This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Data-Based Receding Horizon Control of Linear Network Systemsthanks: This work was supported by ARO-W911NF-18-1-0213.

Ahmed Allibhoy  Jorge Cortés A. Allibhoy and J. Cortés are with the Department of Mechanical and Aerospace Engineering, UC San Diego, {aallibho,cortes}@ucsd.edu
Abstract

We propose a distributed data-based predictive control scheme to stabilize a network system described by linear dynamics. Agents cooperate to predict the future system evolution without knowledge of the dynamics, relying instead on learning a data-based representation from a single sample trajectory. We employ this representation to reformulate the finite-horizon Linear Quadratic Regulator problem as a network optimization with separable objective functions and locally expressible constraints. We show that the controller resulting from approximately solving this problem using a distributed optimization algorithm in a receding horizon manner is stabilizing. We validate our results through numerical simulations.

Index Terms:
Data-based control, network systems, predictive control of linear systems

I Introduction

With the growing complexity of engineering systems, data-based methods in control theory are becoming increasingly popular, particularly for systems where it is too difficult to develop models from first principles and parameter identification is impractical or too costly. An important class of such systems are network systems, which arise in many applications such as neuroscience, power systems, traffic management, and robotics. Without a system model, agents must use sampled data to characterize the network behavior. However, the decentralized nature of the system means that agents only have access to information that can be measured locally, and must coordinate with one another to predict the network response and decide their control actions. These observations motivate the focus here on distributed data-based control of network systems with linear dynamics.

Literature Review

Distributed control of network systems is a burgeoning area of research, see e.g., [1, 2, 3] and references therein. In general, designing optimal controllers for network systems is an NP-hard problem, but under certain conditions optimal distributed controllers for linear systems can be obtained as the solution to a convex program [4]. When these conditions do not hold, suboptimal controllers can be obtained by convex relaxations [5, 6] or convex restrictions [7] of the original problem. Although these methods produce distributed controllers, the computation of the controller itself is typically done offline, in a centralized manner, and requires knowledge of the underlying system model. Reinforcement learning (RL) is an increasingly popular approach for controlling robots [8] and multi-agent systems [9]. However, RL approaches typically require a very large number of samples to perform effectively [10] and their complexity makes it difficult to get stability, safety, and robustness guarantees as is standard with other control approaches. For applications where safety assurances are required, model predictive control (MPC) is widely used since performance and safety constraints can be directly incorporated into an optimization problem that is solved online. Several distributed MPC formulations are available for multi-agent systems where the dynamics of the agents are coupled, such as [11, 12] where each agent implements a control policy minimizing its own objective while accounting for network interactions locally, or [13] where agents cooperate to minimize a system-wide objective using a network optimization algorithm. Data-based approaches to predictive control have also been proposed. System identification [14] is often leveraged to learn a parameterized model which can then be used with any of the MPC formulations previously mentioned. Methods for implementing a controller directly from sampled data without any intermediate identification also exist. The fundamental lemma from behavioral systems theory [15], which characterizes system trajectories from a single sample trajectory, has recently gained attention in the area of data-based control [16, 17, 18], and has been used for predictive control in the recently developed DeePC framework [19, 20]. Our work here extends the DeePC framework to network systems where each node only has partial access to the data.

Statement of Contributions

We develop distributed data-based feedback controllers for network systems111Throughout the paper, we make use of the following notation. Given integers, a,ba,b\in{\mathbb{Z}} with a<ba<b, let [a,b]={a,a+1,,b}[a,b]=\{a,a+1,\dots,b\}. Let 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) be an undirected graph with NN nodes, where 𝒱=[1,N]\mathcal{V}=[1,N] and 𝒱×𝒱\mathcal{E}\subset\mathcal{V}\times\mathcal{V}. The neighbors of i𝒱i\in\mathcal{V} are 𝒩i={j:(i,j)}\mathcal{N}_{i}=\{j:(i,j)\in\mathcal{E}\}. Given S={s1,s2,,sM}[1,N]S=\{s_{1},s_{2},\dots,s_{M}\}\subseteq[1,N] and a vector x=[x1𝖳,x2𝖳,,xN𝖳]x=[x_{1}^{\mathsf{T}},x_{2}^{\mathsf{T}},\dots,x_{N}^{\mathsf{T}}], we denote xS=[xs1𝖳xs2𝖳xsM𝖳]x_{S}=\begin{bmatrix}x_{s_{1}}^{\mathsf{T}}&x_{s_{2}}^{\mathsf{T}}&\cdots&x_{s_{M}}^{\mathsf{T}}\end{bmatrix}. For xidix_{i}\in^{d_{i}} with i[1,K]i\in[1,K], we let col(x1,x2,,xK)=[x1𝖳,x2𝖳,,xK𝖳]𝖳\textnormal{col}(x_{1},x_{2},\dots,x_{K})=[x_{1}^{\mathsf{T}},x_{2}^{\mathsf{T}},\dots,x_{K}^{\mathsf{T}}]^{\mathsf{T}}. For positive semidefinite Qn×nQ\in^{n\times n}, we denote xQ=x𝖳Qx\left\lVert x\right\rVert_{Q}=\sqrt{x^{\mathsf{T}}Qx}. For Mn×mM\in^{n\times m}, we denote by MM^{\dagger} its Moore-Penrose pseudoinverse. The Hankel matrix of a signal w:[0,T]kw:[0,T]\to^{k} with tTt\leq T block rows is the kt×(Tt+1)kt\times(T-t+1) matrix t(w)=[w(0)w(1)w(Tt)w(1)w(2)w(Tt+1)w(t1)w(t)w(T1)].\mathcal{H}_{t}(w)=\begin{bmatrix}w(0)&w(1)&\cdots&w(T-t)\\ w(1)&w(2)&\cdots&w(T-t+1)\\ \vdots&\vdots&\ddots&\vdots\\ w(t-1)&w(t)&\cdots&w(T-1)\end{bmatrix}. Given two signals v1:[0,T1]k1v_{1}:[0,T-1]\to^{k_{1}} and v2:[0,T1]k2v_{2}:[0,T-1]\to^{k_{2}}, let v=col(v1,v2)v=\textnormal{col}(v_{1},v_{2}) be the signal where v(t)=v1(t)v(t)=v_{1}(t) for 0t<T0\leq t<T, and v(t)=v2(tT)v(t)=v_{2}(t-T) for Tt<2TT\leq t<2T.. A group of agents whose state evolves according to unknown coupled linear dynamics each have access to their own state and those of their neighbors in some sample trajectory. Their collective objective is to drive the network state to the origin while minimizing a quadratic objective function without knowledge of the system dynamics. The approach we use computes the control policy online and in a distributed manner by extending the DeePC formalism to the network case. Building upon the fundamental lemma, we introduce a new distributed, data-based representation of possible network trajectories. We use this representation to pose the control synthesis as a network optimization problem, without state or input constraints, in terms of the data available to each agent. We show that this optimization problem is equivalent to the standard finite-horizon Linear Quadratic Regulation (LQR) problem and introduce a primal-dual method along with a suboptimality certificate to allow agents to cooperatively find an approximate solution. Finally, we show that the controller that results from implementing the distributed solver in a receding horizon manner is stabilizing.

II Preliminaries

We briefy recall here basic concepts on the identifiability of Linear Time-Invariant (LTI) systems from data. Given t,Td0t,T_{d}\in{\mathbb{Z}}_{\geq 0} with t<Tdt<T_{d}, a signal w:[0,Td1]kw:[0,T_{d}-1]\to^{k} is persistently exciting of order tt if rowrank t(w)=kt\textnormal{rowrank }\mathcal{H}_{t}(w)=kt. Informally, this means that any arbitrary signal v:[0,t1]kv:[0,t-1]\to^{k} can be described as a linear combination of windows of width tt in the signal ww. A necessary condition for persistence of excitation is Td(k+1)t1T_{d}\geq(k+1)t-1.

Lemma II.1.

(Fundamental Lemma [15]): Consider the LTI system x(t+1)=Ax(t)+Bu(t)x(t+1)=Ax(t)+Bu(t), with (A,B)(A,B) controllable. Let ud:[0,Td1]mu^{d}:[0,T_{d}-1]\to^{m}, xd:[0,Td1]nx^{d}:[0,T_{d}-1]\to^{n} be sequences such that wd=col(ud,xd)w^{d}=\textnormal{col}(u^{d},x^{d}) is a trajectory of the system and udu^{d} is persistently exciting of order n+τn+\tau. Then for any pair u:[0,τ1]mu:[0,\tau-1]\to^{m}, x:[0,τ1]nx:[0,\tau-1]\to^{n}, w=col(u,x)w=\textnormal{col}(u,x) is a trajectory of the system if and only if there exists gTdτ1g\in^{T_{d}-\tau-1} such that τ(wd)g=w\mathcal{H}_{\tau}(w^{d})g=w.

Lemma II.1 is stated here in state-space form, even though the result was originally presented in the language of behavioral systems theory. The result states that all trajectories of a controllable LTI system can be characterized by a single trajectory if the corresponding input is persistently exciting of sufficiently high order, obviating the need for a model or parameter estimation when designing a controller.

III Problem Formulation

Consider a network system described by an undirected graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) with NN nodes. Each node corresponds to an agent with sensing, communication, and computation capabilities. Each edge corresponds to both a physical coupling and a communication link between the corresponding agents. A subset of the nodes S𝒱S\subset\mathcal{V}, with |S|=M|S|=M, also have actuation capabilities via inputs uimiu_{i}\in^{m_{i}}. The system dynamics are then

xi(t+1)={Aiixi(t)+j𝒩iAijxj(t)+BiuiiS,Aiixi(t)+j𝒩iAijxj(t)iS,\small x_{i}(t+1)=\left\{\begin{aligned} &A_{ii}x_{i}(t)+\sum_{j\in\mathcal{N}_{i}}A_{ij}x_{j}(t)+B_{i}u_{i}&i\in S,\cr&A_{ii}x_{i}(t)+\sum_{j\in\mathcal{N}_{i}}A_{ij}x_{j}(t)&i\notin S,\end{aligned}\right. (1)

where xinix_{i}\in^{n_{i}}, Aijni×njA_{ij}\in^{n_{i}\times n_{j}} and Bini×miB_{i}\in^{n_{i}\times m_{i}}. Let n=i=1Nnin=\sum_{i=1}^{N}n_{i} and m=i=1Mmim=\sum_{i=1}^{M}m_{i} and define x=col(x1,x2,,xN)nx=\textnormal{col}(x_{1},x_{2},\cdots,x_{N})\in^{n} and u=col(u1,u2,,uM)mu=\textnormal{col}(u_{1},u_{2},\cdots,u_{M})\in^{m}. Let An×nA\in^{n\times n} and Bn×mB\in^{n\times m} be matrices so that (1) takes the compact form x(t+1)=Ax(t)+Bu(t)x(t+1)=Ax(t)+Bu(t).

To each node i𝒱i\in\mathcal{V}, we associate an objective of the form Ji(xi,ui)=t=0T1xi(t)Qi+ui(t)RiJ_{i}(x_{i},u_{i})=\sum_{t=0}^{T-1}\left\lVert x_{i}(t)\right\rVert_{Q_{i}}+\left\lVert u_{i}(t)\right\rVert_{R_{i}} when iSi\in S and Ji(xi)=t=0T1xi(t)QiJ_{i}(x_{i})=\sum_{t=0}^{T-1}\left\lVert x_{i}(t)\right\rVert_{Q_{i}} otherwise. Here, each Qini×niQ_{i}\in^{n_{i}\times n_{i}} is positive semidefinite, each Rii×iR_{i}\in^{i\times i} is positive definite, col(u,x)\textnormal{col}(u,x) is a system trajectory, and TT is the time horizon of trajectories being considered.

Each node wants to drive its state xix_{i} to the origin while minimizing JiJ_{i} and satisfying the constraints. The resulting network objective function is the sum of the objective functions across the nodes. Letting Q=blkdiag(Q1,Q2,,QN)n×nQ=\text{blkdiag}(Q_{1},Q_{2},\dots,Q_{N})\in^{n\times n} and R=blkdiag(Rs1,Rs2,,RsM)m×mR=\text{blkdiag}(R_{s_{1}},R_{s_{2}},\dots,R_{s_{M}})\in^{m\times m}, this objective can be written as

J(x,u)\displaystyle J(x,u) =iSJi(xi,u)+i𝒱SJi(xi)\displaystyle=\sum_{i\in S}J_{i}(x_{i},u)+\sum_{i\in\mathcal{V}\setminus S}J_{i}(x_{i})

so that J(x,u)=t=0T1x(t)Q+u(t)RJ(x,u)=\sum_{t=0}^{T-1}\left\lVert x(t)\right\rVert_{Q}+\left\lVert u(t)\right\rVert_{R}. If the system starts from x(0)=x0nx(0)=x^{0}\in^{n}, the agents’ goal can be formulated as the network optimization problem:

minimizeu,x\displaystyle\underset{u,x}{\text{minimize}} t=0T1x(t)Q+u(t)R\displaystyle\sum_{t=0}^{T-1}\left\lVert x(t)\right\rVert_{Q}+\left\lVert u(t)\right\rVert_{R} (2)
subject to x(t+1)=Ax(t)+Bu(t),\displaystyle x(t+1)=Ax(t)+Bu(t), for t[0,Tlqr]t\in[0,T_{\text{lqr}}]
x(0)=x0,x(T)=0.\displaystyle x(0)=x^{0},\;x(T)=0.

Note that the agents’ decisions on their control inputs are coupled through the constraints. Since R0R\succ 0, if (2) is feasible, its optimal state and input trajectories are unique.

A key aspect of this paper is that we consider scenarios where the system matrices AA and BB are unknown to the network. Instead, we assume that, for a set of given input sequences {uid:[0,Td1]mi}iS\{u^{d}_{i}:[0,T_{d}-1]\to^{m_{i}}\}_{i\in S}, the corresponding state trajectories {xid:[0,Td1]ni}i𝒱\{x^{d}_{i}:[0,T_{d}-1]\to^{n_{i}}\}_{i\in\mathcal{V}} are available, and each node i𝒱i\in\mathcal{V} has access to its own state trajectory as well as those of its neighbors. Actuated nodes iSi\in S also have access to their own input uidu^{d}_{i}, but this is unknown to its neighbors 𝒩i\mathcal{N}_{i}. Our aim is to synthesize a control policy that can be implemented by each node in a distributed way with data available to it. The resulting controller should stabilize the system to the origin while minimizing J(x,u)J(x,u).

IV Data-Based Representation for Optimization

Here, we introduce a data-based representation of system trajectories that is employed to pose a network optimization problem equivalent to (2). Throughout this section, we let xd:[0,Td1]nx^{d}:[0,T_{d}-1]\to^{n}, ud:[0,Td1]mu^{d}:[0,T_{d}-1]\to^{m} be sequences such that wd(t)=col(ud(t),xd(t))w^{d}(t)=\textnormal{col}(u^{d}(t),x^{d}(t)) is a trajectory of (1). Let

wid(t)={col(uid(t),x𝒩id(t),xid(t))if iScol(x𝒩id(t),xid(t))if iS,w^{d}_{i}(t)=\begin{cases}\textnormal{col}(u^{d}_{i}(t),x^{d}_{\mathcal{N}_{i}}(t),x^{d}_{i}(t))&\text{if $i\in S$}\\ \textnormal{col}(x^{d}_{\mathcal{N}_{i}}(t),x^{d}_{i}(t))&\text{if $i\notin S$}\end{cases},

for each i𝒱i\in\mathcal{V} and 0t<Td10\leq t<T_{d}-1. Then widw_{i}^{d} is the data available to each node. Let u:[0,τ]mu:[0,\tau]\to^{m}, x:[0,τ]nx:[0,\tau]\to^{n} be arbitrary sequences where Td(n+m)τ1T_{d}\geq(n+m)\tau-1. Define w(t)=col(x(t),u(t))w(t)=\textnormal{col}(x(t),u(t)) and

wi(t)={col(ui(t),x𝒩i(t),xi(t))if iScol(x𝒩i(t),xi(t))if iS.w_{i}(t)=\begin{cases}\textnormal{col}(u_{i}(t),x_{\mathcal{N}_{i}}(t),x_{i}(t))&\text{if $i\in S$}\\ \textnormal{col}(x_{\mathcal{N}_{i}}(t),x_{i}(t))&\text{if $i\notin S$}\end{cases}.

Let ki=niτ+j𝒩injτ+miτk_{i}=n_{i}\tau+\sum_{j\in\mathcal{N}_{i}}n_{j}\tau+m_{i}\tau for iSi\in S, and ki=niτ+j𝒩injτk_{i}=n_{i}\tau+\sum_{j\in\mathcal{N}_{i}}n_{j}\tau otherwise. We define Eiki×(m+n)τE_{i}\in^{k_{i}\times(m+n)\tau} to be the matrix consisting of all ones and zeros such that Eiwd=widE_{i}w^{d}=w^{d}_{i} and Eiw=wiE_{i}w=w_{i}.

IV-A Data-Based Representation of Network Trajectories

Lemma II.1 states conditions under which the behavior of the system can be described completely by the Hankel matrix of the sampled data. Here we extend Lemma II.1 to the setting of a network system to build a data-based representation of network trajectories using the Hankel matrices of the data available to each agent, τ(wid)\mathcal{H}_{\tau}(w^{d}_{i}). We show that under certain conditions the image of τ(wid)\mathcal{H}_{\tau}(w^{d}_{i}) is the set of all possible trajectories of node ii.

Proposition IV.1.

(Sufficiency of Date-Based Image Representation): If for each i𝒱i\in\mathcal{V} there exists giTdτ+1g_{i}\in^{T_{d}-\tau+1} with τ(wid)gi=wi\mathcal{H}_{\tau}(w^{d}_{i})g_{i}=w_{i}, then ww is a trajectory of (1).

Proof.

Writing gi=(gi(0),gi(1),,gi(Tdτ))𝖳g_{i}=(g_{i}(0),g_{i}(1),\dots,g_{i}(T_{d}-\tau))^{\mathsf{T}} so for all 0t<τ10\leq t<\tau-1, wi(t)=k=0Tdτ+1gi(k)wid(t+k)w_{i}(t)=\sum_{k=0}^{T_{d}-\tau+1}g_{i}(k)w^{d}_{i}(t+k), it follows that xi(t+1)=k=0Tτgi(k)xid(t+k+1)x_{i}(t+1)=\sum_{k=0}^{T-\tau}g_{i}(k)x^{d}_{i}(t+k+1) so for iSi\notin S,

xi(t+1)\displaystyle x_{i}(t+1) =k=0Tdτgi(k)(Aiixid(t+k)+j𝒩iAijxjd(t+k))\displaystyle=\sum_{k=0}^{T_{d}-\tau}g_{i}(k)\Big{(}A_{ii}x^{d}_{i}(t+k)+\sum_{j\in\mathcal{N}_{i}}A_{ij}x^{d}_{j}(t+k)\Big{)}
=Aiixi(t)+j𝒩iAijxj(t).\displaystyle=A_{ii}x_{i}(t)+\sum_{j\in\mathcal{N}_{i}}A_{ij}x_{j}(t).

By a similar computation, we can show that for each iSi\in S,

xi(t+1)=Aii\displaystyle x_{i}(t+1)=A_{ii} xi(t)+j𝒩iAijxj(t)+Biui(t),\displaystyle x_{i}(t)+\sum_{j\in\mathcal{N}_{i}}A_{ij}x_{j}(t)+B_{i}u_{i}(t),

which is consistent with (1). ∎

Next we identify conditions for the converse of the above result to hold, i.e., when the Hankel matrices of all the agents characterize all possible network trajectories.

Proposition IV.2.

(Necessity of Data-Based Image Representation): If (A,B)(A,B) is controllable, wdw^{d} is a trajectory of (1) and either

  1. (i)

    udu^{d} is persistently exciting of order n+τn+\tau ;

  2. (ii)

    col(uid(t),x𝒩id(t))\textnormal{col}(u^{d}_{i}(t),x_{\mathcal{N}_{i}}^{d}(t)) is persistently exciting of order ni+τn_{i}+\tau for each iSi\in S, and x𝒩idx_{\mathcal{N}_{i}}^{d} is persistently exciting of order ni+τn_{i}+\tau for each i𝒱Si\in\mathcal{V}\setminus S;

then for all i𝒱i\in\mathcal{V} there exists giTdτ+1g_{i}\in^{T_{d}-\tau+1} such that τ(wid)gi=wi\mathcal{H}_{\tau}(w^{d}_{i})g_{i}=w_{i}.

Proof.

In the case of (i) we simply apply Lemma II.1 to obtain gTdτ+1g\in^{T_{d}-\tau+1} where τ(wd)g=w\mathcal{H}_{\tau}(w^{d})g=w, and note that for all i𝒱i\in\mathcal{V}, wi=Eiw=Eiτ(wd)g=τ(wid)gw_{i}=E_{i}w=E_{i}\mathcal{H}_{\tau}(w^{d})g=\mathcal{H}_{\tau}(w_{i}^{d})g, so the result follows by letting gi=gg_{i}=g. For case (ii), we think of xjx_{j} for j𝒩ij\in\mathcal{N}_{i} as an input to node ii. Letting k=|𝒩i|k=|\mathcal{N}_{i}|, where 𝒩i={j1,j2,,jk}\mathcal{N}_{i}=\{j_{1},j_{2},\dots,j_{k}\}, and defining

B~i={[Aij1Aij2AijkBi]iS[Aij1Aij2Aijk]iS,\displaystyle\small\tilde{B}_{i}=\begin{cases}\begin{bmatrix}A_{ij_{1}}&A_{ij_{2}}&\cdots&A_{ij_{k}}&B_{i}\end{bmatrix}&i\in S\\[10.0pt] \begin{bmatrix}A_{ij_{1}}&A_{ij_{2}}&\cdots&A_{ij_{k}}\end{bmatrix}&i\notin S\end{cases},

we have

xi(t+1)={Aiixi(t)+B~icol(xj1,xj2,,xjk,ui)iSAiixi(t)+B~icol(xj1,xj2,,xjk)iS\displaystyle\small x_{i}(t+1)=\begin{cases}A_{ii}x_{i}(t)+\tilde{B}_{i}\textnormal{col}(x_{j_{1}},x_{j_{2}},\cdots,x_{j_{k}},u_{i})&i\in S\\ A_{ii}x_{i}(t)+\tilde{B}_{i}\textnormal{col}(x_{j_{1}},x_{j_{2}},\cdots,x_{j_{k}})&i\notin S\end{cases}

Let xi0nix^{0}_{i}\in^{n_{i}} be arbitrary, and x0nx^{0}\in^{n} such that the iith block component is xi0x^{0}_{i}. Since (A,B)(A,B) is controllable there exists an input u¯:[0,n]m\bar{u}:[0,n]\to^{m} such the corresponding state trajectory x¯:[0,n]m\bar{x}:[0,n]\to^{m} with x¯(0)=x0\bar{x}(0)=x^{0} has x¯(n)=0\bar{x}(n)=0. Note that if iSi\in S, then x¯i\bar{x}_{i} is the state trajectory corresponding to the input col(u¯i,x¯𝒩i)\textnormal{col}(\bar{u}_{i},\bar{x}_{\mathcal{N}_{i}}), and x¯i(0)=xi0\bar{x}_{i}(0)=x_{i}^{0} and x¯i(n)=0\bar{x}_{i}(n)=0. Likewise, if iSi\notin S, x¯i\bar{x}_{i} is the state trajectory corresponding to the input x¯𝒩i\bar{x}_{\mathcal{N}_{i}}, and x¯i(0)=xi0\bar{x}_{i}(0)=x_{i}^{0} and x¯i(n)=0\bar{x}_{i}(n)=0. Hence (Aii,B~i)(A_{ii},\tilde{B}_{i}) is controllable for all i𝒱i\in\mathcal{V} and the result follows from Lemma II.1. ∎

Remark IV.3.

(Feasibility of Identifiability Conditions): Proposition IV.2 gives conditions on when the data is rich enough to characterize all possible trajectories of the system. Condition (i) gives conditions on the input sequence, udu^{d}, which guarantee a priori the identifiability of the system from data. This condition is generically true in the sense that the set of sequences udu^{d} which are not persistently exciting of order n+τn+\tau (even though for all iSi\in S, uidu^{d}_{i} is) have zero Lebesgue measure. In general, it is difficult to verify condition (i) in a distributed manner. On the other hand, it is straightforward to verify condition (ii) using only information available to the individual agents. However this verification must be done in an ad hoc manner, after the input has been applied to the system. While the condition is sufficient for identifiability, there are systems where for all inputs udu^{d}, the resulting trajectory wdw^{d} will never satisfy it. \square

IV-B Equivalent Network Optimization Problem

Here, we build on the data-based image representation of network trajectories in a distributed fashion to pose a network optimization problem that can be solved with the data available to each agent, which is equivalent to an LQR problem with a time horizon of TT. Each node can use this representation along with Tini>0T_{\text{ini}}>0 past states and inputs to predict future trajectories assuming that the hypotheses of Proposition IV.2 are satisfied. Formally, let τ=Tini+T+1\tau=T_{\text{ini}}+T+1 and let uini:[0,Tini1]mu^{\text{ini}}:[0,T_{\text{ini}}-1]\to^{m} and xini:[0,Tini1]nx^{\text{ini}}:[0,T_{\text{ini}}-1]\to^{n} be sequences such that col(uini,xini)\textnormal{col}(u^{\text{ini}},x^{\text{ini}}) is a TiniT_{\text{ini}} long trajectory of the system. In the network optimization we introduce below, we optimize over system trajectories col(u,x)\textnormal{col}(u,x) of length τ\tau, constrained so the first TiniT_{\text{ini}} samples of uu and xx are uiniu^{\text{ini}} and xinix^{\text{ini}} resp. This plays a similar role to the initial condition constraint in (2).

For each node i𝒱i\in\mathcal{V}, define

Hi=[τ(uid)τ(x𝒩id)τ(xid)] if iS and Hi=[τ(x𝒩id)τ(xid)] if iS.H_{i}=\begin{bmatrix}\mathcal{H}_{\tau}(u^{d}_{i})\\ \mathcal{H}_{\tau}(x^{d}_{\mathcal{N}_{i}})\\ \mathcal{H}_{\tau}(x^{d}_{i})\end{bmatrix}\text{ if }i\in S\text{ and }H_{i}=\begin{bmatrix}\mathcal{H}_{\tau}(x^{d}_{\mathcal{N}_{i}})\\ \mathcal{H}_{\tau}(x^{d}_{i})\end{bmatrix}\text{ if }i\notin S. (3)

Consider the following problem

minimizegi,ui,xi\displaystyle\underset{g_{i},u_{i},x_{i}}{\text{minimize}} iSJi(xi,ui)+i𝒱SJi(xi)\displaystyle\sum_{i\in S}J_{i}(x_{i},u_{i})+\sum_{i\in\mathcal{V}\setminus S}J_{i}(x_{i}) (4)
subject to Higi=col(uiini,ui,x𝒩iini,x𝒩i,xiini,xi),\displaystyle H_{i}g_{i}=\textnormal{col}(u^{\text{ini}}_{i},u_{i},x^{\text{ini}}_{\mathcal{N}_{i}},x_{\mathcal{N}_{i}},x^{\text{ini}}_{i},x_{i}), iS\displaystyle i\in S
Higi=col(x𝒩iini,x𝒩i,xiini,xi),\displaystyle H_{i}g_{i}=\textnormal{col}(x^{\text{ini}}_{\mathcal{N}_{i}},x_{\mathcal{N}_{i}},x^{\text{ini}}_{i},x_{i}), iS\displaystyle i\notin S
xi(T)=0\displaystyle x_{i}(T)=0 iV.\displaystyle i\in V.

Although (4) does not necessarily have a unique optimizer, any optimizer (g,u,x)(g^{*},u^{*},x^{*}) of (4) is such that uu^{*} and xx^{*} are the optimal input and state sequences of (2), as formalized next.

Proposition IV.4.

(Equivalent Network Optimization): Consider the system (1) and sample trajectory wdw^{d} satisfying the hypotheses of Proposition IV.2 and let x0=Axini(Tini1)+Buini(Tini1)x^{0}=Ax^{\text{ini}}(T_{\text{ini}}-1)+Bu^{\text{ini}}(T_{\text{ini}}-1). Then the following hold:

  1. (i)

    If problem (2) is feasible, then it has a unique optimizer;

  2. (ii)

    Problem (4) is feasible if and only if (2) is feasible;

  3. (iii)

    If (4) is feasible, (u1,,x1,)(u^{1,*},x^{1,*}) is the optimizer of (2) and (g,u2,,x2,)(g^{*},u^{2,*},x^{2,*}) is an optimizer of (4), then u1,=u2,u^{1,*}=u^{2,*} and x1,=x2,x^{1,*}=x^{2,*}.

We omit the proof for space reasons, but note that it is the analogue of Theorem 5.1 and Corollary 5.2 in [19] for the case of network systems once one invokes Propositions IV.1 and IV.2 above. Unlike the original network optimization problem (2), for which agents lack knowledge of the system matrices AA, BB, the network optimization problem (4) can be solved in a distributed way with the information available to them. The structure of the problem (aggregate objective functions plus locally expressible constraints) makes it amenable to a variety of distributed optimization algorithms, see e.g., [21, 22]. In Section V-B below, we employ a primal-dual dynamic to find asymptotically a solution of (4) in a distributed way.

Remark IV.5.

(Scalability of Network Optimization): As the number of nodes in the network increases so does the state dimension, hence more data is required in order to maintain persistency of excitation. A necessary condition is Td(n+m+1)(Tini+T)1T_{d}\geq(n+m+1)(T_{\text{ini}}+T)-1. Assuming that TO(n)T\sim O(n), TiniO(1)T_{\text{ini}}\sim O(1), we have TdO(n2+mn)T_{d}\sim O(n^{2}+mn). The decision variable for each node is zi=col(gi,ui,xi)z_{i}=\textnormal{col}(g_{i},u_{i},x_{i}) when iSi\in S and zi=col(gi,xi)z_{i}=\textnormal{col}(g_{i},x_{i}) otherwise. The size of ziz_{i} is O(n2+mn)O(n^{2}+mn). However, using the distributed optimization algorithm of Section V-B, agent ii only needs to send messages of size O(ki)O(k_{i}) to agent jj. \square

V Distributed Data-Based Predictive Control

Here we introduce a distributed data-based predictive control scheme to stabilize the system (1) to the origin, as described in Section III. To do this, we solve the network optimization problem (4) in a receding horizon manner with uiniu^{\text{ini}} and xinix^{\text{ini}} updated every time step based on the systems current state. The control scheme is summarized in Algorithm 1.

Algorithm 1 Distributed Data-Based Predictive Control
1:Input: Sample trajectory wdw^{d}, performance indices (Qi)i=1N(Q_{i})_{i=1}^{N}, (Ri)i=1N(R_{i})_{i=1}^{N}
2:Initialize HiH_{i} as in equation (3), let uiniu^{\text{ini}} and xinix^{\text{ini}} be the TiniT_{\text{ini}} most recent states and inputs respectively, and set t=0t=0.
3:while x>0\left\lVert x\right\rVert>0 do
4:     Use a distributed optimization algorithm to obtain an approximate solution to (4), z^=col(g^,u^,x^)\hat{z}=\textnormal{col}(\hat{g},\hat{u},\hat{x}), such that u^uδmin{1,xini(Tini1)}\lVert\hat{u}-u^{*}\rVert\leq\delta\min\{1,\lVert x^{\text{ini}}(T_{\text{ini}}-1)\rVert\}.
5:     Apply the input u^(0)\hat{u}(0).
6:     Set tt to t+1t+1 and uiniu^{\text{ini}} and xinix^{\text{ini}} to the TiniT_{\text{ini}} most recent inputs and states respectively.

The rest of the section proceeds by first showing that the controller resulting from Algorithm 1 is stabilizing even when the network optimization (4) is solved only approximately; and then introducing a particular distributed solver for (4) along with a suboptimality certificate to check, in a distributed manner, the stopping condition in Step 4 of Algorithm 1.

V-A Stability Analysis of Closed-Loop System

In the rest of the paper, we rely on the following assumption.

Assumption V.1.

Consider the system (1),

  1. (i)

    the collected data satisfies the hypotheses of Proposition IV.2;

  2. (ii)

    the optimization problem (4) is feasible for all (uini,xini)(u^{\text{ini}},x^{\text{ini}}) which is a valid TiniT_{\text{ini}}-sample long system trajectory.

Since the system is controllable (cf. (i)), a sufficient condition for guaranteeing the feasibility in (ii) is TnT\geq n. In fact, in such case, there exists a trajectory (u,x)(u,x) such that x(T)=0x(T)=0. It follows that (u,x)(u,x) is feasible for (2), so by Proposition IV.4, there exists some gg such that (g,u,x)(g,u,x) is feasible for (4).

Under Assumption V.1, the closed-loop system with the controller corresponding to a receding horizon implementation of (2) is globally exponentially stable, cf. [23, Theorem 12.2]. Unlike [19], we do not assume we have access to the exact solution of (4) since distributed optimization algorithms typically only converge asymptotically to the true optimizer and must be terminated in finite time. Here we show that Algorithm 1 still stabilizes the system when the tolerance δ\delta is sufficiently small.

Theorem V.2.

(Distributed, Data-Based Predictive Control is Stabilizing): For δ>0\delta>0, let ϕδ:nm\phi_{\delta}:^{n}\to^{m} be the feedback control corresponding to Algorithm 1. Under Assumption V.1, there exists δ>0\delta^{*}>0 such that for all δ<δ\delta<\delta^{*}, the origin is globally asymptotically stable with respect to the closed-loop dynamics x(t+1)=Ax(t)+Bϕδ(x(t))x(t+1)=Ax(t)+B\phi_{\delta}(x(t)).

Proof.

Let ϕmpc:nm\phi_{\text{mpc}}:^{n}\to^{m} be the feedback corresponding to a receding horizon implementation of (2). Consider the system x(t+1)=f(x(t),v(t)),x(t+1)=f(x(t),v(t)), where f(x,v)=A+Bϕmpc(x)+Bvf(x,v)=A+B\phi_{\text{mpc}}(x)+Bv. Let J(x0)=J(x,u)J^{*}(x^{0})=J(x^{*},u^{*}), where (u,x)(u^{*},x^{*}) is an optimizer of (2). Because (4) is nondegenerate, JJ^{*} is continuously differentiable, cf. [23, Theorem 6.9], and the system is input-to-state stable (ISS) with JJ^{*} being a ISS-Lyapunov function satisfying J(f(x,v))J(x)αx2+σv2J^{*}(f(x,v))-J^{*}(x)\leq-\alpha\left\lVert x\right\rVert^{2}+\sigma\left\lVert v\right\rVert^{2} for constants α,σ>0\alpha,\sigma>0, cf. [24, Theorem 1] (albeit the result is stated there for systems where AA is Schur stable, the same reasoning is valid when there are no state or input constraints and AA is unstable). Because the system x(t+1)=f(x(t),0)x(t+1)=f(x(t),0) without disturbances is exponentially stable [25], it follows by [26, proof of Lemma 3.5] that the gain function is linear, so there exist γ>0\gamma>0 and a class 𝒦\mathcal{K}\mathcal{L} function β\beta such that

x(t)β(x(0),t)+γsup0τtv(τ),\left\lVert x(t)\right\rVert\leq\beta(\left\lVert x(0)\right\rVert,t)+\gamma\sup_{0\leq\tau\leq t}\left\lVert v(\tau)\right\rVert,

for all t0t\in{\mathbb{Z}}_{\geq 0}. Let x:0nx:{\mathbb{Z}}_{\geq 0}\to^{n} be a trajectory of the closed-loop dynamics of (1) with the controller described by Algorithm 1 where δ<δ=γ1\delta<\delta^{*}=\gamma^{-1} and define v(t)=ϕδ(x(t))ϕmpc(x(t))v(t)=\phi_{\delta}(x(t))-\phi_{\text{mpc}}(x(t)). It follows that x(t+1)=f(x(t),v(t))x(t+1)=f(x(t),v(t)). Note that ϕmpc(x(t))=u(0;uini,xini)\phi_{\text{mpc}}(x(t))=u^{*}(0;u^{\text{ini}},x^{\text{ini}}), where uini=ϕδ(x(t1))u^{\text{ini}}=\phi_{\delta}(x(t-1)) and xini=x(t1)x^{\text{ini}}=x(t-1) so it follows that v(t)=u^(0;uini,xini)u(0;uini,xini)δx(t1).\left\lVert v(t)\right\rVert=\left\lVert\hat{u}(0;u^{\text{ini}},x^{\text{ini}})-u^{*}(0;u^{\text{ini}},x^{\text{ini}})\right\rVert\leq\delta\left\lVert x(t-1)\right\rVert. We claim that for all kk\in{\mathbb{N}}, there exists TkT_{k}\in{\mathbb{N}} such that x(t)(k+1)(γδ)k\left\lVert x(t)\right\rVert\leq(k+1)(\gamma\delta)^{k} whenever tTkt\geq T_{k}. The case when k=1k=1 follows by observing that x(t)β(x(0),t)+γδ\left\lVert x(t)\right\rVert\leq\beta(\left\lVert x(0)\right\rVert,t)+\gamma\delta and there exists T1T_{1} such that β(x(0),t)<γδ\beta(\left\lVert x(0)\right\rVert,t)<\gamma\delta for all tT1t\geq T_{1}. If the claim holds for some kk, then for all tTk+1t\geq T_{k}+1,

x(t)\displaystyle\left\lVert x(t)\right\rVert β((k+1)(γδ)k,t)+γsupTk+1<τtv(τ)\displaystyle\leq\beta((k+1)(\gamma\delta)^{k},t)+\gamma\sup_{T_{k}+1<\tau\leq t}\left\lVert v(\tau)\right\rVert
β((k+1)(γδ)k,t)+(k+1)(γδ)k+1,\displaystyle\leq\beta((k+1)(\gamma\delta)^{k},t)+(k+1)(\gamma\delta)^{k+1},

so by choosing Tk+1T_{k+1} such that β((k+1)(γδ)k,t)<(γδ)k+1\beta((k+1)(\gamma\delta)^{k},t)<(\gamma\delta)^{k+1} for all tTk+1t\geq T_{k+1}, then x(t)<(k+2)(γδ)k+1\left\lVert x(t)\right\rVert<(k+2)(\gamma\delta)^{k+1} for all t>Tk+1t>T_{k+1} and the claim follows by induction. Hence lim suptx(t)lim supk(k+1)(γδ)k=0.\limsup_{t\to\infty}\left\lVert x(t)\right\rVert\leq\limsup_{k\to\infty}(k+1)(\gamma\delta)^{k}=0. To show global Lyapunov stability, let η>0\eta>0 be arbitrary, and suppose that x(0)\left\lVert x(0)\right\rVert is chosen so that β(x(0),0)<(1γδ)η\beta(\left\lVert x(0)\right\rVert,0)<(1-\gamma\delta)\eta and x(0)<η\left\lVert x(0)\right\rVert<\eta. Then for all t>0t>0,

x(t)\displaystyle\left\lVert x(t)\right\rVert (1γδ)η+γsup0τtv(τ)\displaystyle\leq(1-\gamma\delta)\eta+\gamma\sup_{0\leq\tau\leq t}\left\lVert v(\tau)\right\rVert
(1γδ)η+γδx(t1).\displaystyle\leq(1-\gamma\delta)\eta+\gamma\delta\left\lVert x(t-1)\right\rVert.

If x(t1)<η\left\lVert x(t-1)\right\rVert<\eta, then x(t)<η\left\lVert x(t)\right\rVert<\eta. It follows by induction on tt that x(t)<η\left\lVert x(t)\right\rVert<\eta for all η\eta. ∎

V-B Primal-Dual Solver for Network Optimization

In this section we introduce a method for solving the optimization problem (4) in a distributed way. We let

zi={col(gi,ui,xi)iS,col(gi,xi)iS,z_{i}=\begin{cases}\textnormal{col}(g_{i},u_{i},x_{i})&i\in S,\\ \textnormal{col}(g_{i},x_{i})&i\notin S,\end{cases}

and z=col(z1,,zN)z=\textnormal{col}(z_{1},\dots,z_{N}). Note zidiz_{i}\in^{d_{i}}, where di=T(Tini+T)+1+ni+mid_{i}=T-(T_{\text{ini}}+T)+1+n_{i}+m_{i} for iSi\in S and di=T(Tini+T)+1+nid_{i}=T-(T_{\text{ini}}+T)+1+n_{i} otherwise. Problem (4) can be written as

minimizezidi\displaystyle\underset{z_{i}\in^{d_{i}}}{\text{minimize}}\qquad i𝒱zi𝒬i2\displaystyle\sum_{i\in\mathcal{V}}\left\lVert z_{i}\right\rVert_{\mathcal{Q}_{i}}^{2} (5)
subject to 𝒜iz𝒩i=bi.\displaystyle\mathcal{A}_{i}z_{\mathcal{N}_{i}}=b_{i}.

for suitable 𝒬idi×di\mathcal{Q}_{i}\in^{d_{i}\times d_{i}}, 𝒬i0\mathcal{Q}_{i}\succeq 0, 𝒜ici×di\mathcal{A}_{i}\in^{c_{i}\times d_{i}},, and bicib_{i}\in^{c_{i}}, with i,cii,c_{i}\in{\mathbb{Z}}. The Lagrangian of (5) is

(z,λ)=i𝒱zi𝒬i2+λi𝖳(𝒜iz𝒩ibi).\displaystyle\mathcal{L}(z,\lambda)=\sum_{i\in\mathcal{V}}\left\lVert z_{i}\right\rVert_{\mathcal{Q}_{i}}^{2}+\lambda_{i}^{\mathsf{T}}(\mathcal{A}_{i}z_{\mathcal{N}_{i}}-b_{i}).

If λ\lambda^{*} is an optimizer of the dual problem, then the pair (z,λ)(z^{*},\lambda^{*}) is a (min-max) saddle point of \mathcal{L}, meaning that (z,λ)(z,λ)(z,λ)\mathcal{L}(z^{*},\lambda)\leq\mathcal{L}(z^{*},\lambda^{*})\leq\mathcal{L}(z,\lambda^{*}) for all zdz\in^{d} and λc\lambda\in^{c}. The saddle-point property of the Lagrangian suggests that the primal-dual flow, which descends along the gradient of the primal variable and ascends along the gradient of the dual variable,

[z˙iλi˙]\displaystyle\begin{bmatrix}\dot{z}_{i}\\ \dot{\lambda_{i}}\end{bmatrix} =[zi(z,λ,μ)λi(z,λ,μ)]\displaystyle=\begin{bmatrix}-\nabla_{z_{i}}\mathcal{L}(z,\lambda,\mu)\\ \nabla_{\lambda_{i}}\mathcal{L}(z,\lambda,\mu)\end{bmatrix} (6)
=[2𝒬iziFii𝒜i𝖳λij𝒩iFij𝒜j𝖳λj𝒜iz𝒩i+bi],\displaystyle=\begin{bmatrix}-2\mathcal{Q}_{i}z_{i}-F_{ii}\mathcal{A}_{i}^{\mathsf{T}}\lambda_{i}-\sum_{j\in\mathcal{N}_{i}}F_{ij}\mathcal{A}_{j}^{\mathsf{T}}\lambda_{j}\\ \mathcal{A}_{i}z_{\mathcal{N}_{i}}+b_{i}\end{bmatrix},

can be used to compute the optimizer. Here, Fij(di+j𝒩idj)×diF_{ij}\in^{(d_{i}+\sum_{j\in\mathcal{N}_{i}}d_{j})\times d_{i}} is the matrix such that Fijz𝒩j=ziF_{ij}z_{\mathcal{N}_{j}}=z_{i}. By [22, Corollary 4.5], the flow converges asymptotically to a saddle point of \mathcal{L}. This procedure is fully distributed, since the flow equations in (6) can be computed with the information available to each agent or its direct neighbors. In particular, if j𝒩ij\in\mathcal{N}_{i}, then the message agent jj shares with agent ii consists of col(xj,λj)njT+kj\textnormal{col}(x_{j},\lambda_{j})\in^{n_{j}T+k_{j}}, which is O(ki)O(k_{i}) (cf. Remark IV.5).

We conclude by providing a certificate that can be used to verify the stopping condition of Step 4 in Algorithm 1.

Proposition V.3.

(Suboptimality Certificate): Let uu^{*} and xx^{*} denote the optimal input and state trajectories of (2), y=col(z,λ)y=\textnormal{col}(z,\lambda), 𝒬=diag(𝒬1,𝒬2,,𝒬N)\mathcal{Q}={\rm diag}(\mathcal{Q}_{1},\mathcal{Q}_{2},\dots,\mathcal{Q}_{N}), 𝒜=[F1𝖳𝒜1𝖳,F2𝖳𝒜2𝖳,,FN𝖳𝒜N𝖳]𝖳\mathcal{A}=[F_{1}^{\mathsf{T}}\mathcal{A}_{1}^{\mathsf{T}},F_{2}^{\mathsf{T}}\mathcal{A}_{2}^{\mathsf{T}},\dots,F_{N}^{\mathsf{T}}\mathcal{A}_{N}^{\mathsf{T}}]^{\mathsf{T}}, b=col(b1,b2,,bN)b=\textnormal{col}(b_{1},b_{2},\dots,b_{N}), and

M=[2𝒬𝖳𝒜𝖳𝒜0]q=[0b].\displaystyle M=\begin{bmatrix}-2\mathcal{Q}^{\mathsf{T}}&-\mathcal{A}^{\mathsf{T}}\\ \mathcal{A}&0\end{bmatrix}\qquad q=\begin{bmatrix}0\\ b\end{bmatrix}.

Under Assumption V.1, and with the flow given by (6), if col(z˙i,λ˙i)<ρ\lVert{\textnormal{col}(\dot{z}_{i},\dot{\lambda}_{i})}\rVert<\rho for all i𝒱i\in\mathcal{V}, where ρ=δ2NM2\rho=\frac{\delta^{2}}{N\left\lVert M^{\dagger}\right\rVert^{2}}, then uu<δ\left\lVert u-u^{*}\right\rVert<\delta.

Proof.

The set of saddle points of \mathcal{L} is 𝒮={y|My+q=0}\mathcal{S}=\{y\;|\;My+q=0\}. Since the optimal input and state trajectories are unique, all saddle points share the property that, for each iSi\in S, the (ui,xi)(u_{i},x_{i}) components of their ziz_{i} equal (ui,xi)(u^{*}_{i},x^{*}_{i}). Given an arbitrary yy, we have uuyw\left\lVert u-u^{*}\right\rVert\leq\left\lVert y-w\right\rVert for all w𝒮w\in\mathcal{S}, and hence uuinfw𝒮yw\left\lVert u-u^{*}\right\rVert\leq\inf_{w\in\mathcal{S}}\left\lVert y-w\right\rVert. The set of saddle points can also be described as 𝒮={y^}+kerM\mathcal{S}=\{\hat{y}\}+\ker M, for any y^𝒮\hat{y}\in\mathcal{S}. Therefore, infw𝒮yw=infvkerMyy^v\inf_{w\in\mathcal{S}}\left\lVert y-w\right\rVert=\inf_{v\in\ker M}\left\lVert y-\hat{y}-v\right\rVert. Since IMMI-M^{\dagger}M is the orthogonal projection onto kerM\ker M,

infvkerM\displaystyle\inf_{v\in\ker M} yy^v=(yy^)(IMM)(yy^)\displaystyle\left\lVert y-\hat{y}-v\right\rVert=\left\lVert(y-\hat{y})-(I-M^{\dagger}M)(y-\hat{y})\right\rVert
=M(MyMy^)=M(My+q)My˙,\displaystyle=\left\lVert M^{\dagger}(My-M\hat{y})\right\rVert=\left\lVert M^{\dagger}(My+q)\right\rVert\leq\left\lVert M^{\dagger}\right\rVert\left\lVert\dot{y}\right\rVert,

and therefore,

uu2Mi𝒱y˙i2<Mi𝒱δ2NM=δ2.\left\lVert u-u^{*}\right\rVert^{2}\leq\left\lVert M^{\dagger}\right\rVert\sum_{i\in\mathcal{V}}\left\lVert\dot{y}_{i}\right\rVert^{2}<\left\lVert M^{\dagger}\right\rVert\sum_{i\in\mathcal{V}}\frac{\delta^{2}}{N\left\lVert M^{\dagger}\right\rVert}=\delta^{2}.

The suboptimality certificate can be checked in a fully distributed manner using information locally available to each agent provided that M\left\lVert M^{\dagger}\right\rVert is known. Because MM depends only on the objective 𝒬\mathcal{Q} and constraints 𝒜\mathcal{A}, which in turn comes from the sample trajectory wdw^{d}, it can be computed offline. Finally, it is possible for each agent to compute a bound on M\left\lVert M^{\dagger}\right\rVert using the fact that, for M=[M1𝖳,M2𝖳,MN𝖳]𝖳M=[M_{1}^{\mathsf{T}},M_{2}^{\mathsf{T}},\dots M_{N}^{\mathsf{T}}]^{\mathsf{T}}, one has MMi\left\lVert M^{\dagger}\right\rVert\leq\lVert M_{i}^{\dagger}\rVert for all i𝒱i\in\mathcal{V}. It follows that for all i𝒱i\in\mathcal{V},

M[2𝒬iFii𝒜i𝖳Fij1𝒜j1𝖳Fij|𝒩i|𝒜j|𝒩i|𝖳],\left\lVert M^{\dagger}\right\rVert\leq\left\lVert\begin{bmatrix}2\mathcal{Q}_{i}&F_{ii}\mathcal{A}_{i}^{\mathsf{T}}&F_{ij_{1}}\mathcal{A}_{j_{1}}^{\mathsf{T}}&\cdots&F_{ij_{|\mathcal{N}_{i}|}}\mathcal{A}_{j_{|\mathcal{N}_{i}|}}^{\mathsf{T}}\end{bmatrix}^{\dagger}\right\rVert,

for {j1,j2,,j|𝒩i|}=𝒩i\{j_{1},j_{2},\dots,j_{|\mathcal{N}_{i}|}\}=\mathcal{N}_{i}, so each agent can compute a bound on M\left\lVert M^{\dagger}\right\rVert using data available to itself and its neighbors.

Refer to caption
Refer to caption
(a) Simulation of Newman-Watts-Strogatz network with T=nT=n
Refer to caption
Refer to caption
(b) Simulation of star network with T=5<nT=5<n
Figure 1: State trajectories of Data-based Receding Horizon Controller on various network topologies.

V-C Numerical Simulations

We simulate the proposed distributed data-based predictive controller on a Newman-Watts-Strogatz network [27] and a star network. In each case, AA and BB are chosen at random so that (A,B)(A,B) is controllable. The input sequence is chosen as ud(t)=Kxd(t)+w(t)u^{d}(t)=Kx^{d}(t)+w(t), where KK is a matrix so that A+BKA+BK is marginally stable (the data does not need to be generated from a stable system, but this is done to avoid numerical issues), and w(t)w(t) is a Gaussian white noise process. We use Proposition IV.2 to ensure that the data is informative enough for data-driven control. In both cases, condition (i) is satisfied. Condition (ii) fails for the Newman-Watts-Strogatz network, but is satisfied by the star network (cf. Remark IV.3). We integrate the primal-dual flow using the stopping condition in Proposition V.3 is used to terminate the flow. Fig. 1 shows the closed-loop state trajectories and the number of iterations on each time step with different values of ρ\rho. For the Newman-Watts-Strogatz network, cf. Fig. 1(a), the time horizon is T=nT=n. For the star network, cf. Fig. 1(b), the time horizon is T=5<nT=5<n, but the optimization at each time step is still feasible. In both cases, the distributed data-based predictive controller better approximates the exact MPC for smaller values of ρ\rho at the cost of more iterations per time step.

VI Conclusions and Future Work

We have introduced a distributed data-based predictive controller for stabilizing network linear dynamics described by unknown system matrices. Instead of building a dynamic model, agents learn a non-parametric representation based on a single trajectory and use it to implement a controller as the solution of a network optimization problem solved in a receding horizon manner and in a distributed way. Future work will explicitly quantify the tolerance δ\delta^{*} in terms of the available data and study ways to construct a terminal cost without knowledge of the underlying model to guarantee stability when the stabilizing terminal constraint is omitted. We plan to extend the results to cases where there are constraints on the state and input, characterize the robustness properties of the introduced control scheme, investigate ways of improving its scalability, and consider more general scenarios, including the presence of noise in the data, inputs not persistently exciting of sufficiently high order, and partial observations of the network state. We also plan to explore improvements to the primal-dual flow to solve the optimization problem with fewer iterations and less communication between the agents.

References

  • [1] F. Bullo, J. Cortés, and S. Martinez, Distributed Control of Robotic Networks. Applied Mathematics Series, Princeton University Press, 2009.
  • [2] M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in Multiagent Networks. Applied Mathematics Series, Princeton University Press, 2010.
  • [3] R. R. Negenborn and J. M. Maestre, “Distributed model predictive control: An overview and roadmap of future research opportunities,” IEEE Control Systems, vol. 34, no. 4, pp. 87–97, 2014.
  • [4] M. Rotkowitz and S. Lall, “A characterization of convex problems in decentralized control,” IEEE Transactions on Automatic Control, vol. 51, no. 2, pp. 274–286, 2006.
  • [5] F. Lin, M. Fardad, and M. R. Jovanovic, “Design of optimal sparse feedback gains via the alternating direction method of multipliers,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2426–2431, 2013.
  • [6] G. Fazelnia, R. Madani, A. Kalbat, and J. Lavaei, “Convex relaxation for optimal distributed control problems,” IEEE Transactions on Automatic Control, vol. 62, no. 1, pp. 206–221, 2017.
  • [7] L. Furieri, Y. Zheng, A. Papachristodoulou, and M. Kamgarpour, “On separable quadratic Lyapunov functions for convex design of distributed controllers,” in European Control Conference, (Naples, Italy), pp. 42–49, 2019.
  • [8] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
  • [9] L. Bu, R. Babu, B. D. Schutter, et al., “A comprehensive survey of multiagent reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2, pp. 156–172, 2008.
  • [10] B. Recht, “A tour of reinforcement learning: The view from continuous control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019.
  • [11] D. Jia and B. Krogh, “Min-max feedback model predictive control for distributed control with communication,” in American Control Conference, (Anchorage, AK), pp. 4507–4512, 2002.
  • [12] W. B. Dunbar, “Distributed receding horizon control of dynamically coupled nonlinear systems,” IEEE Transactions on Automatic Control, vol. 52, no. 7, pp. 1249–1263, 2007.
  • [13] B. T. Stewart, A. N. Venkat, J. B. Rawlings, S. J. Wright, and G. Pannocchia, “Cooperative distributed model predictive control,” Systems & Control Letters, vol. 59, no. 8, pp. 460–469, 2010.
  • [14] L. Ljung, System Identification: Theory for the User. Prentice Hall information and system sciences series, Prentice Hall, 1999.
  • [15] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. M. D. Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
  • [16] U. Park and M. Ikeda, “Stability analysis and control design of lti discrete-time systems by the direct use of time series data,” Automatica, vol. 45, no. 5, pp. 1265–1271, 2009.
  • [17] T. M. Maupong and P. Rapisarda, “Data-driven control: A behavioral approach,” Systems & Control Letters, vol. 101, pp. 37–43, 2017.
  • [18] C. D. Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality and robustness,” arXiv preprint arXiv:1903.06842, 2019.
  • [19] J. Coulson, J. Lygeros, and F. Dörfler, “Data-enabled predictive control: in the shallows of the DeePC,” in European Control Conference, (Naples, Italy), pp. 307–312, 2019.
  • [20] J. Berberich, J. Köhler, A. M. Müller, and F. Allgöwer, “Data-driven model predictive control with stability and robustness guarantees,” arXiv preprint arXiv:1906.04679, 2019.
  • [21] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
  • [22] A. Cherukuri, B. Gharesifard, and J. Cortés, “Saddle-point dynamics: conditions for asymptotic stability of saddle points,” SIAM Journal on Control and Optimization, vol. 55, no. 1, pp. 486–511, 2017.
  • [23] F. Borrelli, A. Bemporad, and M. Morari, Predictive Control for Linear and Hybrid Systems. Cambridge, UK: Cambridge University Press, 2017.
  • [24] A. Jadbabaie and A. S. Morse, “On the ISS property for receding horizon control of constrained linear systems,” IFAC Proceedings Volumes, vol. 35, no. 1, pp. 37–40, 2002.
  • [25] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert, “Constrained model predictive control: Stability and optimality,” Automatica, vol. 36, pp. 789–814, 2000.
  • [26] Z.-P. Jiang and Y. Wang, “Input-to-state stability for discrete-time nonlinear systems,” Automatica, vol. 37, no. 6, pp. 857–869, 2001.
  • [27] M. E. J. Newman and D. J. Watts, “Renormalization group analysis of the small-world network model,” Physics Letters A, vol. 263, no. 4-6, pp. 341–346, 1999.