This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Data-driven design of explicit predictive controllers
using model-based priors

Valentina Breschi [email protected]    Andrea Sassella [email protected]    Simone Formentin [email protected] Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza L. da Vinci 32, 20133 Milano, Italy.
Abstract

In this paper, we propose a data-driven approach to derive explicit predictive control laws, without requiring any intermediate identification step. The keystone of the presented strategy is the exploitation of available priors on the control law, coming from model-based analysis. Specifically, by leveraging on the knowledge that the optimal predictive controller is expressed as a piecewise affine (PWA) law, we directly optimize the parameters of such an analytical controller from data, instead of running an on-line optimization problem. As the proposed method allows us to automatically retrieve also a model of the closed-loop system, we show that we can apply model-based techniques to perform a stability check prior to the controller deployment. The effectiveness of the proposed strategy is assessed on two benchmark simulation examples, through which we also discuss the use of regularization and its combination with averaging techniques to handle the presence of noise.

keywords:
Data-driven control; learning-based control; predictive control; explicit predictive control
thanks: This project was partially supported by the Italian Ministry of University and Research under the PRIN’17 project “Data-driven learning of constrained control systems”, contract no. 2017J89ARP.

, ,

1 Introduction

For its capability of handling constraints, Model predictive control (MPC) is a widely employed technique for advanced control applications (see, e.g., [24, 23, 19, 14, 21]). Due to the increasing complexity of the systems to be controlled, the model required by MPC is often no longer parameterized from the physics but learned from data in a black-box fashion. However, this identification step generally takes the lion’s share of the time and effort required for control design. In the last years, research has thus benched into two main direction. On the one side, efforts has been focused on improving and easing learning procedures [9, 20]. On the other side, many approaches have been proposed to directly employ data for the design of predictive controllers, while bypassing any model identification step. Among existing works in this direction, we recall the foundational contributions of [10, 5], which have been extended to handle tracking problems [4], to deal with nonlinear systems [6], and to improve the performance in the presence of noise [13, 11], just to mention a few. For both traditional and data-driven predictive controls, the computational effort required to solve the related constrained optimization problem is known to be a potential limit for its application, especially for fast-sampling systems. Nonetheless, when the MPC problem is relatively small, i.e., one has to control a low order system and/or the prediction horizon is relatively short, this limitation can be overcome by explicitly deriving its solution [3]. In fact, when the cost of the optimization problem is quadratic and the constraints are linear, the explicit solution of MPC is known to be a Piece-Wise Affine (PWA) state feedback law.

In this work, we propose an approach to directly learn explicit predictive control laws from data. More specifically, we initially build on the foundational results in [12, 26] and on model-based priors (namely, the aforementioned fact that the optimal solution of linear MPC is PWA [3]), to construct a data-driven, multi-step closed-loop predictor. The latter is exploited to construct a fully data-based optimization problem, yielding the estimation of the parameters of the optimal predictive control law. As a by-product, we obtain a data-driven description of the closed-loop behavior, which can be used in combination with existing model-based techniques (see, e.g., [22]) to check the stability of the system controlled with the data-driven explicit controller, prior to its deployment. As far as we are aware, this is the first time a data-driven predictive control strategy is provided together with a preliminary assessment of its closed-loop performance.

We should mention here that an early attempt to obtain a data-driven counterpart of explicit MPC was already carried out in [25]. However, the strategy proposed therein relies on an implicit open-loop identification step. Another method was presented in [8], by relying on the behavioral predictor used in [11, 5]. Even though that method involves no open-loop identification phase and it does not require the state to be fully measurable, the latter does not leverage on priors coming from model-based analysis and design. Nonetheless, by leveraging on priors, we are here able to retrieve a data-based characterization of the closed-loop and, thus, practically assess its stability in a data-driven fashion prior to its deployment. This is instead not possible in [8], where stability cannot be directly assessed nor it is theoretically guaranteed in presence of noisy data.

The remainder of the paper is organized as follows. The targeted problem is formalized in Section 2, while all the steps required to obtain its data-driven formulation from priors are introduced in Section 3. The explicit control law is derived in Section 4, where we additional discuss practical aspects for its implementation, certified deployment and noise handling. The effectiveness of the proposed strategy is assessed on two benchmark examples in Section 5. Section 6 concludes the work and indicates some directions for future research.

Notation

We denote with 0\mathbb{N}_{0} the set of natural numbers, that includes zero. Let \mathbb{R}, n\mathbb{R}^{n} and n×m\mathbb{R}^{n\times m} be the set of real numbers, column vectors of dimension nn and n×mn\times m dimensional real matrices, respectively. Given Bm×nB\in\mathbb{R}^{m\times n}, its transpose is BB^{\top}, its Moore-Penrose inverse is BB^{\dagger} and, when m=nm=n its inverse is indicated as B1B^{-1}. Given a vector vnv\in\mathbb{R}^{n}, [v]i:j[v]_{i:j} indicates its rows from ii to jj, with ijni\leq j\leq n. For a matrix Bn×mB\in\mathbb{R}^{n\times m}, [B]1:i,1:j[B]_{1:i,1:j} denotes a sub-matrix comprising the first ii rows and jj columns of BB, with ini\leq n and jmj\leq m. Identity matrices are denotes as II, while zero matrices and vectors will be denoted as 0. If a matrix Qn×nQ\in\mathbb{R}^{n\times n} is positive definite (positive semi-definite), this is denoted as Q0Q\succ 0 (Q0Q\succeq 0). Given a vector xnx\in\mathbb{R}^{n}, the quadratic form xQxx^{\prime}Qx is compactly indicated as xQ2\|x\|_{Q}^{2}. Given a signal {νtm}t0\{\nu_{t}\in\mathbb{R}^{m}\}_{t\in\mathbb{N}_{0}} and 0<L<T0<L<T, we denote with N0,L,T1mL×TL+1N_{0,L,T-1}\in\mathbb{R}^{mL\times T-L+1} the associated Hankel matrix

N0,L,T1=[ν(0)ν(1)ν(TL)ν(1)ν(2)ν(TL+1)ν(L)ν(L+1)ν(T1)],N_{0,L,T-1}=\begin{bmatrix}\nu(0)&\nu(1)&\cdots&\nu(T-L)\\ \nu(1)&\nu(2)&\cdots&\nu(T-L+1)\\ \vdots&\vdots&\ddots&\vdots\\ \nu(L)&\nu(L+1)&\cdots&\nu(T-1)\end{bmatrix}, (1)

while, for i,j0i,j\in\mathbb{N}_{0}, we introduce

Ni,Tj=[ν(i)ν(i+1)ν(Tj)],i<Tj.N_{i,T-j}=\begin{bmatrix}\nu(i)&\nu(i+1)&\cdots&\nu(T-j)\end{bmatrix},~{}~{}i<T-j. (2)

2 Problem formulation

Consider the class 𝒮\mathcal{S} of discrete-time linear, time invariant (LTI), controllable systems with fully measurable state, i.e.,

𝒮:{x(t+1)=Ax(t)+Bu(t),yo(t)=x(t),\mathcal{S}:\begin{cases}x(t+1)=Ax(t)+Bu(t),\\ y^{\mathrm{o}}(t)=x(t),\end{cases} (3)

where x(t)nx(t)\in\mathbb{R}^{n} denotes the state of 𝒮\mathcal{S} at time t0t\in\mathbb{N}_{0}, u(t)mu(t)\in\mathbb{R}^{m} is an exogenous input and yo(t)ny^{\mathrm{o}}(t)\in\mathbb{R}^{n} is the associated noiseless output. Let us consider the following model predictive control (MPC) problem:

minimize{u~(k)}k=0L1k=0L1[x~(k)Q2+u~(k)R2]+x~(L)P2\displaystyle\underset{\{\tilde{u}(k)\}_{k=0}^{L-1}}{\mbox{minimize}}~{}~{}~{}\sum_{k=0}^{L-1}\left[\|\tilde{x}(k)\|_{Q}^{2}\!+\!\|\tilde{u}(k)\|_{R}^{2}\right]\!+\!\|\tilde{x}(L)\|_{P}^{2} (4a)
s.t.x~(k+1)=Ax~(k)+Bu~(k),k=0,,L1,\displaystyle\quad\mbox{s.t.}~{}~{}\tilde{x}(k\!+\!1)\!=\!A\tilde{x}(k)\!+\!B\tilde{u}(k),~{}~{}k\!=\!0,\ldots,L\!-\!1, (4b)
Cxx~(k)+Cuu~(k)d,k=0,,L1,\displaystyle\qquad\quad C_{x}\tilde{x}(k)+C_{u}\tilde{u}(k)\leq d,~{}~{}k\!=\!0,\ldots,L\!-\!1, (4c)
x~(0)=x(t).\displaystyle\qquad\quad\tilde{x}(0)=x(t). (4d)

The objective of (4) is to steer both the (predicted) state x~(k)\tilde{x}(k) and the input u~(k)\tilde{u}(k) to zero, over a prediction horizon of prefixed length L>0L>0. Optimality is indeed dictated by: (i)(i) the distance of the predicted state from zero, penalized with Q0Q\succeq 0 over the whole horizon except for the terminal state (weighted via P0P\succeq 0), and (ii)(ii) the control effort, penalized via R0R\succ 0. Meanwhile, a set of ncn_{c} polyhedral constraints dictated by (4c) has to be satisfied, with Cxnc×nC_{x}\in\mathbb{R}^{n_{c}\times n}, Cunc×mC_{u}\in\mathbb{R}^{n_{c}\times m} and dncd\in\mathbb{R}^{n_{c}}, while relying on the latest information on the system (see (4d)). Assume also that the matrices An×nA\in\mathbb{R}^{n\times n} and Bn×mB\in\mathbb{R}^{n\times m} characterizing the dynamics of 𝒮\mathcal{S} are unknown, and that we can access a set of input/output data pairs 𝒟T={𝒰T,𝒴T}\mathcal{D}_{T}=\{\mathcal{U}_{T},\mathcal{Y}_{T}\}, where 𝒰T\mathcal{U}_{T} and 𝒴T\mathcal{Y}_{T} denote the available input and output sequences, respectively, satisfying the following assumptions.

Assumption 1 (Persistently exciting inputs).

The input sequence 𝒰T={u(t)}t=0T\mathcal{U}_{T}=\{u(t)\}_{t=0}^{T} is persistently exciting of order n+Ln+L.

Assumption 2 (Noisy outputs).

The output sequence 𝒴T={y(t)}t=0T\mathcal{Y}_{T}=\{y(t)\}_{t=0}^{T} is corrupted by noise, namely

y(t)=yo(t)+w(t),y(t)=y^{\mathrm{o}}(t)+w(t), (5)

where v(t)v(t) is the realization of a zero mean white noise with covariance Ωn×n\Omega\in\mathbb{R}^{n\times n}.

Assumption 3 (Sufficiently long dataset).

The length TT of the dataset 𝒟T\mathcal{D}_{T} satisfies the following:

T(m+1)(n+L)1.T\geq(m+1)(n+L)-1.

The goal of this work is to directly exploit the available data to find an explicit solution to (4), under Assumptions 1-3, while bypassing any identification step.

3 Exploiting priors for explicit DDPC

To attain our goal, it is fundamental to replace the model in (4b) with an expression that directly depends on the available data, by also not forgetting what we have learned from model-based predictive control. To start with, we thus recall the following lemma, explicitly stating the form of the optimal explicit predictive controller in our setting [3].

Lemma 1 (On the solution of (4)).

Let UU denote the vector stacking the inputs over the prediction horizon, i.e.,

U=U(x(t))=[u~(0)u~(1)u~(L1)]mL.U=U(x(t))=\begin{bmatrix}\tilde{u}(0)\\ \tilde{u}(1)\\ \vdots\\ \tilde{u}(L-1)\end{bmatrix}\in\mathbb{R}^{mL}. (6)

The optimal control sequence U=U(x(t))U^{\star}=U^{\star}(x(t)) solving (4) is a continuous piecewise affine (PWA) function of x(t)x(t).

Then, we generalize the one-step ahead predictor introduced in [12] to the multi-step case. To this end, it is important to recall that, thanks to Assumptions 1, 3 and the Fundamental Lemma [26], the following rank condition holds:

rank([X0,TL1U0,L,TL1])=n+mL.\mbox{rank}\left(\begin{bmatrix}X_{0,T-L-1}\\ \hline\cr U_{0,L,T-L-1}\end{bmatrix}\right)=n+mL. (7)

We can now derive the multi-step data-based predictor as follows.

Theorem 1 (Data-based multi-step predictor).

Let Assumptions 1 and 3 hold. Given x(t)x(t), the sequence of predicted states X~=X~(x(t))\tilde{X}=\tilde{X}(x(t)) admits the following data-based representation:

X~=[x~(1)x~(L)]=X1,L,TL[X0,TL1U0,L,TL1][x(t)U].\tilde{X}=\begin{bmatrix}\tilde{x}(1)\\ \vdots\\ \tilde{x}(L)\end{bmatrix}=X_{1,L,T-L}\begin{bmatrix}X_{0,T-L-1}\\ \hline\cr U_{0,L,T-L-1}\end{bmatrix}^{\dagger}\begin{bmatrix}x(t)\\ U\end{bmatrix}. (8)
Proof.

The proof follows the steps of the one in [12, Appendix B]. Specifically, let vn+mLv\in\mathbb{R}^{n+mL} and Sn+mL×TS\in\mathbb{R}^{n+mL\times T} be respectively defined as:

v:=[x(t)U],S:=[X0,TL1U0,L,TL1],v:=\begin{bmatrix}x(t)\\ U\end{bmatrix},\quad S:=\begin{bmatrix}X_{0,T-L-1}\\ \hline\cr U_{0,L,T-L-1}\end{bmatrix},

with SS being full row rank, i.e., rank(S)=n+mL\mbox{rank}(S)=n+mL. By the Rouché-Capelli theorem, for any given vv, the equality

v=Sα,v=S\alpha,

admits infinite solutions α\alpha of the form

α=Sv+ΠSw,wT,\alpha=S^{\dagger}v+\Pi_{S}^{\perp}w,\forall w\in\mathbb{R}^{T}, (9)

with ΠS=(ISS)\Pi_{S}^{\perp}=(I-S^{\dagger}S) being the orthogonal projector onto the kernel of SS. Meanwhile, based on the model in (4b), the predicted state sequence can be defined as a function of x(t)x(t) and UU as:

X~=[AAL1]ξx(t)+[B00ABB0AL1BAL2BB]ΓU.\tilde{X}=\underbrace{\begin{bmatrix}A\\ \vdots\\ A^{L\!-\!1}\end{bmatrix}}_{\xi}x(t)+\underbrace{\begin{bmatrix}B&0&\cdots&0\\ AB&B&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ A^{L\!-\!1}B&A^{L\!-\!2}B&\cdots&B\end{bmatrix}}_{\Gamma}U. (10)

In turn, such a sequence can be recast as a function of α\alpha, i.e.,

X~=[ξΓ]Sα=X1,L,TLα.\tilde{X}=\begin{bmatrix}\xi&\Gamma\end{bmatrix}S\alpha=X_{1,L,T-L}\alpha.

where the second equality straightforwardly follows from (10) and the definition of SS. By replacing α\alpha with (9), we then obtain

X~=X1,L,TL(Sv+ΠSw)=X1,L,TLSv,\tilde{X}=X_{1,L,T-L}\left(S^{\dagger}v+\Pi_{S}^{\perp}w\right)=X_{1,L,T-L}S^{\dagger}v,

as X1,L,TLΠS=[ξΓ]SΠS=0X_{1,L,T-L}\Pi_{S}^{\perp}=\begin{bmatrix}\xi&\Gamma\end{bmatrix}S\Pi_{S}^{\perp}=0, based on the definition of the projector. ∎

This preliminary result allows us to exploit priors on the solution of (4) for the definition of the DDPC problem. In fact, according to Lemma 1, we can parameterize the control sequence UU as:

U={K1x(t)+f1, if H1x(t)1,KMx(t)+fM, if HMx(t)M,U\!=\!\begin{cases}K_{1}x(t)\!+\!f_{1},~{}\mbox{ if }H_{1}x(t)\leq\ell_{1},\\ \vdots\\ K_{M}x(t)\!+\!f_{M},~{}\mbox{ if }H_{M}x(t)\leq\ell_{M},\end{cases} (11)

where KimL×nK_{i}\in\mathbb{R}^{mL\times n} and fimLf_{i}\in\mathbb{R}^{mL} are the (unknown) feedback and affine gains characterizing the control law, {Hi,i}i=1M\{H_{i},\ell_{i}\}_{i=1}^{M} dictates the associated polyhedral partition, for i=1,,Mi=1,\ldots,M, and the amounts of modes MM is dictated by the number of possible combinations of active constraints. Therefore, for a given state x(t)x(t), the input sequence is the affine function

U(x(t))=Kx(t)+f,U(x(t))=Kx(t)+f, (12)

with K=Ks(x(t))K=K_{s(x(t))} and f=fs(x(t))f=f_{s(x(t))} denoting the gains associated to the active control law, and with

s(x(t))=iHix(t)i,i{1,,M}.s(x(t))=i\quad\iff\quad H_{i}x(t)\leq\ell_{i},i\in\{1,\ldots,M\}.

Based on this parameterization, we can compute a data-based closed-loop characterization of the predictor in (4b), as outlined in the following theorem.

Theorem 2 (Closed-loop multi-step predictor).

Let Assumptions 1 and 3 hold. Given x(t)x(t), the sequence of predicted states X~=X~(x(t))\tilde{X}=\tilde{X}(x(t)) can be equivalently expressed as:

X~=X1,L,TL(GKx(t)+Gf),\tilde{X}=X_{1,L,T-L}\left(G_{K}x(t)+G_{f}\right), (13a)
with GKTL×nG_{K}\in\mathbb{R}^{T-L\times n} and GfTLG_{f}\in\mathbb{R}^{T-L} satisfying:
[IK]=[X0,TL1U0,L,TL1]GK\displaystyle\begin{bmatrix}I\\ K\end{bmatrix}=\begin{bmatrix}X_{0,T-L-1}\\ \hline\cr U_{0,L,T-L-1}\end{bmatrix}G_{K} (13b)
[0f]=[X0,TL1U0,L,TL1]Gf\displaystyle\begin{bmatrix}0\\ f\end{bmatrix}=\begin{bmatrix}X_{0,T-L-1}\\ \hline\cr U_{0,L,T-L-1}\end{bmatrix}G_{f} (13c)

where KK and ff characterize the local control law (12). Accordingly, the input sequence is given by:

U=U0,L,TL1(GKx(t)+Gf).U=U_{0,L,T-L-1}(G_{K}x(t)+G_{f}). (14)
Proof.

For the Rouché-Capelli theorem, there exists an T×nT\times n matrix GkG_{k} and a TT-dimensional vector GfG_{f} such that (13b)-(13c) hold. Meanwhile, replacing (12) into the open-loop predictor in (8), the predicted state sequence can be represented as:

X~=X1,L,TL[X0,TL1U0,L,TL1]([IK]x(t)+[0f]).\tilde{X}=X_{1,L,T-L}\begin{bmatrix}X_{0,T-L-1}\\ \hline\cr U_{0,L,T-L-1}\end{bmatrix}^{\dagger}\left(\begin{bmatrix}I\\ K\end{bmatrix}x(t)+\begin{bmatrix}0\\ f\end{bmatrix}\right).

By combining these two results, the closed-loop representation in (13) and the equivalent definition of the input sequence in (14) straightforwardly follow. ∎

By leveraging on (13)-(14), we can now equivalently recast the predictive control task in (4) as an optimization problem with the closed-loop matrices GK,GfG_{K},G_{f} being the decision variables, as follows111Notice that the term in the cost that depends only on x(t)x(t) has been neglected. This can be done without loss of generality, as the optimal solution does not change.:

minimizeGK,GfX~𝒬X~+UU\displaystyle\underset{G_{K},G_{f}}{\mbox{minimize}}~{}~{}~{}\tilde{X}^{\top}\mathcal{Q}\tilde{X}\!+\!U^{\top}\mathcal{R}U (15a)
s.t.X~=X1,L,TL(GKx(t)+Gf)\displaystyle\qquad\mbox{s.t.}~{}~{}~{}\tilde{X}=X_{1,L,T-L}\left(G_{K}x(t)+G_{f}\right) (15b)
U=U0,L,TL1(GKx(t)+Gf),\displaystyle\qquad\qquad~{}U=U_{0,L,T-L-1}(G_{K}x(t)+G_{f}), (15c)
[Cx00𝒞x][x(t)X~]+𝒞uU𝒟,\displaystyle\qquad\qquad~{}\begin{bmatrix}C_{x}&0\\ 0&\mathcal{C}_{x}\end{bmatrix}\begin{bmatrix}x(t)\\ \tilde{X}\end{bmatrix}+\mathcal{C}_{u}U\leq\mathcal{D}, (15d)
X0,TL1GKx(t)=x(t),\displaystyle\qquad\qquad~{}X_{0,T-L-1}G_{K}x(t)=x(t), (15e)
X0,TL1Gf=0,\displaystyle\qquad\qquad~{}X_{0,T-L-1}G_{f}=0, (15f)

In (15), 𝒬=diag([Q,,Q,P])\mathcal{Q}\!=\!\mbox{diag}\left([Q,\cdots,Q,P]\right), =diag([R,,R])\mathcal{R}\!=\!\mbox{diag}\left([R,\cdots,R]\right), 𝒞u=diag([Cu,,Cu])\mathcal{C}_{u}\!=\!\mbox{diag}\left([C_{u},\cdots,C_{u}]\right) and

𝒞x=[Cx0000Cx0000Cx0],𝒟=[dd].\displaystyle\mathcal{C}_{x}\!=\!\!\begin{bmatrix}C_{x}&0&\cdots&0&0\\ 0&C_{x}&\cdots&0&0\\ \vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&\cdots&C_{x}&0\end{bmatrix}\!\!,~{}~{}~{}~{}\mathcal{D}=\begin{bmatrix}d\\ \vdots\\ d\end{bmatrix}\!\!.

Note that, the last constraints (see (15e)-(15f)) are introduced for the problem to be consistent with the closed-loop representation in (13).

This shift from an open-loop predictor to its closed-loop counterpart allows us to directly learn the control law from data, and avoid any system identification step.

Remark 1.

Both the equivalences in (8) and (13) exactly hold in a noiseless setting only. As such, problem (15) is equivalent to (4) only when the available batch of data is noise-free.

4 Learning Explicit DDPC

To derive its explicit solution, the problem in (15) is manipulated to obtain a multi-parametric Quadratic Program (mp-QP). As a preliminary step, we condense the unknowns of (15) into a single variable:

G(t)=[GKx(t)Gf]2(TL).G(t)=\begin{bmatrix}G_{K}x(t)\\ G_{f}\end{bmatrix}\in\mathbb{R}^{2(T-L)}. (16)

Accordingly, we can recast (15) as the following mp-QP

minimizeG(t)(G(t))dG(t)\displaystyle\underset{G(t)}{\mbox{minimize}}~{}~{}~{}(G(t))^{\top}\mathcal{H}_{d}G(t) (17a)
s.t.ΞdG(t)+Ψx(t)𝒟,\displaystyle\qquad\mbox{s.t.}~{}\quad\Xi_{d}G(t)+\Psi x(t)\leq\mathcal{D}, (17b)
ΘdG(t)=[x(t)0],\displaystyle\qquad\qquad~{}~{}\Theta_{d}G(t)=\begin{bmatrix}x(t)\\ 0\end{bmatrix}, (17c)

where

d=𝒳𝒬𝒳+𝒱𝒱,\displaystyle\mathcal{H}_{d}=\mathcal{X}^{\top}\mathcal{Q}\mathcal{X}+\mathcal{V}^{\top}\mathcal{R}\mathcal{V}, (18a)
Ξd=[Cx00𝒞x][0𝒳]+𝒞u𝒱,Ψ=[Cx00𝒞x][I0],\displaystyle\Xi_{d}=\begin{bmatrix}C_{x}&0\\ 0&\mathcal{C}_{x}\end{bmatrix}\begin{bmatrix}0\\ \mathcal{X}\end{bmatrix}+\mathcal{C}_{u}\mathcal{V},~{}~{}\Psi\!=\!\begin{bmatrix}C_{x}&0\\ 0&\mathcal{C}_{x}\end{bmatrix}\begin{bmatrix}I\\ 0\end{bmatrix}\!\!, (18b)
Θd=diag([X0,TL1,X0,TL1]),,\displaystyle\Theta_{d}\!=\!\mbox{diag}\left([X_{0,T\!-L\!-1},X_{0,T\!-L\!-1}]\right),, (18c)
and
𝒳=[X1,L,TLX1,L,TL],\displaystyle\mathcal{X}\!=\!\begin{bmatrix}X_{1,L,T\!-L}&X_{1,L,T\!-L}\end{bmatrix},
𝒱=[U0,L,TL1U0,L,TL1].\displaystyle\mathcal{V}\!=\!\begin{bmatrix}U_{0,L,T\!-L\!-1}&U_{0,L,T\!-L\!-1}\end{bmatrix}.

By focusing on d\mathcal{H}_{d} in (17a), it can be proven that this weighting matrix satisfies the following lemma.

Lemma 2 (Features of d\mathcal{H}_{d}).

Under Assumptions 1 and 3, d\mathcal{H}_{d} is positive semi-definite.

Proof.

This is a direct consequence of Assumptions 1 and 3, for which 𝒱2(TL)×mL\mathcal{V}^{\top}\in\mathbb{R}^{2(T-L)\times mL} is not full row rank. ∎

As the cost should be strictly convex for a unique explicit solution to be retrieved, this feature of d\mathcal{H}_{d} prevents us from deriving the explicit law. To overcome this limitation, we introduce a regularization term in the cost of (17), thus replacing the weight d\mathcal{H}_{d} with:

dγ=12(d+γI),\mathcal{H}_{d}^{{\gamma}}=\frac{1}{2}\left(\mathcal{H}_{d}+\gamma I\right), (19)

where γ>0\gamma>0 is an hyper-parameter to be tuned222The cost has been normalized to ease the subsequent derivations.. The data-driven control problem then corresponds to the regularized mp-QP:

minimizeG(t)(G(t))dγG(t)\displaystyle\underset{G(t)}{\mbox{minimize}}~{}~{}~{}(G(t))^{\top}\mathcal{H}_{d}^{\gamma}G(t) (20a)
s.t.ΞdG(t)+Ψx(t)𝒟,\displaystyle\qquad\mbox{s.t.}~{}\quad\Xi_{d}G(t)+\Psi x(t)\leq\mathcal{D}, (20b)
ΘdG(t)=[x(t)0].\displaystyle\qquad\qquad~{}~{}\Theta_{d}G(t)=\begin{bmatrix}x(t)\\ 0\end{bmatrix}. (20c)

4.1 Derivation of the explicit DDPC law

The introduction of the regularizer in (20) allows us to derive the explicit DDPC law through the manipulation of the Karush-Kuhn-Tucker (KKT) conditions associated with the new DDPC problem. To ease the computations, let us consider the following further assumption.

Assumption 4 (Non-degenerate constraints).

The active constraints of (20) are linearly independent.

Based on this assumption, we now follow the same steps used to derive the explicit model-based predictive control law in [3].

The KKT conditions for the regularized DDPC problem in (20) are:

dγG(t)+Ξdλ+Θdμ=0,\displaystyle\mathcal{H}_{d}^{\gamma}G(t)+\Xi_{d}^{\top}\lambda+\Theta_{d}^{\top}\mu=0, (21a)
λ(ΞdG(t)+Ψx(t)𝒟)=0,\displaystyle\lambda^{\top}(\Xi_{d}G(t)+\Psi x(t)-\mathcal{D})=0, (21b)
λ0,\displaystyle\lambda\geq 0, (21c)
ΞdG(t)+Ψx(t)𝒟0,\displaystyle\Xi_{d}G(t)+\Psi x(t)-\mathcal{D}\leq 0, (21d)
ΘdG(t)[x(t)0]=0,\displaystyle\Theta_{d}G(t)-\begin{bmatrix}x(t)\\ 0\end{bmatrix}=0, (21e)
where λ\lambda and μ\mu are the Lagrange multipliers associated with inequality and equality constraints in (20b) and (20c), respectively.

Let us focus on the ii-th set of active constraints only, distinguishing between the Lagrange multipliers associated with a given active and inactive inequality constraints. We respectively denote them as λ~i\tilde{\lambda}_{i} and λ¯i\bar{\lambda}_{i}. It is straightforward to notice that the combination of (21b) and (21c) leads to the following condition on λ¯i\bar{\lambda}_{i}:

λ¯i=0.\bar{\lambda}_{i}=0.

By merging (21b) and (21e) for the ii-th set of active constraints, it is also straightforward to show that the optimal solution Gi(t)G_{i}(t) satisfies

Φd,iGi(t)S~ix(t)W~i=0,\Phi_{d,i}G_{i}(t)-\tilde{S}_{i}x(t)-\tilde{W}_{i}=0, (22)

where

Φd,i=[Ξ~d,iΘd],S~i=[Ψ~iI0],W~i=[𝒟~i0]\Phi_{d,i}=\begin{bmatrix}\tilde{\Xi}_{d,i}\\ \Theta_{d}\end{bmatrix},~{}~{}\tilde{S}_{i}=\begin{bmatrix}-\tilde{\Psi}_{i}\\ I\\ 0\end{bmatrix},~{}~{}\tilde{W}_{i}=\begin{bmatrix}\tilde{\mathcal{D}}_{i}\\ 0\end{bmatrix}

and Ξ~d,i\tilde{\Xi}_{d,i}, Ψ~i\tilde{\Psi}_{i} and 𝒟~i\tilde{\mathcal{D}}_{i} are the rows of Ξd\Xi_{d}, Ψ\Psi and 𝒟\mathcal{D} coupled with the considered set active constraints. By leveraging on (21a), we can now express our optimization variable Gi(t)G_{i}(t) as a function of the Lagrange multipliers, i.e.,

Gi(t)=(dγ)1[Ξ~d,iΘd]Φd,iΛ~i,G_{i}(t)=-(\mathcal{H}_{d}^{\gamma})^{-1}\underbrace{\begin{bmatrix}\tilde{\Xi}_{d,i}^{\top}&\Theta_{d}^{\top}\end{bmatrix}}_{\Phi_{d,i}^{\top}}\tilde{\Lambda}_{i}, (23)

where

Λ~i=[λ~iμ].\tilde{\Lambda}_{i}=\begin{bmatrix}\tilde{\lambda}_{i}\\ \mu\end{bmatrix}.

We can now replace the latter into (22) to obtain an explicit expression of the Lagrange multipliers as functions of the matrices characterizing (20):

Λ~i=[Φd,i(dγ)1Φd,i]1Υd,i(S~ix(t)+W~i).\tilde{\Lambda}_{i}=-\underbrace{\left[\Phi_{d,i}\left(\mathcal{H}_{d}^{\gamma}\right)^{-1}\Phi_{d,i}^{\top}\right]^{-1}}_{\Upsilon_{d,i}}\left(\tilde{S}_{i}x(t)\!+\!\tilde{W}_{i}\right). (24)

In turn, this allows us to explicitly retrieve Gi(t)G_{i}(t) as:

Gi(t)=(dγ)1Φd,iΥd,i(S~ix(t)+W~i),G_{i}(t)\!=\!\left(\mathcal{H}_{d}^{\gamma}\right)^{-1}\!\Phi_{d,i}^{\top}\Upsilon_{d,i}\left(\tilde{S}_{i}x(t)\!+\!\tilde{W}_{i}\right), (25)

and the associated optimal input sequence as

Ui(x(t))=𝒱(dγ)1Φd,iΥd,i(S~ix(t)+W~i).U_{i}(x(t))\!=\!\mathcal{V}\left(\mathcal{H}_{d}^{\gamma}\right)^{-1}\!\Phi_{d,i}^{\top}\Upsilon_{d,i}\left(\tilde{S}_{i}x(t)\!+\!\tilde{W}_{i}\right). (26)

Thus, the input to be fed to the system when the ii-th set of constraints is active is defined as:

ui(x(t))=[Ui(x(t))]1:m.u_{i}(x(t))\!=\!\left[U_{i}(x(t))\right]_{1:m}\!. (27)

Through (21c) and (21d), we can finally define the polyhedral region associated with the considered combination of active constraints, which is dictated by the following inequalities:

Υd,i(S~ix(t)+W~i)0,\displaystyle\Upsilon_{d,i}\left(\tilde{S}_{i}x(t)\!+\!\tilde{W}_{i}\right)\leq 0, (28a)
Ξd(dγ)1Φd,iΥd,i(S~ix(t)+W~i)+Ψx(t)𝒟0.\displaystyle\Xi_{d}\!\left(\mathcal{H}_{d}^{\gamma}\right)^{-1}\!\Phi_{d,i}^{\top}\Upsilon_{d,i}\!\left(\tilde{S}_{i}x(t)\!+\!\!\tilde{W}_{i}\right)\!\!+\!\Psi x(t)\!-\!\mathcal{D}\!\leq 0. (28b)

The complete data-driven expression for (11) is then straightforwardly obtained by following the above steps for all possible combinations of the active constraints. This operation ultimately yields an optimal input sequence U(x(t))U(x(t)) of the form:

U(x(t))={U1(x(t)), if d,1x(t)d,1,UM(x(t)), if d,Mx(t)d,M,U(x(t))=\begin{cases}U_{1}(x(t)),\mbox{ if }\mathcal{F}_{d,1}x(t)\leq\mathcal{E}_{d,1},\\ \vdots\\ U_{M}(x(t)),\mbox{ if }\mathcal{F}_{d,M}x(t)\leq\mathcal{E}_{d,M},\end{cases} (29)

where MM is given by the number of possible combinations of active constraints, UiU_{i} corresponds to (26), for all i{1,,M}i\in\{1,\ldots,M\}, while {d,i,d,i}i=1M\{\mathcal{F}_{d,i},\mathcal{E}_{d,i}\}_{i=1}^{M} can be easily obtained from (28). Consequently, the input to be fed to 𝒮\mathcal{S} starting from x(t)x(t) can be retrieved by evaluating the PWA law

u(x(t))={u1(x(t)), if d,1x(t)d,1,uM(x(t)), if d,Mx(t)d,M,u(x(t))=\begin{cases}u_{1}(x(t)),\mbox{ if }\mathcal{F}_{d,1}x(t)\leq\mathcal{E}_{d,1},\\ \vdots\\ u_{M}(x(t)),\mbox{ if }\mathcal{F}_{d,M}x(t)\leq\mathcal{E}_{d,M},\\ \end{cases} (30)

with ui(x(t))u_{i}(x(t)) given by (27), for i=1,,Mi=1,\ldots,M.

Remark 2 (On Assumption 4).

Although introduced to ease computations, we remark that Assumption LABEL:ass:non_degenrate is not restrictive. Indeed, degenerate cases can be straightforwardly handled via existing approaches, e.g., see [3].

Remark 3 (Data-driven and model-based).

Within a noiseless setting, the results in Theorem 2 and the one-to-one correspondence between the chosen parameterization of the control law in (11) and its model-based counterpart guarantee the equivalence between (4) and (15). Therefore, when there is no noise, the data-driven explicit controller coincides with the E-MPC law as γ0\gamma\rightarrow 0.

4.2 Implementing Explicit DDPC

Based on the available batch of data and the features of the considered predictive control problem, the explicit DDPC law can be completely retrieved offline from the available measurements, as summarized in Algorithm 1.

Algorithm 1 Offline construction of the explicit law

Input: Dataset 𝒟T\mathcal{D}_{T}; penalties Q,P0Q,P\succeq 0; R0R\succ 0; horizon N>0N\!>\!0; constraint matrices Cx,Cu,dC_{x},C_{u},d; regularization parameter γ>0\gamma>0.  

  1. 1.

    Construct the data-based matrices X1,L,TLX_{1,L,T-L}, U0,L,TL1U_{0,L,T-L-1}, X0,TL1X_{0,T-L-1}.

  2. 2.

    Build dγ\mathcal{H}_{d}^{\gamma}, Ξd\Xi_{d}, Ψ\Psi, Θd\Theta_{d} in (18a) based on the cost and constraints of the DDPC problem.

  3. 3.

    Find all possible combinations of active constraints.

  4. 4.

    For each combination, isolate the matrices Ξ~d\tilde{\Xi}_{d}, W~\tilde{W} and S~\tilde{S} characterizing (22)

  5. 5.

    If not all rows of Ξ~d\tilde{\Xi}_{d} are linearly independent, handle the degeneracy, e.g., as in [3].

  6. 6.

    Find the PWA explicit law by retrieving (26)-(28) for all possible combinations of active constraints.

  7. 7.

    Merge polyhedral regions as in [2].

  8. 8.

    Extract the first component of the optimal input sequence U(x(t))U(x(t)).

   Output: Optimal input u(x(t))u(x(t)).

Given the data, one has to initially construct the Hankel matrices needed to build the DDPC problem (see steps 0.-0.). Once all the possible combinations of active constraints have been detected at step 0. and degenerate scenarios have been handled (see step 0.), at step 0. the local controllers and the associated polyhedral regions are retrieved according to (26)-(28). Lastly, at step 0., the optimal control sequence is simplified, by merging polyhedral regions whenever possible. After this step, the explicit optimal input can simply be retrieved by extracting the first element of the input sequence U(x(t))U(x(t)) (see step 0.).

Once the explicit DDPC law has been retrieved, the computation of the optimal input at each time instant simply consists of a function evaluation. Specifically, one has to (i)(i) search for the polyhedral region the current state x(t)x(t) belongs to, and (ii)(ii) apply the corresponding parametric law. We stress that this computational advantage is retained for simple control problems only (i.e., for short prediction horizon and small systems). Indeed, the complexity of the PWA law is known to rapidly increase [7] with the one of the DDPC problem to be solved, analogously to the model-based case.

4.3 Explicit data-driven predictive control and closed-loop stability

When designing a controller in a data-driven setting, it is crucial to check the stability of the resulting closed-loop system before the controller deployment. Towards this objective, we now show how the peculiar features of the explicit data-driven predictive control can be leveraged in combination with existing techniques to devise an off-line, data-driven stability test. To this end, let us assume that the ii-th set of constraints is active and consider the following multi-step ahead closed-loop model:

X^(t)=𝒳(dγ)1Φd,iΥd,iS~ix(t)+𝒳(dγ)1Φd,iΥd,iW~i,\hat{X}(t)\!=\!\mathcal{X}(\mathcal{H}_{d}^{\gamma})^{-1}\Phi_{d,i}^{\top}\Upsilon_{d,i}\tilde{S}_{i}x(t)\!+\!\mathcal{X}(\mathcal{H}_{d}^{\gamma})^{-1}\Phi_{d,i}^{\top}\Upsilon_{d,i}\tilde{W}_{i}, (31)

obtained by combining (13) with the result of our explicit derivation in (25), where X^\hat{X} stacks the state predicted by the learned closed-loop model, i.e.,

X^(t)=[x^(t+1)x^(t+L)].\hat{X}(t)=\begin{bmatrix}\hat{x}(t+1)\\ \vdots\\ \hat{x}(t+L)\end{bmatrix}.

From (31), we can then isolate the learned one-step ahead closed-loop model, namely

x^(t+1)=Ad,iclx(t)+fd,icl,\hat{x}(t+1)=A_{d,i}^{cl}x(t)+f_{d,i}^{cl}, (32a)
where
Ad,icl=[𝒳(dγ)1Φd,iΥd,iS~i]1:n,1:n,\displaystyle A_{d,i}^{cl}=[\mathcal{X}(\mathcal{H}_{d}^{\gamma})^{-1}\Phi_{d,i}^{\top}\Upsilon_{d,i}\tilde{S}_{i}]_{1:n,1:n}, (32b)
fd,icl=[𝒳(dγ)1Φd,iΥd,iW~i]1:n.\displaystyle f_{d,i}^{cl}=[\mathcal{X}(\mathcal{H}_{d}^{\gamma})^{-1}\Phi_{d,i}^{\top}\Upsilon_{d,i}\tilde{W}_{i}]_{1:n}. (32c)

When performed for each mode i{1,,M}i\in\{1,\ldots,M\}, these manipulations allow us to retrieve the data-driven closed-loop transition matrix Ad,iclA_{d,i}^{cl} for each i{1,,M}i\in\{1,\ldots,M\}. Retrieving these matrices ultimately enables us to apply model-based techniques, e.g., the ones presented in [22], to shed a light on the features of the final closed-loop. As an example, one can search for a matrix 𝒫n×n\mathcal{P}\in\mathbb{R}^{n\times n} satisfying the following sufficient conditions for asymptotic stability:

𝒫>0\displaystyle\mathcal{P}>0 (33a)
(Ad,icl)𝒫Ad,iclP<0,i=1,,M,\displaystyle(A_{d,i}^{cl})^{\top}\mathcal{P}A_{d,i}^{cl}-P<0,~{}~{}i=1,\ldots,M, (33b)

or, alternatively, look for the set of matrices {𝒫in×n}i=1M\{\mathcal{P}_{i}\in\mathbb{R}^{n\times n}\}_{i=1}^{M} verifying the following LMIs:

𝒫i>0,i=1,,M,\displaystyle\mathcal{P}_{i}>0,~{}~{}i=1,\ldots,M, (34a)
(Ad,icl)𝒫jAd,iclP<0,i,j=1,,M,\displaystyle(A_{d,i}^{cl})^{\top}\mathcal{P}_{j}A_{d,i}^{cl}-P<0,~{}~{}i,j=1,\ldots,M, (34b)
which are also sufficient conditions for asymptotic stability.
Remark 4 (On the tuning of PP in (4a)).

When 𝒮\mathcal{S} is known to be open-loop stable, the terminal weight P0P\succeq 0 in (4a) is generally selected as the solution of the Lyapunov equation

P=APA+Q.P=A^{\top}PA+Q.

This equation can be directly translated into its data-driven counterpart by exploiting [12] as follows:

P=AdPAd+Q,P=A_{d}PA_{d}+Q, (35)

with

Ad=X1,T[X0,T1U0,1,T1][I0],A_{d}=X_{1,T}\!\begin{bmatrix}X_{0,T-1}\\ \hline\cr U_{0,1,T-1}\end{bmatrix}^{\dagger}\!\!\begin{bmatrix}I\\ 0\end{bmatrix},

thus providing a data-based approach for the selection of this parameter.

Remark 5 (Hyper-parameter tuning).

The possibility of performing an off-line data-based stability check on the data-driven explicit law can be useful to preliminarily assess the effects of different choices of the tuning parameters QQ, RR and PP in (4a) and γ\gamma in (19), allowing one to discard the ones resulting in a failure of the data-driven stability tests.

4.4 Regularization and noise handling

As highlighted in Remark 1, all the equivalences we rely on to derive the explicit predictive control law are verified when 𝒟T\mathcal{D}_{T} is noiseless. However, in practice, Ω\Omega in Assumption 2 is generally a non-zero matrix.

To cope with noisy data, we follow the footsteps of [5, 13] and propose to leverage on the regularization term introduced in (19). Indeed, as in standard ridge regression [18], this additional element of the cost steers all the components of G(t)G(t) towards small values. In turn, this potentially limits the impact of noise on the constraints in (20b)-(20c) and, thus, on the final explicit law. This shrinkage effect is modulated by the regularization parameter γ\gamma, with the reduction in the magnitude of G(t)G(t) being stronger whenever large values of γ\gamma are considered. At the same time, γ\gamma implicitly changes the balance between the penalties in the original data-driven control problem in (15), with excessively high values of γ\gamma potentially driving the explicit data-driven controller far away from its model-based counterpart. The choice of this hyper-parameters thus becomes an important tuning-knob of the approach, requiring one to trade-off between handling noise and keeping explicit DDPC as close as possible to the implicit DDPC problem.

Although the stability checks in (33)-(34) can be used to have a preliminary assessment on the effect of different choices of γ\gamma, at the moment this balance can only be attained through closed-loop trials for several values of γ\gamma. Such a procedure allows one to ultimately select the hyper-parameter that best fits one’s needs, at the price of requiring closed-loop experiments that can be rather safety-critical in practice, especially when a simulator of the system is not available.

Whenever multiple experiments can be performed by feeding the plant with the same input sequence 𝒰T\mathcal{U}_{T}, the burden associated to the choice of γ\gamma can be alleviated by exploiting the features of Assumption 2 itself. In this scenario, one can indeed replace 𝒟T\mathcal{D}_{T} with the averaged dataset 𝒟¯T={𝒰T,𝒴¯T}\bar{\mathcal{D}}_{T}=\{\mathcal{U}_{T},\bar{\mathcal{Y}}_{T}\}, where 𝒴¯T={y¯t}t=0T\bar{\mathcal{Y}}_{T}=\{\bar{y}_{t}\}_{t=0}^{T} and

y¯t=1Ni=1Nyt(i),\bar{y}_{t}=\frac{1}{N}\sum_{i=1}^{N}y_{t}^{(i)}, (36)

with yt(i)y_{t}^{(i)} denoting the output of the ii-th experiment. Since the noise is assumed to be zero mean, the law of large numbers asymptotically yields

y¯tNyto.\bar{y}_{t}\underset{N\rightarrow\infty}{\longrightarrow}y_{t}^{\mathrm{o}}. (37)

As such, when the number NN of experiments increases, the role of γ\gamma in handling noise is progressively less dominant. In this case, γ\gamma should then be used only to make the DDPC problem well-defined. Any small γ>0\gamma>0 is acceptable for this purpose.

5 Numerical examples

The performance of the explicit predictive controller are now assessed on two benchmark examples: (i)(i) the regulation to zero of the stable open-loop system of [3], for the case when the state is fully measurable; and (ii)(ii) the altitude control of a quadcopter. Since the last example features an open-loop unstable linearized plant, data are collected in closed-loop, by assuming that the drone is stabilized by an (unknown) pre-existing controller. In both the examples, the level of noise acting on the measured states is assessed through the averaged Signal-to-Noise-Ratio (SNR):

SNR¯=1ni=1n10log10(t=0T(xi(t)wi(t))2t=0T(wi(t))2),[dB]\overline{\mbox{SNR}}\!=\!\frac{1}{n}\!\sum_{i=1}^{n}10\log_{10}\!\left(\!\frac{\sum_{t=0}^{T}(x_{i}(t)-w_{i}(t))^{2}}{\sum_{t=0}^{T}(w_{i}(t))^{2}}\!\right)\!,~{}\mbox{[dB]} (38)

where xi(t)x_{i}(t) and wi(t)w_{i}(t) denote the ii-th components of the state and the measurement noise, respectively. All computations are carried out on an Intel Core i7-7700HQ processor, running MATLAB 2019b.

5.1 Open-loop stable benchmark system

Refer to caption
Figure 1: Open-loop stable benchmark: polyhedral partition of the explicit data-driven law.
Refer to caption
(a) State trajectories.
Refer to caption
(b) Input
Figure 2: Open-loop stable benchmark: state and input trajectories obtained with the explicit data-driven law.

Let us consider the benchmark system described by:

𝒮:x(t+1)=[0.73260.08610.17220.9909]x(t)+[0.06090.0064]u(t).\mathcal{S}:~{}x(t\!+\!1)\!=\!\!\begin{bmatrix}0.7326&-0.0861\\ 0.1722&0.9909\end{bmatrix}x(t)\!+\!\begin{bmatrix}0.0609\\ 0.0064\end{bmatrix}u(t). (39)

Our goal is to regulate both components of the state to zero, while enforcing the following box-constraint on the input:

2u(k)2.-2\leq u(k)\leq 2. (40)

Towards this goal, we collect a set 𝒟T\mathcal{D}_{T} of T=100T=100 input/state pairs, by feeding 𝒮\mathcal{S} with an input sequence uniformly distributed within the interval [5,5][-5,5]. According to Assumption 2, the measured states are corrupted by an additive zero-mean white noise sequence, with variance yielding SNR¯=20\overline{\mbox{SNR}}=20 dB. The parameters characterizing the DDPC problem to be solved are selected as in [3], namely L=2L=2, Q=IQ=I, R=0.01R=0.01 and PP is chosen as the solution of the data-driven Lyapunov in (35). By setting γ=10\gamma=10, the partition associated with the explicit data-driven predictive controller333The partition is plotted thanks to the Hybrid Toolbox [1]. is the one shown in Figure 1, which approximately correspond to that reported in [3]444The negligible differences with respect to the model-based partition are due to the noise on the batch data.. Prior to the controller deployment, we have performed the data-based closed-loop stability check in (33), resulting in555The LMIs in (33) are solved with CVX [17, 16].

𝒫=[24.869510.559510.559543.2657]0.\mathcal{P}=\begin{bmatrix}24.8695&10.5595\\ 10.5595&43.2657\end{bmatrix}\succ 0.

This indicates that the explicit law preserves the stability of the open-loop system. Figure 2 report the trajectories of the state and the optimal input obtained over a noise-free closed-loop tests with the explicit data-driven law, which confirm its effectiveness and validate the result of the data-driven stability check.

Refer to caption
(a) SNR¯=\overline{\mbox{SNR}}=40 dB
Refer to caption
(b) SNR¯=\overline{\mbox{SNR}}=20 dB
Refer to caption
(c) SNR¯=\overline{\mbox{SNR}}=10 dB
Figure 3: Open-loop stable benchmark: 𝒥\mathcal{J} in (41) vs log10(γ)\log_{10}(\gamma) for and increasing levels of noise over 30 realizations of 𝒟T\mathcal{D}_{T}.

We now assess the sensitivity of the explicit controller to the choice of γ\gamma over 2020 Monte-Carlo realizations of the batch datasets 𝒟T\mathcal{D}_{T}, for different noise levels. This evaluation is performed by looking at the cost of the controller over the same noiseless closed-loop test of length Tv=50T_{v}=50 considered previously, i.e.,

𝒥=t=0Tvx(t)Q2+u(t)R2.\mathcal{J}=\sum_{t=0}^{T_{v}}\|x(t)\|_{Q}^{2}+\|u(t)\|_{R}^{2}. (41)

As shown in Figure 3, the value of γ\gamma that leads to the minimum closed-loop cost tends to decrease when the noise level increases, supporting our considerations in Section 4.4. At the same time, by properly selecting γ\gamma we attain a cost 𝒥=16.37±0.58\mathcal{J}=16.37\pm 0.58, which is generally close to the oracle 𝒥𝒪=15.77\mathcal{J}^{\mathcal{O}}=15.77, which is the one achieved by the oracle law, i.e., the model-based predictive controller obtained by exploiting the true model of 𝒮\mathcal{S}. These results additionally show that, for increasing levels of noise, the choice of γ\gamma becomes more challenging, since the range of values leading to the minimum 𝒥\mathcal{J} progressively shrinks. Note that, when γ\gamma is excessively small the optimal input is always zero and 𝒮\mathcal{S} evolves freely666This behavior is also observed for γ106\gamma\!\leq\!10^{-6}, γ104\gamma\!\leq\!10^{-4} and γ103\gamma\!\leq\!10^{-3} when SNR¯=40\overline{\mbox{SNR}}\!=\!40 dB, SNR¯=20\overline{\mbox{SNR}}\!=\!20 dB and SNR¯=10\overline{\mbox{SNR}}\!=\!10 dB, respectively..

Refer to caption
(a) N=1N=1
Refer to caption
(b) N=10N=10
Refer to caption
(c) N=100N=100
Figure 4: Open-loop stable benchmark: 𝒥\mathcal{J} in (41) vs log10(γ)\log_{10}(\gamma) for an increasing number of repeated experiments over 30 realizations of 𝒟T\mathcal{D}_{T}.

We additionally evaluate the effect of averaging, by looking at the performance index 𝒥\mathcal{J} in (41) over 3030 Monte-Carlo data-collections for an increasing number NN of repeated experiments of length T=100T=100. The measurements are affected by noise, yielding SNR¯=20\overline{\mbox{SNR}}=20 dB. Figure 4 shows that the use of the averaged dataset 𝒟¯T\bar{\mathcal{D}}_{T} has a similar effect to a reduction of the noise level. Indeed, for increasing NN the optimal γ\gamma slowly shifts towards smaller values, thus further implying the gradual reduction in the impact of γ\gamma on noise handling.

5.2 Altitude control of a quadcopter

Refer to caption
(a) Take-off
Refer to caption
(b) Landing
Figure 5: Altitude control: measured (dotted-dashed gray line) and actual (black line) altitude vs reference (dashed red line).

As a final case study, we consider a nonlinear system, namely the problem of controlling the altitude of a quadcopter, to perform landing or take-off maneuvers. To this end, we exploit the same simulator used in [15] to collect the data and to carry out the closed-loop experiments with the learned explicit law. Let z(t)z(t) [m] be the altitude of the quadcopter, vz(t)v_{z}(t) [m/s] be its vertical velocity and (θ(t),ϕ(t),ψ(t))(\theta(t),\phi(t),\psi(t)) [deg] its roll, pitch and yaw angles at time tt. Both the the altitude z(t)z(t) [m] and the vertical velocity vz(t)v_{z}(t) are assumed to be measured, with the measurement being corrupted by a zero-mean white noise, resulting in SNR¯=\overline{\mbox{SNR}}= 30 dB over these two outputs. As this system is open-loop unstable, the data collection phase is carried out in closed-loop for 2020 [s] at a sampling rate of 4040 [Hz], by using the four proportional derivative (PD) controllers introduced in [15]. The altitude set point used at this stage is generated uniformly at random in the interval [0,4][0,4] [m]. The set points for all the attitude angles are instead selected as slowly variable random signals within the interval [0.2,0.2][-0.2,0.2] [rad]. These choices yield a dataset 𝒟T\mathcal{D}_{T} of length T=800T=800, that satisfies Assumption 1 and allows us to retain information on possible non-zero angular configurations.

The three attitude controllers introduced in [15] are further retained in testing to keep the attitude angles at zero and to decouple the altitude dynamics from that of the other state variables. Within this setting, the explicit data-driven law is designed by imposing L=5L=5, Q=P=diag([1,0.1])Q=P=\mbox{diag}([1,0.1]), R=105R=10^{-5} and γ=1\gamma=1. To mitigate the effect of the gravitational force, the design and closed-loop deployment of the designed explicit controller are carried out by pre-compensating it. As a result, the input to be optimized is

u(t)=uz(t)mg,u(t)=\frac{u_{z}(t)}{m}-g, (42)

where m=0.5m=0.5 [kg] is the mass of the quadcopter, g=9.81g=9.81 [m/s] is the gravitational acceleration and uz(t)u_{z}(t) is the input prior to the compensation. A similar approach is adopted for the control problem to fit our framework in both landing and take-off scenarios. We thus consider the reduced state

x(t)=[z(t)z¯vz(t)],x(t)=\begin{bmatrix}z(t)-\bar{z}\\ v_{z}(t)\end{bmatrix}, (43)

where z¯\bar{z} [m] is the altitude set point. To avoid potential crashes of the quadcopter, in designing the explicit law we impose the following constraint on the state of the system:

x1(t)z¯,x_{1}(t)\geq-\bar{z}, (44)

which, in turn, guarantees the altitude to be always non-negative. Meanwhile, the pre-compensated input is constrained to the interval:

9.81u(t)9.564,-9.81\leq u(t)\leq 9.564, (45)

where the lower bound corresponds to a null input and the upper limit is dictated by the maximum power of the motors777The reader is referred to [15] for additional details on the system..

The performance of the learned explicit law attained in take-off and landing are reported in Figure 5. Here we consider closed-loop tests in which the altitude and the vertical velocity are noisy, with the noise acting on the closed-loop measurements sharing the features of that corrupting the batch ones. Despite the noise acting on the initial condition at each step, both maneuvers are successfully performed, thus showing the effectiveness of the retrieved explicit data-driven laws.

6 Conclusions

By leveraging on the known PWA nature of the explicit MPC law within linear quadratic predictive control, in this paper we propose an approach to derive such an explicit controller from data only, without undertaking a full modeling/identification step. Thanks to the formalization of the problem, well-known model-based techniques can be straightforwardly adapted to check the stability of the closed-loop system before deploying the controller.
Future research will be devoted to extend these preliminary results to cases in which the state is not fully measurable, to exploit priors to guarantee practical closed-loop stability by design. Future work will also be devoted to formalize the connections between the explicit solution proposed in this paper and the one introduced in [8], consequently providing a comparative analysis of the two approaches.

References

  • [1] A. Bemporad. Hybrid Toolbox - User’s Guide, 2004. http://cse.lab.imtlucca.it/~bemporad/hybrid/toolbox.
  • [2] A. Bemporad, K. Fukuda, and F.D. Torrisi. Convexity recognition of the union of polyhedra. Computational Geometry, 18(3):141–154, 2001.
  • [3] A. Bemporad, M. Morari, V. Dua, and E.N. Pistikopoulos. The explicit linear quadratic regulator for constrained systems. Automatica, 38(1):3–20, 2002.
  • [4] J. Berberich, J. Köhler, M.A. Müller, and F. Allgöwer. Data-driven tracking MPC for changing setpoints. IFAC-PapersOnLine, 53(2):6923–6930, 2020. 21th IFAC World Congress.
  • [5] J. Berberich, J. Köhler, M.A. Müller, and F. Allgöwer. Data-driven model predictive control with stability and robustness guarantees. IEEE Transactions on Automatic Control, 66(4):1702–1717, 2021.
  • [6] J. Berberich, J. Köhler, M.A. Müller, and F. Allgöwer. Linear tracking MPC for nonlinear systems part II: The data-driven case. In arXiv/2105.08567, 2021.
  • [7] F. Borrelli, M. Baotić, J. Pekar, and G. Stewart. On the complexity of explicit mpc laws. In 2009 European Control Conference (ECC), pages 2408–2413, 2009.
  • [8] V. Breschi, A. Sassella, and S. Formentin. On the design of regularized explicit predictive controllers from input-output data. In arXiv/2110.11808, 2021.
  • [9] A. Chiuso and G. Pillonetto. System identification: A machine learning perspective. Annual Review of Control, Robotics, and Autonomous Systems, 2(1):281–304, 2019.
  • [10] J. Coulson, J. Lygeros, and F. Dörfler. Data-enabled predictive control: In the shallows of the DeePC. In 2019 18th European Control Conference (ECC), pages 307–312, 2019.
  • [11] J. Coulson, J. Lygeros, and F. Dörfler. Regularized and distributionally robust data-enabled predictive control. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 2696–2701, 2019.
  • [12] C. De Persis and P. Tesi. Formulas for data-driven control: Stabilization, optimality, and robustness. IEEE Transactions on Automatic Control, 65(3):909–924, 2019.
  • [13] F. Dörfler, J. Coulson, and I. Markovsky. Bridging direct & indirect data-driven control formulations via regularizations and relaxations. In arXiv/2101.01273, 2021.
  • [14] T. Faulwasser, M.A. Müller, and K. Worthmann. Recent advances in model predictive control: Theory, algorithms, and applications. 2021.
  • [15] S. Formentin and M. Lovera. Flatness-based control of a quadrotor helicopter via feedforward linearization. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 6171–6176, 2011.
  • [16] M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95–110. Springer-Verlag Limited, 2008. http://stanford.edu/~boyd/graph_dcp.html.
  • [17] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.
  • [18] T. Hastie, R. Tibshirani, and J.H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer series in statistics. Springer, 2009.
  • [19] D. Hrovat, S. Di Cairano, H.E. Tseng, and I.V. Kolmanovsky. The development of model predictive control in automotive industry: A survey. In 2012 IEEE International Conference on Control Applications, pages 295–302, 2012.
  • [20] L. Ljung. Perspectives on system identification. Annual Reviews in Control, 34(1):1–12, 2010.
  • [21] D.Q. Mayne. Model predictive control: Recent developments and future promise. Automatica, 50(12):2967–2986, 2014.
  • [22] D. Mignone, G. Ferrari-Trecate, and M. Morari. Stability and stabilization of piecewise affine and hybrid systems: an lmi approach. In Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187), volume 1, pages 504–509 vol.1, 2000.
  • [23] S.J. Qin and T.A. Badgwell. A survey of industrial model predictive control technology. Control Engineering Practice, 11(7):733–764, 2003.
  • [24] J.B. Rawlings. Tutorial overview of model predictive control. IEEE control systems magazine, 20(3):38–52, 2000.
  • [25] A. Sassella, V. Breschi, and S. Formentin. Learning explicit predictive controllers: theory and applications. In arXiv/2108.08412, 2021.
  • [26] J.C. Willems, P. Rapisarda, I. Markovsky, and B.L.M. De Moor. A note on persistency of excitation. Systems & Control Letters, 54(4):325–329, 2005.