This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Safety Filter Design for Neural Network Systems via
Convex Optimization

Shaoru Chen, Kong Yao Chee, Nikolai Matni, M. Ani Hsieh, George J. Pappas Shaoru Chen is with Microsoft Research, 300 Lafayette Street, New York, NY, 10012, USA, [email protected]. Kong Yao Chee, Nikolai Matni, M. Ani Hsieh, George J. Pappas are with the Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA. {ckongyao, nmatni, m.hsieh, pappasg}@seas.upenn.edu.
The first two authors contributed equally and are co-first authors.
Abstract

With the increase in data availability, it has been widely demonstrated that neural networks (NN) can capture complex system dynamics precisely in a data-driven manner. However, the architectural complexity and nonlinearity of the NNs make it challenging to synthesize a provably safe controller. In this work, we propose a novel safety filter that relies on convex optimization to ensure safety for a NN system, subject to additive disturbances that are capable of capturing modeling errors. Our approach leverages tools from NN verification to over-approximate NN dynamics with a set of linear bounds, followed by an application of robust linear MPC to search for controllers that can guarantee robust constraint satisfaction. We demonstrate the efficacy of the proposed framework numerically on a nonlinear pendulum system.

I Introduction

With the rapid development in machine learning infrastructure, neural networks (NN) have been ubiquitously applied in the modeling of complex dynamical systems [1, 2]. Through a data collection and training procedure [3], NNs can capture accurate representations of the system dynamics even in challenging scenarios where high-speed aerodynamic effects [4, 5, 6] or contact-rich environments [7, 8] are present. Moreover, NNs can be easily updated online as more data is collected, making them suitable for online tasks or modeling changing environments. For example, NN dynamical systems are widely used in model-based reinforcement learning [9] and learning-based adaptive control [6].

However, applying NN dynamics brings about significant challenges in providing safety guarantees for the controlled system. Benefiting from the expressivity of NNs, we are meanwhile faced with the high nonlinearity and large scale of NNs since they are often overparameterized. The runtime assurance (RTA) [10] mechanism provides a practical and effective solution to guarantee the safety of complex dynamical systems by designing a safety filter that primarily focuses on enforcing safety constraints. Given a primary controller that aims to optimize performance, the safety filter monitors and modifies the output of the primary controller online to guarantee that only safe control inputs are applied. The safety filter allows the design of the safety and performance-based controllers to be decoupled and has found wide applications in safe learning-based control [11, 12, 13].

In this work, we focus on the design of a predictive safety filter (PSF) [14] for uncertain NN dynamics. The PSF essentially follows a model predictive control (MPC) formulation with the nonlinear dynamics and constraints encoded in an optimization problem. Different from MPC, the PSF is less complex to solve since it does not consider any performance objectives [14]. Compared with the alternative safety filter construction schemes through control barrier functions (CBF) [15, 16] or Hamilton-Jacobi (HJ) reachability analysis [17, 18], the PSF enjoys flexibility in handling dynamically changing NN models or model uncertainty bounds when updated online. We refer the interested readers to [14, Section 1 and 2] for a detailed discussion of the PSF, CBF, and HJ reachability-based safety filters.

Contributions: In this work, we consider uncertain NN dynamics subject to bounded additive disturbances, where the disturbances can encapsulate the errors between the learned NN dynamics and the true system. Despite being highly expressive, the considered uncertain NN dynamics requires solving a robust optimization problem involving NN dynamical constraints online in the PSF. To resolve this computational challenge, we propose to apply NN verification tools [19] to abstract the NN dynamics locally as a linear uncertain system, thereby reducing the original PSF problem into one that is amenable to robust linear MPC and convex optimization. In particular, we adapt the SLS (System Level Synthesis) MPC method [20] to solve the resulting robust MPC problem. A schematic of our pipeline is shown in Fig. 1. Soft constraints are used in robust linear MPC where slack variables denoting constraint violations are penalized. By applying a hierarchy of conservative function and model uncertainty approximations, we transform the original optimization problem into a convex one. A safety certificate for the uncertain NN dynamics over a finite horizon can then be provided when all slack variables are zero. Our contributions are summarized below.

  1. 1.

    Drawing tools from NN verification and robust linear MPC, we propose a novel predictive safety filter for uncertain NN dynamics through convex optimization. Importantly, the complexity of the convex optimization problem is independent of the NN size (i.e., width and depth of the NN).

  2. 2.

    Our PSF provides a safety certificate for the uncertain NN dynamics over a finite horizon when a certain numerical criterion is met by the convex optimization solutions.

I-A Related works

The problem of ensuring the safety of learning-based systems has received significant interest with a plethora of methods described in [21]. Directly related to our work is the PSF developed in [14], which monitors and modifies a given control input by solving a predictive control problem online to guarantee the safety of the system. This formulation has been extended to the SLS setting [22], applied to racing cars [12] and a soft-constrained variant is proposed in [23] to handle unexpected disturbances to the states. The PSF that we propose differs from those in the existing work in the following ways. First, we exploit the structure of the neural networks to extract linear bounds on the NN outputs using NN verification tools [24, 19], simplifying the PSF formulation for NN dynamics. Second, our proposed pipeline circumvents the need to solve a robust non-convex optimization problem, even with the consideration of additive disturbances within the uncertain dynamics, as typical for nonlinear variants of SLS [25]. Unlike existing work in predictive control of NN dynamics [26, 27, 2], our work considers robust control of uncertain NN dynamics with a focus on obtaining formal safety guarantees.

Notation: [N][N] denotes the set {0,1,,N}\{0,1,\dots,N\}. p(z,r)\mathcal{B}_{p}(z,r) denotes the p\ell_{p} ball centered at zz with radius rr. We use x0:Tx_{0:T} to denote the sequence {x0,,xT}\{x_{0},\dotsc,x_{T}\}. 𝟎\mathbf{0} denotes a vector of all zeros, and its dimension can be inferred from context. For a vector dnd\in\mathbb{R}^{n}, S=diag(d)n×nS=\textrm{diag}(d)\in\mathbb{R}^{n\times n} denotes a diagonal matrix with dd being the diagonal vector. For a sequence of matrices S1,,SNS_{1},\cdots,S_{N}, S=blkdiag(S1,,SN)S=\text{blkdiag}(S_{1},\cdots,S_{N}) denotes a block diagonal matrix whose diagonal blocks are S1,,SNS_{1},\cdots,S_{N} arranged in the order. We represent a linear, causal operator 𝐑\mathbf{R} defined over a horizon TT by the block-lower-triangular matrix

𝐑=[R0,0R1,1R1,0RT,TRT,1RT,0]\mathbf{R}=\begin{bmatrix}R^{0,0}&\ &\ &\ \\ R^{1,1}&R^{1,0}&\ &\ \\ \vdots&\ddots&\ddots&\ \\ R^{T,T}&\cdots&R^{T,1}&R^{T,0}\end{bmatrix} (1)

where Ri,jp×qR^{i,j}\in\mathbb{R}^{p\times q} is a matrix of compatible dimension. The set of such matrices is denoted by TVT,p×q\mathcal{L}_{TV}^{T,p\times q} and the superscript TT or p×qp\times q will be dropped when it is clear from the context.

Refer to caption
Figure 1: Pipeline of the proposed predictive safety filter for uncertain NN systems.

II Problem Formulation

Consider the following discrete-time nonlinear system,

x(k+1)=Ax(k)+Bu(k)+f(x(k),u(k))+w(k),x(k+1)=Ax(k)+Bu(k)+f\left(x(k),u(k)\right)+w(k), (2)

where the vectors x(k)nxx(k)\in\mathbb{R}^{n_{x}} and u(k)nuu(k)\in\mathbb{R}^{n_{u}} are the state and control input. The vector w(k)nxw(k)\in\mathbb{R}^{n_{x}} denotes the additive disturbances that can account for unknown effects from the environment or unmodeled dynamics. We assume that w(k)w(k) is norm-bounded, i.e., w(k)𝒲:={wwσw}w(k)\in\mathcal{W}:=\{w\mid\lVert w\rVert_{\infty}\leq\sigma_{w}\}. The system consists of a set of linear dynamics characterized by the matrices AA and BB and a set of nonlinear dynamics, f(x(k),u(k))f(x(k),u(k)). Specifically, these nonlinear dynamics are given by a NN f:nx+nunxf:\mathbb{R}^{n_{x}+n_{u}}\mapsto\mathbb{R}^{n_{x}} with an arbitrary architecture. While we discuss our proposed approach taking into account a general formulation of dynamics in (2), the approach allows the matrices AA and BB to be zero and has the flexibility to account for time-varying dynamics (A(k),B(k))(A(k),B(k)). The system (2) is required to satisfy the following constraints,

x(k)𝒳,u(k)𝒰,x(k)\in\mathcal{X},\quad u(k)\in\mathcal{U}, (3)

where 𝒳nx\mathcal{X}\subset\mathbb{R}^{n_{x}} and 𝒰nu\mathcal{U}\subset\mathbb{R}^{n_{u}} are polytopes. The state x(k)x(k) and input u(k)u(k) are considered safe if they satisfy constraints (3).

II-A Predictive safety filter

We assume a primary controller π(x)\pi(x) which aims to complete a task or achieve high performance is given for system (2). Following the runtime assurance scheme, the primary controller π(x)\pi(x) is not guaranteed to be safe since its design may be decoupled from the safety requirements. To ensure constraint satisfaction of the closed-loop system, we aim to design a predictive safety filter that monitors and modifies the control input π(x(k))\pi(x(k)) given by the primary controller in a minimally invasive manner online. This is achieved by solving the following robust optimization problem at each time step kk,

minimizeu0:T1\displaystyle\underset{u_{0:T-1}}{\textrm{minimize}} u0π(x(k))22\displaystyle\quad\lVert u_{0}-\pi(x(k))\rVert_{2}^{2} (4)
subject to xt+1=Axt+But+f(xt,ut)+wt,\displaystyle\quad x_{t+1}=Ax_{t}+Bu_{t}+f(x_{t},u_{t})+w_{t},
xt𝒳,ut𝒰,wt𝒲,t[T],\displaystyle\quad x_{t}\in\mathcal{X},u_{t}\in\mathcal{U},\forall w_{t}\in\mathcal{W},\quad t\in[T],
x0=x(k),\displaystyle\quad x_{0}=x(k),

where the vectors xt,utx_{t},\,u_{t} denote the predicted states and inputs over the horizon TT, with x(k)x(k) representing the current state of the system. We denote Problem (4) as the PSF problem and the sequence u0:T1u_{0:T-1}^{*} as the optimal solution of Problem (4). When applied, the control inputs u0:T1u_{0:T-1}^{*} can guarantee the safety of the system for the next TT steps. In practice, the PSF problem (4) is solved recursively at each time step kk, and the first optimal control input u0u_{0}^{*} is applied to the system, analogous to an MPC scheme.

II-B Challenges with the predictive safety filter

While the solution to Problem (4) is able to provide safety guarantees, solving Problem (4) is a challenging task. Some potential issues include

  1. (i)

    it is well known in robust MPC that searching over open-loop control sequences u0:T1u_{0:T-1} can be overly conservative [28],

  2. (ii)

    the presence of the NN dynamics f(x,u)f(x,u) makes solving the PSF computationally challenging,

  3. (iii)

    the safety certificate of the solution u0:T1u_{0:T-1}^{*} is not available until convergence is reached, and

  4. (iv)

    without the availability of a robust forward invariant set, attempting to solve the PSF may result in infeasibility.

To handle all the aforementioned issues, we combine NN verification tools [19] and robust MPC [20]. Our solution consists of two steps. First, we generate local linear bounds for the NN dynamics f(x,u)f(x,u) using tools from NN verification. Next, we apply robust linear MPC to synthesize a state feedback control policy that guarantees robust constraint satisfaction for the system. This combined procedure provides a powerful simplification of the PSF problem and resolves issues (i) to (iii). To address the issue (iv), we introduce soft constraints in our formulation. This provides formal safety guarantees for the system when the slack variables are zero. We describe these two steps in Sections III and IV, respectively. Section V demonstrates our method numerically, and Section VI concludes the paper.

Remark 1

To ensure the safety constraints are satisfied at all times or guarantee recursive feasibility of the PSF (4), a local forward invariant set for the nonlinear system (2) is required, which is generally challenging to find. In this work, we do not assume the availability of such a forward invariant set and use slack variables as numerical certificates of safety. We leave the synthesis of the forward invariant terminal set for NN dynamics and its integration into the PSF as part of our future work.

III Neural network verification bounds

In this section, we demonstrate how tools from NN verification can be utilized to over-approximate the NN dynamics with a linear time-varying (LTV) representation. This enables us to conservatively transform the PSF problem into a robust convex optimization problem, which is simpler to solve.

Given a bounded input set, the linear relaxation-based perturbation analysis (LiRPA) [19] is an efficient method to synthesize linear lower and upper bounds for the outputs of a NN with a general architecture. The bounds computed from this method are described in the following theorem.

Theorem 1

(rephrasing [24, Theorem 3.2]) Given a NN f(z):n0nLf(z):\mathbb{R}^{n_{0}}\mapsto\mathbb{R}^{n_{L}}, there exist two explicit linear functions fU:n0nLf_{U}:\mathbb{R}^{n_{0}}\mapsto\mathbb{R}^{n_{L}} and fL:n0nLf_{L}:\mathbb{R}^{n_{0}}\mapsto\mathbb{R}^{n_{L}} such that for all zp(z0,r)z\in\mathcal{B}_{p}(z_{0},r), we have

fL(z)=ALz+bLf(z)AUz+bU=fU(z),f_{L}(z)=A_{L}z+b_{L}\leq f(z)\leq A_{U}z+b_{U}=f_{U}(z), (5)

where the inequalities are applied component-wise.

The parameters AL,AU,bL,bUA_{L},A_{U},b_{L},b_{U} are derived from the weights, biases and activation functions of the NN. In this paper, we choose p:=p:=\infty such that (z,r)\mathcal{B}_{\infty}(z,r) is polyhedral. The bounds (5) are computed using closed-form updates with a computational complexity polynomial in the number of neurons [24]. This allows the method to scale well to deep networks. However, if the NN is deep or if the input domain (z,r)\mathcal{B}_{\infty}(z,r) is large, the computed bounds tend to be loose. Motivated by this observation, we propose to extract a set of local linear bounds along a reference trajectory. Specifically, at every time step kk, we construct a reference trajectory given by the sequences of reference states 𝐱^:=x^0:T\mathbf{\hat{x}}:=\hat{x}_{0:T} and control inputs 𝐮^:=u^0:T1\mathbf{\hat{u}}:=\hat{u}_{0:T-1} where

x^t+1=Ax^t+Bu^t+f(x^t,u^t),t[T1],\displaystyle\hat{x}_{t+1}=A\hat{x}_{t}+B\hat{u}_{t}+f(\hat{x}_{t},\hat{u}_{t}),\quad t\in[T-1], (6)
x^0=x(k).\displaystyle\hat{x}_{0}=x(k).

The reference control inputs 𝐮^\mathbf{\hat{u}} can be obtained, e.g., by rolling out the nominal NN dynamics following the primary policy π()\pi(\cdot). By denoting zt:=[xtut]{z}_{t}:=[{x}_{t}^{\top}\ {u}_{t}^{\top}]^{\top}, z^t:=[x^tu^t]\hat{z}_{t}:=[\hat{x}_{t}^{\top}\ \hat{u}_{t}^{\top}]^{\top} and rtr_{t} to be the radius of the \ell_{\infty} ball around z^t\hat{z}_{t}, we apply Theorem 1 to get the following bounds for the NN dynamics along the reference trajectory,

[G¯txG¯tu]zt+g¯tf(xt,ut)[G¯txG¯tu]zt+g¯t,\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{g}_{t}\leq f(x_{t},u_{t})\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{g}_{t}, (7)

for all (xt,ut)(z^t,rt)(x_{t},u_{t})\in\mathcal{B}_{\infty}(\hat{z}_{t},r_{t}). In other words, the NN dynamics f(xt,ut)f(x_{t},u_{t}) is over-approximated with a set of linear lower and upper bounds. The ball (z^t,rt)\mathcal{B}_{\infty}(\hat{z}_{t},r_{t}) is referred to as the trust region in which the bounds (7) are valid.

To reduce conservatism in the formulation of the filter within the robust MPC framework, we integrate the bounds into the linear dynamics of the system. Specifically, using the bounds in (7), we define

fd(xt,ut):=f(xt,ut)(A~txt+B~tut+c~t),f_{d}(x_{t},u_{t}):=f(x_{t},u_{t})-\left(\tilde{A}_{t}x_{t}+\tilde{B}_{t}u_{t}+\tilde{c}_{t}\right), (8)

where

A~t:=G¯tx+G¯tx2,B~t:=G¯tu+G¯tu2,c~t:=g¯t+g¯t2\tilde{A}_{t}:=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}+\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}}{2},\;\tilde{B}_{t}:=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}+\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}}{2},\;\tilde{c}_{t}:=\frac{\underline{g}_{t}+\overline{g}_{t}}{2} (9)

denote the means of the linear bounds.

Corollary 1

Given the bounds in (7), for every zt:=[xtut](z^t,rt)z_{t}:=[x_{t}^{\top}\,u_{t}^{\top}]^{\top}\in\mathcal{B}_{\infty}(\hat{z}_{t},r_{t}), the dynamics fd(xt,ut)f_{d}(x_{t},u_{t}) in (8) have the following bounds,

[D¯txD¯tu]zt+d¯tfd(xt,ut)[D¯txD¯tu]zt+d¯t,\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{d}_{t}\leq f_{d}(x_{t},u_{t})\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{d}_{t}, (10)

where

D¯tx=G¯txG¯tx2=D¯tx,D¯tu=G¯tuG¯tu2=D¯tu,d¯t=g¯tg¯t2=d¯t.\small\begin{split}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}-\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}}{2}&=-\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x},\;\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}-\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}}{2}=-\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u},\\ &\underline{d}_{t}=\frac{\underline{g}_{t}-\overline{g}_{t}}{2}=-\overline{d}_{t}.\end{split}

It is important to note that although the NN dynamics f(xt,ut)f(x_{t},u_{t}) can have large values within the trust region (z^t,rt)\mathcal{B}_{\infty}(\hat{z}_{t},r_{t}), the dynamics fd(xt,ut)f_{d}(x_{t},u_{t}) tends to be small in magnitude and can be treated as disturbances to the system. With the extraction of fd(xt,ut)f_{d}(x_{t},u_{t}), we obtain an LTV reformulation of the PSF problem, referred to as the linear PSF problem,

minimizeu0:T1\displaystyle\underset{u_{0:T-1}}{\textrm{minimize}} u0π(x(k))22\displaystyle\quad\lVert u_{0}-\pi(x(k))\rVert_{2}^{2} (11)
subject to xt+1=Atxt+Btut+ct+Δt(xt,ut)+wt,\displaystyle\quad x_{t+1}=A_{t}x_{t}+B_{t}u_{t}+c_{t}+\Delta_{t}(x_{t},u_{t})+w_{t},
(xt,ut)(z^t,rt),t[T1],\displaystyle\quad(x_{t},u_{t})\in\mathcal{B}_{\infty}\left(\hat{z}_{t},r_{t}\right),\quad t\in[T-1],
xt𝒳,ut𝒰,t[T],\displaystyle\quad x_{t}\in\mathcal{X},\,u_{t}\in\mathcal{U},\quad t\in[T],
Δt()𝒫t,wt𝒲,t[T],\displaystyle\quad\forall\Delta_{t}(\cdot)\in\mathcal{P}_{t},\,w_{t}\in\mathcal{W},\quad t\in[T],
x0=x(k),\displaystyle\quad x_{0}=x(k),

where the means in (9) are merged into the linear dynamics of the system with the definition of the following time-varying system parameters 111When a time-varying dynamics (A(k),B(k))(A(k),B(k)) is considered in (2), we can replace (A,B)(A,B) by their time-varying counterparts in the definitions of (At,Bt)(A_{t},B_{t}). ,

At:=A+A~t,Bt:=B+B~t,ct:=c~t.A_{t}:=A+\tilde{A}_{t},\,B_{t}:=B+\tilde{B}_{t},\,c_{t}:=\tilde{c}_{t}.

The uncertainty set 𝒫t\mathcal{P}_{t} is given as

𝒫t:={Δt(xt,ut)|Δt(xt,ut)[D¯txD¯tu]zt+d¯tΔt(xt,ut)[D¯txD¯tu]zt+d¯t},\mathcal{P}_{t}:=\Bigg{\{}\Delta_{t}(x_{t},u_{t})\Bigg{|}\begin{array}[]{l}\Delta_{t}(x_{t},u_{t})\geq\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{d}_{t}\\ \Delta_{t}(x_{t},u_{t})\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{d}_{t}\end{array}\Bigg{\}}, (12)

using the bounds obtained from Corollary 1. It immediately follows that fd(xt,ut)f_{d}(x_{t},u_{t}) corresponds to a realization of Δt(xt,ut)\Delta_{t}(x_{t},u_{t}).

A few remarks about the linear PSF problem are in order. First, the solution of the linear PSF problem is dependent on the centers 𝐳^=z^0:T\mathbf{\hat{z}}=\hat{z}_{0:T} and radius rtr_{t} of the trust regions. The centers play an important role when the reference trajectory lies near or beyond the boundaries of the constraint set. In this case, the reference trajectory should be shifted towards the constraint set and this is done by adjusting the centers of these trust regions. Next, based on how the linear bounds in (12) are computed, there is a trade-off in the size of the radius rtr_{t}. A small radius ensures that the computed bounds are accurate, but it limits the range in which the centers 𝐳^\mathbf{\hat{z}} can be updated at each iteration. On the other hand, a large radius provides more flexibility in updating the reference trajectory, but the bounds can be overly conservative. Lastly, any feasible solution u0:T1u^{*}_{0:T-1} to the linear PSF problem (11) is also feasible for the PSF problem (4).

With these considerations, we devise a method such that the trust regions (z^t,rt)\mathcal{B}_{\infty}(\hat{z}_{t},\,r_{t}) can be updated online. Details of this update are given in Section IV-E. In Section IV-A to IV-D, we first describe how to solve the linear PSF problem through robust linear MPC.

IV Robust Linear MPC

Compared with the PSF problem (4), the linear PSF problem (11) only involves uncertain linear dynamics. However, solving it can still be a challenge and a conservative approach may not succeed. Since optimizing over open-loop control sequences is conservative in robust MPC, we consider optimizing over state-feedback controllers ut=μt(x0:t)u_{t}=\mu_{t}(x_{0:t}) instead. To achieve this, we apply an extension of the SLS MPC algorithm in [20] which is shown to enjoy outstanding tightness among the existing robust linear MPC methods.

IV-A Overview of SLS MPC

In SLS MPC, we consider the following uncertain linear time-varying system,

xt+1=Atxt+Btut+ξt\vspace{-0.1cm}x_{t+1}=A_{t}x_{t}+B_{t}u_{t}+\xi_{t} (13)

where the time-varying matrices that represent the nominal dynamics (At,Bt)(A_{t},B_{t}) are known and ξt\xi_{t} denotes the lumped uncertainty, which will be used to capture the effects of uncertainty in the dynamics.

Consider the dynamics (13) over a horizon TT. We define the following variables, which are concatenations of the variables in (13) over the horizon TT,

𝐱:=[x0xT],𝐮:=[u0uT],\displaystyle\mathbf{x}:=[x_{0}^{\top}\ \cdots\ x_{T}^{\top}]^{\top},\quad\quad\ \mathbf{u}:=[u_{0}^{\top}\ \cdots\ u_{T}^{\top}]^{\top}, (14)
𝝃:=[x0ξ0ξT1],\displaystyle\boldsymbol{\xi}:=[x_{0}^{\top}\ \xi_{0}^{\top}\ \cdots\ \xi_{T-1}^{\top}]^{\top},

and these concatenated system matrices,

𝐀:=blkdiag(A0,,AT),𝐁:=blkdiag(B0,,BT).\vspace{-0.1cm}\mathbf{{A}}:=\textrm{blkdiag}\left({A}_{0},\cdots,{A}_{T}\right),\,\mathbf{{B}}:=\textrm{blkdiag}\left({B}_{0},\cdots,{B}_{T}\right).

We define ZZ as the block-downshift operator with the first block sub-diagonal filled with identity matrices and zeros everywhere else. The dynamics (13) over the horizon TT can then be compactly written as,

𝐱=Z𝐀𝐱+Z𝐁𝐮+𝝃,\vspace{-0.1cm}\mathbf{x}=Z\mathbf{A}\mathbf{x}+Z\mathbf{B}\mathbf{u}+\boldsymbol{\xi}, (15)

Next, we consider a LTV state feedback controller ut=i=0tKt,tixiu_{t}=\sum_{i=0}^{t}K^{t,t-i}x_{i}, compactly given as 𝐮=𝐊𝐱\mathbf{u}=\mathbf{K}\mathbf{x}, 𝐊TVT,nu×nx\mathbf{K}\in\mathcal{L}_{TV}^{T,n_{u}\times n_{x}}. Plugging 𝐮\mathbf{u} into (15) gives the following system responses {𝚽x,𝚽u}\{\mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\}, mapping 𝝃\boldsymbol{\xi} to the closed-loop states and inputs (𝐱,𝐮)(\mathbf{x},\mathbf{u}),

[𝚽x𝚽u]:=[(IZ(𝐀+𝐁𝐊))1𝐊(IZ(𝐀+𝐁𝐊))1].\begin{bmatrix}\mathbf{\Phi}_{x}\\ \mathbf{\Phi}_{u}\end{bmatrix}:=\begin{bmatrix}(I-Z(\mathbf{A}+\mathbf{B}\mathbf{K}))^{-1}\\ \mathbf{K}(I-Z(\mathbf{A}+\mathbf{B}\mathbf{K}))^{-1}\end{bmatrix}. (16)

The following theorem establishes the connection between {𝚽x,𝚽u}\{\mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\} and state feedback controllers.

Theorem 2

[29, Theorem 2.1] Over the horizon t[T]t\in[T], for the dynamics (13) with the LTV state feedback controller 𝐮=𝐊𝐱,𝐊TVT,nu×nx\mathbf{u}=\mathbf{K}\mathbf{x},\,\mathbf{K}\in\mathcal{L}_{TV}^{T,n_{u}\times n_{x}}, we have:

  1. 1.

    The affine subspace defined by

    [IZ𝐀Z𝐁][𝚽x𝚽u]=I,𝚽x,𝚽uTVT\begin{bmatrix}I-Z\mathbf{A}&-Z\mathbf{B}\end{bmatrix}\begin{bmatrix}\mathbf{\Phi}_{x}\\ \mathbf{\Phi}_{u}\end{bmatrix}=I,\ \mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\in\mathcal{L}_{TV}^{T} (17)

    parameterizes all possible system responses (16).

  2. 2.

    For any {𝚽x,𝚽u}TVT\{\mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\}\in\mathcal{L}_{TV}^{T} satisfying (17), the controller gain 𝐊=𝚽u𝚽x1TVT\mathbf{K}=\mathbf{\Phi}_{u}\mathbf{\Phi}_{x}^{-1}\in\mathcal{L}_{TV}^{T} achieves the desired responses (16).

The system responses explicitly characterize the effects of the lumped uncertainty 𝝃\boldsymbol{\xi} on (𝐱,𝐮)(\mathbf{x},\mathbf{u}). In a previous work[20], a system subjected to polytopic uncertainties and additive disturbances is considered, i.e., ξt:=ΔAxt+ΔBut+wt\xi_{t}:=\Delta_{A}x_{t}+\Delta_{B}u_{t}+w_{t} where the parameters (ΔA,ΔB)(\Delta_{A},\Delta_{B}) belong to a polytopic set. Interested readers are referred to [20] for more details on SLS MPC.

Despite its outstanding performance in conservatism reduction compared to existing robust MPC methods [20], applying SLS MPC directly to solve the linear PSF comes with these two challenges; (i) the uncertainty set 𝒫t\mathcal{P}_{t} is defined through affine inequalities and converting 𝒫t\mathcal{P}_{t} into a vertex representation, which is amenable to existing robust MPC methods [30, 31, 32, 33], requires 2nx2^{n_{x}} vertices, (ii) applying SLS MPC requires merging the constant ctc_{t} into the lumped uncertainty ξt\xi_{t} which causes the bounds on 𝝃\boldsymbol{\xi} to be overly conservative. In the following subsections, we describe an extension to SLS MPC to address these two challenges.

IV-B Controller parameterization

For the uncertain linear dynamics

xt+1=Atxt+Btut+ct+Δt(xt,ut)+wt\vspace{-0.1cm}x_{t+1}=A_{t}x_{t}+B_{t}u_{t}+c_{t}+\Delta_{t}(x_{t},u_{t})+w_{t} (18)

stated in (11), we define ηt:=Δt(xt,ut)+wt\eta_{t}:=\Delta_{t}(x_{t},u_{t})+w_{t} as the lumped uncertainty. Instead of treating ξt=ct+ηt\xi_{t}=c_{t}+\eta_{t} in (13), we decompose (18) into a set of nominal and error dynamics.

First, in addition to (14), we concatenate these variables over the horizon TT,

𝜼:=[𝟎η0ηT1],𝐰:=[𝟎w0wT1],\displaystyle\boldsymbol{\eta}:=[\mathbf{0}^{\top}\ \eta_{0}^{\top}\ \cdots\ \eta_{T-1}^{\top}]^{\top},\;\mathbf{w}:=[\mathbf{0}^{\top}\ w_{0}^{\top}\ \cdots\ w_{T-1}^{\top}]^{\top}, (19)
𝐜:=[𝟎c0cT1],𝜹x0:=[x0 0 0].\displaystyle\mathbf{c}:=[\mathbf{0}^{\top}c_{0}^{\top}\ \cdots\ c_{T-1}^{\top}]^{\top},\quad\boldsymbol{\delta}_{x_{0}}:=[x_{0}^{\top}\ \mathbf{0}^{\top}\ \cdots\ \mathbf{0}^{\top}]^{\top}.

We define the nominal and error states {𝐡,𝐱e}\{\mathbf{h},\mathbf{x}_{e}\} and control inputs {𝐯,𝐮e}\{\mathbf{v},\mathbf{u}_{e}\} as

𝐡:=[h0hT],𝐯:=[v0vT],𝐱e:=[xe,0xe,T]=𝐱𝐡,𝐮e:=[ue,0ue,T]=𝐮𝐯,\begin{split}\mathbf{h}&:=[h_{0}^{\top}\ \cdots\ h_{T}^{\top}]^{\top},\quad\mathbf{v}:=[v_{0}^{\top}\ \cdots\ v_{T}^{\top}]^{\top},\\ \mathbf{x}_{e}&:=[x_{e,0}^{\top}\ \cdots\ x_{e,T}^{\top}]^{\top}=\mathbf{x}-\mathbf{h},\\ \mathbf{u}_{e}&:=[u_{e,0}^{\top}\ \cdots\ u_{e,T}^{\top}]^{\top}=\mathbf{u}-\mathbf{v},\end{split}

with the nominal and error dynamics as

𝐡\displaystyle\mathbf{h} =Z(𝐀𝐡+𝐁𝐯)+𝐜+𝜹x0,\displaystyle=Z(\mathbf{A}\mathbf{h}+\mathbf{B}\mathbf{v})+\mathbf{c}+\boldsymbol{\delta}_{x_{0}}, (20a)
𝐱e\displaystyle\mathbf{x}_{e} =Z(𝐀𝐱e+𝐁𝐮e)+𝜼.\displaystyle=Z(\mathbf{A}\mathbf{x}_{e}+\mathbf{B}\mathbf{u}_{e})+\boldsymbol{\eta}. (20b)

It is important to note that (20b) conforms with (13). A LTV state feedback controller 𝐊TVT,nu×nx\mathbf{K}\in\mathcal{L}_{TV}^{T,n_{u}\times n_{x}} is then applied to control the error states. The overall controller for (18) is given by 𝐮=𝐊𝐱e+𝐯=𝐊(𝐱𝐡)+𝐯\mathbf{u}=\mathbf{K}\mathbf{x}_{e}+\mathbf{v}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}.

IV-C Lumped uncertainty over-approximation

For the lumped uncertainty 𝜼\boldsymbol{\eta}, its dependence on 𝐱\mathbf{x}, 𝐮\mathbf{u} and 𝐰\mathbf{w} complicates the design of the robust controller. As in SLS MPC, the approach is to over-approximate 𝜼\boldsymbol{\eta} by an independent, filtered virtual disturbance signal 𝚿𝐰~\mathbf{\Psi}\widetilde{\mathbf{w}}, where 𝐰~=[𝟎w~0w~T1],w~t1.\widetilde{\mathbf{w}}=[\mathbf{0}^{\top}\ \tilde{w}_{0}^{\top}\ \cdots\ \tilde{w}_{T-1}^{\top}]^{\top},\ \lVert\tilde{w}_{t}\rVert_{\infty}\leq 1. The matrix 𝚿TVT,nx×nx\mathbf{\Psi}\in\mathcal{L}_{TV}^{T,n_{x}\times n_{x}} is a filter operating on the finite-horizon virtual disturbance signal 𝐰~\widetilde{\mathbf{w}}, with its diagonal blocks of 𝚿\mathbf{\Psi} structured as Ψ0,0=I,Ψt,0=diag(ψt1)\Psi^{0,0}=I,\Psi^{t,0}=\textrm{diag}(\psi_{t-1}) with ψt1nx,ψt1>0,t=1,,T.\psi_{t-1}\in\mathbb{R}^{n_{x}},\psi_{t-1}>0,\,t=1,\dots,T.

We define 𝒲~={𝐰~w~t1,t[T1]}\widetilde{\mathcal{W}}=\left\{\widetilde{\mathbf{w}}\mid\lVert\tilde{w}_{t}\rVert_{\infty}\leq 1,t\in[T-1]\right\} as the set of admissible virtual disturbances. Since w~t\tilde{w}_{t} are unit norm-bounded, we tune the filter 𝚿\mathbf{\Psi} to change the reachable set of 𝚿𝐰~\mathbf{\Psi}\widetilde{\mathbf{w}}, defined as (𝚿𝐰~):={𝜻𝜻=𝚿𝐰~,𝐰~𝒲~},\mathcal{R}(\mathbf{\Psi}\widetilde{\mathbf{w}}):=\{\boldsymbol{\zeta}\mid\boldsymbol{\zeta}=\mathbf{\Psi}\widetilde{\mathbf{w}},\widetilde{\mathbf{w}}\in\widetilde{\mathcal{W}}\}, 𝜻:=[𝟎ζ0ζT1]\boldsymbol{\zeta}:=[\mathbf{0}^{\top}\ \zeta_{0}^{\top}\ \cdots\ \zeta_{T-1}^{\top}]^{\top}.

Our goal is to find sufficient conditions on the control parameters {𝐊,𝐡,𝐯}\{\mathbf{K},\mathbf{h},\mathbf{v}\} and 𝚿\mathbf{\Psi} such that the reachable set of 𝜼\boldsymbol{\eta}, denoted by (𝜼;𝐊,𝐡,𝐯):={𝜼ηt=Δt(xt,ut)+wt,Δt𝒫t,t[T1]}\mathcal{R}\left(\boldsymbol{\eta};\,\mathbf{K},\mathbf{h},\mathbf{v}\right):=\left\{\boldsymbol{\eta}\mid\eta_{t}=\Delta_{t}(x_{t},u_{t})+w_{t},\Delta_{t}\in\mathcal{P}_{t},\,t\in[T-1]\right\}, is a subset of the reachable set of 𝚿𝐰~\mathbf{\Psi}\widetilde{\mathbf{w}}. The following proposition provides these sufficient conditions and the proof is given in Appendix -A.

Proposition 1

Let einxe_{i}\in\mathbb{R}^{n_{x}} denote the ii-th standard basis for i=1,,nxi=1,\dots,n_{x} and yt=[htvt]y_{t}=[h_{t}^{\top}\ v_{t}^{\top}]^{\top}, Φt,ti:=[Φxt,tiΦut,ti]\Phi^{t,t-i}:=\begin{bmatrix}\Phi_{x}^{t,t-i\top}&\Phi_{u}^{{t,t-i}\top}\end{bmatrix}^{\top} for t[T1]t\in[T-1]. The following constraints:

[IZ𝐀Z𝐁][𝚽x𝚽u]=𝚿,𝚽x,𝚽uTVT,\begin{bmatrix}I-Z\mathbf{A}&-Z\mathbf{B}\end{bmatrix}\begin{bmatrix}\mathbf{\Phi}_{x}\\ \mathbf{\Phi}_{u}\end{bmatrix}=\mathbf{\Psi},\ \mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\in\mathcal{L}_{TV}^{T}, (21)

and

σw+ei([D¯0xD¯0u]y0+d¯0)ψ0,i,\displaystyle\sigma_{w}+e_{i}^{\top}([\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}\;\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}]y_{0}+\overline{d}_{0})\leq\psi_{0,i}, (22a)
σwei([D¯0xD¯0u]y0+d¯0)ψ0,i,\displaystyle\sigma_{w}-e_{i}^{\top}([\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}]y_{0}+\underline{d}_{0})\leq\psi_{0,i}, (22b)
σw+ei([D¯txD¯tu]yt+d¯t)+\displaystyle\sigma_{w}+e_{i}^{\top}\Big{(}[\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]y_{t}+\overline{d}_{t}\Big{)}+
i=1tei([D¯txD¯tu]Φt,tiΨt+1,t+1i)1ψt,i,\displaystyle\sum_{i=1}^{t}\left\lVert e_{i}^{\top}\Big{(}[\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]\Phi^{t,t-i}-\Psi^{t+1,t+1-i}\Big{)}\right\rVert_{1}\leq\psi_{t,i}, (22c)
σwei([D¯txD¯tu]yt+d¯t)+\displaystyle\sigma_{w}-e_{i}^{\top}\Big{(}[\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]y_{t}+\underline{d}_{t}\Big{)}+
i=1tei([D¯txD¯tu]Φt,tiΨt+1,t+1i)1ψt,i,\displaystyle\sum_{i=1}^{t}\left\lVert e_{i}^{\top}\Big{(}[\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]\Phi^{t,t-i}-\Psi^{t+1,t+1-i}\Big{)}\right\rVert_{1}\leq\psi_{t,i}, (22d)
i[nx],t=1,,T1,\displaystyle i\in[n_{x}],\quad t=1,\cdots,T-1,

guarantee that (𝛈;𝐊,𝐡,𝐯)(𝚿𝐰~)\mathcal{R}(\boldsymbol{\eta};\mathbf{K},\mathbf{h},\mathbf{v})\subseteq\mathcal{R}(\mathbf{\Psi}\widetilde{\mathbf{w}}) holds.

IV-D Convex formulation of the linear PSF

With the constraints (21) and (22), we have for any realization of 𝜼\boldsymbol{\eta}, there exists 𝐰~𝒲~\widetilde{\mathbf{w}}\in\widetilde{\mathcal{W}} such that 𝜼=𝚿𝐰~\boldsymbol{\eta}=\mathbf{\Psi}\widetilde{\mathbf{w}}. Therefore, we can represent 𝜼\boldsymbol{\eta} as 𝚿𝐰~\mathbf{\Psi}\widetilde{\mathbf{w}} and write the error dynamics (20b) as

𝐱e=Z(𝐀𝐱e+𝐁𝐮e)+𝚿𝐰~.\mathbf{x}_{e}=Z(\mathbf{A}\mathbf{x}_{e}+\mathbf{B}\mathbf{u}_{e})+\mathbf{\Psi}\widetilde{\mathbf{w}}. (23)

By [20, Corollary 1], we have constraint (21) parameterizes all system responses 𝐱e=𝚽x𝐰~,𝐮e=𝚽u𝐰~\mathbf{x}_{e}=\mathbf{\Phi}_{x}\widetilde{\mathbf{w}},\mathbf{u}_{e}=\mathbf{\Phi}_{u}\widetilde{\mathbf{w}} of system (23) under the controller 𝐮e=𝐊𝐱e\mathbf{u}_{e}=\mathbf{K}\mathbf{x}_{e}. Then, the closed-loop states and control inputs of system (18) are given by

𝐱=𝐡+𝚽x𝐰~,𝐮=𝐯+𝚽u𝐰~.\vspace{-0.1cm}\mathbf{x}=\mathbf{h}+\mathbf{\Phi}_{x}\widetilde{\mathbf{w}},\quad\mathbf{u}=\mathbf{v}+\mathbf{\Phi}_{u}\widetilde{\mathbf{w}}. (24)

To guarantee robust constraint satisfaction of the controller 𝐮=𝐊(𝐱𝐡)+𝐯\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}, we tighten the constraints in the linear PSF (11). Consider the state constraint xt𝒳x_{t}\in\mathcal{X} as an example. The constraints are represented by a polyhedral set, 𝒳={xnxFxxbx}\mathcal{X}=\{x\in\mathbb{R}^{n_{x}}\mid F_{x}x\leq b_{x}\}, where Fxn𝒳×nxF_{x}\in\mathbb{R}^{n_{\mathcal{X}}\times n_{x}}, bx:=[b1x,,bn𝒳x]n𝒳b_{x}:=[b_{1}^{x},\dotsc,b_{n_{\mathcal{X}}}^{x}]^{\top}\in\mathbb{R}^{n_{\mathcal{X}}}, and {ajx}j=1n𝒳\{a_{j}^{x\top}\}_{j=1}^{n_{\mathcal{X}}} denote the rows of FxF_{x}. This implies that (ajx,bjx)(a_{j}^{x},b_{j}^{x}) denotes the jj-th set of linear constraint parameters in 𝒳\mathcal{X}. From (24), we have xt=ht+i=1tΦxt,tiw~i1x_{t}=h_{t}+\sum_{i=1}^{t}\Phi_{x}^{t,t-i}\tilde{w}_{i-1}. Then, the following constraints

ajxht+i=1tajxΦxt,ti1bjx,j=1,,n𝒳\vspace{-0.2cm}\small a_{j}^{x\top}h_{t}+\sum_{i=1}^{t}\left\lVert a_{j}^{x\top}\Phi_{x}^{t,t-i}\right\rVert_{1}\leq b_{j}^{x},\quad j=1,\dotsc,n_{\mathcal{X}} (25)

guarantee that all constraints in 𝒳\mathcal{X} are satisfied robustly. As discussed in Section II, the recursive feasibility of the linear PSF cannot be guaranteed without a robust forward invariant set. Therefore, in this work, we introduce soft constraints into (25),

ajxht+i=1tajxΦxt,ti1bjx+ϵx,jt,\displaystyle a_{j}^{x\top}h_{t}+\sum_{i=1}^{t}\left\lVert a_{j}^{x\top}\Phi_{x}^{t,t-i}\right\rVert_{1}\leq b_{j}^{x}+\epsilon_{x,j}^{t}, ϵx,jt0,t[T],\displaystyle\epsilon_{x,j}^{t}\geq 0,\,t\in[T], (26)
j=1,,n𝒳.\displaystyle j=1,\dotsc,n_{\mathcal{X}}.

In the cost function, a large penalty on ϵxt1\lVert\epsilon_{x}^{t}\rVert_{1} is applied. In the case ϵx,jt=0\epsilon_{x,j}^{t}=0, we obtain robust constraint satisfaction guarantee for the constraint with parameters (ajx,bjx)(a_{j}^{x},b_{j}^{x}). Similarly, the input constraints can be tightened as

akuvt+i=1takuΦut,ti1bku+ϵu,kt,\displaystyle a_{k}^{u\top}v_{t}+\sum_{i=1}^{t}\left\lVert a_{k}^{u\top}\Phi_{u}^{t,t-i}\right\rVert_{1}\leq b_{k}^{u}+\epsilon_{u,k}^{t}, ϵu,kt0,t[T],\displaystyle\epsilon_{u,k}^{t}\geq 0,\,t\in[T], (27)
k=1,,n𝒰,\displaystyle k=1,\dotsc,n_{\mathcal{U}},

where n𝒰n_{\mathcal{U}} denotes the number of linear inequalities defining 𝒰\mathcal{U} and (aku,bku)(a_{k}^{u},b_{k}^{u}) denotes the kk-th set of linear constraint parameters in 𝒰\mathcal{U}. The trust region constraints (xt,ut)(z^t,rt)(x_{t},u_{t})\in\mathcal{B}_{\infty}\left(\hat{z}_{t},r_{t}\right) analogously can be tightened as

ajht+i=1tajΦxt,ti1bj+σx,jt,σx,jt0,\displaystyle a_{j}^{\top}h_{t}+\sum_{i=1}^{t}\left\lVert a_{j}^{\top}\Phi_{x}^{t,t-i}\right\rVert_{1}\leq b_{j}+\sigma_{x,j}^{t},\,\sigma_{x,j}^{t}\geq 0, (28)
akvt+i=1takΦut,ti1bk+σu,kt,σu,kt0,\displaystyle a_{k}^{\top}v_{t}+\sum_{i=1}^{t}\left\lVert a_{k}^{\top}\Phi_{u}^{t,t-i}\right\rVert_{1}\leq b_{k}+\sigma_{u,k}^{t},\,\sigma_{u,k}^{t}\geq 0,
j=1,,2nx,k=1,,2nu,t[T1],\displaystyle j=1,\dotsc,2n_{x},\,k=1,\dotsc,2n_{u},\,t\in[T-1],

where (aj,bj)(a_{j},b_{j}) and (ak,bk)(a_{k},b_{k}) are defined similarly for the state and input-related constraint parameters in (z^t,rt)\mathcal{B}_{\infty}\left(\hat{z}_{t},r_{t}\right).

Summarizing the results above, we propose a convex tightening of the linear PSF, which can be written as

minimize𝚽x,𝚽u,𝚿,𝐡,𝐯,{ϵxt,ϵut,σxt,σut}\displaystyle\underset{\begin{subarray}{c}\mathbf{\Phi}_{x},\mathbf{\Phi}_{u},\mathbf{\Psi},\mathbf{h},\mathbf{v},\\ \{\epsilon_{x}^{t},\epsilon_{u}^{t},\sigma_{x}^{t},\sigma_{u}^{t}\}\end{subarray}}{\textrm{minimize}} u0π(x(k))22+Mϵt=0T(ϵxt1+ϵut1)\displaystyle\quad\left\lVert u_{0}-\pi(x(k))\right\rVert_{2}^{2}+M_{\epsilon}\sum_{t=0}^{T}\left(\left\lVert\epsilon_{x}^{t}\right\rVert_{1}+\left\lVert\epsilon_{u}^{t}\right\rVert_{1}\right) (29)
+Mσt=0T1(σxt1+σut1)\displaystyle\quad+M_{\sigma}\sum_{t=0}^{T-1}\left(\left\lVert\sigma_{x}^{t}\right\rVert_{1}+\left\lVert\sigma_{u}^{t}\right\rVert_{1}\right)
subject to affine constraint (21),\displaystyle\quad\text{affine constraint~{}\eqref{eq:affine_scaled}},
over-approximation constraints (22),\displaystyle\quad\text{over-approximation constraints~{}\eqref{eq:over_approx_constr}},
soft state and input constraints (26) and (27),\displaystyle\quad\text{soft state and input constraints~{}\eqref{eq:soft_state} and~{}\eqref{eq:soft_input}},
soft trust region constraints (28),\displaystyle\quad\text{soft trust region constraints~{}\eqref{eq:soft_trust_region}},
x0=x(k),\displaystyle\quad x_{0}=x(k),

where Mϵ,Mσ>0M_{\epsilon},M_{\sigma}>0 are chosen as large numbers. When a polyhedral robust forward invariant set is used as the terminal constraint, we can tighten it similarly as (26). All the constraints in Problem (29) are linear, making (29) a quadratic program. With the use of soft constraints, (29) is always feasible. If the slack variables {ϵxt,ϵut,σxt,σut}\{\epsilon_{x}^{t},\epsilon_{u}^{t},\sigma_{x}^{t},\sigma_{u}^{t}\} in the solution are zero, we obtain a certificate that the system is safe for the next TT steps under the state-feedback controller 𝐮=𝐊(𝐱𝐡)+𝐯\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v} with 𝐊=𝚽u𝚽x1\mathbf{K}=\mathbf{\Phi}_{u}\mathbf{\Phi}_{x}^{-1}.

IV-E Trust region update

Following the discussion on the trust regions in Section III, we describe a method to update the trust regions online. Starting with the reference trajectory given by the primary policy π(x)\pi(x), we propose to iteratively increase rtr_{t} and update the reference trajectory by applying the policy 𝐮=𝐊(𝐱𝐡)+𝐯\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}, synthesized in Section IV. We then pick the reference trajectory that gives the smallest slack variables and apply the corresponding control inputs to the system. Our framework is summarized in Algorithm 1.

Algorithm 1 Robust Linear MPC-based PSF

Input: Current state x(k)x(k), horizon TT, number of iterations NN, initial reference trajectory z^0:T\hat{z}_{0:T}, reference control input π(x(k))\pi(x(k)), initial trust region radius r0>0r_{0}>0
Output: Filtered control input u0u_{0}^{*}, safety certificate safe_cert

1:At each time step kk,
2:Initialize: rtr0,t[T1],{𝐊,𝐡,𝐯}{0}r_{t}\leftarrow r_{0},\;t\in[T-1],\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}\leftarrow\{\textbf{0}\}, ϵ\epsilon^{*}\leftarrow\infty.
3:for =1,,N\ell=1,\dots,N do
4:     Construct (z^t,rt)\mathcal{B}_{\infty}(\hat{z}_{t},r_{t}) and 𝒫t\mathcal{P}_{t} with (7) and (10)
5:     Solve (29) with x(k)x(k) to get {𝐊,𝐡,𝐯}\{\mathbf{K},\mathbf{h},\mathbf{v}\}
6:     Extract ϵmax:=\epsilon_{max}:= max. value of slack variables
7:     if ϵmax=0\epsilon_{max}=0 then
8:         Set {𝐊,𝐡,𝐯}{𝐊,𝐡,𝐯}\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}\leftarrow\{\mathbf{K},\mathbf{h},\mathbf{v}\}
9:         Set safe_cert \leftarrow True
10:     else
11:         Update z^0:T\hat{z}_{0:T} with 𝐮=𝐊(𝐱𝐡)+𝐯\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}, as described
12:         in (6)
13:         Update rtβrtr_{t}\leftarrow\beta r_{t} for t[T1]t\in[T-1] with β>1\beta>1.
14:         if ϵmaxϵ\epsilon_{max}\leq\epsilon^{*} then
15:              Set ϵϵmax\epsilon^{*}\leftarrow\epsilon_{max}
16:              Set {𝐊,𝐡,𝐯}{𝐊,𝐡,𝐯}\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}\leftarrow\{\mathbf{K},\mathbf{h},\mathbf{v}\}
17:              Set safe_cert \leftarrow False               
18:Set u0v0u_{0}^{*}\leftarrow v_{0}, extracted from {𝐊,𝐡,𝐯}\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}

V Numerical Example

To verify the efficacy of the proposed solution, we test it on a NN proxy of the nonlinear pendulum system 222 Our codes are publicly available at https://github.com/ShaoruChen/NN-System-PSF.. The pendulum consists of the following dynamics [34],

θ¨=3gsin(θ)2l+3τml2,\ddot{\theta}=\frac{3g\sin(\theta)}{2l}+\frac{3\tau}{ml^{2}}\,, (30)

where θ\theta is the angle between the pendulum and the vertical, mm and ll are the mass and length of the pendulum, gg is the gravitational force, and τ\tau is the external torque acting on the pendulum. The state and control input are defined as x:=[θθ˙]2x:=[\theta\;\dot{\theta}]^{\top}\in\mathbb{R}^{2} and u:=τu:=\tau\in\mathbb{R}. To obtain the linear dynamics in (2), the dynamics (30) are linearized about the origin and discretized with a sampling time of 0.05s to obtain the following dynamics,

xt+1=[1.00920.050150.3691.0092]xt+[0.001250.05015]ut.x_{t+1}=\begin{bmatrix}1.0092&0.05015\\ 0.369&1.0092\end{bmatrix}x_{t}+\begin{bmatrix}0.00125\\ 0.05015\end{bmatrix}u_{t}. (31)

Next, we train a NN f(x,u)f(x,u) to approximate the residual dynamics that are not captured by the linear dynamics in (31). We first collect data by simulating the nonlinear dynamics in (30) for a duration of 15s. The NN is then trained through a backpropagation procedure [3], using the mean squared errors between the predicted and true states as the loss function. The NN consists of 3 hidden layers with 64 neurons in each layer and uses the rectified linear unit (ReLU) as the activation function. Additive noise with a maximum magnitude σw\sigma_{w} of {0.05,0.1}\{0.05,0.1\} is injected into the states of the system. For the primary control policy, we consider an iterative linear quadratic regulator (iLQR) scheme [35] with the box-constrained heuristic [36]. This is implemented with the mpc.pytorch library [37]. The box-constrained heuristic allows the system to adhere to the control constraints, but does not account for state constraints.

Four test cases are considered in our experiments. In each of these cases, the system is required to track a pair of reference angles (θr,1,θr,2)\left(\theta_{r,1},\,\theta_{r,2}\right) sequentially, starting from an initial condition x0x_{0} and across a duration of 2s. The initial conditions and reference angles are given in Table I.

TABLE I: Initial conditions and angles for the test scenarios.
Test Case x0x_{0} [deg; deg/s] θr,1\theta_{r,1} [deg] θr,2\theta_{r,2} [deg]
1 [57.3; -120.3] 120 -50
2 [-85.9; -85.9] -150 40
3 [-85.9; -114.6] -100 -180
4 [85.9; 57.3] 100 180

We simulate each of these test cases under 4 control schemes - (i) a nominal iLQR framework, (ii) a soft-constrained iLQR framework (SC-iLQR), (iii) safe-filtered iLQR, where we apply the proposed safety filter to the nominal iLQR scheme, and (iv) safe-filtered SC-iLQR, where the safety filter is applied to SC-iLQR. For SC-iLQR, soft state constraints are incorporated into the cost function of the forward pass of the iLQR algorithm. Specifically, the function ϕ(x)=max{0,x}\phi(x)=\text{max}\{0,x\} is applied onto each of the constraints, which increases the cost function proportionally whenever the constraints are violated. In safe-filtered iLQR and SC-iLQR, the initial reference trajectories z^0:T\hat{z}_{0:T} in Algorithm 1 were initialized by iLQR and SC-iLQR, respectively.

The state trajectories under these control schemes are plotted in Fig. 2 and the percentages of constraint violations are tabulated in Table II. These percentages are computed by taking the ratio between the number of points in which the states violate the constraints against the total number of points in the state trajectory. Since the nominal iLQR method does not account for state constraints, it results in the largest percentage of constraint violation. While the soft-constrained iLQR reduces the level of constraint violation, there are a number of instances where the constraints are violated, as depicted in Fig. 2. On the other hand, as shown in Table II, through the application of the safety filter to iLQR and SC-iLQR, no constraint violations are observed for the test cases. To illustrate the safety certificate obtained, we plot the slack variables that characterize the trust region and state and input constraints, together with the safety certificate for the third test case, under a maximum noise level σw=0.05\sigma_{w}=0.05, in Fig. 3. The combination of Table II and Fig. 3 indicates that given the formulation in (29), the PSF may not give a numerical safety certificate even when the state trajectories are safe. Meanwhile, this proposed PSF can effectively encourage the system to behave safely by minimizing the conservative upper bounds on the constraint violation.

The statistics of the computational times of the control scheme, in this case the iLQR scheme, and the safety filter are depicted in Fig. 4. While introducing soft constraints increases the run times of the iLQR scheme, it has a significant effect on the run times of the safety filter, as observed in the right panel of Fig. 4. With the soft constraints added to the iLQR scheme, the state trajectories are closer to the boundaries of the constraint sets, without the activation of the safety filter. This allows the filter to find a solution that satisfies the constraints under a smaller number of iterations, which reduces computational time.

Refer to caption
Figure 2: Top panel: Plots of the state trajectories under the 4 test cases. The initial and final states are marked with green and magenta circles. The state constraints are plotted with black dashed lines. Bottom panel: Zoomed-in plots of the state trajectories, around the boundaries where constraint violation potentially occurs.
Refer to caption
Figure 3: Top two panels: Time histories of the slack variables for the trust region, state and input constraints. Bottom panel: Time histories of the safety certificate attained. The values of 1 and 0 denote True and False respectively.
TABLE II: Percentage of state constraint violations.
Method Maximum noise level, σw=0.05\sigma_{w}=0.05
Case 1 [%] Case 2 [%] Case 3 [%] Case 4 [%]
iLQR 12.20 14.63 14.63 21.95
SC-iLQR 7.32 4.88 12.20 12.20
Safe-filtered iLQR 0.0 0.0 0.0 0.0
Safe-filtered SC-iLQR 0.0 0.0 0.0 0.0
Maximum noise level, σw=0.1\sigma_{w}=0.1
Case 1 [%] Case 2 [%] Case 3 [%] Case 4 [%]
iLQR 14.63 14.63 17.07 26.83
SC-iLQR 4.87 2.44 7.32 9.75
Safe-filtered iLQR 0.0 0.0 0.0 0.0
Safe-filtered SC-iLQR 0.0 0.0 0.0 0.0
Refer to caption
Figure 4: Statistics of the run times of the controller and the safety filter, under the four test methods.

VI Conclusion

We propose a convex optimization-based predictive safety filter for uncertain NN dynamical systems with the inclusion of additive disturbances. By utilizing tools from NN verification and robust linear MPC, our method requires solving a soft-constrained convex program online whose complexity is independent of the NN size. With our framework, formal safety guarantees, together with a robust state-feedback controller, are attained when the slack variables in the solution are all zero.

-A Proof of Proposition 1

Showing that (𝜼;𝐊,𝐡,𝐯)(𝚿𝐰~)\mathcal{R}(\boldsymbol{\eta};\mathbf{K},\mathbf{h},\mathbf{v})\subseteq\mathcal{R}(\mathbf{\Psi}\widetilde{\mathbf{w}}) is equivalent to showing for every possible values of 𝜼\boldsymbol{\eta}, there exist 𝐰~𝒲~\widetilde{\mathbf{w}}^{*}\in\widetilde{\mathcal{W}} such that 𝜼=𝚿𝐰~\boldsymbol{\eta}=\mathbf{\Psi}\widetilde{\mathbf{w}}^{*}. Following this perspective, the sufficient conditions in Proposition 1 can be derived inductively.

Initial case

At t=0t=0, we have x0=h0,u0=v0x_{0}=h_{0},u_{0}=v_{0}, and

Δ0(x0,u0)\displaystyle\Delta_{0}(x_{0},u_{0}) [D¯0xD¯0u]y0+d¯0,\displaystyle\geq\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}\end{bmatrix}y_{0}+\underline{d}_{0}, (32)
Δ0(x0,u0)\displaystyle\Delta_{0}(x_{0},u_{0}) [D¯0xD¯0u]y0+d¯0,\displaystyle\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}\end{bmatrix}y_{0}+\overline{d}_{0},

with y0=[h0v0]y_{0}=[h_{0}^{\top}\ v_{0}^{\top}]^{\top}. Note that η0=Δ0(x0,u0)+w0\eta_{0}=\Delta_{0}(x_{0},u_{0})+w_{0}. For η0=Ψ1,0w~0=diag(ψ0)w~0\eta_{0}=\Psi^{1,0}\tilde{w}_{0}=\textrm{diag}(\psi_{0})\tilde{w}_{0} to have a solution w~0\tilde{w}_{0} satisfying w~01\lVert\tilde{w}_{0}\rVert_{\infty}\leq 1 for all possible realization of the lumped uncertainty η0\eta_{0}, it is both sufficient and necessary to have

diag(ψ0)1η01{eiη0ψ0,i,i[nx],eiη0ψ0,i,i[nx],\lVert\textrm{diag}(\psi_{0})^{-1}\eta_{0}\lVert_{\infty}\leq 1\Leftrightarrow\left\{\begin{array}[]{c}e_{i}^{\top}\eta_{0}\leq\psi_{0,i},i\in[n_{x}],\\ -e_{i}^{\top}\eta_{0}\leq\psi_{0,i},i\in[n_{x}],\end{array}\right. (33)

hold robustly for η0\eta_{0} where eie_{i} denotes the ii-th standard basis. Using the bounds in (32) and the fact that w0σw\lVert w_{0}\rVert_{\infty}\leq\sigma_{w}, we have constraints (22a) and (22b) guarantee the robust feasibility of (33) for all possible values of η0\eta_{0}. For any realization of η0\eta_{0}, we denote the corresponding solution as w~0\tilde{w}_{0}^{*} such that η0=diag(ψ0)w~0\eta_{0}=\textrm{diag}(\psi_{0})\tilde{w}_{0}^{*}.

Induction step

For a given controller {𝐊,𝐡,𝐯}\{\mathbf{K},\mathbf{h},\mathbf{v}\}, consider an arbitrary realization of the lumped uncertainty η0:T\eta_{0:T}. At time t1t\geq 1, let 𝜼0:t:=[𝟎η0ηt1]\boldsymbol{\eta}_{0:t}:=[\mathbf{0}^{\top}\ \eta_{0}\ \cdots\ \eta_{t-1}^{\top}]^{\top} denote the truncation of 𝜼\boldsymbol{\eta} consisting of the first t+1t+1 components of 𝜼\boldsymbol{\eta}, and 𝚿0:tTVt\mathbf{\Psi}_{0:t}\in\mathcal{L}_{TV}^{t} denote the truncation of 𝚿\mathbf{\Psi} up to the t+1t+1-th row and column. The truncated vectors 𝐱e,0:t,𝐮e,0:t\mathbf{x}_{e,0:t},\mathbf{u}_{e,0:t} and matrices 𝐀0:t,𝐁0:t\mathbf{A}_{0:t},\mathbf{B}_{0:t} are defined similarly. Assume there exist 𝐰~0:t=[𝟎w~0w~t1]\widetilde{\mathbf{w}}^{*}_{0:t}=[\mathbf{0}^{\top}\ \tilde{w}_{0}^{*\top}\ \cdots\ \tilde{w}_{t-1}^{*\top}]^{\top} such that 𝜼0:t=𝚿0:t𝐰~0:t\boldsymbol{\eta}_{0:t}=\mathbf{\Psi}_{0:t}\widetilde{\mathbf{w}}^{*}_{0:t}. Next, we will show that under constraint (22) there exist w~t\tilde{w}^{*}_{t} with w~t1\lVert\tilde{w}^{*}_{t}\rVert_{\infty}\leq 1 such that

ηt\displaystyle\eta_{t} =i=1t+1Ψt+1,t+1iw~i1\displaystyle=\sum_{i=1}^{t+1}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*} (34)
=i=1tΨt+1,t+1iw~i1+diag(ψt)w~t\displaystyle=\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}+\textrm{diag}(\psi_{t})\tilde{w}_{t}^{*}

holds.

Since 𝜼0:t=𝚿0:t𝐰~0:t\boldsymbol{\eta}_{0:t}=\mathbf{\Psi}_{0:t}\widetilde{\mathbf{w}}^{*}_{0:t}, the error dynamics (20b) up to time tt can be written as

𝐱e,0:t=Z(𝐀0:t𝐱e,0:t+𝐁0:t)𝐮e,0:t+𝚿0:t𝐰~0:t.\mathbf{x}_{e,0:t}=Z(\mathbf{A}_{0:t}\mathbf{x}_{e,0:t}+\mathbf{B}_{0:t})\mathbf{u}_{e,0:t}+\mathbf{\Psi}_{0:t}\widetilde{\mathbf{w}}^{*}_{0:t}. (35)

According to [20, Corollary 1], the affine constraint (21) parameterizes all closed-loop system responses 𝐱e,0:t=𝚽x,0:t𝐰~0:t\mathbf{x}_{e,0:t}=\mathbf{\Phi}_{x,0:t}\widetilde{\mathbf{w}}^{*}_{0:t}, 𝐮e,0:t=𝚽u,0:t𝐰~0:t\mathbf{u}_{e,0:t}=\mathbf{\Phi}_{u,0:t}\widetilde{\mathbf{w}}^{*}_{0:t} under the controller 𝐮e,0:t=𝐊0:t𝐱e,0:t\mathbf{u}_{e,0:t}=\mathbf{K}_{0:t}\mathbf{x}_{e,0:t} where 𝚽x,0:t,𝚽u,0:t,𝐊0:tTVt\mathbf{\Phi}_{x,0:t},\mathbf{\Phi}_{u,0:t},\mathbf{K}_{0:t}\in\mathcal{L}_{TV}^{t} are the corresponding truncation of 𝚽x,𝚽u,𝐊\mathbf{\Phi}_{x},\mathbf{\Phi}_{u},\mathbf{K}, respectively. Define yt:=[htvt]y_{t}:=[h_{t}^{\top}\,v_{t}^{\top}]^{\top} and Φt,ti:=[Φxt,tiΦut,ti]\Phi^{t,t-i}:=\begin{bmatrix}\Phi_{x}^{t,t-i\top}&\Phi_{u}^{t,t-i\top}\end{bmatrix}^{\top}. It follows from (35) and the fact 𝐱e,0:t=𝚽x,0:t𝐰~0:t\mathbf{x}_{e,0:t}=\mathbf{\Phi}_{x,0:t}\widetilde{\mathbf{w}}^{*}_{0:t}, 𝐮e,0:t=𝚽u,0:t𝐰~0:t\mathbf{u}_{e,0:t}=\mathbf{\Phi}_{u,0:t}\widetilde{\mathbf{w}}^{*}_{0:t} that the state xtx_{t} and control input utu_{t} at time tt under the policy {𝐊,𝐡,𝐯}\{\mathbf{K},\mathbf{h},\mathbf{v}\} are given by

[xtut]=zt+i=1tΦt,tiw~i1.\begin{bmatrix}x_{t}\\ u_{t}\end{bmatrix}=z_{t}+\sum_{i=1}^{t}\Phi^{t,t-i}\tilde{w}_{i-1}^{*}. (36)

Correspondingly, the uncertainty Δt(xt,ut)\Delta_{t}(x_{t},u_{t}) at time tt are bounded by

Δt(xt,ut)\displaystyle\Delta_{t}(x_{t},u_{t}) [D¯txD¯tu]zt+d¯t\displaystyle\geq\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{d}_{t} (37)
=[D¯txD¯tu](yt+i=1tΦt,tiw~i1)+d¯t,\displaystyle=\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}\Big{(}y_{t}+\sum_{i=1}^{t}\Phi^{t,t-i}\tilde{w}_{i-1}^{*}\Big{)}+\underline{d}_{t},
Δt(xt,ut)\displaystyle\Delta_{t}(x_{t},u_{t}) [D¯txD¯tu]zt+d¯t\displaystyle\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{d}_{t}
=[D¯txD¯tu](yt+i=1tΦt,tiw~i1)+d¯t,\displaystyle=\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}\Big{(}y_{t}+\sum_{i=1}^{t}\Phi^{t,t-i}\tilde{w}_{i-1}^{*}\Big{)}+\overline{d}_{t},

which is achieved by plugging (36) into the definition of 𝒫t\mathcal{P}_{t}. Recall that ηt=Δt(xt,ut)+wt\eta_{t}=\Delta_{t}(x_{t},u_{t})+w_{t}. For

ηt=i=1tΨt+1,t+1iw~i1+diag(ψt)w~t\eta_{t}=\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}+\textrm{diag}(\psi_{t})\tilde{w}_{t} (38)

to have a solution w~t\tilde{w}_{t} such that w~t1\lVert\tilde{w}_{t}\rVert_{\infty}\leq 1, it is equivalent to require

ei(ηti=1tΨt+1,t+1iw~i1)\displaystyle e_{i}^{\top}\Big{(}\eta_{t}-\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}\Big{)} ψt,i,i[nx],\displaystyle\leq\psi_{t,i},i\in[n_{x}], (39)
ei(ηti=1tΨt+1,t+1iw~i1)\displaystyle-e_{i}^{\top}\Big{(}\eta_{t}-\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}\Big{)} ψt,i,i[nx],\displaystyle\leq\psi_{t,i},i\in[n_{x}],

hold. Since Δt(xt,ut)\Delta_{t}(x_{t},u_{t}) is bounded by (37), w~i1\lVert\tilde{w}_{i}^{*}\rVert_{\infty}\leq 1 for i[t1]i\in[t-1] and wtσw\lVert w_{t}\rVert_{\infty}\leq\sigma_{w}, we have constraints (22c) and (22d) are sufficient to guarantee the inequalities (39) hold. Then, we denote the solution of (38) as w~t\tilde{w}_{t}^{*} for the given realization of ηt\eta_{t}. We repeat this process until t=Tt=T and in this way construct the virtual disturbance 𝐰~\widetilde{\mathbf{w}}^{*} such that 𝜼=𝚿𝐰~\boldsymbol{\eta}=\mathbf{\Psi}\widetilde{\mathbf{w}}^{*}. Since the realization of 𝜼\boldsymbol{\eta} is chosen arbitrarily, we prove Proposition 1.

References

  • [1] O. Ogunmolu, X. Gu, S. Jiang, and N. Gans, “Nonlinear systems identification using deep dynamic neural networks,” arXiv preprint arXiv:1610.01439, 2016.
  • [2] K. Y. Chee, T. Z. Jiahao, and M. A. Hsieh, “Knode-mpc: A knowledge-based data-driven predictive control framework for aerial robots,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2819–2826, 2022.
  • [3] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
  • [4] S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin, “Learning quadrotor dynamics using neural network for flight control,” in 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4653–4660, IEEE, 2016.
  • [5] L. Bauersfeld, E. Kaufmann, P. Foehn, S. Sun, and D. Scaramuzza, “Neurobem: Hybrid aerodynamic quadrotor model,” arXiv preprint arXiv:2106.08015, 2021.
  • [6] M. O’Connell, G. Shi, X. Shi, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural-fly enables rapid learning for agile flight in strong winds,” Science Robotics, vol. 7, no. 66, p. eabm6597, 2022.
  • [7] Y. Yang, K. Caluwaerts, A. Iscen, T. Zhang, J. Tan, and V. Sindhwani, “Data efficient reinforcement learning for legged robots,” in Conference on Robot Learning, pp. 1–10, PMLR, 2020.
  • [8] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou, “Information theoretic MPC for model-based reinforcement learning,” in 2017 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 1714–1721, IEEE, 2017.
  • [9] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning,” in 2018 IEEE international conference on robotics and automation (ICRA), pp. 7559–7566, IEEE, 2018.
  • [10] K. L. Hobbs, M. L. Mote, M. C. Abate, S. D. Coogan, and E. M. Feron, “Runtime assurance for safety-critical systems: An introduction to safety filtering approaches for complex control systems,” IEEE Control Systems Magazine, vol. 43, no. 2, pp. 28–65, 2023.
  • [11] Y. Emam, G. Notomista, P. Glotfelter, Z. Kira, and M. Egerstedt, “Safe reinforcement learning using robust control barrier functions,” IEEE Robotics and Automation Letters, no. 99, pp. 1–8, 2022.
  • [12] B. Tearle, K. P. Wabersich, A. Carron, and M. N. Zeilinger, “A predictive safety filter for learning-based racing control,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7635–7642, 2021.
  • [13] M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
  • [14] K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” Automatica, vol. 129, p. 109597, 2021.
  • [15] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016.
  • [16] H. Ma, B. Zhang, M. Tomizuka, and K. Sreenath, “Learning differentiable safety-critical control using control barrier functions for generalization to novel environments,” in 2022 European Control Conference (ECC), pp. 1301–1308, IEEE, 2022.
  • [17] A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger, and C. J. Tomlin, “Reachability-based safe learning with gaussian processes,” in 53rd IEEE Conference on Decision and Control, pp. 1424–1431, IEEE, 2014.
  • [18] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2737–2752, 2018.
  • [19] K. Xu, Z. Shi, H. Zhang, Y. Wang, K.-W. Chang, M. Huang, B. Kailkhura, X. Lin, and C.-J. Hsieh, “Automatic perturbation analysis for scalable certified robustness and beyond,” Advances in Neural Information Processing Systems, vol. 33, pp. 1129–1141, 2020.
  • [20] S. Chen, V. M. Preciado, M. Morari, and N. Matni, “Robust model predictive control with polytopic model uncertainty through system level synthesis,” arXiv preprint arXiv:2203.11375, 2022.
  • [21] L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
  • [22] A. P. Leeman, J. Köhler, S. Bennani, and M. N. Zeilinger, “Predictive safety filter using system level synthesis,” in Proceedings of The 5th Annual Learning for Dynamics and Control Conference, pp. 1180–1192, PMLR, 2023.
  • [23] K. P. Wabersich and M. N. Zeilinger, “Predictive control barrier functions: Enhanced safety mechanisms for learning-based control,” IEEE Transactions on Automatic Control, 2022.
  • [24] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel, “Efficient neural network robustness certification with general activation functions,” Advances in neural information processing systems, vol. 31, 2018.
  • [25] A. P. Leeman, J. Köhler, A. Zanelli, S. Bennani, and M. N. Zeilinger, “Robust nonlinear optimal control via system level synthesis,” arXiv preprint arXiv:2301.04943, 2023.
  • [26] A. Saviolo, G. Li, and G. Loianno, “Physics-inspired temporal learning of quadrotor dynamics for accurate model predictive trajectory tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10256–10263, 2022.
  • [27] N. A. Spielberg, M. Brown, and J. C. Gerdes, “Neural network model predictive motion control applied to automated driving with unknown friction,” IEEE Transactions on Control Systems Technology, vol. 30, no. 5, pp. 1934–1945, 2021.
  • [28] D. Q. Mayne, “Model predictive control: Recent developments and future promise,” Automatica, vol. 50, no. 12, pp. 2967–2986, 2014.
  • [29] J. Anderson, J. C. Doyle, S. H. Low, and N. Matni, “System level synthesis,” Annual Reviews in Control, vol. 47, pp. 364–393, 2019.
  • [30] W. Langson, I. Chryssochoos, S. Raković, and D. Q. Mayne, “Robust model predictive control using tubes,” Automatica, vol. 40, no. 1, pp. 125–133, 2004.
  • [31] J. Köhler, E. Andina, R. Soloperto, M. A. Müller, and F. Allgöwer, “Linear robust adaptive model predictive control: Computational complexity and conservatism,” in 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 1383–1388, IEEE, 2019.
  • [32] J. Fleming, B. Kouvaritakis, and M. Cannon, “Robust tube mpc for linear systems with multiplicative uncertainty,” IEEE Transactions on Automatic Control, vol. 60, no. 4, pp. 1087–1092, 2014.
  • [33] M. Bujarbaruah, U. Rosolia, Y. R. Stürz, X. Zhang, and F. Borrelli, “Robust mpc for lpv systems via a novel optimization-based constraint tightening,” Automatica, vol. 143, p. 110459, 2022.
  • [34] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
  • [35] W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems.,” in ICINCO (1), pp. 222–229, Citeseer, 2004.
  • [36] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1168–1175, IEEE, 2014.
  • [37] B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable mpc for end-to-end planning and control,” Advances in neural information processing systems, vol. 31, 2018.