Safety Filter Design for Neural Network Systems via
Convex Optimization

Shaoru Chen^†, Kong Yao Chee^†, Nikolai Matni, M. Ani Hsieh, George J. Pappas Shaoru Chen is with Microsoft Research, 300 Lafayette Street, New York, NY, 10012, USA, [email protected]. Kong Yao Chee, Nikolai Matni, M. Ani Hsieh, George J. Pappas are with the Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA. {ckongyao, nmatni, m.hsieh, pappasg}@seas.upenn.edu.
^† The first two authors contributed equally and are co-first authors.

Abstract

With the increase in data availability, it has been widely demonstrated that neural networks (NN) can capture complex system dynamics precisely in a data-driven manner. However, the architectural complexity and nonlinearity of the NNs make it challenging to synthesize a provably safe controller. In this work, we propose a novel safety filter that relies on convex optimization to ensure safety for a NN system, subject to additive disturbances that are capable of capturing modeling errors. Our approach leverages tools from NN verification to over-approximate NN dynamics with a set of linear bounds, followed by an application of robust linear MPC to search for controllers that can guarantee robust constraint satisfaction. We demonstrate the efficacy of the proposed framework numerically on a nonlinear pendulum system.

I Introduction

With the rapid development in machine learning infrastructure, neural networks (NN) have been ubiquitously applied in the modeling of complex dynamical systems [1, 2]. Through a data collection and training procedure [3], NNs can capture accurate representations of the system dynamics even in challenging scenarios where high-speed aerodynamic effects [4, 5, 6] or contact-rich environments [7, 8] are present. Moreover, NNs can be easily updated online as more data is collected, making them suitable for online tasks or modeling changing environments. For example, NN dynamical systems are widely used in model-based reinforcement learning [9] and learning-based adaptive control [6].

However, applying NN dynamics brings about significant challenges in providing safety guarantees for the controlled system. Benefiting from the expressivity of NNs, we are meanwhile faced with the high nonlinearity and large scale of NNs since they are often overparameterized. The runtime assurance (RTA) [10] mechanism provides a practical and effective solution to guarantee the safety of complex dynamical systems by designing a safety filter that primarily focuses on enforcing safety constraints. Given a primary controller that aims to optimize performance, the safety filter monitors and modifies the output of the primary controller online to guarantee that only safe control inputs are applied. The safety filter allows the design of the safety and performance-based controllers to be decoupled and has found wide applications in safe learning-based control [11, 12, 13].

In this work, we focus on the design of a predictive safety filter (PSF) [14] for uncertain NN dynamics. The PSF essentially follows a model predictive control (MPC) formulation with the nonlinear dynamics and constraints encoded in an optimization problem. Different from MPC, the PSF is less complex to solve since it does not consider any performance objectives [14]. Compared with the alternative safety filter construction schemes through control barrier functions (CBF) [15, 16] or Hamilton-Jacobi (HJ) reachability analysis [17, 18], the PSF enjoys flexibility in handling dynamically changing NN models or model uncertainty bounds when updated online. We refer the interested readers to [14, Section 1 and 2] for a detailed discussion of the PSF, CBF, and HJ reachability-based safety filters.

Contributions: In this work, we consider uncertain NN dynamics subject to bounded additive disturbances, where the disturbances can encapsulate the errors between the learned NN dynamics and the true system. Despite being highly expressive, the considered uncertain NN dynamics requires solving a robust optimization problem involving NN dynamical constraints online in the PSF. To resolve this computational challenge, we propose to apply NN verification tools [19] to abstract the NN dynamics locally as a linear uncertain system, thereby reducing the original PSF problem into one that is amenable to robust linear MPC and convex optimization. In particular, we adapt the SLS (System Level Synthesis) MPC method [20] to solve the resulting robust MPC problem. A schematic of our pipeline is shown in Fig. 1. Soft constraints are used in robust linear MPC where slack variables denoting constraint violations are penalized. By applying a hierarchy of conservative function and model uncertainty approximations, we transform the original optimization problem into a convex one. A safety certificate for the uncertain NN dynamics over a finite horizon can then be provided when all slack variables are zero. Our contributions are summarized below.

1.

Drawing tools from NN verification and robust linear MPC, we propose a novel predictive safety filter for uncertain NN dynamics through convex optimization. Importantly, the complexity of the convex optimization problem is independent of the NN size (i.e., width and depth of the NN).
2.

Our PSF provides a safety certificate for the uncertain NN dynamics over a finite horizon when a certain numerical criterion is met by the convex optimization solutions.

I-A Related works

The problem of ensuring the safety of learning-based systems has received significant interest with a plethora of methods described in [21]. Directly related to our work is the PSF developed in [14], which monitors and modifies a given control input by solving a predictive control problem online to guarantee the safety of the system. This formulation has been extended to the SLS setting [22], applied to racing cars [12] and a soft-constrained variant is proposed in [23] to handle unexpected disturbances to the states. The PSF that we propose differs from those in the existing work in the following ways. First, we exploit the structure of the neural networks to extract linear bounds on the NN outputs using NN verification tools [24, 19], simplifying the PSF formulation for NN dynamics. Second, our proposed pipeline circumvents the need to solve a robust non-convex optimization problem, even with the consideration of additive disturbances within the uncertain dynamics, as typical for nonlinear variants of SLS [25]. Unlike existing work in predictive control of NN dynamics [26, 27, 2], our work considers robust control of uncertain NN dynamics with a focus on obtaining formal safety guarantees.

Notation: $[N]$ denotes the set $\{0,1,\dots,N\}$ . $\mathcal{B}_{p}(z,r)$ denotes the $\ell_{p}$ ball centered at $z$ with radius $r$ . We use $x_{0:T}$ to denote the sequence $\{x_{0},\dotsc,x_{T}\}$ . $\mathbf{0}$ denotes a vector of all zeros, and its dimension can be inferred from context. For a vector $d\in\mathbb{R}^{n}$ , $S=\textrm{diag}(d)\in\mathbb{R}^{n\times n}$ denotes a diagonal matrix with $d$ being the diagonal vector. For a sequence of matrices $S_{1},\cdots,S_{N}$ , $S=\text{blkdiag}(S_{1},\cdots,S_{N})$ denotes a block diagonal matrix whose diagonal blocks are $S_{1},\cdots,S_{N}$ arranged in the order. We represent a linear, causal operator $\mathbf{R}$ defined over a horizon $T$ by the block-lower-triangular matrix

\mathbf{R}=\begin{bmatrix}R^{0,0}&\ &\ &\ \\ R^{1,1}&R^{1,0}&\ &\ \\ \vdots&\ddots&\ddots&\ \\ R^{T,T}&\cdots&R^{T,1}&R^{T,0}\end{bmatrix}

(1)

where $R^{i,j}\in\mathbb{R}^{p\times q}$ is a matrix of compatible dimension. The set of such matrices is denoted by $\mathcal{L}_{TV}^{T,p\times q}$ and the superscript $T$ or $p\times q$ will be dropped when it is clear from the context.

Refer to caption — Figure 1: Pipeline of the proposed predictive safety filter for uncertain NN systems.

II Problem Formulation

Consider the following discrete-time nonlinear system,

x(k+1)=Ax(k)+Bu(k)+f\left(x(k),u(k)\right)+w(k),

(2)

where the vectors $x(k)\in\mathbb{R}^{n_{x}}$ and $u(k)\in\mathbb{R}^{n_{u}}$ are the state and control input. The vector $w(k)\in\mathbb{R}^{n_{x}}$ denotes the additive disturbances that can account for unknown effects from the environment or unmodeled dynamics. We assume that $w(k)$ is norm-bounded, i.e., $w(k)\in\mathcal{W}:=\{w\mid\lVert w\rVert_{\infty}\leq\sigma_{w}\}$ . The system consists of a set of linear dynamics characterized by the matrices $A$ and $B$ and a set of nonlinear dynamics, $f(x(k),u(k))$ . Specifically, these nonlinear dynamics are given by a NN $f:\mathbb{R}^{n_{x}+n_{u}}\mapsto\mathbb{R}^{n_{x}}$ with an arbitrary architecture. While we discuss our proposed approach taking into account a general formulation of dynamics in (2), the approach allows the matrices $A$ and $B$ to be zero and has the flexibility to account for time-varying dynamics $(A(k),B(k))$ . The system (2) is required to satisfy the following constraints,

x(k)\in\mathcal{X},\quad u(k)\in\mathcal{U},

(3)

where $\mathcal{X}\subset\mathbb{R}^{n_{x}}$ and $\mathcal{U}\subset\mathbb{R}^{n_{u}}$ are polytopes. The state $x(k)$ and input $u(k)$ are considered safe if they satisfy constraints (3).

II-A Predictive safety filter

We assume a primary controller $\pi(x)$ which aims to complete a task or achieve high performance is given for system (2). Following the runtime assurance scheme, the primary controller $\pi(x)$ is not guaranteed to be safe since its design may be decoupled from the safety requirements. To ensure constraint satisfaction of the closed-loop system, we aim to design a predictive safety filter that monitors and modifies the control input $\pi(x(k))$ given by the primary controller in a minimally invasive manner online. This is achieved by solving the following robust optimization problem at each time step $k$ ,

$\displaystyle\underset{u_{0:T-1}}{\textrm{minimize}}$	$\displaystyle\quad\lVert u_{0}-\pi(x(k))\rVert_{2}^{2}$	(4)
subject to	$\displaystyle\quad x_{t+1}=Ax_{t}+Bu_{t}+f(x_{t},u_{t})+w_{t},$
	$\displaystyle\quad x_{t}\in\mathcal{X},u_{t}\in\mathcal{U},\forall w_{t}\in\mathcal{W},\quad t\in[T],$
	$\displaystyle\quad x_{0}=x(k),$

where the vectors $x_{t},\,u_{t}$ denote the predicted states and inputs over the horizon $T$ , with $x(k)$ representing the current state of the system. We denote Problem (4) as the PSF problem and the sequence $u_{0:T-1}^{*}$ as the optimal solution of Problem (4). When applied, the control inputs $u_{0:T-1}^{*}$ can guarantee the safety of the system for the next $T$ steps. In practice, the PSF problem (4) is solved recursively at each time step $k$ , and the first optimal control input $u_{0}^{*}$ is applied to the system, analogous to an MPC scheme.

II-B Challenges with the predictive safety filter

While the solution to Problem (4) is able to provide safety guarantees, solving Problem (4) is a challenging task. Some potential issues include

(i)

it is well known in robust MPC that searching over open-loop control sequences $u_{0:T-1}$ can be overly conservative [28],
(ii)

the presence of the NN dynamics $f(x,u)$ makes solving the PSF computationally challenging,
(iii)

the safety certificate of the solution $u_{0:T-1}^{*}$ is not available until convergence is reached, and
(iv)

without the availability of a robust forward invariant set, attempting to solve the PSF may result in infeasibility.

To handle all the aforementioned issues, we combine NN verification tools [19] and robust MPC [20]. Our solution consists of two steps. First, we generate local linear bounds for the NN dynamics $f(x,u)$ using tools from NN verification. Next, we apply robust linear MPC to synthesize a state feedback control policy that guarantees robust constraint satisfaction for the system. This combined procedure provides a powerful simplification of the PSF problem and resolves issues (i) to (iii). To address the issue (iv), we introduce soft constraints in our formulation. This provides formal safety guarantees for the system when the slack variables are zero. We describe these two steps in Sections III and IV, respectively. Section V demonstrates our method numerically, and Section VI concludes the paper.

Remark 1

To ensure the safety constraints are satisfied at all times or guarantee recursive feasibility of the PSF (4), a local forward invariant set for the nonlinear system (2) is required, which is generally challenging to find. In this work, we do not assume the availability of such a forward invariant set and use slack variables as numerical certificates of safety. We leave the synthesis of the forward invariant terminal set for NN dynamics and its integration into the PSF as part of our future work.

III Neural network verification bounds

In this section, we demonstrate how tools from NN verification can be utilized to over-approximate the NN dynamics with a linear time-varying (LTV) representation. This enables us to conservatively transform the PSF problem into a robust convex optimization problem, which is simpler to solve.

Given a bounded input set, the linear relaxation-based perturbation analysis (LiRPA) [19] is an efficient method to synthesize linear lower and upper bounds for the outputs of a NN with a general architecture. The bounds computed from this method are described in the following theorem.

Theorem 1

(rephrasing [24, Theorem 3.2]) Given a NN $f(z):\mathbb{R}^{n_{0}}\mapsto\mathbb{R}^{n_{L}}$ , there exist two explicit linear functions $f_{U}:\mathbb{R}^{n_{0}}\mapsto\mathbb{R}^{n_{L}}$ and $f_{L}:\mathbb{R}^{n_{0}}\mapsto\mathbb{R}^{n_{L}}$ such that for all $z\in\mathcal{B}_{p}(z_{0},r)$ , we have

f_{L}(z)=A_{L}z+b_{L}\leq f(z)\leq A_{U}z+b_{U}=f_{U}(z),

(5)

where the inequalities are applied component-wise.

The parameters $A_{L},A_{U},b_{L},b_{U}$ are derived from the weights, biases and activation functions of the NN. In this paper, we choose $p:=\infty$ such that $\mathcal{B}_{\infty}(z,r)$ is polyhedral. The bounds (5) are computed using closed-form updates with a computational complexity polynomial in the number of neurons [24]. This allows the method to scale well to deep networks. However, if the NN is deep or if the input domain $\mathcal{B}_{\infty}(z,r)$ is large, the computed bounds tend to be loose. Motivated by this observation, we propose to extract a set of local linear bounds along a reference trajectory. Specifically, at every time step $k$ , we construct a reference trajectory given by the sequences of reference states $\mathbf{\hat{x}}:=\hat{x}_{0:T}$ and control inputs $\mathbf{\hat{u}}:=\hat{u}_{0:T-1}$ where

		$\displaystyle\hat{x}_{t+1}=A\hat{x}_{t}+B\hat{u}_{t}+f(\hat{x}_{t},\hat{u}_{t}),\quad t\in[T-1],$		(6)
		$\displaystyle\hat{x}_{0}=x(k).$		(6)

The reference control inputs $\mathbf{\hat{u}}$ can be obtained, e.g., by rolling out the nominal NN dynamics following the primary policy $\pi(\cdot)$ . By denoting ${z}_{t}:=[{x}_{t}^{\top}\ {u}_{t}^{\top}]^{\top}$ , $\hat{z}_{t}:=[\hat{x}_{t}^{\top}\ \hat{u}_{t}^{\top}]^{\top}$ and $r_{t}$ to be the radius of the $\ell_{\infty}$ ball around $\hat{z}_{t}$ , we apply Theorem 1 to get the following bounds for the NN dynamics along the reference trajectory,

\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{g}_{t}\leq f(x_{t},u_{t})\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{g}_{t},

(7)

for all $(x_{t},u_{t})\in\mathcal{B}_{\infty}(\hat{z}_{t},r_{t})$ . In other words, the NN dynamics $f(x_{t},u_{t})$ is over-approximated with a set of linear lower and upper bounds. The ball $\mathcal{B}_{\infty}(\hat{z}_{t},r_{t})$ is referred to as the trust region in which the bounds (7) are valid.

To reduce conservatism in the formulation of the filter within the robust MPC framework, we integrate the bounds into the linear dynamics of the system. Specifically, using the bounds in (7), we define

f_{d}(x_{t},u_{t}):=f(x_{t},u_{t})-\left(\tilde{A}_{t}x_{t}+\tilde{B}_{t}u_{t}+\tilde{c}_{t}\right),

(8)

where

\tilde{A}_{t}:=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}+\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}}{2},\;\tilde{B}_{t}:=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}+\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}}{2},\;\tilde{c}_{t}:=\frac{\underline{g}_{t}+\overline{g}_{t}}{2}

(9)

denote the means of the linear bounds.

Corollary 1

Given the bounds in (7), for every $z_{t}:=[x_{t}^{\top}\,u_{t}^{\top}]^{\top}\in\mathcal{B}_{\infty}(\hat{z}_{t},r_{t})$ , the dynamics $f_{d}(x_{t},u_{t})$ in (8) have the following bounds,

\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{d}_{t}\leq f_{d}(x_{t},u_{t})\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{d}_{t},

(10)

where

\small\begin{split}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}-\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{x}}{2}&=-\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x},\;\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}=\frac{\mkern 3.0mu\underline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}-\mkern 3.0mu\overline{\mkern-3.0muG\mkern-3.0mu}\mkern 3.0mu_{t}^{u}}{2}=-\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u},\\ &\underline{d}_{t}=\frac{\underline{g}_{t}-\overline{g}_{t}}{2}=-\overline{d}_{t}.\end{split}

It is important to note that although the NN dynamics $f(x_{t},u_{t})$ can have large values within the trust region $\mathcal{B}_{\infty}(\hat{z}_{t},r_{t})$ , the dynamics $f_{d}(x_{t},u_{t})$ tends to be small in magnitude and can be treated as disturbances to the system. With the extraction of $f_{d}(x_{t},u_{t})$ , we obtain an LTV reformulation of the PSF problem, referred to as the linear PSF problem,

$\displaystyle\underset{u_{0:T-1}}{\textrm{minimize}}$	$\displaystyle\quad\lVert u_{0}-\pi(x(k))\rVert_{2}^{2}$	(11)
subject to	$\displaystyle\quad x_{t+1}=A_{t}x_{t}+B_{t}u_{t}+c_{t}+\Delta_{t}(x_{t},u_{t})+w_{t},$
	$\displaystyle\quad(x_{t},u_{t})\in\mathcal{B}_{\infty}\left(\hat{z}_{t},r_{t}\right),\quad t\in[T-1],$
	$\displaystyle\quad x_{t}\in\mathcal{X},\,u_{t}\in\mathcal{U},\quad t\in[T],$
	$\displaystyle\quad\forall\Delta_{t}(\cdot)\in\mathcal{P}_{t},\,w_{t}\in\mathcal{W},\quad t\in[T],$
	$\displaystyle\quad x_{0}=x(k),$

where the means in (9) are merged into the linear dynamics of the system with the definition of the following time-varying system parameters ¹¹1When a time-varying dynamics $(A(k),B(k))$ is considered in (2), we can replace $(A,B)$ by their time-varying counterparts in the definitions of $(A_{t},B_{t})$ . ,

A_{t}:=A+\tilde{A}_{t},\,B_{t}:=B+\tilde{B}_{t},\,c_{t}:=\tilde{c}_{t}.

The uncertainty set $\mathcal{P}_{t}$ is given as

\mathcal{P}_{t}:=\Bigg{\{}\Delta_{t}(x_{t},u_{t})\Bigg{|}\begin{array}[]{l}\Delta_{t}(x_{t},u_{t})\geq\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{d}_{t}\\ \Delta_{t}(x_{t},u_{t})\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{d}_{t}\end{array}\Bigg{\}},

(12)

using the bounds obtained from Corollary 1. It immediately follows that $f_{d}(x_{t},u_{t})$ corresponds to a realization of $\Delta_{t}(x_{t},u_{t})$ .

A few remarks about the linear PSF problem are in order. First, the solution of the linear PSF problem is dependent on the centers $\mathbf{\hat{z}}=\hat{z}_{0:T}$ and radius $r_{t}$ of the trust regions. The centers play an important role when the reference trajectory lies near or beyond the boundaries of the constraint set. In this case, the reference trajectory should be shifted towards the constraint set and this is done by adjusting the centers of these trust regions. Next, based on how the linear bounds in (12) are computed, there is a trade-off in the size of the radius $r_{t}$ . A small radius ensures that the computed bounds are accurate, but it limits the range in which the centers $\mathbf{\hat{z}}$ can be updated at each iteration. On the other hand, a large radius provides more flexibility in updating the reference trajectory, but the bounds can be overly conservative. Lastly, any feasible solution $u^{*}_{0:T-1}$ to the linear PSF problem (11) is also feasible for the PSF problem (4).

With these considerations, we devise a method such that the trust regions $\mathcal{B}_{\infty}(\hat{z}_{t},\,r_{t})$ can be updated online. Details of this update are given in Section IV-E. In Section IV-A to IV-D, we first describe how to solve the linear PSF problem through robust linear MPC.

IV Robust Linear MPC

Compared with the PSF problem (4), the linear PSF problem (11) only involves uncertain linear dynamics. However, solving it can still be a challenge and a conservative approach may not succeed. Since optimizing over open-loop control sequences is conservative in robust MPC, we consider optimizing over state-feedback controllers $u_{t}=\mu_{t}(x_{0:t})$ instead. To achieve this, we apply an extension of the SLS MPC algorithm in [20] which is shown to enjoy outstanding tightness among the existing robust linear MPC methods.

IV-A Overview of SLS MPC

In SLS MPC, we consider the following uncertain linear time-varying system,

\vspace{-0.1cm}x_{t+1}=A_{t}x_{t}+B_{t}u_{t}+\xi_{t}

(13)

where the time-varying matrices that represent the nominal dynamics $(A_{t},B_{t})$ are known and $\xi_{t}$ denotes the lumped uncertainty, which will be used to capture the effects of uncertainty in the dynamics.

Consider the dynamics (13) over a horizon $T$ . We define the following variables, which are concatenations of the variables in (13) over the horizon $T$ ,

		$\displaystyle\mathbf{x}:=[x_{0}^{\top}\ \cdots\ x_{T}^{\top}]^{\top},\quad\quad\ \mathbf{u}:=[u_{0}^{\top}\ \cdots\ u_{T}^{\top}]^{\top},$		(14)
		$\displaystyle\boldsymbol{\xi}:=[x_{0}^{\top}\ \xi_{0}^{\top}\ \cdots\ \xi_{T-1}^{\top}]^{\top},$		(14)

and these concatenated system matrices,

\vspace{-0.1cm}\mathbf{{A}}:=\textrm{blkdiag}\left({A}_{0},\cdots,{A}_{T}\right),\,\mathbf{{B}}:=\textrm{blkdiag}\left({B}_{0},\cdots,{B}_{T}\right).

We define $Z$ as the block-downshift operator with the first block sub-diagonal filled with identity matrices and zeros everywhere else. The dynamics (13) over the horizon $T$ can then be compactly written as,

\vspace{-0.1cm}\mathbf{x}=Z\mathbf{A}\mathbf{x}+Z\mathbf{B}\mathbf{u}+\boldsymbol{\xi},

(15)

Next, we consider a LTV state feedback controller $u_{t}=\sum_{i=0}^{t}K^{t,t-i}x_{i}$ , compactly given as $\mathbf{u}=\mathbf{K}\mathbf{x}$ , $\mathbf{K}\in\mathcal{L}_{TV}^{T,n_{u}\times n_{x}}$ . Plugging $\mathbf{u}$ into (15) gives the following system responses $\{\mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\}$ , mapping $\boldsymbol{\xi}$ to the closed-loop states and inputs $(\mathbf{x},\mathbf{u})$ ,

\begin{bmatrix}\mathbf{\Phi}_{x}\\ \mathbf{\Phi}_{u}\end{bmatrix}:=\begin{bmatrix}(I-Z(\mathbf{A}+\mathbf{B}\mathbf{K}))^{-1}\\ \mathbf{K}(I-Z(\mathbf{A}+\mathbf{B}\mathbf{K}))^{-1}\end{bmatrix}.

(16)

The following theorem establishes the connection between $\{\mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\}$ and state feedback controllers.

Theorem 2

[29, Theorem 2.1] Over the horizon $t\in[T]$ , for the dynamics (13) with the LTV state feedback controller $\mathbf{u}=\mathbf{K}\mathbf{x},\,\mathbf{K}\in\mathcal{L}_{TV}^{T,n_{u}\times n_{x}}$ , we have:

The affine subspace defined by

\begin{bmatrix}I-Z\mathbf{A}&-Z\mathbf{B}\end{bmatrix}\begin{bmatrix}\mathbf{\Phi}_{x}\\ \mathbf{\Phi}_{u}\end{bmatrix}=I,\ \mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\in\mathcal{L}_{TV}^{T}

(17)

parameterizes all possible system responses (16).

2.

For any $\{\mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\}\in\mathcal{L}_{TV}^{T}$ satisfying (17), the controller gain $\mathbf{K}=\mathbf{\Phi}_{u}\mathbf{\Phi}_{x}^{-1}\in\mathcal{L}_{TV}^{T}$ achieves the desired responses (16).

The system responses explicitly characterize the effects of the lumped uncertainty $\boldsymbol{\xi}$ on $(\mathbf{x},\mathbf{u})$ . In a previous work[20], a system subjected to polytopic uncertainties and additive disturbances is considered, i.e., $\xi_{t}:=\Delta_{A}x_{t}+\Delta_{B}u_{t}+w_{t}$ where the parameters $(\Delta_{A},\Delta_{B})$ belong to a polytopic set. Interested readers are referred to [20] for more details on SLS MPC.

Despite its outstanding performance in conservatism reduction compared to existing robust MPC methods [20], applying SLS MPC directly to solve the linear PSF comes with these two challenges; (i) the uncertainty set $\mathcal{P}_{t}$ is defined through affine inequalities and converting $\mathcal{P}_{t}$ into a vertex representation, which is amenable to existing robust MPC methods [30, 31, 32, 33], requires $2^{n_{x}}$ vertices, (ii) applying SLS MPC requires merging the constant $c_{t}$ into the lumped uncertainty $\xi_{t}$ which causes the bounds on $\boldsymbol{\xi}$ to be overly conservative. In the following subsections, we describe an extension to SLS MPC to address these two challenges.

IV-B Controller parameterization

For the uncertain linear dynamics

\vspace{-0.1cm}x_{t+1}=A_{t}x_{t}+B_{t}u_{t}+c_{t}+\Delta_{t}(x_{t},u_{t})+w_{t}

(18)

stated in (11), we define $\eta_{t}:=\Delta_{t}(x_{t},u_{t})+w_{t}$ as the lumped uncertainty. Instead of treating $\xi_{t}=c_{t}+\eta_{t}$ in (13), we decompose (18) into a set of nominal and error dynamics.

First, in addition to (14), we concatenate these variables over the horizon $T$ ,

		$\displaystyle\boldsymbol{\eta}:=[\mathbf{0}^{\top}\ \eta_{0}^{\top}\ \cdots\ \eta_{T-1}^{\top}]^{\top},\;\mathbf{w}:=[\mathbf{0}^{\top}\ w_{0}^{\top}\ \cdots\ w_{T-1}^{\top}]^{\top},$		(19)
		$\displaystyle\mathbf{c}:=[\mathbf{0}^{\top}c_{0}^{\top}\ \cdots\ c_{T-1}^{\top}]^{\top},\quad\boldsymbol{\delta}_{x_{0}}:=[x_{0}^{\top}\ \mathbf{0}^{\top}\ \cdots\ \mathbf{0}^{\top}]^{\top}.$		(19)

We define the nominal and error states $\{\mathbf{h},\mathbf{x}_{e}\}$ and control inputs $\{\mathbf{v},\mathbf{u}_{e}\}$ as

\begin{split}\mathbf{h}&:=[h_{0}^{\top}\ \cdots\ h_{T}^{\top}]^{\top},\quad\mathbf{v}:=[v_{0}^{\top}\ \cdots\ v_{T}^{\top}]^{\top},\\ \mathbf{x}_{e}&:=[x_{e,0}^{\top}\ \cdots\ x_{e,T}^{\top}]^{\top}=\mathbf{x}-\mathbf{h},\\ \mathbf{u}_{e}&:=[u_{e,0}^{\top}\ \cdots\ u_{e,T}^{\top}]^{\top}=\mathbf{u}-\mathbf{v},\end{split}

with the nominal and error dynamics as


$\displaystyle\mathbf{h}$	$\displaystyle=Z(\mathbf{A}\mathbf{h}+\mathbf{B}\mathbf{v})+\mathbf{c}+\boldsymbol{\delta}_{x_{0}},$	(20a)
$\displaystyle\mathbf{x}_{e}$	$\displaystyle=Z(\mathbf{A}\mathbf{x}_{e}+\mathbf{B}\mathbf{u}_{e})+\boldsymbol{\eta}.$	(20b)

It is important to note that (20b) conforms with (13). A LTV state feedback controller $\mathbf{K}\in\mathcal{L}_{TV}^{T,n_{u}\times n_{x}}$ is then applied to control the error states. The overall controller for (18) is given by $\mathbf{u}=\mathbf{K}\mathbf{x}_{e}+\mathbf{v}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}$ .

IV-C Lumped uncertainty over-approximation

For the lumped uncertainty $\boldsymbol{\eta}$ , its dependence on $\mathbf{x}$ , $\mathbf{u}$ and $\mathbf{w}$ complicates the design of the robust controller. As in SLS MPC, the approach is to over-approximate $\boldsymbol{\eta}$ by an independent, filtered virtual disturbance signal $\mathbf{\Psi}\widetilde{\mathbf{w}}$ , where $\widetilde{\mathbf{w}}=[\mathbf{0}^{\top}\ \tilde{w}_{0}^{\top}\ \cdots\ \tilde{w}_{T-1}^{\top}]^{\top},\ \lVert\tilde{w}_{t}\rVert_{\infty}\leq 1.$ The matrix $\mathbf{\Psi}\in\mathcal{L}_{TV}^{T,n_{x}\times n_{x}}$ is a filter operating on the finite-horizon virtual disturbance signal $\widetilde{\mathbf{w}}$ , with its diagonal blocks of $\mathbf{\Psi}$ structured as $\Psi^{0,0}=I,\Psi^{t,0}=\textrm{diag}(\psi_{t-1})$ with $\psi_{t-1}\in\mathbb{R}^{n_{x}},\psi_{t-1}>0,\,t=1,\dots,T.$

We define $\widetilde{\mathcal{W}}=\left\{\widetilde{\mathbf{w}}\mid\lVert\tilde{w}_{t}\rVert_{\infty}\leq 1,t\in[T-1]\right\}$ as the set of admissible virtual disturbances. Since $\tilde{w}_{t}$ are unit norm-bounded, we tune the filter $\mathbf{\Psi}$ to change the reachable set of $\mathbf{\Psi}\widetilde{\mathbf{w}}$ , defined as $\mathcal{R}(\mathbf{\Psi}\widetilde{\mathbf{w}}):=\{\boldsymbol{\zeta}\mid\boldsymbol{\zeta}=\mathbf{\Psi}\widetilde{\mathbf{w}},\widetilde{\mathbf{w}}\in\widetilde{\mathcal{W}}\},$ $\boldsymbol{\zeta}:=[\mathbf{0}^{\top}\ \zeta_{0}^{\top}\ \cdots\ \zeta_{T-1}^{\top}]^{\top}$ .

Our goal is to find sufficient conditions on the control parameters $\{\mathbf{K},\mathbf{h},\mathbf{v}\}$ and $\mathbf{\Psi}$ such that the reachable set of $\boldsymbol{\eta}$ , denoted by $\mathcal{R}\left(\boldsymbol{\eta};\,\mathbf{K},\mathbf{h},\mathbf{v}\right):=\left\{\boldsymbol{\eta}\mid\eta_{t}=\Delta_{t}(x_{t},u_{t})+w_{t},\Delta_{t}\in\mathcal{P}_{t},\,t\in[T-1]\right\}$ , is a subset of the reachable set of $\mathbf{\Psi}\widetilde{\mathbf{w}}$ . The following proposition provides these sufficient conditions and the proof is given in Appendix -A.

Proposition 1

Let $e_{i}\in\mathbb{R}^{n_{x}}$ denote the $i$ -th standard basis for $i=1,\dots,n_{x}$ and $y_{t}=[h_{t}^{\top}\ v_{t}^{\top}]^{\top}$ , $\Phi^{t,t-i}:=\begin{bmatrix}\Phi_{x}^{t,t-i\top}&\Phi_{u}^{{t,t-i}\top}\end{bmatrix}^{\top}$ for $t\in[T-1]$ . The following constraints:

\begin{bmatrix}I-Z\mathbf{A}&-Z\mathbf{B}\end{bmatrix}\begin{bmatrix}\mathbf{\Phi}_{x}\\ \mathbf{\Phi}_{u}\end{bmatrix}=\mathbf{\Psi},\ \mathbf{\Phi}_{x},\mathbf{\Phi}_{u}\in\mathcal{L}_{TV}^{T},

(21)

and


	$\displaystyle\sigma_{w}+e_{i}^{\top}([\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}\;\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}]y_{0}+\overline{d}_{0})\leq\psi_{0,i},$		(22a)
	$\displaystyle\sigma_{w}-e_{i}^{\top}([\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}]y_{0}+\underline{d}_{0})\leq\psi_{0,i},$		(22b)
	$\displaystyle\sigma_{w}+e_{i}^{\top}\Big{(}[\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]y_{t}+\overline{d}_{t}\Big{)}+$
	$\displaystyle\sum_{i=1}^{t}\left\lVert e_{i}^{\top}\Big{(}[\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]\Phi^{t,t-i}-\Psi^{t+1,t+1-i}\Big{)}\right\rVert_{1}\leq\psi_{t,i},$		(22c)
	$\displaystyle\sigma_{w}-e_{i}^{\top}\Big{(}[\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]y_{t}+\underline{d}_{t}\Big{)}+$
	$\displaystyle\sum_{i=1}^{t}\left\lVert e_{i}^{\top}\Big{(}[\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}\;\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}]\Phi^{t,t-i}-\Psi^{t+1,t+1-i}\Big{)}\right\rVert_{1}\leq\psi_{t,i},$		(22d)
	$\displaystyle i\in[n_{x}],\quad t=1,\cdots,T-1,$

guarantee that $\mathcal{R}(\boldsymbol{\eta};\mathbf{K},\mathbf{h},\mathbf{v})\subseteq\mathcal{R}(\mathbf{\Psi}\widetilde{\mathbf{w}})$ holds.

IV-D Convex formulation of the linear PSF

With the constraints (21) and (22), we have for any realization of $\boldsymbol{\eta}$ , there exists $\widetilde{\mathbf{w}}\in\widetilde{\mathcal{W}}$ such that $\boldsymbol{\eta}=\mathbf{\Psi}\widetilde{\mathbf{w}}$ . Therefore, we can represent $\boldsymbol{\eta}$ as $\mathbf{\Psi}\widetilde{\mathbf{w}}$ and write the error dynamics (20b) as

\mathbf{x}_{e}=Z(\mathbf{A}\mathbf{x}_{e}+\mathbf{B}\mathbf{u}_{e})+\mathbf{\Psi}\widetilde{\mathbf{w}}.

(23)

By [20, Corollary 1], we have constraint (21) parameterizes all system responses $\mathbf{x}_{e}=\mathbf{\Phi}_{x}\widetilde{\mathbf{w}},\mathbf{u}_{e}=\mathbf{\Phi}_{u}\widetilde{\mathbf{w}}$ of system (23) under the controller $\mathbf{u}_{e}=\mathbf{K}\mathbf{x}_{e}$ . Then, the closed-loop states and control inputs of system (18) are given by

\vspace{-0.1cm}\mathbf{x}=\mathbf{h}+\mathbf{\Phi}_{x}\widetilde{\mathbf{w}},\quad\mathbf{u}=\mathbf{v}+\mathbf{\Phi}_{u}\widetilde{\mathbf{w}}.

(24)

To guarantee robust constraint satisfaction of the controller $\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}$ , we tighten the constraints in the linear PSF (11). Consider the state constraint $x_{t}\in\mathcal{X}$ as an example. The constraints are represented by a polyhedral set, $\mathcal{X}=\{x\in\mathbb{R}^{n_{x}}\mid F_{x}x\leq b_{x}\}$ , where $F_{x}\in\mathbb{R}^{n_{\mathcal{X}}\times n_{x}}$ , $b_{x}:=[b_{1}^{x},\dotsc,b_{n_{\mathcal{X}}}^{x}]^{\top}\in\mathbb{R}^{n_{\mathcal{X}}}$ , and $\{a_{j}^{x\top}\}_{j=1}^{n_{\mathcal{X}}}$ denote the rows of $F_{x}$ . This implies that $(a_{j}^{x},b_{j}^{x})$ denotes the $j$ -th set of linear constraint parameters in $\mathcal{X}$ . From (24), we have $x_{t}=h_{t}+\sum_{i=1}^{t}\Phi_{x}^{t,t-i}\tilde{w}_{i-1}$ . Then, the following constraints

\vspace{-0.2cm}\small a_{j}^{x\top}h_{t}+\sum_{i=1}^{t}\left\lVert a_{j}^{x\top}\Phi_{x}^{t,t-i}\right\rVert_{1}\leq b_{j}^{x},\quad j=1,\dotsc,n_{\mathcal{X}}

(25)

guarantee that all constraints in $\mathcal{X}$ are satisfied robustly. As discussed in Section II, the recursive feasibility of the linear PSF cannot be guaranteed without a robust forward invariant set. Therefore, in this work, we introduce soft constraints into (25),

	$\displaystyle a_{j}^{x\top}h_{t}+\sum_{i=1}^{t}\left\lVert a_{j}^{x\top}\Phi_{x}^{t,t-i}\right\rVert_{1}\leq b_{j}^{x}+\epsilon_{x,j}^{t},$	$\displaystyle\epsilon_{x,j}^{t}\geq 0,\,t\in[T],$		(26)
		$\displaystyle j=1,\dotsc,n_{\mathcal{X}}.$		(26)

In the cost function, a large penalty on $\lVert\epsilon_{x}^{t}\rVert_{1}$ is applied. In the case $\epsilon_{x,j}^{t}=0$ , we obtain robust constraint satisfaction guarantee for the constraint with parameters $(a_{j}^{x},b_{j}^{x})$ . Similarly, the input constraints can be tightened as

	$\displaystyle a_{k}^{u\top}v_{t}+\sum_{i=1}^{t}\left\lVert a_{k}^{u\top}\Phi_{u}^{t,t-i}\right\rVert_{1}\leq b_{k}^{u}+\epsilon_{u,k}^{t},$	$\displaystyle\epsilon_{u,k}^{t}\geq 0,\,t\in[T],$		(27)
		$\displaystyle k=1,\dotsc,n_{\mathcal{U}},$		(27)

where $n_{\mathcal{U}}$ denotes the number of linear inequalities defining $\mathcal{U}$ and $(a_{k}^{u},b_{k}^{u})$ denotes the $k$ -th set of linear constraint parameters in $\mathcal{U}$ . The trust region constraints $(x_{t},u_{t})\in\mathcal{B}_{\infty}\left(\hat{z}_{t},r_{t}\right)$ analogously can be tightened as

		$\displaystyle a_{j}^{\top}h_{t}+\sum_{i=1}^{t}\left\lVert a_{j}^{\top}\Phi_{x}^{t,t-i}\right\rVert_{1}\leq b_{j}+\sigma_{x,j}^{t},\,\sigma_{x,j}^{t}\geq 0,$		(28)
		$\displaystyle a_{k}^{\top}v_{t}+\sum_{i=1}^{t}\left\lVert a_{k}^{\top}\Phi_{u}^{t,t-i}\right\rVert_{1}\leq b_{k}+\sigma_{u,k}^{t},\,\sigma_{u,k}^{t}\geq 0,$
		$\displaystyle j=1,\dotsc,2n_{x},\,k=1,\dotsc,2n_{u},\,t\in[T-1],$

where $(a_{j},b_{j})$ and $(a_{k},b_{k})$ are defined similarly for the state and input-related constraint parameters in $\mathcal{B}_{\infty}\left(\hat{z}_{t},r_{t}\right)$ .

Summarizing the results above, we propose a convex tightening of the linear PSF, which can be written as

$\displaystyle\underset{\begin{subarray}{c}\mathbf{\Phi}_{x},\mathbf{\Phi}_{u},\mathbf{\Psi},\mathbf{h},\mathbf{v},\\ \{\epsilon_{x}^{t},\epsilon_{u}^{t},\sigma_{x}^{t},\sigma_{u}^{t}\}\end{subarray}}{\textrm{minimize}}$	$\displaystyle\quad\left\lVert u_{0}-\pi(x(k))\right\rVert_{2}^{2}+M_{\epsilon}\sum_{t=0}^{T}\left(\left\lVert\epsilon_{x}^{t}\right\rVert_{1}+\left\lVert\epsilon_{u}^{t}\right\rVert_{1}\right)$	(29)
	$\displaystyle\quad+M_{\sigma}\sum_{t=0}^{T-1}\left(\left\lVert\sigma_{x}^{t}\right\rVert_{1}+\left\lVert\sigma_{u}^{t}\right\rVert_{1}\right)$
subject to	$\displaystyle\quad\text{affine constraint~{}\eqref{eq:affine_scaled}},$
	$\displaystyle\quad\text{over-approximation constraints~{}\eqref{eq:over_approx_constr}},$
	$\displaystyle\quad\text{soft state and input constraints~{}\eqref{eq:soft_state} and~{}\eqref{eq:soft_input}},$
	$\displaystyle\quad\text{soft trust region constraints~{}\eqref{eq:soft_trust_region}},$
	$\displaystyle\quad x_{0}=x(k),$

where $M_{\epsilon},M_{\sigma}>0$ are chosen as large numbers. When a polyhedral robust forward invariant set is used as the terminal constraint, we can tighten it similarly as (26). All the constraints in Problem (29) are linear, making (29) a quadratic program. With the use of soft constraints, (29) is always feasible. If the slack variables $\{\epsilon_{x}^{t},\epsilon_{u}^{t},\sigma_{x}^{t},\sigma_{u}^{t}\}$ in the solution are zero, we obtain a certificate that the system is safe for the next $T$ steps under the state-feedback controller $\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}$ with $\mathbf{K}=\mathbf{\Phi}_{u}\mathbf{\Phi}_{x}^{-1}$ .

IV-E Trust region update

Following the discussion on the trust regions in Section III, we describe a method to update the trust regions online. Starting with the reference trajectory given by the primary policy $\pi(x)$ , we propose to iteratively increase $r_{t}$ and update the reference trajectory by applying the policy $\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}$ , synthesized in Section IV. We then pick the reference trajectory that gives the smallest slack variables and apply the corresponding control inputs to the system. Our framework is summarized in Algorithm 1.

Algorithm 1 Robust Linear MPC-based PSF

Input: Current state $x(k)$ , horizon $T$ , number of iterations $N$ , initial reference trajectory $\hat{z}_{0:T}$ , reference control input $\pi(x(k))$ , initial trust region radius $r_{0}>0$
Output: Filtered control input $u_{0}^{*}$ , safety certificate safe_cert

1:At each time step

k

2:Initialize:

r_{t}\leftarrow r_{0},\;t\in[T-1],\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}\leftarrow\{\textbf{0}\}

\epsilon^{*}\leftarrow\infty

3:for

\ell=1,\dots,N

4: Construct

\mathcal{B}_{\infty}(\hat{z}_{t},r_{t})

and

\mathcal{P}_{t}

with (7) and (10)

5: Solve (29) with

x(k)

to get

\{\mathbf{K},\mathbf{h},\mathbf{v}\}

6: Extract

\epsilon_{max}:=

max. value of slack variables

7: if

\epsilon_{max}=0

then

8: Set

\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}\leftarrow\{\mathbf{K},\mathbf{h},\mathbf{v}\}

9: Set safe_cert

\leftarrow

True

10: else

11: Update

\hat{z}_{0:T}

with

\mathbf{u}=\mathbf{K}(\mathbf{x}-\mathbf{h})+\mathbf{v}

, as described

12: in (6)

13: Update

r_{t}\leftarrow\beta r_{t}

for

t\in[T-1]

with

\beta>1

14: if

\epsilon_{max}\leq\epsilon^{*}

then

15: Set

\epsilon^{*}\leftarrow\epsilon_{max}

16: Set

\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}\leftarrow\{\mathbf{K},\mathbf{h},\mathbf{v}\}

17: Set safe_cert

\leftarrow

False

18:Set

u_{0}^{*}\leftarrow v_{0}

, extracted from

\{\mathbf{K}^{*},\mathbf{h}^{*},\mathbf{v}^{*}\}

V Numerical Example

To verify the efficacy of the proposed solution, we test it on a NN proxy of the nonlinear pendulum system ²²2 Our codes are publicly available at https://github.com/ShaoruChen/NN-System-PSF.. The pendulum consists of the following dynamics [34],

\ddot{\theta}=\frac{3g\sin(\theta)}{2l}+\frac{3\tau}{ml^{2}}\,,

(30)

where $\theta$ is the angle between the pendulum and the vertical, $m$ and $l$ are the mass and length of the pendulum, $g$ is the gravitational force, and $\tau$ is the external torque acting on the pendulum. The state and control input are defined as $x:=[\theta\;\dot{\theta}]^{\top}\in\mathbb{R}^{2}$ and $u:=\tau\in\mathbb{R}$ . To obtain the linear dynamics in (2), the dynamics (30) are linearized about the origin and discretized with a sampling time of 0.05s to obtain the following dynamics,

x_{t+1}=\begin{bmatrix}1.0092&0.05015\\ 0.369&1.0092\end{bmatrix}x_{t}+\begin{bmatrix}0.00125\\ 0.05015\end{bmatrix}u_{t}.

(31)

Next, we train a NN $f(x,u)$ to approximate the residual dynamics that are not captured by the linear dynamics in (31). We first collect data by simulating the nonlinear dynamics in (30) for a duration of 15s. The NN is then trained through a backpropagation procedure [3], using the mean squared errors between the predicted and true states as the loss function. The NN consists of 3 hidden layers with 64 neurons in each layer and uses the rectified linear unit (ReLU) as the activation function. Additive noise with a maximum magnitude $\sigma_{w}$ of $\{0.05,0.1\}$ is injected into the states of the system. For the primary control policy, we consider an iterative linear quadratic regulator (iLQR) scheme [35] with the box-constrained heuristic [36]. This is implemented with the mpc.pytorch library [37]. The box-constrained heuristic allows the system to adhere to the control constraints, but does not account for state constraints.

Four test cases are considered in our experiments. In each of these cases, the system is required to track a pair of reference angles $\left(\theta_{r,1},\,\theta_{r,2}\right)$ sequentially, starting from an initial condition $x_{0}$ and across a duration of 2s. The initial conditions and reference angles are given in Table I.

TABLE I: Initial conditions and angles for the test scenarios.

Test Case	$x_{0}$ [deg; deg/s]	$\theta_{r,1}$ [deg]	$\theta_{r,2}$ [deg]
1	[57.3; -120.3]	120	-50
2	[-85.9; -85.9]	-150	40
3	[-85.9; -114.6]	-100	-180
4	[85.9; 57.3]	100	180

We simulate each of these test cases under 4 control schemes - (i) a nominal iLQR framework, (ii) a soft-constrained iLQR framework (SC-iLQR), (iii) safe-filtered iLQR, where we apply the proposed safety filter to the nominal iLQR scheme, and (iv) safe-filtered SC-iLQR, where the safety filter is applied to SC-iLQR. For SC-iLQR, soft state constraints are incorporated into the cost function of the forward pass of the iLQR algorithm. Specifically, the function $\phi(x)=\text{max}\{0,x\}$ is applied onto each of the constraints, which increases the cost function proportionally whenever the constraints are violated. In safe-filtered iLQR and SC-iLQR, the initial reference trajectories $\hat{z}_{0:T}$ in Algorithm 1 were initialized by iLQR and SC-iLQR, respectively.

The state trajectories under these control schemes are plotted in Fig. 2 and the percentages of constraint violations are tabulated in Table II. These percentages are computed by taking the ratio between the number of points in which the states violate the constraints against the total number of points in the state trajectory. Since the nominal iLQR method does not account for state constraints, it results in the largest percentage of constraint violation. While the soft-constrained iLQR reduces the level of constraint violation, there are a number of instances where the constraints are violated, as depicted in Fig. 2. On the other hand, as shown in Table II, through the application of the safety filter to iLQR and SC-iLQR, no constraint violations are observed for the test cases. To illustrate the safety certificate obtained, we plot the slack variables that characterize the trust region and state and input constraints, together with the safety certificate for the third test case, under a maximum noise level $\sigma_{w}=0.05$ , in Fig. 3. The combination of Table II and Fig. 3 indicates that given the formulation in (29), the PSF may not give a numerical safety certificate even when the state trajectories are safe. Meanwhile, this proposed PSF can effectively encourage the system to behave safely by minimizing the conservative upper bounds on the constraint violation.

The statistics of the computational times of the control scheme, in this case the iLQR scheme, and the safety filter are depicted in Fig. 4. While introducing soft constraints increases the run times of the iLQR scheme, it has a significant effect on the run times of the safety filter, as observed in the right panel of Fig. 4. With the soft constraints added to the iLQR scheme, the state trajectories are closer to the boundaries of the constraint sets, without the activation of the safety filter. This allows the filter to find a solution that satisfies the constraints under a smaller number of iterations, which reduces computational time.

TABLE II: Percentage of state constraint violations.

Method	Maximum noise level, $\sigma_{w}=0.05$
	Case 1 [%]	Case 2 [%]	Case 3 [%]	Case 4 [%]
iLQR	12.20	14.63	14.63	21.95
SC-iLQR	7.32	4.88	12.20	12.20
Safe-filtered iLQR	0.0	0.0	0.0	0.0
Safe-filtered SC-iLQR	0.0	0.0	0.0	0.0
	Maximum noise level, $\sigma_{w}=0.1$
	Case 1 [%]	Case 2 [%]	Case 3 [%]	Case 4 [%]
iLQR	14.63	14.63	17.07	26.83
SC-iLQR	4.87	2.44	7.32	9.75
Safe-filtered iLQR	0.0	0.0	0.0	0.0
Safe-filtered SC-iLQR	0.0	0.0	0.0	0.0

VI Conclusion

We propose a convex optimization-based predictive safety filter for uncertain NN dynamical systems with the inclusion of additive disturbances. By utilizing tools from NN verification and robust linear MPC, our method requires solving a soft-constrained convex program online whose complexity is independent of the NN size. With our framework, formal safety guarantees, together with a robust state-feedback controller, are attained when the slack variables in the solution are all zero.

-A Proof of Proposition 1

Showing that $\mathcal{R}(\boldsymbol{\eta};\mathbf{K},\mathbf{h},\mathbf{v})\subseteq\mathcal{R}(\mathbf{\Psi}\widetilde{\mathbf{w}})$ is equivalent to showing for every possible values of $\boldsymbol{\eta}$ , there exist $\widetilde{\mathbf{w}}^{*}\in\widetilde{\mathcal{W}}$ such that $\boldsymbol{\eta}=\mathbf{\Psi}\widetilde{\mathbf{w}}^{*}$ . Following this perspective, the sufficient conditions in Proposition 1 can be derived inductively.

Initial case

At $t=0$ , we have $x_{0}=h_{0},u_{0}=v_{0}$ , and

	$\displaystyle\Delta_{0}(x_{0},u_{0})$	$\displaystyle\geq\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}\end{bmatrix}y_{0}+\underline{d}_{0},$		(32)
	$\displaystyle\Delta_{0}(x_{0},u_{0})$	$\displaystyle\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{0}^{u}\end{bmatrix}y_{0}+\overline{d}_{0},$		(32)

with $y_{0}=[h_{0}^{\top}\ v_{0}^{\top}]^{\top}$ . Note that $\eta_{0}=\Delta_{0}(x_{0},u_{0})+w_{0}$ . For $\eta_{0}=\Psi^{1,0}\tilde{w}_{0}=\textrm{diag}(\psi_{0})\tilde{w}_{0}$ to have a solution $\tilde{w}_{0}$ satisfying $\lVert\tilde{w}_{0}\rVert_{\infty}\leq 1$ for all possible realization of the lumped uncertainty $\eta_{0}$ , it is both sufficient and necessary to have

\lVert\textrm{diag}(\psi_{0})^{-1}\eta_{0}\lVert_{\infty}\leq 1\Leftrightarrow\left\{\begin{array}[]{c}e_{i}^{\top}\eta_{0}\leq\psi_{0,i},i\in[n_{x}],\\ -e_{i}^{\top}\eta_{0}\leq\psi_{0,i},i\in[n_{x}],\end{array}\right.

(33)

hold robustly for $\eta_{0}$ where $e_{i}$ denotes the $i$ -th standard basis. Using the bounds in (32) and the fact that $\lVert w_{0}\rVert_{\infty}\leq\sigma_{w}$ , we have constraints (22a) and (22b) guarantee the robust feasibility of (33) for all possible values of $\eta_{0}$ . For any realization of $\eta_{0}$ , we denote the corresponding solution as $\tilde{w}_{0}^{*}$ such that $\eta_{0}=\textrm{diag}(\psi_{0})\tilde{w}_{0}^{*}$ .

Induction step

For a given controller $\{\mathbf{K},\mathbf{h},\mathbf{v}\}$ , consider an arbitrary realization of the lumped uncertainty $\eta_{0:T}$ . At time $t\geq 1$ , let $\boldsymbol{\eta}_{0:t}:=[\mathbf{0}^{\top}\ \eta_{0}\ \cdots\ \eta_{t-1}^{\top}]^{\top}$ denote the truncation of $\boldsymbol{\eta}$ consisting of the first $t+1$ components of $\boldsymbol{\eta}$ , and $\mathbf{\Psi}_{0:t}\in\mathcal{L}_{TV}^{t}$ denote the truncation of $\mathbf{\Psi}$ up to the $t+1$ -th row and column. The truncated vectors $\mathbf{x}_{e,0:t},\mathbf{u}_{e,0:t}$ and matrices $\mathbf{A}_{0:t},\mathbf{B}_{0:t}$ are defined similarly. Assume there exist $\widetilde{\mathbf{w}}^{*}_{0:t}=[\mathbf{0}^{\top}\ \tilde{w}_{0}^{*\top}\ \cdots\ \tilde{w}_{t-1}^{*\top}]^{\top}$ such that $\boldsymbol{\eta}_{0:t}=\mathbf{\Psi}_{0:t}\widetilde{\mathbf{w}}^{*}_{0:t}$ . Next, we will show that under constraint (22) there exist $\tilde{w}^{*}_{t}$ with $\lVert\tilde{w}^{*}_{t}\rVert_{\infty}\leq 1$ such that

	$\displaystyle\eta_{t}$	$\displaystyle=\sum_{i=1}^{t+1}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}$		(34)
		$\displaystyle=\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{}+\textrm{diag}(\psi_{t})\tilde{w}_{t}^{}$		(34)

holds.

Since $\boldsymbol{\eta}_{0:t}=\mathbf{\Psi}_{0:t}\widetilde{\mathbf{w}}^{*}_{0:t}$ , the error dynamics (20b) up to time $t$ can be written as

\mathbf{x}_{e,0:t}=Z(\mathbf{A}_{0:t}\mathbf{x}_{e,0:t}+\mathbf{B}_{0:t})\mathbf{u}_{e,0:t}+\mathbf{\Psi}_{0:t}\widetilde{\mathbf{w}}^{*}_{0:t}.

(35)

According to [20, Corollary 1], the affine constraint (21) parameterizes all closed-loop system responses $\mathbf{x}_{e,0:t}=\mathbf{\Phi}_{x,0:t}\widetilde{\mathbf{w}}^{*}_{0:t}$ , $\mathbf{u}_{e,0:t}=\mathbf{\Phi}_{u,0:t}\widetilde{\mathbf{w}}^{*}_{0:t}$ under the controller $\mathbf{u}_{e,0:t}=\mathbf{K}_{0:t}\mathbf{x}_{e,0:t}$ where $\mathbf{\Phi}_{x,0:t},\mathbf{\Phi}_{u,0:t},\mathbf{K}_{0:t}\in\mathcal{L}_{TV}^{t}$ are the corresponding truncation of $\mathbf{\Phi}_{x},\mathbf{\Phi}_{u},\mathbf{K}$ , respectively. Define $y_{t}:=[h_{t}^{\top}\,v_{t}^{\top}]^{\top}$ and $\Phi^{t,t-i}:=\begin{bmatrix}\Phi_{x}^{t,t-i\top}&\Phi_{u}^{t,t-i\top}\end{bmatrix}^{\top}$ . It follows from (35) and the fact $\mathbf{x}_{e,0:t}=\mathbf{\Phi}_{x,0:t}\widetilde{\mathbf{w}}^{*}_{0:t}$ , $\mathbf{u}_{e,0:t}=\mathbf{\Phi}_{u,0:t}\widetilde{\mathbf{w}}^{*}_{0:t}$ that the state $x_{t}$ and control input $u_{t}$ at time $t$ under the policy $\{\mathbf{K},\mathbf{h},\mathbf{v}\}$ are given by

\begin{bmatrix}x_{t}\\ u_{t}\end{bmatrix}=z_{t}+\sum_{i=1}^{t}\Phi^{t,t-i}\tilde{w}_{i-1}^{*}.

(36)

Correspondingly, the uncertainty $\Delta_{t}(x_{t},u_{t})$ at time $t$ are bounded by

$\displaystyle\Delta_{t}(x_{t},u_{t})$	$\displaystyle\geq\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\underline{d}_{t}$	(37)
	$\displaystyle=\begin{bmatrix}\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\underline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}\Big{(}y_{t}+\sum_{i=1}^{t}\Phi^{t,t-i}\tilde{w}_{i-1}^{*}\Big{)}+\underline{d}_{t},$
$\displaystyle\Delta_{t}(x_{t},u_{t})$	$\displaystyle\leq\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}z_{t}+\overline{d}_{t}$
	$\displaystyle=\begin{bmatrix}\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{x}&\mkern 3.0mu\overline{\mkern-3.0muD\mkern-3.0mu}\mkern 3.0mu_{t}^{u}\end{bmatrix}\Big{(}y_{t}+\sum_{i=1}^{t}\Phi^{t,t-i}\tilde{w}_{i-1}^{*}\Big{)}+\overline{d}_{t},$

which is achieved by plugging (36) into the definition of $\mathcal{P}_{t}$ . Recall that $\eta_{t}=\Delta_{t}(x_{t},u_{t})+w_{t}$ . For

\eta_{t}=\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}+\textrm{diag}(\psi_{t})\tilde{w}_{t}

(38)

to have a solution $\tilde{w}_{t}$ such that $\lVert\tilde{w}_{t}\rVert_{\infty}\leq 1$ , it is equivalent to require

	$\displaystyle e_{i}^{\top}\Big{(}\eta_{t}-\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}\Big{)}$	$\displaystyle\leq\psi_{t,i},i\in[n_{x}],$		(39)
	$\displaystyle-e_{i}^{\top}\Big{(}\eta_{t}-\sum_{i=1}^{t}\Psi^{t+1,t+1-i}\tilde{w}_{i-1}^{*}\Big{)}$	$\displaystyle\leq\psi_{t,i},i\in[n_{x}],$		(39)

hold. Since $\Delta_{t}(x_{t},u_{t})$ is bounded by (37), $\lVert\tilde{w}_{i}^{*}\rVert_{\infty}\leq 1$ for $i\in[t-1]$ and $\lVert w_{t}\rVert_{\infty}\leq\sigma_{w}$ , we have constraints (22c) and (22d) are sufficient to guarantee the inequalities (39) hold. Then, we denote the solution of (38) as $\tilde{w}_{t}^{*}$ for the given realization of $\eta_{t}$ . We repeat this process until $t=T$ and in this way construct the virtual disturbance $\widetilde{\mathbf{w}}^{*}$ such that $\boldsymbol{\eta}=\mathbf{\Psi}\widetilde{\mathbf{w}}^{*}$ . Since the realization of $\boldsymbol{\eta}$ is chosen arbitrarily, we prove Proposition 1.

References

[1] O. Ogunmolu, X. Gu, S. Jiang, and N. Gans, “Nonlinear systems identification using deep dynamic neural networks,” arXiv preprint arXiv:1610.01439, 2016.
[2] K. Y. Chee, T. Z. Jiahao, and M. A. Hsieh, “Knode-mpc: A knowledge-based data-driven predictive control framework for aerial robots,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2819–2826, 2022.
[3] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[4] S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin, “Learning quadrotor dynamics using neural network for flight control,” in 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4653–4660, IEEE, 2016.
[5] L. Bauersfeld, E. Kaufmann, P. Foehn, S. Sun, and D. Scaramuzza, “Neurobem: Hybrid aerodynamic quadrotor model,” arXiv preprint arXiv:2106.08015, 2021.
[6] M. O’Connell, G. Shi, X. Shi, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural-fly enables rapid learning for agile flight in strong winds,” Science Robotics, vol. 7, no. 66, p. eabm6597, 2022.
[7] Y. Yang, K. Caluwaerts, A. Iscen, T. Zhang, J. Tan, and V. Sindhwani, “Data efficient reinforcement learning for legged robots,” in Conference on Robot Learning, pp. 1–10, PMLR, 2020.
[8] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou, “Information theoretic MPC for model-based reinforcement learning,” in 2017 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 1714–1721, IEEE, 2017.
[9] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning,” in 2018 IEEE international conference on robotics and automation (ICRA), pp. 7559–7566, IEEE, 2018.
[10] K. L. Hobbs, M. L. Mote, M. C. Abate, S. D. Coogan, and E. M. Feron, “Runtime assurance for safety-critical systems: An introduction to safety filtering approaches for complex control systems,” IEEE Control Systems Magazine, vol. 43, no. 2, pp. 28–65, 2023.
[11] Y. Emam, G. Notomista, P. Glotfelter, Z. Kira, and M. Egerstedt, “Safe reinforcement learning using robust control barrier functions,” IEEE Robotics and Automation Letters, no. 99, pp. 1–8, 2022.
[12] B. Tearle, K. P. Wabersich, A. Carron, and M. N. Zeilinger, “A predictive safety filter for learning-based racing control,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7635–7642, 2021.
[13] M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
[14] K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” Automatica, vol. 129, p. 109597, 2021.
[15] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016.
[16] H. Ma, B. Zhang, M. Tomizuka, and K. Sreenath, “Learning differentiable safety-critical control using control barrier functions for generalization to novel environments,” in 2022 European Control Conference (ECC), pp. 1301–1308, IEEE, 2022.
[17] A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger, and C. J. Tomlin, “Reachability-based safe learning with gaussian processes,” in 53rd IEEE Conference on Decision and Control, pp. 1424–1431, IEEE, 2014.
[18] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2737–2752, 2018.
[19] K. Xu, Z. Shi, H. Zhang, Y. Wang, K.-W. Chang, M. Huang, B. Kailkhura, X. Lin, and C.-J. Hsieh, “Automatic perturbation analysis for scalable certified robustness and beyond,” Advances in Neural Information Processing Systems, vol. 33, pp. 1129–1141, 2020.
[20] S. Chen, V. M. Preciado, M. Morari, and N. Matni, “Robust model predictive control with polytopic model uncertainty through system level synthesis,” arXiv preprint arXiv:2203.11375, 2022.
[21] L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
[22] A. P. Leeman, J. Köhler, S. Bennani, and M. N. Zeilinger, “Predictive safety filter using system level synthesis,” in Proceedings of The 5th Annual Learning for Dynamics and Control Conference, pp. 1180–1192, PMLR, 2023.
[23] K. P. Wabersich and M. N. Zeilinger, “Predictive control barrier functions: Enhanced safety mechanisms for learning-based control,” IEEE Transactions on Automatic Control, 2022.
[24] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel, “Efficient neural network robustness certification with general activation functions,” Advances in neural information processing systems, vol. 31, 2018.
[25] A. P. Leeman, J. Köhler, A. Zanelli, S. Bennani, and M. N. Zeilinger, “Robust nonlinear optimal control via system level synthesis,” arXiv preprint arXiv:2301.04943, 2023.
[26] A. Saviolo, G. Li, and G. Loianno, “Physics-inspired temporal learning of quadrotor dynamics for accurate model predictive trajectory tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10256–10263, 2022.
[27] N. A. Spielberg, M. Brown, and J. C. Gerdes, “Neural network model predictive motion control applied to automated driving with unknown friction,” IEEE Transactions on Control Systems Technology, vol. 30, no. 5, pp. 1934–1945, 2021.
[28] D. Q. Mayne, “Model predictive control: Recent developments and future promise,” Automatica, vol. 50, no. 12, pp. 2967–2986, 2014.
[29] J. Anderson, J. C. Doyle, S. H. Low, and N. Matni, “System level synthesis,” Annual Reviews in Control, vol. 47, pp. 364–393, 2019.
[30] W. Langson, I. Chryssochoos, S. Raković, and D. Q. Mayne, “Robust model predictive control using tubes,” Automatica, vol. 40, no. 1, pp. 125–133, 2004.
[31] J. Köhler, E. Andina, R. Soloperto, M. A. Müller, and F. Allgöwer, “Linear robust adaptive model predictive control: Computational complexity and conservatism,” in 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 1383–1388, IEEE, 2019.
[32] J. Fleming, B. Kouvaritakis, and M. Cannon, “Robust tube mpc for linear systems with multiplicative uncertainty,” IEEE Transactions on Automatic Control, vol. 60, no. 4, pp. 1087–1092, 2014.
[33] M. Bujarbaruah, U. Rosolia, Y. R. Stürz, X. Zhang, and F. Borrelli, “Robust mpc for lpv systems via a novel optimization-based constraint tightening,” Automatica, vol. 143, p. 110459, 2022.
[34] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
[35] W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems.,” in ICINCO (1), pp. 222–229, Citeseer, 2004.
[36] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1168–1175, IEEE, 2014.
[37] B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable mpc for end-to-end planning and control,” Advances in neural information processing systems, vol. 31, 2018.

Safety Filter Design for Neural Network Systems via Convex Optimization