This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Provably Correct Training of Neural Network
Controllers Using Reachability Analysis

Xiaowu Sun [email protected]    Yasser Shoukry [email protected] Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697 USA
Abstract

In this paper, we consider the problem of training neural network (NN) controllers for nonlinear dynamical systems that are guaranteed to satisfy safety and liveness (e.g., reach-avoid) properties. Our approach is to combine model-based design methodologies for dynamical systems with data-driven approaches to achieve this target. We confine our attention to NNs with Rectifier Linear Unit (ReLU) nonlinearity which are known to represent Continuous Piece-Wise Affine (CPWA) functions. Given a mathematical model of the dynamical system, we compute a finite-state abstract model that captures the closed-loop behavior under all possible CPWA controllers. Using this finite-state abstract model, our framework identifies a family of CPWA functions guaranteed to satisfy the safety requirements. We augment the learning algorithm with a NN weight projection operator during training that enforces the resulting NN to represent a CPWA function from the provably safe family of CPWA functions. Moreover, the proposed framework uses the finite-state abstract model to identify candidate CPWA functions that may satisfy the liveness properties. Using such candidate CPWA functions, the proposed framework biases the NN training to achieve the liveness specification. We show the efficacy of the proposed framework both in simulation and on an actual robotic vehicle.

keywords:
Neural networks; Formal methods; Reachability analysis
thanks: This work was partially sponsored by the NSF awards #CNS-2002405 and #CNS-2013824.
This paper was not presented at any conference.

, ,

1 Introduction

The last decade has witnessed tremendous success in using machine learning (ML) in a multitude of safety-critical cyber-physical systems domains, such as autonomous vehicles, drones, and smart cities. Indeed, end-to-end learning is attractive for the realization of feedback controllers for such complex cyber-physical systems, thanks to the appeal of designing control systems based on purely data-driven architectures. However, regardless of the explosion in the use of machine learning to design data-driven feedback controllers, providing formal safety and reliability guarantees of these ML-based controllers is in question. It is then unsurprising the recent focus in the literature on the problem of safe and trustworthy autonomy in general, and safe reinforcement learning, in particular.

The literature on the safe design of ML-based controllers for dynamical and hybrid systems can be classified according to three broad approaches, namely (i) incorporating safety in the training of ML-based controllers, (ii) post-training verification of ML-based controllers, and (iii) online validation of safety and control intervention. Representative examples of the first approach include reward-shaping (Jiang et al.,, 2020; Saunders et al.,, 2018), Bayesian and robust regression (Berkenkamp et al.,, 2016; Liu et al., 2019a, ; Pauli et al.,, 2020), and policy optimization with constraints (Achiam et al.,, 2017; Turchetta et al.,, 2016; Wen et al.,, 2020). Unfortunately, these approaches do not provide provable guarantees on the safety of the trained controller.

To provide strong safety and reliability guarantees, several works in the literature focus on applying formal verification techniques (e.g., model checking) to verify pre-trained ML-based controllers’ formal safety properties. Representative examples of this approach are the use of SMT-like solvers (Dutta et al.,, 2018; Liu et al., 2019b, ; Sun et al.,, 2019) and hybrid-system verification (Fazlyab et al.,, 2019; Ivanov et al.,, 2019; Xiang et al.,, 2019). However, these techniques only assess a given ML-based controller’s safety rather than design or train a safe agent.

Due to the lack of safety guarantees on the resulting ML-based controllers, researchers proposed several techniques to restrict the output of the ML-based controller to a set of safe control actions. Such a set of safe actions can be obtained through Hamilton-Jacobi analysis (Fisac et al.,, 2018; Hsu et al.,, 2021), and barrier certificates (Abate et al.,, 2021; Chen et al.,, 2021; Cheng et al.,, 2019; Ferlez et al.,, 2020; Li and Belta,, 2019; Robey et al.,, 2020; Taylor et al.,, 2020; Wabersich and Zeilinger, 2018b, ; Wang et al.,, 2018; Xiao et al.,, 2020). Unfortunately, methods of this type suffer from being computationally expensive, specific to certain controller structures or else employ training algorithms that require certain assumptions on the system model. Other techniques in this domain include synthesizing a safety layer or shield based on safe model predictive control with the assumption of existing a terminal safe set (Bastani et al.,, 2021; Wabersich and Zeilinger, 2018a, ; Wabersich and Zeilinger,, 2021), and Lyapunov methods (Berkenkamp et al.,, 2017; Chow et al.,, 2018, 2019) which focus mainly on providing stability guarantees rather than general safety and liveness guarantees.

This paper proposes a principled framework combining model-based control and data-driven neural network training to achieve enhanced reliability and verifiability. Our framework bridges ideas from reachability analysis to guide and bias the neural network controller’s training and provides strong reliability guarantees. Our starting point is the fact that Neural Networks (NNs) with Rectifier Linear Unit (ReLU) nonlinearity represent Continuous Piece-Wise Affine (CPWA) functions. Given a nonlinear model of the system, we compute a finite-state abstract model capable of capturing the closed-loop behavior under all possible CPWA controllers. Such a finite-state abstract model can be computed using a direct extension of existing reachability tools. Next, our framework uses this abstract model to search for a family of safe CPWA controllers such that the system controlled by any controller from this family is guaranteed to satisfy the safety properties. During the neural network training, we use a novel projection operator that projects the trained neural network weights to ensure that the resulting NN represents a CPWA function from the identified family of safe CPWA functions.

Unlike the safety property, satisfying the liveness property can not be enforced by projecting the trained neural network weights. Therefore, to account for the liveness properties, our framework utilizes the abstract model further to refine the family of safe CPWA functions to find candidate CPWA functions that may satisfy the liveness properties. The framework then ranks these candidates and biases the NN training accordingly until a NN that satisfies the liveness property is obtained. In conclusion, the contributions of this paper can be summarized as follows:

  1. 1.

    An abstraction-based framework that captures the behavior of all neural network controllers.

  2. 2.

    A novel projected training algorithm to train provably safe NN controllers.

  3. 3.

    A procedure to bias the NN training to satisfy the liveness properties.

2 Problem Formulation

Notation: The symbols \mathbb{R} and \mathbb{N} denote the set of real and natural numbers, respectively. The symbols \land and \Longrightarrow denote the logical AND and logical IMPLIES, respectively. We denote by Int(𝒮)\mathrm{Int}(\mathcal{S}) the interior of a set 𝒮\mathcal{S} and by Vert()\mathrm{Vert}(\mathcal{R}) the the set of all vertices of a polytope \mathcal{R}. We use ΨCPWA:𝒳m\Psi_{\text{CPWA}}:\mathcal{X}\rightarrow\mathbb{R}^{m} to denote a Continuous and Piece-Wise Affine (CPWA) function of the form:

ΨCPWA(x)=Kix+biifxi,i=1,,L,\Psi_{\text{CPWA}}(x)=K_{i}x+b_{i}\quad\text{if}\ x\in\mathcal{R}_{i},\ i=1,\ldots,L, (1)

where the collection of polytopic subsets {1,,L}\{\mathcal{R}_{1},\ldots,\mathcal{R}_{L}\} is a partition of the set 𝒳\mathcal{X} such that i=1Li=𝒳\bigcup_{i=1}^{L}\mathcal{R}_{i}=\mathcal{X} and Int(i)Int(j)=\mathrm{Int}(\mathcal{R}_{i})\cap\mathrm{Int}(\mathcal{R}_{j})=\emptyset if iji\neq j. We call each polytopic subset i𝒳\mathcal{R}_{i}\subset\mathcal{X} a linear region, and denote by 𝕃ΨCPWA\mathbb{L}_{\Psi_{\text{CPWA}}} the set of linear regions associated to ΨCPWA\Psi_{\text{CPWA}}, i.e.:

𝕃ΨCPWA={1,,L}.\mathbb{L}_{\Psi_{\text{CPWA}}}=\{\mathcal{R}_{1},\ldots,\mathcal{R}_{L}\}. (2)

2.1 Dynamical Model, NN Controller, and Specification

Consider discrete-time nonlinear dynamical systems of the form:

x(t+1)=f(x(t),u(t)),x^{(t+1)}=f(x^{(t)},u^{(t)}), (3)

where the state vector x(t)𝒳nx^{(t)}\in\mathcal{X}\subset\mathbb{R}^{n}, the control vector u(t)𝒰mu^{(t)}\in\mathcal{U}\subset\mathbb{R}^{m}, and tt\in\mathbb{N}. Given a feedback control law Ψ:𝒳𝒰\Psi:\mathcal{X}\rightarrow\mathcal{U}, we use ξx0,Ψ:𝒳\xi_{x_{0},\Psi}:\mathbb{N}\rightarrow\mathcal{X} to denote the closed-loop trajectory of (3) that starts from the state x0𝒳x_{0}\in\mathcal{X} and evolves under the control law Ψ\Psi.

In this paper, our primary focus is on controlling the nonlinear system (3) with a state-feedback neural network controller 𝒩𝒩:𝒳𝒰\mathcal{NN}:\mathcal{X}\rightarrow\mathcal{U}. An FF-layer Rectified Linear Unit (ReLU) NN is specified by composing FF layer functions (or just layers). A layer ll with 𝔦l\mathfrak{i}_{l} inputs and 𝔬l\mathfrak{o}_{l} outputs is specified by a weight matrix W(l)𝔬l×𝔦lW^{(l)}\in\mathbb{R}^{\mathfrak{o}_{l}\times\mathfrak{i}_{l}} and a bias vector b(l)𝔬lb^{(l)}\in\mathbb{R}^{\mathfrak{o}_{l}} as follows:

Lθ(l):zmax{W(l)z+b(l),0},L_{\theta^{(l)}}:z\mapsto\max\{W^{(l)}z+b^{(l)},0\}, (4)

where the max\max function is taken element-wise, and θ(l)(W(l),b(l))\theta^{(l)}\triangleq(W^{(l)},b^{(l)}) for brevity. Thus, an FF-layer ReLU NN is specified by FF layer functions {Lθ(l):l=1,,F}\{L_{\theta^{(l)}}:l=1,\dots,F\} whose input and output dimensions are composable: that is they satisfy 𝔦l=𝔬l1,l=2,,F\mathfrak{i}_{l}=\mathfrak{o}_{l-1},l=2,\dots,F. Specifically:

𝒩𝒩θ(x)=(Lθ(F)Lθ(F1)Lθ(1))(x),\mathcal{NN}_{\theta}(x)=(L_{\theta^{(F)}}\circ L_{\theta^{(F-1)}}\circ\dots\circ L_{\theta^{(1)}})(x), (5)

where we index a ReLU NN function by a list of matrices θ(θ(1),,θ(F))\theta\triangleq(\theta^{(1)},\dots,\theta^{(F)}). Also, it is common to allow the final layer function to omit the max\max function altogether, and we will be explicit about this when it is the case.

As a typical control task, this paper considers a reach-avoid specification ϕ\phi, which combines a safety property ϕsafety\phi_{\text{safety}} for avoiding a set of obstacles {𝒪1,,𝒪o}\{\mathcal{O}_{1},\ldots,\mathcal{O}_{o}\} with 𝒪i𝒳\mathcal{O}_{i}\subset\mathcal{X}, and a liveness property ϕliveness\phi_{\text{liveness}} for reaching a goal region 𝒳goal𝒳\mathcal{X}_{\text{goal}}\subset\mathcal{X} in a bounded time horizon TT. We use ξx0,Ψϕsafety\xi_{x_{0},\Psi}\models\phi_{\text{safety}} and ξx0,Ψϕliveness\xi_{x_{0},\Psi}\models\phi_{\text{liveness}} to denote a trajectory ξx0,Ψ\xi_{x_{0},\Psi} satisfies the safety and liveness specifications, respectively, i.e.:

ξx0,Ψϕsafetyt,i{1,,o},ξx0,Ψ(t)Oi,\displaystyle\xi_{x_{0},\Psi}\models\phi_{\text{safety}}\Longleftrightarrow\forall t\in\mathbb{N},\forall i\in\{1,\ldots,o\},\xi_{x_{0},\Psi}(t)\not\in O_{i},
ξx0,Ψϕlivenesst{1,T},ξx0,Ψ(t)𝒳goal.\displaystyle\xi_{x_{0},\Psi}\models\phi_{\text{liveness}}\Longleftrightarrow\exists t^{\prime}\in\{1,\ldots T\},\;\xi_{x_{0},\Psi}(t^{\prime})\in\mathcal{X}_{\text{goal}}.

Given a set of initial states 𝒳init\mathcal{X}_{\text{init}}, a control law Ψ:𝒳m\Psi:\mathcal{X}\rightarrow\mathbb{R}^{m} satisfies the specification ϕ\phi (denoted by Ψ,𝒳initϕ\Psi,\mathcal{X}_{\text{init}}\models\phi) if all trajectories starting from the set 𝒳init\mathcal{X}_{\text{init}} satisfy the specification, i.e., ξx,Ψϕ\xi_{x,\Psi}\models\phi, x𝒳init\forall x\in\mathcal{X}_{\text{init}}.

2.2 Main Problem

Given the dynamical model (3) and a reach-avoid specification ϕ=ϕsafetyϕliveness\phi=\phi_{\text{safety}}\land\phi_{\text{liveness}}, we consider the problem of designing a NN controller with provable guarantees as described in the next problem.

Problem 1.

Given the nonlinear dynamical system (3) and a reach-avoid specification ϕ\phi, compute an assignment of neural network weights θ\theta and a set of initial states 𝒳init𝒳\mathcal{X}_{\text{init}}\subseteq\mathcal{X} such that 𝒩𝒩θ,𝒳initϕsafetyϕliveness\mathcal{NN}_{\theta},\mathcal{X}_{\text{init}}\models\phi_{\text{safety}}\land\phi_{\text{liveness}}.

While it is desirable to find the largest possible set of safe initial states 𝒳init\mathcal{X}_{\text{init}}, our algorithm will instead focus on finding an ϵ\epsilon sub-optimal 𝒳init\mathcal{X}_{\text{init}}. For space considerations, the quantification of the sub-optimality in the computations of 𝒳init\mathcal{X}_{\text{init}} is omitted.

3 Overview of the Proposed Formal Training Framework

Before describing our approach to solve Problems 1, we start by recalling the connection between ReLU neural networks and Continuous Piece-Wise Affine (CPWA) functions as follows (Pascanu et al.,, 2013):

Proposition 2.

Every nm\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} ReLU NN represents a continuous piece-wise affine function.

In this paper, we confine our attention to CPWA controllers (and hence neural network controllers) that are selected from a bounded polytopic set 𝒫K×𝒫bm×n×m\mathcal{P}^{K}\times\mathcal{P}^{b}\subset\mathbb{R}^{m\times n}\times\mathbb{R}^{m}, i.e., we assume that Ki𝒫KK_{i}\in\mathcal{P}^{K} and bi𝒫bb_{i}\in\mathcal{P}^{b}.

Our solution to Problem 1 is to use the mathematical model of the dynamical system (3) to guide the training of neural networks. In particular, our approach is split into two components, one to address the safety specifications while the other one addresses the liveness specifications as described in the next two subsections.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 1: (a) An example of a state space is partitioned into abstract states qiq_{i}, i=1,2,3i=1,2,3, goal (the goal region), and obst (the obstacle). The controller parameter space is discretized into controller partitions 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2}. (b) Posterior graph constructed using reachability analysis of the dynamical model. (c) Assign one controller partition to each abstract state. Since both the controller partitions 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} are in the label from the state q3q_{3} to the obstacle, q3q_{3} is unsafe and hence is not considered in (c). (d) Train a local NN for each abstract state and enforce the CPWA functions represented by the NNs are within the assigned controller partitions. (e) Combine local NNs into a single NN controller that is guaranteed to satisfy the given specification ϕ\phi.

3.1 Addressing Safety Specification ϕsafety\phi_{\text{safety}}

Our approach to address the safety specification ϕsafety\phi_{\text{safety}} is as follows:

  • Step 1: Capture the closed-loop behavior of the system under all CPWA controllers using a finite-state abstract model. Such abstract model can be obtained by extending current results on reachability analysis for polytopic systems (Yordanov et al.,, 2012).

  • Step 2: Identify a family of CPWA controllers that lead to correct behavior on the abstract model.

  • Step 3: Enforce the training of the NN to pick from the identified family of the CPWA controllers.

Figure 1 conceptualizes our framework. To start with, we construct an abstract model by partitioning both the state space 𝒳\mathcal{X} and the set of all allowed CPWA functions 𝒫K×𝒫b\mathcal{P}^{K}\times\mathcal{P}^{b}. In Figure 1 (a), the state space 𝒳\mathcal{X} is partitioned into a set of abstract states 𝕏={q1,q2,q3,obst,goal}\mathbb{X}=\{q_{1},q_{2},q_{3},\text{obst},\text{goal}\} such that 𝒳=q𝕏q\mathcal{X}=\bigcup_{q\in\mathbb{X}}q. Similarly, the controller space 𝒫K×𝒫b\mathcal{P}^{K}\times\mathcal{P}^{b} is partitioned into a set of controller partitions ={𝒫1,𝒫2}\mathbb{P}=\{\mathcal{P}_{1},\mathcal{P}_{2}\} such that 𝒫K×𝒫b=𝒫𝒫\mathcal{P}^{K}\times\mathcal{P}^{b}=\bigcup_{\mathcal{P}\in\mathbb{P}}\mathcal{P} (each controller partition 𝒫\mathcal{P}\in\mathbb{P} represents a subset of CPWA functions). The resulting abstract model is a non-deterministic finite transition system with nodes represent abstract states in 𝕏\mathbb{X} and transitions are labeled by controller partitions in \mathbb{P}. Transitions between the abstract states are computed based on the reachable sets of the nonlinear system (3) from each abstract state q𝕏q\in\mathbb{X} and under every controller partition 𝒫\mathcal{P}\in\mathbb{P}.

Based on the abstract model, we compute a function PsafeP_{\text{safe}} that maps each abstract state q𝕏q\in\mathbb{X} to a subset of the controller partitions that are considered to be safe at the abstract state qq, denoted by Psafe(q)P_{\text{safe}}(q). For example, in Figure 1 (b), since the transition from q1q_{1} labeled by 𝒫2\mathcal{P}_{2} leads to the obstacle, the controller partition 𝒫2\mathcal{P}_{2} is unsafe at q1q_{1}, and hence Psafe(q1)={𝒫1}P_{\text{safe}}(q_{1})=\{\mathcal{P}_{1}\}. Similarly, Psafe(q2)={𝒫1,𝒫2}P_{\text{safe}}(q_{2})=\{\mathcal{P}_{1},\mathcal{P}_{2}\}. For the abstract state q3q_{3}, since both 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} can lead to the obstacle, Psafe(q3)P_{\text{safe}}(q_{3}) is empty, and hence q3q_{3} is considered an unsafe abstract state. The set of initial states that can provide safety guarantee is the union of the safe abstract states, i.e., 𝒳init=q1q2\mathcal{X}_{\text{init}}=q_{1}\cup q_{2}.

Using the set of safe controllers captured by Psafe(q)P_{\text{safe}}(q), it is direct to show that if a neural network gives rise to CPWA functions in Psafe(q)P_{\text{safe}}(q) at every abstract state q𝕏q\in\mathbb{X}, then the NN satisfies the safety specifications. Therefore, to ensure the safety of the resulting trained neural network, we propose a NN weight “projection” operator to enforce that the trained NN only gives rise to the CPWA functions indicated by Psafe(q)P_{\text{safe}}(q).

3.2 Addressing Liveness Specification ϕliveness\phi_{\text{liveness}}

Our approach to addressing the liveness specification ϕliveness\phi_{\text{liveness}} (reaching the goal) can be summarized as follows:

  • Step 1: Use the abstract model to identify candidate controller partitions 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q) that can lead to satisfaction of the liveness properties.

  • Step 2: Train one local neural network NNq\text{NN}_{q} for each of the abstract states using either imitation learning or reinforcement learning. We take into account the knowledge of 𝒫\mathcal{P}^{\star} as constraints during training, and use the NN weight projection operator to ensure that the resulting NN still enjoys the safety specifications.

  • Step 3: Combine all the local neural networks NNq\text{NN}_{q} into a single global NN.

In the context of the example in Figure 1 (c), the controller partition 𝒫1\mathcal{P}_{1} is assigned to both q1q_{1} and q2q_{2} as the candidate controller partition 𝒫\mathcal{P}^{\star} that may lead to the satisfaction of the liveness properties.

Next, we train one local neural network NNq\text{NN}_{q} for each abstract state, or a subset of abstract states assigned with the same controller partition as shown in Figure 1 (d). During training of the local neural networks, we project their weights to enforce that the resulting NNs only give rise to the CPWA functions belong to the assigned controller partitions 𝒫\mathcal{P}^{\star}. Since the controller partition 𝒫\mathcal{P}^{\star} is chosen from the subset Psafe(q)P_{\text{safe}}(q) at every q𝕏q\in\mathbb{X}, the resulting NN enjoys the same safety guarantees.

Finally, in Figure 1 (e), we combine the local networks trained for each abstract state into a single NN controller, with layers of fixed weights to decide which part of the NN should be activated.

3.3 Formal Guarantees

We highlight that the proposed framework above always guarantees that the resulting NN satisfies the safety specification ϕsafety\phi_{\text{safety}} thanks to the NN weight projection operator. This is reflected in Theorem 10 in Section 5.2.

On the other hand, achieving the liveness specification ϕliveness\phi_{\text{liveness}} depends on the effort of neural network training and the quality of training data, and hence needs an extra step of formally verifying the resulting neural networks and iteratively changing the candidate controller partitions 𝒫\mathcal{P}^{\star} if needed. However, we argue that the resulting NN architecture is modular and is composed of a set of local networks NNq\text{NN}_{q} that are more amenable to verification. The proposed architecture leads to a direct divide-and-conquer approach in which only some of the local networks NNq\text{NN}_{q} may need to be re-trained whenever the liveness properties are not met.

4 Provably Safe Training of NN Controllers

In this section, we provide details on how to construct the abstract model that captures the behavior of all CPWA controllers along with how to choose the controller partitions that satisfy the safety specifications and train the corresponding NN controllers.

4.1 Step 1: Constructing the Abstract Model

In order to capture the behavior of the system (3) under all possible CPWA controllers ΨCPWA\Psi_{\text{CPWA}} in the form of (1), we construct a finite-state abstract model by discretizing both the state space and the set of all allowed CPWA functions. In particular, we partition the state space 𝒳n\mathcal{X}\subset\mathbb{R}^{n} into a collection of abstract states, denoted by 𝕏={q1,,qN}\mathbb{X}=\{q_{1},\ldots,q_{N}\}. The goal region 𝒳goal𝒳\mathcal{X}_{\text{goal}}\subset\mathcal{X} is represented by a single abstract state qgoal𝕏q_{\text{goal}}\in\mathbb{X}, and the set of obstacles 𝕏obst=i=1o{qobsti}\mathbb{X}_{\text{obst}}=\bigcup_{i=1}^{o}\{q_{\text{obst}_{i}}\} represents each obstacle 𝒪i𝒳\mathcal{O}_{i}\subset\mathcal{X} by an abstract state qobsti𝕏q_{\text{obst}_{i}}\in\mathbb{X}, i=1,,oi=1,\ldots,o. Other abstract states qi𝕏(𝕏obst{qgoal})q_{i}\in\mathbb{X}\setminus(\mathbb{X}_{\text{obst}}\cup\{q_{\text{goal}}\}) represent infinity-norm balls in the state space 𝒳n\mathcal{X}\subset\mathbb{R}^{n}, and the partitioning of the state space satisfies 𝒳=q𝕏q\mathcal{X}=\bigcup_{q\in\mathbb{X}}q and Int(qi)Int(qj)=\mathrm{Int}(q_{i})\cap\mathrm{Int}(q_{j})=\emptyset if iji\neq j. For the interest of this paper, we consider the size of abstract states is pre-specified, by noticing that discretization techniques such as adaptive partitioning (e.g., Hsu et al., (2018)) can be directly applied in our framework. In experiments, we show results with both uniform and non-uniform partitioning of the state space.

Similarly, we partition the controller space into polytopic subsets. For simplicity of notation, we define the set of parameters 𝒫K×bm×(n+1)\mathcal{P}^{K\times b}\subset\mathbb{R}^{m\times(n+1)} be a polytope that combines 𝒫Km×n\mathcal{P}^{K}\in\mathbb{R}^{m\times n} and 𝒫bm\mathcal{P}^{b}\in\mathbb{R}^{m}, and with some abuse of notation, we use Ki(x)K_{i}(x) with a single parameter Ki𝒫K×bK_{i}\in\mathcal{P}^{K\times b} to denote Kixi+biK^{\prime}_{i}x_{i}+b^{\prime}_{i} with the pair (Ki,bi)=Ki(K^{\prime}_{i},b^{\prime}_{i})=K_{i}. The controller space 𝒫K×bm×(n+1)\mathcal{P}^{K\times b}\subset\mathbb{R}^{m\times(n+1)} is discretized into a collection of polytopic subsets in m×(n+1)\mathbb{R}^{m\times(n+1)}, denoted by ={𝒫1,,𝒫M}\mathbb{P}=\{\mathcal{P}_{1},\ldots,\mathcal{P}_{M}\}, such that 𝒫K×b=𝒫𝒫\mathcal{P}^{K\times b}=\bigcup_{\mathcal{P}\in\mathbb{P}}\mathcal{P} and Int(𝒫i)Int(𝒫j)=\mathrm{Int}(\mathcal{P}_{i})\cap\mathrm{Int}(\mathcal{P}_{j})=\emptyset if iji\neq j. We call each of the subsets 𝒫i\mathcal{P}_{i}\in\mathbb{P} a controller partition. Each controller partition 𝒫\mathcal{P}\in\mathbb{P} represents a subset of CPWA functions, by restricting parameters KiK_{i} in a CPWA function to take values from 𝒫\mathcal{P}.

In order to reason about the safety property ϕsafety\phi_{\text{safety}}, we introduce a posterior operator based on the dynamical model (3). The posterior of an abstract state q𝕏q\in\mathbb{X} under a controller partition 𝒫\mathcal{P}\in\mathbb{P} is the set of states that can be reached in one step from states xqx\in q by using affine state feedback controllers with parameters K𝒫K\in\mathcal{P}, i.e.:

Post(q,𝒫){f(x,K(x))n|xq,K𝒫}.\mathrm{Post}(q,\mathcal{P})\triangleq\{f(x,K(x))\in\mathbb{R}^{n}\;|\;x\in q,\ K\in\mathcal{P}\}. (6)

Calculating the exact posterior of a nonlinear system is computationally daunting. Therefore, we rely on over-approximations of the posterior set denoted by Post¯\overline{\mathrm{Post}}.

Our abstract model is defined by using the set of abstract states 𝕏\mathbb{X}, the set of controller partitions \mathbb{P}, and the posterior operator Post¯\overline{\mathrm{Post}}. Intuitively, an abstract state q𝕏q\in\mathbb{X} has a transition to q𝕏q^{\prime}\in\mathbb{X} under a controller partition 𝒫\mathcal{P}\in\mathbb{P} if the intersection between qq^{\prime} and Post¯(q,𝒫)\overline{\mathrm{Post}}(q,\mathcal{P}) is non-empty.

Definition 3.

(Posterior Graph) A posterior graph is a finite transition system SPost(X,X0,L,)S_{\mathrm{Post}}\triangleq(X,X_{0},L,\longrightarrow), where:

  • X=𝕏X=\mathbb{X};

  • X0=𝕏X_{0}=\mathbb{X};

  • L=2L=2^{\mathbb{P}};

  • q𝑙qq\overset{l}{\longrightarrow}q^{\prime}, if q{qgoal}𝕏obstq\not\in\{q_{\text{goal}}\}\cup\mathbb{X}_{\text{obst}} and l={𝒫|qPost¯(q,𝒫)}l=\{\mathcal{P}\in\mathbb{P}\;|\;q^{\prime}\cap\overline{\mathrm{Post}}(q,\mathcal{P})\neq\emptyset\}\neq\emptyset.

The computation of the posterior operator can be done by borrowing existing techniques in reachability analysis of polytopic systems (Yordanov et al.,, 2012), with the subtle difference due to the need to consider polytopic partitions of the parameter space 𝒫K×bm×(n+1)\mathcal{P}^{K\times b}\subset\mathbb{R}^{m\times(n+1)} instead of the well-studied problem of considering polytopic partitions of the input space 𝒰m\mathcal{U}\subset\mathbb{R}^{m}. We refer to the Posterior Graph SPostS_{\mathrm{Post}} as the finite-state abstract model of (3).

4.2 Step 2: Computing the Function PsafeP_{\text{safe}}

Once the abstract model SPostS_{\mathrm{Post}} is computed, our framework identifies a set of safe controller partitions Psafe(q)P_{\text{safe}}(q)\subseteq\mathbb{P} at each abstract state q𝕏q\in\mathbb{X}. It is possible that the set Psafe(q)P_{\text{safe}}(q) is empty at some state q𝕏q\in\mathbb{X}, in which case, the state qq is considered to be unsafe.

We first introduce the Next\mathrm{Next} operator as follows:

Next(q,𝒫){q𝕏|qPost¯(q,𝒫)}.\mathrm{Next}(q,\mathcal{P})\triangleq\{q^{\prime}\in\mathbb{X}\;|\;q^{\prime}\cap\overline{\mathrm{Post}}(q,\mathcal{P})\neq\emptyset\}. (7)

We identify the set of unsafe abstract states in a recursive manner by backtracking from the set of obstacles 𝕏obst\mathbb{X}_{\text{obst}} in the posterior graph SPostS_{\mathrm{Post}}:

𝕏unsafe0=𝕏obst\displaystyle\mathbb{X}_{\text{unsafe}}^{0}=\mathbb{X}_{\text{obst}}
𝕏unsafe1={q𝕏|𝒫:Next(q,𝒫)𝕏unsafe0}𝕏unsafe0\displaystyle\mathbb{X}_{\text{unsafe}}^{1}=\{q\in\mathbb{X}\;|\;\forall\mathcal{P}\in\mathbb{P}:\mathrm{Next}(q,\mathcal{P})\cap\mathbb{X}_{\text{unsafe}}^{0}\neq\emptyset\}\cup\mathbb{X}_{\text{unsafe}}^{0}
\displaystyle\vdots
𝕏unsafek={q𝕏|𝒫:Next(q,𝒫)𝕏unsafek1}𝕏unsafek1\displaystyle\mathbb{X}_{\text{unsafe}}^{k}=\{q\in\mathbb{X}\;|\;\forall\mathcal{P}\in\mathbb{P}:\mathrm{Next}(q,\mathcal{P})\cap\mathbb{X}_{\text{unsafe}}^{k-1}\neq\emptyset\}\cup\mathbb{X}_{\text{unsafe}}^{k-1}

The backtracking stops at iteration kk^{*} when it cannot find new unsafe states, i.e., 𝕏unsafek=𝕏unsafek1\mathbb{X}_{\text{unsafe}}^{k^{*}}=\mathbb{X}_{\text{unsafe}}^{k^{*}-1}. This backtracking process ensures that the set 𝕏unsafek\mathbb{X}_{\text{unsafe}}^{k^{*}} includes all the abstract states that inevitably lead to the obstacles 𝕏obst\mathbb{X}_{\text{obst}} (either directly or over multiple transitions).

Then, we define the set of safe abstract states as 𝕏safe𝕏𝕏unsafek\mathbb{X}_{\text{safe}}\triangleq\mathbb{X}\setminus\mathbb{X}_{\text{unsafe}}^{k^{*}}. The following proposition guarantees that there exists a controller partition assignment for each safe abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}} such that collision-free can be satisfied for all times in the future.

Proposition 4.

The set of safe abstract states 𝕏safe\mathbb{X}_{\text{safe}} is control positive invariant, i.e.:

q𝕏safe,𝒫s.t.Next(q,𝒫)𝕏safe.\forall q\in\mathbb{X}_{\text{safe}},\;\exists\mathcal{P}\in\mathbb{P}\;\text{s.t.}\;\mathrm{Next}(q,\mathcal{P})\subseteq\mathbb{X}_{\text{safe}}. (8)
{pf}

Assume for the sake of contradiction that q𝕏safe\exists q\in\mathbb{X}_{\text{safe}}, s.t. 𝒫\forall\mathcal{P}\in\mathbb{P}, Next(q,𝒫)𝕏unsafek1\mathrm{Next}(q,\mathcal{P})\cap\mathbb{X}_{\text{unsafe}}^{k^{*}-1}\neq\emptyset. Then, by the backtracking process, q𝕏unsafekq\in\mathbb{X}_{\text{unsafe}}^{k^{*}} which contradicts the assumption that q𝕏safe=𝕏𝕏unsafekq\in\mathbb{X}_{\text{safe}}=\mathbb{X}\setminus\mathbb{X}_{\text{unsafe}}^{k^{*}}. Correspondingly, the function PsafeP_{\text{safe}} maps each abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}} to the subset of controller partitions that can be used to force the invariance of the set 𝕏safe\mathbb{X}_{\text{safe}}:

Psafe:q{𝒫|Next(q,𝒫)𝕏safe}.P_{\text{safe}}:q\mapsto\{\mathcal{P}\in\mathbb{P}\ |\ \mathrm{Next}(q,\mathcal{P})\subseteq\mathbb{X}_{\text{safe}}\}. (9)

The following theorem summarizes the safety property.

Theorem 5.

Let Ψq,CPWA:q𝒰\Psi_{q,\text{CPWA}}:q\rightarrow\mathcal{U} be a state feedback CPWA controller defined at an abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}}. Assume that the parameters of the affine functions KiK_{i} in Ψq,CPWA\Psi_{q,\text{CPWA}} are chosen such that Ki𝒫K_{i}\in\mathcal{P} and 𝒫Psafe(q)\mathcal{P}\in P_{\text{safe}}(q). Consider the set of initial states 𝒳init=q𝕏safeq\mathcal{X}_{\text{init}}=\bigcup_{q\in\mathbb{X}_{\text{safe}}}q and define a controller Ψ:𝒳safe𝒰\Psi:\mathcal{X}_{\text{safe}}\rightarrow\mathcal{U} as:

Ψ(x){Ψq1,CPWA(x)xq1ΨqN,CPWA(x)xqN,\Psi(x)\triangleq\begin{cases}\Psi_{q_{1},\text{CPWA}}(x)\qquad&\forall x\in q_{1}\\ \qquad\vdots\\ \Psi_{q_{N^{\prime}},\text{CPWA}}(x)&\forall x\in q_{N^{\prime}},\end{cases} (10)

where N=|𝕏safe|N^{\prime}=|\mathbb{X}_{\text{safe}}| is the number of safe abstract states. Then, it holds that Ψ,𝒳initϕsafety.\Psi,\mathcal{X}_{\text{init}}\models\phi_{\text{safety}}.

{pf}

It is sufficient to show that the set 𝒳init=q𝕏safeq\mathcal{X}_{\text{init}}=\bigcup_{q\in\mathbb{X}_{\text{safe}}}q is positive invariant under the controller Ψ\Psi in the form of (10). In other words, for any state x𝒳initx\in\mathcal{X}_{\text{init}}, let q𝕏safeq\in\mathbb{X}_{\text{safe}} be the abstract state contains xx, i.e., xqx\in q, then by applying any controller K𝒫K\in\mathcal{P} with any 𝒫Psafe(q)\mathcal{P}\in P_{\text{safe}}(q), it holds that f(x,K(x))𝒳initf(x,K(x))\in\mathcal{X}_{\text{init}}.

The definitions of the posterior operator (6) and the Next\mathrm{Next} operator (7) directly yield f(x,K(x))Post¯(q,𝒫)f(x,K(x))\in\overline{\mathrm{Post}}(q,\mathcal{P}) and Post¯(q,𝒫)qNext(q,𝒫)q\overline{\mathrm{Post}}(q,\mathcal{P})\subseteq\bigcup_{q^{\prime}\in\mathrm{Next}(q,\mathcal{P})}q^{\prime}, and hence:

f(x,K(x))qNext(q,𝒫)q,f(x,K(x))\in\bigcup_{q^{\prime}\in\mathrm{Next}(q,\mathcal{P})}q^{\prime}, (11)

where xqx\in q, q𝕏q\in\mathbb{X}, K𝒫K\in\mathcal{P}, and 𝒫𝕏\mathcal{P}\in\mathbb{X} are arbitrarily chosen. Now consider an arbitrary safe abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}}, by Proposition 4 and the definition of PsafeP_{\text{safe}} as (9), the set Psafe(q)P_{\text{safe}}(q)\neq\emptyset, and for any 𝒫Psafe(q)\mathcal{P}\in P_{\text{safe}}(q), it holds that:

qNext(q,𝒫)q𝒳init,\bigcup_{q^{\prime}\in\mathrm{Next}(q,\mathcal{P})}q^{\prime}\subseteq\mathcal{X}_{\text{init}}, (12)

where it uses the definition 𝒳init=q𝕏safeq\mathcal{X}_{\text{init}}=\bigcup_{q\in\mathbb{X}_{\text{safe}}}q. Therefore, by (11) and (12), f(x,K(x))𝒳initf(x,K(x))\in\mathcal{X}_{\text{init}} for any xq𝕏safex\in q\in\mathbb{X}_{\text{safe}} and any K𝒫Psafe(q)K\in\mathcal{P}\in P_{\text{safe}}(q).

By Theorem 5, the system is guaranteed to satisfy the safety specification ϕsafety\phi_{\text{safety}} by applying any CPWA controller Ψq,CPWA\Psi_{q,\text{CPWA}} at each abstract sate q𝕏safeq\in\mathbb{X}_{\text{safe}}, as long as the controller parameters KiK_{i} are chosen from the safe controller partitions 𝒫Psafe(q)\mathcal{P}\in P_{\text{safe}}(q). This allows us to conclude the safety guarantee provided by a NN controller if the CPWA function represented by the NN is chosen from the safe controller partitions, as we show in detail in the next subsection.

4.3 Step 3: Safe Training Using NN Weight Projection

As described in Section 3, the final NN consists of several local networks NNq\text{NN}_{q} that will be trained to account for the liveness property. Nevertheless, to ensure that the safety property is met by each of the local networks NNq\text{NN}_{q}, we propose a NN weight projection operator that can be incorporated in the training of these local NNs. Given a controller partition 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q), this operator projects the weights of NNq\text{NN}_{q} to ensure that the network represents a CPWA function belongs to 𝒫\mathcal{P}^{\star}.

Consider a local network NNq\text{NN}_{q} of FF layers, including F1F-1 hidden layers and an output layer. The NN weight projection operator updates the weight matrix W(F)W^{(F)} and the bias vector b(F)b^{(F)} of the output layer to W^(F)\widehat{W}^{(F)} and b^(F)\widehat{b}^{(F)}, respectively, such that the CPWA function represented by the updated local network belongs to the selected controller partition 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q) at linear regions intersecting qq. To be specific, let the CPWA functions represented by the local network NNq\text{NN}_{q} before and after the weight projection be 𝒩𝒩θ\mathcal{NN}_{\theta} and 𝒩𝒩θ^\mathcal{NN}_{\widehat{\theta}}, respectively. Then, we formulate the NN weight projection operator as an optimization problem:

minW^(F),b^(F)maxxq𝒩𝒩θ^(x)𝒩𝒩θ(x)1\displaystyle\underset{\widehat{W}^{(F)},\widehat{b}^{(F)}}{\min}\;\underset{x\in q}{\max}\;|\!|{\mathcal{NN}_{\widehat{\theta}}(x)-\mathcal{NN}_{\theta}(x)}|\!|_{1} (13)
s.t.Ki𝒫,i{𝕃𝒩𝒩θ|q}\displaystyle\text{s.t.}\;K_{i}\in\mathcal{P}^{\star},\;\forall\mathcal{R}_{i}\in\{\mathcal{R}\in\mathbb{L}_{\mathcal{NN}_{\theta}}|\mathcal{R}\cap q\neq\emptyset\} (14)

where by the notation (2), 𝕃𝒩𝒩θ={1,,L}\mathbb{L}_{\mathcal{NN}_{\theta}}=\{\mathcal{R}_{1},\ldots,\mathcal{R}_{L}\} is the set of linear regions of the CPWA function 𝒩𝒩θ\mathcal{NN}_{\theta}. Consider the CPWA function 𝒩𝒩θ^\mathcal{NN}_{\widehat{\theta}} is in the form of (1), where affine functions parametrized by Kim×(n+1)K_{i}\in\mathbb{R}^{m\times(n+1)} are associated to linear regions i\mathcal{R}_{i}, i=1,,Li=1,\ldots,L. Then, the constraints (14) require that Ki𝒫K_{i}\in\mathcal{P}^{\star} whenever the corresponding linear region i\mathcal{R}_{i} intersects the abstract state qq. As a common practice, we consider there is no ReLU activation function in the output layer. Then, the projection operator does not change the set of linear regions, i.e., 𝕃𝒩𝒩θ^=𝕃𝒩𝒩θ\mathbb{L}_{\mathcal{NN}_{\widehat{\theta}}}=\mathbb{L}_{\mathcal{NN}_{\theta}}, since the linear regions of a neural network only depend on the hidden layers, while the projection operator updates the output layer weights.

The optimization problem (13)-(14) tries to minimize the change of NN outputs due to the projection, which is measured by the largest 1-norm difference between the control signals 𝒩𝒩θ^\mathcal{NN}_{\widehat{\theta}} and 𝒩𝒩θ\mathcal{NN}_{\theta} across the abstract state qq, i.e., maxxq𝒩𝒩θ^(x)𝒩𝒩θ(x)1\underset{x\in q}{\max}\;|\!|{\mathcal{NN}_{\widehat{\theta}}(x)-\mathcal{NN}_{\theta}(x)}|\!|_{1}. The relation between this objective function and the change of output layer weights is captured in the following proposition.

Proposition 6.

Consider an FF-layer local network NNq\text{NN}_{q} whose output layer weights change from W(F)W^{(F)}, b(F)b^{(F)} to W^(F)\widehat{W}^{(F)}, b^(F)\widehat{b}^{(F)}. Then, the largest change of the NN’s outputs across the abstract state qq is bounded as follows:

maxxq𝒩𝒩θ^(x)𝒩𝒩θ(x)1\displaystyle\underset{x\in q}{\max}\;|\!|{\mathcal{NN}_{\widehat{\theta}}(x)-\mathcal{NN}_{\theta}(x)}|\!|_{1} (15)
maxxVert(𝕃𝒩𝒩θq)i=1mj=1𝔬F1|ΔWij(F)|hj(x)+i=1m|Δbi(F)|\displaystyle\leq\underset{x\in\mathrm{Vert}(\mathbb{L}_{\mathcal{NN}_{\theta}\cap q})}{\max}\;\sum_{i=1}^{m}\sum_{j=1}^{\mathfrak{o}_{F-1}}|\Delta W_{ij}^{(F)}|h_{j}(x)+\sum_{i=1}^{m}|\Delta b_{i}^{(F)}|

where ΔWij(F)\Delta W_{ij}^{(F)} and Δbi(F)\Delta b_{i}^{(F)} are the (i,j)(i,j)-th and the ii-th entry of ΔW(F)=W^(F)W(F)\Delta W^{(F)}=\widehat{W}^{(F)}-W^{(F)} and Δb(F)=b^(F)b(F)\Delta b^{(F)}=\widehat{b}^{(F)}-b^{(F)}, respectively.

In the above proposition, by following the notation of layer functions (4), we use a single function h:n𝔬F1h:\mathbb{R}^{n}\rightarrow\mathbb{R}^{\mathfrak{o}_{F-1}} to represent all the hidden layers:

h(x)=(Lθ(F1)Lθ(F2)Lθ(1))(x),h(x)=(L_{\theta^{(F-1)}}\circ L_{\theta^{(F-2)}}\circ\dots\circ L_{\theta^{(1)}})(x), (16)

where 𝔬F1\mathfrak{o}_{F-1} is the number of neurons in the (F1)(F-1)-layer (the last hidden layer). We denote by 𝕃𝒩𝒩θq\mathbb{L}_{\mathcal{NN}_{\theta}\cap q} the intersected regions between the linear regions in 𝕃𝒩𝒩θ\mathbb{L}_{\mathcal{NN}_{\theta}} and the abstract state qq, i.e.:

𝕃𝒩𝒩θq{q|𝕃𝒩𝒩θ,q},\mathbb{L}_{\mathcal{NN}_{\theta}\cap q}\triangleq\{\mathcal{R}\cap q\;|\;\mathcal{R}\in\mathbb{L}_{\mathcal{NN}_{\theta}},\mathcal{R}\cap q\neq\emptyset\}, (17)

and denote by Vert(𝕃𝒩𝒩θq)\mathrm{Vert}(\mathbb{L}_{\mathcal{NN}_{\theta}\cap q}) the set of all vertices of regions in 𝕃𝒩𝒩θq\mathbb{L}_{\mathcal{NN}_{\theta}\cap q}, i.e., Vert(𝕃𝒩𝒩θq)𝕃𝒩𝒩θqVert()\mathrm{Vert}(\mathbb{L}_{\mathcal{NN}_{\theta}\cap q})\triangleq\bigcup_{\mathcal{R}\in\mathbb{L}_{\mathcal{NN}_{\theta}\cap q}}\ \mathrm{Vert}(\mathcal{R}). {pf} The function 𝒩𝒩θ\mathcal{NN}_{\theta} can be written as 𝒩𝒩θ(x)=W(F)h(x)+b(F)\mathcal{NN}_{\theta}(x)=W^{(F)}h(x)+b^{(F)}, and after the change of output layer weights, 𝒩𝒩θ^(x)=W^(F)h(x)+b^(F)\mathcal{NN}_{\widehat{\theta}}(x)=\widehat{W}^{(F)}h(x)+\widehat{b}^{(F)}. Then,

maxxq𝒩𝒩θ^(x)𝒩𝒩θ(x)1\displaystyle\underset{x\in q}{\max}\;|\!|{\mathcal{NN}_{\widehat{\theta}}(x)-\mathcal{NN}_{\theta}(x)}|\!|_{1} (18)
=maxxqi=1m|j=1𝔬F1ΔWij(F)hj(x)+Δbi(F)|\displaystyle=\underset{x\in q}{\max}\;\sum_{i=1}^{m}|\sum_{j=1}^{\mathfrak{o}_{F-1}}\Delta W_{ij}^{(F)}h_{j}(x)+\Delta b_{i}^{(F)}| (19)
maxxqi=1mj=1𝔬F1|ΔWij(F)|hj(x)+i=1m|Δbi(F)|\displaystyle\leq\underset{x\in q}{\max}\;\sum_{i=1}^{m}\sum_{j=1}^{\mathfrak{o}_{F-1}}|\Delta W_{ij}^{(F)}|h_{j}(x)+\sum_{i=1}^{m}|\Delta b_{i}^{(F)}| (20)
=maxxVert(𝕃𝒩𝒩θq)i=1mj=1𝔬F1|ΔWij(F)|hj(x)+i=1m|Δbi(F)|\displaystyle=\underset{x\in\mathrm{Vert}(\mathbb{L}_{\mathcal{NN}_{\theta}\cap q})}{\max}\sum_{i=1}^{m}\sum_{j=1}^{\mathfrak{o}_{F-1}}|\Delta W_{ij}^{(F)}|h_{j}(x)+\sum_{i=1}^{m}|\Delta b_{i}^{(F)}| (21)

where (19) directly follows the form of 𝒩𝒩θ\mathcal{NN}_{\theta} and 𝒩𝒩θ^\mathcal{NN}_{\widehat{\theta}}, (20) takes the absolute value of each term before summation and uses the fact that the hidden layer outputs h(x)0h(x)\geq 0. We notice that when hh is restricted to each linear region of 𝒩𝒩θ\mathcal{NN}_{\theta}, (20) is a linear program whose optimal solution is attained at extreme points. Therefore, in (21), the maximum can be taken over a finite set of all the vertices of linear regions in 𝕃𝒩𝒩θq\mathbb{L}_{\mathcal{NN}_{\theta}\cap q}.

Proposition 6 proposes a direct way to design the intended projection operator. It implies that minimizing the right hand side of the inequality (15) will minimize the upper bound on the change in the control signal due to the projection operator. To that end, given a controller partition 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q), let Π𝒫\Pi_{\mathcal{P}^{\star}} be the NN weight projection operator that updates the output layer weights of NNq\text{NN}_{q} as W^(F),b^(F)=Π𝒫(𝒩𝒩θ)\widehat{W}^{(F)},\widehat{b}^{(F)}=\Pi_{\mathcal{P}^{\star}}(\mathcal{NN}_{\theta}). In order to minimize the change of NN outputs due to the projection, we minimize its upper bound in (15). Accordingly, we write the NN weight projection operator Π𝒫\Pi_{\mathcal{P}^{\star}} as the following optimization problem:

argminW^(F),b^(F)maxxVert(𝕃𝒩𝒩θq)i=1mj=1𝔬F1|ΔWij(F)|hj(x)+i=1m|Δbi(F)|\displaystyle\underset{\widehat{W}^{(F)},\widehat{b}^{(F)}}{\text{argmin}}\;\underset{x\in\mathrm{Vert}(\mathbb{L}_{\mathcal{NN}_{\theta}\cap q})}{\max}\ \sum_{i=1}^{m}\sum_{j=1}^{\mathfrak{o}_{F-1}}|\Delta W_{ij}^{(F)}|h_{j}(x)+\sum_{i=1}^{m}|\Delta b_{i}^{(F)}| (22)
s.t.Ki𝒫,i{𝕃𝒩𝒩θ|q}.\displaystyle\text{s.t.}\;K_{i}\in\mathcal{P}^{\star},\;\forall\mathcal{R}_{i}\in\{\mathcal{R}\in\mathbb{L}_{\mathcal{NN}_{\theta}}|\mathcal{R}\cap q\neq\emptyset\}. (23)

A direct question is related to the computational complexity of the proposed projection operator Π𝒫\Pi_{\mathcal{P}^{\star}}. This is addressed in the following proposition.

Proposition 7.

The NN weight projection operator Π𝒫\Pi_{\mathcal{P}^{\star}} defined as the optimization problem (22)-(23) is a linear program.

{pf}

Using the epigraph form of the problem (22)-(23), we can write it in the equivalent form:

minW^(F),b^(F),t,sij,vitsuch that\displaystyle\underset{\widehat{W}^{(F)},\widehat{b}^{(F)},t,s_{ij},v_{i}}{\min}\;t\qquad\text{such that}
i=1mj=1𝔬F1sijhj(x)+i=1mvit,xVert(𝕃𝒩𝒩θq)\displaystyle\sum_{i=1}^{m}\sum_{j=1}^{\mathfrak{o}_{F-1}}s_{ij}h_{j}(x)+\sum_{i=1}^{m}v_{i}\leq t,\;\forall x\in\mathrm{Vert}(\mathbb{L}_{\mathcal{NN}_{\theta}\cap q}) (24)
|W^ij(F)Wij(F)|sij,i=1,,m,j=1,,𝔬F1\displaystyle|\widehat{W}_{ij}^{(F)}-W_{ij}^{(F)}|\leq s_{ij},\;i=1,\ldots,m,\;j=1,\ldots,\mathfrak{o}_{F-1} (25)
|b^i(F)bi(F)|vi,i=1,,m\displaystyle|\widehat{b}_{i}^{(F)}-b_{i}^{(F)}|\leq v_{i},\;i=1,\ldots,m (26)
Ki𝒫,i{𝕃𝒩𝒩θ|q}.\displaystyle K_{i}\in\mathcal{P}^{\star},\;\forall\mathcal{R}_{i}\in\{\mathcal{R}\in\mathbb{L}_{\mathcal{NN}_{\theta}}\;|\;\mathcal{R}\cap q\neq\emptyset\}. (27)

The inequality (24) is affine since the hidden layer function hh is known and does not depend on the optimization variables. The number of inequalities in (24) is finite since the set of vertices Vert(𝕃𝒩𝒩θq)\mathrm{Vert}(\mathbb{L}_{\mathcal{NN}_{\theta}\cap q}) is finite. To see the constraint (27) is affine, consider the CPWA function 𝒩𝒩θ^:xW^(F)h(x)+b^(F)\mathcal{NN}_{\widehat{\theta}}:x\mapsto\widehat{W}^{(F)}h(x)+\widehat{b}^{(F)}, which is represented by the NN with the output layer weights W^(F)\widehat{W}^{(F)}, b^(F)\widehat{b}^{(F)}, and the hidden layer function hh. Since hh restricted to each linear region i𝕃𝒩𝒩θ\mathcal{R}_{i}\in\mathbb{L}_{\mathcal{NN}_{\theta}} is a known affine function, the parameter KiK_{i} (coefficients of 𝒩𝒩θ^\mathcal{NN}_{\widehat{\theta}} restricted to i\mathcal{R}_{i}) affinely depends on W^(F)\widehat{W}^{(F)} and b^(F)\widehat{b}^{(F)}.

Finally, the following result shows the safety guarantee provided by applying the projection operator Π𝒫\Pi_{\mathcal{P}^{\star}} during training of the neural network controller.

Theorem 8.

Given an abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}} and an arbitrary controller partition 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q). Let the weights of NNq\text{NN}_{q} be projected by the operator Π𝒫\Pi_{\mathcal{P}^{\star}} (i.e., the optimization problem (22)-(23)), then NNq\text{NN}_{q} is guaranteed to be safe at qq, i.e., NNq,qϕsafety\text{NN}_{q},q\models\phi_{\text{safety}}.

{pf}

The proof directly follows Theorem 5 and the constraints (23) in the NN weight projection operator.

4.4 Safety-biased Training to Reduce the Effect of Projection

As a practical issue, it is often desirable to have the trained NNs be safe or close to safe even before the weight projection, and thus the projection operator Π𝒫\Pi_{\mathcal{P}^{\star}} does not or only slightly modifies the NN weights without compromising performance. To reduce the effect of projection, our algorithm alternates the training and projection, and uses the safe controller partitions 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q) as constraints to bias the training.

We show the provably safe training of each local network NNq\text{NN}_{q} in Algorithm 1. Similar to a projected gradient algorithm, Algorithm 1 alternates the training (Line 3 in Algorithm 1) and the NN weight projection (Line 5 in Algorithm 1) up to a pre-specified maximum iteration max_iter. The training approach Constrained-Train (Line 3 in Algorithm 1) can be any imitation learning or reinforcement learning algorithm, with some extra constraints required by the safe controller partition 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q). In the imitation learning setting, we use {(x,u)}\{(x,u)\} to denote the collection of training data generated by an expert, where each state x𝒳x\in\mathcal{X} is associated with an input label umu\in\mathbb{R}^{m}. The approach Constrained-Train requires that all the training data satisfy u=K(x)u=K(x) and K𝒫K\in\mathcal{P}^{\star}. Similarly, in the reinforcement learning setting, we assign high reward to the explored state-action pairs (x,u)(x,u) that satisfy u=K(x)u=K(x) and K𝒫K\in\mathcal{P}^{\star}.

To guarantee that the trained neural networks are safe, Algorithm 1 identifies the set of linear regions 𝕃𝒩𝒩θ\mathbb{L}_{\mathcal{NN}_{\theta}} (Line 4 in Algorithm 1) and projects the neural network weights by solving the linear program (22)-(23) (Line 5 in Algorithm 1). The solved weights W^(F)\widehat{W}^{(F)}, b^(F)\widehat{b}^{(F)} are then used to replace the currently trained output layer weights W(F)W^{(F)}, b(F)b^{(F)} (Line 6 in Algorithm 1), and the largest possible change of the NN’s outputs is bounded by (15). Thanks to the training approach Constrained-Train takes the constraints required by 𝒫\mathcal{P}^{\star} into consideration, and the problem (22)-(23) tries to minimize the change of NN outputs due to the projection, we observe in our experiments that the projection operator usually does not change the trained NN weights much. Indeed, in the result section, we show that the trained NNs perform well even with just a single projection at the end (i.e., max_iter=1\text{max\_iter}=1).

Algorithm 1 SAFE-TRAIN (q,𝒫q,\mathcal{P}^{\star})
1:Initialize local network NNq\text{NN}_{q}, i=1i=1
2:for imax_iteri\leq\text{max\_iter} do
3:    NNq=Constrained-Train(NNq,𝒫)\text{NN}_{q}=\texttt{Constrained-Train}(\text{NN}_{q},\mathcal{P}^{\star})
4:    𝕃𝒩𝒩θ=Identify-LR(NNq)\mathbb{L}_{\mathcal{NN}_{\theta}}=\texttt{Identify-LR}(\text{NN}_{q})
5:    W^(F),b^(F)=Π𝒫(𝒩𝒩θ)\widehat{W}^{(F)},\widehat{b}^{(F)}=\Pi_{\mathcal{P}^{\star}}(\mathcal{NN}_{\theta})
6:    Set the output layer weights of NNq\text{NN}_{q} be W^(F),b^(F)\widehat{W}^{(F)},\widehat{b}^{(F)}
7:end for
8:Return NNq\text{NN}_{q}

5 Extension to Liveness Property

5.1 Controller Partition Assignment

Among all the safe controller partitions in Psafe(q)P_{\text{safe}}(q), we assign one of them 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q) to each abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}}, by taking into account the liveness specification ϕliveness\phi_{\text{liveness}}. The liveness property requires that the nonlinear system (3) can reach the goal 𝒳goal𝒳\mathcal{X}_{\text{goal}}\subset\mathcal{X} by using the trained NN controller. Unlike the safety property which can be enforced using the projection operator Π𝒫\Pi_{\mathcal{P}^{\star}}, achieving the liveness property depends on practical issues, such as the amount of training data and the effort spent on training the NNs. To that end, we show that the safety guarantee provided by our algorithm does not impede training NNs to satisfy the liveness property by using standard learning techniques.

Since the posterior graph SPostS_{\mathrm{Post}} over-approximates the behavior of the system, a transition from the abstract state qq to qq^{\prime} in SPostS_{\mathrm{Post}} does not guarantee that every state xqx\in q can reach qq^{\prime} in one step. Therefore, we introduce a predecessor operator to capture the liveness property. The predecessor of an abstract state q𝕏q^{\prime}\in\mathbb{X} under a controller partition 𝒫\mathcal{P}\in\mathbb{P} is defined as the set of states that can reach qq^{\prime} in one step by using an affine state feedback controller with some parameter K𝒫K\in\mathcal{P}:

Pre(q,𝒫){xn|K𝒫:f(x,K(x))q}.\mathrm{Pre}(q^{\prime},\mathcal{P})\triangleq\{x\in\mathbb{R}^{n}\;|\;\exists K\in\mathcal{P}:\;f(x,K(x))\in q^{\prime}\}.

The predecessor operator can be computed using backward reachability analysis of polytopic systems (Yordanov et al.,, 2012). We then introduce the predecessor graph:

Definition 9.

(Predecessor Graph) A predecessor graph is a finite transition system SPre(X,X0,L,)S_{\mathrm{Pre}}\triangleq(X,X_{0},L,\longrightarrow), where:

  • X=𝕏safe{qgoal}X=\mathbb{X}_{\text{safe}}\cup\{q_{\text{goal}}\};

  • X0=𝕏safeX_{0}=\mathbb{X}_{\text{safe}};

  • L=2L=2^{\mathbb{P}};

  • q𝑙qq\overset{l}{\longrightarrow}q^{\prime}, if qqgoalq\neq q_{\text{goal}} and l={𝒫Psafe(q)|qPre(q,𝒫)}l=\{\mathcal{P}\in P_{\text{safe}}(q)\ |\ q\cap\mathrm{Pre}(q^{\prime},\mathcal{P})\neq\emptyset\}\neq\emptyset.

Notice that in the construction of SPreS_{\mathrm{Pre}}, we restrict transition labels to the safe controller partitions 𝒫Psafe(q)\mathcal{P}\in P_{\text{safe}}(q) at each state q𝕏safeq\in\mathbb{X}_{\text{safe}}. Let 𝒯Pre\mathcal{T}_{\mathrm{Pre}} be the set of all trajectories over the predecessor graph SPreS_{\mathrm{Pre}}, then we use πX(t):ωq\pi_{X}^{(t)}:\omega\mapsto q to denote the map from a trajectory ω𝒯Pre\omega\in\mathcal{T}_{\mathrm{Pre}} to the abstract state qq at time step tt in ω\omega, and use πL(t):ωl\pi_{L}^{(t)}:\omega\mapsto l to denote the map from ω𝒯Pre\omega\in\mathcal{T}_{\mathrm{Pre}} to the label l2l\in 2^{\mathbb{P}} associated to the transition from πX(t)(ω)\pi_{X}^{(t)}(\omega) to πX(t+1)(ω)\pi_{X}^{(t+1)}(\omega). By extending the formulation of specification to the abstract state space, a trajectory ω\omega in SPreS_{\mathrm{Pre}} satisfies the reach-avoid specification ϕ\phi, denoted by ωϕ\omega\models\phi, if ω\omega reaches the goal state qgoalq_{\text{goal}} in TT steps. Similar to the notation introduced in the posterior graph, we define a Next\mathrm{Next} operator in the predecessor graph:

Next(q,𝒫){q𝕏safe{qgoal}|qPre(q,𝒫)}.\mathrm{Next}(q,\mathcal{P})\triangleq\{q^{\prime}\in\mathbb{X}_{\text{safe}}\cup\{q_{\text{goal}}\}\;|\;q\cap\mathrm{Pre}(q^{\prime},\mathcal{P})\neq\emptyset\}.

At each abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}}, our objective is to choose the candidate controller partition 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q) that can lead most of the states xqx\in q to the goal. To that end, we restrict our attention to the trajectories in SPreS_{\mathrm{Pre}} that progress towards the goal. That is, let |ω||\omega| be the length of a trajectory ω𝒯Pre\omega\in\mathcal{T}_{\mathrm{Pre}}, and Dist:𝕏safe\mathrm{Dist}:\mathbb{X}_{\text{safe}}\rightarrow\mathbb{N} map a state q𝕏safeq\in\mathbb{X}_{\text{safe}} to the length of the shortest trajectory from the state qq to the goal in the predecessor graph SPreS_{\mathrm{Pre}}. Then, we use 𝒯Pre𝒯Pre\mathcal{T}^{\prime}_{\mathrm{Pre}}\subseteq\mathcal{T}_{\mathrm{Pre}} to denote the subset of trajectories that lead to the goal:

𝒯Pre\displaystyle\mathcal{T}^{\prime}_{\mathrm{Pre}}\triangleq {ω𝒯Pre|Dist(πX(t)(ω))<Dist(πX(t1)(ω)),\displaystyle\{\omega\in\mathcal{T}_{\mathrm{Pre}}\;|\;\mathrm{Dist}(\pi_{X}^{(t)}(\omega))<\mathrm{Dist}(\pi_{X}^{(t-1)}(\omega)),
t=1,,|ω|1}.\displaystyle t=1,\ldots,|\omega|-1\}. (28)

Now we can define the subset of abstract states that progress towards the goal from abstract state qq under controller partition 𝒫\mathcal{P}, denoted by 𝒬q,𝒫Next(q,𝒫)\mathcal{Q}_{q,\mathcal{P}}\subseteq\mathrm{Next}(q,\mathcal{P}), as the set of abstract states along trajectories ω𝒯Pre\omega\in\mathcal{T}^{\prime}_{\mathrm{Pre}} that satisfy the given specification:

𝒬q,𝒫\displaystyle\mathcal{Q}_{q,\mathcal{P}}\triangleq {qNext(q,𝒫)|ω𝒯Pre:ωϕ,\displaystyle\{q^{\prime}\in\mathrm{Next}(q,\mathcal{P})\;|\;\exists\omega\in\mathcal{T}^{\prime}_{\mathrm{Pre}}:\omega\models\phi,
πX(0)(ω)=q,πX(1)(ω)=q,𝒫πL(0)(ω)}.\displaystyle\pi_{X}^{(0)}(\omega)=q,\pi_{X}^{(1)}(\omega)=q^{\prime},\mathcal{P}\in\pi_{L}^{(0)}(\omega)\}. (29)

Then, the intersection between the abstract state qq and the predecessors of abstract states q𝒬q,𝒫q^{\prime}\in\mathcal{Q}_{q,\mathcal{P}} under the controller partition 𝒫\mathcal{P}, i.e.:

q,𝒫q𝒬q,𝒫(qPre(q,𝒫)),\mathcal{I}_{q,\mathcal{P}}\triangleq\bigcup_{q^{\prime}\in\mathcal{Q}_{q,\mathcal{P}}}\left(q\cap\mathrm{Pre}(q^{\prime},\mathcal{P})\right), (30)

measures the portion of states xqx\in q that progress towards the goal by applying input u=K(x)u=K(x) with K𝒫K\in\mathcal{P} at the current step. Then, our algorithm assigns each abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}} with the controller partition 𝒫Psafe(q)\mathcal{P}^{\star}\in P_{\text{safe}}(q) corresponding to the maximum volume of q,𝒫\mathcal{I}_{q,\mathcal{P}}, i.e. 𝒫=argmax𝒫Psafe(q)μ(q,𝒫)\mathcal{P}^{\star}=\underset{\mathcal{P}\in P_{\text{safe}}(q)}{\text{argmax}}\mu(\mathcal{I}_{q,\mathcal{P}}), where μ\mu is the Lebesgue measure of n\mathbb{R}^{n}. Indeed, and as mentioned in Section 3.3, such procedure only ranks the available choices of controller partitions and one may need to iterate over the remaining choices using the same heuristic above.

5.2 Combined NN Controller

The final step of our framework is to combine all the local networks NNq\text{NN}_{q} into one global NN controller. Figure 2 (a) shows the overall structure of the global NN controller obtained by combining modules [NNq]𝔐[\text{NN}_{q}]_{\mathfrak{M}} corresponding to the local networks NNq\text{NN}_{q}. As input to the NN controller, the state x𝒳x\in\mathcal{X} is fed into all the local networks, and the output of the NN controller is the summation of all the local network outputs. In the figures, we show a single output for simplicity, but it can be easily extended to multiple outputs (indeed, even in the figures, uu and uqu_{q} can be thought as vectors in m\mathbb{R}^{m}, and the summation and product operations correspond to vector addition and scalar multiplication, respectively).

Each module [NNq]𝔐[\text{NN}_{q}]_{\mathfrak{M}} consists of two parts: logic and ReLU NN. The logic component decides whether the current state xx is in the abstract state qq associated with [NNq]𝔐[\text{NN}_{q}]_{\mathfrak{M}}, and outputs 11 if the answer is affirmative, 0 otherwise. The ReLU network NNq\text{NN}_{q} is trained for this abstract state qq using Algorithm 1. By multiplying the outputs of the logic and the ReLU NN, output of the module [NNq]𝔐[\text{NN}_{q}]_{\mathfrak{M}} is identical to the output of the ReLU NN if xqx\in q, and zero otherwise. Figure 2 (b) is an example of the module [NNq]𝔐[\text{NN}_{q}]_{\mathfrak{M}} for an arbitrary abstract state q2q\subset\mathbb{R}^{2} given by 0x120\leq x_{1}\leq 2 and 0x220\leq x_{2}\leq 2.

The logic component in each module [NNq]𝔐[\text{NN}_{q}]_{\mathfrak{M}} can be implemented as a single layer NN with fixed weights. Given the HH-representation AxcAx\leq c of the abstract state qq, the weight matrix and the bias vector of the single layer NN are W(1)=AW^{(1)}=-A and b(1)=cb^{(1)}=c, respectively. Essentially, this choice of weights encodes each hyperplane inequality in the HH-representation to each neuron in the single layer. To represent whether an inequality holds, we use a step function as the nonlinear activation function for the single layer: Step(x)=1 if x0\mathrm{Step}(x)=1\text{ if }x\geq 0, 0 otherwise. The product of the outputs of all the neurons in the single layer is computed at the end (by the product operator Π\Pi), and hence the logic component returns 11 if and only if all the hyperplane inequalities are satisfied.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: (a) The global NN controller consists of one module [NNqi]𝔐[\text{NN}_{q_{i}}]_{\mathfrak{M}} for each abstract state qi𝕏safeq_{i}\in\mathbb{X}_{\text{safe}}. (b) An example of the module [NNq]𝔐[\text{NN}_{q}]_{\mathfrak{M}}, where the state q2q\subset\mathbb{R}^{2} is given by q=[0,2]×[0,2]q=[0,2]\times[0,2].

We refer to the single layer NN for the logic component as NNq,Π\text{NN}_{q,\Pi}, along with the ReLU network NNq\text{NN}_{q}, each module can be written as [NNq]𝔐=NNq||NNq,Π[\text{NN}_{q}]_{\mathfrak{M}}=\text{NN}_{q}||\text{NN}_{q,\Pi}, where |||| denotes the parallel composition. Using the same notation, we can write the global NN controller as:

NN=[NNq1]𝔐[NNqN]𝔐,\text{NN}=[\text{NN}_{q_{1}}]_{\mathfrak{M}}\;||\;\ldots\;||\;[\text{NN}_{q_{N^{\prime}}}]_{\mathfrak{M}}, (31)

where N=|𝕏safe|N^{\prime}=|\mathbb{X}_{\text{safe}}| is the number of safe abstract states. We summarize the guarantees on the global NN controller in Theorem 10, whose proof follows directly from the discussion above along with Theorem 5 and Theorem 8. Let 𝒬qi,𝒫i\mathcal{Q}_{q_{i},\mathcal{P}^{\star}_{i}} be defined as (5.1) and Reach(q,NNq)\mathrm{Reach}(q,\text{NN}_{q}) be the set of states that can be reached in one step from abstract state qq using local network NNq\text{NN}_{q}:

Reach(q,NNq){f(x,NNq(x))|xq}.\mathrm{Reach}(q,\text{NN}_{q})\triangleq\{f(x,\text{NN}_{q}(x))\ |\ x\in q\}. (32)
Theorem 10.

Consider the nonlinear system (3) and the reach-avoid specification ϕ=ϕsafetyϕliveness\phi=\phi_{\text{safety}}\land\phi_{\text{liveness}}. Let the controller partition assignment 𝒫1,,𝒫N\mathcal{P}^{\star}_{1},\ldots,\mathcal{P}^{\star}_{N^{\prime}} and the local neural networks NNq1,,NNqN\text{NN}_{q_{1}},\ldots,\text{NN}_{q_{N^{\prime}}} satisfy (i) 𝒫iPsafe(qi)\mathcal{P}^{\star}_{i}\in P_{\text{safe}}(q_{i}) and (ii) the neural network weights θi\theta_{i} of NNqi\text{NN}_{q_{i}} is projected on 𝒫i\mathcal{P}^{\star}_{i} using the projection operator (22)-(23). Then, the global neural network controller NN given by (31) is guaranteed to be safe, i.e., NN,𝒳initϕsafety\text{NN},\mathcal{X}_{\text{init}}\models\phi_{\text{safety}} with 𝒳init=q𝕏safeq\mathcal{X}_{\text{init}}=\bigcup_{q\in\mathbb{X}_{\text{safe}}}q. Moreover, if all the local networks NNqi\text{NN}_{q_{i}} satisfy Reach(qi,NNqi)q𝒬qi,𝒫iq\mathrm{Reach}(q_{i},\text{NN}_{q_{i}})\subseteq\bigcup_{q^{\prime}\in\mathcal{Q}_{q_{i},\mathcal{P}^{\star}_{i}}}q^{\prime}, then NN satisfies the liveness property, i.e., NN,𝒳initϕliveness\text{NN},\mathcal{X}_{\text{init}}\models\phi_{\text{liveness}}.

In words, Theorem 10 guarantees that the global NN composed from provably safe local networks is still safe (i.e., satisfies ϕsafety\phi_{\text{safety}}). This reflects the fact that the composition of the global network respects the linear regions on which the local networks are defined. Moreover, if the local networks satisfy the local reachability requirements, then the global NN satisfies the liveness property ϕliveness\phi_{\text{liveness}}, which reflects the fact that the set 𝒬q,𝒫\mathcal{Q}_{q,\mathcal{P}} in (5.1) is defined to guarantee progress towards the goal.

In practice, by combining the local networks into a single NN controller, it allows one to repair the NN controller in a systematic way when it fails to meet the liveness property. For example, if a local network NNq\text{NN}_{q} fails to satisfy the local reachability requirement Reach(q,NNq)q𝒬q,𝒫q\mathrm{Reach}(q,\text{NN}_{q})\subseteq\bigcup_{q^{\prime}\in\mathcal{Q}_{q,\mathcal{P}^{\star}}}q^{\prime}, then only NNq\text{NN}_{q} need to be improved, such as by further training with augmented data collected at the state qq, without affecting local networks that satisfy the specification at other abstract states.

6 Experimental Results

We evaluated the proposed framework both in simulation and on an actual robotic vehicle. All experiments were executed on an Intel Core i9 2.4-GHz processor with 32 GB of memory.

6.1 Controller Performance Comparison in Simulation

We first present simulation results of a wheeled robot under the control of NN controllers trained by our algorithm. Let the state vector of the system be x=[ζx,ζy,θ]𝒳3x=[\zeta_{x},\zeta_{y},\theta]^{\top}\in\mathcal{X}\subset\mathbb{R}^{3}, where ζx\zeta_{x}, ζy\zeta_{y} denote the coordinates of the robot, and θ\theta is the heading direction. The discrete-time dynamics of the robot is given by:

ζx(t+Δt)\displaystyle\zeta_{x}^{(t+\Delta t)} =ζx(t)+Δtvcos(θ(t))\displaystyle=\zeta_{x}^{(t)}+\Delta t\ v\ \text{cos}(\theta^{(t)})
ζy(t+Δt)\displaystyle\zeta_{y}^{(t+\Delta t)} =ζy(t)+Δtvsin(θ(t))\displaystyle=\zeta_{y}^{(t)}+\Delta t\ v\ \text{sin}(\theta^{(t)}) (33)
θ(t+Δt)\displaystyle\theta^{(t+\Delta t)} =θ(t)+Δtu(t)\displaystyle=\theta^{(t)}+\Delta t\ u^{(t)}

where the control input u(t)u^{(t)}\in\mathbb{R} is determined by a neural network controller, i.e., u(t)=NN(x(t))u^{(t)}=\text{NN}(x^{(t)}), NN𝒫K×b1×4\text{NN}\in\mathcal{P}^{K\times b}\subset\mathbb{R}^{1\times 4} with the controller space 𝒫K×b\mathcal{P}^{K\times b} considered to be a hyperrectangle. We choose discrete time step size Δt=0.1\Delta t=0.1.

We considered two different workspaces indexed by 11 and 22 as shown in the upper and lower row of Figure 3, respectively. As the first step of our algorithm, we discretized the state space 𝒳3\mathcal{X}\subset\mathbb{R}^{3} and the controller space 𝒫K×b1×4\mathcal{P}^{K\times b}\subset\mathbb{R}^{1\times 4} as described in Section 4.1. To illustrate the flexibility in the choice of partition strategies, we partitioned the state space corresponding to workspace 11 uniformly into 552552 abstract states, while partitioning the state space corresponding to workspace 22 non-uniformly into 904904 abstract states. In both cases, the range of heading direction θ[0,2π)\theta\in[0,2\pi) is uniformly partitioned into 88 intervals, and the partitions of the xx, yy dimensions are shown as the dashed lines in the workspaces in Figure 3. We uniformly partitioned 𝒫K×b\mathcal{P}^{K\times b} into 160160 hyperrectangles. By computing the reachable sets using the reachability tool TIRA (Meyer et al.,, 2019), we constructed the posterior graph SPostS_{\mathrm{Post}}, which is then used to find the set of safe abstract states 𝕏safe\mathbb{X}_{\text{safe}} and the function PsafeP_{\text{safe}}. The execution time to compute the posterior graph and to identify the set of safe states can be found in Table 1.

We used the nonlinear MPC solver FORCES (Domahidi and Jerez,, 2019; Zanelli et al.,, 2017) as an expert to collect training data, and used Keras (Chollet et al.,, 2015) to train a shallow NN (one hidden layer) with 44 hidden layer neurons for each abstract state q𝕏safeq\in\mathbb{X}_{\text{safe}}. At the end of training, we projected the trained NN weights only once as mentioned in Section 4.3. For workspace 11, it takes 367367 seconds to collect all the training data, and 463463 seconds to train all the local NNs including the projection of the NN weights. For workspace 22, the execution time for collecting data is 601601 seconds, and the total time for training and projection is 695695 seconds.

In Figure 3, we show some trajectories under NN controllers trained by our algorithm in both workspaces. Despite we choose trajectories with initial states in the set 𝒳init=q𝕏safeq\mathcal{X}_{\text{init}}=\bigcup_{q\in\mathbb{X}_{\text{safe}}}q to be close to the obstacles or initially heading towards the obstacles, all the trajectories are collision-free as guaranteed by our algorithm. Moreover, by assigning controller partitions based on strategies in Section 5.1, all trajectories satisfy the liveness specification ϕliveness\phi_{\text{liveness}}.

               Workspace 1

Refer to caption Refer to caption Refer to caption

            Workspace 2

Refer to caption Refer to caption Refer to caption
Figure 3: Workspace 11 (the upper row) and workspace 22 (the lower row) are partitioned into abstract states (dash lines) either uniformly or non-uniformly. Trajectories starting from different initial states satisfy both the safety specification ϕsafety\phi_{\text{safety}} (blue areas are obstacles) and the liveness specification ϕliveness\phi_{\text{liveness}} for reaching the goal (green area).

   Standard Imitation Learning

Refer to caption Refer to caption Refer to caption

      Provably Safe Training

Refer to caption Refer to caption Refer to caption
Figure 4: The upper row shows trajectories resulting from NN controllers trained using standard imitation learning, where the NN architectures are (left) 22 hidden layers with 1010 neurons per layer, (middle) 22 hidden layers with 6464 neurons per layer, and (right) 33 hidden layers with 128128 neurons per layer. The lower row shows trajectories resulting from NN controllers trained using our algorithm. With the same initial states (two sub-figures in the same column), only the NN controllers trained by our algorithm result into collision-free trajectories.

Next, we compare NN controllers trained by our algorithm with those trained by standard imitation learning, which minimizes the regression loss without taking into account the safety guarantee. All NN controllers are trained using the same set of the training data. We vary the NN architectures for the NNs trained by standard imitation learning to achieve better performance, and train them using enough episodes for the loss to be low enough. Nevertheless, as shown in Figure 4, for all the NN controllers trained by standard imitation learning, we are able to find initial states starting from which the trajectories are not safe. However, with the same initial states, trajectories under NN controllers trained by our algorithm are collision-free as the framework guarantees.

6.2 Actual Robotic Vehicle

We tested the proposed framework on a small robotic vehicle called PiCar, which carries a Raspberry Pi that runs the NN controller trained by our algorithm. We used the motion capture system Vicon to measure states of the PiCar in real time. Figure 5 (a) shows the PiCar and our experimental setup. We modeled the PiCar dynamics by the rear-wheel bicycle drive (Klančar et al.,, 2017), and took model-error into consideration by adding a bounded disturbance when computing the reachable sets. We trained the local networks NNq\text{NN}_{q} using Proximal Policy Optimization (PPO) (Schulman et al.,, 2017). Figure 5 (b) shows the PiCar’s trajectory under the NN controller trained by our framework. The PiCar runs on the race track for multiple loops and satisfies the safety property ϕsafety\phi_{\text{safety}}.

Refer to caption

(a)
Refer to caption
(b)
Figure 5: (a) PiCar and workspace (b) The PiCar’s trajectory (red) on the race track satisfies the safety property ϕsafety\phi_{\text{safety}}.

6.3 Scalability Study

1- Scalability with respect to Partition Granularity: We first show scalability of our algorithm with respect to the choice of partition parameters. Using the two workspaces in Figure 3, we increase the number of abstract states and controller partitions by decreasing the partition grid size and report the execution time for each part of our framework in Table 1. We notice that the execution time grows linearly with the number of abstract states and the number of controller partitions.

Although we conducted all the experiments on a single CPU core, we note that our algorithm can be highly parallelized. For example, computing reachable sets of the abstract states, checking intersection between the posteriors and the abstract states when constructing the posterior graph, and training local neural networks NNq\text{NN}_{q} can all be done in parallel. After training the NN controller, the execution time of the controller is almost instantaneous, which is a major advantage of NN controllers.

Table 1: Scalability with respect to Partition Granularity
Workspace Number of Number of Number of Compute Construct Compute Assign
Index Abstract Controller Safe & Reachable Reachable Posterior Function Controller
States Partitions Abstract States Sets [s] Graph [s] PsafeP_{\text{safe}} [s] Partitions [s]
1 552 160 400 52.6 82.3 0.06 0.7
1 552 320 400 107.5 160.3 0.1 0.9
1 552 640 400 223.1 329.6 0.2 1.7
1 1104 160 800 108.2 333.0 0.2 2.3
1 1104 320 800 219.6 684.2 0.4 2.7
1 1104 640 800 451.5 1297.4 0.6 4.2
2 904 160 632 88.1 159.1 0.1 1.0
2 904 320 632 203.6 313.2 0.2 1.1
2 904 640 632 393.2 660.8 0.3 1.7
2 1808 160 1264 202.1 634.6 0.3 3.4
2 1808 320 1264 388.6 1298.1 0.6 4.0
2 1808 640 1264 778.2 2564.4 0.9 5.9

2- Scalability with respect to System Dimension: Abstraction-based controller design is known to be computational expensive for high-dimensional systems due to the curse of dimensionality. In Table 2, we show scalability of our algorithm with respect to the system dimension. To conveniently increase the system dimension, we consider a chain of integrators represented as the linear system x(t+1)=Ax(t)+Bu(t)x^{(t+1)}=Ax^{(t)}+Bu^{(t)}, where An×nA\in\mathbb{R}^{n\times n} is the identity matrix, and u(t)2u^{(t)}\in\mathbb{R}^{2}. With the fixed number of controller partitions, Table 2 shows that the number of abstract states and execution time grow exponentially with the system dimension nn. Nevertheless, our algorithm can handle a high-dimensional system in a reasonable amount of time.

Table 2: Scalability with respect to System Dimension
System Number of Compute Reachable Construct Posterior
Dimension nn Abstract States Sets [s] Graph [s]
2 69 0.6 0.7
4 276 2.7 2.6
6 1104 11.7 34.2
8 4416 57.1 521.0
10 17664 258.1 9840.4

7 Conclusion

In this paper, we proposed a framework for training neural network controllers for nonlinear dynamical systems with rigorous safety and liveness guarantees. Our framework computes a finite-state abstract model that captures the behavior of all possible CPWA controllers, and enforces the correct behavior during training through a NN weight projection operator, which can be applied on top of any existing training algorithm. We implemented our framework on an actual robotic vehicle that shows the capability of our approach in real-world applications. Compared to existing techniques, our framework results in provably correct neural network controllers removing the need for online monitoring, predictive filters, barrier functions, or post-training formal verification. Future research directions include extending the framework to multiple agents in unknown environment, and using the proposed abstract model to generalize learning for different tasks in meta-learning setting.

References

  • Abate et al., (2021) Abate, A., Ahmed, D., Edwards, A., Giacobbe, M., and Peruffo, A. (2021). Fossil: A software tool for the formal synthesis of lyapunov functions and barrier certificates using neural networks. In Proceedings of the 24th ACM International Conference on Hybrid Systems: Computation and Control.
  • Achiam et al., (2017) Achiam, J., Held, D., Tamar, A., and Abbeel, P. (2017). Constrained policy optimization. In Proceedings of the 34th International Conference on Machine Learning, pages 22–31.
  • Bastani et al., (2021) Bastani, O., Li, S., and Xu, A. (2021). Safe reinforcement learning via statistical model predictive shielding. In Robotics: Science and Systems.
  • Berkenkamp et al., (2016) Berkenkamp, F., Krause, A., and Schoellig, A. P. (2016). Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. arXiv preprint arXiv:1602.04450.
  • Berkenkamp et al., (2017) Berkenkamp, F., Turchetta, M., Schoellig, A., and Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In Advances in neural information processing systems.
  • Chen et al., (2021) Chen, S., Fazlyab, M., Morari, M., Pappas, G. J., and Preciado, V. M. (2021). Learning lyapunov functions for hybrid systems. In Proceedings of the 24th ACM International Conference on Hybrid Systems: Computation and Control.
  • Cheng et al., (2019) Cheng, R., Orosz, G., Murray, R. M., and Burdick, J. W. (2019). End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3387–3395.
  • Chollet et al., (2015) Chollet, F. et al. (2015). Keras. url=https://keras.io.
  • Chow et al., (2018) Chow, Y., Nachum, O., Duenez-Guzman, E., and Ghavamzadeh, M. (2018). A lyapunov-based approach to safe reinforcement learning. In Advances in neural information processing systems, pages 8092–8101.
  • Chow et al., (2019) Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., and Ghavamzadeh, M. (2019). Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031.
  • Domahidi and Jerez, (2019) Domahidi, A. and Jerez, J. (2014–2019). Forces professional. Embotech AG, url=https://embotech.com/FORCES-Pro.
  • Dutta et al., (2018) Dutta, S., Jha, S., Sankaranarayanan, S., and Tiwari, A. (2018). Output range analysis for deep feedforward neural networks. In NASA Formal Methods Symposium. Springer.
  • Fazlyab et al., (2019) Fazlyab, M., Robey, A., Hassani, H., Morari, M., and Pappas, G. (2019). Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, pages 11423–11434.
  • Ferlez et al., (2020) Ferlez, J., Elnaggar, M., Shoukry, Y., and Fleming, C. (2020). Shieldnn: A provably safe nn filter for unsafe nn controllers. arXiv preprint arXiv:2006.09564.
  • Fisac et al., (2018) Fisac, J. F., Akametalu, A. K., Zeilinger, M. N., Kaynama, S., Gillula, J., and Tomlin, C. J. (2018). A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7):2737–2752.
  • Hsu et al., (2018) Hsu, K., Majumdar, R., Mallik, K., and Schmuck, A.-K. (2018). Multi-layered abstraction-based controller synthesis for continuous-time systems. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 120–129.
  • Hsu et al., (2021) Hsu, K.-C., Rubies-Royo, V., Tomlin, C. J., and Fisac, J. F. (2021). Safety and liveness guarantees through reach-avoid reinforcement learning. In Robotics: Science and Systems.
  • Ivanov et al., (2019) Ivanov, R., Weimer, J., Alur, R., Pappas, G. J., and Lee, I. (2019). Verisig: verifying safety properties of hybrid systems with neural network controllers. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 169–178.
  • Jiang et al., (2020) Jiang, Y., Bharadwaj, S., Wu, B., Shah, R., Topcu, U., and Stone, P. (2020). Temporal-logic-based reward shaping for continuing learning tasks. arXiv:2007.01498.
  • Klančar et al., (2017) Klančar, G., Zdešar, A., Blažič, S., and Škrjanc, I. (2017). Wheeled mobile robotics.
  • Li and Belta, (2019) Li, X. and Belta, C. (2019). Temporal logic guided safe reinforcement learning using control barrier functions. arXiv preprint arXiv:1903.09885.
  • (22) Liu, A., Shi, G., Chung, S.-J., Anandkumar, A., and Yue, Y. (2019a). Robust regression for safe exploration in control. arXiv preprint arXiv:1906.05819.
  • (23) Liu, C., Arnon, T., Lazarus, C., Barrett, C., and Kochenderfer, M. J. (2019b). Algorithms for verifying deep neural networks. arXiv preprint arXiv:1903.06758.
  • Meyer et al., (2019) Meyer, P.-J., Devonport, A., and Arcak, M. (2019). Tira: toolbox for interval reachability analysis. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 224–229.
  • Pascanu et al., (2013) Pascanu, R., Montufar, G., and Bengio, Y. (2013). On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098.
  • Pauli et al., (2020) Pauli, P., Koch, A., Berberich, J., and Allgöwer, F. (2020). Training robust neural networks using lipschitz bounds. arXiv preprint arXiv:2005.02929.
  • Robey et al., (2020) Robey, A., Hu, H., Lindemann, L., Zhang, H., Dimarogonas, D. V., Tu, S., and Matni, N. (2020). Learning control barrier functions from expert demonstrations. In 59th IEEE Conference on Decision and Control.
  • Saunders et al., (2018) Saunders, W., Sastry, G., Stuhlmueller, A., and Evans, O. (2018). Trial without error: Towards safe reinforcement learning via human intervention. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2067–2069.
  • Schulman et al., (2017) Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.
  • Sun et al., (2019) Sun, X., Khedr, H., and Shoukry, Y. (2019). Formal verification of neural network controlled autonomous systems. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 147–156.
  • Taylor et al., (2020) Taylor, A. J., Singletary, A., Yue, Y., and Ames, A. D. (2020). A control barrier perspective on episodic learning via projection-to-state safety. arXiv preprint arXiv:2003.08028.
  • Turchetta et al., (2016) Turchetta, M., Berkenkamp, F., and Krause, A. (2016). Safe exploration in finite markov decision processes with gaussian processes. In Advances in Neural Information Processing Systems, pages 4312–4320.
  • (33) Wabersich, K. P. and Zeilinger, M. N. (2018a). Linear model predictive safety certification for learning-based control. In 57th IEEE Conference on Decision and Control.
  • (34) Wabersich, K. P. and Zeilinger, M. N. (2018b). Scalable synthesis of safety certificates from data with application to learning-based control. In 2018 European Control Conference (ECC), pages 1691–1697. IEEE.
  • Wabersich and Zeilinger, (2021) Wabersich, K. P. and Zeilinger, M. N. (2021). A predictive safety filter for learning-based control of constrained nonlinear dynamical systems. arXiv:1812.05506.
  • Wang et al., (2018) Wang, L., Theodorou, E. A., and Egerstedt, M. (2018). Safe learning of quadrotor dynamics using barrier certificates. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 2460–2465. IEEE.
  • Wen et al., (2020) Wen, L., Duan, J., Li, S. E., Xu, S., and Peng, H. (2020). Safe reinforcement learning for autonomous vehicles through parallel constrained policy optimization. arXiv preprint arXiv:2003.01303.
  • Xiang et al., (2019) Xiang, W., Lopez, D. M., Musau, P., and Johnson, T. T. (2019). Reachable set estimation and verification for neural network models of nonlinear dynamic systems. In Safe, Autonomous and Intelligent Vehicles, pages 123–144. Springer.
  • Xiao et al., (2020) Xiao, W., Belta, C., and Cassandras, C. G. (2020). Adaptive control barrier functions for safety-critical systems. arXiv:2002.04577.
  • Yordanov et al., (2012) Yordanov, B., Tumova, J., Cerna, I., Barnat, J., and Belta, C. (2012). Temporal logic control of discrete-time piecewise affine systems. IEEE Transactions on Automatic Control, 57(6):1491–1504.
  • Zanelli et al., (2017) Zanelli, A., Domahidi, A., Jerez, J., and Morari, M. (2017). Forces nlp: an efficient implementation of interior-point methods for multistage nonlinear nonconvex programs. International Journal of Control, pages 1–17.