This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Incremental Composition of Learned Control Barrier Functions in Unknown Environments

Paul Lutkus, Deepika Anantharaman, Stephen Tu, and Lars Lindemann Paul Lutkus and Lars Lindemann are with the Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA. Deepika Anantharaman performed this work while affiliated with the Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA. Stephen Tu is with the Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California. Corresponding E-mail: [email protected].
Abstract

We consider the problem of safely exploring a static and unknown environment while learning valid control barrier functions (CBFs) from sensor data. Existing works either assume known environments, target specific dynamics models, or use a-priori valid CBFs, and are thus limited in their safety guarantees for general systems during exploration. We present a method for safely exploring the unknown environment by incrementally composing a global CBF from locally-learned CBFs. The challenge here is that local CBFs may not have well-defined end behavior outside their training domain, i.e. local CBFs may be positive (indicating safety) in regions where no training data is available. We show that well-defined end behavior can be obtained when local CBFs are parameterized by compactly-supported radial basis functions. For learning local CBFs, we collect sensor data, e.g. LiDAR capturing obstacles in the environment, and augment it with simulated data from a safe oracle controller. Our work complements recent efforts to learn CBFs from safe demonstrations—where learned safe sets are limited to their training domains—by demonstrating how to grow the safe set over time as more data becomes available. We evaluate our approach on two simulated systems, where our method successfully explores an unknown environment while maintaining safety throughout the entire execution.

I Introduction

A key characteristic of a truly autonomous agent is the ability to safely explore an unknown environment by iteratively gathering data, in a safe manner, about its surroundings, and then replanning based on its new belief. Consider, as an example, a mobile search and rescue robot placed in an abandoned warehouse. Upon entering its environment, the robot has only an incomplete understanding of the obstacles in this previously unseen warehouse, and hence must explore the environment, using its onboard sensors to gather information about the surrounding obstacles. If this exploration is not done in a safe manner, the robot may accidentally enter situations where it becomes damaged and can no longer complete its mission. For real-world mission-critical deployments, handling this inherent tension between safety and liveliness in a mathematically rigorous way is of paramount importance.

In this work, we attack the safe exploration problem by incrementally composing locally-learned control barrier functions (CBFs). In particular, we extend prior work in offline CBF learning [1, 2, 3, 4] to the online setting by incrementally learning local CBFs that individually ensure that the system acts safely with respect its immediate surroundings. Learning local CBFs has considerable computational advantages, especially for long-running agents, since the complexity of each learning problem scales only with the amount of new data collected, and not with cumulative data. We compose these locally-valid CBFs into a single valid global CBF via a pointwise maximum, using recent progress on non-smooth CBFs [5]. In order to ensure that our CBFs can indeed be meaningfully composed in such a manner, we utilize compactly supported basis functions [6] which ensure negative behavior off the support of the training data. Together, these ingredients allow us to incrementally expand the number of safe states that an autonomous system is able to operate in, as more data becomes available, while provably remaining safe throughout the exploration process by inheriting the rigorous safety guarantees of CBFs.

Our approach improves on existing work in the area of safe exploration with CBFs [7, 8] by leveraging a learning framework to produce dynamics-aware CBFs, rather than CBFs which only encode obstacle information, a limitation that can lead to safety-filter infeasibility or the requirement of unbounded actuation to maintain safety.

Contributions: (1) We learn local CBFs that are globally well-defined by leveraging compactly supported basis functions to obtain negative end-behavior [6]. (2) We provide conditions on the basis function parameters and CBF-synthesis QP to certify that valid local CBFs compose, via pointwise maximum, to a valid, nonsmooth global CBF. (3) We provide simulations that illustrate the correctness and applicability of our method.

Related Work: For a known dynamical system and environment, there is a rich body of literature on safe control synthesis via both Hamilton-Jacobi (HJ) reachability [9] and control barrier functions (CBF) [10], which are closely related [11]. Furthermore, generalizations of both HJ reachability and CBFs that handle complex system types—including multi-agent and hybrid systems—through the framework of composition have been recently explored [5, 12, 13, 14].

In this work, we primarily focus on CBFs, motivated by their practical use in designing safe control laws via convex programming [10]. However, directly applying CBFs to the problem of safe exploration is difficult for a multitude of reasons. First, even when both the system and environment are known beforehand, deriving a valid CBF from first principles is a challenging problem. Recent work has focused instead on using safe expert-demonstration trajectories to learn a CBF directly from data [1, 3, 4]. While learning-based approaches significantly expand the applicability of CBFs, we cannot naïvely apply these techniques to safe exploration since we do not have a full description of the environment, nor a complete set of safe demonstration trajectories that covers the entire safe space.

Recent works have addressed the issue of safe exploration with CBFs for various special cases. The work [15] considers online tuning of parameterized CBFs specialized to single-integrator, multi-agent systems. In [16], the authors show how to ensure simultaneous safety and exploration when learning a general stochastic, discrete-time system, assuming that a ground truth CBF is available a-priori. Furthermore, the works [17, 8, 7, 18, 19, 20] consider the use of sensor data (particularly LiDAR data) to obtain CBFs online. However, these works either consider specialized dynamics models [18, 17], require retraining on the entire demonstration dataset whenever new measurements are gathered [8], or restrict CBFs to e.g. elliptical sub-level sets or signed distance functions [7, 20, 19]. The latter restriction is not guaranteed to satisfy dynamic feasibility—especially under input constraints or underactuation—and hence cannot guarantee safety in general; we illustrate this issue with a Dubins car example in Section IV.

II Background

Consider the control-affine system x˙=f(x)+g(x)u\dot{x}=f(x)+g(x)u where xdxx\in\mathbb{R}^{d_{x}} and uduu\in\mathbb{R}^{d_{u}} are states and control inputs, respectively. We wish to ensure that the system’s trajectory x(t)x(t) remains within a compact set SdxS\subseteq\mathbb{R}^{d_{x}} for all times t0t\geq 0. To accomplish this, we use control barrier functions (CBFs) [10], which certify the existence of control inputs guaranteeing forward invariance of the set SS.

Definition 1 (Control Barrier Function)

Given a compact set SdxS\subseteq\mathbb{R}^{d_{x}}, a set of control inputs UduU\subseteq\mathbb{R}^{d_{u}}, and a locally Lipschitz continuous extended class-𝒦\mathcal{K} function α:\alpha:\mathbb{R}\to\mathbb{R}111An extended class-𝒦\mathcal{K} function α\alpha is strictly increasing and s.t. α(0)=0\alpha(0)=0., a continuously differentiable function h:Dh:D\rightarrow\mathbb{R} is said to be a control barrier function on an open set DSD\supseteq S if:

h(x)\displaystyle h(x) >0xint(S)\displaystyle>0\ \forall x\in\mathrm{int}(S) (1a)
h(x)\displaystyle h(x) <0xDS\displaystyle<0\ \forall x\in D\setminus S (1b)
h(x)\displaystyle h(x) =0xbd(S)\displaystyle=0\ \forall x\in\mathrm{bd}(S) (1c)
supuUh(x)\displaystyle\ \sup_{u\in U}\langle\nabla h(x) ,f(x)+g(x)uα(h(x))xD.\displaystyle,\ f(x)+g(x)u\rangle\geq-\alpha(h(x))\ \forall x\in D. (1d)

Given a reference control signal ur:0Uu_{r}:\mathbb{R}_{\geq 0}\to U, i.e. an open-loop control signal that we would like to match, we can use the CBF hh as a safety filter to keep the trajectory x(t)x(t) within the set SS while closely following uru_{r}.

Lemma 1 (Safety Filter [10])

Given a CBF h:Dh:D\rightarrow\mathbb{R} and a reference control signal ur:0Uu_{r}:\mathbb{R}_{\geq 0}\to U, we compute a control signal u:0Uu^{*}:\mathbb{R}_{\geq 0}\to U that renders the set SS forward invariant by solving the convex quadratic program:

u(t)=argminuUuur(t)22s.t.h(x(t)),f(x(t))+g(x(t))uα(h(x(t))).\displaystyle\begin{split}u^{*}(t)&=\operatorname*{arg\,min}_{u\in U}\quad\|u-u_{r}(t)\|_{2}^{2}\\ \mathrm{s.t.}&\left\langle\nabla h(x(t)),f(x(t))+g(x(t))u\right\rangle\geq-\alpha(h(x(t))).\end{split} (2)

In the remainder, we will refer to the quadratic program in Eq. 2 as the CBF-QP.

Refer to caption

Figure 1: A valid global CBF is obtained as the pointwise maximum of locally-valid CBFs. A necessary condition for this construction is that each local CBF have negative end-behavior.

II-A Learning CBFs from Data

Constructing valid CBFs is generally challenging and has motivated learning-based approaches. We briefly review our prior work in [1, 2] on learning CBFs from demonstrations, which serves as a basis for this paper. Here, an expert dataset 𝒟:={(xi,ui)}i=1N\mathcal{D}:=\{(x_{i},u_{i})\}_{i=1}^{N} is given where NN state-action pairs demonstrate safe system behavior. From the dataset 𝒟\mathcal{D}, a CBF of the form h(x):=j=1Jθjϕj(x)=θTϕ(x)h(x):=\sum_{j=1}^{J}\theta_{j}\phi_{j}(x)=\theta^{T}\phi(x) is learned by solving a quadratic program (QP) where ϕ(x):dxJ\phi(x):\mathbb{R}^{d_{x}}\rightarrow\mathbb{R}^{J} are JJ pre-defined basis functions and θJ\theta\in\mathbb{R}^{J} are the decision variables. This QP is formed by constructing three distinct datasets using information from 𝒟\mathcal{D}. The buffer dataset :={(xi,ui)}i=1Nb\mathcal{B}:=\{(x_{i},u_{i})\}_{i=1}^{N_{b}} contains the boundary points of the dataset 𝒟\mathcal{D}, obtained via boundary point detection algorithms (see [2] for details), on which we enforce the dynamics constraint (1d). The safe dataset 𝒮:=𝒟={(xi,ui)}i=1Ns\mathcal{S}:=\mathcal{D}\setminus\mathcal{B}=\{(x_{i},u_{i})\}_{i=1}^{N_{s}} contains the remaining points in 𝒟\mathcal{D}, on which we enforce the positivity and dynamics constraints (1a) and (1d), respectively. The unsafe dataset 𝒰:={xi}i=1Nu\mathcal{U}:=\{x_{i}\}_{i=1}^{N_{u}} contains points that surround 𝒟\mathcal{D}, e.g. obtained by sampling or again applying boundary point detection, on which we enforce the negativity constraint (1b).

Definition 2 (Learning-CBF QP [1])

Given the datasets 𝒮\mathcal{S}, \mathcal{B}, and 𝒰\mathcal{U}, the hyperparameters γsafe,γunsafe,γdyn>0,b0\gamma_{\text{safe}},\gamma_{\text{unsafe}},\gamma_{\text{dyn}}>0,b\geq 0, a locally Lipschitz continuous extended class-𝒦\mathcal{K} function α:\alpha:\mathbb{R}\to\mathbb{R}, and a basis function ϕ:dxJ\phi:\mathbb{R}^{d_{x}}\rightarrow\mathbb{R}^{J} with bounded Lipschitz constant, we learn a candidate CBF h(x):=ϕ(x)Tθbh(x):=\phi(x)^{T}\theta-b by solving the convex quadratic program:

minθJ\displaystyle\min_{\theta\in\mathbb{R}^{J}}\quad θ22\displaystyle\|\theta\|_{2}^{2} (3a)
s.t.\displaystyle\mathrm{s.t.}\quad ϕ(xi)Tθbγsafe\displaystyle\phi(x_{i})^{T}\theta-b\geq\gamma_{\text{safe}} xi𝒮\displaystyle\mathllap{\forall x_{i}\in\mathcal{S}\quad\quad} (3b)
ϕ(xi)Tθbγunsafe\displaystyle\phi(x_{i})^{T}\theta-b\leq-\gamma_{\text{unsafe}} xi𝒰\displaystyle\mathllap{\forall x_{i}\in\mathcal{U}\quad\quad} (3c)
Dxϕ(xi)Tθ,f(xi)+g(xi)ui+α(h(xi))γdyn\displaystyle\left\langle D_{x}\phi(x_{i})^{T}\theta,\ f(x_{i})+g(x_{i})u_{i}\right\rangle+\alpha(h(x_{i}))\geq\gamma_{\text{dyn}}
(xi,ui)𝒮\displaystyle\mathllap{\forall(x_{i},u_{i})\in\mathcal{S}\cup\mathcal{B}\quad\quad} (3d)

where Dxϕ(x)D_{x}\phi(x) denotes the Jacobian of ϕ(x)\phi(x).

We refer to (3) as the Learning-CBF QP, and remark that the candidate CBF obtained by solving the Learning-CBF QP is valid when conditions on the density of the datapoints in 𝒮\mathcal{S}, \mathcal{B}, and 𝒰\mathcal{U} and the Lipschitz constants of hh, ff, and gg are satisfied (we refer the reader to [1] for details). Specifically, we will use the Learning-CBF QP to learn local CBFs hih_{i} that are valid on a domain DiD_{i}. For the experiments in [1], the basis functions ϕ(x)\phi(x) are chosen to be random Fourier features [21]. Unfortunately, the oscillatory end-behavior of Random Fourier features prevents composition of multiple learned CBFs via the max\max-operator (cf. Figure 1), since the locally learned CBFs hih_{i} may oscillate to positive outside of their domain DiD_{i}. Obtaining negative end-behavior motivates our use of compactly supported RBFs.

II-B Compactly Supported Radial Basis Functions

Radial basis functions (RBFs) are scalar functions that exhibit radial symmetry about a center zdxz\in\mathbb{R}^{d_{x}}, i.e. ϕ(x,z):=ϕ¯(xz2){\phi}(x,z):=\bar{\phi}(\|x-z\|_{2}) for a scalar function ϕ¯:\bar{\phi}:\mathbb{R}\to\mathbb{R}. For a set of JJ centers Z:=(z1,,zJ)Z:=(z_{1},\ldots,z_{J}), we will combine RBFs linearly as h(x):=θTϕ(x,Z)bh(x):=\theta^{T}\phi(x,Z)-b, where ϕ(x,Z):=(ϕ(x,z1),,ϕ(x,zJ))\phi(x,Z):=({\phi}(x,z_{1}),\ldots,{\phi}(x,z_{J})). We consider a specific family of RBFs, referred to as compactly supported radial basis functions (CS-RBFs). CS-RBFs are zero outside of a pre-specified radius around zz, which ensures that hh can be guaranteed to be negative outside of its region of validity DD for b>0b>0. When using CS-RBFs to learn control barrier functions, we require continuous differentiability, and so we use Wendland’s functions, which are compactly supported, polynomial basis functions that are minimal in their polynomial degree [6]. Some examples of Wendland’s functions are ϕ¯(r):=max(0,1r)4(1+4r)/20\bar{\phi}(r):=\max(0,1-r)^{4}(1+4r)/20 for dx3d_{x}\leq 3, and ϕ¯(r):=max(0,1r)8(3+24r+63r2)/5040\bar{\phi}(r):=\max(0,1-r)^{8}(3+24r+63r^{2})/5040 for dx7d_{x}\leq 7.

III Safe Exploration via Incremental Composition of CBFs

In this section, we describe our approach to safe exploration via incrementally composing local CBFs in an unknown environment. Let CdxC\subseteq\mathbb{R}^{d_{x}} denote the a-priori unknown geometric safe region, e.g. the obstacle-free space for a robot, along with those states from which obstacles can be avoided via an appropriate control signal. Denote q:=(x1,,xdq)q:=(x^{1},\ldots,x^{d_{q}}) for dqdxd_{q}\leq d_{x} as the observable coordinates from a subset of the full coordinates x:=(x1,,xdx)x:=(x^{1},\ldots,x^{d_{x}}) for which safety can be determined by a measurement map M:dxprojq(2C)M:\mathbb{R}^{d_{x}}\to\text{proj}_{q}(2^{C}) that takes the state xx to a subset of the geometric safe region, projected222projq(S)={qdq:(q,xdq+1,,xdx)S}\text{proj}_{q}(S)=\{q\in\mathbb{R}^{d_{q}}:\exists(q,x^{d_{q}+1},\ldots,x^{d_{x}})\in S\} onto the coordinates qq. M(x)M(x) represents local information obtainable about CC from state xx; for example, spatial measurements collected from onboard sensors, e.g. LiDAR. For a sensor that detects all obstacles PdqP\subseteq\mathbb{R}^{d_{q}} in a radius rr around xx, M(x)=Pcr(x)M(x)=P^{c}\cap\mathcal{B}_{r}(x), where r(x)\mathcal{B}_{r}(x) is a ball of radius rr centered at xx and PcP^{c} is the complement of PP.

Our goal is to safely guide the system to collect new measurements M(x(t))M(x(t)) in order to compose a CBF H(x)H(x) that defines a safe set S:={xdx:H(x)0}S:=\{x\in\mathbb{R}^{d_{x}}:H(x)\geq 0\} which covers as much of CC as possible, while respecting x(t)Cx(t)\in C for all times. We solve this safe exploration problem by combining local CBFs as follows. At time t=0t=0, we take a measurement M(x(0))M(x(0)) and, using the Learning-CBF QP (cf. Definition 2), learn an initial CBF h0h_{0} which is valid on some set D0dxD_{0}\subseteq\mathbb{R}^{d_{x}}, and renders a subset S0:={xD0:h0(x)0}CS_{0}:=\{x\in D_{0}:h_{0}(x)\geq 0\}\subseteq C forward invariant.333Here, D0D_{0} is the set on which the constraints in Definition 2 are satisfied. In addition, we require that S0S_{0} agrees with the measurement, i.e. projq(S0)M(x0)\text{proj}_{q}(S_{0})\subseteq M(x_{0}). Let kk denote the number of measurements taken so far (starting at k=1k=1). We proceed iteratively as follows:

  1. 1.

    Let Hk(x):=maxi{0,,k1}hi(x)H_{k}(x):=\max_{i\in\{0,\dots,k-1\}}h_{i}(x) denote the CBF that renders i{0,,k1}Si\cup_{i\in\{0,\dots,k-1\}}S_{i} forward invariant.

  2. 2.

    Use HkH_{k} to approach the boundary of the current invariant set i{0,,k1}Si\cup_{i\in\{0,\dots,k-1\}}S_{i} (cf. Remark 1).

  3. 3.

    Take a new measurement Mk=M(x(t))M_{k}=M(x(t)), and learn a CBF hkh_{k} which is valid on some set DkdxD_{k}\subseteq\mathbb{R}^{d_{x}}, renders a subset Sk:={xDk:hk(x)0}S_{k}:=\{x\in D_{k}:h_{k}(x)\geq 0\} forward invariant, and satisfies projq(Sk)Mk\text{proj}_{q}(S_{k})\subseteq M_{k}.

  4. 4.

    Increment kk+1k\leftarrow k+1.

Remark 1

In order to approach the boundary of the current invariant set i{0,,k1}Si\cup_{i\in\{0,\dots,k-1\}}S_{i}, one can choose a point xx^{-} such that Hk(x)<0H_{k}(x^{-})<0, and construct a model predictive controller (MPC) that steers x(t)x(t) towards xx^{-}. The control signal ur(t)u_{r}(t) generated by this MPC can then be filtered by the CBF-QP in Eq. 2 to obtain a signal u(t)u(t) that safely approaches the boundary of i{0,,k1}Si\cup_{i\in\{0,\dots,k-1\}}S_{i}.

Our procedure works by learning CBFs in an incremental manner when new measurements MkM_{k} become available. This yields an efficient algorithm where one does not need to recompute CBFs for regions of the state space where no new information about the set CC is obtained. However, this incremental CBF construction raises two important questions. The first question, which we address in Section III-A, is how one obtains the necessary expert dataset in Step 3) in order to apply the Learning-CBF QP while satisfying projq(Sk)Mk\text{proj}_{q}(S_{k})\subseteq M_{k}. The second question, addressed in Section III-B, is what conditions are needed to ensure that the max\max composition in Step 1) yields a valid global CBF.

III-A Obtaining Expert Demonstrations Online

Refer to caption

Figure 2: Left-to-Right: Incrementally learning a global CBF for the Dubins car, while safely exploring an unknown environment under input constraints. LiDAR scans are taken (and CBFs computed) where the trajectory color changes. To showcase the correctness of our local CBFs, each trajectory is constrained by the last-learned local CBF, whose zero-level set is shaded. At each (q1,q2)(q^{1},q^{2}), red arcs denote unsafe angles h(q1,q2,θ)<0h(q^{1},q^{2},\theta)<0. Blue arcs denote h(q1,q2,θ)0h(q^{1},q^{2},\theta)\geq 0. Red arcs that are not on obstacles reflect the fact that unexplored states must be considered unsafe.

The Learning-CBF QP described in Section II-A provides us with a powerful framework to learn valid CBFs from demonstrations. In order to apply the framework, we need access to a dataset of expert demonstrations containing safe state-action pairs (cf. Definition 2). However, this may not be the case as we freely explore the state space. For example, suppose our measurement map is implemented via a LiDAR sensor. While this sensor provides us with a point cloud of obstacles, it does not provide us with expert control inputs that safely avoid said obstacles.

We provide two remedies for the lack of safe inputs. Our first solution is based on the dual norm. Suppose that the input set UU is a norm ball of the form uumax\|u\|\leq u_{\max}. Then, the CBF derivative constraint supuUh(x),f(x)+g(x)uα(h(x))\sup_{u\in U}\langle\nabla h(x),\ f(x)+g(x)u\rangle\geq-\alpha(h(x)) and hence the constraint Eq. 3 in the Learning-CBF QP can be rewritten in terms of the dual norm u=supxdn{x,u:x1}\|u\|^{\star}=\sup_{x\in\mathbb{R}^{d_{n}}}\{\langle x,u\rangle:\|x\|\leq 1\} as follows [22]:

h(x),f(x)+umaxg(x)Th(x)+α(h(x))γdyn.\langle\nabla h(x),f(x)\rangle+u_{\max}\|g(x)^{T}\nabla h(x)\|^{\star}+\alpha(h(x))\geq\gamma_{\text{dyn}}.

Observe that the above expression does not require any input demonstrations uu. However, there are two limitations to this approach. Firstly, the resulting Learning-CBF QP is nonconvex. Secondly, sensors may not fully characterize the safe states (i.e. dq<dxd_{q}<d_{x}), e.g., consider a car with a LiDAR sensor. The LiDAR provides safe and unsafe positions, but not the safe steering angles and velocities which are needed to avoid system states that inevitably lead to a collision.

To address the limitations of the dual norm, our second approach is based on the notion of a safety oracle. Given a safe set returned by a sensor, we define a safety oracle to be a function that takes this safe set as input and returns safe state-action pairs for which there exists a control signal that keeps the system in the aforementioned safe set for all time.

Definition 3 (Safety Oracle)

Recall that qdqq\in\mathbb{R}^{d_{q}} denotes the observed subset of the full coordinates xdxx\in\mathbb{R}^{d_{x}}. For some state xsdxx_{s}\in\mathbb{R}^{d_{x}} at which a measurement is obtained, assume that the measurement map M(xs)M(x_{s}) returns a local safe subset Clocprojq(C)C_{\mathrm{loc}}\subset\text{proj}_{q}(C) contained in a local scan region RClocR\supseteq C_{\mathrm{loc}} around xsx_{s}, i.e. x=(q,xdq+1,,xdx)Cx=(q,x^{d_{q}+1},\ldots,x^{d_{x}})\notin C for all qRCloccq\in R\cap C_{\mathrm{loc}}^{c} and (xdq+1,,xdx)dxdq(x^{d_{q}+1},\ldots,x^{d_{x}})\in\mathbb{R}^{d_{x}-d_{q}}. A safety oracle 𝒪:M(xs){(xi,ui)}i=1Ns\mathcal{O}:M(x_{s})\mapsto\{(x_{i},u_{i})\}_{i=1}^{N_{s}} takes the observed safe set M(xs)=ClocM(x_{s})=C_{\mathrm{loc}} and returns some safe state-action pairs 𝒟={(xi,ui)}i=1Ns\mathcal{D}=\{(x_{i},u_{i})\}_{i=1}^{N_{s}} for which there exists a control signal u(t)u(t) that ensures projq(x(t))Cloc\text{proj}_{q}(x(t))\in C_{\mathrm{loc}} for all times t0t\geq 0 and for all initial conditions (x(0),u(0))=(xi,ui)𝒟(x(0),u(0))=(x_{i},u_{i})\in\mathcal{D}.

Some examples of safety oracles are Hamilton-Jacobi reachability analysis [9], safe MPC [23], and obstacle-aware shooting methods [24]. Our proposed approach allows us to efficiently use local oracle computations in two ways. Firstly, while oracle computations may only provide safe state-action pairs for a finite set of states, the learned locally-valid CBF interpolates between the given datapoints (cf. Definition 2), yielding safety guarantees over the entire local domain. Secondly, oracle computatations may be costly. By synthesizing local safety computations into a composable CBF, we can store and combine them in a principled way, circumventing the need to recalculate the safety of already-certified trajectories. The following example illustrates how our approach leverages a safety oracle.

Refer to caption

Figure 3: The global CBF for the Dubins car. Observe that the magenta trajectory of the system under the global CBF interpolates the behavior of the system under the locally learned CBFs shown in Fig. 2.

Dubins car: Consider dynamics x˙=(q˙1,q˙2,θ˙)=(vcosθ,vsinθ,u)\dot{x}=(\dot{q}^{1},\dot{q}^{2},\dot{\theta})=(v\cos\theta,v\sin\theta,u), with position q:=(q1,q2)q:=(q^{1},q^{2}) and a fixed velocity vv. The system operates in an unknown environment with access to a LiDAR sensor that returns a dataset of safe positions 𝒮q:={qi}i=1Ns\mathcal{S}_{q}:=\{q_{i}\}_{i=1}^{N_{s}}, and a dataset of unsafe positions 𝒰q:={qi}i=1Nu\mathcal{U}_{q}:=\{q_{i}\}_{i=1}^{N_{u}}. Critically, the LiDAR does not provide safe or unsafe θ\theta values, nor does it provide control inputs uu. By joining the unsafe dataset 𝒰q\mathcal{U}_{q} with unsafe positions 𝒵q\mathcal{Z}_{q} sampled from beyond the convex hull of 𝒮q\mathcal{S}_{q}, we can perform Hamilton-Jacobi reachability analysis [9] to obtain a value function V(x)V(x) that, for each q𝒮qq\in\mathcal{S}_{q}, is positive for safe θ\theta’s for which there exists a control signal that avoids 𝒰q𝒵q\mathcal{U}_{q}\cup\mathcal{Z}_{q}. For each q𝒮qq\in\mathcal{S}_{q} the safe angles are {θ:V(q1,q2,θ)0}\{\theta:V(q^{1},q^{2},\theta)\geq 0\}. For a discretized range of θ\theta’s, the result is a dataset 𝒟:={(xi,ui)}i=1N\mathcal{D}:=\{(x_{i},u_{i})\}_{i=1}^{N} of safe states and control inputs, where the gradient of the value function, V(x)\nabla V(x), is used to obtain safe control inputs according to π(x)=argmaxuUV(x),f(x)+g(x)u\pi^{*}(x)=\arg\max_{u\in U}\langle\nabla V(x),f(x)+g(x)u\rangle. Note that synthesizing 𝒟\mathcal{D} into a locally-valid CBF effectively allows us to interpolate the gridded solution to the HJB-PDE, and to combine information from multiple PDE solutions by composing their learned CBFs.

III-B Composing Locally-Valid CBFs

In the previous subsection, we generated datasets of safe state-action pairs, which can be used to obtain locally-valid CBFs via the Learning-CBF QP. Consider the task of composing kk locally-valid CBFs h1,,hkh_{1},\dots,h_{k}. Clearly, since hih_{i} is only constrained to be valid on some region DiD_{i} where its dataset was sampled from, hih_{i} should be negative for dxDi\mathbb{R}^{d_{x}}\setminus D_{i}. Thus, if each CBF hih_{i} is valid on DiD_{i}, and hi(x)<0xdxDih_{i}(x)<0\ \forall x\in\mathbb{R}^{d_{x}}\setminus D_{i}, then H(x):=maxi{1,,k}hi(x)H(x):=\max_{i\in\{1,\ldots,k\}}h_{i}(x) is a valid nonsmooth CBF on all i{1,,k}Di\bigcup_{i\in\{1,\ldots,k\}}D_{i}.

Let ϕs(x,Z):dx×dx×JJ\phi_{s}(x,Z):\mathbb{R}^{d_{x}}\times\mathbb{R}^{d_{x}\times J}\to\mathbb{R}^{J} denote a vector of compactly supported radial basis functions that decay to zero in radius ss. Specifically, each component [ϕs(x,Z)]j[\phi_{s}(x,Z)]_{j} of ϕs(x,Z)\phi_{s}(x,Z) is a Wendland’s function of the form ϕs(x,zj):=ϕ¯(xzj2/s)=(1rj/s)+pf(rj/s){\phi}_{s}(x,z_{j}):=\bar{\phi}(\|x-z_{j}\|_{2}/s)=(1-r_{j}/s)_{+}^{p}f(r_{j}/s) for rj:=xzj2r_{j}:=\|x-z_{j}\|_{2}, where ()+:=max(0,)(\cdot)_{+}:=\max(0,\cdot) and f(rj/s)f(r_{j}/s) is polynomial in rj/sr_{j}/s with order less than pp [6]. Note that ϕ¯(rj/s)=0ris\bar{\phi}(r_{j}/s)=0\ \forall r_{i}\geq s.

Assume that DiD_{i} is a compact and contains all centers zjz_{j} of hih_{i}. We show that as long as all centers zjz_{j} whose corresponding weight θj\theta_{j} is positive are at least a distance of ss away from bd(Di)\mathrm{bd}(D_{i}), which can be guaranteed by sampling unsafe points to increase the size of 𝒰\mathcal{U} in Eq. 3c, then hi(x)<0h_{i}(x)<0 on all dxDi\mathbb{R}^{d_{x}}\setminus D_{i}. It follows that iSi\bigcup_{i}S_{i} is forward invariant under control inputs generated by H(x)H(x). Let d(z,D)d(z,D) denote the smallest Euclidean distance from point zz to set DD.

Theorem 1 (Forward Invariance for H(x)=maxi{hi(x)}H(x)=\max_{i}\{h_{i}(x)\})

Consider kk continuously differentiable CBFs {hi(x)}i=1k\{h_{i}(x)\}_{i=1}^{k} parameterized as hi(x):=ϕs(x,Z(i))Tθ(i)bh_{i}(x):=\phi_{s}(x,Z^{(i)})^{T}\theta^{(i)}-b with weights θ(i)\theta^{(i)}, centers Z(i)Z^{(i)}, and b>0b>0. Let each hi(x)h_{i}(x) be locally valid on a corresponding region from {Di}i=1k\{D_{i}\}_{i=1}^{k}. For all zj(i)Z(i)z_{j}^{(i)}\in Z^{(i)}, assume that zj(i)Diz_{j}^{(i)}\in D_{i} and that d(zj(i),bd(Di))sd(z_{j}^{(i)},\mathrm{bd}(D_{i}))\geq s if θj(i)>0\theta_{j}^{(i)}>0. If for all xi{1,,k}Dix\in\bigcup_{i\in\{1,\ldots,k\}}D_{i} with H(x)0H(x)\geq 0 there exists ϵ>0\epsilon>0 such that the following (modified) CBF-QP has a bounded solution

u(t)=argminuU\displaystyle u^{*}(t)=\operatorname*{arg\,min}_{u\in U} uur(t)22\displaystyle\quad\|u-u_{r}(t)\|_{2}^{2}
s.t.\displaystyle\mathrm{s.t.} hi(x),f(x)+g(x)uα(H(x))\displaystyle\quad\left\langle\nabla h_{i}(x),f(x)+g(x)u\right\rangle\geq-\alpha(H(x))
i{i:|hi(x)H(x)|ϵ},\displaystyle\mathllap{\forall i\in\{i^{\prime}:|h_{i^{\prime}}(x)-H(x)|\leq\epsilon\}},

then H(x)H(x) is a valid non-smooth CBF that guarantees forward invariance of the set i{1,,k}Sii{1,,k}Di\bigcup_{i\in\{1,\ldots,k\}}S_{i}\subseteq\bigcup_{i\in\{1,\ldots,k\}}D_{i}.

Proof: If d(zj(i),bd(Di))sd(z_{j}^{(i)},\mathrm{bd}(D_{i}))\geq s for all zj(i)z_{j}^{(i)} s.t. θj(i)>0\theta_{j}^{(i)}>0, then b>0hi(x)<0b>0\Rightarrow h_{i}(x)<0 (strict ineq.) xdxDi\forall x\in\mathbb{R}^{d_{x}}\setminus D_{i} by the compact support property of ϕ¯s\bar{\phi}_{s}. By Theorem 3 of [5], it immediately follows that H(x)H(x) is a non-smooth CBF that guarantees forward invariance of the set i{1,,k}Si\bigcup_{i\in\{1,\ldots,k\}}S_{i}. ∎

The set Iϵ(x):={i:|hi(x)H(x)|ϵ}I_{\epsilon}(x):=\{i^{\prime}:|h_{i^{\prime}}(x)-H(x)|\leq\epsilon\} is called the “almost-active” set in [5]. The CBF-QP in Theorem 1 hence imposes a derivative constraint for each hi(x)h_{i}(x) with iIϵ(x)i\in I_{\epsilon}(x) to obtain forward invariant control inputs.

Refer to caption
Figure 4: Left: Local Wendland’s function CBF (green and blue contours) constrains the Dubins car system (magenta trajectory) to the safe region. A signed distance function cannot certify forward invariance for this system (gold trajectory, leaves safe set), even with unbounded actuation.

IV Case Studies

To illustrate the validity of our approach, we incrementally learn global CBFs for two systems: a planar, feedback-linearizable system and a fixed-velocity Dubins car. For both case studies, we choose s=1s=1 and obtain CS-RBFs ϕs=1(x,zj):=max{0,1rj}4(1+4rj)/20\phi_{s=1}(x,z_{j}):=\max\{0,1-r_{j}\}^{4}(1+4r_{j})/20. For simplicity, we pick the centers zjz_{j} to fall on a grid on their domains; more sophisticated arrangements of centers could be chosen, e.g. along safe trajectories provided by the oracle. The code for our case studies can be found at https://github.com/paullutkus/learning-cbfs-online.

IV-A Dubins Car

The dynamics of the Dubins car are x˙=(q˙1,q˙2,θ˙)=(Vcosθ,Vsinθ,u)\dot{x}=(\dot{q}^{1},\dot{q}^{2},\dot{\theta})=(V\cos\theta,V\sin\theta,u) with fixed velocity V:=0.1V:=0.1 and initial condition x(0):=(1.1,1.1,0)x(0):=(-1.1,-1.1,0). We also consider the input constraints |u|umax:=0.4|u|\leq u_{\textrm{max}}:=0.4. The environment consists of a central circular obstacle and four walls (Fig. 2).

For our idealized LiDAR sensor, we define a circular scan radius of 1.11.1 (depicted in red in Fig. 4) around the position qq of the system. Because all states outside of the scan radius could potentially be obstacles, the incrementally learned CBF must keep the system within the scan radius while avoiding detected obstacles. Our local CBFs hi(x)h_{i}(x) and thus also the incremental CBF H(x)H(x) accomplish this requirement, while satisfying input constraints, see Figs. 2, LABEL:, and 3.

In contrast, candidate CBFs constructed from sensor scans without incorporating dynamics may not be valid and fail to satisfy safety constraints. To illustrate this point, we consider signed distance functions (SDFs), which have been used as candidate CBFs in the past, see e.g.  [7]. Fig. 4 shows that an SDF cannot provide safety for this system, even with unbounded actuation (umax=)u_{\text{max}}=\infty), because the gradient of an SDF to obstacles l(x)l(x) is always orthogonal to the vector field of the Dubins car (since l/θ=0\partial l/\partial\theta=0). Though high-order CBFs can be used to resolve this incompatibility, we are able to control the system with a simpler, first-order CBF.

Refer to caption
Figure 5: Let hi|Xih_{i}|X_{i} denote that the CBF hih_{i} was obtained from data XiX_{i}. Top: The magnitudes of gradients of maxi{hi|Xi}\max_{i}\{h_{i}|X_{i}\} are visibly larger than those of h|iXih|\bigcup_{i}X_{i}, and that the zero-level set of maxi{hi|Xi}\max_{i}\{h_{i}|X_{i}\} is closer to the central obstacle. Bottom: The trajectory under maxi{hi|Xi}\max_{i}\{h_{i}|X_{i}\} reaches the target in less time, displaying larger actuation and speed while still maintaining safety. These results suggest that maxi{hi|Xi}\max_{i}\{h_{i}|X_{i}\} enjoys decreased conservatism.

IV-B Planar System

The dynamics of the planar system are:

x˙=[1001]x+[(x1)2+δ00(x2)2+δ]u\displaystyle\dot{x}=\begin{bmatrix}1&0\\ 0&1\end{bmatrix}x+\begin{bmatrix}(x^{1})^{2}+\delta&0\\ 0&(x^{2})^{2}+\delta\end{bmatrix}u

with δ:=0.33\delta:=0.33 and initial condition x(0)=(0,0)x(0)=(0,0). We consider the input constraints |u|umax=1|u|\leq u_{\textrm{max}}=1. The environment consists of two vertically-aligned circular obstacles and four walls (Fig. 5). The scan radius of the LiDAR sensor is 11.

Here, we compare our incremental CBF H(x)H(x) with a CBF that is trained in a single-shot on all the data that was used to incrementally train H(x)H(x). We find that not only do we match the performance of such a single-shot CBF, but our incremental CBF also performs better in this case (Fig. 5). Specifically, the zero-level set is closer to the obstacle, and larger gradients allow the system to approach a target more quickly. This is not surprising, as the maximum of continuously-differentiable CBFs is a more expressive function class than a single continuously differentiable CBF. Furthermore, incrementally learning a CBF is almost three times faster (5.25.2s total for four local CBFs vs. 14.914.9s).

V Conclusion

We presented an online approach for incremental composition of a non-smooth CBF via locally learned CBFs. As a result, our approach allows for the safe exploration of unknown and static environments. We rely on a class of compactly-supported radial basis functions, called Wendland’s functions, whose end behavior guarantee the valid composition of local CBFs. We validated our approach on a feedback-linearizable planar system and a fixed-velocity Dubins car, and compared our approach to existing works.

Though we observe good performance from using Wendland’s functions, we find that their Lipschitz constants can sometimes be too large to certify validity of local CBFs following the conditions in [1], which are sufficient but not necessary. This suggests that we may benefit from a richer class of basis functions, perhaps compactly-supported neural networks [25], or less-conservative conditions for validity. Another avenue of future work is to generalize our approach to handle non-static and stochastic environments.

References

  • [1] A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V. Dimarogonas, S. Tu, and N. Matni, “Learning control barrier functions from expert demonstrations,” 2020.
  • [2] L. Lindemann, A. Robey, L. Jiang, S. Das, S. Tu, and N. Matni, “Learning robust output control barrier functions from safe expert demonstrations,” IEEE Open Journal of Control Systems, 2024.
  • [3] H. Zhao, X. Zeng, T. Chen, Z. Liu, and J. Woodcock, “Learning safe neural network controllers with barrier certificates,” Formal Aspects of Computing, vol. 33, pp. 437–455, 2021.
  • [4] D. Sun, S. Jha, and C. Fan, “Learning certified control using contraction metric,” in Conference on Robot Learning.   PMLR, 2021, pp. 1519–1539.
  • [5] P. Glotfelter, J. Cortés, and M. Egerstedt, “Boolean composability of constraints and control synthesis for multi-robot systems via nonsmooth control barrier functions,” in 2018 IEEE Conference on Control Technology and Applications (CCTA).   IEEE, 2018, pp. 897–902.
  • [6] H. Wendland, “Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree,” Advances in computational Mathematics, vol. 4, pp. 389–396, 1995.
  • [7] K. Long, C. Qian, J. Cortés, and N. Atanasov, “Learning barrier functions with memory for robust safe navigation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4931–4938, 2021.
  • [8] M. Srinivasan, A. Dabholkar, S. Coogan, and P. A. Vela, “Synthesis of control barrier functions using a supervised machine learning approach,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 7139–7145.
  • [9] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-jacobi reachability: A brief overview and recent advances,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC).   IEEE, 2017, pp. 2242–2253.
  • [10] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” 2019.
  • [11] J. J. Choi, D. Lee, K. Sreenath, C. J. Tomlin, and S. L. Herbert, “Robust control barrier–value functions for safety-critical control,” in 2021 60th IEEE Conference on Decision and Control (CDC).   IEEE, 2021, pp. 6814–6821.
  • [12] L. Lindemann and D. V. Dimarogonas, “Control barrier functions for signal temporal logic tasks,” IEEE control systems letters, vol. 3, no. 1, pp. 96–101, 2018.
  • [13] M. Black and D. Panagou, “Consolidated control barrier functions: Synthesis and online verification via adaptation under input constraints,” arXiv preprint arXiv:2304.01815, 2023.
  • [14] S. Yang, M. Black, G. Fainekos, B. Hoxha, H. Okamoto, and R. Mangharam, “Safe control synthesis for hybrid systems through local control barrier functions,” arXiv preprint arXiv:2311.17201, 2023.
  • [15] Z. Gao, G. Yang, and A. Prorok, “Online control barrier functions for decentralized multi-agent navigation,” in 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS).   IEEE, 2023, pp. 107–113.
  • [16] W. Luo, W. Sun, and A. Kapoor, “Sample-efficient safe learning for online nonlinear control with control barrier functions,” in International Workshop on the Algorithmic Foundations of Robotics.   Springer, 2022, pp. 419–435.
  • [17] C. Li, Z. Zhang, A. Nesrin, Q. Liu, F. Liu, and M. Buss, “Instantaneous local control barrier function: An online learning approach for collision avoidance,” arXiv preprint arXiv:2106.05341, 2021.
  • [18] A. S. Lafmejani, S. Berman, and G. Fainekos, “Nmpc-lbf: Nonlinear mpc with learned barrier function for decentralized safe navigation of multiple robots in unknown environments,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 10 297–10 303.
  • [19] Y. Zhang, G. Tian, L. Wen, X. Yao, L. Zhang, Z. Bing, W. He, and A. Knoll, “Online efficient safety-critical control for mobile robots in unknown dynamic multi-obstacle environments,” arXiv preprint arXiv:2402.16449, 2024.
  • [20] A. Safari and J. B. Hoagg, “Time-varying soft-maximum control barrier functions for safety in an a priori unknown environment,” arXiv preprint arXiv:2310.05261, 2023.
  • [21] A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” Advances in neural information processing systems, vol. 20, 2007.
  • [22] L. Lindemann, H. Hu, A. Robey, H. Zhang, D. Dimarogonas, S. Tu, and N. Matni, “Learning hybrid control barrier functions from data,” in Conference on Robot Learning.   PMLR, 2021, pp. 1351–1370.
  • [23] J. V. Frasch, A. Gray, M. Zanon, H. J. Ferreau, S. Sager, F. Borrelli, and M. Diehl, “An auto-generated nonlinear mpc algorithm for real-time obstacle avoidance of ground vehicles,” in 2013 European Control Conference (ECC).   IEEE, 2013, pp. 4136–4141.
  • [24] P. T. An and N. T. Le, “Multiple shooting approach for finding approximately shortest paths for autonomous robots in unknown environments in 2d,” Journal of Combinatorial Optimization, vol. 47, no. 5, pp. 1–32, 2024.
  • [25] A. Barbu and H. Mou, “The compact support neural network,” Sensors, vol. 21, no. 24, p. 8494, 2021.