This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bayesian Meta-Learning on Control Barrier Functions with Data from On-Board Sensors

Wataru Hashimoto, Kazumune Hashimoto, Akifumi Wachi, Xun Shen, Masako Kishida, and Shigemasa Takai Wataru Hashimoto, Kazumune Hashimoto, Xun Shen, and Shigemasa Takai are with the Graduate School of Engineering, Osaka University, Suita, Japan (e-mail: [email protected], {hashimoto, shenxun, takai}@eei.eng.osaka-u.ac.jp). Akifumi Wachi is with LINE Corporation, Tokyo, Japan (email: [email protected]). Masako Kishida is with the National Institute of Informatics, Tokyo, Japan (email: [email protected]). This work is supported by JST CREST JPMJCR201, Japan and by JSPS KAKENHI Grant 21K14184.
Abstract

In this paper, we consider a way to safely navigate the robots in unknown environments using measurement data from sensory devices. The control barrier function (CBF) is one of the promising approaches to encode safety requirements of the system and the recent progress on learning-based approaches for CBF realizes online synthesis of CBF-based safe controllers with sensor measurements. However, the existing methods are inefficient in the sense that the trained CBF cannot be generalized to different environments and the re-synthesis of the controller is necessary when changes in the environment occur. Thus, this paper considers a way to learn CBF that can quickly adapt to a new environment with few amount of data by utilizing the currently developed Bayesian meta-learning framework. The proposed scheme realizes efficient online synthesis of the controller as shown in the simulation study and provides probabilistic safety guarantees on the resulting controller.

Index Terms:
Control barrier function, learning-based control, meta-learning.

I Introduction

ENSURING safety while achieving certain tasks is a fundamental yet challenging problem that has drawn significant attention from researchers in the control and robotics communities over the past few decades. In the control community, such safety requirements are often handled by imposing corresponding state constraints on an optimal control problem. Model Predictive Control (MPC) [1] and control synthesis methods based on the certificate functions such as Control Lyapunov Function (CLF) and Control Barrier Function (CBF) [2, 3, 4] are notable examples of such a method. Desirable theoretical results for these methods such as feasibility, stability, and safety are offered for known system dynamics [1, 2, 3, 4] or noisy dynamics with known noise distribution [5, 6, 7].

The CBF is a prominent tool to impose forward invariance of a safe set in state space. If a dynamical system is control-affine, quadratic programming formulation called CBF-QP [4] is available, which can produce safe control inputs much faster than many other control methods including MPC, and enables real-time implementations for challenging applications such as autonomous vehicles [8] and bipedal locomotion [9]. However, the majority of the existing works presuppose the availability of a valid CBF that represents safe and unsafe regions in state space. This assumption, however, cannot be maintained if a robot is expected to operate autonomously in unknown or uncertain environments. In such situations, it is essential for a robotic system to automatically identify unsafe regions on the fly, employing data from sensory devices. Thus, in this paper, we consider a learning-based method for constructing a CBF based on online measurements obtained from onboard sensors such as LiDAR scanners.

In the previous studies regarding this line of the topic (e.g., [31, 32, 33]), the robotic system iteratively improves the prediction of the CBF by collecting online sensor measurements without any prior knowledge about the environment or learning task. However, in such cases, even a slight change in the environment necessitates a complete retraining process, which is inefficient and requires a substantial amount of data. To efficiently and effectively learn CBF, adapting it to a new environment with minimal or small amount of training data, we suggest implementing meta-learning [10, 11], a technique that allows an agent to learn how to learn, as opposed to creating it from scratch. During the meta-training process, the model parameters are trained with data collected from various different environments, aiming to learn common underlying knowledge or structure of the learning task. The parameters obtained through meta-learning are then utilized as initial parameters for the online phase in a new environment and updated with online measurements in the environment. In this paper, we specifically employ a learning scheme combining the recently developed Bayesian meta-learning method [12] which is equipped with probabilistic bounds on the prediction [13], and the Bayesian surface reconstruction method [14, 15, 16] that can learn the unsafe regions from noisy sensor measurements. With this method, we can learn CBF and synthesize a safe controller with few amount of data and provide a probabilistic guarantee on the resulting controller.

Related works on learning and CBF : Learning-based approaches for certificate functions such as CBF and CLF are one of the active research topics in the area of intersections between control and learning [17]. In the following, we summarize the recent progress in the topic of learning and CBF. The authors of the works [18, 19, 20, 21, 22, 23] used CBF for active safe learning of uncertain dynamics. The works [18, 19] consider machine learning approaches to safely reduce the model uncertainty and show the methods yield empirically good performances, while the Gaussian Process (GP) is also used in [20, 21, 22, 23] and rigorous theoretical results for safety requirements are offered under certain assumptions. CBF is also used in Imitation Learning (IL) and Reinforcement Learning (RL) [25, 26, 24] to account for safety concerns.

While the above studies assume a valid CBF is given and consider leveraging it to impose safety conditions, several previous studies also consider parameterizing CBF itself by Neural Network (NN) and learning it to be a valid barrier function [27, 28, 29, 30, 31, 32, 33]. The work [27] considers jointly learning NNs representing CBF and CLF as well as a control policy based on safe and unsafe samples. The method proposed in [28] considers recovering CBF from expert demonstrations and theoretically guarantees that the learned NN meets the CBF conditions under the Lipsitz continuity assumptions. In [31], GP is also used to synthesize CBF. The works most related to this paper are [32, 33] which propose ways to learn CBF with sensor measurements. In [32], Support Vector Machine (SVM) classifier representing CBF is trained with safe and unsafe samples constructed by using the LiDAR measurements. Similarly, the authors of the work [33] present a synthesis method using Signed Distance Function (SDF).

Related works on meta-learning and control : Meta-learning, also known as “learning to learn,” is a technique capable of learning from previous learning experiences or tasks in order to improve the efficiency of future learning [10]. Meta-learning has recently become increasingly popular in the control literature. In several previous studies such as [13, 34], meta-learning is used to learn system dynamics to enable quick adaptation to the changes in surrounding situations. In these works, the trained models are incorporated into MPC which requires relatively heavy online computation. Meta-learning is also used in policy learning in the context of adaptive control [35] and reinforcement learning [10, 36].

Contributions: Our contributions compared to the existing methods are threefold: First, compared to the methods that synthesize CBF with sensor measurements [31, 32, 33] which rely on traditional supervised learning, the proposed meta-learning scheme can effectively use past data collected from different environments and produce a prediction of CBF with few amount of data. Consequently, the resulting control scheme realizes less conservative control performance with few amount of data (small number of online CBF updates) as we see in the case study. Second, different from the previous NN-based CBF synthesis methods [27, 29, 30, 32, 33], our method can readily take into account uncertainty in data and calculate formal probabilistic bounds on CBF. Based on this, we provide a probabilistic safety guarantee on the proposed control scheme. Although the work [33] takes into account the prediction error in SDF, such an error bound is assumed to be given and actual computation is not provided. Third, the learned CBF can readily be incorporated into ordinary QP formulation which can be efficiently solved. Thus, our method can produce control input much faster than the previous methods regarding meta-learning and MPC such as [13, 34].

Notations: A continuous functions α1:00\alpha_{1}:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}_{\geq 0} and α2:\alpha_{2}:\ \mathbb{R}\rightarrow\mathbb{R} are class 𝒦\mathcal{K} and extended class 𝒦\mathcal{K}_{\infty} function, respectively, if they are strictly increasing with α1(0)=0\alpha_{1}(0)=0 and α2(0)=0\alpha_{2}(0)=0. 𝒳d2(p)\mathcal{X}_{d}^{2}(p) is the pp-th quantile of the 𝒳2\mathcal{X}^{2} distribution with dd degrees of freedom. The maximum and minimum eigenvalues for any positive definite matrix AA are defined as λ¯(A)\bar{\lambda}(A) and λ¯(A)\underline{\lambda}(A), respectively. For vector fields ff and a function FF, LfF(x):=F(x)xf(x)L_{f}F(x):=\frac{\partial F(x)}{\partial x}f(x) is the Lie derivatives of the function FF in the direction of ff.

II Problem Statement

Throughout this paper, we consider the following control affine system:

x˙\displaystyle\dot{x} =f(x)+g(x)u,\displaystyle=f(x)+g(x)u, (1)

where x𝒟nx\in\mathcal{D}\subset\mathbb{R}^{n} and u𝒰:={umAub}u\in\mathcal{U}:=\{u\in\mathbb{R}^{m}\mid Au\leq b\} with Am×mA\in\mathbb{R}^{m\times m} and bmb\in\mathbb{R}^{m} are the admissible system states and control inputs respectively. The functions f:nnf:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} and g:nn×mg:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n\times m} are continuously differentiable functions representing dynamics of a robotic system (e.g., ground vehicles or drones) equipped with sensors such as LiDAR. The state space is assumed to contain initially unknown NobsN_{\mathrm{obs}} obstacles and the outside of each obstacle is defined by the following:

𝒞ξi={x𝒟nh(x;ξi)0},i1:Nobs,\displaystyle\mathcal{C}_{\xi_{i}}=\{x\in\mathcal{D}\subset\mathbb{R}^{n}\mid h(x;\xi_{i})\geq 0\},\ i\in\mathbb{N}_{1:N_{\mathrm{obs}}}, (2)

where h(;ξ):nh(\cdot;\xi):\mathbb{R}^{n}\rightarrow\mathbb{R} is a continuously differentiable function with unobserved latent variable ξ\xi that encodes the geometric information about an obstacle, e.g., the position and shape of an obstacle. We assume realizations of the variable, ξi\xi_{i}, i1:Nobsi\in\mathbb{N}_{1:N_{\mathrm{obs}}} are determined by samples from a probabilistic distribution p(ξ)p(\xi) and they are fixed during the control execution. For simplicity, the obstacles are chosen not to overlap with each other.

Our goal is to synthesize a controller that achieves a given task (e.g., goal-reaching task) without deviating from the safe region 𝒞=i=1Nobs𝒞ξi\mathcal{C}=\bigcap_{i=1}^{N_{\mathrm{obs}}}\mathcal{C}_{\xi_{i}} given initial state x(0)𝒞x(0)\in\mathcal{C}. Since the safe/unsafe regions are initially unknown, the robotic system needs to identify them based on the sensor measurements. A naive approach toward this end is to use supervised learning based on measurement data obtained under a pre-determined environment. However, in this case, the trained model cannot be generalized to different environments, and even slight changes in the environment necessitate a complete re-fitting of the model, which is not only inefficient but also demands a substantial amount of data. Thus, this paper considers a way to effectively use past data regarding the different environments sampled from the distribution p(ξ)p(\xi) to quickly learn safe/unsafe regions in a new environment and synthesize the safe controller with few amount of online data.

In the following, we first introduce the control notions suited for our purpose called Control Barrier Function (CBF) and Control Lyapunov Function (CLF) in Section III. Then, the meta-learning procedures of CBF and a way to synthesize QP with the learned CBF are discussed in Section IV and V.

III CBF-CLF-QP

In this section, we first introduce the Zeroing Control Barrier Function (ZCBF) as a tool to enforce safety on the system (1). The following discussion is a summary of the works [2, 4]. In ZCBF, the safety is defined as the forward-invariance of a safe set 𝒞\mathcal{C} which is defined by the super zero level set of a function h:nh:\mathbb{R}^{n}\rightarrow\mathbb{R}, i.e., the system (1) is considered to be safe if x(t)𝒞:={xh(x)0}x(t)\in\mathcal{C}:=\{x\mid h(x)\geq 0\} holds for all t0t\geq 0 when x(0)𝒞x(0)\in\mathcal{C}. Given an extended class 𝒦\mathcal{K}_{\infty} function αC:\alpha_{C}:\mathbb{R}\rightarrow\mathbb{R}, the ZCBF is defined as follows.

Definition 1.

Let 𝒞𝒟n\mathcal{C}\subset\mathcal{D}\subset\mathbb{R}^{n} be the zero super level set of a continuously differentiable function h:nh:\ \mathbb{R}^{n}\rightarrow\mathbb{R}. Then, the function hh is a zeroing control barrier function (ZCBF) for (1) on 𝒞\mathcal{C} if there exists an extended class 𝒦\mathcal{K}_{\infty} function αC\alpha_{C} such that, for all x𝒟x\in\mathcal{D} the following inequality holds.

supum{Lfh(x)+Lgh(x)u+αC(h(x))}0.\displaystyle\sup_{u\in\mathbb{R}^{m}}\left\{L_{f}h(x)+L_{g}h(x)u+\alpha_{C}(h(x))\right\}\geq 0. (3)

We additionally define the set of control inputs that satisfy the CBF condition as follows:

UC(x)={umLfh(x)+Lgh(x)u+αC(h(x))0}.\displaystyle U_{C}(x)=\{u\in\mathbb{R}^{m}\mid L_{f}h(x)+L_{g}h(x)u+\alpha_{C}(h(x))\geq 0\}.

If the function hh is a valid CBF for (1) on 𝒞\mathcal{C}, the safety of the system (1) is guaranteed as the following theorem.

Theorem 1 ([4]).

Let 𝒞𝒟n\mathcal{C}\subset\mathcal{D}\subset\mathbb{R}^{n} be a super zero level set of a continuously differentiable function h:nh:\ \mathbb{R}^{n}\rightarrow\mathbb{R} and the function hh is ZCBF for (1). Then, any policy that selects control inputs from UC(x)U_{C}(x) renders the safe set 𝒞\mathcal{C} forward invariant for the system (1).

In this paper, the control objectives, except for the safety requirements, are encoded through the control Lyapunov function (CLF) which is given in advance. With a class 𝒦\mathcal{K} function αV\alpha_{V}, the definition of CLF is formally given as follows [2].

Definition 2.

A continuously differentiable function V:n0V:\ \mathbb{R}^{n}\rightarrow\mathbb{R}_{\geq 0} is a CLF if V(xe)=0V(x_{e})=0 with equilibrium point xe𝒟x_{e}\in\mathcal{D} and there exists a class 𝒦\mathcal{K} function αV:\alpha_{V}:\mathbb{R}\rightarrow\mathbb{R} such that, for all x𝒟nx\in\mathcal{D}\subset\mathbb{R}^{n} the following inequality holds.

infum{LfV(x)+LgV(x)u+αV(V(x))}0.\displaystyle\inf_{u\in\mathbb{R}^{m}}\left\{L_{f}V(x)+L_{g}V(x)u+\alpha_{V}(V(x))\right\}\leq 0. (4)

Given a CLF VV, we consider the set of all control inputs that satisfy (4) for a state x𝒟x\in\mathcal{D} as

UV(x)={umLfV(x)+LgV(x)u+αV(V(x))0}.\displaystyle U_{V}(x)=\{u\in\mathbb{R}^{m}\mid L_{f}V(x)+L_{g}V(x)u+\alpha_{V}(V(x))\leq 0\}.

If a function VV is CLF, the stability of the equilibrium point xex_{e} is guaranteed as the following theorem.

Theorem 2 ([3]).

Let a continuously differentiable function V:n0V:\ \mathbb{R}^{n}\rightarrow\mathbb{R}_{\geq 0} be CLF for (1). Then, any policy that selects control inputs from UV(x)U_{V}(x) renders the equilibrium point xex_{e} of the system (1) asymptotically stable.

Having CBF and CLF defined above, we can synthesize a controller satisfying CBF and CLF conditions (3) and (4) with quadratic programming (QP) as follows:

min(u,ϵ)m+112uHu+λϵ2,\displaystyle\min_{(u,\epsilon)\in\mathbb{R}^{m+1}}\ \frac{1}{2}u^{\top}Hu+\lambda\epsilon^{2}, (5a)
s.t.LfV(x)+LgV(x)u+αV(V(x))ϵ,\displaystyle\mathrm{s.t.}\ L_{f}V(x)+L_{g}V(x)u+\alpha_{V}(V(x))\leq\epsilon, (5b)
Lfh(x)+Lgh(x)u+αC(h(x))0\displaystyle\ \ \ \ \ L_{f}h(x)+L_{g}h(x)u+\alpha_{C}(h(x))\geq 0 (5c)
Aub,\displaystyle\ \ \ \ \ Au\leq b, (5d)

where Hm×mH\in\mathbb{R}^{m\times m} is a positive definite matrix and λ(>0)\lambda\ (>0) is a tunable parameter for relaxing the CLF condition (5b). The relaxation of the CLF condition ensures the solvability of the QP. If the safety is defined by multiple CBFs, we can handle them by adding the corresponding CBF constraints to the optimization problem.

IV Bayesian Meta-Learning of CBF

In this section, we consider learning the safe region (2) based on the measurement data from sensor devices. To this end, we employ a technique based on the Gaussian process implicit surfaces (GPIS) [14, 15, 16] which is developed for non-parametric probabilistic reconstruction of object surfaces. Note that we consider the case of Nobs=1N_{\mathrm{obs}}=1 for a while, and the case of multiple obstacles is discussed later in the next section. In this paper, the implicit surfaces (IS) are defined by the distance from the surface of objects to a certain point zz in 2D or 3D space, and the inside and outside of the object surfaces are interpreted through the sign of the function as follows.

hIS(z;ξ)={dξ(z)if z is outside the obstacle0if z is on the surfacedξ(z)if z is inside the obstacle\displaystyle h_{\mathrm{IS}}(z;\xi)=\left\{\begin{array}[]{ll}d_{\xi}(z)&\text{if $z$ is outside the obstacle}\\ 0&\text{if $z$ is on the surface}\\ -d_{\xi}(z)&\text{if $z$ is inside the obstacle}\end{array}\right. (9)

where zbz\in\mathbb{R}^{b}, b{2,3}b\in\{2,3\} is a position in a 2D or 3D space and dξ:b0d_{\xi}:\mathbb{R}^{b}\rightarrow\mathbb{R}_{\geq 0} is the function that returns the minimum Euclidean distance between zz and surface of the obstacle corresponding to ξp(ξ)\xi\sim p(\xi). Though this paper focuses on a 2D environment, our method can easily be extended to a 3D case. In GPIS, the function hISh_{\mathrm{IS}} is estimated with the Gaussian process (GP), which enables us to deal with uncertainty arising from measurement noise and scarcity of data. The GP is estimated with a dataset {(zi,hIS(zi,ξ)+ςi)}i=1ndata\{(z^{i},h_{\mathrm{IS}}(z^{i},\xi)+\varsigma_{i})\}_{i=1}^{n_{\mathrm{data}}}, where ςi\varsigma_{i} is bounded σ\sigma-subgaussian noise and ndatan_{\mathrm{data}} is the number of data.

In the following, we explain how to construct a dataset and apply Bayesian meta-learning to the training of hISh_{\mathrm{IS}}.

IV-A Dataset Construction

Refer to caption
Figure 1: The construction of data points from noisy sensor measurements: The gray dots are the points on the surface detected by the sensors. The green, blue, and red dots represent the points that have distance 2Δ2\Delta, Δ\Delta, Δ-\Delta from the surface (negative value means inside of the obstacle) which are calculated by using surface normal approximation explained in Section IV-A (we show the case of n=1n_{-}=1 and n+=2n_{+}=2).

In this subsection, we explain how to construct a dataset to learn implicit surfaces (9) based on the measurement data from sensors such as LiDAR. A pictorial image for obtaining training data from noisy sensor measurements is shown in Fig. 1. First, assume we have a point cloud on the surface 𝒫0=i=1nsurf{z0,i}\mathcal{P}^{0}=\cup_{i=1}^{n_{\mathrm{surf}}}\{z^{0,i}\}, where nsurf>0n_{\mathrm{surf}}\in\mathbb{Z}_{>0} is the number of surface points and z0,i2z^{0,i}\in\mathbb{R}^{2}, i1:nsurf\forall i\in\mathbb{N}_{1:n_{\mathrm{surf}}} are the world Cartesian coordinates of surface points computed with the raw depth readings produced by sensors and current position of the robot. Then, for each surface point z0,iz^{0,i}, the normal to the surface of the obstacle at that point is approximated by the perpendicular to the segment between the point and the nearest neighbor point within the same scan [16]. Under the assumption that the sensor measurements on the surface are dense enough, this method yields accurate approximations. For each surface point z0,iz^{0,i} and corresponding approximated normal to the surface, we construct a set of points p=nn+{zpΔ,i}\cup_{p=-n_{-}}^{n_{+}}\{z^{p\Delta,i}\} with Δ>0\Delta\in\mathbb{R}_{>0} and n+,n0n_{+},n_{-}\in\mathbb{Z}_{\geq 0}, where zpΔ,iz^{p\Delta,i}, pn:n+\forall p\in\mathbb{Z}_{-n_{-}:n_{+}} are the representations of the points that are pΔp\Delta away from the surface of the obstacle (see Fig 1). A negative pp value means that the point zpΔ,iz^{p\Delta,i} is inside the obstacle. Then, we construct the data set to train hISh_{\mathrm{IS}} as D=i=1nsurfp=nn+{(zpΔ,i,pΔ)}D=\cup_{i=1}^{n_{\mathrm{surf}}}\cup_{p=-n_{-}}^{n_{+}}\{(z^{p\Delta,i},p\Delta)\}.

IV-B Bayesian meta-learning of the function hISh_{\mathrm{IS}}

Refer to caption
Figure 2: The summation of the proposed approach: The proposed method consists of offline meta-training and online control execution. The offline training follows the procedures in Section IV-B2. In each iteration, we first sample JJ obstacles from the distribution p(ξ)p(\xi) and construct the datasets DξjtrD_{\xi_{j}}^{\mathrm{tr}} and DξjtsD_{\xi_{j}}^{\mathrm{ts}} for each of these obstacles. Then, the loss (IV-B2) of the prediction made by the model adopted from DξjtrD_{\xi_{j}}^{\mathrm{tr}} is evaluated for test data DξjtsD_{\xi_{j}}^{\mathrm{ts}} and the parameters are updated with the gradients of the loss. In the online control execution, the control input is generated by CBF-CLF-QP (5) with the CBFs obtained through the adaptation of the posterior parameters (11) and the probabilistic bounds (17). The dataset for the adaptation is constructed using the sensor measurements obtained during the operation by following procedures in Section IV-A and the data selection scheme discussed in V-A.

When we train the IS function online, the efficiency of the training may be problematic while the model trained offline is not capable of generalizing to new environments. Thus, instead of relying on either online or offline training, we consider employing a meta-learning [10] scheme, which enables quick prediction on hISh_{\mathrm{IS}} with a few amount of online data by effectively using data from different settings. In the following, we first introduce a Bayesian meta-learning framework ALPaCA (Adaptive Learning for Probabilistic Connectionist Architectures) [12]. Then, the probabilistic bounds on the prediction are subsequently discussed.

IV-B1 Overview of ALPaCA

We first parameterize the prediction of the function hISh_{\mathrm{IS}} by the following form:

h^IS(z;θ)=θϕw(z),\displaystyle\hat{h}_{\mathrm{IS}}(z;\theta)=\theta^{\top}\phi_{w}(z), (10)

where ϕw:2d\phi_{w}:\mathbb{R}^{2}\rightarrow\mathbb{R}^{d} represents a feed-forward neural network with parameter ww and θd\theta\in\mathbb{R}^{d} is a coefficient matrix that follows a Gaussian distribution 𝒩(θ¯,σ2Λ1)\mathcal{N}(\bar{\theta},\sigma^{2}\Lambda^{-1}) which encodes the information and uncertainty associated with the unknown variable ξp(ξ)\xi\sim p(\xi). Here, θ¯d\bar{\theta}\in\mathbb{R}^{d} denotes the mean parameters and Λd×d\Lambda\in\mathbb{R}^{d\times d} is the positive definite precision matrix. Once the parameters of ϕw\phi_{w} and the prior distribution, (w,θ¯0,Λ0w,\bar{\theta}_{0},\Lambda_{0}), are determined through the offline meta-training procedure explained later, and the new measurement data along with a fixed ξp(ξ)\xi\sim p(\xi) is given as Dξ={(zi,yi)}i=1nadaptD_{\xi}=\{(z_{i},y_{i})\}_{i=1}^{n_{\mathrm{adapt}}}, the posterior distribution on θ\theta is obtained as follows.

Λξ=ΦξΦξ+Λ0,θ¯ξ=Λξ1(ΦξGξ+Λ0θ0),\displaystyle\Lambda_{\xi}=\Phi_{\xi}^{\top}\Phi_{\xi}+\Lambda_{0},\ \ \bar{\theta}_{\xi}=\Lambda_{\xi}^{-1}(\Phi_{\xi}^{\top}G_{\xi}+\Lambda_{0}\theta_{0}), (11)

where Gξ=[y1,y2,,ynadapt]nadaptG_{\xi}^{\top}=[y_{1},y_{2},\ldots,y_{n_{\mathrm{adapt}}}]\in\mathbb{R}^{n_{\mathrm{adapt}}}, and Φξ=[ϕw(z0),ϕw(z1),,ϕw(znadapt)]d×nadapt\Phi_{\xi}^{\top}=[\phi_{w}(z_{0}),\phi_{w}(z_{1}),\ldots,\phi_{w}(z_{n_{\mathrm{adapt}}})]\in\mathbb{R}^{d\times n_{\mathrm{adapt}}}. The above computation is based on Bayesian linear regression. Here, the mean and precision matrices are subscripted by ξ\xi to explicitly show that they are calculated using the data associated with unknown variable ξp(ξ)\xi\sim p(\xi). Then, for any given z2z\in\mathbb{R}^{2}, the mean and variance of the posterior predictive distribution are obtained as

μξ(z)=θ¯ξϕw(z),Σξ(z)=σ2(1+ϕw(z)Λξ1ϕw(z)).\displaystyle\mu_{\xi}(z)=\bar{\theta}_{\xi}^{\top}\phi_{w}(z),\ \Sigma_{\xi}(z)=\sigma^{2}(1+\phi_{w}(z)^{\top}\Lambda_{\xi}^{-1}\phi_{w}(z)). (12)

IV-B2 Offline meta-training procedures

In the offline meta-training procedure, the parameters (ww, θ¯0\bar{\theta}_{0}, Λ0\Lambda_{0}) are trained so that the previously discussed online adaptation yields accurate prediction (The pseudo-code of the offline training is summarized in Algorithm 2 in Appendix -A). To this end, we iterate the following procedures prescribed Nite>0N_{\mathrm{ite}}\in\mathbb{Z}_{>0} times. First, we prepare J>0J\in\mathbb{Z}_{>0} data sets by sampling obstacles ξ1,ξ2,,ξJ\xi_{1},\xi_{2},\ldots,\xi_{J} from the distribution p(ξ)p(\xi) and collect the data for each obstacle as Dξj=i=1nj{(zij,yij)}D_{\xi_{j}}=\cup_{i=1}^{n_{j}}\{(z_{i}^{j},y_{i}^{j})\}, where njn_{j} is the number of data within DξjD_{\xi_{j}}. Then, we randomly split each dataset DξjD_{\xi_{j}} into training set Dξjtr=i=1njtr{(zitr,j,yitr,j)}D_{\xi_{j}}^{\mathrm{tr}}=\cup_{i=1}^{n_{j}^{\mathrm{tr}}}\{(z_{i}^{\mathrm{tr},j},y_{i}^{\mathrm{tr},j})\} and test set Dξjts=i=1njts{(zits,j,ypts,j)}D_{\xi_{j}}^{\mathrm{ts}}=\cup_{i=1}^{n_{j}^{\mathrm{ts}}}\{(z_{i}^{\mathrm{ts},j},y_{p}^{\mathrm{ts},j})\}, where njtrn_{j}^{\mathrm{tr}} and njtsn_{j}^{\mathrm{ts}} are number of data within DξjtrD_{\xi_{j}}^{\mathrm{tr}} and DξjtsD_{\xi_{j}}^{\mathrm{ts}} with njtr+njts=njn_{j}^{\mathrm{tr}}+n_{j}^{\mathrm{ts}}=n_{j}. In each iteration, njtrn_{j}^{\mathrm{tr}} is also randomly chosen from the uniform distribution over {1,,nj}\{1,\ldots,n_{j}\}. The data within DξjtrD_{\xi_{j}}^{\mathrm{tr}} is used to calculate the posterior distribution on the parameter θξj\theta_{\xi_{j}} (i.e., mean θ¯ξj\bar{\theta}_{\xi_{j}} and precision matrix Λξj\Lambda_{\xi_{j}}) through (11) while data within DξjtsD_{\xi_{j}}^{\mathrm{ts}} is used to evaluate the accuracy of the model (10) with the posterior parameters θ¯ξj\bar{\theta}_{\xi_{j}} and Λξj\Lambda_{\xi_{j}} calculated above. We define the meta-learning objective by the following marginal log-likelihood across the data and update the parameters through stochastic gradient descent.

(\displaystyle\ell( θ¯0,Λ0,w):=j=1Ji=1njtslogp(yits,jzits,j)\displaystyle\bar{\theta}_{0},\Lambda_{0},w):=\sum_{j=1}^{J}\sum_{i=1}^{n_{j}^{\mathrm{ts}}}\log p(y_{i}^{\mathrm{ts},j}\mid z_{i}^{\mathrm{ts},j})
=j=1Ji=1njts(log(1+ϕw(zits,j)Λξjϕw(zits,j))\displaystyle=\sum_{j=1}^{J}\sum_{i=1}^{n_{j}^{\mathrm{ts}}}(\log(1+\phi_{w}(z_{i}^{\mathrm{ts},j})^{\top}\Lambda_{\xi_{j}}\phi_{w}(z_{i}^{\mathrm{ts},j}))
+(yits,jθ¯ξjϕw(zits,j))Σξj(zits,j)\displaystyle\ \ \quad+(y_{i}^{\mathrm{ts},j}-\bar{\theta}_{\xi_{j}}^{\top}\phi_{w}(z_{i}^{\mathrm{ts,j}}))^{\top}\Sigma_{\xi_{j}}(z_{i}^{\mathrm{ts,j}})
(yits,jθ¯ξjϕw(zits,j))).\displaystyle\qquad\quad(y_{i}^{\mathrm{ts},j}-\bar{\theta}_{\xi_{j}}^{\top}\phi_{w}(z_{i}^{\mathrm{ts,j}}))). (13)

IV-B3 Probabilistic bounds on CBF

From the discussions above, we can compute the predictive distribution of the function value hIS(z;ξ)h_{\mathrm{IS}}(z;\xi) for any z2z\in\mathbb{R}^{2} and ξp(ξ)\xi\sim p(\xi) by (12) once the offline meta-training procedure discussed in Section IV-B2 has been done and new measurements are obtained along with the online execution. Here, we consider deriving a deterministic function with probabilistic guarantees, which can readily be incorporated in the QP formulation (5). The following discussion follows from [13]. Before deriving the probabilistic bounds on the prediction of hISh_{\mathrm{IS}}, we make the following two assumptions on the quality of the offline meta-training, which are realistic and intuitive ones.

Assumption 1.

For all ξp(ξ)\xi\sim p(\xi), there exists θξd\theta^{*}_{\xi}\in\mathbb{R}^{d} such that

θξϕw(z)=hIS(z;ξ),z2.\displaystyle{\theta^{*}_{\xi}}^{\top}\phi_{w}(z)=h_{\mathrm{IS}}(z;\xi),\ \ \forall z\in\mathbb{R}^{2}. (14)
Assumption 2.

For all ξp(ξ)\xi\sim p(\xi), we have the following:

(θξθ¯0Λi2σ2𝒳d2(1δ))1δ.\displaystyle\mathbb{P}(\|\theta^{*}_{\xi}-\bar{\theta}_{0}\|_{\Lambda_{i}}^{2}\leq\sigma^{2}\mathcal{X}_{d}^{2}(1-\delta))\geq 1-\delta. (15)

Assumption 1 implies that the meta-learning model (10) is capable of fitting the true function while Assumption 2 says that the uncertainty in the prior is conservative enough. Under these assumptions, we can derive the bounds on the function hISh_{\mathrm{IS}} as follows.

Theorem 3.

Suppose the offline meta-training of the parameters ww, θ¯0\bar{\theta}_{0}, and Λ0\Lambda_{0} is done to satisfy Assumption 1 and 2, and the parameters defining posterior distribution θ¯ξ\bar{\theta}_{\xi} and Λξ\Lambda_{\xi} are computed by (11) with data from the new environment ξp(ξ)\xi\sim p(\xi). Then, the L-1 norm of the difference between the true function value hIS(z;ξ)h_{\mathrm{IS}}(z;\xi) and the mean prediction h¯IS(z;θξ)=θ¯ξϕ(z)\bar{h}_{\mathrm{IS}}(z;\theta_{\xi})=\bar{\theta}_{\xi}\phi(z) with θξ𝒩(θ¯ξ,Λξ)\theta_{\xi}\sim\mathcal{N}(\bar{\theta}_{\xi},\Lambda_{\xi}), for all z2z\in\mathbb{R}^{2} and ξp(ξ)\xi\sim p(\xi) is bounded as the following.

|hIS(z;ξ)h¯IS(z;θξ)||ϕ(z)(θ¯ξθξ)|ϕ(z)Λξ1βξ,\displaystyle|h_{\mathrm{IS}}(z;\xi)-\bar{h}_{\mathrm{IS}}(z;\theta_{\xi})|\leq|\phi(z)^{\top}(\bar{\theta}_{\xi}-\theta^{*}_{\xi})|\leq\|\phi(z)\|_{\Lambda_{\xi}^{-1}}\beta_{\xi}, (16)

with

βξ=σ(2log(1δdet(Λξ)1/2det(Λ0)1/2)+λ¯(Λ0)λ¯(Λξ)𝒳d2(1δ)),\displaystyle\beta_{\xi}=\sigma\left(\sqrt{2\log\left(\frac{1}{\delta}\frac{\mathrm{det}(\Lambda_{\xi})^{1/2}}{\mathrm{det}(\Lambda_{0})^{1/2}}\right)}+\sqrt{\frac{\bar{\lambda}(\Lambda_{0})}{\underline{\lambda}(\Lambda_{\xi})}\mathcal{X}_{d}^{2}(1-\delta)}\right),

with probability at least (12δ)(1-2\delta).

Proof.

Proof of this result follows from [13] and [37]. In the middle of the proof of Theorem 1 in [13], it is shown that |a(θ¯ξθξ)|aΛξ1βξ|a^{\top}(\bar{\theta}_{\xi}-\theta^{*}_{\xi})|\leq\|a\|_{\Lambda_{\xi}^{-1}}\beta_{\xi} holds for any ada\in\mathbb{R}^{d} with probability at least (12δ)(1-2\delta) (see pp. 18 of [13]). Then, (16) is obtained by substituting ϕw(z)\phi_{w}(z) to aa in the inequality. ∎

Since the difference between the true value and mean prediction is bounded by aΛξ1βξ\|a\|_{\Lambda_{\xi}^{-1}}\beta_{\xi}, we can provide the lower bound of hISh_{\mathrm{IS}} corresponding to a robot state xx as the following.

hb(x;θξ):=h¯IS(v(x);θξ)ϕ(v(x))Λξ1βξ,\displaystyle h^{b}(x;\theta_{\xi}):=\bar{h}_{\mathrm{IS}}(v(x);\theta_{\xi})-\|\phi(v(x))\|_{\Lambda_{\xi}^{-1}}\beta_{\xi}, (17)

where v:n2v:\mathbb{R}^{n}\rightarrow\mathbb{R}^{2} is the function that maps the robot’s state to the robot’s position in 2D space. By using hbh^{b} as the CBF, we can provide a probabilistic guarantee of the resulting controller as elaborated in Section V.

V Online Control Execution

Input : ff, gg (system dynamics model); p(ξ)p(\xi) (distribution of obstacles); (w,θ¯0,Λ0)(w,\bar{\theta}_{0},\Lambda_{0}) (prior parameters); δ\delta (confidence level); VV (CLF); TT (execution time)
1 ξip(ξ),i,i1:Nobs\xi_{i}\sim p(\xi),\ \mathcal{B}_{i}\leftarrow{\emptyset},\ \forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}};
2
3while t<Tt<T do
4       if t=kΔlidart=k\Delta_{\mathrm{lidar}}, k0k\in\mathbb{Z}_{\geq 0} then
5             [CBF update]
6             for i=1:Nobsi=1:N_{\mathrm{obs}} do
7                   Obtain sensor measurements 𝒫i0\mathcal{P}^{0}_{i};
8                   Construct dataset DiD_{i} by procedures in Sec IV-A
9                   Update the buffer i\mathcal{B}_{i} by procedures in Sec V-A;
10                  
11                  if i\mathcal{B}_{i}\neq\emptyset then
12                         Compute the posterior parameters θ¯ξi\bar{\theta}_{\xi_{i}} and Λξi\Lambda_{\xi_{i}} by (11);
13                         Set hb(;θξi)=hISb(v();θξi)h^{b}(\cdot;\theta_{\xi_{i}})={h}^{b}_{IS}(v(\cdot);\theta_{\xi_{i}}) by (17);
14      [Control execution]
15       Solve the QP (5) with the CBFs obtained above;
16       Solve (1) and update state x(t)x(t);
Algorithm 1 Online control execution

Given system (1), meta-learned parameters (w,θ¯0,Λ0)(w,\bar{\theta}_{0},\Lambda_{0}), and CLF VV that encodes a control objective, the procedures in the online control execution is as summarized in Algorithm 1. Before the execution, an environment is determined by sampling the obstacles ξi,i1:Nobs\xi_{i},\ \forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}} from the distribution p(ξ)p(\xi) and the data buffers i\mathcal{B}_{i} for ξi,i1:Nobs\xi_{i},\ \forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}} are initialized by empty sets (Line 1). Then, the control is executed by repeatedly solving the QP (5) with the CBFs (17) defined using the current posteriors θ¯ξi\bar{\theta}_{\xi_{i}} and Λξi\Lambda_{\xi_{i}} calculated by the data in the current buffers (Line 13,14). The buffers i\mathcal{B}_{i} and the posterior parameters θ¯ξi\bar{\theta}_{\xi_{i}} and Λξi\Lambda_{\xi_{i}} i1:Nobs\forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}} are updated at every Δlidar>0\Delta_{\mathrm{lidar}}\in\mathbb{R}_{>0} [s] through the procedures in the following subsection (Line 3-11). Since each buffer i\mathcal{B}_{i} is empty at the beginning of the execution, we ignore the CBF constraint for ξi\xi_{i} until surface points of the corresponding obstacle are detected. To justify this, we make the following assumption.

Assumption 3.

The length of the laser is long enough and the measurements are taken constant enough such that the control invariant of the safe region is maintained even if the unobserved obstacles are ignored in the current QP.

In the following subsections, we discuss the update scheme of each buffer i\mathcal{B}_{i} and the theoretical results of the proposed control scheme.

V-A The update scheme of each buffer i\mathcal{B}_{i}

The update procedures of each buffer i\mathcal{B}_{i} are discussed here. We ignore the subscription ii in the following discussion for simplicity of notation. At time instance kk, suppose we have new data points Dk=i=1nkp=nn+{(zk+pΔ,i,pΔ)}D_{k}=\cup_{i=1}^{n_{k}}\cup_{p=n_{-}}^{n_{+}}\{(z_{k}^{+p\Delta,i},p\Delta)\} obtained by following the procedures in Section IV-A and the buffer \mathcal{B} which is constructed by the data points obtained at past time instances 0 to k1k-1. Intuitively, the data points in DkD_{k} will include many data points that are similar to those obtained in the past instances if the time interval of updating the dataset is small. Thus, it is important to select only the informative data points before adding them to the buffer. Here, we consider the data selection scheme for this purpose based on [39]. We repeat the following procedures for i1:nki\in\mathbb{N}_{1:n_{k}}. First, we compute the predictive variance at a data point on the surface zk0,iz_{k}^{0,i} with posterior parameters calculated by the current \mathcal{B} as Σξ(zk0,i)\Sigma_{\xi}(z_{k}^{0,i}). Then, if the value Σξ(zk0,i)\Sigma_{\xi}(z_{k}^{0,i}) is larger than the prescribed threshold η0\eta\in\mathbb{R}_{\geq 0}, the data points p=nn+(zkpΔ,i,pΔ)\cup_{p=n_{-}}^{n_{+}}(z_{k}^{p\Delta,i},p\Delta) are considered to be informative for the current model and thus added to the buffer \mathcal{B}.

V-B Theoretical Results

The probabilistic safety guarantees for our control scheme are summarized in the following theorem.

Theorem 4.

Suppose Assumption 3 holds and the offline meta-training of the parameters ww, θ¯0\bar{\theta}_{0}, and Λ0\Lambda_{0} is performed to satisfy Assumption 1 and 2, and feasible solution of the QP in Algorithm 1 is obtained at all time t<Tt<T. Then, x(t)𝒞ξix(t)\in\mathcal{C}_{\xi_{i}} holds with probability at least 12δ1-2\delta for all t>0t\in\mathbb{R}_{>0} and i1:Nobsi\in\mathbb{N}_{1:N_{\mathrm{obs}}}.

Proof.

From Theorem 1, the implicit control policy defined by the QP in Algorithm 1 keeps the system states within 𝒞^ξi:={x𝒟nhb(x,ξi)0}\hat{\mathcal{C}}_{\xi_{i}}:=\{x\in\mathcal{D}\subset\mathbb{R}^{n}\mid h^{b}(x,\xi_{i})\geq 0\} for all of the observed obstacles ξi\xi_{i}. Since each state in 𝒞^ξi\hat{\mathcal{C}}_{\xi_{i}} is included in the actual safe set 𝒞ξi\mathcal{C}_{\xi_{i}} with probability at least 12δ1-2\delta from Theorem 3, we can guarantee x(t)𝒞ξix(t)\in\mathcal{C}_{\xi_{i}} with probability at least 12δ1-2\delta for all t>0t\in\mathbb{R}_{>0} and the observed obstacles ξi\xi_{i}. This result and Assumption 3 prove the theorem. ∎

VI Case Study

We test the proposed approach through a simulation of vehicle navigation in a 2D space. All the experiments are conducted on Python running on a Windows 10 with a 2.80 GHz Core i7 CPU and 32 GB of RAM. TensorFlow and PyBullet [38] are used for the meta-learning of the IS function and implementation of the vehicle system with LiDAR scanners, respectively. We define the controlled system by the unmanned vehicle that has the following dynamics:

[qx˙qy˙ϑ˙]x˙=[000]f(x)+[cosϑsinϑsinϑcosϑ01]g(x)[vω]u,\displaystyle\underbrace{\begin{bmatrix}\dot{q_{x}}\\ \dot{q_{y}}\\ \dot{\vartheta}\end{bmatrix}}_{\dot{x}}=\underbrace{\begin{bmatrix}0\\ 0\\ 0\end{bmatrix}}_{f(x)}+\underbrace{\begin{bmatrix}\cos\vartheta&-\ell\sin\vartheta\\ \sin\vartheta&\ell\cos\vartheta\\ 0&1\end{bmatrix}}_{g(x)}\underbrace{\begin{bmatrix}v\\ \omega\end{bmatrix}}_{u}, (18)

where qxq_{x}, qyq_{y}, and ϑ\vartheta are xyx-y coordinates of the robot position and heading angle of the robot, vv and ω\omega are the velocity and angular velocity of the vehicle, respectively. The control inputs and system states are defined as u=[v,ω]2u=[v,\omega]^{\top}\in\mathbb{R}^{2} and x=[qx,qy,ϑ]3x=[q_{x},q_{y},\vartheta]\in\mathbb{R}^{3}, respectively. Note that in (18), we consider the dynamics of a point off the axis of the robot by a distance (>0)\ell\ (>0) to make the system relative degree one [40]. Moreover, the LiDAR scanner which has a 360-degree field of view and radiates 150 rays with a 3-meter length is assumed to be mounted on the vehicle.

VI-A Meta learning of unsafe regions

Refer to caption
Figure 3: PyBullet objects in case (II). The figures in the first and second lows are the GUI of the objects and corresponding cross sections at x-y coordinates, respectively. The surfaces with the smallest and largest sizes are shown by the black and blue lines, respectively.
Refer to caption
Figure 4: A few examples of the prediction on unsafe regions: The black and blue lines show the actual surface and the mean prediction of the surface, respectively. The blue shaded areas show the regions between the 0-level sets of the upper and lower 90% bounds of hISh_{\mathrm{IS}} derived by following the discussion in Section IV-B3, and the black cross marks show the data points on the surface used for the adaptation.

We first test the performance of the proposed scheme for constructing the IS function hIS:2h_{\mathrm{IS}}:\mathbb{R}^{2}\rightarrow\mathbb{R}. We consider 2 settings: (I) ellipsoidal obstacles with randomly chosen semi-axes lengths, center positions, and rotation angles (II) Objects in Fig. 3 with randomly chosen positions, rotation angles, and scales. In setting (I), the IS function hISh_{\mathrm{IS}} is defined by the ellipsoids as follows:

hIS(z;ξ)=[(qxqx,0ξ)cosϑobξ+(qyqy,0ξ)sinϑobξ]2cx,ξ2\displaystyle h_{\mathrm{IS}}(z;\xi)=\frac{[(q_{x}-q_{x,0}^{\xi})\cos\vartheta_{\mathrm{ob}}^{\xi}+(q_{y}-q_{y,0}^{\xi})\sin\vartheta_{\mathrm{ob}}^{\xi}]^{2}}{c^{2}_{x,\xi}}
+[(qxqx,0ξ)sinϑobξ(qyqy,0ξ)cosϑobξ]2cy,ξ21,\displaystyle\quad+\frac{[(q_{x}-q_{x,0}^{\xi})\sin\vartheta_{\mathrm{ob}}^{\xi}-(q_{y}-q_{y,0}^{\xi})\cos\vartheta_{\mathrm{ob}}^{\xi}]^{2}}{c^{2}_{y,\xi}}-1, (19)

where cxξc_{x}^{\xi}, cyξ(>0)c_{y}^{\xi}\ (>0) are the semi-axes lengths along the xx and yy coordinates respectively, ϑobsξ\vartheta_{\mathrm{obs}}^{\xi} is the rotation angle, (qx,0ξ,qy,0ξ)(q_{x,0}^{\xi},q_{y,0}^{\xi}) is the center of the ellipsoid, and z=[qx,qy]z=[q_{x},q_{y}]^{\top}. In this example, the unobserved parameter ξ\xi can be explicitly written as ξ=[cxξ,cyξ,qx,0ξ,qy,0ξ,ϑobsξ]\xi=[c_{x}^{\xi},c_{y}^{\xi},q_{x,0}^{\xi},q_{y,0}^{\xi},\vartheta_{\mathrm{obs}}^{\xi}]^{\top}. The parameters (cxξc_{x}^{\xi}, cyξc_{y}^{\xi}), (qx,0ξq_{x,0}^{\xi}, qy,0ξq_{y,0}^{\xi}), and ϑobξ\vartheta_{\mathrm{ob}}^{\xi} are randomly sampled from the uniform distribution in the range [0.4,0.8][0.4,0.8], [0.8,0.8][-0.8,0.8], and [0,2π][0,2\pi], respectively. In setting (II), we used object files in pybullet_data [41] to test with more arbitrary shapes. The rotational angles and the center position of the objects are sampled in the same way as setting (I). For both cases, the architecture of ϕ\phi in (10) is defined by three hidden layers of 256 units with 32 basis functions. Moreover, we set NiteN_{\mathrm{ite}}, σ\sigma, nn_{-}, n+n_{+}, and Δ\Delta to 30000, 0.001, 1, 5, and 0.1, respectively.

The prediction of an obstacle surface calculated with a small amount of data is shown in Figure 4. The blue lines and blue shaded areas in the figure are the mean predictions of the surfaces and the regions between the 0-level sets of the upper and lower 90% bounds of hISh_{\mathrm{IS}}. We can see that the confidence bounds are reasonably tight even when the number of data points for the adaptation is small, which is the main effect of the offline meta-training. Moreover, same as reported in [12], the time required for the adaptation is much smaller than that of the GP [42] which is a commonly used Bayesian method. More results including the comparison to the GP in terms of both the accuracy of the prediction and time required for the training are provided in Appendix -C.

VI-B Control execution

Refer to caption
(a)
Refer to caption
(b)
Figure 5: An example of control execution with the proposed method (above) and GP case (below). The red crosses, heat maps, blue solid lines, black solid lines, and black dotted lines represent data points on the surface, the values of CBF, 0-level sets of 95% lower bounds of CBF, actual surfaces, and robot trajectories, respectively. The CBF is updated every 5 [s].

Based on the trained hISh_{\mathrm{IS}}, we next consider executing the control with Algorithm 1. The control objective is to steer the vehicle toward the goal position (qxg,qyg)(q_{x}^{g},q_{y}^{g}) while avoiding obstacles. The goal-reaching objective is encoded through the CLF V(x)=(qxqxg)2+(qyqyg)2V(x)=(q_{x}-q_{x}^{g})^{2}+(q_{y}-q_{y}^{g})^{2}. Moreover, αC\alpha_{C} and αV\alpha_{V} in (5) are defined by the linear functions αC(c)=γCc\alpha_{C}(c)=\gamma_{C}c and αV(c)=γVc\alpha_{V}(c)=\gamma_{V}c. We set parameters γC\gamma_{C}, γV\gamma_{V}, and λ\lambda to 1.01.0, 1.01.0, and 10, respectively. For comparison, we test the control performance of the proposed scheme and the case where the GP is used instead of ALPaCA. Since the theoretically guaranteed bounds of GP [43, 44] tend to be overly conservative, we instead use the 2-σ\sigma lower bounds in the GP case. The metric of the control performance is defined by the cumulative squared error (CSE) of the difference between the robot and goal positions, along with the resulting state trajectory (the squared errors are corrected every 0.02 [s]). An example of the control execution with Δlidar=5\Delta_{\mathrm{lidar}}=5 [s] is visualized in Figure 5. From Figure 5, we can see the vehicle successfully reaches the goal while avoiding the unsafe region. Notably, the control of the proposed method is much less conservative than the GP case because of the tight prediction of the unsafe region (the vehicle in the GP case is often forced to be stacked in small safe regions until the subsequent update of the CBF). The CSEs in this case for the proposed method and GP case are 618 and 1010, respectively. We additionally provide results for different 5 environments with single or multiple obstacles in Appendix -D (the results for the cases Δlidar=1,3,5\Delta_{\mathrm{lidar}}=1,3,5 are shown) and similar results are observed. Although the CSE of the proposed method and the GP case get closer as Δlidar\Delta_{\mathrm{lidar}} becomes smaller, it is ideal to maintain the value Δlidar\Delta_{\mathrm{lidar}} to be small to save computation and the use of sensor devices that have limited batteries. Moreover, the GP prediction can not be updated so frequently because of the time required for the training (see the time analysis in Appendix -C). From the results above, we can conclude that the proposed scheme has a practical advantage from the perspective of control.

VII Conclusion

In this paper, we proposed an efficient and probabilistically ensured online learning method of CBF with sensor measurements. In the proposed method, we utilized a technique based on the GPIS to train unsafe regions from the measurements. Specifically, a Bayesian meta-learning scheme was employed to learn IS function, which enables us to effectively use past data from different settings. To avoid the volume of data buffers becoming unnecessarily large in the online control execution, we also considered the data selection scheme which effectively uses the uncertainty information provided by the current model. The prediction made by the proposed scheme was readily incorporated in the CBF-CLF-QP formulation and the probabilistic safety guarantee for the control scheme was also provided. In the case study, we have shown the efficacy of our method in terms of the conservativeness of the prediction, the time required for the online training, and the conservativeness of the control performance.

References

  • [1] E. F. Camacho and C. B. Alba, Model predictive control. Springer science & business media, 2013.
  • [2] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in European Control Conference (ECC), 2019, pp. 3420–3431.
  • [3] A. D. Ames, K. Galloway, K. Sreenath and J. W. Grizzle, “Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics,” in IEEE Transactions on Automatic Control, vol. 59, no. 4, pp. 876–891, 2014.
  • [4] A. D. Ames, X. Xu, J. W. Grizzle and P. Tabuada, “Control Barrier Function Based Quadratic Programs for Safety Critical Systems,” in IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2017.
  • [5] D. Q. Mayne, M. M. Seron, and S. Rakovic, “Robust model predictive ´ control of constrained linear systems with bounded disturbances,” Automatica, vol. 41, no. 2, pp. 219–224, 2005.
  • [6] A. Clark, “Control Barrier Functions for Complete and Incomplete Information Stochastic Systems,” in American Control Conference (ACC), Philadelphia, 2019, pp. 2928-2935.
  • [7] K. Garg and D. Panagou, “Robust Control Barrier and Control Lyapunov Functions with Fixed-Time Convergence Guarantees,” 2021 American Control Conference (ACC), 2021, pp. 2292-2297.
  • [8] Y. Chen, H. Peng, and J. Grizzle, “Obstacle Avoidance for Low-Speed Autonomous Vehicles With Barrier Function,” in IEEE Transactions on Control Systems Technology, vol. 26, no. 1, pp. 194–206, 2018.
  • [9] A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs with application to adaptive cruise control,” in IEEE Conference on Decision and Control, 2014, pp. 6271–6278.
  • [10] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1126–1135.
  • [11] T. Hospedales, A. Antoniou, P. Micaelli and A. Storkey, “Meta-Learning in Neural Networks: A Survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5149–5169, 2022.
  • [12] J. Harrison, A. Sharma, and M. Pavone, “Meta-learning priors for efficient online Bayesian regression,” in Proc. Workshop Algorithmic Found. Robot., 2018, pp. 318–337.
  • [13] T. Lew, A. Sharma, J. Harrison, A. Bylard and M. Pavone, ”Safe Active Dynamics Learning and Control: A Sequential Exploration–Exploitation Framework,” in IEEE Transactions on Robotics, vol. 38, no. 5, pp. 2888-2907, 2022.
  • [14] O. Williams and A. Fitzgibbon, “Gaussian process implicit surfaces,” in Proc. Gaussian Process. Practice Workshop, 2007.
  • [15] W. Martens, Y. Poffet, P. R. Soria, R. Fitch and S. Sukkarieh, “Geometric Priors for Gaussian Process Implicit Surfaces,” in IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 373–380, 2017.
  • [16] M. Gerardo-Castro, T. Peynot, and F. Ramos, “Laser-radar data fusion with Gaussian process implicit surfaces,” in Proc. Field Serv. Robot., 2015, pp. 289–302.
  • [17] C. Dawson, S. Gao, and C. Fan, “Safe Control with Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction methods,” in IEEE Transactions on Robotics, to be appeared, 2023.
  • [18] A. Taylor, A. Singletary, Y. Yue, and A. Ames, “Learning for safety critical control with control barrier functions,” in Learning for Dynamics and Control, 2020, pp. 708–717.
  • [19] J. Choi, F. Castaneda, C. Tomlin, and K. Sreenath, “Reinforcement ˜Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions,” in Robotics: Science and Systems, Corvalis, OR, 2020.
  • [20] P. Jagtap, G. J. Pappas and M. Zamani, “Control Barrier Functions for Unknown Nonlinear Systems using Gaussian Processes,” in IEEE Conference on Decision and Control (CDC), 2020, pp. 3699–3704.
  • [21] V. Dhiman, M. J. Khojasteh, M. Franceschetti and N. Atanasov, “Control Barriers in Bayesian Learning of System Dynamics,” in IEEE Transactions on Automatic Control, vol. 68, no. 1, pp. 214–229, 2023.
  • [22] F. Castaneda, J. J. Choi, W. Jung, B. Zhang, C. J. Tomlin, and K. Sreenath, “Probabilistic Safe Online Learning with Control Barrier Functions,” arXiv:2208.10733, 2022.
  • [23] K. Long, V. Dhiman, M. Leok, J. Cortés and N. Atanasov, ”Safe Control Synthesis With Uncertain Dynamics and Constraints,” in IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7295-7302, July 2022
  • [24] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 3387–3395.
  • [25] R. K. Cosner, Y. Yue and A. D. Ames, “End-to-End Imitation Learning with Safety Guarantees using Control Barrier Functions,” in IEEE Conference on Decision and Control (CDC), 2022, pp. 5316–5322.
  • [26] S. Yaghoubi, G. Fainekos and S. Sankaranarayanan, “Training Neural Network Controllers Using Control Barrier Functions in the Presence of Disturbances,” in IEEE International Conference on Intelligent Transportation Systems (ITSC), 2020, pp. 1–6.
  • [27] W. Jin, Z. Wang, Z. Yang, and S. Mou, “Neural Certificates for Safe Control Policies,” arXiv:2006.08465, 2020.
  • [28] A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V. Dimarogonas, S. Tu, and N. Matni, “Learning Control Barrier Functions from Expert Demonstrations,” in IEEE Conference on Decision and Control (CDC), 2020, pp. 3717–3724.
  • [29] C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions,” in Proceedings of the 5th Conference on Robot Learning, vol. 164, 2022, pp. 1724–1735.
  • [30] F. B. Mathiesen, S. C. Calvert and L. Laurenti, “Safety Certification for Stochastic Systems via Neural Barrier Functions,” in IEEE Control Systems Letters, vol. 7, pp. 973–978, 2023.
  • [31] M. Khan, T. Ibuki, and A. Chatterjee, “Gaussian Control Barrier Functions : A Non-Parametric Paradigm to Safety,” arXiv:2203.15474, 2022.
  • [32] M. Srinivasan, A. Dabholkar, S. Coogan and P. A. Vela, “Synthesis of Control Barrier Functions Using a Supervised Machine Learning Approach,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 7139–7145.
  • [33] K. Long, C. Qian, J. Cortés and N. Atanasov, “Learning Barrier Functions With Memory for Robust Safe Navigation,” in IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4931–4938, 2021.
  • [34] A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn, “Learning to adapt in dynamic, real-world environments through meta reinforcement learning. In Int. Conf. on Learning Representations, 2019.
  • [35] S. M. Richards, N. Azizan, J. E. Slotine, and M. Pavone, “Adaptive control-oriented meta-learning for nonlinear systems,” arXiv preprint arXiv:2103.04490, 2021
  • [36] K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen, “Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables,” in Proceedings of the 36th International Conference on Machine Learning, 2019, pp. 5331–5340.
  • [37] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari, “Improved algorithms for linear stochastic bandits,” in Neural Information Processing Systems, 2011.
  • [38] https://pybullet.org/wordpress/
  • [39] D. Nguyen-Tuong and J. Peters, “Incremental online sparsification for model learning in real-time robot control,” Neurocomputing, vol. 74, no. 11, pp. 1859–1867, 2011.
  • [40] J. Cortes and M. Egerstedt, “Coordinated control of multi-robot ´ systems: A survey,” SICE Journal of Control, Measurement, and System Integration, vol. 10, no. 6, pp. 495–503, 2017.
  • [41] https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet
    /gym/pybullet_data
  • [42] C. E. Rasmussen, “Gaussian processes in machine learning,” in Summer school on machine learning. Springer, 2003, pp. 63–71.
  • [43] N. Srinivas, A. Krause, S. M. Kakade and M. W. Seeger, “Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting,” in IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 3250–3265, 2012.
  • [44] K. Hashimoto, A. Saoud, M. Kishida, T. Ushio, and D. Dimarogonas, “Learning-based safe symbolic abstractions for nonlinear control systems,” Automatica, 2022.
  • [45] https://gpy.readthedocs.io/en/deploy/

-A Pseudo code of offline meta-learning

The Pseudo code of offline meta-learning is summarized in Algorithm 2.

Input : p(ξ)p(\xi) (distribution of obstacles);σ\sigma (noise variance);
1 Randomly initialize the parameters θ¯0,Λ0,w\bar{\theta}_{0},\Lambda_{0},w
2
3for i=1:Nitei=1:N_{\mathrm{ite}} do
4       for j=1:Jj=1:J do
5             Sample obstacles ξjp(ξ)\xi_{j}\sim p(\xi);
6             Construct datasets corresponding to the sampled obstacles as Dξj=i=1nj{(zij,yij)}D_{\xi_{j}}=\cup_{i=1}^{n_{j}}\{(z_{i}^{j},y_{i}^{j})\};
7             Sample njtrn_{j}^{\mathrm{tr}} from uniform distribution over {1,,nj}\{1,\ldots,n_{j}\}
8             Split the dataset into training and test sets: Dξjtr=i=1njtr{(zitr,j,yitr,j)}D_{\xi_{j}}^{\mathrm{tr}}=\cup_{i=1}^{n_{j}^{\mathrm{tr}}}\{(z_{i}^{\mathrm{tr},j},y_{i}^{\mathrm{tr},j})\} and Dξjts=i=1njts{(zits,j,ypts,j)}D_{\xi_{j}}^{\mathrm{ts}}=\cup_{i=1}^{n_{j}^{\mathrm{ts}}}\{(z_{i}^{\mathrm{ts},j},y_{p}^{\mathrm{ts},j})\}
9             Compute posterior parameters θ¯ξj\bar{\theta}_{\xi_{j}} and Λξj\Lambda_{\xi_{j}} via (11) with data DξjtrD_{\xi_{j}}^{\mathrm{tr}}
10            
11      Update θ¯0,Λ0,w\bar{\theta}_{0},\Lambda_{0},w via gradient step on the loss (IV-B2) calculated by θ¯ξj\bar{\theta}_{\xi_{j}}, Λξj\Lambda_{\xi_{j}}, and DξjtsD_{\xi_{j}}^{\mathrm{ts}}, j1:J\forall j\in\mathbb{N}_{1:J}
Algorithm 2 Offline meta-learning

-B Mitigation of conservativeness

To avoid unnecessary large βξ\beta_{\xi} defined in Section IV-B3 and mitigate the conservativeness of the probabilistic bounds without compromising on safety guarantees, we can add the following regularization term in the original loss function (IV-B2) as proposed in [13].

regξ(Λ0)=γTr(ΛξTΛξ1)Tr(Λ0TΛ01),\displaystyle\mathcal{L}_{\mathrm{reg}}^{\xi}(\Lambda_{0})=\gamma\mathrm{Tr}(\Lambda_{\xi}^{-T}\Lambda_{\xi}^{-1})\mathrm{Tr}(\Lambda_{0}^{-T}\Lambda_{0}^{-1}), (20)

where γ0\gamma\in\mathbb{R}_{\geq 0} represents the weights for the regularization term and Tr()\mathrm{Tr}(\cdot) is the determinant of the argument matrix. This term corresponds to the upper bound on the term λ¯(Λ0)λ¯(Λξ)\frac{\bar{\lambda}(\Lambda_{0})}{\underline{\lambda}(\Lambda_{\xi})} in βξ\beta_{\xi}. Thus, minimizing (20) leads to small βξ\beta_{\xi} value. In the experiment, we have confirmed that the regularization term with γ=109\gamma=10^{-9} reasonably reduces the conservativeness of the bounds.

-C Detailed Results for Meta-Learning of Unsafe Regions

We first show the negative log-likelihood (NLL) of the predictions for the cases (I) and (II) discussed in Section VI, in Figure 6. The NLLs against the number of data points are calculated for both the proposed scheme and the case where we use Gaussian process regression (GPR) instead of ALPaCA. NLLs are calculated for 100 different obstacles and the average and 3-σ\sigma intervals are plotted in the figures. Note that the number of data points in the figures means the number of surface points and the actual number of data points is n++n+1n_{+}+n_{-}+1 (=7=7) times of that. The squared exponential kernel is used for GP and the parameters in the kernel are fitted by solving the maximum likelihood problem. In GP implementation, the Python library GPy [45] is used. From the figures, we can see that the prediction made by the proposed meta-learning scheme is superior to that of the GPR case when the number of data is small. The time analysis is also provided in Figure 7. Figure 7 shows that the proposed scheme can produce the prediction very fast compared to the case in which every time fits the GP, which is especially beneficial for the usage of the online synthesis of CBF-QP since we can more frequently update the CBFs. Additional examples of the prediction are shown in Figure 8a and 8b.

Refer to caption
(a) Setting (I)
Refer to caption
(b) Setting (II)
Figure 6: NLLs for predictions of the proposed meta-learning scheme and GP case against the number of data points (Note that the number of data points in the figure means the number of surface points and the actual number of data points is n++n+1n_{+}+n_{-}+1 (=7=7) times of that.). The NLLs are collected for randomly chosen 100 different functions and the mean and 3σ3-\sigma interval of NLL values are plotted in the figure.
Refer to caption
Figure 7: The time required for predictions of the proposed meta-learning scheme and GP case against the number of data points (Note that the number of data points in the figure means the number of surface points and the actual number of data points is n++n+1n_{+}+n_{-}+1 (=7=7) times of that.). The times are collected for randomly chosen 100 different functions and the mean and 3σ3-\sigma interval of NLL values are plotted in the figure.
Refer to caption
(a) Additional visualization for case (I)
Refer to caption
(b) Additional visualization for case (II)

-D Detailed Results for Control Execution

Next, we summarize the detailed results for control execution. In Figures 812, we show the resulting trajectories and the time evolution of the squared errors between the robot positions and the goal positions when Δlidar=5\Delta_{\mathrm{lidar}}=5, for all 6 environments. From these figures, we can see that the tight prediction of the proposed scheme leads to less conservative control performance as explained in Section VI. In Table LABEL:CSE, we also summarize the cumulative squared errors between the robot position and goal position for all environments and the cases Δlidar=1,3,5\Delta_{\mathrm{lidar}}=1,3,5 [s].

Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Figure 8: A control execution in Environment 1 with the proposed method (above) and GP case (below). The red crosses, heat maps, blue solid lines, black solid lines, and black dotted lines represent data points on the surface, the values of CBF, 0-level sets of 95% lower bounds of CBF, actual surfaces, and robot trajectories, respectively. The prediction of CBF is updated every 5 [s]. The time evolution of the squared errors between goal and system states is also shown in the figure.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 9: A control execution in Environment 2 with the proposed method (above) and GP case (below). The red crosses, blue/red solid lines, black solid lines, and black dotted lines represent data points on the surface, 0-level sets of 95% lower bounds of CBF, actual surfaces, and robot trajectories, respectively. The prediction of CBF is updated every 5 [s]. The time evolution of the squared errors between goal and system states is also shown in the figure.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 10: A control execution in Environment 3. The prediction of CBF is updated every 5 [s].
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 11: A control execution in Environment 4. The prediction of CBF is updated every 5 [s].
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 12: A control execution in Environment 5. The prediction of CBF is updated every 5 [s].
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 13: A control execution in Environment 6. The prediction of CBF is updated every 5 [s].
TABLE I: The cumulative squared errors between the robot position and the goal position
Proposed GP
Environment / Δlidar\Delta_{\mathrm{lidar}} 1 3 5 1 3 5
1 554.93 620.12 618.56 543.78 715.75 1010.93
2 654.10 702.75 830.22 697.65 818.47 1094.79
3 534.40 585.63 654.25 533.77 609.42 897.31
4 794.06 794.06 843.25 1792.50 1792.50 1803.99
5 475.41 475.41 486.61 472.43 472.43 503.12
6 1068.52 1068.56 1076.81 1271.35 1271.32 1648.49