Bayesian Meta-Learning on Control Barrier Functions with Data from On-Board Sensors

Wataru Hashimoto, Kazumune Hashimoto, Akifumi Wachi, Xun Shen, Masako Kishida, and Shigemasa Takai Wataru Hashimoto, Kazumune Hashimoto, Xun Shen, and Shigemasa Takai are with the Graduate School of Engineering, Osaka University, Suita, Japan (e-mail: [email protected], {hashimoto, shenxun, takai}@eei.eng.osaka-u.ac.jp). Akifumi Wachi is with LINE Corporation, Tokyo, Japan (email: [email protected]). Masako Kishida is with the National Institute of Informatics, Tokyo, Japan (email: [email protected]). This work is supported by JST CREST JPMJCR201, Japan and by JSPS KAKENHI Grant 21K14184.

Abstract

In this paper, we consider a way to safely navigate the robots in unknown environments using measurement data from sensory devices. The control barrier function (CBF) is one of the promising approaches to encode safety requirements of the system and the recent progress on learning-based approaches for CBF realizes online synthesis of CBF-based safe controllers with sensor measurements. However, the existing methods are inefficient in the sense that the trained CBF cannot be generalized to different environments and the re-synthesis of the controller is necessary when changes in the environment occur. Thus, this paper considers a way to learn CBF that can quickly adapt to a new environment with few amount of data by utilizing the currently developed Bayesian meta-learning framework. The proposed scheme realizes efficient online synthesis of the controller as shown in the simulation study and provides probabilistic safety guarantees on the resulting controller.

Index Terms:

Control barrier function, learning-based control, meta-learning.

I Introduction

ENSURING safety while achieving certain tasks is a fundamental yet challenging problem that has drawn significant attention from researchers in the control and robotics communities over the past few decades. In the control community, such safety requirements are often handled by imposing corresponding state constraints on an optimal control problem. Model Predictive Control (MPC) [1] and control synthesis methods based on the certificate functions such as Control Lyapunov Function (CLF) and Control Barrier Function (CBF) [2, 3, 4] are notable examples of such a method. Desirable theoretical results for these methods such as feasibility, stability, and safety are offered for known system dynamics [1, 2, 3, 4] or noisy dynamics with known noise distribution [5, 6, 7].

The CBF is a prominent tool to impose forward invariance of a safe set in state space. If a dynamical system is control-affine, quadratic programming formulation called CBF-QP [4] is available, which can produce safe control inputs much faster than many other control methods including MPC, and enables real-time implementations for challenging applications such as autonomous vehicles [8] and bipedal locomotion [9]. However, the majority of the existing works presuppose the availability of a valid CBF that represents safe and unsafe regions in state space. This assumption, however, cannot be maintained if a robot is expected to operate autonomously in unknown or uncertain environments. In such situations, it is essential for a robotic system to automatically identify unsafe regions on the fly, employing data from sensory devices. Thus, in this paper, we consider a learning-based method for constructing a CBF based on online measurements obtained from onboard sensors such as LiDAR scanners.

In the previous studies regarding this line of the topic (e.g., [31, 32, 33]), the robotic system iteratively improves the prediction of the CBF by collecting online sensor measurements without any prior knowledge about the environment or learning task. However, in such cases, even a slight change in the environment necessitates a complete retraining process, which is inefficient and requires a substantial amount of data. To efficiently and effectively learn CBF, adapting it to a new environment with minimal or small amount of training data, we suggest implementing meta-learning [10, 11], a technique that allows an agent to learn how to learn, as opposed to creating it from scratch. During the meta-training process, the model parameters are trained with data collected from various different environments, aiming to learn common underlying knowledge or structure of the learning task. The parameters obtained through meta-learning are then utilized as initial parameters for the online phase in a new environment and updated with online measurements in the environment. In this paper, we specifically employ a learning scheme combining the recently developed Bayesian meta-learning method [12] which is equipped with probabilistic bounds on the prediction [13], and the Bayesian surface reconstruction method [14, 15, 16] that can learn the unsafe regions from noisy sensor measurements. With this method, we can learn CBF and synthesize a safe controller with few amount of data and provide a probabilistic guarantee on the resulting controller.

Related works on learning and CBF : Learning-based approaches for certificate functions such as CBF and CLF are one of the active research topics in the area of intersections between control and learning [17]. In the following, we summarize the recent progress in the topic of learning and CBF. The authors of the works [18, 19, 20, 21, 22, 23] used CBF for active safe learning of uncertain dynamics. The works [18, 19] consider machine learning approaches to safely reduce the model uncertainty and show the methods yield empirically good performances, while the Gaussian Process (GP) is also used in [20, 21, 22, 23] and rigorous theoretical results for safety requirements are offered under certain assumptions. CBF is also used in Imitation Learning (IL) and Reinforcement Learning (RL) [25, 26, 24] to account for safety concerns.

While the above studies assume a valid CBF is given and consider leveraging it to impose safety conditions, several previous studies also consider parameterizing CBF itself by Neural Network (NN) and learning it to be a valid barrier function [27, 28, 29, 30, 31, 32, 33]. The work [27] considers jointly learning NNs representing CBF and CLF as well as a control policy based on safe and unsafe samples. The method proposed in [28] considers recovering CBF from expert demonstrations and theoretically guarantees that the learned NN meets the CBF conditions under the Lipsitz continuity assumptions. In [31], GP is also used to synthesize CBF. The works most related to this paper are [32, 33] which propose ways to learn CBF with sensor measurements. In [32], Support Vector Machine (SVM) classifier representing CBF is trained with safe and unsafe samples constructed by using the LiDAR measurements. Similarly, the authors of the work [33] present a synthesis method using Signed Distance Function (SDF).

Related works on meta-learning and control : Meta-learning, also known as “learning to learn,” is a technique capable of learning from previous learning experiences or tasks in order to improve the efficiency of future learning [10]. Meta-learning has recently become increasingly popular in the control literature. In several previous studies such as [13, 34], meta-learning is used to learn system dynamics to enable quick adaptation to the changes in surrounding situations. In these works, the trained models are incorporated into MPC which requires relatively heavy online computation. Meta-learning is also used in policy learning in the context of adaptive control [35] and reinforcement learning [10, 36].

Contributions: Our contributions compared to the existing methods are threefold: First, compared to the methods that synthesize CBF with sensor measurements [31, 32, 33] which rely on traditional supervised learning, the proposed meta-learning scheme can effectively use past data collected from different environments and produce a prediction of CBF with few amount of data. Consequently, the resulting control scheme realizes less conservative control performance with few amount of data (small number of online CBF updates) as we see in the case study. Second, different from the previous NN-based CBF synthesis methods [27, 29, 30, 32, 33], our method can readily take into account uncertainty in data and calculate formal probabilistic bounds on CBF. Based on this, we provide a probabilistic safety guarantee on the proposed control scheme. Although the work [33] takes into account the prediction error in SDF, such an error bound is assumed to be given and actual computation is not provided. Third, the learned CBF can readily be incorporated into ordinary QP formulation which can be efficiently solved. Thus, our method can produce control input much faster than the previous methods regarding meta-learning and MPC such as [13, 34].

Notations: A continuous functions $\alpha_{1}:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}_{\geq 0}$ and $\alpha_{2}:\ \mathbb{R}\rightarrow\mathbb{R}$ are class $\mathcal{K}$ and extended class $\mathcal{K}_{\infty}$ function, respectively, if they are strictly increasing with $\alpha_{1}(0)=0$ and $\alpha_{2}(0)=0$ . $\mathcal{X}_{d}^{2}(p)$ is the $p$ -th quantile of the $\mathcal{X}^{2}$ distribution with $d$ degrees of freedom. The maximum and minimum eigenvalues for any positive definite matrix $A$ are defined as $\bar{\lambda}(A)$ and $\underline{\lambda}(A)$ , respectively. For vector fields $f$ and a function $F$ , $L_{f}F(x):=\frac{\partial F(x)}{\partial x}f(x)$ is the Lie derivatives of the function $F$ in the direction of $f$ .

II Problem Statement

Throughout this paper, we consider the following control affine system:

\displaystyle\dot{x}

\displaystyle=f(x)+g(x)u,

(1)

where $x\in\mathcal{D}\subset\mathbb{R}^{n}$ and $u\in\mathcal{U}:=\{u\in\mathbb{R}^{m}\mid Au\leq b\}$ with $A\in\mathbb{R}^{m\times m}$ and $b\in\mathbb{R}^{m}$ are the admissible system states and control inputs respectively. The functions $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ and $g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n\times m}$ are continuously differentiable functions representing dynamics of a robotic system (e.g., ground vehicles or drones) equipped with sensors such as LiDAR. The state space is assumed to contain initially unknown $N_{\mathrm{obs}}$ obstacles and the outside of each obstacle is defined by the following:

\displaystyle\mathcal{C}_{\xi_{i}}=\{x\in\mathcal{D}\subset\mathbb{R}^{n}\mid h(x;\xi_{i})\geq 0\},\ i\in\mathbb{N}_{1:N_{\mathrm{obs}}},

(2)

where $h(\cdot;\xi):\mathbb{R}^{n}\rightarrow\mathbb{R}$ is a continuously differentiable function with unobserved latent variable $\xi$ that encodes the geometric information about an obstacle, e.g., the position and shape of an obstacle. We assume realizations of the variable, $\xi_{i}$ , $i\in\mathbb{N}_{1:N_{\mathrm{obs}}}$ are determined by samples from a probabilistic distribution $p(\xi)$ and they are fixed during the control execution. For simplicity, the obstacles are chosen not to overlap with each other.

Our goal is to synthesize a controller that achieves a given task (e.g., goal-reaching task) without deviating from the safe region $\mathcal{C}=\bigcap_{i=1}^{N_{\mathrm{obs}}}\mathcal{C}_{\xi_{i}}$ given initial state $x(0)\in\mathcal{C}$ . Since the safe/unsafe regions are initially unknown, the robotic system needs to identify them based on the sensor measurements. A naive approach toward this end is to use supervised learning based on measurement data obtained under a pre-determined environment. However, in this case, the trained model cannot be generalized to different environments, and even slight changes in the environment necessitate a complete re-fitting of the model, which is not only inefficient but also demands a substantial amount of data. Thus, this paper considers a way to effectively use past data regarding the different environments sampled from the distribution $p(\xi)$ to quickly learn safe/unsafe regions in a new environment and synthesize the safe controller with few amount of online data.

In the following, we first introduce the control notions suited for our purpose called Control Barrier Function (CBF) and Control Lyapunov Function (CLF) in Section III. Then, the meta-learning procedures of CBF and a way to synthesize QP with the learned CBF are discussed in Section IV and V.

III CBF-CLF-QP

In this section, we first introduce the Zeroing Control Barrier Function (ZCBF) as a tool to enforce safety on the system (1). The following discussion is a summary of the works [2, 4]. In ZCBF, the safety is defined as the forward-invariance of a safe set $\mathcal{C}$ which is defined by the super zero level set of a function $h:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , i.e., the system (1) is considered to be safe if $x(t)\in\mathcal{C}:=\{x\mid h(x)\geq 0\}$ holds for all $t\geq 0$ when $x(0)\in\mathcal{C}$ . Given an extended class $\mathcal{K}_{\infty}$ function $\alpha_{C}:\mathbb{R}\rightarrow\mathbb{R}$ , the ZCBF is defined as follows.

Definition 1.

Let $\mathcal{C}\subset\mathcal{D}\subset\mathbb{R}^{n}$ be the zero super level set of a continuously differentiable function $h:\ \mathbb{R}^{n}\rightarrow\mathbb{R}$ . Then, the function $h$ is a zeroing control barrier function (ZCBF) for (1) on $\mathcal{C}$ if there exists an extended class $\mathcal{K}_{\infty}$ function $\alpha_{C}$ such that, for all $x\in\mathcal{D}$ the following inequality holds.

\displaystyle\sup_{u\in\mathbb{R}^{m}}\left\{L_{f}h(x)+L_{g}h(x)u+\alpha_{C}(h(x))\right\}\geq 0.

(3)

We additionally define the set of control inputs that satisfy the CBF condition as follows:

\displaystyle U_{C}(x)=\{u\in\mathbb{R}^{m}\mid L_{f}h(x)+L_{g}h(x)u+\alpha_{C}(h(x))\geq 0\}.

If the function $h$ is a valid CBF for (1) on $\mathcal{C}$ , the safety of the system (1) is guaranteed as the following theorem.

Theorem 1 ([4]).

Let $\mathcal{C}\subset\mathcal{D}\subset\mathbb{R}^{n}$ be a super zero level set of a continuously differentiable function $h:\ \mathbb{R}^{n}\rightarrow\mathbb{R}$ and the function $h$ is ZCBF for (1). Then, any policy that selects control inputs from $U_{C}(x)$ renders the safe set $\mathcal{C}$ forward invariant for the system (1).

In this paper, the control objectives, except for the safety requirements, are encoded through the control Lyapunov function (CLF) which is given in advance. With a class $\mathcal{K}$ function $\alpha_{V}$ , the definition of CLF is formally given as follows [2].

Definition 2.

A continuously differentiable function $V:\ \mathbb{R}^{n}\rightarrow\mathbb{R}_{\geq 0}$ is a CLF if $V(x_{e})=0$ with equilibrium point $x_{e}\in\mathcal{D}$ and there exists a class $\mathcal{K}$ function $\alpha_{V}:\mathbb{R}\rightarrow\mathbb{R}$ such that, for all $x\in\mathcal{D}\subset\mathbb{R}^{n}$ the following inequality holds.

\displaystyle\inf_{u\in\mathbb{R}^{m}}\left\{L_{f}V(x)+L_{g}V(x)u+\alpha_{V}(V(x))\right\}\leq 0.

(4)

Given a CLF $V$ , we consider the set of all control inputs that satisfy (4) for a state $x\in\mathcal{D}$ as

\displaystyle U_{V}(x)=\{u\in\mathbb{R}^{m}\mid L_{f}V(x)+L_{g}V(x)u+\alpha_{V}(V(x))\leq 0\}.

If a function $V$ is CLF, the stability of the equilibrium point $x_{e}$ is guaranteed as the following theorem.

Theorem 2 ([3]).

Let a continuously differentiable function $V:\ \mathbb{R}^{n}\rightarrow\mathbb{R}_{\geq 0}$ be CLF for (1). Then, any policy that selects control inputs from $U_{V}(x)$ renders the equilibrium point $x_{e}$ of the system (1) asymptotically stable.

Having CBF and CLF defined above, we can synthesize a controller satisfying CBF and CLF conditions (3) and (4) with quadratic programming (QP) as follows:


		$\displaystyle\min_{(u,\epsilon)\in\mathbb{R}^{m+1}}\ \frac{1}{2}u^{\top}Hu+\lambda\epsilon^{2},$		(5a)
		$\displaystyle\mathrm{s.t.}\ L_{f}V(x)+L_{g}V(x)u+\alpha_{V}(V(x))\leq\epsilon,$		(5b)
		$\displaystyle\ \ \ \ \ L_{f}h(x)+L_{g}h(x)u+\alpha_{C}(h(x))\geq 0$		(5c)
		$\displaystyle\ \ \ \ \ Au\leq b,$		(5d)

where $H\in\mathbb{R}^{m\times m}$ is a positive definite matrix and $\lambda\ (>0)$ is a tunable parameter for relaxing the CLF condition (5b). The relaxation of the CLF condition ensures the solvability of the QP. If the safety is defined by multiple CBFs, we can handle them by adding the corresponding CBF constraints to the optimization problem.

IV Bayesian Meta-Learning of CBF

In this section, we consider learning the safe region (2) based on the measurement data from sensor devices. To this end, we employ a technique based on the Gaussian process implicit surfaces (GPIS) [14, 15, 16] which is developed for non-parametric probabilistic reconstruction of object surfaces. Note that we consider the case of $N_{\mathrm{obs}}=1$ for a while, and the case of multiple obstacles is discussed later in the next section. In this paper, the implicit surfaces (IS) are defined by the distance from the surface of objects to a certain point $z$ in 2D or 3D space, and the inside and outside of the object surfaces are interpreted through the sign of the function as follows.

\displaystyle h_{\mathrm{IS}}(z;\xi)=\left\{\begin{array}[]{ll}d_{\xi}(z)&\text{if $z$ is outside the obstacle}\\ 0&\text{if $z$ is on the surface}\\ -d_{\xi}(z)&\text{if $z$ is inside the obstacle}\end{array}\right.

(9)

where $z\in\mathbb{R}^{b}$ , $b\in\{2,3\}$ is a position in a 2D or 3D space and $d_{\xi}:\mathbb{R}^{b}\rightarrow\mathbb{R}_{\geq 0}$ is the function that returns the minimum Euclidean distance between $z$ and surface of the obstacle corresponding to $\xi\sim p(\xi)$ . Though this paper focuses on a 2D environment, our method can easily be extended to a 3D case. In GPIS, the function $h_{\mathrm{IS}}$ is estimated with the Gaussian process (GP), which enables us to deal with uncertainty arising from measurement noise and scarcity of data. The GP is estimated with a dataset $\{(z^{i},h_{\mathrm{IS}}(z^{i},\xi)+\varsigma_{i})\}_{i=1}^{n_{\mathrm{data}}}$ , where $\varsigma_{i}$ is bounded $\sigma$ -subgaussian noise and $n_{\mathrm{data}}$ is the number of data.

In the following, we explain how to construct a dataset and apply Bayesian meta-learning to the training of $h_{\mathrm{IS}}$ .

IV-A Dataset Construction

Refer to caption — Figure 1: The construction of data points from noisy sensor measurements: The gray dots are the points on the surface detected by the sensors. The green, blue, and red dots represent the points that have distance $2\Delta$ , $\Delta$ , $-\Delta$ from the surface (negative value means inside of the obstacle) which are calculated by using surface normal approximation explained in Section IV-A (we show the case of $n_{-}=1$ and $n_{+}=2$ ).

In this subsection, we explain how to construct a dataset to learn implicit surfaces (9) based on the measurement data from sensors such as LiDAR. A pictorial image for obtaining training data from noisy sensor measurements is shown in Fig. 1. First, assume we have a point cloud on the surface $\mathcal{P}^{0}=\cup_{i=1}^{n_{\mathrm{surf}}}\{z^{0,i}\}$ , where $n_{\mathrm{surf}}\in\mathbb{Z}_{>0}$ is the number of surface points and $z^{0,i}\in\mathbb{R}^{2}$ , $\forall i\in\mathbb{N}_{1:n_{\mathrm{surf}}}$ are the world Cartesian coordinates of surface points computed with the raw depth readings produced by sensors and current position of the robot. Then, for each surface point $z^{0,i}$ , the normal to the surface of the obstacle at that point is approximated by the perpendicular to the segment between the point and the nearest neighbor point within the same scan [16]. Under the assumption that the sensor measurements on the surface are dense enough, this method yields accurate approximations. For each surface point $z^{0,i}$ and corresponding approximated normal to the surface, we construct a set of points $\cup_{p=-n_{-}}^{n_{+}}\{z^{p\Delta,i}\}$ with $\Delta\in\mathbb{R}_{>0}$ and $n_{+},n_{-}\in\mathbb{Z}_{\geq 0}$ , where $z^{p\Delta,i}$ , $\forall p\in\mathbb{Z}_{-n_{-}:n_{+}}$ are the representations of the points that are $p\Delta$ away from the surface of the obstacle (see Fig 1). A negative $p$ value means that the point $z^{p\Delta,i}$ is inside the obstacle. Then, we construct the data set to train $h_{\mathrm{IS}}$ as $D=\cup_{i=1}^{n_{\mathrm{surf}}}\cup_{p=-n_{-}}^{n_{+}}\{(z^{p\Delta,i},p\Delta)\}$ .

IV-B Bayesian meta-learning of the function $h_{\mathrm{IS}}$

When we train the IS function online, the efficiency of the training may be problematic while the model trained offline is not capable of generalizing to new environments. Thus, instead of relying on either online or offline training, we consider employing a meta-learning [10] scheme, which enables quick prediction on $h_{\mathrm{IS}}$ with a few amount of online data by effectively using data from different settings. In the following, we first introduce a Bayesian meta-learning framework ALPaCA (Adaptive Learning for Probabilistic Connectionist Architectures) [12]. Then, the probabilistic bounds on the prediction are subsequently discussed.

IV-B1 Overview of ALPaCA

We first parameterize the prediction of the function $h_{\mathrm{IS}}$ by the following form:

\displaystyle\hat{h}_{\mathrm{IS}}(z;\theta)=\theta^{\top}\phi_{w}(z),

(10)

where $\phi_{w}:\mathbb{R}^{2}\rightarrow\mathbb{R}^{d}$ represents a feed-forward neural network with parameter $w$ and $\theta\in\mathbb{R}^{d}$ is a coefficient matrix that follows a Gaussian distribution $\mathcal{N}(\bar{\theta},\sigma^{2}\Lambda^{-1})$ which encodes the information and uncertainty associated with the unknown variable $\xi\sim p(\xi)$ . Here, $\bar{\theta}\in\mathbb{R}^{d}$ denotes the mean parameters and $\Lambda\in\mathbb{R}^{d\times d}$ is the positive definite precision matrix. Once the parameters of $\phi_{w}$ and the prior distribution, ( $w,\bar{\theta}_{0},\Lambda_{0}$ ), are determined through the offline meta-training procedure explained later, and the new measurement data along with a fixed $\xi\sim p(\xi)$ is given as $D_{\xi}=\{(z_{i},y_{i})\}_{i=1}^{n_{\mathrm{adapt}}}$ , the posterior distribution on $\theta$ is obtained as follows.

\displaystyle\Lambda_{\xi}=\Phi_{\xi}^{\top}\Phi_{\xi}+\Lambda_{0},\ \ \bar{\theta}_{\xi}=\Lambda_{\xi}^{-1}(\Phi_{\xi}^{\top}G_{\xi}+\Lambda_{0}\theta_{0}),

(11)

where $G_{\xi}^{\top}=[y_{1},y_{2},\ldots,y_{n_{\mathrm{adapt}}}]\in\mathbb{R}^{n_{\mathrm{adapt}}}$ , and $\Phi_{\xi}^{\top}=[\phi_{w}(z_{0}),\phi_{w}(z_{1}),\ldots,\phi_{w}(z_{n_{\mathrm{adapt}}})]\in\mathbb{R}^{d\times n_{\mathrm{adapt}}}$ . The above computation is based on Bayesian linear regression. Here, the mean and precision matrices are subscripted by $\xi$ to explicitly show that they are calculated using the data associated with unknown variable $\xi\sim p(\xi)$ . Then, for any given $z\in\mathbb{R}^{2}$ , the mean and variance of the posterior predictive distribution are obtained as

\displaystyle\mu_{\xi}(z)=\bar{\theta}_{\xi}^{\top}\phi_{w}(z),\ \Sigma_{\xi}(z)=\sigma^{2}(1+\phi_{w}(z)^{\top}\Lambda_{\xi}^{-1}\phi_{w}(z)).

(12)

IV-B2 Offline meta-training procedures

In the offline meta-training procedure, the parameters ( $w$ , $\bar{\theta}_{0}$ , $\Lambda_{0}$ ) are trained so that the previously discussed online adaptation yields accurate prediction (The pseudo-code of the offline training is summarized in Algorithm 2 in Appendix -A). To this end, we iterate the following procedures prescribed $N_{\mathrm{ite}}\in\mathbb{Z}_{>0}$ times. First, we prepare $J\in\mathbb{Z}_{>0}$ data sets by sampling obstacles $\xi_{1},\xi_{2},\ldots,\xi_{J}$ from the distribution $p(\xi)$ and collect the data for each obstacle as $D_{\xi_{j}}=\cup_{i=1}^{n_{j}}\{(z_{i}^{j},y_{i}^{j})\}$ , where $n_{j}$ is the number of data within $D_{\xi_{j}}$ . Then, we randomly split each dataset $D_{\xi_{j}}$ into training set $D_{\xi_{j}}^{\mathrm{tr}}=\cup_{i=1}^{n_{j}^{\mathrm{tr}}}\{(z_{i}^{\mathrm{tr},j},y_{i}^{\mathrm{tr},j})\}$ and test set $D_{\xi_{j}}^{\mathrm{ts}}=\cup_{i=1}^{n_{j}^{\mathrm{ts}}}\{(z_{i}^{\mathrm{ts},j},y_{p}^{\mathrm{ts},j})\}$ , where $n_{j}^{\mathrm{tr}}$ and $n_{j}^{\mathrm{ts}}$ are number of data within $D_{\xi_{j}}^{\mathrm{tr}}$ and $D_{\xi_{j}}^{\mathrm{ts}}$ with $n_{j}^{\mathrm{tr}}+n_{j}^{\mathrm{ts}}=n_{j}$ . In each iteration, $n_{j}^{\mathrm{tr}}$ is also randomly chosen from the uniform distribution over $\{1,\ldots,n_{j}\}$ . The data within $D_{\xi_{j}}^{\mathrm{tr}}$ is used to calculate the posterior distribution on the parameter $\theta_{\xi_{j}}$ (i.e., mean $\bar{\theta}_{\xi_{j}}$ and precision matrix $\Lambda_{\xi_{j}}$ ) through (11) while data within $D_{\xi_{j}}^{\mathrm{ts}}$ is used to evaluate the accuracy of the model (10) with the posterior parameters $\bar{\theta}_{\xi_{j}}$ and $\Lambda_{\xi_{j}}$ calculated above. We define the meta-learning objective by the following marginal log-likelihood across the data and update the parameters through stochastic gradient descent.

$\displaystyle\ell($	$\displaystyle\bar{\theta}_{0},\Lambda_{0},w):=\sum_{j=1}^{J}\sum_{i=1}^{n_{j}^{\mathrm{ts}}}\log p(y_{i}^{\mathrm{ts},j}\mid z_{i}^{\mathrm{ts},j})$
	$\displaystyle=\sum_{j=1}^{J}\sum_{i=1}^{n_{j}^{\mathrm{ts}}}(\log(1+\phi_{w}(z_{i}^{\mathrm{ts},j})^{\top}\Lambda_{\xi_{j}}\phi_{w}(z_{i}^{\mathrm{ts},j}))$
	$\displaystyle\ \ \quad+(y_{i}^{\mathrm{ts},j}-\bar{\theta}_{\xi_{j}}^{\top}\phi_{w}(z_{i}^{\mathrm{ts,j}}))^{\top}\Sigma_{\xi_{j}}(z_{i}^{\mathrm{ts,j}})$
	$\displaystyle\qquad\quad(y_{i}^{\mathrm{ts},j}-\bar{\theta}_{\xi_{j}}^{\top}\phi_{w}(z_{i}^{\mathrm{ts,j}}))).$	(13)

IV-B3 Probabilistic bounds on CBF

From the discussions above, we can compute the predictive distribution of the function value $h_{\mathrm{IS}}(z;\xi)$ for any $z\in\mathbb{R}^{2}$ and $\xi\sim p(\xi)$ by (12) once the offline meta-training procedure discussed in Section IV-B2 has been done and new measurements are obtained along with the online execution. Here, we consider deriving a deterministic function with probabilistic guarantees, which can readily be incorporated in the QP formulation (5). The following discussion follows from [13]. Before deriving the probabilistic bounds on the prediction of $h_{\mathrm{IS}}$ , we make the following two assumptions on the quality of the offline meta-training, which are realistic and intuitive ones.

Assumption 1.

For all $\xi\sim p(\xi)$ , there exists $\theta^{*}_{\xi}\in\mathbb{R}^{d}$ such that

\displaystyle{\theta^{*}_{\xi}}^{\top}\phi_{w}(z)=h_{\mathrm{IS}}(z;\xi),\ \ \forall z\in\mathbb{R}^{2}.

(14)

Assumption 2.

For all $\xi\sim p(\xi)$ , we have the following:

\displaystyle\mathbb{P}(\|\theta^{*}_{\xi}-\bar{\theta}_{0}\|_{\Lambda_{i}}^{2}\leq\sigma^{2}\mathcal{X}_{d}^{2}(1-\delta))\geq 1-\delta.

(15)

Assumption 1 implies that the meta-learning model (10) is capable of fitting the true function while Assumption 2 says that the uncertainty in the prior is conservative enough. Under these assumptions, we can derive the bounds on the function $h_{\mathrm{IS}}$ as follows.

Theorem 3.

Suppose the offline meta-training of the parameters $w$ , $\bar{\theta}_{0}$ , and $\Lambda_{0}$ is done to satisfy Assumption 1 and 2, and the parameters defining posterior distribution $\bar{\theta}_{\xi}$ and $\Lambda_{\xi}$ are computed by (11) with data from the new environment $\xi\sim p(\xi)$ . Then, the L-1 norm of the difference between the true function value $h_{\mathrm{IS}}(z;\xi)$ and the mean prediction $\bar{h}_{\mathrm{IS}}(z;\theta_{\xi})=\bar{\theta}_{\xi}\phi(z)$ with $\theta_{\xi}\sim\mathcal{N}(\bar{\theta}_{\xi},\Lambda_{\xi})$ , for all $z\in\mathbb{R}^{2}$ and $\xi\sim p(\xi)$ is bounded as the following.

\displaystyle|h_{\mathrm{IS}}(z;\xi)-\bar{h}_{\mathrm{IS}}(z;\theta_{\xi})|\leq|\phi(z)^{\top}(\bar{\theta}_{\xi}-\theta^{*}_{\xi})|\leq\|\phi(z)\|_{\Lambda_{\xi}^{-1}}\beta_{\xi},

(16)

with

\displaystyle\beta_{\xi}=\sigma\left(\sqrt{2\log\left(\frac{1}{\delta}\frac{\mathrm{det}(\Lambda_{\xi})^{1/2}}{\mathrm{det}(\Lambda_{0})^{1/2}}\right)}+\sqrt{\frac{\bar{\lambda}(\Lambda_{0})}{\underline{\lambda}(\Lambda_{\xi})}\mathcal{X}_{d}^{2}(1-\delta)}\right),

with probability at least $(1-2\delta)$ .

Proof.

Proof of this result follows from [13] and [37]. In the middle of the proof of Theorem 1 in [13], it is shown that $|a^{\top}(\bar{\theta}_{\xi}-\theta^{*}_{\xi})|\leq\|a\|_{\Lambda_{\xi}^{-1}}\beta_{\xi}$ holds for any $a\in\mathbb{R}^{d}$ with probability at least $(1-2\delta)$ (see pp. 18 of [13]). Then, (16) is obtained by substituting $\phi_{w}(z)$ to $a$ in the inequality. ∎

Since the difference between the true value and mean prediction is bounded by $\|a\|_{\Lambda_{\xi}^{-1}}\beta_{\xi}$ , we can provide the lower bound of $h_{\mathrm{IS}}$ corresponding to a robot state $x$ as the following.

\displaystyle h^{b}(x;\theta_{\xi}):=\bar{h}_{\mathrm{IS}}(v(x);\theta_{\xi})-\|\phi(v(x))\|_{\Lambda_{\xi}^{-1}}\beta_{\xi},

(17)

where $v:\mathbb{R}^{n}\rightarrow\mathbb{R}^{2}$ is the function that maps the robot’s state to the robot’s position in 2D space. By using $h^{b}$ as the CBF, we can provide a probabilistic guarantee of the resulting controller as elaborated in Section V.

V Online Control Execution

Input :

f

g

(system dynamics model);

p(\xi)

(distribution of obstacles);

(w,\bar{\theta}_{0},\Lambda_{0})

(prior parameters);

\delta

(confidence level);

V

(CLF);

T

(execution time)

\xi_{i}\sim p(\xi),\ \mathcal{B}_{i}\leftarrow{\emptyset},\ \forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}}

;

3while $t<T$ do

4 if $t=k\Delta_{\mathrm{lidar}}$ , $k\in\mathbb{Z}_{\geq 0}$ then

5 [CBF update]

6 for $i=1:N_{\mathrm{obs}}$ do

7 Obtain sensor measurements

\mathcal{P}^{0}_{i}

;

8 Construct dataset

D_{i}

by procedures in Sec IV-A

9 Update the buffer

\mathcal{B}_{i}

by procedures in Sec V-A;

11 if $\mathcal{B}_{i}\neq\emptyset$ then

12 Compute the posterior parameters

\bar{\theta}_{\xi_{i}}

and

\Lambda_{\xi_{i}}

by (11);

13 Set

h^{b}(\cdot;\theta_{\xi_{i}})={h}^{b}_{IS}(v(\cdot);\theta_{\xi_{i}})

by (17);

14 [Control execution]

15 Solve the QP (5) with the CBFs obtained above;

16 Solve (1) and update state

x(t)

;

Algorithm 1 Online control execution

Given system (1), meta-learned parameters $(w,\bar{\theta}_{0},\Lambda_{0})$ , and CLF $V$ that encodes a control objective, the procedures in the online control execution is as summarized in Algorithm 1. Before the execution, an environment is determined by sampling the obstacles $\xi_{i},\ \forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}}$ from the distribution $p(\xi)$ and the data buffers $\mathcal{B}_{i}$ for $\xi_{i},\ \forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}}$ are initialized by empty sets (Line 1). Then, the control is executed by repeatedly solving the QP (5) with the CBFs (17) defined using the current posteriors $\bar{\theta}_{\xi_{i}}$ and $\Lambda_{\xi_{i}}$ calculated by the data in the current buffers (Line 13,14). The buffers $\mathcal{B}_{i}$ and the posterior parameters $\bar{\theta}_{\xi_{i}}$ and $\Lambda_{\xi_{i}}$ $\forall i\in\mathbb{N}_{1:N_{\mathrm{obs}}}$ are updated at every $\Delta_{\mathrm{lidar}}\in\mathbb{R}_{>0}$ [s] through the procedures in the following subsection (Line 3-11). Since each buffer $\mathcal{B}_{i}$ is empty at the beginning of the execution, we ignore the CBF constraint for $\xi_{i}$ until surface points of the corresponding obstacle are detected. To justify this, we make the following assumption.

Assumption 3.

The length of the laser is long enough and the measurements are taken constant enough such that the control invariant of the safe region is maintained even if the unobserved obstacles are ignored in the current QP.

In the following subsections, we discuss the update scheme of each buffer $\mathcal{B}_{i}$ and the theoretical results of the proposed control scheme.

V-A The update scheme of each buffer $\mathcal{B}_{i}$

The update procedures of each buffer $\mathcal{B}_{i}$ are discussed here. We ignore the subscription $i$ in the following discussion for simplicity of notation. At time instance $k$ , suppose we have new data points $D_{k}=\cup_{i=1}^{n_{k}}\cup_{p=n_{-}}^{n_{+}}\{(z_{k}^{+p\Delta,i},p\Delta)\}$ obtained by following the procedures in Section IV-A and the buffer $\mathcal{B}$ which is constructed by the data points obtained at past time instances 0 to $k-1$ . Intuitively, the data points in $D_{k}$ will include many data points that are similar to those obtained in the past instances if the time interval of updating the dataset is small. Thus, it is important to select only the informative data points before adding them to the buffer. Here, we consider the data selection scheme for this purpose based on [39]. We repeat the following procedures for $i\in\mathbb{N}_{1:n_{k}}$ . First, we compute the predictive variance at a data point on the surface $z_{k}^{0,i}$ with posterior parameters calculated by the current $\mathcal{B}$ as $\Sigma_{\xi}(z_{k}^{0,i})$ . Then, if the value $\Sigma_{\xi}(z_{k}^{0,i})$ is larger than the prescribed threshold $\eta\in\mathbb{R}_{\geq 0}$ , the data points $\cup_{p=n_{-}}^{n_{+}}(z_{k}^{p\Delta,i},p\Delta)$ are considered to be informative for the current model and thus added to the buffer $\mathcal{B}$ .

V-B Theoretical Results

The probabilistic safety guarantees for our control scheme are summarized in the following theorem.

Theorem 4.

Suppose Assumption 3 holds and the offline meta-training of the parameters $w$ , $\bar{\theta}_{0}$ , and $\Lambda_{0}$ is performed to satisfy Assumption 1 and 2, and feasible solution of the QP in Algorithm 1 is obtained at all time $t<T$ . Then, $x(t)\in\mathcal{C}_{\xi_{i}}$ holds with probability at least $1-2\delta$ for all $t\in\mathbb{R}_{>0}$ and $i\in\mathbb{N}_{1:N_{\mathrm{obs}}}$ .

Proof.

From Theorem 1, the implicit control policy defined by the QP in Algorithm 1 keeps the system states within $\hat{\mathcal{C}}_{\xi_{i}}:=\{x\in\mathcal{D}\subset\mathbb{R}^{n}\mid h^{b}(x,\xi_{i})\geq 0\}$ for all of the observed obstacles $\xi_{i}$ . Since each state in $\hat{\mathcal{C}}_{\xi_{i}}$ is included in the actual safe set $\mathcal{C}_{\xi_{i}}$ with probability at least $1-2\delta$ from Theorem 3, we can guarantee $x(t)\in\mathcal{C}_{\xi_{i}}$ with probability at least $1-2\delta$ for all $t\in\mathbb{R}_{>0}$ and the observed obstacles $\xi_{i}$ . This result and Assumption 3 prove the theorem. ∎

VI Case Study

We test the proposed approach through a simulation of vehicle navigation in a 2D space. All the experiments are conducted on Python running on a Windows 10 with a 2.80 GHz Core i7 CPU and 32 GB of RAM. TensorFlow and PyBullet [38] are used for the meta-learning of the IS function and implementation of the vehicle system with LiDAR scanners, respectively. We define the controlled system by the unmanned vehicle that has the following dynamics:

\displaystyle\underbrace{\begin{bmatrix}\dot{q_{x}}\\ \dot{q_{y}}\\ \dot{\vartheta}\end{bmatrix}}_{\dot{x}}=\underbrace{\begin{bmatrix}0\\ 0\\ 0\end{bmatrix}}_{f(x)}+\underbrace{\begin{bmatrix}\cos\vartheta&-\ell\sin\vartheta\\ \sin\vartheta&\ell\cos\vartheta\\ 0&1\end{bmatrix}}_{g(x)}\underbrace{\begin{bmatrix}v\\ \omega\end{bmatrix}}_{u},

(18)

where $q_{x}$ , $q_{y}$ , and $\vartheta$ are $x-y$ coordinates of the robot position and heading angle of the robot, $v$ and $\omega$ are the velocity and angular velocity of the vehicle, respectively. The control inputs and system states are defined as $u=[v,\omega]^{\top}\in\mathbb{R}^{2}$ and $x=[q_{x},q_{y},\vartheta]\in\mathbb{R}^{3}$ , respectively. Note that in (18), we consider the dynamics of a point off the axis of the robot by a distance $\ell\ (>0)$ to make the system relative degree one [40]. Moreover, the LiDAR scanner which has a 360-degree field of view and radiates 150 rays with a 3-meter length is assumed to be mounted on the vehicle.

VI-A Meta learning of unsafe regions

We first test the performance of the proposed scheme for constructing the IS function $h_{\mathrm{IS}}:\mathbb{R}^{2}\rightarrow\mathbb{R}$ . We consider 2 settings: (I) ellipsoidal obstacles with randomly chosen semi-axes lengths, center positions, and rotation angles (II) Objects in Fig. 3 with randomly chosen positions, rotation angles, and scales. In setting (I), the IS function $h_{\mathrm{IS}}$ is defined by the ellipsoids as follows:

	$\displaystyle h_{\mathrm{IS}}(z;\xi)=\frac{[(q_{x}-q_{x,0}^{\xi})\cos\vartheta_{\mathrm{ob}}^{\xi}+(q_{y}-q_{y,0}^{\xi})\sin\vartheta_{\mathrm{ob}}^{\xi}]^{2}}{c^{2}_{x,\xi}}$
	$\displaystyle\quad+\frac{[(q_{x}-q_{x,0}^{\xi})\sin\vartheta_{\mathrm{ob}}^{\xi}-(q_{y}-q_{y,0}^{\xi})\cos\vartheta_{\mathrm{ob}}^{\xi}]^{2}}{c^{2}_{y,\xi}}-1,$		(19)

where $c_{x}^{\xi}$ , $c_{y}^{\xi}\ (>0)$ are the semi-axes lengths along the $x$ and $y$ coordinates respectively, $\vartheta_{\mathrm{obs}}^{\xi}$ is the rotation angle, $(q_{x,0}^{\xi},q_{y,0}^{\xi})$ is the center of the ellipsoid, and $z=[q_{x},q_{y}]^{\top}$ . In this example, the unobserved parameter $\xi$ can be explicitly written as $\xi=[c_{x}^{\xi},c_{y}^{\xi},q_{x,0}^{\xi},q_{y,0}^{\xi},\vartheta_{\mathrm{obs}}^{\xi}]^{\top}$ . The parameters ( $c_{x}^{\xi}$ , $c_{y}^{\xi}$ ), ( $q_{x,0}^{\xi}$ , $q_{y,0}^{\xi}$ ), and $\vartheta_{\mathrm{ob}}^{\xi}$ are randomly sampled from the uniform distribution in the range $[0.4,0.8]$ , $[-0.8,0.8]$ , and $[0,2\pi]$ , respectively. In setting (II), we used object files in pybullet_data [41] to test with more arbitrary shapes. The rotational angles and the center position of the objects are sampled in the same way as setting (I). For both cases, the architecture of $\phi$ in (10) is defined by three hidden layers of 256 units with 32 basis functions. Moreover, we set $N_{\mathrm{ite}}$ , $\sigma$ , $n_{-}$ , $n_{+}$ , and $\Delta$ to 30000, 0.001, 1, 5, and 0.1, respectively.

The prediction of an obstacle surface calculated with a small amount of data is shown in Figure 4. The blue lines and blue shaded areas in the figure are the mean predictions of the surfaces and the regions between the 0-level sets of the upper and lower 90% bounds of $h_{\mathrm{IS}}$ . We can see that the confidence bounds are reasonably tight even when the number of data points for the adaptation is small, which is the main effect of the offline meta-training. Moreover, same as reported in [12], the time required for the adaptation is much smaller than that of the GP [42] which is a commonly used Bayesian method. More results including the comparison to the GP in terms of both the accuracy of the prediction and time required for the training are provided in Appendix -C.

VI-B Control execution

Based on the trained $h_{\mathrm{IS}}$ , we next consider executing the control with Algorithm 1. The control objective is to steer the vehicle toward the goal position $(q_{x}^{g},q_{y}^{g})$ while avoiding obstacles. The goal-reaching objective is encoded through the CLF $V(x)=(q_{x}-q_{x}^{g})^{2}+(q_{y}-q_{y}^{g})^{2}$ . Moreover, $\alpha_{C}$ and $\alpha_{V}$ in (5) are defined by the linear functions $\alpha_{C}(c)=\gamma_{C}c$ and $\alpha_{V}(c)=\gamma_{V}c$ . We set parameters $\gamma_{C}$ , $\gamma_{V}$ , and $\lambda$ to $1.0$ , $1.0$ , and 10, respectively. For comparison, we test the control performance of the proposed scheme and the case where the GP is used instead of ALPaCA. Since the theoretically guaranteed bounds of GP [43, 44] tend to be overly conservative, we instead use the 2- $\sigma$ lower bounds in the GP case. The metric of the control performance is defined by the cumulative squared error (CSE) of the difference between the robot and goal positions, along with the resulting state trajectory (the squared errors are corrected every 0.02 [s]). An example of the control execution with $\Delta_{\mathrm{lidar}}=5$ [s] is visualized in Figure 5. From Figure 5, we can see the vehicle successfully reaches the goal while avoiding the unsafe region. Notably, the control of the proposed method is much less conservative than the GP case because of the tight prediction of the unsafe region (the vehicle in the GP case is often forced to be stacked in small safe regions until the subsequent update of the CBF). The CSEs in this case for the proposed method and GP case are 618 and 1010, respectively. We additionally provide results for different 5 environments with single or multiple obstacles in Appendix -D (the results for the cases $\Delta_{\mathrm{lidar}}=1,3,5$ are shown) and similar results are observed. Although the CSE of the proposed method and the GP case get closer as $\Delta_{\mathrm{lidar}}$ becomes smaller, it is ideal to maintain the value $\Delta_{\mathrm{lidar}}$ to be small to save computation and the use of sensor devices that have limited batteries. Moreover, the GP prediction can not be updated so frequently because of the time required for the training (see the time analysis in Appendix -C). From the results above, we can conclude that the proposed scheme has a practical advantage from the perspective of control.

VII Conclusion

In this paper, we proposed an efficient and probabilistically ensured online learning method of CBF with sensor measurements. In the proposed method, we utilized a technique based on the GPIS to train unsafe regions from the measurements. Specifically, a Bayesian meta-learning scheme was employed to learn IS function, which enables us to effectively use past data from different settings. To avoid the volume of data buffers becoming unnecessarily large in the online control execution, we also considered the data selection scheme which effectively uses the uncertainty information provided by the current model. The prediction made by the proposed scheme was readily incorporated in the CBF-CLF-QP formulation and the probabilistic safety guarantee for the control scheme was also provided. In the case study, we have shown the efficacy of our method in terms of the conservativeness of the prediction, the time required for the online training, and the conservativeness of the control performance.

References

[1] E. F. Camacho and C. B. Alba, Model predictive control. Springer science & business media, 2013.
[2] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in European Control Conference (ECC), 2019, pp. 3420–3431.
[3] A. D. Ames, K. Galloway, K. Sreenath and J. W. Grizzle, “Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics,” in IEEE Transactions on Automatic Control, vol. 59, no. 4, pp. 876–891, 2014.
[4] A. D. Ames, X. Xu, J. W. Grizzle and P. Tabuada, “Control Barrier Function Based Quadratic Programs for Safety Critical Systems,” in IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2017.
[5] D. Q. Mayne, M. M. Seron, and S. Rakovic, “Robust model predictive ´ control of constrained linear systems with bounded disturbances,” Automatica, vol. 41, no. 2, pp. 219–224, 2005.
[6] A. Clark, “Control Barrier Functions for Complete and Incomplete Information Stochastic Systems,” in American Control Conference (ACC), Philadelphia, 2019, pp. 2928-2935.
[7] K. Garg and D. Panagou, “Robust Control Barrier and Control Lyapunov Functions with Fixed-Time Convergence Guarantees,” 2021 American Control Conference (ACC), 2021, pp. 2292-2297.
[8] Y. Chen, H. Peng, and J. Grizzle, “Obstacle Avoidance for Low-Speed Autonomous Vehicles With Barrier Function,” in IEEE Transactions on Control Systems Technology, vol. 26, no. 1, pp. 194–206, 2018.
[9] A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs with application to adaptive cruise control,” in IEEE Conference on Decision and Control, 2014, pp. 6271–6278.
[10] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1126–1135.
[11] T. Hospedales, A. Antoniou, P. Micaelli and A. Storkey, “Meta-Learning in Neural Networks: A Survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5149–5169, 2022.
[12] J. Harrison, A. Sharma, and M. Pavone, “Meta-learning priors for efficient online Bayesian regression,” in Proc. Workshop Algorithmic Found. Robot., 2018, pp. 318–337.
[13] T. Lew, A. Sharma, J. Harrison, A. Bylard and M. Pavone, ”Safe Active Dynamics Learning and Control: A Sequential Exploration–Exploitation Framework,” in IEEE Transactions on Robotics, vol. 38, no. 5, pp. 2888-2907, 2022.
[14] O. Williams and A. Fitzgibbon, “Gaussian process implicit surfaces,” in Proc. Gaussian Process. Practice Workshop, 2007.
[15] W. Martens, Y. Poffet, P. R. Soria, R. Fitch and S. Sukkarieh, “Geometric Priors for Gaussian Process Implicit Surfaces,” in IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 373–380, 2017.
[16] M. Gerardo-Castro, T. Peynot, and F. Ramos, “Laser-radar data fusion with Gaussian process implicit surfaces,” in Proc. Field Serv. Robot., 2015, pp. 289–302.
[17] C. Dawson, S. Gao, and C. Fan, “Safe Control with Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction methods,” in IEEE Transactions on Robotics, to be appeared, 2023.
[18] A. Taylor, A. Singletary, Y. Yue, and A. Ames, “Learning for safety critical control with control barrier functions,” in Learning for Dynamics and Control, 2020, pp. 708–717.
[19] J. Choi, F. Castaneda, C. Tomlin, and K. Sreenath, “Reinforcement ˜Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions,” in Robotics: Science and Systems, Corvalis, OR, 2020.
[20] P. Jagtap, G. J. Pappas and M. Zamani, “Control Barrier Functions for Unknown Nonlinear Systems using Gaussian Processes,” in IEEE Conference on Decision and Control (CDC), 2020, pp. 3699–3704.
[21] V. Dhiman, M. J. Khojasteh, M. Franceschetti and N. Atanasov, “Control Barriers in Bayesian Learning of System Dynamics,” in IEEE Transactions on Automatic Control, vol. 68, no. 1, pp. 214–229, 2023.
[22] F. Castaneda, J. J. Choi, W. Jung, B. Zhang, C. J. Tomlin, and K. Sreenath, “Probabilistic Safe Online Learning with Control Barrier Functions,” arXiv:2208.10733, 2022.
[23] K. Long, V. Dhiman, M. Leok, J. Cortés and N. Atanasov, ”Safe Control Synthesis With Uncertain Dynamics and Constraints,” in IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7295-7302, July 2022
[24] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 3387–3395.
[25] R. K. Cosner, Y. Yue and A. D. Ames, “End-to-End Imitation Learning with Safety Guarantees using Control Barrier Functions,” in IEEE Conference on Decision and Control (CDC), 2022, pp. 5316–5322.
[26] S. Yaghoubi, G. Fainekos and S. Sankaranarayanan, “Training Neural Network Controllers Using Control Barrier Functions in the Presence of Disturbances,” in IEEE International Conference on Intelligent Transportation Systems (ITSC), 2020, pp. 1–6.
[27] W. Jin, Z. Wang, Z. Yang, and S. Mou, “Neural Certificates for Safe Control Policies,” arXiv:2006.08465, 2020.
[28] A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V. Dimarogonas, S. Tu, and N. Matni, “Learning Control Barrier Functions from Expert Demonstrations,” in IEEE Conference on Decision and Control (CDC), 2020, pp. 3717–3724.
[29] C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions,” in Proceedings of the 5th Conference on Robot Learning, vol. 164, 2022, pp. 1724–1735.
[30] F. B. Mathiesen, S. C. Calvert and L. Laurenti, “Safety Certification for Stochastic Systems via Neural Barrier Functions,” in IEEE Control Systems Letters, vol. 7, pp. 973–978, 2023.
[31] M. Khan, T. Ibuki, and A. Chatterjee, “Gaussian Control Barrier Functions : A Non-Parametric Paradigm to Safety,” arXiv:2203.15474, 2022.
[32] M. Srinivasan, A. Dabholkar, S. Coogan and P. A. Vela, “Synthesis of Control Barrier Functions Using a Supervised Machine Learning Approach,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 7139–7145.
[33] K. Long, C. Qian, J. Cortés and N. Atanasov, “Learning Barrier Functions With Memory for Robust Safe Navigation,” in IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4931–4938, 2021.
[34] A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn, “Learning to adapt in dynamic, real-world environments through meta reinforcement learning. In Int. Conf. on Learning Representations, 2019.
[35] S. M. Richards, N. Azizan, J. E. Slotine, and M. Pavone, “Adaptive control-oriented meta-learning for nonlinear systems,” arXiv preprint arXiv:2103.04490, 2021
[36] K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen, “Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables,” in Proceedings of the 36th International Conference on Machine Learning, 2019, pp. 5331–5340.
[37] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari, “Improved algorithms for linear stochastic bandits,” in Neural Information Processing Systems, 2011.
[38] https://pybullet.org/wordpress/
[39] D. Nguyen-Tuong and J. Peters, “Incremental online sparsification for model learning in real-time robot control,” Neurocomputing, vol. 74, no. 11, pp. 1859–1867, 2011.
[40] J. Cortes and M. Egerstedt, “Coordinated control of multi-robot ´ systems: A survey,” SICE Journal of Control, Measurement, and System Integration, vol. 10, no. 6, pp. 495–503, 2017.
[41] https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet
/gym/pybullet_data
[42] C. E. Rasmussen, “Gaussian processes in machine learning,” in Summer school on machine learning. Springer, 2003, pp. 63–71.
[43] N. Srinivas, A. Krause, S. M. Kakade and M. W. Seeger, “Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting,” in IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 3250–3265, 2012.
[44] K. Hashimoto, A. Saoud, M. Kishida, T. Ushio, and D. Dimarogonas, “Learning-based safe symbolic abstractions for nonlinear control systems,” Automatica, 2022.
[45] https://gpy.readthedocs.io/en/deploy/

-A Pseudo code of offline meta-learning

The Pseudo code of offline meta-learning is summarized in Algorithm 2.

Input :

p(\xi)

(distribution of obstacles);

\sigma

(noise variance);

1 Randomly initialize the parameters

\bar{\theta}_{0},\Lambda_{0},w

3for $i=1:N_{\mathrm{ite}}$ do

4 for $j=1:J$ do

5 Sample obstacles

\xi_{j}\sim p(\xi)

;

6 Construct datasets corresponding to the sampled obstacles as

D_{\xi_{j}}=\cup_{i=1}^{n_{j}}\{(z_{i}^{j},y_{i}^{j})\}

;

7 Sample

n_{j}^{\mathrm{tr}}

from uniform distribution over

\{1,\ldots,n_{j}\}

8 Split the dataset into training and test sets:

D_{\xi_{j}}^{\mathrm{tr}}=\cup_{i=1}^{n_{j}^{\mathrm{tr}}}\{(z_{i}^{\mathrm{tr},j},y_{i}^{\mathrm{tr},j})\}

and

D_{\xi_{j}}^{\mathrm{ts}}=\cup_{i=1}^{n_{j}^{\mathrm{ts}}}\{(z_{i}^{\mathrm{ts},j},y_{p}^{\mathrm{ts},j})\}

9 Compute posterior parameters

\bar{\theta}_{\xi_{j}}

and

\Lambda_{\xi_{j}}

via (11) with data

D_{\xi_{j}}^{\mathrm{tr}}

11 Update

\bar{\theta}_{0},\Lambda_{0},w

via gradient step on the loss (IV-B2) calculated by

\bar{\theta}_{\xi_{j}}

\Lambda_{\xi_{j}}

, and

D_{\xi_{j}}^{\mathrm{ts}}

\forall j\in\mathbb{N}_{1:J}

Algorithm 2 Offline meta-learning

-B Mitigation of conservativeness

To avoid unnecessary large $\beta_{\xi}$ defined in Section IV-B3 and mitigate the conservativeness of the probabilistic bounds without compromising on safety guarantees, we can add the following regularization term in the original loss function (IV-B2) as proposed in [13].

\displaystyle\mathcal{L}_{\mathrm{reg}}^{\xi}(\Lambda_{0})=\gamma\mathrm{Tr}(\Lambda_{\xi}^{-T}\Lambda_{\xi}^{-1})\mathrm{Tr}(\Lambda_{0}^{-T}\Lambda_{0}^{-1}),

(20)

where $\gamma\in\mathbb{R}_{\geq 0}$ represents the weights for the regularization term and $\mathrm{Tr}(\cdot)$ is the determinant of the argument matrix. This term corresponds to the upper bound on the term $\frac{\bar{\lambda}(\Lambda_{0})}{\underline{\lambda}(\Lambda_{\xi})}$ in $\beta_{\xi}$ . Thus, minimizing (20) leads to small $\beta_{\xi}$ value. In the experiment, we have confirmed that the regularization term with $\gamma=10^{-9}$ reasonably reduces the conservativeness of the bounds.

-C Detailed Results for Meta-Learning of Unsafe Regions

We first show the negative log-likelihood (NLL) of the predictions for the cases (I) and (II) discussed in Section VI, in Figure 6. The NLLs against the number of data points are calculated for both the proposed scheme and the case where we use Gaussian process regression (GPR) instead of ALPaCA. NLLs are calculated for 100 different obstacles and the average and 3- $\sigma$ intervals are plotted in the figures. Note that the number of data points in the figures means the number of surface points and the actual number of data points is $n_{+}+n_{-}+1$ ( $=7$ ) times of that. The squared exponential kernel is used for GP and the parameters in the kernel are fitted by solving the maximum likelihood problem. In GP implementation, the Python library GPy [45] is used. From the figures, we can see that the prediction made by the proposed meta-learning scheme is superior to that of the GPR case when the number of data is small. The time analysis is also provided in Figure 7. Figure 7 shows that the proposed scheme can produce the prediction very fast compared to the case in which every time fits the GP, which is especially beneficial for the usage of the online synthesis of CBF-QP since we can more frequently update the CBFs. Additional examples of the prediction are shown in Figure 8a and 8b.

-D Detailed Results for Control Execution

Next, we summarize the detailed results for control execution. In Figures 8– 12, we show the resulting trajectories and the time evolution of the squared errors between the robot positions and the goal positions when $\Delta_{\mathrm{lidar}}=5$ , for all 6 environments. From these figures, we can see that the tight prediction of the proposed scheme leads to less conservative control performance as explained in Section VI. In Table LABEL:CSE, we also summarize the cumulative squared errors between the robot position and goal position for all environments and the cases $\Delta_{\mathrm{lidar}}=1,3,5$ [s].

TABLE I: The cumulative squared errors between the robot position and the goal position

	Proposed			GP
Environment / $\Delta_{\mathrm{lidar}}$	1	3	5	1	3	5
1	554.93	620.12	618.56	543.78	715.75	1010.93
2	654.10	702.75	830.22	697.65	818.47	1094.79
3	534.40	585.63	654.25	533.77	609.42	897.31
4	794.06	794.06	843.25	1792.50	1792.50	1803.99
5	475.41	475.41	486.61	472.43	472.43	503.12
6	1068.52	1068.56	1076.81	1271.35	1271.32	1648.49

Bayesian Meta-Learning on Control Barrier Functions with Data from On-Board Sensors

Abstract

Index Terms:

I Introduction

II Problem Statement

III CBF-CLF-QP

Definition 1.

Theorem 1 ([4]).

Definition 2.

Theorem 2 ([3]).

IV Bayesian Meta-Learning of CBF

IV-A Dataset Construction

IV-B Bayesian meta-learning of the function hISh_{\mathrm{IS}}

IV-B1 Overview of ALPaCA

IV-B2 Offline meta-training procedures

IV-B3 Probabilistic bounds on CBF

Assumption 1.

Assumption 2.

Theorem 3.

Proof.

V Online Control Execution

Assumption 3.

V-A The update scheme of each buffer ℬi\mathcal{B}_{i}

V-B Theoretical Results

Theorem 4.

Proof.

VI Case Study

VI-A Meta learning of unsafe regions

VI-B Control execution

VII Conclusion

References

-A Pseudo code of offline meta-learning

-B Mitigation of conservativeness

-C Detailed Results for Meta-Learning of Unsafe Regions

-D Detailed Results for Control Execution

IV-B Bayesian meta-learning of the function $h_{\mathrm{IS}}$

V-A The update scheme of each buffer $\mathcal{B}_{i}$