This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Safety Guaranteed Robust Multi-Agent Reinforcement Learning with Hierarchical Control for Connected and Automated Vehicles

Zhili Zhang∗1 H M Sabbir Ahmad∗2 Ehsan Sabouni∗2 Yanchao Sun3
Furong Huang3  Wenchao Li2 Fei Miao1
*Z. Zhang, H. M. S. Ahmad and E. Sabouni contributed equally1Z. Zhang, and F. Miao are with the Department of Computer Science and Engineering, University of Connecticut, Storrs Mansfield, CT, USA 06268 {zhili.zhang, fei.miao}@uconn.edu2H. M. S. Ahmad, E. Sabouni, and W. Li are with the Division of Systems Engineering and Department of Electrical & Computer Engineering, Boston University, Boston, MA, USA 02215{sabbir92, esabouni, wenchao}@bu.edu3Y. Sun, and F. Huang are with the Department of Computer Science, University of Maryland, College Park, MD, USA 20742 {ycs, furongh}@umd.edu.
Abstract

We address the problem of coordination and control of Connected and Automated Vehicles (CAVs) in the presence of imperfect observations in mixed traffic environment. A commonly used approach is learning-based decision-making, such as reinforcement learning (RL). However, most existing safe RL methods suffer from two limitations: (i) they assume accurate state information, and (ii) safety is generally defined over the expectation of the trajectories. It remains challenging to design optimal coordination between multi-agents while ensuring hard safety constraints under system state uncertainties (e.g., those that arise from noisy sensor measurements, communication, or state estimation methods) at every time step. We propose a safety guaranteed hierarchical coordination and control scheme called Safe-RMM to address the challenge. Specifically, the high-level coordination policy of CAVs in mixed traffic environment is trained by the Robust Multi-Agent Proximal Policy Optimization (RMAPPO) method. Though trained without uncertainty, our method leverages a worst-case Q network to ensure the model’s robust performances when state uncertainties are present during testing. The low-level controller is implemented using model predictive control (MPC) with robust Control Barrier Functions (CBFs) to guarantee safety through their forward invariance property. We compare our method with baselines in different road networks in the CARLA simulator. Results show that our method provides best evaluated safety and efficiency in challenging mixed traffic environments with uncertainties.

I Introduction

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Intersection (a, b): two HDVs run the red light when two CAVs are passing the box in Intersection scenario. CAVs are in green; HDVs are in red. (1a): CAVs successfully avoid collision with our method during testing with state uncertainties; (1b): CAV1 adopting benchmark (MCP) collides with an HDV because the perturbed location of HDV1 (with yellow triangle) misleads the CAV to believe collision will not happen.

Machine learning models assisted by more accurate on-board sensors, such as camera and LiDARs, have enabled intelligent driving to a certain degree. Meanwhile, advances in wireless communication technologies also make it possible for information sharing beyond an individual’s perception [1, 2]. Through vehicle-to-everything (V2X) communications, it has been shown that shared information can contribute to CAVs decision-making [3, 4, 5], and improve the safety and coordination of CAVs [6, 7, 8].

However, it remains challenging for reinforcement learning (RL) or multi-agent reinforcement learning (MARL)-based decision-making methods to guarantee the safety of CAVs in complicated dynamic environments containing human driven vehicles (HDVs) and optimize the joint behavior of the entire system. For real-world CAVs, state uncertainties that may result from noisy sensor measurements, state estimation algorithms or the communication medium pose another challenge. There can be scenarios where safety is highly correlated with the correctness of state information especially in the presence of HDVs/unconnected vehicles, as shown in Fig.  1b.

In this work, we propose a hierarchical decision-making approach for the coordination and control problem of CAVs in mixed traffic environments. At the top of the hierarchy is a Robust MARL policy that learns the cooperative behavior of CAVs by generating discrete planning actions for each vehicle. At the lower level, to guarantee the safety of each CAV, a MPC controller with control barrier function (CBF) constraints is designed to track the planned path according to the MARL action. Specifically, for cooperative policy-learning, we design a robust MAPPO (RMAPPO) algorithm that optimizes the worst-case Q network [9] as a critic allowing the MAPPO [10] to train a robust policy without simulating potential state uncertainties during the training process. The MPC controller using robust CBFs serves twofold purposes: guarantees safety in the presence of state uncertainties for the CAVs in mixed traffic environment; and tracks the planned path determined by the RMAPPO policy’s actions. In summary, the main contributions of this work are:

  • We propose a hierarchical decision-making framework, Safe-RMM, for CAVs in mixed traffic environments. The framework comprises of two levels whereby the top level is Robust MARL (the "RM" in Safe-RMM) that determines discrete actions conditioned on the behavior of other CAVs and HDVs. The low-level controller uses MPC (the final "M" in Safe-RMM) with CBFs to execute the high-level plan while guaranteeing safety to the neighborhood vehicles through the forward invariance property of CBFs.

  • To handle states uncertainties, we design the robust MARL algorithm which only requires training one more critic for the agents but no prior knowledge of uncertainties. Additionally, the MPC controller is incorporated with robust CBFs to consistently generate safe controls given MARL decisions, and to be endowed with the robustness against erroneous system state.

  • We validate through experiments in CARLA simulator that the proposed Safe-RMM approach significantly improves the collision-free rate and allows the CAV agents to achieve higher overall returns compared to baseline methods. Ablation studies further highlight the contributions of both the robust MARL algorithm and the MPC-CBF controller, as well as their reciprocal effects.

II Related Work

In this section we provide a survey of the existing literature in this area along with their limitations to motivate our proposed approach.

Safe RL and Robust RL

Different approaches have been proposed to guarantee or improve safety of the system, such as defining a safety shield or barrier assisting RL or MARL algorithm in either training or execution stage [11, 12, 13], constrained RL/MARL that learns a risk network [14], an expected cost function [15, 16], or cost constraints from language [17] that define the safety requirements. For MARL of CAV, safety-checking module with CBF-PID controller for each individual vehicle has been designed [8, 18, 19]. However, the above works assume accurate state inputs to RL or MARL algorithm from the driving environment and cannot tolerate noisy or inaccurate state input. Meanwhile, robust RL and robust MARL that only considers to train a policy under state uncertainty or model uncertainty  [9, 20, 21, 22, 23] without explicitly considering the safety requirements have been proposed recently. However, in the multi-agent settings with imperfect observations, considering both safety requirements and robustness in an unified decision-making framework for CAVs still remains challenging.

Rule-Based Approaches

Unified optimization framework poses challenges that can be addressed by decomposing the problem into hierarchical structures. Specifically, the higher level control is responsible for decision making and the lower level control is responsible for safe execution. For the higher level planner, heuristic rule based methods can be employed in which a set of rules govern the behavior of each agent within the system. For instance existing driving behavior models in mixed traffic can be found in [24], [25],[26, 27]. However, these models often lack robustness and make various assumptions about HDVs, which prevents generalization to all scenarios. MPC can be used for the lower level controller due to its ability in reference tracking and handling hard constraints in real time. In situations where imperfect observations are present, robust MPC approaches may be used, such as tube MPC  [28, 29]. Nevertheless, tube-based MPC approaches require a feedback controller that can keep the actual system trajectory close to the nominal one. The calculation of such feedback controller is not trivial in multi-agent systems with nonlinear dynamics. Min-Max MPC  [30] can also be adopted but it is often difficult to solve, and when it is approximated, the approximation can result in an overly conservative solution.

In this work, we consider safe and robust coordination and control for multi-agent CAV systems in mixed traffic-environments with state uncertainties. We define safety as collision-free condition for CAVs, and the concept of robustness refers to agents’ capability of ensuring its performances, including safety and efficiency, with state uncertainties. The proposed hierarchical scheme involves robust MARL that works in tandem with a low-level MPC controller using robust CBFs to guarantee safety for CAVs under input state uncertainties. Our robust MARL algorithm does not require injecting perturbations during training.

III Problem Formulation

III-A Problem Description

We consider the robust cooperative policy-learning problem under uncertain state inputs for CAVs in mixed traffic environments including HDVs that do not communicate or coordinate with CAVs, and various driving scenarios such as multi-lane intersection and highway (as shown in Fig. 1&4). We assume that each CAV can get shared information from V2V and V2I communications. We consider that a CAV agent ii has accurate self-observation of its driving state but potentially perturbed observations of the other vehicles. The two parts collectively constitute its state sis_{i} in reinforcement learning explained in Sec. III-B, and also used by the MPC controller as inputs to the robust CBFs.

III-B Formulation of MARL with State Uncertainty for CAVs

The problem of Multi-Agent Reinforcement Learning with State Uncertainty for CAVs is defined as a tuple G=(𝒮,𝒜,P,{ri},o~,𝒢,γ)G=(\mathcal{S},\mathcal{A},P,\{r_{i}\},\tilde{o},\mathcal{G},\gamma) where 𝒢(𝒩,)\mathcal{G}\coloneqq(\mathcal{N},\mathcal{E}) is the communication network of all CAV agents. 𝒮\mathcal{S} is the joint state space of all agents: 𝒮:=𝒮1××𝒮n\mathcal{S}:=\mathcal{S}_{1}\times\dots\times\mathcal{S}_{n}. The state space of agent ii: 𝒮i={oi,o𝒩i,o𝒩iUV}\mathcal{S}_{i}=\{o_{i},o_{\mathcal{N}_{i}},o_{\mathcal{N}_{i}^{UV}}\} contains self-observation oio_{i}, observations o𝒩i={oj|j𝒩i}o_{\mathcal{N}_{i}}=\{o_{j}|j\in\mathcal{N}_{i}\} being the communicated message shared by neighbor connected agents 𝒩i\mathcal{N}_{i}, observations o𝒩iUVo_{\mathcal{N}_{i}^{UV}} of unconnected vehicles 𝒩iUV\mathcal{N}_{i}^{UV} either observed by agent ii itself or shared by other agents or infrastructures. For example, self-observation oio_{i} and shared observations o𝒩io_{\mathcal{N}_{i}} can contain location, velocity, acceleration and lane-detection: (𝒍,𝒗,𝜶,LD)(\bm{l},\bm{v},\bm{\alpha},LD); observations of unconnected vehicles 𝒩iUV\mathcal{N}_{i}^{UV} can contain location and velocity (𝒍,𝒗)(\bm{l},\bm{v}).

In this work, we consider agent ii suffers from uncertain observed locations and velocities of other vehicles aside from the ego vehicle (i.e. s~i={oi}{o~j|j𝒩i𝒩iUV}\tilde{s}_{i}=\{o_{i}\}\bigcup\{\tilde{o}_{j}|j\in\mathcal{N}_{i}\bigcup\mathcal{N}_{i}^{UV}\}), where o~j\tilde{o}_{j} denotes an uncertain observation over vehicle jj in comparison with the true accurate self-observation oio_{i}. The uncertainty is defined by bounded errors (el,ev)o~=(𝒍~,𝒗~),𝒍~=𝒍+el,𝒗~=𝒗+ev(e_{l},e_{v})\coloneqq\tilde{o}=(\tilde{\bm{l}},\tilde{\bm{v}}),\tilde{\bm{l}}=\bm{l}+e_{l},\tilde{\bm{v}}=\bm{v}+e_{v}. The implementation of state uncertainty in testing experiments is explained in Sec. V.

The joint action set is 𝒜:=𝒜1××𝒜n\mathcal{A}:=\mathcal{A}_{1}\times\cdots\times\mathcal{A}_{n} where 𝒜i={ai,1,ai,2,,ai,3+k}\mathcal{A}_{i}=\{a_{i,1},a_{i,2},\cdots,a_{i,3+k}\} is the discrete finite action space for agent ii. ai,1a_{i,1}: KEEP-LANE-MAX - the CAV ii maximize its reference throttle in the current lane. ai,2a_{i,2}: CHANGE-LANE-LEFT - the CAV ii changes to its left lane. ai,3a_{i,3}: CHANGE-LANE-RIGHT - the CAV ii changes to its right lane. In experiment, the path planner will set a target waypoint trajectory onto its left/right neighboring lane. ai,4,,ai,3+ka_{i,4},\ldots,a_{i,3+k} are kk lane-keeping actions associated with different reference throttle values. By choosing action ai,5a_{i,5}, for example, reference throttle value throttle5throttle_{5} will first converted into reference acceleration then it will be fed to the controller and the calculation of safe control is introduced in IV-B. The state transition function is P:𝒮×𝒜×𝒮[0,1]P:\mathcal{S}\times\mathcal{A}\times\mathcal{S}\mapsto[0,1]. The reward functions {ri𝒮×𝒜}\{r_{i}\coloneqq\mathcal{S}\times\mathcal{A}\mapsto\mathbb{R}\} are defined as

ri(s,a)=\displaystyle r_{i}(s,a)= jμi,jv𝒗j2+jμi,jl𝒍j𝒅j2+jμi,jsrjs(s,a)\displaystyle\sum_{j}\mu^{v}_{i,j}\|\bm{v}_{j}\|_{2}+\sum_{j}\mu^{l}_{i,j}\|\bm{l}_{j}-\bm{d}_{j}\|_{2}+\sum_{j}\mu^{s}_{i,j}r^{s}_{j}(s,a) (1)

in which 𝒗j\bm{v}_{j} is vehicle jj’s velocity; 𝒅j\bm{d}_{j} is jj’s default destination and rjsr^{s}_{j} is the safety reward. μv,μl,μs\mu^{v},\mu^{l},\mu^{s} are non-negative weights balancing the proportions of individual and total achievement. As break-downs of the safety reward rjs(s,a)=pCol(s,a)+pMPC(s,s𝒩i,a)r^{s}_{j}(s,a)=p^{\textit{Col}}(s,a)+p^{\textit{MPC}}(s,s_{\mathcal{N}_{i}},a), the collision penalty pCol(s,a)p^{\textit{Col}}(s,a) penalizes collision and pMPC(s,s𝒩i,a)p^{\textit{MPC}}(s,s_{\mathcal{N}_{i}},a) penalizes the infeasiblity of the low level controller.

IV Methodology

Refer to caption
Figure 2: Safe-RMM algorithm. The figure demonstrates an agent’s decision pipeline while other agents share the same procedure. During training, both Value network and worst-Q network join the update of actor’s policy. During testing, Agent ii takes states with uncertainty to its actor and samples the high level action aia_{i}, which is subsequently handled by robust MPC controller for path-planning and generating safe control 𝒖i\boldsymbol{u}_{i}.

We present Safe-RMM – our hierarchical decision-making framework design in this section. We begin by presenting the design of our Robust MARL algorithm and subsequently present the details of the MPC controller using robust CBFs. Our Robust MARL algorithm augments MAPPO [10] such that each PPO agent is equipped with a worst-case Q network [9]. The worst-case Q estimates the potential impact of state perturbations on the policy’s action selection and the resulting expected return. By incorporating it into the policy’s training objective, we enhance the robustness of the trained policy against state perturbations. Consequently, the proposed algorithm improves both the safety and efficiency of CAVs under state uncertainties. We present our proposed safe MARL algorithm, Safe-RMM, in Algorithm 1.

1 Initialize policy, PPO critic and Worst-case Q networks 𝜽i0,ϕi0,𝝎i0\boldsymbol{\theta}_{i}^{0},\boldsymbol{\phi}_{i}^{0},\boldsymbol{\omega}_{i}^{0}; worst-Q weight 𝝀i0=0\boldsymbol{\lambda}_{i}^{0}=0 ;
2 for each episode EE do
3       Initialize s=isi𝒮s=\prod_{i}s_{i}\in\mathcal{S}, 𝒜=i𝒜i\mathcal{A}=\prod_{i}\mathcal{A}_{i}
4       Initialize centralized memory M=M=\emptyset
5       Rollout(s,𝒜)\textbf{{Rollout}}(s,\mathcal{A}): for each step, agent ii do
6             Choose ai𝒜ia_{i}\in\mathcal{A}_{i} based on ϵ\epsilon-greedy, a=aia=\prod a_{i};
7             Computes safe control and safety reward 𝒖isafe,piMPC=MPC(si,ai)\boldsymbol{u}_{i}^{\textit{safe}},p_{i}^{\textit{MPC}}=\textit{MPC}(s_{i},a_{i})IV-B) ;
8             Execute 𝒖isafe\boldsymbol{u}_{i}^{\textit{safe}}, observe next state s=isis^{\prime}=\prod_{i}s^{\prime}_{i} ;
9             Compute rewards r={ri}r=\{r_{i}\} according to (1) Store (si,ai,ri,si),i(s_{i},a_{i},r_{i},s^{\prime}_{i}),\forall i in MM;
10             sss\leftarrow s^{\prime};
11            
12       end for
13      Training: for each agent ii do
14             Compute PPO critic loss gradient ϕi(iV)\nabla_{\boldsymbol{\phi}_{i}}(\mathcal{L}^{V}_{i}) [31];
15             Compute worst-Q critic loss gradient 𝝎i(iQ¯)\nabla_{\boldsymbol{\omega}_{i}}(\mathcal{L}^{\underline{Q}}_{i}) [9];
16             Update ϕiEϕiE+1\scriptstyle\boldsymbol{\phi}_{i}^{E}\leftarrow\boldsymbol{\phi}_{i}^{E+1},𝝎iE𝝎iE+1\scriptstyle\boldsymbol{\omega}_{i}^{E}\leftarrow\boldsymbol{\omega}_{i}^{E+1};
17             Compute policy loss gradients 𝜽i(i(𝜽i))\nabla_{\boldsymbol{\theta}_{i}}(\mathcal{L}_{i}(\boldsymbol{\theta}_{i})) (2);
18             Update 𝜽iE𝜽iE+1\boldsymbol{\theta}_{i}^{E}\leftarrow\boldsymbol{\theta}_{i}^{E+1}, 𝝀iE𝝀iE+1\boldsymbol{\lambda}_{i}^{E}\leftarrow\boldsymbol{\lambda}_{i}^{E+1};
19            
20       end for
21      
22 end for
Algorithm 1 Safe-RMM

IV-A Robust MAPPO

The proposed Robust MARL algorithm (Alg 1; Fig. 2) uses centralized training and decentralized execution. We are inspired by the Worst-case-aware Robust RL framework [9] and designed the robust MAPPO. Each robust PPO agent maintains a policy network π𝜽i\pi^{\boldsymbol{\theta}_{i}} (“actor”) with parameter 𝜽i\boldsymbol{\theta}_{i}, a value network (“critic”) V(s)V(s) with parameter ϕi\boldsymbol{\phi}_{i} and the second critic Q¯𝝎i(si,ai)\underline{Q}^{\boldsymbol{\omega}_{i}}(s_{i},a_{i}) network approximating the worst-case action values with parameter 𝝎i\boldsymbol{\omega}_{i}. In order to learn a safe cooperative policy for the CAVs, the MARL interacts with the MPC controller (Sec. IV-B) during the Rollout process. As the algorithm starts, agent ii’s policy takes initial state sis_{i} and samples an action aia_{i}; the MPC-controller with robust CBFs given aia_{i} computes a safe control 𝒖i\boldsymbol{u}_{i} for the vehicle to execute. The agent receives rir_{i} (1) and all agents synchronously move to the next time-step by observing the new state s=isis^{\prime}=\prod_{i}s^{\prime}_{i}.

During training, both critics account for updating the policy with loss function (2) so that the trained policy can balance the goals between maximizing expectations of advantage A^i\hat{A}_{i} and worst-case return Q¯(𝝎i)\underline{Q}^{(\boldsymbol{\omega}_{i})}. Through value-based state regularization ireg(𝜽i)\mathcal{L}^{reg}_{i}(\boldsymbol{\theta}_{i}), the policy is trained to be robust at crucial “vulnerable” states around which uncertainties are more likely to affect the policy [32, 10].

i(𝜽i)\displaystyle\mathcal{L}_{i}(\boldsymbol{\theta}_{i}) =1NtNmin(ρ𝜽i,clip(ρ𝜽i))(A^it+κQ¯(𝝎i)t)+ireg(𝜽i)\displaystyle=\scriptstyle\frac{1}{N}\sum_{t}^{N}\min(\rho_{\boldsymbol{\theta}_{i}},clip(\rho_{\boldsymbol{\theta}_{i}}))\textstyle(\hat{A}^{t}_{i}+\kappa\underline{Q}^{(\boldsymbol{\omega}_{i})t})\scriptstyle+\mathcal{L}^{reg}_{i}(\boldsymbol{\theta}_{i}) (2)

IV-B Robust CBF-based Model Predictive Control

We adopt receding horizon control to implement the low-level controller for every agent ii in the road network. The low-level controller maps the high-level plans/actions aiπ𝜽ia_{i}\sim\pi_{\boldsymbol{\theta}_{i}} into primitive actions/control inputs for agent ii. Firstly, a path planning function z:𝒜𝒳i×𝒰iz:\mathcal{A}\rightarrow\mathcal{X}_{i}\times\mathcal{U}_{i} is used to map the high-level plans/actions into state and action references, i.e. (𝒙iref,𝒖iref)=z(𝒙i,ai)(\boldsymbol{x}_{i}^{ref},\boldsymbol{u}_{i}^{ref})=z(\boldsymbol{x}_{i},a_{i}), where 𝒳i𝒮i\mathcal{X}_{i}\subset\mathcal{S}_{i} is the state space and 𝒰i\mathcal{U}_{i} is the input space of primitive actions of CAV ii respectively. Subsequently this information is fed to the MPC controller. To prevent collision between agents, safety constraints are incorporated to the controller using CBFs.

We consider that the dynamics for each vehicle is affine in terms of its control input 𝒖\bm{u} as follows:

𝒙˙=f(𝒙)+g(𝒙)𝒖\boldsymbol{\dot{x}}=f(\boldsymbol{x})+g(\boldsymbol{x})\boldsymbol{u}
[x˙y˙ψ˙v˙]𝒙˙(t)=[vcosψvsinψ00]f(𝒙(t))+[00000v/(lf+lr)10]g(𝒙(t))[uϕ]𝒖(t),\underbrace{\left[\begin{array}[]{c}\dot{x}\\ \dot{y}\\ \dot{\psi}\\ \dot{v}\end{array}\right]}_{\dot{\boldsymbol{x}}(t)}=\underbrace{\left[\begin{array}[]{c}v\cos\psi\\ v\sin\psi\\ 0\\ 0\end{array}\right]}_{f\left(\boldsymbol{x}(t)\right)}+\underbrace{\left[\begin{array}[]{cc}0&0\\ 0&0\\ 0&v/(l_{f}+l_{r})\\ 1&0\end{array}\right]}_{g\left(\boldsymbol{x}(t)\right)}\underbrace{\left[\begin{array}[]{c}u\\ \phi\end{array}\right]}_{\boldsymbol{u}(t)}, (3)

where ff and gg are locally Lipschitz, 𝒙𝒳𝒮\boldsymbol{x}\in\mathcal{X}\subset\mathcal{S} denotes the state vector and 𝒖𝒰=[𝒖min,𝒖max]\bm{u}\in\mathcal{U}=[\boldsymbol{u}_{min},\boldsymbol{u}_{max}] denotes the control input. In the above equations x(t),y(t),ψ(t),v(t)x(t),y(t),\psi(t),v(t) represent the current longitudinal position, lateral position, heading angle, and speed, respectively. u(t)u(t) and ϕ(t)\phi(t) are the acceleration and steering angle of vehicle at time tt, respectively, g(x(t))=g\left(x(t)\right)= [gu(𝒙(t)),gϕ(𝒙(t))]\left[g_{u}\left(\boldsymbol{x}(t)\right),g_{\phi}\left(\boldsymbol{x}(t)\right)\right].

We incorporate safety with respect to other vehicles (primarily unconnected vehicles) as follows. Let CAV ii needs to stay safe to a vehicle j{oi,o𝒩i,o𝒩iUV}j\in\{o_{i},o_{\mathcal{N}_{i}},o_{\mathcal{N}_{i}^{UV}}\} in its vicinity. To achieve that we enforce a constraint on vehicles ii by defining a speed dependent ellipsoidal safe region b1(𝒙i,𝒙j)b_{1}\left(\boldsymbol{x}_{i},\boldsymbol{x}_{j}\right) as follows:

b(𝒙i,𝒙j):=(xi(t)xj(t))2(aivi(t))2+(yi(t)yj(t))2(bivi(t))210,b\left(\boldsymbol{x}_{i},\boldsymbol{x}_{j}\right):=\frac{\left(x_{i}(t)-x_{j}(t)\right)^{2}}{\left(a_{i}v_{i}(t)\right)^{2}}+\frac{\left(y_{i}(t)-y_{j}(t)\right)^{2}}{\left(b_{i}v_{i}(t)\right)^{2}}-1\geq 0, (4)

where aia_{i}, bib_{i} are weights adjusting the length of the major and minor axes of the ellipse as illustrated in Fig. 3. We enforce safety constraints on any CAV ii for vehicles in {oi,o𝒩i,o𝒩iUV}\{o_{i},o_{\mathcal{N}_{i}},o_{\mathcal{N}_{i}^{UV}}\} in one of three scenarios which are: i. immediately preceding in the same lane, ii. located in the lane the ego is changing to and iii. arriving from another lane that merges to the lane of the ego vehicle.

Given a continuously differentiable function b:nb:\mathbb{R}^{n}\rightarrow\mathbb{R} and the safe set defined as C:={𝒙n:b(𝒙)0}C:=\{\boldsymbol{x}\in\mathbb{R}^{n}:b(\boldsymbol{x})\geq 0\}, b(𝒙)b(\boldsymbol{x}) is a candidate control barrier function (CBF) for the system (IV-B) if there exists a class 𝒦\mathcal{K} function α\alpha and 𝒖\boldsymbol{u} such that

Lfb(𝒙)+Lgb(𝒙)𝒖+α(b(𝒙))0,L_{f}b(\boldsymbol{x})+L_{g}b(\boldsymbol{x})\boldsymbol{u}+\alpha(b(\boldsymbol{x}))\geq 0, (5)

for all 𝒙C\boldsymbol{x}\in C, where Lf,LgL_{f},L_{g} denote the Lie derivatives along ff and gg, respectively. Additionally, we use CLFs to incorporate state references to the controller. A continuously differentiable function V:nV:\mathbb{R}^{n}\rightarrow\mathbb{R} is a globally and exponentially stabilizing CLF for (IV-B) if there exists constants ci>0c_{i}\in\mathbb{R}_{>0}, i=1,2i=1,2, such that c1𝒙2V(𝒙)c2𝒙2c_{1}||\bm{x}||^{2}\leq V(\bm{x})\leq c_{2}||\bm{x}||^{2}, 𝒖𝒰\boldsymbol{u}\in\mathcal{U}, and the following inequality holds

LfV(𝒙)+LgV(𝒙)u+η(𝒙)e,L_{f}V(\boldsymbol{x})+L_{g}V(\boldsymbol{x})u+\eta(\boldsymbol{x})\leq e, (6)

where ee makes this a soft constraint.

The uncertain state measurements stemming from process noise and measurement noise denoted by 𝒙^(t)\hat{\bm{x}}(t) is expressed as follows:

𝒙^(t)=𝒙(t)+𝒘(t)\hat{\bm{x}}(t)=\bm{x}(t)+\bm{w}(t) (7)

where 𝒘(t)\bm{w}(t) is bounded noise such that 𝒘(t)ϵ\|\bm{w}(t)\|_{\infty}\leq\epsilon. In the presence of noise, the robust CBF constraint becomes the following which has been shown to make the safety set CC forward invariant in [33].

min{𝒘(t):𝒘(t)ϵ}[Lfb(𝒙^(t)𝒘(t))]+Lgb(𝒙^(t)𝒘(t))𝒖(t)\displaystyle\min_{\scriptstyle{\{\bm{w}(t):\|\bm{w}(t)\|_{\infty}\leq\epsilon\}}}[L_{f}b(\scriptstyle\boldsymbol{\hat{x}}(t)-\bm{w}(t)\textstyle)]+L_{g}b(\scriptstyle\boldsymbol{\hat{x}}(t)-\bm{w}(t)\textstyle)\boldsymbol{u}(t)
+α(b(𝒙^(t)𝒘(t)))]0\displaystyle+\alpha(b(\boldsymbol{\hat{x}}(t)-\bm{w}(t)))]\geq 0 (8)
Refer to caption
Figure 3: Illustration of the ellipsoidal safety set.

Finally, the MPC with robust CBF and CLF control problem can be expressed as follows:

min𝒙0:N|k𝒖0:N1|kh=0N1((𝒙h|k𝒙h|kref)TAx(𝒙h|k𝒙h|kref)\displaystyle\displaystyle\min_{\boldsymbol{x}_{0:N|k}\atop\boldsymbol{u}_{0:N-1|k}}\sum_{h=0}^{N-1}\bigg{(}(\boldsymbol{x}_{h|k}-\bm{x}_{h|k}^{ref})^{T}A_{x}(\boldsymbol{x}_{h|k}-\bm{x}_{h|k}^{ref})
+(𝒖h|k𝒖h|kref)TAu(𝒖h|k𝒖h|kref)+𝑩T[𝒙h|k𝒖h|k]+\displaystyle+(\boldsymbol{u}_{h|k}-\bm{u}_{h|k}^{ref})^{T}A_{u}(\boldsymbol{u}_{h|k}-\bm{u}_{h|k}^{ref})+\bm{B}^{T}\begin{bmatrix}\boldsymbol{x}_{h|k}\\ \boldsymbol{u}_{h|k}\end{bmatrix}+
𝜹T𝒆h|k2)+VN(𝒙N|k)\displaystyle\bm{\delta}^{T}\bm{e}_{h|k}^{2}\bigg{)}+V_{N}(\boldsymbol{x}_{N|k})
subject to 𝒙h+1|k=f(𝒙h|k,𝒖h|k),h=0,,N1,\displaystyle\ \ \boldsymbol{x}_{h+1|k}=f(\boldsymbol{x}_{h|k},\boldsymbol{u}_{h|k}),\ \ \ \ \ h=0,\dots,N-1,
(5),(6),h=0,,N1,\displaystyle\eqref{cbf_condition},\eqref{CLF},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ h=0,\ldots,N-1,
𝒖h|k𝒰,𝒙h|k𝒳,xN𝒳fh=0,,N1\displaystyle\boldsymbol{u}_{h|k}\in\mathcal{U},\boldsymbol{x}_{h|k}\in\mathcal{X},x_{N}\in\mathcal{X}_{f}\ h=0,\ldots,N-1

where Axn×nA_{x}\in\mathbb{R}^{n\times n}, Auq×qA_{u}\in\mathbb{R}^{q\times q} are the matrix of weights, 𝑩n+q\bm{B}\in\mathbb{R}^{n+q} is vector of weights, and 𝜹\bm{\delta} is the vector of weights of the penalty terms associated with the relaxation parameters of the CLF constraints.

V Experiments and Evaluations

We conduct our experiment in the CARLA Simulator environment [34], where each vehicle is configured with onboard GPS and IMU sensors and a collision sensor that detects the collision with other objects. We show two challenging scenarios in daily driving, respectively, at Intersection (Fig. 4a) and on Highway (Fig. 4b), where we spawn multiple CAVs and some HDVs randomly. We adopt three types of state uncertainties exclusively for testing: A random error erandU(3,3){e^{\text{rand}}\scriptstyle\sim\textit{U}(-3,3)} (U: uniform distribution); error_over_time: 𝙴𝚁𝚁T\mathtt{ERR}^{T} and error_target_vehicles: 𝙴𝚁𝚁𝒱\mathtt{ERR}^{\mathcal{V}} that impose perturbation at consistent values to CAVs’ states to affect their behavior patterns:

𝙴𝚁𝚁T\displaystyle\mathtt{ERR}^{T} ={(et,et)|etU(e012,e0+12),±e0U(4,5),tT}\displaystyle=\{(e^{t},e^{t})|e^{t}{\scriptstyle\sim\textit{U}(e^{0}-\frac{1}{2},e^{0}+\frac{1}{2})},\pm e^{0}{\scriptstyle\sim\textit{U}(4,5)},t\in T\}
𝙴𝚁𝚁𝒱\displaystyle\mathtt{ERR}^{\mathcal{V}} ={(eν,eν)|eνU(e012,e0+12),±e0U(4,5),ν𝒱}\displaystyle=\{(e^{\nu},e^{\nu})|e^{\nu}{\scriptstyle\sim\textit{U}(e^{0}-\frac{1}{2},e^{0}+\frac{1}{2})},\pm e^{0}{\scriptstyle\sim\textit{U}(4,5)},{\nu}\in\mathcal{V}\}

The former 𝙴𝚁𝚁T\mathtt{ERR}^{T} has state errors for all cars in duration TET\subset E, while the latter 𝙴𝚁𝚁𝒱\mathtt{ERR}^{\mathcal{V}} has a subset of vehicles 𝒱{\mathcal{V}} randomly sampled in each episode and adds uncertainties to how ν𝒱{{\nu}\in\mathcal{V}} are observed by others throughout the current episode.

Intersection

The Fig. 1 and Fig. 4a present an snapshot of the Intersection scenarios with CAVs (green) passing the intersection and HDVs (red) from opposite sides crossing the box at the same time. HDVs pose critical safety threats as they could either hit or be hit by a CAV from the side when driving fast (10m/s\approx 10m/s). CAVs aim to avoid collision and reach the preset destination after passing through the intersection.

Highway

Figures. 4b illustrates the scenario, where CAVs (green) are spawned behind HDVs (red) on a multi-lane highway. During training and evaluation, HDVs keep in their lanes at random speed from [7-9] m/sm/s except one random HDV simulates a stop and go scenario. CAVs aim to avoid any collision and drive at the speed limit of the road to arrive at the destination.

Refer to caption
(a)
Refer to caption
(b)
Figure 4: Intersection (a) and Highway (b) scenarios for testing. (4a) 3 CAVs and 2 HDVs participate (left), CAVs could either dodge (middle) or collide with HDVs (right); (4b) 3 CAVs and 3 HDVs participate in a multi-lane Highway scenario, with one HDV suddenly brakes.
Refer to caption
Figure 5: Discounted Efficiency Returns during Training in Intersection.
Refer to caption
Figure 6: Discounted Efficiency Returns during Training in Highway.

V-A Experiment Results

In this section, we highlight our method’s performances in terms of safety guarantees and robustness against state uncertainty while generalizable to different driving scenarios. We demonstrate through ablation studies that the incorporation of MPC with robust CBF based controller improves the performance of MARL algorithm. Additionally, we also demonstrate that our proposed hierarchical approach with robust MARL also improves the MPC-CBF controller.

We trained three models: our Safe-RMM method, the “non-robust” Safe-MM that also adopts our framework, and the MCP method adopting MARL-PID controller with CBF safety shield [8]. Each model is trained on Intersection and Highway respectively for 200 episodes. In evaluation, aside from the three trained models, we have “MP”, MARL with PID controller, as an example of learning-based method without safety shielding; and we also implemented the benchmark “RULE” adopting a rule-based planner and a robust MPC controller. The “RULE” benchmark has been implemented based on the method proposed in [35], which introduces a safety-guaranteed rule for managing vehicle merging on roadways. This rule ensures safe interactions between vehicles arriving from different roads converging at a common point. Methods are evaluated for 50 episodes in both scenarios under four uncertainty configurations: None (uncertainty-free), random error erande^{\text{rand}}, and two targeting errors 𝙴𝚁𝚁𝒱\mathtt{ERR}^{\mathcal{V}} and 𝙴𝚁𝚁T\mathtt{ERR}^{T}. Training results in Intersection are shown in Fig. 5; evaluations in both scenarios are presented in Table. I. For each entry in both tables, the left integer is the number of collisions happened during evaluation (in 50 episodes); the right number is the agents’ mean discounted return considering only the rewards related to velocity and goal-achievement in (1). We highlight the top performance across all methods with the least collision numbers and the highest efficiency return.

TABLE I: Evaluation Results in Intersection and Highway
Uncertainty
Method None erande^{\text{rand}} 𝙴𝚁𝚁𝒱\mathtt{ERR}^{\mathcal{V}} 𝙴𝚁𝚁T\mathtt{ERR}^{T}
Intersection
Safe-RMM1 0, 162.9 0, 161.4 0, 162.2 0, 161.8
Safe-MM2 0, 157.9 0, 155.7 0, 155.9 0, 155.7
MCP3 3, 65.7 2, 60.6 0, 66.2 2, 67.7
MP4 33, 148.4 41, 149.1 36, 145.9 30, 139.0
RULE5 2, 120.9 1, 113.9 3, 105.5 2, 112.3
Highway
Safe-RMM 0, 162.0 0, 169.4 0, 166.4 0, 161.8
Safe-MM 0, 161.3 0, 168.7 0, 168.7 0, 163.0
MCP 0, 56.8 2, 55.8 1, 60.7 2, 58.4
MP 35, 74.1 34, 74.5 38, 73.8 38, 74.5
  • 1Safe-RMM: our method–Safe Robust MARL-MPC; 2Safe-MM: Safe MARL-MPC adopting same framework as1 but trained without worst-case Q. Benchmarks: 3MCP: MARL-PID with CBF Safety Shield; 4MP: MARL-PID without shielding; 5RULE: rule-based planner with MPC controller.

  • Each entry in the table above contains (number of collisions; mean of episodes discounted efficiency return).

V-A1 Top Safety and Efficiency Achieved by the Framework

Our proposed MARL-MPC framework demonstrates top safety performance as both Safe-RMM and Safe-MM achieve zero collisions across all evaluation scenarios, even when subjected to uncertainties. Additionally, Safe-RMM and Safe-MM rank among the top two in terms of efficiency across all settings. Our proposed approach with robust MPC controller enables MARL to fully realize its potential, allowing its decisions to be executed with accuracy and receiving precise reward feedback. The reciprocation between the MARL and MPC controller enhances policy training and demonstrates that ensuring safety in autonomous vehicles need not compromise efficiency.

In contrast, the MP baseline, adopting MARL without safety shielding, experiences collisions in 60%-80% of evaluation episodes as shown in Table I. The result highlights the limited safety-awareness of pure learning-based approaches in complex driving scenarios. The MCP baseline, which incorporates a CBF-based safety shield, significantly reduces collisions to just 3% of episodes in average. However, this comes at the cost of performance efficiency due to more conservative behaviors. Specifically, MCP in the presence of safety shield shows a 54% performance drop in the Intersection scenario and a 23% drop in the Highway scenario compared to the MP method. These results verify that when applying learning-based algorithm in CAV system, the policy and the controller (or the action actuator) are not independent. Our Safe-RMM having MARL enabled by the accuracy and safety of MPC controller achieves comprehensively best performance while the same MARL algorithm is hindered by the limited capability of PID controller from reaching its optimality.

V-A2 Ablation Study with Rule-based Benchmark and Robustness

We conducted an ablation study by evaluating the rule-based benchmark “RULE” in the Intersection scenario. As shown in Table I, our Safe-RMM method outperforms “RULE” in both safety and efficiency metrics. Compared with other benchmarks, the rule-based method offers a more “balanced” performance – being significantly safer and less “reckless” than MP, while achieving comparable safety and much higher efficiency than MCP. However, in a safety-critical scenario like the Intersection, more precise decision-making is required for CAVs to ensure both safety and speed. The lack of adaptability and learning capability limits rule-based approaches from achieving comparable performance as our approach in these complex scenarios.

From the results of the RULE benchmark in Table I, we observe a 6%-13% drop in efficiency when comparing episodes with and without state uncertainties. This supports our earlier assessment in Sec. II that rule-based methods, as high-level planners, generally lack the robustness of learning-based approaches. Furthermore, a comparison between Safe-RMM and its “non-robust” counterpart, Safe-MM, shows that Safe-RMM outperforms Safe-MM by approximately 4% in efficiency across all evaluation settings in Intersection. However, this advantage does not extend to the Highway scenario. Our analysis of both scenarios suggests that the uncertainties of HDVs come not only from tested errors, but also from the initial randomization of HDVs’ location and speed. In Intersection, even slight variations in vehicle states may require completely different optimal strategies by the CAVs – either seizing the opportunity to pass ahead of the HDVs, or yielding for safety at the cost of efficiency. Safe-RMM demonstrates greater robustness against these uncertainties and can effectively manage these “critical” moments, optimizing for higher expected returns with a worst-case consideration. However, in less safety-critical scenarios like Highway, the worst-case awareness of Safe-RMM can result in sub-optimality. In these cases, the algorithm may avoid taking greedier actions and have lower efficiency, even when such actions carry little risk.

VI Conclusion

In this work, we study the safe and robust planning and control problem for connected autonomous vehicles in common driving scenarios under state uncertainties. We propose the Safe-RMM algorithm for coordinated CAVs and further validate through experiments the effectiveness of our method. MARL can enhance the performance ceiling of MPC controller in safety-critical scenarios and robust MPC controller can safely and accurately execute the actions from RL and reciprocally contributes to better trained policies. The method achieves top safety and efficiency performances in evaluation, and maintain robustness against tested perturbations. Future work could consider optimizing the control policy in a mixed traffic scenario with both RL and rule-based intelligent agents.

References

  • [1] D. Martín-Sacristán, S. Roger, D. Garcia-Roger, J. F. Monserrat, P. Spapis, C. Zhou, and A. Kaloxylos, “Low-latency infrastructure-based cellular v2v communications for multi-operator environments with regional split,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 2, pp. 1052–1067, 2020.
  • [2] H. Mun, M. Seo, and D. H. Lee, “Secure privacy-preserving v2v communication in 5g-v2x supporting network slicing,” IEEE Trans. Intell. Transp. Syst., 2021.
  • [3] N. Buckman, A. Pierson, S. Karaman, and D. Rus, “Generating visibility-aware trajectories for cooperative and proactive motion planning,” pp. 3220–3226, 2020.
  • [4] A. Miller and K. Rim, “Cooperative perception and localization for cooperative driving,” pp. 1256–1262, 2020.
  • [5] S. Han, H. Wang, S. Su, Y. Shi, and F. Miao, “Stable and efficient shapley value-based reward reallocation for multi-agent reinforcement learning of autonomous vehicles,” pp. 8765–8771, 2022.
  • [6] J. Rios-Torres and A. A. Malikopoulos, “A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 5, pp. 1066–1077, May 2017.
  • [7] J. Lee and B. Park, “Development and evaluation of a cooperative vehicle intersection control algorithm under the connected vehicles environment,” IEEE Trans. Intell. Transp. Syst., vol. 13, no. 1, pp. 81–90, March 2012.
  • [8] Z. Zhang, S. Han, J. Wang, and F. Miao, “Spatial-temporal-aware safe multi-agent reinforcement learning of connected autonomous vehicles in challenging scenarios,” pp. 5574–5580, 2023.
  • [9] Y. Liang, Y. Sun, R. Zheng, and F. Huang, “Efficient adversarial training without attacking: Worst-case-aware robust reinforcement learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 22 547–22 561, 2022.
  • [10] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperative multi-agent games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 611–24 624, 2022.
  • [11] L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
  • [12] I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng, “Safe multi-agent reinforcement learning via shielding,” in Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, ser. AAMAS ’21.   Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2021, p. 483–491.
  • [13] Z. Cai, H. Cao, W. Lu, L. Zhang, and H. Xiong, “Safe multi-agent reinforcement learning through decentralized multiple control barrier functions,” 2021.
  • [14] L. Wen, J. Duan, S. E. Li, S. Xu, and H. Peng, “Safe reinforcement learning for autonomous vehicles through parallel constrained policy optimization,” pp. 1–7, 2020.
  • [15] S. Lu, K. Zhang, T. Chen, T. Başar, and L. Horesh, “Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning,” vol. 35, no. 10, pp. 8767–8775, 2021.
  • [16] S. Gu, J. Grudzien Kuba, Y. Chen, Y. Du, L. Yang, A. Knoll, and Y. Yang, “Safe multi-agent reinforcement learning for multi-robot control,” Artificial Intelligence, vol. 319, p. 103905, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370223000516
  • [17] Z. Wang, M. Fang, T. Tomilin, F. Fang, and Y. Du, “Safe multi-agent reinforcement learning with natural language constraints,” 2024. [Online]. Available: https://arxiv.org/abs/2405.20018
  • [18] J. Wang, S. Yang, Z. An, S. Han, Z. Zhang, R. Mangharam, M. Ma, and F. Miao, “Multi-agent reinforcement learning guided by signal temporal logic specifications,” arXiv preprint arXiv:2306.06808, 2023.
  • [19] S. Han, S. Zhou, J. Wang, L. Pepin, C. Ding, J. Fu, and F. Miao, “A multi-agent reinforcement learning approach for safe and efficient behavior planning of connected autonomous vehicles,” arXiv:2003.04371, 2022.
  • [20] S. Han, S. Su, S. He, S. Han, H. Yang, and F. Miao, “What is the solution for state adversarial multi-agent reinforcement learning?” arXiv preprint arXiv:2212.02705, 2022.
  • [21] S. He, S. Han, S. Su, S. Han, S. Zou, and F. Miao, “Robust multi-agent reinforcement learning with state uncertainty,” Transactions on Machine Learning Research, 2023.
  • [22] E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, “Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,” IEEE Access, vol. 9, pp. 153 171–153 187, 2021.
  • [23] L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70.   PMLR, 06–11 Aug 2017, pp. 2817–2826. [Online]. Available: https://proceedings.mlr.press/v70/pinto17a.html
  • [24] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,” Physical Review E, vol. 62, pp. 1805–1824, 02 2000.
  • [25] A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model mobil for car-following models,” Transportation Research Record, vol. 1999, no. 1, pp. 86–94, 2007. [Online]. Available: https://doi.org/10.3141/1999-10
  • [26] C. R. Munigety, “Modelling behavioural interactions of drivers’ in mixed traffic conditions,” Journal of Traffic and Transportation Engineering (English Edition), vol. 5, no. 4, pp. 284–295, 2018.
  • [27] J. J. Olstam and A. Tapani, “Comparison of car-following models,” 2004. [Online]. Available: https://api.semanticscholar.org/CorpusID:15720655
  • [28] B. T. Lopez, J.-J. E. Slotine, and J. P. How, “Dynamic tube mpc for nonlinear systems,” in 2019 American Control Conference (ACC).   IEEE, 2019, pp. 1655–1662.
  • [29] D. Q. Mayne and E. C. Kerrigan, “Tube-based robust nonlinear model predictive control,” IFAC Proceedings Volumes, vol. 40, no. 12, pp. 36–41, 2007, 7th IFAC Symposium on Nonlinear Control Systems. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1474667016354994
  • [30] D. M. Raimondo, D. Limon, M. Lazar, L. Magni, and E. F. ndez Camacho, “Min-max model predictive control of nonlinear systems: A unifying overview on stability,” European Journal of Control, vol. 15, no. 1, pp. 5–21, 2009. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0947358009707034
  • [31] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  • [32] H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. Boning, and C.-J. Hsieh, “Robust deep reinforcement learning against adversarial perturbations on state observations,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 024–21 037, 2020.
  • [33] H. M. S. Ahmad, E. Sabouni, A. Dickson, W. Xiao, C. G. Cassandras, and W. Li, “Secure control of connected and automated vehicles using trust-aware robust event-triggered control barrier functions,” 2024. [Online]. Available: https://arxiv.org/abs/2401.02306
  • [34] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” pp. 1–16, 2017.
  • [35] E. Sabouni, H. S. Ahmad, C. G. Cassandras, and W. Li, “Merging control in mixed traffic with safety guarantees: A safe sequencing policy with optimal motion control,” in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), 2023, pp. 4260–4265.