Multi-Adversarial Safety Analysis for Autonomous Vehicles

Gilbert Bahati Civil and Environmental Engineering
University of California - Berkeley
[email protected] Marsalis Gibson Electrical Engineering and Computer Science
University of California - Berkeley
[email protected] Alexandre Bayen Institute of Transportation Studies
University of California - Berkeley
[email protected]

Abstract

This work in progress considers reachability-based safety analysis in the domain of autonomous driving in multi-agent systems. We formulate the safety problem for a car following scenario as a differential game and study how different modelling strategies yield very different behaviors regardless of the validity of the strategies in other scenarios. Given the nature of real-life driving scenarios, we propose a modeling strategy in our formulation that accounts for subtle interactions between agents, and compare its Hamiltonian results to other baselines. Our formulation encourages reduction of conservativeness in Hamilton-Jacobi safety analysis to provide better safety guarantees during navigation.

I Introduction

If autonomous vehicles are to serve as traffic management systems [5], safe navigation around human vehicles on highways and in cities is crucial. However, safe navigation can be difficult to provide because a lot of uncertainty exists in real driving scenarios that complicate the driving problem. Typically, Hamilton-Jacobi reachability analysis (HJI) can be used to find safe strategies around unknown components of a dynamical system [1]. In previous work, researchers develop [2] a framework to protect a system against one known source of uncertainty using Hamilton-Jacobi reachability, with the goal of protecting the system from the worst-case scenario. However, in real driving scenarios, it may be necessary to consider multiple sources of uncertainty. As depicted in figure 1 and 2, extreme worst-case scenarios may never provide a feasible safety strategy, and it may be the case that establishing safety is impossible.

Refer to caption — (a) Two player scenario (2D state system) in fig 2(a)

Therefore, in this work, we study the reduction of conservativeness in Hamilton-Jacobi safety analysis by introducing structure into some or all of the human models. Specifically, we study a modeling strategy around the second disturbance that takes advantage of the structure of human behavior in a way that allows us to use differential game theory in more dense dynamic driving environments.

II System dynamics

We consider a dynamical system with state $z\in\mathbb{R}^{n}$ , and three inputs, $u\in\mathcal{U}\subset\mathbb{R}^{n_{u}}$ , $d_{1}\in\mathcal{D}_{1}\subset\mathbb{R}^{n_{d}}$ , $d_{2}\in\mathcal{D}_{2}\subset\mathbb{R}^{n_{d}}$ , which we refer to as the controls, disturbance 1, and disturbance 2 respectively. Our system dynamics are generally defined as:

\dot{z}=f(z,u,d_{1},d_{2})

(1)

Disturbance 1 and 2 represent the uncertainty around the leading and following human vehicle respectively. In our car following scenario in figure 3, the goal for the autonomous agent is to establish safety and remain in between the other two players given their actions. Thus, the dynamics between all three vehicles can be described using their relative position $x_{j}$ , relative speed $v_{j}$ , and relative accelerations $u_{i}$ as in the following:

\begin{matrix}\begin{bmatrix}\dot{x}_{g_{1}}\\ \dot{v}_{g_{1}}\\ \dot{x}_{g_{2}}\\ \dot{v}_{g_{2}}\\ \end{bmatrix}=\par\begin{bmatrix}v_{g_{1}}\\ u_{1}-u_{2}\\ v_{g_{2}}\\ u_{2}-u_{3}\\ \end{bmatrix}\end{matrix}

(2)

s.t.

u_{i}\in[a_{min},a_{max}],\ \forall\ i=1...3\\

x_{j}>0,\ \forall\ j=g_{1},g_{2}\\

In the next section, Section III, we discuss how we choose our uncertainty and how we pose our safety problem.

III Three-Player Differential Game

The safety problem is posed as a differential game between three players, where the system controller, $u$ , plays against two adversaries, $d_{1}$ and $d_{2}$ , also known as the system’s uncertainty. To obtain a safe policy for the system, we chose a function, $l(\textbf{z})$ , that assigns a safety value to the current state, z and formulate a game whose outcome is given by the function $\mathcal{V}:\mathbb{R}^{n}\times\mathcal{U}\times\mathcal{D}_{1}\times\mathcal{D}_{2}\rightarrow\mathbb{R}$ . $\mathcal{V}$ assigns each initial state z and player strategies $u(\cdot)$ , $d_{1}(\cdot)$ , $d_{2}(\cdot)$ , the lowest value of $l(\cdot)$ ever achieved by a trajectory $\xi^{\textbf{z},u}_{d_{1},d_{2}}(\cdot)$ from state z.

\mathcal{V}(\textbf{z},u(\cdot),d_{1}(\cdot),d_{2}(\cdot))=\inf_{t\geq 0}l(\xi^{\textbf{z},u}_{d_{1},d_{2}}(t))

(3)

The goal of system is to maximize the objective, while the goal of the active adversaries is to minimize the objective. Thus, the game formulation that we want to solve is¹¹1Technically, as in [3], we restrict each disturbance to a set of nonanticipative strategies. Therefore, $d_{1}(\cdot)$ and $d_{2}(\cdot)$ in eq. 4 are actually maps, $\beta_{1}[u(\cdot)](\cdot)$ and $\beta_{2}[u(\cdot)](\cdot)$ , that respectively maps our control input to their corresponding disturbance input.:

V(\textbf{z})=\inf_{d_{1},d_{2}}\sup_{u}\mathcal{V}(z,u(\cdot),d_{1}(\cdot),d_{2}(\cdot))\\

(4)

III-A Player Strategies

We formulate uncertainties $d_{1}$ and $d_{2}$ around the two human driving actions, for example $u_{1}$ and $u_{3}$ , to represent behavioral properties that are trying to perturb the autonomous system. More specifically:

1.
First, we consider a baseline assignment:
- •
  
  $d_{1}=u_{1}$ where $u_{1}\in[a_{min},a_{max}]$
- •
  
  $d_{2}=u_{3}$ where $u_{3}\in[a_{min},a_{max}]$
2.
Then, we consider an alternative assignment for $d_{2}$ by taking advantage of the structure of human driving and modeling $u_{3}$ using a car following model $g(z,d_{2})$ :
- •
  
  $d_{1}=u_{1}$ where $u_{1}\in[a_{min},a_{max}]$
- •
  
  $u_{3}=g(z,d_{2})$

In our second strategy, $d_{2}$ uses psycho-physiological characteristics in human driving as an alternative modeling strategy [4]. Additionally, we ensure that the values of $u_{3}$ are within realistic bounds given all possible autonomous agent’s actions $u_{2}$ . This modelling strategy relaxes unrealistic extremities of the previous dynamic game formulation and implicitly models interaction effects between agents for realistic safety. We model the following vehicle’s driving behavior using the Intelligent Driver’s car following model and explicitly model $d_{2}$ as safe-reaction time, T, as follows:

g(z,T)=a\bigg{(}1-\bigg{(}\frac{v_{3}}{v_{0}}\bigg{)}^{\delta}-\bigg{(}\frac{s^{*}(z,T)}{x_{g_{2}}}\bigg{)}^{2}\bigg{)}

(5)

s^{*}(z,T)=s_{0}+\max\bigg{(}0,v_{3}T+\frac{v_{3}(-v_{g_{2}})}{2\sqrt{ab}}\bigg{)}

(6)

where:
$T$ : safe reaction-time $(ie.\ d_{2}=T\ \in\ [0,T_{max}])$
$s^{*}(z,T)$ : desired headway of the following vehicle
$s_{0}$ : minimum desired headway (ie. $s_{0}=0$ to allow crashes)
$a,b$ : maximum acceleration and deceleration respectively
$\delta,v_{0}$ : acceleration exponent (usually 4) and desired velocity respectively

III-B Resulting policies

The optimal control strategy, $u_{2}^{*}$ , and the optimal disturbance strategy for the first human vehicle, $d_{1}^{*}$ are calculated from (4) using the Hamiltonian numerics to be:

\displaystyle u_{2}^{*}=\begin{cases}a_{max}&\mbox{if }(p_{4}-p_{2})>0\\ a_{min}&\mbox{else }\end{cases}

(7)

\displaystyle d_{1}^{*}=\begin{cases}a_{max}&\mbox{if }p_{2}<0\\ a_{min}&\mbox{else }\end{cases}

(8)

and the optimal disturbance strategy for the second human vehicle is likewise calculated to be:

\displaystyle\text{Baseline }d_{2}^{*}=\begin{cases}a_{max}&\mbox{if }p_{4}>0\\ a_{min}&\mbox{else }\end{cases}

(9)

\displaystyle\text{Alternative }d_{2}^{*}=\begin{cases}\min(T_{max},\max(T_{min},\frac{-v_{g_{2}}}{2\sqrt{ab}}))&\mbox{if }p_{4}>0\\ \displaystyle\max_{T}|2\sqrt{ab}T+v_{g_{2}}|&\mbox{else }\end{cases}

(10)

where: $p_{2}$ = $\nabla_{v_{g_{1}}}V$ and $p_{4}$ = $\nabla_{v_{g_{2}}}V$ .

IV Results and Conclusion

As depicted in figure 4, the alternative formulation for the second disturbance is able to uncover hidden safe strategies for the 3-car scenario. Setting the disturbance and control bounds to [-1.5, 1.5] and [-2, 2] respectively, the baseline produces an empty blue set, meaning there are no guaranteed safe states for the simulation with this particular disturbance/control setting. However, by taking advantage of psycho-physiological characteristics and driver influences on the road, we discover that a non-empty blue set does exist.

In conclusion, when considering worst-case uncertainty in human driving, if we are to guarantee safety in the chaotic world of driving, we may need to incorporate better information structures of human behavior in our analysis and update our assumptions as we uncover more knowledge about the system. However, by choosing a specific model structure to reduce conservativeness of reachability analysis, we run the risk of not being able to capture human behavior some of the time due to the limitations of our chosen model ( which serves to capture only approximations). The performance of a chosen model will vary greatly depending on the particular type of driver and circumstance. Therefore, to tackle this trade-off, we further aim to incorporate real-time analysis and data-driven models to learn disturbances (and how they accurately evolve), and maintain formal and robust safety guarantees using different learning strategies such as deep reinforcement learning.

Acknowledgments

This material is based upon work supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under the Vehicle Technologies Office award number CID DE-EE0008872. The views expressed herein do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

References

Bansal et al. [2017] Somil Bansal, Mo Chen, Sylvia Herbert, and Claire J Tomlin. Hamilton-jacobi reachability: A brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 2242–2253. IEEE, 2017.
Fisac et al. [2018] Jaime F Fisac, Anayo K Akametalu, Melanie N Zeilinger, Shahab Kaynama, Jeremy Gillula, and Claire J Tomlin. A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7):2737–2752, 2018.
Mitchell and Templeton [2005] Ian M Mitchell and Jeremy A Templeton. A toolbox of hamilton-jacobi solvers for analysis of nondeterministic continuous and hybrid systems. In International Workshop on Hybrid Systems: Computation and Control, pages 480–494. Springer, 2005.
Treiber and Thiemann [2013] M. Treiber and C. Thiemann. Traffic Flow Dynamics: Data, Models and Simulation. Springer-Verlag Berlin Heidelberg, Berlin, Germany, 2013.
Wu et al. [2017] Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, and Alexandre M Bayen. Flow: Architecture and benchmarking for reinforcement learning in traffic control. arXiv preprint arXiv:1710.05465, 2017.