This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multi-Adversarial Safety Analysis for Autonomous Vehicles

Gilbert Bahati Civil and Environmental Engineering
University of California - Berkeley
[email protected]
   Marsalis Gibson Electrical Engineering and Computer Science
University of California - Berkeley
[email protected]
   Alexandre Bayen Institute of Transportation Studies
University of California - Berkeley
[email protected]
Abstract

This work in progress considers reachability-based safety analysis in the domain of autonomous driving in multi-agent systems. We formulate the safety problem for a car following scenario as a differential game and study how different modelling strategies yield very different behaviors regardless of the validity of the strategies in other scenarios. Given the nature of real-life driving scenarios, we propose a modeling strategy in our formulation that accounts for subtle interactions between agents, and compare its Hamiltonian results to other baselines. Our formulation encourages reduction of conservativeness in Hamilton-Jacobi safety analysis to provide better safety guarantees during navigation.

I Introduction

If autonomous vehicles are to serve as traffic management systems [5], safe navigation around human vehicles on highways and in cities is crucial. However, safe navigation can be difficult to provide because a lot of uncertainty exists in real driving scenarios that complicate the driving problem. Typically, Hamilton-Jacobi reachability analysis (HJI) can be used to find safe strategies around unknown components of a dynamical system [1]. In previous work, researchers develop [2] a framework to protect a system against one known source of uncertainty using Hamilton-Jacobi reachability, with the goal of protecting the system from the worst-case scenario. However, in real driving scenarios, it may be necessary to consider multiple sources of uncertainty. As depicted in figure 1 and 2, extreme worst-case scenarios may never provide a feasible safety strategy, and it may be the case that establishing safety is impossible.

Refer to caption
(a) Two player scenario (2D state system) in fig 2(a)
Refer to caption
(b) Three player scenario (4D state system) in fig 2(b)
Figure 1: HJI reachability analysis [3] computations for fig 2: the constraint set (green) represents our state boundaries and the reachable safe set (blue) remains within/propagates inwards as t increases. To interpret this, a state that starts within the blue safe set is guaranteed to remain within the green constraint set for t seconds. For ease of visualization, 1(b) is a 3D slice of the 4D state system where the relative velocity with follower is 0.
Refer to caption
(a) Autonomous agent finds optimal strategy
Refer to caption
(b) Autonomous agent fails to reason and find optimal strategy
Figure 2: The black car represents the autonomous agent while the other two cars represent human drivers. In the two car scenario in 2(a), we compute the safe reachable set under the assumption that the leading car acts adversarially. In this scenario, we obtain a feasible solution, see fig 1(a). However, this is not the case when a third human driven car is introduced behind the autonomous agent in 2(b). The autonomous agent cannot find a safe reachable set, see fig 1(b), which means that there are no states or points in time in which the car is guaranteed to be safe if the two human vehicles choose to act adversarially.

Therefore, in this work, we study the reduction of conservativeness in Hamilton-Jacobi safety analysis by introducing structure into some or all of the human models. Specifically, we study a modeling strategy around the second disturbance that takes advantage of the structure of human behavior in a way that allows us to use differential game theory in more dense dynamic driving environments.

II System dynamics

Refer to caption
Figure 3: Car following scenario where the subject vehicle drives in between two human vehicles.

We consider a dynamical system with state znz\in\mathbb{R}^{n}, and three inputs, u𝒰nuu\in\mathcal{U}\subset\mathbb{R}^{n_{u}}, d1𝒟1ndd_{1}\in\mathcal{D}_{1}\subset\mathbb{R}^{n_{d}}, d2𝒟2ndd_{2}\in\mathcal{D}_{2}\subset\mathbb{R}^{n_{d}}, which we refer to as the controls, disturbance 1, and disturbance 2 respectively. Our system dynamics are generally defined as:

z˙=f(z,u,d1,d2)\dot{z}=f(z,u,d_{1},d_{2}) (1)

Disturbance 1 and 2 represent the uncertainty around the leading and following human vehicle respectively. In our car following scenario in figure 3, the goal for the autonomous agent is to establish safety and remain in between the other two players given their actions. Thus, the dynamics between all three vehicles can be described using their relative position xjx_{j}, relative speed vjv_{j}, and relative accelerations uiu_{i} as in the following:

[x˙g1v˙g1x˙g2v˙g2]=[vg1u1u2vg2u2u3]\begin{matrix}\begin{bmatrix}\dot{x}_{g_{1}}\\ \dot{v}_{g_{1}}\\ \dot{x}_{g_{2}}\\ \dot{v}_{g_{2}}\\ \end{bmatrix}=\par\begin{bmatrix}v_{g_{1}}\\ u_{1}-u_{2}\\ v_{g_{2}}\\ u_{2}-u_{3}\\ \end{bmatrix}\end{matrix} (2)

s.t.

ui[amin,amax],i=13u_{i}\in[a_{min},a_{max}],\ \forall\ i=1...3\\
xj>0,j=g1,g2x_{j}>0,\ \forall\ j=g_{1},g_{2}\\

In the next section, Section III, we discuss how we choose our uncertainty and how we pose our safety problem.

III Three-Player Differential Game

The safety problem is posed as a differential game between three players, where the system controller, uu, plays against two adversaries, d1d_{1} and d2d_{2}, also known as the system’s uncertainty. To obtain a safe policy for the system, we chose a function, l(z)l(\textbf{z}), that assigns a safety value to the current state, z and formulate a game whose outcome is given by the function 𝒱:n×𝒰×𝒟1×𝒟2\mathcal{V}:\mathbb{R}^{n}\times\mathcal{U}\times\mathcal{D}_{1}\times\mathcal{D}_{2}\rightarrow\mathbb{R}. 𝒱\mathcal{V} assigns each initial state z and player strategies u()u(\cdot), d1()d_{1}(\cdot), d2()d_{2}(\cdot), the lowest value of l()l(\cdot) ever achieved by a trajectory ξd1,d2z,u()\xi^{\textbf{z},u}_{d_{1},d_{2}}(\cdot) from state z.

𝒱(z,u(),d1(),d2())=inft0l(ξd1,d2z,u(t))\mathcal{V}(\textbf{z},u(\cdot),d_{1}(\cdot),d_{2}(\cdot))=\inf_{t\geq 0}l(\xi^{\textbf{z},u}_{d_{1},d_{2}}(t)) (3)

The goal of system is to maximize the objective, while the goal of the active adversaries is to minimize the objective. Thus, the game formulation that we want to solve is111Technically, as in [3], we restrict each disturbance to a set of nonanticipative strategies. Therefore, d1()d_{1}(\cdot) and d2()d_{2}(\cdot) in eq. 4 are actually maps, β1[u()]()\beta_{1}[u(\cdot)](\cdot) and β2[u()]()\beta_{2}[u(\cdot)](\cdot), that respectively maps our control input to their corresponding disturbance input.:

V(z)=infd1,d2supu𝒱(z,u(),d1(),d2())V(\textbf{z})=\inf_{d_{1},d_{2}}\sup_{u}\mathcal{V}(z,u(\cdot),d_{1}(\cdot),d_{2}(\cdot))\\ (4)

III-A Player Strategies

We formulate uncertainties d1d_{1} and d2d_{2} around the two human driving actions, for example u1u_{1} and u3u_{3}, to represent behavioral properties that are trying to perturb the autonomous system. More specifically:

  1. 1.

    First, we consider a baseline assignment:

    • d1=u1d_{1}=u_{1} where u1[amin,amax]u_{1}\in[a_{min},a_{max}]

    • d2=u3d_{2}=u_{3} where u3[amin,amax]u_{3}\in[a_{min},a_{max}]

  2. 2.

    Then, we consider an alternative assignment for d2d_{2} by taking advantage of the structure of human driving and modeling u3u_{3} using a car following model g(z,d2)g(z,d_{2}):

    • d1=u1d_{1}=u_{1} where u1[amin,amax]u_{1}\in[a_{min},a_{max}]

    • u3=g(z,d2)u_{3}=g(z,d_{2})

In our second strategy, d2d_{2} uses psycho-physiological characteristics in human driving as an alternative modeling strategy [4]. Additionally, we ensure that the values of u3u_{3} are within realistic bounds given all possible autonomous agent’s actions u2u_{2}. This modelling strategy relaxes unrealistic extremities of the previous dynamic game formulation and implicitly models interaction effects between agents for realistic safety. We model the following vehicle’s driving behavior using the Intelligent Driver’s car following model and explicitly model d2d_{2} as safe-reaction time, T, as follows:

g(z,T)=a(1(v3v0)δ(s(z,T)xg2)2)g(z,T)=a\bigg{(}1-\bigg{(}\frac{v_{3}}{v_{0}}\bigg{)}^{\delta}-\bigg{(}\frac{s^{*}(z,T)}{x_{g_{2}}}\bigg{)}^{2}\bigg{)} (5)
s(z,T)=s0+max(0,v3T+v3(vg2)2ab)s^{*}(z,T)=s_{0}+\max\bigg{(}0,v_{3}T+\frac{v_{3}(-v_{g_{2}})}{2\sqrt{ab}}\bigg{)} (6)

where:
TT: safe reaction-time (ie.d2=T[0,Tmax])(ie.\ d_{2}=T\ \in\ [0,T_{max}])
s(z,T)s^{*}(z,T): desired headway of the following vehicle
s0s_{0}: minimum desired headway (ie. s0=0s_{0}=0 to allow crashes)
a,ba,b: maximum acceleration and deceleration respectively
δ,v0\delta,v_{0}: acceleration exponent (usually 4) and desired velocity respectively

III-B Resulting policies

The optimal control strategy, u2u_{2}^{*}, and the optimal disturbance strategy for the first human vehicle, d1d_{1}^{*} are calculated from (4) using the Hamiltonian numerics to be:

u2={amaxif (p4p2)>0aminelse\displaystyle u_{2}^{*}=\begin{cases}a_{max}&\mbox{if }(p_{4}-p_{2})>0\\ a_{min}&\mbox{else }\end{cases} (7)
d1={amaxif p2<0aminelse\displaystyle d_{1}^{*}=\begin{cases}a_{max}&\mbox{if }p_{2}<0\\ a_{min}&\mbox{else }\end{cases} (8)

and the optimal disturbance strategy for the second human vehicle is likewise calculated to be:

Baseline d2={amaxif p4>0aminelse\displaystyle\text{Baseline }d_{2}^{*}=\begin{cases}a_{max}&\mbox{if }p_{4}>0\\ a_{min}&\mbox{else }\end{cases} (9)
Alternative d2={min(Tmax,max(Tmin,vg22ab))if p4>0maxT|2abT+vg2|else\displaystyle\text{Alternative }d_{2}^{*}=\begin{cases}\min(T_{max},\max(T_{min},\frac{-v_{g_{2}}}{2\sqrt{ab}}))&\mbox{if }p_{4}>0\\ \displaystyle\max_{T}|2\sqrt{ab}T+v_{g_{2}}|&\mbox{else }\end{cases} (10)

where: p2p_{2} = vg1V\nabla_{v_{g_{1}}}V and p4p_{4} = vg2V\nabla_{v_{g_{2}}}V.

IV Results and Conclusion

As depicted in figure 4, the alternative formulation for the second disturbance is able to uncover hidden safe strategies for the 3-car scenario. Setting the disturbance and control bounds to [-1.5, 1.5] and [-2, 2] respectively, the baseline produces an empty blue set, meaning there are no guaranteed safe states for the simulation with this particular disturbance/control setting. However, by taking advantage of psycho-physiological characteristics and driver influences on the road, we discover that a non-empty blue set does exist.

Refer to caption
(a) d2d_{2} = Extreme Actions (baseline)
Refer to caption
(b) d2d_{2} = Reaction Time
Figure 4: Invariant safe states resulting from the baseline and alternative technique.

In conclusion, when considering worst-case uncertainty in human driving, if we are to guarantee safety in the chaotic world of driving, we may need to incorporate better information structures of human behavior in our analysis and update our assumptions as we uncover more knowledge about the system. However, by choosing a specific model structure to reduce conservativeness of reachability analysis, we run the risk of not being able to capture human behavior some of the time due to the limitations of our chosen model ( which serves to capture only approximations). The performance of a chosen model will vary greatly depending on the particular type of driver and circumstance. Therefore, to tackle this trade-off, we further aim to incorporate real-time analysis and data-driven models to learn disturbances (and how they accurately evolve), and maintain formal and robust safety guarantees using different learning strategies such as deep reinforcement learning.

Acknowledgments

This material is based upon work supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under the Vehicle Technologies Office award number CID DE-EE0008872. The views expressed herein do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

References

  • Bansal et al. [2017] Somil Bansal, Mo Chen, Sylvia Herbert, and Claire J Tomlin. Hamilton-jacobi reachability: A brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 2242–2253. IEEE, 2017.
  • Fisac et al. [2018] Jaime F Fisac, Anayo K Akametalu, Melanie N Zeilinger, Shahab Kaynama, Jeremy Gillula, and Claire J Tomlin. A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7):2737–2752, 2018.
  • Mitchell and Templeton [2005] Ian M Mitchell and Jeremy A Templeton. A toolbox of hamilton-jacobi solvers for analysis of nondeterministic continuous and hybrid systems. In International Workshop on Hybrid Systems: Computation and Control, pages 480–494. Springer, 2005.
  • Treiber and Thiemann [2013] M. Treiber and C. Thiemann. Traffic Flow Dynamics: Data, Models and Simulation. Springer-Verlag Berlin Heidelberg, Berlin, Germany, 2013.
  • Wu et al. [2017] Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, and Alexandre M Bayen. Flow: Architecture and benchmarking for reinforcement learning in traffic control. arXiv preprint arXiv:1710.05465, 2017.