This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Delay-aware Robust Control for Safe Autonomous Driving

Dvij Kalaria1, Qin Lin2∗, and John M. Dolan3 1Dvij Kalaria is with the Department of Computer Science and Engineering, IIT Kharagpur, India [email protected]2Qin Lin is with the Electrical Engineering and Computer Science Department, Cleveland State University [email protected] (*corresponding author)3John M. Dolan is with the Robotics Institute, Carnegie Mellon University [email protected]
Abstract

With the advancement of affordable self-driving vehicles using complicated nonlinear optimization but limited computation resources, computation time becomes a matter of concern. Other factors such as actuator dynamics and actuator command processing cost also unavoidably cause delays. In high-speed scenarios, these delays are critical to the safety of a vehicle. Recent works consider these delays individually, but none unifies them all in the context of autonomous driving. Moreover, recent works inappropriately consider computation time as a constant or a large upper bound, which makes the control either less responsive or over-conservative. To deal with all these delays, we present a unified framework by 1) modeling actuation dynamics, 2) using robust tube model predictive control, and 3) using a novel adaptive Kalman filter without assuming a known process model and noise covariance, which makes the controller safe while minimizing conservativeness. On the one hand, our approach can serve as a standalone controller; on the other hand, our approach provides a safety guard for a high-level controller, which assumes no delay. This can be used for compensating the sim-to-real gap when deploying a black-box learning-enabled controller trained in a simplistic environment without considering delays for practical vehicle systems.

I Introduction

The recent surge in autonomous driving has led to an increase in demand for increased affordability and accessibility. The stringent requirements of onboard computing and high-resolution sensors pose a major challenge in meeting this demand. While there has been much work on making algorithms more efficient, the required computation time is still dangerous in fast-changing environments such as high-speed scenarios. Most commonly used control algorithms assume the planned command to be applied instantaneously. This assumption can generate significant tracking errors and jeopardize stability.

This work presents a unified approach to dealing with three types of delay: 1) computation time delay; 2) actuator command processing delay; and 3) actuator dynamics delay. First, we model the actuator’s steering dynamics delay using a first-order ordinary differential equation (ODE). Second, we propose a delay-aware robust tube Model Predictive Control (MPC). It is coupled with our proposed filter, called INFLUENCE - adaptIve kalmaN FiLter with Unknown process modEl and Noise CovariancE (including process noise and measurement noise). INFLUENCE is a novel adaptive Kalman filter variant combining online identification of an unknown dynamic model and estimation of noise covariances. INFLUENCE probabilistically safely estimates the computation time delay. We propose two controller plans: the first plan serves as a standalone controller for a delay-aware robust control; in the second controller plan, we compensate a learning-enabled (LE) primary controller to boost its safety performance. This controller plan has application merit, since LE controllers are being used in autonomous systems. However, simplistic assumptions are usually made in the training procedure of the LE controllers. For safety-critical systems such as autonomous driving, it is crucial to close the sim-to-real gap by providing a safety-assured guard. We treat the LE controller as a high-level controller. In the low-level control part, we track the reference generated by the LE controller, but actively regulate its unsafe control by delay compensation.

In summary, we make the following novel contributions:

1. A unified delay-aware robust control approach dealing with three major delays: computation time delay, actuator command processing delay, and actuator dynamics delay.

2. INFLUENCE, a probabilistic framework for real-time estimation of computation time instead of taking an upper bound, which makes the algorithm safe while minimizing conservativeness. INFLUENCE has application merit for general safe prediction problems, since it does not assume known process model and noise covariance.

3. A control plan to safely compensate for these delays for a LE controller which doesn’t consider delays.

The rest of this paper is organized as follows. Section II provides a review of some important related work. Section III describes the methodology. Section IV presents the experimental results. The conclusions are in Section V.

II Related works

In the low-level control layer of a vehicle system, many algorithms ignore the delay arising from various factors such as computation, actuator command processing, sensor delay, and actuator dynamics. For the compensation of computation delay for discrete MPC, [1] proposes the simple solution of shifting the initial state by one control step, approximating it from the prediction model. However, this is not suitable if the computation time is more than one control step. [2] further proposes to use a buffer to store control commands from the previous batch. It also proposes the use of a pre-compensating unit and robust tube MPC to prove the safety of the system under bounded perturbation. This plan is well suited for static scenarios, where the objective is nearly constant throughout. However, for highly dynamic scenarios where the planned trajectory changes rapidly, taking the upper bound as the horizon length might lead to the algorithm’s being less responsive. We instead propose to use an adaptive Kalman filter to approximate a local upper bound on computation time and adapt to its changing values instead of taking an upper bound as the horizon length.

For compensation of delay due to actuator dynamics, [3] proposes approximating the actuator dynamics by a first-order ODE, after which the actuator state can be augmented in the state space model. The approach has been tested on differential braking stability control. Instead, we use a similar design for compensating steering delay.

For considering delay caused by processing of actuator commands at the actuator, [4] proposes an initial state transition method similar to the compensation for computation delay, but a preview continuous controller. It proposes a closed-loop solution to compensate for a dead time between when the command is planned and when it reaches the actuator. [5] further extends the idea by including compensation for actuator saturation, as well to make the solution deployable on real systems with control limits. However, with the use of the preview controller, it becomes difficult to include state constraints in the system. [6] proposes a simple way to compensate for the sensor delay by transforming the frame of sensor values to reflect values at the current time and not at the time when they were recorded. Our work, however, considers computation delay, actuator command processing delay, and actuator dynamics as well as control and state constraints under one optimization framework in the context of autonomous vehicle control.

III Methodology

III-A Notation

A polytope is defined as a convex hull of finite points in dd-dimensional space d\mathbb{R}^{d}. The Minkowski sum of two polytopes is defined as PQ:={x+qd:xP,qQ}P\oplus Q:=\{x+q\in\mathbb{R}^{d}:x\in P,q\in Q\}. The Pontryagin difference of two polytopes is defined as PQ:={xd:x+qP,qQ}P\ominus Q:=\{x\in\mathbb{R}^{d}:x+q\in P,q\in Q\}.

III-B System dynamics

A kinematic bicycle model describes the dynamics of the vehicle with the state variables being position (pxp_{x}, pyp_{y}), heading angle (θ\theta), and velocity (vv), and the control variables acceleration (aa) and steering angle (δ\delta). It is commonly assumed in the literature that steering angle δ\delta and acceleration aa are applied instantaneously by the actuators. But in reality, there is a certain lag between the command and when the actuator physically modifies the steering angle state, which is called the actuator dynamic delay. We modify the system dynamics to include the actual steering angle as a state denoted as δa\delta_{a}. The control command is now the desired steering angle δ\delta. We approximate the change in the steering angle state by a first-order ODE similar to [3], i.e., δa˙=Kδ(δδa)\dot{\delta_{a}}=K_{\delta}(\delta-\delta_{a}), where KδK_{\delta} is the inverse of the time constant for the steering actuator. For acceleration, pedal dynamics are assumed to be instantaneous for the experiments in this paper. However, they can also be approximated in the same way. After discretization, the vehicle state is now modified as xk=[px,k,py,k,θk,vk,δa,k]x_{k}=[p_{x,k},p_{y,k},\theta_{k},v_{k},\delta_{a,k}]. The discrete dynamics are given in Eq. 1.

px,k+1=px,k+sin(θk+κklk)sin(θk)κk\displaystyle p_{x,k+1}=p_{x,k}+\frac{\sin(\theta_{k}+\kappa_{k}l_{k})-\sin(\theta_{k})}{\kappa_{k}} (1a)
py,k+1=py,k+cos(θk)cos(θk+κklk)κk\displaystyle p_{y,k+1}=p_{y,k}+\frac{\cos(\theta_{k})-\cos(\theta_{k}+\kappa_{k}l_{k})}{\kappa_{k}} (1b)
θk+1=θk+κklk\displaystyle\theta_{k+1}=\theta_{k}+\kappa_{k}l_{k} (1c)
vk+1=vk+akΔt\displaystyle v_{k+1}=v_{k}+a_{k}\Delta t (1d)
δa,k+1=δk(δkδa,k)(eKΔt1)\displaystyle\delta_{a,k+1}=\delta_{k}-(\delta_{k}-\delta_{a,k})(e^{K\Delta t}-1) (1e)
where the curvature κk=tan(δa,k)CrL\kappa_{k}=\frac{\tan(\delta_{a,k})C_{r}}{L}, LL is the vehicle length, and the travel distance lk=vkΔt+12akΔt2l_{k}=v_{k}\Delta t+\frac{1}{2}a_{k}\Delta t^{2}.

III-C Robust Tube MPC

For a discrete linear system with system matrices An×nA\in\mathbb{R}^{n\times n} and Bn×mB\in\mathbb{R}^{n\times m}, let the control gain Km×nK\in\mathbb{R}^{m\times n} be such that the feedback system of AK=A+BKA_{K}=A+BK is stable. Let 𝒵\mathcal{Z} be the disturbance-invariant set for the controlled uncertain system x+=AKx+wx^{+}=A_{K}x+w, satisfying AK𝒵𝒲𝒵A_{K}\mathcal{Z}\oplus\mathcal{W}\subseteq\mathcal{Z}, where the disturbance ww is assumed to be bounded (w𝒲w\in\mathcal{W}) by a polyhedron that contains the origin in its interior. The following finite optimization problem is solved at each step for X¯={x¯0,x¯1,x¯N}\bar{X}=\{\bar{x}_{0},\bar{x}_{1}...,\bar{x}_{N}\}, U¯={u¯0,u¯1,u¯N1}\bar{U}=\{\bar{u}_{0},\bar{u}_{1}...,\bar{u}_{N-1}\} and reference state sequence Xref={xref,0,xref,1,xref,N}X_{ref}=\{x_{ref,0},x_{ref,1}...,x_{ref,N}\} obtained from the path planner, where NN is the horizon length, and X¯\bar{X} and U¯\bar{U} are the state and control sequences of a nominal system ignoring ww:

minx¯0,Ut=0N1(x¯kxref,k)TQ(x¯kxref,k)+u¯kTRu¯k+(x¯Nxref,N)TQN(x¯Nxref,N)s.t.x¯k+1=Ax¯k+Bu¯kx0x¯0𝒵u¯𝒰K𝒵x¯𝒳𝒵\begin{split}\mathop{\min}\limits_{\bar{x}_{0},U}&\quad\sum_{t=0}^{N-1}(\bar{x}_{k}-{x_{ref,k}})^{T}Q(\bar{x}_{k}-{x_{ref,k}})+\bar{u}_{k}^{T}R\bar{u}_{k}\\ &+(\bar{x}_{N}-{x_{ref,N}})^{T}Q_{N}(\bar{x}_{N}-{x_{ref,N}})\\ s.t.&\quad\bar{x}_{k+1}=A\bar{x}_{k}+B\bar{u}_{k}\\ &\quad x_{0}\in\bar{x}_{0}\oplus\mathcal{Z}\\ &\quad\bar{u}\in\mathcal{U}\ominus K\mathcal{Z}\\ &\quad\bar{x}\in\mathcal{X}\ominus\mathcal{Z}\end{split} (2)

where QQ, RR and QNQ_{N} are the state, control and terminal state cost matrices, respectively. The control command given is u=u¯+K(xx¯)u=\bar{u}+K(x-\bar{x}), where xx is the current state. This guarantees x+x¯++𝒵x^{+}\in\bar{x}^{+}+\mathcal{Z} for any w𝒲w\in\mathcal{W}, i.e., all states xkx_{k} will be inside the constraint set 𝒳\mathcal{X}. However, for a nonlinear system as in our case, we use the equivalent LTV system, where the system matrices AA and BB are replaced with Jacobian matrices AkA_{k} and BkB_{k} at the current state for the system dynamics used in Eq. 1. For more details as well as a detailed proof of the feasibility and the stability of the above controller, see [7]. Also, a mismatch between the linearized and the actual model can be compensated by adding an additional disturbance w^\hat{w} assuming the model non-linear function to be a Lipschitz function [8]. For the experiments in this work, we assume the disturbance margin is large enough to cover this extra disturbance.

III-D Delay-aware robust tube MPC

For the above formulation, we assumed delay time to be zero, meaning that the computed command is delivered to the system at the same time the observation is made for the current state xx. But in practice, there is computation time denoted as tct_{c} and an actuator command processing delay tat_{a} after the calculated command is delivered, resulting in a total delay time td=tc+tat_{d}=t_{c}+t_{a}. Hence, if the current state xx is observed at time tt, the computed command influences the actuator state at t+tdt+t_{d} time. This may lead to instability of the system if tdt_{d} is large and the robust tube assumptions no longer hold true. In order to tackle this problem, [2] proposes a bi-level control scheme to deal with time delay and also proves robustness using the tube MPC. A buffer block of commands is used for communication between higher- and lower-level units, as depicted in Fig. 1. At time tt, the set of possible states at time t+tht+t_{h}, where tht_{h} is the horizon length, is predicted. The high-level tube MPC updates the buffer with nominal states and control commands from time t+tht+t_{h} to t+2tht+2t_{h}. If the higher-level MPC requires a delay time tct_{c} that is less than tht_{h}, the system waits for the remaining time thtct_{h}-t_{c} for time alignment. However, we believe this is only suited for systems when the objective is nearly constant. For dynamically changing objectives as well as state constraints, it is necessary to update the reference path more frequently, since waiting for the full horizon path to be followed may lead to inconsistencies. In the case of autonomous driving, for dynamic scenarios where the reference path has to be updated frequently, it would not be feasible to wait for tht_{h} to get a new updated path.

Hence, we propose to get a local probabilistic upper-bound estimate t^c\hat{t}_{c} of the computation time. We update the buffer from (t+t^ct+\hat{t}_{c} to t+t^c+tht+\hat{t}_{c}+t_{h}) instead of from (t+tht+t_{h} to t+2tht+2t_{h}), as shown in Fig. 1. This increases the controller plan update rate for the higher-level MPC and also makes the controller robust to changing computation times. For estimation of the local upper bound t^c\hat{t}_{c}, we use an adaptive Kalman filter, as further described in Section III-E. Considering tat_{a} to be the extra delay due to actuator command processing, we thus get a new local upper-bound delay time estimate t^d=t^c+ta\hat{t}_{d}=\hat{t}_{c}+t_{a}, which we use to find the initial state estimate xt^d|tx_{\hat{t}_{d}|t} assuming no disturbance after t^d\hat{t}_{d} time given the current state xtx_{t}. It can be calculated by piece-wise integration of the system dynamics using the control commands from the buffer. Hence, a command issued at t+t^ct+\hat{t}_{c} will be executed at t+t^c+ta=t+t^dt+\hat{t}_{c}+t_{a}=t+\hat{t}_{d}, where we consider our initial state to be for optimization, as depicted in Fig. 1. Mathematically, the updated objective function is described in Eq. 3, where xt^d+tx_{\hat{t}_{d}+t} is the actual state after time t^d\hat{t}_{d}. The calculated nominal discrete states (X¯\bar{X}) and controls (U¯\bar{U}) are used to fill the buffer B from time t+t^ct+\hat{t}_{c} to t+t^c+tht+\hat{t}_{c}+t_{h} as u¯[t+t^c+kΔt,t+t^c+(k+1)Δt]=u¯k\bar{u}_{[t+\hat{t}_{c}+k\Delta t,t+\hat{t}_{c}+(k+1)\Delta t]}=\bar{u}_{k} for k{0,1,2N1}k\in\{0,1,2...N-1\}, as shown in Fig. 1. The pre-compensator unit is a low-level process which executes commands in the buffer at a higher frequency than the high-level MPC.

Refer to caption
Figure 1: Dual cycle control scheme for tube MPC with delay
minx¯0,Uk=0N1(x¯kxref,k)TQ(x¯kxref,k)+u¯kTRu¯k+(x¯Nxref,N)TQN(x¯Nxref,N)s.t.x¯k+1=Ax¯k+Bu¯kxtd+tx¯0𝒵xtd|tx¯0𝒵(j=0s1AKj𝒲)s=tdΔtu¯𝒰K𝒵x¯𝒳𝒵\begin{split}\mathop{\min}\limits_{\bar{x}_{0},U}&\quad\sum_{k=0}^{N-1}(\bar{x}_{k}-{x_{ref,k}})^{T}Q(\bar{x}_{k}-{x_{ref,k}})+\bar{u}_{k}^{T}R\bar{u}_{k}\\ &\quad\quad\quad+(\bar{x}_{N}-{x_{ref,N}})^{T}Q_{N}(\bar{x}_{N}-{x_{ref,N}})\\ s.t.&\quad\bar{x}_{k+1}=A\bar{x}_{k}+B\bar{u}_{k}\\ &\quad x_{t_{d}+t}\in\bar{x}_{0}\oplus\mathcal{Z}\implies x_{t_{d}|t}\in\bar{x}_{0}\oplus\mathcal{Z}\ominus(\oplus_{j=0}^{s-1}A_{K}^{j}\mathcal{W})\\ &\quad\ s=\left\lceil\frac{t_{d}}{\Delta t}\right\rceil\\ &\quad\bar{u}\in\mathcal{U}\ominus K\mathcal{Z}\\ &\quad\bar{x}\in\mathcal{X}\ominus\mathcal{Z}\end{split} (3)

III-D1 Control Constraints (𝒰\mathcal{U})

Limits on acceleration and steering are formulated as control constraints for the optimization problem.

III-D2 State Constraints (𝒳\mathcal{X})

For the state variables θ\theta and vv, we set the upper and lower bound for their range. However, for the state variables pxp_{x} and pyp_{y}, free space is non-convex in nature, hence it becomes quite computationally expensive to set them in the non-convex form for the optimization problem. Hence we use the IRIS algorithm we have used in previous work [9] to derive a set of convex constraints which can be used for efficient optimization of the path tracking problem while also ensuring safety through collision avoidance. IRIS optimizes the objective of finding linear constraints for each obstacle such that the resultant convex space fits the largest possible ellipsoid. We set the seed for IRIS as the predicted position (without uncertainty) of the vehicle after time t^d\hat{t}_{d} to get the resultant convex space 𝒳\mathcal{X}.

III-D3 Disturbance-invariant set (𝒵\mathcal{Z})

The disturbance-invariant set can be over-approximated using 𝒵k=Σi=0NAK,ki𝒲\mathcal{Z}_{k}=\Sigma^{N}_{i=0}A_{K,k}^{i}\mathcal{W} [10] where AK,k=Ak+BkKkA_{K,k}=A_{k}+B_{k}K_{k}. However, in the presence of delay, AK,k+1A_{K,k+1} would be different for the next control cycle, hence for robustness, 𝒵k\mathcal{Z}_{k} must be sufficient to be covered by 𝒵k+1\mathcal{Z}_{k+1}.

Theorem 1.

Given the optimization in Eq. 3, the disturbance-invariant set 𝒵k\mathcal{Z}_{k} is calculated as 𝒵=AKSA(j=0N1AKj𝒲)\mathcal{Z}=\bigcup\limits_{A_{K}\sim S_{A}}(\oplus_{j=0}^{N-1}A_{K}^{j}\mathcal{W}), i.e., a union set with all possible values of initial heading angle, speed and steering angle within the admissible range, which determines the matrix AK=A+BKA_{K}=A+BK. The invariant set guarantees robust initialization of the optimization problem. The calculation of 𝒵\mathcal{Z} is computationally expensive, hence is carried out offline. Calculating 𝒵\mathcal{Z} offline and using the fixed value online for calculating the safe region for the vehicle doesn’t increase the computation time of the online algorithm significantly.

Proof.

xtd|tkx_{t_{d}|t_{k}} is the expected state assuming no disturbance after tdt_{d} for the current time tkt_{k}, and the application of u¯\bar{u} with feedback. xtd+tkx_{t_{d}+t_{k}} is the actual state at td+tkt_{d}+t_{k}. We have xtd+tkxtd|tk(j=0s1Akj𝒲)x_{t_{d}+t_{k}}\in x_{t_{d}|t_{k}}\oplus(\oplus_{j=0}^{s-1}A_{k}^{j}\mathcal{W}), where s=tdΔts=\left\lceil\frac{t_{d}}{\Delta t}\right\rceil, From Eq. 3, we have xtd+tkx¯0|td+tk𝒵x_{t_{d}+t_{k}}\in\bar{x}_{0|t_{d}+t_{k}}\oplus\mathcal{Z}. Hence, in order to guarantee robustness, xtd|tkx¯0|td+tk𝒵(j=0s1AKj𝒲)x_{t_{d}|t_{k}}\in\bar{x}_{0|t_{d}+t_{k}}\oplus\mathcal{Z}\ominus(\oplus_{j=0}^{s-1}A_{K}^{j}\mathcal{W}), we need to ensure x¯0|td+tk\bar{x}_{0|t_{d}+t_{k}} has a valid solution, i.e., (j=0s1AKj𝒲)𝒵(\oplus_{j=0}^{s-1}A_{K}^{j}\mathcal{W})\subseteq\mathcal{Z}. As sNs\leq N and 0n𝒲0_{n}\in\mathcal{W}, we can establish

(j=0s1AKj𝒲)(j=0s1AKj𝒲)(j=sN1AKj𝒲)(\oplus_{j=0}^{s-1}A_{K}^{j}\mathcal{W})\subseteq(\oplus_{j=0}^{s-1}A_{K}^{j}\mathcal{W})\oplus(\oplus_{j=s}^{N-1}A_{K}^{j}\mathcal{W}) (4)

Thus, j=0s1AKj𝒲j=0N1AKj𝒲\oplus_{j=0}^{s-1}A_{K}^{j}\mathcal{W}\subseteq\oplus_{j=0}^{N-1}A_{K}^{j}\mathcal{W}. AKA_{K} is the Jacobian matrix, which depends on θ\theta, vv, and δ\delta of xtd|tkx_{t_{d}|t_{k}}. Let’s define set SA={AK(θ,v,δ)|θ[π,+π],δ[δmax,δmax],v[vmin,vmax]}S_{A}=\{A_{K}(\theta,v,\delta)|\theta\in[-\pi,+\pi],\delta\in[-\delta_{max},\delta_{max}],v\in[v_{min},v_{max}]\}, which consists of all possible matrices.

(j=0N1AKj𝒲)AKSA(j=0N1AKj𝒲)(\oplus_{j=0}^{N-1}A_{K}^{j}\mathcal{W})\subseteq\bigcup\limits_{A_{K}\sim S_{A}}(\oplus_{j=0}^{N-1}A_{K}^{j}\mathcal{W}) (5)

Thus, j=0s1AKj𝒲AKSA(j=0N1AKj𝒲)\oplus_{j=0}^{s-1}A_{K}^{j}\mathcal{W}\subseteq\bigcup\limits_{A_{K}\sim S_{A}}(\oplus_{j=0}^{N-1}A_{K}^{j}\mathcal{W}). 𝒵\mathcal{Z} is chosen as AKSA(j=0N1AKj𝒲)\bigcup\limits_{A_{K}\sim S_{A}}(\oplus_{j=0}^{N-1}A_{K}^{j}\mathcal{W}), which concludes the proof. ∎

III-E Estimating computation time

We propose INFLUENCE for estimating a local upper bound on computation time. The conventional Kalman filter faces a crucial challenge when the dynamic model and noise covariance are unknown. On the subject of adaptive filters addressing this challenge, existing approaches either assume an unknown dynamic model but known covariance [11] or unknown covariance but a known dynamic model [12, 13]. INFLUENCE assumes the process model γ\gamma, process noise variance qq and measurement noise variance rr are all unknown. Since measuring time is direct but noisy, it is reasonable to assume the measurement matrix to be the identity matrix. However, if the assumption does not hold in other applications, the identification can be facilitated using the similar approach for process model identification in the INFLUENCE algorithm. We assume both noise distributions to be Gaussian, independent and mutually uncorrelated throughout. To make the optimization tractable, INFLUENCE iteratively fixes γ\gamma first to identify qq and rr (see Eq. 6c to Eq. 6k), which are then fixed to update γ\gamma (see Eq. 6l to Eq. 6m). For the identification of qq and rr, an exponential moving average is maintained to estimate prediction error ee and measurement error ww, respectively. Parameters NqN_{q} and NrN_{r} determine the influence of older values on the moving average: the greater their values, the greater the weighting of older values. Thus, an incremental update for qq and rr can be done in Eq. 6d and Eq. 6j. These update rules have been adapted from [12, 13], while the update rules for state transition parameters θ\theta have been adapted from [11]. FF is the learning gain, and λ=Nθ1Nθ\lambda=\frac{N_{\theta}-1}{N_{\theta}} denotes the forgetting factor for estimation of θ\theta, where NθN_{\theta} is the buffer size for the process model’s identification. The initialization of INFLUENCE is in Eq. 6a.

x0|0=tc,0,p0|0=0,q0=ϵ,r0=ϵ\displaystyle x_{0|0}=t_{c,0},p_{0|0}=0,q_{0}=\epsilon,r_{0}=\epsilon
e0=0,w0=0,F0=I2,γ=[1 0]T\displaystyle e_{0}=0,w_{0}=0,F_{0}=I_{2},\gamma=[1\ \ 0]^{T} (6a)
xn|n1=γn1T[xn1|n1 1]T\displaystyle x_{n|n-1}=\gamma_{n-1}^{T}[x_{n-1|n-1}\ \ 1]^{T} (6b)
pn|n1=γn1,02pn1|n1+qn1\displaystyle p_{n|n-1}=\gamma_{n-1,0}^{2}\ p_{n-1|n-1}+q_{n-1} (6c)
en=Nr1Nren1+1Nr(tc,nxn|n1)\displaystyle e_{n}=\frac{N_{r}-1}{N_{r}}e_{n-1}+\frac{1}{N_{r}}(t_{c,n}-x_{n|n-1}) (6d)
Δrn=((tc,nxn|n1)en)2Nr1pn|n1Nr\displaystyle\Delta r_{n}=\frac{((t_{c,n}-x_{n|n-1})-e_{n})^{2}}{N_{r}-1}-\frac{p_{n|n-1}}{N_{r}} (6e)
rn=|Nr1Nrrn1+Δrn|\displaystyle r_{n}=\left|\frac{N_{r}-1}{N_{r}}r_{n-1}+\Delta r_{n}\right| (6f)
Kn=pn|n1pn|n1+rn\displaystyle K_{n}=\frac{p_{n|n-1}}{p_{n|n-1}+r_{n}} (6g)
xn|n=xn|n1+Kn(tc,nxn|n1)\displaystyle x_{n|n}=x_{n|n-1}+K_{n}(t_{c,n}-x_{n|n-1}) (6h)
pn|n=(1Kn)pn|n1\displaystyle p_{n|n}=(1-K_{n})p_{n|n-1} (6i)
wn=Nq1Nqwn1+1Nq(xn|nxn|n1)\displaystyle w_{n}=\frac{N_{q}-1}{N_{q}}w_{n-1}+\frac{1}{N_{q}}(x_{n|n}-x_{n|n-1}) (6j)
Δqn=pn|nγn1,02pn1|n1Nq+((xn|nxn|n1)wn)2Nq1\displaystyle\Delta q_{n}=\frac{p_{n|n}-\gamma_{n-1,0}^{2}\ p_{n-1|n-1}}{N_{q}}+\frac{((x_{n|n}-x_{n|n-1})-w_{n})^{2}}{N_{q}-1} (6k)
qn=|Nq1Nqqn1+Δqn|\displaystyle q_{n}=\left|\frac{N_{q}-1}{N_{q}}q_{n-1}+\Delta q_{n}\right| (6l)
Let ϕ be [xn1|n1 1]T\displaystyle\text{Let }\phi\text{ be }[x_{n-1|n-1}\ \ 1]^{T}
Fn=1λ(Fn1Fn1ϕϕTFn1λ+ϕTFn1ϕ)\displaystyle F_{n}=\frac{1}{\lambda}\left(F_{n-1}-\frac{F_{n-1}\phi\phi^{T}F_{n-1}}{\lambda+\phi^{T}F_{n-1}\phi}\right) (6m)
γn=γn1+Fnϕ(xn|nxn|n1)\displaystyle\gamma_{n}=\gamma_{n-1}+F_{n}\phi(x_{n|n}-x_{n|n-1}) (6n)

where tc,nt_{c,n} is the observed computation time at step nn, and γn1,0\gamma_{n-1,0} is the first element of the vector at step n1n-1.

For the local upper bound estimate, we use the predicted value and variance to get an upper-bound estimate on computation time, i.e., t^c,n=xn|n1+βpn|n1\hat{t}_{c,n}=x_{n|n-1}+\beta p_{n|n-1}. We choose the parameter β\beta accordingly to get sufficiently high confidence as an upper bound assuming a Gaussian distribution. It is important to note here that the addition of an adaptive Kalman filter for estimation of computation time doesn’t significantly affect the computation time of the overall algorithm, as augmenting by a fixed number of computations in the cycle leads to an insignificant increase in computation time as opposed to window-based estimation methods whose computational complexity depends on the window size.

III-F Controller Plan A

We present a standalone controller (called plan A) as depicted in Fig. 2. We compensate for the actuator’s steering dynamics by modelling a first-order ODE. For the compensation of the computation and actuator command processing delays, we use initial state shift by the estimated local upper bound on the net delay time. The optimization problem updates the robust tube buffer from t+t^ct+\hat{t}_{c} to t+t^c+tht+\hat{t}_{c}+t_{h} with the nominal commands and states. The pre-compensator unit runs as a low-level process to refine the control with a higher frequency (see Fig. 2).

Refer to caption
Figure 2: Controller Plan A for Robust tube MPC. If the true delay tdt_{d} is smaller than the estimated t^d\hat{t}_{d}, we need to wait t^dtd\hat{t}_{d}-t_{d} for a time alignment.

III-G Controller Plan B

As an alternate plan, we compensate for the actuator and computation delay of a nominal controller (see Fig. 3). It can be a black-box controller, which can be used for a LE controller trained in a simplified simulation environment without any practical delay. For the computation time and actuator processing delay compensation, we use the same design as plan A by shifting the initial state. For actuator dynamic delay compensation, we use a separate unit which takes the commands from the nominal controller as a reference to track. The computation time and actuator processing delays are estimated via Plan A to shift the initial state of the LE controller and conduct rollouts to obtain sequential commands U^={u^1,u^2,u^N}\hat{U}=\{\hat{u}_{1},\hat{u}_{2}...,\hat{u}_{N}\}. The refined commands after compensating for actuator dynamic delay are U={u1,u2..,uN}U=\{u_{1},u_{2}..,u_{N}\}. UU is obtained by solving a quadratic optimization problem (Eq. 7) where ustartu_{start} is the current value of the steering angle, rkr_{k} is the unit step response of the steering actuator at the kthk^{th} time step, and QacQ_{ac} and RacR_{ac} are positive semidefinite weight matrices. The optimization is to track the desired actuator commands from the LE controller as closely as possible while minimizing the control effort. Though collision avoidance is not considered in the experiments for Plan B, safety constraints such as control barrier functions [14] can be easily incorporated into the optimization.

minUk=1Nu^k(u0+i=1i=k(uiui1)rki+1)Qac+ukRacs.t.u0ustart=0\begin{split}\mathop{\min}\limits_{U}&\quad\sum_{k=1}^{N}\left\|\hat{u}_{k}-(u_{0}+\sum_{i=1}^{i=k}(u_{i}-u_{i-1})r_{k-i+1})\right\|_{Q_{ac}}\\ &\quad+\left\|u_{k}\right\|_{R_{ac}}\\ s.t.&\quad u_{0}-u_{start}=0\\ \end{split} (7)

where ri=(1eKiΔt)r_{i}=(1-e^{-Ki\Delta t}).

Refer to caption
Figure 3: Controller Plan B for black box controller.

IV Simulation Results

We conduct the experiments in the Gazebo simulator with a Prius vehicle model. In order to get the time constant value for the steering actuator, we test its unit response, i.e., we set the actuator command to 1 and record the steering angle values over a time window sufficient for the steering angle to converge at the maximum value. We then fit the observed response values with the first-order ODE described in Section III-B and determine the parameter KδK_{\delta}. As shown in Fig. 4, using Kδ=30K_{\delta}=30 hz well approximates the actuator dynamics for the Prius model in Gazebo. We have also attached a short compiled video111https://www.youtube.com/watch?v=KdqClN4fiR0 run for all the experiments discussed.

Refer to caption
Figure 4: Steering angle response for Prius model in Gazebo

IV-A Static scenario

For testing controller plan A, we perform a static obstacle avoidance experiment. The planner used is hybrid A [15]. We compare the paths followed by the controller with and without considering delays. In the case in which no compensation is considered, due to the delays, the vehicle overshoots during the first turn and when it tries to get back to the reference safely, the vehicle collides with the static obstacle as marked by pose B in Fig. 5. In the cases in which delay compensations are considered, the resulting path followed is closer to the reference line and smoother, while safely avoiding collision. We also compare the results between when 1) the delay time is taken as an upper bound equal to the horizon length [2] and 2) the local upper bound estimate is found using INFLUENCE. The path followed using our proposed method is clearly seen to be smoother in Fig. 5. This is because in the case of constant delay compensation, at pose A, the state constraints generated from IRIS force the vehicle to deviate from the reference path, which thus overshoots by a significant amount, but is still able to get back to the reference safely. On the other hand, if we approximate the delay time using INFLUENCE and adjust the expected local upper bound value accordingly (see Fig. 6), the controller responds faster. Hence, after passing pose A, the state constraints change and the controller reacts faster to get back to the reference path, giving less overshoot. It is important to note here that the minute computation time difference between the with and without delay compensation cases is due to the shifted state calculation and the extended Kalman filter, but it seems that having an efficient delay-aware algorithm outweighs any effect the increased computation time has on making the controller less responsive.

Refer to caption
Figure 5: Comparison of paths followed. Yellow region is the convex state constraint set from IRIS at pose A for Experiment IV-A. The red box is a static obstacle.
Refer to caption
Figure 6: Observed and estimated computation times using INFLUENCE for Experiment IV-A.

IV-B Overtaking scenario

We further test the controller plan A in an overtaking scenario, in which the lead vehicle brakes suddenly at point A when t=4st=4s. We use the Frenet planner [16] with the reference path as the lane center. Frenet frame-based planning has been successful in practice due to the significant advantage of its independence from complex road geometry. We perform the experiment with the same starting conditions and compare the results with (Fig. 7(a)) and without (Fig. 7(b)) delay compensation. The Frenet planner expects the ego vehicle to move at constant speed, but as the speed rapidly drops at point A, the reference path changes rapidly. The Frenet planner thus rapidly changes path after point A. Point B is the closest position between the ego vehicle and lead vehicle in all the cases. If delay time is not considered, the ego vehicle hits the other vehicle slightly at point B. Also, in this case if computation time is taken as a constant upper bound of 0.2s0.2s, due to the slow reaction of the controller, the ego vehicle hits the lead vehicle at point B (Fig. 7(c)), which proves that taking the computation time as an upper bound is ineffective in a rapidly changing environment.

Refer to caption
(a) With variable local upper bound on computation delay.
Refer to caption
(b) Without computation delay consideration.
Refer to caption
(c) With constant upper bound on computation delay (0.2s).
Figure 7: Comparison in Experiment IV-B. The red boxes denote the ego vehicle at A and B, while the blue boxes denote the opponent vehicle at AA^{\prime} and BB^{\prime}.

IV-C Closed track scenario

We test controller plan B for a LE controller trained in an ideal environment without delays. The LE lateral controller is a neural network trained on waypoint following with inputs [Δx,Δy,Δθ][\Delta x,\Delta y,\Delta\theta], where (Δx,Δy)(\Delta x,\Delta y) and Δθ\Delta\theta are respectively the relative position and heading with a target waypoint. The output is the steering angle δ\delta. The network architecture is a simple feed-forward neural network with hidden layer sizes (4,16,4)(4,16,4). For longitudinal control, simple PID control is used to track a constant speed of 75m/s75\ m/s throughout. The simulation has been performed in the Ansys VRX simulator [17]. When deploying the LE controller in such a high-speed waypoint-following scenario in Gazebo, it performs worse due to the practical delays. The LE controller loses control at the time of turning, see Fig. 8. By using the proposed plan B, the vehicle retains control. The vehicle is operating at its friction limits, hence even a little bit of error caused due to delay leads to the vehicle losing control, even when the computation time is just around 0.02s.

Refer to caption
Figure 8: Comparison of high-speed tracking for Experiment IV-C

V Conclusion

We propose a unified framework for compensating the delays of computation, actuator command processing and actuator dynamics in autonomous driving systems. We propose the INFLUENCE algorithm to safely approximate the computation time. With the use of tube MPC, the vehicle safely tracks the planned trajectories in realistic scenarios tested in the high-fidelity Gazebo simulator. Lastly, we present a framework for compensating delays for a black-box controller trained in an ideal environment. The simulation results demonstrate safety and real-time performance of our proposed framework.

Acknowledgment

This material is based upon work supported by the United States Air Force and DARPA under Contract No. FA8750-18-C-0092. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA.

References

  • [1] P. Cortes, J. Rodriguez, C. Silva, and A. Flores, “Delay compensation in model predictive current control of a three-phase inverter,” IEEE Transactions on Industrial Electronics, vol. 59, pp. 1323–1325, 2011.
  • [2] Y. Su, K. K. Tan, and T. H. Lee, “Computation delay compensation for real time implementation of robust model predictive control,” Journal of Process Control, vol. 23, no. 9, pp. 1342–1349, 2013.
  • [3] A. Nahidi, A. Khajepour, A. Kasaeizadeh, S.-K. Chen, and B. Litkouhi, “A study on actuator delay compensation using predictive control technique with experimental verification,” Mechatronics, vol. 57, pp. 140–149, 2019.
  • [4] Y. Liao and F. Liao, “Design of preview controller for linear continuous-time systems with input delay,” International Journal of Control, Automation and Systems, vol. 16, no. 3, pp. 1080–1090, 2018.
  • [5] N. E. Kahveci and P. A. Ioannou, “Automatic steering of vehicles subject to actuator saturation and delay,” in 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), 2011, pp. 119–124.
  • [6] A Multi-Stage Optimization Formulation for MPC-Based Obstacle Avoidance in Autonomous Vehicles Using a LIDAR Sensor, ser. Dynamic Systems and Control Conference, vol. Volume 2: Ground and Space Vehicle Dynamics, 10 2014.
  • [7] R. Smith, “Robust model predictive control of constrained linear systems,” in Proceedings of the 2004 American Control Conference, vol. 1, 2004, pp. 245–250 vol.1.
  • [8] Y. Gao, A. Gray, H. E. Tseng, and F. Borrelli, “A tube-based robust nonlinear predictive control approach to semiautonomous ground vehicles,” Vehicle System Dynamics, vol. 52, no. 6, pp. 802–823, 2014.
  • [9] S. Khaitan, Q. Lin, and J. M. Dolan, “Safe planning and control under uncertainty for self-driving,” IEEE Transactions on Vehicular Technology, vol. 70, no. 10, pp. 9826–9837, 2021.
  • [10] S. V. Raković, E. Kerrigan, K. Kouramas, and D. Mayne, Invariant approximations of robustly positively invariant sets for constrained linear discrete-time systems subject to bounded disturbances.   University of Cambridge, Department of Engineering Cambridge, 2004.
  • [11] C. Liu and M. Tomizuka, “Safe exploration: Addressing various uncertainty levels in human robot interactions,” in 2015 American Control Conference (ACC).   IEEE, 2015, pp. 465–470.
  • [12] I. Hashlamon and K. Erbatur, “An improved real-time adaptive kalman filter with recursive noise covariance updating rules,” Turkish Journal of Electrical Engineering and Computer Sciences, 12 2013.
  • [13] K. Myers and B. Tapley, “Adaptive sequential estimation with unknown noise statistics,” IEEE Transactions on Automatic Control, vol. 21, no. 4, pp. 520–523, 1976.
  • [14] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European Control Conference (ECC).   IEEE, 2019, pp. 3420–3431.
  • [15] D. Dolgov, S. Thrun, M. Montemerlo, and J. Diebel, “Practical search techniques in path planning for autonomous driving,” Ann Arbor, vol. 1001, no. 48105, pp. 18–80, 2008.
  • [16] M. Werling, J. Ziegler, S. Kammel, and S. Thrun, “Optimal trajectory generation for dynamic street scenarios in a frenet frame,” in 2010 IEEE International Conference on Robotics and Automation.   IEEE, 2010.
  • [17] Ansys corporation, “Vrxperience driving simulator.” [Online]. Available: https://www.ansys.com/en-in/products/av-simulation/ansys-vrxperience-driving-simulator