Pedestrian Trajectory Forecasting Using Deep Ensembles Under Sensing Uncertainty

Anshul Nayak, Azim Eskandarian, Zachary Doerzaph, Prasenjit Ghorai Autonomous Systems and Intelligent Machines lab, Virginia Tech.

Abstract

One of the fundamental challenges in the prediction of dynamic agents is robustness. Usually, most predictions are deterministic estimates of future states which are over-confident and prone to error. Recently, few works have addressed capturing uncertainty during forecasting of future states. However, these probabilistic estimation methods fail to account for the upstream noise in perception data during tracking. Sensors always have noise and state estimation becomes even more difficult under adverse weather conditions and occlusion. Traditionally, Bayes filters have been used to fuse information from noisy sensors to update states with associated belief. But, they fail to address non-linearities and long-term predictions. Therefore, we propose an end-to-end estimator that can take noisy sensor measurements and make robust future state predictions with uncertainty bounds while simultaneously taking into consideration the upstream perceptual uncertainty. For the current research, we consider an encoder-decoder based deep ensemble network for capturing both perception and predictive uncertainty simultaneously. We compared the current model to other approximate Bayesian inference methods. Overall, deep ensembles provided more robust predictions and the consideration of upstream uncertainty further increased the estimation accuracy for the model.

Index Terms:

Uncertainty quantification, Bayesian Inference, Deep Ensembles, MC Dropout

I INTRODUCTION

Most of the prediction algorithms output deterministic estimates of future states from raw sensor data [1]. Deterministic predictions are over-confident and prone to error. Therefore, it is important to make probabilistic predictions of future states to improve robustness downstream, especially for uncertainty-aware planning [2][3]. Past research has tried to address the issue by developing probabilistic methods for prediction [4, 5, 6]. One popular approach is to use Bayesian neural network (BNN) to capture uncertainty in both classification and regression problems [7]. However, exact Bayesian inference is computationally challenging due to a large number of model parameters. Therefore, approximate inference methods like Monte Carlo dropout [8] and deep ensembles [9] have been developed which can output probabilistic predictions approximating the posterior without making significant changes to the neural network (NN) architecture. The deep ensemble model has a network of independently trained neural networks with each network having a random initialization. Each network outputs probabilistic predictions based on a sophisticated loss function and the predictions are averaged over all the networks to obtain the predictive mean and variance assuming Gaussian posterior distribution. Meanwhile, the MC dropout method introduces dropout [10] layers during training and inference to capture predictive uncertainty. During inference, weights are randomly dropped to generate a distribution of outputs rather than deterministic predictions. The predictive uncertainty accounts for noise in the output data called aleatoric as well as the variation in predictions by the neural network model known as epistemic uncertainty [11]. Feng et.al [12] designed a NN architecture for Lidar 3D vehicle detection and localization capturing aleatroic and epistemic uncertainty during predictions.

Refer to caption — Figure 1: Model: The Kalman filter module updates the state and covariance at each step. Trajectory Sampling: Conditional Trajectory Sampling propagates an initial state based on system dynamics to generate a trajectory. TS is called recursively to generate a distribution of trajectories.

Although, the prediction model provides probabilistic outputs for future states, the model itself considers deterministic states during training and prediction. In the context of prediction, the model assumes that the input states as observed by the sensors are deterministic and makes predictions based on these deterministic state inputs. However, the sensors are inherently noisy and can not accurately estimate the state of an object. Further, state estimation becomes even more uncertain when coupled with adverse weather or occlusion [13][14]. Therefore, having deterministic state estimation may not be correct, and capturing the perception uncertainty associated with each state is necessary for robust predictions downstream. The main idea of the current paper is to investigate why incorporating and propagating perceptual uncertainty into the prediction pipeline is necessary. Traditionally, Bayes filters have been used to capture the state and its associated covariance during tracking [15][16]. A simple Bayes filter like Kalman Filter (KF) recursively updates the state and covariance at each step by fusing raw noisy sensor measurements with the prior computed using a motion model. But, the KF can reliably estimate only the states for which it has measurements while our problem requires the model to accurately learn and predict the covariance associated with future states. Deep neural networks have been utilized to estimate covariance from raw sensor data [17]. The authors learned the representation for measurement model by minimizing the loss between ground truth and raw sensor measurement. Similarly, Bertoni et.al [18] captured 3D localization uncertainty during tracking through a loss function based on Laplace distribution. They used MC dropout and captured both aleatoric and epistemic uncertainty during state localization using monocular RGB images. Recently, Rebecca et.al [19] modelled multivariate uncertainty for regression problems by training a NN end-to-end through a KF. Our model draws inspiration from previous work and has a simple encoder-decoder architecture that learns the KF covariance by minimising the MSE loss between model and ground truth covariance measurements via supervised regression. The learned NN model is able to estimate covariance for future states capturing perceptual uncertainty. Once the model learns to estimate covariance, we propagate the sensing uncertainty into prediction pipeline.

We design the current NN model as an end-to-end estimator that can simultaneously predict the perception and prediction uncertainty associated with future states. In the past, end-to-end approaches such as FAF[20] projected Lidar points into the bird’s eye view (BEV) grid generating predictions by inferring detection multiple times in future. Further, PTP [21] unified the Multi-object tracking (MOT) and prediction under one framework. However, they used generative modeling for trajectory distribution which is not particularly efficient in incorporating the propagation of perception uncertainty into prediction. More recently, Pavone et.al [22] showed the importance of propagating state uncertainty and leveraged upon the idea of penalizing the loss function to encode state uncertainty. Our approach in a way combines both the notion of end-to-end tracking and prediction while simultaneously estimating the perceptual uncertainty for robust probabilistic trajectory predictions. We designed our model on a sophisticated loss function that minimises mean-squared error (MSE) and negative-log-likelihood (NLL) loss simultaneously to perform robust end-to-end predictions while estimating perceptual uncertainty. The model learns to perform state covariance estimation by minimising the MSE loss with KF ground truth covariance. Meanwhile, the predictive uncertainty is captured by minimising the NLL loss using a deep ensemble model. Overall, our end-to-end NN model can take raw sensory inputs with measurement noise and make robust probabilistic predictions of future states downstream without ignoring the upstream perceptual uncertainty.

Contributions. Our key contributions are as follows. We propose a simple end-to-end NN model that can capture both perceptual and predictive uncertainty for robust state prediction. Secondly, we show the essence of using deep ensembles for predictive uncertainty. Finally, we also show how incorporating state uncertainty into prediction pipeline improved overall robustness and compared the deep ensemble model with MC dropout on publicly available pedestrian datasets. Further, we performed offline experiments with the trained model to understand the out-of-distribution prediction accuracy.

TABLE I: NOTATION

$\mathbf{X}$	State
$F:R^{n}\rightarrow R^{n}$	State Transition matrix
$z\in R^{n}$	Raw measurement
$P\in R^{nxn}$	Posterior covariance
$K\in R^{kxn}$	Kalman Gain
H : $R^{n}\rightarrow R^{k}$	Observation matrix
$Q\in R^{nxn}$	Process Noise covariance
$R\in R^{nxn}$	Measurement covariance

II PROPOSED METHOD

Our method is two-fold; first the neural network model approximates a Bayes filter to predict the sensing uncertainty at each future state. Secondly, a prediction model considers the upstream sensing uncertainty and makes robust future state predictions downstream. we use Deep Ensembles for quantifying predictive uncertainty while sensing covariance estimation is carried out using a simple Kalman filter.

II-A Covariance Estimation using Bayes Filter

In this section, we present an approach for estimating the state and covariance of a Bayes filter using a neural network (NN). Bayes filters are used to account for the uncertainty during sensing and localization with popular filters such as the KF, Extended Kalman filter, and Particle filter. These filters estimate the states of a system as well as their associated beliefs, which are updated by fusing information from raw sensor measurements. The belief, which characterizes the state uncertainty, can have either uniform variance, known as homoskedastic, or heteroscedastic noise. Capturing this upstream uncertainty during sensing data is critical for subsequent robust predictions. Therefore, the current objective is to design a NN model that can learn the heteroscedastic measurement covariance of a Bayes filter. We approach the problem of state estimation as a 2D object tracking problem, using the KF.

The KF module consists of two steps; prediction and update step (Figure 1). The prediction step takes the previous state $\mathbf{X_{k-1}}$ and covariance $\mathbf{P_{k-1}}$ and computes the prior distribution based on the constant velocity motion model (1).

{\mathbf{\bar{X}_{k}}}=\mathbf{F}\mathbf{X_{k-1}}+\mathbf{B}u_{k}

\mathbf{\bar{P}_{k}}=\mathbf{F}\mathbf{P_{k-1}}\mathbf{F}^{T}+\mathbf{Q}

(1)

Here, $\mathbf{F}$ is the state transition matrix, and $u_{k}$ is the control input. For the tracking problem, the control input has no significance. Further, $\mathbf{Q}$ represents the process noise covariance.

We model the innovation, $z_{k}$ based on the difference between actual and predicted measurement of the state

z_{k}=y_{k}-\mathbf{H}{\mathbf{\bar{X}_{k}}}

which has covariance,

\mathbf{S_{t}}=\mathbf{R}+(\mathbf{H}\mathbf{\bar{P}_{k}}\mathbf{H}^{T})

It is the sum of measurement noise covariance and predicted state covariance, $\mathbf{\bar{P}_{k}}$ . R represents the covariance matrix associated with measurement noise.

\mathbf{K_{k}}=\mathbf{\bar{P}_{k}}\mathbf{H}^{T}(\mathbf{H}{\mathbf{\bar{P}_{k}}}\mathbf{H}^{T}+\mathbf{R})^{-1}

The state, $\mathbf{{X}_{k}}$ and associated covariance, ${\mathbf{P_{k}}}$ are updated at each step using the Kalman gain, $\mathbf{K_{k}}$ which resembles a weighting factor between predicted (prior) state and actual measurement (likelihood) of the state.

\mathbf{{X}_{k}}=\mathbf{\bar{X}_{k}}+\mathbf{K_{k}}y_{k}

{\mathbf{P_{k}}}=\mathbf{(I-K_{k}H)}\mathbf{\bar{P}_{k}}

(2)

The posterior states, $\mathbf{{X}_{k}},\mathbf{P_{k}}$ (2) are updated recursively at each step by fusing the actual noisy measurement from sensors and predicted measurement using the motion model.

II-B Conditional Trajectory Sampling

Usually, trajectories can be randomly sampled from the posterior distribution of states obtained using the KF. However, random sampling generate non-smooth and infeasible trajectories especially with high sensor noise. To address this in-feasibility, we propose conditional trajectory sampling (CTS) where we propagate an initial sampled point using a dynamics model [24] and then resample a new point from the adjacent posterior distribution. This process is repeated recursively till a trajectory is generated (Figure 1). In this way, the generated trajectories adhere to system dynamics and are not random.

The CTS technique is used to sample from the posterior distribution of the KF state covariance at each time step. We randomly sample the initial state as $x_{0}$ and then propagate the initial state based on some motion model as a Markovian process $x_{t}\sim P(x_{t}|x_{t-1},a)$ [25]. The prior distribution, $P(x_{t-1})$ is propagated based on some action, a. This transition predicts the the likelihood of next state based on the constant-velocity motion model. The Process is repeated recursively based on the Markovian dynamics to generate a trajectory. Further, we repeat the trajectory generation process according to a particular bootstrap, i $\in$ {1,…,M}, where M represents the number of trajectories. The ensemble of generated trajectories roughly represent the state uncertainty. For the current research, trajectory sampling can be considered as a data augmentation step for the training of the neural network to capture this perceptual uncertainty. Another benefit of CTS is that each sampled trajectory within the distribution can be fed into the independent NN of an ensemble model to capture total predictive uncertainty. Details of our implementation are discussed in sec.II-C and Figure 2.

II-C Uncertainty Estimation

We usually treat trajectory prediction as a regression problem. For regression problems with deterministic predictions, the NNs output a single value say $\mu(x)$ . It is estimated by minimizing the mean squared error on the training set, MSE = $\sum_{n=1}^{N}(y_{n}-\mu({x_{n}}))^{2}$ . However, the outputs, $y_{n}$ are point estimates and do not contain any information on associated uncertainty. To capture the uncertainty, we assume the outputs are sampled from a Gaussian distribution such that the final layer outputs two values, predicted mean, $\mu({x})$ and variance, $\sigma^{2}(x)$ of the distribution. The variance, $\sigma^{2}(x)$ of a NN model can be obtained using the Gaussian negative log-likelihood loss (NLL) on training samples with input $x_{n}$ and output $y_{n}$ as:

-logP(y_{n}|x_{n})=\frac{log\,\sigma^{2}(x_{n})}{2}+\frac{(y_{n}-\mu({x_{n}}))^{2}}{2\sigma^{2}(x_{n})}+constant

(3)

$\sigma(x)$ represents the model’s noise observation parameter - showing the amount of noise present in the model’s outputs. However, the standard NLL strongly depends on predictive variance, $\sigma(x)$ and scales down the gradient for ill-predicted points [11]. Hence, an alternative loss function called as the $\beta$ -exponentiated negative log-likelihood loss ( $\beta$ -NLL) [23] has been used to minimize loss.

{L_{\beta-NLL}}=-logP(y_{n}|x_{n})\,stop(\sigma^{2\beta})

(4)

$\beta$ controls the dependency of gradients on predictive variance while stop() is the stop gradient operation that prevents the gradients from flowing. $\beta$ = 0 represents the standard NLL loss. Meanwhile, $\beta$ =1 completely removes the dependency of gradients on variance, $\sigma(x)$ and treats the loss function as standard mean-squared error (MSE). The $\beta$ -NLL loss function allows us the flexibility to switch between NLL and MSE loss function.

Deep Ensembles

Deep ensemble is an approximate Bayesian inference method that can capture predictive uncertainty during forecasting. It is simple and scalable compared to Bayesian NNs. As the name suggests, an ensemble network consists of a series of NNs which are different from one another due to random initialization. Let, M denote the number of NNs present within the ensemble. Then, $\mu_{i}(x)$ and $\sigma_{i}(x)$ represent the mean and variance of a single NN indexed i $\in$ [1,…,M]. Balaji et.al [9] treated the ensemble as a uniformly-weighted mixture model and combine the predictions into a single Gaussian mixture distribution $p(y|x)$ using:

p(y|x)\sim N(\mu_{i}(x),\sigma^{2}_{i}(x))

(5)

And for ease of estimating predictive probabilities, they further approximated the ensemble prediction as a Gaussian whose mean and variance correspond to the respective mean and variance of the mixture model.

\mu_{*}(x)=M^{-1}\sum_{i}{}\mu_{i}(x)

(6)

\sigma_{*}^{2}(x)=M^{-1}\sum_{i}{}(\sigma^{2}_{i}(x)+\mu^{2}_{i}(x))-\mu^{2}_{*}(x)

(7)

Dropout as Bayesian approximation

We compare the performance of other approximate Bayesian inference methods used for uncertainty quantification with deep ensembles [9]. One such approach focuses on dropout [10] to capture the total predictive uncertainty. The key notion is to randomly drop weights during both training and inference. Concisely, we can formulate the MC Dropout algorithm as,

	for b = 1:B
		$\displaystyle e_{(b)}^{}=\textit{VariationalDropout}(g(x^{}),p)$
		$\displaystyle y_{(b)}^{}=\textit{Dropout}(h(e^{}),p)$
	end for

Provided the input data $x^{*}$ , an encoder-decoder network $g(.)$ , prediction network $h(.)$ , dropout probability $p$ , and number of iterations B, we train the encoder-decoder model, $e=g(.)$ with dropout, $p$ . Further, during inference, for the same input, $x^{*}$ , the prediction network, $h(.)$ is inferred by randomly dropping weights to generate a distribution of B outputs. [8] showed the output distribution approximates a Bayesian NN without the additional complexity. The mean and variance of the predicted samples are presented below.

\\ \hat{y}_{mc}^{*}=\frac{1}{B}\sum_{b=1}^{B}\hat{y}_{(b)}^{*}

\eta_{1}^{2}=\frac{1}{B}\sum_{b=1}^{B}(\hat{y}_{(b)}^{*}-\hat{y}_{mc}^{*})^{2}

(8)

II-D Uncertainty Disentanglement

Total predictive variance (7) can be disentangled into aleatoric uncertainty, associated with the inherent noise of the data, and epistemic uncertainty accounting for uncertainty in model predictions[11].

$\displaystyle\sigma_{*}^{2}(x)$	$\displaystyle=M^{-1}\sum_{i}{}\sigma^{2}_{i}(x)$	$\displaystyle+M^{-1}\sum_{i}\mu^{2}_{i}(x)-\mu^{2}_{*}(x)$	(9)
	$\displaystyle=\mathbb{E}_{i}[\sigma^{2}_{i}(x)]$	$\displaystyle+\mathbb{E}_{i}[\mu^{2}_{i}(x)]-\mathbb{E}_{i}[\mu_{i}(x)]^{2}$
	$\displaystyle=\underbrace{\mathbb{E}_{i}[\sigma^{2}_{i}(x)]}_{Aleatroric\,\,uncertainty}$	$\displaystyle+\underbrace{\mathrm{Var}_{i}[\mu_{i}(x)]}_{Epistemic\,\,uncertainty}$

Equation (9) shows that across multiple output samples, the mean of variances represents aleatoric uncertainty, while the variance of mean represents the epistemic uncertainty. The predictive variance, $\sigma_{i}^{2}(x)$ is obtained using the Gaussian NLL loss (4). However, predictive uncertainty only accounts for the data and model uncertainty during future trajectory prediction and does not have the information of upstream perceptual uncertainty obtained using KF. In order to capture the perceptual uncertainty, the NN is trained on augmented trajectory samples that takes both state and associated covariance as inputs. The perceptual uncertainty is then estimated by minimising the MSE loss between actual covariance obtained using KF with the predicted covariance from NN. Details of the method and results have been shown in section IV-C.

Algorithm

function Kalman(

\mathbf{X_{k-1}},\mathbf{P_{k-1}},\mathbf{R},\mathbf{Q}

)

\triangleright

Where

\mathbf{X_{k-1}}

- state,

\mathbf{P_{k-1}}

- cov,

\mathbf{R}

- measurement noise ,

\mathbf{Q}

- process noise

for

k=1

N

x_{k}=Fx_{k-1}

\triangleright

Predict Step

P_{k}=FP_{k-1}F^{T}+Q

S=HP_{k-1}H^{T}+R

K=P_{k-1}H^{T}S^{-1}

y_{k}=z_{k}-Hx_{k}

\triangleright

Innovation

x_{k}=x_{k}+Ky_{k}

P_{k}=P_{k}-KHP_{k}

\triangleright

Update Step

end for

end function

for

k=1

M

\triangleright

M samples for M ensembles

function Trajectory Sampling(

\mathbf{X_{k}},\mathbf{P_{k}}

)

x_{sample}=\mathrm{MultivariateNormal}(\mathbf{X_{k}},\mathbf{P_{k}}

)

end function

end for

function Model(input =

\mathbf{[X_{k},\Sigma_{k}]^{T}}

, target =

\mathbf{[y_{k},\Sigma_{k}^{y}]^{T}}

, num epochs, batch, M)

\triangleright

End-to-End Training Model

for

epoch=1

num\,epochs

\mathbf{[\hat{y}_{k},\hat{\Sigma}_{k}^{s},\hat{\Sigma}_{k}^{p}]}=\mathrm{Model}(\mathbf{[X_{k},\Sigma_{k}]^{T}})

\triangleright

Outputs

MSE=\lVert\mathbf{\hat{\Sigma}_{k}^{s}-\Sigma_{k}^{y}}\rVert

NLL=\dfrac{\lVert\mathbf{y_{k}-\hat{y}_{k}}\rVert}{\mathbf{\hat{\Sigma}_{k}^{p}}}+\dfrac{log(\mathbf{\hat{\Sigma}_{k}^{p}})}{2}

end for

end function

III Experiments

In this section, we discuss the datasets, data augmentation, implementation details for each network and the performance metrics. Following common practice from literature [27], we trained our models on publicly available pedestrian datasets. Two most popular datasets are the ETH dataset [29] which contains the ETH and HOTEL scene while the UCY dataset [30] which contains the UNIV, ZARA1 and ZARA2 scenes. In order to draw parallelism with past works [28], we studied 8 (3.2 secs) historical steps to predict 12 (4.8 secs) steps into the future.

III-A Data Augmentation

Initially, we trained our model on the ETH dataset only which contains approximately 420 pedestrian trajectories under varied crowd settings. However, a small number of trajectories is insufficient for training. Therefore, we performed data augmentation using Taken’s Embedding theorem [31]. We used a sliding window of T = 1 step to generate multiple small trajectories out of a single large trajectory. For instance, a pedestrian’s trajectory of 29 steps will result in 10 small $\{x,y\}$ trajectory pairs of 20 steps each if past trajectory information of 8 steps is used for predicting 12 steps into future. In total, we constructed 1597 multivariate time series sequences which we split into 1260 training and 337 testing sequences for the ETH hotel dataset. Further, each trajectory was augmented using KF to generate posterior state and covariance distribution. Then, M trajectories were sampled from the distribution for each original trajectory. Details of data augmentation using KF and TS have been discussed in sec.II.

III-B Implementation details

The encoder-decoder neural network was trained end-to-end using PyTorch. Adam optimizer with a learning rate of $1e-3$ was used to compute the MSE and NLL loss. MSE loss was minimized to estimate the covariance of KF while NLL loss was minimized to capture the predictive uncertainty. Each model was trained for 150 epochs with a batch size of 64. For the ensemble model, M=3 networks were considered while for the MC dropout, a single model with dropout probability, p = 0.5 was used based on our previous research [4]. The model was compiled and fit using train data and test data was used for predictions.

III-C Performance Metrics

The trained model is then used to predict the distribution for pedestrian future states. Overall, the predictions are averaged to generate the mean predicted path along with the associated variance that quantifies uncertainty. We adopt the widely used performance metrics [27] namely average displacement error (ADE) and final displacement error (FDE) for prediction comparison between the ground truth and mean predicted path. Further, we define valid prediction intervals for regression problems based on performance metrics like prediction interval coverage probability (PICP) and mean prediction interval width (MPIW) [26].

(a) Prediction Interval Coverage Probability (PICP): Coverage probability for a single state shows whether the ground truth state, $\mathbf{y_{k}}$ lies within the predicted covariance ellipse $\Gamma(X_{k})$ for the state $\mathbf{X_{k}}$ ,

\displaystyle\mathcal{C}(\Gamma)\approx\frac{1}{|\mathcal{D^{*}}|}\sum_{(x,y\in\mathcal{D^{*}})}\mathbbm{1}(y_{k}\in\Gamma(X_{k}))

(10)

$\mathbbm{1}$ denotes an indicator function representing Boolean values.

(b) Mean Prediction Interval Width (MPIW): It refers to the average width of the confidence interval. For the current results, we consider MPIW as the average of the major and minor axes of the predicted covariance ellipse.

\displaystyle\mathcal{W}(\Gamma)\approx\frac{1}{|\mathcal{D^{*}}|}\sum_{(x,y\in\mathcal{D^{*}})}{(|u(x)-l(x)|)}

(11)

$u(x)$ and $l(x)$ refer to the lower and upper bounds for the prediction interval.

\displaystyle\begin{split}\text{ADE}&=\frac{1}{T}\sum_{t=t_{0}}^{t_{f}}||{\mathbf{\hat{y}}_{(t)}-{\mathbf{y}}_{(t)}}||\end{split}

(12)

(d) Final Displacement Error (FDE): Euclidean distance between the predicted and true final state across all trajectories.

\displaystyle\begin{split}\text{FDE}&=||{\mathbf{\hat{y}}_{(t_{f})}-{\mathbf{y}}_{(t_{f})}}||\end{split}

(13)

where $\mathbf{\hat{y}}_{t}$ is the predicted location at timestamp t and $\mathbf{y}_{t}$ is the ground truth position.

IV Results

IV-A Why Ensemble?

For quantifying uncertainty, deep ensembles average predictions over an ensemble of independently trained networks. In the current scenario, each network is trained using the Gaussian negative log-likelihood (NLL) (4) loss function such that the network outputs probabilistic predictions with both mean ( $\mu$ ) and variance ( $\sigma^{2}$ ). As, a single network can generate probabilistic outputs when trained with NLL loss function, why consider averaging the predictions over an ensemble of M networks? To answer this question, we observe how the NLL and test MSE loss scale with the number of independent networks (M) within an ensemble. The losses were evaluated on the ETH [29] dataset for pedestrians.

TABLE II: Scalability of NLL (nats) and MSE with number of networks (M) within an ensemble

M	NLL	MSE
1	-0.335	0.214
2	-0.362	0.205
3	-0.377	0.205
4	-0.371	0.208
5	-0.379	0.200

Each neural network within the ensemble was randomly initialized at the beginning of the training. Additionally, for each network, a training trajectory was randomly sampled as input from the distribution of trajectories. Training was performed and the final NLL loss was averaged over the number of networks, M within the ensemble. Test MSE was evaluated on a set of test trajectories different from the training data after the model was fully trained (Table II). The results indicate an ensemble of five networks had the lowest NLL as well as test MSE. Indeed, this shows an ensemble network because of lower NLL captures better predictive uncertainty. Further, low test MSE shows predictions of an ensemble network are closer to ground truth as compared to a single network. Overall, any ensemble of networks with $M>2$ produced better NLL and MSE as compared to a single network.

IV-B Predictions: single vs Ensemble

In this section, we compare the predictive uncertainty of a single network with an ensemble of three networks (M=3) on a pedestrian trajectory from the ETH dataset. Figure 3 shows the predictive uncertainty. The model takes 8 input states ( $\bullet$ , green dot) to predict 12 states into future. $\blacktriangle$ represents the actual ground truth trajectory of the pedestrian. Further, the plot shows the mean predicted path ( $\blacklozenge$ , blue diamond) alongwith the $1\sigma$ covariance ellipse to quantify uncertainty during prediction.

Figure 3b shows the predictive uncertainty for a single network, which fails to generate accurate prediction interval with respect to the ground truth. A significant portion of the ground truth trajectory lies outside of the $1\sigma$ predictive covariance. On the other hand, the ensemble network (Figure 3a) produced better predictive uncertainty and mean path by averaging the mean and variance of predictions over an ensemble of networks. The plot shows that the ground truth completely lies within the confidence interval at each time step. Further, the ADE/FDE for the ensemble network (0.618/1.137) was significantly lower compared to the ADE/FDE for a single network (0.704/1.394).

Figure 3a shows the total predictive uncertainty, which is due to the combination of aleatoric and epistemic uncertainty. The aleatoric uncertainty represents the inherent noise in the data, while the epistemic uncertainty arises due to the variation in model predictions. Since, the test data is sampled from the same distribution as the train data, the model uncertainty highlighted in yellow, is negligible compared to the aleatoric uncertainty. In contrast, a single network has no model uncertainty, and the total predictive uncertainty and aleatoric uncertainty are the same. Thus, the ensemble network is better suited to handle epistemic uncertainty, which is critical for robust real-world applications.

Further, we compare the performance metrics, coverage probability PICP (10) and prediction interval width MPIW (11) for different ensemble networks with the ETH dataset (Figure 4). Results show that the ensemble network with M = $\{3,5\}$ networks provide better predictive uncertainty estimates than a single network, with an average coverage probability of $\approx 63\%$ compared to $45\%$ . In addition, current results also reveal that the MPIW for an ensemble model with multiple networks is either comparable or less than that of a single network, indicating that even with a smaller confidence interval, the ensemble model can achieve a higher coverage probability for the predictions. We denote the average width of major and minor axes as $\mathrm{MPIW_{x}}$ and $\mathrm{MPIW_{y}}$ respectively.

IV-C Incorporating Perception uncertainty

Previous studies have focused on capturing predictive uncertainty while neglecting upstream state uncertainty during perception. Incorporating perception or state uncertainty into the prediction pipeline remains a challenge, as it is unclear how the total predictive uncertainty will be affected. To address this challenge, we propose incorporating and propagating state uncertainty by including the associated state covariance, $\bar{P_{k}}$ , obtained at each step from the KF. We append the heteroskedastic noise associated with each state to the state, $\mathbf{X_{k}}$ , and train them together. In section IV-B, the state $\mathbf{X_{k}}$ = $[x,y]$ contained only respective states sampled from the posterior distribution of state covariance using KF. Here, we have neglected the velocity, [u,v] in the states for training as no significant improvement was observed with their inclusion. In the current scenario, we append the states and the associated covariance together as $\mathbf{[X_{k},\Sigma_{k}]^{T}}$ and train them jointly. The outputs are $\mathbf{[\hat{y}_{k},\hat{\Sigma}_{k}^{s},\hat{\Sigma}_{k}^{p}]^{T}}$ . $\mathbf{\hat{\Sigma}_{k}^{s}}$ corresponds to the estimated state covariance which is trained by minimizing the MSE loss with the actual covariance, $\mathbf{\Sigma_{k}}$ obtained using KF. This enables the NN capture the upstream perceptual uncertainty. Meanwhile, $\mathbf{\hat{\Sigma}_{k}^{p}}$ corresponds to the estimated predictive covariance by the NN and was obtained by minimising the NLL loss for the state $\mathbf{X_{k}}$ . Overall, our method enables us to estimate the state uncertainty and incorporate it into the prediction pipeline, which can help improve the total uncertainty estimation.

IV-C1 Perception uncertainty

Sensing uncertainty during state estimation of a dynamic object can be obtained recursively from sensor measurements using KF. However, estimating perceptual uncertainty for future states over a long prediction horizon using KF can be challenging. Further, the uncertainty represented by covariance will continuously grow based on the motion model. To address these issues, a neural network (NN) model is trained to learn the KF for estimating perceptual uncertainty at any future state. We perform domain randomization on train trajectories by varying the measurement covariance, $\mathbf{R}\in(2\%,20\%)$ to generate trajectories across a wide range of sensor noise. It will make the NN more robust in estimating the sensing uncertainty effectively.

To achieve this, we apply KF to both the input and ground truth of every single trajectory to obtain the estimated state and covariance. The resulting states and covariance are then concatenated and trained together using an encoder-decoder network. The network minimizes the mean squared loss between KF ground truth covariance, $\mathbf{\Sigma_{k}}$ and predicted covariance, $\mathbf{\hat{\Sigma}_{k}^{s}}$ . Figure 5 depicts the outputs of the neural network for $\mathbf{R}=5\%$ that closely resembles the covariance predictions of the KF. The NN takes 8 states with associated covariance obtained using the KF as input and predicts the 12 future states, as well as the estimated state covariance due to sensing uncertainty. Our results demonstrate that the $2\sigma$ confidence interval predicted using the neural network closely matches the state uncertainty on ground truth obtained using the KF. This capability allows the NN to estimate the state covariance associated with perception uncertainty at each future state, which can then be integrated with the prediction uncertainty to enhance the system’s robustness.

IV-C2 Prediction uncertainty

Unlike perception uncertainty which accounts for noise in the process or measurement during sensing, prediction uncertainty captures the unpredictability associated with future states. In Figure 6, we show the total predictive uncertainty for the same trajectory as before. For capturing predictive uncertainty, we train the network using NLL loss and the NN outputs both the mean, $\mu_{k}$ and covariance, $\mathbf{\hat{\Sigma}_{k}^{p}}$ for the predicted distribution. We treat $\mathbf{\hat{\Sigma}_{k}^{p}}=[\hat{\Sigma}_{xx},\hat{\Sigma}_{xy},\hat{\Sigma}_{yx},\hat{\Sigma}_{yy}]$ as the full state covariance of a bivariate distribution.

We generate results for an ensemble of 3 networks and average the predicted distributions to obtain the mean predicted path and uncertainty at each state. The average ADE/FDE of the mean predicted path across all test trajectories is 0.64/1.08. For the ensemble network, the mean of variance, $\mathbf{\hat{\Sigma}_{k}^{p}}$ , represents aleatoric uncertainty across the ensemble. Meanwhile, the variance of predicted means represents the ”model” or epistemic uncertainty. As the test samples are from the same distribution as train samples, the variation in model predictions is insignificant, and thus the epistemic uncertainty is negligible too. Further, the predictive uncertainty around each state is significantly larger than the predicted sensing uncertainty.

IV-C3 State and Prediction Uncertainty

Why incorporate sensing uncertainty into prediction pipeline?

The primary objective of this paper is to design an end-to-end estimator that can effectively estimate state uncertainty by taking noisy sensor measurements and propagate the state uncertainty into the future predicted states. This approach enables the neural network to make precise future state predictions while remaining robust to upstream uncertainty. Mathematically, we formulate total uncertainty as the combination of sensing and predictive uncertainty.

Assume, the upstream state uncertainty due to noisy measurements is represented by the covariance ellipse, E1.

E1=\{x_{1}{\in\mathbb{R}^{2}}:\,(x_{1}-\mu_{k})^{T}(\mathbf{\hat{\Sigma}_{k}^{s}})^{-1}(x_{1}-\mu_{k})\leq 1\}

Similarly, at each time, a state randomly sampled from the covariance ellipse representing prediction uncertainty as:

E2=\{x_{2}{\in\mathbb{R}^{2}}:\,(x_{2}-\mu_{k})^{T}(\mathbf{\hat{\Sigma}_{k}^{p}})^{-1}(x_{2}-\mu_{k})\leq 1\}

Covariance ellipses E1 and E2 represent convex polytope of all reachable states during perception and prediction respectively. Any sampled state, $x_{2}$ from the predictive uncertainty, $\mathrm{E2}=\mathbf{\hat{\Sigma}_{k}^{p}}$ represents a possible future state of the tracked object. However, the sampled state has no information of upstream uncertainty, $\mathrm{E1}=\mathbf{\hat{\Sigma}_{k}^{s}}$ . Therefore, perceptual uncertainty can be incorporated into the prediction uncertainty as the Minkowski addition of the closed convex polytopes, $\mathrm{E1}$ and $\mathrm{E2}$ centred around the origin (14).

E=\{x_{1}\oplus x_{2}:\,x_{1}\in E1,\,x_{2}\in E2\}

(14)

Here $\oplus$ denotes the vector addition. Further, we translate the summed covariance ellipse representing total uncertainty to the mean predicted state, $\mu_{k}$ .

E^{\prime}=\{x+\mu_{k}:\,x\in E\}

Overall, $\mathrm{E^{\prime}}$ represents the total reachable set of states for the end-to-end estimator at any instance.

By including state uncertainty, the total uncertainty now becomes the Minkowski sum of state and prediction uncertainty, as shown in Figure 8. We evaluated the coverage probability for an ensemble of three networks, and with consideration of predictive uncertainty alone, the coverage probability was 0.63 (Figure 4). This predictive uncertainty was based on deterministic state inputs without considering any sensing uncertainty. However, when we accounted for measurement noise, $\mathrm{R}=5\%$ for states and augmented the data using KF for training, the coverage probability improved by almost 20% to 0.8. This demonstrates the effectiveness of our proposed approach in capturing both state and predictive uncertainty for accurate and robust future predictions.

IV-D UQ Methods

In this section, we compare the prediction efficacy of Deep Ensemble with Monte Carlo (MC) dropout which is an approximate Bayesian inference method. The MC dropout leverages on the idea of training the NN using dropout layers and then performing inference at test time by randomly dropping weights. This generates a distribution of varying outputs instead of a single deterministic prediction. Like ensembles, the mean and variance of the output distribution can be computed to obtain the mean predicted path and quantify uncertainty. Details of the method has been described in section II-C.

We show the the uncertainty plots for three test trajectories from the ZARA01 dataset, at $\mathbf{R}=5\%$ comparing both the methods (Figure 9). All trajectories start from origin. For deep ensemble, we consider M=3 networks while a dropout probability, p = 0.5 has been used on a single network for training using MC dropout method.

$\mathbf{Left:}$ The plot compares the $2\sigma$ perception uncertainty between the NN and KF for each method. Both the methods are slightly under confident in predictions and overestimate the uncertainty bounds when compared with the KF ground truth. This may arise due to the simultaneous training using the NLL and MSE loss function. The NN fails to minimise the MSE loss function accurately while estimating covariance. However, the deep ensemble model slightly outperforms the MC dropout in estimating the perception uncertainty at each state. The MC dropout model overestimates the covariance associated with initial states.

$\mathbf{Center:}$ The predictive uncertainty plot shows the $1\sigma$ distribution for future states with uncertainty bound for each method. The Predictive uncertainty can be disentangled into epistemic and aleatoric uncertainty. The uncertainty estimation is scalable and no significant difference was observed between the predictions of each models. However, the ADE and FDE for the mean predicted path of deep ensemble model (0.53/0.97) is closer to the ground truth when compared to the dropout model (0.58/1.00) as seen in Table III.

$\mathbf{Right:}$ The top and bottom right plots show the $1\sigma$ total uncertainty for deep ensemble and MC dropout respectively. As discussed, the total uncertainty is the Minkowski addition of the covariance ellipses representing the perception and prediction uncertainty. Since, the MC dropout overestimates the perception uncertainty, it affects the total uncertainty too. The MC dropout method makes under-confident predictions for total uncertainty. This phenomenon is more pronounced for the two trajectories along negative y-axis when compared to scalable predictions from deep ensembles.

Apart from uncertainty quantification, the current study also compared the performance metrics (sec. III-C) for both methods across the pedestrian datasets ETH[29] and UCY[30]. The ADE/FDE results indicate that the deep ensembles have a closer mean predicted path to ground truth compared to MC dropout across all the datasets. Meanwhile, the $1\sigma$ coverage probability results show deep ensembles have slightly higher coverage probability although not significant except for ETH dataset. We have combined the prediction interval width across major, $\mathrm{MPIW_{x}}$ and minor, $\mathrm{MPIW_{y}}$ axes of predicted covariance ellipse to obtain the mean prediction interval width,

MPIW=\sqrt{\frac{\mathrm{MPIW_{x}}^{2}}{2}+\frac{\mathrm{MPIW_{y}}^{2}}{2}}

Again, the deep ensembles have a lower MPIW compared to MC dropout which shows the deep ensembles are able to achieve slightly higher or equal coverage probability even with less prediction interval width. This shows that ensembles make robust predictions with scalable uncertainty and estimations closer to ground truth.

TABLE III: Performance metrics showing ADE, FDE, PICP and MPIW for predicting 12 future time steps given 8 historical steps

		ADE	FDE	PICP	MPIW
Deep Ensemble	ETH	0.60	1.11	0.80	2.23
	Hotel	0.40	0.67	0.88	1.44
	ZARA 01	0.53	0.97	0.81	1.82
	ZARA 02	0.56	1.12	0.87	2.08
	UNIV	0.25	0.50	0.93	1.55
Dropout	ETH	0.7	1.2	0.73	2.38
	Hotel	0.44	0.66	0.88	1.46
	ZARA 01	0.58	1.00	0.8	1.83
	ZARA 02	0.59	1.15	0.86	2.06
	UNIV	0.27	0.54	0.94	1.59

IV-E Out-of-distribution Results

The current simulation results showed that the NN based end-to-end estimator yield good performance for prediction on trajectories which follow same distribution as the training data. However, one fundamental challenge for NN based prediction model has been out-of-distribution (OOD) robustness. Especially, if the test samples are based on real-world pedestrian trajectory with distributional shift, how effectively the trained NN model can predict the future state as well as the prediction and sensing uncertainty? To test this hypothesis, we studied multiple scenarios namely walking fast, walking slow, turning left, turning right, walking normal (Figure 11) which constitutes a set of edge case scenarios which are different from the training samples.

IV-E1 Test trajectory generation

In order to collect test trajectories, we use the a depth camera recording at 30 frames per second (Figure 10). The camera estimates the depth of the object based on a pair of images to obtain the real-world position and velocity in 3D Cartesian coordinates. For object detection, we train a simple Mask R-CNN [32] on the coco dataset [33]. The object detection module accurately classifies the pedestrian and tracks it real-time. The sampling time is set at 12 frames such that the camera obtains the object’s position and velocity at every 0.4 seconds similar to the simulation. Every single trajectory has a duration of 8 seconds resulting in 20 $\{x,y,u,v\}$ samples, out of which 8 samples(3.2 secs) represent past trajectory while 12 samples(4.8 secs) represent the ground truth. We apply the current end-to-end estimator on the past trajectory to predict the future states with associated uncertainty and compare the predictions with ground truth.

IV-E2 Sensor measurement noise

In order to estimate the measurement covariance, $\mathbf{R}$ , we perform a simple calibration test. A single object was placed exactly 3m away from the camera. 60 samples pertaining to the $\{x,y\}$ position of the object were considered. The mean of the distribution was 2.9 $\pm$ 0.06 m. This shows roughly $4\%$ noise on all measurements. Kalman filter was applied to the each test trajectory for data augmentation based on the estimated measurement covariance for the sensor.

IV-E3 Inference

The weights of the NN model are trained on publicly available datasets namely ETH and UCY. Further, to ensure robustness towards varying degree of sensing noise, domain randomization was performed for measurement covariance, $\mathbf{R}$ . As discussed previously, the trajectories are augmented using KF with a range of measurement covariance, $\mathbf{R}\in\{2-20\}\%$ and then trained using NN. This will make the NN model more robust towards prediction on test samples generated using a different sensor having a different measurement noise. During inference, only model parameters such as trained weights and biases were considered which makes the inference process computationally cheap.

Figure 11 shows end-to-end prediction on out-of-distribution trajectories for the considered scenarios. To draw parallel with simulation results, we predict 12 states into future based on 8 historical steps. The NN model based on deep ensembles with M =3 networks predicts both 1 $\hat{\Sigma}$ sensing (blue) and prediction (olive) covariance ellipse alongwith the mean estimated path for each scenario. The ground truth trajectory lies within the predicted 1 $\hat{\Sigma}$ total covariance ellipse except for the left turn trajectory (Figure 11a). Typically, the NN estimation model fails to capture significantly drastic changes in the trajectory. Overall, the current end-to-end prediction model is robust to out-of-distribution samples as well.

V Conclusion

The current paper presents an end-to-end estimator that can take raw noisy sensor measurements and make robust future state predictions considering the upstream perceptual uncertainty. The NN model uses deep ensembles and averages outputs over a batch of networks to provide the mean predicted path and associated uncertainty for each state. For perceptual uncertainty, the NN model approximates the characteristics of a Bayes filter and estimates the associated covariance. Further, the model also estimates the predictive uncertainty associated with future states to which the perceptual uncertainty is incorporated to obtain the total uncertainty. Our results show that the incorporation of sensing uncertainty into the prediction pipeline enables the model to make robust downstream predictions. Overall, an ensemble model of 3 networks has been considered over a single network owing to better predictive uncertainty. The performance metrics indicate that the mean predicted path for an ensemble model is closer to the ground truth compared to the MC dropout predictions. Further, the coverage probability for an ensemble network is higher even with a smaller prediction interval width. Finally, the end-to-end prediction model showed robustness on out-of-distribution samples in quantifying both estimated future state and uncertainty.

In the future, it will be interesting to consider non-parametric filters like particle filter (PF) that does not assume Gaussian distribution over the filter estimations and estimate whether the NN model performs better estimates by considering the PF.

References

[1] Houenou, Adam, Philippe Bonnifait, Véronique Cherfaoui, and Wen Yao. ”Vehicle trajectory prediction based on motion model and maneuver recognition.” In 2013 IEEE/RSJ international conference on intelligent robots and systems, pp. 4363-4369. IEEE, 2013.
[2] Kahn, Gregory, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, and Sergey Levine. ”Uncertainty-aware reinforcement learning for collision avoidance.” arXiv preprint arXiv:1702.01182 (2017).
[3] Wu, Xihui, Anshul Nayak, and Azim Eskandarian. ”Motion planning of autonomous vehicles under dynamic traffic environment in intersections using probabilistic rapidly exploring random tree.” SAE International Journal of Connected and Automated Vehicles 4, no. 12-04-04-0029 (2021): 383-399.
[4] Nayak, Anshul, Azim Eskandarian, and Zachary Doerzaph. ”Uncertainty estimation of pedestrian future trajectory using Bayesian approximation.” IEEE Open Journal of Intelligent Transportation Systems 3 (2022): 617-630.
[5] Li, Jiachen, Hengbo Ma, and Masayoshi Tomizuka. ”Conditional generative neural system for probabilistic trajectory prediction.” In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6150-6156. IEEE, 2019.
[6] Wiest, Jürgen, Matthias Höffken, Ulrich Kreßel, and Klaus Dietmayer. ”Probabilistic trajectory prediction with Gaussian mixture models.” In 2012 IEEE Intelligent Vehicles Symposium, pp. 141-146. IEEE, 2012.
[7] Jospin, Laurent Valentin, Hamid Laga, Farid Boussaid, Wray Buntine, and Mohammed Bennamoun. ”Hands-on Bayesian neural networks—A tutorial for deep learning users.” IEEE Computational Intelligence Magazine 17, no. 2 (2022): 29-48.
[8] Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a bayesian approximation: Representing model uncertainty in deep learning.” In international conference on machine learning, pp. 1050-1059. PMLR, 2016.
[9] Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. ”Simple and scalable predictive uncertainty estimation using deep ensembles.” Advances in neural information processing systems 30 (2017).
[10] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. ”Dropout: a simple way to prevent neural networks from overfitting.” The journal of machine learning research 15, no. 1 (2014): 1929-1958.
[11] Valdenegro-Toro, Matias, and Daniel Saromo Mori. ”A deeper look into aleatoric and epistemic uncertainty disentanglement.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1508-1516. IEEE, 2022.
[12] Feng, Di, Lars Rosenbaum, and Klaus Dietmayer. ”Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection.” In 2018 21st international conference on intelligent transportation systems (ITSC), pp. 3266-3273. IEEE, 2018.
[13] Yoneda, Keisuke, Naoki Suganuma, Ryo Yanase, and Mohammad Aldibaja. ”Automated driving recognition technologies for adverse weather conditions.” IATSS research 43, no. 4 (2019): 253-262.
[14] Schulz, Dirk, and Wolfram Burgard. ”Probabilistic state estimation of dynamic objects with a moving mobile robot.” Robotics and Autonomous Systems 34, no. 2-3 (2001): 107-115.
[15] Prevost, Carole G., Andre Desbiens, and Eric Gagnon. ”Extended KF for state estimation and trajectory prediction of a moving object detected by an unmanned aerial vehicle.” In 2007 American control conference, pp. 1805-1810. IEEE, 2007.
[16] Breitenstein, Michael D., Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, and Luc Van Gool. ”Robust tracking-by-detection using a detector confidence particle filter.” In 2009 IEEE 12th International Conference on Computer Vision, pp. 1515-1522. IEEE, 2009.
[17] Liu, Katherine, Kyel Ok, William Vega-Brown, and Nicholas Roy. ”Deep inference for covariance estimation: Learning gaussian noise models for state estimation.” In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1436-1443. IEEE, 2018.
[18] Bertoni, Lorenzo, Sven Kreiss, and Alexandre Alahi. ”Monoloco: Monocular 3d pedestrian localization and uncertainty estimation.” In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6861-6871. 2019.
[19] Russell, Rebecca L., and Christopher Reale. ”Multivariate uncertainty in deep learning.” IEEE Transactions on Neural Networks and Learning Systems 33, no. 12 (2021): 7937-7943.
[20] Luo, Wenjie, Bin Yang, and Raquel Urtasun. ”Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net.” In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 3569-3577. 2018.
[21] Weng, Xinshuo, Ye Yuan, and Kris Kitani. ”PTP: Parallelized tracking and prediction with graph neural networks and diversity sampling.” IEEE Robotics and Automation Letters 6, no. 3 (2021): 4640-4647.
[22] Ivanovic, Boris, Yifeng Lin, Shubham Shrivastava, Punarjay Chakravarty, and Marco Pavone. ”Propagating state uncertainty through trajectory forecasting.” In 2022 International Conference on Robotics and Automation (ICRA), pp. 2351-2358. IEEE, 2022.
[23] Seitzer, Maximilian, Arash Tavakoli, Dimitrije Antic, and Georg Martius. ”On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks.” arXiv preprint arXiv:2203.09168 (2022).
[24] Chua, Kurtland, Roberto Calandra, Rowan McAllister, and Sergey Levine. ”Deep reinforcement learning in a handful of trials using probabilistic dynamics models.” Advances in neural information processing systems 31 (2018).
[25] Rose, Dominic C., Jamie F. Mair, and Juan P. Garrahan. ”A reinforcement learning approach to rare trajectory sampling.” New Journal of Physics 23, no. 1 (2021): 013013.
[26] Dewolf, Nicolas, Bernard De Baets, and Willem Waegeman. ”Valid prediction intervals for regression problems.” Artificial Intelligence Review 56, no. 1 (2023): 577-613.
[27] Alahi, Alexandre, Kratarth Goel, Vignesh Ramanathan, Alexandre Ro- bicquet, Li Fei-Fei, and Silvio Savarese. ”Social lstm: Human trajectory prediction in crowded spaces.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 961-971. 2016.
[28] Nikhil, Nishant, and Brendan Tran Morris. ”Convolutional neural net- work for trajectory prediction.” In Proceedings of the European Confer- ence on Computer Vision (ECCV) Workshops, pp. 0-0. 2018
[29] Pellegrini, Stefano, Andreas Ess, and Luc Van Gool. ”Improving data association by joint modeling of pedestrian trajectories and groupings.” In European conference on computer vision, pp. 452-465. Springer, Berlin, Heidelberg, 2010
[30] L. Leal-Taixe M. Fenzi, A. Kuznetsova, B. Rosenhahn and S. Savarese, ”Learning an Image-Based Motion Context for Multiple People Track- ing,” 2014 IEEE Conference on Computer Vision and Pattern Recogni- tion, 2014, pp. 3542-3549, doi: 10.1109/CVPR.2014.453.
[31] Takens, Floris. ”Detecting strange attractors in turbulence.” In Dynam- ical systems and turbulence, Warwick 1980, pp. 366-381. Springer, Berlin, Heidelberg, 1981
[32] He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. ”Mask r-cnn.” In Proceedings of the IEEE international conference on computer vision, pp. 2961-2969. 2017.
[33] Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. ”Microsoft coco: Common objects in context.” In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740-755. Springer International Publishing, 2014.

VI Biography Section