This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Neural Generalised AutoRegressive Conditional Heteroskedasticity

Zexuan Yin{\dagger} and Paolo Barucca{{\dagger}{\ddagger}}
Corresponding author. Email: [email protected] {\dagger}Deparment of Computer Science, University College London, WC1E 7JE, United Kingdom
\ddagger[email protected]
(v1.1 released November 2021)
Abstract

We propose Neural GARCH, a class of methods to model conditional heteroskedasticity in financial time series. Neural GARCH is a neural network adaptation of the GARCH(1,1) model in the univariate case, and the diagonal BEKK(1,1) model in the multivariate case. We allow the coefficients of a GARCH model to be time-varying in order to reflect the constantly changing dynamics of financial markets. The time-varying coefficients are parameterised by a recurrent neural network that is trained with stochastic gradient variational Bayes. We propose two variants of our model, one with normal innovations and the other with Student’s t innovations. We test our models on a wide range of univariate and multivariate financial time series, and we find that the Neural Student’s t model consistently outperforms the others.

keywords:
Heteroskedasticity; Recurrent neural networks; Variational inference; Volatility Forecasting
{classcode}

C32, C45, C53,

1 Introduction

Modelling conditional heteroskedasticity (time-varying volatility) in financial time series such as energy prices (Chan and Grant, 2016), cryptocurrencies (Chu et al., 2017), and foreign currency exchange rates Malik (2005) is of great importance to financial practitioners as it allows better decision making with regards to portfolio selection, asset pricing and risk management. In the univariate setting, popular methods include Autoregressive Conditional Heteroskedastic models (ARCH) (Engle, 1982) and Generalised GARCH (GARCH) models (Bollerslev, 1986). ARCH and GARCH models are regression-based models estimated using maximum likelihood, and are capable of capturing stylised facts about financial time series such as volatility clustering (Bauwens et al., 2006). The ARCH(pp) model describes the conditional volatility as a function of pp lagged squared residuals, and similarly the GARCH(pp,qq) model include contributions due to the last qq conditional variances. Many variants of the GARCH model have been proposed to better capture properties of financial time series, for example the EGARCH (Nelson, 1991) and GJR-GARCH (Glosten et al., 1993) models were designed to capture the so-called leverage effect, which describes the negative relationship between asset price and volatility.

In a multivariate setting, instead of modelling only time-varying conditional variances, for an nn-dimensional system, we estimate the n×nn\times n time-varying variance-covariance matrix. This allows us to investigate interactions between the volatility of different time series and whether there is a transmission of volatility (spillover effect) between markets (Bauwens et al., 2006; Erten et al., 2012). Popular multivariate GARCH models include the VEC model (Bollerslev et al., 1988), the BEKK model (Engle and Kroner, 1995), the GO-GARCH model (Van Der Weide, 2002) and DCC model (Christodoulakis and Satchell, 2002; Tse and Tsui, 2002; Engle, 2002).

In this paper we focus specifically on GARCH(1,1) models in the univariate case and the diagonal BEKK(1,1) model in the multivariate case to model daily financial asset returns. We consider several assets classes such as foreign exchange rates, commodities and stock indices. GARCH(1,1) models work well in general practical settings due to their simplicity and robustness to overfitting (Wu et al., 2013).

In traditional GARCH models, the estimated coefficients are constant which imply a stationary returns process with a constant unconditional mean and variance (Bollerslev, 1986). However, there is evidence in existing literature that relaxing the stationary constraint on the returns time series can often lead to a better performance as it allows the model to better capture time-varying market conditions. In Stǎricǎ and Granger (2005) the authors modelled daily S&P 500 returns with locally stationary models and found that most of the dynamics were concentrated in shifts of the unconditional variance, and forecasts based on non-stationary unconditional modelling yielded a better performance than a stationary GARCH(1,1) model. Similarly, the authors in Wu et al. (2013) designed a GARCH(1,1) model with time-varying coefficients that followed a random walk process, and they reported better forecasting performances in the test dataset relative to the GARCH(1,1) model.

To this end, we propose univariate and multivariate GARCH models with time-varying coefficients that are parameterised by a recurrent neural network. Our method allows the simplicity and interpretability of GARCH models to be combined with the expressive power of neural networks, and this approach follows a trend in the literature that combines classical time series models with deep learning. In Rangapuram et al. (2018) for example, the authors proposed to parameterise the coefficients of a linear Gaussian state space model with a recurrent neural network, and the latent states were then inferred using a Kalman filter. This approach is advantageous as the neural network allows modelling of more complex relationships between time steps whilst preserving the structural form of the state space model. Similarly, by preserving the structural form of the BEKK model, we can obtain covariance matrices that are symmetric and positive definite (Engle and Kroner, 1995) without the need of implementing further constraints. We treat the time-varying GARCH coefficients as latent variables to be inferred, and to achieve this we leverage recent advances in amortised variational inference in the form of a variational autoencoder (VAE) (Kingma and Welling, 2014), and subsequent combinations of a VAE with a recurrent neural network (so-called Variational RNN, or VRNN) (Chung et al., 2015; Bayer and Osendorfer, 2014; Krishnan et al., 2017; Fabius and van Amersfoort, 2015; Fraccaro et al., 2016; Karl et al., 2017) to allow efficient structured inference over a sequence of latent random variables.

The rest of the paper is organised as follows: in Section 2 we outline the preliminary mathematical concepts of GARCH modelling and amortised variational inference, in Section 3 we introduce the generative and inference model components of Neural GARCH, and in Section 4 we present the performance of Neural GARCH on univariate and multivariate daily returns time series covering foreign exchange rates, commodity prices, and stock indices.

2 Preliminaries

2.1 Univariate GARCH Model

The GARCH(pp,qq) model (Bollerslev, 1986) for a returns process rtr_{t} is specified in terms of the conditional mean equation:

rt𝒩(0,σt2),r_{t}\sim\mathcal{N}(0,\sigma_{t}^{2}), (1)

and the conditional variance equation:

σt2=ω+i=1pαirti2+j=1qβjσtj2.\sigma_{t}^{2}=\omega+\sum_{i=1}^{p}\alpha_{i}r_{t-i}^{2}+\sum_{j=1}^{q}\beta_{j}\sigma_{t-j}^{2}. (2)

Under the GARCH(1,1) model, the returns process rtr_{t} is covariance stationary with a constant unconditional mean and variance given by 𝔼[rt]=0\mathbb{E}[r_{t}]=0 and 𝔼[rt2]=ω1αβ\mathbb{E}[r_{t}^{2}]=\frac{\omega}{1-\alpha-\beta}, where ω>0\omega>0, α0\alpha\geq 0 and β0\beta\geq 0 to ensure that σt2>0\sigma_{t}^{2}>0, and α+β<1\alpha+\beta<1 to ensure a finite unconditional variance. For parameter estimation assuming normal innovations, the following log-likelihood function is maximised:

=t=1T(12log(σt2)+rt22σt2)\mathcal{L}=-\sum_{t=1}^{T}(\frac{1}{2}log(\sigma_{t}^{2})+\frac{r_{t}^{2}}{2\sigma_{t}^{2}}) (3)

To model the leptokurtic (fat-tailed) behaviour of financial returns, the authors in Bollerslev (1987) considered GARCH models with Student’s t innovations with the following log-likelihood function to be maximised:

=t=1T(logΓ(ν+12)+logΓ(ν2)+12log(ν2)+12log(σt2)+(ν+1)2log(1+rt2(ν2)σt2)),\mathcal{L}=-\sum_{t=1}^{T}(log\Gamma(\frac{\nu+1}{2})+log\Gamma(\frac{\nu}{2})+\frac{1}{2}log(\nu-2)+\frac{1}{2}log(\sigma_{t}^{2})+\frac{(\nu+1)}{2}log(1+\frac{r_{t}^{2}}{(\nu-2)\sigma_{t}^{2}})), (4)

where ν>2\nu>2 is the degree of freedom and Γ\Gamma is the gamma function.

2.2 BEKK Model

The BEKK multivariate GARCH model (Engle and Kroner, 1995) parameterises an nn-dimensional multivariate returns process 𝒓tn×T\boldsymbol{r}_{t}\in\mathbb{R}^{n\times T}:

𝒓t𝒩(0,𝚺t),\boldsymbol{r}_{t}\sim\mathcal{N}(0,\boldsymbol{\Sigma}_{t}), (5)
𝚺t=𝛀T𝛀+i=1p𝑨iT𝒓ti𝒓tiT𝑨i+j=1q𝑩jT𝚺tj𝑩j,\boldsymbol{\Sigma}_{t}=\boldsymbol{\Omega}^{T}\boldsymbol{\Omega}+\sum_{i=1}^{p}\boldsymbol{A}_{i}^{T}\boldsymbol{r}_{t-i}\boldsymbol{r}_{t-i}^{T}\boldsymbol{A}_{i}+\sum_{j=1}^{q}\boldsymbol{B}_{j}^{T}\boldsymbol{\Sigma}_{t-j}\boldsymbol{B}_{j}, (6)

where 𝚺t\boldsymbol{\Sigma}_{t} is the n×nn\times n symmetric and positive-definite covariance matrix, 𝛀\boldsymbol{\Omega} is an upper triangular matrix with n(n+1)2\frac{n(n+1)}{2} non-zero entries, 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B} are n×nn\times n coefficient matrices. In our paper we consider the diagonal-BEKK model where 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B} are diagonal matrices.

2.3 Neural Network Variational Inference

For a latent variable model with parameters θ\theta, target variable yy and latent variable zz, we wish to maximise the marginal likelihood with the latent variable integrated out, which often involves an intractable integral:

logPθ(y)=logPθ(y|z)Pθ(z)𝑑z,logP_{\theta}(y)=log\int P_{\theta}(y|z)P_{\theta}(z)\,dz, (7)

instead we perform variable inference by approximating the actual posterior distribution Pθ(z|y)P_{\theta}(z|y) with a variational approxiation qϕ(z|y)q_{\phi}(z|y) and maximise the evidence lower bound (ELBOELBO) where logPθ(y)ELBOlogP_{\theta}(y)\geq ELBO, which is equivalent to minimising the Kullback-Leiber (KLKL) divergence between the variational posterior qϕ(z|y)q_{\phi}(z|y) and the actual posterior Pθ(z|y)P_{\theta}(z|y) (Kingma and Welling, 2014):

logPθ(y)=ELBO+KL(qϕ(z|y)||Pθ(z|y)),logP_{\theta}(y)=ELBO+KL(q_{\phi}(z|y)||P_{\theta}(z|y)), (8)

where the ELBOELBO is given by:

ELBO=𝔼zq(z|x)[logPθ(x|z)]KL(qϕ(z|x)||pθ(z)),ELBO=\mathbb{E}_{z\sim q(z|x)}[logP_{\theta}(x|z)]-KL(q_{\phi}(z|x)||p_{\theta}(z)), (9)

where Pθ(z)P_{\theta}(z) is a prior distribution for zz, and in a variational autoencoder (VAE) the generative and inference distributions logPθ(x|z)logP_{\theta}(x|z) and qϕ(z|x)q_{\phi}(z|x) are parameterised by neural networks. An uninformative prior such as 𝒩(0,1)\mathcal{N}(0,1) is often used for the prior Pθ(z)P_{\theta}(z), however in our model we adopt a learned prior distribution Pθ(z|t1)P_{\theta}(z|\mathcal{I}_{t-1}) where t1\mathcal{I}_{t-1} is the information set up to the time step t1t-1. This learned prior approach has achieved great success in sequential generation tasks such as video prediction (Franceschi et al., 2020; Denton and Fergus, 2018).

3 Materials and Methods

3.1 Neural GARCH Models

In this section we introduce the intuition and various components of Neural GARCH models. We shall focus specifically on univariate and multivariate GARCH(1,1) models as we would like to keep the GARCH model structure as simple as possible and delegate the modelling of complex relationships between time steps to the underlying neural network which outputs the coefficients of the GARCH models. For the rest of this paper, we use the terms (multivariate)GARCH(1,1) and BEKK(1,1) interchangeably when referring to multivariate systems.

In neural GARCH, the coefficients {ω\omega, α\alpha, β\beta} in the univariate case and {𝛀\boldsymbol{\Omega}, 𝑨\boldsymbol{A}, 𝑩\boldsymbol{B}} in the multivariate case are allowed to vary freely with time. This approach allows the model to capture the time-varying nature of market dynamics Wu et al. (2013). The GARCH(1,1) and BEKK(1,1) models thus become:

σt2=ωt+αtrt12+βtσt12,\sigma_{t}^{2}=\omega_{t}+\alpha_{t}r_{t-1}^{2}+\beta_{t}\sigma_{t-1}^{2}, (10)
𝚺t=𝛀tT𝛀t+𝑨tT𝒓t1𝒓t1T𝑨t+𝑩tT𝚺t1𝑩t,\boldsymbol{\Sigma}_{t}=\boldsymbol{\Omega}^{T}_{t}\boldsymbol{\Omega}_{t}+\boldsymbol{A}_{t}^{T}\boldsymbol{r}_{t-1}\boldsymbol{r}_{t-1}^{T}\boldsymbol{A}_{t}+\boldsymbol{B}_{t}^{T}\boldsymbol{\Sigma}_{t-1}\boldsymbol{B}_{t}, (11)

For notation purposes we define the parameter set 𝜸t=[ωt,αt,βt]T\boldsymbol{\gamma}_{t}=[\omega_{t},\alpha_{t},\beta_{t}]^{T} for GARCH(1,1) and 𝜸t=[𝛀t,𝑨t,𝑩t]T\boldsymbol{\gamma}_{t}=[\boldsymbol{\Omega}_{t},\boldsymbol{A}_{t},\boldsymbol{B}_{t}]^{T} for BEKK(1,1).

In our proposed framework, 𝜸t\boldsymbol{\gamma}_{t} is a multivariate normal latent random variable with a diagonal covariance matrix to be estimated at every time step. For GARCH(1,1) this involves an estimation of a vector of size 3 for a model with normal innovations:

𝜸t=[ωtαtβt]𝒩(𝝁𝒕,𝚺γ,t),\boldsymbol{\gamma}_{t}=\begin{bmatrix}\omega_{t}\\ \alpha_{t}\\ \beta_{t}\end{bmatrix}\sim\mathcal{N}(\boldsymbol{\mu_{t}},\boldsymbol{\Sigma}_{\gamma,t}), (12)

and the vector [σωt2,σαt2,σβt2]T[\sigma_{\omega_{t}}^{2},\sigma_{\alpha_{t}}^{2},\sigma_{\beta_{t}}^{2}]^{T} represents the diagonal elements of the covariance matrix 𝚺γ,t\boldsymbol{\Sigma}_{\gamma,t}. Here we have written the covariance matrix of the parameter set 𝜸t\boldsymbol{\gamma}_{t} as 𝚺γ,t\boldsymbol{\Sigma}_{\gamma,t} in order to distinguish it from the covariance matrix of the asset returns 𝚺t\boldsymbol{\Sigma}_{t}. For neural GARCH(1,1) with Student’s t innovations, 𝜸t\boldsymbol{\gamma}_{t} is augmented with the degree of freedom parameter νt\nu_{t} such that 𝜸t=[ωt,αt,βt,νt]T\boldsymbol{\gamma}_{t}=[\omega_{t},\alpha_{t},\beta_{t},\nu_{t}]^{T}.

For the multivariate diagonal BEKK(1,1), we adopt a similarly methodology. For a system of nn assets, 𝜸t\boldsymbol{\gamma}_{t} of a model with normal innovations is a vector of size 2n+n(n+1)22n+\frac{n(n+1)}{2} (Engle and Kroner, 1995), and with Student’s t innovations 𝜸t\boldsymbol{\gamma}_{t} is of size 2n+n(n+1)2+12n+\frac{n(n+1)}{2}+1. As an example, for a system of 2 assets (n=2n=2), the BEKK model is given by:

𝚺t=[c11,t0c21,tc22,t][c11,tc12,t0c22,t]+[a11,t00a22,t][r1,t1r2,t1][r1,t1r2,t1]T[a11,t00a22,t]+[b11,t00b22,t][σ11,t2σ12,t2σ21,t2σ22,t2][b11,t00b22,t],\boldsymbol{\Sigma}_{t}=\begin{bmatrix}c_{11,t}&0\\ c_{21,t}&c_{22,t}\end{bmatrix}\begin{bmatrix}c_{11,t}&c_{12,t}\\ 0&c_{22,t}\end{bmatrix}+\begin{bmatrix}a_{11,t}&0\\ 0&a_{22,t}\end{bmatrix}\begin{bmatrix}r_{1,t-1}\\ r_{2,t-1}\end{bmatrix}\begin{bmatrix}r_{1,t-1}\\ r_{2,t-1}\end{bmatrix}^{T}\begin{bmatrix}a_{11,t}&0\\ 0&a_{22,t}\end{bmatrix}\\ +\begin{bmatrix}b_{11,t}&0\\ 0&b_{22,t}\end{bmatrix}\begin{bmatrix}\sigma_{11,t}^{2}&\sigma_{12,t}^{2}\\ \sigma_{21,t}^{2}&\sigma_{22,t}^{2}\end{bmatrix}\begin{bmatrix}b_{11,t}&0\\ 0&b_{22,t}\end{bmatrix}, (13)

where aij,ta_{ij,t} is the i,ji,jth element of the matrix 𝑨t\boldsymbol{A}_{t}, the parameter set 𝜸t\boldsymbol{\gamma}_{t}, which also has a multivariate normal distribution, is given by:

𝜸t=[a11,t,a22,t,b11,t,b22,t,c11,t,c12,t,c22,t]T\boldsymbol{\gamma}_{t}=[a_{11,t},a_{22,t},b_{11,t},b_{22,t},c_{11,t},c_{12,t},c_{22,t}]^{T} (14)

The main contribution of our paper is the estimation of 𝜸t\boldsymbol{\gamma}_{t} with a recurrent neural network (RNN) and a multilayer perceptron (MLP). We provide the exact estimation schemes in Sections 3.2 and 3.3. Since we assume a multivariate normal distribution with a diagonal covariance matrix for 𝜸t\boldsymbol{\gamma}_{t}, we need to estimate the means and variances of the elements in 𝜸t\boldsymbol{\gamma}_{t} with our neural network.

3.2 Generative Model

The generative model distribution Pθ(𝒓1:T,𝚺1:T,𝜸1:T)P_{\theta}(\boldsymbol{r}_{1:T},\boldsymbol{\Sigma}_{1:T},\boldsymbol{\gamma}_{1:T}) of a general multivariate neural GARCH is presented in Figure 1 and given by (15). For the univariate case, one simply replaces 𝚺t\boldsymbol{\Sigma}_{t} in (15) with σt2\sigma^{2}_{t}.

Pθ(𝒓1:T,𝚺1:T,𝜸1:T)=P(𝜸0)P(𝚺0)t=1TPθ(𝒓t|𝚺t)Pθ(𝚺t|𝜸t,𝒓t1,𝚺t1)Pθ(𝜸t|𝜸t1,𝒓1:t1).P_{\theta}(\boldsymbol{r}_{1:T},\boldsymbol{\Sigma}_{1:T},\boldsymbol{\gamma}_{1:T})=P(\boldsymbol{\gamma}_{0})P(\boldsymbol{\Sigma}_{0})\prod_{t=1}^{T}P_{\theta}(\boldsymbol{r}_{t}|\boldsymbol{\Sigma}_{t})P_{\theta}(\boldsymbol{\Sigma}_{t}|\boldsymbol{\gamma}_{t},\boldsymbol{r}_{t-1},\boldsymbol{\Sigma}_{t-1})P_{\theta}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t-1}). (15)

The initial priors were set to delta distributions, P(𝚺0)P(\boldsymbol{\Sigma}_{0}) was centered on the covariance matrix estimated using the training dataset, and P(𝜸0)P(\boldsymbol{\gamma}_{0}) was centered on a vector of 1s. The predictive distribution Pθ(𝜸t|𝜸t1,𝒓1:t1)P_{\theta}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t-1}) takes as input the information set t1={𝜸t1,𝒓1:t1}\mathcal{I}_{t-1}=\{\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t-1}\} and predicts the 1-step-ahead value 𝜸t\boldsymbol{\gamma}_{t}. For this parameterisation, we leverage a recurrent neural network to carry 𝒓1:t1\boldsymbol{r}_{1:t-1} such that:

Pθ(𝜸t|𝜸t1,𝒓1:t1)=Pθ(𝜸t|𝜸t1,𝒉t1),P_{\theta}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t-1})=P_{\theta}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{h}_{t-1}), (16)

where 𝒉t\boldsymbol{h}_{t} is the hidden state of the underlying RNN. In our model we use a gated recurrent unit (GRU) Cho et al. (2014). We then use an MLP which takes as input t1\mathcal{I}_{t-1} as maps it to the means and variances of the elements in 𝜸t\boldsymbol{\gamma}_{t}. In the 2-dimensional example given in (14), the estimation is done using:

[μa11,t,,μc22,t,σa11,t2,,σc22,t2]T=MLPpred(𝜸t1,𝒉t1),[\mu_{a_{11,t}},...,\mu_{c_{22,t}},\sigma^{2}_{a_{11,t}},...,\sigma^{2}_{c_{22,t}}]^{T}=MLP_{pred}(\boldsymbol{\gamma}_{t-1},\boldsymbol{h}_{t-1}), (17)

and we apply a sigmoid function on the neural network output to ensure that the estimated variances of the elements in 𝜸t\boldsymbol{\gamma}_{t} and the GARCH coefficients are non-negative. We have also tested other ways to ensure non-negativity such as using a softplus function, however we found that applying a sigmoid function gave the best performance. For neural GARCH with Student’s t innovations, we require that ν>2\nu>2 in order to have a well-defined covariance. Since appyling the sigmoid function ensures our estimated coefficients are non-negative, we estimate ν=ν2\nu^{\prime}=\nu-2 (instead of ν\nu directly) to ensure ν>2\nu>2.

The conditional distribution Pθ(𝚺t|𝜸t,𝒓t1,𝚺t1)P_{\theta}(\boldsymbol{\Sigma}_{t}|\boldsymbol{\gamma}_{t},\boldsymbol{r}_{t-1},\boldsymbol{\Sigma}_{t-1}) is a delta distribution centered on (10) in the univariate case and (11) in the multivariate case as we can calculate the covariance matrix 𝚺𝒕\boldsymbol{\Sigma_{t}} deterministically given {𝜸t,𝒓t1,𝚺t1}\{\boldsymbol{\gamma}_{t},\boldsymbol{r}_{t-1},\boldsymbol{\Sigma}_{t-1}\}. The distribution Pθ(𝒓t|𝚺t)P_{\theta}(\boldsymbol{r}_{t}|\boldsymbol{\Sigma}_{t}) is the likelihood function and we have provided their logarithms (in the univariate case) in (3) for normal innovations and (4) for Student’s t innovations.

Refer to caption
Figure 1: Generative model of neural GARCH. The generative MLP takes as input {𝜸t1,𝒉t1}\{\boldsymbol{\gamma}_{t-1},\boldsymbol{h}_{t-1}\} and outputs the estimated means and variances of the elements in 𝜸t\boldsymbol{\gamma}_{t}.

3.3 Inference Model

The inference model distribution qϕ(𝚺1:T,𝜸1:T|𝒓1:T)q_{\phi}(\boldsymbol{\Sigma}_{1:T},\boldsymbol{\gamma}_{1:T}|\boldsymbol{r}_{1:T}) is presented in Figure 2 and can be factorised as:

qϕ(𝚺1:T,𝜸1:T|𝒓1:T)=P(𝜸0)P(𝚺0)t=1Tqϕ(𝚺t|𝜸t,𝒓t1,𝚺t1)qϕ(𝜸t|𝜸t1,𝒓1:t),q_{\phi}(\boldsymbol{\Sigma}_{1:T},\boldsymbol{\gamma}_{1:T}|\boldsymbol{r}_{1:T})=P(\boldsymbol{\gamma}_{0})P(\boldsymbol{\Sigma}_{0})\prod_{t=1}^{T}q_{\phi}(\boldsymbol{\Sigma}_{t}|\boldsymbol{\gamma}_{t},\boldsymbol{r}_{t-1},\boldsymbol{\Sigma}_{t-1})q_{\phi}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t}), (18)

where P(𝜸0)P(\boldsymbol{\gamma}_{0}) and P(𝚺0)P(\boldsymbol{\Sigma}_{0}) are the same as in the generative model, qϕ(𝚺t|𝜸t,𝒓t1,𝚺t1)q_{\phi}(\boldsymbol{\Sigma}_{t}|\boldsymbol{\gamma}_{t},\boldsymbol{r}_{t-1},\boldsymbol{\Sigma}_{t-1}) has the same functional form (a delta distribution) as Pθ(𝚺t|𝜸t,𝒓t1,𝚺t1)P_{\theta}(\boldsymbol{\Sigma}_{t}|\boldsymbol{\gamma}_{t},\boldsymbol{r}_{t-1},\boldsymbol{\Sigma}_{t-1}), however 𝜸t\boldsymbol{\gamma}_{t} is now drawn from the posterior distribution qϕ(𝜸t|𝜸t1,𝒓1:t)q_{\phi}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t}) where:

qϕ(𝜸t|𝜸t1,𝒓1:t)=qϕ(𝜸t|𝜸t1,𝒉t).q_{\phi}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t})=q_{\phi}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{h}_{t}). (19)

We note that the generative and inference networks share the same underlying recurrent neural network but uses information at different time steps. The generative model predicts 𝜸t\boldsymbol{\gamma}_{t} using the information set t1\mathcal{I}_{t-1} and the inference model infers 𝜸t\boldsymbol{\gamma}_{t} using t\mathcal{I}_{t}. The inference MLP (MLPinfMLP_{inf}) however is different to that of the generative model (MLPpredMLP_{pred}) and it outputs the posterior estimates of the elements of 𝜸t\boldsymbol{\gamma}_{t}:

[μa11,t,,μc22,t,σa11,t2,,σc22,t2]postT=MLPinf(𝜸t1,𝒉t).[\mu_{a_{11,t}},...,\mu_{c_{22,t}},\sigma^{2}_{a_{11,t}},...,\sigma^{2}_{c_{22,t}}]_{post}^{T}=MLP_{inf}(\boldsymbol{\gamma}_{t-1},\boldsymbol{h}_{t}). (20)
Refer to caption
Figure 2: Inference model of neural GARCH. The inference MLP outputs the posterior estimate of 𝜸t\boldsymbol{\gamma}_{t} conditioned on available information up to time tt.

3.4 Model Training

For neural network training we optimise the generative and inference model parameters (θ\theta and ϕ\phi) jointly using stochastic gradient variational Bayes Kingma and Welling (2014). Our objective function is the ELBO defined as:

ELBO(θ,ϕ)=n=1T𝔼γtqϕ[logPθ(𝒓t|𝜸t)]KL(qϕ(𝜸t|𝜸t1,𝒓1:t)||Pθ(𝜸t|𝜸t1,𝒓1:t1)),ELBO(\theta,\phi)=\sum_{n=1}^{T}\mathbb{E}_{\gamma_{t}\sim q_{\phi}}[logP_{\theta}(\boldsymbol{r}_{t}|\boldsymbol{\gamma}_{t})]-KL(q_{\phi}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t})||P_{\theta}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t-1})), (21)

and we seek:

{θ,ϕ}=argmaxθ,ϕELBO(θ,ϕ).\{\theta^{*},\phi^{*}\}=\operatorname*{argmax}_{\theta,\phi}ELBO(\theta,\phi). (22)

3.5 Model Prediction

Neural GARCH produces 1-step-ahead conditional volatility predictions. Given t={𝜸t,𝚺t,𝒓t:t}\mathcal{I}_{t}=\{\boldsymbol{\gamma}_{t},\boldsymbol{\Sigma}_{t},\boldsymbol{r}_{t:t}\}, we use (17) to obtain our prediction of 𝜸t+1\boldsymbol{\gamma}_{t+1} by drawing from the multivariate normal distribution whose parameters are given by MLPpredMLP_{pred}. We then obtain our estimate of 𝚺t+1\boldsymbol{\Sigma}_{t+1} deterministically using (11). To estimate 𝚺t+2\boldsymbol{\Sigma}_{t+2}, we would now have access to 𝒓t+1\boldsymbol{r}_{t+1} and therefore we obtain the posterior estimate of 𝜸t+1\boldsymbol{\gamma}_{t+1} using (20) and predict 𝚺t+2\boldsymbol{\Sigma}_{t+2} using the posterior estimate of 𝚺t+1\boldsymbol{\Sigma}_{t+1}. This posterior update is crucial as it ensures that we use all available and up-to-date information to predict the next covariance matrix.

3.6 Experiments

We test neural GARCH on a range of daily asset log returns time series covering univariate and multivariate foreign exchange rates (20 pairs), commodity prices (brent crude, silver and gold) and stock indices (DAX, S&P, NASDAQ, FTSE100, Dow Jones). We provide a brief data description in Table 1.

Table 1: Description of asset log returns time series analysed in our experiments.
Dataset N Time Series Frequency Observations Date Range
Foreign exchange 20 daily 3128 05/08/2011 - 05/08/2021
Brent crude 1 daily 2065 05/08/2013 - 05/08/2021
Silver & gold 2 daily 3109 05/08/2011 - 05/08/2021
Stock indices 5 daily 2054 05/08/2013 - 05/08/2021

For model training, we split each time series such that 80% was used in training, 10% for validation and 10% for testing. The underyling recurrent neural network (GRU) has a hidden state size 64, the generative and inference MLPs (MLPpredMLP_{pred} and MLPinfMLP_{inf}) are both 3-layer MLPs with 64 hidden nodes and ReLU activation functions.

For univariate time series, we compare the performance of six models: GARCH(1,1)-Normal, GARCH(1,1)-Student’s t, Neural-GARCH(1,1) and Neural-GARCH(1,1)-Student’s t, EGARCH(1,1,1)-Normal and EGARCH(1,1,1)-Student’s t. Although neural GARCH is an adaptation of the GARCH(1,1) model, we include the EGARCH(1,1,1) model as a benchmark as it is capable of accounting for the asymmetric leverage effect: negative shocks lead to larger volatilities than positive shocks, where the middle index represents the order of the asymmetric term. We would like to investigate whether the data driven approach of neural GARCH allows it to model the leverage effect without the explicit dependence on an asymmetric term as in an EGARCH model. For multivariate time series, we compare the performance of multivariate GARCH(1,1) (BEKK(1,1)) with normal and Student’s t innovations against their neural network adaptations. We evaluate the model performance using the log-likelihood of the test dataset.

4 Results & Discussion

In Tables 2, 3, 4 and 5 we provide the log-likelihoods evaluated on the test dataset for commodity prices, stock indices, and univariate and multivariate foreign exchange time series. We have highlighed the best model for each time series in bold. For commodity prices, we observe that EGARCH(1,1,1)-Student’s t is the best performer on Brent crude, whilst Neural-GARCH(1,1)-Student’s t performs best on silver and gold price returns.

For stock indices we observe that Neural-GARCH(1,1)-Student’s t performs best on the DAX AND Dow Jones indices whilst EGARCH(1,1,1)-Student’s t performs best on S&P500, NASDAQ and FTSE 100. The fact that the neural GARCH models perform better than EGARCH in some datasets shows that our data-driven approach can learn to accommodate many but not all scenarios of the leverage effect, and therefore in cases where EGARCH outperforms, there are benefits associated with the direct modelling of the asymmetric effect. For univariate foreign exchange time series, we observe that the Neural GARCH variants outperform traditional GARCH models on 16 out of 20 time series, and where neural GARCH outperforms, Neural-GARCH(1,1) with normal innovations performs better on 5/16 time series and Neural-GARCH(1,1)-Student’s t performs better on 11/16 time series.

Table 2: Test log-likelihoods for commodity price time series. Best result highlighed in bold, higher log-likelihood is better.
Time series GARCH(1,1)-Normal GARCH(1,1)-Student’s t Neural-GARCH(1,1) Neural-GARCH(1,1)-Student’s t EGARCH(1,1,1)-Normal EGARCH(1,1,1)-Student’s t
BRENT -298.738 -298.689 -307.921 -295.895 -299.966 292.798\boldsymbol{-292.798}
SILVER -554.595 -551.936 -541.713 514.476\boldsymbol{-514.476} -572.780 -581.834
GOLD -462.28 -450.752 -473.074 421.566\boldsymbol{-421.566} -462.857 -468.509
Table 3: Test log-likelihoods for stock index time series.
Time series GARCH(1,1)-Normal GARCH(1,1)-Student’s t Neural-GARCH(1,1) Neural-GARCH(1,1)-Student’s t EGARCH(1,1,1)-Normal EGARCH(1,1,1)-Student’s t
DAX -261.275 -268.944 -259.321 244.190\boldsymbol{-244.190} -257.767 -266.163
SNP -300.849 -298.614 -308.559 -295.934 -300.577 284.841\boldsymbol{-284.841}
NASDAQ -327.547 -326.401 -331.539 -320.387 -334.237 312.366\boldsymbol{-312.366}
FTSE -324.437 314.480\boldsymbol{-314.480} -326.572 -315.606 -322.425 311.135\boldsymbol{-311.135}
DOW -298.406 -302.196 -315.164 284.247\boldsymbol{-284.247} -292.974 -293.486

For multivariate foreign exchange time series, we observe that Neural-BEKK(1,1)-Student’s t is the best performer on 8/9 time series considered. Across different assets we see that the Student’s t version of Neural GARCH consistently performs better than the traditional GARCH models as well as Neural GARCH with normal innovations. This suggests that a model with Student’s t innovation does indeed model the leptokurtic behaviour of financial time series returns better than a model with normal innovations. This finding is in line with our expectations after surveying the literature (for example Bollerslev (1987) and Heracleous (2007)).

Table 4: Test log-likelihoods for univariate foreign exchange time series.
Time series GARCH(1,1)-Normal GARCH(1,1)-Student’s t Neural-GARCH(1,1) Neural-GARCH(1,1)-Student’s t EGARCH(1,1,1)-Normal EGARCH(1,1,1)-Student’s t
AUDCAD 397.251\boldsymbol{-397.251} -402.582 -409.553 -398.645 -397.776 -473.302
AUDCHF -311.566 -308.029 293.853\boldsymbol{-293.853} -294.010 -309.295 -312.965
AUDJPY -346.024 -350.401 -353.213 335.945\boldsymbol{-335.945} -346.478 -354.095
AUDNZD -303.986 -318.345 -307.44 301.514\boldsymbol{-301.514} -303.627 -322.777
AUDUSD -423.602 -424.594 -432.518 422.753\boldsymbol{-422.753} -424.498 -425.807
CADJPY -351.749 -359.545 349.209\boldsymbol{-349.209} -349.842 -350.460 -362.875
CHFJPY -238.566 -241.360 -215.536 208.710\boldsymbol{-208.710} -230.120 -253.050
EURAUD -338.378 -344.922 -347.995 336.604\boldsymbol{-336.604} -337.481 -347.259
EURCAD -347.177 -359.499 345.989\boldsymbol{-345.989} -347.730 -346.547 -366.701
EURCHF -277.643 -153.502 -156.567 142.963\boldsymbol{-142.963} -275.073 -321.051
EURGBP -366.187 -378.950 -373.515 364.619\boldsymbol{-364.619} -364.727 -389.416
EURJPY -266.674 -278.327 -267.374 256.341\boldsymbol{-256.341} -262.667 -290.897
EURUSD -332.917 -347.818 330.471\boldsymbol{-330.471} -334.488 -334.178 -361.348
GBPAUD 335.530\boldsymbol{-335.530} -346.944 -353.800 -344.842 -335.812 -353.034
GBPJPY -330.030 -348.729 -337.981 324.559\boldsymbol{-324.559} -329.013 -359.506
GBPUSD 418.593\boldsymbol{-418.593} -431.554 -423.460 -419.658 -420.534 -441.162
NZDUSD 415.648\boldsymbol{-415.648} -416.944 -425.841 -417.380 -416.094 -417.153
USDCAD -408.008 -416.483 404.614\boldsymbol{-404.614} -413.507 -406.735 -419.863
USDCHF -315.963 -303.351 -276.461 260.177\boldsymbol{-260.177} -282.682 -308.410
USDJPY -295.295 -304.539 -291.419 277.477\boldsymbol{-277.477} -294.519 -318.100
Table 5: Test log-likelihoods for multivariate foreign exchange time series.
Time series GARCH(1,1)-Normal GARCH(1,1)-Student’s t Neural-GARCH(1,1) Neural-GARCH(1,1)-Student’s t
EURGBP,EURCHF -643.521 -558.275 -523.725 513.214\boldsymbol{-513.214}
GBPJPY GBPUSD -629.950 -656.198 -649.221 605.305\boldsymbol{-605.305}
AUDCHF AUDJPY -534.49 -522.934 -497.726 477.992\boldsymbol{-477.992}
EURGBP,EURUSD,EURJPY -920.085 -959.420 -985.156 917.907\boldsymbol{-917.907}
USDCAD,USDCHF,USDJPY -1008.821 -998.041 -990.601 957.912\boldsymbol{-957.912}
EURGBP,GBPJPY,USDJPY 916.957\boldsymbol{-916.957} -943.66 -1011.435 -966.806
GBPAUD,GBPJPY,GBPUSD -971.522 -991.8238 -1037.296 967.500\boldsymbol{-967.500}
EURCHF,EURGBP,EURJPY,EURUSD -1196.477 -1127.192 -1105.298 1078.165\boldsymbol{-1078.165}
AUDJPY,AUDCHF,EURCHF,GBPJPY -1505.540 -862.995 -865.471 783.955\boldsymbol{-783.955}

In order to evaluate whether the models’ performances across different time series are statistically significant, we plotted a critical difference (cd) diagram by following the approach of the authors in Ismail Fawaz et al. (2019) where a Friedman test at α=0.05\alpha=0.05 Friedman (1940) was first used to reject the null hypothesis that the four models are equivalent and have equal rankings, and then a post-hoc test was done using a Wilcoxon signed-rank test Wilcoxon (1945) at the 95% confidence level. The critical diagram shows average rankings of the models across different datasets.

In Figure 3 we show the cd plot for univariate time series. A bold horizontal line indicates no significant difference amongst the group of models that are on the line. In the univariate experiments we observe no significant difference amongst the group: EGARCH(1,1,1)-Student’s T, GARCH(1,1)-Student’s T and Neural-GARCH(1,1); likewise, there is also no significant difference amongst the group: GARCH(1,1)-Student’s T, Neural-GARCH(1,1), GARCH(1,1)-Normal and EGARCH(1,1,1)-Normal. We also observe that on average, GARCH(1,1)-Normal and EGARCH(1,1,1)-Normal perform significantly better than EGARCH(1,1,1)-Student’s T. We establish that Neural-GARCH(1,1)-Student’s t is the best performer overall on the univariate datasets, and it significantly outperforms the other models with an average rank of 1.8929.

Refer to caption
Figure 3: Critical difference diagram of the univariate experiments. A horizontal bold line indicates no significant difference amongst the group of models. We establish that Neural-GARCH(1,1)-Student’s t is the best performer in the univariate experiments.
Refer to caption
Figure 4: Critical difference diagram showing the average rankings of GARCH(1,1) and Neural-GARCH(1,1) with normal and Student’s t innovations on all time series experiments. We find that Neural-GARCH(1,1)-Student’s t is the best-performing model with an average rank of 1.4324.

In Figure 4 we show the cd plot constructed using all the time series experiments (univariate and multivariate). Our aim is to compare the class of traditional GARCH(1,1) models against their neural network adaptations. We observe that there is no significant difference between GARCH(1,1)-Student’s t, Neural-GARCH(1,1) and GARCH(1,1)-Normal, and we establish that Neural-GARCH(1,1)-Student’s t is the best performer overall with an average ranking of 1.4324.

For a GARCH(1,1) model, the returns process is often assumed to be stationary with a constant unconditional mean and variance. Neural GARCH(1,1) relaxes this stationary assumption. The unconditional variance of Neural-GARCH(1,1) in the univariate case

σt2=ωt+αtrt2+βtσt12\sigma^{2}_{t}=\omega_{t}+\alpha_{t}r^{2}_{t}+\beta_{t}\sigma^{2}_{t-1} (23)

is obtained by taking the expectation of (23):

𝔼[rt2]=𝔼[ωt+αtrt12+βtσt12]=ωt+αt𝔼[rt12]+βt𝔼[σt12]=ωt+(αt+βt)𝔼[rt12].\begin{split}\mathbb{E}[r^{2}_{t}]&=\mathbb{E}[\omega_{t}+\alpha_{t}r^{2}_{t-1}+\beta_{t}\sigma^{2}_{t-1}]\\ &=\omega_{t}+\alpha_{t}\mathbb{E}[r^{2}_{t-1}]+\beta_{t}\mathbb{E}[\sigma^{2}_{t-1}]\\ &=\omega_{t}+(\alpha_{t}+\beta_{t})\mathbb{E}[r^{2}_{t-1}].\end{split} (24)

For a GARCH(1,1) model with constant coefficients {ω,α,β}\{\omega,\alpha,\beta\}, we have 𝔼[rt2]=𝔼[rt12]\mathbb{E}[r^{2}_{t}]=\mathbb{E}[r^{2}_{t-1}] (constant unconditional variance) and therefore ω1αβ\frac{\omega}{1-\alpha-\beta}. With Neural-GARCH(1,1), 𝔼[rt2]𝔼[rt12]\mathbb{E}[r^{2}_{t}]\neq\mathbb{E}[r^{2}_{t-1}] however we can assume that the parameters {ωt,αt,βt}\{\omega_{t},\alpha_{t},\beta_{t}\} change gradually with no sudden jumps and therefore 𝔼[rt2]𝔼[rt12]\mathbb{E}[r^{2}_{t}]\approx\mathbb{E}[r^{2}_{t-1}] (Bri, 2017) and we can approximate the time-varying unconditional variance of Neural-GARCH(1,1) with 𝔼[rt2]ωt1αtβt\mathbb{E}[r^{2}_{t}]\approx\frac{\omega_{t}}{1-\alpha_{t}-\beta_{t}} with αt+βt<1\alpha_{t}+\beta_{t}<1.

Results from our analysis of the Neural-GARCH(1,1) coefficients show a consistent pattern when compared to GARCH(1,1) models. We provide an example for the currency pair USDCHF in Figure 5, which shows the time-varying parameter set {ωt,αt,βt}\{\omega_{t},\alpha_{t},\beta_{t}\} of Neural-GARCH(1,1) against the constant set {ω,α,β}\{\omega,\alpha,\beta\} of GARCH(1,1). We observe across different time series that Neural-GARCH(1,1) consistently estimates a higher value for ω\omega and α\alpha, and a lower value for β\beta. In Figure 6 we show the zoomed-in images of the Neural-GARCH(1,1) coefficients shown in Figure 5 for the currency pair USDCHF. We observe that the coefficients follow well-behaved time-varying behaviour and similar dynamics is observed across all three parameters. This shows the effectiveness of our learned prior neural network (MLPpredMLP_{pred}) which models the distribution Pθ(𝜸t|𝜸t1,𝒓1:t1)P_{\theta}(\boldsymbol{\gamma}_{t}|\boldsymbol{\gamma}_{t-1},\boldsymbol{r}_{1:t-1}).

Refer to caption
Figure 5: Plots of Neural-GARCH(1,1) coefficients against GARCH(1,1) coefficients. The blue line represents the Neural-GARCH(1,1) αt\alpha_{t}(left), βt\beta_{t}(middle) and ωt\omega_{t}(right), and the orange line shows the GARCH(1,1) coefficients.
Refer to caption
Figure 6: Zoomed-in plots of the Neural-GARCH(1,1) coefficients shown in Figure 5 for USDCHF.

Having time-varying coefficients allows us to model the financial returns time series as a non-stationary process with a 0 unconditional mean but time-varying unconditional variance. Similarly, the authors in Stǎricǎ and Granger (2005) reported that by relaxing the stationarity assumption on daily S&P 500 returns and using locally stationary linear models, a better forecasting performance was achieved, and in their analysis they showed most of the dynamics of the returns time series to be concentrated in shifts of the unconditional variance. Our model provides a data-driven approach to modelling the returns process. During model training we optimise over the neural network parameters without implementing any external constraints, however we observe in Figure 6 that the model nonetheless outputs time-varying coefficients that satisfy the condition αt+βt<1\alpha_{t}+\beta_{t}<1, which is required for the model to have a well-defined unconditional variance.

5 Conclusions

In this paper we propose neural GARCH: a neural network adaptation of the univariate GARCH(1,1) and multivariate diagonal BEKK(1,1) models to model conditional heteroskedasticity in financial time series. Our model consists of a recurrent neural network that captures the temporal dynamics of the returns process and a multilayer perceptron to predict the next-step-ahead GARCH coefficients, which are then used to determine the conditional volatilities. The generative model of neural GARCH makes predictions based on all available information, and the inference model makes updated posterior estimates of the GARCH coefficients when new information becomes available. We tested two versions of neural GARCH on univariate and multivariate financial returns time series: one with normal innovations and the other with Student’s t innovations. When compared against their GARCH counterparts we observe that neural GARCH Student’s t is the best performer and from our analysis we hypothesise that this is due to the neural network’s ability to capture complex temporal dynamics present in the time series and also allowing us to relax the stationarity assumption that is fundamental to traditional GARCH models.

Acknowledgement

The authors would like to thank Fabio Caccioli, Department of Computer Science, University College London, for proofreading the manuscript and providing feedback.

References

  • Bri (2017) Changing dynamics: Time-varying autoregressive models using generalized additive modeling.. Psychological Methods, 2017, 22, 409–425.
  • Bauwens et al. (2006) Bauwens, L., Laurent, S. and Rombouts, J.V., Multivariate GARCH models: A survey. Journal of Applied Econometrics, 2006, 21, 79–109.
  • Bayer and Osendorfer (2014) Bayer, J. and Osendorfer, C., Learning Stochastic Recurrent Networks. arXiv preprint, 2014, pp. 1–9.
  • Bollerslev (1986) Bollerslev, T., Generalised Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 1986, 31, 307–327.
  • Bollerslev (1987) Bollerslev, T., A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return. The Review of Economics and Statistics, 1987, 69, 542–547.
  • Bollerslev et al. (1988) Bollerslev, T., Engle, R.F. and Wooldridge, J.M., A Capital Asset Pricing Model with Time-Varying Covariances. Journal of Political Economy, 1988, 96, 116–131.
  • Chan and Grant (2016) Chan, J.C. and Grant, A.L., Modeling energy price dynamics: GARCH versus stochastic volatility. Energy Economics, 2016, 54, 182–189.
  • Cho et al. (2014) Cho, K., van Merrienboer, B., Bahdanau, D. and Bengio, Y., On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. In Proceedings of the In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), 2014.
  • Christodoulakis and Satchell (2002) Christodoulakis, G.A. and Satchell, S.E., Correlated ARCH (CorrARCH): Modelling the time-varying conditional correlation between financial asset returns. European Journal of Operational Research, 2002, 139, 351–370.
  • Chu et al. (2017) Chu, J., Chan, S., Nadarajah, S. and Osterrieder, J., GARCH Modelling of Cryptocurrencies. Journal of Risk and Financial Management, 2017, 10, 17.
  • Chung et al. (2015) Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. and Bengio, Y., A recurrent latent variable model for sequential data. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 2015-January, pp. 2980–2988, 2015.
  • Denton and Fergus (2018) Denton, E. and Fergus, R., Stochastic Video Generation with a Learned Prior. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Vol.  3, pp. 1906–1919, 2018.
  • Engle (1982) Engle, R., Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica, 1982, 50, 987–1007.
  • Engle (2002) Engle, R., Dynamic Conditional Correlation. Journal of Business and Economic Statistics, 2002, 20, 339–350.
  • Engle and Kroner (1995) Engle, R. and Kroner, K., Multivariate Simultaneous Generalized Arch. Econometric Theory, 1995, 11, 122–150.
  • Erten et al. (2012) Erten, I., Murat, M. and Okay, N., Volatility Spillovers in Emerging Markets During the Global Financial Crisis : Diagonal BEKK Approach. Munich Personal RePEc Archive, 2012, pp. 1–18.
  • Fabius and van Amersfoort (2015) Fabius, O. and van Amersfoort, J.R., Variational recurrent auto-encoders. 3rd International Conference on Learning Representations, ICLR 2015 - Workshop Track Proceedings, 2015, pp. 1–5.
  • Fraccaro et al. (2016) Fraccaro, M., Sønderby, S.K., Paquet, U. and Winther, O., Sequential neural models with stochastic layers. In Proceedings of the Advances in Neural Information Processing Systems, pp. 2207–2215, 2016.
  • Franceschi et al. (2020) Franceschi, J.Y., Delasalles, E., Chen, M., Lamprier, S. and Gallinari, P., Stochastic latent residual video prediction. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Vol. PartF16814, pp. 3191–3204, 2020.
  • Friedman (1940) Friedman, M., A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 1940, 11, 86–92.
  • Glosten et al. (1993) Glosten, L.R., Jagannathan, R. and Runkle, D.E., On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance, 1993, 48, 1779–1801.
  • Heracleous (2007) Heracleous, M., Sample Kurtosis, GARCH-t and the Degrees of Freedom Issue. EUR Working Papers, 2007.
  • Ismail Fawaz et al. (2019) Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L. and Muller, P.A., Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 2019, 33, 917–963.
  • Karl et al. (2017) Karl, M., Soelch, M., Bayer, J. and Van Der Smagt, P., Deep variational Bayes filters: Unsupervised learning of state space models from raw data. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, ii, pp. 1–13, 2017.
  • Kingma and Welling (2014) Kingma, D.P. and Welling, M., Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, pp. 1–14, 2014.
  • Krishnan et al. (2017) Krishnan, R.G., Shalit, U. and Sontag, D., Structured inference networks for nonlinear state space models. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 2101–2109, 2017.
  • Malik (2005) Malik, A.K., European exchange rate volatility dynamics: An empirical investigation. Journal of Empirical Finance, 2005, 12, 187–215.
  • Nelson (1991) Nelson, D., Conditional Heteroskedasticity in Asset Returns : A New Approach. Econometrica, 1991, 59, 347–370.
  • Rangapuram et al. (2018) Rangapuram, S.S., Seeger, M., Gasthaus, J., Stella, L., Wang, Y. and Januschowski, T., Deep state space models for time series forecasting. In Proceedings of the Advances in Neural Information Processing Systems, pp. 7785–7794, 2018.
  • Stǎricǎ and Granger (2005) Stǎricǎ, C. and Granger, C., Nonstationarities in stock returns. Review of Economics and Statistics, 2005, 87, 503–522.
  • Tse and Tsui (2002) Tse, Y.K. and Tsui, A.K., A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business and Economic Statistics, 2002, 20, 351–362.
  • Van Der Weide (2002) Van Der Weide, R., GO-GARCH: A multivariate generalized orthogonal GARCH model. Journal of Applied Econometrics, 2002, 17, 549–564.
  • Wilcoxon (1945) Wilcoxon, F., Individual comparisons by ranking methods. Biometrics Bulletin, 1945, 1, 80–83.
  • Wu et al. (2013) Wu, Y., Lobato, J.M.H. and Ghahramani, Z., Dynamic covariance models for multivariate financial time series. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Vol.  28, pp. 1595–1603, 2013.