Censored EM algorithm for Weibull mixtures: application to arrival times of market orders

Markus Kreer CSSM, Faculty of Sciences, School of Physical Sciences, Department of Physics, University of Adelaide, 5005, Adelaide, Australia. CAMPUSERVICE GmbH, Servicegesellschaft der Johann Wolfgang Goethe-Universitat Frankfurt, Rossertstrasse 2, 60323 Frankfurt am Main, Germany Feldbergschule, Oberhochstadter Str., 20, 61440, Oberrursel (Taunus), Germany Ayşe Kızılersü CSSM, Faculty of Sciences, School of Physical Sciences, Department of Physics, University of Adelaide, 5005, Adelaide, Australia. [email protected] Anthony W. Thomas CSSM, Faculty of Sciences, School of Physical Sciences, Department of Physics, University of Adelaide, 5005, Adelaide, Australia.

Abstract

In a previous analysis the problem of “zero-inflated” time data (caused by high frequency trading in the electronic order book) was handled by left-truncating the inter-arrival times. We demonstrated, using rigorous statistical methods, that the Weibull distribution describes the corresponding stochastic dynamics for all inter-arrival time differences except in the region near zero. However, since the truncated Weibull distribution was not able to describe the huge “zero-inflated” probability mass in the neighbourhood of zero (making up approximately 50% of the data for limit orders), it became clear that the entire probability distribution is a mixture distribution of which the Weibull distribution is a significant part. Here we use a censored EM algorithm to analyse data for the difference of the arrival times of market orders, which usually have a much lower percentage of zero inflation, for four selected stocks trading on the London Stock Exchange.

^†^†preprint: ADP-20-34/T1144

I Introduction

Electronic Order Book (EOB) trading is now common to most stock exchanges. A set of EOB data from the FTSE for the period June to September 2010 was described and partly analysed in Ref.[1]. In this data set timestamps of changes in the EOB were given in milliseconds and because of the huge volume of EOB trading by (ultra-)high frequency trading algorithms, taking place on a microsecond scale, differences of timestamps appeared to be zero. This is why the timestamp time series appeared to be “zero-inflated”. Because of this rounding error, all differences in timestamps, $\Delta t$ , are mapped to our actual data set as follows

$\displaystyle\Delta t\in(0.0,0.5)=\mathcal{I}_{1}$	$\displaystyle\longrightarrow$	$\displaystyle 0=c_{1}$
$\displaystyle\Delta t\in[0.5,1.5)=\mathcal{I}_{2}$	$\displaystyle\longrightarrow$	$\displaystyle 1=c_{2}$
$\displaystyle\Delta t\in[1.5,2.5)=\mathcal{I}_{3}$	$\displaystyle\longrightarrow$	$\displaystyle 2=c_{3}$
	$\displaystyle etc.$

where the integer numbers $c_{1},c_{2},...$ are the rounded time stamp differences to the precision of milliseconds.

In Ref.[1] it was demonstrated that sets of non-negative time differences between subsequent market orders (MO) with $\Delta t>10$ milliseconds describe a random variable $X$ that could be fitted to parametric distributions, such as a left-truncated Weibull distribution, which passed a rigorous set of goodness-of-fit tests, such as Kolmogorof-Simirnov, Anderson-Darling, Cramer von Misses and Kuiper tests¹¹1The analysis in Ref.[1] also included limit orders (LO) but this will not be the topic of our letter..

Here we analyse the entire set of observations of the random variable $X$ , including the ”zero-inflated” ones, by starting from a Weibull distribution for the whole range

\displaystyle f_{i}(x|\alpha_{i},\beta_{i})=\frac{\beta_{i}}{\alpha_{i}}\left(\frac{x}{\alpha_{i}}\right)^{\beta_{i}-1}e^{-\left(\frac{x}{\alpha_{i}}\right)^{\beta_{i}}}\,,

(1)

where $\alpha_{i}>0$ is the scale parameter and $\beta_{i}>0$ the shape parameter. Note that for $\beta_{i}=1$ we recover the exponential distribution. We denote the parameter vector by $\theta_{i}=(\alpha_{i},\beta_{i})$ and use the subscript index $i$ to denote various Weibull or exponential distributions with different parameter vectors $\theta_{i}$ .

Now, the attempt to analyse the entire data set including the “zero-inflated” part (which sometimes was even more than half of the observed data set) requires a different procedure. One promising approach is to apply interval censoring to the observed data for the intervals of small time differences $\Delta t$ : we use the above intervals $\mathcal{I}_{1},\mathcal{I}_{2},\mathcal{I}_{3},...$ together with the information of how many observations belong to them, i.e. $N_{1},N_{2},N_{3},...$ . In this notation we can write an observed sample of a positive random variable $X$ as

	$\displaystyle x_{1},$	$\displaystyle x_{2},\cdots$	$\displaystyle,x_{n},x_{n+1},x_{n+2},...,x_{N}$		(2)
		$\displaystyle=$	$\displaystyle x_{1},x_{2},...,x_{n},\underbrace{c_{1},c_{1},...,c_{1}}_{N_{1}\text{times}},...,\underbrace{c_{L},c_{L},...,c_{L}}_{N_{L}\text{times}}\,,$		(2)

where, after grouping, all observations with index bigger than $n$ have been censored (with $L$ censoring intervals).

Despite censoring, the random variable $X$ itself will be assumed to come from a mixed distribution with a mixture of $M$ components. As a consequence of the findings in Ref.[1], we expect that a significant component of this mixed distribution should come from a Weibull distribution. Similar results where found in Ref.[3]: the authors found that the differences between subsequent MO’s for 30 DJIA stocks from the NYSE in October 1999 can be well described by Weibull distributions. Thus we want to investigate here mixtures of the following kind

\displaystyle f(x|\theta)=\sum_{i=1}^{M}\Pi_{i}f_{i}(x|\theta_{i})

(3)

where the weights, $\Pi_{i}$ , satisfy $\Pi_{i}\geq 0$ with $\sum_{i=1}^{M}\Pi_{i}=1$ and the probability density functions $f_{i}(\cdot|\theta_{i})$ are given by Eq.(1), where the parameters $\theta_{i}$ need to be estimated by maximum likelihood estimation or other methods²²2In the following the index ”i” stands for the mixture components and the index ”j” for the number of observation.. Defining $\Theta=(\theta_{1},...,\theta_{M})$ , the log-likelihood function $\mathcal{L}$ for $L$ censoring intervals $\mathcal{I}_{1},...,\mathcal{I}_{L}$ is now given as in Ref.[5] by

\displaystyle\mathcal{L}=f(x_{1}|\Theta)\cdots\ f(x_{n}|\Theta)\cdot\left(\int_{\mathcal{I}_{1}}~{}dyf(y|\Theta)\right)^{N_{1}}\cdots\left(\int_{\mathcal{I}_{L}}~{}dyf(y|\Theta)\right)^{N_{L}}

(4)

which is usually a difficult expression to handle. However, given our data sample Eq.(2), the estimation problem would be a standard maximum likelihood estimation (MLE) problem if we knew by some indicator $z_{ij}$ (taking values 0 or 1) denoting which observation $x_{j}$ belongs to which probability density function $f_{i}(\cdot|\theta_{i})$ and likewise some indicator $\tilde{z}_{i\ell}$ (also taking values 0 or 1) which censored observation $c_{\ell}$ belongs to which distribution. In this case, we would just group the observations and thus factorize the likelihood $\mathcal{L}=\mathcal{L}_{1}\cdots\mathcal{L}_{M}$ and separately maximize each group likelihood $\mathcal{L}_{i}$ corresponding to one mixture component $f_{i}(\cdot|\theta_{i})$ .

The expectation-maximization (EM) algorithm Ref.[6, 7, 8] will be used to iteratively generate estimates for the indicators $z_{ij}$ , for uncensored observations, $x_{j}$ and $\tilde{z}_{i\ell}$ for censored observations, $c_{\ell}$ , and solve the estimation problem (provided it converges). The problem for censored mixtures has been discussed in some detail in Ref.[9]. In our study we apply this analysis for mixtures of exponential and Weibull distributions and study market order (MO) inter-arrival times. In section 2 we give a very brief review of the censored EM algorithm as given by Ref.[9] and in section 3 the relevant equations for Weibull distributions are given. Section 4 summarizes our results and conclusions.

II Censored EM algorithm for mixtures in a nutshell

The EM algorithm is an iterative procedure where an expectation-step (E-step) tries to estimate the unobservable indicators $z_{ij}$ and $\tilde{z}_{i\ell}$ , respectively, and then uses the result in the maximization step (M-step) to estimate the parameters of the mixtures by an MLE. After the M-step there is another E-step and so on. Here we follow closely Ref.[9], where a hint is given that this iteration converges for certain function families (including exponential and Weibull distribution functions) in a certain limited parameter range. In the following sections the integer index $k$ denotes the number of the iteration.

II.1 The E-step

Using the above notation and assuming that the parameter vector $\theta_{i}^{(k-1)}$ for each mixture component $i$ are known from a previous step, Eq.(3.3) in the Ref.[9] provides the following term for the uncensored observations, $j=1,...,n$ for the mixture component $i$

\displaystyle z_{ij}^{(k)}

\displaystyle=

\displaystyle\frac{f_{i}(x_{j}|\theta_{i}^{(k-1)})}{\sum_{i=1}^{M}\Pi_{i}^{(k-1)}f_{i}(x_{j}|\theta_{i}^{(k-1)})}\cdot\Pi_{i}^{(k-1)}

(5)

and for the censored observations in the censoring intervals $\mathcal{I}_{\ell}=[\xi_{\ell-1},\xi_{\ell})$ with $\ell=1,2,...,L$ , Eq.(3.4) in the Ref.[9] provides the following expression for the mixture $i$ th component

\displaystyle\tilde{z}_{i\ell}^{(k)}

\displaystyle=

\displaystyle\frac{\int_{\mathcal{I}_{\ell}}dy~{}f_{i}(y|\theta_{i}^{(k-1)})}{\sum_{i=1}^{M}\Pi_{i}^{(k-1)}\int_{\mathcal{I}_{\ell}}dy~{}f_{i}(y|\theta_{i}^{(k-1)})}\cdot\Pi_{i}^{(k-1)}

(6)

II.2 The M-step

The sample size, $N=n+\sum_{\ell=1}^{L}N_{\ell}$ , is the number of all observations (uncensored and censored) and the new weights after the E-step are given by the following update-rule (Eq.(3.7) of Ref.[9]):

	$\displaystyle\Pi_{i}^{(k)}$	$\displaystyle=$	$\displaystyle\hskip 8.5359pt\frac{1}{N}\sum_{j=1}^{n}\frac{f_{i}(x_{j}\|\theta_{i}^{(k-1)})}{\sum_{i=1}^{M}\Pi_{i}^{(k-1)}f_{i}(x_{j}\|\theta_{i}^{(k-1)})}\cdot\Pi_{i}^{(k-1)}$
			$\displaystyle+\frac{1}{N}\sum_{\ell=1}^{L}N_{\ell}\frac{\int_{\mathcal{I}_{\ell}}dy~{}f_{i}(y\|\theta_{i}^{(k-1)})}{\sum_{i=1}^{M}\Pi_{i}^{(k-1)}\int_{\mathcal{I}_{\ell}}dy~{}f_{i}(y\|\theta_{i}^{(k-1)})}\cdot\Pi_{i}^{(k-1)}\,.$

Note that this update-rule is just same as the following formula

\displaystyle\Pi_{i}^{(k)}=\frac{1}{N}\left(\sum_{j=1}^{n}z_{ij}^{(k)}+\sum_{\ell=1}^{L}N_{\ell}\tilde{z}_{i\ell}^{(k)}\right)

(7)

Now, the following expression needs to be maximised with respect to the parameter vectors $\theta_{i}^{(k)}$ , where the index $i$ refers to the mixture and $k$ to the actual number in the iterative proceedure

	$\displaystyle\sum_{i=1}^{M}\sum_{j=1}^{n}z_{ij}^{(k)}\log{f_{i}(x_{j}\|\theta_{i}^{(k)})}+\sum_{i=1}^{M}\sum_{\ell=1}^{L}N_{\ell}~{}\tilde{z}_{i\ell}^{(k)}\int_{\mathcal{I}_{\ell}}dy~{}\log{f_{i}(y\|\theta_{i}^{(k)})}h_{i}(y\|c_{\ell},\theta_{i}^{(k-1)})$
			(8)

Here, the conditional density function on the censoring interval $\mathcal{I}_{\ell}$ is given by Ref.[9] Eq.(3.5)

\displaystyle h_{i}(y|c_{\ell},\theta_{i}^{(k-1)})=\frac{f_{i}(y|\theta_{i}^{(k-1)})}{\int_{\mathcal{I}_{\ell}}dy~{}f_{i}(y|\theta_{i}^{(k-1)})}

(9)

The MLE equations will be obtained by differentiating Eq.(8) with respect to the components of the parameter vector $\theta_{i}^{(k)}$ . Note that in this expression terms like $\sum_{i,j}z_{ij}^{(k)}\log{\Pi_{i}^{(k)}}$ and the normalization condition $\mu\cdot\left(\sum_{i}\Pi_{i}^{(k)}-1\right)$ are not given because they depend only on the parameter vector $\theta_{i}^{(k-1)}$ from the previous iteration and are irrelevant for the maximization with respect to the parameter vector $\theta_{i}^{(k)}$ . Note also that the maximization problem decouples into independent maximization problems for each individual probability distribution.

III Implementation of the censored Weibull mixtures

For simplicity we now consider a 2-component mixture consisting of an exponential distribution, denoted by $i=1$ , and a general Weibull distribution, denoted by $i=2$ , as given in Eq.(1). Note that here the $\alpha$ and $\beta$ have a subscript $i$ for the mixture and a superscript $(k)$ for the iteration step in the EM algorithm. The expression corresponding to Eq.(8) is given in Appendix A. Our algorithmic results extend the results as given by Ref.[10] for exponential mixtures. It is clear how to modify our computations for arbitrary mixtures, e.g. $(p+r)$ mixtures consisting of $p$ exponentials and $r$ Weibulls,with $p,r=0,1,2,...$

III.1 MLE equations for M-step

Maximising the expression in Appendix A, Eq.(LABEL:eq:MLE), with respect to the first parameter, $\alpha_{1}^{(k)}$ leads for the exponential distribution (index $i=1$ ) to an explicit solution

\displaystyle\alpha_{1}^{(k)}=\frac{\sum_{j=1}^{n}z_{1j}^{(k)}x_{j}+\sum_{\ell=1}^{L}N_{\ell}\tilde{z}_{1\ell}^{(k)}C_{1\ell}^{(k-1)}}{\sum_{j=1}^{n}z_{1j}^{(k)}+\sum_{\ell=1}^{L}N_{\ell}\tilde{z}_{1\ell}^{(k)}}

(10)

where the quantity $C_{1\ell}^{(k-1)}$ is defined in Appendix A, Eq.(15).

The MLE equations for the Weibull distribution (index $i=2$ ) are obtained by computing $\frac{\partial}{\partial\alpha_{2}^{(k)}}$ and equating the expression to 0,

	$\displaystyle 0$	$\displaystyle=$	$\displaystyle\sum_{j=1}^{n}z_{2j}^{(k)}\left[-1+\left(\frac{x_{j}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\right]$
		$\displaystyle+$	$\displaystyle\sum_{\ell=1}^{L}N_{\ell}~{}\tilde{z}_{2\ell}^{(k)}\left\{-1+\left(\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\frac{\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell-1}\right)-\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell}\right)}{e^{-\zeta_{\ell-1}}-e^{-\zeta_{\ell}}}\right\}$

and likewise for $\frac{\partial}{\partial\beta_{2}^{(k)}}$

	$\displaystyle 0=\sum_{j=1}^{n}z_{2j}^{(k)}\left[\frac{1}{\beta_{2}^{(k)}}+\log{\frac{x_{j}}{\alpha_{2}^{(k)}}}-\left(\frac{x_{j}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\log{\frac{x_{j}}{\alpha_{2}^{(k)}}}\right]$
	$\displaystyle+\sum_{\ell=1}^{L}N_{\ell}~{}\tilde{z}_{2\ell}^{(k)}\left\{\frac{1}{\beta_{2}^{(k)}}+\log{\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}}+\frac{D_{2\ell}^{(k)}}{e^{-\zeta_{\ell-1}}-e^{-\zeta_{\ell}}}\right\}\,\,$		(12)

The auxiliary functions $D_{2\ell}^{(k)}$ , as well as $\zeta_{\ell}$ , are defined in the Appendix A. Reference [11] has obtained similar results for the complete finite mixture of Weibull distributions albeit without the censoring terms displayed here.

III.2 Algorithmical implementation

We use one censoring interval only, $\mathcal{I}_{1}=(0,0.5)$ and $L=1$ to handle our zero-inflated data sets (see Ref.[1]). Also in the spirit of Ref.[12], as a practical approximation to simplify the solution of these non-linear equations, we use a self-consistency assumption by setting the ratios $\alpha_{2}^{(k-1)}/\alpha_{2}^{(k)}$ and $\beta_{2}^{(k-1)}/\beta_{2}^{(k)}$ equal to 1 in the MLE equations for the Weibull parameters. This considerably simplifies the MLE equations, Eqs. (LABEL:eq:Weibull_alpha)–(12), and in our experience leads to rapid convergence in most cases.³³3A comparison to the direct maximisation of the terms in Eq.(8) as given in the Appendix A leads to comparable results and might justify our trick to speed-up convergence. Note that if we already knew the exact solution $\alpha_{2}^{\infty}$ and $\beta_{2}^{\infty}$ , i.e. if we started with the true fixed point, these equations would be trivially satisfied. For the EM-algorithm we proceed as follows⁴⁴4We describe the censored EM-algorithm for a mixture of 1 exponential + 1 Weibull. The generalisation to arbitrary mixtures is obvious:

1.

At $k=0$ initialize mixture weights $\Pi_{i}^{(0)}$ and initial values $\alpha_{1}^{(0)},\alpha_{2}^{(0)},\beta_{2}^{(0)}$
2.

Compute $z_{ij}^{(k)},\tilde{z}_{i\ell}^{(k)},\Pi_{i}^{(k)}$ using Eq.(5), (6), (7) for $k=1,2,...$
3.

Compute $\alpha_{1}^{(k)}$ using Eq.(10) for exponential distribution for $k=1,2,...$
4.

Compute $\alpha_{2}^{(k)}$ using Eq.(LABEL:eq:Weibull_alpha) for Weibull distribution putting here only $\beta_{2}^{(k)}=\beta_{2}^{(k-1)}$ for $k=1,2,...$
5.

Compute $\beta_{2}^{(k)}$ using Eq.(12) for Weibull distribution for $k=1,2,...$
6.

Compute the current log-likelihood from Eq.(3)–(4)
7.

If the absolute value of “log-likelihood at step $k$ minus log-likelihood at step $k-1$ ” is bigger than $\varepsilon>0$ , then put $k\rightarrow k+1$ and go back to step 2., otherwise terminate.

In our experience this version of the censored EM algorithm (based on the MLE equations) converges sufficiently fast to a desired maximum solution for suitable initial conditions. We have taken $\varepsilon=10^{-5}$ and start with equal mixtures setting $\beta=1$ and take values of $\alpha$ motivated by our previous study Ref.[1]. When testing the algorithm, its results have been cross-checked by the corresponding algorithm which uses a direct maximisation of the objective function given in Eq.(8) in the M-step. Usually the results obtained both way agree fairly well. In Ref.[9] it is claimed that there is local convergence almost surely.

IV Analysis of MO arrival times and model selection

IV.1 A first approach: “naive” analysis of entire data set

Here we take all time stamps of a given stock from 1st June to 30th September 2010 into one large sample and fit various mixture models to the arrival times (i.e. the difference of subsequent timestamps) using the algorithm described above. This analysis would be sensible if the stochastic process were stationary and the sample data were independent identically distributed (i.i.d). In Table 1 we provide the log-likelihood per data point to obtain a first idea about the best model. The percentage numbers in round brackets denote the proportion of those data points which have been censored. Note that the quantity “log-likelihood per data point”, or “average log-likelihood”, corresponds to a negative Shannon entropy per event because for large sample size $N$ and i.i.d. data we have under the usual consistency property of the maximum likelihood estimator

\displaystyle\frac{1}{N}\log{\mathcal{L}}\sim\mathbb{E}(\log{p})=\int\log{p}\cdot dp

The physical meaning of this quantity is the “information content” or “surprisal” when a new MO enters the EOB.

Table 1: Average log-likelihood

		RIO	BARC	RRLN	ABFLN
		(2.5%)	(2.3%)	(2.5%)	(2.1%)
Model	dof
1 exp + 1 wbl	4	-8.357	-8.294	-9.510	-10.074
0 exp + 2 wbl	5	-8.349	-8.288	-9.492	-10.060
3 exp + 0 wbl	5	-8.422	-8.394	-9.792	-10.323
2 exp + 1 wbl	6	-8.350	-8.290	-9.494	-10.061
1 exp + 2 wbl	7	-8.348	-8.288	-9.489	-10.057
4 exp + 0 wbl	7	-8.360	-8.300	-9.511	-10.081
0 exp + 3 wbl	8	-8.348	-8.288	-9.489	-10.054
3 exp + 1 wbl	8	-8.349	-8.289	-9.491	-10.055
5 exp + 0 wbl	9	-8.351	-8.291	-9.497	-10.063

We see from Table 1 that all models seem to yield very similar Shannon entropy measures for an individual stock. Also, depending on the trading activity, the entropies differ: for a heavily traded stock such as RIO or BARC the “surprise” is lower than for less actively stocks such as RRLN or ABFLN. A model selection criterion here would be the model with least entropy and thus a mixture of 3 Weibull distributions seems to be the best choice.

However, we know from Ref.[1] that the data are only i.i.d. for smaller subsamples and thus the scale parameters in the exponential and Weibull distribution will exhibit a time dependence. When looking at smaller samples it became customary for model selection rather looking at the above Shannon entropy to put the sample size in relation to the degrees of freedom (dof) (see e.g. Ref.[15]). We have decided to use the Bayesian Information Criterion (BIC) (Ref.[16], Ref.[15])

\displaystyle\text{BIC}=-2\log{\mathcal{L}}+d\log{N}

(13)

where $\mathcal{L}$ are likelihoods and $d$ is the number of degrees of freedom of the individual model. From Table 1 we can generate hypothetically Table 2 by using Eq.(13) with $N=200$ for “larger stocks”, with higher trading activity (RIO, BARC), and $N=100$ for “smaller stocks”, with lower trading activity (RRLN, ABFLN). These values for $N$ correspond to time intervals of approximately 10 minutes and we have seen in Ref.[1] that these time intervals yield samples for which the asumption of stationarity of the data set can be somehow justified. We clearly see that now mixtures with low dof are favored, in particular the mixture “1 exponential + 1 Weibull”. Note also that the suggestion of Ref.[17] to model the arrival time distribution as a suitable mixture of exponential waiting times, will be excluded in the model selection using BIC by a too high value for the dof.

Table 2: Expected BIC from Table1 with different sample size

N

		RIO	BARC	RRLN	ABFLN
		(2.5%)	(2.3%)	(2.5%)	(2.1%)
Model	dof	$N=200$	$N=200$	$N=100$	$N=100$
1 exp + 1 wbl	4	3363.99	3338.79	1923.19	2035.99
0 exp + 2 wbl	5	3366.09	3341.69	1924.89	2038.49
3 exp + 0 wbl	5	3395.29	3384.09	1984.89	2091.09
2 exp + 1 wbl	6	3371.79	3347.79	1930.59	2043.99
1 exp + 2 wbl	7	3376.29	3352.29	1934.89	2048.49
4 exp + 0 wbl	7	3381.09	3357.09	1939.29	2053.29
0 exp + 3 wbl	8	3381.59	3357.59	1940.19	2053.19
3 exp + 1 wbl	8	3381.99	3357.99	1940.59	2053.39
5 exp + 0 wbl	9	3388.08	3364.08	1947.08	2060.28

IV.2 A second approach: Analysis of stationary subsamples

The previous analysis was naive as we assumed the entire data sample would consist of i.i.d. random variables. This is clearly not the case. As already noticed in Ref.[1] the volume of trading changes a great deal during a trading day, so that the scale parameter $\alpha$ must also vary. However, in Ref.[1] Kizilersü et al. argued that for “small” subsamples the assumptions of stationarity and i.i.d. might be expected to be justified. We take as subsample size $N=200$ for “larger stocks” with higher trading activity (RIO, BARC) and $N=100$ for “smaller stocks” with lower trading activity (RRLN, ABFLN). Motivated by Table 2, our candidates for possible models are the following mixtures

•

1 exp + 1 wbl (dof=4)
•

0 exp + 2 wbl (dof=5)
•

3 exp + 0 wbl (dof=5)
•

2 exp + 1 wbl (dof=6)

To quote Ref.[18] “[t]he practice of using the same data set to select a best-fitting model and to assess the significance of model parameter estimates or interpret the model structure is based on the often implicit assumption that the selected model is the true model that generated the data […]. However, this assumption does not hold in general. The sampling error related to model selection is ignored if the same data are used for inference.” Thus, we separate the task of model selection from the best fit of the parameters.

IV.2.1 Model selection and bootstrapping

For the model selection we take for every trading day in the months of June and July 2010 a random time stamp for each individual stock. From this time stamp onward we take 200 successive time stamps for the big stocks (RIO and BARC) and 100 successive time stamps for the small stocks (RRLN and ABFL). For each of these original samples we generate additional 999 bootstrap samples out of the original sample (e.g. for more details on bootstrapping the standard reference [19]) . For each ensemble of 1000 bootstrap samples we run the censored EM-algorithm and compute the log-likelihood and the BIC. Hence, for each model we have obtained a BIC distribution which is approximately normal. We then perform a Welch t-test with 5% confidence level on the following hypothesis

“Can the alternative model beat 1 exp + 1 Weibull using BIC?”

We then count the success rate for the winning distribution. Our results are depicted in Table 3, displaying the proportion of winnings. Our first intuition is confirmed: the mixture “1 exponential + 1 Weibull” is the clear winner.

Table 3: BIC-winners from bootstrapping ensembles in June and July 2010

Model	dof	RIO	BARC	RRLN	ABFLN
1 exp + 1 wbl	4	0.77	0.77	0.77	0.76
0 exp + 2 wbl	5	0.09	0.14	0.09	0.12
3 exp + 0 wbl	5	0.14	0.09	0.14	0.12
2 exp + 1 wbl	6	0	0	0	0

IV.2.2 Results for the preferred model: “1 exponential + 1 Weibull”

In this subsection we use the convention that for the exponential contribution of the mixture $\beta_{1}=1$ will be suppressed and for the Weibull contribution we write $\beta$ rather than $\beta_{2}$ for easier reading. In Table 4 the results of the estimated parameters are summarised. We find for the “complete” data samples that the Weibull shape parameter $\beta$ takes on a universal value of approximately 0.57 as already found in Ref.[1] (where using a left-truncation was the way of handling the “zero-inflated” data).

Table 4: Weibull component for “1 exp + 1 Weibull”

	$\frac{1}{N}\left(\log{L}\pm\Delta\log{L}\right)$	weight	median $\beta$	$\beta\pm\Delta\beta$	No. samples
RIO	-8.29 $\pm$ 0.63	0.82	0.55	0.57 $\pm$ 0.11	3107
BARC	-8.24 $\pm$ 0.72	0.81	0.57	0.58 $\pm$ 0.11	3346
RRLN	-9.35 $\pm$ 0.83	0.78	0.48	0.50 $\pm$ 0.12	535
ABFLN	-9.91 $\pm$ 0.78	0.81	0.47	0.49 $\pm$ 0.12	291

Table 5: Scale parameters

\alpha

in [ms] for “1 exp + 1 Weibull”

	median $\alpha_{2}$	range	median $\alpha_{1}$	range	No. samples
RIO	2499	[1099,5491]	17.2	[6.5,70.8]	3107
BARC	2452	[1078,5310]	19.0	[9.0,64.0]	3346
RRLN	13846	[6079,28988]	16.0	[6.6,47.1]	535
ABFLN	23095	[10374,45580]	18.7	[7.0,49.1]	291

From Table 5 we see that the median of the Weibull scale parameter $\alpha_{2}$ varies for different stocks (depending on their trading activity), whereas the median of the exponential scale parameter $\alpha_{1}$ seems to be the same for different stocks. From Table 5 we suspect a stronger time dependence for the Weibull scale parameter. To investigate this time dependence, we divide the trading hours from 9:00 to 17:30 UK summer time in intervals of 10 (respectively 30) minutes for RIO and BARC (RRLN and ABFLN respectively) and average the values of $\alpha_{2}$ , $\alpha_{1}$ and $\beta$ over the trading days from June to September 2010. Due to this partitioning of the data, the sample size will vary significantly, depending on the time of the day, with $N\gg 200$ at some times and $N\ll 200$ at some other times. As we have already remarked in Ref.[1] the Weibull scale parameter $\alpha_{2}$ will exhibit a strong time dependence during the trading day. The more actively a stock is traded, the smaller the Weibull scale parameter $\alpha_{2}$ will be. We see from Figure 1 that the typical scale parameter for the big stocks RIO and BARC is well below 10 seconds, and both curves as function of time are nearly identical. This can be explained by index arbitrage in the FTSE100, which requires trading in big stocks such as RIO and BARC at the same time to exploit the arbitrage. Obviously at lunch time there is less activity and the $\alpha_{2}$ becomes larger.

Refer to caption — Figure 1: Weibull $\alpha_{2}$ in milliseconds for various tickers during trading hours.

Since the Fisher information matrix is not diagonal for the MLE problem for Weibull distributions a bias in the estimated scale parameter $\alpha_{2}$ will also result in a bias of the estimated shape parameter $\beta$ . Thus, we see in Figure 2 that the value of $\beta$ becomes slightly larger at lunch time, when the estimated value of $\alpha_{2}$ is larger than that found at the opening or closing of trading hours. Despite this, the high level of stability found for $\beta$ suggests that it is reasonable to assume that the true shape parameter is universal with $\beta=0.57$ as conjectured in Ref.[1].

Finally we point the reader’s attention to Fig. 3 which displaying the time dependence of the scale parameters $\alpha_{1}$ of the exponential contribution and Fig. 4 displaying the time dependence of the mixture weights: the time dependence of $\alpha_{1}$ is mainly in the region of 10 to 20 milliseconds for all stocks with exceptions of the “smallest” and sometimes illiquid stock ABFLN. Also the weights of this exponential contribution in the two-component mixture is nearly independent of time and consistently around 20 %, valid for all stocks.

V Conclusion

The censored EM-algorithm in combination with a bootstrapping argument applied to the Baysean Information Criterion (BIC) allows us to choose as a model for the MO arrival times a two-component mixture distribution consisting of “1 exponential + 1 Weibull”. Of course, this conclusion is not applicable in the exceptional case of extraordinary trading activity in a stock when critical information is disclosed to the market participants. The first component of this mixed distribution is an exponential distribution with a relative weight of approximately 20% and a rather short scale parameter, in the range of 10 to 20 milliseconds. This result is independent of the stock under consideration and almost constant during the trading day. The second component, with a relative weight of approximately 80%, is a Weibull distribution for which the scale parameter varies intra-day with trading activity and lies typically between 1000 and 25000 milliseconds, albeit with universal shape parameter $\beta\approx 0.57$ .

This result can be understood with a view to a typical stock exchange computer architecture (e.g. Ref.[20]): market orders are captured via various order gateways which forward the electronic orders to the so called accumulator, where timestamps are given. Low latency order gateways typically cater for high-frequency traders (e.g. algorithmic hedge funds) whose buy- and sell-orders are generated by computers located in the stock exchange’s immediate neighbourhood to minimize the transmission time. Here an exponential waiting time between these signals (with a short time scale parameter) is to be expected resulting in a Poisson process for high-frequency buy- and sell-orders. On the other hand, VSAT gateways cater for stock brockers whose customers include both institutional and private clients and whose orders are usually transmitted via an internet connection. It has been shown that internet traffic sends information as TCP-“parcels” with arrival times that can be described by Weibull distributions with a shape parameter less than 1 (see Ref.[21], Ref.[22]). In Ref.[21] it was shown that a shape parameter $\beta=0.569$ provided an excellent fit to the observed data. This is close to the value that we find for the Weibull distribution. The natural conclusion would be that for the time period we have studied trading on the LSE approximately 10% of the trading was done by market participants whose orders were generated electronically by a computer in the immediate neighborhood, whereas the other 90% of market participants were using the internet to generate their buy- and sell-orders.

Finally we want to emphasize that by Ref.[23] any Weibull distribution function with scale parameter $\beta$ smaller than 1 can be expressed as a superposition of exponential waiting time distributions. Thus our favorite model “1 Weibull + 1 exponential” is equivalent to a suitable mixture of exponential waiting time distributions. Consequently, the proposed censored EM algorithm for finite Weibull mixtures maybe compared to the problem discussed firstly in Ref.[17] and later expanded in Ref.[24] of how to fit a waiting time distribution consisting of a finite number of exponential waiting time distributions to the observed tick-by-tick data. Although exponential mixtures with more than 5 components seem to describe our actual data sets very well, they are excluded by the BIC due to their too high dof-value.

Appendix A Expression of objective function in M-step and further auxilary functions

First define the censoring intervals $\mathcal{I}_{\ell}=[\xi_{\ell-1},\xi_{\ell})$ . For simplicity introduce a transformed quantity $\zeta_{\ell}=(\xi_{\ell}/\alpha_{i}^{(k-1)})^{\beta_{i}^{(k-1)}}$ . Then we find after some lengthy computations in the spirit of section 2 the following version of Eq.(8)

			$\displaystyle\sum_{j=1}^{n}z_{1j}^{(k)}\left[\log{\frac{1}{\alpha_{1}^{(k)}}}-\frac{x_{j}}{\alpha_{1}^{(k)}}\right]+\sum_{\ell=1}^{L}N_{\ell}~{}\tilde{z}_{1\ell}^{(k)}\Bigg{[}\log{\frac{1}{\alpha_{1}^{(k)}}}-\frac{1}{\alpha_{1}^{(k)}}C_{1\ell}^{(k-1)}\Bigg{]}$
		$\displaystyle+$	$\displaystyle\sum_{j=1}^{n}z_{2j}^{(k)}\left[\log{\frac{\beta_{2}^{(k)}}{\alpha_{2}^{(k)}}}+(\beta_{2}^{(k)}-1)\log{\frac{x_{j}}{\alpha_{2}^{(k)}}}-\left(\frac{x_{j}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\right]$
		$\displaystyle+$	$\displaystyle\sum_{\ell=1}^{L}N_{\ell}~{}\tilde{z}_{2\ell}^{(k)}\Bigg{\{}\log{\frac{\beta_{2}^{(k)}}{\alpha_{2}^{(k)}}}+(\beta_{2}^{(k)}-1)\log{\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}}$
		$\displaystyle+$	$\displaystyle\left.\frac{\beta_{2}^{(k)}-1}{\beta_{2}^{(k-1)}}~{}\frac{e^{-\zeta_{\ell-1}}\log{\zeta_{\ell-1}}-e^{-\zeta_{\ell}}\log{\zeta_{\ell}}+\Gamma(0,\zeta_{\ell-1})-\Gamma(0,\zeta_{\ell})}{e^{-\zeta_{\ell-1}}-e^{-\zeta_{\ell}}}\right.$
		$\displaystyle{\color[rgb]{1,0,0}-}$	$\displaystyle\left(\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\frac{\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell-1}\right)-\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell}\right)}{e^{-\zeta_{\ell-1}}-e^{-\zeta_{\ell}}}\Bigg{\}}$

where

\displaystyle C_{1\ell}^{(k-1)}=\alpha_{1}^{(k-1)}+\frac{\xi_{\ell-1}e^{-\frac{\xi_{\ell-1}}{\alpha_{1}^{(k-1)}}}-\xi_{\ell}e^{-\frac{\xi_{\ell}}{\alpha_{1}^{(k-1)}}}}{e^{-\frac{\xi_{\ell-1}}{\alpha_{1}^{(k-1)}}}-e^{-\frac{\xi_{\ell}}{\alpha_{1}^{(k-1)}}}}

(15)

Note that in Eq.(LABEL:eq:MLE) we have for the lower censoring interval boundary $\zeta_{0}=0$ that the term with $\ell=1$ is well-behaved and reduces to

	$\displaystyle\frac{e^{-\zeta_{\ell-1}}\log{\zeta_{\ell-1}}-e^{-\zeta_{\ell}}\log{\zeta_{\ell}}+\Gamma(0,\zeta_{\ell-1})-\Gamma(0,\zeta_{\ell})}{e^{-\zeta_{\ell-1}}-e^{-\zeta_{\ell}}}$
	$\displaystyle\vspace*{5cm}=-\frac{\gamma+e^{-\zeta_{1}}\log{\zeta_{1}}+\Gamma(0,\zeta_{1})}{1-e^{-\zeta_{1}}}$

We need to maximise expression Eq.(LABEL:eq:MLE) with respect to the parameters $\alpha_{1}^{(k)},\alpha_{2}^{(k)},\beta_{2}^{(k)}$ , all other quantities being known from previous steps.

For the MLE equations the following expression arises during the lengthy computations

$\displaystyle D_{2\ell}^{(k)}$	$\displaystyle=$	$\displaystyle\frac{e^{-\zeta_{\ell-1}}\log{\zeta_{\ell-1}}-e^{-\zeta_{\ell}}\log{\zeta_{\ell}}+\Gamma(0,\zeta_{\ell-1})-\Gamma(0,\zeta_{\ell})}{\beta_{2}^{(k-1)}}$
		$\displaystyle-\frac{1}{\beta_{2}^{(k-1)}}\left(\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\Bigg{[}\sum_{p=0}^{\infty}\frac{(-1)^{p}}{p!}\frac{\zeta_{\ell-1}^{\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1+p}-\zeta_{\ell}^{\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1+p}}{\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1+p\right)^{2}}$
		$\displaystyle-\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1\right)\left(\log{\zeta_{\ell-1}}-\log{\zeta_{\ell}}\right)$
		$\displaystyle+\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell-1}\right)\log{\zeta_{\ell-1}}-\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell}\right)\log{\zeta_{\ell}}\Bigg{]}$
		$\displaystyle-\log{\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}}\left(\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\Bigg{\{}\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell-1}\right)-\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{\ell}\right)\Bigg{\}}$

and for $\ell=1$

$\displaystyle D_{21}^{(k)}$	$\displaystyle=$	$\displaystyle\frac{-\gamma-e^{-\zeta_{1}}\log{\zeta_{1}}-\Gamma(0,\zeta_{1})}{\beta_{2}^{(k-1)}}$
		$\displaystyle-\frac{1}{\beta_{2}^{(k-1)}}\left(\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\Bigg{[}-\sum_{p=0}^{\infty}\frac{(-1)^{p}}{p!}\frac{\zeta_{1}^{\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1+p}}{\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1+p\right)^{2}}$
		$\displaystyle+\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1\right)\log{\zeta_{1}}-\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{1}\right)\log{\zeta_{1}}\Bigg{]}$
		$\displaystyle-\log{\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}}\left(\frac{\alpha_{2}^{(k-1)}}{\alpha_{2}^{(k)}}\right)^{\beta_{2}^{(k)}}\Bigg{\{}\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1\right)-\Gamma\left(\frac{\beta_{2}^{(k)}}{\beta_{2}^{(k-1)}}+1,\zeta_{1}\right)\Bigg{\}}$

References

Kizilersu et al. [2016] A. Kizilersu, M. Kreer, A. Thomas, and M. Feindt, Universal behaviour in the stock market: Time dynamics of the electronic orderbook, Physics Letters A380, 2501 (2016).
Note [1] The analysis in Ref. [1] also included limit orders (LO) but this will not be the topic of our letter.
Politi and Scalas [2008] M. Politi and E. Scalas, Fitting the empirical distribution of intratrade durations, Physica A 387 (8-9), 2025 (2008).
Note [2] In the following the index.
Kendall and Stuart [1979] M. Kendall and A. Stuart, The advanced theory of statistics II - Inference and relationship (Griffin London, 1979) 4th revised edition.
McLachlan and Krishnan [2008] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions (John Wiley and Sons, Inc., 2008).
Dempster et al. [1977] A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B 39 (1), 1 (1977).
Redner and Walker [1984] R. Redner and F. Walker, Mixture densities, maximum likelihood and the em algorithm, SIAM Review 26 (2), 195 (1984).
Chauveau [1995] D. Chauveau, A stochastic em algorithm for mixtures with censored data, Journal of Statistical Planning and Inference 46, 1 (1995).
Jewell [1982] N. Jewell, Mixtures of exponential distributions, The Annals of Statistics 10 (2), 479 (1982).
Elmahdy and Aboutahoun [2013] E. Elmahdy and A. Aboutahoun, A new approach for parameter estimation of finite weibull mixture distributions for reliability modeling, Applied Mathematical Modelling 37 (4), 1800 (2013).
Efron [1967] B. Efron, The two sample problem with censored data, Vol. 4 (Proc. Fifth Berkeley Symp. Math. Statist. Probab., 1967) pp. 831–853,.
Note [3] A comparison to the direct maximisation of the terms in Eq. (8) as given in the Appendix A leads to comparable results and might justify our trick to speed-up convergence. Note that if we already knew the exact solution $\alpha_{2}^{\infty}$ and $\beta_{2}^{\infty}$ , i.e. if we started with the true fixed point, these equations would be trivially satisfied.
Note [4] We describe the censored EM-algorithm for a mixture of 1 exponential + 1 Weibull. The generalisation to arbitrary mixtures is obvious.
Burnham and Anderson [2002] K. Burnham and D. Anderson, Model Selection and Multimodel Inference - A Practical Information-Theoretic Approach (Springer, New York, 2002) 2nd edition.
Kass and Raftery [1995] R. E. Kass and A. Raftery, Bayes factors, Journal of the American Statistical Association 90, 773 (1995).
Scalas [2013] E. Scalas, Mixtures of compound poisson processes as models of tick-by-tick financial data, Chaos, Solitons & Fractals 34 (1), 33 (2013).
Lubke and Campbell [2016] G. Lubke and I. Campbell, Inference based on the best-fitting model can contribute to the replication crisis: Assessing model selection uncertainty using a bootstrap approach, Structural Equation Modeling: A Multidisciplinary Journal 23, 479 (2016).
Efron [1979] B. Efron, Bootstrap methods: Another look at the jackknife, Ann. Statist. 7, 1 (1979).
Loveless [2013] J. Loveless, Barbarians at the gateways, Commun. ACM 56, 42–49 (2013).
Feldmann [2000] A. Feldmann, Characteristics of tcp connection arrivals, in Self-Similar Network Traffic and Performance Evaluation (John Wiley and Sons, Ltd, 2000) Chap. 15, pp. 367–399.
Arfeen et al. [2019] A. Arfeen, K. Pawlikowski, D. McNickle, and A. Willig, The role of the weibull distribution in modelling traffic in internet access and backbone core networks, Journal of Network and Computer Applications 141, 1 (2019).
Yannaros [1994] N. Yannaros, Weibull renewal processes, Annals of the Institute of Statistical Mathematics 46 (4), 641 (1994).
Ponta et al. [2019] L. Ponta, M. Trinh, M. Raberto, and E. Scalas, Modeling non-stationarities in high-frequency financial time series, Physica A 521, 173 (2019).