This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\jyear

2021

[1]\fnm Larissa A. \surMatos

[1]\orgdivDepartamento de Estatística, \orgnameUniversidade Estadual de Campinas, \orgaddress\cityCampinas, \postcode13083-859, \stateSão Paulo, \countryBrazil

2]\orgdivDepartamento de Matemáticas, \orgnameEscuela Superior Politécnica del Litoral, \orgaddress\cityGuayaquil, \postcode090112, \countryEcuador

Censored autoregressive regression models with Student-tt innovations

\fnmKatherine A. L. \surValeriano [email protected]    \fnmFernanda L. \surSchumacher [email protected]    \fnmChristian E. \surGalarza [email protected]    [email protected] * [
Abstract

The Student-tt distribution is widely used in statistical modeling of datasets involving outliers since its longer-than-normal tails provide a robust approach to hand such data. Furthermore, data collected over time may contain censored or missing observations, making it impossible to use standard statistical procedures. This paper proposes an algorithm to estimate the parameters of a censored linear regression model when the regression errors are autocorrelated and the innovations follow a Student-tt distribution. To fit the proposed model, maximum likelihood estimates are obtained throughout the SAEM algorithm, which is a stochastic approximation of the EM algorithm useful for models in which the E-step does not have an analytic form. The methods are illustrated by the analysis of a real dataset that has left-censored and missing observations. We also conducted two simulations studies to examine the asymptotic properties of the estimates and the robustness of the model.

keywords:
Autoregressive models; Censored data; Heavy-tailed distributions; Missing data; SAEM algorithm

1 Introduction

A linear regression model is a commonly used statistical tool to analyze the relation between a response (dependent) and some explanatory (independent) variables. In these models, the errors are usually considered as independent and identically distributed random variables with zero-mean and constant-variance. However, observations collected over time are often autocorrelated rather than independent. Examples of this kind of data abound in economics, medicine, environmental monitoring, and social sciences and therefore the study of the dependence among observations is of considerable practical interest (see tsay2005analysis; box2015time).

A stochastic model that has been successfully used in many real-world applications to deal with serial correlation in the error term is the autoregressive (AR) model. In the AR model, the current state of the process is expressed as a finite linear combination of previous states and a stochastic shock of disturbance, which is also known as innovation in the time series literature. In general, it is assumed that the disturbance is normally distributed. For example, beach1978maximum suggested a maximum likelihood (ML) estimation method to estimate the parameters of AR(1) error term regression model, ullah1983estimation used the two-stage Prais-Winsten estimator for a linear regression model with first-order autocorrelated disturbances, where a Gaussian assumption is considered for the innovations to derive a large-sample asymptotic approximation for the variance-covariance matrix, and more recently alpuim2008efficiency proposed a linear regression model with the sequence of error terms following a Gaussian autoregressive stationary process, in which the parameters of the model are estimated using ML and ordinary least squares.

In practice, however, the normality assumption may be unrealistic, especially in the presence of outliers. To relax this assumption, tiku1999estimating suggested considering non-normal symmetric innovations in a simple regression model with AR(1) error term. tuacc2018robust proposed a model with AR(p) error term considering the tt distribution with fixed degrees of freedom as an alternative to the normal distribution and applying the conditional maximum likelihood method to find the estimators of the model parameters, and nduka2018based developed an EM algorithm to estimate the parameters in autoregressive models of order pp with Student-tt innovations.

An additional challenge arises when some observations are censored or missing. In the first case, values can occur out of the range of a measuring instrument, and in the second, the value is unknown. Censored time series data are frequently encountered in environmental monitoring, medicine, and economics, and there are some proposals in the literature related to censored autoregressive linear regression models with normal innovations. For example, zeger1986regression proposed a Gaussian censored autocorrelated model with parameters estimated by maximizing the full likelihood using the quasi-Newton and Expectation-Maximization (EM) algorithms, and the authors pointed out that the quasi-Newton approach is preferable because it estimates the variance-covariance matrix for the parameters.

Additionally, park2007censored developed a censored autoregressive moving average model with Gaussian white noise, in which the algorithm works iteratively imputing the censored or missing observations by sampling from the truncated normal distribution and the parameters are estimated by maximizing an approximate full likelihood function, and wang2018quasi suggested a quasi-likelihood method using the complete-incomplete data framework. Recently, schumacher2017censored considered an autoregressive censored linear model with the parameters estimated via an analytically tractable and efficient stochastic approximation of the EM (SAEM) algorithm, and liu2019parameter proposed a coupled Monte Carlo Markov Chain (MCMC)-SAEM algorithm to fit an AR(p)(p) regression model with Student-tt innovations accounting for missing data.

Nevertheless, to the best of our knowledge there are no studies that consider the Student-tt distribution for the innovations in censored autoregressive models from a likelihood-based perspective. Thence, in this work we propose an EM-type algorithm to estimate the parameters of a censored regression model with autoregressive errors and innovations following a Student-tt distribution. Specifically, the SAEM algorithm is considered to avoid the direct computation of complex expressions on the E-step of the EM algorithm (dempster1977maximum) and since its computational effort is much smaller in comparison to the Monte Carlo EM algorithm (wei1990monte), as shown by jank2006implementing.

The paper is organized as follows. Section 2 introduces the autoregressive regression models of order pp and the SAEM algorithm. Section 3 is devoted to formulate the censored autoregressive model with Student-tt innovations, providing the expressions used to estimate the parameters of the proposed model, the method used to estimate the observed information matrix, and the expression to make predictions. Section 4 displays some results of two simulation studies carried out to examine the asymptotic properties of the estimators, and to demonstrate the robustness of the model. Section 5 applies the proposed model to the total phosphorus concentration dataset available in the R package ARCensReg (schumacher2016package; rteam), along with an analysis of quantile residuals. Finally, Section 6 concludes with a discussion.

2 Preliminaries

We begin by defining some notations and introducing the basic definitions and algorithms used throughout this work. The normal distribution with mean μ\mu and variance σ2\sigma^{2} is denoted by 𝒩(μ,σ2)\mathcal{N}(\mu,\sigma^{2}), the notation for a pp-variate normal distribution with mean 𝝁\boldsymbol{\mu} and variance-covariance matrix 𝚺\boldsymbol{\Sigma} is 𝒩p(𝝁,𝚺)\mathcal{N}_{p}(\boldsymbol{\mu},\boldsymbol{\Sigma}), and a truncated multivariate normal distribution with truncation region 𝔸\mathbb{A} is denoted by TNp(𝝁,𝚺;𝔸)\mbox{TN}_{p}(\boldsymbol{\mu},\boldsymbol{\Sigma}\mathchar 24635\relax\;\mathbb{A}). The notation for a random variable with Gamma distribution is given by 𝒢(α,β)\mathcal{G}(\alpha,\beta), where α>0\alpha>0 is the shape parameter and β>0\beta>0 is the rate parameter. The Student-tt distribution with location parameter μ\mu, scale parameter σ2>0\sigma^{2}>0, and ν>0\nu>0 degrees of freedom is denoted by t(μ,σ2,ν)t(\mu,\sigma^{2},\nu). A random variable XX with uniform distribution on the interval (a,b)(a,b) is denoted by X𝒰(a,b)X\sim\mathcal{U}(a,b). Furthermore, the expression “iidiid” means independent and identically distributed, 𝐀{\bf A}^{\top} represents the transpose of 𝐀{\bf A}, Γ(a)=0xa1exdx\Gamma(a)=\int_{0}^{\infty}x^{a-1}e^{-x}\mathrm{d}x is the gamma function evaluated at a>0a>0, and t=σ(Y1,Y2,,Yt)\mathcal{F}_{t}=\sigma(Y_{1},Y_{2},\ldots,Y_{t}) is the σ\sigma-field generated by {Y1,Y2,,Yt}\{Y_{1},Y_{2},\ldots,Y_{t}\}. Finally, for integers 2tn2\leq t\leq n and 1p<t1\leq p<t, we use the index (t,p)(t,p) as follows: for a vector y=(y1,y2,,yn)n\textbf{y}=(y_{1},y_{2},\ldots,y_{n})^{\top}\in\mathbb{R}^{n}, then y(t,p)=(yt1,yt2,,ytp)p\textbf{y}_{(t,p)}=(y_{t-1},y_{t-2},\ldots,y_{t-p})^{\top}\in\mathbb{R}^{p}; and for a matrix Xn×q\textbf{X}\in\mathbb{R}^{n\times q} with rows xiq\textbf{x}_{i}\in\mathbb{R}^{q}, 1in1\leq i\leq n, then X(t,p)=[xt1,xt2,,xtp]p×q\textbf{X}_{(t,p)}=[\textbf{x}_{t-1},\textbf{x}_{t-2},\ldots,\textbf{x}_{t-p}]^{\top}\in\mathbb{R}^{p\times q}.

2.1 The autoregressive regression model of order pp

First, consider a linear regression model with errors that are autocorrelated as a discrete-time autoregressive process of order pp; thus, the model for an observation at time tt is given by

Yt\displaystyle Y_{t} =\displaystyle= xt𝜷+𝝃t,\displaystyle\textbf{x}_{t}^{\top}\boldsymbol{\beta}+\boldsymbol{\xi}_{t}, (1)
𝝃t\displaystyle\boldsymbol{\xi}_{t} =\displaystyle= ϕ1𝝃t1++ϕp𝝃tp+ηt,\displaystyle\phi_{1}\boldsymbol{\xi}_{t-1}+\ldots+\phi_{p}\boldsymbol{\xi}_{t-p}+\eta_{t},
ηt\displaystyle\eta_{t} iid\displaystyle\stackrel{{\scriptstyle iid}}{{\sim}} F(),t=1,,n,\displaystyle F(\cdot),\quad t=1,\ldots,n, (2)

where YtY_{t} represents the dependent variable observed at time tt, xt=(xt1,,xtq)\textbf{x}_{t}=(x_{t1},\ldots,x_{tq})^{\top} is a q×1q\times 1 vector of known covariables, 𝜷\boldsymbol{\beta} is a q×1q\times 1 vector of unknown regression parameters to be estimated, and 𝝃t\boldsymbol{\xi}_{t} is the regression error and is assumed to follow an autoregressive model, with ϕ=(ϕ1,,ϕp)\boldsymbol{\phi}=(\phi_{1},\ldots,\phi_{p})^{\top} begin a p×1p\times 1 vector of autoregressive coefficients and ηt\eta_{t} being a shock of disturbance with distribution F()F(\cdot). The term denoted by ηt\eta_{t} is also known as the innovation in the time series literature (see, for instance, box2015time; schumacher2017censored; wang2018quasi).

Now, suppose that ηt\eta_{t} in Equation 2 follows a Student-tt distribution with location parameter 0, scale parameter σ2>0\sigma^{2}>0, and ν>0\nu>0 degrees of freedom, whose probability density function (pdf) can be written as

f(η;σ2,ν)=Γ(ν+12)Γ(ν2)(πνσ2)12(1+η2νσ2)ν+12,f(\eta\mathchar 24635\relax\;\sigma^{2},\nu)=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right)(\pi\nu\sigma^{2})^{\frac{1}{2}}}\left(1+\frac{\eta^{2}}{\nu\sigma^{2}}\right)^{-\frac{\nu+1}{2}}, (3)

η.\eta\in\mathbb{R}. Then, the model defined by Equations 1-3 will be called the autoregressive regression tt model of order pp (ARt(p)t(p)) hereinafter.

Particularly, the distribution of ηtt(0,σ2,ν)\eta_{t}\sim t(0,\sigma^{2},\nu) might be written using a scale mixture of normal (SMN) distribution as described in kotz2004multivariate, where ηt\eta_{t} is equal in distribution to Zt/UtZ_{t}/\sqrt{U_{t}}, with Ztiid𝒩(0,σ2)Z_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma^{2}), Utiid𝒢(ν/2,ν/2)U_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{G}(\nu/2,\nu/2), and ZtZ_{t} being independent of UtU_{t}. Consequently, the conditional distribution of ηt\eta_{t} given Ut=utU_{t}=u_{t} is normal with zero mean and variance σ2/ut\sigma^{2}/u_{t}, i.e.,

ηt(Ut=ut)ind𝒩(0,σ2/ut)andUtiid𝒢(ν/2,ν/2),\eta_{t}\mid\,(U_{t}=u_{t})\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{N}(0,\sigma^{2}/u_{t})\ \mbox{and}\ U_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{G}(\nu/2,\nu/2),

t=1,,n.t=1,\ldots,n. This representation facilitates the implementation of an EM-type algorithm, which is discussed next.

2.2 The SAEM algorithm

The EM algorithm was first introduced by dempster1977maximum and provides a general approach to the iterative computation of ML estimates in problems with incomplete data. Let yc=(yo,ym)\textbf{y}_{c}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{\top})^{\top} be the complete data, where yo\textbf{y}_{o} is the observed data and ym\textbf{y}_{m} is the incomplete (or missing) data. Let c(𝜽;yc)\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c}) be the complete log-likelihood function. The EM algorithm proceeds as follows:

  • E-step: Let 𝜽^(k)\widehat{\boldsymbol{\theta}}^{(k)} be the estimate of 𝜽\boldsymbol{\theta} at the kkth iteration. Compute the conditional expectation of the complete log-likelihood function

    Qk(𝜽)=𝔼[c(𝜽;yc)yo,𝜽^(k)].Q_{k}(\boldsymbol{\theta})=\mathbb{E}\left[\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})\mid\textbf{y}_{o},\widehat{\boldsymbol{\theta}}^{(k)}\right].
  • M-step: Maximize Qk(𝜽)Q_{k}(\boldsymbol{\theta}) with respect to 𝜽\boldsymbol{\theta} to obtain 𝜽^(k+1)\widehat{\boldsymbol{\theta}}^{(k+1)}.

There are situations where the E-step has no analytic form. To deal with these cases, wei1990monte proposed to replace the expectation in the computation of Qk(𝜽)Q_{k}(\boldsymbol{\theta}) with a Monte Carlo (MC) approximation based on a large number of independent simulations of the missing data, which was called the Monte Carlo EM (MCEM) algorithm. As an alternative to the computationally intensive MCEM algorithm, delyon1999convergence proposed a scheme that splits the E-step into a simulation step and an integration step (using a stochastic approximation), while the maximization step remains unchanged. This algorithm was called the SAEM algorithm and performs as follows:

  • E-step:

    • 1.

      Simulation: Draw MM samples of the missing data ym(l,k),l=1,,M\textbf{y}_{m}^{(l,k)},l=1,\ldots,M from the conditional distribution f(ym;𝜽^(k),yo)f(\textbf{y}_{m}\mathchar 24635\relax\;\widehat{\boldsymbol{\theta}}^{(k)},\textbf{y}_{o}).

    • 2.

      Stochastic approximation: Update Qk(𝜽){Q}_{k}(\boldsymbol{\theta}) by

      Q^k(𝜽)=Q^k1(𝜽)\displaystyle\widehat{Q}_{k}(\boldsymbol{\theta})=\widehat{Q}_{k-1}(\boldsymbol{\theta})
      +δk(1Ml=1Mc(𝜽;ym(k,l),yo)Q^k1(𝜽)),\displaystyle\hskip 14.22636pt+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{m}^{(k,l)},\textbf{y}_{o})-\widehat{Q}_{k-1}(\boldsymbol{\theta})\right),

      where δk\delta_{k} is a decreasing sequence of positive numbers such that k=1δk=\sum_{k=1}^{\infty}\delta_{k}=\infty and k=1δk2<\sum_{k=1}^{\infty}\delta_{k}^{2}<\infty, also known as the smoothness parameter (kuhn2005maximum).

  • M-step: Update 𝜽^(k)\widehat{\boldsymbol{\theta}}^{(k)} as

    𝜽^(k+1)=argmax𝜽Q^k(𝜽).\widehat{\boldsymbol{\theta}}^{(k+1)}=\underset{\boldsymbol{\theta}}{\operatorname{argmax}}\;\widehat{Q}_{k}(\boldsymbol{\theta}).

This process is iterated until some distance between two successive evaluations of the actual log-likelihood function becomes small enough.

3 Proposed model and parameter estimation

This section is devoted the formulating and the ML estimation of the censored autoregressive linear model. Here, we specify the log-likelihood function, the algorithm used to overcome the parameter estimation problem, expressions to approximate the standard errors, and expressions to predict at unobserved times when the values of some covariables (if needed) at these times are available.

3.1 The censored ART(pp) model

Assume that the response variable YtY_{t} given in Equation 1 is not fully observed for all times. Instead, we observe (Vt,Ct)(V_{t},C_{t}), where VtV_{t} represents either an observed value or the limit of detection (LOD) of a censored variable, and CtC_{t} is the censoring indicator defined as

Ct={1if Vt1YtVt2,(censored)0if Vt=Yt.(observed)\displaystyle C_{t}=\left\{\begin{array}[]{lll}1&\mbox{if }\quad V_{t1}\leq Y_{t}\leq V_{t2},&(\mbox{censored})\\ 0&\mbox{if }\quad V_{t}=Y_{t}.&(\mbox{observed})\end{array}\right. (6)

Thereby, the model defined by Equations 1-6 will be called censored autoregressive regression tt model of order pp (CARt(p)t(p)). We say that YtY_{t} is left-censored if Ct=1C_{t}=1 and Vt=(,Vt2]V_{t}=(-\infty,V_{t2}], right-censored if Ct=1C_{t}=1 and Vt=[Vt1,+)V_{t}=[V_{t1},+\infty), interval censored if Ct=1C_{t}=1 and Vt=[Vt1,Vt2]V_{t}=[V_{t1},V_{t2}], and it represents a missing value if Ct=1C_{t}=1 and Vt=(,+)V_{t}=(-\infty,+\infty).

To compute the log-likelihood function of the CARt(p)t(p) model, we condition the marginal distribution on the first pp observations, which are considered fully observed, and we denote by yono\textbf{y}_{o}\in\mathbb{R}^{n_{o}} the vector of observed data and by ymnm\textbf{y}_{m}\in\mathbb{R}^{n_{m}} the vector of censored or missing observations, with no+nm=npn_{o}+n_{m}=n-p. Note also that the distribution of YtY_{t} conditional to all the preceding data t1\mathcal{F}_{t-1} only depends on the previous pp observations. Then, for 𝜽=(𝜷,ϕ,σ2,ν)\boldsymbol{\theta}=(\boldsymbol{\beta}^{\top},\boldsymbol{\phi}^{\top},\sigma^{2},\nu)^{\top}, YtY_{t} given y(t,p)\textbf{y}_{(t,p)} follows a Student-tt distribution with location parameter μt=xt𝜷+(y(t,p)X(t,p)𝜷)ϕ\mu_{t}=\textbf{x}_{t}^{\top}\boldsymbol{\beta}+(\textbf{y}_{(t,p)}-\textbf{X}_{(t,p)}\boldsymbol{\beta})^{\top}\boldsymbol{\phi}, scale parameter σ2\sigma^{2}, and ν\nu degrees of freedom, where X(t,p)\textbf{X}_{(t,p)} is a p×qp\times q matrix with the covariates related to y(t,p)\textbf{y}_{(t,p)} and xt\textbf{x}_{t} is a vector with the covariates related to YtY_{t}. In other words, we have

f(yt𝜽,t1)\displaystyle f(y_{t}\mid\boldsymbol{\theta},\mathcal{F}_{t-1}) =\displaystyle= f(yt𝜽,y(t,p))\displaystyle f(y_{t}\mid\boldsymbol{\theta},\textbf{y}_{(t,p)})
=\displaystyle= f(yt𝜽,yt1,yt2,,ytp),\displaystyle f(y_{t}\mid\boldsymbol{\theta},y_{t-1},y_{t-2},\ldots,y_{t-p}),

t=p+1,,n.t=p+1,\ldots,n. Thus, the observed (conditional) log-likelihood function can be computed by

(𝜽;yo)=\displaystyle\ell(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{o})=
log(mt=p+1nf(yt𝜽,yt1,,ytp)dym).\displaystyle\log\left(\int_{\mathbb{R}^{m}}\prod_{t=p+1}^{n}f(y_{t}\mid\boldsymbol{\theta},y_{t-1},\ldots,y_{t-p})\mathrm{d}\textbf{y}_{m}\right). (7)

3.2 Parameter estimation via the SAEM algorithm

The observed log-likelihood function in Equation 3.1 involves complex expressions, making work directly with (𝜽;yo)\ell(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{o}) very difficult. Hence, to obtain the ML estimates of 𝜽\boldsymbol{\theta}, we now apply an EM-type algorithm considering the hierarchical representation presented in Subsection 2.1.

Let y=(yp+1,,yn)\textbf{y}=(y_{p+1},\ldots,y_{n})^{\top} and u=(up+1,,un)\textbf{u}=(u_{p+1},\ldots,u_{n})^{\top} be hypothetical missing data, and consider that we observe (V,C)(\textbf{V},\textbf{C}), where V=(Vp+1,,Vn)\textbf{V}=(V_{p+1},\ldots,V_{n})^{\top} and C=(Cp+1,,Cn)\textbf{C}=(C_{p+1},\ldots,C_{n})^{\top}. Then, the complete dataset is given by yc=(y,u,V,C)\textbf{y}_{c}=\left(\textbf{y}^{\top},\textbf{u}^{\top},\textbf{V}^{\top},\textbf{C}^{\top}\right)^{\top}, and the complete log-likelihood function c(𝜽;yc)\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c}) can be written as

c(𝜽;yc)\displaystyle\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c}) =\displaystyle= cte+g(νu)np2logσ2\displaystyle\textrm{cte}+g(\nu\mid\textbf{u})-\frac{n-p}{2}\log\sigma^{2}
\displaystyle- 12σ2i=p+1nui(y~iwiϕ)2,\displaystyle\frac{1}{2\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2},

with g(νu)=np2[νlog(ν2)2logΓ(ν2)]+ν2(i=p+1nloguii=p+1nui)g(\nu\mid\textbf{u})=\frac{n-p}{2}\left[\nu\log\left(\frac{\nu}{2}\right)-2\log\Gamma\left(\frac{\nu}{2}\right)\right]+\frac{\nu}{2}\left(\sum_{i=p+1}^{n}\log u_{i}-\sum_{i=p+1}^{n}u_{i}\right), y~t=ytxt𝜷\tilde{y}_{t}=y_{t}-\textbf{x}_{t}^{\top}\boldsymbol{\beta}, wt=y(t,p)X(t,p)𝜷\textbf{w}_{t}=\textbf{y}_{(t,p)}-\textbf{X}_{(t,p)}\boldsymbol{\beta}, and cte being a constant.

Let 𝜽^(k)\widehat{\boldsymbol{\theta}}^{(k)} denote the current estimate of 𝜽\boldsymbol{\theta}. Then, the conditional expectation of the complete log-likelihood function given the observed data and ignoring constant terms is given by Qk(𝜽)=𝔼[c(𝜽;yc)V,C,𝜽^(k)]Q_{k}(\boldsymbol{\theta})=\mathbb{E}\left[\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}\right], with

Qk(𝜽)\displaystyle Q_{k}(\boldsymbol{\theta}) =\displaystyle= g(νu)^(k)np2logσ^2(k)12σ^2(k)(uy2^(k)\displaystyle\widehat{g(\nu\mid\textbf{u})}^{(k)}-\frac{n-p}{2}\log\widehat{\sigma}^{2(k)}-\frac{1}{2\widehat{\sigma}^{2(k)}}\left(\widehat{uy^{2}_{*}}^{(k)}\right.
2ϕ^(k)uyy^(k)+ϕ^(k)uy2^(k)ϕ^(k)),\displaystyle\left.\hskip 22.76228pt-2\widehat{\boldsymbol{\phi}}^{(k)\top}\widehat{uy\textbf{y}}_{*}^{(k)}+\widehat{\boldsymbol{\phi}}^{(k)\top}\widehat{u\textbf{y}^{2}}_{*}^{(k)}\widehat{\boldsymbol{\phi}}^{(k)}\right),\hskip 28.45274pt

where

g(νu)^(k)=ν^(k)2(log(u)^(k)u^(k))\displaystyle\widehat{g(\nu\mid\textbf{u})}^{(k)}=\frac{\widehat{\nu}^{(k)}}{2}\left(\widehat{\mbox{log(u)}}^{(k)}-\widehat{u}^{(k)}\right)
+np2[ν^(k)log(ν^(k)2)2logΓ(ν^(k)2)]\displaystyle\hskip 28.45274pt+\frac{n-p}{2}\left[\widehat{\nu}^{(k)}\log\left(\frac{\widehat{\nu}^{(k)}}{2}\right)-2\log\Gamma\left(\frac{\widehat{\nu}^{(k)}}{2}\right)\right]
uy2^(k)=uy2^(k)i=p+1n(2uyi^(k)xi𝜷^(k)\displaystyle\widehat{uy^{2}_{*}}^{(k)}=\widehat{uy^{2}}^{(k)}-\sum_{i=p+1}^{n}\left(2\widehat{uy_{i}}^{(k)}\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}\right.
u^i(k)𝜷^(k)xixi𝜷^(k)),\displaystyle\left.\hskip 99.58464pt-\ \widehat{u}_{i}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{x}_{i}\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}\right),
uyy^(k)=uyy^(k)i=p+1n(uyi^(k)X(i,p)𝜷^(k)\displaystyle\widehat{uy\textbf{y}_{*}}^{(k)}=\widehat{uy\textbf{y}}^{(k)}-\sum_{i=p+1}^{n}\left(\widehat{uy_{i}}^{(k)}\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\right.
+uyi^(k)xi𝜷^(k)u^i(k)X(i,p)𝜷^(k)𝜷^(k)xi),\displaystyle\left.\hskip 28.45274pt+\ \widehat{u\textbf{y}_{i}}^{(k)}\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}-\widehat{u}_{i}^{(k)}\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{x}_{i}\right),
uy2^(k)=uy2^(k)i=p+1n(uyi^(k)𝜷^(k)X(i,p)\displaystyle\widehat{u\textbf{y}^{2}_{*}}^{(k)}=\widehat{u\textbf{y}^{2}}^{(k)}-\sum_{i=p+1}^{n}\left(\widehat{u\textbf{y}_{i}}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{X}_{(i,p)}^{\top}\right.
+X(i,p)𝜷^(k)uyi^(k)u^i(k)X(i,p)𝜷^(k)𝜷^(k)X(i,p)),\displaystyle\left.+\ \textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\widehat{u\textbf{y}_{i}}^{(k)\top}-\widehat{u}_{i}^{(k)}\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{X}_{(i,p)}^{\top}\right),

such that u^(k)=i=p+1nu^i(k)\widehat{u}^{(k)}=\sum_{i=p+1}^{n}\widehat{u}_{i}^{(k)}, u^i(k)=𝔼[UiV,C,𝜽^(k)]\widehat{u}_{i}^{(k)}=\mathbb{E}[U_{i}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}], log(u)^(k)=𝔼[i=p+1nlog(Ui)V,C,𝜽^(k)]\widehat{\mbox{log(u)}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}\log(U_{i})\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}], uy2^(k)=𝔼[i=p+1nUiYi2V,C,𝜽^(k)]\widehat{uy^{2}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}U_{i}Y_{i}^{2}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}], uyi^(k)=𝔼[UiYiV,C,𝜽^(k)]\widehat{uy_{i}}^{(k)}=\mathbb{E}[U_{i}Y_{i}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}], uyy^(k)=𝔼[i=p+1nUiYiY(i,p)V,C,𝜽^(k)]\widehat{uy\textbf{y}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}U_{i}Y_{i}\textbf{Y}_{(i,p)}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}], uyi^(k)=𝔼[UiY(i,p)V,C,𝜽^(k)]\widehat{u\textbf{y}_{i}}^{(k)}=\mathbb{E}[U_{i}\textbf{Y}_{(i,p)}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}], and uy2^(k)=𝔼[i=p+1nUiY(i,p)Y(i,p)V,C,𝜽^(k1)]\widehat{u\textbf{y}^{2}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}U_{i}\textbf{Y}_{(i,p)}\textbf{Y}_{(i,p)}^{\top}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k-1)}], for i=p+1,,ni=p+1,\ldots,n.

It is easy to observe that the E-step reduces to the computation of u^i(k)\widehat{u}_{i}^{(k)}, log(u)^(k)\widehat{\mbox{log(u)}}^{(k)}, uy2^(k)\widehat{uy^{2}}^{(k)}, uiyi^(k)\widehat{u_{i}\textbf{y}_{i}}^{(k)}, uyy^(k)\widehat{uy\textbf{y}}^{(k)}, uyi^(k)\widehat{u\textbf{y}_{i}}^{(k)}, and uy2^(k)\widehat{u\textbf{y}^{2}}^{(k)}, for i=p+1,,ni=p+1,\ldots,n. Because of the difficulties in the calculation of the conditional expectations in the E-step, specifically, when we have successive censored observations, we consider the implementation of the SAEM algorithm, introduced in Subsection 2.2, which at the kkth iteration proceeds as follows:

Step E-1 (Sampling). We first consider the observed and censored/missing part of y=(yo,ym)\textbf{y}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{\top})^{\top} separately and rearrange the elements of the location vector and scale matrix parameters to compute the conditional distribution of ym\textbf{y}_{m} as demonstrated in Appendix A.1. Thus, in the simulation step, we generate MM samples from the conditional distribution of the latent variables (y,u)(\textbf{y},\textbf{u}) via the Gibbs sampler algorithm according to the following scheme:

  • Step 1: Sample ym(k,l)\textbf{y}_{m}^{(k,l)} from the TNnm(𝝁~(k),𝚺~(k);Vm)\mbox{TN}_{n_{m}}(\tilde{\boldsymbol{\mu}}^{*(k)},\tilde{\boldsymbol{\Sigma}}^{*(k)}\mathchar 24635\relax\;\textbf{V}_{m}), with 𝝁~(k)=𝝁~m(k)+𝚺~mo(k)𝚺~oo1(k)(yo𝝁~o(k))\tilde{\boldsymbol{\mu}}^{*(k)}=\tilde{\boldsymbol{\mu}}_{m}^{(k)}+\tilde{\boldsymbol{\Sigma}}_{mo}^{(k)}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1(k)}(\textbf{y}_{o}-\tilde{\boldsymbol{\mu}}_{o}^{(k)}), 𝚺~(k)=𝚺~mm(k)𝚺~mo(k)𝚺~oo1(k)𝚺~om(k)\tilde{\boldsymbol{\Sigma}}^{*(k)}=\tilde{\boldsymbol{\Sigma}}_{mm}^{(k)}-\tilde{\boldsymbol{\Sigma}}_{mo}^{(k)}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1(k)}\tilde{\boldsymbol{\Sigma}}_{om}^{(k)}, and Vm={ym=(y1m,,ynmm);V11my1mV12m,,Vnm1mynmmVnm2m}\textbf{V}_{m}=\{\textbf{y}_{m}=(y_{1}^{m},\ldots,y_{n_{m}}^{m})^{\top}\mathchar 24635\relax\;V_{11}^{m}\leq y_{1}^{m}\leq V_{12}^{m},\ldots,V_{n_{m}1}^{m}\leq\textbf{y}_{n_{m}}^{m}\leq V_{n_{m}2}^{m}\}, where the parameters 𝝁~m(k),𝝁~o(k),𝚺~mm(k),𝚺~mo(k)\tilde{\boldsymbol{\mu}}_{m}^{(k)},\tilde{\boldsymbol{\mu}}_{o}^{(k)},\tilde{\boldsymbol{\Sigma}}_{mm}^{(k)},\tilde{\boldsymbol{\Sigma}}_{mo}^{(k)}, and 𝚺~oo(k)\tilde{\boldsymbol{\Sigma}}_{oo}^{(k)} are computed using Equations 19 and 20 available in Appendix A.1. Then, we construct a full vector of observations as y(k,l)=(yo,ym(k,l))\textbf{y}^{(k,l)}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{(k,l)\top})^{\top}, using the observed and the sample generated values, for l=1,,Ml=1,\ldots,M.

  • Step 2: Sample u(k,l)\textbf{u}^{(k,l)} from f(uy(k,l),𝜽^(k))f(\textbf{u}\mid\textbf{y}^{(k,l)},\widehat{\boldsymbol{\theta}}^{(k)}), whose elements are independent and Gamma distributed as ui(k,l)y(k,l),𝜽^(k)ind𝒢(ai(k,l),bi(k,l))u_{i}^{(k,l)}\mid\textbf{y}^{(k,l)},\widehat{\boldsymbol{\theta}}^{(k)}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{G}(a_{i}^{(k,l)},b_{i}^{(k,l)}) with ai(k,l)=(ν^(k)+1)/2a_{i}^{(k,l)}=(\widehat{\nu}^{(k)}+1)/2, bi(k,l)=(ν^(k)+ϱi(k,l)2/σ^2(k))/2b_{i}^{(k,l)}=(\widehat{\nu}^{(k)}+\varrho_{i}^{(k,l)2}/\widehat{\sigma}^{2(k)})/2, and ϱi(k,l)=yi(k,l)xi𝜷^(k)y(i,p)(k,l)ϕ^(k)+𝜷^(k)X(i,p)ϕ^(k)\varrho_{i}^{(k,l)}=y_{i}^{(k,l)}-\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}-\textbf{y}_{(i,p)}^{(k,l)\top}\widehat{\boldsymbol{\phi}}^{(k)}+\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{X}_{(i,p)}^{\top}\widehat{\boldsymbol{\phi}}^{(k)}, for i=p+1,,ni=p+1,\ldots,n and l=1,,Ml=1,\ldots,M.

Step E-2 (Stochastic Approximation). Given the sequence (y(k,l),u(k,l))(\textbf{y}^{(k,l)},\textbf{u}^{(k,l)}) for l=1,,Ml=1,\ldots,M, we replace the conditional expectations in Qk(𝜽)Q_{k}(\boldsymbol{\theta}) by the following stochastic approximations:

u^i(k)=u^i(k1)+δk(1Ml=1Mui(k,l)u^i(k1)),\displaystyle\widehat{u}_{i}^{(k)}=\widehat{u}_{i}^{(k-1)}+\delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}u_{i}^{(k,l)}-\widehat{u}_{i}^{(k-1)}\right),
log(u)^(k)=log(u)^(k1)\displaystyle\widehat{\mbox{log(u)}}^{(k)}=\widehat{\mbox{log(u)}}^{(k-1)}
+δk(1Ml=1Mi=p+1nlogui(k,l)log(u)^(k1)),\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}\log u_{i}^{(k,l)}-\widehat{\mbox{log(u)}}^{(k-1)}\right),
uy2^(k)=uy2^(k1)\displaystyle\widehat{uy^{2}}^{(k)}=\widehat{uy^{2}}^{(k-1)}
+δk(1Ml=1Mi=p+1nui(k,l)yi2(k,l)uy2^(k1)),\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}u_{i}^{(k,l)}y_{i}^{2(k,l)}-\widehat{uy^{2}}^{(k-1)}\right),
uyi^(k)=uyi^(k1)\displaystyle\widehat{uy_{i}}^{(k)}=\widehat{uy_{i}}^{(k-1)}
+δk(1Ml=1Mui(k,l)yi(k,l)uyi^(k1)),\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}u_{i}^{(k,l)}y_{i}^{(k,l)}-\widehat{uy_{i}}^{(k-1)}\right),
uyy^(k)=uyy^(k1)\displaystyle\widehat{uy\textbf{y}}^{(k)}=\widehat{uy\textbf{y}}^{(k-1)}
+δk(1Ml=1Mi=p+1nui(k,l)yi(k,l)y(i,p)(k,l)uyy^(k1)),\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}u_{i}^{(k,l)}y_{i}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)}-\widehat{uy\textbf{y}}^{(k-1)}\right),
uyi^(k)=uyi^(k1)\displaystyle\widehat{u\textbf{y}_{i}}^{(k)}=\widehat{u\textbf{y}_{i}}^{(k-1)}
+δk(1Ml=1Mui(k,l)y(i,p)(k,l)uyi^(k1)),\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}u_{i}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)}-\widehat{u\textbf{y}_{i}}^{(k-1)}\right),
uy2^(k)=uy2^(k1)\displaystyle\widehat{u\textbf{y}^{2}}^{(k)}=\widehat{u\textbf{y}^{2}}^{(k-1)}
+δk(1Ml=1Mi=p+1nui(k,l)y(i,p)(k,l)y(i,p)(k,l)uy2^(k1)),\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}u_{i}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)\top}-\widehat{u\textbf{y}^{2}}^{(k-1)}\right),\hskip 156.49014pt

where i=p+1,,n.i=p+1,\ldots,n.

Following galarza2017quantile, the variable δk\delta_{k} will be considered as δk=1\delta_{k}=1, if 1kcW1\leq k\leq cW, and δk=1/(kcW)\delta_{k}=1/(k-cW), if cW+1kWcW+1\leq k\leq W, where WW is the maximum number of iterations and cc is a cutoff point (0c1)(0\leq c\leq 1) which determines the percentage of initial iterations with no-memory. If c=0c=0, the algorithm will have a memory for all iterations and hence will converge slowly to the ML estimates, and WW needs to be large. If c=1c=1, the algorithm will be memory-free, it will converge quickly to a solution neighborhood, and the algorithm will initiate a Markov chain leading to a reasonably well-estimated mean after applying the necessary burn-in and thinning steps. A number between 0 and 1 (0<c<1)(0<c<1) will assure an initial convergence in distribution to a solution neighborhood for the first cWcW iterations and an almost sure convergence for the rest of the iterations.

The selection of cc and WW could affect the speed of convergence of the SAEM algorithm. A graphical approach can monitor the convergence of the estimates for all parameters and determine the values for these constants, as suggested by lavielle2014mixed. An advantage of the SAEM algorithm is that, even though it performs an MCMC E-step, it only requires a small and fixed sample size MM making it much faster than the MCEM algorithm.

Step CM (Conditional Maximization). A conditional maximization step is then carried out, and 𝜽^(k)\widehat{\boldsymbol{\theta}}^{(k)} is updated by maximizing Qk(𝜽)Q_{k}(\boldsymbol{\theta}) over 𝜽\boldsymbol{\theta} to obtain a new estimate 𝜽^(k+1)\widehat{\boldsymbol{\theta}}^{(k+1)}, leading to the expressions:

ϕ^(k+1)\displaystyle\widehat{\boldsymbol{\phi}}^{(k+1)} =\displaystyle= (uy2^(k))1uyy^(k),\displaystyle\left(\widehat{u\textbf{y}^{2}_{*}}^{(k)}\right)^{-1}\widehat{uy\textbf{y}_{*}}^{(k)}, (8)
σ^2(k+1)\displaystyle\widehat{\sigma}^{2(k+1)} =\displaystyle= 1np(uy2^(k)2ϕ^(k+1)uyy^(k)\displaystyle\frac{1}{n-p}\left(\widehat{uy_{*}^{2}}^{(k)}-2\widehat{\boldsymbol{\phi}}^{(k+1)\top}\widehat{uy\textbf{y}_{*}}^{(k)}\right. (10)
+ϕ^(k+1)uy2^(k)ϕ^(k+1)),\displaystyle\hskip 56.9055pt+\ \left.\widehat{\boldsymbol{\phi}}^{(k+1)\top}\widehat{u\textbf{y}^{2}_{*}}^{(k)}\widehat{\boldsymbol{\phi}}^{(k+1)}\right),
𝜷^(k+1)\displaystyle\widehat{\boldsymbol{\beta}}^{(k+1)} =\displaystyle= (i=p+1nu^i(k)𝜶^i(k+1)𝜶^i(k+1))1\displaystyle\left(\sum_{i=p+1}^{n}\widehat{u}_{i}^{(k)}\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)}\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)\top}\right)^{-1} (12)
×i=p+1n(uyi^(k)ϕ^(k+1)uyi^(k))𝜶^i(k+1),\displaystyle\times\sum_{i=p+1}^{n}\left(\widehat{uy_{i}}^{(k)}-\widehat{\boldsymbol{\phi}}^{(k+1)\top}\widehat{u\textbf{y}_{i}}^{(k)}\right)\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)},
ν^(k+1)\displaystyle\widehat{\nu}^{(k+1)} =\displaystyle= argmax𝜈g(νu)^(k),\displaystyle\underset{\nu}{\operatorname{argmax}}\quad\widehat{g(\nu\mid u)}^{(k)},\hskip 312.9803pt (13)

with 𝜶^i(k+1)=xiX(i,p)ϕ^(k+1)\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)}=\textbf{x}_{i}-\textbf{X}_{(i,p)}^{\top}\widehat{\boldsymbol{\phi}}^{(k+1)} for i=p+1,,ni=p+1,\ldots,n.

3.3 Standard error approximation

The Fisher information matrix is a good measure of the amount of information that a sample dataset provides about the parameters, and it can be used to compute the asymptotic variance of the estimators. louis1982finding developed a procedure for extracting the observed information matrix when the EM algorithm is used to find the ML estimates in problems with incomplete data.

Let 𝐒c(yc;𝜽)\mathbf{S}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta}) and 𝐁c(yc;𝜽)\mathbf{B}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta}) respectively denote the first derivative and the negative of the second derivative of the complete-data log-likelihood function with respect to the parameter vector 𝜽\boldsymbol{\theta}. Let 𝐒o(yo;𝜽)\mathbf{S}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta}) and 𝐁o(yo;𝜽)\mathbf{B}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta}) be the corresponding derivatives of the log-likelihood function of the observed data. Then, the observed information matrix is given by

Io(𝜽)=𝐁o(yo;𝜽)=𝐒o(yo;𝜽)𝐒o(yo;𝜽)\displaystyle\textbf{I}_{o}(\boldsymbol{\theta})=\mathbf{B}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})=\mathbf{S}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})\mathbf{S}_{o}^{\top}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})
+𝔼[𝐁c(yc;𝜽)yo]𝔼[𝐒c(yc;𝜽)𝐒c(yc;𝜽)yo].\displaystyle+\mathbb{E}\left[\mathbf{B}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})\mid\textbf{y}_{o}\right]-\mathbb{E}\left[\mathbf{S}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})\mathbf{S}_{c}^{\top}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})\mid\textbf{y}_{o}\right].

delyon1999convergence adapted the method proposed by louis1982finding to compute the observed information matrix when the SAEM algorithm is used to estimate the parameters. For this, it is necessary to compute the auxiliary terms

𝐇k\displaystyle\mathbf{H}_{k} =𝐆k+𝚫k𝚫k,\displaystyle=-\mathbf{G}_{k}+\mathbf{\Delta}_{k}\mathbf{\Delta}_{k}^{\top}, (14)
𝐆k\displaystyle\mathbf{G}_{k} =𝐆k1+δk(1Ml=1M(2c(𝜽;y(k,l))𝜽𝜽\displaystyle=\mathbf{G}_{k-1}+\delta_{k}\Bigg{(}\frac{1}{M}\sum_{l=1}^{M}\Bigg{(}\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}\partial\boldsymbol{\theta}^{\top}}
+c(𝜽;y(k,l))𝜽c(𝜽;y(k,l))𝜽)𝐆k1),\displaystyle+\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}}\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}}^{\top}\Bigg{)}-\mathbf{G}_{k-1}\Bigg{)},
and
𝚫k\displaystyle\mathbf{\Delta}_{k} =𝚫k1+δk(1Ml=1Mc(𝜽;y(k,l))𝜽𝚫k1),\displaystyle=\mathbf{\Delta}_{k-1}+\delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}}-\mathbf{\Delta}_{k-1}\right),\hskip 227.62204pt

where y(k,l)=(yo,ym(k,l))\textbf{y}^{(k,l)}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{(k,l)\top})^{\top} is the vector of observations sampled at iteration (k,l)(k,l) for k=1,,Wk=1,\ldots,W and l=1,,Ml=1,\ldots,M, with WW denoting the maximum number of iteration of the SAEM algorithm and MM the number of MC samples for stochastic approximation. The inverse of the limiting value of 𝐇k\mathbf{H}_{k} can be used to assess the dispersion of the estimators (delyon1999convergence). The analytical expressions for the first and second derivatives of the complete data log-likelihood function are given in Appendix A.2.

3.4 Prediction

Considering the interest of predicting values from the CARt(p)t(p) model, we denote by yobs\textbf{y}_{\textrm{obs}} the nn-vector of random variables (RVs) corresponding to the given sample and by ypred\textbf{y}_{\textrm{pred}} the vector of RVs of length npredn_{\textrm{pred}} corresponding to the time points that we are interested in predicting.

Let yobs=(yo,ym)\textbf{y}_{\textrm{obs}}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{\top})^{\top}, where yo\textbf{y}_{o} is the vector of uncensored observations and ym\textbf{y}_{m} is the vector of censored or missing observations. To deal with the censored values existing in yobs\textbf{y}_{\textrm{obs}}, we use an imputation procedure that consists in replacing the censored values with the values obtained in the last iteration of the SAEM algorithm, i.e., ym=𝔼[ymV,C,𝜽^(W)]y^m(W)\textbf{y}_{m}=\mathbb{E}[\textbf{y}_{m}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(W)}]\approx\widehat{\textbf{y}}_{m}^{(W)}, since elements of y^m(k)\widehat{\textbf{y}}_{m}^{(k)} can also be updated during the Step E-2 of the SAEM algorithm as

y^m(k)=y^m(k1)+δk(1Ml=1Mym(k,l)y^m(k1)),\widehat{\textbf{y}}_{m}^{(k)}=\widehat{\textbf{y}}_{m}^{(k-1)}+\delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\textbf{y}_{m}^{(k,l)}-\widehat{\textbf{y}}_{m}^{(k-1)}\right), (15)

k=1,,Wk=1,\ldots,W , with the same δk\delta_{k}, MM, and WW settings considering in the estimation. The new vector of observed values will be denoted by yobs*=(yo,y^m(W))\textbf{y}_{\textrm{obs*}}=(\textbf{y}_{o}^{\top},\widehat{\textbf{y}}_{m}^{(W)\top})^{\top}.

Now, supposing that all values in yobs*\textbf{y}_{\textrm{obs*}} are completely observed and that the explanatory variables for ypred\textbf{y}_{\textrm{pred}} are available, the forecasting procedure will be performed recursively (box2015time) as follows

y^n+k=\widehat{y}_{n+k}=

{xn+k𝜷+j=kpϕj(yn+kjxn+kj𝜷),k=1xn+k𝜷+i=1k1ϕi(y^n+kixn+ki𝜷)+j=kpϕj(yn+kjxn+kj𝜷),1<kpxn+k𝜷+j=1pϕj(y^n+kjxn+kj𝜷),p<knpred.\begin{cases}\textbf{x}_{n+k}^{\top}\boldsymbol{\beta}+\sum_{j=k}^{p}\phi_{j}(y_{n+k-j}-\textbf{x}_{n+k-j}^{\top}\boldsymbol{\beta}),&k=1\\ &\\ \textbf{x}_{n+k}^{\top}\boldsymbol{\beta}+\sum_{i=1}^{k-1}\phi_{i}(\widehat{y}_{n+k-i}-\textbf{x}_{n+k-i}^{\top}\boldsymbol{\beta})\\ +\sum_{j=k}^{p}\phi_{j}(y_{n+k-j}-\textbf{x}_{n+k-j}^{\top}\boldsymbol{\beta}),&1<k\leq p\\ &\\ \textbf{x}_{n+k}^{\top}\boldsymbol{\beta}+\sum_{j=1}^{p}\phi_{j}(\widehat{y}_{n+k-j}-\textbf{x}_{n+k-j}^{\top}\boldsymbol{\beta}),&p<k\leq n_{\textrm{pred}}.\end{cases}

In practice, 𝜽\boldsymbol{\theta} is substituted by 𝜽^=(𝜷^,ϕ^,σ^2,ν^)\widehat{\boldsymbol{\theta}}=\left(\widehat{\boldsymbol{\beta}}^{\top},\widehat{\boldsymbol{\phi}}^{\top},\widehat{\sigma}^{2},\widehat{\nu}\right)^{\top}, where 𝜽^\widehat{\boldsymbol{\theta}} is the estimate obtained via the SAEM algorithm, such that y^pred=(y^n+1,y^n+2,,y^n+npred)\widehat{\textbf{y}}_{\textrm{pred}}=(\widehat{y}_{n+1},\widehat{y}_{n+2},\ldots,\widehat{y}_{n+n_{\textrm{pred}}})^{\top}.

4 Simulation Study

In this section, we examine the asymptotic properties of the SAEM estimates through a simulation study considering different sample sizes and levels of censoring. A second simulation study is performed to demonstrate the robustness of the estimates obtained from the proposed model when the data is perturbed.

Table 1: Simulation study 1. Summary statistics of parameter estimates for the CARtt(2) model based on 300 samples of sizes n=100,300,600n=100,300,600, and different levels of censoring.
Average Parameter
nn LOD level of Measure β0\beta_{0} β1\beta_{1} β2\beta_{2} ϕ1\phi_{1} ϕ2\phi_{2} σ2\sigma^{2} ν\nu
censoring 5.00 0.50 0.90 -0.40 0.12 2.00 4.00
100 No 0% MC-Mean 4.988 0.501 0.937 -0.409 0.093 1.779 4.150
IM-SE 0.325 0.141 0.565 0.086 0.086 0.531 2.612
MC-SD 0.334 0.149 0.567 0.090 0.091 0.465 2.089
CP (%) 94.0 91.3 94.0 - - - -
1.60 5.10% MC-Mean 4.984 0.502 0.942 -0.409 0.092 1.781 4.345
IM-SE 0.319 0.141 0.557 0.091 0.088 0.533 3.326
MC-SD 0.339 0.149 0.576 0.096 0.094 0.494 2.692
CP (%) 94.3 92.0 94.3 - - - -
3.45 19.59% MC-Mean 4.977 0.505 0.931 -0.418 0.089 1.812 4.559
IM-SE 0.337 0.150 0.586 0.105 0.099 0.584 3.957
MC-SD 0.349 0.156 0.588 0.110 0.103 0.516 2.914
CP (%) 93.3 93.7 93.3 - - - -
4.30 34.11% MC-Mean 4.939 0.511 0.928 -0.434 0.077 1.856 4.930
IM-SE 0.371 0.163 0.644 0.122 0.115 0.721 4.872
MC-SD 0.372 0.171 0.642 0.126 0.120 0.571 3.517
CP (%) 94.0 93.7 93.3 - - - -
300 No 0% MC-Mean 5.015 0.501 0.893 -0.401 0.115 1.937 4.162
IM-SE 0.167 0.084 0.299 0.048 0.048 0.293 1.172
MC-SD 0.180 0.090 0.312 0.052 0.050 0.295 1.266
CP (%) 93.3 93.7 93.7 - - - -
1.60 5.31% MC-Mean 5.017 0.501 0.889 -0.399 0.115 1.944 4.244
IM-SE 0.169 0.085 0.303 0.053 0.051 0.314 1.444
MC-SD 0.182 0.091 0.319 0.055 0.051 0.311 1.498
CP (%) 93.0 94.0 93.0 - - - -
3.45 20.58% MC-Mean 4.999 0.507 0.891 -0.409 0.110 1.997 4.645
IM-SE 0.177 0.090 0.317 0.060 0.057 0.365 2.122
MC-SD 0.191 0.095 0.332 0.061 0.058 0.341 1.935
CP (%) 93.7 92.3 93.7 - - - -
4.30 35.44% MC-Mean 4.958 0.518 0.901 -0.409 0.110 2.080 5.043
IM-SE 0.194 0.098 0.341 0.068 0.065 0.421 2.716
MC-SD 0.199 0.106 0.339 0.074 0.069 0.385 2.258
CP (%) 95.0 93.3 95.3 - - - -
600 No 0% MC-Mean 5.001 0.507 0.909 -0.400 0.118 1.966 4.072
IM-SE 0.125 0.063 0.225 0.037 0.036 0.221 0.889
MC-SD 0.125 0.064 0.226 0.038 0.037 0.212 0.918
CP (%) 93.0 94.7 92.3 - - - -
1.60 5.11% MC-Mean 4.999 0.508 0.911 -0.401 0.117 1.977 4.145
IM-SE 0.125 0.063 0.225 0.037 0.036 0.221 0.899
MC-SD 0.124 0.065 0.224 0.038 0.037 0.212 0.937
CP (%) 92.7 94.3 92.0 - - - -
3.45 19.99% MC-Mean 4.981 0.513 0.921 -0.405 0.117 2.034 4.447
IM-SE 0.132 0.066 0.235 0.042 0.040 0.254 1.252
MC-SD 0.127 0.069 0.229 0.042 0.041 0.239 1.342
CP (%) 94.0 93.7 94.3 - - - -
4.30 34.64% MC-Mean 4.939 0.521 0.933 -0.410 0.112 2.113 4.826
IM-SE 0.142 0.072 0.251 0.048 0.045 0.300 1.672
MC-SD 0.139 0.074 0.251 0.049 0.046 0.284 1.657
CP (%) 93.7 94.7 94.7 - - - -
Figure 1: Simulation study 1. Boxplot of the estimates for CARtt(2) model considering different sample sizes and LOD.
Refer to caption

4.1 Simulation study 1

This study aims to provide empirical evidence about the consistence of the ML estimates under different scenarios. Thence, 300 MC samples were generated considering different sample sizes: n=100,300n=100,300, and 600600. The data was generated from the model specified by Equations 1-2, considering a Student-tt distribution for the innovations (η\eta’s) with σ2=2\sigma^{2}=2 and ν=4\nu=4. The rest of the parameters were set as 𝜷=(5,0.50,0.90)\boldsymbol{\beta}=(5,0.50,0.90)^{\top}, ϕ=(0.40,0.12)\boldsymbol{\phi}=(-0.40,0.12)^{\top}, and the covariables at time ii were xi=(1,xi1,xi2)\textbf{x}_{i}=(1,x_{i1},x_{i2})^{\top}, with xi1𝒩(0,1)x_{i1}\sim\mathcal{N}(0,1) and x2i𝒰(0,1)x_{2i}\sim\mathcal{U}(0,1), for i=1,2,,ni=1,2,\ldots,n. From this scenario, two analysis were conducted and will be discussed next.

4.1.1 Asymptotic properties

Aiming to have scenarios with an average level of censoring/missing of 5%, 20%, and 35%, respectively, we considered the following LODs (values lower than the LOD are substituted by the LOD): 1.60, 3.45, and 4.30. Furthermore, 20% of the desired censored rate corresponds to observations randomly selected to be treated as missing. Additionally, we considered the case without censoring (original data) for comparison.

For each sample size and LOD, we computed the mean of the 300 MC estimates (MC-Mean), the mean of the standard error (IM-SE) computed by the inverse of the observed information matrix given in Equation 14, the standard deviation of the 300 MC estimates (MC-SD), and the coverage probability (CP) of a 95% confidence interval, i.e.,

MC-Meani\displaystyle\mbox{MC-Mean}_{i} =\displaystyle= θ^¯i=1300j=1300θ^i(j),\displaystyle\bar{\widehat{\theta}}_{i}=\frac{1}{300}\sum_{j=1}^{300}\widehat{\theta}_{i}^{(j)},
MC-SDi\displaystyle\mbox{MC-SD}_{i} =\displaystyle= 1299j=1299(θ^i(j)θ^¯i)2,and\displaystyle\sqrt{\frac{1}{299}\sum_{j=1}^{299}\left(\widehat{\theta}_{i}^{(j)}-\bar{\widehat{\theta}}_{i}\right)^{2}},\quad\mbox{and}
IM-SEi\displaystyle\mbox{IM-SE}_{i} =\displaystyle= 1300j=1300SE(θ^i(j)),\displaystyle\frac{1}{300}\sum_{j=1}^{300}SE\left(\widehat{\theta}_{i}^{(j)}\right),

where θ^i(j)\widehat{\theta}_{i}^{(j)} is the estimate of the iith parameter of 𝜽=(β0,β1,β2,ϕ1,ϕ2,σ2,ν)\boldsymbol{\theta}=(\beta_{0},\beta_{1},\beta_{2},\phi_{1},\phi_{2},\sigma^{2},\nu)^{\top} in the jjth MC sample.

Figure 2: Simulation study 1. MSE of the estimates for the CARtt(2) model considering different sample sizes and LOD.
Refer to caption

The results are shown in Table 1, where we can observe that the mean of the estimates (MC-mean) is close to the true parameter value in all combinations of sample size and LOD, except for the scale parameter σ2\sigma^{2} for a sample size of n=100n=100. As expected, this difference decreases as the sample size increases. Notice that the mean of the standard errors obtained through the inverse of the observed information matrix (IM-SE) are in general close to the standard deviation of the estimates (MC-SD) for all scenarios, indicating that the proposed method to obtain the standard errors is reliable.

Figure 1 shows the boxplots of the estimates for each parameter, considering different sample sizes and LODs. The solid red line represents the true parameter value. In general, the median of the estimates is close to the real value independent of the sample size and LOD. However, for ϕ2\phi_{2} and σ2\sigma^{2}, the median underestimates the true value in samples of size n=100n=100, i.e., the smallest sample size in the simulation study. Furthermore, interquartile ranges decrease as sample sizes increase, suggesting the consistence of the estimates. Additionally, boxplots for the estimates of ν\nu are shown in Figure 8 in Appendix B.

Finally, to study the asymptotic properties of the estimates, we analyzed the mean squared error (MSE) of the estimates obtained from the proposed algorithm for all scenarios, which can defined by MSEi=1nj=1n(θ^i(j)θi)2\mbox{MSE}_{i}=\frac{1}{n}\sum_{j=1}^{n}(\widehat{\theta}_{i}^{(j)}-\theta_{i})^{2}. The results for each parameter, sample size, and LOD are shown in Figure 2, where we may note that the MSE tends to zero as the sample size increases. Thus, the proposed SAEM algorithm seems to provide ML estimates with good asymptotic properties for our proposed autoregressive censored linear model with Student-tt innovations.

4.1.2 Residual Analysis

Checking the specification of a statistical model usually involves statistical tests and graphical methods based on residuals. However, conventional residuals (as Pearson’s residuals, p.e.) are not appropriate for some models, since they may lead to erroneous inference as kalliovirta2006misspecification demonstrated for models based on mixtures of distributions. As an alternative, dunn1996randomized developed the quantile residuals, a type of residuals for regression models with independent responses that produce residuals normally distributed by inverting the fitted distribution function for each response value and finding the equivalent standard normal quantile. These results assume that the model is correctly specified and parameters are consistently estimated. The method can be extended to dependent data by expressing the likelihood as a product of univariate conditional likelihoods.

Figure 3: Simulation study 1. Plots of quantile residuals for a sample of size n=600n=600 generated from the CARtt(2) model considering different levels of left censoring.
Refer to captionRefer to captionRefer to captionRefer to caption

To compute the quantile residuals for the CARt(p)t(p) model, we first impute the censored or missing observations as defined in Section 3.4. Then, considering all values as completely observed, the residual for iith observation is computed as follows

r^i=Φ1(T1(yi;μ^i,σ^2,ν^)),i=p+1,n,\widehat{r}_{i}=\Phi^{-1}\left(T_{1}\left(y_{i}\mathchar 24635\relax\;\widehat{\mu}_{i},\widehat{\sigma}^{2},\widehat{\nu}\right)\right),\quad i=p+1,\ldots n, (16)
Figure 4: Simulation study 1. Plots of the residuals for a sample of size n=600n=600 generated from the CARtt(2) model and fitting a model with normal innovations, considering different levels of left censoring.
Refer to captionRefer to captionRefer to captionRefer to caption

where Φ()\Phi(\cdot) denotes the cumulative distribution function (CDF) of the standard normal distribution, T1(;μ^i,σ^2,ν^)T_{1}(\cdot\mathchar 24635\relax\;\hat{\mu}_{i},\hat{\sigma}^{2},\hat{\nu}) is the CDF of the univariate Student-tt distribution with location parameter μ^i=xi𝜷^+(y(i,p)X(i,p)𝜷^)ϕ^\widehat{\mu}_{i}=\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}+(\textbf{y}_{(i,p)}-\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}})^{\top}\widehat{\boldsymbol{\phi}}, scale parameter σ^2\hat{\sigma}^{2}, and ν^\hat{\nu} degrees of freedom, where 𝜽^\widehat{\boldsymbol{\theta}} refers to the ML estimates of 𝜽\boldsymbol{\theta} obtained through the SAEM algorithm. Note that the quantile residual is calculated only for the latest npn-p observations.

Aiming to analyze how the quantile residuals behave for the proposed model, they were computed for a simulated dataset of sample size n=600n=600 and considering four levels of left censoring: 0%, 5.17%, 20%, and 34.67%. Figure 3 shows a Quantile-Quantile (Q-Q) plot, a dispersion plot (residual vs. time), and a histogram for the residuals. For all levels of censoring, the histogram seems to correspond to a histogram of a normally distributed variable, and the dispersion plot shows independent residuals. We can deduce through the Q-Q plot that the residuals are roughly normally distributed because all points form a roughly straight line inside the confidence band. However, for samples with 20% and 34.67% of censoring, the Q-Q plots present a slight deviation from the center line in the lower tail, which might be due to the high proportion of censored values.

For comparison, we fitted the same dataset considering the normal distribution (i.e., disregarding the heavy tails), and computed the corresponding quantile residuals. The resulting plots are given in Figure 4, where we can see clear signs of non-normality, such as large residuals and several points outside the confidence band in the Q-Q plots. Therefore, this illustration indicates that this method can help checking the model specification in the CARt(p)t(p) model. Nevertheless, for significant levels of censoring, more caution is needed in the analysis of residuals since our proposal imputes the unobserved data by its conditional expectation.

4.2 Simulation study 2: Robustness of the estimators

Figure 5: Simulation study 2. Boxplot of the estimates obtained from CAR(2) and CARtt(2) model based on 300 MC samples of size 100.

a. Level of censoring: 0% Refer to caption

b. Level of censoring: 10.04% Refer to caption

c. Level of censoring: 30% Refer to caption

This simulation study aims to compare the performance of the estimates for two censored AR models in the presence of outliers on the response variable. In this case, we generated 300 MC samples of size n=100n=100 under the model specified in Equations 1-2, considering the standard normal distribution for the innovations. The other parameters were set as 𝜷=(4,0.50)\boldsymbol{\beta}=(4,0.50)^{\top} and ϕ=(0.48,0.20)\boldsymbol{\phi}=(0.48,-0.20)^{\top}, and the covariates were set as xi=(1,xi1)\textbf{x}_{i}=(1,x_{i1})^{\top}, where xi1𝒩(0,1)x_{i1}\sim\mathcal{N}(0,1), i=1,,100i=1,\ldots,100. After generating the data, each MC sample was perturbed considering the following scheme: the maximum value was increased in ϑ\vartheta times the sample standard deviation, i.e., max(y)=max(y)+ϑSD(y)\max(\textbf{y})=\max(\textbf{y})+\vartheta\mbox{SD}(\textbf{y}), for ϑ{0,1,2,3,4,5,7}\vartheta\in\{0,1,2,3,4,5,7\}.

Table 2: Simulation study 2. Mean of the estimates for CAR(2) and CARtt(2) model based on 300 MC samples of size n=100n=100 for different levels of perturbation.

a. Level of censoring: 0% Pert. CAR(2) CARtt(2) (ϑ\vartheta) β0\beta_{0} β1\beta_{1} σ2\sigma^{2} ϕ1\phi_{1} ϕ2\phi_{2} β0\beta_{0} β1\beta_{1} σ2\sigma^{2*} ϕ1\phi_{1} ϕ2\phi_{2} ν\nu DI (%) 0 4.007 0.510 0.963 0.478 -0.215 4.007 0.510 0.979 0.480 -0.216 24.424 17 1 4.021 0.526 1.030 0.469 -0.209 4.008 0.518 1.042 0.464 -0.207 17.098 73.33 2 4.036 0.541 1.135 0.447 -0.195 3.999 0.518 1.130 0.433 -0.186 9.568 98.67 3 4.050 0.557 1.275 0.416 -0.175 3.991 0.515 1.229 0.398 -0.162 6.064 99.67 4 4.064 0.572 1.449 0.383 -0.154 3.987 0.513 1.328 0.369 -0.140 4.896 100 5 4.079 0.588 1.655 0.349 -0.133 3.986 0.510 1.426 0.346 -0.121 4.324 100 7 4.107 0.619 2.160 0.287 -0.100 3.985 0.508 1.628 0.309 -0.090 3.710 100

b. Level of censoring: 10.04% Pert. CAR(2) CARtt(2) (ϑ\vartheta) β0\beta_{0} β1\beta_{1} σ2\sigma^{2} ϕ1\phi_{1} ϕ2\phi_{2} β0\beta_{0} β1\beta_{1} σ2\sigma^{2*} ϕ1\phi_{1} ϕ2\phi_{2} ν\nu DI (%) 0 4.006 0.511 0.966 0.478 -0.214 4.006 0.511 0.997 0.478 -0.213 20.892 21.00 1 4.018 0.529 1.044 0.468 -0.208 4.007 0.519 1.079 0.460 -0.203 14.290 77.67 2 4.027 0.549 1.165 0.445 -0.194 4.002 0.518 1.207 0.424 -0.180 7.764 98.67 3 4.036 0.570 1.327 0.415 -0.175 3.999 0.514 1.362 0.387 -0.155 5.061 99.67 4 4.044 0.592 1.527 0.383 -0.156 3.999 0.511 1.535 0.357 -0.133 4.105 100 5 4.052 0.614 1.764 0.351 -0.138 4.000 0.509 1.704 0.332 -0.115 3.668 100 7 4.065 0.659 2.340 0.294 -0.109 4.003 0.505 2.170 0.295 -0.088 3.166 100

c. Level of censoring: 30% Pert. CAR(2) CARtt(2) (ϑ\vartheta) β0\beta_{0} β1\beta_{1} σ2\sigma^{2} ϕ1\phi_{1} ϕ2\phi_{2} β0\beta_{0} β1\beta_{1} σ2\sigma^{2*} ϕ1\phi_{1} ϕ2\phi_{2} ν\nu DI (%) 0 4.014 0.505 0.944 0.476 -0.219 4.015 0.503 0.972 0.476 -0.218 19.338 24.67 1 4.014 0.532 1.050 0.464 -0.213 4.011 0.513 1.076 0.454 -0.206 12.553 82.33 2 4.007 0.562 1.213 0.439 -0.199 4.007 0.512 1.230 0.415 -0.181 6.854 98.33 3 3.996 0.594 1.429 0.409 -0.182 4.006 0.507 1.406 0.376 -0.156 4.651 100 4 3.982 0.628 1.694 0.378 -0.165 4.008 0.503 1.616 0.344 -0.135 3.803 100 5 3.965 0.664 2.005 0.349 -0.151 4.010 0.500 1.837 0.318 -0.118 3.386 100 7 3.926 0.737 2.760 0.301 -0.130 4.016 0.495 2.428 0.280 -0.092 2.926 100

Furthermore, we considered three different levels of censoring: the first case corresponds to the case without censoring; the second case considered 2.34 as a LOD, where all values lower or equal than the LOD were substituted by 2.34 (which implied an average of 10.04% of censoring); and finally the third scenario considered a LOD of 3.31 (which yielded an average of 30% of censoring). For each scenario, we fitted two models: the first considers normal innovations, denoted by CAR(2) whose parameters were estimated by the function ARCensReg from the R package ARCensReg (schumacher2016package), and the second one is our proposed model CARtt(2) fitted through the function ARtCensReg from the same package.

Table 2 displays the mean of the estimates for each parameter by level of perturbation and censoring rate. To obtain comparable values of σ2\sigma^{2}, for the model with Student-tt innovations we reported the estimated variance of the innovation σ^2=ν^σ^2/(ν^2)\widehat{\sigma}^{2*}=\widehat{\nu}\widehat{\sigma}^{2}/(\widehat{\nu}-2), where σ^2\widehat{\sigma}^{2} is the estimate of the scale parameter under our proposal. Moreover, the percentage of times that our model detected the perturbed observation as influential, denoted by DI(%), is also reported, which was computed as the number of times the estimated weight (u^i{\widehat{u}_{i}}, computed during the E-step in the SAEM algorithm) of the perturbed observation was the lowest one, divided by the number of MC samples.

Table 2a and Figure 5a show the results for the non-censored dataset, where it is possible to observe that for the normal distribution the bias increases as the perturbation increases, as natural; however, when Student-tt innovations are considered, the bias is much smaller, illustrating the robustness of the model against atypical distributions. As expected, estimates for ν\nu decrease as the perturbation grows. For the non-perturbed samples, the observation with the maximum value was detected as influential in only 17% of the samples, but this percentage increases fast along the perturbation. Observe that for ϑ=4,5\vartheta=4,5, and 77, the perturbed observation was detected as influential in all MC samples.

Table 2b and Figure 5b display the results for samples with an average of 10.04% of censoring. The estimates for β0\beta_{0} have a distribution similar to the non-censored case. On the other hand, for β1\beta_{1}, a more significant difference was observed between the real value and its estimate in the normal model. In contrast, the model with heavy-tailed innovations performs better in recovering the true parameter values, with a mean value of ν\nu smaller than the observed previously.

Finally, results for the scenario with an average of 30% of left censoring are shown in Table 2c and Figure 5c. For the normal case, the bias for β1\beta_{1} is more outstanding than the observed in the previous two cases, while for our model, its mean and median are generally close to the true parameter value, for all perturbation levels. For β0\beta_{0}, the normal model returned estimates close to the real value only for levels of perturbation lower than 4 (ϑ<4\vartheta<4), while for larger perturbations the models tends to underestimate it. These results confirm that the heavy-tails of the Student-tt distribution provides to our model the ability to mitigate the effect of outliers, i.e., a much more robust method against atypical values.

5 Application

In this section, the CARt(p)t(p) model is applied to a real environmental dataset that has both missing and censored observations. We consider the phosphorus concentration dataset previously analyzed by schumacher2017censored and wang2018quasi, available in the package ARCensReg. For this study, we are interested in tracking the phosphorus concentration levels over time as an indicator of the river water quality since, for instance, excessive phosphorus in surface water may result in eutrophication. The data consist of n=181n=181 observations of phosphorus concentration (PP) in mg/L monthly measured from October 1998 to October 2013 in West Fork Cedar River at Finchford, Iowa, USA. The phosphorus concentration measurement was subject to a LOD of 0.02, 0.05, or 0.10, depending on the time, and therefore 28 (15.47%) observations are left-censored. Moreover, there are 7 (3.87%) missing observations from September 2008 to March 2009 due to a program suspension caused by a temporary lack of funding.

As mentioned by wang2018quasi, PP levels are generally correlated with the water discharge (QQ), which is measured in cubic feet per second; then, our objective is to explore the relationship between PP and QQ, when the response contains censored and missing observations. Aiming to evaluate the prediction accuracy, the dataset was train-test split. The training dataset consists of 169 observations, where 20.71% are left-censored or missing, while the testing dataset contains 12 observations, all fully observed. Following wang2018quasi, to make the PP-QQ relationship more linear we considered the logarithmic transformation of PP and QQ and fitted the following CARt(p)t(p) model:

log(Pt)=j=14[β0jSjt+β1jSjtlog(Qt)]+𝝃t,\log(P_{t})=\sum_{j=1}^{4}\left[\beta_{0j}S_{jt}+\beta_{1j}S_{jt}\log(Q_{t})\right]+\boldsymbol{\xi}_{t}, (17)

t=p+1,,nt=p+1,\ldots,n, where the regression error 𝝃t\boldsymbol{\xi}_{t} follows an autoregressive model and SjS_{j} is a dummy seasonal variable for quarters j=1,2,3j=1,2,3, and 44. The first quarter corresponds from January to March, the second one from April to June, and so on. In this model, β0j\beta_{0j} and β1j\beta_{1j} are respectively the intercept and slope for quarter jj, for j=1,2,3j=1,2,3, and 44.

The AR order is unknown, but since the second observation from the training set (November 1998) is censored, it is not possible to consider a model of order greater than one because the proposed algorithm assumes that the first pp values are completely observed. Therefore, for this application we only considered a model of order 1. Besides that schumacher2017censored concluded that a censored autoregressive model of order 1 was the best to fit this dataset based on information criteria and mean squared prediction error (MSPE). The authors also found that the dataset has some influential observations, then it seems reasonable to consider a model with innovations following a heavy-tailed distribution.

Table 3: Phosphorus concentration data. Parameter estimates and their standard errors (SE) for the CAR(1) and CARtt(1) models.
Parameters CARtt(1) CAR(1)
Estimate SE 95% CI Estimate SE 95% CI
β01\beta_{01} -4.338 0.609 (-5.530 , -3.145) -4.670 0.513 (-5.674 , -3.665)
β02\beta_{02} -2.731 0.750 (-4.201 , -1.262) -3.022 0.751 (-4.494 , -1.549)
β03\beta_{03} -4.141 0.419 (-4.963 , -3.319) -4.174 0.442 (-5.041 , -3.307)
β04\beta_{04} -4.651 0.545 (-5.719 , -3.583) -4.999 0.549 (-6.075 , -3.922)
β11\beta_{11} 0.293 0.107 (0.084 , 0.503) 0.373 0.085 (0.206 , 0.541)
β12\beta_{12} 0.139 0.105 (-0.066 , 0.345) 0.185 0.105 (-0.021 , 0.391)
β13\beta_{13} 0.363 0.070 (0.227 , 0.500) 0.364 0.075 (0.218 , 0.510)
β14\beta_{14} 0.351 0.097 (0.161 , 0.542) 0.410 0.098 (0.218 , 0.601)
ϕ1\phi_{1} -0.087 0.085 -0.077 0.089
σ2\sigma^{2} 0.176 0.046 0.254 0.030
ν\nu 5.002 3.059 - -
MSPE 0.101 0.126
MAPE 0.240 0.255

For comparison purposes, we considered the model in Equation 17 with regression errors following an autoregressive model of order 1 (as defined by Equation 2), with innovations ηt\eta_{t} being independent and identically distributed as 𝒩(0,σ2)\mathcal{N}(0,\sigma^{2}) and t(0,σ2,ν)t(0,\sigma^{2},\nu) (CAR(1) and CARtt(1), respectively). Both models were fitted using the R library ARCensReg through the functions ARCensReg and ARtCensReg, respectively. Parameter estimates and their corresponding standard errors (SEs) are displayed in Table 3, along with the MSPE and the mean absolute prediction error (MAPE) were computed considering one-step-ahead predictions for the testing dataset. These criteria indicate that the heavy-tailed Student-tt (ν^=5.002\widehat{\nu}=5.002) model provides better predictions than the normal model for the phosphorus concentration data.

One can also note that the estimates for β0j,j=1,2,3,4\beta_{0j},j=1,2,3,4 under the CARtt(1) model are negative and greater than the estimates obtained from the CAR(1) model. On the other hand, estimates for slopes β1j\beta_{1j} are all positive and, except for the second quarter, they are significantly different from zero, evidencing the correlation between the water discharge and the phosphorus concentration. Regarding the autoregressive coefficient ϕ1\phi_{1}, both models estimated similar values.

Figure 6 presents a Q-Q plot (left), a residual vs. time scatterplot (middle), and a histogram (right) for residual analysis for both models. For the CARtt(1) model (top), we see in the Q-Q plot that all points are forming a roughly straight line and lie within the confidence bands. Further, the histogram seems to correspond to a normally distributed variable, and the dispersion plot seems to be related to independent residuals. On the other hand, the Q-Q plot for the CAR(1) model (bottom) presents some points outside of the confidence bands on the upper tail, indicating that the distribution is skewed or heavy-tailed. Additionally, we see from the scatterplot and histogram larger residual values than those from the tt model. Therefore, the CARtt(1) model seems to provide a better fit to the phosphorus concentration data than the CAR(1) model. Plots with the sample autocorrelation for the quantile residuals are displayed in Appendix B, Figure 9.

Figure 6: Phosphorus concentration data. Plots of residuals for the CAR(1) and CARtt(1) models.

CARtt(1)
Refer to caption CAR(1)
Refer to caption

Finally, Figure 7 shows the imputed values for the censored and missing observations from October 1998 to October 2012 (training dataset) and the predicted values from November 2012 to October 2013 (yellow box) considering normal (light blue line) and Student-tt innovations (pink line), with observed values represented as a black solid line and vertical black dashed lines indicating the period with missing values. Here, we see slight differences between the predicted values obtained under both fitted models. Besides, the general behavior of the imputed values for the missing period seems to capture well the seasonal behavior of the time series and is also similar for both models. In addition, for assessing the convergence of SAEM parameter estimates, convergence plots for our proposal are displayed in Figure 10 in Section B of the Appendix.

Figure 7: Phosphorus concentration data. Observed (black solid line) and predicted values considering Student-tt (pink line) and normal (light blue line) innovations. Black dashed lines represent the period with missing observations.
Refer to caption

6 Conclusions

Extending autoregressive regression methods to include censored response variables is a promising area of research. This paper introduces a novel model that can handle left, right, or interval censoring time series, while simultaneously modeling heavy-tails and missing observations, which can be treated as interval-censored observations.

Our approach extends some previous works, such as schumacher2017censored and liu2019parameter, and handles missing data while respecting the nature of the data without the need for transformations. The proposed methods have been coded and implemented in the R package ARCensReg, which is available for the users on the CRAN repository.

It is important to remark that we assumed the dropout/censoring mechanism to be missing at random (MAR) (see diggle2002analysis, p 283). However, in the case where MAR with ignorability is not realistic, the relationship between the unobserved measurements and the censoring process should be further investigated. Future directions point to tackling the limitation of assuming that the first pp observations are fully observed to fit a CARt(p)t(p) model. Furthermore, a natural and interesting path for future research is to extend this model to a multivariate framework.

\bmhead

Acknowledgments

This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. Larissa A. Matos acknowledges support from FAPESP-Brazil (Grant 2020/16713-0).

Appendix A Conditional distributions and derivatives

A.1 Full conditional distributions

The full conditional distributions are needed to perform the E-Step of the SAEM algorithm; these are f(uyo,ym,𝜽)f(\textbf{u}\mid\textbf{y}_{o},\textbf{y}_{m},\boldsymbol{\theta}) and f(ymu,yo,𝜽)f(\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{o},\boldsymbol{\theta}). We have that, the conditional probability density function (pdf) of u=(u1,,un)\textbf{u}=(u_{1},\ldots,u_{n})^{\top} is given by

f(uyo,ym,𝜽)=f(u,yo,ym𝜽)f(yo,ym𝜽)f(u,yo,ym𝜽)\displaystyle f(\textbf{u}\mid\textbf{y}_{o},\textbf{y}_{m},\boldsymbol{\theta})=\frac{f(\textbf{u},\textbf{y}_{o},\textbf{y}_{m}\mid\boldsymbol{\theta})}{f(\textbf{y}_{o},\textbf{y}_{m}\mid\boldsymbol{\theta})}\propto f(\textbf{u},\textbf{y}_{o},\textbf{y}_{m}\mid\boldsymbol{\theta})
i=p+1nui(ν1)/2exp{ui2σ2(yixi𝜷y(i,p)ϕ\displaystyle\propto\prod_{i=p+1}^{n}u_{i}^{(\nu-1)/2}\exp\Big{\{}-\frac{u_{i}}{2\sigma^{2}}\Big{(}y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}
+𝜷X(i,p)ϕ)2ν2ui}\displaystyle\hskip 99.58464pt+\boldsymbol{\beta}^{\top}\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}\Big{)}^{2}-\frac{\nu}{2}u_{i}\Big{\}}
=i=p+1nui(ν1)/2exp{ui2(ν+(yixi𝜷\displaystyle=\prod_{i=p+1}^{n}u_{i}^{(\nu-1)/2}\exp\Big{\{}-\frac{u_{i}}{2}\Big{(}\nu+\big{(}y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}
y(i,p)ϕ+𝜷X(i,p)ϕ)2/σ2)}.\displaystyle\hskip 85.35826pt-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}+\boldsymbol{\beta}^{\top}\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}\big{)}^{2}\big{/}\sigma^{2}\Big{)}\Big{\}}.

From the last expression, we deduce that UiU_{i} is independent of UjU_{j} for all iji\neq j, where Uiyo,ym,𝜽ind𝒢(ai,bi)U_{i}\mid\textbf{y}_{o},\textbf{y}_{m},\boldsymbol{\theta}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{G}\left(a_{i},b_{i}\right), with ai=ν+12a_{i}=\displaystyle\frac{\nu+1}{2}, bi=12(ν+ϱi2σ2)b_{i}=\displaystyle\frac{1}{2}\left(\nu+\frac{\varrho_{i}^{2}}{\sigma^{2}}\right), and ϱi=yixi𝜷y(i,p)ϕ+𝜷X(i,p)ϕ\varrho_{i}=y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}+\boldsymbol{\beta}^{\top}\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}, for i=p+1,,ni=p+1,\ldots,n.

To compute the condition distribution of ym\textbf{y}_{m}, we consider that the first pp observations are completely observed. Using the VAR(1) model representation, as zhou2020student suggested, we get the expression,

wt=𝚽wt1+𝜶t,\textbf{w}_{t}=\boldsymbol{\Phi}{\textbf{w}}_{t-1}+\boldsymbol{\alpha}_{t},

where 𝜶t=(ηt,0,,0)\boldsymbol{\alpha}_{t}=(\eta_{t},0,\ldots,0)^{\top} is a vector of dimension pp, 𝚽=[ϕ,A]\boldsymbol{\Phi}=[\boldsymbol{\phi},\textbf{A}^{\top}]^{\top} is a p×pp\times p matrix in which A is a (p1)×p(p-1)\times p matrix with the identity matrix in its first p1p-1 columns and 0s in the last one, wt=(y~t,y~t1,,y~tp+1)\textbf{w}_{t}=(\tilde{y}_{t},\tilde{y}_{t-1},\ldots,\tilde{y}_{t-p+1})^{\top}, and y~t=ytxt𝜷\tilde{y}_{t}=y_{t}-\textbf{x}_{t}^{\top}\boldsymbol{\beta}. Through a recursive process based on VAR(1) form, we have

t=p+1\displaystyle t=p+1 \displaystyle\Rightarrow wp+1=𝚽wp+𝜶p+1.\displaystyle{\textbf{w}}_{p+1}=\boldsymbol{\Phi}{\textbf{w}}_{p}+\boldsymbol{\alpha}_{p+1}.
t=p+2\displaystyle t=p+2 \displaystyle\Rightarrow wp+2=𝚽wp+1+𝜶p+2\displaystyle{\textbf{w}}_{p+2}=\boldsymbol{\Phi}{\textbf{w}}_{p+1}+\boldsymbol{\alpha}_{p+2}
=𝚽(𝚽wp+𝜶p+1)+𝜶p+2\displaystyle\quad=\boldsymbol{\Phi}(\boldsymbol{\Phi}{\textbf{w}}_{p}+\boldsymbol{\alpha}_{p+1})+\boldsymbol{\alpha}_{p+2}
=𝚽2wp+𝚽𝜶p+1+𝜶p+2.\displaystyle\quad=\boldsymbol{\Phi}^{2}{\textbf{w}}_{p}+\boldsymbol{\Phi}\boldsymbol{\alpha}_{p+1}+\boldsymbol{\alpha}_{p+2}.
t=p+3\displaystyle t=p+3 \displaystyle\Rightarrow wp+3=𝚽wp+2+𝜶p+3\displaystyle{\textbf{w}}_{p+3}=\boldsymbol{\Phi}{\textbf{w}}_{p+2}+\boldsymbol{\alpha}_{p+3}
=𝚽(𝚽2wp+𝚽𝜶p+1+𝜶p+2)+𝜶p+3\displaystyle\quad=\boldsymbol{\Phi}(\boldsymbol{\Phi}^{2}{\textbf{w}}_{p}+\boldsymbol{\Phi}\boldsymbol{\alpha}_{p+1}+\boldsymbol{\alpha}_{p+2})+\boldsymbol{\alpha}_{p+3}
=𝚽3wp+𝚽2𝜶p+1+𝚽𝜶p+2+𝜶p+3.\displaystyle\quad=\boldsymbol{\Phi}^{3}{\textbf{w}}_{p}+\boldsymbol{\Phi}^{2}\boldsymbol{\alpha}_{p+1}+\boldsymbol{\Phi}\boldsymbol{\alpha}_{p+2}+\boldsymbol{\alpha}_{p+3}.
\displaystyle\vdots
t=p+k\displaystyle t=p+k \displaystyle\Rightarrow wp+k=𝚽wp+k1+𝜶p+k\displaystyle{\textbf{w}}_{p+k}=\boldsymbol{\Phi}{\textbf{w}}_{p+k-1}+\boldsymbol{\alpha}_{p+k}
=𝚽kwp+j=0k1𝚽j𝜶p+kj.\displaystyle\quad=\boldsymbol{\Phi}^{k}{\textbf{w}}_{p}+\sum_{j=0}^{k-1}\boldsymbol{\Phi}^{j}\boldsymbol{\alpha}_{p+k-j}.

Note that the model defined in Equations 1 and 2 can be recovered through the first element of the preceding vectors, i.e.,

yp+k\displaystyle y_{p+k} =\displaystyle= xp+k𝜷+(𝚽k)1.(y(p+1,p)X(p+1,p)𝜷)\displaystyle\textbf{x}_{p+k}^{\top}\boldsymbol{\beta}+\left(\boldsymbol{\Phi}^{k}\right)^{\top}_{1.}\Big{(}{\textbf{y}}_{(p+1,p)}-\textbf{X}_{(p+1,p)}\boldsymbol{\beta}\Big{)} (18)
+j=0k1(𝚽j)11ηp+kj,\displaystyle+\sum_{j=0}^{k-1}\left(\boldsymbol{\Phi}^{j}\right)_{11}\eta_{p+k-j},

where 𝚽k\boldsymbol{\Phi}^{k} represents the matrix 𝚽\boldsymbol{\Phi} multiplied by itself kk times, (𝚽k)1.\big{(}\boldsymbol{\Phi}^{k}\big{)}_{1.} is a p×1p\times 1 vector whose elements correspond to the first row of 𝚽k\boldsymbol{\Phi}^{k}, and (𝚽k)11\big{(}\boldsymbol{\Phi}^{k}\big{)}_{11} is the element (1,1) of 𝚽k\boldsymbol{\Phi}^{k}. From Equation 18, we deduce that Yp+kY_{p+k} given y(p+1,p)\textbf{y}_{(p+1,p)}, 𝜽\boldsymbol{\theta}, and U=u\textbf{U}=\textbf{u} is normally distributed, for all k=1,,npk=1,\ldots,n-p. Thus, the conditional expectation and the elements of the variance-covariance matrix that characterizes the normal distribution will be given by

μ~(k)\displaystyle\tilde{\mu}_{(k)} =𝔼[Yp+ku,y(p+1,p),𝜽]\displaystyle=\mathbb{E}\big{[}Y_{p+k}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta}\big{]}
=𝔼[xp+k𝜷+(𝚽k)1.(y(p+1,p)X(p+1,p)𝜷)\displaystyle=\mathbb{E}\Bigg{[}\textbf{x}_{p+k}^{\top}\boldsymbol{\beta}+\big{(}\boldsymbol{\Phi}^{k}\big{)}^{\top}_{1.}({\textbf{y}}_{(p+1,p)}-\textbf{X}_{(p+1,p)}\boldsymbol{\beta})
+j=0k1(𝚽j)11ηp+kju,y(p+1,p),𝜽]\displaystyle\quad\quad+\sum_{j=0}^{k-1}\big{(}\boldsymbol{\Phi}^{j}\big{)}_{11}\eta_{p+k-j}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta}\Bigg{]}
=xp+k𝜷+(𝚽k)1.(y(p+1,p)X(p+1,p)𝜷).\displaystyle=\textbf{x}_{p+k}^{\top}\boldsymbol{\beta}+\big{(}\boldsymbol{\Phi}^{k}\big{)}^{\top}_{1.}({\textbf{y}}_{(p+1,p)}-\textbf{X}_{(p+1,p)}\boldsymbol{\beta}). (19)
σ~(kl)\displaystyle\tilde{\sigma}_{(kl)} =Cov(Yp+k,Yp+lu,y(p+1,p),𝜽)\displaystyle=\mathrm{Cov}(Y_{p+k},Y_{p+l}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta})
=Cov(j=0k1(𝚽j)11ηp+kj,\displaystyle=\mathrm{Cov}\Bigg{(}\sum_{j=0}^{k-1}\Big{(}\boldsymbol{\Phi}^{j}\Big{)}_{11}\eta_{p+k-j},
i=0l1(𝚽i)11ηp+liu,y(p+1,p),𝜽)\displaystyle\hskip 36.98866pt\sum_{i=0}^{l-1}\left(\boldsymbol{\Phi}^{i}\right)_{11}\eta_{p+l-i}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta}\Bigg{)}
=j=0k1i=0l1(𝚽j)11(𝚽i)11\displaystyle=\sum_{j=0}^{k-1}\sum_{i=0}^{l-1}\left(\boldsymbol{\Phi}^{j}\right)_{11}\left(\boldsymbol{\Phi}^{i}\right)_{11}
×Cov(ηp+kj,ηp+liu,y1:p,𝜽)\displaystyle\hskip 36.98866pt\times\mathrm{Cov}\Bigg{(}\eta_{p+k-j},\eta_{p+l-i}\mid\textbf{u},\textbf{y}_{1:p},\boldsymbol{\theta}\Bigg{)}
=j=0r1σ2up+rj(𝚽j)11(𝚽kl+j)11\displaystyle=\sum_{j=0}^{r-1}\frac{\sigma^{2}}{u_{p+r-j}}\left(\boldsymbol{\Phi}^{j}\right)_{11}\Big{(}\boldsymbol{\Phi}^{\mid k-l\mid+j}\Big{)}_{11}
=j=1rσ2up+j(𝚽kj)11(𝚽lj)11,\displaystyle=\sum_{j=1}^{r}\frac{\sigma^{2}}{u_{p+j}}\big{(}\boldsymbol{\Phi}^{k-j}\big{)}_{11}\big{(}\boldsymbol{\Phi}^{l-j}\big{)}_{11}, (20)

where r=min(k,l)r=\min(k,l). Now, suppose that y is partitioned into two vectors, y1:pp\textbf{y}_{1:p}\in\mathbb{R}^{p} containing the first pp observed values and ypnp\textbf{y}_{-p}\in\mathbb{R}^{n-p} containing the remaining observations, i.e., y=(y1:p,yp)\textbf{y}=(\textbf{y}_{1:p}^{\top},\textbf{y}_{-p}^{\top})^{\top}. Let yo\textbf{y}_{o} and ym\textbf{y}_{m} be the observed and the censored/missing part of yp\textbf{y}_{-p}, respectively, then the distribution of ypu,y1:p,𝜽yo,ymu,y1:p,𝜽𝒩np(𝝁~,𝚺~)\textbf{y}_{-p}\mid\textbf{u},\textbf{y}_{1:p},\boldsymbol{\theta}\equiv\textbf{y}_{o},\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{1:p},\boldsymbol{\theta}\sim\mathcal{N}_{n-p}(\tilde{\boldsymbol{\mu}},\tilde{\boldsymbol{\Sigma}}), where the iith element of 𝝁~\tilde{\boldsymbol{\mu}} is μ~(i)\tilde{\mu}_{(i)} and the element (i,j)(i,j) of 𝚺~\tilde{\boldsymbol{\Sigma}} is equal to σ~(ij)\tilde{\sigma}_{(ij)}, for all i,j=1,,npi,j=1,\ldots,n-p.

To compute the conditional distribution of ymu,yo,y1:p,𝜽\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{o},\textbf{y}_{1:p},\boldsymbol{\theta}, we rearrange the elements of yp,𝝁~\textbf{y}_{-p},\tilde{\boldsymbol{\mu}}, and 𝚺~\tilde{\boldsymbol{\Sigma}} as follows:

yp=(yoym),𝝁~=(𝝁~o𝝁~m),\displaystyle\textbf{y}_{-p}=\left(\begin{array}[]{c}\textbf{y}_{o}\\ \textbf{y}_{m}\end{array}\right),\quad\tilde{\boldsymbol{\mu}}=\left(\begin{array}[]{c}\tilde{\boldsymbol{\mu}}_{o}\\ \tilde{\boldsymbol{\mu}}_{m}\end{array}\right),
and𝚺~=(𝚺~oo𝚺~om𝚺~mo𝚺~mm).\displaystyle\mbox{and}\quad\tilde{\boldsymbol{\Sigma}}=\left(\begin{array}[]{cc}\tilde{\boldsymbol{\Sigma}}_{oo}&\tilde{\boldsymbol{\Sigma}}_{om}\\ \tilde{\boldsymbol{\Sigma}}_{mo}&\tilde{\boldsymbol{\Sigma}}_{mm}\end{array}\right).

Using the results for the conditional distribution of the normal distribution, we obtain

ymu,yo,y1:p,𝜽𝒩(𝝁~,𝚺~),\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{o},\textbf{y}_{1:p},\boldsymbol{\theta}\sim\mathcal{N}(\tilde{\boldsymbol{\mu}}^{*},\tilde{\boldsymbol{\Sigma}}^{*}),

with 𝝁~=𝝁~m+𝚺~mo𝚺~oo1(yoμ~o)\tilde{\boldsymbol{\mu}}^{*}=\tilde{\boldsymbol{\mu}}_{m}+\tilde{\boldsymbol{\Sigma}}_{mo}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1}(\textbf{y}_{o}-\tilde{\mu}_{o}) and 𝚺~=𝚺~mm𝚺~mo𝚺~oo1𝚺~om\tilde{\boldsymbol{\Sigma}}^{*}=\tilde{\boldsymbol{\Sigma}}_{mm}-\tilde{\boldsymbol{\Sigma}}_{mo}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1}\tilde{\boldsymbol{\Sigma}}_{om}.

A.2 First and second derivatives of the complete-data log-likelihood function

The complete log-likelihood function for the model defined by Equations 1-6 is given by

c(𝜽;yc)\displaystyle\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c}) =\displaystyle= g(νu)np2logσ2\displaystyle g(\nu\mid\textbf{u})-\frac{n-p}{2}\log\sigma^{2}
12σ2i=p+1nui(y~iwiϕ)2+cte,\displaystyle-\frac{1}{2\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2}+\text{cte},

where g(νu)=np2(νlog(ν2)2logΓ(ν2))+ν2(i=p+1nloguii=p+1nui)g(\nu\mid\textbf{u})=\frac{n-p}{2}\left(\nu\log\left(\frac{\nu}{2}\right)-2\log\Gamma\left(\frac{\nu}{2}\right)\right)+\frac{\nu}{2}\left(\sum_{i=p+1}^{n}\log u_{i}-\sum_{i=p+1}^{n}u_{i}\right), y~i=yixi𝜷\tilde{y}_{i}=y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}, and wi=y(i,p)X(i,p)𝜷\textbf{w}_{i}=\textbf{y}_{(i,p)}-\textbf{X}_{(i,p)}\boldsymbol{\beta}.

The first derivative of c(𝜽;yc)\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c}) with respect to each element of 𝜽=(𝜷,ϕ,σ2,ν)\boldsymbol{\theta}=(\boldsymbol{\beta}^{\top},\boldsymbol{\phi}^{\top},\sigma^{2},\nu) is:

c(𝜽;yc)ν=np2(log(ν2)+1ψ(ν2))\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\nu}=\frac{n-p}{2}\left(\log\left(\frac{\nu}{2}\right)+1-\psi\left(\frac{\nu}{2}\right)\right)
+12(i=p+1nloguii=p+1nui),\displaystyle\hskip 56.9055pt+\frac{1}{2}\left(\sum_{i=p+1}^{n}\log u_{i}-\sum_{i=p+1}^{n}u_{i}\right),
c(𝜽;yc)σ2=np2σ2+12σ4i=p+1nui(y~iwiϕ)2,\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\sigma^{2}}=-\frac{n-p}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2},
c(𝜽;yc)ϕ=1σ2i=p+1n[ui(yixi𝜷)wiuiwiwiϕ],\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}}=\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}\left[u_{i}(y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta})\textbf{w}_{i}-u_{i}\textbf{w}_{i}\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right],
c(𝜽;yc)𝜷=1σ2i=p+1n[ui(yiy(i,p)ϕ)𝜶iui𝜶i𝜶i𝜷],\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}}=\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}\left[u_{i}(y_{i}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi})\boldsymbol{\alpha}_{i}-u_{i}\boldsymbol{\alpha}_{i}\boldsymbol{\alpha}_{i}^{\top}\boldsymbol{\beta}\right],

with 𝜶i=xiX(i,p)ϕ\boldsymbol{\alpha}_{i}=\textbf{x}_{i}-\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}\, and ψ(x)=ddxlogΓ(x)=Γ(x)Γ(x)\psi(x)=\frac{\mbox{d}}{\mbox{d}x}\log\Gamma(x)=\frac{\Gamma^{\prime}(x)}{\Gamma(x)}.

Besides, elements of the Hessian matrix are given by:

2c(𝜽;yc)ν2=np2(1ν12ψ1(ν2)),\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\nu^{2}}=\frac{n-p}{2}\left(\frac{1}{\nu}-\frac{1}{2}\psi_{1}\left(\frac{\nu}{2}\right)\right),
2c(𝜽;yc)σ2ν=0,\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\sigma^{2}\partial\nu}=0,
2c(𝜽;yc)ϕν=0,\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}\partial\nu}=\textbf{0},
2c(𝜽;yc)𝜷ν=0,\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\nu}=\textbf{0},
2c(𝜽;yc)(σ2)2=np2σ41σ6i=p+1nui(y~iwiϕ)2,\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial(\sigma^{2})^{2}}=\frac{n-p}{2\sigma^{4}}-\frac{1}{\sigma^{6}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2},
2c(𝜽;yc)ϕσ2=1σ4i=p+1n[ui(yixi𝜷)wi\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}\partial\sigma^{2}}=-\frac{1}{\sigma^{4}}\sum_{i=p+1}^{n}\Big{[}u_{i}\left(y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}\right)\textbf{w}_{i}
uiwiwiϕ],\displaystyle\hskip 113.81102pt-u_{i}\textbf{w}_{i}\textbf{w}_{i}^{\top}\boldsymbol{\phi}\Big{]},
2c(𝜽;yc)𝜷σ2=1σ4i=p+1n[ui(yiy(i,p)ϕ)𝜶i\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\sigma^{2}}=-\frac{1}{\sigma^{4}}\sum_{i=p+1}^{n}\Big{[}u_{i}\left(y_{i}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}\right)\boldsymbol{\alpha}_{i}
ui𝜶i𝜶i𝜷],\displaystyle\hskip 113.81102pt-u_{i}\boldsymbol{\alpha}_{i}\boldsymbol{\alpha}_{i}^{\top}\boldsymbol{\beta}\Big{]},
2c(𝜽;yc)ϕϕ=1σ2i=p+1nuiwiwi,\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}\partial\boldsymbol{\phi}^{\top}}=-\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\textbf{w}_{i}\textbf{w}_{i}^{\top},
2c(𝜽;yc)𝜷ϕ=1σ2i=p+1n[ui(wiϕ+xi𝜷yi)X(i,p)\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\boldsymbol{\phi}^{\top}}=\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}\Big{[}u_{i}\left(\textbf{w}_{i}^{\top}\boldsymbol{\phi}+\textbf{x}_{i}^{\top}\boldsymbol{\beta}-y_{i}\right)\textbf{X}_{(i,p)}^{\top}
ui𝜶iwi],\displaystyle\hskip 105.2751pt-u_{i}\boldsymbol{\alpha}_{i}\textbf{w}_{i}^{\top}\Big{]},
2c(𝜽;yc)𝜷𝜷=1σ2i=p+1nui𝜶i𝜶i,\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\boldsymbol{\beta}^{\top}}=-\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\boldsymbol{\alpha}_{i}\boldsymbol{\alpha}_{i}^{\top},

with ψ1(x)=d2dx2logΓ(x)\psi_{1}(x)=\frac{\mbox{d}^{2}}{\mbox{d}x^{2}}\log\Gamma(x).

Appendix B Additional results

This section displays some additional results obtained from the simulation study and the analysis of the phosphorus concentration dataset.

B.1 Simulation study 1

Figure 8 shows boxplots for the estimates of ν\nu considering different sample sizes and limits of detection (LOD). Here it is possible to observe that the median of the estimates is close to the true value (ν=4\nu=4) independent of the sample size and LOD. Furthermore, interquartile ranges decrease as sample sizes increase, suggesting the consistence of the estimates.

Figure 8: Simulation Study 1. Boxplot of the estimates for ν\nu in the CARt(2)t(2) model considering different sample sizes and LOD.
Refer to caption

B.2 Application

For the phosphorus concentration dataset, available in the package ARCensReg, was fitted two censored autoregressive models, one which considered a model with Student-tt innovations (CARtt(1)) and other with normal innovations (CAR(1)). Then, Figure 9 displays the sample autocorrelation for the quantile residuals computed from each model, where we noted that the obtained residuals present negligible serial autocorrelation.

Figure 9: Phosphorus concentration data. Sample autocorrelation function for the quantile residuals obtained from the CARtt(1) and CAR(1) model.
Refer to captionRefer to caption

Finally, Figure 10 shows the convergence graphics of the SAEM parameter estimates obtained for the CARtt(1) model at each iteration, the model selected as the best for fitting this dataset based on prediction accuracy and residual analysis.

Figure 10: Phosphorus concentration data. Convergence of the SAEM parameter estimates for the CARtt(1) model.
Refer to caption