\jyear

2021

[1]\fnm Larissa A. \surMatos

[1]\orgdivDepartamento de Estatística, \orgnameUniversidade Estadual de Campinas, \orgaddress\cityCampinas, \postcode13083-859, \stateSão Paulo, \countryBrazil

2]\orgdivDepartamento de Matemáticas, \orgnameEscuela Superior Politécnica del Litoral, \orgaddress\cityGuayaquil, \postcode090112, \countryEcuador

Censored autoregressive regression models with Student- $t$ innovations

\fnmKatherine A. L. \surValeriano [email protected] \fnmFernanda L. \surSchumacher [email protected] \fnmChristian E. \surGalarza [email protected] [email protected] * [

Abstract

The Student- $t$ distribution is widely used in statistical modeling of datasets involving outliers since its longer-than-normal tails provide a robust approach to hand such data. Furthermore, data collected over time may contain censored or missing observations, making it impossible to use standard statistical procedures. This paper proposes an algorithm to estimate the parameters of a censored linear regression model when the regression errors are autocorrelated and the innovations follow a Student- $t$ distribution. To fit the proposed model, maximum likelihood estimates are obtained throughout the SAEM algorithm, which is a stochastic approximation of the EM algorithm useful for models in which the E-step does not have an analytic form. The methods are illustrated by the analysis of a real dataset that has left-censored and missing observations. We also conducted two simulations studies to examine the asymptotic properties of the estimates and the robustness of the model.

keywords:

Autoregressive models; Censored data; Heavy-tailed distributions; Missing data; SAEM algorithm

1 Introduction

A linear regression model is a commonly used statistical tool to analyze the relation between a response (dependent) and some explanatory (independent) variables. In these models, the errors are usually considered as independent and identically distributed random variables with zero-mean and constant-variance. However, observations collected over time are often autocorrelated rather than independent. Examples of this kind of data abound in economics, medicine, environmental monitoring, and social sciences and therefore the study of the dependence among observations is of considerable practical interest (see tsay2005analysis; box2015time).

A stochastic model that has been successfully used in many real-world applications to deal with serial correlation in the error term is the autoregressive (AR) model. In the AR model, the current state of the process is expressed as a finite linear combination of previous states and a stochastic shock of disturbance, which is also known as innovation in the time series literature. In general, it is assumed that the disturbance is normally distributed. For example, beach1978maximum suggested a maximum likelihood (ML) estimation method to estimate the parameters of AR(1) error term regression model, ullah1983estimation used the two-stage Prais-Winsten estimator for a linear regression model with first-order autocorrelated disturbances, where a Gaussian assumption is considered for the innovations to derive a large-sample asymptotic approximation for the variance-covariance matrix, and more recently alpuim2008efficiency proposed a linear regression model with the sequence of error terms following a Gaussian autoregressive stationary process, in which the parameters of the model are estimated using ML and ordinary least squares.

In practice, however, the normality assumption may be unrealistic, especially in the presence of outliers. To relax this assumption, tiku1999estimating suggested considering non-normal symmetric innovations in a simple regression model with AR(1) error term. tuacc2018robust proposed a model with AR(p) error term considering the $t$ distribution with fixed degrees of freedom as an alternative to the normal distribution and applying the conditional maximum likelihood method to find the estimators of the model parameters, and nduka2018based developed an EM algorithm to estimate the parameters in autoregressive models of order $p$ with Student- $t$ innovations.

An additional challenge arises when some observations are censored or missing. In the first case, values can occur out of the range of a measuring instrument, and in the second, the value is unknown. Censored time series data are frequently encountered in environmental monitoring, medicine, and economics, and there are some proposals in the literature related to censored autoregressive linear regression models with normal innovations. For example, zeger1986regression proposed a Gaussian censored autocorrelated model with parameters estimated by maximizing the full likelihood using the quasi-Newton and Expectation-Maximization (EM) algorithms, and the authors pointed out that the quasi-Newton approach is preferable because it estimates the variance-covariance matrix for the parameters.

Additionally, park2007censored developed a censored autoregressive moving average model with Gaussian white noise, in which the algorithm works iteratively imputing the censored or missing observations by sampling from the truncated normal distribution and the parameters are estimated by maximizing an approximate full likelihood function, and wang2018quasi suggested a quasi-likelihood method using the complete-incomplete data framework. Recently, schumacher2017censored considered an autoregressive censored linear model with the parameters estimated via an analytically tractable and efficient stochastic approximation of the EM (SAEM) algorithm, and liu2019parameter proposed a coupled Monte Carlo Markov Chain (MCMC)-SAEM algorithm to fit an AR $(p)$ regression model with Student- $t$ innovations accounting for missing data.

Nevertheless, to the best of our knowledge there are no studies that consider the Student- $t$ distribution for the innovations in censored autoregressive models from a likelihood-based perspective. Thence, in this work we propose an EM-type algorithm to estimate the parameters of a censored regression model with autoregressive errors and innovations following a Student- $t$ distribution. Specifically, the SAEM algorithm is considered to avoid the direct computation of complex expressions on the E-step of the EM algorithm (dempster1977maximum) and since its computational effort is much smaller in comparison to the Monte Carlo EM algorithm (wei1990monte), as shown by jank2006implementing.

The paper is organized as follows. Section 2 introduces the autoregressive regression models of order $p$ and the SAEM algorithm. Section 3 is devoted to formulate the censored autoregressive model with Student- $t$ innovations, providing the expressions used to estimate the parameters of the proposed model, the method used to estimate the observed information matrix, and the expression to make predictions. Section 4 displays some results of two simulation studies carried out to examine the asymptotic properties of the estimators, and to demonstrate the robustness of the model. Section 5 applies the proposed model to the total phosphorus concentration dataset available in the R package ARCensReg (schumacher2016package; rteam), along with an analysis of quantile residuals. Finally, Section 6 concludes with a discussion.

2 Preliminaries

We begin by defining some notations and introducing the basic definitions and algorithms used throughout this work. The normal distribution with mean $\mu$ and variance $\sigma^{2}$ is denoted by $\mathcal{N}(\mu,\sigma^{2})$ , the notation for a $p$ -variate normal distribution with mean $\boldsymbol{\mu}$ and variance-covariance matrix $\boldsymbol{\Sigma}$ is $\mathcal{N}_{p}(\boldsymbol{\mu},\boldsymbol{\Sigma})$ , and a truncated multivariate normal distribution with truncation region $\mathbb{A}$ is denoted by $\mbox{TN}_{p}(\boldsymbol{\mu},\boldsymbol{\Sigma}\mathchar 24635\relax\;\mathbb{A})$ . The notation for a random variable with Gamma distribution is given by $\mathcal{G}(\alpha,\beta)$ , where $\alpha>0$ is the shape parameter and $\beta>0$ is the rate parameter. The Student- $t$ distribution with location parameter $\mu$ , scale parameter $\sigma^{2}>0$ , and $\nu>0$ degrees of freedom is denoted by $t(\mu,\sigma^{2},\nu)$ . A random variable $X$ with uniform distribution on the interval $(a,b)$ is denoted by $X\sim\mathcal{U}(a,b)$ . Furthermore, the expression “ $iid$ ” means independent and identically distributed, ${\bf A}^{\top}$ represents the transpose of ${\bf A}$ , $\Gamma(a)=\int_{0}^{\infty}x^{a-1}e^{-x}\mathrm{d}x$ is the gamma function evaluated at $a>0$ , and $\mathcal{F}_{t}=\sigma(Y_{1},Y_{2},\ldots,Y_{t})$ is the $\sigma$ -field generated by $\{Y_{1},Y_{2},\ldots,Y_{t}\}$ . Finally, for integers $2\leq t\leq n$ and $1\leq p<t$ , we use the index $(t,p)$ as follows: for a vector $\textbf{y}=(y_{1},y_{2},\ldots,y_{n})^{\top}\in\mathbb{R}^{n}$ , then $\textbf{y}_{(t,p)}=(y_{t-1},y_{t-2},\ldots,y_{t-p})^{\top}\in\mathbb{R}^{p}$ ; and for a matrix $\textbf{X}\in\mathbb{R}^{n\times q}$ with rows $\textbf{x}_{i}\in\mathbb{R}^{q}$ , $1\leq i\leq n$ , then $\textbf{X}_{(t,p)}=[\textbf{x}_{t-1},\textbf{x}_{t-2},\ldots,\textbf{x}_{t-p}]^{\top}\in\mathbb{R}^{p\times q}$ .

2.1 The autoregressive regression model of order $p$

First, consider a linear regression model with errors that are autocorrelated as a discrete-time autoregressive process of order $p$ ; thus, the model for an observation at time $t$ is given by

$\displaystyle Y_{t}$	$\displaystyle=$	$\displaystyle\textbf{x}_{t}^{\top}\boldsymbol{\beta}+\boldsymbol{\xi}_{t},$	(1)
$\displaystyle\boldsymbol{\xi}_{t}$	$\displaystyle=$	$\displaystyle\phi_{1}\boldsymbol{\xi}_{t-1}+\ldots+\phi_{p}\boldsymbol{\xi}_{t-p}+\eta_{t},$
$\displaystyle\eta_{t}$	$\displaystyle\stackrel{{\scriptstyle iid}}{{\sim}}$	$\displaystyle F(\cdot),\quad t=1,\ldots,n,$	(2)

where $Y_{t}$ represents the dependent variable observed at time $t$ , $\textbf{x}_{t}=(x_{t1},\ldots,x_{tq})^{\top}$ is a $q\times 1$ vector of known covariables, $\boldsymbol{\beta}$ is a $q\times 1$ vector of unknown regression parameters to be estimated, and $\boldsymbol{\xi}_{t}$ is the regression error and is assumed to follow an autoregressive model, with $\boldsymbol{\phi}=(\phi_{1},\ldots,\phi_{p})^{\top}$ begin a $p\times 1$ vector of autoregressive coefficients and $\eta_{t}$ being a shock of disturbance with distribution $F(\cdot)$ . The term denoted by $\eta_{t}$ is also known as the innovation in the time series literature (see, for instance, box2015time; schumacher2017censored; wang2018quasi).

Now, suppose that $\eta_{t}$ in Equation 2 follows a Student- $t$ distribution with location parameter 0, scale parameter $\sigma^{2}>0$ , and $\nu>0$ degrees of freedom, whose probability density function (pdf) can be written as

f(\eta\mathchar 24635\relax\;\sigma^{2},\nu)=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right)(\pi\nu\sigma^{2})^{\frac{1}{2}}}\left(1+\frac{\eta^{2}}{\nu\sigma^{2}}\right)^{-\frac{\nu+1}{2}},

(3)

$\eta\in\mathbb{R}.$ Then, the model defined by Equations 1-3 will be called the autoregressive regression $t$ model of order $p$ (AR $t(p)$ ) hereinafter.

Particularly, the distribution of $\eta_{t}\sim t(0,\sigma^{2},\nu)$ might be written using a scale mixture of normal (SMN) distribution as described in kotz2004multivariate, where $\eta_{t}$ is equal in distribution to $Z_{t}/\sqrt{U_{t}}$ , with $Z_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,\sigma^{2})$ , $U_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{G}(\nu/2,\nu/2)$ , and $Z_{t}$ being independent of $U_{t}$ . Consequently, the conditional distribution of $\eta_{t}$ given $U_{t}=u_{t}$ is normal with zero mean and variance $\sigma^{2}/u_{t}$ , i.e.,

\eta_{t}\mid\,(U_{t}=u_{t})\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{N}(0,\sigma^{2}/u_{t})\ \mbox{and}\ U_{t}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{G}(\nu/2,\nu/2),

$t=1,\ldots,n.$ This representation facilitates the implementation of an EM-type algorithm, which is discussed next.

2.2 The SAEM algorithm

The EM algorithm was first introduced by dempster1977maximum and provides a general approach to the iterative computation of ML estimates in problems with incomplete data. Let $\textbf{y}_{c}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{\top})^{\top}$ be the complete data, where $\textbf{y}_{o}$ is the observed data and $\textbf{y}_{m}$ is the incomplete (or missing) data. Let $\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})$ be the complete log-likelihood function. The EM algorithm proceeds as follows:

•

E-step: Let $\widehat{\boldsymbol{\theta}}^{(k)}$ be the estimate of $\boldsymbol{\theta}$ at the $k$ th iteration. Compute the conditional expectation of the complete log-likelihood function

Q_{k}(\boldsymbol{\theta})=\mathbb{E}\left[\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})\mid\textbf{y}_{o},\widehat{\boldsymbol{\theta}}^{(k)}\right].

•

M-step: Maximize $Q_{k}(\boldsymbol{\theta})$ with respect to $\boldsymbol{\theta}$ to obtain $\widehat{\boldsymbol{\theta}}^{(k+1)}$ .

There are situations where the E-step has no analytic form. To deal with these cases, wei1990monte proposed to replace the expectation in the computation of $Q_{k}(\boldsymbol{\theta})$ with a Monte Carlo (MC) approximation based on a large number of independent simulations of the missing data, which was called the Monte Carlo EM (MCEM) algorithm. As an alternative to the computationally intensive MCEM algorithm, delyon1999convergence proposed a scheme that splits the E-step into a simulation step and an integration step (using a stochastic approximation), while the maximization step remains unchanged. This algorithm was called the SAEM algorithm and performs as follows:

•

E-step:

1.

Simulation: Draw $M$ samples of the missing data $\textbf{y}_{m}^{(l,k)},l=1,\ldots,M$ from the conditional distribution $f(\textbf{y}_{m}\mathchar 24635\relax\;\widehat{\boldsymbol{\theta}}^{(k)},\textbf{y}_{o})$ .

Stochastic approximation: Update ${Q}_{k}(\boldsymbol{\theta})$ by

	$\displaystyle\widehat{Q}_{k}(\boldsymbol{\theta})=\widehat{Q}_{k-1}(\boldsymbol{\theta})$
	$\displaystyle\hskip 14.22636pt+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{m}^{(k,l)},\textbf{y}_{o})-\widehat{Q}_{k-1}(\boldsymbol{\theta})\right),$

where $\delta_{k}$ is a decreasing sequence of positive numbers such that $\sum_{k=1}^{\infty}\delta_{k}=\infty$ and $\sum_{k=1}^{\infty}\delta_{k}^{2}<\infty$ , also known as the smoothness parameter (kuhn2005maximum).

•

M-step: Update $\widehat{\boldsymbol{\theta}}^{(k)}$ as

$\widehat{\boldsymbol{\theta}}^{(k+1)}=\underset{\boldsymbol{\theta}}{\operatorname{argmax}}\;\widehat{Q}_{k}(\boldsymbol{\theta}).$

This process is iterated until some distance between two successive evaluations of the actual log-likelihood function becomes small enough.

3 Proposed model and parameter estimation

This section is devoted the formulating and the ML estimation of the censored autoregressive linear model. Here, we specify the log-likelihood function, the algorithm used to overcome the parameter estimation problem, expressions to approximate the standard errors, and expressions to predict at unobserved times when the values of some covariables (if needed) at these times are available.

3.1 The censored ART( $p$ ) model

Assume that the response variable $Y_{t}$ given in Equation 1 is not fully observed for all times. Instead, we observe $(V_{t},C_{t})$ , where $V_{t}$ represents either an observed value or the limit of detection (LOD) of a censored variable, and $C_{t}$ is the censoring indicator defined as

\displaystyle C_{t}=\left\{\begin{array}[]{lll}1&\mbox{if }\quad V_{t1}\leq Y_{t}\leq V_{t2},&(\mbox{censored})\\ 0&\mbox{if }\quad V_{t}=Y_{t}.&(\mbox{observed})\end{array}\right.

(6)

Thereby, the model defined by Equations 1-6 will be called censored autoregressive regression $t$ model of order $p$ (CAR $t(p)$ ). We say that $Y_{t}$ is left-censored if $C_{t}=1$ and $V_{t}=(-\infty,V_{t2}]$ , right-censored if $C_{t}=1$ and $V_{t}=[V_{t1},+\infty)$ , interval censored if $C_{t}=1$ and $V_{t}=[V_{t1},V_{t2}]$ , and it represents a missing value if $C_{t}=1$ and $V_{t}=(-\infty,+\infty)$ .

To compute the log-likelihood function of the CAR $t(p)$ model, we condition the marginal distribution on the first $p$ observations, which are considered fully observed, and we denote by $\textbf{y}_{o}\in\mathbb{R}^{n_{o}}$ the vector of observed data and by $\textbf{y}_{m}\in\mathbb{R}^{n_{m}}$ the vector of censored or missing observations, with $n_{o}+n_{m}=n-p$ . Note also that the distribution of $Y_{t}$ conditional to all the preceding data $\mathcal{F}_{t-1}$ only depends on the previous $p$ observations. Then, for $\boldsymbol{\theta}=(\boldsymbol{\beta}^{\top},\boldsymbol{\phi}^{\top},\sigma^{2},\nu)^{\top}$ , $Y_{t}$ given $\textbf{y}_{(t,p)}$ follows a Student- $t$ distribution with location parameter $\mu_{t}=\textbf{x}_{t}^{\top}\boldsymbol{\beta}+(\textbf{y}_{(t,p)}-\textbf{X}_{(t,p)}\boldsymbol{\beta})^{\top}\boldsymbol{\phi}$ , scale parameter $\sigma^{2}$ , and $\nu$ degrees of freedom, where $\textbf{X}_{(t,p)}$ is a $p\times q$ matrix with the covariates related to $\textbf{y}_{(t,p)}$ and $\textbf{x}_{t}$ is a vector with the covariates related to $Y_{t}$ . In other words, we have

	$\displaystyle f(y_{t}\mid\boldsymbol{\theta},\mathcal{F}_{t-1})$	$\displaystyle=$	$\displaystyle f(y_{t}\mid\boldsymbol{\theta},\textbf{y}_{(t,p)})$
		$\displaystyle=$	$\displaystyle f(y_{t}\mid\boldsymbol{\theta},y_{t-1},y_{t-2},\ldots,y_{t-p}),$

$t=p+1,\ldots,n.$ Thus, the observed (conditional) log-likelihood function can be computed by

	$\displaystyle\ell(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{o})=$
	$\displaystyle\log\left(\int_{\mathbb{R}^{m}}\prod_{t=p+1}^{n}f(y_{t}\mid\boldsymbol{\theta},y_{t-1},\ldots,y_{t-p})\mathrm{d}\textbf{y}_{m}\right).$		(7)

3.2 Parameter estimation via the SAEM algorithm

The observed log-likelihood function in Equation 3.1 involves complex expressions, making work directly with $\ell(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{o})$ very difficult. Hence, to obtain the ML estimates of $\boldsymbol{\theta}$ , we now apply an EM-type algorithm considering the hierarchical representation presented in Subsection 2.1.

Let $\textbf{y}=(y_{p+1},\ldots,y_{n})^{\top}$ and $\textbf{u}=(u_{p+1},\ldots,u_{n})^{\top}$ be hypothetical missing data, and consider that we observe $(\textbf{V},\textbf{C})$ , where $\textbf{V}=(V_{p+1},\ldots,V_{n})^{\top}$ and $\textbf{C}=(C_{p+1},\ldots,C_{n})^{\top}$ . Then, the complete dataset is given by $\textbf{y}_{c}=\left(\textbf{y}^{\top},\textbf{u}^{\top},\textbf{V}^{\top},\textbf{C}^{\top}\right)^{\top}$ , and the complete log-likelihood function $\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})$ can be written as

	$\displaystyle\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})$	$\displaystyle=$	$\displaystyle\textrm{cte}+g(\nu\mid\textbf{u})-\frac{n-p}{2}\log\sigma^{2}$
		$\displaystyle-$	$\displaystyle\frac{1}{2\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2},$

with $g(\nu\mid\textbf{u})=\frac{n-p}{2}\left[\nu\log\left(\frac{\nu}{2}\right)-2\log\Gamma\left(\frac{\nu}{2}\right)\right]+\frac{\nu}{2}\left(\sum_{i=p+1}^{n}\log u_{i}-\sum_{i=p+1}^{n}u_{i}\right)$ , $\tilde{y}_{t}=y_{t}-\textbf{x}_{t}^{\top}\boldsymbol{\beta}$ , $\textbf{w}_{t}=\textbf{y}_{(t,p)}-\textbf{X}_{(t,p)}\boldsymbol{\beta}$ , and cte being a constant.

Let $\widehat{\boldsymbol{\theta}}^{(k)}$ denote the current estimate of $\boldsymbol{\theta}$ . Then, the conditional expectation of the complete log-likelihood function given the observed data and ignoring constant terms is given by $Q_{k}(\boldsymbol{\theta})=\mathbb{E}\left[\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}\right]$ , with

	$\displaystyle Q_{k}(\boldsymbol{\theta})$	$\displaystyle=$	$\displaystyle\widehat{g(\nu\mid\textbf{u})}^{(k)}-\frac{n-p}{2}\log\widehat{\sigma}^{2(k)}-\frac{1}{2\widehat{\sigma}^{2(k)}}\left(\widehat{uy^{2}_{*}}^{(k)}\right.$
			$\displaystyle\left.\hskip 22.76228pt-2\widehat{\boldsymbol{\phi}}^{(k)\top}\widehat{uy\textbf{y}}_{}^{(k)}+\widehat{\boldsymbol{\phi}}^{(k)\top}\widehat{u\textbf{y}^{2}}_{}^{(k)}\widehat{\boldsymbol{\phi}}^{(k)}\right),\hskip 28.45274pt$

where

	$\displaystyle\widehat{g(\nu\mid\textbf{u})}^{(k)}=\frac{\widehat{\nu}^{(k)}}{2}\left(\widehat{\mbox{log(u)}}^{(k)}-\widehat{u}^{(k)}\right)$
	$\displaystyle\hskip 28.45274pt+\frac{n-p}{2}\left[\widehat{\nu}^{(k)}\log\left(\frac{\widehat{\nu}^{(k)}}{2}\right)-2\log\Gamma\left(\frac{\widehat{\nu}^{(k)}}{2}\right)\right]$

	$\displaystyle\widehat{uy^{2}_{*}}^{(k)}=\widehat{uy^{2}}^{(k)}-\sum_{i=p+1}^{n}\left(2\widehat{uy_{i}}^{(k)}\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}\right.$
	$\displaystyle\left.\hskip 99.58464pt-\ \widehat{u}_{i}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{x}_{i}\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}\right),$
	$\displaystyle\widehat{uy\textbf{y}_{*}}^{(k)}=\widehat{uy\textbf{y}}^{(k)}-\sum_{i=p+1}^{n}\left(\widehat{uy_{i}}^{(k)}\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\right.$
	$\displaystyle\left.\hskip 28.45274pt+\ \widehat{u\textbf{y}_{i}}^{(k)}\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}-\widehat{u}_{i}^{(k)}\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{x}_{i}\right),$
	$\displaystyle\widehat{u\textbf{y}^{2}_{*}}^{(k)}=\widehat{u\textbf{y}^{2}}^{(k)}-\sum_{i=p+1}^{n}\left(\widehat{u\textbf{y}_{i}}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{X}_{(i,p)}^{\top}\right.$
	$\displaystyle\left.+\ \textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\widehat{u\textbf{y}_{i}}^{(k)\top}-\widehat{u}_{i}^{(k)}\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}}^{(k)}\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{X}_{(i,p)}^{\top}\right),$

such that $\widehat{u}^{(k)}=\sum_{i=p+1}^{n}\widehat{u}_{i}^{(k)}$ , $\widehat{u}_{i}^{(k)}=\mathbb{E}[U_{i}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}]$ , $\widehat{\mbox{log(u)}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}\log(U_{i})\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}]$ , $\widehat{uy^{2}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}U_{i}Y_{i}^{2}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}]$ , $\widehat{uy_{i}}^{(k)}=\mathbb{E}[U_{i}Y_{i}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}]$ , $\widehat{uy\textbf{y}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}U_{i}Y_{i}\textbf{Y}_{(i,p)}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}]$ , $\widehat{u\textbf{y}_{i}}^{(k)}=\mathbb{E}[U_{i}\textbf{Y}_{(i,p)}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k)}]$ , and $\widehat{u\textbf{y}^{2}}^{(k)}=\mathbb{E}[\sum_{i=p+1}^{n}U_{i}\textbf{Y}_{(i,p)}\textbf{Y}_{(i,p)}^{\top}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(k-1)}]$ , for $i=p+1,\ldots,n$ .

It is easy to observe that the E-step reduces to the computation of $\widehat{u}_{i}^{(k)}$ , $\widehat{\mbox{log(u)}}^{(k)}$ , $\widehat{uy^{2}}^{(k)}$ , $\widehat{u_{i}\textbf{y}_{i}}^{(k)}$ , $\widehat{uy\textbf{y}}^{(k)}$ , $\widehat{u\textbf{y}_{i}}^{(k)}$ , and $\widehat{u\textbf{y}^{2}}^{(k)}$ , for $i=p+1,\ldots,n$ . Because of the difficulties in the calculation of the conditional expectations in the E-step, specifically, when we have successive censored observations, we consider the implementation of the SAEM algorithm, introduced in Subsection 2.2, which at the $k$ th iteration proceeds as follows:

Step E-1 (Sampling). We first consider the observed and censored/missing part of $\textbf{y}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{\top})^{\top}$ separately and rearrange the elements of the location vector and scale matrix parameters to compute the conditional distribution of $\textbf{y}_{m}$ as demonstrated in Appendix A.1. Thus, in the simulation step, we generate $M$ samples from the conditional distribution of the latent variables $(\textbf{y},\textbf{u})$ via the Gibbs sampler algorithm according to the following scheme:

•

Step 1: Sample $\textbf{y}_{m}^{(k,l)}$ from the $\mbox{TN}_{n_{m}}(\tilde{\boldsymbol{\mu}}^{*(k)},\tilde{\boldsymbol{\Sigma}}^{*(k)}\mathchar 24635\relax\;\textbf{V}_{m})$ , with $\tilde{\boldsymbol{\mu}}^{*(k)}=\tilde{\boldsymbol{\mu}}_{m}^{(k)}+\tilde{\boldsymbol{\Sigma}}_{mo}^{(k)}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1(k)}(\textbf{y}_{o}-\tilde{\boldsymbol{\mu}}_{o}^{(k)})$ , $\tilde{\boldsymbol{\Sigma}}^{*(k)}=\tilde{\boldsymbol{\Sigma}}_{mm}^{(k)}-\tilde{\boldsymbol{\Sigma}}_{mo}^{(k)}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1(k)}\tilde{\boldsymbol{\Sigma}}_{om}^{(k)}$ , and $\textbf{V}_{m}=\{\textbf{y}_{m}=(y_{1}^{m},\ldots,y_{n_{m}}^{m})^{\top}\mathchar 24635\relax\;V_{11}^{m}\leq y_{1}^{m}\leq V_{12}^{m},\ldots,V_{n_{m}1}^{m}\leq\textbf{y}_{n_{m}}^{m}\leq V_{n_{m}2}^{m}\}$ , where the parameters $\tilde{\boldsymbol{\mu}}_{m}^{(k)},\tilde{\boldsymbol{\mu}}_{o}^{(k)},\tilde{\boldsymbol{\Sigma}}_{mm}^{(k)},\tilde{\boldsymbol{\Sigma}}_{mo}^{(k)}$ , and $\tilde{\boldsymbol{\Sigma}}_{oo}^{(k)}$ are computed using Equations 19 and 20 available in Appendix A.1. Then, we construct a full vector of observations as $\textbf{y}^{(k,l)}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{(k,l)\top})^{\top}$ , using the observed and the sample generated values, for $l=1,\ldots,M$ .
•

Step 2: Sample $\textbf{u}^{(k,l)}$ from $f(\textbf{u}\mid\textbf{y}^{(k,l)},\widehat{\boldsymbol{\theta}}^{(k)})$ , whose elements are independent and Gamma distributed as $u_{i}^{(k,l)}\mid\textbf{y}^{(k,l)},\widehat{\boldsymbol{\theta}}^{(k)}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{G}(a_{i}^{(k,l)},b_{i}^{(k,l)})$ with $a_{i}^{(k,l)}=(\widehat{\nu}^{(k)}+1)/2$ , $b_{i}^{(k,l)}=(\widehat{\nu}^{(k)}+\varrho_{i}^{(k,l)2}/\widehat{\sigma}^{2(k)})/2$ , and $\varrho_{i}^{(k,l)}=y_{i}^{(k,l)}-\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}^{(k)}-\textbf{y}_{(i,p)}^{(k,l)\top}\widehat{\boldsymbol{\phi}}^{(k)}+\widehat{\boldsymbol{\beta}}^{(k)\top}\textbf{X}_{(i,p)}^{\top}\widehat{\boldsymbol{\phi}}^{(k)}$ , for $i=p+1,\ldots,n$ and $l=1,\ldots,M$ .

Step E-2 (Stochastic Approximation). Given the sequence $(\textbf{y}^{(k,l)},\textbf{u}^{(k,l)})$ for $l=1,\ldots,M$ , we replace the conditional expectations in $Q_{k}(\boldsymbol{\theta})$ by the following stochastic approximations:

	$\displaystyle\widehat{u}_{i}^{(k)}=\widehat{u}_{i}^{(k-1)}+\delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}u_{i}^{(k,l)}-\widehat{u}_{i}^{(k-1)}\right),$
	$\displaystyle\widehat{\mbox{log(u)}}^{(k)}=\widehat{\mbox{log(u)}}^{(k-1)}$
	$\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}\log u_{i}^{(k,l)}-\widehat{\mbox{log(u)}}^{(k-1)}\right),$
	$\displaystyle\widehat{uy^{2}}^{(k)}=\widehat{uy^{2}}^{(k-1)}$
	$\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}u_{i}^{(k,l)}y_{i}^{2(k,l)}-\widehat{uy^{2}}^{(k-1)}\right),$
	$\displaystyle\widehat{uy_{i}}^{(k)}=\widehat{uy_{i}}^{(k-1)}$
	$\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}u_{i}^{(k,l)}y_{i}^{(k,l)}-\widehat{uy_{i}}^{(k-1)}\right),$
	$\displaystyle\widehat{uy\textbf{y}}^{(k)}=\widehat{uy\textbf{y}}^{(k-1)}$
	$\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}u_{i}^{(k,l)}y_{i}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)}-\widehat{uy\textbf{y}}^{(k-1)}\right),$
	$\displaystyle\widehat{u\textbf{y}_{i}}^{(k)}=\widehat{u\textbf{y}_{i}}^{(k-1)}$
	$\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}u_{i}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)}-\widehat{u\textbf{y}_{i}}^{(k-1)}\right),$
	$\displaystyle\widehat{u\textbf{y}^{2}}^{(k)}=\widehat{u\textbf{y}^{2}}^{(k-1)}$
	$\displaystyle+\ \delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\sum_{i=p+1}^{n}u_{i}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)}\textbf{y}_{(i,p)}^{(k,l)\top}-\widehat{u\textbf{y}^{2}}^{(k-1)}\right),\hskip 156.49014pt$

where $i=p+1,\ldots,n.$

Following galarza2017quantile, the variable $\delta_{k}$ will be considered as $\delta_{k}=1$ , if $1\leq k\leq cW$ , and $\delta_{k}=1/(k-cW)$ , if $cW+1\leq k\leq W$ , where $W$ is the maximum number of iterations and $c$ is a cutoff point $(0\leq c\leq 1)$ which determines the percentage of initial iterations with no-memory. If $c=0$ , the algorithm will have a memory for all iterations and hence will converge slowly to the ML estimates, and $W$ needs to be large. If $c=1$ , the algorithm will be memory-free, it will converge quickly to a solution neighborhood, and the algorithm will initiate a Markov chain leading to a reasonably well-estimated mean after applying the necessary burn-in and thinning steps. A number between 0 and 1 $(0<c<1)$ will assure an initial convergence in distribution to a solution neighborhood for the first $cW$ iterations and an almost sure convergence for the rest of the iterations.

The selection of $c$ and $W$ could affect the speed of convergence of the SAEM algorithm. A graphical approach can monitor the convergence of the estimates for all parameters and determine the values for these constants, as suggested by lavielle2014mixed. An advantage of the SAEM algorithm is that, even though it performs an MCMC E-step, it only requires a small and fixed sample size $M$ making it much faster than the MCEM algorithm.

Step CM (Conditional Maximization). A conditional maximization step is then carried out, and $\widehat{\boldsymbol{\theta}}^{(k)}$ is updated by maximizing $Q_{k}(\boldsymbol{\theta})$ over $\boldsymbol{\theta}$ to obtain a new estimate $\widehat{\boldsymbol{\theta}}^{(k+1)}$ , leading to the expressions:

$\displaystyle\widehat{\boldsymbol{\phi}}^{(k+1)}$	$\displaystyle=$	$\displaystyle\left(\widehat{u\textbf{y}^{2}_{}}^{(k)}\right)^{-1}\widehat{uy\textbf{y}_{}}^{(k)},$	(8)
$\displaystyle\widehat{\sigma}^{2(k+1)}$	$\displaystyle=$	$\displaystyle\frac{1}{n-p}\left(\widehat{uy_{}^{2}}^{(k)}-2\widehat{\boldsymbol{\phi}}^{(k+1)\top}\widehat{uy\textbf{y}_{}}^{(k)}\right.$	(10)
		$\displaystyle\hskip 56.9055pt+\ \left.\widehat{\boldsymbol{\phi}}^{(k+1)\top}\widehat{u\textbf{y}^{2}_{*}}^{(k)}\widehat{\boldsymbol{\phi}}^{(k+1)}\right),$	(10)
$\displaystyle\widehat{\boldsymbol{\beta}}^{(k+1)}$	$\displaystyle=$	$\displaystyle\left(\sum_{i=p+1}^{n}\widehat{u}_{i}^{(k)}\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)}\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)\top}\right)^{-1}$	(12)
		$\displaystyle\times\sum_{i=p+1}^{n}\left(\widehat{uy_{i}}^{(k)}-\widehat{\boldsymbol{\phi}}^{(k+1)\top}\widehat{u\textbf{y}_{i}}^{(k)}\right)\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)},$	(12)
$\displaystyle\widehat{\nu}^{(k+1)}$	$\displaystyle=$	$\displaystyle\underset{\nu}{\operatorname{argmax}}\quad\widehat{g(\nu\mid u)}^{(k)},\hskip 312.9803pt$	(13)

with $\widehat{\boldsymbol{\alpha}}_{i}^{(k+1)}=\textbf{x}_{i}-\textbf{X}_{(i,p)}^{\top}\widehat{\boldsymbol{\phi}}^{(k+1)}$ for $i=p+1,\ldots,n$ .

3.3 Standard error approximation

The Fisher information matrix is a good measure of the amount of information that a sample dataset provides about the parameters, and it can be used to compute the asymptotic variance of the estimators. louis1982finding developed a procedure for extracting the observed information matrix when the EM algorithm is used to find the ML estimates in problems with incomplete data.

Let $\mathbf{S}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})$ and $\mathbf{B}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})$ respectively denote the first derivative and the negative of the second derivative of the complete-data log-likelihood function with respect to the parameter vector $\boldsymbol{\theta}$ . Let $\mathbf{S}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})$ and $\mathbf{B}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})$ be the corresponding derivatives of the log-likelihood function of the observed data. Then, the observed information matrix is given by

	$\displaystyle\textbf{I}_{o}(\boldsymbol{\theta})=\mathbf{B}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})=\mathbf{S}_{o}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})\mathbf{S}_{o}^{\top}(\textbf{y}_{o}\mathchar 24635\relax\;\boldsymbol{\theta})$
	$\displaystyle+\mathbb{E}\left[\mathbf{B}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})\mid\textbf{y}_{o}\right]-\mathbb{E}\left[\mathbf{S}_{c}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})\mathbf{S}_{c}^{\top}(\textbf{y}_{c}\mathchar 24635\relax\;\boldsymbol{\theta})\mid\textbf{y}_{o}\right].$

delyon1999convergence adapted the method proposed by louis1982finding to compute the observed information matrix when the SAEM algorithm is used to estimate the parameters. For this, it is necessary to compute the auxiliary terms

$\displaystyle\mathbf{H}_{k}$	$\displaystyle=-\mathbf{G}_{k}+\mathbf{\Delta}_{k}\mathbf{\Delta}_{k}^{\top},$	(14)
$\displaystyle\mathbf{G}_{k}$	$\displaystyle=\mathbf{G}_{k-1}+\delta_{k}\Bigg{(}\frac{1}{M}\sum_{l=1}^{M}\Bigg{(}\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}\partial\boldsymbol{\theta}^{\top}}$
	$\displaystyle+\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}}\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}}^{\top}\Bigg{)}-\mathbf{G}_{k-1}\Bigg{)},$
and
$\displaystyle\mathbf{\Delta}_{k}$	$\displaystyle=\mathbf{\Delta}_{k-1}+\delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}^{(k,l)})}{\partial\boldsymbol{\theta}}-\mathbf{\Delta}_{k-1}\right),\hskip 227.62204pt$

where $\textbf{y}^{(k,l)}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{(k,l)\top})^{\top}$ is the vector of observations sampled at iteration $(k,l)$ for $k=1,\ldots,W$ and $l=1,\ldots,M$ , with $W$ denoting the maximum number of iteration of the SAEM algorithm and $M$ the number of MC samples for stochastic approximation. The inverse of the limiting value of $\mathbf{H}_{k}$ can be used to assess the dispersion of the estimators (delyon1999convergence). The analytical expressions for the first and second derivatives of the complete data log-likelihood function are given in Appendix A.2.

3.4 Prediction

Considering the interest of predicting values from the CAR $t(p)$ model, we denote by $\textbf{y}_{\textrm{obs}}$ the $n$ -vector of random variables (RVs) corresponding to the given sample and by $\textbf{y}_{\textrm{pred}}$ the vector of RVs of length $n_{\textrm{pred}}$ corresponding to the time points that we are interested in predicting.

Let $\textbf{y}_{\textrm{obs}}=(\textbf{y}_{o}^{\top},\textbf{y}_{m}^{\top})^{\top}$ , where $\textbf{y}_{o}$ is the vector of uncensored observations and $\textbf{y}_{m}$ is the vector of censored or missing observations. To deal with the censored values existing in $\textbf{y}_{\textrm{obs}}$ , we use an imputation procedure that consists in replacing the censored values with the values obtained in the last iteration of the SAEM algorithm, i.e., $\textbf{y}_{m}=\mathbb{E}[\textbf{y}_{m}\mid\textbf{V},\textbf{C},\widehat{\boldsymbol{\theta}}^{(W)}]\approx\widehat{\textbf{y}}_{m}^{(W)}$ , since elements of $\widehat{\textbf{y}}_{m}^{(k)}$ can also be updated during the Step E-2 of the SAEM algorithm as

\widehat{\textbf{y}}_{m}^{(k)}=\widehat{\textbf{y}}_{m}^{(k-1)}+\delta_{k}\left(\frac{1}{M}\sum_{l=1}^{M}\textbf{y}_{m}^{(k,l)}-\widehat{\textbf{y}}_{m}^{(k-1)}\right),

(15)

$k=1,\ldots,W$ , with the same $\delta_{k}$ , $M$ , and $W$ settings considering in the estimation. The new vector of observed values will be denoted by $\textbf{y}_{\textrm{obs*}}=(\textbf{y}_{o}^{\top},\widehat{\textbf{y}}_{m}^{(W)\top})^{\top}$ .

Now, supposing that all values in $\textbf{y}_{\textrm{obs*}}$ are completely observed and that the explanatory variables for $\textbf{y}_{\textrm{pred}}$ are available, the forecasting procedure will be performed recursively (box2015time) as follows

$\widehat{y}_{n+k}=$

\begin{cases}\textbf{x}_{n+k}^{\top}\boldsymbol{\beta}+\sum_{j=k}^{p}\phi_{j}(y_{n+k-j}-\textbf{x}_{n+k-j}^{\top}\boldsymbol{\beta}),&k=1\\ &\\ \textbf{x}_{n+k}^{\top}\boldsymbol{\beta}+\sum_{i=1}^{k-1}\phi_{i}(\widehat{y}_{n+k-i}-\textbf{x}_{n+k-i}^{\top}\boldsymbol{\beta})\\ +\sum_{j=k}^{p}\phi_{j}(y_{n+k-j}-\textbf{x}_{n+k-j}^{\top}\boldsymbol{\beta}),&1<k\leq p\\ &\\ \textbf{x}_{n+k}^{\top}\boldsymbol{\beta}+\sum_{j=1}^{p}\phi_{j}(\widehat{y}_{n+k-j}-\textbf{x}_{n+k-j}^{\top}\boldsymbol{\beta}),&p<k\leq n_{\textrm{pred}}.\end{cases}

In practice, $\boldsymbol{\theta}$ is substituted by $\widehat{\boldsymbol{\theta}}=\left(\widehat{\boldsymbol{\beta}}^{\top},\widehat{\boldsymbol{\phi}}^{\top},\widehat{\sigma}^{2},\widehat{\nu}\right)^{\top}$ , where $\widehat{\boldsymbol{\theta}}$ is the estimate obtained via the SAEM algorithm, such that $\widehat{\textbf{y}}_{\textrm{pred}}=(\widehat{y}_{n+1},\widehat{y}_{n+2},\ldots,\widehat{y}_{n+n_{\textrm{pred}}})^{\top}$ .

4 Simulation Study

In this section, we examine the asymptotic properties of the SAEM estimates through a simulation study considering different sample sizes and levels of censoring. A second simulation study is performed to demonstrate the robustness of the estimates obtained from the proposed model when the data is perturbed.

Table 1: Simulation study 1. Summary statistics of parameter estimates for the CAR

t

(2) model based on 300 samples of sizes

n=100,300,600

, and different levels of censoring.

$n$	LOD	level of	Measure	$\beta_{0}$	$\beta_{1}$	$\beta_{2}$	$\phi_{1}$	$\phi_{2}$	$\sigma^{2}$	$\nu$
		Average		Parameter
		censoring		5.00	0.50	0.90	-0.40	0.12	2.00	4.00
100	No	0%	MC-Mean	4.988	0.501	0.937	-0.409	0.093	1.779	4.150
			IM-SE	0.325	0.141	0.565	0.086	0.086	0.531	2.612
			MC-SD	0.334	0.149	0.567	0.090	0.091	0.465	2.089
			CP (%)	94.0	91.3	94.0	-	-	-	-
	1.60	5.10%	MC-Mean	4.984	0.502	0.942	-0.409	0.092	1.781	4.345
			IM-SE	0.319	0.141	0.557	0.091	0.088	0.533	3.326
			MC-SD	0.339	0.149	0.576	0.096	0.094	0.494	2.692
			CP (%)	94.3	92.0	94.3	-	-	-	-
	3.45	19.59%	MC-Mean	4.977	0.505	0.931	-0.418	0.089	1.812	4.559
			IM-SE	0.337	0.150	0.586	0.105	0.099	0.584	3.957
			MC-SD	0.349	0.156	0.588	0.110	0.103	0.516	2.914
			CP (%)	93.3	93.7	93.3	-	-	-	-
	4.30	34.11%	MC-Mean	4.939	0.511	0.928	-0.434	0.077	1.856	4.930
			IM-SE	0.371	0.163	0.644	0.122	0.115	0.721	4.872
			MC-SD	0.372	0.171	0.642	0.126	0.120	0.571	3.517
			CP (%)	94.0	93.7	93.3	-	-	-	-
300	No	0%	MC-Mean	5.015	0.501	0.893	-0.401	0.115	1.937	4.162
			IM-SE	0.167	0.084	0.299	0.048	0.048	0.293	1.172
			MC-SD	0.180	0.090	0.312	0.052	0.050	0.295	1.266
			CP (%)	93.3	93.7	93.7	-	-	-	-
	1.60	5.31%	MC-Mean	5.017	0.501	0.889	-0.399	0.115	1.944	4.244
			IM-SE	0.169	0.085	0.303	0.053	0.051	0.314	1.444
			MC-SD	0.182	0.091	0.319	0.055	0.051	0.311	1.498
			CP (%)	93.0	94.0	93.0	-	-	-	-
	3.45	20.58%	MC-Mean	4.999	0.507	0.891	-0.409	0.110	1.997	4.645
			IM-SE	0.177	0.090	0.317	0.060	0.057	0.365	2.122
			MC-SD	0.191	0.095	0.332	0.061	0.058	0.341	1.935
			CP (%)	93.7	92.3	93.7	-	-	-	-
	4.30	35.44%	MC-Mean	4.958	0.518	0.901	-0.409	0.110	2.080	5.043
			IM-SE	0.194	0.098	0.341	0.068	0.065	0.421	2.716
			MC-SD	0.199	0.106	0.339	0.074	0.069	0.385	2.258
			CP (%)	95.0	93.3	95.3	-	-	-	-
600	No	0%	MC-Mean	5.001	0.507	0.909	-0.400	0.118	1.966	4.072
			IM-SE	0.125	0.063	0.225	0.037	0.036	0.221	0.889
			MC-SD	0.125	0.064	0.226	0.038	0.037	0.212	0.918
			CP (%)	93.0	94.7	92.3	-	-	-	-
	1.60	5.11%	MC-Mean	4.999	0.508	0.911	-0.401	0.117	1.977	4.145
			IM-SE	0.125	0.063	0.225	0.037	0.036	0.221	0.899
			MC-SD	0.124	0.065	0.224	0.038	0.037	0.212	0.937
			CP (%)	92.7	94.3	92.0	-	-	-	-
	3.45	19.99%	MC-Mean	4.981	0.513	0.921	-0.405	0.117	2.034	4.447
			IM-SE	0.132	0.066	0.235	0.042	0.040	0.254	1.252
			MC-SD	0.127	0.069	0.229	0.042	0.041	0.239	1.342
			CP (%)	94.0	93.7	94.3	-	-	-	-
	4.30	34.64%	MC-Mean	4.939	0.521	0.933	-0.410	0.112	2.113	4.826
			IM-SE	0.142	0.072	0.251	0.048	0.045	0.300	1.672
			MC-SD	0.139	0.074	0.251	0.049	0.046	0.284	1.657
			CP (%)	93.7	94.7	94.7	-	-	-	-

Refer to caption — Figure 1: Simulation study 1. Boxplot of the estimates for CAR $t$ (2) model considering different sample sizes and LOD.

4.1 Simulation study 1

This study aims to provide empirical evidence about the consistence of the ML estimates under different scenarios. Thence, 300 MC samples were generated considering different sample sizes: $n=100,300$ , and $600$ . The data was generated from the model specified by Equations 1-2, considering a Student- $t$ distribution for the innovations ( $\eta$ ’s) with $\sigma^{2}=2$ and $\nu=4$ . The rest of the parameters were set as $\boldsymbol{\beta}=(5,0.50,0.90)^{\top}$ , $\boldsymbol{\phi}=(-0.40,0.12)^{\top}$ , and the covariables at time $i$ were $\textbf{x}_{i}=(1,x_{i1},x_{i2})^{\top}$ , with $x_{i1}\sim\mathcal{N}(0,1)$ and $x_{2i}\sim\mathcal{U}(0,1)$ , for $i=1,2,\ldots,n$ . From this scenario, two analysis were conducted and will be discussed next.

4.1.1 Asymptotic properties

Aiming to have scenarios with an average level of censoring/missing of 5%, 20%, and 35%, respectively, we considered the following LODs (values lower than the LOD are substituted by the LOD): 1.60, 3.45, and 4.30. Furthermore, 20% of the desired censored rate corresponds to observations randomly selected to be treated as missing. Additionally, we considered the case without censoring (original data) for comparison.

For each sample size and LOD, we computed the mean of the 300 MC estimates (MC-Mean), the mean of the standard error (IM-SE) computed by the inverse of the observed information matrix given in Equation 14, the standard deviation of the 300 MC estimates (MC-SD), and the coverage probability (CP) of a 95% confidence interval, i.e.,

$\displaystyle\mbox{MC-Mean}_{i}$	$\displaystyle=$	$\displaystyle\bar{\widehat{\theta}}_{i}=\frac{1}{300}\sum_{j=1}^{300}\widehat{\theta}_{i}^{(j)},$
$\displaystyle\mbox{MC-SD}_{i}$	$\displaystyle=$	$\displaystyle\sqrt{\frac{1}{299}\sum_{j=1}^{299}\left(\widehat{\theta}_{i}^{(j)}-\bar{\widehat{\theta}}_{i}\right)^{2}},\quad\mbox{and}$
$\displaystyle\mbox{IM-SE}_{i}$	$\displaystyle=$	$\displaystyle\frac{1}{300}\sum_{j=1}^{300}SE\left(\widehat{\theta}_{i}^{(j)}\right),$

where $\widehat{\theta}_{i}^{(j)}$ is the estimate of the $i$ th parameter of $\boldsymbol{\theta}=(\beta_{0},\beta_{1},\beta_{2},\phi_{1},\phi_{2},\sigma^{2},\nu)^{\top}$ in the $j$ th MC sample.

The results are shown in Table 1, where we can observe that the mean of the estimates (MC-mean) is close to the true parameter value in all combinations of sample size and LOD, except for the scale parameter $\sigma^{2}$ for a sample size of $n=100$ . As expected, this difference decreases as the sample size increases. Notice that the mean of the standard errors obtained through the inverse of the observed information matrix (IM-SE) are in general close to the standard deviation of the estimates (MC-SD) for all scenarios, indicating that the proposed method to obtain the standard errors is reliable.

Figure 1 shows the boxplots of the estimates for each parameter, considering different sample sizes and LODs. The solid red line represents the true parameter value. In general, the median of the estimates is close to the real value independent of the sample size and LOD. However, for $\phi_{2}$ and $\sigma^{2}$ , the median underestimates the true value in samples of size $n=100$ , i.e., the smallest sample size in the simulation study. Furthermore, interquartile ranges decrease as sample sizes increase, suggesting the consistence of the estimates. Additionally, boxplots for the estimates of $\nu$ are shown in Figure 8 in Appendix B.

Finally, to study the asymptotic properties of the estimates, we analyzed the mean squared error (MSE) of the estimates obtained from the proposed algorithm for all scenarios, which can defined by $\mbox{MSE}_{i}=\frac{1}{n}\sum_{j=1}^{n}(\widehat{\theta}_{i}^{(j)}-\theta_{i})^{2}$ . The results for each parameter, sample size, and LOD are shown in Figure 2, where we may note that the MSE tends to zero as the sample size increases. Thus, the proposed SAEM algorithm seems to provide ML estimates with good asymptotic properties for our proposed autoregressive censored linear model with Student- $t$ innovations.

4.1.2 Residual Analysis

Checking the specification of a statistical model usually involves statistical tests and graphical methods based on residuals. However, conventional residuals (as Pearson’s residuals, p.e.) are not appropriate for some models, since they may lead to erroneous inference as kalliovirta2006misspecification demonstrated for models based on mixtures of distributions. As an alternative, dunn1996randomized developed the quantile residuals, a type of residuals for regression models with independent responses that produce residuals normally distributed by inverting the fitted distribution function for each response value and finding the equivalent standard normal quantile. These results assume that the model is correctly specified and parameters are consistently estimated. The method can be extended to dependent data by expressing the likelihood as a product of univariate conditional likelihoods.

To compute the quantile residuals for the CAR $t(p)$ model, we first impute the censored or missing observations as defined in Section 3.4. Then, considering all values as completely observed, the residual for $i$ th observation is computed as follows

\widehat{r}_{i}=\Phi^{-1}\left(T_{1}\left(y_{i}\mathchar 24635\relax\;\widehat{\mu}_{i},\widehat{\sigma}^{2},\widehat{\nu}\right)\right),\quad i=p+1,\ldots n,

(16)

where $\Phi(\cdot)$ denotes the cumulative distribution function (CDF) of the standard normal distribution, $T_{1}(\cdot\mathchar 24635\relax\;\hat{\mu}_{i},\hat{\sigma}^{2},\hat{\nu})$ is the CDF of the univariate Student- $t$ distribution with location parameter $\widehat{\mu}_{i}=\textbf{x}_{i}^{\top}\widehat{\boldsymbol{\beta}}+(\textbf{y}_{(i,p)}-\textbf{X}_{(i,p)}\widehat{\boldsymbol{\beta}})^{\top}\widehat{\boldsymbol{\phi}}$ , scale parameter $\hat{\sigma}^{2}$ , and $\hat{\nu}$ degrees of freedom, where $\widehat{\boldsymbol{\theta}}$ refers to the ML estimates of $\boldsymbol{\theta}$ obtained through the SAEM algorithm. Note that the quantile residual is calculated only for the latest $n-p$ observations.

Aiming to analyze how the quantile residuals behave for the proposed model, they were computed for a simulated dataset of sample size $n=600$ and considering four levels of left censoring: 0%, 5.17%, 20%, and 34.67%. Figure 3 shows a Quantile-Quantile (Q-Q) plot, a dispersion plot (residual vs. time), and a histogram for the residuals. For all levels of censoring, the histogram seems to correspond to a histogram of a normally distributed variable, and the dispersion plot shows independent residuals. We can deduce through the Q-Q plot that the residuals are roughly normally distributed because all points form a roughly straight line inside the confidence band. However, for samples with 20% and 34.67% of censoring, the Q-Q plots present a slight deviation from the center line in the lower tail, which might be due to the high proportion of censored values.

For comparison, we fitted the same dataset considering the normal distribution (i.e., disregarding the heavy tails), and computed the corresponding quantile residuals. The resulting plots are given in Figure 4, where we can see clear signs of non-normality, such as large residuals and several points outside the confidence band in the Q-Q plots. Therefore, this illustration indicates that this method can help checking the model specification in the CAR $t(p)$ model. Nevertheless, for significant levels of censoring, more caution is needed in the analysis of residuals since our proposal imputes the unobserved data by its conditional expectation.

4.2 Simulation study 2: Robustness of the estimators

This simulation study aims to compare the performance of the estimates for two censored AR models in the presence of outliers on the response variable. In this case, we generated 300 MC samples of size $n=100$ under the model specified in Equations 1-2, considering the standard normal distribution for the innovations. The other parameters were set as $\boldsymbol{\beta}=(4,0.50)^{\top}$ and $\boldsymbol{\phi}=(0.48,-0.20)^{\top}$ , and the covariates were set as $\textbf{x}_{i}=(1,x_{i1})^{\top}$ , where $x_{i1}\sim\mathcal{N}(0,1)$ , $i=1,\ldots,100$ . After generating the data, each MC sample was perturbed considering the following scheme: the maximum value was increased in $\vartheta$ times the sample standard deviation, i.e., $\max(\textbf{y})=\max(\textbf{y})+\vartheta\mbox{SD}(\textbf{y})$ , for $\vartheta\in\{0,1,2,3,4,5,7\}$ .

Table 2: Simulation study 2. Mean of the estimates for CAR(2) and CAR

t

(2) model based on 300 MC samples of size

n=100

for different levels of perturbation.

a. Level of censoring: 0% Pert. CAR(2) CAR $t$ (2) ( $\vartheta$ ) $\beta_{0}$ $\beta_{1}$ $\sigma^{2}$ $\phi_{1}$ $\phi_{2}$ $\beta_{0}$ $\beta_{1}$ $\sigma^{2*}$ $\phi_{1}$ $\phi_{2}$ $\nu$ DI (%) 0 4.007 0.510 0.963 0.478 -0.215 4.007 0.510 0.979 0.480 -0.216 24.424 17 1 4.021 0.526 1.030 0.469 -0.209 4.008 0.518 1.042 0.464 -0.207 17.098 73.33 2 4.036 0.541 1.135 0.447 -0.195 3.999 0.518 1.130 0.433 -0.186 9.568 98.67 3 4.050 0.557 1.275 0.416 -0.175 3.991 0.515 1.229 0.398 -0.162 6.064 99.67 4 4.064 0.572 1.449 0.383 -0.154 3.987 0.513 1.328 0.369 -0.140 4.896 100 5 4.079 0.588 1.655 0.349 -0.133 3.986 0.510 1.426 0.346 -0.121 4.324 100 7 4.107 0.619 2.160 0.287 -0.100 3.985 0.508 1.628 0.309 -0.090 3.710 100

b. Level of censoring: 10.04% Pert. CAR(2) CAR $t$ (2) ( $\vartheta$ ) $\beta_{0}$ $\beta_{1}$ $\sigma^{2}$ $\phi_{1}$ $\phi_{2}$ $\beta_{0}$ $\beta_{1}$ $\sigma^{2*}$ $\phi_{1}$ $\phi_{2}$ $\nu$ DI (%) 0 4.006 0.511 0.966 0.478 -0.214 4.006 0.511 0.997 0.478 -0.213 20.892 21.00 1 4.018 0.529 1.044 0.468 -0.208 4.007 0.519 1.079 0.460 -0.203 14.290 77.67 2 4.027 0.549 1.165 0.445 -0.194 4.002 0.518 1.207 0.424 -0.180 7.764 98.67 3 4.036 0.570 1.327 0.415 -0.175 3.999 0.514 1.362 0.387 -0.155 5.061 99.67 4 4.044 0.592 1.527 0.383 -0.156 3.999 0.511 1.535 0.357 -0.133 4.105 100 5 4.052 0.614 1.764 0.351 -0.138 4.000 0.509 1.704 0.332 -0.115 3.668 100 7 4.065 0.659 2.340 0.294 -0.109 4.003 0.505 2.170 0.295 -0.088 3.166 100

c. Level of censoring: 30% Pert. CAR(2) CAR $t$ (2) ( $\vartheta$ ) $\beta_{0}$ $\beta_{1}$ $\sigma^{2}$ $\phi_{1}$ $\phi_{2}$ $\beta_{0}$ $\beta_{1}$ $\sigma^{2*}$ $\phi_{1}$ $\phi_{2}$ $\nu$ DI (%) 0 4.014 0.505 0.944 0.476 -0.219 4.015 0.503 0.972 0.476 -0.218 19.338 24.67 1 4.014 0.532 1.050 0.464 -0.213 4.011 0.513 1.076 0.454 -0.206 12.553 82.33 2 4.007 0.562 1.213 0.439 -0.199 4.007 0.512 1.230 0.415 -0.181 6.854 98.33 3 3.996 0.594 1.429 0.409 -0.182 4.006 0.507 1.406 0.376 -0.156 4.651 100 4 3.982 0.628 1.694 0.378 -0.165 4.008 0.503 1.616 0.344 -0.135 3.803 100 5 3.965 0.664 2.005 0.349 -0.151 4.010 0.500 1.837 0.318 -0.118 3.386 100 7 3.926 0.737 2.760 0.301 -0.130 4.016 0.495 2.428 0.280 -0.092 2.926 100

Furthermore, we considered three different levels of censoring: the first case corresponds to the case without censoring; the second case considered 2.34 as a LOD, where all values lower or equal than the LOD were substituted by 2.34 (which implied an average of 10.04% of censoring); and finally the third scenario considered a LOD of 3.31 (which yielded an average of 30% of censoring). For each scenario, we fitted two models: the first considers normal innovations, denoted by CAR(2) whose parameters were estimated by the function ARCensReg from the R package ARCensReg (schumacher2016package), and the second one is our proposed model CAR $t$ (2) fitted through the function ARtCensReg from the same package.

Table 2 displays the mean of the estimates for each parameter by level of perturbation and censoring rate. To obtain comparable values of $\sigma^{2}$ , for the model with Student- $t$ innovations we reported the estimated variance of the innovation $\widehat{\sigma}^{2*}=\widehat{\nu}\widehat{\sigma}^{2}/(\widehat{\nu}-2)$ , where $\widehat{\sigma}^{2}$ is the estimate of the scale parameter under our proposal. Moreover, the percentage of times that our model detected the perturbed observation as influential, denoted by DI(%), is also reported, which was computed as the number of times the estimated weight ( ${\widehat{u}_{i}}$ , computed during the E-step in the SAEM algorithm) of the perturbed observation was the lowest one, divided by the number of MC samples.

Table 2a and Figure 5a show the results for the non-censored dataset, where it is possible to observe that for the normal distribution the bias increases as the perturbation increases, as natural; however, when Student- $t$ innovations are considered, the bias is much smaller, illustrating the robustness of the model against atypical distributions. As expected, estimates for $\nu$ decrease as the perturbation grows. For the non-perturbed samples, the observation with the maximum value was detected as influential in only 17% of the samples, but this percentage increases fast along the perturbation. Observe that for $\vartheta=4,5$ , and $7$ , the perturbed observation was detected as influential in all MC samples.

Table 2b and Figure 5b display the results for samples with an average of 10.04% of censoring. The estimates for $\beta_{0}$ have a distribution similar to the non-censored case. On the other hand, for $\beta_{1}$ , a more significant difference was observed between the real value and its estimate in the normal model. In contrast, the model with heavy-tailed innovations performs better in recovering the true parameter values, with a mean value of $\nu$ smaller than the observed previously.

Finally, results for the scenario with an average of 30% of left censoring are shown in Table 2c and Figure 5c. For the normal case, the bias for $\beta_{1}$ is more outstanding than the observed in the previous two cases, while for our model, its mean and median are generally close to the true parameter value, for all perturbation levels. For $\beta_{0}$ , the normal model returned estimates close to the real value only for levels of perturbation lower than 4 ( $\vartheta<4$ ), while for larger perturbations the models tends to underestimate it. These results confirm that the heavy-tails of the Student- $t$ distribution provides to our model the ability to mitigate the effect of outliers, i.e., a much more robust method against atypical values.

5 Application

In this section, the CAR $t(p)$ model is applied to a real environmental dataset that has both missing and censored observations. We consider the phosphorus concentration dataset previously analyzed by schumacher2017censored and wang2018quasi, available in the package ARCensReg. For this study, we are interested in tracking the phosphorus concentration levels over time as an indicator of the river water quality since, for instance, excessive phosphorus in surface water may result in eutrophication. The data consist of $n=181$ observations of phosphorus concentration ( $P$ ) in mg/L monthly measured from October 1998 to October 2013 in West Fork Cedar River at Finchford, Iowa, USA. The phosphorus concentration measurement was subject to a LOD of 0.02, 0.05, or 0.10, depending on the time, and therefore 28 (15.47%) observations are left-censored. Moreover, there are 7 (3.87%) missing observations from September 2008 to March 2009 due to a program suspension caused by a temporary lack of funding.

As mentioned by wang2018quasi, $P$ levels are generally correlated with the water discharge ( $Q$ ), which is measured in cubic feet per second; then, our objective is to explore the relationship between $P$ and $Q$ , when the response contains censored and missing observations. Aiming to evaluate the prediction accuracy, the dataset was train-test split. The training dataset consists of 169 observations, where 20.71% are left-censored or missing, while the testing dataset contains 12 observations, all fully observed. Following wang2018quasi, to make the $P$ - $Q$ relationship more linear we considered the logarithmic transformation of $P$ and $Q$ and fitted the following CAR $t(p)$ model:

\log(P_{t})=\sum_{j=1}^{4}\left[\beta_{0j}S_{jt}+\beta_{1j}S_{jt}\log(Q_{t})\right]+\boldsymbol{\xi}_{t},

(17)

$t=p+1,\ldots,n$ , where the regression error $\boldsymbol{\xi}_{t}$ follows an autoregressive model and $S_{j}$ is a dummy seasonal variable for quarters $j=1,2,3$ , and $4$ . The first quarter corresponds from January to March, the second one from April to June, and so on. In this model, $\beta_{0j}$ and $\beta_{1j}$ are respectively the intercept and slope for quarter $j$ , for $j=1,2,3$ , and $4$ .

The AR order is unknown, but since the second observation from the training set (November 1998) is censored, it is not possible to consider a model of order greater than one because the proposed algorithm assumes that the first $p$ values are completely observed. Therefore, for this application we only considered a model of order 1. Besides that schumacher2017censored concluded that a censored autoregressive model of order 1 was the best to fit this dataset based on information criteria and mean squared prediction error (MSPE). The authors also found that the dataset has some influential observations, then it seems reasonable to consider a model with innovations following a heavy-tailed distribution.

Table 3: Phosphorus concentration data. Parameter estimates and their standard errors (SE) for the CAR(1) and CAR

t

(1) models.

Parameters	CAR $t$ (1)			CAR(1)
Parameters	Estimate	SE	95% CI	Estimate	SE	95% CI
$\beta_{01}$	-4.338	0.609	(-5.530 , -3.145)	-4.670	0.513	(-5.674 , -3.665)
$\beta_{02}$	-2.731	0.750	(-4.201 , -1.262)	-3.022	0.751	(-4.494 , -1.549)
$\beta_{03}$	-4.141	0.419	(-4.963 , -3.319)	-4.174	0.442	(-5.041 , -3.307)
$\beta_{04}$	-4.651	0.545	(-5.719 , -3.583)	-4.999	0.549	(-6.075 , -3.922)
$\beta_{11}$	0.293	0.107	(0.084 , 0.503)	0.373	0.085	(0.206 , 0.541)
$\beta_{12}$	0.139	0.105	(-0.066 , 0.345)	0.185	0.105	(-0.021 , 0.391)
$\beta_{13}$	0.363	0.070	(0.227 , 0.500)	0.364	0.075	(0.218 , 0.510)
$\beta_{14}$	0.351	0.097	(0.161 , 0.542)	0.410	0.098	(0.218 , 0.601)
$\phi_{1}$	-0.087	0.085		-0.077	0.089
$\sigma^{2}$	0.176	0.046		0.254	0.030
$\nu$	5.002	3.059		-	-
MSPE	0.101			0.126
MAPE	0.240			0.255

For comparison purposes, we considered the model in Equation 17 with regression errors following an autoregressive model of order 1 (as defined by Equation 2), with innovations $\eta_{t}$ being independent and identically distributed as $\mathcal{N}(0,\sigma^{2})$ and $t(0,\sigma^{2},\nu)$ (CAR(1) and CAR $t$ (1), respectively). Both models were fitted using the R library ARCensReg through the functions ARCensReg and ARtCensReg, respectively. Parameter estimates and their corresponding standard errors (SEs) are displayed in Table 3, along with the MSPE and the mean absolute prediction error (MAPE) were computed considering one-step-ahead predictions for the testing dataset. These criteria indicate that the heavy-tailed Student- $t$ ( $\widehat{\nu}=5.002$ ) model provides better predictions than the normal model for the phosphorus concentration data.

One can also note that the estimates for $\beta_{0j},j=1,2,3,4$ under the CAR $t$ (1) model are negative and greater than the estimates obtained from the CAR(1) model. On the other hand, estimates for slopes $\beta_{1j}$ are all positive and, except for the second quarter, they are significantly different from zero, evidencing the correlation between the water discharge and the phosphorus concentration. Regarding the autoregressive coefficient $\phi_{1}$ , both models estimated similar values.

Figure 6 presents a Q-Q plot (left), a residual vs. time scatterplot (middle), and a histogram (right) for residual analysis for both models. For the CAR $t$ (1) model (top), we see in the Q-Q plot that all points are forming a roughly straight line and lie within the confidence bands. Further, the histogram seems to correspond to a normally distributed variable, and the dispersion plot seems to be related to independent residuals. On the other hand, the Q-Q plot for the CAR(1) model (bottom) presents some points outside of the confidence bands on the upper tail, indicating that the distribution is skewed or heavy-tailed. Additionally, we see from the scatterplot and histogram larger residual values than those from the $t$ model. Therefore, the CAR $t$ (1) model seems to provide a better fit to the phosphorus concentration data than the CAR(1) model. Plots with the sample autocorrelation for the quantile residuals are displayed in Appendix B, Figure 9.

Finally, Figure 7 shows the imputed values for the censored and missing observations from October 1998 to October 2012 (training dataset) and the predicted values from November 2012 to October 2013 (yellow box) considering normal (light blue line) and Student- $t$ innovations (pink line), with observed values represented as a black solid line and vertical black dashed lines indicating the period with missing values. Here, we see slight differences between the predicted values obtained under both fitted models. Besides, the general behavior of the imputed values for the missing period seems to capture well the seasonal behavior of the time series and is also similar for both models. In addition, for assessing the convergence of SAEM parameter estimates, convergence plots for our proposal are displayed in Figure 10 in Section B of the Appendix.

6 Conclusions

Extending autoregressive regression methods to include censored response variables is a promising area of research. This paper introduces a novel model that can handle left, right, or interval censoring time series, while simultaneously modeling heavy-tails and missing observations, which can be treated as interval-censored observations.

Our approach extends some previous works, such as schumacher2017censored and liu2019parameter, and handles missing data while respecting the nature of the data without the need for transformations. The proposed methods have been coded and implemented in the R package ARCensReg, which is available for the users on the CRAN repository.

It is important to remark that we assumed the dropout/censoring mechanism to be missing at random (MAR) (see diggle2002analysis, p 283). However, in the case where MAR with ignorability is not realistic, the relationship between the unobserved measurements and the censoring process should be further investigated. Future directions point to tackling the limitation of assuming that the first $p$ observations are fully observed to fit a CAR $t(p)$ model. Furthermore, a natural and interesting path for future research is to extend this model to a multivariate framework.

\bmhead

Acknowledgments

This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. Larissa A. Matos acknowledges support from FAPESP-Brazil (Grant 2020/16713-0).

Appendix A Conditional distributions and derivatives

A.1 Full conditional distributions

The full conditional distributions are needed to perform the E-Step of the SAEM algorithm; these are $f(\textbf{u}\mid\textbf{y}_{o},\textbf{y}_{m},\boldsymbol{\theta})$ and $f(\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{o},\boldsymbol{\theta})$ . We have that, the conditional probability density function (pdf) of $\textbf{u}=(u_{1},\ldots,u_{n})^{\top}$ is given by

	$\displaystyle f(\textbf{u}\mid\textbf{y}_{o},\textbf{y}_{m},\boldsymbol{\theta})=\frac{f(\textbf{u},\textbf{y}_{o},\textbf{y}_{m}\mid\boldsymbol{\theta})}{f(\textbf{y}_{o},\textbf{y}_{m}\mid\boldsymbol{\theta})}\propto f(\textbf{u},\textbf{y}_{o},\textbf{y}_{m}\mid\boldsymbol{\theta})$
	$\displaystyle\propto\prod_{i=p+1}^{n}u_{i}^{(\nu-1)/2}\exp\Big{\{}-\frac{u_{i}}{2\sigma^{2}}\Big{(}y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}$
	$\displaystyle\hskip 99.58464pt+\boldsymbol{\beta}^{\top}\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}\Big{)}^{2}-\frac{\nu}{2}u_{i}\Big{\}}$
	$\displaystyle=\prod_{i=p+1}^{n}u_{i}^{(\nu-1)/2}\exp\Big{\{}-\frac{u_{i}}{2}\Big{(}\nu+\big{(}y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}$
	$\displaystyle\hskip 85.35826pt-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}+\boldsymbol{\beta}^{\top}\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}\big{)}^{2}\big{/}\sigma^{2}\Big{)}\Big{\}}.$

From the last expression, we deduce that $U_{i}$ is independent of $U_{j}$ for all $i\neq j$ , where $U_{i}\mid\textbf{y}_{o},\textbf{y}_{m},\boldsymbol{\theta}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{G}\left(a_{i},b_{i}\right)$ , with $a_{i}=\displaystyle\frac{\nu+1}{2}$ , $b_{i}=\displaystyle\frac{1}{2}\left(\nu+\frac{\varrho_{i}^{2}}{\sigma^{2}}\right)$ , and $\varrho_{i}=y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}+\boldsymbol{\beta}^{\top}\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}$ , for $i=p+1,\ldots,n$ .

To compute the condition distribution of $\textbf{y}_{m}$ , we consider that the first $p$ observations are completely observed. Using the VAR(1) model representation, as zhou2020student suggested, we get the expression,

\textbf{w}_{t}=\boldsymbol{\Phi}{\textbf{w}}_{t-1}+\boldsymbol{\alpha}_{t},

where $\boldsymbol{\alpha}_{t}=(\eta_{t},0,\ldots,0)^{\top}$ is a vector of dimension $p$ , $\boldsymbol{\Phi}=[\boldsymbol{\phi},\textbf{A}^{\top}]^{\top}$ is a $p\times p$ matrix in which A is a $(p-1)\times p$ matrix with the identity matrix in its first $p-1$ columns and $0$ s in the last one, $\textbf{w}_{t}=(\tilde{y}_{t},\tilde{y}_{t-1},\ldots,\tilde{y}_{t-p+1})^{\top}$ , and $\tilde{y}_{t}=y_{t}-\textbf{x}_{t}^{\top}\boldsymbol{\beta}$ . Through a recursive process based on VAR(1) form, we have

$\displaystyle t=p+1$	$\displaystyle\Rightarrow$	$\displaystyle{\textbf{w}}_{p+1}=\boldsymbol{\Phi}{\textbf{w}}_{p}+\boldsymbol{\alpha}_{p+1}.$
$\displaystyle t=p+2$	$\displaystyle\Rightarrow$	$\displaystyle{\textbf{w}}_{p+2}=\boldsymbol{\Phi}{\textbf{w}}_{p+1}+\boldsymbol{\alpha}_{p+2}$
		$\displaystyle\quad=\boldsymbol{\Phi}(\boldsymbol{\Phi}{\textbf{w}}_{p}+\boldsymbol{\alpha}_{p+1})+\boldsymbol{\alpha}_{p+2}$
		$\displaystyle\quad=\boldsymbol{\Phi}^{2}{\textbf{w}}_{p}+\boldsymbol{\Phi}\boldsymbol{\alpha}_{p+1}+\boldsymbol{\alpha}_{p+2}.$
$\displaystyle t=p+3$	$\displaystyle\Rightarrow$	$\displaystyle{\textbf{w}}_{p+3}=\boldsymbol{\Phi}{\textbf{w}}_{p+2}+\boldsymbol{\alpha}_{p+3}$
		$\displaystyle\quad=\boldsymbol{\Phi}(\boldsymbol{\Phi}^{2}{\textbf{w}}_{p}+\boldsymbol{\Phi}\boldsymbol{\alpha}_{p+1}+\boldsymbol{\alpha}_{p+2})+\boldsymbol{\alpha}_{p+3}$
		$\displaystyle\quad=\boldsymbol{\Phi}^{3}{\textbf{w}}_{p}+\boldsymbol{\Phi}^{2}\boldsymbol{\alpha}_{p+1}+\boldsymbol{\Phi}\boldsymbol{\alpha}_{p+2}+\boldsymbol{\alpha}_{p+3}.$
	$\displaystyle\vdots$
$\displaystyle t=p+k$	$\displaystyle\Rightarrow$	$\displaystyle{\textbf{w}}_{p+k}=\boldsymbol{\Phi}{\textbf{w}}_{p+k-1}+\boldsymbol{\alpha}_{p+k}$
		$\displaystyle\quad=\boldsymbol{\Phi}^{k}{\textbf{w}}_{p}+\sum_{j=0}^{k-1}\boldsymbol{\Phi}^{j}\boldsymbol{\alpha}_{p+k-j}.$

Note that the model defined in Equations 1 and 2 can be recovered through the first element of the preceding vectors, i.e.,

	$\displaystyle y_{p+k}$	$\displaystyle=$	$\displaystyle\textbf{x}_{p+k}^{\top}\boldsymbol{\beta}+\left(\boldsymbol{\Phi}^{k}\right)^{\top}_{1.}\Big{(}{\textbf{y}}_{(p+1,p)}-\textbf{X}_{(p+1,p)}\boldsymbol{\beta}\Big{)}$		(18)
			$\displaystyle+\sum_{j=0}^{k-1}\left(\boldsymbol{\Phi}^{j}\right)_{11}\eta_{p+k-j},$		(18)

where $\boldsymbol{\Phi}^{k}$ represents the matrix $\boldsymbol{\Phi}$ multiplied by itself $k$ times, $\big{(}\boldsymbol{\Phi}^{k}\big{)}_{1.}$ is a $p\times 1$ vector whose elements correspond to the first row of $\boldsymbol{\Phi}^{k}$ , and $\big{(}\boldsymbol{\Phi}^{k}\big{)}_{11}$ is the element (1,1) of $\boldsymbol{\Phi}^{k}$ . From Equation 18, we deduce that $Y_{p+k}$ given $\textbf{y}_{(p+1,p)}$ , $\boldsymbol{\theta}$ , and $\textbf{U}=\textbf{u}$ is normally distributed, for all $k=1,\ldots,n-p$ . Thus, the conditional expectation and the elements of the variance-covariance matrix that characterizes the normal distribution will be given by

$\displaystyle\tilde{\mu}_{(k)}$	$\displaystyle=\mathbb{E}\big{[}Y_{p+k}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta}\big{]}$
	$\displaystyle=\mathbb{E}\Bigg{[}\textbf{x}_{p+k}^{\top}\boldsymbol{\beta}+\big{(}\boldsymbol{\Phi}^{k}\big{)}^{\top}_{1.}({\textbf{y}}_{(p+1,p)}-\textbf{X}_{(p+1,p)}\boldsymbol{\beta})$
	$\displaystyle\quad\quad+\sum_{j=0}^{k-1}\big{(}\boldsymbol{\Phi}^{j}\big{)}_{11}\eta_{p+k-j}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta}\Bigg{]}$
	$\displaystyle=\textbf{x}_{p+k}^{\top}\boldsymbol{\beta}+\big{(}\boldsymbol{\Phi}^{k}\big{)}^{\top}_{1.}({\textbf{y}}_{(p+1,p)}-\textbf{X}_{(p+1,p)}\boldsymbol{\beta}).$	(19)
$\displaystyle\tilde{\sigma}_{(kl)}$	$\displaystyle=\mathrm{Cov}(Y_{p+k},Y_{p+l}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta})$
	$\displaystyle=\mathrm{Cov}\Bigg{(}\sum_{j=0}^{k-1}\Big{(}\boldsymbol{\Phi}^{j}\Big{)}_{11}\eta_{p+k-j},$
	$\displaystyle\hskip 36.98866pt\sum_{i=0}^{l-1}\left(\boldsymbol{\Phi}^{i}\right)_{11}\eta_{p+l-i}\mid\textbf{u},\textbf{y}_{(p+1,p)},\boldsymbol{\theta}\Bigg{)}$
	$\displaystyle=\sum_{j=0}^{k-1}\sum_{i=0}^{l-1}\left(\boldsymbol{\Phi}^{j}\right)_{11}\left(\boldsymbol{\Phi}^{i}\right)_{11}$
	$\displaystyle\hskip 36.98866pt\times\mathrm{Cov}\Bigg{(}\eta_{p+k-j},\eta_{p+l-i}\mid\textbf{u},\textbf{y}_{1:p},\boldsymbol{\theta}\Bigg{)}$
	$\displaystyle=\sum_{j=0}^{r-1}\frac{\sigma^{2}}{u_{p+r-j}}\left(\boldsymbol{\Phi}^{j}\right)_{11}\Big{(}\boldsymbol{\Phi}^{\mid k-l\mid+j}\Big{)}_{11}$
	$\displaystyle=\sum_{j=1}^{r}\frac{\sigma^{2}}{u_{p+j}}\big{(}\boldsymbol{\Phi}^{k-j}\big{)}_{11}\big{(}\boldsymbol{\Phi}^{l-j}\big{)}_{11},$	(20)

where $r=\min(k,l)$ . Now, suppose that y is partitioned into two vectors, $\textbf{y}_{1:p}\in\mathbb{R}^{p}$ containing the first $p$ observed values and $\textbf{y}_{-p}\in\mathbb{R}^{n-p}$ containing the remaining observations, i.e., $\textbf{y}=(\textbf{y}_{1:p}^{\top},\textbf{y}_{-p}^{\top})^{\top}$ . Let $\textbf{y}_{o}$ and $\textbf{y}_{m}$ be the observed and the censored/missing part of $\textbf{y}_{-p}$ , respectively, then the distribution of $\textbf{y}_{-p}\mid\textbf{u},\textbf{y}_{1:p},\boldsymbol{\theta}\equiv\textbf{y}_{o},\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{1:p},\boldsymbol{\theta}\sim\mathcal{N}_{n-p}(\tilde{\boldsymbol{\mu}},\tilde{\boldsymbol{\Sigma}})$ , where the $i$ th element of $\tilde{\boldsymbol{\mu}}$ is $\tilde{\mu}_{(i)}$ and the element $(i,j)$ of $\tilde{\boldsymbol{\Sigma}}$ is equal to $\tilde{\sigma}_{(ij)}$ , for all $i,j=1,\ldots,n-p$ .

To compute the conditional distribution of $\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{o},\textbf{y}_{1:p},\boldsymbol{\theta}$ , we rearrange the elements of $\textbf{y}_{-p},\tilde{\boldsymbol{\mu}}$ , and $\tilde{\boldsymbol{\Sigma}}$ as follows:

	$\displaystyle\textbf{y}_{-p}=\left(\begin{array}[]{c}\textbf{y}_{o}\\ \textbf{y}_{m}\end{array}\right),\quad\tilde{\boldsymbol{\mu}}=\left(\begin{array}[]{c}\tilde{\boldsymbol{\mu}}_{o}\\ \tilde{\boldsymbol{\mu}}_{m}\end{array}\right),$
	$\displaystyle\mbox{and}\quad\tilde{\boldsymbol{\Sigma}}=\left(\begin{array}[]{cc}\tilde{\boldsymbol{\Sigma}}_{oo}&\tilde{\boldsymbol{\Sigma}}_{om}\\ \tilde{\boldsymbol{\Sigma}}_{mo}&\tilde{\boldsymbol{\Sigma}}_{mm}\end{array}\right).$

Using the results for the conditional distribution of the normal distribution, we obtain

\textbf{y}_{m}\mid\textbf{u},\textbf{y}_{o},\textbf{y}_{1:p},\boldsymbol{\theta}\sim\mathcal{N}(\tilde{\boldsymbol{\mu}}^{*},\tilde{\boldsymbol{\Sigma}}^{*}),

with $\tilde{\boldsymbol{\mu}}^{*}=\tilde{\boldsymbol{\mu}}_{m}+\tilde{\boldsymbol{\Sigma}}_{mo}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1}(\textbf{y}_{o}-\tilde{\mu}_{o})$ and $\tilde{\boldsymbol{\Sigma}}^{*}=\tilde{\boldsymbol{\Sigma}}_{mm}-\tilde{\boldsymbol{\Sigma}}_{mo}\tilde{\boldsymbol{\Sigma}}_{oo}^{-1}\tilde{\boldsymbol{\Sigma}}_{om}$ .

A.2 First and second derivatives of the complete-data log-likelihood function

The complete log-likelihood function for the model defined by Equations 1-6 is given by

	$\displaystyle\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})$	$\displaystyle=$	$\displaystyle g(\nu\mid\textbf{u})-\frac{n-p}{2}\log\sigma^{2}$
			$\displaystyle-\frac{1}{2\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2}+\text{cte},$

where $g(\nu\mid\textbf{u})=\frac{n-p}{2}\left(\nu\log\left(\frac{\nu}{2}\right)-2\log\Gamma\left(\frac{\nu}{2}\right)\right)+\frac{\nu}{2}\left(\sum_{i=p+1}^{n}\log u_{i}-\sum_{i=p+1}^{n}u_{i}\right)$ , $\tilde{y}_{i}=y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}$ , and $\textbf{w}_{i}=\textbf{y}_{(i,p)}-\textbf{X}_{(i,p)}\boldsymbol{\beta}$ .

The first derivative of $\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})$ with respect to each element of $\boldsymbol{\theta}=(\boldsymbol{\beta}^{\top},\boldsymbol{\phi}^{\top},\sigma^{2},\nu)$ is:

	$\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\nu}=\frac{n-p}{2}\left(\log\left(\frac{\nu}{2}\right)+1-\psi\left(\frac{\nu}{2}\right)\right)$
	$\displaystyle\hskip 56.9055pt+\frac{1}{2}\left(\sum_{i=p+1}^{n}\log u_{i}-\sum_{i=p+1}^{n}u_{i}\right),$
	$\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\sigma^{2}}=-\frac{n-p}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2},$
	$\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}}=\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}\left[u_{i}(y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta})\textbf{w}_{i}-u_{i}\textbf{w}_{i}\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right],$
	$\displaystyle\frac{\partial\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}}=\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}\left[u_{i}(y_{i}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi})\boldsymbol{\alpha}_{i}-u_{i}\boldsymbol{\alpha}_{i}\boldsymbol{\alpha}_{i}^{\top}\boldsymbol{\beta}\right],$

with $\boldsymbol{\alpha}_{i}=\textbf{x}_{i}-\textbf{X}_{(i,p)}^{\top}\boldsymbol{\phi}\,$ and $\psi(x)=\frac{\mbox{d}}{\mbox{d}x}\log\Gamma(x)=\frac{\Gamma^{\prime}(x)}{\Gamma(x)}$ .

Besides, elements of the Hessian matrix are given by:

	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\nu^{2}}=\frac{n-p}{2}\left(\frac{1}{\nu}-\frac{1}{2}\psi_{1}\left(\frac{\nu}{2}\right)\right),$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\sigma^{2}\partial\nu}=0,$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}\partial\nu}=\textbf{0},$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\nu}=\textbf{0},$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial(\sigma^{2})^{2}}=\frac{n-p}{2\sigma^{4}}-\frac{1}{\sigma^{6}}\sum_{i=p+1}^{n}u_{i}\left(\tilde{y}_{i}-\textbf{w}_{i}^{\top}\boldsymbol{\phi}\right)^{2},$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}\partial\sigma^{2}}=-\frac{1}{\sigma^{4}}\sum_{i=p+1}^{n}\Big{[}u_{i}\left(y_{i}-\textbf{x}_{i}^{\top}\boldsymbol{\beta}\right)\textbf{w}_{i}$
	$\displaystyle\hskip 113.81102pt-u_{i}\textbf{w}_{i}\textbf{w}_{i}^{\top}\boldsymbol{\phi}\Big{]},$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\sigma^{2}}=-\frac{1}{\sigma^{4}}\sum_{i=p+1}^{n}\Big{[}u_{i}\left(y_{i}-\textbf{y}_{(i,p)}^{\top}\boldsymbol{\phi}\right)\boldsymbol{\alpha}_{i}$
	$\displaystyle\hskip 113.81102pt-u_{i}\boldsymbol{\alpha}_{i}\boldsymbol{\alpha}_{i}^{\top}\boldsymbol{\beta}\Big{]},$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\phi}\partial\boldsymbol{\phi}^{\top}}=-\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\textbf{w}_{i}\textbf{w}_{i}^{\top},$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\boldsymbol{\phi}^{\top}}=\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}\Big{[}u_{i}\left(\textbf{w}_{i}^{\top}\boldsymbol{\phi}+\textbf{x}_{i}^{\top}\boldsymbol{\beta}-y_{i}\right)\textbf{X}_{(i,p)}^{\top}$
	$\displaystyle\hskip 105.2751pt-u_{i}\boldsymbol{\alpha}_{i}\textbf{w}_{i}^{\top}\Big{]},$
	$\displaystyle\frac{\partial^{2}\ell_{c}(\boldsymbol{\theta}\mathchar 24635\relax\;\textbf{y}_{c})}{\partial\boldsymbol{\beta}\partial\boldsymbol{\beta}^{\top}}=-\frac{1}{\sigma^{2}}\sum_{i=p+1}^{n}u_{i}\boldsymbol{\alpha}_{i}\boldsymbol{\alpha}_{i}^{\top},$

with $\psi_{1}(x)=\frac{\mbox{d}^{2}}{\mbox{d}x^{2}}\log\Gamma(x)$ .

Appendix B Additional results

This section displays some additional results obtained from the simulation study and the analysis of the phosphorus concentration dataset.

B.1 Simulation study 1

Figure 8 shows boxplots for the estimates of $\nu$ considering different sample sizes and limits of detection (LOD). Here it is possible to observe that the median of the estimates is close to the true value ( $\nu=4$ ) independent of the sample size and LOD. Furthermore, interquartile ranges decrease as sample sizes increase, suggesting the consistence of the estimates.

B.2 Application

For the phosphorus concentration dataset, available in the package ARCensReg, was fitted two censored autoregressive models, one which considered a model with Student- $t$ innovations (CAR $t$ (1)) and other with normal innovations (CAR(1)). Then, Figure 9 displays the sample autocorrelation for the quantile residuals computed from each model, where we noted that the obtained residuals present negligible serial autocorrelation.

Finally, Figure 10 shows the convergence graphics of the SAEM parameter estimates obtained for the CAR $t$ (1) model at each iteration, the model selected as the best for fitting this dataset based on prediction accuracy and residual analysis.

Censored autoregressive regression models with Student-tt innovations