This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Limitation on the Student-tt Linear Regression Model

Yoshiko Hayashi 111[email protected]
Osaka Central Advanced Mathematical Institute
Osaka Metropolitan University
Abstract

For the outlier problem in linear regression models, the Student-tt linear regression model is one of the common methods for robust modeling and is widely adopted in the literature. To examine the model, it is important if a model recognizes outliers. This study provides the practically useful and quite simple conditions to ensure that the Student-tt linear regression model is robust against an outlier in the yy-direction.

Keywords: Outlier, Linear regression model, Regularly varying distributions, Bayesian modeling

1 Introduction

In regression analysis, outliers in a linear regression model can jeopardize the results by the ordinary least squares (OLS) estimator. The Student-tt linear regression model, designed as a linear regression model with the error term having t-distribution, is one of the common methods to solve the outlier problem (Lange et al., 1989). While the Student-tt linear regression model has been widely adopted, most studies apply it without careful theoretical consideration.

Bayesian robustness modeling using heavy tailed distributions, which include tt-distribution, provides the theoretical solution for the outlier problem. For a simple Bayesian model, when both the prior distribution and the likelihood of an observation are normal distributions, the posterior distribution is also a normal distribution, and the posterior mean is a weighted average of the mean of prior distribution and the observation. When the prior distribution and the observation are located far from each other and follow a normal distribution, the posterior distribution is far from both pieces of information; this is called conflictconflict. For example, when a single observation x=15x=15 follows N(μ,1)N(\mu,1) and the prior of the location parameter follows N(0,1)N(0,1), then the posterior distribution follows N(7.5,0.5)N(7.5,0.5). In this case, the posterior distribution is not suggested by either the prior distribution or the observation.

For this problem, Dawid (1973) formally provides the theoretical resolution of the conflict between the prior distribution and the data, also known as conflict of information. He uses the pure location model in which the scale parameter is given and clarifies how an outlier is automatically ignored in the posterior distribution when the outlier follows a heavy-tailed distribution. This result occurs because we believe the information about the prior distribution more than we believe the observation.

O’Hagan (1990) presents the concept of credencecredence, which measures the degree of the tail’s information. As Andrade and O’Hagan (2006) mention, credence represents how much we are prepared to believe in one source of information rather than another in the case of conflict; it is represented by the index of a regularly varying function. Andrade and O’Hagan (2011) show that in a univariate model, many observations that are located close enough create a larger credence, which equals the sum of each credence of the observations. When an outlier is far from the group of non-outliers with the same heavy-tailed distribution, the information of the group of non-outliers creates larger credibility or credence. Thus, the posterior distribution is located closer to the non-outliers, and is robust against the outlier. Andrade and O’Hagan (2011) establish sufficient conditions for robust modeling against a single outlier in nn samples for a univariate model using regular variation theory. The sufficient condition requires the minimum number of non-outliers to be robust against an outlier. O’Hagan and Pericchi (2012) review previous studies on the resolution of the conflict.

O’Hagan (1988) applies heavy-tailed modeling to a Student-tt linear regression model without an intercept term under the pure location structure and demonstrates its robustness. For the model without the intercept term, the outlier unconditionally conflicts with non-outliers. Therefore, a univariate model can be directly applied. By contrast, as Peña et al. (2009) mention, we need to be careful about the outlier in the xx-direction for the model with an intercept term. Peña et al. (2009) show when the outliers in the xx-direction reach infinity, the result of a Student-tt linear model does not enable robustness. Peña et al. (2009) examine the phenomenon using Kullback–Leibler divergence, and propose a down-weighting method that assigns a lower weight to outliers. As Andrade and O’Hagan’s (2011) show the heavy-tail modeling using tt-distribution is a partial robust modeling. Thus the location-scale modeling completely cannot ignore outliers. Gagnon et al. (2020) theoretically develop a robust linear regression model using a super heavy tailed distribution, which is heavier than tt-distribution and provides a wholly robust modeling. He et al. (2021), Andrade (2022), and Gagnon and Hayashi (2023) provided theoretical consideration for the Student-tt linear regression model. Although the Student-tt linear regression model provides partial robustness, the model is widely applied. Thus, it is quite important to clarify how the model works as a robust model.

Our study investigates the conditions for the Student-tt linear model with an intercept term for an outlier in the yy-direction by extending Andrade and O’Hagan’s (2011) conditions. For this purpose, first, we investigate the range in which there is a conflict between an outlier and non-outliers, which is the necessary condition to apply heavy-tail modeling. Then, we clarify the condition of the model’s robustness.

Refer to caption
Refer to caption
Figure 1: Conflict in linear model: Straight lines show the regression line by OLS and dotted lines show it without the outlier.

Heavy-tail modeling as a resolution of a conflict between an outlier and non-outliers works when the outlier and the mean of the group of non-outliers are located far enough, and the sufficient condition for the number of non-outliers is satisfied. A linear regression model provides the mean of yy conditioned on xx. Thus, the conflict of information in a linear regression model with an intercept term occurs when an outlier is located far from the regression line, and non-outliers lie close to the regression line created from non-outliers.

The left panel in Figure 1 shows the case in which the outlier conflicts with the group of non-outliers. The figure shows that the outlier is located far from the regression line by OLS. In this case, the Student-tt linear regression model is robust against the outlier in the yy-direction. This is because the information of the conditional distribution of the outlier is less credible than that for the grouped non-outlier data, under the assumption of the same degrees of freedom of tt-distribution for all data, which represent credencecredence. Non-outliers in the left panel of Figure 1 are close to each other and create a large credence, while the outlier does not belong to the regression line suggested by the grouped data and creates a small credence. Meanwhile, as shown in the right panel of Figure 1, when the outlier is in the xx-direction, which is called the leverage point, all data, including the outlier, are sufficiently close to the regression line, and create larger credence than the regression line without the outlier, which is presented by the dotted line in Figure 1. In this case, the straight line in the right panel of Figure 1 has larger credence than the dotted line does.

The rest of the paper is organized as follows. In Section 2, we provide the condition for the existence of conflict between an outlier and non-outliers in the Student-tt linear regression model. Section 3 shows the sufficient conditions for the Student-tt linear regression model. Section 4 presents simulation results in a simple linear regression model. Section 5 concludes.

2 Conflicting Information in the Student-tt Linear Regression Model

To examine the limitation of the robustness for the Student-tt linear model with an intercept term, we consider the following linear regression model. The dependent variable 𝐲\bf y is an n×1n\times 1 vector, the independent variable 𝐗\bf X is an n×(k+1)n\times(k+1) full-rank matrix, β\bf\beta is a (k+1)×1(k+1)\times 1 vector, and 𝐮\bf u is an n×1n\times 1 vector assumed to be independent and identically distributed:

𝐲=𝐗β+𝐮,\displaystyle\bf{y=X\beta+u}, (1)

where

𝐗=[1X11Xk11X1nX𝑘𝑛].\bf X\it=\left[\begin{array}[]{cccc}1&X_{11}&\ldots&X_{k1}\\ \vdots&\vdots&\ddots&\vdots\\ 1&X_{1n}&\ldots&X_{kn}\end{array}\right].\\

Consider the residual of the result from OLS for the model in equation (1):

𝐲=𝐗β^𝐨𝐥𝐬+𝐞,\displaystyle\bf{y=X\hat{\beta}_{ols}+e}, (2)

where

𝐞=[e/𝑜𝑢𝑡1,,e/𝑜𝑢𝑡n1,e𝑜𝑢𝑡],\bf e^{\prime}\it=\left[\begin{array}[]{c}e^{1}_{/out},\ldots,e^{n-1}_{/out},e_{out}\end{array}\right],\\

and subscripts /out{/out} and out{out} show a non-outlier and an outlier, respectively.

According to Cook and Weiberg (1982) and Chatterjee and Hadi (1988), the prediction or hat matrix, 𝐲^=𝐇𝐲\bf\hat{y}=Hy, is given by

𝐇=𝐗(𝐗T𝐗)1𝐗.\displaystyle\bf H=X(X^{\it{T}}\bf X)^{\rm{-1}}\bf X. (3)

Then the residual is defined as

𝐞=(𝐈n𝐇)𝐲.\displaystyle\bf{e}=(\bf{I}_{\it n}-\bf{H})\bf{y}. (4)

The element of the hat matrix for the model with the intercept term, the (i,ji,j)-th element of 𝐇\bf H, hijh_{ij} is described as

hij=1n+(𝐱i𝐱¯)T(𝐗~T𝐗~)𝟏(𝐱j𝐱¯)(i,j=1,2,,n),\displaystyle h_{ij}=\frac{1}{n}+\bf(x_{\it{i}}-\bar{x})^{\it{T}}\bf(\tilde{X}^{\it{T}}\tilde{X})^{-1}(x{{}_{\it{j}}}-\bar{x})\quad(\it i,j=\rm 1,2,\ldots,\it n), (5)

where

𝐗~=[X11X¯1Xk1X¯kX1nX¯1X𝑘𝑛X¯k],𝐱i=[𝐗𝟏𝐢𝐗𝐤𝐢],𝐱¯=[X¯1X¯k],\bf\widetilde{X}\it=\left[\begin{array}[]{ccc}X_{11}-\bar{X}_{1}&\ldots&X_{k1}-\bar{X}_{k}\\ \vdots&\ddots&\vdots\\ X_{1n}-\bar{X}_{1}&\ldots&X_{kn}-\bar{X}_{k}\end{array}\right],\\ \bf x_{\it{i}}=\left[\begin{array}[]{c}X_{1i}\\ \vdots\\ X_{ki}\end{array}\right],\\ \bf\bar{x}\it=\left[\begin{array}[]{c}\bar{X}_{1}\\ \vdots\\ \bar{X}_{k}\end{array}\right],\\

and X¯l=Σi=1nXli/n(l=1,,k)\bar{X}_{l}=\Sigma_{i=1}^{n}X_{li}/n\quad(l=1,\ldots,k).

In the model with the intercept term, for the diagonal elements, which are called leverageleverage, 1/n1/n is the smallest value and 1 is the largest.

1nhii1(i=1,2,,n).\frac{1}{n}\leq h_{ii}\leq 1\quad(\it i=\rm 1,2,\ldots,\it n). (6)

Mohammadi (2016) showed the range of the off-diagonal elements with the intercept term as follows:

1n12hij12(ij).\frac{1}{n}-\frac{1}{2}\leq h_{ij}\leq\frac{1}{2}\quad(\it i\neq j). (7)

Assume that the non-outliers are located close enough to the regression line. If the outlier moves away from the regression line and does so faster than the group of non-outliers does, then the residual eoute_{out} reaches infinity as the outlier reaches infinity in the yy-direction. Since non-outliers create combined credence, if one of the non-outliers conflicts with the outlier, then the group of non-outliers conflicts with the outlier. As shown in Figure 1, when an outlier is located close enough to the group of non-outliers in the xx-direction, they conflict. Therefore, if the partial derivative of eoute_{out} with respect to youty_{out} is larger than the partial derivative of the closest non-outlier’s residual, the outlier conflicts with the group of non-outliers.

We derived the relationship between the Hat matrix and the range of outliers in a linear regression model. Let nnth the observation be an outlier. Thus, hnnouth^{out}_{nn} represents the element of 𝐇\bf H of an outlier.

eoutyout>e/outmaxyout,\displaystyle\frac{\partial e_{out}}{\partial y_{out}}>\frac{\partial e^{max}_{/out}}{\partial y_{out}}, (8)

where subscript /out{/out} is a non-outlier.

From the definition, the partial derivative of the residual of an outlier, eoute_{out}, and the residual of a non-outlier, e/outje^{j}_{/out}, with respect to outlier, youty_{out}, can be given as

eoutyout=1hnnout,\displaystyle\frac{\partial e_{out}}{\partial y_{out}}=1-h^{out}_{nn}, (9)

and

e/outjyout=hnj.\displaystyle\frac{\partial e^{j}_{/out}}{\partial y_{out}}=-h_{nj}. (10)

To investigate the limitation in which an outlier conflicts with the group of non-outliers, this study examined the location of an outlier as the outlier goes to infinity in the yy-direction. To examine the condition, we first see if the observation is an outlier in the first stage and then derive the condition if the non-outliers create a group.

Lemma 1 If the following condition holds, the observation becomes an outlier in the linear regression model as yout±y_{out}\rightarrow\pm\infty :

hnj>0(j=1,,n1).\displaystyle h_{nj}>\rm 0\quad\it(j=\rm{1},\ldots,\it n-\rm 1). (11)

Proof of Lemma 1

The condition that the observation becomes an outlier is expressed as the outlier moves away from the regression line, where the residual eoute_{out} reaches infinity as the outlier reaches infinity in the yy-direction.

e𝑜𝑢𝑡y𝑜𝑢𝑡>e/𝑜𝑢𝑡jy𝑜𝑢𝑡\displaystyle\it\frac{\partial e_{out}}{\partial y_{out}}>\frac{\partial e^{j}_{/out}}{\partial y_{out}}\quad\quad (j=1,,n1).\displaystyle\quad\quad(j=1,\ldots,n-1). (12)

Substituting equations (9) and (10) into the condition (12), we have

1h𝑛𝑛𝑜𝑢𝑡>h𝑛𝑗.\displaystyle 1-\it h^{out}_{nn}>-h_{nj}. (13)

Using (6), Lemma 1 is obtained.

Lemma 2 If the following condition holds, the non-outliers create a group against the outlier as youty_{out} goes to infinity in the linear regression model.

hnnout<12.\displaystyle h^{out}_{nn}<\frac{1}{2}. (14)

Proof of Lemma 2

The outlier moves away from the regression line faster than the group of non-outliers. Therefore, if the partial derivative of eoute_{out} with respect to youty_{out} is larger than the partial derivative of non-outlier residuals, non-outliers create a group against the outlier. To create a group of non-outliers, the following conditions are required:

{(i)e𝑜𝑢𝑡y𝑜𝑢𝑡>e/𝑜𝑢𝑡y𝑜𝑢𝑡fore/𝑜𝑢𝑡y𝑜𝑢𝑡0,(ii)e𝑜𝑢𝑡y𝑜𝑢𝑡>e/𝑜𝑢𝑡y𝑜𝑢𝑡fore/𝑜𝑢𝑡y𝑜𝑢𝑡<0.\displaystyle\begin{cases}\rm(i)\quad\it\frac{\partial e_{out}}{\partial y_{out}}>\frac{\partial e_{/out}}{\partial y_{out}}\quad\quad&\rm for\quad\it\frac{\partial e_{/out}}{\partial y_{out}}\geq\rm 0,\\ \\ \rm(ii)\quad\it\frac{\partial e_{out}}{\partial y_{out}}>-\frac{\partial e_{/out}}{\partial y_{out}}\quad\quad&\rm for\quad\it\frac{\partial e_{/out}}{\partial y_{out}}<\rm 0.\end{cases} (15)

When Lemma 1 holds, condition (i) in (15) holds: To satisfy condition (ii) in (15), we have

1h𝑛𝑛𝑜𝑢𝑡>h𝑛𝑗\displaystyle 1-\it h^{out}_{nn}>h_{nj} (16)

for e/out/yout<0\partial e_{/out}/\partial y_{out}<\rm 0. By the range of hnjh_{nj}, Lemma 2 is obtained.


Corollary 1 When the following condition holds, the property of idempotent and symmetric matrix is satisfied under the conditions Lemma 1 and Lemma 2.

Σkjn1(hnk)218(j=1,,n1).\displaystyle\Sigma_{k\neq j}^{n-1}(h_{nk})^{2}\geq\frac{1}{8}\quad\quad(j=1,\ldots,n-1). (17)

Proof of Corollary 1

As Chatterjee and Hadi (1988) denoted, since a Hat matrix is idempotent and symmetric, hnnouth^{out}_{nn} can be written as

hnnout\displaystyle h^{out}_{nn} =\displaystyle= (hnnout)2+Σj=1n1(hnj)2\displaystyle(h^{out}_{nn})^{2}+\Sigma_{j=1}^{n-1}(h_{nj})^{2} (18)
=\displaystyle= (hnnout)2+(hnj)2+Σkjn1(hnk)2\displaystyle(h^{out}_{nn})^{2}+(h_{nj})^{2}+\Sigma_{k\neq j}^{n-1}(h_{nk})^{2}

By arranging (18), we have

14\displaystyle\frac{1}{4} =\displaystyle= (hnnout12)2+(hnj)2+Σkjn1(hnk)2.\displaystyle\left(h^{out}_{nn}-\frac{1}{2}\right)^{2}+(h_{nj})^{2}+\Sigma_{k\neq j}^{n-1}(h_{nk})^{2}. (19)

The dashed line and the dot-dashed line in Figure 2 show Lemma 1 and 2 based on the conditions (13) and (16). The dotted circle in Fig 2 shows the case with Σkjn1(hnk)2=0\Sigma_{k\neq j}^{n-1}(h_{nk})^{2}=0, which does not satisfy both Lemma 1 and 2 for hnnout>0.5h_{nn}^{out}>0.5. From the figure, Σkjn1(hnk)2\Sigma_{k\neq j}^{n-1}(h_{nk})^{2} is necessary greater than or equal to 18\frac{1}{8} to satisfy the conditions of Lemma 1 and 2.

Refer to caption
Figure 2: The dashed line and the dot-dashed line show the conditions of Lemma 1 and 2, respectively. The circle drawn by line shows the case with Σkjn1(hnk)2=18\Sigma_{k\neq j}^{n-1}(h_{nk})^{2}=\frac{1}{8}, and drawn by dotted line shows Σkjn1(hnk)2=0\Sigma_{k\neq j}^{n-1}(h_{nk})^{2}=0.

3 Sufficient Conditions for Rejecting an Outlier in the Student-tt Linear Regression Model

This section investigates the sufficient conditions for a Student-tt linear regression model being robust based on Andrade and O’Hagan’s (2011) corollary 4, which shows the conditions for robustness against a single outlier out of nn samples in a univariate model. To examine the conditions for the Student-tt linear regression model, we adopt the independent Jeffreys priors derived by Fonseca et al. (2008, 2014) under the given degrees of freedom.

As shown in Andrade and O’Hagan (2006), the credence is defined as cc for f(x)Rc(c>0)f(x)\in R_{-c}(c>0), where RcR_{-c} presents that f(x)f(x) is regularly varying at \infty with index cc. For a tt distribution, the credence of tt distribution with γ\gamma degrees of freedom is γ+\gamma+1 (see Appendix A).

Assume all data, including a single outlier among nn observations, are tt-distributed with degrees of freedom, γ\gamma, tγ(μ,σ)t_{\gamma}(\mu,\sigma), which has mean μ\mu and scale parameter σ\sigma with γ\gamma degrees of freedom. As the tt-distribution is a location-scale family, the likelihood can be denoted as f(yi|𝐗,β,σ)=1/σ×h[(yiXiβ)/σ]f(y_{i}|\bf X,\beta,\it\sigma)=\rm 1/\it\sigma\times h[(y_{i}-X^{\prime}_{i}\beta)/\sigma]. XiX^{\prime}_{i} is the ii-th row of 𝐗\bf X.

For simplicity, assume all data have the same likelihood function and the non-outliers are close enough to the conditional mean XiβX^{\prime}_{i}\beta. Consider the nn-th observation is an outlier.

Model

{yi|X,β,σDtγ(yi|Xi,β,σ)=1/σ×h[(yiXiTβ)/σ]independent(i=1,,n),βqDπ(βq)1,(q=1,,p),σDπ(σ)1/σ,hR(γ+1),γ>0,(i=1,,n).\left\{\begin{array}[]{@{\,}ll}y_{i}|\textbf{X},\beta,\sigma\stackrel{{\scriptstyle D}}{{\sim}}t_{\gamma}(y_{i}|X_{i},\beta,\sigma)=1/\sigma\times h[(y_{i}-X^{T}_{i}\beta)/\sigma]\hskip 8.5359pt\textrm{independent}\hskip 8.5359pt(i=1,\ldots,n),\\ \beta_{q}\stackrel{{\scriptstyle D}}{{\sim}}\pi(\beta_{q})\propto 1,\hskip 22.76228pt(q=1,\ldots,p),\\ \sigma\stackrel{{\scriptstyle D}}{{\sim}}\pi(\sigma)\propto 1/\sigma,\\ h\in R_{-(\gamma+1)},\gamma>0,\hskip 22.76228pt(i=1,\ldots,n).\end{array}\right.

Theorem 1 Robustness of an outlier among nn observations

Consider nn observations in the present model, and Lemma 1 holds, in which the residual en=ynxnTβe_{n}=y_{n}-x^{T}_{n}\beta reaches infinity as yny_{n} goes to infinity. Then, the following condition holds: (γ+1)<{np}(\gamma+1)<\{n-p\}.

Then, the posterior distribution partially ignores the outlier:

π(β,σ|𝐗,𝐲)σγπ(β,σ|𝐗(n1),𝐲(n1))asyn,\pi(\beta,\sigma|\mathbf{X},\mathbf{y})\propto\sigma^{\gamma}\pi(\beta,\sigma|\bf{X^{\rm(\it n-\rm 1)}},\mathbf{y}^{\rm(\it n-\rm 1)})\hskip 28.45274pt\textrm{as}\hskip 8.5359pt\it y_{n}\rightarrow\infty, (20)

where the superscript notation (n1){(n-1)} is used to indicate the omission of the nn-th observation.

Proof of Theorem 1

When scale parameter σ\sigma is given, the posterior distribution of β\beta in the model is as follows:

π(β|σ,X(n1),y(n1))\displaystyle\pi(\beta|\sigma,\textbf{X}^{(n-1)},\textbf{y}^{(n-1)}) \displaystyle\propto π(β)Πi=1n1h[(yiXiTβ)/σ]\displaystyle\pi(\beta)\cdot\Pi_{i=1}^{n-1}h[(y_{i}-X^{T}_{i}\beta)/\sigma]
\displaystyle\propto Πi=1n1h[(yiXiTβ)/σ]R(n1)(γ+1).\displaystyle\Pi_{i=1}^{n-1}h[(y_{i}-X^{T}_{i}\beta)/\sigma]\in R_{-(n-1)(\gamma+1)}.

Applying transformation τ=1σβ\tau=\frac{1}{\sigma}\beta, which is a p×p\times 1 vector, we obtain

π(y(n1)|σ,X(n1))\displaystyle\pi(\textbf{y}^{(n-1)}|\sigma,\textbf{X}^{(n-1)}) =\displaystyle= (1σ)np1pΠi=1n1h[(yi/σ)XiTτ]𝑑τ.\displaystyle\left(\frac{1}{\sigma}\right)^{n-p-1}\int_{\mathbb{R}^{p}}\Pi_{i=1}^{n-1}h[(y_{i}/\sigma)-X^{T}_{i}\tau]d\tau. (22)

When all elements of X are given and bounded, pΠi=1n1h[(yi/σ)XiTτ]𝑑τ\int_{\mathbb{R}^{p}}\Pi_{i=1}^{n-1}h[(y_{i}/\sigma)-X^{T}_{i}\tau]d\tau in σ\sigma is OO(1). Thus, as a function of σ\sigma, it is slowly varying,

pΠi=1n1h[(yi/σ)XiTτ]𝑑τR0.\displaystyle\int_{\mathbb{R}^{p}}\Pi_{i=1}^{n-1}h[(y_{i}/\sigma)-X^{T}_{i}\tau]d\tau\in R_{0}. (23)

Thus, the marginal posterior distribution of σ\sigma given information X(n1)\textbf{X}^{(n-1)} and y(n1)\textbf{y}^{(n-1)} becomes

π(σ|X(n1),y(n1))\displaystyle\pi(\sigma|\textbf{X}^{(n-1)},\textbf{y}^{(n-1)}) \displaystyle\propto π(σ)π(y(n1)|σ,X(n1))R(np).\displaystyle\pi(\sigma)\cdot\pi(\textbf{y}^{(n-1)}|\sigma,\textbf{X}^{(n-1)})\in R_{-(n-p)}. (24)

Again, applying transformation τ=1σβ\tau=\frac{1}{\sigma}\beta produces the marginal posterior distribution of yny_{n} given information X(n1)\textbf{X}^{(n-1)} as

f(yn|σ,X(n1),y(n1))\displaystyle f(y_{n}|\sigma,\textbf{X}^{(n-1)},\textbf{y}^{(n-1)})\hskip 227.62204pt (25)
=(1σ)npph[(yn/σ)XnTτ]Πi=1n1h[(yi/σ)Xiτ]𝑑τ.\displaystyle\hskip 71.13188pt=\left(\frac{1}{\sigma}\right)^{n-p}\int_{\mathbb{R}^{p}}h[(y_{n}/\sigma)-X^{T}_{n}\tau]\Pi_{i=1}^{n-1}h[(y_{i}/\sigma)-X^{\prime}_{i}\tau]d\tau.

When non-outliers are located close enough to the regression line, Andrade and O’Hagan’s (2011) Proposition 1, which gives the convolution of regularly varying densities being distributed as the sum of them, fg(x)f(x)+g(x)f*g(x)\sim f(x)+g(x), can be applied as f(y)=h[(yn/σ)XnTτ]f(y)=h[(y_{n}/\sigma)-X^{T}_{n}\tau] and g(y)=Πi=1n1h[(yi/σ)XiTτ]g(y)=\Pi_{i=1}^{n-1}h[(y_{i}/\sigma)-X^{T}_{i}\tau]. Since min((γ+1),(n1)(γ+1))=(γ+1)min((\gamma+1),(n-1)(\gamma+1))=(\gamma+1) for n>2n>2, when the residual en=ynXnTβe_{n}=y_{n}-X^{T}_{n}\beta reaches infinity as yny_{n} goes to the infinity, we obtain

ph[(yn/σ)XnTτ]Πi=1n1h[(yi/σ)XiTτ]𝑑τR(γ+1).\displaystyle\int_{\mathbb{R}^{p}}h[(y_{n}/\sigma)-X^{T}_{n}\tau]\Pi_{i=1}^{n-1}h[(y_{i}/\sigma)-X^{T}_{i}\tau]d\tau\in R_{-(\gamma+1)}. (26)

Lemma 1 and 2 show the conditions for the residual en=ynXnTβe_{n}=y_{n}-X^{T}_{n}\beta reaching infinity as yny_{n} goes to infinity. Accordingly, the marginal posterior distribution for σ\sigma is

π(σ|y)=f(yn|σ,X(n1),y(n1))π(σ|X(n1),y(n1))0f(yn|σ,X(n1),y(n1))π(σ|X(n1),y(n1))𝑑σ.\displaystyle\pi(\sigma|\textbf{y})=\frac{f(y_{n}|\sigma,\textbf{X}^{(n-1)},\textbf{y}^{(n-1)})\pi(\sigma|X^{(n-1)},\textbf{y}^{(n-1)})}{\int_{0}^{\infty}f(y_{n}|\sigma,\textbf{X}^{(n-1)},\textbf{y}^{(n-1)})\pi(\sigma|\textbf{X}^{(n-1)},\textbf{y}^{(n-1)})d\sigma}. (27)

Next, consider the case in which yny_{n} goes to infinity. As a function of yny_{n}, the posterior distribution of f(yn|σ,X(n1),y(n1))f(y_{n}|\sigma,\textbf{X}^{(n-1)},\textbf{y}^{(n-1)}) takes the form of 1σg(ynσ)R(γ+1)\frac{1}{\sigma}g\left(\frac{y_{n}}{\sigma}\right)\in R_{-(\gamma+1)}. Thus, by the relationship g(ynσ)/g(yn)=σ(γ+1)g\left(\frac{y_{n}}{\sigma}\right)/g\left(y_{n}\right)=\sigma^{(\gamma+1)},

limynπ(σ|y)\displaystyle\lim_{y_{n}\rightarrow\infty}\pi(\sigma|\textbf{y}) =\displaystyle= σγπ(σ|X(n1),y(n1))limyn01σg(ynσ)/g(yn)π(σ|X(n1),y(n1))𝑑σ.\displaystyle\frac{\sigma^{\gamma}\pi(\sigma|X^{(n-1)},\textbf{y}^{(n-1)})}{\lim_{y_{n}\rightarrow\infty}\int_{0}^{\infty}\frac{1}{\sigma}g\left(\frac{y_{n}}{\sigma}\right)/g\left(y_{n}\right)\pi(\sigma|\textbf{X}^{(n-1)},\textbf{y}^{(n-1)})d\sigma}. (28)

From (24), we obtain

π(σ|X(n1),y(n1))σ(np)l(σ).\displaystyle\pi(\sigma|\textbf{X}^{(n-1)},\textbf{y}^{(n-1)})\propto\sigma^{-(n-p)}l(\sigma). (29)

Thus, for the dominator of (28) to exist, (γ+1)<np(\gamma+1)<n-p must hold. ∎

3.1 Example

We consider the following case of a simple linear regression model with a single outlier:

yi=β0+β1xi+ui(i=1,,n).\displaystyle y_{i}=\beta_{0}+\beta_{1}x_{i}+u_{i}\hskip 28.45274pt(i=1,\ldots,n). (30)

From Lemma 1, we obtain the following condition:

hnj\displaystyle h_{nj} =\displaystyle= 1n+(xoutx¯)(xjx¯)i=1n(xix¯)2\displaystyle\frac{1}{n}+\frac{(x_{out}-\bar{x})(x_{j}-\bar{x})}{\sum^{n}_{i=1}(x_{i}-\bar{x})^{2}} (31)
=\displaystyle= 1n+n1n(xoutx¯)(xjx¯)n1n2(xoutx¯)2i=1n1(xix¯)2+n1n(xoutx¯)2\displaystyle\frac{1}{n}+\frac{\frac{n-1}{n}(x_{out}-\bar{x}^{*})(x_{j}-\bar{x}^{*})-\frac{n-1}{n^{2}}(x_{out}-\bar{x}^{*})^{2}}{\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}+\frac{n-1}{n}(x_{out}-\bar{x}^{*})^{2}}
=\displaystyle= 1n+(xoutx¯)(xjx¯)1n(xoutx¯)2nn1i=1n1(xix¯)2+(xoutx¯)2\displaystyle\frac{1}{n}+\frac{(x_{out}-\bar{x}^{*})(x_{j}-\bar{x}^{*})-\frac{1}{n}(x_{out}-\bar{x}^{*})^{2}}{\frac{n}{n-1}\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}+(x_{out}-\bar{x}^{*})^{2}}
>\displaystyle> 0,\displaystyle 0,

where x¯=1n1i=1n1xi\bar{x}^{*}=\frac{1}{n-1}\sum^{n-1}_{i=1}x_{i}.

By arranging the condition (31), we obtain the range as

(xoutx¯)(xjx¯)<1n1i=1n1(xix¯)2.\displaystyle-(x_{out}-\bar{x}^{*})(x_{j}-\bar{x}^{*})<\frac{1}{n-1}\cdot\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}. (32)

Thus, when (xoutx¯)>0(x_{out}-\bar{x})>0, we obtain the range for satisfying Lemma 1 as follows:

(xjx¯)<(xoutx¯)1n1i=1n1(xix¯)2.\displaystyle-(x_{j}-\bar{x}^{*})<\frac{(x_{out}-\bar{x}^{*})}{\frac{1}{n-1}\cdot\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}}. (33)

From Lemma 2, we obtain the following condition:

hnnout\displaystyle h^{out}_{nn} =\displaystyle= 1n+(xoutx¯)2i=1n(xix¯)2\displaystyle\frac{1}{n}+\frac{(x_{out}-\bar{x})^{2}}{\sum^{n}_{i=1}(x_{i}-\bar{x})^{2}} (34)
=\displaystyle= 1n+(n1n)2(xoutx¯)2i=1n1(xix¯)2+n1n(xoutx¯)2\displaystyle\frac{1}{n}+\frac{\left(\frac{n-1}{n}\right)^{2}(x_{out}-\bar{x}^{*})^{2}}{\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}+\frac{n-1}{n}(x_{out}-\bar{x}^{*})^{2}}
=\displaystyle= 1n+n1n(xoutx¯)2nn1i=1n1(xix¯)2+(xoutx¯)2\displaystyle\frac{1}{n}+\frac{\frac{n-1}{n}(x_{out}-\bar{x}^{*})^{2}}{\frac{n}{n-1}\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}+(x_{out}-\bar{x}^{*})^{2}}
<\displaystyle< 12.\displaystyle\frac{1}{2}.

By rearranging the above condition, we obtain the range as

(xoutx¯)2<n2n1i=1n1(xix¯)2.\displaystyle(x_{out}-\bar{x})^{2}<\frac{n-2}{n-1}\cdot\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}. (35)

Thus, when (xoutx¯)>0(x_{out}-\bar{x})>0, we obtain the range for satisfying Lemma 2 as follows:

x¯[(n2)i=1n1(xix¯)2n1]12<xout<x¯+[(n2)i=1n1(xix¯)2n1]12\displaystyle\bar{x}^{*}-\left[(n-2)\cdot\frac{\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}}{n-1}\right]^{\frac{1}{2}}<x_{out}<\bar{x}^{*}+\left[(n-2)\cdot\frac{\sum^{n-1}_{i=1}(x_{i}-\bar{x}^{*})^{2}}{n-1}\right]^{\frac{1}{2}} (36)

These results highlight that the robust range is wider, as the number of non-outliers is larger. In addition, when the independent variable of the outlier is located far from other data, there is no conflict of information, irrespective of the value of yy.

This subsection investigates the robustness performance in relation to the value of the outlier in the Student-tt linear regression model for simple linear regression. For robustness, the degrees of freedom of the tt-distributed errors need to be sufficiently small. Thus, we utilize three degrees of freedom, uit(3)(0,σ)u_{i}\sim t_{(3)}(0,\sigma) for the error term. We employ the independent Jeffreys priors; the priors of β0\beta_{0} and β1\beta_{1} have uniform distributions, and the prior distribution of σ\sigma is 1/σ1/\sigma. By Theorem 5.1, the sufficient condition for tt-distribution with dd degrees of freedom is γ<{np1}\gamma<\{n-p-1\}. Thus, in this model the condition becomes 3<n43<n-4. The simulated observations are defined as yi=3+2xi+uiy_{i}=3+2x_{i}+u_{i}. We set x/out=[2,1,0,1,2]x_{/out}=[-2,-1,0,1,2] for the first simulation and [2,1,0,1,2,2,1,0,1,2][-2,-1,0,1,2,-2,-1,0,1,2] for the second one. The error terms are generated from the normal distribution with mean 0 and variance 1. We move the outlier in the xx-direction, from -50 to 50, and set yout=10y_{out}=-10. The left panels of Figure 3 depict the simulated data we used for these simulations. The right panels of Figure 3 show the results of the numerical evaluation of the posterior mean of the parameter β1\beta_{1}; the upper right panel illustrates the result for n=6n=6, which satisfies the sufficient condition, and the lower right panel presents it for n=11n=11, which satisfies the condition. The results indicate that the Student-tt linear regression model is robust within the controllable range defined in Lemma 2, which is shown as the vertical dotted lines.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Left panels: Scatterplots of simulated data. Right panels: Posterior mean of the slope β1\beta_{1}. The upper panel shows the result for n=6n=6, which does not satisfy the sufficient condition, and the lower panel shows the result for n=11n=11, which satisfies the condition. The straight line depicts the result of the Student-tt linear regression model, and the dashed line depicts that of the linear regression model with normally distributed error terms. The vertical dotted lines show the range defined in Lemma 2.

4 Concluding remarks

This study extended Andrade and O’Hagan’s (2011) condition for resolving the outlier problem of the Student-tt linear regression model. The model treats outliers as a natural outcome of the data and does not remove them arbitrarily. The condition works when there is conflicting information between outliers and non-outliers. However, in a linear regression model, an outlier does not conflict with non-outliers when the outlier is located far from non-outliers in the xx-direction. Thus, we first clarified the range of the presence of conflicting information in a linear regression model. Then, we derived the sufficient condition for robustness of the Student-tt linear regression model in the above range. Future research should investigate the conditions for many outliers. Furthermore, it would be interesting to extend this study to a model with unknown degrees of freedom for the tt-distribution.

References

  • [1] Andrade, J.A.A. (2022), On the robustness to outliers of the Student-t process, Scandinavian Journal of Statistics.
  • [2] Andrade, J.A.A. and O’Hagan, A. (2006), Bayesian robustness modeling using regularly varying distribution. Bayesian Analysis 1, 169–188.
  • [3] Andrade, J.A.A. and O’Hagan, A. (2011), Bayesian robustness modelling of location and scale parameters. Scandinavian Journal of Statistics 38(4), 691–711.
  • [4] Bingham, N.H., Goldie, C.M., and Teugels, J.L. (1987), Regular Variation. Cambridge University Press, Cambridge.
  • [5] Chatterjee, S. and Hadi, A.S. (1988), Sensitivity Analysis in Linear Regression. Wily, New York.
  • [6] Cook, R. D. and Weisberg, S. (1982), Residuals and Influence in Regression. New York: Chapman and Hall.
  • [7] Dawid, A.P. (1973), Posterior expectations for large observations. Biometrika 60(3), 664–667.
  • [8] Fonseca, T.C.O., Ferreira, M.A.R., and Migon, H.S. (2008), Objective Bayesian analysis for the Student-tt regression model. Biometrika 95(2), 325–333.
  • [9] Gagnon, P., Desgagné, A., and Bédard, M. (2020). A New Bayesian Approach to Robustness Against Outliers in Linear Regression. Bayesian Analysis 15, 389–414.
  • [10] Gagnon, P. and Hayashi, Y. (2023), Theoretical properties of Bayesian Student-t linear regression. Statistics and Probability Letters 193, 109693.
  • [11] He, D., Sun, D., and He, L. (2021), Objective Bayesian analysis for the 21 Student-t linear regression. Bayesian Analysis, 16, 129–145.
  • [12] Lange, K.L., Little, R.J.A., and Taylor, J.M.G. (1989), Robust statistical modeling using the t distribution. Journal of the American Statistical Association 84(408), 881–892.
  • [13] O’Hagan, (1988), A. Modelling with heavy tails. Bayesian Statistics 3, 345–359.
  • [14] O’Hagan, A. (1990), On Outliers and Credence for Location Parameter Inference. Journal of the American Statistical Association , 85, pp.172–176.
  • [15] O’Hagan, A. and Pericchi, L. (2012), Bayesian heavy-tailed models and conflict resolution: A review. Brazilian Journal of Probability and Statistics 26(4), 372–401, 2012.
  • [16] Peña, D., Zamar, R., and Yan, G. (2009), Bayesian likelihood robustness in linear models. Journal of Statistical Planning Inference 139(7), 2196–2207, 2009.
  • [17] Resnick, S.I. (2007), Heavy-tail Phenomena: Probabilistic and Statistical Modeling. Springer, New York.

Appendix A: Regularly varying functions

The tail behavior can be presented by the index of a regularly varying function. The index ρ\rho is defined as follows.

A positive measurable function f(x)f(x) is regularly varying at \infty with index ρR\rho\in R for an arbitrary positive tt.

limxf(tx)f(x)=tρ\displaystyle\lim_{x\rightarrow\infty}\frac{f(tx)}{f(x)}=t^{\rho} (A.1 )

We present it as f(x)Rρf(x)\in R_{\rho} in this study, and l(x)R0l(x)\in R_{0} is called “slowly varying.” The regularly varying function can be presented as f(x)=tρl(x)f(x)=t^{\rho}l(x).

Using the property

limxlog(f(x))logx=ρ,\displaystyle\lim_{x\rightarrow\infty}\frac{\log(f(x))}{\log x}=\rho, (A.2 )

we obtain the index for tt distribution with the degrees of freedom, dd, as

limxlog(p(x;d,μ,σ2))logx\displaystyle\lim_{x\to\infty}\frac{\log(p(x;d,\mu,\sigma^{2}))}{\log x} =\displaystyle= limx(Ad+12log({1+1d(xμσ)2}))logx\displaystyle\lim_{x\rightarrow\infty}\frac{(A-\frac{d+1}{2}\log(\{1+\frac{1}{d}(\frac{x-\mu}{\sigma})^{2}\}))}{\log x}
=\displaystyle= (d+1),\displaystyle-(d+1),

where A=log(Γ(d+12)Γ(d2)π1/2d1/2σ)A=\log\left(\frac{\Gamma(\frac{d+1}{2})}{\Gamma(\frac{d}{2})\pi^{1/2}d^{1/2}\sigma}\right).

Some properties of the regularly varying function used in this study are seen in Bingham et al. (1987) and Resnick (2007).