This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Symmetric generalized Heckman models

Helton Saulo, Roberto Vila and Shayane S. Cordeiro
Department of Statistics, Universidade de Brasília, Brasília, Brazil
Abstract

The sample selection bias problem arises when a variable of interest is correlated with a latent variable, and involves situations in which the response variable had part of its observations censored. Heckman, (1976) proposed a sample selection model based on the bivariate normal distribution that fits both the variable of interest and the latent variable. Recently, this assumption of normality has been relaxed by more flexible models such as the Student-tt distribution (Marchenko and Genton,, 2012; Lachos et al.,, 2021). The aim of this work is to propose generalized Heckman sample selection models based on symmetric distributions (Fang et al.,, 1990). This is a new class of sample selection models, in which variables are added to the dispersion and correlation parameters. A Monte Carlo simulation study is performed to assess the behavior of the parameter estimation method. Two real data sets are analyzed to illustrate the proposed approach.

Keywords. Generalized Heckman models \cdot Symmetric distributions \cdot Variable dispersion \cdot Variable correlation.

1 Introduction

It is common in the areas of economics, statistics, sociology, among others, that in the sampling process there is a relationship between a variable of interest and a latent variable, in which the former is observable only in a subset of the population under study. This problem is called sample selection bias and was studied by Heckman, (1976). The author proposed a sample selection model by joint modeling the variable of interest and the latent variable. The classical Heckman sample selection (classical Heckman-normal model) model received several criticisms, due to the need to assume bivariate normality and the difficulty in estimating the parameters using the maximum likelihood (ML) method, which led to the introduction of an alternative estimation method known as the two-step method; see Heckman, (1979). Some studies on Heckman models have been done by Nelson, (1984), Paarsch, (1984), Manning et al., (1987), Stolzenberg and Relles, (1990) and Leung and Yu, (1996). These works suggested that the Heckman sample selection model can reduce or eliminate selection bias when the assumptions hold, but deviation from normality assumption may distort the results.

The normality assumption of the classical Heckman-normal model (Heckman,, 1976) has been relaxed by more flexible models such as the Student-tt distribution (Marchenko and Genton,, 2012; Ding,, 2014; Lachos et al.,, 2021), the skew-normal distribution (Ogundimu and Hutton,, 2016) and the Birnbaum-Saunders distribution (Bastos and Barreto-Souza,, 2021). Moreover, the classical Heckman-normal model assumes that the dispersion and correlation (sample selection bias parameter) are constant, which may not be adequate. In this context, the present work aims to propose generalized Heckman sample selection models based on symmetric distributions (Fang et al.,, 1990). In the proposed model, covariates are added to the dispersion and correlation parameters, then we have covariates explaining possible heteroscedasticity and sample selection bias, respectively. Our proposed methodology can be seen as a generalization of the generalized Heckman-normal model with varying sample selection bias and dispersion parameters proposed by Bastos et al., (2021) and the Heckman-Student-tt model proposed by Marchenko and Genton, (2012). We demonstrate that the proposed symmetric generalized Heckman model outperforms the generalized Heckman-normal and Heckman-Student-tt models in terms of model fitting, making it a practical and useful model for modelling data with sample selection bias.

The rest of this work proceeds as follows. In Section 2, we briefly describe the bivariate symmetric distributions. We then introduce the symmetric generalized Heckman models. In this section, we also describe the maximum likelihood (ML) estimation of the model parameters. In Section 3, we derive the generalized Heckman-Student-tt model, which is a special case of the symmetric generalized Heckman models. In Section 4, we carry out a Monte Carlo simulation study for evaluating the performance of the estimators. In Section 5, we apply the generalized Heckman-Student-tt to two real data sets to demonstrate the usefulness of the proposed model, and finally in Section 6, we provide some concluding remarks.

2 Symmetric generalized Heckman models

Let 𝒀=(Y1,Y2)\bm{Y}=(Y_{1},Y_{2})^{\top} be a random vector following a bivariate symmetric (BSY) distribution (Fang et al.,, 1990) with location (mean) vector 𝝁=(μ1,μ2)\bm{\mu}=(\mu_{1},\mu_{2})^{\top}, covariance matrix

𝚺=(σ12ρσ1σ2ρσ1σ2σ22),\displaystyle\bm{\Sigma}=\begin{pmatrix}\sigma_{1}^{2}&\rho\sigma_{1}\sigma_{2}\\ \rho\sigma_{1}\sigma_{2}&\sigma_{2}^{2}\end{pmatrix}, (2.1)

and density generator gcg_{c}, with μi\mu_{i}\in\mathbb{R}, σi>0\sigma_{i}>0, for i=1,2i=1,2. We use the notation 𝒀BSY(𝝁,𝚺,gc)\bm{Y}\sim{\rm BSY}(\bm{\mu},\bm{\Sigma},g_{c}). Then, the probability density function (PDF) of 𝒀BSY(𝝁,𝚺,gc)\bm{Y}\sim{\rm BSY}(\bm{\mu},\bm{\Sigma},g_{c}) is given by

f𝒀(𝒚;𝝁,𝚺,gc)=1|𝚺|1/2Zgcgc((𝒚𝝁)𝚺1(𝒚𝝁)),𝒚2,f_{\bm{Y}}(\bm{y};\bm{\mu},\bm{\Sigma},g_{c})=\frac{1}{|\bm{\Sigma}|^{1/2}{Z_{g_{c}}}}\,g_{c}\left((\bm{y}-\bm{\mu})^{\top}\bm{\Sigma}^{-1}(\bm{y}-\bm{\mu})\right),\quad\bm{y}\in\mathbb{R}^{2}, (2.2)

where |𝚺|=σ12σ22(1ρ2)|\bm{\Sigma}|=\sigma_{1}^{2}\sigma_{2}^{2}(1-\rho^{2}) and ZgcZ_{g_{c}} is a normalization constant so that f𝒀f_{\bm{Y}} is a PDF, that is,

Zgc=21|𝚺|1/2gc((𝒚𝝁)𝚺1(𝒚𝝁))d𝒚=π0gc(u)du.\displaystyle Z_{g_{c}}=\int_{\mathbb{R}^{2}}\frac{1}{|\bm{\Sigma}|^{1/2}}\,g_{c}\left((\bm{y}-\bm{\mu})^{\top}\bm{\Sigma}^{-1}(\bm{y}-\bm{\mu})\right)\,{\rm d}\bm{y}=\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}u.

The density generator gcg_{c} in (2.2) leads to different bivariate symmetric distributions, which may contain an extra parameter (or extra parameter vector).


We propose a generalization of the classical Heckman-normal model (Heckman,, 1976) by considering independent errors terms following a BSY distribution with regression structures for the sample selection bias (0<ρ<10<\rho<1) and dispersion (σ>0\sigma>0) parameters:

(YiUi)BSY(𝝁=(μ1iμ2i),𝚺=(σi2ρiσiρiσi1),gc),i=1,,n.\displaystyle{Y^{*}_{i}\choose U^{*}_{i}}\sim\text{BSY}\Biggl{(}\bm{\mu}={\mu_{1i}\choose\mu_{2i}},\bm{\Sigma}=\begin{pmatrix}\sigma_{i}^{2}&\rho_{i}\sigma_{i}\\ \rho_{i}\sigma_{i}&1\end{pmatrix},g_{c}\Biggr{)},\quad i=1,\ldots,n. (2.3)

In the above equation μ1i,μ2i,σi\mu_{1i},\mu_{2i},\sigma_{i} and ρi\rho_{i} are are the mean, dispersion and correlation parameters, respectively, with the following regression structure g1(μ1i)=𝒙i𝜷g_{1}(\mu_{1i})=\bm{x}_{i}^{\top}\bm{\beta}, g2(μ2i)=𝒘i𝜸g_{2}(\mu_{2i})=\bm{w}_{i}^{\top}\bm{\gamma}, h1(σi)=𝒛i𝝀h_{1}(\sigma_{i})=\bm{z}_{i}^{\top}\bm{\lambda} and h2(ρi)=𝒗i𝜿h_{2}(\rho_{i})=\bm{v}_{i}^{\top}\bm{\kappa}, where 𝜷=(β1,,βk)k\bm{\beta}=(\beta_{1},\ldots,\beta_{k})^{\top}\in\mathbb{R}^{k}, 𝜸=(γ1,,γl)l\bm{\gamma}=(\gamma_{1},\ldots,\gamma_{l})^{\top}\in\mathbb{R}^{l}, 𝝀=(λ1,,λp)p\bm{\lambda}=(\lambda_{1},\ldots,\lambda_{p})^{\top}\in\mathbb{R}^{p} and 𝜿=(κ1,,κq)q\bm{\kappa}=(\kappa_{1},\ldots,\kappa_{q})^{\top}\in\mathbb{R}^{q} are vectors of regression coefficients, 𝒙𝒊=(xi1,,xik){\bm{x_{i}}}=(x_{i1},\ldots,x_{ik})^{\top}, 𝒘𝒊=(wi1,,wil){\bm{w_{i}}}=(w_{i1},\ldots,w_{il})^{\top}, 𝒛𝒊=(zi1,,zip){\bm{z_{i}}}=(z_{i1},\ldots,z_{ip})^{\top} and 𝒗𝒊=(vi1,,viq){\bm{v_{i}}}=(v_{i1},\ldots,v_{iq})^{\top} are the values of kk, ll, pp and qq covariates, and k+l+p+q<nk+l+p+q<n. The links g1(),g2(),h1()g_{1}(\cdot),g_{2}(\cdot),h_{1}(\cdot) and h2()h_{2}(\cdot) are strictly monotone and twice differentiable. The link functions g1:g_{1}:\mathbb{R}\rightarrow\mathbb{R}, g2:g_{2}:\mathbb{R}\rightarrow\mathbb{R}, h1:+h_{1}:\mathbb{R}^{+}\rightarrow\mathbb{R} and h2:[1,1]h_{2}:[-1,1]\rightarrow\mathbb{R} must be strictly monotone, and at least twice differentiable, with g11()g_{1}^{-1}(\cdot), g21()g_{2}^{-1}(\cdot), h11()h_{1}^{-1}(\cdot), and h21()h_{2}^{-1}(\cdot) being the inverse functions of g1()g_{1}(\cdot), g2()g_{2}(\cdot), h1()h_{1}(\cdot), and h2()h_{2}(\cdot), respectively. For g1()g_{1}(\cdot) and g2()g_{2}(\cdot) the most common choice is the identity link, whereas for h1()h_{1}(\cdot) and h2()h_{2}(\cdot) the most common choices are logarithm and arctanh (inverse hyperbolic tangent) links, respectively.

We can agglutinate the information from UiU_{i}^{*} in the following indicator function Ui=𝟙{Ui>0}U_{i}=\mathds{1}_{\{U_{i}^{*}>0\}}.

Let Yi=YiUiY_{i}=Y_{i}^{*}U_{i} be the observed outcome, for i=1,,ni=1,\ldots,n. Only n1n_{1} out of nn observations YiY_{i}^{*} for which Ui>0U_{i}^{*}>0 are observed. This model is known as “Type 2 tobit model” in the econometrics literature. Notice that UiBernoulli((Ui>0))U_{i}\sim{\rm Bernoulli}(\mathbb{P}(U_{i}^{*}>0)). By using law of total probability, for 𝜽=(𝜷,𝜸,σ,ρ)\bm{\theta}=(\bm{\beta}^{\top},\bm{\gamma}^{\top},\sigma,\rho)^{\top}, the random variable YiY_{i} has distribution function

FYi(yi;𝜽)\displaystyle F_{Y_{i}}(y_{i};\bm{\theta}) =(Yiyi|Ui>0)(Ui>0)+(Yiyi|Ui>0)(Ui0)\displaystyle=\mathbb{P}(Y_{i}\leq y_{i}|U_{i}^{*}>0)\mathbb{P}(U_{i}^{*}>0)+\mathbb{P}(Y_{i}\leq y_{i}|U_{i}^{*}>0)\mathbb{P}(U_{i}^{*}\leq 0)
={(Yiyi|Ui>0)(Ui>0),ifyi<0,(Yiyi|Ui>0)(Ui>0)+(Ui0),ifyi0.\displaystyle=\begin{cases}\mathbb{P}(Y_{i}^{*}\leq y_{i}|U_{i}^{*}>0)\mathbb{P}(U_{i}^{*}>0),&\text{if}\,y_{i}<0,\\[5.69046pt] \mathbb{P}(Y_{i}^{*}\leq y_{i}|U_{i}^{*}>0)\mathbb{P}(U_{i}^{*}>0)+\mathbb{P}(U_{i}^{*}\leq 0),&\text{if}\,y_{i}\geq 0.\end{cases}

The function FYiF_{Y_{i}} has only one jump, at yi=0y_{i}=0, and (Yi=0)=(Ui0)\mathbb{P}(Y_{i}=0)=\mathbb{P}(U_{i}^{*}\leq 0). Therefore, YiY_{i} is a random variable that is neither discrete nor absolutely continuous, but a mixture of the two types. In other words,

FYi(yi;𝜽)=(Ui0)Fd(yi)+(Ui>0)Fac(yi),\displaystyle F_{Y_{i}}(y_{i};\bm{\theta})=\mathbb{P}(U_{i}^{*}\leq 0)F_{\rm d}(y_{i})+\mathbb{P}(U_{i}^{*}>0)F_{\rm ac}(y_{i}),

where Fd(yi)=𝟙[0,+)(yi)F_{\rm d}(y_{i})=\mathds{1}_{[0,+\infty)}(y_{i}) and Fac(yi)=(Yiyi|Ui>0)F_{\rm ac}(y_{i})=\mathbb{P}(Y_{i}^{*}\leq y_{i}|U_{i}^{*}>0). Hence, the PDF of YiY_{i} is given by

fYi(yi;𝜽)\displaystyle f_{Y_{i}}(y_{i};\bm{\theta}) =(Ui0)δ0(yi)+(Ui>0)fYi|Ui>0(yi;𝜽)\displaystyle=\mathbb{P}(U_{i}^{*}\leq 0)\delta_{0}(y_{i})+\mathbb{P}(U_{i}^{*}>0)f_{Y_{i}^{*}|U_{i}^{*}>0}(y_{i};\bm{\theta})
=((Ui0))1ui((Ui>0))ui(fYi|Ui>0(yi;𝜽))ui\displaystyle=(\mathbb{P}(U_{i}^{*}\leq 0))^{1-u_{i}}(\mathbb{P}(U_{i}^{*}>0))^{u_{i}}(f_{Y_{i}^{*}|U_{i}^{*}>0}(y_{i};\bm{\theta}))^{u_{i}} (2.4)
=(Ui=ui)(fYi|Ui>0(yi;𝜽))ui,ui=0,1,\displaystyle=\mathbb{P}(U_{i}=u_{i})(f_{Y_{i}^{*}|U_{i}^{*}>0}(y_{i};\bm{\theta}))^{u_{i}},\quad u_{i}=0,1,

wherein (Ui=0)=1(Ui=1)=(Ui0)\mathbb{P}(U_{i}=0)=1-\mathbb{P}(U_{i}=1)=\mathbb{P}(U_{i}^{*}\leq 0) for i=1,,ni=1,\ldots,n, and δ0\delta_{0} is the Dirac delta function. That is, the density of YiY_{i} is composed of a discrete component described by the probit model (Ui=ui)=((Ui0))1ui((Ui>0))ui\mathbb{P}(U_{i}=u_{i})=(\mathbb{P}(U_{i}^{*}\leq 0))^{1-u_{i}}(\mathbb{P}(U_{i}^{*}>0))^{u_{i}}, for ui=0,1u_{i}=0,1, and a continuous part given by the conditional PDF fYi|Ui>0(yi;𝜽)f_{Y_{i}^{*}|U_{i}^{*}>0}(y_{i};\bm{\theta}).

2.1 Finding the conditional density of Yi{Y}^{*}_{i}, given that Ui>0{U}^{*}_{i}>0

In the context of sample selection models, the interest lies in finding the PDF of Yi|Ui>0{Y}^{*}_{i}|{U}^{*}_{i}>0 given that (Yi,Ui)({Y}^{*}_{i},{U}^{*}_{i})^{\top} follows the BSY distribution (2.3); see Theorem 2.5.

Before stating and proving the main result (Theorem 2.5) of this section, throughout the paper we will adopt the following notations:

Gi(x)=xfZ2i|Z1i(wi|yiμ1iσi)dwi\displaystyle G_{i}(x)={\displaystyle\int_{-x}^{\infty}}{f_{Z_{2i}|Z_{1i}}\Big{(}w_{i}\,\Big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}}\,{\rm d}w_{i} (2.5)

and

Hi(x)=xfρiZ1i+1ρi2Z2i(ui)dui,\displaystyle H_{i}(x)=\int_{-x}^{\infty}f_{\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}}(u_{i})\,{\rm d}u_{i}, (2.6)

where

Z1i=RDV1iandZ2i=R1D2V2i\displaystyle Z_{1i}=RDV_{1i}\quad\text{and}\quad Z_{2i}=R\sqrt[\ ]{1-D^{2}}V_{2i} (2.7)

have the joint PDF fZ1i,Z2if_{Z_{1i},Z_{2i}} and fXf_{X} denotes the PDF corresponding to a random variable XX. Here, the random variables V1iV_{1i}, V2iV_{2i}, RR, and DD are mutually independent and (Vki=1)=(Vki=1)=1/2\mathbb{P}(V_{ki}=-1)=\mathbb{P}(V_{ki}=1)=1/2, k=1,2k=1,2. The random variable DD in (2.7) is positive and has PDF

fD(d)=2π1d2,d(0,1).\displaystyle f_{D}(d)={2\over\pi\sqrt[\ ]{1-d^{2}}},\quad d\in(0,1).

The random variable RR in (2.7) is positive and is called the generator of the random vector (Yi,Ui)(Y_{i}^{*},U_{i}^{*})^{\top}. Moreover, RR has PDF given by

fR(r)=2rgc(r2)0gc(u)du,r>0,\displaystyle f_{R}(r)={2rg_{c}(r^{2})\over\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}},\quad r>0,

where gcg_{c} is the density generator in (2.2).

Proposition 2.1.

Let us denote S=RDS=RD and T=R1D2T=R\sqrt[\ ]{1-D^{2}}.

  1. 1.

    The PDFs of SS and TT are given by

    fS(v)=fT(v)=v4gc(w2)1v2w2dwπ0gc(u)du,v>0.\displaystyle f_{S}(v)=f_{T}(v)={\int_{v}^{\infty}{4g_{c}(w^{2})\over\sqrt[\ ]{1-{v^{2}\over w^{2}}}}\,{\rm d}w\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}}\,,\quad v>0. (2.8)
  2. 2.

    The PDFs of Z1iZ_{1i} and Z2iZ_{2i} are

    fZ1i(z)=fZ2i(z)=z2gc(w2)1z2w2dwπ0gc(u)du,<z<.\displaystyle f_{Z_{1i}}(z)=f_{Z_{2i}}(z)={\int_{z}^{\infty}{2g_{c}(w^{2})\over\sqrt[\ ]{1-{z^{2}\over w^{2}}}}\,{\rm d}w\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}}\,,\quad-\infty<z<\infty.
Proof.

Using the known formula for the PDF of product of two random variables XX and YY:

fXY(u)=1|x|fX,Y(x,ux)dx\displaystyle f_{XY}(u)=\int_{-\infty}^{\infty}{1\over|x|}\,f_{X,Y}\Big{(}x,{u\over x}\Big{)}\,{\rm d}x

the proof of (2.8) follows.

The proof of second item follows by combining the law of total probability with (2.8). ∎

Proposition 2.2.

The random vector (Z1i,Z2i)(Z_{1i},Z_{2i})^{\top} is jointly symmetric about (0,0)(0,0). That is, fZ1i,Z2i(x,y)=fZ1i,Z2i(x,y)=fZ1i,Z2i(x,y)=fZ1i,Z2i(x,y)f_{Z_{1i},Z_{2i}}(x,y)=f_{Z_{1i},Z_{2i}}(x,-y)=f_{Z_{1i},Z_{2i}}(-x,y)=f_{Z_{1i},Z_{2i}}(-x,-y).

Proof.

Let S=RDS=RD and T=R1D2T=R\sqrt[\ ]{1-D^{2}}. Since RR and DD are mutually independent, by using change of variables (Jacobian Method), the joint density of SS and TT is

fS,T(s,t)=4gc(s2+t2)π0gc(u)du,s,t>0.\displaystyle f_{S,T}(s,t)={4g_{c}(s^{2}+t^{2})\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}},\quad s,t>0. (2.9)

Moreover, since V1i=dV2iBernoulli(1/2)V_{1i}\stackrel{{\scriptstyle d}}{{=}}V_{2i}\sim{\rm Bernoulli}(1/2) are mutually independent, by law of total probability we get

FZ1i,Z2i(x,y)=14{FS,T(x,y)𝟙[0,)×[0,)(x,y)+(Sx,Ty)𝟙(,0)×(,0)(x,y)+(Sx,Ty)𝟙(0,)×(,0)(x,y)+(Sx,Ty)𝟙(,0)×(0,)(x,y)},F_{Z_{1i},Z_{2i}}(x,y)={1\over 4}\,\big{\{}F_{S,T}(x,y)\mathds{1}_{[0,\infty)\times[0,\infty)}(x,y)+\mathbb{P}(S\geqslant-x,T\geqslant-y)\mathds{1}_{(-\infty,0)\times(-\infty,0)}(x,y)\\[5.69046pt] +\mathbb{P}(S\leqslant x,T\geqslant-y)\mathds{1}_{(0,\infty)\times(-\infty,0)}(x,y)+\mathbb{P}(S\geqslant-x,T\leqslant y)\mathds{1}_{(-\infty,0)\times(0,\infty)}(x,y)\big{\}}, (2.10)

where FX,Y(,)F_{X,Y}(\cdot,\cdot) denotes the (joint) distribution function of (X,Y)(X,Y)^{\top}. Using the following well-known identity:

(a1<Xb1,a2<Yb2)=FX,Y(b1,b2)FX,Y(b1,a2)FX,Y(a1,b2)+FX,Y(a1,a2),\displaystyle\mathbb{P}(a_{1}<X\leqslant b_{1},a_{2}<Y\leqslant b_{2})=F_{X,Y}(b_{1},b_{2})-F_{X,Y}(b_{1},a_{2})-F_{X,Y}(a_{1},b_{2})+F_{X,Y}(a_{1},a_{2}),

we have

(Sx,Ty)=FS,T(,)FS,T(,y)FS,T(x,)+FS,T(x,y),x<0,y<0;\displaystyle\mathbb{P}(S\geqslant-x,T\geqslant-y)=F_{S,T}(\infty,\infty)-F_{S,T}(\infty,-y)-F_{S,T}(-x,\infty)+F_{S,T}(-x,-y),\ x<0,y<0;
(Sx,Ty)=FS,T(x,)FS,T(x,y)FS,T(0,)+FS,T(0,y),x>0,y<0;\displaystyle\mathbb{P}(S\leqslant x,T\geqslant-y)=F_{S,T}(x,\infty)-F_{S,T}(x,-y)-F_{S,T}(0,\infty)+F_{S,T}(0,-y),\quad x>0,y<0;
(Sx,Ty)=FS,T(,y)FS,T(,0)FS,T(x,y)+FS,T(x,0),x<0,y>0.\displaystyle\mathbb{P}(S\geqslant-x,T\leqslant y)=F_{S,T}(\infty,y)-F_{S,T}(\infty,0)-F_{S,T}(-x,y)+F_{S,T}(-x,0),\quad x<0,y>0.

By replacing the last three identities in (2.10) and then by differentiating FZ1i,Z2i(x,y)F_{Z_{1i},Z_{2i}}(x,y) with respect to xx and yy, we obtain

fZ1i,Z2i(x,y)\displaystyle f_{Z_{1i},Z_{2i}}(x,y) =14{fS,T(x,y)𝟙(0,)×(0,)(x,y)+fS,T(x,y)𝟙(,0)×(,0)(x,y)\displaystyle={1\over 4}\,\big{\{}f_{S,T}(x,y)\mathds{1}_{(0,\infty)\times(0,\infty)}(x,y)+f_{S,T}(-x,-y)\mathds{1}_{(-\infty,0)\times(-\infty,0)}(x,y)
+fS,T(x,y)𝟙(0,)×(,0)(x,y)+fS,T(x,y)𝟙(,0)×(0,)(x,y)}\displaystyle\quad+f_{S,T}(x,-y)\mathds{1}_{(0,\infty)\times(-\infty,0)}(x,y)+f_{S,T}(-x,y)\mathds{1}_{(-\infty,0)\times(0,\infty)}(x,y)\big{\}}
=14fS,T(x,y)𝟙{0}×{0}(x,y)=gc(x2+y2)π0gc(u)du𝟙{0}×{0}(x,y),\displaystyle={1\over 4}\,f_{S,T}(x,y)\mathds{1}_{\mathbb{R}\setminus\{0\}\times\mathbb{R}\setminus\{0\}}(x,y)={g_{c}(x^{2}+y^{2})\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}}\mathds{1}_{\mathbb{R}\setminus\{0\}\times\mathbb{R}\setminus\{0\}}(x,y),

where in the last line we used the Equation (2.9). Note that the function fZ1i,Z2if_{Z_{1i},Z_{2i}} above is not defined on the abscise or ordinate axes (neither at the origin), but this is not relevant because these events have a null probability measure. Therefore, we can say that

fZ1i,Z2i(x,y)=gc(x2+y2)π0gc(u)du,<x,y<.\displaystyle f_{Z_{1i},Z_{2i}}(x,y)={g_{c}(x^{2}+y^{2})\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}},\quad-\infty<x,y<\infty. (2.11)

From (2.11) it is clear that the vector (Z1i,Z2i)(Z_{1i},Z_{2i})^{\top} is jointly symmetric about (0,0)(0,0). ∎

Proposition 2.3.
  1. 1.

    The marginal PDFs of Z1iZ_{1i} and Z1iZ_{1i}, denoted by f1if_{1i} and f2if_{2i}, respectively, are given by

    f1i(x)=gc(x2+y2)dyπ0gc(u)du=fZ1i(x),<x<,\displaystyle f_{{1i}}(x)={\int_{-\infty}^{\infty}g_{c}(x^{2}+y^{2})\,{\rm d}y\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}}=f_{Z_{1i}}(x),\quad-\infty<x<\infty,
    f2i(y)=gc(x2+y2)dxπ0gc(u)du=fZ2i(y),<y<,\displaystyle f_{{2i}}(y)={\int_{-\infty}^{\infty}g_{c}(x^{2}+y^{2})\,{\rm d}x\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}}=f_{Z_{2i}}(y),\quad-\infty<y<\infty,

    where fZ1if_{Z_{1i}} and fZ1if_{Z_{1i}} are given in Proposition 2.1.

  2. 2.

    The random vector (Z1i,Z2i)(Z_{1i},Z_{2i})^{\top} is marginally symmetric about (0,0)(0,0). That is, f1i(x)=f1i(x)f_{{1i}}(x)=f_{{1i}}(-x) and f2i(y)=f2i(y)f_{{2i}}(y)=f_{{2i}}(-y).

Proof.

The proof of first item is immediate from the joint density of (Z1i,Z2i)(Z_{1i},Z_{2i}), given in (2.11), and by Proposition 2.1. The proof of the second item follows from the first one. ∎

Proposition 2.4.

The function GiG_{i} in (2.5) satisfies the following identity:

Gi(x)=1Gi(x)=xfZ2i|Z1i(wi|yiμ1iσi)dwi.\displaystyle G_{i}(x)=1-G_{i}(-x)={\displaystyle\int_{-\infty}^{x}}{f_{Z_{2i}|Z_{1i}}\Big{(}w_{i}\,\Big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}}\,{\rm d}w_{i}.

That is, GiG_{i} is the conditional CDF of Z2iZ_{2i}, given that Z1i=(yiμ1i)/σiZ_{1i}=(y_{i}-\mu_{1i})/\sigma_{i}. Consequently, GiG_{i} is a distribution symmetric about 0.

Proof.

The proof immediately follows by applying Proposition 2.2. ∎

We now proceed to establish the main result of this section.

Theorem 2.5.

If (Yi,Ui)BSY(𝝁,𝚺,gc)({Y}^{*}_{i},{U}^{*}_{i})^{\top}\sim{\rm BSY}(\bm{\mu},\bm{\Sigma},g_{c}) then the PDF of Yi|Ui>0{Y}^{*}_{i}|{U}^{*}_{i}>0 is given by

fYi|Ui>0(yi;𝜽)=1σifZ1i(yiμ1iσi)Gi(11ρi2μ2i+ρi1ρi2(yiμ1iσi))Hi(μ2i),\displaystyle f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\bm{\theta})={1\over\sigma_{i}}\,f_{Z_{1i}}\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}\,{G_{i}\Big{(}{1\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,\mu_{2i}+{\rho_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,({y_{i}-\mu_{1i}\over\sigma_{i}})\Big{)}\over H_{i}(\mu_{2i})}, (2.12)

where GiG_{i}, HiH_{i} and Z1iZ_{1i} are given in Proposition 2.4, (2.6) and (2.7), respectively.

As a by-product of the proof we get that (Ui>0)=Hi(μ2i)\mathbb{P}(U_{i}^{*}>0)=H_{i}(\mu_{2i}).

Proof.

Since the random vector (Yi,Ui)(Y_{i}^{*},U_{i}^{*})^{\top} follows the BSY distribution (2.3), this one admits the stochastic representation (Abdous,, 2005)

Yi=σiZ1i+μ1i,Ui=ρiZ1i+1ρi2Z2i+μ2i,\displaystyle\begin{array}[]{llcc}&Y_{i}^{*}=\sigma_{i}Z_{1i}+\mu_{1i},\\[11.38092pt] &U_{i}^{*}=\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}+\mu_{2i},\end{array} (2.15)

where Z1iZ_{1i} and Z2iZ_{2i} are as in (2.7). If Yi=yiY_{i}^{*}=y_{i} then Z1i=(yiμ1i)/σiZ_{1i}=(y_{i}-\mu_{1i})/\sigma_{i}. So, the conditional distribution of UiU_{i}^{*}, given that Yi=yiY_{i}^{*}=y_{i} is the same as the distribution of

ρi(yiμ1iσi)+1ρi2Z2i+μ2i|Yi=yi.\displaystyle\rho_{i}\,\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}+\mu_{2i}\ \Big{|}\ Y_{i}^{*}=y_{i}.

Consequently, the PDF of UiU_{i}^{*} given that Yi=yiY_{i}^{*}=y_{i} is given by

fUi|Yi(ui|yi)=fZ1i,Z2i(yiμ1iσi,11ρi2(uiμ2i)ρi1ρi2(yiμ1iσi))1ρi2fZ1i(yiμ1iσi).\displaystyle f_{U_{i}^{*}|Y_{i}^{*}}(u_{i}|y_{i})=\dfrac{f_{Z_{1i},\,Z_{2i}}\Big{(}{y_{i}-\mu_{1i}\over\sigma_{i}},{1\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,(u_{i}-\mu_{2i})-{\rho_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,({y_{i}-\mu_{1i}\over\sigma_{i}})\Big{)}}{\sqrt[\ ]{1-\rho_{i}^{2}}\,f_{Z_{1i}}({y_{i}-\mu_{1i}\over\sigma_{i}})}. (2.16)

Further,

fYi(yi)=1σifZ1i(yiμ1iσi).\displaystyle f_{Y_{i}^{*}}(y_{i})={1\over\sigma_{i}}\,f_{Z_{1i}}\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}. (2.17)

By using the identity

fYi|Ui>0(yi;𝜽)=fYi(yi)0fUi|Yi(ui|yi)dui(Ui>0)=fYi(yi)0fUi|Yi(ui|yi)dui0fUi(ui)dui,f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\bm{\theta})=f_{Y_{i}^{*}}(y_{i})\,\dfrac{\int_{0}^{\infty}f_{U_{i}^{*}|Y_{i}^{*}}(u_{i}|y_{i})\,{\rm d}u_{i}}{\mathbb{P}(U_{i}^{*}>0)}=f_{Y_{i}^{*}}(y_{i})\,\dfrac{\int_{0}^{\infty}f_{U_{i}^{*}|Y_{i}^{*}}(u_{i}|y_{i})\,{\rm d}u_{i}}{\int_{0}^{\infty}f_{U_{i}^{*}}(u_{i})\,{\rm d}u_{i}},

and by employing identities (2.16) and (2.17), we get

fYi|Ui>0(yi;𝜽)\displaystyle f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\bm{\theta}) =1σifZ1i(yiμ1iσi)0fZ1i,Z2i(yiμ1iσi,11ρi2(uiμ2i)ρi1ρi2(yiμ1iσi))1ρi2fZ1i(yiμ1iσi)duiμ2ifρiZ1i+1ρi2Z2i(ui)dui\displaystyle={1\over\sigma_{i}}\,f_{Z_{1i}}\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}\,\dfrac{\displaystyle\int_{0}^{\infty}\frac{f_{Z_{1i},\,Z_{2i}}\Big{(}{y_{i}-\mu_{1i}\over\sigma_{i}},{1\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,(u_{i}-\mu_{2i})-{\rho_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,({y_{i}-\mu_{1i}\over\sigma_{i}})\Big{)}}{\sqrt[\ ]{1-\rho_{i}^{2}}\,f_{Z_{1i}}({y_{i}-\mu_{1i}\over\sigma_{i}})}\,{\rm d}u_{i}}{\displaystyle\int_{-\mu_{2i}}^{\infty}f_{\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}}(u_{i})\,{\rm d}u_{i}}
=1σifZ1i(yiμ1iσi)11ρi2μ2iρi1ρi2(yiμ1iσi)fZ2i|Z1i(wi|yiμ1iσi)dwiμ2ifρiZ1i+1ρi2Z2i(ui)dui,\displaystyle={1\over\sigma_{i}}\,f_{Z_{1i}}\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}\,\dfrac{{\displaystyle\int_{-{1\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,\mu_{2i}-{\rho_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,({y_{i}-\mu_{1i}\over\sigma_{i}})}^{\infty}}{f_{Z_{2i}|Z_{1i}}(w_{i}|{y_{i}-\mu_{1i}\over\sigma_{i}})}\,{\rm d}w_{i}}{\displaystyle\int_{-\mu_{2i}}^{\infty}f_{\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}}(u_{i})\,{\rm d}u_{i}},

where in the last line a change of variables was used. Finally, by using the notations of GiG_{i} and HiH_{i} given in (2.5) and (2.6), respectively, from the above identity and from Proposition 2.4 the proof follows. ∎

Corollary 2.6 (Gaussian density generator).

If (Yi,Ui)BSY(𝝁,𝚺,gc)({Y}^{*}_{i},{U}^{*}_{i})^{\top}\sim{\rm BSY}(\bm{\mu},\bm{\Sigma},g_{c}), where gc(x)=exp(x/2)g_{c}(x)=\exp(-x/2) is the density generator of the bivariate normal distribution, then the PDF of Yi|Ui>0{Y}^{*}_{i}|{U}^{*}_{i}>0 is given by

fYi|Ui>0(yi;𝜽)=1σiϕ(yiμ1iσi)Φ(11ρi2μ2i+ρi1ρi2(yiμ1iσi))Φ(μ2i),\displaystyle f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\bm{\theta})={1\over\sigma_{i}}\,\phi\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}\,{\Phi\Big{(}{1\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,\mu_{2i}+{\rho_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,({y_{i}-\mu_{1i}\over\sigma_{i}})\Big{)}\over\Phi(\mu_{2i})},

wherein ϕ\phi and Φ\Phi denote the PDF and CDF of the standard normal distribution, respectively.

Proof.

If (Yi,Ui)({Y}^{*}_{i},{U}^{*}_{i})^{\top} follows the bivariate normal distribution then there exist independent standard normal random variables Z1iZ_{1i} and Z2iZ_{2i} such that a stochastic representation of type (2.15) is satisfied. Therefore, Z2i|Z1i=zZ_{2i}|Z_{1i}=z and ρiZ1i+1ρi2Z2i\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i} are distributed according to the standard normal distribution. Hence, GiG_{i} and HiH_{i}, given in Proposition 2.4 and Item (2.6), respectively, are written as:

Gi(x)=xfZ2i|Z1i(wi|yiμ1iσi)dwi=(Z2ix|Z1i=yiμ1iσi)=Φ(x)\displaystyle G_{i}(x)={\displaystyle\int_{-\infty}^{x}}{f_{Z_{2i}|Z_{1i}}\Big{(}w_{i}\,\Big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}}\,{\rm d}w_{i}=\mathbb{P}\Big{(}Z_{2i}\leqslant-x\,\Big{|}\,Z_{1i}={y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}=\Phi(x)

and

Hi(x)=xfρiZ1i+1ρi2Z2i(ui)dui=(ρiZ1i+1ρi2Z2i>x)=Φ(x).\displaystyle H_{i}(x)=\int_{-x}^{\infty}f_{\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}}(u_{i})\,{\rm d}u_{i}=\mathbb{P}\big{(}\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}>-x\big{)}=\Phi(x).

Applying Theorem 2.5 the proof concludes. ∎

Remark 2.7.

It is well-known that, if (X1,X2)(X_{1},X_{2})^{\top} is distributed from a bivariate Student-tt distribution, notation (X1,X2)tν(𝝁,𝚺)(X_{1},X_{2})^{\top}\sim t_{\nu}(\bm{\mu},\bm{\Sigma}), where 𝝁=(μ1,μ2)2\bm{\mu}=(\mu_{1},\mu_{2})\in\mathbb{R}^{2} and 𝚺\bm{\Sigma} is as in (2.1), then both the marginal and the conditional distributions of X2X_{2} given X1X_{1} are also univariate Student’s tt distributions: X1tν(μ1,σ1)X_{1}\sim t_{\nu}(\mu_{1},\sigma_{1}) and

X2|X1=x1tν+1(μ2+ρσ2(x1μ1σ1),ν+(x1μ1σ1)2ν+1σ22(1ρ2)).\displaystyle X_{2}|X_{1}=x_{1}\sim t_{\nu+1}\biggl{(}\mu_{2}+\rho\sigma_{2}\Big{(}{x_{1}-\mu_{1}\over\sigma_{1}}\Big{)},{\nu+({x_{1}-\mu_{1}\over\sigma_{1}})^{2}\over\nu+1}\,\sigma_{2}^{2}(1-\rho^{2})\biggr{)}.

The above statement is equivalent to

ν+1(ν+r2)(1ρ2)(X2μ2σ2ρr)|X1μ1σ1=rtν+1(0,1)tν+1.\displaystyle\sqrt[\ ]{\nu+1\over(\nu+r^{2})(1-\rho^{2})}\,\Big{(}{X_{2}-\mu_{2}\over\sigma_{2}}-\rho r\Big{)}\,\bigg{|}{X_{1}-\mu_{1}\over\sigma_{1}}=r\sim t_{\nu+1}(0,1)\equiv t_{\nu+1}.
Corollary 2.8 (Student-tt density generator).

If (Yi,Ui)BSY(𝝁,𝚺,gc)({Y}^{*}_{i},{U}^{*}_{i})^{\top}\sim{\rm BSY}(\bm{\mu},\bm{\Sigma},g_{c}), where gc(x)=(1+x/ν)(ν+2)/2g_{c}(x)=(1+x/\nu)^{-(\nu+2)/2} is the density generator of the bivariate Student-tt distribution with ν\nu degrees of freedom, then the PDF of Yi|Ui>0{Y}^{*}_{i}|{U}^{*}_{i}>0 is given by

fYi|Ui>0(yi;𝜽)=1σifν(yiμ1iσi)Fν+1(ν+1ν+(yiμ1iσi)2[11ρi2μ2i+ρi1ρi2(yiμ1iσi)])Fν(μ2i),\displaystyle f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\bm{\theta})={1\over\sigma_{i}}\,f_{\nu}\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}\,{F_{\nu+1}\Big{(}\sqrt[\ ]{{\nu+1\over\nu+({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}}}\ \big{[}{1\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,\mu_{2i}+{\rho_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,({y_{i}-\mu_{1i}\over\sigma_{i}})\big{]}\Big{)}\over F_{\nu}(\mu_{2i})},

wherein fνf_{\nu} and FνF_{\nu} denote the PDF and CDF of a classic Student-tt distribution with ν\nu degrees of freedom, respectively.

Proof.

It is well-known that the vector (Yi,Ui)({Y}^{*}_{i},{U}^{*}_{i})^{\top} following a bivariate Student-tt distribution has the stochastic representation; see Subsection 9.2.6, p. 354 of Balakrishnan and Lai, (2009):

Yi=σiZ1i+μ1i,Ui=ρiZ1i+1ρi2Z2i+μ2i,\displaystyle\begin{array}[]{llcc}&Y_{i}^{*}=\sigma_{i}Z_{1i}+\mu_{1i},\\[11.38092pt] &U_{i}^{*}=\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}+\mu_{2i},\end{array} (2.20)

where Z1i=νZ~1i/QZ_{1i}=\sqrt[\ ]{\nu}\widetilde{Z}_{1i}/\sqrt[\ ]{Q} and Z2i=νZ~2i/QZ_{2i}=\sqrt[\ ]{\nu}\widetilde{Z}_{2i}/\sqrt[\ ]{Q} are Student-tt random variables, wherein Qχν2Q\sim\chi^{2}_{\nu} (chi-square with ν\nu degrees of freedom) is independent of Z~1i\widetilde{Z}_{1i} and ρiZ~1i+1ρi2Z~2i\rho_{i}\widetilde{Z}_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,\widetilde{Z}_{2i}, and Z~1i=dZ~2iN(0,1)\widetilde{Z}_{1i}\stackrel{{\scriptstyle d}}{{=}}\widetilde{Z}_{2i}\sim N(0,1) are independent.

When (Yiμ1i)/σi=r({Y}^{*}_{i}-\mu_{1i})/\sigma_{i}=r, from Remark 2.7 we have

ν+1(ν+r2)(1ρi2)(Uiμ2iρir)|Yiμ1iσi=rtν+1.\displaystyle\sqrt[\ ]{\nu+1\over(\nu+r^{2})(1-\rho_{i}^{2})}\,({U}^{*}_{i}-\mu_{2i}-\rho_{i}r)\,\bigg{|}{Y_{i}^{*}-\mu_{1i}\over\sigma_{i}}=r\sim t_{\nu+1}.

Hence,

Fν+1(w)=(ν+1(ν+r2)(1ρi2)(Uiμ2iρir)w|Yiμ1iσi=r),<w<.\displaystyle F_{\nu+1}(w)=\mathbb{P}\left(\sqrt[\ ]{{\nu+1\over(\nu+r^{2})(1-\rho_{i}^{2})}}\,({U}^{*}_{i}-\mu_{2i}-\rho_{i}r)\leqslant w\,\Bigg{|}\,{{Y}^{*}_{i}-\mu_{1i}\over\sigma_{i}}=r\right),\quad-\infty<w<\infty.

By using the representation (2.20), a simple algebraic manipulation shows that the right-hand of the above probability is

=(Z2iwν+1ν+r2|Z1i=r).\displaystyle=\mathbb{P}\left(Z_{2i}\leqslant{w\over\sqrt[\ ]{{\nu+1\over\nu+r^{2}}}}\,\Bigg{|}\,Z_{1i}=r\right).

Setting w=(ν+1)/(ν+r2)xw=-\sqrt[\ ]{{(\nu+1)/(\nu+r^{2})}}x and r=(yiμ1i)/σir={(y_{i}-\mu_{1i})/\sigma_{i}} we obtain

(Z2ix|Z1i=yiμ1iσi)=Fν+1(ν+1ν+(yiμ1iσi)2x).\displaystyle\mathbb{P}\Big{(}Z_{2i}\leqslant-x\,\Big{|}\,Z_{1i}={y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}=F_{\nu+1}\left(\sqrt[\ ]{{\nu+1\over\nu+({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}}}\ x\right).

Therefore, the function GiG_{i}, given in Proposition 2.4, is

Gi(x)=xfZ2i|Z1i(wi|yiμ1iσi)dwi\displaystyle G_{i}(x)={\displaystyle\int_{-\infty}^{x}}{f_{Z_{2i}|Z_{1i}}\Big{(}w_{i}\,\Big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}}\,{\rm d}w_{i} =(Z2ix|Z1i=yiμ1iσi)\displaystyle=\mathbb{P}\Big{(}Z_{2i}\leqslant-x\,\Big{|}\,Z_{1i}={y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}
=Fν+1(ν+1ν+(yiμ1iσi)2x).\displaystyle=F_{\nu+1}\left(\sqrt[\ ]{{\nu+1\over\nu+({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}}}\ x\right).

On the other hand, note that ρiZ1i+1ρi2Z2i=ν(ρiZ~1i+1ρi2Z~2i)/Q\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}=\sqrt[\ ]{\nu}(\rho_{i}\widetilde{Z}_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,\widetilde{Z}_{2i})/\sqrt[\ ]{Q} is distributed according to the a Student-tt distribution with ν\nu degrees of freedom because ρiZ~1i+1ρi2Z~2iN(0,1)\rho_{i}\widetilde{Z}_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,\widetilde{Z}_{2i}\sim N(0,1). Then

Hi(x)=xfρiZ1i+1ρi2Z2i(ui)dui=(ρiZ1i+1ρi2Z2i>x)=Fν(x).\displaystyle H_{i}(x)=\int_{-x}^{\infty}f_{\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}}(u_{i})\,{\rm d}u_{i}=\mathbb{P}\big{(}\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}>-x\big{)}=F_{\nu}(x).

Applying Theorem 2.5 the proof follows. ∎

Given the density generator gcg_{c} we can directly determine the PDF of Yi|Ui>0{Y}^{*}_{i}|{U}^{*}_{i}>0 as follows.

Corollary 2.9.

If (Yi,Ui)BSY(𝝁,𝚺,gc)({Y}^{*}_{i},{U}^{*}_{i})^{\top}\sim{\rm BSY}(\bm{\mu},\bm{\Sigma},g_{c}) then the PDF of Yi|Ui>0{Y}^{*}_{i}|{U}^{*}_{i}>0 is given by

fYi|Ui>0(yi;𝜽)=ρi1ρi2σizigc((yiμ1iσi)2+wi2)dwiμ2i{gc((xρi)2+(uix1ρi2)2)dx}dui,\displaystyle f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\bm{\theta})={\rho_{i}\sqrt[\ ]{1-\rho_{i}^{2}}\over\sigma_{i}}\,{{{\int_{-\infty}^{z_{i}}g_{c}(({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}+w_{i}^{2})\,{\rm d}w_{i}}}\over\int_{-\mu_{2i}}^{\infty}\big{\{}\int_{-\infty}^{\infty}{g_{c}\big{(}({x\over\rho_{i}})^{2}+\big{(}{u_{i}-x\over\sqrt[\ ]{1-\rho_{i}^{2}}}\big{)}^{2}\big{)}}\,{\rm d}x\big{\}}\,{\rm d}u_{i}},

where zi=μ2i/1ρi2+ρi(yiμ1iσi)/1ρi2z_{i}=\mu_{2i}/\sqrt[\ ]{1-\rho_{i}^{2}}+{\rho_{i}}\,({y_{i}-\mu_{1i}\over\sigma_{i}})/\sqrt[\ ]{1-\rho_{i}^{2}}.

Proof.

By using the known formula for the PDF of sum of two random variables XX and YY:

fX+Y(z)=fX,Y(x,zx)dx,\displaystyle f_{X+Y}(z)=\int_{-\infty}^{\infty}f_{X,Y}(x,z-x)\,{\rm d}x,

we have that HiH_{i}, defined in (2.6), can be written as

Hi(x)=1ρi1ρi2x{gc((ζρi)2+(uiζ1ρi2)2)dζ}duiπ0gc(u)du.\displaystyle H_{i}(x)={1\over\rho_{i}\sqrt[\ ]{1-\rho_{i}^{2}}}{\int_{-x}^{\infty}\big{\{}\int_{-\infty}^{\infty}{g_{c}\big{(}({\zeta\over\rho_{i}})^{2}+\big{(}{u_{i}-\zeta\over\sqrt[\ ]{1-\rho_{i}^{2}}}\big{)}^{2}\big{)}}\,{\rm d}\zeta\big{\}}\,{\rm d}u_{i}\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}}.

Proposition 2.3 provides

fZ1i(yiμ1iσi)=gc((yiμ1iσi)2+y2)dyπ0gc(u)du.\displaystyle f_{Z_{1i}}\Big{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\Big{)}={\int_{-\infty}^{\infty}g_{c}(({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}+y^{2})\,{\rm d}y\over\pi\int_{0}^{\infty}g_{c}(u)\,{\rm d}{u}}. (2.21)

By combining Proposition 2.4, Equations (2.11) and (2.21), we have

Gi(x)=xfZ1i,Z2i(yiμ1iσi,wi)dwifZ1i(yiμ1iσi)=xgc((yiμ1iσi)2+wi2)dwigc((yiμ1iσi)2+y2)dy.\displaystyle G_{i}(x)={\int_{-\infty}^{x}{f_{Z_{1i},Z_{2i}}({y_{i}-\mu_{1i}\over\sigma_{i}},w_{i})}\,{\rm d}w_{i}\over f_{Z_{1i}}({y_{i}-\mu_{1i}\over\sigma_{i}})}={\int_{-\infty}^{x}g_{c}(({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}+w_{i}^{2})\,{\rm d}w_{i}\over{\int_{-\infty}^{\infty}g_{c}(({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}+y^{2})\,{\rm d}y}}.

Substituting the above formulas of HiH_{i}, fZ1if_{Z_{1i}} and GiG_{i} into Theorem 2.5, we complete the proof. ∎

2.2 Maximum likelihood estimation

By combining Equation (2) with Theorem 2.5, the following formula for the PDF of YiY_{i} is valid:

fYi(yi;𝜽)\displaystyle f_{Y_{i}}(y_{i};\bm{\theta}) =(1Hi(μ2i))1ui(Hi(μ2i))ui[1σifZ1i(yiμ1iσi)Gi(τi+αi(yiμ1iσi))Hi(μ2i)]ui,\displaystyle=(1-H_{i}(\mu_{2i}))^{1-u_{i}}(H_{i}(\mu_{2i}))^{u_{i}}\Biggl{[}{1\over\sigma_{i}}\,f_{Z_{1i}}\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}\,{G_{i}\big{(}\tau_{i}+\alpha_{i}({y_{i}-\mu_{1i}\over\sigma_{i}})\big{)}\over H_{i}(\mu_{2i})}\Biggr{]}^{u_{i}},

where αi=ρi/1ρi2\alpha_{i}=\rho_{i}/\sqrt[\ ]{1-\rho_{i}^{2}}, τi=μ2i/1ρi2\tau_{i}=\mu_{2i}/\sqrt[\ ]{1-\rho_{i}^{2}}, ui=1u_{i}=1 if ui>0u_{i}^{*}>0 and ui=0u_{i}=0 otherwise, g1(μ1i)=𝒙i𝜷g_{1}(\mu_{1i})=\bm{x}_{i}^{\top}\bm{\beta}, g2(μ2i)=𝒘i𝜸g_{2}(\mu_{2i})=\bm{w}_{i}^{\top}\bm{\gamma}, h1(σi)=𝒛i𝝀h_{1}(\sigma_{i})=\bm{z}_{i}^{\top}\bm{\lambda} and h2(ρi)=𝒗i𝜿h_{2}(\rho_{i})=\bm{v}_{i}^{\top}\bm{\kappa}.

The log-likelihood of the symmetric generalized Heckman model for 𝜽=(𝜷,𝜸,𝝀,𝜿)\bm{\theta}=(\bm{\beta}^{\top},\bm{\gamma}^{\top},\bm{\lambda}^{\top},\bm{\kappa}^{\top})^{\top} is given by

(𝜽)\displaystyle\ell(\bm{\theta}) =i=1nlogfYi(yi;𝜽)\displaystyle=\sum_{i=1}^{n}\log f_{Y_{i}}(y_{i};\bm{\theta})
=i=1nui[log(σi)+logfZ1i(yiμ1iσi)+logGi(τi+αi(yiμ1iσi))]\displaystyle=\sum_{i=1}^{n}u_{i}\biggl{[}-\log(\sigma_{i})+\log f_{Z_{1i}}\left(\frac{y_{i}-\mu_{1i}}{\sigma_{i}}\right)+\log{G_{i}\left(\tau_{i}+\alpha_{i}\left(\frac{y_{i}-\mu_{1i}}{\sigma_{i}}\right)\right)}\biggr{]}
+i=1n(1ui)log(1Hi(μ2i)).\displaystyle+\sum_{i=1}^{n}(1-u_{i})\log(1-H_{i}(\mu_{2i})). (2.22)

To obtain the ML estimate of 𝜽\bm{\theta}, we maximize the log-likelihood function (2.2) by equating the score vector ˙(𝜽)\dot{\ell}(\bm{\theta}) to zero, providing the likelihood equations. They are solved by means of an iterative procedure for non-linear optimization, such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method.

The likelihood equations are given by

0=(𝜽)βj\displaystyle 0={\partial\ell(\bm{\theta})\over\partial\beta_{j}} =i=1nuiσi[fZ1i(yiμ1iσi)fZ1i(yiμ1iσi)αifZ2i|Z1i(αi(yiμ1iσi)+τi|yiμ1iσi)Gi(αi(yiμ1iσi)+τi)]μ1iβj;\displaystyle=\sum_{i=1}^{n}{u_{i}\over\sigma_{i}}\Biggl{[}-{f^{\prime}_{Z_{1i}}(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})\over f_{Z_{1i}}(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})}-\frac{\alpha_{i}f_{Z_{2i}|Z_{1i}}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\,\big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\big{)}}{G_{i}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\big{)}}\Biggr{]}{\partial\mu_{1i}\over\partial\beta_{j}};
0=(𝜽)γr\displaystyle 0={\partial\ell(\bm{\theta})\over\partial\gamma_{r}} =i=1n[ui1ρi2fZ2i|Z1i(αi(yiμ1iσi)+τi|yiμ1iσi)Gi(αi(yiμ1iσi)+τi)+fρiZ1i+1ρi2Z2i(μ2i)1Hi(μ2i)]μ2iγr;\displaystyle=\sum_{i=1}^{n}\left[{u_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,{f_{Z_{2i}|Z_{1i}}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\,\big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\big{)}\over G_{i}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\big{)}}+{f_{\rho_{i}Z_{1i}+\sqrt[\ ]{1-\rho_{i}^{2}}\,Z_{2i}}(-\mu_{2i})\over 1-H_{i}(\mu_{2i})}\right]{\partial\mu_{2i}\over\partial\gamma_{r}};
0=(𝜽)λs\displaystyle 0={\partial\ell(\bm{\theta})\over\partial\lambda_{s}} =i=1nuiσi[1(yiμ1i)σifZ1i(yiμ1iσi)fZ1i(yiμ1iσi)αi(yiμ1i)σifZ2i|Z1i(αi(yiμ1iσi)+τi|yiμ1iσi)Gi(αi(yiμ1iσi)+τi)]σiλs;\displaystyle=\sum_{i=1}^{n}{u_{i}\over\sigma_{i}}\Biggl{[}-1-{(y_{i}-\mu_{1i})\over\sigma_{i}}\,{f^{\prime}_{Z_{1i}}(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})\over f_{Z_{1i}}(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})}-\alpha_{i}\,\frac{(y_{i}-\mu_{1i})}{\sigma_{i}}\,\frac{f_{Z_{2i}|Z_{1i}}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\,\big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\big{)}}{G_{i}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\big{)}}\Biggr{]}{\partial\sigma_{i}\over\partial\lambda_{s}};
0=(𝜽)κm\displaystyle 0={\partial\ell(\bm{\theta})\over\partial\kappa_{m}} =i=1nui1ρi2[(yiμ1iσi)1ρi2μ2iρi]fZ2i|Z1i(αi(yiμ1iσi)+τi|yiμ1iσi)Gi(αi(yiμ1iσi)+τi)ρiκm;\displaystyle=\sum_{i=1}^{n}{u_{i}\over\sqrt[\ ]{1-\rho_{i}^{2}}}\,\Biggl{[}{({y_{i}-\mu_{1i}\over\sigma_{i}})\over 1-\rho_{i}^{2}}-\mu_{2i}\rho_{i}\Biggr{]}\frac{f_{Z_{2i}|Z_{1i}}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\,\big{|}\,{y_{i}-\mu_{1i}\over\sigma_{i}}\big{)}}{G_{i}\big{(}\alpha_{i}\,(\frac{y_{i}-\mu_{1i}}{\sigma_{i}})+\tau_{i}\big{)}}\,{\partial\rho_{i}\over\partial\kappa_{m}};

where

μ1iβj\displaystyle{\partial\mu_{1i}\over\partial\beta_{j}} =xijg1(μ1i),j=1,,k;i=1,,n;\displaystyle={x_{ij}\over g_{1}^{\prime}(\mu_{1i})},\quad j=1,\ldots,k;\ i=1,\ldots,n;
μ2iγr\displaystyle{\partial\mu_{2i}\over\partial\gamma_{r}} =wirg2(μ2i),r=1,,l;i=1,,n;\displaystyle={w_{ir}\over g_{2}^{\prime}(\mu_{2i})},\quad r=1,\ldots,l;\ i=1,\ldots,n;
σiλs\displaystyle{\partial\sigma_{i}\over\partial\lambda_{s}} =zish1(σi),s=1,,p;i=1,,n;\displaystyle={z_{is}\over h_{1}^{\prime}(\sigma_{i})},\quad s=1,\ldots,p;\ i=1,\ldots,n;
ρiκm\displaystyle{\partial\rho_{i}\over\partial\kappa_{m}} =vimh2(ρi),m=1,,q;i=1,,n.\displaystyle={v_{im}\over h_{2}^{\prime}(\rho_{i})},\quad m=1,\ldots,q;\ i=1,\ldots,n.

3 Generalized Heckman-Student-tt model

The generalized Heckman-normal model proposed by Bastos et al., (2021) is a special case of (2.3) when the underlying distribution is bivariate normal. In this work, we focus on the generalized Heckman-tt model, which is based on the bivariate Student-tt (Btt) distribution. This distribution is a good alternative in the symmetric family of distributions because it possesses has heavier tails than the bivariate normal distribution. From (2.2), if 𝒀=(Y1,Y2)\bm{Y}=(Y_{1},Y_{2})^{\top} follows a Btt distribution, then the associated PDF is given by

f(𝒚;𝝁;𝚺,ν)=1|𝚺|1/2Zgc{1+(𝒚𝝁)𝚺1(𝒚𝝁)ν}(ν+2)/2,\displaystyle f(\bm{y};\bm{\mu};\bm{\Sigma},\nu)={\frac{1}{|\bm{\Sigma}|^{1/2}Z_{g_{c}}}}\left\{1+\frac{(\bm{y}-\bm{\mu})^{\top}\bm{\Sigma}^{-1}(\bm{y}-\bm{\mu})}{\nu}\right\}^{-(\nu+2)/2}, (3.1)

where ν\nu is the number of degrees of freedom. Here, the density generator of the Btt distribution is given by gc(x)=(1+x/ν)(ν+2)/2g_{c}(x)=(1+x/\nu)^{-(\nu+2)/2} , |𝚺|=σi2(1ρi2)|\bm{\Sigma}|=\sigma_{i}^{2}(1-\rho_{i}^{2}) and Zgc=[νπΓ(ν/2)]/Γ((ν+2)/2)Z_{g_{c}}=[\nu\pi\Gamma(\nu/2)]/\Gamma((\nu+2)/2) is a normalization constant. Therefore, if (Yi,Ui)({Y}^{*}_{i},{U}^{*}_{i}) follow a Btt distribution, then, by Corollary 2.8, the PDF of Yi|Ui>0{Y}^{*}_{i}|{U}^{*}_{i}>0 is written as

fYi|Ui>0(yi;μ1i,σi2,αi,τi,ν)=1σifν(yiμ1iσi)Fν+1(ν+1ν+(yiμ1iσi)2(τi+αi(yiμ1iσi)))Fν(τi/1+αi2),\displaystyle f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\mu_{1i},\sigma^{2}_{i},\alpha_{i},\tau_{i},\nu)={1\over\sigma_{i}}\,f_{\nu}\biggl{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\biggr{)}\,\frac{F_{\nu+1}\Big{(}\sqrt[\ ]{{\nu+1\over\nu+({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}}}\ \big{(}\tau_{i}+\alpha_{i}({y_{i}-\mu_{1i}\over\sigma_{i}})\big{)}\Big{)}}{F_{\nu}\big{(}\tau_{i}/\sqrt[\ ]{1+\alpha_{i}^{2}}\,\big{)}},

where fνf_{\nu} and FνF_{\nu} are the PDF and CDF, respectively, of a univariate Student-tt distribution with ν\nu degrees of freedom, αi=ρi/1ρi2\alpha_{i}=\rho_{i}/\sqrt[\ ]{1-\rho_{i}^{2}} and τi=μ2i/1ρi2\tau_{i}=\mu_{2i}/\sqrt[\ ]{1-\rho_{i}^{2}}. The log-likelihood for 𝜽=(𝜷,𝜸,𝝀,𝜿,ν)\bm{\theta}=(\bm{\beta}^{\top},\bm{\gamma}^{\top},\bm{\lambda}^{\top},\bm{\kappa}^{\top},\nu)^{\top} is given by

(𝜽)\displaystyle\ell(\bm{\theta}) =i=1nlogfYi(yi;𝜽)\displaystyle=\sum_{i=1}^{n}\log f_{Y_{i}}(y_{i};\bm{\theta})
=i=1nlog{(Fν(μ2i))1ui(Fν(μ2i))ui(fYi|Ui>0(yi;μ1i,σi2,αi,τi,ν))ui}\displaystyle=\sum_{i=1}^{n}\log\left\{(F_{\nu}(-\mu_{2i}))^{1-u_{i}}(F_{\nu}(\mu_{2i}))^{u_{i}}(f_{{Y}_{i}^{*}|{U}_{i}^{*}>0}(y_{i};\mu_{1i},\sigma^{2}_{i},\alpha_{i},\tau_{i},\nu))^{u_{i}}\right\}
=i=1nui[log(σi)+logfν(yiμ1iσi)+logFν+1(ν+1ν+(yiμ1iσi)2(τi+αi(yiμ1iσi)))]\displaystyle=\sum_{i=1}^{n}u_{i}\left[-\log(\sigma_{i})+\log f_{\nu}\left(\frac{y_{i}-\mu_{1i}}{\sigma_{i}}\right)+\log F_{\nu+1}\left(\sqrt[\ ]{{\nu+1\over\nu+({y_{i}-\mu_{1i}\over\sigma_{i}})^{2}}}\ \big{(}\tau_{i}+\alpha_{i}\big{(}{y_{i}-\mu_{1i}\over\sigma_{i}}\big{)}\big{)}\right)\right]
+i=1n(1ui)logFν(μ2i),\displaystyle+\sum_{i=1}^{n}(1-u_{i})\log F_{\nu}(-\mu_{2i}), (3.2)

where ui=1u_{i}=1 if ui>0u_{i}^{*}>0 and ui=0u_{i}=0 otherwise, μ1i\mu_{1i}, μ2i\mu_{2i}, σi\sigma_{i} and ρi\rho_{i} are as in (2.3). The ML estimate of 𝜽\bm{\theta} is obtained by maximizing the log-likelihood function (3), that is, by equating the score vector ˙(𝜽)\dot{\ell}(\bm{\theta}) (given in Subsection 2.2) to zero, providing the likelihood equations. They are solved using an iterative procedure for non-linear optimization, such as the BFGS quasi-Newton method.

4 Monte Carlo simulation

In this section, we carry out Monte Carlo simulation studies to evaluate the performance of the ML estimators under the symmetric generalized Heckman model. We focus on the generalized Heckman-tt model and consider three different set of true parameter value, which leads to scenarios covering moderate to high censoring percentages. The studies consider simulated data generated from each scenario according to

μ1i=β1+β2x1i+β3x2i,\mu_{1i}=\beta_{1}+\beta_{2}x_{1i}+\beta_{3}x_{2i}, (4.1)
μ2i=γ1+γ2x1i+γ3x2i+γ4x3i,\mu_{2i}=\gamma_{1}+\gamma_{2}x_{1i}+\gamma_{3}x_{2i}+\gamma_{4}x_{3i}, (4.2)
logσi=λ1+λ2x1i,\log\sigma_{i}=\lambda_{1}+\lambda_{2}x_{1i}, (4.3)
arctanhρi=κ1+κ2x1i,\text{arctanh}\,\rho_{i}=\kappa_{1}+\kappa_{2}x_{1i}, (4.4)

for i=1,,ni=1,\ldots,n, x1ix_{1i}, x2ix_{2i} and x3ix_{3i} are covariates obtained from a normal distribution in the interval (0,1). Moreover, the simulation scenarios consider sample size n{500,1000,2000}n\in\{500,1000,2000\} and ν=4\nu=4, with NREP=1000\text{NREP}=1000 Monte Carlo replicates for each sample size. In the structure presented in (4.1) - (4.4), μ1i\mu_{1i} is the primary interest equation, while μ2i\mu_{2i} represents the selection equation. The R software has been used to do all numerical calculations; see R Core Team, (2022).

The performance of the ML estimators are evaluated through the bias and mean squared error (MSE), computed from the Monte Carlo replicas as

Bias^(θ^)=1NREPi=1NREPθ^(i)θandMSE^(θ^)=1NREPi=1NREP(θ^(i)θ)2,\widehat{\textrm{Bias}}(\widehat{\theta})=\frac{1}{\text{NREP}}\sum_{i=1}^{\text{NREP}}\widehat{\theta}^{(i)}-\theta\quad\text{and}\quad\widehat{\mathrm{MSE}}(\widehat{\theta})=\frac{1}{\text{NREP}}\sum_{i=1}^{\text{NREP}}(\widehat{\theta}^{(i)}-\theta)^{2}, (4.5)

where θ\theta and θ^(i)\widehat{\theta}^{(i)} are the true parameter value and its respective ii-th ML estimate, and NREP is the number of Monte Carlo replicas.

We consider the following sets of true parameter values for the regression structure in (4.1)-(4.4):

  • Scenario 1) 𝜷=(1.1,0.7,0.1)\bm{\beta}=(1.1,0.7,0.1)^{\top}, 𝜸=(0.9,0.5,1.1,0.6)\bm{\gamma}=(0.9,0.5,1.1,0.6)^{\top}, and 𝝀=(0.4,0.7)\bm{\lambda}=(-0.4,0.7)^{\top} and 𝜿=(0.3,0.5)\bm{\kappa}=(0.3,0.5)^{\top}.

  • Scenario 2) 𝜷=(1.0,0.7,1.1)\bm{\beta}=(1.0,0.7,1.1)^{\top}, 𝜸=(0.9,0.5,1.1,0.6)\bm{\gamma}=(0.9,0.5,1.1,0.6)^{\top}, 𝝀=(0.2,1.2)\bm{\lambda}=(-0.2,1.2)^{\top}, and 𝜿=(0.7,0.3)\bm{\kappa}=(0.7,0.3)^{\top} or 𝜿=(0.7,0.3)\bm{\kappa}=(-0.7,0.3)^{\top}.

  • Scenario 3) 𝜷=(1.1,0.7,0.1)\bm{\beta}=(1.1,0.7,0.1)^{\top}, 𝜸=(0,0.5,1.1,0.6)\bm{\gamma}=(0,0.5,1.1,0.6)^{\top}, 𝝀=(0.4,1.2)\bm{\lambda}=(-0.4,1.2)^{\top}, and 𝜿=(0.3,0.3)\bm{\kappa}=(-0.3,-0.3)^{\top} (moderate correlation) or 𝜿=(0.7,0.7)\bm{\kappa}=(-0.7,-0.7)^{\top} (strong correlation).

To keep the censoring proportion around 50%, in Scenario 1 a threshold greater than zero was used, so Ui>aU_{i}^{*}>a. According to Bastos et al., (2021), in general the value of aa is zero, as any other value would be absorbed by the intercept, so considering another value does not cause problems for the model. In Scenario 2, the dispersion and correlation parameters were changed and the censoring proportion was maintained around 30%. In Scenario 3, the censoring rate around 50% was obtained by changing the parameters of the selection equation μ2i\mu_{2i}.

The ML estimation results for the Scenarios 1), 2) and 3) are presented in Tables 1-3, respectively, wherein the bias and MSE are all reported. As the ML estimators are consistent and asymptotically normally distributed, we expect the bias and MSE to approach zero as nn grows. Moreover, we expect that the performances of the estimates deteriorate as the censoring proportion (%) grows. A look at the results in Tables 1-3 allows us to conclude that, as the sample size increases, the bias and MSE both decrease, as expected. In addition, the performances of the estimates decrease when the censoring proportion increases.

Table 1: Bias and MSE for the indicated ML estimates of the generalized Heckman-tt model parameters (Scenario 1).
Generalized Heckman-tt Generalized Heckman-tt
Parameters n Censoring Bias MSE Censoring Bias MSE
β1\beta_{1}     1.1 500 30.8878 0.0034 0.0053 54.0284 0.0082 0.0148
1000 30.9352 0.0019 0.0025 53.421 0.0016 0.0060
2000 31.0035 0.0021 0.0011 52.8706 0.0058 0.0030
β2\beta_{2}     0.7 500 30.8878 0.0037 0.0014 54.0284 0.0060 0.0034
1000 30.9352 0.0020 0.0007 53.421 0.0033 0.0013
2000 31.0035 0.0016 0.0003 52.8706 0.0020 0.0005
β3\beta_{3}     0.1 500 30.8878 0.0009 0.002 54.0284 -0.0014 0.0045
1000 30.9352 -0.0003 0.0008 53.421 0.0003 0.0017
2000 31.0035 0.0008 0.0004 52.8706 -0.0022 0.0008
γ1\gamma_{1}     0.9 500 30.8878 0.0163 0.0137 54.0284 -1.0167 1.0443
1000 30.9352 0.0055 0.0065 53.421 -1.0095 1.0246
2000 31.0035 -0.0008 0.0032 52.8706 -0.9981 0.9980
γ2\gamma_{2}     0.5 500 30.8878 0.0066 0.0091 54.0284 0.0071 0.0077
1000 30.9352 0.0064 0.0044 53.421 0.0026 0.0039
2000 31.0035 0.0025 0.0021 52.8706 0.0021 0.0019
γ3\gamma_{3}     1.1 500 30.8878 0.0177 0.0194 54.0284 0.0131 0.0170
1000 30.9352 0.0081 0.0085 53.421 0.0090 0.0074
2000 31.0035 0.0032 0.0041 52.8706 0.0044 0.0038
γ4\gamma_{4}     0.6 500 30.8878 0.0091 0.0110 54.0284 0.0076 0.0089
1000 30.9352 0.0081 0.0085 53.421 0.0043 0.0045
2000 31.0035 0.0028 0.0026 52.8706 0.0020 0.0020
κ1\kappa_{1}     0.3 500 30.8878 0.0139 0.0564 54.0284 0.0034 0.0467
1000 30.9352 0.0048 0.0048 53.421 0.0080 0.0207
2000 31.0035 -0.0005 0.0102 52.8706 -0.0029 0.0097
κ2\kappa_{2}     0.5 500 30.8878 0.0589 0.0554 54.0284 0.0416 0.0383
1000 30.9352 0.0054 0.0225 53.421 0.0202 0.0157
2000 31.0035 0.0120 0.0083 52.8706 0.0136 0.0064
λ1\lambda_{1}     -0.4 500 30.8878 0.0011 0.0049 54.0284 -0.0029 0.0075
1000 30.9352 0.0210 0.018 53.421 0.0009 0.0035
2000 31.0035 0.0018 0.0011 52.8706 -0.0014 0.0018
λ2\lambda_{2}     0.7 500 30.8878 0.0017 0.0031 54.0284 0.0037 0.0046
1000 30.9352 0.0006 0.0023 53.421 0.0026 0.0021
2000 31.0035 0.0007 0.0007 52.8706 0.0006 0.0010
ν\nu      4 500 30.8878 0.3633 2.3387 54.0284 0.6463 8.4442
1000 30.9352 0.1506 0.5565 53.421 0.2221 0.8141
2000 31.0035 0.0833 0.2377 52.8706 0.0820 0.3078
Table 2: Bias and MSE for the indicated ML estimates of the generalized Heckman-tt model parameters (Scenario 2).
Generalized Heckman-tt Generalized Heckman-tt
n Parameters Censoring Bias MSE Parameters Censoring Bias MSE
500 β1\beta_{1}     1.0 31.017 0.0048 0.0045 β1\beta_{1}     1.0 30.9268 0.0052 0.0039
1000 30.9395 0.0031 0.0022 30.8644 -0.0002 0.0016
2000 30.9098 0.0013 0.0009 30.9734 0.0005 0.0009
500 β2\beta_{2}     0.7 31.017 0.0013 0.0009 β2\beta_{2}     0.7 30.9268 0.0045 0.0008
1000 30.9395 -0.0002 0.0046 30.8644 0.0010 0.0003
2000 30.9098 0.0002 0.0001 30.9734 0.0006 0.0001
500 β3\beta_{3}     1.1 31.017 -0.001 0.0011 β3\beta_{3}     1.1 30.9268 0.0001 0.0007
1000 30.9395 0.0015 0.0008 30.8644 0.0009 0.0003
2000 30.9098 -0.0007 0.0002 30.9734 0.0003 0.0001
500 γ1\gamma_{1}     0.9 31.017 0.0071 0.0141 γ1\gamma_{1}     0.9 30.9268 0.0108 0.0153
1000 30.9395 0.0165 0.0814 30.8644 0.0205 0.0555
2000 30.9098 0.0036 0.0031 30.9734 0.0027 0.0034
500 γ2\gamma_{2}     0.5 31.017 0.0072 0.0090 γ2\gamma_{2}     0.5 30.9268 0.0111 0.0096
1000 30.9395 0.0076 0.0101 30.8644 0.0115 0.0270
2000 30.9098 -0.0005 0.0020 30.9734 0.0016 0.0035
500 γ3\gamma_{3}     1.1 31.017 0.013 0 0.0180 γ3\gamma_{3}     1.1 30.9268 0.0143 0.0175
1000 30.9395 0.0151 0.0254 30.8644 0.0170 0.0450
2000 30.9098 0.0020 0.0041 30.9734 0.0028 0.0051
500 γ4\gamma_{4}     0.6 31.017 0.0067 0.0102 γ4\gamma_{4}     0.6 30.9268 -0.0003 0.0094
1000 30.9395 0.0064 0.0195 30.8644 0.0077 0.0139
2000 30.9098 0.001 0.0023 30.9734 0.0023 0.0024
500 κ1\kappa_{1}     0.7 31.017 0.0299 0.0560 κ1\kappa_{1}     -0.7 30.9268 -0.0449 0.0662
1000 30.9395 0.0197 0.0628 30.8644 -0.0198 0.033
2000 30.9098 0.0031 0.0089 30.9734 -0.0058 0.0218
500 κ2\kappa_{2}     0.3 31.017 0.052 0.0702 κ2\kappa_{2}     0.3 30.9268 0.0363 0.067
1000 30.9395 0.0197 0.0576 30.8644 0.0141 0.0632
2000 30.9098 0.0097 0.0097 30.9734 0.0108 0.0345
500 λ1\lambda_{1}     -0.2 31.017 0.0004 0.0053 λ1\lambda_{1}     -0.2 30.9268 0.0043 0.0054
1000 30.9395 -0.0018 0.0041 30.8644 -0.0031 0.0039
2000 30.9098 0.0019 0.0012 30.9734 -0.0004 0.0013
500 λ2\lambda_{2}     1.2 31.017 0.0032 0.0038 λ2\lambda_{2}     1.2 30.9268 0.0096 0.0035
1000 30.9395 0.0018 0.0020 30.8644 0.0058 0.0016
2000 30.9098 -0.0001 0.0007 30.9734 0.0024 0.0010
500 ν\nu      4 31.017 0.4404 2.3837 ν\nu      4 30.9268 0.5096 17.9132
1000 30.9395 0.1418 0.6321 30.8644 0.1619 0.6922
2000 30.9098 0.1234 0.2899 30.9734 0.0789 0.2731
Table 3: Bias and MSE for the indicated ML estimates of the generalized Heckman-tt model parameters (Scenario 3).
Generalized Heckman-tt Generalized Heckman-tt
n Parameters Censoring Bias MSE Parameters Censoring Bias MSE
500 β1\beta_{1}     1.1 49.8604 -0.0037 0.0094 β1\beta_{1}     1.1 49.9824 -0.0109 0.0072
1000 49.9469 -0.0008 0.0035 49.9075 -0.0049 0.003
2000 50.0265 -0.0012 0.0017 49.925 -0.0046 0.0011
500 β2\beta_{2}     0.7 49.8604 -0.0025 0.0016 β2\beta_{2}     0.7 49.9824 -0.0057 0.0014
1000 49.9469 -0.0006 0.0005 49.9075 -0.0026 0.0005
2000 50.0265 -0.0009 0.0002 49.925 -0.0025 0.0002
500 β3\beta_{3}     0.1 49.8604 0.0001 0.0014 β3\beta_{3}     0.1 49.9824 0.0017 0.0011
1000 49.9469 ¡0.0001 0.0005 49.9075 0.0004 0.0004
2000 50.0265 0.0005 0.0002 49.925 -0.0002 0.0002
500 γ1\gamma_{1}     0 49.8604 0.0025 0.0065 γ1\gamma_{1}     0 49.9824 0.0012 0.0063
1000 49.9469 0.0036 0.003 49.9075 0.0029 0.004
2000 50.0265 -0.0023 0.0015 49.925 0.0022 0.0014
500 γ2\gamma_{2}     0.5 49.8604 0.0067 0.0078 γ2\gamma_{2}     0.5 49.9824 0.0051 0.0071
1000 49.9469 0.0061 0.0035 49.9075 0.0036 0.0051
2000 50.0265 0.0009 0.0019 49.925 0.0003 0.0015
500 γ3\gamma_{3}     1.1 49.8604 0.009 0.0161 γ3\gamma_{3}     1.1 49.9824 0.0095 0.0155
1000 49.9469 0.0047 0.008 49.9075 0.0116 0.0185
2000 50.0265 0.0024 0.0037 49.925 0.0046 0.0035
500 γ4\gamma_{4}     0.6 49.8604 0.0010 0.0091 γ4\gamma_{4}     0.6 49.9824 0.0048 0.0083
1000 49.9469 0.0014 0.0042 49.9075 0.0056 0.0086
2000 50.0265 -0.001 0.0023 49.925 0.0038 0.002
500 κ1\kappa_{1}     -0.3 49.8604 -0.0105 0.04 κ1\kappa_{1}     -0.7 49.9824 -0.0166 0.0437
1000 49.9469 -0.0107 0.0173 49.9075 -0.0061 0.0183
2000 50.0265 -0.0015 0.0085 49.925 -0.0034 0.0078
500 κ2\kappa_{2}     -0.3 49.8604 -0.0302 0.0394 κ2\kappa_{2}     -0.7 49.9824 -0.0831 0.0609
1000 49.9469 -0.0189 0.0163 49.9075 -0.0317 0.0195
2000 50.0265 -0.0053 0.007 49.925 -0.0214 0.0086
500 λ1\lambda_{1}     -0.4 49.8604 0.0032 0.0067 λ1\lambda_{1}     -0.4 49.9824 -0.0041 0.0072
1000 49.9469 -0.0003 0.0035 49.9075 -0.006 0.0034
2000 50.0265 0.0005 0.0016 49.925 -0.0033 0.0018
500 λ2\lambda_{2}     1.2 49.8604 0.0062 0.004 λ2\lambda_{2}     1.2 49.9824 0.0039 0.0038
1000 49.9469 0.0057 0.0019 49.9075 0.0029 0.0018
2000 50.0265 0.0026 0.0009 49.925 0.0016 0.0008
500 ν\nu      4 49.8604 0.5755 3.9653 ν\nu      4 49.9824 0.546 8.083
1000 49.9469 0.2316 0.8686 49.9075 0.1677 0.793
2000 50.0265 0.1139 0.3197 49.925 0.0841 0.4107

5 Application to real data

In this section, two real data sets, corresponding to outpatient expense and investments in education, are analyzed. The outpatient expense data data set has already been analyzed in the literature by Heckman models Marchenko and Genton, (2012), whereas the education investment data is new and is analyzed for the first time here.

5.1 Outpatient expense

In this subsection, a real data set corresponding to outpatient expense from the 2001 Medical Expenditure Panel Survey (MEPS) is used to illustrate the proposed methodology. This data set has information about the cost and provision of outpatient services, and is the most complete coverage about health insurance in the United States, according to the Agency for Healthcare Research and Quality (AHRQ).

The MEPS data set contains information collected from 3328 individuals between 21 and 64 years. The variable of interest is the expenditure on medical services in the logarithm scale (Yi=lnambxY_{i}^{*}=lnambx), while the latent variable (Ui=dambexpU_{i}^{*}=dambexp) is the willingness of the individual to spend; Ui=𝟙{Ui>0}U_{i}=\mathds{1}_{\{U_{i}^{*}>0\}} corresponds the decision of the individual to spend. It was verified that 526 (15.8%) of the outpatient costs are identified as zero (censored). The covariates considered in the are: ageage is the age measured in tens of years; femfem is a dummy variable that assumed value 1 for women and 0 for men; educeduc is the years of education; blhispblhisp is a dummy variable for ethnicity (1 for black or Hispanic and 0 if non-black and non-Hispanic); totcrtotcr is the total number of chronic diseases; insins is the insurance status; and revenuerevenue denotes the individual income.

Table 4 reports descriptive statistics of the observed medical expenditures, including the minimum, mean, median, maximum, standard deviation (SD), coefficient of variation (CV), coefficient of skewness (CS) and coefficient of (excess) kurtosis (CK) values. From this table, we note the following: the mean is almost equal to the median; a very small negative skewness value; and a very small kurtosis value. The symmetric nature of the data is confirmed by the histogram shown in Figure Figure 1(a). The boxplot shown in Figure 1(b) indicates some potential outliers. Therefore, we observe that a symmetric distribution is a reasonable assumption, more specifically a Student-tt model, since since we have to accommodate outliers.

Table 4: Summary statistics for the medical expenditure data.
minimum Mean Median maximum SD CV CS CK nn
0.693 6.557 6.659 10.819 1.406 21.434% -0.315 0.006 2801
Refer to caption
Refer to caption
Figure 1: Histogram (a) and boxplots (b) for the medical expenditure data.

We then analyze the medical expenditure data using the generalized Heckman-tt model, expressed as

lnambx=β0+β1agei+β2femi+β3educi+β4blhispi+β5totchri+β6insi,lnambx=\beta_{0}+\beta_{1}\,age_{i}+\beta_{2}\,fem_{i}+\beta_{3}\,educ_{i}+\beta_{4}\,blhisp_{i}+\beta_{5}\,totchr_{i}+\beta_{6}\,ins_{i}, (5.1)
dambexp=γ0+γ1agei+γ2femi+γ3educi+γ4blhispi+γ5totchri+γ6insi+γ7incomei,dambexp=\gamma_{0}+\gamma_{1}\,age_{i}+\gamma_{2}\,fem_{i}+\gamma_{3}\,educ_{i}+\gamma_{4}\,blhisp_{i}+\gamma_{5}\,totchr_{i}+\gamma_{6}\,ins_{i}+\gamma_{7}\,income_{i}, (5.2)
logσi=λ0+λ1agei+λ2totchri+λ3insi,\log\sigma_{i}=\lambda_{0}+\lambda_{1}\,age_{i}+\lambda_{2}\,totchr_{i}+\lambda_{3}\,ins_{i}, (5.3)
arctanhρi=κ0+κ1femi+κ2totchri.\text{arctanh}\,\rho_{i}=\kappa_{0}+\kappa_{1}\,fem_{i}+\kappa_{2}\,totchr_{i}. (5.4)

We initially compare the adjustments of the generalized Heckman-tt (GHtt) model, in terms of Akaike (AIC) and Bayesian information (BIC), with the adjustments of the classical Heckman-normal (CHN) (Heckman,, 1976) and generalized Heckman-normal (GHN) (Bastos et al.,, 2021) models; see Table 5. The AIC and BIC values reveal that the GHtt model provides the best adjustment, followed by the GHN model.

Table 5: AIC and BIC of the indicated Heckman models.
CHN GHN GHtt
AIC 11706.44 11660.29 11635.05
BIC 11810.31 11794.71 11775.59

Table 6 presents the estimation results of the GHN and GHtt models. From this table, we observe the following results: the explanatory variables totchrtotchr and insins that model the dispersion are significant, in both models, indicating the presence of heteroscedasticity in the data. The explanatory variable ageage is only significant in the GHN model. For the correlation term, the covariates femfem and totchrtotchr are significant for both models, which indicates the presence of selection bias in the data.

In the outcome equation, when we look at both models, ageage, femfem, blhispblhisp and totchrtotchr are significant at the 5% level, and educeduc is not significant. The explanatory variable insins is significant at the 5% and 10% levels in the GHN and GHtt models, respectively; see Table 6. We can interpret the estimated coefficients in terms of the effect on the expenditure on medical services; see Weisberg, (2014, p.82). For example, a 1-year increase in ageage rises by (exp(0.1838)1)×100=20.18%(\exp(0.1838)-1)\times 100=20.18\% and (exp(0.1895)1)×100=20.86%(\exp(0.1895)-1)\times 100=20.86\% the expected value of the expenditure on medical services according to the GHN and GHtt, respectively. Moreover, a 1-unit increase in totchrtotchr rises by (exp(0.4306)1)×100=53.82%(\exp(0.4306)-1)\times 100=53.82\% and (exp(0.4464)1)×100=56.27%(\exp(0.4464)-1)\times 100=56.27\% the expected value of the response according to the GHN and GHtt models, respectively.

In the case of the selection equation, we observe that, for both models, the explanatory variables ageage, femfem, educeduc, blhispblhisp, totchrtotchr and insins are significant at the 5% level and revenuerevenue is significant at the 10% level; see Table 6. The interpretation is made in terms of odds ratio, which is obtained by exponentiating the estimated explanatory variable coefficient. For example, for the GHtt case, the odds ratio for ageage is exp(0.0930)=1.0975\exp(0.0930)=1.0975, suggesting that each additional year of age raises the likelihood of an individual having expenditures on medical services by ((1.09751)×100)=9.75%((1.0975-1)\times 100)=9.75\%.

Table 6: Estimation results of the GHN and GHtt models.
Probit selection equation
Variables Estimates Std. Error tt Value pp-value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
(Intercept)(Intercept) -0.5903 -0.6406 0.1867 0.2014 -3.1620 -3.1800 <0.001 <0.001
ageage 0.0863 0.0930 0.0264 0.0288 3.2610 3.2230 <0.001 <0.001
femfem 0.6299 0.7087 0.0597 0.0681 1.0543 1.0395 <0.001 <0.001
educeduc 0.0569 0.0590 0.0114 0.0122 4.9830 4.8110 <0.001 <0.001
blhispblhisp -0.3368 -0.3726 0.0596 0.0647 -5.6430 -5.7590 <0.001 <0.001
totchrtotchr 0.7585 0.8728 0.0686 0.0858 1.1043 1.0168 <0.001 <0.001
insins 0.1727 0.1863 0.0611 0.0665 2.8240 2.7990 <0.001 <0.001
revenuerevenue 0.0022 0.0025 0.0012 0.0013 1.8380 1.8750 0.0661 0.0610
Outcome equation
Variables Estimates Std. Error tt Value pp-value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
(Intercept)(Intercept) 5.7041 5.6078 0.1930 0.1912 2.9553 2.9316 <0.001 <0.001
ageage 0.1838 0.1895 0.0234 0.0230 7.8460 8.2350 <0.001 <0.001
femfem 0.2498 0.2555 0.0587 0.0580 4.2530 4.4000 <0.001 <0.001
educeduc 0.0013 0.0062 0.0101 0.0100 0.1290 0.6240 0.8970 0.5329
blhispblhisp -0.1283 -0.1344 0.0577 0.0569 -2.2210 -2.3630 0.0264 0.0182
totchrtotchr 0.4306 0.4464 0.0305 0.0297 1.4115 1.5002 <0.001 <0.001
insins -0.1027 -0.0976 0.0513 0.0501 -2.000 -1.9470 0.0456 0.0516
Dispersion
Variables Estimates Std. Error tt Value pp-value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
Intercept)Intercept) 0.5081 0.4172 0.0573 0.0643 8.8550 6.4820 <0.001 <0.001
ageage -0.0249 -0.0209 0.0125 0.0136 -1.9870 -1.5360 0.0469 0.1246
totchrtotchr -0.1046 -0.1118 0.0191 0.0208 -5.4760 -5.3720 <0.001 <0.001
insins -0.1070 -0.1117 0.0277 0.0303 -3.8630 -3.6810 <0.001 <0.001
Correlation
Variables Estimates Std. Error tt Value pp-value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
InterceptIntercept -0.6475 -0.6051 0.1143 0.1118 -5.666 -5.413 <0.001 <0.001
femfem -0.4040 -0.4220 0.1356 0.1489 -2.978 -2.835 <0.001 <0.001
totchrtotchr -0.4379 -0.4999 0.1862 0.2102 -2.351 -2.378 0.0187 0.0174
ν\nu     - 12.3230     - 2.7570     - 4.4690     - <0.001

Figure 2 displays the quantile versus quantile (QQ) plots of the martingale-type (MT) residuals for the GHN and GHtt models. This residual is given by

riMT=sign(rMi)2(rMi+uilog(uirMi)),i=1,,n.{r}_{{i}}^{\text{MT}}=\textrm{sign}(r^{{}_{\textrm{\tiny M}_{i}}})\sqrt[\ ]{-2\left(r^{{}_{\textrm{\tiny M}_{i}}}+u_{i}\log(u_{i}-r^{{}_{\textrm{\tiny M}_{i}}})\right)},\quad i=1,\dots,n. (5.5)

where rMi=ui+log(S^(ti))r^{{}_{\textrm{\tiny M}_{i}}}=u_{i}+\log(\widehat{S}(t_{i})), S^(ti)\widehat{S}(t_{i}) is the fitted survival function, and ui=0u_{i}=0 or 11 indicating that case ii is censored or not, respectively; see Therneau et al., (1990). The MT residual is asymptotically standard normal, if the model is correctly specified whatever the specification of the model is. From Figure 2, we see clearly that the GHtt model provides better fit than GHN model.

Refer to caption
(a) GHN
Refer to caption
(b) GHtt
Figure 2: QQ plot for the MT residuals for the GHN and GHtt models.

5.2 Investments in education

In this subsection, data on education investments are used to illustrate the proposed methodology. We consider the investments made by municipalities of two Brazilian states: Sao Paulo (SP) and Minas Gerais (MG). This data set was obtained from the Brazilian National Fund for Educational Development (FNDE) website111Filtered data are available at https://repositorio.shinyapps.io/plataforma_de_dados_municipais., which is a federal agency under the Ministry of Education. The origin of these investments comes from the Fund for the Maintenance and Development of Basic Education and the Valorization of Education Professionals (FUNDEB). The resources are distributed to 27 federative units (26 states plus the Federal District), according to the number of students enrolled in their basic education network. This rule is established for the previous year’s school census data, e.g., 2018 resources were based on 2017 student numbers. This method helps to better distribute resources across the country, as it takes into account the size of education networks.

The variable of interest is education investments with 1503 observations, of which 102 (7%) correspond to unobserved investment values identified as zero investment. The explanatory variables considered in the study were: revenuerevenue333http://www.ipeadata.gov.br/Default.aspx. represents per capita revenue collected by the municipality; gnpgnp 333http://www.ipeadata.gov.br/Default.aspx. is the Gross National Product of the municipality; distributedistribute222https://dados.gov.br/dataset/sistema-arrecadacao. is a dummy variable indicating if the municipality receives resources from the Financial Compensation for Exploration of Mineral Resources (CFEM); this resource must be destined to investments in the areas of health, education and infrastructure for the community; spsp is an indicator variable for state (spsp receives value 1); enrollmentenrollment444https://www.gov.br/inep/pt-br/areas-de-atuacao/pesquisas-estatisticas-e-indicadores/censo-escolar. is the school census enrollment numbers.

As in the previous study, the response variable, investment in education, is in the logarithm scale Yi=lninvestY_{i}^{*}=lninvest. The latent variable (Ui=dinvestU_{i}^{*}=dinvest) denotes the willingness of the iith municipality to invest education; Ui=𝟙{Ui>0}U_{i}=\mathds{1}_{\{U_{i}^{*}>0\}} corresponds to the decision or not of the iith municipality to invest in education.

33footnotetext: https://www.ibge.gov.br/estatisticas/sociais/populacao/9103-estimativas-de-populacao.html?=&t=resultados.

The descriptive statistics for the investments in education are reported in Table 7. From this table, we note the following: the mean is almost equal to the median; a very small negative skewness value, and a high kurtosis value. The symmetric nature of the data is confirmed by the histogram shown in Figure Figure 3(a). The boxplot shown in Figure 3(b) indicates potential outliers. Therefore, we observe that a Student-tt model is a reasonable assumption.

Table 7: Summary statistics for the education investment data.
minimum Mean Median maximum SD CV CS CK nn
4.271 12.212 12.575 18.511 1.903 15.587% -0.5864 3.70 1401
Refer to caption
Refer to caption
Figure 3: Histogram (a) and boxplots (b) for the education investments data.

We then analyze the education investment data using the GHtt model, expressed as

dinvest=β0+β1revenuei+β2spi+β3enrollmenti,dinvest=\beta_{0}+\beta_{1}\,revenue_{i}+\beta_{2}\,sp_{i}+\beta_{3}\,enrollment_{i}, (5.6)
lninvest=γ0+γ1revenuei+γ2spi+γ3enrollmenti+γ4gnpi,lninvest=\gamma_{0}+\gamma_{1}\,revenue_{i}+\gamma_{2}\,sp_{i}+\gamma_{3}\,enrollment_{i}+\gamma_{4}\,gnp_{i}, (5.7)
logσi=λ0+λ1revenuei+λ2spi,\log\sigma_{i}=\lambda_{0}+\lambda_{1}\,revenue_{i}+\lambda_{2}\,sp_{i}, (5.8)
arctanhρi=κ0+κ1revenuei+κ2distributei.\text{arctanh}\,\rho_{i}=\kappa_{0}+\kappa_{1}\,revenue_{i}+\kappa_{2}\,distribute_{i}. (5.9)

Table 8 reports the AIC and BIC of the CHN, GHN and GHtt models. From Table 8, we observe that the GHtt model provides better adjustment than other models based on the values of AIC and BIC.

Table 8: AIC and BIC of the indicated Heckman models.
CHN GHN GHtt
AIC 8269.058 5873.579 5719.590
BIC 8306.147 5953.055 5804.365

Table 9 presents the estimation results of the GHN and GHtt models. From this table, we note that the explanatory variables revenuerevenue and spsp associated with the dispersion are significant, in both models, suggesting the presence of heteroscedasticity. For the explanatory variables related to the correlation parameter, revenuerevenue is significant at the 5% and 10% levels in the GHN and GHtt models, respectively, while distributedistribute is significant at the 10% and 5% levels in the GHN and GHtt models, respectively. Therefore, there is evidence of the presence of sample selection bias in the data.

From Table 9, we observe that in the outcome equation revenuerevenue, spsp, enrollmentenrollment and gnpgnp are significant, according to the GHN and GHtt models. Note that increasing revenuerevenue by one unit is associated with (exp(0.2683)1)×100=23.53%(\exp(-0.2683)-1)\times 100=-23.53\% and (exp(0.2167)1)×100=19.48%(\exp(-0.2167)-1)\times 100=-19.48\% decrease in the average investment in education according to the GHN and GHtt models, respectively. Note also that when the municipality is located in the state of Sao Paulo (spsp), the average investment in education increases by (exp(1.0714)1)×100=191.95%(\exp(1.0714)-1)\times 100=191.95\% and (exp(1.0063)1)×100=173.55%(\exp(1.0063)-1)\times 100=173.55\%, according to the GHN and GHtt models, respectively. These results show that the municipalities of the state of Sao Paulo invest much more in education than the municipalities of the state of Minas Gerais.

In the selection equation, we observe that the explanatory variables revenuerevenue, spsp and enrollmentenrollment are significant in the GHtt model. On the other hand, revenuerevenue and spsp are not significant in the GHN model; see Table 9. We observe that, for the GHtt case, the odds ratio for spsp is exp(0.3674)=1.4439\exp(0.3674)=1.4439, that is, the likelihood of a municipality to spend on medical services increases by ((1.44391)×100)=44.39%((1.4439-1)\times 100)=44.39\% when the municipality is located in the state of Sao Paulo

Table 9: Estimation results of the GHN and GHtt models.
Probit selection equation
Variables Estimate Std. Error t Value p-Value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
(intercept)(intercept) 1.0523 1.4222 0.1013 0.0955 10.3870 14.8880 <0.01 <0.001
revenuerevenue -0.0058 -0.0530 0.0223 0.0164 -0.2630 -3.2330 0.7920 <0.001
spsp 0.1065 0.3674 0.0990 0.1032 1.0750 3.5600 0.2820 <0.001
enrollmentenrollment 0.0311 0.0537 0.0032 0.0026 9.6380 20.6150 <0.001 <0.001
Outcome equation
Variables Estimate Std. Error t Value p-Value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
(intercept)(intercept) 12.4789 12.4058 0.1252 0.1208 99.6480 102.6960 <0.001 <0.001
revenuerevenue -0.2683 -0.2167 0.0332 0.0274 -8.0640 -7.9030 <0.001 <0.001
spsp 1.0714 1.0063 0.1003 0.0881 10.6810 11.4190 <0.001 <0.001
enrollmentenrollment 0.0031 0.0226 0.0005 0.0025 5.6670 8.8410 <0.001 <0.001
gnpgnp 0.0202 0.0151 0.0023 0.0005 8.4630 27.4250 <0.001 <0.001
Dispersion
Variables Estimate Std. Error t Value p-Value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
(intercept)(intercept) 0.5921 0.2947 0.0503 0.0623 11.7560 4.7250 <0.001 <0.001
revenuerevenue 0.0382 0.0546 0.0130 0.0146 2.9330 3.7180 <0.001 <0.001
spsp -0.3142 -0.4745 0.0416 0.0534 -7.5470 -8.8800 <0.001 <0.001
Correlation
Variables Estimate Std. Error t Value p-Value
GHN GHtt GHN GHtt GHN GHtt GHN GHtt
(intercept)(intercept) -3.7263 -6.7669 0.5463 0.9682 -6.8210 -6.9890 <0.001 <0.001
revenuerevenue 0.1686 0.2274 0.0839 0.1161 2.0080 -6.9890 0.0448 0.0500
distributedistribute 0.8298 3.1118 0.4411 0.6867 1.8810 1.9580 0.0602 <0.001
ν\nu      - 3.6240      - 0.4320      - 8.3900      - <0.001

Figure 4 displays the QQ plots of the MT residuals. This figure indicates that the MT residuals in the GHtt model shows better agreement with the reference distribution.

Refer to caption
(a) GHN
Refer to caption
(b) GHtt
Figure 4: QQ plot for the MT residuals for the GHN and GHtt models.

6 Concluding Remarks

In this paper, a class of Heckman sample selection models were proposed based symmetric distributions. In such models, covariates were added to the dispersion and correlation parameters, allowing the accommodation of heteroscedasticity and a varying sample selection bias, respectively. A Monte Carlo simulation study has showed good results of the parameter estimation method. We have considered high/low censoring rates and the presence of strong/weak correlation. We have applied the proposed model along with some other two existing models to two data sets corresponding to outpatient expense and investments in education. The applications favored the use of the proposed generalized Heckman-tt model over the classical Heckman-normal and generalized Heckman-normal models. As part of future research, it will be of interest to propose sample selection models based on skew-symmetric distributions. Furthermore, the behavior of the Wald, score, likelihood ratio and gradient tests can be investigated. Work on these problems is currently in progress and we hope to report these findings in future.

References

  • Abdous, (2005) Abdous, B., F. A.-L. G. K. (2005). Extreme behaviour for bivariate elliptical distributions. The Canadian Journal of Statistics, 33:317–334.
  • Balakrishnan and Lai, (2009) Balakrishnan, N. and Lai, C. D. (2009). Continuous Bivariate Distributions. Springer-Verlag, New York, NY.
  • Bastos and Barreto-Souza, (2021) Bastos, F. S. and Barreto-Souza, W. (2021). Birnbaum-Saunders sample selection model. Journal of Applied Statistics, 48:1896–1916.
  • Bastos et al., (2021) Bastos, F. S., Barreto-Souza, W., and Genton, M. G. (2021). A generalized heckman model with varying sample election bias and dispersion parameters. Statistica Sinica.
  • Ding, (2014) Ding, P. (2014). Bayesian robust inference of sample selection using selection-t models. Journal of Multivariate Analysis, pages 124:451–464.
  • Fang et al., (1990) Fang, K. T., Kotz, S., and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions. Chapman and Hall, London, UK.
  • Heckman, (1976) Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5:475–492.
  • Heckman, (1979) Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47:153–161.
  • Lachos et al., (2021) Lachos, V. H., Prates, M. O., and Dey, D. K. (2021). Heckman selection-t model: Parameter estimation via the em-algorithm. Journal of Multivariate Analysis, 184:104737.
  • Leung and Yu, (1996) Leung, S. F. and Yu, S. (1996). On the choice between sample selection and twopart models. Journal of Econometrics, pages 72:197–229.
  • Manning et al., (1987) Manning, W., Duan, N., and Rogers, W. (1987). Monte carlo evidence on the choice between sample selection and two-part models. Journal of Econometrics, pages 35:59 –82.
  • Marchenko and Genton, (2012) Marchenko, Y. V. and Genton, M. G. (2012). A heckman selection-t model. Journal of the American Statistical Association, 107:304–317.
  • Nelson, (1984) Nelson, F. D. (1984). Eciency of the two-step estimator for models with endogenous sample selection. pages 24:181 – 196.
  • Ogundimu and Hutton, (2016) Ogundimu, E. O. and Hutton, J. L. (2016). A sample selection model with skew-normal distribution. Scandinavian Journal of Statistics, pages 43:172–190.
  • Paarsch, (1984) Paarsch, H. J. (1984). A monte carlo comparison of estimators for censored regression models. pages 24:197–213.
  • R Core Team, (2022) R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Stolzenberg and Relles, (1990) Stolzenberg, R. M. and Relles, D. A. (1990). heory testing in a world of constrained research design: The significance of heckman’s censored sampling bias correction for nonexperimental research. Sociological Methods & Research, pages 18:395–415.
  • Therneau et al., (1990) Therneau, T., Grambsch, P., and Fleming, T. (1990). Martingale-based residuals for survival models. Biometrika, 77:147–160.
  • Weisberg, (2014) Weisberg, S. (2014). Applied Linear Regression. John Wiley & Sons, Hoboken, New Jersey, fourth edition edition.