This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The zero-adjusted log-symmetric quantile regression model applied to extramarital affairs data

Danúbia R. Cunha1 , Jose A. Divino1, Helton Saulo2
1Department of Economics, Universidade Católica de Brasília, Brasília, Brazil
2Department of Statistics, Universidade de Brasília, Brasília, Brazil

Abstract. In this work, we propose a zero-adjusted log-symmetric quantile regression model. Initially, we introduce zero-adjusted log-symmetric distributions, which allow for the accommodation of zeros. The estimation of the parameters is approached by the maximum likelihood method and a Monte Carlo simulation is performed to evaluate the estimates. Finally, we illustrate the proposed methodology with the use of a real extramarital affairs data set.
Keywords: Zero-adjusted log-symmetric distributions; Quantile regression; Extramarital affairs data.

1 Introduction

The classical log-symmetric distributions (LS) are a generalization of the log-normal distribution and are particularly flexible in providing models with positive asymmetry and that have lighter/heavier tails than those of the log-normal distribution (Vanegas and Paula,, 2016). Regression models based on LS distributions were recently studied by Vanegas and Paula, (2015, 2017), Medeiros and Ferrari, (2017), in which problems such as semi-parametric approach, presence of censored data, or hypothesis tests, are investigated.

Recently, Saulo et al., (2021) proposed a reparametrization of the LS distributions, here denoted by quantile-LS distributions. The authors’ idea was to insert a quantile parameter in the LS distributions and thus obtain the quantile-LS distributions. Then, based on these distributions, the authors proposed parametric quantile-LS quantile regression models. Parametric quantile regression models (Gilchrist,, 2000) are an alternative to semi-parametric models (distribution-free) (Koenker and Bassett Jr,, 1978), and to models based on pseudo-likelihood using the asymmetric Laplace distribution or a mixture distribution. Based on quantile-LS distributions, Cunha et al., (2021) proposed a quantile tobit model useful for modeling left censored data.

A limitation of the LS (Vanegas and Paula,, 2015) and quantile-LS (Saulo et al.,, 2021) distributions, and consequently of their respective regression models, is the impossibility of modeling (without the need for any type of transformation) data sets that contain zeros, since such distributions have positive support. In this sense, a strategy to circumvent such a problem is to consider a mixture of a continuous distribution with support (0,)(0,\infty) with a degenerate distribution with mass at zero; see, for example, Aitchison and Brown, (1957) for the case of the log-normal distribution, Heller et al., (2006) for the inverse Gaussian distribution, Leiva et al., (2016) and Tomazella et al., (2019) for Birnbaum-Saunders distributions, and most recently Cosavalente and Cysneiros, (2021); Cosavalente, (2021) for LS distributions. The mixture of two components, distribution with support (0,)(0,\infty) and a degenerate distribution with zero value, can be called “zero adjusted” as in Heller et al., (2006), and it is similar to the Cragg, (1971) approach.

In this context, the primary objective of this work is to propose a class of zero-adjusted log-symmetric quantile regression models. To this end, initially zero-adjusted quantile-LS distributions are proposed, which are denoted by quantile-ZALS distributions. Then, we propose the quantile-ZALS regression models. The immediate advantage of the proposed methodology lies in the flexibility of the quantile approach, which allows considering the effects of explanatory variables over the spectrum of the dependent variable, in addition to the possibility of including zero, which is not possible in the quantile regression models studied by Saulo et al., (2021). The quantile-ZALS regression models, proposed in this work, generalize for a quantile context, the works of Aitchison and Brown, (1957), Cosavalente and Cysneiros, (2021); Cosavalente, (2021) and Leiva et al., (2016). Secondary objectives include: (i) to obtain the estimates of the model parameters of using the maximum likelihood method; (ii) to carry out a Monte Carlo simulation to assess the performance of the maximum likelihood estimates; and (iii) to illustrate the proposed methodology using a real data set on extramarital affairs.

The rest of this work proceeds as follows. In Section 2, the quantile-LS distributions introduced by Saulo et al., (2021) are briefly discussed, then the quantile-ZALS distributions are introduced. In Section 3, the proposed quantile-ZALS regression model is presented. In Section 4, a Monte Carlo simulation is carried out to assess the performance of the maximum likelihood estimates. In Section 5, an application to real data is performed. Finally, in Section 6 presents the final conclusions.

2 Quantile-LS and quantile-ZALS distributions

In this section, the quantile-LS distributions introduced by Saulo et al., (2021) are initially described. Then, a zero-adjusted version of these distributions, denoted by quantile-ZALS, are proposed.

2.1 Quantile-LS distributions

A random variable TT follows a quantile-LS distribution if its probability density function (PDF) and cumulative distribution function (CDF) are given, respectively, by

fT(t;Q,ϕ)=1ϕtg(1ϕ[log(t)log(Q)+ϕzp]2),0<t<,f_{T}(t;Q,\phi)=\dfrac{1}{\sqrt{\phi}\,t}\,g\!\left(\frac{1}{\phi}\left[\log(t)-\log(Q)+\sqrt{\phi}\,z_{p}\right]^{2}\right),\quad 0<t<\infty, (1)

and

FT(t;Q,ϕ)=G(1ϕ[log(t)log(λ)]2)=G(1ϕ[log(t)log(Q)+ϕzp]2),0<t<,F_{T}(t;Q,\phi)=G\!\left(\frac{1}{\phi}\left[\log(t)-\log(\lambda)\right]^{2}\right)=G\!\left(\frac{1}{\phi}\left[\log(t)-\log(Q)+\sqrt{\phi}\,z_{p}\right]^{2}\right),\quad 0<t<\infty, (2)

where Q>0Q>0 is a scale parameter and also the quantile of the distribution, ϕ>0\phi>0 is a power parameter, gg is a density generator, which may involve an extra parameter ξ\xi, and G(ω)=ηωg(z2)dzG(\omega)=\eta{\int^{\omega}_{-\infty}g(z^{2})\,\textrm{d}z} with ω\omega\in\mathbb{R}, with η\eta being a normalizing constant.

Saulo et al., (2021) have shown that if Tquantile-LS(Q,ϕ,g)T\sim\textrm{{quantile}-LS}(Q,\phi,g), then the following properties hold: (P1) cTquantile-LS(cQ,ϕ,g)cT\sim\textrm{{quantile}-LS}(cQ,\phi,g), with c>0c>0; (P2) Tcquantile-LS(Qc,c2ϕ,g)T^{c}\sim\textrm{{quantile}-LS}(Q^{c},c^{2}\phi,g), with c>0c>0; and (P3) the quantile function is given by QT(q;λ,ϕ)=λexp(ϕG1(q)),q(0,1)Q_{T}(q;\lambda,\phi)=\lambda\exp\big{(}\sqrt{\phi}\,G^{-1}(q)\big{)},\quad q\in(0,1). The properties (P1) and (P2) imply that T=QϵϕT=Q\,\epsilon^{\sqrt{\phi}}, where ϵquantile-LS(1,1,g)\epsilon\sim\textrm{{quantile}-LS}(1,1,g).

The log-normal, Log-Student-tt, log-power-exponential and extended Birnbaum-Saunders distributions are obtained as particular cases for gg:

  • Log-normal(Q,ϕQ,\phi): g(u)exp(12u)g(u)\propto\exp\left(-\frac{1}{2}u\right);

  • Log-Student-tt(Q,ϕ,ξQ,\phi,\xi): g(u)(1+uξ)ξ+12g(u)\propto\left(1+\frac{u}{\xi}\right)^{-\frac{\xi+1}{2}}, ξ>0\xi>0;

  • Log-power-exponential(Q,ϕ,ξQ,\phi,\xi): g(u)exp(12u11+ξ)g(u)\propto\exp\left(-\frac{1}{2}u^{\frac{1}{1+\xi}}\right), 1<ξ1-1<{\xi}\leq{1};

  • Extended Birnbaum-Saunders(Q,ϕ,ξQ,\phi,\xi): g(u)cosh(u1/2)exp(2ξ2sinh2(u1/2))g(u)\propto\cosh(u^{1/2})\exp\left(-\frac{2}{\xi^{2}}\sinh^{2}(u^{1/2})\right), ξ>0\xi>0.

2.2 Quantile-ZALS distributions

Consider a random variable Tquantile-LS(Q,ϕ,g)T\sim\textrm{{quantile}-LS}(Q,\phi,g) with PDF and CDF given by and , respectively. We propose a quantile-LS distribution that accommodates the zeros, denoted by quantile-ZALS, by using a mixture approach given by

g(z)\displaystyle g(z) =\displaystyle= π(0)(z)+(1π)fT(z)(1(0)(z)),\displaystyle\pi\mathcal{I}_{(0)}(z)+(1-\pi)f_{T}(z)(1-\mathcal{I}_{(0)}(z)),
or
g(z)\displaystyle g(z) =\displaystyle= π(0)(z)×{(1π)fT(z)}1(0)(z)\displaystyle\pi^{\mathcal{I}_{(0)}(z)}\times\left\{(1-\pi)f_{T}(z)\right\}^{1-\mathcal{I}_{(0)}(z)} (3)

where 0<π<10<\pi<1 is a weight that determines the contribution of zeros, ff is the PDF of a random variable leTleT, and A()\mathcal{I}_{A}(\cdot) is an indicator function, that is,

A(x)={1,ifx=A;0ifxA.\mathcal{I}_{A}(x)=\begin{cases}1,&\quad\text{if}\;x=A;\\ 0&\quad\text{if}\;x\neq A.\end{cases}

The CDF given by Equation (2.2) can be rewritten by replacing fTf_{T} with (2.2), namely,

fZ(z;Q,ϕ,π)\displaystyle f_{Z}(z;Q,\phi,\pi) =\displaystyle= π{0}(z)+{(1π)1ϕzg(1ϕ[log(z)log(Q)+ϕzp]2)}(1(0)(z)),\displaystyle\pi\,\mathcal{I}_{\{0\}}(z)+\left\{(1-\pi)\dfrac{1}{\sqrt{\phi}\,z}\,g\!\left(\frac{1}{\phi}\left[\log(z)-\log(Q)+\sqrt{\phi}\,z_{p}\right]^{2}\right)\right\}(1-\mathcal{I}_{(0)}(z)),
or
fZ(z;Q,ϕ,π)\displaystyle f_{Z}(z;Q,\phi,\pi) =\displaystyle= π{0}(z)×{(1π)1ϕzg(1ϕ[log(z)log(Q)+ϕzp]2)}1(0)(z).\displaystyle\pi^{\mathcal{I}_{\{0\}}(z)}\times\left\{(1-\pi)\dfrac{1}{\sqrt{\phi}\,z}\,g\!\left(\frac{1}{\phi}\left[\log(z)-\log(Q)+\sqrt{\phi}\,z_{p}\right]^{2}\right)\right\}^{1-\mathcal{I}_{(0)}(z)}. (4)

We use the notation Zquantile-ZALS(Q,ϕ,π,g)Z\sim\textrm{{quantile}-ZALS}(Q,\phi,\pi,g). The CDF of ZZ can be written as

FZ(z;Q,ϕ,π)={π,ifz=0,π+(1π)FT(z;Q,ϕ),ifz>0,F_{Z}(z;Q,\phi,\pi)=\begin{cases}\pi,&\quad\text{if}\;z=0,\\ \pi+(1-\pi)F_{T}(z;Q,\phi),&\quad\text{if}\;z>0,\end{cases}

where FT()F_{T}(\cdot) is the CDF of Tquantile-LS(Q,ϕ,g)T\sim\textrm{{quantile}-LS}(Q,\phi,g) given in (2.2).

3 Quantile-ZALS regression model

Based on the zero-adjusted quantile log-symmetric distributions proposed in Subsection 2.2, we propose the respective regression model. Consider Z1,,ZnZ_{1},\ldots,Z_{n} an independent random sample with Ziquantile-ZALS(Qi,ϕi,πi,g)Z_{i}\sim\text{{quantile}-ZALS}(Q_{i},\phi_{i},\pi_{i},g), for i=1,,ni=1,\ldots,n, such that

FZ(zi;Qi,ϕi,πi)={πi,if zi=0,πi+(1πi)FT(zi;Qi,ϕi),if zi>0,F_{Z}(z_{i};Q_{i},\phi_{i},\pi_{i})=\begin{cases}\pi_{i},&\quad\text{if }z_{i}=0,\\ \pi_{i}+(1-\pi_{i})F_{T}(z_{i};Q_{i},\phi_{i}),&\quad\text{if }z_{i}>0,\end{cases}

with

Qi\displaystyle Q_{i} =\displaystyle= exp(𝒙i𝜷),\displaystyle\exp(\bm{x}_{i}^{\top}\bm{\beta}),
ϕi\displaystyle\phi_{i} =\displaystyle= exp(𝒘i𝜿)and\displaystyle\exp(\bm{w}^{\top}_{i}\bm{\kappa})\,\,\mbox{and}
πi\displaystyle\pi_{i} =\displaystyle= Λ(𝒗i𝜼)=exp(𝒗i𝜼)1+exp(𝒗i𝜼),\displaystyle\Lambda(\bm{v}^{\top}_{i}\bm{\eta})=\frac{\exp(\bm{v}^{\top}_{i}\bm{\eta})}{1+\exp(\bm{v}^{\top}_{i}\bm{\eta})},

where 𝜷=(β0,β1,,βk)\bm{\beta}=(\beta_{0},\beta_{1},\ldots,\beta_{k})^{\top}, 𝜿=(κ0,κ1,,κl)\bm{\kappa}=(\kappa_{0},\kappa_{1},\ldots,{\kappa_{l}})^{\top} and 𝜼=(η0,η1,,ηm)\bm{\eta}=(\eta_{0},\eta_{1},\ldots,{\eta_{m}})^{\top} are vectors of unknown parameters to be estimated, 𝒙i=(1,xi1,,xip){\bm{x}}^{\top}_{i}=(1,x_{i1},\ldots,x_{ip})^{\top}, 𝒘i=(1,wi1,,wiq){\bm{w}}^{\top}_{i}=(1,w_{i1},\ldots,w_{iq})^{\top} and 𝒗i=(1,vi1,,vir){\bm{v}}^{\top}_{i}=(1,v_{i1},\ldots,v_{ir})^{\top} are the values of pp, qq and rr explanatory variable associated with the quantile QiQ_{i}, relative dispersion ϕi\phi_{i} and probability of drawing a zero πi\pi_{i}, respectively. Note that Λ(x)=exp(x)1+exp(x)\Lambda(x)=\frac{\exp(x)}{1+\exp(x)} is the logistic function.

The estimation of the parameters of the quantile-ZALS regression model presented in (3) is performed using the maximum likelihood method. Let Z1,,,ZnZ_{1},\ldots,\ldots,Z_{n} be an independent random sample such that Ziquantile-ZALS(Qi,ϕi,πi,g)Z_{i}\sim\text{{quantile}-ZALS}(Q_{i},\phi_{i},\pi_{i},g), and z1,,znz_{1},\ldots,z_{n} be the corresponding observed values. Then, the likelihood function for 𝜽=(𝜷,𝜿,𝜼)\bm{\theta}=(\bm{\beta}^{\top},\bm{\kappa}^{\top},\bm{\eta}^{\top})^{\top} can be written as

L(𝜽)\displaystyle\small L(\bm{\theta}) =\displaystyle= i=1nfZ(zi;Qi,ϕi,πi)\displaystyle\prod_{i=1}^{n}f_{Z}(z_{i};Q_{i},\phi_{i},\pi_{i})
=\displaystyle= i=1nπi{0}(zi)(1πi)1{0}(zi)L1(𝜼)i=1n{1ϕizig(1ϕi[log(zi)log(Qi)+ϕizp]2)}1{0}(zi)L2(𝜷,𝜿).\displaystyle\underbrace{\prod_{i=1}^{n}\pi_{i}^{\mathcal{I}_{\{0\}}(z_{i})}\,(1-\pi_{i})^{1-\mathcal{I}_{\{0\}}(z_{i})}}_{L_{1}(\bm{\eta})}\underbrace{\prod_{i=1}^{n}\left\{\dfrac{1}{\sqrt{\phi_{i}}\,z_{i}}\,g\!\left(\frac{1}{\phi_{i}}\left[\log(z_{i})-\log(Q_{i})+\sqrt{\phi_{i}}\,z_{p}\right]^{2}\right)\right\}^{1-\mathcal{I}_{\{0\}}(z_{i})}}_{L_{2}(\bm{\beta},\bm{\kappa})}.

Taking the logarithm in 3, we obtain the log-likelihood function

(𝜽)\displaystyle\small\ell(\bm{\theta}) =\displaystyle= i=1nlog(fZ(zi;Qi,ϕi,πi))\displaystyle\sum_{i=1}^{n}\log(f_{Z}(z_{i};Q_{i},\phi_{i},\pi_{i}))
=\displaystyle= i=1n{0}(zi)log(πi)+(1{0}(zi))log(1πi)1(𝜼)\displaystyle\underbrace{\sum_{i=1}^{n}{\mathcal{I}_{\{0\}}(z_{i})}\log(\pi_{i})\,+(1-\mathcal{I}_{\{0\}}(z_{i}))\log(1-\pi_{i})}_{\ell_{1}(\bm{\eta})}
×i=1n(1{0}(zi))log{1ϕizig(1ϕi[log(zi)log(Qi)+ϕizp]2)}2(𝜷,𝜿).\displaystyle\times\underbrace{\sum_{i=1}^{n}(1-\mathcal{I}_{\{0\}}(z_{i}))\log\left\{\dfrac{1}{\sqrt{\phi_{i}}\,z_{i}}\,g\!\left(\frac{1}{\phi_{i}}\left[\log(z_{i})-\log(Q_{i})+\sqrt{\phi_{i}}\,z_{p}\right]^{2}\right)\right\}}_{\ell_{2}(\bm{\beta},\bm{\kappa})}.

Note that in 3, (𝜽)\ell(\bm{\theta}) is factored in two terms (Pace and Salvan,, 1997), one that is associated with the probability of occurrence of zero, 1(𝜼)\ell_{1}(\bm{\eta}), and another that is associated with a continuous and positive part, 2(𝜷,𝜿)\ell_{2}(\bm{\beta},\bm{\kappa}). Therefore, the maximum likelihood estimates can be obtained independently for 𝜽\bm{\theta} and (𝜷,𝜿)(\bm{\beta}^{\top},\bm{\kappa}^{\top})^{\top}, that is, the maximization is performed separately for 1(𝜼)\ell_{1}(\bm{\eta}) and 2(𝜷,𝜿)\ell_{2}(\bm{\beta},\bm{\kappa}). Nevertheless, as there is no analytical solution, an iterative procedure can be used for non-linear optimization, in particular the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is used in this work.

The quantile-ZALS regression model proposed above can be interpreted as being divided into two equations:

  • Participation equation

    {(Zi=0)=πi=exp(𝒗i𝜼)1+exp(𝒗i𝜼),di=0,ifzi=0,(Zi>0)=1πi=11+exp(𝒗i𝜼),di=1ifzi>0,\begin{cases}\mathbb{P}(Z_{i}=0)=\pi_{i}=\frac{\exp(\bm{v}^{\top}_{i}\bm{\eta})}{1+\exp(\bm{v}^{\top}_{i}\bm{\eta})},&\quad d_{i}=0,\,\text{if}\;z_{i}=0,\\ \mathbb{P}(Z_{i}>0)=1-\pi_{i}=\frac{1}{1+\exp(\bm{v}^{\top}_{i}\bm{\eta})},&\quad d_{i}=1\,\text{if}\;z_{i}>0,\\ \end{cases} (7)

    where di=1d_{i}=1 if the individual participates and di=0d_{i}=0 otherwise.,

  • Intensity equation

    Qi=Q(Zi|di=1)=exp(𝒙i𝜷),Q_{i}=Q(Z_{i}|d_{i}=1)=\exp(\bm{x}_{i}^{\top}\bm{\beta}), (8)

    where Q(Zi|di=1)Q(Z_{i}|d_{i}=1) is the quantile ofZifZ_{i} given that di=1d_{i}=1.

4 Monte Carlo simulation

A Monte Carlo simulation study is carried out to evaluate the performance of the maximum likelihood estimates of the quantile-ZALS regression model. The performance is assessed using bias and mean square error (MSE) estimates, given by

Bias^(θ^)=1NREPi=1NREPθ^(i)θandEQM^(θ^)=1NREPi=1NREP(θ^(i)θ)2,\widehat{\textrm{Bias}}(\widehat{\theta})=\frac{1}{\text{NREP}}\sum_{i=1}^{\text{NREP}}\widehat{\theta}^{(i)}-\theta\quad\text{and}\quad\widehat{\mathrm{EQM}}(\widehat{\theta})=\frac{1}{\text{NREP}}\sum_{i=1}^{\text{NREP}}(\widehat{\theta}^{(i)}-\theta)^{2},

where θ\theta and θ^(i)\widehat{\theta}^{(i)} denote the true parameter value and its respective ii-th maximum likelihood estimate, respectively, and NREP is the number of Monte Carlo replicas. We use the R software has been used to do all numerical calculations; see R Core Team, (2020). The model used to generate the samples is given by

Qi=exp(β0+β1xi1+β2xi2),ϕi=exp(κ0+κ1wi1+κ2wi2);andπi=exp(η0+η1vi1+η2vi2)1+exp(η0+η1vi1+η2vi2),Q_{i}=\exp\left(\beta_{0}+\beta_{1}x_{i1}+\beta_{2}x_{i2}\right),\,\,\phi_{i}=\exp\left(\kappa_{0}+\kappa_{1}w_{i1}+\kappa_{2}w_{i2}\right);\text{and}\;\pi_{i}=\frac{\exp\left(\eta_{0}+\eta_{1}v_{i1}+\eta_{2}v_{i2}\right)}{1+\exp\left(\eta_{0}+\eta_{1}v_{i1}+\eta_{2}v_{i2}\right)},

where the reference distribution is log-normal (the results of the log-Student-tt, log-power-exponential and extended Birnbaum-Saunders are similar and are therefore omitted here). The simulation has the following setting: (β0,β1,β2)=(0.5,0.7,1.0)(\beta_{0},\beta_{1},\beta_{2})=(0.5,0.7,1.0), (κ0,κ1,κ2)=(0.5,0.8,1.0)(\kappa_{0},\kappa_{1},\kappa_{2})=(0.5,0.8,1.0), (η0,η1,η2)=(0.5,0.3,0.5)(\eta_{0},\eta_{1},\eta_{2})=(0.5,0.3,0.5), with NREP=5,000\text{NREP}=5,000 Monte Carlo replicas.The explanatory variables xix_{i}, wiw_{i} and viv_{i} are generated from the Uniform(0,1) distribution. Figures 1-3 show the Monte Carlo simulation results for q={0.10,0.50,0.90}q=\{0.10,0.50,0.90\}. An analysis of the results allows us to conclude that, in general, as the sample size increases, the bias and MSE decrease, as expected.

Refer to caption
(a) Bias^\widehat{\textrm{Bias}}(β^i\widehat{\beta}_{i})
Refer to caption
(b) MSE^\widehat{\textrm{MSE}}(β^i\widehat{\beta}_{i})
Refer to caption
(c) Bias^\widehat{\textrm{Bias}}(κ^i\widehat{\kappa}_{i})
Refer to caption
(d) MSE^\widehat{\textrm{MSE}}(κ^i\widehat{\kappa}_{i})
Refer to caption
(e) Bias^\widehat{\textrm{Bias}}(η^i\widehat{\eta}_{i})
Refer to caption
(f) MSE^\widehat{\textrm{MSE}}(η^i\widehat{\eta}_{i})
Figure 1: Bias and MSE estimates for q=0.10q=0.10 (i={0,1,2}i=\{0,1,2\}).
Refer to caption
(a) Bias^\widehat{\textrm{Bias}}(β^i\widehat{\beta}_{i})
Refer to caption
(b) MSE^\widehat{\textrm{MSE}}(β^i\widehat{\beta}_{i})
Refer to caption
(c) Bias^\widehat{\textrm{Bias}}(κ^i\widehat{\kappa}_{i})
Refer to caption
(d) MSE^\widehat{\textrm{MSE}}(κ^i\widehat{\kappa}_{i})
Refer to caption
(e) Bias^\widehat{\textrm{Bias}}(η^i\widehat{\eta}_{i})
Refer to caption
(f) MSE^\widehat{\textrm{MSE}}(η^i\widehat{\eta}_{i})
Figure 2: Bias and MSE estimates for q=0.50q=0.50 (i={0,1,2}i=\{0,1,2\}).
Refer to caption
(a) Bias^\widehat{\textrm{Bias}}(β^i\widehat{\beta}_{i})
Refer to caption
(b) MSE^\widehat{\textrm{MSE}}(β^i\widehat{\beta}_{i})
Refer to caption
(c) Bias^\widehat{\textrm{Bias}}(κ^i\widehat{\kappa}_{i})
Refer to caption
(d) MSE^\widehat{\textrm{MSE}}(κ^i\widehat{\kappa}_{i})
Refer to caption
(e) Bias^\widehat{\textrm{Bias}}(η^i\widehat{\eta}_{i})
Refer to caption
(f) MSE^\widehat{\textrm{MSE}}(η^i\widehat{\eta}_{i})
Figure 3: Bias and MSE estimates for q=0.90q=0.90 (i={0,1,2}i=\{0,1,2\}).

5 Application to extramarital affairs data

In this subsection, the quantile-ZALS regression models are illustrated using data from extramarital affairs. The database is available at Fair, (1978) and the objective here is to study the allocation of time in extramarital affairs for men and women married for the first time. There are 6,366 observations in the database, where the dependent variable (ZZ) is the time spent in extramarital affairs, and the explanatory variables are

  • ratemarr: rating of the marriage, coded 1 to 4;

  • age: age, in years;

  • yrsmarr: number of years married;

  • numkids: number of children;

  • relig: religiosity, code 1 to 4, 1 = not, 4 = very;

  • educ: education, coded 9, 12, 14, 16, 17, 20, that is, 9 = elementary school, 12 = high school, . . . , 20 = doctorate or other;

  • wifeocc: wife’s occupation - Hollingshead scale;

  • husocc: husband’s occupation - Hollingshead scale.

Descriptive statistics for the time spent in extramarital affairs (ZZ) indicate that the mean, median and standard deviation are given by 0.705, 0 and 2.203, respectively. The coefficient of variation is 312.37%, indicating a high dispersion of data around the mean. The coefficients of skewness and kurtosis are equal to 8.761 and 131.912, respectively, which shows the presence of a high positive skewness and the presence of heavy tails. Then, the hypothesis of the use of log-symmetric distributions is plausible. Note that the asymmetric nature of the data is confirmed by the histogram shown in Figure 4(a). Note also a high concentration of zero values in the sample, about 4,313 individuals have 0 time spent in extramarital affairs.

Refer to caption
Refer to caption
Figure 4: Histogram (a) for the time spent in extramarital affairs and QQ plot (b) and its envelope for the randomized quantile residuals based on the quantile-ZAEBS regression model (q=0.50q=0.50).

The proposed quantile-ZALS regression models can accommodate heteroscedasticity, then specifications with and without explanatory variables in the relative dispersion ϕ\phi can be fitted. We consider here quantile-ZALS regression models based on the log-normal (quantile-ZALNO), log-Student-tt (quantile-ZALtt), log-power-exponential (quantile-ZALPE) and extended Birnbaum-Saunders (quantile-ZAEBS) distributions. An analysis of the significance of the coefficients suggested the following specification:

Qi=exp(β0+β1ratemarr+β2yrsmarr+β3numkids+β4relig)\displaystyle Q_{i}=\exp\left(\beta_{0}+\beta_{1}\texttt{ratemarr}+\beta_{2}\texttt{yrsmarr}+\beta_{3}\texttt{numkids}+\beta_{4}\texttt{relig}\right)
ϕi=exp(κ0+κ1age)and\displaystyle\phi_{i}=\exp\left(\kappa_{0}+\kappa_{1}\texttt{age}\right)\,\,\mbox{and}
πi=Λ(η0+η1ratemarr+η2age+η3yrsmarr+η4relig+η5educ\displaystyle\pi_{i}=\Lambda\left(\eta_{0}+\eta_{1}\texttt{ratemarr}+\eta_{2}\texttt{age}+\eta_{3}\texttt{yrsmarr}+\eta_{4}\texttt{relig}+\eta_{5}\texttt{educ}\right.
+η6wifeocc).\displaystyle\left.+\eta_{6}\texttt{wifeocc}\right).

Table 1 reports the results of the averages of the AIC and BIC values based on q={0.01,0.02,,0.99}q=\{0.01,0.02,\ldots,0.99\} for the proposed quantile-ZALS regression models. The results indicate that the lowest values of AIC and BIC are those based on the quantile-ZALPE and quantile-ZAEBS models, with a slight advantage of the latter.

Table 1: Averages of the of the AIC and BIC values with q={0.01,0.02,,0.99}q=\{0.01,0.02,\ldots,0.99\} for different models.
Model
Criterion quantile-ZALNO quantile-ZALtt quantile-ZALPE quantile-ZAEBS
AIC 13165.14 13173.48 13163.49 13163.41
BIC 13259.77 13268.10 13258.12 13258.03

Since the results of the quantile-ZAEBS model showed the best results, we can compare them with the results of the zero-adjusted gamma (ZAGA) and zero-adjusted inverse Gaussian (ZAIG) regression models, studied by Heller et al., (2006) and Stasinopoulos et al., (2017). Table reports the results of AIC and BIC for these models, and we observe that the quantile-ZAEBS model has the best fit. The QQ plot for the randomized quantile residuals (Dunn and Smyth,, 1996) of this model for q=0.50q=0.50 is shown in Figure 4(b); similar plots are obtained for other values of qq. The results then indicate that the proposed model present adjustments that are superior to existing models in the literature.

Table 2: AIC and BIC values for different quantile-ZALS regression models.
Model
quantile-ZAEBS ZAGA ZAIG
(q=0.50q=0.50)
AIC 13163.18 13466.62 13953.84
BIC 13257.80 13580.48 14067.71

The estimates of the parameters of the quantile-ZAEBS model for pii\ pi_{i} (discrete component) are shown in Table 3, and those for QiQ_{i} and ϕi\phi_{i} in Figure 5. The figure shows asymmetric dynamics. For example, the estimates of β0\beta_{0} (β2\beta_{2}) tend to increase (decrease) with the increase of qq. In general, such results show the importance of considering a quantile approach.

Table 3: Estimated parameters (standard errors in parentheses) of the discrete part of the quantile-ZAEBS regression model.
Interc. ratemarr age yrsmarr relig educ wifeocc
Estimate 3.7371* -0.7153* -0.0602* 0.1095* -0.3760* -0.0380* 0.1628*
Standard error (0.2961) (0.0314) (0.0103) (0.0097) (0.0346) (0.0153) (0.0337)
* significant at 5% level. ** significant at 10% level.
Refer to caption
(a) β0\beta_{0}
Refer to caption
(b) β1\beta_{1}
Refer to caption
(c) β2\beta_{2}
Refer to caption
(d) β3\beta_{3}
Refer to caption
(e) β4\beta_{4}
Refer to caption
(f) κ0\kappa_{0}
Refer to caption
(g) κ1\kappa_{1}
Figure 5: Parameter estimates (confidence intervals in grey) for the positive continuous part of the quantile-ZAEBS regression model.

6 Concluding remarks

In this work, a class of zero-adjusted log-symmetric quantile regression models was proposed. The proposed regression model is based on a zero-adjusted version of the log-symmetric distributions parameterized by the quantile, also proposed in this work. The quantile proposal provides wide flexibility in the analysis of the effects of the explanatory variables on the dependent variable, which makes the proposed model a more interesting alternative than the existing zero-adjusted log-symmetric regression models (Cosavalente and Cysneiros,, 2021; Cosavalente,, 2021). The estimation of the parameters was performed by the maximum likelihood method and a Monte Carlo simulation study was carried out to evaluate the performance of the maximum likelihood estimates. An application of the proposed models is used to study the allocation of time in extramarital affairs of men and women married for the first time. The results showed that the proposed zero-adjusted log-symmetric quantile regression models perform better than the existing zero-adjusted gamma (ZAGA) and zero-adjusted inverse Gaussian (ZAIG) regression models studied by Heller et al., (2006) and Stasinopoulos et al., (2017).

References

  • Aitchison and Brown, (1957) Aitchison, J. and Brown, J. A. C. (1957). The Lognormal Distribution with Special Reference to its Uses in Econometrics. University of Cambridge Department of Applied Economics Monograph: 5. Cambridge University Press, 1st edition.
  • Cosavalente, (2021) Cosavalente, D. R. R. (2021). Distribução zero ajustada log-simétrica: estimação e modelagem. Dissertação de mestrado, Universidade Federal de Pernambuco, Programa de Pós-Graduação em Estatística do Centro de Ciências Exatas e da Natureza, Brasil.
  • Cosavalente and Cysneiros, (2021) Cosavalente, D. R. R. and Cysneiros, F. (2021). The zero-adjusted log-symmetric distributions: point and intervalar estimation. Annals of the Brazilian Academy of Sciences.
  • Cragg, (1971) Cragg, J. G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39:829–844.
  • Cunha et al., (2021) Cunha, D. R., Divino, J. A., and Saulo, H. (2021). On a log-symmetric quantile tobit model applied to female labor supply data. arXiv:2103.04449.
  • Dunn and Smyth, (1996) Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5.
  • Fair, (1978) Fair, R. (1978). A theory of extramarital affairs. Journal of Political Economy, 86:45–61.
  • Gilchrist, (2000) Gilchrist, W. (2000). Statistical Modelling with Quantile Functions. Chapman & Hall/CRC, Boca Raton, FL, 1st edition.
  • Heller et al., (2006) Heller, G., Stasinopoulos, M., and Rigby, B. (2006). The zero-adjusted inverse gaussian distribution as a model for insurance claims. In J. Hinde, J. E. and Newell, J., editors, Proceedings of the 21th International Workshop on Statistical Modelling, pages 226–233, Galway, Ireland. Statistical Modelling Society, University of Lancaster.
  • Koenker and Bassett Jr, (1978) Koenker, R. and Bassett Jr, G. (1978). Regression quantiles. Econometrica, 46:33–50.
  • Leiva et al., (2016) Leiva, V., Santos-Neto, M., Cysneiros, F. J. A., and Barros, M. (2016). A methodology for stochastic inventory models based on a zero-adjusted birnbaum-saunders distribution. Applied Stochastic Models in Business and Industry, 32(1):74–89.
  • Medeiros and Ferrari, (2017) Medeiros, M. C. and Ferrari, S. L. P. (2017). Small-sample testing inference in symmetric and log-symmetric linear regression models. Statistica Neerlandica, 71:200–224.
  • Pace and Salvan, (1997) Pace, L. and Salvan, A. (1997). Principles of Statistical Inference from a Neo-Fisherian Perspective. Advanced Series on Statistical Science & Applied Probability: Volume 4. World scientific, 1st edition.
  • R Core Team, (2020) R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Saulo et al., (2021) Saulo, H., Dasilva, A., Leiva, V., Sánchez, L., and Fuente-Mella, H. L. (2021). Log-symmetric quantile regression models. Statistica Neerlandica, pages 1–26.
  • Stasinopoulos et al., (2017) Stasinopoulos, M. D., Rigby, R. A., Heller, G. Z., Voudouris, V., and Bastiani, F. D. (2017). Flexible regression and smoothing : using GAMLSS in R. Chapman & Hall/CRC the R series (CRC Press). Chapman and Hall/CRC, 1 edition.
  • Tomazella et al., (2019) Tomazella, V., Pereira, G. H. A., Nobre, J. S., and Santos-Neto, M. (2019). Zero-adjusted reparameterized birnbaum-saunders regression model. Statistics & Probability Letters, 149:142–145.
  • Vanegas and Paula, (2015) Vanegas, L. H. and Paula, G. A. (2015). A semiparametric approach for joint modeling of median and skewness. Test, 24:110–135.
  • Vanegas and Paula, (2016) Vanegas, L. H. and Paula, G. A. (2016). Log-symmetric distributions: statistical properties and parameter estimation. Brazilian Journal of Probability and Statistics, 30:196–220.
  • Vanegas and Paula, (2017) Vanegas, L. H. and Paula, G. A. (2017). Log-symmetric regression models under the presence of non-informative left- or right-censored observations. Test, 26:405–428.