This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A General Framework for Random Effects Models for Binary, Ordinal, Count Type and Continuous Dependent Variables Including Variable Selection

Gerhard Tutz et al
Ludwig-Maximilians-Universität München
Akademiestraße 1, 80799 München
Abstract

A general random effects model is proposed that allows for continuous as well as discrete distributions of the responses. Responses can be unrestricted continuous, bounded continuous, binary, ordered categorical or given in the form of counts. The distribution of the responses is not restricted to exponential families, which is a severe restriction in generalized mixed models. Generalized mixed models use fixed distributions for responses, for example the Poisson distribution in count data, which has the disadvantage of not accounting for overdispersion. By using a response function and a thresholds function the proposed mixed thresholds model can account for a variety of alternative distributions that often show better fits than fixed distributions used within the generalized linear model framework. A particular strength of the model is that it provides a tool for joint modeling, responses may be of different types, some can be discrete, others continuous. In addition to introducing the mixed thresholds model parameter sparsity is addressed. Random effects models can contain a large number of parameters, in particular if effects have to be assumed as measurement-specific. Methods to obtain sparser representations are proposed and illustrated. The methods are shown to work in the thresholds model but could also be adapted to other modeling approaches.

Keywords: Random effects models; joint modeling; count data; bounded continuous data; ordinal data.

1 Introduction

Random effects models are a strong tool to model the heterogeneity of clustered responses. By postulating the existence of unobserved latent variables, the so-called random effects, which are shared by the measurement within a cluster, correlation between the measurements within clusters is introduced. The clusters or units can refer to persons in repeated measurement trials or to larger units as, for example, schools with measurements referring to performance scores of students.

Detailed expositions of linear mixed models, which are typically used for continuous dependent variables, are found in Hsiao (1986), Lindsey (1993), and Jones (1993). Models for binary variables and counts are often discusses within the framework of generalized mixed models, see, for example, McCulloch and Searle (2001). Random effects models for ordinal dependent variables were considered by Harville and Mee (1984), Jansen (1990), Tutz and Hennevogl (1996) and Hartzel et al. (2001). Mixed model versions for continuous bounded data in the form of rates and proportions that take values in the interval (0,1)(0,1), have been considered by Qiu et al. (2008) based on the simplex model and by Bonat et al. (2015) who propagate beta distribution models. Several R packages are available to fit generalized mixed linaer models, for example, glmmTMB (Brooks et al., 2017) for various continuous and discrete distributions, lme4 for continuous and binary data, ordinal and MultOrdRS (Schauberger, 2024) for ordinal data.

Generalized mixed models within the generalized linear model framework as well as extended approaches as the models for bounded continuous data postulate familiar fixed distributions for the responses. This is different in the approach propagated here although there is some overlap with generalized linear models. The mixed thresholds model used here gains its flexibility concerning distributional assumptions by using two components, a response function, which is a distribution function, and a thresholds function that modifies the distribution. In the simplest case, by assuming a linear thresholds function, the distribution of the responses follows the response function. Thus, familiar linear Gaussian response models are obtained but also linear models with quite different, possibly skewed distribution functions are available. Distributions with a restricted support, for example if responses are observed in an interval or are positive only, are obtained by using non-linear thresholds functions. They are also useful when modeling discrete data, which typically are restricted to a specific range, for example, count data take only values 0,1,0,1,\dots and ordered categorical responses can be coded by 0,1,,k0,1,\dots,k, where numbers only represent the order of values. In the case of binary and ordered categorical responses cumulative generalized mixed model is a special case of the proposed thresholds model.

More concrete, let yi1,,yimy_{i1,\dots,}y_{im} denote the observations on unit ii (i=1,,ni=1,\dots,n), which can be continuous or discrete. In addition, let 𝒙ij,𝒛ij\boldsymbol{x}_{ij},\boldsymbol{z}_{ij} denote covariates associated with response yijy_{ij}. Then, the Mixed Thresholds Model (MTM) has the form

P(Yij>y|𝒃i,𝒛ij,𝒙ij)=F(𝒛ijT𝒃i+𝒙ijT𝜷jδj(y))),\displaystyle P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(y))), (1)

where F(.)F(.) is a strictly increasing distribution function and δj(.))\delta_{j}(.)) is a non-decreasing measurement-specific function defined on the support SS of the dependent variables, referred to as thresholds function. The predictor ηij=𝒛ijT𝒃i+𝒙ijT𝜷jδj(y))\eta_{ij}=\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(y)) contains two components linked to explanatory variables.

  • The term 𝒛ijT𝒃i\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i} contains the cluster-specific effects 𝒃i\boldsymbol{b}_{i}. They are assumed to vary independently across clusters and are assumed to follow a specific distribution, typically the normal distribution, 𝒃iN(𝟎,𝚺)\boldsymbol{b}_{i}\thicksim N(\boldsymbol{0},\boldsymbol{\Sigma}).

  • The term 𝒙ijT𝜷j\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j} contains the effects of 𝒙ij\boldsymbol{x}_{ij} on the dependent variable. The parameters 𝜷j{\boldsymbol{\beta}}_{j} are fixed population-specific parameters.

The distribution of the dependent variables is crucially determined by the choice of the distribution function F(.)F(.), also referred to as response function, and the thresholds function δj(.)\delta_{j}(.). Specific choices yield models that are in common use in random effects modeling. Other choices widen the toolbox yielding models that show better fits than classical approaches. Thresholds models have been considered before in the form of item thresholds model (Tutz, 2022), which are latent trait models that aim at measurement and do not contain any covariates. The objective of the models considered here is quite different. The focus is on the effect of covariates on responses, with covariates that can take quite dfferent forms. They can be measurement-specific, unit-specific or both, which raises problems, in particular with the possible number of parameters involved.

The role the response and the thresholds function play in modeling the response distribution will become obvious when considering specific choices. In Section 2 the case of continuous responses is considered with linear and non-linear thresholds functions. Section 3 is devoted to discrete data with infinite and finite support. Joint modeling, which allows for different types of responses, in particular a mixture of continuous and discrete responses, are considered in Section 4. Marginal likelihood estimation methods are given in Section 5. In Section 6 the problem of obtaining sparser representations is addressed. Several small examples are used to demonstrate the versatility of the approach. They are meant for illustration, no in-depth investigation of effects is given.

2 Continuous Dependent Variables

We start with models that contain linear thresholds functions. The model class comprises the classical normal response model but allows for alternative distributions. Then we consider models for restricted support, which call for non-linear thresholds functions.

2.1 Linear Random Effects Models for Gaussian Data and Other Distributions

Let the dependent variables be continuous with support \mathbb{R}. Then a thresholds function that is simple but already yields very flexible models is the linear thresholds function

δj(y)=δ0j+δjy,δj>0.\delta_{j}(y)=\delta_{0j}+\delta_{j}y,\quad\delta_{j}>0.

Let F(.)F(.) denote a fixed, typically standardized, distribution function with support \mathbb{R}, for example the standardized normal distribution function. Then, the means μij=E(Yij)\mu_{ij}=\operatorname{E}(Y_{ij}) and variances σij2=var(Yij)\sigma_{ij}^{2}=\operatorname{var}(Y_{ij}) of dependent variables have a very simple form,

μij=1δj(β0j+𝒛ijT𝒃i+𝒙ijT𝜷j),σij2=σF2δj2,\displaystyle\mu_{ij}=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),\quad\sigma_{ij}^{2}=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}}, (2)

where β0j=μFδ0j\beta_{0j}=-\mu_{F}-\delta_{0j}, and μF,σF2\mu_{F},\sigma_{F}^{2} are constants that are determined by the distribution function F(.)F(.). More concrete, μF=yf(y)𝑑y\mu_{F}=\int yf(y)dy is the expectation corresponding to distribution function F(.)F(.) and σF2=varF=(yμF)2f(y)𝑑y\sigma_{F}^{2}=\operatorname{var}_{F}=\int(y-\mu_{F})^{2}f(y)dy the corresponding variance; for a proof see Proposition 7.1.

It is seen that the means of dependent variables are simple linear functions of 𝒛ij,𝒙ij\boldsymbol{z}_{ij},\boldsymbol{x}_{ij} and the variances vary across measurements. It is a linear model but not necessarily for Gaussian data. Responses can take any strictly increasing distribution function. The model parameterizes the mean as a linear function of covariates also if responses follow a skewed distribution, which might be more appropriate in applications.

The distribution of responses is easily derived since the density fij(.)f_{ij}(.) of variable YijY_{ij} is given by

fij(y)=f(𝒛ijT𝒃i+𝒙ijT𝜷jδ0jδjy)δj,\displaystyle f_{ij}(y)=f(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}-\delta_{j}y)\delta_{j}, (3)

where f(.)f(.) denotes the density linked to F(.)F(.), f(y)=F(y)/yf(y)=\partial F(y)/{\partial y}. It means, in particular, that for symmetric distribution functions F(.)F(.) the distributions of all dependent variables are scaled and shifted versions of the distribution specified by F(.)F(.). If F(.)F(.) is not symmetric the distributions of all dependent variables are scaled and shifted versions of the distribution function F~(y)=1F(y)\tilde{F}(y)=1-F(-y). This is easily seen by considering the distribution function of YijY_{ij}, which is given by P(Yijy|𝒃i,𝒛ij,𝒙ij)=1P(Yij>y|𝒃i,𝒛ij,𝒙ij)P(Y_{ij}\leq y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=1-P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij}).

A simplified version is the homogeneous GMT model, in which δj\delta_{j} does not depend on the measurement, that is δ1==δm=δ\delta_{1}=\dots=\delta_{m}=\delta. Then, the variance is the same for all observations and the mean and variance have the simple form

μij=β~0j+𝒛ijT𝒃~i+𝒙ijT𝜷~j,σij2=σ~F2,\displaystyle\mu_{ij}=\tilde{\beta}_{0j}+\boldsymbol{z}_{ij}^{T}\tilde{\boldsymbol{b}}_{i}+\boldsymbol{x}_{ij}^{T}\tilde{\boldsymbol{\beta}}_{j},\quad\sigma_{ij}^{2}={\tilde{\sigma}_{F}^{2}}, (4)

where β~0j,𝒃~i,,𝜷~j,σ~F2\tilde{\beta}_{0j},\tilde{\boldsymbol{b}}_{i},,\tilde{\boldsymbol{\beta}}_{j},\tilde{\sigma}_{F}^{2} are the original parameters divided by δ\delta.

A special case of the GMTM is the classical linear random effects model with normally distributed dependent variables. Let F(.)F(.) denote the standardized normal distribution function and assume that dispersion homogeneity holds (δ1==δm=δ\delta_{1}=\dots=\delta_{m}=\delta). Then, the dependent variables are normally distributed with means and variance given by (4). A more familiar representation of the model is the vector-valued representation

𝒚i=𝜷0+𝑿i𝜷+𝒁i𝒃i+𝜺i,𝒃iN(𝟎,𝜮),𝜺iN(𝟎,σ~F2𝑰)\boldsymbol{y}_{i}={\boldsymbol{\beta}}_{0}+\boldsymbol{X}_{i}{\boldsymbol{\beta}}+\boldsymbol{Z}_{i}\boldsymbol{b}_{i}+{\boldsymbol{\varepsilon}}_{i},\boldsymbol{b}_{i}\sim N(\boldsymbol{0},\boldsymbol{\mathit{\Sigma}}),{\boldsymbol{\varepsilon}}_{i}\sim N(\boldsymbol{0},\tilde{\sigma}_{F}^{2}\boldsymbol{I})

where 𝒚iT=(yi1,,yim)\boldsymbol{y}_{i}^{T}=(y_{i1},\dots,y_{im}), the matrices 𝑿i,𝒁i\boldsymbol{X}_{i},\boldsymbol{Z}_{i} are composed from the vectors 𝒛ij,𝒙ij\boldsymbol{z}_{ij},\boldsymbol{x}_{ij} and 𝑰\boldsymbol{I} denotes the unit matrix.

The more general model with varying dispersion parameters δj\delta_{j} is an extension of this classical model. It is more flexible and can be more appropriate, in particular when repeated measurements on a unit are time-dependent and the variance changes over time.

Within the MTM framework there is no need to assume that dependent variable are normally distributed. Any strictly monotone distribution function F(.)F(.) can be used in the model. With linear thresholds function δj(.)\delta_{j}(.) one obtains a linear form of the expectation and simple terms for the variance. In particular skewed distribution can be used. This extends the usual normal distribution approach to modeling clustered data to a wider class of models with a simple link between covariates and measurements.

Rent data

For illustration we use the Munich rent index data. The variables are rent (monthly rent in Euros), floor (floor space), rooms (number of rooms), age in years. There are 25 districts (residential areas) which have an effect on rents and are modeled as random effects. For an extensive description of the rent data see Fahrmeir et al. (2011). The rent is the dependent variable, floor space, number of rooms and age are explanatory variables.

Monthly rents tend to have a right-skewed distribution since there are typically some houses that are much more expensive than the average house. Thus, the normal distribution might not be the best choice for modeling this kind of data. A candidate for a right right-skewed distribution is the Gumbel distribution F(y)=exp(exp(y))F(y)=\exp(-\exp(-y)), which is the distribution of responses if one chooses the Gompertz distribution F(y)=1exp(exp(y))F(y)=1-\exp(-\exp(y)) as response function F(.)F(.). As the log-likelihoods in Table 1 show the assumption of the Gumbel distribution for responses (F(.) is chosen as the Gompertz distribution) yields better fit. The Gompertz distribution as a left-skewed distribution for responses (if F(.)F(.) is chosen as the Gumbel distribution function ) yields much worse fit and is not shown.

Table 1: Parameter estimates for self esteem dat
response distribution Floor Rooms Age std mixture log-lik
Normal 0.073 -0.593 -0.010 0.244 -3389.992
stderr ( 0.004) (0.087) (0.002)
Gumbel 0.050 -0.280 -0.012 0.331 -3366.059
stderr ( 0.004) (0.099) (0.002)

2.2 Random Effects Models for Positive-Valued Variables

In many applications the dependent variable can take only positive values, for example if responses are response times. Although often used, the assumption of a normal distribution or any other distribution with support \mathbb{R} is not warranted and will only yield a crude approximation to the true distribution.

In the MTM the support of the dependent variable can be restricted by using an appropriate thresholds function δj(.)\delta_{j}(.). If the difficulty function is chosen such that limy0δi(y)=\lim_{y\rightarrow 0}\delta_{i}(y)=-\infty holds the responses automatically has positive values, y0y\geq 0. One candidate that can be chosen is the logarithmic thresholds function

δj(y)=δ0j+δjlog(y).\delta_{j}(y)=\delta_{0j}+\delta_{j}\log(y).

Thresholds functions of this type combine linearity with a transformation function. The general form of thresholds functions of this type, which are used throughout the paper, is

δj(y)=δ0j+δjg(y),\displaystyle\delta_{j}(y)=\delta_{0j}+\delta_{j}g(y), (5)

where g(.)g(.) is a non-decreasing function. Thresholds functions, of this form are simply named after the transformation function g(.)g(.).

If the thresholds function is logarithmic a familiar distribution is found if F(.)F(.) is chosen as the standard normal distribution function. Then, one can derive that the density of yijy_{ij} denoted by fij(.)f_{ij}(.) is given by

fij(y)=\displaystyle f_{ij}(y)= 1σ¯jy2πexp((log(y)μ~ij)2σ¯j2)\displaystyle\frac{1}{\bar{\sigma}_{j}y\sqrt{2\pi}}\exp(\frac{-(\log(y)-\tilde{\mu}_{ij})^{2}}{\bar{\sigma}_{j}^{2}}) (6)

where μ~ij=𝒛ijT𝒃i+𝒙ijT𝜷jδ0jη~ijδj\tilde{\mu}_{ij}=\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}\tilde{\eta}_{ij}\delta_{j}, σ¯j=1/δj\bar{\sigma}_{j}=1/\delta_{j}. This is the lognormal distribution with parameters μ¯pi,σ¯i\bar{\mu}_{pi},\bar{\sigma}_{i}. Thus the logarithmic thresholds function generates a random effects model in which dependent variables follow a lognormal distribution.

Sleep Data

In a sleep deprivation study the average reaction time per day for subjects has been measured (data set sleepstudy from package lme4). On day 1 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. The 10 days represent the repeated measurements on 18 persons.

Instead of using a normal distribution model with linear effect of days a thresholds model with normal response function and logarithmic thresholds function is used. In the model

P(Yij>y|bi,𝒛ij,𝒙ij)=F(biδ0jδjlog(y)),\displaystyle P(Y_{ij}>y|b_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(b_{i}-\delta_{0j}-\delta_{j}\log(y)),

the random effect bib_{i} refers to the person and δ0j\delta_{0j} accounts for the effect of days. It is not assumed that the mean is a linear function of days as is common in typical random effects models. Instead, the basic variation of responses over repeated measurements is captured in the parameters δ0j\delta_{0j}. The parameters δj\delta_{j} account for possible heterogeneity of variances. The log(y)\log(y) function, which makes the dependent variables follow a log-normal distribution shows slightly better fit than the common normal distribution model. The log-likelihood was -872.96 with the log(y)\log(y) function and -875.50 for the identity function (Gaussian distribution). Figure 1 shows the fitted densities for days 1,3,5,9 for bi=0b_{i}=0 and bi=1b_{i}=1. It is seen that the distributions have quite different forms and variances vary across days. The mean reaction time increases as well as the variances increase over days of sleep deprivation .

Refer to caption
Refer to caption
Figure 1: Reaction time for days 1,3,5,9 of sleep deprivation for bi=0b_{i}=0 (left) and bi=1b_{i}=1 (right).

2.3 Random Effects Models for Continuous Bounded Data

Various regression model have been proposed for continuous bounded data in the form of rates and proportions that take values in the interval (0,1)(0,1), see, for example, Kieschnick and McCullough (2003), Bonat et al. (2019). Also mixed model versions for repeated measurement have been developed, in particular the simplex mixed model (Qiu et al., 2008) and beta mixed models (Bonat et al., 2015).

Let us more generally consider the case where Yij(a,b)Y_{ij}\in(a,b). The restriction of responses to the interval is obtained within the thresholds model framework by choosing thresholds functions for which limyaδi(y)=\lim_{y\rightarrow a}\delta_{i}(y)=-\infty and limybδi(y)=\lim_{y\rightarrow b}\delta_{i}(y)=\infty hold since then P(Yij>a)1P(Y_{ij}>a)\rightarrow 1 and P(Yij>b)0P(Y_{ij}>b)\rightarrow 0. A thresholds function that meets these demands is, for example, the logit thresholds function

δj(y)=δ0j+δjlog((ya)/(by)).\delta_{j}(y)=\delta_{0j}+\delta_{j}\log((y-a)/(b-y)).

Instead of g(.)=log((ya)/(by)g(.)=\log((y-a)/(b-y) any inverse distribution function can be used. The logistic distribution function is just one option, which yields the logit function with g(y)=log((ya)/(by))g(y)=\log((y-a)/(b-y)).

While simple terms for means and variances of the dependent variable can be found only for simple thresholds functions as the linear one it is straightforward to show that for any (non-decreasing) transformation function g(y)g(y) means and expectations of the transformed variables g(Yij)g(Y_{ij}) are given by

E(g(Yij))=1δj(β0j+𝒛ijT𝒃i+𝒙ijT𝜷j),var(g(Yij))=σF2δj2,\displaystyle\operatorname{E}(g(Y_{ij}))=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),\quad\operatorname{var}(g(Y_{ij}))=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}}, (7)

see Proposition 7.2 for a proof. That means the mean of the responses is a linear function of covariates and variances can vary over measurements.

To illustrate the restriction to intervals generated by properly chosen thresholds functions Figure 2 shows the obtained distributions if F(.)F(.) is the normal distribution, the predictor is ηij=bi+xiβ\eta_{ij}=b_{i}+x_{i}\beta with binary predictor xix_{i}, β=1\beta=1 and δj(y)=log((ya)/(by)),a=0,b=10\delta_{j}(y)=\log((y-a)/(b-y)),a=0,b=10. In the left picture bi=0b_{i}=0, in the right picture bi=0.5b_{i}=0.5. The drawn line shows the density for xi=0x_{i}=0, the dashed line for xi=1x_{i}=1. It is seen that the logit type distribution ensures that the support of the distribution is (0,1)(0,1). For larger random effect (right picture) the density becomes larger close to the upper boundary.

Refer to caption
Refer to caption
Figure 2: Densities for predictor is ηij=bi+xiβ\eta_{ij}=b_{i}+x_{i}\beta with binary predictor xix_{i}, β=1\beta=1 and δj(y)=log((ya)/(by)),a=0,b=10\delta_{j}(y)=\log((y-a)/(b-y)),a=0,b=10; left: bi=0b_{i}=0; right: bi=0.5b_{i}=0.5; drawn line: xi=0x_{i}=0, dashed line: xi=1x_{i}=1.

3 Discrete Data

Discrete data come in two forms, with infinite support and finite support. We will consider first the case of infinite support than the case where the response is in categories. In the latter typically only ordinal scale level is assumed for the dependent variable.

3.1 Random Effects Models for Count Data

Let the responses be counts, that is, YijY_{ij} takes values from {0,1,}\{0,1,\dots\}. If responses are assumed to follow a Poisson distribution mixed models can be formulated within the generalized linear model framework. Extended versions that are able to account for overdispersion as the negative binomial model have been considered by Tempelman and Gianola (1996), Molenberghs et al. (2007), Zhang et al. (2017). In psychometrics also the Conway-Maxwell-Poisson model has been used albeit without covariates (Forthmann et al., 2020).

In the mixed thresholds model the discrete density or mass function is given by

fij(0)\displaystyle f_{ij}(0) =1P(Ypi>0)=1F(𝒛ijT𝒃i+𝒙ijT𝜷jδj(0))),\displaystyle=1-P(Y_{pi}>0)=1-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(0))),
fij(r)\displaystyle f_{ij}(r) =P(Yij>r1)P(Yij>r)=\displaystyle=P(Y_{ij}>r-1)-P(Y_{ij}>r)=
=F(𝒛ijT𝒃i+𝒙ijT𝜷jδj(r1)))F(𝒛ijT𝒃i+𝒙ijT𝜷jδj(r))),r=1,2,\displaystyle=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(r-1)))-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(r))),r=1,2,\dots

A thresholds function that ensures that the responses have support 0,1,0,1,\dots is the shifted logarithmic thresholds function

δj(y)=δ0j+δjlog(1+y).\delta_{j}(y)=\delta_{0j}+\delta_{j}\log(1+y).

The resulting model is as flexible as models that account for overdispersion. In particular the varying slopes δj\delta_{j} make it very flexible and able to account for changes in distributional shape over measurements.

Epileptics Data

The response in the data set epil from R package mass is the number of seizures in a fixed period (four periods considered). As covariates we use age, treatment (1: treatment, 0: placebo) and base (number of seizures at the beginning of the trial). The model uses a logarithmic thresholds function δj(y)=δj0+δjlog(1+y)\delta_{j}(y)=\delta_{j0}+\delta_{j}\log(1+y). As response function F(.)F(.) we used again the normal, the Gompertz and the Gumbel distribution. As the following table shows the Gumbel distribution shows much better fit than the normal distribution, the fit of the Gompertz was much worse (not given).

Response distribution treatment base age log-lik AIC
Gumbel -0.534 0.059 0.019 -604.520 1233.040
NV -0.479 0.048 0.024 -622.706 1269.412

The treatment effect is more pronounced in the Gumbel model yielding the likelihood-ratio test 5.658 as compared to 3.29 if the normal distribution is used. For illustration Figure 3 shows the densities for period 2 and 4 for placebo (left) and treatment (right) when using the Gumbel distribution, The other variables were chosen by bi=0b_{i}=0, base=20, age=20, diamonds indicate period four, circles period two. It is seen that treatment distictly reduces the number of seizures.

The thresholds model fits much better than the Poisson model, which yields log-likelihood -651.8071 and AIC 1313.614. It also fits better than the more general negative binomial model, which yielded log-likelihood -609.711. AIC values of the Gumbel thresholds model and the negative binomial model are comparable (AIC for negative binomial model: 1231.423. Fitting of the Poisson and the negative binomial model was done by using the R package glmmTMB (Brooks et al., 2017).

Refer to caption
Refer to caption
Figure 3: Densities for epileptics data with bi=0b_{i}=0, base=20, age=20; left: placebo, right: traetment, diamonds indicate period four, circles period two.

3.2 Random Effects Models for Ordered Responses

Let YijY_{ij} take values from {1,,k}\{1,\dots,k\} and assume that categories are ordered. The typical mixed model for this type of data is the cumulative mixed model considered among others by Jansen (1990) and Tutz and Hennevogl (1996) for univariate random effects. It has the form

P(Yij>r|𝒃i,𝒛ij,𝒙ij)=F(β0r+𝒛ijT𝒃i+𝒙ijT𝜷)),r=1,,k1\displaystyle P(Y_{ij}>r|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\beta_{0r}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}})),\quad r=1,\dots,k-1 (8)

with ordered intercepts β0rβ0,r+1\beta_{0r}\geq\beta_{0,r+1} for all rr. For random intercepts 𝒛ijT𝒃i=bi\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}=b_{i} it can be fitted by using the package ordinal .

The model is equivalent to the thresholds model

P(Yij>r|𝒃i,𝒛ij,𝒙ij)=F(𝒛ijT𝒃i+𝒙ijT𝜷jδj(r)))\displaystyle P(Y_{ij}>r|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(r))) (9)

with the special choices δj(r)=β0r\delta_{j}(r)=-\beta_{0r} and 𝜷1==𝜷m=𝜷{\boldsymbol{\beta}}_{1}=\dots={\boldsymbol{\beta}}_{m}={\boldsymbol{\beta}}.

However, the model (9) without restrictions is more general than the simple cumulative model (8). In the simple cumulative model the thresholds β01β0,k1\beta_{01}\geq\dots\geq\beta_{0,k-1} do not depend on the measurement jj. It is implicitly assumed that only covariates 𝒙ij\boldsymbol{x}_{ij} modify the distribution of the dependent variables. This is far too restrictive in many applications. In addition, in model (8) the effect of covariates does not depend on the measurement (𝜷1==𝜷m=𝜷{\boldsymbol{\beta}}_{1}=\dots={\boldsymbol{\beta}}_{m}={\boldsymbol{\beta}}). This is hardly realistic, in particular if dependent variables refer to different variables as in the fears data example considered in the following.

Since the range of outcomes is bounded similar thresholds functions as in the case of bounded continuous data should be used. We use the logit type function

δj(y)=δ0j+δjlog((ya)/(by)).\delta_{j}(y)=\delta_{0j}+\delta_{j}\log((y-a)/(b-y)).

with a<1a<1 but close to 1, b=kb=k. grid or optimization, but very stable over varying values of aa.

An advantage of using a threshold function of the form δj(y)=δ0j+δjg(y)\delta_{j}(y)=\delta_{0j}+\delta_{j}g(y) instead of letting all intercepts vary freely (apart from order restrictions) is that sparser representations are obtained. Only 2m2m parameters are needed instead of m(k1)m(k-1), also order restriction problems do not occur.

Fears Data

As an illustrating example, we consider data from the German Longitudinal Election Study (GLES). The data originate from the pre-election survey for the German federal election in 2017 and are concerned with political fears. The participants were asked: “How afraid are you due to the …” - (1) refugee crisis? - (2) global climate change? - (3) international terrorism? - (4) globalization? - (5) use of nuclear energy? The answers were measured on Likert scales from 1 (not afraid at all) to 7 (very afraid).

We fitted a discrete thresholds with logistic response function and logit difficulty function including covariates, gender (1: female; 0: male), standardized age in decades, EastWest (1: Eastern German countries/former GDR, 0: Western German countries/former FRG) and Abitur (High school degree for the admission to the university, 1:yes, 0:no). Table 2 shows the estimates of item parameters. The parameters given are the intercept and the slope of the difficulty function δ(y)=δ0j+δjlog((ya)/(by))\delta(y)=\delta_{0j}+\delta_{j}\log((y-a)/(b-y)). It is seen that all items show significant covariate effects for at least one of the covariates ( z-values given below estimates). Older respondents tend to be more afraid than younger respondents, in particular concerning terrorism but less concerning climate change. Females have for all items higher fear levels than males, the effects of EastWest are rather mixed, people from the Eastern parts of the country are more afraid of globalization but less afraid of nuclear energy. Higher education seems to reduce the level of fears. The necessity of covariates is also supported by testing. The log-likelihood test that compares the model without covariates to the model with covariates is 88.081on 10 df. Thus, the covariates turn out to be influential if one accounts for the heterogeneity in the population.

The model fits better than the common cumulative model (8), in which thresholds do not depend on jj. By default the package ordinal fits models with global covariate effects, that is, 𝜷1==𝜷k1=𝜷{\boldsymbol{\beta}}_{1}=\dots={\boldsymbol{\beta}}_{k-1}={\boldsymbol{\beta}}. By constructing appropriate design matrices it is possible to fit variable-specific covariate effects. The corresponding model has log-likelihood -1710.76 which is much smaller than -1684.871 for the thresholds model. Consequently, in terms of AIC values the thresholds model fits better. AIC value for the model with measurement-specific covariate effects but global thresholds effects is 3475.52 (20 covariate effects, 6 thresholds, variance of mixing distribution). For the thresholds model one obtains 3431.87 (20 covariate effects, 10 difficulty function parameters, variance of mixing distribution).

Table 2: Parameter estimates for the fears data with logit difficulty function, logistic response function, z-values of parameter estimates of covariate parameters are given in the lower part, variable age was standardized.
Parameters
Item intercepts slopes Age Gender EastWest Abitur
measurement- specific effects
1 refugee -0.341 1.653 0.151 0.606 0.392 -1.388
2 climate change -1.140 1.858 0.013 1.061 -0.594 -0.062
3 terrorism -2.062 1.795 0.393 1.329 0.321 -1.207
4 globalization 0.716 1.862 0.195 1.020 0.725 -0.719
5 nuclear energy -0.922 1.641 0.254 0.379 -0.416 -0.300
Log-lik -1684.871
zz-values covariates
1 refugee 0.937 1.883 1.179 -4.002
2 climate change 0.079 3.313 -1.798 -0.186
3 terrorism 2.354 3.935 0.939 -3.468
4 globalization 1.236 3.184 2.206 -2.108
5 nuclear energy 1.565 1.189 -1.253 -0.892
global effects
Log-lik -1715.82 0.230 1.030 0.290 -0.545

4 Joint Modeling of Different Types of Responses

A strength of the thresholds model is that dependent variables can be of various types. It allows for some of the measurements to be continous while others are binary, ordinal or given as counts. Also the combination of continuous measurements with differing support can be modeled in a joint random effects model. There have been some approaches to joint modeling of different types of responses in specificsettings, see, for example, Ivanova et al. (2016) with a focus on ordinal variables or Loeys et al. (2011), where a joint modeling approach for reaction time and accuracy in psycholinguistic experiments has been proposed. However, no general random effects model that allows to combine different types of responses seems available

The flexibility of thresholds models to account for various types of measurement is due to the general form of the model. Since the same model form, which specifies P(Yij>y)P(Y_{ij}>y), applies to different types of measurement it is straightforward to obtain a joint model simply by allowing for different distributions (continuous or discrete) and specifying the thresholds function accordingly. For example, if measurement 1 is continuous and measurement 2 ordered categorical, one can choose for the first measurement the linear or logarithmic thresholds function (depending on the support) and for the second measurement the logistic thresholds function. The common random effects will account for the correlation between measurements without the need for an explicit new concept for the correlation between a continuous and an ordered categorical variable.

Sleep Data

As an illustrating example let us again consider the sleep deprivation data. Instead of using the average reaction time per day in all 10 measurements, the last two measurements were transformed to ordered categorical data. More concrete, the interval (200,500)(200,500), which covers the reaction times, has been divided into six equidistant intervals and responses were coded as 1,,61,\dots,6 according to the responses in intervals. For the first eight measurement the logarithmic threshold function has been chosen, and they are specified as continuous. For the last two measurements the logit thresholds function has been chosen and the measurements are specified as discrete with values 1,,61,\dots,6. The fitting of the model with normal response function yielded log-likelihood -768.597, which has to differ from the likelihood of the model with continuous responses considered earlier (-875.50) since now a combination of continuous and discrete distributions is assumed.

However, the variation of reaction times over days remains essentially the same. Figure 4 shows the densities for days 1,3,5 for bi=0b_{i}=0 (left) and bi=1b_{i}=1 (right). They are practically the same as the densities given in Figure 1, which shows the densities if all variables are considered as continuous.

Refer to caption
Refer to caption
Figure 4: Reaction time for days 1,3,5 of sleep deprivation for bi=0b_{i}=0 (left) and bi=1b_{i}=1 (right) for mixed responses.

5 Estimation

A general form of the threshold function term is given by δj(y))=𝚽j(y)T𝜹j\delta_{j}(y))={\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j}, where 𝚽j(y)=(Φj0(y),,ΦjM(y)){\boldsymbol{\Phi}}_{j}(y)=(\Phi_{j0}(y),\dots,\Phi_{jM}(y)) is a vector that contains functions of yy. If the same threshold function is used for all measurements it can be specified as 𝚽j(y)T=(1,g(y)){\boldsymbol{\Phi}}_{j}(y)^{T}=(1,g(y)). Measurement-specific thresholds functions can be obtained by 𝚽j(y)T=(1,gj(y)){\boldsymbol{\Phi}}_{j}(y)^{T}=(1,g_{j}(y)). But also more general vectors of functions can be useful. Using this parameterization, the model has the form

P(Yij>y|𝒃i,𝒛ij,𝒙ij)=F(𝒛ijT𝒃i+𝒙ijT𝜷j𝚽j(y)T𝜹j),\displaystyle P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j}), (10)

For continuous measurement jj the density is

fij(y)=f(𝒛ijT𝒃i+𝒙ijT𝜷jδ0j𝚽j(y)T𝜹j)𝚽j(y)T𝜹j,\displaystyle f_{ij}(y)=f(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}-{\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j}){\boldsymbol{\Phi}}_{j}^{{}^{\prime}}(y)^{T}\boldsymbol{\delta}_{j},

where 𝚽j(y)=(Φj0(y),,ΦjM(y)){\boldsymbol{\Phi}}^{{}^{\prime}}_{j}(y)=(\Phi^{{}^{\prime}}_{j0}(y),\dots,\Phi^{{}^{\prime}}_{jM}(y)) contains the derivatives of the components and f(.)f(.) is the density linked to F(.)F(.). If 𝚽j(y)T=(1,g(y)){\boldsymbol{\Phi}}_{j}(y)^{T}=(1,g(y)) one obtains the simpler form

fij(y)=f(𝒛ijT𝒃i+𝒙ijT𝜷jδ0j𝚽j(y)T𝜹j)δjgj(y).\displaystyle f_{ij}(y)=f(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}-{\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j})\delta_{j}g^{{}^{\prime}}_{j}(y).

For discrete data with Yij{0,1,}Y_{ij}\in\{0,1,\dots\}. the discrete density is given by

fij(0)\displaystyle f_{ij}(0) =1P(Ypi>0)=1F(𝒛ijT𝒃i+𝒙ijT𝜷j𝚽j(0)T𝜹j),\displaystyle=1-P(Y_{pi}>0)=1-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(0)^{T}\boldsymbol{\delta}_{j}),
fij(r)\displaystyle f_{ij}(r) =P(Yij>r1)P(Yij>r)=\displaystyle=P(Y_{ij}>r-1)-P(Y_{ij}>r)=
=F(𝒛ijT𝒃i+𝒙ijT𝜷j𝚽j(r1)T𝜹j)F(𝒛ijT𝒃i+𝒙ijT𝜷j𝚽j(r)T𝜹j),r=1,2,\displaystyle=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(r-1)^{T}\boldsymbol{\delta}_{j})-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(r)^{T}\boldsymbol{\delta}_{j}),r=1,2,\dots

When assuming that random effects are normally distributed with mean zero and covariance matrix 𝜮\boldsymbol{\mathit{\Sigma}} the marginal log-likelihood has the form

L({𝜷j},{𝜹j})=i=1nj=1mfij(yij)f0,𝜮(𝒃i)d𝒃i,\displaystyle L(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\})=\prod_{i=1}^{n}\int\prod_{j=1}^{m}f_{ij}(y_{ij})f_{0,\boldsymbol{\mathit{\Sigma}}}(\boldsymbol{b}_{i})d\boldsymbol{b}_{i},

where f0,𝜮(.)f_{0,\boldsymbol{\mathit{\Sigma}}}(.) is the density of the normal distribution with mean zero and covariance matrix 𝜮\boldsymbol{\mathit{\Sigma}}. The corresponding log-likelihood is given by

l({𝜷j},{𝜹j})=log(L({𝜷j},{𝜹j}))=i=1nlog(j=1mfij(yij)f0,𝜮(𝒃i)d𝒃i),\displaystyle l(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\})=\log(L(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\}))=\sum_{i=1}^{n}\log(\int\prod_{j=1}^{m}f_{ij}(y_{ij})f_{0,\boldsymbol{\mathit{\Sigma}}}(\boldsymbol{b}_{i})d\boldsymbol{b}_{i}),

Maximization of the marginal log-likelihood can be obtained by using Gauss Hermite quadrature, which has been used in generalized mixed models, see, for example, Anderson and Aitkin (1985), Liu and Pierce (1994), Hedeker and Gibbons (1994), Rodríguez (2008), Gueorguieva (2001). An alternatively is the EM, which was considered for mixed models among others by Bock and Aitkin (1981) Anderson and Aitkin (1985). Overviews on inference tools for generalized moxed models are found in Jiang and Nguyen (2007), McCulloch and Searle (2001).

6 Finding Sparser Models

As is seen from the fears data the effects of covariates can vary strongly across measurements. For example, the variable EastWest has positive values for some items but negative values for other items. People living in the Eastern part of Germany tend to show stronger fears concerning globalization and refugees but lesser fears concerning the climate crisis than people living in the Western part of Germany. Thus, the cultural imprint from living in different countries with differing experiences seems to still be present.

The inclusion of measurement-specific fixed effects 𝜷j{\boldsymbol{\beta}}_{j} make the general model considered here very flexible. A possible downside is that the model contains many parameters. Each variable comes with as many parameters as measurements are used. However, not each variable necessarily has effects that vary across measurements. Selecting those variables that have global effects, that is effects that do not vary across measurements, would yield sparser models with an easier interpretation.

One strategy to identify variables with global effect is testing. Likelihood ratio test for hypotheses of the form

H0:β1j==βmj against H1:βsjγlj for at least one pair (s,l)\displaystyle H_{0}:\beta_{1j}=\dots=\beta_{mj}\text{ against }H_{1}:\beta_{sj}\neq\gamma_{lj}\text{ for at least one pair }(s,l)

can be used to test if single variables are global. Table 3 shows the results of testing for the fears data. It is seen that all of the variables should be considered as having effects that vary over measurements.

Table 3: Testing if covariates can considered global, one at a time
Age Gender EastWest Abitur log-likelihood LR test df
var var var var -1684.871
global var var var -1695.435 21.128 3
var global var var -1708.562 47.382 3
var var global var –1704.869 39.996 3
var var var global -1702.910 36.079 3
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Coefficient paths for grouped lasso on differences and selection; last panel is cross validation loss.

An alternative way to obtain sparser representations is to use penalization methods that have been widely used since the advent of the Lasso (Tibshirani, 1996). Instead of maximizing the usual marginal log-likelihood one maximizes the penalized log-likelihood

lp({𝜷j},{𝜹j})=l({𝜷j,{𝜹j})Pλ({𝜷j}),\displaystyle l_{p}(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\})=l(\{{\boldsymbol{\beta}}_{j},\{\boldsymbol{\delta}_{j}\})-P_{\lambda}(\{{\boldsymbol{\beta}}_{j}\}),

where l(.)l(.) is the usual log-likelihood and P𝝀({𝜷j})P_{\boldsymbol{\lambda}}(\{{\boldsymbol{\beta}}_{j}\}) is a penalty term depending on a tuning parameter λ\lambda.

A penalty term that enforces the building of global variables and simultaneously select variables is given by

Pλ({𝜷j})=λf(s=1m𝜷s,dif+s=1m𝜷.s)\displaystyle P_{\lambda}(\{{\boldsymbol{\beta}}_{j}\})=\lambda_{f}(\sum_{s=1}^{m}||{\boldsymbol{\beta}}_{s,\text{dif}}||+\sum_{s=1}^{m}||{\boldsymbol{\beta}}_{.s}||)

where ||.||||.|| is the L2L_{2}-norm, 𝜷s,difT=(β1sβ2s,β2sβ3s,){\boldsymbol{\beta}}_{s,\text{dif}}^{T}=(\beta_{1s}-\beta_{2s},\beta_{2s}-\beta_{3s},\dots) contains all differences of effects between pairs of measurement for the ssth variable, and 𝜷.sT=(β1s,βms){\boldsymbol{\beta}}_{.s}^{T}=(\beta_{1s}\dots,\beta_{ms}) is the vector that collects all parameters linked to the ss-th variable. It is a penalty term built from two terms. The first term enforces the building of global variables. It is a grouped lasso type penalty that aims at the fusion of parameters. It enforces that whole groups of differences are set to zero with the group referring to differences of effects of single variables. In common grouped lasso penalties as considered by Yuan and Lin (2006) and Meier et al. (2008) groups refer to parameters linked to variables variables, not differences of parameters linked to variables. Fusion penalties have been used in particular to combine effects of adjacent categories when modeling the effects of categorical predictors, see Tutz and Gertheiss (2016), Gertheiss and Tutz (2023) and to enforce selection and shrinkage to global effects, for example, by Oelker and Tutz (2017). The second term is a selection term. It enforces that all parameters linked to specific variables are set to zero.

Figure 5 shows the resulting coefficient paths plotted against log(1+λ)\log(1+\lambda). For growing λ\lambda coefficients become more similar (global) while the shrinkage toward zero (elimination of variables) sets in rather late. Zero estimates are obtained for very large values of λ\lambda, well beyond the values that are shown. This is sensible since the variables certainly have an effect on the levels of fear. The last panel shows the negative cross-validation log-likelihood (5-fold cross-validation). It is seen that the smallest levels are obtained if log(1+λ)\log(1+\lambda) is slightly above 2. In this range coefficients are still measurement-specific, which supports the testing results.

7 Concluding Remarks

It has been demonstrated that the thresholds model is very versatile and can be adapted to quite different distribution functions. Moreover, it can be used in joint modeling of different responses, which has not been considered in the literature since appropriate models have not been available. A rare exception is the modeling of survival data where joint modeling of survival times and longitudinal data has been considered, see, for example, Hsieh et al. (2006).

We used marginal estimation based on integration methods, but alternative estimation from the toolbox of mixed models could be used. Also alternative regularization methods could be useful when trying to find sparser representations, for an overview on basic regularization method including boosting, see Hastie et al. (2009).

The class of mixed thresholds models could be made even more flexible by letting the data decide which thresholds function is appropriate. In particular, B-splines as considered extensively by Eilers and Marx (2021) could be useful, which has been demonstrated by Tutz (2022) in the item-response setting without covariates.

Appendix

For simplicity, let the model be given in the form

P(Yij>y|𝒃i,𝒛ij,𝒙ij)=F(η~ijδj(y)),\displaystyle P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\tilde{\eta}_{ij}-\delta_{j}(y)), (11)

where η~ij=𝒛ijT𝒃i+𝒙ijT𝜷j\tilde{\eta}_{ij}=\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j} and F(.)F(.) is a strictly increasing distribution function.

Proposition 7.1

Let the threshold function have the form δj(y)=δ0j+δjy\delta_{j}(y)=\delta_{0j}+\delta_{j}y, δj0\delta_{j}\geq 0. Then, one obtains for the expectation and the variance

E(Yij)=μij=1δj(β0j+𝒛ijT𝒃i+𝒙ijT𝜷j),\displaystyle\operatorname{E}(Y_{ij})=\mu_{ij}=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),
var(Yij)=σij2=σF2δj2,\displaystyle\operatorname{var}(Y_{ij})=\sigma_{ij}^{2}=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}},

where β0j=μFδ0j\beta_{0j}=-\mu_{F}-\delta_{0j} with μF=yf(y)𝑑y\mu_{F}=\int yf(y)dy, σF2=varF=(yμF)2f(y)𝑑y\sigma_{F}^{2}=\operatorname{var}_{F}=\int(y-\mu_{F})^{2}f(y)dy, f(y)=F(y)yf(y)=\frac{\partial F(y)}{\partial y}.

Proof: For linear item function the thresholds model has the form P(Yij>y|𝒃i,𝒛ij,𝒙ij)=F(η~ijδj(y))P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\tilde{\eta}_{ij}-\delta_{j}(y)). The corresponding distribution function is

FYij(y)=P(Yijy)=1F(η~ijδ0jδjy),F_{Y_{ij}}(y)=P(Y_{ij}\leq y)=1-F(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y),

which has the density

fYij(y)=FYij(y)y=f(η~ijδ0jδjy)δj.f_{Y_{ij}}(y)=\frac{\partial F_{Y_{ij}}(y)}{\partial y}=f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y)\delta_{j}.

Thus, the mean is given by

E(Yij)=δjyf(η~ijδ0jδjy)𝑑y.\operatorname{E}(Y_{ij})=\delta_{j}\int yf(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y)dy.

With η=η~ijδ0jδjy\eta=\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y and dη/dy=δjd\eta/dy=-\delta_{j} one obtains

E(Ypi)\displaystyle\operatorname{E}(Y_{pi}) =1δj(η~ijηδ0j)f(η)𝑑η=1δj(η~ijηδ0j)f(η)𝑑η\displaystyle=-\frac{1}{\delta_{j}}\int_{\infty}^{-\infty}(\tilde{\eta}_{ij}-\eta-\delta_{0j})f(\eta)d\eta=\frac{1}{\delta_{j}}\int_{-\infty}^{\infty}(\tilde{\eta}_{ij}-\eta-\delta_{0j})f(\eta)d\eta
=1δj(η~ijμFδ0j),\displaystyle=\frac{1}{\delta_{j}}(\tilde{\eta}_{ij}-\mu_{F}-\delta_{0j}),

where μF=yf(y)𝑑y\mu_{F}=\int yf(y)dy is a parameter that depends on FF only.

The variance is given by

var(Ypi)\displaystyle\operatorname{var}(Y_{pi}) =(yη~ijμFδ0jδj)2f(η~ijδ0jδjy)δj𝑑y=(ημFδj)2f(η)𝑑η\displaystyle=\int(y-\frac{\tilde{\eta}_{ij}-\mu_{F}-\delta_{0j}}{\delta_{j}})^{2}f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y)\delta_{j}dy=\int(\frac{\eta-\mu_{F}}{\delta_{j}})^{2}f(\eta)d\eta
=varF/δj2,\displaystyle=\operatorname{var}_{F}/{\delta_{j}}^{2},

where varF=(ημ)2f(η)𝑑η\operatorname{var}_{F}=\int({\eta-\mu})^{2}f(\eta)d\eta.

Proposition 7.2

Let the threshold function have the form δj(y)=δ0j+δjg(y)\delta_{j}(y)=\delta_{0j}+\delta_{j}g(y), δj0\delta_{j}\geq 0. Then, one obtains

E(g(Yij))=1δj(β0j+𝒛ijT𝒃i+𝒙ijT𝜷j),var(g(Yij))=σF2δj2,\displaystyle\operatorname{E}(g(Y_{ij}))=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),\quad\operatorname{var}(g(Y_{ij}))=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}},

Proof: The mean is given by

E(g(y))=g(y)f(η~ijδ0jδjg(y))δjg(y)𝑑y\displaystyle\operatorname{E}(g(y))=\int g(y)f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}g(y))\delta_{j}g^{{}^{\prime}}(y)dy

With η=η~ijδ0jδjg(y)\eta=\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}g(y) one obtains

E(g(y))=ηη~ij+δ0jδjf(η)𝑑η=η~ijδ0jμFδj=η~ij+β0jδj,\displaystyle\operatorname{E}(g(y))=-\int\frac{\eta-\tilde{\eta}_{ij}+\delta_{0j}}{\delta_{j}}f(\eta)d\eta=\frac{\tilde{\eta}_{ij}-\delta_{0j}-\mu_{F}}{\delta_{j}}=\frac{\tilde{\eta}_{ij}+\beta_{0j}}{\delta_{j}},

since β0j=δ0jμF\beta_{0j}=\delta_{0j}-\mu_{F}.

The variance is given by

var(g(y))\displaystyle\operatorname{var}(g(y)) =(g(y)η~ijδ0jμFδj))2f(η~ijδ0jδjg(y))δjg(y)dy\displaystyle=\int(g(y)-\frac{\tilde{\eta}_{ij}-\delta_{0j}-\mu_{F}}{\delta_{j}}))^{2}f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}g(y))\delta_{j}g^{{}^{\prime}}(y)dy
=(ημFδj)2f(η)𝑑η=varF/δj2.\displaystyle=\int(\frac{\eta-\mu_{F}}{\delta_{j}})^{2}f(\eta)d\eta=\operatorname{var}_{F}/{\delta_{j}}^{2}.

References

  • Anderson and Aitkin (1985) Anderson, D. A. and M. Aitkin (1985). Variance component models with binary response: Interviewer variability. Journal of the Royal Statistical Society Series B 47, 203–210.
  • Bock and Aitkin (1981) Bock, R. D. and M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM algorithm. Psychometrika 46, 443–459.
  • Bonat et al. (2019) Bonat, W. H., R. R. Petterle, J. Hinde, and C. G. Demétrio (2019). Flexible quasi-beta regression models for continuous bounded data. Statistical Modelling 19(6), 617–633.
  • Bonat et al. (2015) Bonat, W. H., P. J. Ribeiro Jr, and W. M. Zeviani (2015). Likelihood analysis for a class of beta mixed models. Journal of Applied Statistics 42(2), 252–266.
  • Brooks et al. (2017) Brooks, M. E., K. Kristensen, K. J. Van Benthem, A. Magnusson, C. W. Berg, A. Nielsen, H. J. Skaug, M. Machler, and B. M. Bolker (2017). glmmtmb balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R journal 9(2), 378–400.
  • Eilers and Marx (2021) Eilers, P. H. and B. D. Marx (2021). Practical Smoothing: The Joys of P-splines. Cambridge University Press.
  • Fahrmeir et al. (2011) Fahrmeir, L., T. Kneib, S. Lang, and B. Marx (2011). Regression. Models, Methods and Applications. Berlin: Springer Verlag.
  • Forthmann et al. (2020) Forthmann, B., D. Gühne, and P. Doebler (2020). Revisiting dispersion in count data item response theory models: The Conway–Maxwell–Poisson counts model. British Journal of Mathematical and Statistical Psychology 73(1), 32–50.
  • Gertheiss and Tutz (2023) Gertheiss, J. and G. Tutz (2023). Regularization and predictor selection for ordinal and categorical data. In Trends and Challenges in Categorical Data Analysis: Statistical Modelling and Interpretation, pp.  199–232. Springer.
  • Gueorguieva (2001) Gueorguieva, R. (2001). A multivariate generalized linear mixed model for joint modelling of clustered outcomes in the exponential family. Statistical modelling 1(3), 177–193.
  • Hartzel et al. (2001) Hartzel, J., I. Liu, and A. Agresti (2001). Describing heterogenous effects in stratified ordinal contingency tables, with applications to multi-center clinical trials. Computational Statistics & Data Analysis 35(4), 429–449.
  • Harville and Mee (1984) Harville, D. A. and R. W. Mee (1984). A mixed-model procedure for analyzing ordered categorical data. Biometrics 40, 393–408.
  • Hastie et al. (2009) Hastie, T., R. Tibshirani, and J. H. Friedman (2009). The Elements of Statistical Learning (Second Edition). New York: Springer-Verlag.
  • Hedeker and Gibbons (1994) Hedeker, D. and R. B. Gibbons (1994). A random-effects ordinal regression model for multilevel analysis. Biometrics 50, 933–944.
  • Hsiao (1986) Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
  • Hsieh et al. (2006) Hsieh, F., Y.-K. Tseng, and J.-L. Wang (2006). Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics 62(4), 1037–1043.
  • Ivanova et al. (2016) Ivanova, A., G. Molenberghs, and G. Verbeke (2016). Mixed models approaches for joint modeling of different types of responses. Journal of Biopharmaceutical Statistics 26(4), 601–618.
  • Jansen (1990) Jansen, J. (1990). On the statistical analysis of ordinal data when extravariation is present. Applied Statistics 39, 74–85.
  • Jiang and Nguyen (2007) Jiang, J. and T. Nguyen (2007). Linear and generalized linear mixed models and their applications, Volume 1. Springer.
  • Jones (1993) Jones, R. H. (1993). Longitudinal Data with Serial Correlation: A State-Space Approach. London: Chapman & Hall.
  • Kieschnick and McCullough (2003) Kieschnick, R. and B. D. McCullough (2003). Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Statistical modelling 3(3), 193–213.
  • Lindsey (1993) Lindsey, J. J. (1993). Models for Repeated Measurements. Oxford: Oxford University Press.
  • Liu and Pierce (1994) Liu, Q. and D. A. Pierce (1994). A note on Gauss-Hermite quadrature. Biometrika 81, 624–629.
  • Loeys et al. (2011) Loeys, T., Y. Rosseel, and K. Baten (2011). A joint modeling approach for reaction time and accuracy in psycholinguistic experiments. Psychometrika 76, 487–503.
  • McCulloch and Searle (2001) McCulloch, C. and S. Searle (2001). Generalized, Linear, and Mixed Models. New York: Wiley.
  • Meier et al. (2008) Meier, L., S. van de Geer, and P. Bühlmann (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society, Series B 70, 53–71.
  • Molenberghs et al. (2007) Molenberghs, G., G. Verbeke, and C. G. Demétrio (2007). An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime data analysis 13, 513–531.
  • Oelker and Tutz (2017) Oelker, M. and G. Tutz (2017). A uniform framework for the combination of penalties in generalized structured models. Advances in Data Analysis and Classification 11, 97–120.
  • Qiu et al. (2008) Qiu, Z., P. X.-K. Song, and M. Tan (2008). Simplex mixed-effects models for longitudinal proportional data. Scandinavian Journal of Statistics 35(4), 577–596.
  • Rodríguez (2008) Rodríguez, G. (2008). Multilevel generalized linear models. In Handbook of multilevel analysis, pp.  335–376. Springer.
  • Schauberger (2024) Schauberger, G. (2024). Package multordrs.
  • Tempelman and Gianola (1996) Tempelman, R. J. and D. Gianola (1996). A mixed effects model for overdispersed count data in animal breeding. Biometrics, 265–279.
  • Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58, 267–288.
  • Tutz (2022) Tutz, G. (2022). Item response thresholds models: A general class of models for varying types of items. Psychometrika (87), 1238–1269.
  • Tutz and Gertheiss (2016) Tutz, G. and J. Gertheiss (2016). Regularized regression for categorical data. Statistical Modelling 16, 161–200.
  • Tutz and Hennevogl (1996) Tutz, G. and W. Hennevogl (1996). Random effects in ordinal regression models. Computational Statistics and Data Analysis 22, 537–557.
  • Yuan and Lin (2006) Yuan, M. and Y. Lin (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society B 68, 49–67.
  • Zhang et al. (2017) Zhang, X., H. Mallick, Z. Tang, L. Zhang, X. Cui, A. K. Benson, and N. Yi (2017). Negative binomial mixed models for analyzing microbiome count data. BMC bioinformatics 18, 1–10.