A General Framework for Random Effects Models for Binary, Ordinal, Count Type and Continuous Dependent Variables Including Variable Selection

Gerhard Tutz et al
Ludwig-Maximilians-Universität München
Akademiestraße 1, 80799 München

Abstract

A general random effects model is proposed that allows for continuous as well as discrete distributions of the responses. Responses can be unrestricted continuous, bounded continuous, binary, ordered categorical or given in the form of counts. The distribution of the responses is not restricted to exponential families, which is a severe restriction in generalized mixed models. Generalized mixed models use fixed distributions for responses, for example the Poisson distribution in count data, which has the disadvantage of not accounting for overdispersion. By using a response function and a thresholds function the proposed mixed thresholds model can account for a variety of alternative distributions that often show better fits than fixed distributions used within the generalized linear model framework. A particular strength of the model is that it provides a tool for joint modeling, responses may be of different types, some can be discrete, others continuous. In addition to introducing the mixed thresholds model parameter sparsity is addressed. Random effects models can contain a large number of parameters, in particular if effects have to be assumed as measurement-specific. Methods to obtain sparser representations are proposed and illustrated. The methods are shown to work in the thresholds model but could also be adapted to other modeling approaches.

Keywords: Random effects models; joint modeling; count data; bounded continuous data; ordinal data.

1 Introduction

Random effects models are a strong tool to model the heterogeneity of clustered responses. By postulating the existence of unobserved latent variables, the so-called random effects, which are shared by the measurement within a cluster, correlation between the measurements within clusters is introduced. The clusters or units can refer to persons in repeated measurement trials or to larger units as, for example, schools with measurements referring to performance scores of students.

Detailed expositions of linear mixed models, which are typically used for continuous dependent variables, are found in Hsiao (1986), Lindsey (1993), and Jones (1993). Models for binary variables and counts are often discusses within the framework of generalized mixed models, see, for example, McCulloch and Searle (2001). Random effects models for ordinal dependent variables were considered by Harville and Mee (1984), Jansen (1990), Tutz and Hennevogl (1996) and Hartzel et al. (2001). Mixed model versions for continuous bounded data in the form of rates and proportions that take values in the interval $(0,1)$ , have been considered by Qiu et al. (2008) based on the simplex model and by Bonat et al. (2015) who propagate beta distribution models. Several R packages are available to fit generalized mixed linaer models, for example, glmmTMB (Brooks et al., 2017) for various continuous and discrete distributions, lme4 for continuous and binary data, ordinal and MultOrdRS (Schauberger, 2024) for ordinal data.

Generalized mixed models within the generalized linear model framework as well as extended approaches as the models for bounded continuous data postulate familiar fixed distributions for the responses. This is different in the approach propagated here although there is some overlap with generalized linear models. The mixed thresholds model used here gains its flexibility concerning distributional assumptions by using two components, a response function, which is a distribution function, and a thresholds function that modifies the distribution. In the simplest case, by assuming a linear thresholds function, the distribution of the responses follows the response function. Thus, familiar linear Gaussian response models are obtained but also linear models with quite different, possibly skewed distribution functions are available. Distributions with a restricted support, for example if responses are observed in an interval or are positive only, are obtained by using non-linear thresholds functions. They are also useful when modeling discrete data, which typically are restricted to a specific range, for example, count data take only values $0,1,\dots$ and ordered categorical responses can be coded by $0,1,\dots,k$ , where numbers only represent the order of values. In the case of binary and ordered categorical responses cumulative generalized mixed model is a special case of the proposed thresholds model.

More concrete, let $y_{i1,\dots,}y_{im}$ denote the observations on unit $i$ ( $i=1,\dots,n$ ), which can be continuous or discrete. In addition, let $\boldsymbol{x}_{ij},\boldsymbol{z}_{ij}$ denote covariates associated with response $y_{ij}$ . Then, the Mixed Thresholds Model (MTM) has the form

\displaystyle P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(y))),

(1)

where $F(.)$ is a strictly increasing distribution function and $\delta_{j}(.))$ is a non-decreasing measurement-specific function defined on the support $S$ of the dependent variables, referred to as thresholds function. The predictor $\eta_{ij}=\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(y))$ contains two components linked to explanatory variables.

The term $\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}$ contains the cluster-specific effects $\boldsymbol{b}_{i}$ . They are assumed to vary independently across clusters and are assumed to follow a specific distribution, typically the normal distribution, $\boldsymbol{b}_{i}\thicksim N(\boldsymbol{0},\boldsymbol{\Sigma})$ .
The term $\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}$ contains the effects of $\boldsymbol{x}_{ij}$ on the dependent variable. The parameters ${\boldsymbol{\beta}}_{j}$ are fixed population-specific parameters.

The distribution of the dependent variables is crucially determined by the choice of the distribution function $F(.)$ , also referred to as response function, and the thresholds function $\delta_{j}(.)$ . Specific choices yield models that are in common use in random effects modeling. Other choices widen the toolbox yielding models that show better fits than classical approaches. Thresholds models have been considered before in the form of item thresholds model (Tutz, 2022), which are latent trait models that aim at measurement and do not contain any covariates. The objective of the models considered here is quite different. The focus is on the effect of covariates on responses, with covariates that can take quite dfferent forms. They can be measurement-specific, unit-specific or both, which raises problems, in particular with the possible number of parameters involved.

The role the response and the thresholds function play in modeling the response distribution will become obvious when considering specific choices. In Section 2 the case of continuous responses is considered with linear and non-linear thresholds functions. Section 3 is devoted to discrete data with infinite and finite support. Joint modeling, which allows for different types of responses, in particular a mixture of continuous and discrete responses, are considered in Section 4. Marginal likelihood estimation methods are given in Section 5. In Section 6 the problem of obtaining sparser representations is addressed. Several small examples are used to demonstrate the versatility of the approach. They are meant for illustration, no in-depth investigation of effects is given.

2 Continuous Dependent Variables

We start with models that contain linear thresholds functions. The model class comprises the classical normal response model but allows for alternative distributions. Then we consider models for restricted support, which call for non-linear thresholds functions.

2.1 Linear Random Effects Models for Gaussian Data and Other Distributions

Let the dependent variables be continuous with support $\mathbb{R}$ . Then a thresholds function that is simple but already yields very flexible models is the linear thresholds function

\delta_{j}(y)=\delta_{0j}+\delta_{j}y,\quad\delta_{j}>0.

Let $F(.)$ denote a fixed, typically standardized, distribution function with support $\mathbb{R}$ , for example the standardized normal distribution function. Then, the means $\mu_{ij}=\operatorname{E}(Y_{ij})$ and variances $\sigma_{ij}^{2}=\operatorname{var}(Y_{ij})$ of dependent variables have a very simple form,

\displaystyle\mu_{ij}=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),\quad\sigma_{ij}^{2}=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}},

(2)

where $\beta_{0j}=-\mu_{F}-\delta_{0j}$ , and $\mu_{F},\sigma_{F}^{2}$ are constants that are determined by the distribution function $F(.)$ . More concrete, $\mu_{F}=\int yf(y)dy$ is the expectation corresponding to distribution function $F(.)$ and $\sigma_{F}^{2}=\operatorname{var}_{F}=\int(y-\mu_{F})^{2}f(y)dy$ the corresponding variance; for a proof see Proposition 7.1.

It is seen that the means of dependent variables are simple linear functions of $\boldsymbol{z}_{ij},\boldsymbol{x}_{ij}$ and the variances vary across measurements. It is a linear model but not necessarily for Gaussian data. Responses can take any strictly increasing distribution function. The model parameterizes the mean as a linear function of covariates also if responses follow a skewed distribution, which might be more appropriate in applications.

The distribution of responses is easily derived since the density $f_{ij}(.)$ of variable $Y_{ij}$ is given by

\displaystyle f_{ij}(y)=f(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}-\delta_{j}y)\delta_{j},

(3)

where $f(.)$ denotes the density linked to $F(.)$ , $f(y)=\partial F(y)/{\partial y}$ . It means, in particular, that for symmetric distribution functions $F(.)$ the distributions of all dependent variables are scaled and shifted versions of the distribution specified by $F(.)$ . If $F(.)$ is not symmetric the distributions of all dependent variables are scaled and shifted versions of the distribution function $\tilde{F}(y)=1-F(-y)$ . This is easily seen by considering the distribution function of $Y_{ij}$ , which is given by $P(Y_{ij}\leq y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=1-P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})$ .

A simplified version is the homogeneous GMT model, in which $\delta_{j}$ does not depend on the measurement, that is $\delta_{1}=\dots=\delta_{m}=\delta$ . Then, the variance is the same for all observations and the mean and variance have the simple form

\displaystyle\mu_{ij}=\tilde{\beta}_{0j}+\boldsymbol{z}_{ij}^{T}\tilde{\boldsymbol{b}}_{i}+\boldsymbol{x}_{ij}^{T}\tilde{\boldsymbol{\beta}}_{j},\quad\sigma_{ij}^{2}={\tilde{\sigma}_{F}^{2}},

(4)

where $\tilde{\beta}_{0j},\tilde{\boldsymbol{b}}_{i},,\tilde{\boldsymbol{\beta}}_{j},\tilde{\sigma}_{F}^{2}$ are the original parameters divided by $\delta$ .

A special case of the GMTM is the classical linear random effects model with normally distributed dependent variables. Let $F(.)$ denote the standardized normal distribution function and assume that dispersion homogeneity holds ( $\delta_{1}=\dots=\delta_{m}=\delta$ ). Then, the dependent variables are normally distributed with means and variance given by (4). A more familiar representation of the model is the vector-valued representation

\boldsymbol{y}_{i}={\boldsymbol{\beta}}_{0}+\boldsymbol{X}_{i}{\boldsymbol{\beta}}+\boldsymbol{Z}_{i}\boldsymbol{b}_{i}+{\boldsymbol{\varepsilon}}_{i},\boldsymbol{b}_{i}\sim N(\boldsymbol{0},\boldsymbol{\mathit{\Sigma}}),{\boldsymbol{\varepsilon}}_{i}\sim N(\boldsymbol{0},\tilde{\sigma}_{F}^{2}\boldsymbol{I})

where $\boldsymbol{y}_{i}^{T}=(y_{i1},\dots,y_{im})$ , the matrices $\boldsymbol{X}_{i},\boldsymbol{Z}_{i}$ are composed from the vectors $\boldsymbol{z}_{ij},\boldsymbol{x}_{ij}$ and $\boldsymbol{I}$ denotes the unit matrix.

The more general model with varying dispersion parameters $\delta_{j}$ is an extension of this classical model. It is more flexible and can be more appropriate, in particular when repeated measurements on a unit are time-dependent and the variance changes over time.

Within the MTM framework there is no need to assume that dependent variable are normally distributed. Any strictly monotone distribution function $F(.)$ can be used in the model. With linear thresholds function $\delta_{j}(.)$ one obtains a linear form of the expectation and simple terms for the variance. In particular skewed distribution can be used. This extends the usual normal distribution approach to modeling clustered data to a wider class of models with a simple link between covariates and measurements.

Rent data

For illustration we use the Munich rent index data. The variables are rent (monthly rent in Euros), floor (floor space), rooms (number of rooms), age in years. There are 25 districts (residential areas) which have an effect on rents and are modeled as random effects. For an extensive description of the rent data see Fahrmeir et al. (2011). The rent is the dependent variable, floor space, number of rooms and age are explanatory variables.

Monthly rents tend to have a right-skewed distribution since there are typically some houses that are much more expensive than the average house. Thus, the normal distribution might not be the best choice for modeling this kind of data. A candidate for a right right-skewed distribution is the Gumbel distribution $F(y)=\exp(-\exp(-y))$ , which is the distribution of responses if one chooses the Gompertz distribution $F(y)=1-\exp(-\exp(y))$ as response function $F(.)$ . As the log-likelihoods in Table 1 show the assumption of the Gumbel distribution for responses (F(.) is chosen as the Gompertz distribution) yields better fit. The Gompertz distribution as a left-skewed distribution for responses (if $F(.)$ is chosen as the Gumbel distribution function ) yields much worse fit and is not shown.

Table 1: Parameter estimates for self esteem dat

response distribution	Floor	Rooms	Age	std mixture	log-lik
Normal	0.073	-0.593	-0.010	0.244	-3389.992
stderr	( 0.004)	(0.087)	(0.002)
Gumbel	0.050	-0.280	-0.012	0.331	-3366.059
stderr	( 0.004)	(0.099)	(0.002)

2.2 Random Effects Models for Positive-Valued Variables

In many applications the dependent variable can take only positive values, for example if responses are response times. Although often used, the assumption of a normal distribution or any other distribution with support $\mathbb{R}$ is not warranted and will only yield a crude approximation to the true distribution.

In the MTM the support of the dependent variable can be restricted by using an appropriate thresholds function $\delta_{j}(.)$ . If the difficulty function is chosen such that $\lim_{y\rightarrow 0}\delta_{i}(y)=-\infty$ holds the responses automatically has positive values, $y\geq 0$ . One candidate that can be chosen is the logarithmic thresholds function

\delta_{j}(y)=\delta_{0j}+\delta_{j}\log(y).

Thresholds functions of this type combine linearity with a transformation function. The general form of thresholds functions of this type, which are used throughout the paper, is

\displaystyle\delta_{j}(y)=\delta_{0j}+\delta_{j}g(y),

(5)

where $g(.)$ is a non-decreasing function. Thresholds functions, of this form are simply named after the transformation function $g(.)$ .

If the thresholds function is logarithmic a familiar distribution is found if $F(.)$ is chosen as the standard normal distribution function. Then, one can derive that the density of $y_{ij}$ denoted by $f_{ij}(.)$ is given by

\displaystyle f_{ij}(y)=

\displaystyle\frac{1}{\bar{\sigma}_{j}y\sqrt{2\pi}}\exp(\frac{-(\log(y)-\tilde{\mu}_{ij})^{2}}{\bar{\sigma}_{j}^{2}})

(6)

where $\tilde{\mu}_{ij}=\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}\tilde{\eta}_{ij}\delta_{j}$ , $\bar{\sigma}_{j}=1/\delta_{j}$ . This is the lognormal distribution with parameters $\bar{\mu}_{pi},\bar{\sigma}_{i}$ . Thus the logarithmic thresholds function generates a random effects model in which dependent variables follow a lognormal distribution.

Sleep Data

In a sleep deprivation study the average reaction time per day for subjects has been measured (data set sleepstudy from package lme4). On day 1 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. The 10 days represent the repeated measurements on 18 persons.

Instead of using a normal distribution model with linear effect of days a thresholds model with normal response function and logarithmic thresholds function is used. In the model

\displaystyle P(Y_{ij}>y|b_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(b_{i}-\delta_{0j}-\delta_{j}\log(y)),

the random effect $b_{i}$ refers to the person and $\delta_{0j}$ accounts for the effect of days. It is not assumed that the mean is a linear function of days as is common in typical random effects models. Instead, the basic variation of responses over repeated measurements is captured in the parameters $\delta_{0j}$ . The parameters $\delta_{j}$ account for possible heterogeneity of variances. The $\log(y)$ function, which makes the dependent variables follow a log-normal distribution shows slightly better fit than the common normal distribution model. The log-likelihood was -872.96 with the $\log(y)$ function and -875.50 for the identity function (Gaussian distribution). Figure 1 shows the fitted densities for days 1,3,5,9 for $b_{i}=0$ and $b_{i}=1$ . It is seen that the distributions have quite different forms and variances vary across days. The mean reaction time increases as well as the variances increase over days of sleep deprivation .

Refer to caption — Figure 1: Reaction time for days 1,3,5,9 of sleep deprivation for $b_{i}=0$ (left) and $b_{i}=1$ (right).

2.3 Random Effects Models for Continuous Bounded Data

Various regression model have been proposed for continuous bounded data in the form of rates and proportions that take values in the interval $(0,1)$ , see, for example, Kieschnick and McCullough (2003), Bonat et al. (2019). Also mixed model versions for repeated measurement have been developed, in particular the simplex mixed model (Qiu et al., 2008) and beta mixed models (Bonat et al., 2015).

Let us more generally consider the case where $Y_{ij}\in(a,b)$ . The restriction of responses to the interval is obtained within the thresholds model framework by choosing thresholds functions for which $\lim_{y\rightarrow a}\delta_{i}(y)=-\infty$ and $\lim_{y\rightarrow b}\delta_{i}(y)=\infty$ hold since then $P(Y_{ij}>a)\rightarrow 1$ and $P(Y_{ij}>b)\rightarrow 0$ . A thresholds function that meets these demands is, for example, the logit thresholds function

\delta_{j}(y)=\delta_{0j}+\delta_{j}\log((y-a)/(b-y)).

Instead of $g(.)=\log((y-a)/(b-y)$ any inverse distribution function can be used. The logistic distribution function is just one option, which yields the logit function with $g(y)=\log((y-a)/(b-y))$ .

While simple terms for means and variances of the dependent variable can be found only for simple thresholds functions as the linear one it is straightforward to show that for any (non-decreasing) transformation function $g(y)$ means and expectations of the transformed variables $g(Y_{ij})$ are given by

\displaystyle\operatorname{E}(g(Y_{ij}))=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),\quad\operatorname{var}(g(Y_{ij}))=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}},

(7)

see Proposition 7.2 for a proof. That means the mean of the responses is a linear function of covariates and variances can vary over measurements.

To illustrate the restriction to intervals generated by properly chosen thresholds functions Figure 2 shows the obtained distributions if $F(.)$ is the normal distribution, the predictor is $\eta_{ij}=b_{i}+x_{i}\beta$ with binary predictor $x_{i}$ , $\beta=1$ and $\delta_{j}(y)=\log((y-a)/(b-y)),a=0,b=10$ . In the left picture $b_{i}=0$ , in the right picture $b_{i}=0.5$ . The drawn line shows the density for $x_{i}=0$ , the dashed line for $x_{i}=1$ . It is seen that the logit type distribution ensures that the support of the distribution is $(0,1)$ . For larger random effect (right picture) the density becomes larger close to the upper boundary.

3 Discrete Data

Discrete data come in two forms, with infinite support and finite support. We will consider first the case of infinite support than the case where the response is in categories. In the latter typically only ordinal scale level is assumed for the dependent variable.

3.1 Random Effects Models for Count Data

Let the responses be counts, that is, $Y_{ij}$ takes values from $\{0,1,\dots\}$ . If responses are assumed to follow a Poisson distribution mixed models can be formulated within the generalized linear model framework. Extended versions that are able to account for overdispersion as the negative binomial model have been considered by Tempelman and Gianola (1996), Molenberghs et al. (2007), Zhang et al. (2017). In psychometrics also the Conway-Maxwell-Poisson model has been used albeit without covariates (Forthmann et al., 2020).

In the mixed thresholds model the discrete density or mass function is given by

	$\displaystyle f_{ij}(0)$	$\displaystyle=1-P(Y_{pi}>0)=1-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(0))),$
	$\displaystyle f_{ij}(r)$	$\displaystyle=P(Y_{ij}>r-1)-P(Y_{ij}>r)=$
		$\displaystyle=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(r-1)))-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(r))),r=1,2,\dots$

A thresholds function that ensures that the responses have support $0,1,\dots$ is the shifted logarithmic thresholds function

\delta_{j}(y)=\delta_{0j}+\delta_{j}\log(1+y).

The resulting model is as flexible as models that account for overdispersion. In particular the varying slopes $\delta_{j}$ make it very flexible and able to account for changes in distributional shape over measurements.

Epileptics Data

The response in the data set epil from R package mass is the number of seizures in a fixed period (four periods considered). As covariates we use age, treatment (1: treatment, 0: placebo) and base (number of seizures at the beginning of the trial). The model uses a logarithmic thresholds function $\delta_{j}(y)=\delta_{j0}+\delta_{j}\log(1+y)$ . As response function $F(.)$ we used again the normal, the Gompertz and the Gumbel distribution. As the following table shows the Gumbel distribution shows much better fit than the normal distribution, the fit of the Gompertz was much worse (not given).

Response distribution	treatment	base	age	log-lik	AIC
Gumbel	-0.534	0.059	0.019	-604.520	1233.040
NV	-0.479	0.048	0.024	-622.706	1269.412

The treatment effect is more pronounced in the Gumbel model yielding the likelihood-ratio test 5.658 as compared to 3.29 if the normal distribution is used. For illustration Figure 3 shows the densities for period 2 and 4 for placebo (left) and treatment (right) when using the Gumbel distribution, The other variables were chosen by $b_{i}=0$ , base=20, age=20, diamonds indicate period four, circles period two. It is seen that treatment distictly reduces the number of seizures.

The thresholds model fits much better than the Poisson model, which yields log-likelihood -651.8071 and AIC 1313.614. It also fits better than the more general negative binomial model, which yielded log-likelihood -609.711. AIC values of the Gumbel thresholds model and the negative binomial model are comparable (AIC for negative binomial model: 1231.423. Fitting of the Poisson and the negative binomial model was done by using the R package glmmTMB (Brooks et al., 2017).

3.2 Random Effects Models for Ordered Responses

Let $Y_{ij}$ take values from $\{1,\dots,k\}$ and assume that categories are ordered. The typical mixed model for this type of data is the cumulative mixed model considered among others by Jansen (1990) and Tutz and Hennevogl (1996) for univariate random effects. It has the form

\displaystyle P(Y_{ij}>r|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\beta_{0r}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}})),\quad r=1,\dots,k-1

(8)

with ordered intercepts $\beta_{0r}\geq\beta_{0,r+1}$ for all $r$ . For random intercepts $\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}=b_{i}$ it can be fitted by using the package ordinal .

The model is equivalent to the thresholds model

\displaystyle P(Y_{ij}>r|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{j}(r)))

(9)

with the special choices $\delta_{j}(r)=-\beta_{0r}$ and ${\boldsymbol{\beta}}_{1}=\dots={\boldsymbol{\beta}}_{m}={\boldsymbol{\beta}}$ .

However, the model (9) without restrictions is more general than the simple cumulative model (8). In the simple cumulative model the thresholds $\beta_{01}\geq\dots\geq\beta_{0,k-1}$ do not depend on the measurement $j$ . It is implicitly assumed that only covariates $\boldsymbol{x}_{ij}$ modify the distribution of the dependent variables. This is far too restrictive in many applications. In addition, in model (8) the effect of covariates does not depend on the measurement ( ${\boldsymbol{\beta}}_{1}=\dots={\boldsymbol{\beta}}_{m}={\boldsymbol{\beta}}$ ). This is hardly realistic, in particular if dependent variables refer to different variables as in the fears data example considered in the following.

Since the range of outcomes is bounded similar thresholds functions as in the case of bounded continuous data should be used. We use the logit type function

\delta_{j}(y)=\delta_{0j}+\delta_{j}\log((y-a)/(b-y)).

with $a<1$ but close to 1, $b=k$ . grid or optimization, but very stable over varying values of $a$ .

An advantage of using a threshold function of the form $\delta_{j}(y)=\delta_{0j}+\delta_{j}g(y)$ instead of letting all intercepts vary freely (apart from order restrictions) is that sparser representations are obtained. Only $2m$ parameters are needed instead of $m(k-1)$ , also order restriction problems do not occur.

Fears Data

As an illustrating example, we consider data from the German Longitudinal Election Study (GLES). The data originate from the pre-election survey for the German federal election in 2017 and are concerned with political fears. The participants were asked: “How afraid are you due to the …” - (1) refugee crisis? - (2) global climate change? - (3) international terrorism? - (4) globalization? - (5) use of nuclear energy? The answers were measured on Likert scales from 1 (not afraid at all) to 7 (very afraid).

We fitted a discrete thresholds with logistic response function and logit difficulty function including covariates, gender (1: female; 0: male), standardized age in decades, EastWest (1: Eastern German countries/former GDR, 0: Western German countries/former FRG) and Abitur (High school degree for the admission to the university, 1:yes, 0:no). Table 2 shows the estimates of item parameters. The parameters given are the intercept and the slope of the difficulty function $\delta(y)=\delta_{0j}+\delta_{j}\log((y-a)/(b-y))$ . It is seen that all items show significant covariate effects for at least one of the covariates ( z-values given below estimates). Older respondents tend to be more afraid than younger respondents, in particular concerning terrorism but less concerning climate change. Females have for all items higher fear levels than males, the effects of EastWest are rather mixed, people from the Eastern parts of the country are more afraid of globalization but less afraid of nuclear energy. Higher education seems to reduce the level of fears. The necessity of covariates is also supported by testing. The log-likelihood test that compares the model without covariates to the model with covariates is 88.081on 10 df. Thus, the covariates turn out to be influential if one accounts for the heterogeneity in the population.

The model fits better than the common cumulative model (8), in which thresholds do not depend on $j$ . By default the package ordinal fits models with global covariate effects, that is, ${\boldsymbol{\beta}}_{1}=\dots={\boldsymbol{\beta}}_{k-1}={\boldsymbol{\beta}}$ . By constructing appropriate design matrices it is possible to fit variable-specific covariate effects. The corresponding model has log-likelihood -1710.76 which is much smaller than -1684.871 for the thresholds model. Consequently, in terms of AIC values the thresholds model fits better. AIC value for the model with measurement-specific covariate effects but global thresholds effects is 3475.52 (20 covariate effects, 6 thresholds, variance of mixing distribution). For the thresholds model one obtains 3431.87 (20 covariate effects, 10 difficulty function parameters, variance of mixing distribution).

Table 2: Parameter estimates for the fears data with logit difficulty function, logistic response function, z-values of parameter estimates of covariate parameters are given in the lower part, variable age was standardized.

		Parameters
	Item	intercepts	slopes	Age	Gender	EastWest	Abitur
				measurement-	specific	effects
1	refugee	-0.341	1.653	0.151	0.606	0.392	-1.388
2	climate change	-1.140	1.858	0.013	1.061	-0.594	-0.062
3	terrorism	-2.062	1.795	0.393	1.329	0.321	-1.207
4	globalization	0.716	1.862	0.195	1.020	0.725	-0.719
5	nuclear energy	-0.922	1.641	0.254	0.379	-0.416	-0.300
	Log-lik	-1684.871
					$z-$ values	covariates
1	refugee			0.937	1.883	1.179	-4.002
2	climate change			0.079	3.313	-1.798	-0.186
3	terrorism			2.354	3.935	0.939	-3.468
4	globalization			1.236	3.184	2.206	-2.108
5	nuclear energy			1.565	1.189	-1.253	-0.892
					global	effects
	Log-lik	-1715.82		0.230	1.030	0.290	-0.545

4 Joint Modeling of Different Types of Responses

A strength of the thresholds model is that dependent variables can be of various types. It allows for some of the measurements to be continous while others are binary, ordinal or given as counts. Also the combination of continuous measurements with differing support can be modeled in a joint random effects model. There have been some approaches to joint modeling of different types of responses in specificsettings, see, for example, Ivanova et al. (2016) with a focus on ordinal variables or Loeys et al. (2011), where a joint modeling approach for reaction time and accuracy in psycholinguistic experiments has been proposed. However, no general random effects model that allows to combine different types of responses seems available

The flexibility of thresholds models to account for various types of measurement is due to the general form of the model. Since the same model form, which specifies $P(Y_{ij}>y)$ , applies to different types of measurement it is straightforward to obtain a joint model simply by allowing for different distributions (continuous or discrete) and specifying the thresholds function accordingly. For example, if measurement 1 is continuous and measurement 2 ordered categorical, one can choose for the first measurement the linear or logarithmic thresholds function (depending on the support) and for the second measurement the logistic thresholds function. The common random effects will account for the correlation between measurements without the need for an explicit new concept for the correlation between a continuous and an ordered categorical variable.

Sleep Data

As an illustrating example let us again consider the sleep deprivation data. Instead of using the average reaction time per day in all 10 measurements, the last two measurements were transformed to ordered categorical data. More concrete, the interval $(200,500)$ , which covers the reaction times, has been divided into six equidistant intervals and responses were coded as $1,\dots,6$ according to the responses in intervals. For the first eight measurement the logarithmic threshold function has been chosen, and they are specified as continuous. For the last two measurements the logit thresholds function has been chosen and the measurements are specified as discrete with values $1,\dots,6$ . The fitting of the model with normal response function yielded log-likelihood -768.597, which has to differ from the likelihood of the model with continuous responses considered earlier (-875.50) since now a combination of continuous and discrete distributions is assumed.

However, the variation of reaction times over days remains essentially the same. Figure 4 shows the densities for days 1,3,5 for $b_{i}=0$ (left) and $b_{i}=1$ (right). They are practically the same as the densities given in Figure 1, which shows the densities if all variables are considered as continuous.

5 Estimation

A general form of the threshold function term is given by $\delta_{j}(y))={\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j}$ , where ${\boldsymbol{\Phi}}_{j}(y)=(\Phi_{j0}(y),\dots,\Phi_{jM}(y))$ is a vector that contains functions of $y$ . If the same threshold function is used for all measurements it can be specified as ${\boldsymbol{\Phi}}_{j}(y)^{T}=(1,g(y))$ . Measurement-specific thresholds functions can be obtained by ${\boldsymbol{\Phi}}_{j}(y)^{T}=(1,g_{j}(y))$ . But also more general vectors of functions can be useful. Using this parameterization, the model has the form

\displaystyle P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j}),

(10)

For continuous measurement $j$ the density is

\displaystyle f_{ij}(y)=f(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}-{\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j}){\boldsymbol{\Phi}}_{j}^{{}^{\prime}}(y)^{T}\boldsymbol{\delta}_{j},

where ${\boldsymbol{\Phi}}^{{}^{\prime}}_{j}(y)=(\Phi^{{}^{\prime}}_{j0}(y),\dots,\Phi^{{}^{\prime}}_{jM}(y))$ contains the derivatives of the components and $f(.)$ is the density linked to $F(.)$ . If ${\boldsymbol{\Phi}}_{j}(y)^{T}=(1,g(y))$ one obtains the simpler form

\displaystyle f_{ij}(y)=f(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-\delta_{0j}-{\boldsymbol{\Phi}}_{j}(y)^{T}\boldsymbol{\delta}_{j})\delta_{j}g^{{}^{\prime}}_{j}(y).

For discrete data with $Y_{ij}\in\{0,1,\dots\}$ . the discrete density is given by

	$\displaystyle f_{ij}(0)$	$\displaystyle=1-P(Y_{pi}>0)=1-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(0)^{T}\boldsymbol{\delta}_{j}),$
	$\displaystyle f_{ij}(r)$	$\displaystyle=P(Y_{ij}>r-1)-P(Y_{ij}>r)=$
		$\displaystyle=F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(r-1)^{T}\boldsymbol{\delta}_{j})-F(\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}-{\boldsymbol{\Phi}}_{j}(r)^{T}\boldsymbol{\delta}_{j}),r=1,2,\dots$

When assuming that random effects are normally distributed with mean zero and covariance matrix $\boldsymbol{\mathit{\Sigma}}$ the marginal log-likelihood has the form

\displaystyle L(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\})=\prod_{i=1}^{n}\int\prod_{j=1}^{m}f_{ij}(y_{ij})f_{0,\boldsymbol{\mathit{\Sigma}}}(\boldsymbol{b}_{i})d\boldsymbol{b}_{i},

where $f_{0,\boldsymbol{\mathit{\Sigma}}}(.)$ is the density of the normal distribution with mean zero and covariance matrix $\boldsymbol{\mathit{\Sigma}}$ . The corresponding log-likelihood is given by

\displaystyle l(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\})=\log(L(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\}))=\sum_{i=1}^{n}\log(\int\prod_{j=1}^{m}f_{ij}(y_{ij})f_{0,\boldsymbol{\mathit{\Sigma}}}(\boldsymbol{b}_{i})d\boldsymbol{b}_{i}),

Maximization of the marginal log-likelihood can be obtained by using Gauss Hermite quadrature, which has been used in generalized mixed models, see, for example, Anderson and Aitkin (1985), Liu and Pierce (1994), Hedeker and Gibbons (1994), Rodríguez (2008), Gueorguieva (2001). An alternatively is the EM, which was considered for mixed models among others by Bock and Aitkin (1981) Anderson and Aitkin (1985). Overviews on inference tools for generalized moxed models are found in Jiang and Nguyen (2007), McCulloch and Searle (2001).

6 Finding Sparser Models

As is seen from the fears data the effects of covariates can vary strongly across measurements. For example, the variable EastWest has positive values for some items but negative values for other items. People living in the Eastern part of Germany tend to show stronger fears concerning globalization and refugees but lesser fears concerning the climate crisis than people living in the Western part of Germany. Thus, the cultural imprint from living in different countries with differing experiences seems to still be present.

The inclusion of measurement-specific fixed effects ${\boldsymbol{\beta}}_{j}$ make the general model considered here very flexible. A possible downside is that the model contains many parameters. Each variable comes with as many parameters as measurements are used. However, not each variable necessarily has effects that vary across measurements. Selecting those variables that have global effects, that is effects that do not vary across measurements, would yield sparser models with an easier interpretation.

One strategy to identify variables with global effect is testing. Likelihood ratio test for hypotheses of the form

\displaystyle H_{0}:\beta_{1j}=\dots=\beta_{mj}\text{ against }H_{1}:\beta_{sj}\neq\gamma_{lj}\text{ for at least one pair }(s,l)

can be used to test if single variables are global. Table 3 shows the results of testing for the fears data. It is seen that all of the variables should be considered as having effects that vary over measurements.

Table 3: Testing if covariates can considered global, one at a time

Age	Gender	EastWest	Abitur	log-likelihood	LR test	df
var	var	var	var	-1684.871
global	var	var	var	-1695.435	21.128	3
var	global	var	var	-1708.562	47.382	3
var	var	global	var	–1704.869	39.996	3
var	var	var	global	-1702.910	36.079	3

An alternative way to obtain sparser representations is to use penalization methods that have been widely used since the advent of the Lasso (Tibshirani, 1996). Instead of maximizing the usual marginal log-likelihood one maximizes the penalized log-likelihood

\displaystyle l_{p}(\{{\boldsymbol{\beta}}_{j}\},\{\boldsymbol{\delta}_{j}\})=l(\{{\boldsymbol{\beta}}_{j},\{\boldsymbol{\delta}_{j}\})-P_{\lambda}(\{{\boldsymbol{\beta}}_{j}\}),

where $l(.)$ is the usual log-likelihood and $P_{\boldsymbol{\lambda}}(\{{\boldsymbol{\beta}}_{j}\})$ is a penalty term depending on a tuning parameter $\lambda$ .

A penalty term that enforces the building of global variables and simultaneously select variables is given by

\displaystyle P_{\lambda}(\{{\boldsymbol{\beta}}_{j}\})=\lambda_{f}(\sum_{s=1}^{m}||{\boldsymbol{\beta}}_{s,\text{dif}}||+\sum_{s=1}^{m}||{\boldsymbol{\beta}}_{.s}||)

where $||.||$ is the $L_{2}$ -norm, ${\boldsymbol{\beta}}_{s,\text{dif}}^{T}=(\beta_{1s}-\beta_{2s},\beta_{2s}-\beta_{3s},\dots)$ contains all differences of effects between pairs of measurement for the $s$ th variable, and ${\boldsymbol{\beta}}_{.s}^{T}=(\beta_{1s}\dots,\beta_{ms})$ is the vector that collects all parameters linked to the $s$ -th variable. It is a penalty term built from two terms. The first term enforces the building of global variables. It is a grouped lasso type penalty that aims at the fusion of parameters. It enforces that whole groups of differences are set to zero with the group referring to differences of effects of single variables. In common grouped lasso penalties as considered by Yuan and Lin (2006) and Meier et al. (2008) groups refer to parameters linked to variables variables, not differences of parameters linked to variables. Fusion penalties have been used in particular to combine effects of adjacent categories when modeling the effects of categorical predictors, see Tutz and Gertheiss (2016), Gertheiss and Tutz (2023) and to enforce selection and shrinkage to global effects, for example, by Oelker and Tutz (2017). The second term is a selection term. It enforces that all parameters linked to specific variables are set to zero.

Figure 5 shows the resulting coefficient paths plotted against $\log(1+\lambda)$ . For growing $\lambda$ coefficients become more similar (global) while the shrinkage toward zero (elimination of variables) sets in rather late. Zero estimates are obtained for very large values of $\lambda$ , well beyond the values that are shown. This is sensible since the variables certainly have an effect on the levels of fear. The last panel shows the negative cross-validation log-likelihood (5-fold cross-validation). It is seen that the smallest levels are obtained if $\log(1+\lambda)$ is slightly above 2. In this range coefficients are still measurement-specific, which supports the testing results.

7 Concluding Remarks

It has been demonstrated that the thresholds model is very versatile and can be adapted to quite different distribution functions. Moreover, it can be used in joint modeling of different responses, which has not been considered in the literature since appropriate models have not been available. A rare exception is the modeling of survival data where joint modeling of survival times and longitudinal data has been considered, see, for example, Hsieh et al. (2006).

We used marginal estimation based on integration methods, but alternative estimation from the toolbox of mixed models could be used. Also alternative regularization methods could be useful when trying to find sparser representations, for an overview on basic regularization method including boosting, see Hastie et al. (2009).

The class of mixed thresholds models could be made even more flexible by letting the data decide which thresholds function is appropriate. In particular, B-splines as considered extensively by Eilers and Marx (2021) could be useful, which has been demonstrated by Tutz (2022) in the item-response setting without covariates.

Appendix

For simplicity, let the model be given in the form

\displaystyle P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\tilde{\eta}_{ij}-\delta_{j}(y)),

(11)

where $\tilde{\eta}_{ij}=\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}$ and $F(.)$ is a strictly increasing distribution function.

Proposition 7.1

Let the threshold function have the form $\delta_{j}(y)=\delta_{0j}+\delta_{j}y$ , $\delta_{j}\geq 0$ . Then, one obtains for the expectation and the variance

	$\displaystyle\operatorname{E}(Y_{ij})=\mu_{ij}=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),$
	$\displaystyle\operatorname{var}(Y_{ij})=\sigma_{ij}^{2}=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}},$

where $\beta_{0j}=-\mu_{F}-\delta_{0j}$ with $\mu_{F}=\int yf(y)dy$ , $\sigma_{F}^{2}=\operatorname{var}_{F}=\int(y-\mu_{F})^{2}f(y)dy$ , $f(y)=\frac{\partial F(y)}{\partial y}$ .

Proof: For linear item function the thresholds model has the form $P(Y_{ij}>y|\boldsymbol{b}_{i},\boldsymbol{z}_{ij},\boldsymbol{x}_{ij})=F(\tilde{\eta}_{ij}-\delta_{j}(y))$ . The corresponding distribution function is

F_{Y_{ij}}(y)=P(Y_{ij}\leq y)=1-F(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y),

which has the density

f_{Y_{ij}}(y)=\frac{\partial F_{Y_{ij}}(y)}{\partial y}=f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y)\delta_{j}.

Thus, the mean is given by

\operatorname{E}(Y_{ij})=\delta_{j}\int yf(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y)dy.

With $\eta=\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y$ and $d\eta/dy=-\delta_{j}$ one obtains

	$\displaystyle\operatorname{E}(Y_{pi})$	$\displaystyle=-\frac{1}{\delta_{j}}\int_{\infty}^{-\infty}(\tilde{\eta}_{ij}-\eta-\delta_{0j})f(\eta)d\eta=\frac{1}{\delta_{j}}\int_{-\infty}^{\infty}(\tilde{\eta}_{ij}-\eta-\delta_{0j})f(\eta)d\eta$
		$\displaystyle=\frac{1}{\delta_{j}}(\tilde{\eta}_{ij}-\mu_{F}-\delta_{0j}),$

where $\mu_{F}=\int yf(y)dy$ is a parameter that depends on $F$ only.

The variance is given by

	$\displaystyle\operatorname{var}(Y_{pi})$	$\displaystyle=\int(y-\frac{\tilde{\eta}_{ij}-\mu_{F}-\delta_{0j}}{\delta_{j}})^{2}f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}y)\delta_{j}dy=\int(\frac{\eta-\mu_{F}}{\delta_{j}})^{2}f(\eta)d\eta$
		$\displaystyle=\operatorname{var}_{F}/{\delta_{j}}^{2},$

where $\operatorname{var}_{F}=\int({\eta-\mu})^{2}f(\eta)d\eta$ .

Proposition 7.2

Let the threshold function have the form $\delta_{j}(y)=\delta_{0j}+\delta_{j}g(y)$ , $\delta_{j}\geq 0$ . Then, one obtains

\displaystyle\operatorname{E}(g(Y_{ij}))=\frac{1}{\delta_{j}}(\beta_{0j}+\boldsymbol{z}_{ij}^{T}\boldsymbol{b}_{i}+\boldsymbol{x}_{ij}^{T}{\boldsymbol{\beta}}_{j}),\quad\operatorname{var}(g(Y_{ij}))=\frac{\sigma_{F}^{2}}{\delta_{j}^{2}},

Proof: The mean is given by

\displaystyle\operatorname{E}(g(y))=\int g(y)f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}g(y))\delta_{j}g^{{}^{\prime}}(y)dy

With $\eta=\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}g(y)$ one obtains

\displaystyle\operatorname{E}(g(y))=-\int\frac{\eta-\tilde{\eta}_{ij}+\delta_{0j}}{\delta_{j}}f(\eta)d\eta=\frac{\tilde{\eta}_{ij}-\delta_{0j}-\mu_{F}}{\delta_{j}}=\frac{\tilde{\eta}_{ij}+\beta_{0j}}{\delta_{j}},

since $\beta_{0j}=\delta_{0j}-\mu_{F}$ .

The variance is given by

	$\displaystyle\operatorname{var}(g(y))$	$\displaystyle=\int(g(y)-\frac{\tilde{\eta}_{ij}-\delta_{0j}-\mu_{F}}{\delta_{j}}))^{2}f(\tilde{\eta}_{ij}-\delta_{0j}-\delta_{j}g(y))\delta_{j}g^{{}^{\prime}}(y)dy$
		$\displaystyle=\int(\frac{\eta-\mu_{F}}{\delta_{j}})^{2}f(\eta)d\eta=\operatorname{var}_{F}/{\delta_{j}}^{2}.$

References

Anderson and Aitkin (1985) Anderson, D. A. and M. Aitkin (1985). Variance component models with binary response: Interviewer variability. Journal of the Royal Statistical Society Series B 47, 203–210.
Bock and Aitkin (1981) Bock, R. D. and M. Aitkin (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM algorithm. Psychometrika 46, 443–459.
Bonat et al. (2019) Bonat, W. H., R. R. Petterle, J. Hinde, and C. G. Demétrio (2019). Flexible quasi-beta regression models for continuous bounded data. Statistical Modelling 19(6), 617–633.
Bonat et al. (2015) Bonat, W. H., P. J. Ribeiro Jr, and W. M. Zeviani (2015). Likelihood analysis for a class of beta mixed models. Journal of Applied Statistics 42(2), 252–266.
Brooks et al. (2017) Brooks, M. E., K. Kristensen, K. J. Van Benthem, A. Magnusson, C. W. Berg, A. Nielsen, H. J. Skaug, M. Machler, and B. M. Bolker (2017). glmmtmb balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R journal 9(2), 378–400.
Eilers and Marx (2021) Eilers, P. H. and B. D. Marx (2021). Practical Smoothing: The Joys of P-splines. Cambridge University Press.
Fahrmeir et al. (2011) Fahrmeir, L., T. Kneib, S. Lang, and B. Marx (2011). Regression. Models, Methods and Applications. Berlin: Springer Verlag.
Forthmann et al. (2020) Forthmann, B., D. Gühne, and P. Doebler (2020). Revisiting dispersion in count data item response theory models: The Conway–Maxwell–Poisson counts model. British Journal of Mathematical and Statistical Psychology 73(1), 32–50.
Gertheiss and Tutz (2023) Gertheiss, J. and G. Tutz (2023). Regularization and predictor selection for ordinal and categorical data. In Trends and Challenges in Categorical Data Analysis: Statistical Modelling and Interpretation, pp. 199–232. Springer.
Gueorguieva (2001) Gueorguieva, R. (2001). A multivariate generalized linear mixed model for joint modelling of clustered outcomes in the exponential family. Statistical modelling 1(3), 177–193.
Hartzel et al. (2001) Hartzel, J., I. Liu, and A. Agresti (2001). Describing heterogenous effects in stratified ordinal contingency tables, with applications to multi-center clinical trials. Computational Statistics & Data Analysis 35(4), 429–449.
Harville and Mee (1984) Harville, D. A. and R. W. Mee (1984). A mixed-model procedure for analyzing ordered categorical data. Biometrics 40, 393–408.
Hastie et al. (2009) Hastie, T., R. Tibshirani, and J. H. Friedman (2009). The Elements of Statistical Learning (Second Edition). New York: Springer-Verlag.
Hedeker and Gibbons (1994) Hedeker, D. and R. B. Gibbons (1994). A random-effects ordinal regression model for multilevel analysis. Biometrics 50, 933–944.
Hsiao (1986) Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Hsieh et al. (2006) Hsieh, F., Y.-K. Tseng, and J.-L. Wang (2006). Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics 62(4), 1037–1043.
Ivanova et al. (2016) Ivanova, A., G. Molenberghs, and G. Verbeke (2016). Mixed models approaches for joint modeling of different types of responses. Journal of Biopharmaceutical Statistics 26(4), 601–618.
Jansen (1990) Jansen, J. (1990). On the statistical analysis of ordinal data when extravariation is present. Applied Statistics 39, 74–85.
Jiang and Nguyen (2007) Jiang, J. and T. Nguyen (2007). Linear and generalized linear mixed models and their applications, Volume 1. Springer.
Jones (1993) Jones, R. H. (1993). Longitudinal Data with Serial Correlation: A State-Space Approach. London: Chapman & Hall.
Kieschnick and McCullough (2003) Kieschnick, R. and B. D. McCullough (2003). Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Statistical modelling 3(3), 193–213.
Lindsey (1993) Lindsey, J. J. (1993). Models for Repeated Measurements. Oxford: Oxford University Press.
Liu and Pierce (1994) Liu, Q. and D. A. Pierce (1994). A note on Gauss-Hermite quadrature. Biometrika 81, 624–629.
Loeys et al. (2011) Loeys, T., Y. Rosseel, and K. Baten (2011). A joint modeling approach for reaction time and accuracy in psycholinguistic experiments. Psychometrika 76, 487–503.
McCulloch and Searle (2001) McCulloch, C. and S. Searle (2001). Generalized, Linear, and Mixed Models. New York: Wiley.
Meier et al. (2008) Meier, L., S. van de Geer, and P. Bühlmann (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society, Series B 70, 53–71.
Molenberghs et al. (2007) Molenberghs, G., G. Verbeke, and C. G. Demétrio (2007). An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime data analysis 13, 513–531.
Oelker and Tutz (2017) Oelker, M. and G. Tutz (2017). A uniform framework for the combination of penalties in generalized structured models. Advances in Data Analysis and Classification 11, 97–120.
Qiu et al. (2008) Qiu, Z., P. X.-K. Song, and M. Tan (2008). Simplex mixed-effects models for longitudinal proportional data. Scandinavian Journal of Statistics 35(4), 577–596.
Rodríguez (2008) Rodríguez, G. (2008). Multilevel generalized linear models. In Handbook of multilevel analysis, pp. 335–376. Springer.
Schauberger (2024) Schauberger, G. (2024). Package multordrs.
Tempelman and Gianola (1996) Tempelman, R. J. and D. Gianola (1996). A mixed effects model for overdispersed count data in animal breeding. Biometrics, 265–279.
Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58, 267–288.
Tutz (2022) Tutz, G. (2022). Item response thresholds models: A general class of models for varying types of items. Psychometrika (87), 1238–1269.
Tutz and Gertheiss (2016) Tutz, G. and J. Gertheiss (2016). Regularized regression for categorical data. Statistical Modelling 16, 161–200.
Tutz and Hennevogl (1996) Tutz, G. and W. Hennevogl (1996). Random effects in ordinal regression models. Computational Statistics and Data Analysis 22, 537–557.
Yuan and Lin (2006) Yuan, M. and Y. Lin (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society B 68, 49–67.
Zhang et al. (2017) Zhang, X., H. Mallick, Z. Tang, L. Zhang, X. Cui, A. K. Benson, and N. Yi (2017). Negative binomial mixed models for analyzing microbiome count data. BMC bioinformatics 18, 1–10.