Inference of Random Effects for Linear Mixed-Effects Models with a Fixed Number of Clusters
Abstract
We consider a linear mixed-effects model with a clustered structure, where the parameters are estimated using maximum likelihood (ML) based on possibly unbalanced data. Inference with this model is typically done based on asymptotic theory, assuming that the number of clusters tends to infinity with the sample size. However, when the number of clusters is fixed, classical asymptotic theory developed under a divergent number of clusters is no longer valid and can lead to erroneous conclusions. In this paper, we establish the asymptotic properties of the ML estimators of random-effects parameters under a general setting, which can be applied to conduct valid statistical inference with fixed numbers of clusters. Our asymptotic theorems allow both fixed effects and random effects to be misspecified, and the dimensions of both effects to go to infinity with the sample size.
keywords:
, and
1 Introduction
Over the past several decades, linear mixed-effects models have been broadly applied to clustered data [13], longitudinal data [12, 23], spatial data [15], and data in scientific fields [10, 11], particularly due to their usefulness in modeling data with clustered structures. Model parameters are traditionally estimated, for example, via minimum norm quadratic, maximum likelihood (ML), and restricted ML (REML) methods. ML and REML estimators are compared in Gumedze and Dunne [6].
Estimating random-effects variances in mixed-effects models is usually more challenging than estimating fixed-effects parameters. Although desired asymptotic properties have been developed for ML and REML estimators of random-effects variances [7, 8, 18], these are mainly obtained under the mathematical device of requiring the number of clusters (denoted as ) to grow to infinity with the sample size (denoted as ) and the numbers of fixed effects and random effects (denoted as and ) to be fixed. In fact, most asymptotic results for likelihood ratio tests and model selection in linear mixed-effects models are established under a similar mathematical device; see Self and Liang [21], Stram and Lee [22], Crainiceanu and Ruppert [4], Pu and Niu [20], Fan and Li [5], and Peng and Lu [19]. However, in many practical situations, we are faced with a small , which does not grow to infinity with . As pointed out by McNeish and Stapleton [16] and Huang [9], data collected in the fields of education or developmental psychology typically have a small number of clusters, corresponding, for example, to classrooms or schools. Unfortunately, to the best of our knowledge, no theoretical justification has been provided for random-effects estimators when is fixed.
As shown by Maas and Hox [14], Bell et al. [1], and McNeish and Stapleton [17], for a linear mixed-effects model with few clusters, random-effects variances are not well estimated by either ML or REML. This is because when is fixed, the Fisher information for random-effects variances fails to grow with , and hence the corresponding ML estimators do not achieve consistency. A similar difficulty arises in a spatial-regression model of Chang et al. [2] under the fixed domain asymptotics, in which the spatial covariance parameters cannot be consistently estimated. A direct impact of this difficulty is that the classical central limit theorem established under for the ML (or REML) estimators [7, 8, 18] is no longer valid. Consequently, statistical inference based on the asymptotic results for can be misleading.
In this article, we focus on the ML estimators in linear mixed-effects models with possibly unbalanced data. We first develop the asymptotic properties of the ML estimators, without assuming that fixed- and random-effects models are correctly specified, and are fixed, or . Based on the asymptotic properties of the ML estimators, we provide, for the first time in the mixed-effects models literature, the asymptotic valid confidence intervals for random-effects variances when is fixed. In addition, we present an example illustrating that empirical best linear unbiased predictors (BLUPs) of random effects (which are the BLUPs with the unknown parameters replaced by their ML estimators) compare favorably to least squares (LS) predictors even when the ML estimators are not consistent; see Section 3.1 for details. Also note that our asymptotic theorems allow both fixed- and random-effects models to be misspecified. Consequently, our results are crucial to facilitate further studies on model selection for linear mixed-effects models with fixed , in which investigating the impact of model misspecification is indispensable.
This article is organized as follows. Section 2 introduces the linear mixed-effects model and the regularity conditions. The asymptotic results for the ML estimators are given in Section 3. Section 4 describes simulation studies that confirm our asymptotic theory, including a comparison between the conventional confidence intervals and the proposed ones for random-effects variances. A brief discussion is given in Section 5. The proofs of all the theoretical results are deferred to the online supplementary material.
2 Linear Mixed-Effects Models
Consider a set of observations with clusters, , where is the response vector, and are and design matrices of and covariates with the -th entries and , respectively, and is the number of observations in cluster . A general linear mixed-effects model can be written as
(1) |
where is the -vector of fixed effects, is the -vector of random effects, , and is the -dimensional identity matrix. Here and are mutually independent. Let , , , and be obtained by stacking , , , and . Also let be the block diagonal matrix with diagonal blocks and dimension , where is the total sample size. Let ; and . Then we can rewrite (1) as
(2) |
where , , and ; .
Let be the set of candidate models with and corresponding to the fixed-effects and random-effects covariates indexed by and , respectively. Then a linear mixed-effects model corresponding to can be written as
(3) |
For , let be the sub-matrix of and be the sub-vector of corresponding to . Then for ,
where is the -th column of and is the parameter vector of ; . In other words, under ,
(4) |
where
(5) | ||||
, and is the -vector of zeros. Here, for notational simplicity, we suppress the dependence of on .
For , let be the dimension of and let be the dimension of . Assume that the true model of is
(6) |
where is the underlying mean trend, is the true value of , , , and for some ; . Similarly, let with being the true values of , for . We say that a fixed-effects model is correct if there exists such that . Similarly, a random-effects model is correct if . Let and denote the sets of all correct fixed-effects and random-effects models, respectively. A linear mixed-effects model is said to be correct if . We denote the smallest correct model by , which satisfies
where and are assumed fixed.
Given a model , the covariance parameters consist of and . We estimate these by ML. We assume that and are of full column rank. The ML estimators and of and based on model can be obtained by minimizing the negative twice profile log-likelihood function:
(7) |
where
(8) | ||||
(9) |
Note that , and
For model , the ML estimator of is given by
(10) |
where satisfies
Then the ML estimator of is
where is the ML estimator of based on model .
To establish the asymptotic theory for the ML estimators of the parameters in linear mixed-effects models, we impose regularity conditions on covariates of fixed effects and random effects.
-
(A0)
Let . Assume that and , for some constant , where and .
-
(A1)
With given in (A0), there exist constants and ; , , with such that for and ,
where is the -th column of , for and .
-
(A2)
With given in (A0), there exist constants and ; , , with such that for and ,
-
(A3)
For , , and ,
where , , and are given in (A0), (A1), and (A2), respectively.
Condition (A0) allows the numbers of fixed effects and random effects (i.e., and ) to go to infinity with at a certain rate. Conditions (A1)–(A3) impose correlation constraints on and . For example, Condition (A2) implies that the maximum eigenvalue satisfies , which is similar to an assumption given in Condition 3 of Fan and Li [5].
3 Asymptotic Properties
In this section, we investigate the asymptotic properties of the ML estimators of and for any . We allow and to go to infinity with the sample size . In addition, as we allow to be fixed, we must account for the fact that may not be estimated consistently.
3.1 Asymptotics under correct specification
In this subsection, we consider a correct (but possibly overfitted) model . We derive not only the convergence rates for the ML estimators of and , but also their asymptotic distributions.
Theorem 1.
When is fixed and , it follows from (4) that does not converge to , for . This is because the data do not contain enough information for . Nevertheless, converges to , for , at a rate , which can be faster than . On the other hand, when , by applying the law of large numbers and the central limit theorem to ; , we immediately have the following corollary.
Corollary 1.
Under the assumptions of Theorem 1, as , for . If, in addition, , then
From Corollary 1, for , we obtain a confidence interval of :
(5) |
where is the -th percentile of the standard normal distribution. Although this confidence interval is commonly applied in practice (e.g., Maas and Hox [14]; McNeish and Stapleton [17]), it is only valid when is large, as detailed in a simulation experiment of Section 4.2. Thanks to Theorem 1, we can derive a confidence interval of valid for a fixed .
Theorem 2.
Under the assumptions of Theorem 1, suppose that is fixed. Then for , a confidence interval of is
(6) |
where denotes the -th percentile of the chi-square distribution on degrees of freedom.
Note that the length of the confidence interval of in (6) does not shrink to as , which is not surprising due to the fact that is not a consistent estimator of when is fixed, for .
We close this section by mentioning that although a fixed hinders us from consistently estimating , the empirical BLUPs of random effects, based on the ML estimator of , are still asymptotically more efficient than the LS predictors, as illustrated in the following example.
Example 1.
Consider model (2) with , , and fixed. Assume that (A2) holds with and . Let be the LS predictor of and be the BLUP of given . Define
Then, we show in Appendix B of the supplementary material that
where and are the ML estimators of and , and is some random variable depending on . Moreover, it is shown in the same appendix that the moments of do not exist for and
(7) |
for . Equation (7) reveals that for any fixed , the empirical BLUP, of , is asymptotically more efficient than its LS counterpart, , even when is not a consistent estimator of . In addition, the advantage of the former over the latter rapidly increases with .
3.2 Asymptotics under misspecification
In this subsection, we consider a misspecified model . We derive not only the convergence rates for and , but also their asymptotic distributions. These results are crucial for developing model selection consistency and efficiency in linear mixed-effects models under fixed ; see Chang et al. [3].
We start by investigating the asymptotic properties for the ML estimators of and for under a misspecified random-effects model.
Theorem 3.
Under the assumptions of Theorem 1, except that ,
(8) |
and
(11) |
where . In addition, if , then
Furthermore, if and , then
Note that in (8) is the dominant bias term for , which is contributed by the non-negligible random effects missed by model . It is asymptotically positive with probability one when . Hence has a non-negligible positive bias when . On the other hand, for or nearly balanced data, the following corollary shows that ; , as , even though is misspecified.
The following theorem presents the asymptotic properties of and for under a misspecified fixed-effects model.
Theorem 4.
Under the assumptions of Theorem 1 except that ,
(12) |
and
(15) |
In addition, if , then
Furthermore, if and , then
Note that in (12) is asymptotically positive with probability one when . Therefore, under the assumptions of Theorem 4, has a non-negligible positive bias when . Nevertheless, is consistent for when , as .
The following theorem establishes the asymptotic properties of and for when both the fixed-effects model and the random-effects model are misspecified.
Theorem 5.
Under the assumptions of Theorem 1 except that ,
(16) |
and
(19) |
where . In addition, if , then
Furthermore, if and , then
Note that in (16) is asymptotically positive with probability one when either or . Therefore, under the assumptions of Theorem 5, has a non-negligible positive bias when either or . Also, we have the following corollary.
Corollary 3.
4 Simulations
We conduct two simulation experiments for linear mixed-effects models. The first one examines estimation of mixed-effects models, and the second concerns confidence intervals.
4.1 Experiment 1
We generate data according to (1) with , , , and , where and are independent, for and . This setup satisfies (A1)–(A3) with and , for and . We consider parameter estimation under two scenarios corresponding to balanced data and unbalanced data. We also consider model selection under balanced data.
For parameter estimation, we first consider balanced data with , , and hence . The ML estimators of and under the full model based on 100 simulated replicates are summarized in Table 1. The ML estimators of and under model with correct fixed effects but misspecified random effects based on 100 simulated replicates are summarized in Table 2. The ML estimators of , and under model with both misspecified fixed and random effects based on 100 simulated replicates are summarized in Table 3.
10 | 0.033 | 0.467 | 0.994 | 1.453 | 0.041 | 0.850 |
---|---|---|---|---|---|---|
(0.062) | (0.281) | (0.564) | (0.615) | (0.073) | (0.165) | |
20 | 0.008 | 0.512 | 1.028 | 1.470 | 0.008 | 0.983 |
(0.017) | (0.194) | (0.362) | (0.490) | (0.014) | (0.087) | |
30 | 0.003 | 0.490 | 0.994 | 1.534 | 0.004 | 0.989 |
(0.006) | (0.116) | (0.260) | (0.396) | (0.007) | (0.049) | |
0.000 | 0.500 | 1.000 | 1.500 | 0.000 | 1.000 | |
True | 0.000 | 0.500 | 1.000 | 1.500 | 0.000 | 1.000 |
10 | 0.151 (0.541) | 0.583 (0.546) | 0.944 (0.572) | 2.363 (1.054) |
---|---|---|---|---|
20 | 0.047 (0.091) | 0.555 (0.285) | 1.050 (0.392) | 2.415 (0.650) |
30 | 0.030 (0.070) | 0.521 (0.198) | 0.961 (0.268) | 2.442 (0.666) |
0.000 | 0.500 | 1.000 | 2.500 | |
True | 0.000 | 0.500 | 1.000 | 1.000 |
10 | 1.604 (1.073) | 0.176 (0.465) | 3.494 (0.788) |
---|---|---|---|
20 | 1.353 (0.567) | 0.043 (0.077) | 3.915 (0.540) |
30 | 1.525 (0.427) | 0.030 (0.065) | 3.880 (0.436) |
1.500 | 0.000 | 3.690 | |
True | 1.500 | 0.000 | 1.000 |
As seen in Table 1, the ML estimators, and , based on the full model, have small biases except for with . We note that their standard deviations tend to be smaller when is larger. In particular, the standard deviations of and are much smaller than the others, which echoes Theorem 1, that which shows that has a faster convergence rate when it converges to zero. For model with misspecified random effects, Table 2 shows that the ML estimator overestimates by about on average, particularly for larger values of , which also complies with Theorem 3. Finally, for model with both fixed and random effects misspecified, Table 3 confirms that is far from its true value and reasonably close to its probability limit, , derived in Theorem 5. In addition, tends to be closer to when is larger, as expected from Theorem 5.
Next, we consider unbalanced data with and . We set , , , and hence . The ML estimators of and under the full model based on 100 simulated replicates are summarized in Table 4. The ML estimators of and under model with correct fixed effects but misspecified random effects based on 100 simulated replicates are summarized in Table 5. The ML estimators of , , and under model with both misspecified fixed and random effects based on 100 simulated replicates are summarized in Table 6. The ML estimators of and based on unbalanced data can be seen to perform similarly to those based on balanced data.
| ||||||||
---|---|---|---|---|---|---|---|---|
10 | 3 | 32 | 0.017 | 0.500 | 1.011 | 1.414 | 0.021 | 0.877 |
(0.041) | (0.330) | (0.602) | (0.790) | (0.040) | (0.154) | |||
20 | 4 | 89 | 0.009 | 0.516 | 1.029 | 1.490 | 0.008 | 0.974 |
(0.020) | (0.175) | (0.399) | (0.493) | (0.014) | (0.082) | |||
30 | 5 | 164 | 0.002 | 0.497 | 1.007 | 1.539 | 0.004 | 0.991 |
(0.005) | (0.121) | (0.263) | (0.374) | (0.008) | (0.049) | |||
0.000 | 0.500 | 1.000 | 1.500 | 0.000 | 1.000 | |||
True | 0.000 | 0.500 | 1.000 | 1.500 | 0.000 | 1.000 |
| ||||||
---|---|---|---|---|---|---|
10 | 3 | 32 | 0.213 (0.927) | 0.576 (0.696) | 1.175 (1.220) | 2.283 (1.127) |
20 | 4 | 89 | 0.044 (0.096) | 0.536 (0.260) | 1.091 (0.461) | 2.456 (0.792) |
30 | 5 | 165 | 0.028 (0.079) | 0.500 (0.208) | 0.959 (0.290) | 2.426 (0.645) |
0.000 | 0.500 | 1.000 | 2.500 | |||
True | 0.000 | 0.500 | 1.000 | 1.000 |
10 | 3 | 32 | 1.522 (0.902) | 0.065 (0.180) | 3.535 (0.944) |
---|---|---|---|---|---|
20 | 4 | 89 | 1.362 (0.540) | 0.057 (0.142) | 3.960 (0.716) |
30 | 5 | 164 | 1.494 (0.458) | 0.030 (0.068) | 3.892 (0.539) |
1.500 | 0.000 | 3.690 | |||
True | 1.500 | 0.000 | 1.000 |
4.2 Experiment 2
In the second experiment, we compare the conventional confidence interval given by (5) with the proposed confidence interval given by (6). Similar to Experiment 1, we generate data according to (1) with , , , and , where and are independent, for and . Here we consider a more challenging situation of dependent covariates. Specifically, we assume that is a matrix with the -th entry , and is a matrix with the -th entry . We consider balanced data with and three numbers of clusters, , resulting in a total of nine different combinations.
We compare the 95 confidence intervals of (5) and (6) for and based on model . The coverage probabilities of both confidence intervals obtained from the two methods for various cases based on 1,000 simulated replicates are shown in Table 7. The proposed method has better coverage probabilities than the conventional ones in almost all cases. The coverage probabilities of our confidence interval tend to the nominal level (i.e., ) as increases for all cases even when is very small. In contrast, the conventional method tends to be too optimistic for both and . For example, the coverage probabilities are less than when regardless of . Although the coverage probabilities are a bit closer to the nominal level when is larger, they are still in the range of when , showing that the conventional confidence interval is not valid for small .
Classical | Proposed | ||||
---|---|---|---|---|---|
2 | 10 | 0.651 (0.015) | 0.649 (0.015) | 0.814 (0.012) | 0.763 (0.013) |
50 | 0.724 (0.014) | 0.703 (0.014) | 0.932 (0.008) | 0.935 (0.008) | |
100 | 0.725 (0.014) | 0.722 (0.014) | 0.942 (0.007) | 0.929 (0.008) | |
5 | 10 | 0.778 (0.013) | 0.738 (0.014) | 0.895 (0.010) | 0.871 (0.011) |
50 | 0.809 (0.012) | 0.818 (0.012) | 0.936 (0.008) | 0.937 (0.008) | |
100 | 0.811 (0.012) | 0.809 (0.012) | 0.940 (0.008) | 0.929 (0.008) | |
10 | 10 | 0.838 (0.012) | 0.816 (0.012) | 0.900 (0.009) | 0.893 (0.010) |
50 | 0.874 (0.010) | 0.849 (0.011) | 0.952 (0.007) | 0.946 (0.007) | |
100 | 0.849 (0.011) | 0.867 (0.011) | 0.941 (0.007) | 0.956 (0.006) |
5 Discussion
In this article, we establish the asymptotic theory of the ML estimators of random-effects parameters in linear mixed-effects models for unbalanced data, without assuming that grows to infinity with . We not only allow the dimensions of both the fixed-effects and random-effects models to go to infinity with , but also allow both models to be misspecified. In addition, we provide an asymptotic valid confidence interval for the random-effects parameters when is fixed. These asymptotic results are essential for investigating the asymptotic properties of model-selection methods for linear mixed-effects models, which to the best of our knowledge have only been developed under the assumption of .
Although it is common to assume the random effects to be uncorrelated as done in model (1), it is also of interest to consider correlated random effects with no structure imposed on . However, the technique developed in this article may not be directly applicable to the latter situation; further research in this direction is thus warranted.
Conditions (A1) and (A2) assume that the covariates are asymptotically uncorrelated. These restrictions can be relaxed. Here is a simple example.
Lemma 1.
Consider the data generated from (2) with , , , and the true parameters given in (6). Suppose that is the smallest true model and is a misspecified model defined in (4). Let and be the ML estimators of and based on . Assume that (A1)–(A3) hold except that and , for some constants . Then
where is the true parameter of .
From Lemma 1, it is not surprising to see that . On the other hand, tends to overestimate by . Since and , the amount of overestimation is smaller when either or is larger. In contrast, tends to be more upward biased when is larger, since . Lemma 1 demonstrates how the correlations between the two covariates affect the behavior of and . However, when the number of covariates is larger, the ML estimators of and become much more complicated. We leave this extension of Lemma 1 to the general case for future work.
Acknowledgements
The research of Chih-Hao Chang is supported by ROC Ministry of Science and Technology grant MOST 107-2118-M-390-001.
The research of Hsin-Cheng Huang is supported by ROC Ministry of Science and Technology grant MOST 106-2118-M-001-002-MY3.
The research of Ching-Kang Ing is supported by the Science Vanguard Research Program under the Ministry of Science and Technology, Taiwan, ROC.
References
- [1] {barticle}[author] \bauthor\bsnmBell, \bfnmB. A.\binitsB. A., \bauthor\bsnmMorgan, \bfnmG. B.\binitsG. B., \bauthor\bsnmSchoeneberger, \bfnmJ. A.\binitsJ. A., \bauthor\bsnmKromney, \bfnmJ. D.\binitsJ. D. and \bauthor\bsnmFerron, \bfnmJ. M.\binitsJ. M. (\byear2014). \btitleHow low can you go? \bjournalMethodology: European Journal of Research Methods for the Behavioral and Social Sciences \bvolume10 \bpages1–11. \endbibitem
- [2] {barticle}[author] \bauthor\bsnmChang, \bfnmC. H.\binitsC. H., \bauthor\bsnmHuang, \bfnmH. C.\binitsH. C. and \bauthor\bsnmIng, \bfnmC. K.\binitsC. K. (\byear2017). \btitleMixed domain asymptotics for a stochastic process model with time trend and measurement error. \bjournalBernoulli \bvolume23 \bpages159-190. \endbibitem
- [3] {barticle}[author] \bauthor\bsnmChang, \bfnmC. H.\binitsC. H., \bauthor\bsnmHuang, \bfnmH. C.\binitsH. C. and \bauthor\bsnmIng, \bfnmC. K.\binitsC. K. (\byear2020). \btitleSelection of Linear Mixed-Effects Models with a Small Number of Clusters. \bjournalSubmitted. \endbibitem
- [4] {barticle}[author] \bauthor\bsnmCrainiceanu, \bfnmC. M.\binitsC. M. and \bauthor\bsnmRuppert, \bfnmD.\binitsD. (\byear2004). \btitleLikelihood ratio tests in linear mixed models with one variance component. \bjournalJournal of Royal Statistical Society. Series B \bvolume66 \bpages165–185. \endbibitem
- [5] {barticle}[author] \bauthor\bsnmFan, \bfnmY.\binitsY. and \bauthor\bsnmLi, \bfnmR.\binitsR. (\byear2012). \btitleVariable selection in linear mixed effects models. \bjournalThe Annals of Statistics \bvolume40 \bpages2043-2068. \endbibitem
- [6] {barticle}[author] \bauthor\bsnmGumedze, \bfnmF. N.\binitsF. N. and \bauthor\bsnmDunne, \bfnmT. T.\binitsT. T. (\byear2011). \btitleParameter estimation and inference in the linear mixed model. \bjournalLinear Algebra and Its Applications \bvolume435 \bpages1920-1944. \endbibitem
- [7] {barticle}[author] \bauthor\bsnmHartley, \bfnmH. O.\binitsH. O. and \bauthor\bsnmRao, \bfnmJ. N. K.\binitsJ. N. K. (\byear1967). \btitleMaximum likelihood estimation for the mixed analysis of variance model. \bjournalBiometrika \bvolume54 \bpages93-108. \endbibitem
- [8] {barticle}[author] \bauthor\bsnmHarville, \bfnmD. A.\binitsD. A. (\byear1977). \btitleMaximum likelihood approaches to variance components estimation and related problems. \bjournalJournal of the American Statistical Association \bvolume72 \bpages320-338. \endbibitem
- [9] {barticle}[author] \bauthor\bsnmHuang, \bfnmF.\binitsF. (\byear2018). \btitleUsing cluster bootstrapping to analyze nested data with a few clusters. \bjournalEducational and Psychological Measurement \bvolume78 \bpages297-318. \endbibitem
- [10] {barticle}[author] \bauthor\bsnmJiang, \bfnmJ.\binitsJ. (\byear2007). \btitleLinear and Generalized Linear Mixed Models and Their Applications. \bjournalSpringer, New York. \endbibitem
- [11] {barticle}[author] \bauthor\bsnmJiang, \bfnmJ.\binitsJ. (\byear2017). \btitleAsymptotic Analysis of Mixed Effects Models: Theory, Applications, and Open Problems. \bjournalSpringer, New York. \endbibitem
- [12] {barticle}[author] \bauthor\bsnmLaird, \bfnmN. M.\binitsN. M. and \bauthor\bsnmWare, \bfnmJ. H.\binitsJ. H. (\byear1982). \btitleRandom-effects models for longitudinal data. \bjournalBiometrics \bvolume38 \bpages963-974. \endbibitem
- [13] {barticle}[author] \bauthor\bsnmLongford, \bfnmN. T.\binitsN. T. (\byear1993). \btitleRandom Coefficient Models Oxford Statistical Science Series 11. \bjournalOxford University Press, New York. \endbibitem
- [14] {barticle}[author] \bauthor\bsnmMaas, \bfnmC. J. M.\binitsC. J. M. and \bauthor\bsnmHox, \bfnmJ. J.\binitsJ. J. (\byear2004). \btitleRobustness issues in multilevel regression analysis. \bjournalStatistica Neerlandica \bvolume58 \bpages127-137. \endbibitem
- [15] {barticle}[author] \bauthor\bsnmMardia, \bfnmK. V.\binitsK. V. and \bauthor\bsnmMarshall, \bfnmR. J.\binitsR. J. (\byear1984). \btitleMaximum likelihood estimation of models for residual covariance in spatial regression. \bjournalBiometrika \bvolume71 \bpages135-146. \endbibitem
- [16] {barticle}[author] \bauthor\bsnmMcNeish, \bfnmD.\binitsD. and \bauthor\bsnmStapleton, \bfnmL. M.\binitsL. M. (\byear2016). \btitleModeling clustered data with very few clusters. \bjournalMultivariate Behavioral Research \bvolume51 \bpages495-518. \endbibitem
- [17] {barticle}[author] \bauthor\bsnmMcNeish, \bfnmD.\binitsD. and \bauthor\bsnmStapleton, \bfnmL. M.\binitsL. M. (\byear2016). \btitleThe effect of small sample size on two-level model estimates: A review and illustration. \bjournalEducational Psychology Review \bvolume28 \bpages295-314. \endbibitem
- [18] {barticle}[author] \bauthor\bsnmMiller, \bfnmJ. J.\binitsJ. J. (\byear1977). \btitleAsymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. \bjournalThe Annals of Statistics \bvolume5 \bpages746-762. \endbibitem
- [19] {barticle}[author] \bauthor\bsnmPeng, \bfnmH.\binitsH. and \bauthor\bsnmLu, \bfnmY.\binitsY. (\byear2012). \btitleModel selection in linear mixed effect models. \bjournalJournal of Multivariate Analysis \bvolume109 \bpages109-129. \endbibitem
- [20] {barticle}[author] \bauthor\bsnmPu, \bfnmW.\binitsW. and \bauthor\bsnmNiu, \bfnmX. F.\binitsX. F. (\byear2006). \btitleSelecting mixed-effects models based on a generalized information criterion. \bjournalJournal of Multivariate Analysis \bvolume97 \bpages733-758. \endbibitem
- [21] {barticle}[author] \bauthor\bsnmSelf, \bfnmS. G.\binitsS. G. and \bauthor\bsnmLiang, \bfnmK. Y.\binitsK. Y. (\byear1987). \btitleAsymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. \bjournalJournal of the American Statistical Association \bvolume82 \bpages605-610. \endbibitem
- [22] {barticle}[author] \bauthor\bsnmStram, \bfnmD. O.\binitsD. O. and \bauthor\bsnmLee, \bfnmJ. W.\binitsJ. W. (\byear1994). \btitleVariance component testing in the longitudinal mixed effects model. \bjournalBiometrics \bvolume50 \bpages1171-1177. \endbibitem
- [23] {barticle}[author] \bauthor\bsnmVerbeke, \bfnmG.\binitsG. and \bauthor\bsnmMolenberghs, \bfnmG.\binitsG. (\byear2000). \btitleLinear Mixed Models for Longitudinal Data. \bjournalSpringer, New York. \endbibitem
Supplementary Material
The supplementary materials consist of three appendices that prove all the theoretical results except for Theorem 2, whose proof is straightforward and is hence omitted. Appendix A contains auxiliary lemmas that are required in the proofs. Appendix B provides proofs for Example 1 and Theorems 1 and 3–5. Appendix C gives proofs for all the lemmas.
Appendix A Auxiliary Lemmas
We start with the following matrix identities, which will be repeated applied:
(A.1) | ||||
(A.2) |
where is an nonsingular matrix, and and are column vectors. Note that (A.2) is applied iteratively to establish the decomposition of the precision matrix , where
(A.3) |
Heuristically speaking, let ; be the -th column of and
(A.4) |
where denotes the -th element of ; . Suppose that . Then by (A.2),
(A.5) |
Applying (A.2) iteratively, we obtain the decomposition
(A.6) |
note that . The proofs of Lemmas 2, 3, and 4 are then based on the induction and the decomposition of (A.6).
The proofs of theorems in Section 3 heavily rely on the asymptotic properties of the quadratic forms, , , , , , and , with defined in (A.3), for ; and . The following lemmas give their convergence rates.
Lemma 2.
Lemma 3.
Lemma 4.
Note that Lemma 2 (i) implies that, for ,
uniformly over , where denotes a matrix with elements equal to and is a diagonal matrix with diagonal elements bounded away from and . Hence by (A.2) with and , we have, for ,
(A.7) |
uniformly over , which plays a key role in proving lemmas for theorems.
The following lemma shows that does not converge to in probability for , which allows us to restrict the parameter space of from to
(A.8) |
Lemma 5.
Appendix B Theoretical Proofs
B.1 Proof of Theorem 1
We shall focus on the asymptotic properties of and , and derive the asymptotic properties of via ; . If and ; , then we can derive them using the likelihood equations. Differentiating the profile log-likelihood function of (7) with respect to and , we obtain
(B.1) |
and
(B.2) |
To derive and , we must study the convergence rate of each term on the right-hand sides of both (B.1) and (B.2) by Lemmas 2–4 and Lemma 6.
The first two terms of (B.3) are zeros because
(B.4) |
uniformly over . Note that by the Cauchy–Schwarz inequality,
(B.5) |
Hence, by Lemma 6 (i), Lemma 6 (iii), and Lemma 6 (v), the last term of (B.3) can be written as
uniformly over . Therefore, we can rewrite (B.3) as
It follows from (B.1) that for ,
uniformly over . This and Lemma 5 imply that
(B.6) |
Thus (1) follows by applying the law of large numbers to . In addition, the asymptotic normality of follows by and an application of the central limit theorem to in (B.6).
uniformly over . This and (B.4) imply that for ,
uniformly over , where the last equality follows from Lemma 3 (ii)–(iii) and Lemma 4 (i). Hence, for ,
It remains to prove (4), for . We prove by showing that (B.2) is asymptotically nonnegative, for ; using a recursive argument. Let be except that are replaced by . By Lemma 6 (i) and Lemma 6 (iii), we have, for ,
uniformly over . This and (B.4) imply that for ,
(B.9) |
uniformly over , where the last equality follows from Lemma 3 (iii) and Lemma 4 (i). Hence by (B.5), Lemma 3 (ii), and (B.2), we have, for and ,
This implies that is an asymptotically nondecreasing function on , for given other . It follows that ; . The above convergence rate can be recursively improved. Without loss of generality, assume that . We can restrict the parameter space of in the next step to
(B.10) |
with . Then, by Lemma 6 (i) and Lemma 6 (iii), we have, for ,
uniformly over . This and (B.4) imply that for ,
B.2 Proof of Example 1
Note that for , and . Note that by Lemma 5, we consider the sample space . We first derive the explicit forms of the ML estimators and . By (B.2), we have
where and
(B.11) |
Note that ML estimators and satisfy
which implies that
(B.12) |
where
(B.13) |
with defined in (B.11). By (B.12), we have
(B.14) |
(B.15) |
Similarly, by (B.1), we have
The ML estimators and satisfy
which implies that
(B.16) |
with
This together with (B.14) yields
(B.17) |
We are now ready to compare the asymptotic behaviors between the LS predictors and the empirical BLUPs. Note that for , we have
Hence
(B.18) |
and
which implies that
(B.19) |
Note that by (B.18),
and by (B.19),
which implies that
(B.20) |
Note that by (B.20) and
we have
(B.21) |
with
Note that by (B.14),
(B.22) |
Further, by (B.12) and (B.16), we have
(B.23) |
with
Similarly,
(B.24) | ||||
(B.25) |
with
Hence by (B.14), (B.15), and (B.17), we have
(B.26) |
Furthermore, we have
(B.27) |
with
Note that
(B.28) |
By (B.21), (B.23), (B.24), (B.25), and (B.27), we have
with
where the last equality follows from (B.22), (B.26), and (B.28). Note that follows the inverse-chi-squared distribution with degrees of freedom. We have
(B.29) |
By (B.29) and
we have, for ,
This completes the proofs.
B.3 Proof of Theorem 5
In this section, we first prove Theorem 5 to simplify the proofs of Theorems 3 and 4. As with the proof of Theorem 1, we shall focus on the asymptotic properties of and , and derive them by solving the likelihood equations directly.
where denotes the sub-vector of corresponding to . Note that by the Cauchy–Schwarz inequality, we have
(B.31) |
Hence by (B.31) and Lemma 6, we have
uniformly over , where the last equality follows from (B.5) and Lemmas 2–4. Hence by (B.1), we have, for ,
uniformly over . This and Lemma 5 imply that
(B.32) |
Thus (16) follows by applying the law of large numbers to . In addition, if , the asymptotic normality of follows by and an application of the central limit theorem to in (B.32).
uniformly over . This and (B.30) imply that for ,
uniformly over , where the last equality follows from Lemma 2 (iii), Lemma 3 (ii)–(iv), and Lemma 4 (i). It follows that for ,
uniformly over . Hence by Lemma 3 (ii) and (B.2), we have, for ,
uniformly over . This implies that for ,
This proves (19), for .
It remains to prove (19), for . Let be except that are replaced by . By (B.31) and Lemma 6 (i)–(iv), we have, for ,
uniformly over . This and (B.30) imply that for ,
B.4 Proof of Theorem 3
As with the proof of Theorem 1, we shall focus on the asymptotic properties of and , and derive them by solving the likelihood equations directly.
We first prove (8) using (B.1). Hence by (B.31), Lemma 6 (i)–(iii), Lemma 6 (v)–(vi), and Lemma 6 (viii), we have
uniformly over . This and (B.4) imply
uniformly over , where the last equality follows from Lemma 3, Lemma 4 (i)–(ii), and Lemma 4 (iv). Hence by (B.1), we have, for ,
uniformly over . This and Lemma 5 imply that for ,
(B.33) |
Thus (8) follows by applying the law of large numbers to . In addition, if , the asymptotic normality of follows by and an application of the central limit theorem to in (B.33).
uniformly over . This and (B.4) imply that for ,
uniformly over , where the last equality follows from Lemma 3 (ii)–(iv) and Lemma 4 (i). Hence, for ,
uniformly over . Hence by Lemma 3 (ii) and (B.2), we have, for ,
uniformly over . Hence we have, for ,
This completes the proof of (11), for .
B.5 Proof of Theorem 4
As with the proof of Theorem 1, we shall focus on the asymptotic properties of and , and derive them by solving the likelihood equations directly.
We first prove (12) using (B.1). By Lemma 6 (i), Lemma 6 (iii)–(v), Lemma 6 (vii), and Lemma 6 (x), we have
uniformly over , where the last equality follows from Lemma 3 (ii)–(iv) and Lemma 4. Hence by (B.1), we have, for ,
uniformly over . This and Lemma 5 imply that for ,
(B.34) |
Thus (12) follows by applying the law of large numbers to . In addition, if , the asymptotic normality of follows by and an application of the central limit theorem to in (B.34).
uniformly over . This and (B.30) imply that for ,
uniformly over , where the last equality follows from Lemma 2 (iii), Lemma 3 (ii)–(iii), and Lemma 4 (i). Hence, for ,
uniformly over . Hence by Lemma 3 (ii) and (B.2), we have, for ,
uniformly over . This and Lemma 5 imply that for ,
This completes the proof of (15) when .
It remains to prove (15), for . Let be except that are replaced by . By Lemma 6 (i) and Lemma 6 (iii)–(iv), we have, for ,
uniformly over . This and (B.30) imply that for ,
uniformly over , where the last equality follows from Lemma 2 (iii), Lemma 3 (iii), and Lemma 4 (i). Therefore,
uniformly over . Hence by Lemma 3 (ii) and (B.2), we have, for ,
uniformly over . This and Lemma 5 imply that for ,
This completes the proof of (15), for . Hence the proof of Theorem 4 is complete.
Appendix C Proofs of Auxiliary Lemmas
C.1 Proof of Lemma 2
Let ; be the -th column of and defined in (A.4). For Lemma 2 (i)–(ii) to hold, it suffices to prove that for and ,
(C.1) | ||||
(C.2) | ||||
(C.3) |
uniformly over . For , and , by (A.2) and (A1)–(A3), we have
uniformly over . For , and , by (A.2) and (A1)–(A3), we have
uniformly over . Now suppose that (C.1)–(C.3) hold for . Then for and , by (A.2) and (C.1)–(C.3) with , and Lemma 3 (i), we have
uniformly over . This completes the proofs of (C.1)–(C.3). Hence the proofs of Lemma 2 (i)–(ii) are complete.
where we note that can be arbitrarily small and the dominant term of the denominator of the last equation can be equal to (i) or (ii) . For the case of (i), by Lemma 3 (i); hence, using Lemma 2 (ii) and Lemma 3 (i), we have
and thus
For the case of (ii), by Lemma 3 (i); hence, using Lemma 3 (i), we have
which also gives the following two results:
In conclusion, we have
(C.4) |
uniformly over . This completes the proof.
C.2 Proof of Lemma 3
Let ; be the -th column of and defined in (A.4). We first prove Lemma 3 (i). By (A.4), it suffices to prove that for ,
(C.5) |
and for and ,
(C.6) |
uniformly over by induction. For and , by (A.2) and (A2), we have
uniformly over . For and , by (A.2) and (A2), we have
uniformly over . Now suppose that (C.5) and (C.6) hold for . Then for and , by (A.2), and (C.5) and (C.6) with , we have
uniformly over . This completes the proof of (C.5) and (C.6). Hence Lemma 3 (i) follows from (C.5), (C.6) with and . This completes the proof of Lemma 3 (i).
We now prove Lemma 3 (ii). Without loss of generality, we assume that and . Then by Lemma 3 (i) and (A.2),
uniformly over . Again, by Lemma 3 (i), we have
uniformly over . This completes the proof of Lemma 3 (ii).
where we note that and can be arbitrarily small and the dominant term of the denominator of the last equation can be equal to
-
(i)
;
-
(ii)
;
-
(iii)
.
For the case of (i), and by Lemma 3 (i); hence, using Lemma 3 (i), we have
which also gives the following two results:
For the case of (ii), and (or vice versa) by Lemma 3 (i); hence, using Lemma 3 (i), we have
which gives the following three results:
For the case of (iii), and by Lemma 3 (i); hence, using Lemma 3 (i), we have
which also gives the following three results:
In conclusion, we have
(C.7) |
uniformly over . This completes the proof of Lemma 3 (iii).
C.3 Proof of Lemma 4
Note that for and ,
Lemma 4 (ii)–(iii) then follow arguments similarly from the induction and the proofs of Lemma 2 (iii) are hence omitted.
We next prove Lemma 4 (iv). Let be the -th column of and be defined in (A.4). Without loss of generality, we assume . Hence by (A.6), Lemma 3 (i), and Lemma 4 (ii), we have
uniformly over . This completes the proof of Lemma 4 (iv).
C.4 Proof of Lemma 5
We show the lemma for , where the proofs with respect to the remaining models are similar and are hence omitted.
Let be the -th column of and be defined in (A.4). Without loss of generality, we assume that and . It then suffices to prove that for and
(C.8) |
as both and for some , where , and being the true value of ; . Note that by (A.3) and (A.1), we have
Continuously expanding the above equation by (A.1) yields
uniformly over and
(C.10) |
as both and for some . Before proving (C.9) and (C.10), we prove the following equations, for being defined in (5) and :
(C.11) | ||||
(C.12) |
and
(C.13) | ||||
(C.14) |
uniformly over , where the second last equality follows from Lemma 3 (i) and Lemma 4 (i)–(ii). For (C.12) with , we have
uniformly over , where the second equality follows from (9) and (A.5) and the third equality follows from (A.7), Lemma 2 (iii), and Lemma 4 (ii)–(iii). For (C.13) with , we have
uniformly over , where the second equality follows from (A.7), Lemma 2 (iii), and Lemma 4 (ii)–(iii). For (C.14) with ,
uniformly over , where the second equality follows from (A.7), Lemma 2 (ii)–(iii), and Lemma 4 (iii). This completes the proofs of (C.11)–(C.14). We now prove (C.9). Note that
(C.15) |
uniformly over , where the second equality follows from (C.12) that
uniformly over , the second last equality follows from (C.13) that
uniformly over , and the last equality follows from (C.14) that
uniformly over . Also, by (C.11),
uniformly over . This together with (C.15) gives (C.9). We now prove (C.10). As with the proof of (C.15), we have
uniformly over . Hence
uniformly over . Hence, for (C.10) to hold, it suffices to prove that for and ,
(C.16) |
as both and for some . It suffices to prove (C.16) for . By Lemma 3 (ii)–(iii), we have
Hence, for (C.16) with to hold, it suffices to prove that
as both and , which follows from Lemma 3 (i) and L’Hospital’s rule. This completes the proof of (C.16). This completes the proof.
C.5 Proof of Lemma 6
We first prove Lemma 6 (i). For , and , we have
uniformly over , where the second equality follows from (A.7) and Lemma 2 (iii). Similarly, by (A.7) and Lemma 2 (iii), we have
uniformly over . This completes the proof of Lemma 6 (i).
We now prove Lemma 6 (ii). For , , and ,
uniformly over , where the second equality follows from Lemma 2 (ii)–(iii) and (A.7). Similarly, by (A.7) and Lemma 2 (ii)–(iii), we have
uniformly over . This completes the proof of Lemma 6 (ii).
We now prove Lemma 6 (iii). For and ,
uniformly over , where the second equality follows from (A.7), Lemma 2 (iii), and Lemma 4 (iii). Similarly, by (A.7), Lemma 2 (iii), and Lemma 4 (iii), we have
uniformly over . This completes the proof of Lemma 6 (iii).
We now prove Lemma 6 (iv). For , ,
uniformly over , where the second equality follows from (A.7), Lemma 2 (i), and Lemma 2 (iii). Similarly, by (A.7) and Lemma 2 (i) and (iii), we have
uniformly over . This completes the proof of Lemma 6 (iv).
We now prove Lemma 6 (v). For , we have
uniformly over , where the second equality follows from (A.7) and Lemma 4 (iii). This completes the proof of Lemma 6 (v).
We now prove Lemma 6 (vi). For and , we have
uniformly over , where the second equality follows from (A.7), Lemma 2 (ii), and Lemma 4 (iii). This completes the proof of Lemma 6 (vi).
We now prove Lemma 6 (vii). For , we have
uniformly over , where the second equality follows from (A.7), Lemma 2 (i), and Lemma 4 (iii). This completes the proof of Lemma 6 (vii).
We now prove Lemma 6 (viii). For , and , we have
uniformly over , where the second equality follows from (A.7) and Lemma 2 (ii). This completes the proof of Lemma 6 (viii).
We now prove Lemma 6 (ix). For , , we have
uniformly over , where the second equality follows from (A.7) and Lemma 2 (i)–(ii). This completes the proof of Lemma 6 (ix).
We finally prove Lemma 6 (x). For , we have