\ul
Impact of existence and nonexistence of pivot on the coverage of empirical best linear prediction intervals for small areas
Abstract
We advance the theory of parametric bootstrap in constructing highly efficient empirical best (EB) prediction intervals of small area means. The coverage error of such a prediction interval is of the order , where is the number of small areas to be pooled using a linear mixed normal model. In the context of an area level model where the random effects follow a non-normal known distribution except possibly for unknown hyperparameters, we analytically show that the order of coverage error of empirical best linear (EBL) prediction interval remains the same even if we relax the normality of the random effects by the existence of pivot for a suitably standardized random effects when hyperpameters are known. Recognizing the challenge of showing existence of a pivot, we develop a simple moment-based method to claim non-existence of pivot. We show that existing parametric bootstrap EBL prediction interval fails to achieve the desired order of the coverage error, i.e. , in absence of a pivot. We obtain a surprising result that the order term is always positive under certain conditions indicating possible overcoverage of the existing parametric bootstrap EBL prediction interval. In general, we analytically show for the first time that the coverage problem can be corrected by adopting a suitably devised double parametric bootstrap. Our Monte Carlo simulations show that our proposed single bootstrap method performs reasonably well when compared to rival methods.
Keywords: Small area estimation, empirical Bayes, linear mixed model, best linear predictor
1 Introduction
The following two-level model, commonly referred to as the area-level model, has been extensively used in small area applications.
The area-level model. For ,
Level 1 (Sampling model): ;
Level 2 (Linking model): .
Here, represents the number of small areas, is a fully parametric distribution, not necessarily normal, with mean , variance , and any additional parameters . Assuming the same normal distribution at level 1, Datta and Lahiri (1995) assumed that is a scale mixture of normal distribution. Later, Bell and Huang (2006) and Xie et al. (2007) applied a distribution as a specific case of the scale mixture of normals at level 2, with mean , variance and degrees of freedom , to mitigate the influence of outliers. Fabrizi and Trivisano (2010) introduced two robust area-level models: the first assumes that follows an exponential power distribution with a mean , variance and shape parameter ; the second model assumes a skewed exponential power distribution on level 2, with a mean , variance , shape parameter and the skewness parameter . Thus, in this case, the parameter vector can be defined as .
In the above model, level 1 is used to account for the sampling distribution of the direct estimates , which are weighted averages of observations from small area . Level 2 links the true small area means to a vector of known auxiliary variables , often obtained from various administrative records. The parameters and of the linking model are generally unknown and are estimated from the available data. As in other papers on the area-level model, the sampling variances are assumed to be known. In practice, ’s are estimated using a smoothing technique such as the ones given in Fay and Herriot (1979), Otto and Bell (1995), Ha et al. (2014) and Hawala and Lahiri (2018).
The two-level model can be rewritten as the following simple linear mixed model
(1.1) |
where the random effect and the sampling error are independent. When is different from normal distribution, the best predictor (BP) of , may not have a closed form. Instead, the best linear predictor (BLP) of always has the explicit form as below:
(1.2) |
where , and the BLP minimizes the mean squared prediction error (MSPE) among all linear predictors of . The variance of the prediction error is denoted by . If is known, one can obtain the standard weighted least squares estimator of , denoting as . Replacing in (1.2) by , a best linear unbiased prediction (BLUP) estimator of is given by
(1.3) |
which does not require normality assumptions typically used in small area estimation models. In practice, it is common that both and are unknown and need to be estimated based on data. After plugging in their estimators, an empirical best linear unbiased predictor (EBLUP) of is given by:
(1.4) |
where and , and is a consistent estimator of for large . After plugging in the estimator , we have . To simplify the notation, throughout the remainder of this paper, we will use and to denote and , respectively. Point prediction using EBLUP and the associated mean squared prediction error (MSPE) estimation have been studied extensively. See Rao and Molina (2015) and Jiang and Nguyen (2007) for a detailed discussion on this subject.
In this paper, we consider prediction interval approximates of small area means . A prediction interval of , denoted by , is called a interval for if , for any fixed , and , the parameter space of . The probability is with respect to the area-level model. There are several options for constructing interval estimates for . The prediction interval based on only the Level 1 model for the observed data is given by , where is the th standard normal percentile. While the coverage probability for this direct interval is , it is not efficient when is large as in the case of small area estimation. Hall and Maiti (2006) considered the interval based the regression synthetic estimator: , where and are obtained using a parametric bootstrap method (described in detail in a later section). However, this interval is synthetic in the sense that it is constructed using synthetic regression estimator of and its associated uncertainty measure, which are not area specific in the outcome variable and hence may cause larger length of the confidence interval.
It is of importance to combine both levels of the model in the interval estimation. A effective approach is to use empirical best methodology. We call an interval empirical best (EB) prediction interval if it is based on empirical best predictor of . For a special case of the area-level model that follows a normal distribution, Cox (1975) initiated the exact empirical Bayes interval: , where is the empirical Bayes estimator of . Although the Cox interval always has smaller length than that of the direct interval, its coverage error is of the order , not accurate enough in most small area applications. Yoshimori and Lahiri (2014) improved the cox-type empirical Bayes interval by using a carefully devised adjusted residual maximum likelihood (ARML) estimator of . Their interval has a coverage error of order . Additionally, they analytically showed that their interval always produces shorter length than the corresponding direct interval. However, the properties of both the ARML estimator of and the associated interval have not yet been explored for cases involving non-normally distributed random effects.
A function of the data and parameters is called to be a pivot if its distribution does not depend on any unknown quantity (Shao, 2008; Hall, 2013). When we have a linear mixed normal model on (1.1) , is a standard normal pivot. The traditional method of interval estimation for is of the form , where mspe is an estimate of the true mean squared prediction error (MSPE) of the EBLUP. Unfortunately, is not a pivot and this traditional approach produces too short or too long intervals. The coverage error of such interval is of the order , not accurate enough in most small area applications. Recognizing that does not follow a standard normal distribution, Chatterjee et al. (2008) and Li and Lahiri (2010) developed a parametric bootstrap method to approximate its distribution and obtained a EB prediction interval for in linear mixed normal models. They showed such interval has the coverage error of the order . However, the property remains unknown for non-normal distributed random effects.
In this paper, one main aim is to bring out the virtues of pivoting or rescaling, which can decrease the dependence of our statistics on unknown parameters and yield improved prediction interval approximation for small areas under the general model (1.1). Analogous to Chatterjee et al. (2008), we propose parametric bootstrap methods to approximate the distribution of a suitably centered and scaled EBLUP under the general model (1.1), and apply it to construct a prediction interval estimation of . Here, we define an interval based on the EBLUP of as an Empirical Best Linear (EBL) prediction interval. Specifically, we introduce these two key quantities: one is the centered and scaled EBLUP following the distribution, which can be expressed as below
and the other is based on BLP which has the distribution and can be expressed as
If does not depend on any unknown parameters, can be referred as a pivot; otherwise, is not a pivot and we rewrite as , where is the unknown parameter vector determining the distribution of . We define as the -level quantile of the distribution , if for any fixed .
In Section 2, we introduce a list of notation and regularity conditions used throughout the paper. In Section 3, we propose a single parametric bootstrap EBL prediction interval for small area means in the context of an area-level model, where the random effects follow a general (non-normal) known distribution, except possibly for unknown hyperparameters. We analytically demonstrate that the coverage error order of the EBL prediction interval remains the same as in Chatterjee et al. (2008), even when relaxing the normality of random effects through the existence of a pivot for suitably standardized random effects when hyperparameters are known. However, when a pivot does not exist, the EBL prediction interval fails to achieve the desired coverage error order, i.e. . Surprisingly, we find that the order term is always positive under certain conditions, indicating potential overcoverage in the current single parametric bootstrap EBL prediction interval. Recognizing the challenge of showing existence of a pivot, we develop a simple moment-based method to claim non-existence of pivot. In Section 4, we propose a double parametric bootstrap method under a general area level model and, for the first time, analytically show that this approach can correct the coverage problem. In Section 5, we compare our proposed EBL prediction interval methods with the direct method, other traditional approaches, and the parametric bootstrap prediction interval proposed by Hall and Maiti (2006).
2 A list of notations and regularity conditions
We use the following notations throughout the paper:
-
, a column vector of direct estimates;
-
, a known matrix of rank ;
-
, a diagonal matrix;
-
, weighted least square estimator of with known ;
-
;
-
, derivative with respect to evaluated at .
We assume following regularity conditions for proving various results presented in this paper:
-
R1:
The rank of , , is bounded for a large ;
-
R2:
, and the true , the interior of ;
-
R3:
;
-
R4:
for ;
-
R5:
The distribution function is three times continuously differentiable with respect to (), and third derivative is uniformly bounded. When is not a pivot, is three times continuously differentiable with respect to the parameter vector , and its third derivative is uniformly bounded;
-
R6:
When having a non-pivot , we assume the estimator satisfies:
where the expectation is taken at the true . For a special case that , using the method of moment from Prasad and Rao (1990) to estimate , Lahiri and Rao (1995) showed that . Moreover, they provided the Lemma C.1 that under the regularity conditions R1 - R5, for any satisfying with .
3 Single parametric bootstrap
For the remainder of this paper, without further explicit mention, we will use to represent the BLP of , and to denote the EBLUP of . The single parametric bootstrap method has been widely studied for its simplicity in obtaining prediction intervals directly from the bootstrap histogram. Ideally, a prediction interval of can be constructed based on the distribution of , although this distribution function is complex and difficult to approximate analytically. In this paper, we firstly follow the procedure introduced by Chatterjee et al. (2008) and Li and Lahiri (2010) to provide a bootstrap approximation of by using a parametric bootstrap method. The implementation is straightforward, following these steps:
-
1.
Conditionally on the data , draw for , independently from the distribution ;
-
2.
Given , draw from the distribution ;
-
3.
Construct the bootstrap estimators , using the data and then obtain and ;
-
4.
Calibrate on using the bootstrap method. Let be the solution of the following equations:
(3.1) where ;
-
5.
The single bootstrap calibrated prediction interval is constructed by:
One of our main results from the algorithm above is that we relax the normality of the random effects by the existence of a pivot , the prediction interval obtained above still has a high degree of coverage accuracy. That is, it brings the coverage error down to .
Theorem 3.1.
Under regularity conditions, for a preassigned and arbitrary , when is a pivot, we have
(3.2) |
where and are determined via the single parametric bootstrap procedure described above.
The proof of Theorem 3.1 is given in the Appendix A. An example of Theorem 3.1 is that when is a normal distribution, , using Theorem 3.2 of Chatterjee et al. (2008), we have
(3.3) |
Proposition 3.1.
When is not a pivot, we have
(3.4) |
where is obtained from the single parametric bootstrap procedure described above.
The proof is given in the Appendix B.
Proposition 3.2.
Suppose that:
-
(i)
The random effects are symmetrically distributed;
-
(ii)
and , where means the largest eigenvalue.
This condition is satisfied for some continuous distributions. For instance, when follows a logistic or distribution with known degrees of freedom. The only unknown parameter of is the variance . As Remark 1 indicates that the kurtosis of is a decreasing function of , it is not difficult to show that and ; -
(iii)
The estimators of the unknown parameters, , are either second-order unbiased or negatively biased, that is with , for .
Under these conditions, the prediction interval (3.4) has an overcoverage property. More specifically, we can rewrite (3.4) as below:
(3.5) |
where is of the order with .
See detailed proof in the Appendix C. The proposition indicates that under the regularity conditions, the prediction intervals conducted by the proposed single parametric bootstrap can produce higher coverage than the nominal coverage with a non pivot up to the order of , which could be beneficial for practitioners without considering other properties of prediction intervals.
Remark 1.
When is not a normal distribution, it is challenging to obtain the explicit form of the distribution of . Consequently, it is difficult to verify whether is a pivot. Note the fact that for to be a pivot, its moments should not depend on any unknown parameters. Based on that, we develop a simple moment-based method to claim non-existence of pivot. Under the symmetry assumption of , the odd moments of are zero if they exist, and the second moment equals to 1 because it is standardized. To verify if is a pivot, we calculate its fourth moment as follows:
(3.6) |
where is the excess kurtosis of . Note that when is normally distributed, is zero. When the distribution of is other than normal, such as t, double exponential, and logistic, is a constant other than zero, indicating that the fourth moment of depends on the unknown parameter and thus is not a pivot. Moreover, in these cases, the fourth moment of is a decreasing function of .
4 Double parametric bootstrap
Hall and Maiti (2006) considered parametric bootstrap methods to approximate the distribution of . Then, the prediction interval can be constructed as , where and are obtained from single bootstrap approximation based on their algorithm. Their prediction interval is based on a synthetic model or the regression model, which does not permit approximation of the conditional distribution of given the data . As a consequence, it is likely to underweight the area specific data. When the level 2 distribution determined only by and , it is easy to know that the quantity, , is a pivot. As Hall and Maiti (2006) stated, their prediction interval is effective as . However, when additional parameters involved into the distribution of random effects, might not a pivot and we may lose the effectiveness.
Proposition 4.1.
When considering a general distribution , such as a t distribution with unknown degree of freedom, we have
(4.1) |
where and are obtained from single bootstrap approximation based on Hall and Maiti (2006) algorithm.
Hall and Maiti (2006) proposed a double-bootstrap method to calibrate from single parametric bootstraps, which can achieve a high degree of coverage accuracy . However, their calibration approach can overcorrect and produce a calibrated greater than 1, which makes the implementation infeasible.
While the order term in (3.5) is theoretically positive under certain conditions, it remains unclear whether this positiveness holds when is asymmetrically distributed. Unlike Hall and Maiti (2006), we introduce a new double bootstrap method, which does not require a pivot or symmetric . This method reduces the coverage error to , even when is asymmetrically distributed. Our double bootstrap approach is based on the algorithm from Shi (1992), where the double bootstrap is proposed to obtain accurate and efficient confidence interval for parameters of interest in both nonparametric and univariate parametric distribution settings. Later on, McCullough and Vinod (1998) discussed the theory of the double bootstrap both with and without pivoting, and provided the implementations in some nonlinear production functions. In this paper, we develop the double bootstrap in the context of our mixed effect model and apply it to obtain the EBL prediction intervals of . The framework of our double parametric bootstrap is as below:
-
1.
First-stage bootstrap
-
1.1
Conditionally on the data , draw , from the distribution ;
-
1.2
Given , draw from the distribution ;
-
1.3
Compute , , from the data , and obtain and ;
-
1.1
-
2.
Second-stage bootstrap
-
2.1
Given , , , draw from the distribution ;
-
2.2
Given , draw from the distribution ;
-
2.3
Compute , from the data , also obtain and ;
-
2.4
Consider to obtain a -level, two-sided, equal-tailed prediction interval. Define
(4.2) For seeking the upper limit, we firstly solve the following system of equations in order to obtain such that
(4.3) Using the definition (4.2), we have . Then, the above system of equations is equivalent to
(4.4) Rewriting (4.4) gives
(4.5) Note that the inner probability, , is a function of the first-stage bootstrap resample. More specifically, on the th first-stage bootstrap sample, after all second-stage bootstrap operations are completed, let be the proportion of times that , i.e.,
We will use to adjust the first-stage intervals. After all bootstrapping operations are complete, we have estimates , , where is the number of first-stage bootstrap operations. Sort and choose the quantile as the percentile point for defining the double-bootstrap upper limit.
-
2.5
After completing the step 2.4, we have all estimates , and . Choose such that
(4.6) -
2.6
Take as the upper limit of the double bootstrap prediction interval. Similar operations determine the lower limit, . Finally, construct the prediction interval of as
-
2.1
The algorithm above shows that the single parametric bootstrap is calibrated by the second-stage bootstrap. In general, such a calibration improves the coverage accuracy to , even when the pivot does not exist. One more advantage of our double bootstrap algorithm is that it avoids the problem of over correction and so make it more practical than that of Hall and Maiti (2006).
Theorem 4.1.
Under regularity conditions, for a preassigned and arbitrary , we have
(4.7) |
where and are obtained from the double parametric bootstrap procedure described above.
See proof of the theorem in appendix E.
5 Monte Carlo Simulations
In this section, we compare the performance of the proposed parametric bootstrap with their competitors where available, using Monte Carlo simulation studies. To maintain comparability with existing studies, we adopt part of the simulation framework of Chatterjee et al. (2008). We consider an area-level model with , and five groups of small areas. Within each group, the sampling variances ’s remain the same. Two patterns for the ’s are considered: (i) ; (ii) . For pattern (i), we take ; for pattern (ii), we take doubling the variances of pattern (i) while preserving the ratios. To examine the effect of , we consider and 50. With the increase of , all methods improve and get closer to one another, supporting our asymptotic theory. Since we obtained virtually identical results under these two patterns, for the full study we confined attention to the pattern (i) and the results for pattern (ii) are provided in the Appendix G.
The Prasad-Rao method-of-moments (Prasad and Rao, 1990), and the Fay-Herriot method of estimating the variance component are considered. For the Fay-Herriot estimator of , we employ the method of scoring to solve the estimating equation, which has showed to be more stable (Datta et al., 2005) than the Newton-Raphson method originally used in Fay and Herriot (1979). The estimation equation of the Fay-Herriot estimator is
Here, is a decreasing function with respect to , and the expectation of its first derivative . To improve computational efficiency, we implemented a slight modification to the algorithm. Specifically, the revised Fay-Herriot algorithm begins by calculating at the initial point , as in Fay and Herriot (1979). If , we truncate the estimator to . Otherwise, the iterative process continues to search for a positive solution, with also applied if no positive solution is found. This revised Fay-Herriot algorithm further enhances computational efficiency compared to the original method.
5.1 Simulations on symmetric cases
First, we consider the scenario that is symmetrically distributed. Specifically, we assume is a distribution with 9 degrees of freedom. In this setting, we compare coverage probabilities and average lengths of the following seven different prediction intervals of :
-
•
Two prediction intervals based on the proposed single parametric bootstrap method with two different variance estimators: and , denoted as SB.FH and SB.PR, respectively;
-
•
Two prediction intervals based on the single parametric bootstrap methods proposed by Hall and Maiti (2006), using the same variance estimators, denoted as HM.FH and HM.PR;
- •
-
•
The direct confidence interval (DIRECT) given by .
Each reported result is based on 1000 simulation runs. For all cases, we consider single bootstrap sample of size 400 and three different nominal coverages: .
We report the percentages of negative estimates for A under distribution in Table 1. For , the Prasad-Rao method-of-moment approach yields as high as about 13% negative estimates of under pattern (i) of . Fay-Herroit method produces significantly fewer negative estimates than the Prasad-Rao method-of-moment approach across all scenarios.
Pattern (i) | Pattern (ii) | |||
---|---|---|---|---|
FH | PR | FH | PR | |
15 | 1.6 (6.4) | 13.1 (25.0) | 1.1 (5.7) | 11.8 ( 23.2) |
50 | 0 (0.07) | 1.2(6.6) | 0 (0.07) | 1.2 (6.1) |
Table 2 presents coverage probabilities and average lengths for each prediction interval method with and the distribution under the pattern (i) of . The SB.PR and HM.PR prediction intervals consistently over-cover. SB.PR prediction intervals have shorter lengths than HM.PR, especially for groups 2-5 with smaller sampling variances . The SB.FH and HM.FH prediction intervals perform well regarding coverage errors. Specifically, their coverage probabilities are very close to all three nominal coverages. The SB.FH method produces the shortest average lengths among these four single parametric bootstrap methods. The FH intervals tend to undercover for G1 group when and . The PR intervals have the undercoverage issue for G1 group across all three nominal coverages.
SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) G1 79.94 ( 2.31 ) 80.30 ( 2.53 ) 83.7 ( 2.59 ) 84.45 ( 2.84 ) 79.89 ( 2.30 ) 78.66 ( 2.31 ) 79.43 ( 5.13 ) G2 80.15 ( 1.59 ) 80.31 ( 2.49 ) 83.63 ( 1.79 ) 84.54 ( 2.81 ) 80.38 ( 1.59 ) 81.55 ( 1.67 ) 79.85 ( 1.99 ) G3 79.71 ( 1.50 ) 79.53 ( 2.48 ) 83.12 ( 1.69 ) 83.75 ( 2.80 ) 79.86 ( 1.50 ) 80.88 ( 1.58 ) 79.99 ( 1.81 ) G4 80.47 ( 1.39 ) 80.03 ( 2.48 ) 84.06 ( 1.57 ) 83.95 ( 2.80 ) 80.49 ( 1.39 ) 82.06 ( 1.47 ) 80.29 ( 1.62 ) G5 79.65 ( 1.05 ) 80.48 ( 2.46 ) 83.32 ( 1.20 ) 84.89 ( 2.78 ) 79.99 ( 1.06 ) 81.8 ( 1.14 ) 80.32 ( 1.15 ) (i) G1 89.88 ( 3.05 ) 90.35 ( 3.39 ) 94.13 ( 3.79 ) 94.25 ( 4.21 ) 88.9 ( 2.96 ) 87.82 ( 2.97 ) 89.42 ( 6.58 ) G2 90.17 ( 2.05 ) 90.37 ( 3.30 ) 93.25 ( 2.59 ) 94.27 ( 4.14 ) 89.89 ( 2.04 ) 90.58 ( 2.14 ) 90.19 ( 2.55 ) G3 90.06 ( 1.94 ) 89.34 ( 3.30 ) 93.38 ( 2.44 ) 93.76 ( 4.14 ) 90.15 ( 1.92 ) 90.78 ( 2.03 ) 90.11 ( 2.33 ) G4 90.07 ( 1.79 ) 89.50 ( 3.29 ) 93.41 ( 2.26 ) 94.24 ( 4.13 ) 90.12 ( 1.78 ) 91.19 ( 1.89 ) 90.24 ( 2.08 ) G5 90.29 ( 1.36 ) 90.28 ( 3.26 ) 93.33 ( 1.72 ) 94.2 ( 4.09 ) 90.26 ( 1.36 ) 91.56 ( 1.47 ) 90.32 ( 1.47 ) (i) G1 95.13 ( 3.75 ) 95.04 ( 4.22 ) 97.3 ( 5.31 ) 97.42 ( 5.91 ) 93.57 ( 3.52 ) 92.65 ( 3.53 ) 94.77 ( 7.84 ) G2 95.11 ( 2.47 ) 95.35 ( 4.07 ) 96.99 ( 3.61 ) 97.5 ( 5.79 ) 94.87 ( 2.43 ) 95.52 ( 2.55 ) 95.38 ( 3.04 ) G3 94.96 ( 2.32 ) 94.45 ( 4.06 ) 97.05 ( 3.39 ) 97.42 ( 5.77 ) 94.70 ( 2.29 ) 95.51 ( 2.41 ) 95.00 ( 2.77 ) G4 95.06 ( 2.14 ) 95.03 ( 4.04 ) 96.86 ( 3.14 ) 97.58 ( 5.75 ) 95.12 ( 2.12 ) 95.86 ( 2.25 ) 95.07 ( 2.48 ) G5 95.31 ( 1.62 ) 94.98 ( 3.99 ) 96.87 ( 2.38 ) 97.42 ( 5.69 ) 95.28 ( 1.62 ) 96.01 ( 1.75 ) 95.17 ( 1.75 )
Table 3 reports the results for . As illustrated in Table 1, the Prasad–Rao method produces an extremely high percentage of zero estimates for a small . Thus, it is not surprising to note that the SB.PR intervals have severe undercoverage problem when and 95%. The high percentage of negative estimates of might also contribute to the similar undercoverage problem of HM.PR intervals at and 95% as well as the large average lengths of both SB.PR and HM.PR intervals. The SB.FH and HM.FH intervals uniformly tend to overcover. Still, SB.FH intervals have the shortest average lengths among the four types of parametric bootstrap intervals. The FH intervals significantly undercover for group G1 at three normial coverages and have slight undercoverage for groups G2 and G3, when and 95%. The PR intervals undercover for group G1 and switch to overcover for rest of the groups at all three nominal coverages.
SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) G1 82.43 ( 2.68 ) 82.50 ( 2.94 ) 81.50 ( 3.37 ) 82.30 ( 3.74 ) 75.07 ( 2.31 ) 76.23 ( 2.45 ) 81.37 ( 5.13 ) G2 82.80 ( 1.78 ) 83.50 ( 2.78 ) 80.77 ( 2.25 ) 82.13 ( 3.60 ) 79.43 ( 1.63 ) 88.97 ( 2.19 ) 80.03 ( 1.99 ) G3 81.40 ( 1.67 ) 81.37 ( 2.76 ) 80.57 ( 2.12 ) 82.27 ( 3.58 ) 79.07 ( 1.54 ) 88.20 ( 2.15 ) 80.40 ( 1.81 ) G4 84.07 ( 1.55 ) 84.17 ( 2.74 ) 81.50 ( 1.95 ) 82.97 ( 3.54 ) 81.40 ( 1.43 ) 90.47 ( 2.1 ) 80.67 ( 1.62 ) G5 81.80 ( 1.17 ) 83.17 ( 2.66 ) 79.33 ( 1.47 ) 82.23 ( 3.42 ) 80.73 ( 1.11 ) 88.50 ( 1.98 ) 80.00 ( 1.15 ) (i) G1 93.07 ( 3.89 ) 92.87 ( 4.33 ) 87.00 ( 5.77 ) 87.3 ( 6.45 ) 83.97 ( 2.97 ) 85.80 ( 3.14 ) 90.33 ( 6.58 ) G2 93.00 ( 2.51 ) 93.1 ( 3.96 ) 86.30 ( 3.78 ) 87.27 ( 6.03 ) 89.37 ( 2.09 ) 95.70 ( 2.81 ) 90.87 ( 2.55 ) G3 92.63 ( 2.35 ) 92.5 ( 3.91 ) 86.50 ( 3.57 ) 87.77 ( 6.01 ) 88.37 ( 1.98 ) 95.27 ( 2.76 ) 90.20 ( 2.33 ) G4 93.90 ( 2.16 ) 93.83 ( 3.86 ) 87.53 ( 3.28 ) 87.87 ( 5.94 ) 90.17 ( 1.84 ) 96.67 ( 2.70 ) 90.93 ( 2.08 ) G5 91.83 ( 1.60 ) 92.87 ( 3.67 ) 86.03 ( 2.42 ) 87.53 ( 5.62 ) 90.03 ( 1.43 ) 94.73 ( 2.54 ) 89.60 ( 1.47 ) (i) G1 97.13 ( 5.46 ) 97.07 ( 6.11 ) 88.77 ( 8.91 ) 88.80 ( 9.98 ) 88.73 ( 3.54 ) 90.73 ( 3.75 ) 94.73 ( 7.84 ) G2 96.33 ( 3.41 ) 96.83 ( 5.36 ) 88.43 ( 5.73 ) 88.87 ( 9.17 ) 93.7 ( 2.49 ) 97.87 ( 3.35 ) 95.7 ( 3.04 ) G3 96.60 ( 3.18 ) 96.83 ( 5.26 ) 89.20 ( 5.39 ) 89.50 ( 9.10 ) 93.33 ( 2.36 ) 97.90 ( 3.29 ) 94.77 ( 2.77 ) G4 97.23 ( 2.90 ) 97.37 ( 5.15 ) 89.87 ( 4.93 ) 89.90 ( 8.95 ) 95.33 ( 2.19 ) 98.83 ( 3.22 ) 95.63 ( 2.48 ) G5 96.30 ( 2.11 ) 96.93 ( 4.77 ) 88.93 ( 3.59 ) 89.30 ( 8.38 ) 95.13 ( 1.7 ) 97.80 ( 3.03 ) 94.57 ( 1.75 )
5.2 Further simulations on asymmetric cases
While some of our theoretical results for single bootstrap are based on the symmetry assumption, we also use simulation studies to assess our proposed parametric bootstrap method under asymmetric conditions. Specifically, for the same models and parameter choices as above, we change to be a shifted exponential distribution (SE). Besides the seven prediction intervals mentioned before, we also include our proposed double parametric bootstrap intervals in this subsection and we expect they might improve the potential coverage error under asymmetric conditions. Below, the double bootstrap method based on the Fay-Herriot variance estimator is denoted by DB.FH, and the double bootstrap method based on the Prasad-Rao variance estimator is indicated by DB.PR. We keep bootstrap sample of size 400 in the first stage and apply two different sizes in the second stage. Since for and , we obtained very similar results. Moreover, the two patterns of gave us similar results. In the following, we provide detailed discussions only for under the pattern (i), but the results for are provided in the Appendix G.
Table 4 shows the percentages of negative estimates for under the SE distribution. When , the Prasad-Rao method-of-moments approach yields very high percentages of negative estimates at both the first and second stages of bootstrapping. Although the Fay-Herriot method results in fewer negative estimates than the Prasad-Rao approach, the occurrence of negative estimates remains notable, especially when the number of small areas is small (e.g., ).
Pattern (i) | Pattern (ii) | |||
---|---|---|---|---|
FH | PR | FH | PR | |
15 | 3.6 (10.72) [16.32] | 16.7 (27.98) [31.95] | 3.9 (10.93) [16.46] | 18.1 (28.49) [31.97] |
50 | 0 (0.37) [1.44] | 1.80 ( 7.89) [ 12.82] | 0 (0.44) [1.56] | 2.80 (8.89) [13.68] |
Table 5 displays the coverage probabilities and average lengths for each prediction interval under . The parametric bootstrap methods based on Prasad-Rao estimators of , SB.PR, HM.PR and DB.PR, consistently tend to over-cover at and 90%. DB.PR intervals bring the coverage probabilities closer to the nominal coverages but have larger lengths than single bootstrap SB.PR intervals. Even with , the relatively high percentage of zero estimates of in the second bootstrap stage showed in Table 4, might contribute to the increment on interval length of DB.PR. When applying the Fay-Herriot method in estimating , our proposed single parametric bootstrap intervals SB.FH already showed good performance, DB.FH has little or no effect. The FH intervals perform well, except overcoverage at . PR intervals have undercoverage problem for group G1 at . Overall, the SB.FH and FH intervals perform the best in terms of both coverage probabilities and average lengths in this setting.
SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) G1 80.29 ( 2.20 ) 80.75 ( 2.35 ) 79.58 ( 2.22 ) 84.50 ( 2.50 ) 87.12 ( 2.68 ) 83.16 ( 2.86 ) 83.45 ( 2.28 ) 81.94 ( 2.30 ) 79.90 ( 5.13 ) G2 79.99 ( 1.55 ) 81.01 ( 2.31 ) 79.42 ( 1.56 ) 83.66 ( 1.77 ) 87.17 ( 2.65 ) 82.99 ( 2.07 ) 80.56 ( 1.57 ) 82.99 ( 1.67 ) 79.94 ( 1.99 ) G3 80.06 ( 1.47 ) 81.27 ( 2.30 ) 79.62 ( 1.47 ) 83.41 ( 1.68 ) 87.39 ( 2.65 ) 83.09 ( 1.97 ) 80.38 ( 1.48 ) 82.61 ( 1.59 ) 79.86 ( 1.81 ) G4 80.60 ( 1.36 ) 80.68 ( 2.30 ) 80.20 ( 1.37 ) 83.71 ( 1.56 ) 87.00 ( 2.64 ) 82.96 ( 1.83 ) 81.05 ( 1.37 ) 83.78 ( 1.48 ) 80.15 ( 1.62 ) G5 80.81 ( 1.05 ) 80.52 ( 2.29 ) 80.34 ( 1.05 ) 83.79 ( 1.20 ) 87.13 ( 2.63 ) 83.23 ( 1.4 ) 80.95 ( 1.05 ) 83.54 ( 1.17 ) 80.69 ( 1.15 ) (i) G1 90.46 ( 3.01 ) 91.81 ( 3.26 ) 89.56 ( 3.03 ) 93.49 ( 3.79 ) 94.70 ( 4.16 ) 92.15 ( 5.53 ) 90.98 ( 2.92 ) 90.23 ( 2.95 ) 90.01 ( 6.58 ) G2 90.22 ( 2.04 ) 91.74 ( 3.16 ) 89.53 ( 2.05 ) 93.01 ( 2.61 ) 94.91 ( 4.10 ) 92.09 ( 4.05 ) 90.14 ( 2.02 ) 92.22 ( 2.14 ) 89.81 ( 2.55 ) G3 90.00 ( 1.93 ) 91.93 ( 3.15 ) 89.48 ( 1.93 ) 92.51 ( 2.46 ) 94.89 ( 4.10 ) 91.69 ( 3.85 ) 90.2 ( 1.9 ) 91.69 ( 2.03 ) 89.80 ( 2.33 ) G4 90.48 ( 1.78 ) 91.74 ( 3.14 ) 89.8 ( 1.79 ) 92.72 ( 2.28 ) 94.65 ( 4.09 ) 92.09 ( 3.59 ) 90.47 ( 1.76 ) 92.06 ( 1.90 ) 90.17 ( 2.08 ) G5 90.69 ( 1.36 ) 91.34 ( 3.10 ) 90.24 ( 1.37 ) 92.97 ( 1.74 ) 94.64 ( 4.04 ) 92.23 ( 2.76 ) 90.71 ( 1.35 ) 92.35 ( 1.50 ) 90.29 ( 1.47 ) (i) G1 95.43 ( 3.83 ) 96.68 ( 4.18 ) 94.37 ( 3.90 ) 96.45 ( 5.44 ) 96.80 ( 6.02 ) 95.73 ( 9.56 ) 94.27 ( 3.48 ) 93.58 ( 3.52 ) 95.04 ( 7.84 ) G2 95.32 ( 2.51 ) 96.69 ( 3.98 ) 94.77 ( 2.58 ) 96.37 ( 3.69 ) 96.91 ( 5.90 ) 95.88 ( 6.82 ) 94.91 ( 2.40 ) 96.04 ( 2.55 ) 95.07 ( 3.04 ) G3 94.87 ( 2.36 ) 96.54 ( 3.96 ) 94.35 ( 2.43 ) 96.34 ( 3.49 ) 97.19 ( 5.89 ) 95.72 ( 6.48 ) 94.72 ( 2.27 ) 95.51 ( 2.42 ) 94.79 ( 2.77 ) G4 95.41 ( 2.17 ) 96.36 ( 3.93 ) 94.87 ( 2.23 ) 96.35 ( 3.22 ) 97.01 ( 5.86 ) 95.87 ( 6.06 ) 95.13 ( 2.10 ) 95.92 ( 2.27 ) 95.17 ( 2.48 ) G5 95.34 ( 1.64 ) 96.27 ( 3.86 ) 95.16 ( 1.69 ) 96.34 ( 2.43 ) 96.99 ( 5.79 ) 95.92 ( 4.60 ) 95.31 ( 1.61 ) 96.30 ( 1.78 ) 95.44 ( 1.75 )
With , similar to the cases under distribution, the Prasad-Rao variance estimator produces very high percentages of zero estimates of , especially at the bootstrap stages. See Table 4. Thus, it is not abnormal to have the results that the three parametric bootstrap intervals based on have very low coverage probabilities when the nominal coverage or 95%. See Table 6 for the cases . The other three parametric bootstrap intervals based on overcover when nominal covarage is 80% or 90%. When , SB.FH intervals have good performance as well as DB.FH in terms of coverage error, while SB.FH intervals have shorter average lengths than those of HM.FH and DB.FH. The FH intervals suffer from severe undercoverage for group G1 at all nominal coverages as well as PR intervals at . With and 90%, PR intervals have longer average lengths than those of SB.FH intervals for groups G2 - G5.
SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) G1 84.63 ( 2.66 ) 86.00 ( 2.90 ) 83.63 ( 2.68 ) 79.77 ( 3.31 ) 81.33 ( 3.69 ) 80.6 ( 4.89 ) 77.17 ( 2.23 ) 80.43 ( 2.43 ) 79.8 ( 5.13 ) G2 82.40 ( 1.78 ) 85.5 ( 2.74 ) 81.63 ( 1.80 ) 79.50 ( 2.20 ) 81.33 ( 3.56 ) 79.57 ( 3.26 ) 79.43 ( 1.59 ) 88.6 ( 2.21 ) 79.23 ( 1.99 ) G3 82.77 ( 1.68 ) 85.87 ( 2.72 ) 81.97 ( 1.70 ) 80.17 ( 2.08 ) 81.53 ( 3.55 ) 80.40 ( 3.07 ) 80.13 ( 1.51 ) 89.27 ( 2.18 ) 78.7 ( 1.81 ) G4 84.13 ( 1.55 ) 84.90 ( 2.70 ) 82.97 ( 1.57 ) 79.63 ( 1.92 ) 80.83 ( 3.51 ) 79.63 ( 2.80 ) 81.83 ( 1.41 ) 90.9 ( 2.15 ) 82.03 ( 1.62 ) G5 82.80 ( 1.18 ) 84.23 ( 2.61 ) 81.70 ( 1.18 ) 79.7 ( 1.44 ) 80.67 ( 3.38 ) 80.00 ( 2.00 ) 81.40 ( 1.12 ) 88.93 ( 2.07 ) 80.60 ( 1.15 ) (i) G1 93.77 ( 4.08 ) 94.33 ( 4.54 ) 93.13 ( 4.83 ) 85.83 ( 5.84 ) 86.10 ( 6.59 ) 86.07 ( 13.91 ) 86.13 ( 2.86 ) 89.37 ( 3.12 ) 90.43 ( 6.58 ) G2 92.70 ( 2.59 ) 94.00 ( 4.06 ) 91.53 ( 2.93 ) 86.00 ( 3.78 ) 86.33 ( 6.19 ) 85.97 ( 8.66 ) 89.07 ( 2.04 ) 95.13 ( 2.84 ) 89.67 ( 2.55 ) G3 92.83 ( 2.43 ) 93.97 ( 4.01 ) 91.67 ( 2.69 ) 86.80 ( 3.55 ) 86.80 ( 6.14 ) 86.73 ( 8.16 ) 89.5 ( 1.94 ) 95.77 ( 2.8 ) 90.20 ( 2.33 ) G4 93.20 ( 2.23 ) 93.83 ( 3.94 ) 92.00 ( 2.48 ) 85.93 ( 3.26 ) 86.37 ( 6.07 ) 85.5 ( 7.39 ) 91.43 ( 1.81 ) 96.2 ( 2.76 ) 91.60 ( 2.08 ) G5 92.17 ( 1.65 ) 93.1 ( 3.71 ) 91.00 ( 1.75 ) 86.53 ( 2.39 ) 86.57 ( 5.74 ) 86.33 ( 5.20 ) 91.30 ( 1.43 ) 95.23 ( 2.66 ) 90.23 ( 1.47 ) (i) G1 96.60 ( 6.02 ) 96.77 ( 6.72 ) 96.50 ( 10.31 ) 87.97 ( 9.34 ) 88.07 ( 10.58 ) 88.4 ( 22.55 ) 90.33 ( 3.41 ) 93.63 ( 3.71 ) 95.07 ( 7.84 ) G2 96.17 ( 3.64 ) 96.67 ( 5.67 ) 95.43 ( 5.73 ) 88.8 ( 5.86 ) 88.80 ( 9.64 ) 88.93 ( 13.23 ) 93.73 ( 2.43 ) 97.57 ( 3.38 ) 95.00 ( 3.04 ) G3 96.07 ( 3.38 ) 96.30 ( 5.53 ) 95.57 ( 5.13 ) 89.13 ( 5.48 ) 89.00 ( 9.53 ) 89.8 ( 12.69 ) 93.97 ( 2.31 ) 98.00 ( 3.34 ) 95.67 ( 2.77 ) G4 95.93 ( 3.09 ) 96.63 ( 5.41 ) 95.40 ( 4.59 ) 88.53 ( 5.03 ) 88.30 ( 9.42 ) 88.67 ( 11.26 ) 95.33 ( 2.16 ) 98.03 ( 3.28 ) 96.00 ( 2.48 ) G5 96.17 ( 2.22 ) 96.4 ( 4.95 ) 95.47 ( 2.99 ) 88.83 ( 3.63 ) 88.63 ( 8.76 ) 89.07 ( 8.17 ) 95.77 ( 1.71 ) 97.77 ( 3.16 ) 94.90 ( 1.75 )
6 Discussion
In this study, we put forward parametric bootstrap approaches to construct prediction intervals in the contexts of small area estimation under a general (non-normal) area-level model. Our simulation results show that the proposed single bootstrap method with the Fay-Herriot method of variance estimator performs well for all cases. Moreover, it is more efficient in terms of average lengths than the existing parametric bootstrap method proposed by Hall and Maiti (2006).
When the number of area is small, there is a high likelihood to obtain negative estimates of the variance when applying Prasad-Rao method. Throughout our simulation studies, we notice that the likelihood of obtaining negative estimates of variance might affect the performance of the parametric bootstrap methods in terms of both coverage probabilities and average lengths. One might consider a better variance estimator, such as the Fay-Herriot estimator we used, or a sensible truncation of negative estimates. In this study, we arbitrarily truncate the negative estimates of the variance at 0.01 as similar to Datta et al. (2005). To this end, in the future we will consider an extension of adjusted maximum likelihood estimators of such as the ones considered by Li and Lahiri (2010) or Hirose and Lahiri (2018) to the model proposed in this paper.
There is an another issue that even though the double bootstrap calibration can bring the coverage accuracy to regardless the existence of pivot, our simulations suggest that it is not always beneficial to attempt boosting the theoretical coverage probability via double bootstrap, disregarding other properties of the interval. Specifically, variability of calibrated intervals are greater than uncalibrated ones, minimum length property is almost never preserved, and the results are quite dependent on the parameters and fixed constants of the problem, such as estimation of the variance components in this study. For instance, Table 5 shows that the proposed single bootstrap intervals already produced good performance and thus the calibration using double bootstrap has little or no effect, where . When is relatively small, i.e., , double bootstrap improves the coverage probability marginally but it produces much larger interval length than the corresponding single bootstrap method.
Appendices
Appendix A Appendix
Proof of Theorem3.1: Under regularity conditions, with a pivot , we have
(A.1) |
where , and is a smooth function of order . The last equation is derived from similar results of the chapter 4 of Li (2007). See details in Appendix F. Then, from equations (3.1) and (A.1), we have
(A.2) |
Therefore, we have
(A.3) |
and
(A.4) |
(A.5) |
Appendix B Appendix
Proof of Proposition3.1: Consider is not a pivot, that is, , depends some unknown true parameters, . Here, we rewrite the distribution function of as , and Taylor expand around , obtaining,
(B.1) |
where
with lying between and .
Under the regularity conditions, we have,
(B.2) |
where denotes a smooth function of order , and
(B.3) |
with lying between and .
Appendix C Appendix
Proof of Proposition 3.2: From the third equation of (B.6) and (B.2), we have
(C.1) |
where and are unknown but fixed quantites.
In view of the above, we can write
(C.2) |
where
With the assumption of symmetrically distributed random effects , it is easy to show that . Hence, we have
Given the assumptions -, we have
under the regularity conditions. Thus, .
Appendix D Appendix
Proof of Proposition 4.1 Let , and , where is a vector of unknown parameters. We have
(D.1) |
(D.2) |
where and are smooth functions of order .
Eventually, we have
(D.3) |
Appendix E Appendix
Appendix F Appendix
Let denote the probability density function of . Let and be the first and second derivative of . Then (A.1) can be expressed as:
(F.1) |
where . Note that for , we have . Assuming that the third derivative is uniformly bounded, we have
(F.2) |
for some constant . Then, to show the last equation of (A.1), it reduces to the evaluation of , and . In particular, if , and , then it follows that
(F.3) |
by the Lyapunov inequality so that
First, note that
We now simplify the expression for :
(F.4) |
In view of the above, similar to Li (2007) and Saegusa et al. (2022), we can write
where
Therefore, in order to obtain the last equation of (F.1), we need to show that , and , . Firstly, it is easy to show that . Using the regularity condition that , we have
(F.5) |
Under the assumption , it is easily to see that
where is a column vector with the element unity and the rest zeros, . An application of the Cauchy–Schwartz inequality yields
(F.6) |
To evaluate the fourth moment of rest of the terms in , we need to evaluate the moment of and . To see this, for example, we can break down in terms of more tractable variables and remainder terms:
(F.7) |
Using the Taylor series expansion, we have
and for , we have
where lies between and .
For the last term , letting , we have
(F.8) |
where the remainder being . Then, apply Taylor expansion on :
(F.9) |
As pointed out in Chatterjee et al. (2008) and Saegusa et al. (2022), this evaluation of and involves numerous elementary calculations. In the end, both fourth moments reduce to the fourth moment of . If using the Prasad-Rao method of moment estimator , Lahiri and Rao (1995) showed that and for any satisfying with under regularity conditions.
Therefore, the above arguments support the final expression:
Appendix G Appendix
SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) G1 79.94 ( 3.26 ) 80.30 ( 3.58 ) 84.56 ( 3.76 ) 85.44 ( 4.12 ) 79.89 ( 3.26 ) 78.65 ( 3.27 ) 79.43 ( 7.25 ) G2 80.15 ( 2.25 ) 80.31 ( 3.52 ) 84.51 ( 2.61 ) 85.40 ( 4.07 ) 80.38 ( 2.25 ) 81.56 ( 2.36 ) 79.85 ( 2.81 ) G3 79.71 ( 2.12 ) 79.53 ( 3.51 ) 83.85 ( 2.47 ) 84.58 ( 4.07 ) 79.86 ( 2.12 ) 80.88 ( 2.23 ) 79.99 ( 2.56 ) G4 80.47 ( 1.96 ) 80.05 ( 3.51 ) 84.72 ( 2.29 ) 84.72 ( 4.06 ) 80.49 ( 1.96 ) 82.06 ( 2.08 ) 80.29 ( 2.29 ) G5 79.65 ( 1.49 ) 80.48 ( 3.48 ) 84.04 ( 1.76 ) 85.73 ( 4.03 ) 79.99 ( 1.49 ) 81.8 ( 1.62 ) 80.32 ( 1.62 ) (i) G1 89.89 ( 4.31 ) 90.38 ( 4.80 ) 94.62 ( 5.73 ) 94.68 ( 6.33 ) 88.90 ( 4.18 ) 87.79 ( 4.19 ) 89.42 ( 9.30 ) G2 90.18 ( 2.91 ) 90.37 ( 4.67 ) 93.44 ( 3.96 ) 94.56 ( 6.24 ) 89.89 ( 2.89 ) 90.58 ( 3.03 ) 90.19 ( 3.60 ) G3 90.06 ( 2.74 ) 89.34 ( 4.66 ) 93.68 ( 3.74 ) 94.17 ( 6.23 ) 90.15 ( 2.72 ) 90.78 ( 2.87 ) 90.11 ( 3.29 ) G4 90.09 ( 2.53 ) 89.5 ( 4.65 ) 93.69 ( 3.46 ) 94.65 ( 6.21 ) 90.12 ( 2.52 ) 91.18 ( 2.67 ) 90.24 ( 2.94 ) G5 90.29 ( 1.92 ) 90.28 ( 4.60 ) 93.56 ( 2.65 ) 94.73 ( 6.15 ) 90.26 ( 1.92 ) 91.56 ( 2.07 ) 90.32 ( 2.08 ) (i) G1 95.13 ( 5.30 ) 95.04 ( 5.97 ) 97.45 ( 8.42 ) 97.61 ( 9.31 ) 93.57 ( 4.98 ) 92.61 ( 5.00 ) 94.77 ( 11.09 ) G2 95.12 ( 3.49 ) 95.37 ( 5.76 ) 97.15 ( 5.82 ) 97.62 ( 9.13 ) 94.87 ( 3.44 ) 95.51 ( 3.60 ) 95.38 ( 4.29 ) G3 94.97 ( 3.29 ) 94.46 ( 5.74 ) 97.22 ( 5.48 ) 97.60 ( 9.08 ) 94.7 ( 3.24 ) 95.51 ( 3.42 ) 95 ( 3.92 ) G4 95.07 ( 3.03 ) 95.03 ( 5.71 ) 96.93 ( 5.08 ) 97.68 ( 9.04 ) 95.12 ( 3.00 ) 95.86 ( 3.18 ) 95.07 ( 3.51 ) G5 95.31 ( 2.29 ) 95.01 ( 5.64 ) 96.99 ( 3.88 ) 97.52 ( 8.94 ) 95.28 ( 2.29 ) 96.02 ( 2.47 ) 95.17 ( 2.48 )
SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) G1 84.30 ( 3.89 ) 84.53 ( 4.26 ) 82.80 ( 5.42 ) 84.1 ( 5.98 ) 76.07 ( 3.30 ) 76.50 ( 3.48 ) 79.97 ( 7.25 ) G2 83.43 ( 2.59 ) 83.77 ( 4.03 ) 81.33 ( 3.65 ) 83.77 ( 5.74 ) 79.57 ( 2.32 ) 87.10 ( 3.09 ) 79.17 ( 2.81 ) G3 84.20 ( 2.43 ) 84.53 ( 3.99 ) 82.73 ( 3.42 ) 84.57 ( 5.67 ) 80.00 ( 2.19 ) 88.30 ( 3.03 ) 80.03 ( 2.56 ) G4 83.07 ( 2.25 ) 84.50 ( 3.97 ) 81.87 ( 3.17 ) 83.67 ( 5.65 ) 79.77 ( 2.04 ) 89.33 ( 2.97 ) 80.57 ( 2.29 ) G5 82.53 ( 1.69 ) 84.87 ( 3.87 ) 80.73 ( 2.38 ) 84 ( 5.42 ) 79.17 ( 1.57 ) 87.20 ( 2.82 ) 78.13 ( 1.62 ) (i) G1 94.50 ( 5.78 ) 94.97 ( 6.43 ) 87.57 ( 10.07 ) 87.8 ( 11.18 ) 85.23 ( 4.23 ) 85.83 ( 4.47 ) 89.5 ( 9.3 ) G2 93.73 ( 3.77 ) 94.00 ( 5.88 ) 87.57 ( 6.69 ) 87.87 ( 10.47 ) 88.80 ( 2.98 ) 94.87 ( 3.96 ) 90.03 ( 3.60 ) G3 93.83 ( 3.52 ) 95.10 ( 5.80) 87.60 ( 6.26 ) 88.33 ( 10.32 ) 88.93 ( 2.81 ) 95.37 ( 3.89 ) 89.47 ( 3.29 ) G4 93.87 ( 3.23 ) 94.50 ( 5.72 ) 87.60 ( 5.81 ) 88.53 ( 10.25 ) 89.83 ( 2.61 ) 95.50 ( 3.81 ) 89.83 ( 2.94 ) G5 91.73 ( 2.39 ) 94.47 ( 5.42 ) 87.17 ( 4.28 ) 88.7 ( 9.66 ) 89.17 ( 2.02 ) 93.73 ( 3.62 ) 88.5 ( 2.08 ) (i) G1 98.10 ( 8.48 ) 97.9 ( 9.45 ) 89.27 ( 16.44 ) 89.37 ( 18.35 ) 90.4 ( 5.04 ) 91.17 ( 5.33 ) 94.40 ( 11.09 ) G2 97.73 ( 5.36 ) 97.73 ( 8.3 ) 89.33 ( 10.69 ) 89.43 ( 16.85 ) 93.8 ( 3.55 ) 98.50 ( 4.72 ) 94.77 ( 4.29 ) G3 97.50 ( 4.97 ) 97.73 ( 8.1 ) 89.33 ( 9.96 ) 89.50 ( 16.57 ) 93.67 ( 3.35 ) 98.27 ( 4.64 ) 94.53 ( 3.92 ) G4 97.23 ( 4.55 ) 97.80 ( 7.92 ) 89.67 ( 9.20 ) 89.87 ( 16.39 ) 94.2 ( 3.11 ) 97.93 ( 4.54 ) 94.90 ( 3.51 ) G5 96.5 ( 3.29 ) 97.4 ( 7.30 ) 89.93 ( 6.70 ) 90.53 ( 15.28 ) 94.57 ( 2.40 ) 97.13 ( 4.32 ) 93.40 ( 2.48 )
SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) G1 80.41 ( 2.22 ) 81.98 ( 2.36 ) 80.46 ( 2.27 ) 84.82 ( 2.51 ) 87.88 ( 2.69 ) 84.59 ( 3.08 ) 83.69 ( 2.29 ) 82.36 ( 2.32 ) 79.51 ( 5.13 ) G2 79.91 ( 1.56 ) 80.96 ( 2.33 ) 80.36 ( 1.59 ) 83.76 ( 1.77 ) 87.22 ( 2.66 ) 84.04 ( 2.26 ) 80.64 ( 1.58 ) 82.97 ( 1.67 ) 80.08 ( 1.99 ) G3 79.99 ( 1.48 ) 81.47 ( 2.32 ) 80.06 ( 1.50 ) 83.42 ( 1.67 ) 87.41 ( 2.65 ) 83.80 ( 2.06 ) 80.62 ( 1.49 ) 82.90 ( 1.59 ) 79.56 ( 1.81 ) G4 80.32 ( 1.37 ) 81.26 ( 2.32 ) 80.89 ( 1.4 ) 83.69 ( 1.55 ) 87.52 ( 2.65 ) 83.83 ( 1.96 ) 80.67 ( 1.38 ) 83.41 ( 1.48 ) 80.17 ( 1.62 ) G5 80.10 ( 1.05 ) 81.69 ( 2.30 ) 80.41 ( 1.07 ) 83.22 ( 1.20 ) 87.29 ( 2.63 ) 83.32 ( 1.51 ) 80.14 ( 1.05 ) 82.80 ( 1.17 ) 80.30 ( 1.15 ) (i) G1 90.86 ( 3.03 ) 92.24 ( 3.28 ) 90.70 ( 3.16 ) 93.66 ( 3.78 ) 95.04 ( 4.14 ) 93.44 ( 6.27 ) 91.23 ( 2.94 ) 90.27 ( 2.97 ) 89.62 ( 6.58 ) G2 90.53 ( 2.05 ) 91.43 ( 3.19 ) 90.85 ( 2.13 ) 92.97 ( 2.59 ) 94.46 ( 4.09 ) 93.16 ( 4.54 ) 90.63 ( 2.02 ) 92.15 ( 2.15 ) 89.99 ( 2.55 ) G3 89.86 ( 1.93 ) 91.99 ( 3.18 ) 90.27 ( 2.00 ) 93.28 ( 2.44 ) 95.21 ( 4.08 ) 92.79 ( 4.27 ) 89.95 ( 1.91 ) 91.67 ( 2.04 ) 89.71 ( 2.33 ) G4 90.34 ( 1.79 ) 91.91 ( 3.17 ) 90.64 ( 1.85 ) 92.80 ( 2.26 ) 94.41 ( 4.07 ) 92.7 ( 4.08 ) 90.20 ( 1.77 ) 91.98 ( 1.90 ) 89.74 ( 2.08 ) G5 90.28 ( 1.36 ) 92.04 ( 3.13 ) 90.80 ( 1.41 ) 92.83 ( 1.72 ) 94.90 ( 4.03 ) 92.48 ( 3.21 ) 90.47 ( 1.35 ) 92.10 ( 1.50 ) 90.20 ( 1.47 ) (i) G1 95.90 ( 3.85 ) 96.88 ( 4.21 ) 95.72 ( 4.42 ) 96.67 ( 5.39 ) 97.25 ( 5.96 ) 96.93 ( 11.17 ) 94.28 ( 3.51 ) 93.54 ( 3.54 ) 94.54 ( 7.84 ) G2 95.51 ( 2.51 ) 96.10 ( 4.00 ) 95.72 ( 2.82 ) 96.56 ( 3.64 ) 96.98 ( 5.84 ) 96.83 ( 7.92 ) 94.98 ( 2.41 ) 95.36 ( 2.56 ) 94.81 ( 3.04 ) G3 94.99 ( 2.36 ) 96.76 ( 3.98 ) 95.51 ( 2.65 ) 96.43 ( 3.42 ) 97.28 ( 5.82 ) 96.70 ( 7.51 ) 94.85 ( 2.27 ) 95.94 ( 2.43 ) 94.68 ( 2.77 ) G4 95.34 ( 2.17 ) 96.54 ( 3.96 ) 95.64 ( 2.43 ) 96.40 ( 3.17 ) 96.99 ( 5.82 ) 96.30 ( 7.07 ) 94.98 ( 2.11 ) 95.91 ( 2.27 ) 94.84 ( 2.48 ) G5 95.51 ( 1.64 ) 96.57 ( 3.89 ) 95.76 ( 1.83 ) 96.41 ( 2.40) 97.08 ( 5.74 ) 96.5 ( 5.53 ) 95.13 ( 1.61 ) 96.10 ( 1.78 ) 95.21 ( 1.75 )
SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) G1 83.90( 2.66 ) 85.60 ( 2.90 ) 84.43 ( 2.77 ) 79.27 ( 3.29 ) 80.23 ( 3.67 ) 80.27 ( 5.36 ) 76.83 ( 2.23 ) 80.90 ( 2.43 ) 79.83 ( 5.13 ) G2 83.00 ( 1.78 ) 85.90 ( 2.74 ) 82.90 ( 1.85 ) 79.73 ( 2.19 ) 80.67 ( 3.54 ) 80.20 ( 3.49 ) 79.77 ( 1.59 ) 89.60 ( 2.22 ) 79.40 ( 1.99 ) G3 83.40 ( 1.68 ) 85.47 ( 2.71 ) 83.23 ( 1.74 ) 79.67 ( 2.06 ) 80.77 ( 3.52 ) 80.10 ( 3.28 ) 80.57 ( 1.51 ) 89.90 ( 2.19 ) 79.90 ( 1.81 ) G4 83.23 ( 1.55 ) 85.70 ( 2.69 ) 82.93 ( 1.60 ) 78.90 ( 1.90 ) 80.77 ( 3.48 ) 78.97 ( 2.98 ) 81.30 ( 1.41 ) 90.30 ( 2.15 ) 79.40 ( 1.62 ) G5 83.17 ( 1.18 ) 85.00 ( 2.60 ) 82.73 ( 1.21 ) 80.40 ( 1.43 ) 81.03 ( 3.36 ) 80.50 ( 2.08 ) 83.20 ( 1.12 ) 89.50 ( 2.08 ) 80.77 ( 1.15 ) (i) G1 93.30 ( 4.08 ) 93.93 ( 4.54 ) 93.40 ( 5.54 ) 85.07 ( 5.83 ) 85.27 ( 6.58 ) 85.53 ( 15.6 ) 85.53 ( 2.87 ) 89.27 ( 3.12 ) 90.10 ( 6.58 ) G2 92.87 ( 2.59 ) 93.83 ( 4.06 ) 92.63 ( 3.31 ) 85.77 ( 3.75 ) 86.07 ( 6.14 ) 86.43 ( 9.76 ) 89.83 ( 2.04 ) 95.43 ( 2.84 ) 90.33 ( 2.55 ) G3 92.87 ( 2.42 ) 93.50 ( 3.99 ) 92.60 ( 3.04 ) 85.33 ( 3.53 ) 85.67 ( 6.12 ) 85.90 ( 9.26 ) 89.83 ( 1.94 ) 95.83 ( 2.81 ) 89.67 ( 2.33 ) G4 92.77 ( 2.22 ) 94.53 ( 3.92 ) 92.87 ( 2.7 ) 85.70 ( 3.23 ) 86.37 ( 6.02 ) 86.07 ( 8.26 ) 90.43 ( 1.81 ) 95.57 ( 2.76 ) 89.53 ( 2.08 ) G5 92.23 ( 1.65 ) 93.40 ( 3.71 ) 92.00 ( 1.88 ) 86.87 ( 2.38 ) 87.03 ( 5.73 ) 87.13 ( 5.79 ) 91.47 ( 1.43 ) 95.30 ( 2.67 ) 90.6 ( 1.47 ) (i) G1 96.33 ( 5.99 ) 96.57 ( 6.68 ) 96.70 ( 12.50 ) 87.33 ( 9.30 ) 87.43 ( 10.53 ) 88.53 ( 24.21 ) 89.53 ( 3.41 ) 92.73 ( 3.71 ) 94.93 ( 7.84 ) G2 96.17 ( 3.64 ) 96.6 ( 5.65 ) 96.10 ( 7.03 ) 87.97 ( 5.82 ) 87.90 ( 9.59 ) 89.23 ( 14.30 ) 93.93 ( 2.44 ) 97.77 ( 3.39 ) 95.33 ( 3.04 ) G3 95.93 ( 3.38 ) 96.13 ( 5.53 ) 96.30 ( 6.24 ) 87.70 ( 5.46 ) 87.70 ( 9.51 ) 88.87 ( 13.53 ) 94.07 ( 2.31 ) 97.9 ( 3.34 ) 94.97 ( 2.77 ) G4 96.47 ( 3.08 ) 97.07 ( 5.39 ) 96.53 ( 5.67 ) 87.93 ( 5.01 ) 88.2 ( 9.39 ) 89.40 ( 12.21 ) 95.03 ( 2.16 ) 98.13 ( 3.29 ) 94.63 ( 2.48 ) G5 96.07 ( 2.22 ) 96.50 ( 4.95 ) 96.23 ( 3.69 ) 89.17 ( 3.60 ) 88.90 ( 8.72 ) 90.27 ( 8.69 ) 96.00 ( 1.71 ) 98.00 ( 3.19 ) 95.07 ( 1.75 )
References
- Bell and Huang (2006) Bell, W. R. and E. T. Huang (2006). Using the t-distribution to deal with outliers in small area estimation. In Proceedings of statistics Canada symposium.
- Chatterjee et al. (2008) Chatterjee, S., P. Lahiri, and H. Li (2008). Parametric bootstrap approximation to the distribution of EBLUP and related prediction intervals in linear mixed models. The Annals of Statistics 36(3), 1221 – 1245.
- Cox (1975) Cox, D. (1975). Prediction intervals and empirical bayes confidence intervals. Journal of Applied Probability 12(S1), 47–55.
- Datta and Lahiri (1995) Datta, G. S. and P. Lahiri (1995). Robust hierarchical bayes estimation of small area characteristics in the presence of covariates and outliers. Journal of Multivariate Analysis 54(2), 310–328.
- Datta et al. (2005) Datta, G. S., J. Rao, and D. D. Smith (2005). On measuring the variability of small area estimators under a basic area level model. Biometrika 92(1), 183–196.
- Fabrizi and Trivisano (2010) Fabrizi, E. and C. Trivisano (2010). Robust linear mixed models for small area estimation. Journal of Statistical Planning and Inference 140(2), 433–443.
- Fay and Herriot (1979) Fay, R. E. and R. A. Herriot (1979). Estimates of income for small places: an application of james-stein procedures to census data. Journal of the American Statistical Association 74(366a), 269–277.
- Ha et al. (2014) Ha, N. S., P. Lahiri, and V. Parsons (2014). Methods and results for small area estimation using smoking data from the 2008 national health interview survey. Statistics in Medicine 33(22), 3932–3945.
- Hall (2013) Hall, P. (2013). The bootstrap and Edgeworth expansion. Springer Science & Business Media.
- Hall and Maiti (2006) Hall, P. and T. Maiti (2006). On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society Series B: Statistical Methodology 68(2), 221–238.
- Hawala and Lahiri (2018) Hawala, S. and P. Lahiri (2018). Variance modeling for domains. Statistics and Applications 16(1), 399–409.
- Hirose and Lahiri (2018) Hirose, M. Y. and P. Lahiri (2018). Estimating variance of random effects to solve multiple problems simultaneously. The Annals of Statistics 46(4), 1721–1741.
- Jiang and Nguyen (2007) Jiang, J. and T. Nguyen (2007). Linear and generalized linear mixed models and their applications, Volume 1. Springer.
- Lahiri and Rao (1995) Lahiri, P. and J. Rao (1995). Robust estimation of mean squared error of small area estimators. Journal of the American Statistical Association 90(430), 758–766.
- Li (2007) Li, H. (2007). Small area estimation: An empirical best linear unbiased prediction approach. University of Maryland, College Park.
- Li and Lahiri (2010) Li, H. and P. Lahiri (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of multivariate analysis 101(4), 882–892.
- McCullough and Vinod (1998) McCullough, B. and H. Vinod (1998). Implementing the double bootstrap. Computational Economics 12, 79–95.
- Otto and Bell (1995) Otto, M. C. and W. R. Bell (1995). Sampling error modelling of poverty and income statistics for states. In American Statistical Association, Proceedings of the Section on Government Statistics, pp. 160–165.
- Prasad and Rao (1990) Prasad, N. N. and J. N. Rao (1990). The estimation of the mean squared error of small-area estimators. Journal of the American statistical association 85(409), 163–171.
- Rao and Molina (2015) Rao, J. N. and I. Molina (2015). Small area estimation. John Wiley & Sons.
- Saegusa et al. (2022) Saegusa, T., S. Sugasawa, and P. Lahiri (2022). Parametric bootstrap confidence intervals for the multivariate fay–herriot model. Journal of Survey Statistics and Methodology 10(1), 115–130.
- Shao (2008) Shao, J. (2008). Mathematical statistics. Springer Science & Business Media.
- Shi (1992) Shi, S. G. (1992). Accurate and efficient double-bootstrap confidence limit method. Computational statistics & data analysis 13(1), 21–32.
- Xie et al. (2007) Xie, D., T. E. Raghunathan, and J. M. Lepkowski (2007). Estimation of the proportion of overweight individuals in small areas—a robust extension of the fay–herriot model. Statistics in Medicine 26(13), 2699–2715.
- Yoshimori and Lahiri (2014) Yoshimori, M. and P. Lahiri (2014). A second-order efficient empirical Bayes confidence interval. The Annals of Statistics 42(4), 1233 – 1261.