\useunder

\ul

Impact of existence and nonexistence of pivot on the coverage of empirical best linear prediction intervals for small areas

Yuting Chen¹ Masayo Y. Hirose² and Partha Lahiri¹ ¹ Joint Program in Survey Methodology, University of Maryland, College Park, United States ² Institute of Mathematics for Industry, Kyushu University, Japan

Abstract

We advance the theory of parametric bootstrap in constructing highly efficient empirical best (EB) prediction intervals of small area means. The coverage error of such a prediction interval is of the order $O(m^{-3/2})$ , where $m$ is the number of small areas to be pooled using a linear mixed normal model. In the context of an area level model where the random effects follow a non-normal known distribution except possibly for unknown hyperparameters, we analytically show that the order of coverage error of empirical best linear (EBL) prediction interval remains the same even if we relax the normality of the random effects by the existence of pivot for a suitably standardized random effects when hyperpameters are known. Recognizing the challenge of showing existence of a pivot, we develop a simple moment-based method to claim non-existence of pivot. We show that existing parametric bootstrap EBL prediction interval fails to achieve the desired order of the coverage error, i.e. $O(m^{-3/2})$ , in absence of a pivot. We obtain a surprising result that the order $O(m^{-1})$ term is always positive under certain conditions indicating possible overcoverage of the existing parametric bootstrap EBL prediction interval. In general, we analytically show for the first time that the coverage problem can be corrected by adopting a suitably devised double parametric bootstrap. Our Monte Carlo simulations show that our proposed single bootstrap method performs reasonably well when compared to rival methods.

Keywords: Small area estimation, empirical Bayes, linear mixed model, best linear predictor

1 Introduction

The following two-level model, commonly referred to as the area-level model, has been extensively used in small area applications.

The area-level model. For $i=1,\cdots,m$ ,

Level 1 (Sampling model): $y_{i}\,|\,\theta_{i}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{N}(\theta_{i},D_{i})$ ;

Level 2 (Linking model): $\theta_{i}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{G}(\bm{x_{i}^{\prime}\beta},A,\bm{\phi})$ .

Here, $m$ represents the number of small areas, $\mathcal{G}$ is a fully parametric distribution, not necessarily normal, with mean $\text{E}(\theta_{i})=\bm{x_{i}^{\prime}\beta}$ , variance $\text{Var}(\theta_{i})=A\geq 0$ , and any additional parameters $\bm{\phi}$ . Assuming the same normal distribution at level 1, Datta and Lahiri (1995) assumed that $\mathcal{G}$ is a scale mixture of normal distribution. Later, Bell and Huang (2006) and Xie et al. (2007) applied a $t$ distribution as a specific case of the scale mixture of normals at level 2, with mean $\bm{x_{i}^{\prime}\beta}$ , variance $A$ and degrees of freedom $\phi$ , to mitigate the influence of outliers. Fabrizi and Trivisano (2010) introduced two robust area-level models: the first assumes that $\theta_{i}$ follows an exponential power distribution with a mean $\bm{x_{i}^{\prime}\beta}$ , variance $A$ and shape parameter $\phi$ ; the second model assumes a skewed exponential power distribution on level 2, with a mean $\bm{x_{i}^{\prime}\beta}$ , variance $A$ , shape parameter $\psi$ and the skewness parameter $\lambda$ . Thus, in this case, the parameter vector $\bm{\phi}$ can be defined as $(\psi,\lambda)^{\prime}$ .

In the above model, level 1 is used to account for the sampling distribution of the direct estimates $y_{i}$ , which are weighted averages of observations from small area $i$ . Level 2 links the true small area means $\theta_{i}$ to a vector of $p$ known auxiliary variables $\bm{x_{i}}=(x_{i1},\cdots,x_{ip})^{\prime}$ , often obtained from various administrative records. The parameters $\bm{\beta}$ and $A$ of the linking model are generally unknown and are estimated from the available data. As in other papers on the area-level model, the sampling variances $D_{i}$ are assumed to be known. In practice, $D_{i}$ ’s are estimated using a smoothing technique such as the ones given in Fay and Herriot (1979), Otto and Bell (1995), Ha et al. (2014) and Hawala and Lahiri (2018).

The two-level model can be rewritten as the following simple linear mixed model

y_{i}=\theta_{i}+e_{i}=\bm{x_{i}^{\prime}\beta}+u_{i}+e_{i},\;i=1,\cdots,m,

(1.1)

where the random effect $u_{i}\overset{\mathrm{iid}}{\sim}\mathcal{G}(0,A,\bm{\phi})$ and the sampling error $e_{i}\overset{\mathrm{ind}}{\sim}\mathcal{N}(0,D_{i})$ are independent. When $\mathcal{G}$ is different from normal distribution, the best predictor (BP) of $\theta_{i}$ , $\tilde{\theta_{i}}^{\rm{BP}}=\text{E}(\theta_{i}|y_{i})$ may not have a closed form. Instead, the best linear predictor (BLP) of $\theta_{i}$ always has the explicit form as below:

\tilde{\theta_{i}}^{\rm{BLP}}=(1-B_{i})y_{i}+B_{i}\bm{x_{i}^{\prime}\beta},

(1.2)

where $B_{i}=D_{i}/(A+D_{i})$ , and the BLP minimizes the mean squared prediction error (MSPE) among all linear predictors of $\theta_{i}$ . The variance of the prediction error $(\theta_{i}-\tilde{\theta_{i}}^{\rm{BLP}})$ is denoted by $g_{1i}(A)\coloneq\text{Var}(\theta_{i}-\tilde{\theta_{i}}^{\rm{BLP}})=AD_{i}/(A+D_{i})$ . If $A$ is known, one can obtain the standard weighted least squares estimator of $\bm{\beta}$ , denoting as $\tilde{\bm{\beta}}(A)$ . Replacing $\bm{\beta}$ in (1.2) by $\tilde{\bm{\beta}}=\tilde{\bm{\beta}}(A)$ , a best linear unbiased prediction (BLUP) estimator of $\theta_{i}$ is given by

\tilde{\theta_{i}}^{\text{BLUP}}=(1-B_{i})y_{i}+B_{i}\bm{x_{i}^{\prime}\tilde{\bm{\beta}}},

(1.3)

which does not require normality assumptions typically used in small area estimation models. In practice, it is common that both $\bm{\beta}$ and $A$ are unknown and need to be estimated based on data. After plugging in their estimators, an empirical best linear unbiased predictor (EBLUP) of $\theta_{i}$ is given by:

\hat{\theta}_{i}^{\rm{EBLUP}}=(1-\hat{B}_{i})y_{i}+\hat{B}_{i}\bm{x_{i}^{\prime}\hat{\beta}},

(1.4)

where $\hat{B}_{i}=D_{i}/(\hat{A}+D_{i})$ and $\hat{\bm{\beta}}=\tilde{\bm{\beta}}(\hat{A})$ , and $\hat{A}$ is a consistent estimator of $A$ for large $m$ . After plugging in the estimator $\hat{A}$ , we have $g_{1i}(\hat{A})=\hat{A}D_{i}/(\hat{A}+D_{i})$ . To simplify the notation, throughout the remainder of this paper, we will use $g_{1i}$ and $\hat{g}_{1i}$ to denote $g_{1i}(A)$ and $g_{1i}(\hat{A})$ , respectively. Point prediction using EBLUP and the associated mean squared prediction error (MSPE) estimation have been studied extensively. See Rao and Molina (2015) and Jiang and Nguyen (2007) for a detailed discussion on this subject.

In this paper, we consider prediction interval approximates of small area means $\theta_{i}$ . A prediction interval of $\theta_{i}$ , denoted by $I_{i}$ , is called a $100(1-\alpha)\%$ interval for $\theta_{i}$ if $\mathrm{P}(\theta_{i}\in I_{i}|\bm{\beta},A,\bm{\phi})=1-\alpha$ , for any fixed $\bm{\beta}\in\mathcal{R}^{p}$ , $A\in\mathcal{R}^{+}$ and $\bm{\phi}\in\Theta_{\bm{\phi}}$ , the parameter space of $\bm{\phi}$ . The probability $\mathrm{P}$ is with respect to the area-level model. There are several options for constructing interval estimates for $\theta_{i}$ . The prediction interval based on only the Level 1 model for the observed data is given by $y_{i}\pm z_{\alpha/2}\sqrt{D_{i}}$ , where $z_{\alpha/2}$ is the $100(1-\alpha/2)$ th standard normal percentile. While the coverage probability for this direct interval is $1-\alpha$ , it is not efficient when $D_{i}$ is large as in the case of small area estimation. Hall and Maiti (2006) considered the interval based the regression synthetic estimator: $(\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{A}},\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{A}})$ , where $b_{\hat{\alpha}_{l}^{s}}$ and $b_{\hat{\alpha}_{u}^{s}}$ are obtained using a parametric bootstrap method (described in detail in a later section). However, this interval is synthetic in the sense that it is constructed using synthetic regression estimator of $\theta_{i}$ and its associated uncertainty measure, which are not area specific in the outcome variable and hence may cause larger length of the confidence interval.

It is of importance to combine both levels of the model in the interval estimation. A effective approach is to use empirical best methodology. We call an interval empirical best (EB) prediction interval if it is based on empirical best predictor of $\theta_{i}$ . For a special case of the area-level model that $\theta_{i}$ follows a normal distribution, Cox (1975) initiated the exact empirical Bayes interval: $\hat{\theta}_{i}^{\rm{EB}}\pm z_{\alpha/2}\sqrt{\hat{g}_{1i}}$ , where $\hat{\theta}_{i}^{\rm{EB}}$ is the empirical Bayes estimator of $\theta_{i}$ . Although the Cox interval always has smaller length than that of the direct interval, its coverage error is of the order $O(m^{-1})$ , not accurate enough in most small area applications. Yoshimori and Lahiri (2014) improved the cox-type empirical Bayes interval by using a carefully devised adjusted residual maximum likelihood (ARML) estimator of $A$ . Their interval has a coverage error of order $O(m^{-3/2})$ . Additionally, they analytically showed that their interval always produces shorter length than the corresponding direct interval. However, the properties of both the ARML estimator of $A$ and the associated interval have not yet been explored for cases involving non-normally distributed random effects.

A function of the data and parameters is called to be a pivot if its distribution does not depend on any unknown quantity (Shao, 2008; Hall, 2013). When we have a linear mixed normal model on (1.1) , $(\theta_{i}-\tilde{\theta}_{i}^{\rm{BLP}})/\sqrt{g_{1i}}$ is a standard normal pivot. The traditional method of interval estimation for $\theta_{i}$ is of the form $\rm{EBLUP}\pm z_{\alpha/2}\sqrt{\text{mspe}}$ , where mspe is an estimate of the true mean squared prediction error (MSPE) of the EBLUP. Unfortunately, $(\theta_{i}-\hat{\theta}_{i}^{\text{EBLUP}})/\sqrt{\hat{g}_{1i}}$ is not a pivot and this traditional approach produces too short or too long intervals. The coverage error of such interval is of the order $O(m^{-1})$ , not accurate enough in most small area applications. Recognizing that $(\theta_{i}-\hat{\theta}_{i}^{\text{EBLUP}})/\sqrt{\hat{g}_{1i}}$ does not follow a standard normal distribution, Chatterjee et al. (2008) and Li and Lahiri (2010) developed a parametric bootstrap method to approximate its distribution and obtained a EB prediction interval for $\theta_{i}$ in linear mixed normal models. They showed such interval has the coverage error of the order $O(m^{-3/2})$ . However, the property remains unknown for non-normal distributed random effects.

In this paper, one main aim is to bring out the virtues of pivoting or rescaling, which can decrease the dependence of our statistics on unknown parameters and yield improved prediction interval approximation for small areas under the general model (1.1). Analogous to Chatterjee et al. (2008), we propose parametric bootstrap methods to approximate the distribution of a suitably centered and scaled EBLUP under the general model (1.1), and apply it to construct a $100(1-\alpha)\%$ prediction interval estimation of $\theta_{i}$ . Here, we define an interval based on the EBLUP of $\theta_{i}$ as an Empirical Best Linear (EBL) prediction interval. Specifically, we introduce these two key quantities: one is the centered and scaled EBLUP following the $F_{1i}$ distribution, which can be expressed as below

H_{i}(\bm{\hat{\beta}},\hat{A})\coloneqq(\theta_{i}-\hat{\theta}_{i}^{\rm{EBLUP}})/\sqrt{\hat{g}_{1i}}\sim F_{1i},

and the other is based on BLP which has the $F_{2i}$ distribution and can be expressed as

H_{i}(\bm{\beta},A)\coloneqq(\theta_{i}-\tilde{\theta}_{i}^{\rm{BLP}})/\sqrt{g_{1i}}\sim F_{2i}.

If $F_{2i}$ does not depend on any unknown parameters, $H_{i}(\bm{\beta},A)$ can be referred as a pivot; otherwise, $H_{i}(\bm{\beta},A)$ is not a pivot and we rewrite $F_{2i}$ as $F_{2i}(\bm{\nu})$ , where $\bm{\nu}=(\nu_{1},\cdots,\nu_{s})^{\prime}$ is the unknown parameter vector determining the distribution of $H_{i}(\bm{\beta},A)$ . We define $q_{\alpha}$ as the $\alpha$ -level quantile of the distribution $F_{2i}(\bm{\nu})$ , if $\mathrm{P}(H_{i}(\bm{\beta},A)\leq q_{\alpha}|\;\bm{\nu})=\alpha$ for any fixed $\bm{\nu}$ .

In Section 2, we introduce a list of notation and regularity conditions used throughout the paper. In Section 3, we propose a single parametric bootstrap EBL prediction interval for small area means in the context of an area-level model, where the random effects follow a general (non-normal) known distribution, except possibly for unknown hyperparameters. We analytically demonstrate that the coverage error order of the EBL prediction interval remains the same as in Chatterjee et al. (2008), even when relaxing the normality of random effects through the existence of a pivot for suitably standardized random effects when hyperparameters are known. However, when a pivot does not exist, the EBL prediction interval fails to achieve the desired coverage error order, i.e. $O(m^{-3/2})$ . Surprisingly, we find that the order $O(m^{-1})$ term is always positive under certain conditions, indicating potential overcoverage in the current single parametric bootstrap EBL prediction interval. Recognizing the challenge of showing existence of a pivot, we develop a simple moment-based method to claim non-existence of pivot. In Section 4, we propose a double parametric bootstrap method under a general area level model and, for the first time, analytically show that this approach can correct the coverage problem. In Section 5, we compare our proposed EBL prediction interval methods with the direct method, other traditional approaches, and the parametric bootstrap prediction interval proposed by Hall and Maiti (2006).

2 A list of notations and regularity conditions

We use the following notations throughout the paper:

: $\bm{Y}=(y_{1},\cdots,y_{m})^{\prime}$ , a $m\times 1$ column vector of direct estimates;
: $\bm{X^{\prime}}=(\bm{x_{1}},\cdots,\bm{x_{m}})$ , a $p\times m$ known matrix of rank $p$ ;
: $\bm{\Sigma}={\rm diag}(A+D_{1},\cdots,A+D_{m})$ , a $m\times m$ diagonal matrix;
: $\bm{\tilde{\beta}}=(\bm{X^{\prime}\bm{\Sigma}^{-1}X)^{-1}X^{\prime}\bm{\Sigma}^{-1}Y}$ , weighted least square estimator of $\bm{\beta}$ with known $A$ ;
: $\bm{P=\bm{\Sigma}^{-1}-\bm{\Sigma}^{-1}X(X^{\prime}\bm{\Sigma}^{-1}X)^{-1}X^{\prime}\bm{\Sigma}^{-1}}$ ;
: $\partial F_{2i}(x;\tilde{\bm{\nu}})/\partial\bm{\nu}=\partial F_{2i}(x;\bm{\nu})/\partial\bm{\nu}|_{\bm{\nu}=\tilde{\bm{\nu}}}$ , derivative with respect to $\bm{\nu}$ evaluated at $\tilde{\bm{\nu}}$ .

We assume following regularity conditions for proving various results presented in this paper:

R1:

The rank of $\bm{X}$ , $\text{rank}(\bm{X})=p$ , is bounded for a large $m$ ;
R2:

$0<\inf_{i\geq 1}D_{i}\leq\sup_{i\geq 1}D_{i}<\infty$ , $A\in(0,\infty)$ and the true $\bm{\phi}\in\Theta_{\bm{\phi}}^{0}$ , the interior of $\Theta_{\bm{\phi}}$ ;
R3:

$\text{sup}_{i\geq 1}h_{ii}\equiv\bm{x_{i}^{\prime}(X^{\prime}X)^{-1}x_{i}}=O(m^{-1})$ ;
R4:

$\text{E}|u_{i}|^{8+\delta}<\infty$ for $0<\delta<1$ ;
R5:

The distribution function $F_{2i}(\cdot)$ is three times continuously differentiable with respect to ( $\cdot$ ), and third derivative is uniformly bounded. When $H_{i}(\bm{\beta},A)$ is not a pivot, $F_{2i}(\cdot;\bm{\nu})$ is three times continuously differentiable with respect to the parameter vector ${\bm{\nu}}$ , and its third derivative is uniformly bounded;

R6:

When having a non-pivot $H_{i}(\bm{\beta},A)$ , we assume the estimator $\bm{\hat{\nu}}$ satisfies:

\begin{split}\operatorname{E}[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\partial F_{2i}(x;\bm{\nu})/\partial\bm{\hat{\nu}}]=&\;O(m^{-1}),\\ \operatorname{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\partial^{2}F_{2i}(x;\bm{\nu})/\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}\Bigr{\}}(\bm{\hat{\nu}}-\bm{\nu})\right]=&\;O(m^{-1}),\\ \text{E}\lVert(\bm{\hat{\nu}}-\bm{\nu})\rVert^{3}=\sum_{i,j,k}\text{E}[(\hat{\nu}_{i}-\nu_{i})(\hat{\nu}_{j}-\nu_{j})(\hat{\nu}_{k}-\nu_{k})]=&\;o(m^{-1}),\end{split}

where the expectation is taken at the true $\bm{\nu}$ . For a special case that $\bm{\nu}=A$ , using the method of moment from Prasad and Rao (1990) to estimate $A$ , Lahiri and Rao (1995) showed that $\text{E}(\hat{A}_{\rm{PR}}-A)=o(m^{-1})$ . Moreover, they provided the Lemma C.1 that under the regularity conditions R1 - R5, $\text{E}|\hat{A}_{\rm{PR}}-A|^{2q}=O(m^{-q})$ for any $q$ satisfying $0<q\leq 2+\delta^{\prime}$ with $0<\delta^{\prime}<\frac{1}{4}\delta$ .

3 Single parametric bootstrap

For the remainder of this paper, without further explicit mention, we will use $\tilde{\theta}_{i}$ to represent the BLP of $\theta_{i}$ , and $\hat{\theta}_{i}$ to denote the EBLUP of $\theta_{i}$ . The single parametric bootstrap method has been widely studied for its simplicity in obtaining prediction intervals directly from the bootstrap histogram. Ideally, a prediction interval of $\theta_{i}$ can be constructed based on the distribution of $(\theta_{i}-\hat{\theta}_{i})/\sqrt{\hat{g}_{1i}}$ , although this distribution function $F_{1i}$ is complex and difficult to approximate analytically. In this paper, we firstly follow the procedure introduced by Chatterjee et al. (2008) and Li and Lahiri (2010) to provide a bootstrap approximation of $F_{1i}$ by using a parametric bootstrap method. The implementation is straightforward, following these steps:

1.

Conditionally on the data $(\bm{X,Y})$ , draw $\theta_{i}^{*}$ for $i=1,\cdots,m$ , independently from the distribution $\mathcal{G}(\bm{x_{i}}^{\prime}\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})$ ;
2.

Given $\theta_{i}^{*}$ , draw $y_{i}^{*}$ from the distribution $\mathcal{N}(\theta_{i}^{*},D_{i})$ ;
3.

Construct the bootstrap estimators $\hat{\bm{\beta}}^{*}$ , $\hat{A}^{*}$ using the data $(\bm{X,Y^{*}})$ and then obtain $\hat{\theta}_{i}^{*}=(1-\hat{B}^{*}_{i})y_{i}+\hat{B}^{*}_{i}\bm{x_{i}^{\prime}\hat{\beta}^{*}}$ and $\hat{g}_{1i}^{*}=\hat{A}^{*}D_{i}/(\hat{A}^{*}+D_{i})$ ;

Calibrate on $\alpha$ using the bootstrap method. Let $(q_{\alpha_{l}},q_{\alpha_{u}})=(q_{\hat{\alpha}_{l}^{s}},q_{\hat{\alpha}_{u}^{s}})$ be the solution of the following equations:

\begin{split}\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\alpha_{u}}\,|\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})&=1-\alpha/2\\ \mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\alpha_{l}}\,|\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})&=\alpha/2,\end{split}

(3.1)

where $H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})=(\theta_{i}^{*}-\hat{\theta}^{*}_{i})/\sqrt{\hat{g}^{*}_{1i}}$ ;

The single bootstrap calibrated prediction interval is constructed by:

\hat{I}_{i}^{s}=\left(\hat{\theta}_{i}+q_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{g}_{1i}}\,,\,\hat{\theta}_{i}+q_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{g}_{1i}}\right).

One of our main results from the algorithm above is that we relax the normality of the random effects by the existence of a pivot $H_{i}(\bm{\beta},A)$ , the prediction interval $\hat{I}_{i}^{s}$ obtained above still has a high degree of coverage accuracy. That is, it brings the coverage error down to $O(m^{-3/2})$ .

Theorem 3.1.

Under regularity conditions, for a preassigned $\alpha\in(0,1)$ and arbitrary $i=1,\cdots,m$ , when $H_{i}(\bm{\beta},A)$ is a pivot, we have

\mathrm{P}(\theta_{i}\in\hat{I}_{i}^{s})=\mathrm{P}\left(\hat{\theta}_{i}+q_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{g}_{1i}}\leq\theta_{i}\leq\hat{\theta}_{i}+q_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{g}_{1i}}\right)=1-\alpha+O(m^{-3/2}),

(3.2)

where $q_{\hat{\alpha}_{l}^{s}}$ and $q_{\hat{\alpha}_{u}^{s}}$ are determined via the single parametric bootstrap procedure described above.

The proof of Theorem 3.1 is given in the Appendix A. An example of Theorem 3.1 is that when $\mathcal{G}(\bm{x_{i}^{\prime}\beta},A,\bm{\phi})$ is a normal distribution, $\mathcal{N}(\bm{x_{i}^{\prime}\beta},A)$ , using Theorem 3.2 of Chatterjee et al. (2008), we have

\mathrm{P}(\theta_{i}\in\hat{I}_{i}^{s})=1-\alpha+O(m^{-3/2}).

(3.3)

Proposition 3.1.

When $H_{i}(\bm{\beta},A)$ is not a pivot, we have

\mathrm{P}(\theta_{i}\in\hat{I}_{i}^{s})=1-\alpha+O(m^{-1}),

(3.4)

where $\hat{I}^{s}_{i}$ is obtained from the single parametric bootstrap procedure described above.

The proof is given in the Appendix B.

Proposition 3.2.

Suppose that:

(i)

The random effects $u_{i}$ are symmetrically distributed;
(ii)

$\partial F_{2i}(x;\bm{\nu})/\partial\nu_{i}>0,\;1\leq i\leq s$ and $\lambda_{\rm{max}}(\partial^{2}F_{2i}(x;\bm{\nu})/\partial\bm{\nu}\partial\bm{\nu}^{\prime})<0$ , where $\lambda_{\rm{max}}$ means the largest eigenvalue.
This condition is satisfied for some continuous distributions. For instance, when $u_{i}$ follows a logistic or $t$ distribution with known degrees of freedom. The only unknown parameter of $F_{2i}(x;\bm{\nu})$ is the variance $A$ . As Remark 1 indicates that the kurtosis of $H_{i}(\bm{\beta},A)$ is a decreasing function of $A$ , it is not difficult to show that $\partial F_{2i}(x;A)/\partial A>0$ and $\partial^{2}F_{2i}(x;A)/\partial A^{2}<0$ ;
(iii)

The estimators of the unknown parameters, $\hat{\bm{\nu}}=(\hat{\nu}_{1},\cdots,\hat{\nu}_{s})$ , are either second-order unbiased or negatively biased, that is $\text{E}(\hat{\nu}_{i}-\nu_{i})=b_{i}+o(m^{-1})$ with $b_{i}\leq 0$ , for $i=1,\cdots,s$ .

Under these conditions, the prediction interval (3.4) has an overcoverage property. More specifically, we can rewrite (3.4) as below:

\mathrm{P}(\theta_{i}\in\hat{I}_{i}^{s})=1-\alpha+T_{1}+o(m^{-1}),

(3.5)

where $T_{1}$ is of the order $O(m^{-1})$ with $T_{1}>0$ .

See detailed proof in the Appendix C. The proposition indicates that under the regularity conditions, the prediction intervals conducted by the proposed single parametric bootstrap can produce higher coverage than the nominal coverage with a non pivot $H_{i}(\bm{\beta},A)$ up to the order of $O(m^{-1})$ , which could be beneficial for practitioners without considering other properties of prediction intervals.

Remark 1.

When $\mathcal{G}(\bm{x_{i}^{\prime}\beta},A,\bm{\phi})$ is not a normal distribution, it is challenging to obtain the explicit form of the distribution of $H_{i}(\bm{\beta},A)$ . Consequently, it is difficult to verify whether $H_{i}(\bm{\beta},A)$ is a pivot. Note the fact that for $H_{i}(\bm{\beta},A)$ to be a pivot, its moments should not depend on any unknown parameters. Based on that, we develop a simple moment-based method to claim non-existence of pivot. Under the symmetry assumption of $u_{i}$ , the odd moments of $H_{i}(\bm{\beta},A)$ are zero if they exist, and the second moment equals to 1 because it is standardized. To verify if $H_{i}(\bm{\beta},A)$ is a pivot, we calculate its fourth moment as follows:

\begin{split}&\text{E}\left[(\theta_{i}-\tilde{\theta}_{i})/\sqrt{g_{1i}}\right]^{4}\\ &=\frac{1}{g_{1i}^{2}}\text{E}\left[\bm{x_{i}^{\prime}\beta}+u_{i}-(1-B_{i})y_{i}-B_{i}\bm{x_{i}^{\prime}\beta}\right]^{4}\\ &=\frac{1}{g_{1i}^{2}}\text{E}\left[B_{i}u_{i}-(1-B_{i})e_{i}\right]^{4}\\ &=\frac{1}{g_{1i}^{2}}\text{E}\left[B_{i}^{4}u_{i}^{4}-4B_{i}^{3}u_{i}^{3}(1-B_{i})e_{i}+6B_{i}^{2}u_{i}^{2}(1-B_{i})^{2}e_{i}^{2}-4B_{i}u_{i}(1-B_{i})^{3}e_{i}^{3}+(1-B_{i})^{4}e_{i}^{4}\right]\\ &=\frac{1}{g_{1i}^{2}}\text{E}\left[B_{i}^{4}u_{i}^{4}+6B_{i}^{2}u_{i}^{2}(1-B_{i})^{2}e_{i}^{2}+(1-B_{i})^{4}e_{i}^{4}\right]\\ &=3+\frac{D_{i}^{2}}{(A+D_{i})^{2}}\left[\frac{\text{E}(u_{i}^{4})}{A^{2}}-3\right],\end{split}

(3.6)

where $\left[\text{E}(u_{i}^{4})/A^{2}-3\right]$ is the excess kurtosis of $u_{i}$ . Note that when $u_{i}$ is normally distributed, $\left[\text{E}(u_{i}^{4})/A^{2}-3\right]$ is zero. When the distribution of $u_{i}$ is other than normal, such as t, double exponential, and logistic, $\left[\text{E}(u_{i}^{4})/A^{2}-3\right]$ is a constant other than zero, indicating that the fourth moment of $H_{i}(\bm{\beta},A)$ depends on the unknown parameter $A$ and thus $H_{i}(\bm{\beta},A)$ is not a pivot. Moreover, in these cases, the fourth moment of $H_{i}(\bm{\beta},A)$ is a decreasing function of $A$ .

4 Double parametric bootstrap

Hall and Maiti (2006) considered parametric bootstrap methods to approximate the distribution of $(\theta_{i}-\bm{x_{i}^{\prime}\hat{\beta}})/\sqrt{\hat{A}}$ . Then, the prediction interval can be constructed as $(\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{A}},\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{A}})$ , where $b_{\hat{\alpha}_{l}^{s}}$ and $b_{\hat{\alpha}_{u}^{s}}$ are obtained from single bootstrap approximation based on their algorithm. Their prediction interval is based on a synthetic model or the regression model, which does not permit approximation of the conditional distribution of $\theta_{i}$ given the data $\bm{Y}$ . As a consequence, it is likely to underweight the area specific data. When the level 2 distribution $\mathcal{G}$ determined only by $\bm{x_{i}^{\prime}\beta}$ and $A$ , it is easy to know that the quantity, $(\theta_{i}-\bm{x_{i}^{\prime}\beta})/\sqrt{A}$ , is a pivot. As Hall and Maiti (2006) stated, their prediction interval is effective as $\mathrm{P}(\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{A}}\leq\theta_{i}\leq\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{A}})=1-\alpha+O(m^{-2})$ . However, when additional parameters involved into the distribution of random effects, $(\theta_{i}-\bm{x_{i}^{\prime}\beta})/\sqrt{A}$ might not a pivot and we may lose the effectiveness.

Proposition 4.1.

When considering a general distribution $\mathcal{G}(\bm{x_{i}}^{\prime}\bm{\beta},A,\bm{\phi})$ , such as a t distribution with unknown degree of freedom, we have

\mathrm{P}(\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{A}}\leq\theta_{i}\leq\bm{x_{i}^{\prime}}\hat{\bm{\beta}}+b_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{A}})=1-\alpha+O(m^{-1}),

(4.1)

where $b_{\hat{\alpha}_{l}^{s}}$ and $b_{\hat{\alpha}_{u}^{s}}$ are obtained from single bootstrap approximation based on Hall and Maiti (2006) algorithm.

See proof in the appendix D, which is similar to Proposition 3.1.

Hall and Maiti (2006) proposed a double-bootstrap method to calibrate $(\hat{\alpha}_{l}^{s},\hat{\alpha}_{u}^{s})$ from single parametric bootstraps, which can achieve a high degree of coverage accuracy $O(m^{-3})$ . However, their calibration approach can overcorrect and produce a calibrated $\hat{\alpha}$ greater than 1, which makes the implementation infeasible.

While the order $O(m^{-1})$ term $T_{1}$ in (3.5) is theoretically positive under certain conditions, it remains unclear whether this positiveness holds when $u_{i}$ is asymmetrically distributed. Unlike Hall and Maiti (2006), we introduce a new double bootstrap method, which does not require a pivot or symmetric $u_{i}$ . This method reduces the coverage error to $o(m^{-1})$ , even when $u_{i}$ is asymmetrically distributed. Our double bootstrap approach is based on the algorithm from Shi (1992), where the double bootstrap is proposed to obtain accurate and efficient confidence interval for parameters of interest in both nonparametric and univariate parametric distribution settings. Later on, McCullough and Vinod (1998) discussed the theory of the double bootstrap both with and without pivoting, and provided the implementations in some nonlinear production functions. In this paper, we develop the double bootstrap in the context of our mixed effect model and apply it to obtain the EBL prediction intervals of $\theta_{i}$ . The framework of our double parametric bootstrap is as below:

1.
First-stage bootstrap
- 1.1
  
  Conditionally on the data $(\bm{X,Y})$ , draw $\theta_{i}^{*},\,i=1,\cdots,m$ , from the distribution $\mathcal{G}(\bm{x_{i}}^{\prime}\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})$ ;
- 1.2
  
  Given $\theta_{i}^{*}$ , draw $y_{i}^{*}$ from the distribution $\mathcal{N}(\theta_{i}^{*},D_{i})$ ;
- 1.3
  
  Compute $\hat{\bm{\beta}}^{*}$ , $\hat{A}^{*}$ , $\hat{\bm{\phi}}^{*}$ from the data $(\bm{X,Y^{*}})$ , and obtain $\hat{\theta}_{i}^{*}$ and $\hat{g}_{1i}^{*}$ ;

Second-stage bootstrap

2.1

Given $\hat{\bm{\beta}}^{*}$ , $\hat{A}^{*}$ , $\hat{\bm{\phi}}^{*}$ , draw $\theta_{i}^{**}$ from the distribution $\mathcal{G}(\bm{x_{i}}^{\prime}\hat{\bm{\beta}}^{*},\hat{A}^{*},\hat{\bm{\phi}}^{*})$ ;
2.2

Given $\theta_{i}^{**}$ , draw $y_{i}^{**}$ from the distribution $\mathcal{N}(\theta_{i}^{**},D_{i})$ ;
2.3

Compute $\hat{\bm{\beta}}^{**}$ , $A^{**}$ from the data $(\bm{X,Y^{**}})$ , also obtain $\hat{\theta}_{i}^{**}$ and $\hat{g}_{1i}^{**}$ ;

2.4

Consider to obtain a $(1-\alpha)$ -level, two-sided, equal-tailed prediction interval. Define

\hat{G}^{*}(z)\equiv\mathrm{P}^{**}(H_{i}(\hat{\bm{\beta}}^{**},\hat{A}^{**})\leq z\,|\,\hat{\bm{\beta}}^{*},\hat{A}^{*},\hat{\phi}^{*})

(4.2)

For seeking the upper limit, we firstly solve the following system of equations in order to obtain $\hat{\alpha}_{u}$ such that

\left\{\begin{array}[]{@{}l@{}}\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\hat{\alpha}_{u}}\,|\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})=1-\alpha/2\\ \mathrm{P}^{**}(H_{i}(\hat{\bm{\beta}}^{**},\hat{A}^{**})\leq q_{\hat{\alpha}_{u}}\,|\,\hat{\bm{\beta}}^{*},\hat{A}^{*},\hat{\bm{\phi}}^{*})={\hat{\alpha}_{u}}.\end{array}\right.\,,

(4.3)

Using the definition (4.2), we have $q_{\hat{\alpha}_{u}}=\hat{G}^{*-1}(\hat{\alpha}_{u})$ . Then, the above system of equations is equivalent to

\mathrm{P}^{*}\left(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq\hat{G}^{*-1}(\hat{\alpha}_{u})\,|\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}}\right)=1-\alpha/2.

(4.4)

Rewriting (4.4) gives

\mathrm{P}^{*}\left[\mathrm{P}^{**}\left(H_{i}(\hat{\bm{\beta}}^{**},\hat{A}^{**})\leq H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\,|\,\hat{\bm{\beta}}^{*},\hat{A}^{*},\hat{\bm{\phi}}^{*}\right)\leq\hat{\alpha}_{u}\,|\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}}\right]=1-\alpha/2.

(4.5)

Note that the inner probability, $\mathrm{P}^{**}(H_{i}(\hat{\bm{\beta}}^{**},\hat{A}^{**})\leq H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\,|\,\hat{\bm{\beta}}^{*},\hat{A}^{*},\hat{\bm{\phi}}^{*})$ , is a function of the first-stage bootstrap resample. More specifically, on the $j$ th first-stage bootstrap sample, after all $K$ second-stage bootstrap operations are completed, let $Z_{j}$ be the proportion of times that $H_{i}(\hat{\bm{\beta}}^{**},\hat{A}^{**})\leq H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})$ , i.e.,

Z_{j}=\mathrm{P}^{**}\left(H_{i}(\hat{\bm{\beta}}^{**},\hat{A}^{**})\leq H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\,|\,\hat{\bm{\beta}}^{*},\hat{A}^{*},\hat{\bm{\phi}}^{*}\right)\approx\#(H_{i}(\hat{\bm{\beta}}^{**},\hat{A}^{**})\leq H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*}))/K

We will use $Z_{j}$ to adjust the first-stage intervals. After all bootstrapping operations are complete, we have estimates $Z_{j}$ , $j=1,2,\cdots,J$ , where $J$ is the number of first-stage bootstrap operations. Sort $Z_{j}$ and choose the $1-\alpha/2$ quantile as the percentile point $\hat{\alpha}_{u}$ for defining the double-bootstrap upper limit.

2.5

After completing the step 2.4, we have all estimates $H_{i}^{(j)}(\hat{\bm{\beta}}^{*},\hat{A}^{*})$ , $j=1,\cdots,J$ and $\hat{\alpha}_{u}$ . Choose $q_{\alpha_{u}}=q_{\hat{\alpha}^{d}_{u}}$ such that

\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\alpha_{u}}\,|\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})=\hat{\alpha}_{u}.

(4.6)

2.6

Take $q_{\hat{\alpha}^{d}_{u}}$ as the upper limit of the double bootstrap prediction interval. Similar operations determine the lower limit, $q_{\hat{\alpha}^{d}_{l}}$ . Finally, construct the prediction interval of $\theta_{i}$ as

\hat{I}_{i}^{d}=\left(\hat{\theta}_{i}+q_{\hat{\alpha}^{d}_{l}}\sqrt{\hat{g}_{1i}}\,,\,\hat{\theta}_{i}+q_{\hat{\alpha}^{d}_{u}}\sqrt{\hat{g}_{1i}}\right).

The algorithm above shows that the single parametric bootstrap $\hat{I}^{s}_{i}$ is calibrated by the second-stage bootstrap. In general, such a calibration improves the coverage accuracy to $o(m^{-1})$ , even when the pivot does not exist. One more advantage of our double bootstrap algorithm is that it avoids the problem of over correction and so make it more practical than that of Hall and Maiti (2006).

Theorem 4.1.

Under regularity conditions, for a preassigned $\alpha\in(0,1)$ and arbitrary $i=1,\cdots,m$ , we have

\mathrm{P}\left(\hat{\theta}_{i}+q_{\hat{\alpha}^{d}_{l}}\sqrt{\hat{g}_{1i}}\leq\theta_{i}\leq\hat{\theta}_{i}+q_{\hat{\alpha}_{u}^{d}}\sqrt{\hat{g}_{1i}}\right)=1-\alpha+o(m^{-1}),

(4.7)

where $q_{\hat{\alpha}^{d}_{l}}$ and $q_{\hat{\alpha}^{d}_{u}}$ are obtained from the double parametric bootstrap procedure described above.

See proof of the theorem in appendix E.

5 Monte Carlo Simulations

In this section, we compare the performance of the proposed parametric bootstrap with their competitors where available, using Monte Carlo simulation studies. To maintain comparability with existing studies, we adopt part of the simulation framework of Chatterjee et al. (2008). We consider an area-level model with $\bm{x_{i}^{\prime}\beta}=0$ , and five groups of small areas. Within each group, the sampling variances $D_{i}$ ’s remain the same. Two patterns for the $D_{i}$ ’s are considered: (i) $(4.0,0.6,0.5,0.4,0.2)$ ; (ii) $(8.0,1.2,1.0,0.8,0.4)$ . For pattern (i), we take $A=1$ ; for pattern (ii), we take $A=2$ doubling the variances of pattern (i) while preserving the $B_{i}=D_{i}/(A+D_{i})$ ratios. To examine the effect of $m$ , we consider $m=15$ and 50. With the increase of $m$ , all methods improve and get closer to one another, supporting our asymptotic theory. Since we obtained virtually identical results under these two patterns, for the full study we confined attention to the pattern (i) and the results for pattern (ii) are provided in the Appendix G.

The Prasad-Rao method-of-moments (Prasad and Rao, 1990), and the Fay-Herriot method of estimating the variance component $A$ are considered. For the Fay-Herriot estimator of $A$ , we employ the method of scoring to solve the estimating equation, which has showed to be more stable (Datta et al., 2005) than the Newton-Raphson method originally used in Fay and Herriot (1979). The estimation equation of the Fay-Herriot estimator is

f(A)=\sum_{i=1}^{m}\frac{(y_{i}-\bm{x_{i}^{\prime}\tilde{\beta}})^{2}}{A+D_{i}}=0.

Here, $f(A)$ is a decreasing function with respect to $A$ , and the expectation of its first derivative $\text{E}(f^{\prime}(A))=-\rm{tr}(\bm{P})<0$ . To improve computational efficiency, we implemented a slight modification to the algorithm. Specifically, the revised Fay-Herriot algorithm begins by calculating $f(A)$ at the initial point $A_{0}=0$ , as in Fay and Herriot (1979). If $f(A_{0})<0$ , we truncate the estimator to $\hat{A}_{\rm{FH}}=0.01$ . Otherwise, the iterative process continues to search for a positive solution, with $\hat{A}_{\rm{FH}}=0.01$ also applied if no positive solution is found. This revised Fay-Herriot algorithm further enhances computational efficiency compared to the original method.

5.1 Simulations on symmetric cases

First, we consider the scenario that $u_{i}$ is symmetrically distributed. Specifically, we assume $\mathcal{G}$ is a $t$ distribution with 9 degrees of freedom. In this setting, we compare coverage probabilities and average lengths of the following seven different prediction intervals of $\theta_{i}$ :

•

Two prediction intervals based on the proposed single parametric bootstrap method with two different variance estimators: $\hat{A}_{\rm{FH}}$ and $\hat{A}_{\rm{PR}}$ , denoted as SB.FH and SB.PR, respectively;
•

Two prediction intervals based on the single parametric bootstrap methods proposed by Hall and Maiti (2006), using the same variance estimators, denoted as HM.FH and HM.PR;
•

Two traditional prediction intervals of the form $\hat{\theta}_{i}\pm z_{\alpha/2}\sqrt{\text{mspe}(\hat{\theta}_{i})}$ , where the variance estimator $\hat{A}_{\rm{FH}}$ is used along with the MSPE estimator of Datta et al. (2005), denoted as FH, and the variance estimator $\hat{A}_{\rm{PR}}$ is used with the Prasad-Rao MSPE estimator (Prasad and Rao, 1990), denoted as PR;
•

The direct confidence interval (DIRECT) given by $y_{i}\pm z_{\alpha/2}\sqrt{D_{i}}$ .

Each reported result is based on 1000 simulation runs. For all cases, we consider single bootstrap sample of size 400 and three different nominal coverages: $\alpha\in(80\%,90\%,95\%)$ .

We report the percentages of negative estimates for A under $t_{9}$ distribution in Table 1. For $m=15$ , the Prasad-Rao method-of-moment approach yields as high as about 13% negative estimates of $A$ under pattern (i) of $D_{i}$ . Fay-Herroit method produces significantly fewer negative estimates than the Prasad-Rao method-of-moment approach across all scenarios.

Table 1: Percentages of negative estimates of

A

(

\hat{A}^{*}

) for different estimation methods and different patterns of

D_{i}

under

t_{9}

	Pattern (i)		Pattern (ii)
$m$	FH	PR	FH	PR
15	1.6 (6.4)	13.1 (25.0)	1.1 (5.7)	11.8 ( 23.2)
50	0 (0.07)	1.2(6.6)	0 (0.07)	1.2 (6.1)

Table 2 presents coverage probabilities and average lengths for each prediction interval method with $m=50$ and the $t_{9}$ distribution under the pattern (i) of $D_{i}$ . The SB.PR and HM.PR prediction intervals consistently over-cover. SB.PR prediction intervals have shorter lengths than HM.PR, especially for groups 2-5 with smaller sampling variances $D_{i}$ . The SB.FH and HM.FH prediction intervals perform well regarding coverage errors. Specifically, their coverage probabilities are very close to all three nominal coverages. The SB.FH method produces the shortest average lengths among these four single parametric bootstrap methods. The FH intervals tend to undercover for G1 group when $\alpha=90\%$ and $95\%$ . The PR intervals have the undercoverage issue for G1 group across all three nominal coverages.

Table 2: Coverage probabilities (average length) of different intervals under

t_{9}

distribution with

m

= 50 and Pattern (i)

SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) $\alpha=80\%$ G1 79.94 ( 2.31 ) 80.30 ( 2.53 ) 83.7 ( 2.59 ) 84.45 ( 2.84 ) 79.89 ( 2.30 ) 78.66 ( 2.31 ) 79.43 ( 5.13 ) G2 80.15 ( 1.59 ) 80.31 ( 2.49 ) 83.63 ( 1.79 ) 84.54 ( 2.81 ) 80.38 ( 1.59 ) 81.55 ( 1.67 ) 79.85 ( 1.99 ) G3 79.71 ( 1.50 ) 79.53 ( 2.48 ) 83.12 ( 1.69 ) 83.75 ( 2.80 ) 79.86 ( 1.50 ) 80.88 ( 1.58 ) 79.99 ( 1.81 ) G4 80.47 ( 1.39 ) 80.03 ( 2.48 ) 84.06 ( 1.57 ) 83.95 ( 2.80 ) 80.49 ( 1.39 ) 82.06 ( 1.47 ) 80.29 ( 1.62 ) G5 79.65 ( 1.05 ) 80.48 ( 2.46 ) 83.32 ( 1.20 ) 84.89 ( 2.78 ) 79.99 ( 1.06 ) 81.8 ( 1.14 ) 80.32 ( 1.15 ) (i) $\alpha=90\%$ G1 89.88 ( 3.05 ) 90.35 ( 3.39 ) 94.13 ( 3.79 ) 94.25 ( 4.21 ) 88.9 ( 2.96 ) 87.82 ( 2.97 ) 89.42 ( 6.58 ) G2 90.17 ( 2.05 ) 90.37 ( 3.30 ) 93.25 ( 2.59 ) 94.27 ( 4.14 ) 89.89 ( 2.04 ) 90.58 ( 2.14 ) 90.19 ( 2.55 ) G3 90.06 ( 1.94 ) 89.34 ( 3.30 ) 93.38 ( 2.44 ) 93.76 ( 4.14 ) 90.15 ( 1.92 ) 90.78 ( 2.03 ) 90.11 ( 2.33 ) G4 90.07 ( 1.79 ) 89.50 ( 3.29 ) 93.41 ( 2.26 ) 94.24 ( 4.13 ) 90.12 ( 1.78 ) 91.19 ( 1.89 ) 90.24 ( 2.08 ) G5 90.29 ( 1.36 ) 90.28 ( 3.26 ) 93.33 ( 1.72 ) 94.2 ( 4.09 ) 90.26 ( 1.36 ) 91.56 ( 1.47 ) 90.32 ( 1.47 ) (i) $\alpha=95\%$ G1 95.13 ( 3.75 ) 95.04 ( 4.22 ) 97.3 ( 5.31 ) 97.42 ( 5.91 ) 93.57 ( 3.52 ) 92.65 ( 3.53 ) 94.77 ( 7.84 ) G2 95.11 ( 2.47 ) 95.35 ( 4.07 ) 96.99 ( 3.61 ) 97.5 ( 5.79 ) 94.87 ( 2.43 ) 95.52 ( 2.55 ) 95.38 ( 3.04 ) G3 94.96 ( 2.32 ) 94.45 ( 4.06 ) 97.05 ( 3.39 ) 97.42 ( 5.77 ) 94.70 ( 2.29 ) 95.51 ( 2.41 ) 95.00 ( 2.77 ) G4 95.06 ( 2.14 ) 95.03 ( 4.04 ) 96.86 ( 3.14 ) 97.58 ( 5.75 ) 95.12 ( 2.12 ) 95.86 ( 2.25 ) 95.07 ( 2.48 ) G5 95.31 ( 1.62 ) 94.98 ( 3.99 ) 96.87 ( 2.38 ) 97.42 ( 5.69 ) 95.28 ( 1.62 ) 96.01 ( 1.75 ) 95.17 ( 1.75 )

Table 3 reports the results for $m=15$ . As illustrated in Table 1, the Prasad–Rao method produces an extremely high percentage of zero estimates for a small $m=15$ . Thus, it is not surprising to note that the SB.PR intervals have severe undercoverage problem when $\alpha=90\%$ and 95%. The high percentage of negative estimates of $A$ might also contribute to the similar undercoverage problem of HM.PR intervals at $\alpha=90\%$ and 95% as well as the large average lengths of both SB.PR and HM.PR intervals. The SB.FH and HM.FH intervals uniformly tend to overcover. Still, SB.FH intervals have the shortest average lengths among the four types of parametric bootstrap intervals. The FH intervals significantly undercover for group G1 at three normial coverages and have slight undercoverage for groups G2 and G3, when $\alpha=90\%$ and 95%. The PR intervals undercover for group G1 and switch to overcover for rest of the groups at all three nominal coverages.

Table 3: Coverage probabilities (average length) of different intervals under

t_{9}

distribution with

m

= 15 and Pattern (i)

SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) $\alpha=80\%$ G1 82.43 ( 2.68 ) 82.50 ( 2.94 ) 81.50 ( 3.37 ) 82.30 ( 3.74 ) 75.07 ( 2.31 ) 76.23 ( 2.45 ) 81.37 ( 5.13 ) G2 82.80 ( 1.78 ) 83.50 ( 2.78 ) 80.77 ( 2.25 ) 82.13 ( 3.60 ) 79.43 ( 1.63 ) 88.97 ( 2.19 ) 80.03 ( 1.99 ) G3 81.40 ( 1.67 ) 81.37 ( 2.76 ) 80.57 ( 2.12 ) 82.27 ( 3.58 ) 79.07 ( 1.54 ) 88.20 ( 2.15 ) 80.40 ( 1.81 ) G4 84.07 ( 1.55 ) 84.17 ( 2.74 ) 81.50 ( 1.95 ) 82.97 ( 3.54 ) 81.40 ( 1.43 ) 90.47 ( 2.1 ) 80.67 ( 1.62 ) G5 81.80 ( 1.17 ) 83.17 ( 2.66 ) 79.33 ( 1.47 ) 82.23 ( 3.42 ) 80.73 ( 1.11 ) 88.50 ( 1.98 ) 80.00 ( 1.15 ) (i) $\alpha=90\%$ G1 93.07 ( 3.89 ) 92.87 ( 4.33 ) 87.00 ( 5.77 ) 87.3 ( 6.45 ) 83.97 ( 2.97 ) 85.80 ( 3.14 ) 90.33 ( 6.58 ) G2 93.00 ( 2.51 ) 93.1 ( 3.96 ) 86.30 ( 3.78 ) 87.27 ( 6.03 ) 89.37 ( 2.09 ) 95.70 ( 2.81 ) 90.87 ( 2.55 ) G3 92.63 ( 2.35 ) 92.5 ( 3.91 ) 86.50 ( 3.57 ) 87.77 ( 6.01 ) 88.37 ( 1.98 ) 95.27 ( 2.76 ) 90.20 ( 2.33 ) G4 93.90 ( 2.16 ) 93.83 ( 3.86 ) 87.53 ( 3.28 ) 87.87 ( 5.94 ) 90.17 ( 1.84 ) 96.67 ( 2.70 ) 90.93 ( 2.08 ) G5 91.83 ( 1.60 ) 92.87 ( 3.67 ) 86.03 ( 2.42 ) 87.53 ( 5.62 ) 90.03 ( 1.43 ) 94.73 ( 2.54 ) 89.60 ( 1.47 ) (i) $\alpha=95\%$ G1 97.13 ( 5.46 ) 97.07 ( 6.11 ) 88.77 ( 8.91 ) 88.80 ( 9.98 ) 88.73 ( 3.54 ) 90.73 ( 3.75 ) 94.73 ( 7.84 ) G2 96.33 ( 3.41 ) 96.83 ( 5.36 ) 88.43 ( 5.73 ) 88.87 ( 9.17 ) 93.7 ( 2.49 ) 97.87 ( 3.35 ) 95.7 ( 3.04 ) G3 96.60 ( 3.18 ) 96.83 ( 5.26 ) 89.20 ( 5.39 ) 89.50 ( 9.10 ) 93.33 ( 2.36 ) 97.90 ( 3.29 ) 94.77 ( 2.77 ) G4 97.23 ( 2.90 ) 97.37 ( 5.15 ) 89.87 ( 4.93 ) 89.90 ( 8.95 ) 95.33 ( 2.19 ) 98.83 ( 3.22 ) 95.63 ( 2.48 ) G5 96.30 ( 2.11 ) 96.93 ( 4.77 ) 88.93 ( 3.59 ) 89.30 ( 8.38 ) 95.13 ( 1.7 ) 97.80 ( 3.03 ) 94.57 ( 1.75 )

5.2 Further simulations on asymmetric cases

While some of our theoretical results for single bootstrap are based on the symmetry assumption, we also use simulation studies to assess our proposed parametric bootstrap method under asymmetric conditions. Specifically, for the same models and parameter choices as above, we change $\mathcal{G}$ to be a shifted exponential distribution (SE). Besides the seven prediction intervals mentioned before, we also include our proposed double parametric bootstrap intervals in this subsection and we expect they might improve the potential coverage error under asymmetric conditions. Below, the double bootstrap method based on the Fay-Herriot variance estimator is denoted by DB.FH, and the double bootstrap method based on the Prasad-Rao variance estimator is indicated by DB.PR. We keep bootstrap sample of size 400 in the first stage and apply two different sizes $B_{2}\in(50,100)$ in the second stage. Since for $B_{2}=50$ and $B_{2}=100$ , we obtained very similar results. Moreover, the two patterns of $D_{i}$ gave us similar results. In the following, we provide detailed discussions only for $B_{2}=100$ under the pattern (i), but the results for $B_{2}=50$ are provided in the Appendix G.

Table 4 shows the percentages of negative estimates for $A$ under the SE distribution. When $m=15$ , the Prasad-Rao method-of-moments approach yields very high percentages of negative estimates at both the first and second stages of bootstrapping. Although the Fay-Herriot method results in fewer negative estimates than the Prasad-Rao approach, the occurrence of negative estimates remains notable, especially when the number of small areas is small (e.g., $m=15$ ).

Table 4: Percentages of negative estimates of

A

(

\hat{A}^{*}

)[

\hat{A}^{**}

] for different estimation methods and different patterns of

D_{i}

under SE distribution.

	Pattern (i)		Pattern (ii)
$m$	FH	PR	FH	PR
15	3.6 (10.72) [16.32]	16.7 (27.98) [31.95]	3.9 (10.93) [16.46]	18.1 (28.49) [31.97]
50	0 (0.37) [1.44]	1.80 ( 7.89) [ 12.82]	0 (0.44) [1.56]	2.80 (8.89) [13.68]

Table 5 displays the coverage probabilities and average lengths for each prediction interval under $m=50$ . The parametric bootstrap methods based on Prasad-Rao estimators of $A$ , SB.PR, HM.PR and DB.PR, consistently tend to over-cover at $\alpha=80\%$ and 90%. DB.PR intervals bring the coverage probabilities closer to the nominal coverages but have larger lengths than single bootstrap SB.PR intervals. Even with $m=50$ , the relatively high percentage of zero estimates of $A$ in the second bootstrap stage showed in Table 4, might contribute to the increment on interval length of DB.PR. When applying the Fay-Herriot method in estimating $A$ , our proposed single parametric bootstrap intervals SB.FH already showed good performance, DB.FH has little or no effect. The FH intervals perform well, except overcoverage at $\alpha=80\%$ . PR intervals have undercoverage problem for group G1 at $\alpha=95\%$ . Overall, the SB.FH and FH intervals perform the best in terms of both coverage probabilities and average lengths in this setting.

Table 5: Coverage probabilities (average length) of different intervals under shifted exponential distribution with

m

= 50,

B_{2}

= 100 and Pattern (i)

SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) $\alpha=80\%$ G1 80.29 ( 2.20 ) 80.75 ( 2.35 ) 79.58 ( 2.22 ) 84.50 ( 2.50 ) 87.12 ( 2.68 ) 83.16 ( 2.86 ) 83.45 ( 2.28 ) 81.94 ( 2.30 ) 79.90 ( 5.13 ) G2 79.99 ( 1.55 ) 81.01 ( 2.31 ) 79.42 ( 1.56 ) 83.66 ( 1.77 ) 87.17 ( 2.65 ) 82.99 ( 2.07 ) 80.56 ( 1.57 ) 82.99 ( 1.67 ) 79.94 ( 1.99 ) G3 80.06 ( 1.47 ) 81.27 ( 2.30 ) 79.62 ( 1.47 ) 83.41 ( 1.68 ) 87.39 ( 2.65 ) 83.09 ( 1.97 ) 80.38 ( 1.48 ) 82.61 ( 1.59 ) 79.86 ( 1.81 ) G4 80.60 ( 1.36 ) 80.68 ( 2.30 ) 80.20 ( 1.37 ) 83.71 ( 1.56 ) 87.00 ( 2.64 ) 82.96 ( 1.83 ) 81.05 ( 1.37 ) 83.78 ( 1.48 ) 80.15 ( 1.62 ) G5 80.81 ( 1.05 ) 80.52 ( 2.29 ) 80.34 ( 1.05 ) 83.79 ( 1.20 ) 87.13 ( 2.63 ) 83.23 ( 1.4 ) 80.95 ( 1.05 ) 83.54 ( 1.17 ) 80.69 ( 1.15 ) (i) $\alpha=90\%$ G1 90.46 ( 3.01 ) 91.81 ( 3.26 ) 89.56 ( 3.03 ) 93.49 ( 3.79 ) 94.70 ( 4.16 ) 92.15 ( 5.53 ) 90.98 ( 2.92 ) 90.23 ( 2.95 ) 90.01 ( 6.58 ) G2 90.22 ( 2.04 ) 91.74 ( 3.16 ) 89.53 ( 2.05 ) 93.01 ( 2.61 ) 94.91 ( 4.10 ) 92.09 ( 4.05 ) 90.14 ( 2.02 ) 92.22 ( 2.14 ) 89.81 ( 2.55 ) G3 90.00 ( 1.93 ) 91.93 ( 3.15 ) 89.48 ( 1.93 ) 92.51 ( 2.46 ) 94.89 ( 4.10 ) 91.69 ( 3.85 ) 90.2 ( 1.9 ) 91.69 ( 2.03 ) 89.80 ( 2.33 ) G4 90.48 ( 1.78 ) 91.74 ( 3.14 ) 89.8 ( 1.79 ) 92.72 ( 2.28 ) 94.65 ( 4.09 ) 92.09 ( 3.59 ) 90.47 ( 1.76 ) 92.06 ( 1.90 ) 90.17 ( 2.08 ) G5 90.69 ( 1.36 ) 91.34 ( 3.10 ) 90.24 ( 1.37 ) 92.97 ( 1.74 ) 94.64 ( 4.04 ) 92.23 ( 2.76 ) 90.71 ( 1.35 ) 92.35 ( 1.50 ) 90.29 ( 1.47 ) (i) $\alpha=95\%$ G1 95.43 ( 3.83 ) 96.68 ( 4.18 ) 94.37 ( 3.90 ) 96.45 ( 5.44 ) 96.80 ( 6.02 ) 95.73 ( 9.56 ) 94.27 ( 3.48 ) 93.58 ( 3.52 ) 95.04 ( 7.84 ) G2 95.32 ( 2.51 ) 96.69 ( 3.98 ) 94.77 ( 2.58 ) 96.37 ( 3.69 ) 96.91 ( 5.90 ) 95.88 ( 6.82 ) 94.91 ( 2.40 ) 96.04 ( 2.55 ) 95.07 ( 3.04 ) G3 94.87 ( 2.36 ) 96.54 ( 3.96 ) 94.35 ( 2.43 ) 96.34 ( 3.49 ) 97.19 ( 5.89 ) 95.72 ( 6.48 ) 94.72 ( 2.27 ) 95.51 ( 2.42 ) 94.79 ( 2.77 ) G4 95.41 ( 2.17 ) 96.36 ( 3.93 ) 94.87 ( 2.23 ) 96.35 ( 3.22 ) 97.01 ( 5.86 ) 95.87 ( 6.06 ) 95.13 ( 2.10 ) 95.92 ( 2.27 ) 95.17 ( 2.48 ) G5 95.34 ( 1.64 ) 96.27 ( 3.86 ) 95.16 ( 1.69 ) 96.34 ( 2.43 ) 96.99 ( 5.79 ) 95.92 ( 4.60 ) 95.31 ( 1.61 ) 96.30 ( 1.78 ) 95.44 ( 1.75 )

With $m=15$ , similar to the cases under $t_{9}$ distribution, the Prasad-Rao variance estimator produces very high percentages of zero estimates of $A$ , especially at the bootstrap stages. See Table 4. Thus, it is not abnormal to have the results that the three parametric bootstrap intervals based on $\hat{A}_{PR}$ have very low coverage probabilities when the nominal coverage $\alpha=90\%$ or 95%. See Table 6 for the cases $m=15$ . The other three parametric bootstrap intervals based on $\hat{A}_{FH}$ overcover when nominal covarage is 80% or 90%. When $\alpha=95\%$ , SB.FH intervals have good performance as well as DB.FH in terms of coverage error, while SB.FH intervals have shorter average lengths than those of HM.FH and DB.FH. The FH intervals suffer from severe undercoverage for group G1 at all nominal coverages as well as PR intervals at $\alpha=95\%$ . With $\alpha=80\%$ and 90%, PR intervals have longer average lengths than those of SB.FH intervals for groups G2 - G5.

Table 6: Coverage probabilities (average length) of different intervals under shifted exponential distribution with

m

= 15,

B_{2}

= 100 and Pattern (i)

SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) $\alpha=80\%$ G1 84.63 ( 2.66 ) 86.00 ( 2.90 ) 83.63 ( 2.68 ) 79.77 ( 3.31 ) 81.33 ( 3.69 ) 80.6 ( 4.89 ) 77.17 ( 2.23 ) 80.43 ( 2.43 ) 79.8 ( 5.13 ) G2 82.40 ( 1.78 ) 85.5 ( 2.74 ) 81.63 ( 1.80 ) 79.50 ( 2.20 ) 81.33 ( 3.56 ) 79.57 ( 3.26 ) 79.43 ( 1.59 ) 88.6 ( 2.21 ) 79.23 ( 1.99 ) G3 82.77 ( 1.68 ) 85.87 ( 2.72 ) 81.97 ( 1.70 ) 80.17 ( 2.08 ) 81.53 ( 3.55 ) 80.40 ( 3.07 ) 80.13 ( 1.51 ) 89.27 ( 2.18 ) 78.7 ( 1.81 ) G4 84.13 ( 1.55 ) 84.90 ( 2.70 ) 82.97 ( 1.57 ) 79.63 ( 1.92 ) 80.83 ( 3.51 ) 79.63 ( 2.80 ) 81.83 ( 1.41 ) 90.9 ( 2.15 ) 82.03 ( 1.62 ) G5 82.80 ( 1.18 ) 84.23 ( 2.61 ) 81.70 ( 1.18 ) 79.7 ( 1.44 ) 80.67 ( 3.38 ) 80.00 ( 2.00 ) 81.40 ( 1.12 ) 88.93 ( 2.07 ) 80.60 ( 1.15 ) (i) $\alpha=90\%$ G1 93.77 ( 4.08 ) 94.33 ( 4.54 ) 93.13 ( 4.83 ) 85.83 ( 5.84 ) 86.10 ( 6.59 ) 86.07 ( 13.91 ) 86.13 ( 2.86 ) 89.37 ( 3.12 ) 90.43 ( 6.58 ) G2 92.70 ( 2.59 ) 94.00 ( 4.06 ) 91.53 ( 2.93 ) 86.00 ( 3.78 ) 86.33 ( 6.19 ) 85.97 ( 8.66 ) 89.07 ( 2.04 ) 95.13 ( 2.84 ) 89.67 ( 2.55 ) G3 92.83 ( 2.43 ) 93.97 ( 4.01 ) 91.67 ( 2.69 ) 86.80 ( 3.55 ) 86.80 ( 6.14 ) 86.73 ( 8.16 ) 89.5 ( 1.94 ) 95.77 ( 2.8 ) 90.20 ( 2.33 ) G4 93.20 ( 2.23 ) 93.83 ( 3.94 ) 92.00 ( 2.48 ) 85.93 ( 3.26 ) 86.37 ( 6.07 ) 85.5 ( 7.39 ) 91.43 ( 1.81 ) 96.2 ( 2.76 ) 91.60 ( 2.08 ) G5 92.17 ( 1.65 ) 93.1 ( 3.71 ) 91.00 ( 1.75 ) 86.53 ( 2.39 ) 86.57 ( 5.74 ) 86.33 ( 5.20 ) 91.30 ( 1.43 ) 95.23 ( 2.66 ) 90.23 ( 1.47 ) (i) $\alpha=95\%$ G1 96.60 ( 6.02 ) 96.77 ( 6.72 ) 96.50 ( 10.31 ) 87.97 ( 9.34 ) 88.07 ( 10.58 ) 88.4 ( 22.55 ) 90.33 ( 3.41 ) 93.63 ( 3.71 ) 95.07 ( 7.84 ) G2 96.17 ( 3.64 ) 96.67 ( 5.67 ) 95.43 ( 5.73 ) 88.8 ( 5.86 ) 88.80 ( 9.64 ) 88.93 ( 13.23 ) 93.73 ( 2.43 ) 97.57 ( 3.38 ) 95.00 ( 3.04 ) G3 96.07 ( 3.38 ) 96.30 ( 5.53 ) 95.57 ( 5.13 ) 89.13 ( 5.48 ) 89.00 ( 9.53 ) 89.8 ( 12.69 ) 93.97 ( 2.31 ) 98.00 ( 3.34 ) 95.67 ( 2.77 ) G4 95.93 ( 3.09 ) 96.63 ( 5.41 ) 95.40 ( 4.59 ) 88.53 ( 5.03 ) 88.30 ( 9.42 ) 88.67 ( 11.26 ) 95.33 ( 2.16 ) 98.03 ( 3.28 ) 96.00 ( 2.48 ) G5 96.17 ( 2.22 ) 96.4 ( 4.95 ) 95.47 ( 2.99 ) 88.83 ( 3.63 ) 88.63 ( 8.76 ) 89.07 ( 8.17 ) 95.77 ( 1.71 ) 97.77 ( 3.16 ) 94.90 ( 1.75 )

6 Discussion

In this study, we put forward parametric bootstrap approaches to construct prediction intervals in the contexts of small area estimation under a general (non-normal) area-level model. Our simulation results show that the proposed single bootstrap method with the Fay-Herriot method of variance estimator performs well for all cases. Moreover, it is more efficient in terms of average lengths than the existing parametric bootstrap method proposed by Hall and Maiti (2006).

When the number of area is small, there is a high likelihood to obtain negative estimates of the variance $A$ when applying Prasad-Rao method. Throughout our simulation studies, we notice that the likelihood of obtaining negative estimates of variance might affect the performance of the parametric bootstrap methods in terms of both coverage probabilities and average lengths. One might consider a better variance estimator, such as the Fay-Herriot estimator we used, or a sensible truncation of negative estimates. In this study, we arbitrarily truncate the negative estimates of the variance $A$ at 0.01 as similar to Datta et al. (2005). To this end, in the future we will consider an extension of adjusted maximum likelihood estimators of $A$ such as the ones considered by Li and Lahiri (2010) or Hirose and Lahiri (2018) to the model proposed in this paper.

There is an another issue that even though the double bootstrap calibration can bring the coverage accuracy to $o(m^{-1})$ regardless the existence of pivot, our simulations suggest that it is not always beneficial to attempt boosting the theoretical coverage probability via double bootstrap, disregarding other properties of the interval. Specifically, variability of calibrated intervals are greater than uncalibrated ones, minimum length property is almost never preserved, and the results are quite dependent on the parameters and fixed constants of the problem, such as estimation of the variance components $A$ in this study. For instance, Table 5 shows that the proposed single bootstrap intervals already produced good performance and thus the calibration using double bootstrap has little or no effect, where $m=50$ . When $m$ is relatively small, i.e., $m=15$ , double bootstrap improves the coverage probability marginally but it produces much larger interval length than the corresponding single bootstrap method.

Appendices

Appendix A Appendix

Proof of Theorem3.1: Under regularity conditions, with a pivot $H_{i}(\bm{\beta},A)$ , we have

\begin{split}F_{1i}(q_{\alpha})&=\mathrm{P}((\theta_{i}-\hat{\theta}_{i})/\sqrt{\hat{g}_{1i}}\leq q_{\alpha})\\ &=E\left[\mathrm{P}\left((\theta_{i}-\tilde{\theta}_{i})/\sqrt{g_{1i}}\leq q_{\alpha}+(\hat{\theta}_{i}-\tilde{\theta}_{i}+q_{\alpha}(\sqrt{\hat{g}_{1i}}-\sqrt{g_{1i}}))/\sqrt{g_{1i}}\right)\right]\\ &=E\left[F_{2i}(q_{\alpha}+Q(q_{\alpha}+\bm{Y}))\right]\\ &=F_{2i}(q_{\alpha})+m^{-1}\gamma(\bm{\beta},A;q_{\alpha})+O(m^{-3/2}).\\ \end{split}

(A.1)

where $Q(q_{\alpha}+\bm{Y})=(\hat{\theta}_{i}-\tilde{\theta}_{i}+q_{\alpha}(\sqrt{\hat{g}_{1i}}-\sqrt{g_{1i}}))/\sqrt{g_{1i}}$ , and $\gamma$ is a smooth function of order $O(1)$ . The last equation is derived from similar results of the chapter 4 of Li (2007). See details in Appendix F. Then, from equations (3.1) and (A.1), we have

\begin{split}\text{E}\left[\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\hat{\alpha}_{u}^{s}}\,\big{|}\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}}))\right]&=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}^{s}})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{u}^{s}})\right]+O(m^{-3/2})\\ &=1-\alpha/2,\\ \text{E}\left[\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\hat{\alpha}_{l}^{s}}\,\big{|}\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})\right]&=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{l}^{s}})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{l}^{s}})\right]+O(m^{-3/2})\\ &=\alpha/2.\end{split}

(A.2)

Therefore, we have

\begin{split}\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}^{s}})\right]=1-\alpha/2-\text{E}\left[m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{u}^{s}})\right]+O(m^{-3/2}),\end{split}

(A.3)

and

\begin{split}\text{E}\left[F_{2i}(q_{\hat{\alpha}_{l}^{s}})\right]=\alpha/2-\text{E}\left[m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{l}^{s}})\right]+O(m^{-3/2}).\end{split}

(A.4)

\begin{split}\mathrm{P}&\left(\hat{\theta}_{i}+q_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{g}_{1i}}\leq\theta_{i}\leq\hat{\theta}_{i}+q_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{g}_{1i}}\right)\\ =&\mathrm{P}\left(H_{i}(\hat{\bm{\beta}},\hat{A})\leq q_{\hat{\alpha}_{u}^{s}}\right)-\mathrm{P}\left(H_{i}(\hat{\bm{\beta}},\hat{A})\leq q_{\hat{\alpha}_{l}^{s}}\right)\\ =&\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}^{s}})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{u}^{s}})\right]-\text{E}\left[F_{2i}(q_{\hat{\alpha}_{l}^{s}})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{l}^{s}})\right]+O(m^{-3/2})\\ =&1-\alpha/2+\text{E}\left[m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{u}^{s}})-m^{-1}\gamma(\bm{\hat{\beta}},\hat{A};q_{\hat{\alpha}_{u}^{s}})\right]-\alpha/2\\ &+\text{E}\left[m^{-1}\gamma(\bm{\hat{\beta}},\hat{A};q_{\hat{\alpha}_{l}^{s}})-m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{l}^{s}})\right]+O(m^{-3/2})\\ =&1-\alpha+O(m^{-3/2}).\end{split}

(A.5)

Appendix B Appendix

Proof of Proposition3.1: Consider $H(\bm{\beta},A)$ is not a pivot, that is, $F_{2i}$ , depends some unknown true parameters, $\bm{\nu}$ . Here, we rewrite the distribution function of $H(\bm{\beta},A)$ as $F_{2i}(\bm{\nu})$ , and Taylor expand $F_{2i}(x;\bm{\nu})$ around $\bm{\nu}=\bm{\nu}_{1}$ , obtaining,

\begin{split}F_{2i}(\bm{\nu})&=F_{2i}(\bm{\nu}_{1})+(\bm{\nu}-\bm{\nu_{1}})^{\prime}\frac{\partial F_{2i}(\bm{\nu}_{1})}{\partial\bm{\nu}}+\frac{1}{2}(\bm{\nu}-\bm{\nu_{1}})^{\prime}\frac{\partial^{2}F_{2i}(\bm{\nu}_{1})}{\partial\bm{\nu}\partial\bm{\nu^{\prime}}}(\bm{\nu}-\bm{\nu_{1}})+\mathbf{R}(\bm{\nu}),\end{split}

(B.1)

where

\mathbf{R}(\bm{\nu})=\frac{1}{6}\sum_{i,j,k}\frac{\partial^{3}F_{2i}}{\partial\bm{\nu_{i}}\partial\bm{\nu_{j}}\partial\bm{\nu_{k}}}\bigg{|}_{\bm{\nu^{*}}}(\bm{\nu_{i}}-\bm{\nu_{1i}})(\bm{\nu_{j}}-\bm{\nu_{1j}})(\bm{\nu_{k}}-\bm{\nu_{1k}})

with $\bm{\nu^{*}}$ lying between $\bm{\nu}$ and $\bm{\nu_{1}}$ .

Under the regularity conditions, we have,

\begin{split}\operatorname{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\frac{\partial F_{2i}(\bm{\nu})}{\partial\bm{\hat{\nu}}}\right]+\operatorname{E}\left[\frac{1}{2}(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\frac{\partial^{2}F_{2i}(\bm{\nu})}{\partial\bm{\hat{\nu}}}\partial\bm{\hat{\nu}}^{\prime}\right](\bm{\hat{\nu}}-\bm{\nu})&=m^{-1}c(\bm{\nu}),\\ \end{split}

(B.2)

where $c$ denotes a smooth function of order $O(1)$ , and

\operatorname{E}\left[\mathbf{R}(\bm{\hat{\nu}})\right]=\frac{1}{6}\sum_{i,j,k}\operatorname{E}\left[\frac{\partial^{3}F_{2i}(\bm{\nu^{*}})}{\partial\bm{\hat{\nu}_{i}}\partial\bm{\hat{\nu}_{j}}\partial\bm{\hat{\nu}_{k}}}(\bm{\hat{\nu}_{i}}-\bm{\nu_{i}})(\bm{\hat{\nu}_{j}}-\bm{\nu_{j}})(\bm{\hat{\nu}_{k}}-\bm{\nu_{k}})\right]=o(m^{-1})

(B.3)

with $\bm{\nu^{*}}$ lying between $\bm{\hat{\nu}}$ and $\bm{\nu}$ .

Thus, using equations (3.1) and (A.1), with a non-pivot $H_{i}(\bm{\beta},A)$ , we have

\begin{split}\text{E}\left[\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\hat{\alpha}_{u}^{s}}\,\big{|}\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})\right]&=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}^{s}};\hat{\bm{\nu}})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{u}^{s}})\right]+o(m^{-1})\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}^{s}};\bm{\nu})+m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{u}^{s}})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{u}^{s}})\right]+o(m^{-1})\\ &=1-\alpha/2,\\ \end{split}

(B.4)

\begin{split}\text{E}\left[\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\hat{\alpha}_{l}^{s}}\,\big{|}\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})\right]&=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{l}^{s}};\hat{\bm{\nu}})+m^{-1}\gamma(\bm{\hat{\beta}},\hat{A};q_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{l}^{s}};\bm{\nu})+m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{l}^{s}})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ &=\alpha/2.\end{split}

(B.5)

Finally, we obtain,

\begin{split}\mathrm{P}&\left(\hat{\theta}_{i}+q_{\hat{\alpha}_{l}^{s}}\sqrt{\hat{g}_{1i}}\leq\theta_{i}\leq\hat{\theta}_{i}+q_{\hat{\alpha}_{u}^{s}}\sqrt{\hat{g}_{1i}}\right)\\ =&\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}^{s}};\bm{\nu})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{u}^{s}})\right]-\text{E}\left[F_{2i}(q_{\hat{\alpha}_{l}^{s}};\bm{\nu})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ =&\text{E}\left[1-\alpha/2-m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{u}^{s}})-m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{u}^{s}})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{u}^{s}})\right]\\ &-\text{E}\left[\alpha/2-m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{l}^{s}})-m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{l}^{s}})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ =&1-\alpha-\text{E}\left[m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{u}^{s}})-m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ =&1-\alpha+O(m^{-1}).\end{split}

(B.6)

Appendix C Appendix

Proof of Proposition 3.2: From the third equation of (B.6) and (B.2), we have

\begin{split}T_{1}&=-\text{E}\left[m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{u}^{s}})-m^{-1}c(\bm{\nu};q_{\hat{\alpha}_{l}^{s}})\right]\\ &=-\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\frac{\partial F_{2i}(q_{\hat{\alpha}_{u}^{s}};\bm{\nu})}{\partial\bm{\hat{\nu}}}-\frac{\partial F_{2i}(q_{\hat{\alpha}_{l}^{s}};\bm{\nu})}{\partial\bm{\hat{\nu}}}\Bigr{\}}+\frac{1}{2}(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\frac{\partial^{2}F_{2i}(q_{\hat{\alpha}_{u}^{s}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}-\frac{\partial^{2}F_{2i}(q_{\hat{\alpha}_{l}^{s}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}\Bigr{\}}(\bm{\hat{\nu}}-\bm{\nu})\right]\\ &=-\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\frac{\partial}{\partial\bm{\hat{\nu}}}\Bigl{\{}F_{2i}(q_{\alpha_{u}};\bm{\nu})+\frac{\partial F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}^{\prime}}(\bm{\hat{\nu}}-\bm{\nu})\Bigr{\}}-(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\frac{\partial}{\partial\bm{\hat{\nu}}}\Bigl{\{}F_{2i}(q_{\alpha_{l}};\bm{\nu})+\frac{\partial F_{2i}(q_{\alpha_{l}};\bm{\nu})}{\partial\bm{\hat{\nu}}^{\prime}}(\bm{\hat{\nu}}-\bm{\nu})\Bigr{\}}\right]\\ &-\text{E}\left[\frac{1}{2}(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\frac{\partial^{2}F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}-\frac{\partial^{2}F_{2i}(q_{\alpha_{l}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}\Bigr{\}}(\bm{\hat{\nu}}-\bm{\nu})\right]+o(m^{-1})\\ &=-\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\frac{\partial F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}}-\frac{\partial F_{2i}(q_{\alpha_{l}};\bm{\nu})}{\partial\bm{\hat{\nu}}}\Bigr{\}}\right]\\ &-\frac{3}{2}\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\frac{\partial^{2}F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}-\frac{\partial^{2}F_{2i}(q_{\alpha_{l}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}\Bigr{\}}(\bm{\hat{\nu}}-\bm{\nu})\right]+o(m^{-1}),\end{split}

(C.1)

where $q_{u}$ and $q_{l}$ are unknown but fixed quantites.

In view of the above, we can write

T_{1}=T_{11}+T_{12}+o(m^{-1}),

(C.2)

where

\begin{split}T_{11}&=-\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\frac{\partial F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}}-\frac{\partial F_{2i}(q_{\alpha_{l}};\bm{\nu})}{\partial\bm{\hat{\nu}}}\Bigr{\}}\right],\\ T_{12}&=-\frac{3}{2}\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\Bigl{\{}\frac{\partial^{2}F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}-\frac{\partial^{2}F_{2i}(q_{\alpha_{l}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}^{\prime}}}\Bigr{\}}(\bm{\hat{\nu}}-\bm{\nu})\right].\end{split}

With the assumption of symmetrically distributed random effects $u_{i}$ , it is easy to show that $F_{2i}(q_{\alpha_{u}};\bm{\nu})=1-F_{2i}(q_{\alpha_{l}};\bm{\nu})$ . Hence, we have

\begin{split}T_{11}&=-2\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\frac{\partial F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}}\right],\\ T_{12}&=-3\text{E}\left[(\bm{\hat{\nu}}-\bm{\nu})^{\prime}\frac{\partial^{2}F_{2i}(q_{\alpha_{u}};\bm{\nu})}{\partial\bm{\hat{\nu}}\partial\bm{\hat{\nu}}^{\prime}}(\bm{\hat{\nu}}-\bm{\nu})\right].\end{split}

Given the assumptions $(i)$ - $(iii)$ , we have

T_{11}+T_{12}=O(m^{-1})>0,

under the regularity conditions. Thus, $T_{1}>0$ .

Appendix D Appendix

Proof of Proposition 4.1 Let $M_{i}(\bm{\beta},A)=(\theta_{i}-f_{i}(\bm{\beta}))/\sqrt{A}\sim J_{2i}(\bm{\delta})$ , and $M_{i}(\hat{\bm{\beta}},\hat{A})=(\theta_{i}-f_{i}(\hat{\bm{\beta}}))/\sqrt{\hat{A}}\sim J_{1i}(\bm{\delta})$ , where $\bm{\delta}$ is a vector of unknown parameters. We have

\begin{split}\text{E}\left[\mathrm{P}^{*}(M_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq b_{\hat{\alpha}_{u}^{s}}\,\big{|}\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})\right]&=\text{E}\left[J_{2i}(b_{\hat{\alpha}_{u}^{s}};\hat{\bm{\delta}})+m^{-1}c_{1}(\hat{\bm{\beta}},\hat{A};b_{\hat{\alpha}_{u}^{s}})\right]+o(m^{-1})\\ &=\text{E}\left[J_{2i}(b_{\hat{\alpha}_{u}^{s}};\bm{\delta})+m^{-1}c_{2}(b_{\hat{\alpha}_{u}^{s}};\bm{\delta})+m^{-1}c_{1}(\hat{\bm{\beta}},\hat{A};b_{\hat{\alpha}_{u}^{s}})\right]+o(m^{-1})\\ &=1-\alpha/2,\\ \end{split}

(D.1)

\begin{split}\text{E}\left[\mathrm{P}^{*}(M_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq b_{\hat{\alpha}_{l}^{s}}\,\big{|}\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})\right]&=\text{E}\left[J_{2i}(b_{\hat{\alpha}_{l}^{s}};\hat{\bm{\delta}})+m^{-1}c_{1}(\bm{\hat{\beta}},\hat{A};b_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ &=\text{E}\left[J_{2i}(b_{\hat{\alpha}_{l}^{s}};\bm{\delta})+m^{-1}c_{2}(\bm{\delta};b_{\hat{\alpha}_{l}^{s}})+m^{-1}c_{1}(\hat{\bm{\beta}},\hat{A};b_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ &=\alpha/2,\end{split}

(D.2)

where $c_{1}$ and $c_{2}$ are smooth functions of order $O(1)$ .

Eventually, we have

\begin{split}\mathrm{P}&(f_{i}(\hat{\bm{\beta}})+b_{\hat{\alpha}_{l}^{s}}\hat{A}^{1/2}\leq\theta_{i}\leq f_{i}(\hat{\bm{\beta}})+b_{\hat{\alpha}_{u}^{s}}\hat{A}^{1/2})\\ =&\text{E}\left[J_{2i}(b_{\hat{\alpha}_{u}^{s}};\bm{\delta})+m^{-1}c_{1}(\bm{\beta},A;b_{\hat{\alpha}_{u}^{s}})\right]-\text{E}\left[J_{2i}(b_{\hat{\alpha}_{l}^{s}};\bm{\delta})+m^{-1}c_{1}(\bm{\beta},A;b_{\hat{\alpha}_{l}^{s}})\right]\\ =&\text{E}\left[1-\alpha/2-m^{-1}c_{2}(\bm{\delta};q_{\hat{\alpha}_{u}^{s}})-m^{-1}c_{1}(\hat{\bm{\beta}},\hat{A};b_{\hat{\alpha}_{u}^{s}})+m^{-1}c_{1}(\bm{\beta},A;b_{\hat{\alpha}_{u}^{s}})\right]\\ &-\text{E}\left[\alpha/2-m^{-1}c_{2}(\bm{\delta};b_{\hat{\alpha}_{l}^{s}})-m^{-1}c_{1}(\hat{\bm{\beta}},\hat{A};b_{\hat{\alpha}_{l}^{s}})+m^{-1}c_{1}(\bm{\beta},A;b_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ =&1-\alpha-\text{E}\left[m^{-1}c_{2}(\bm{\delta};b_{\hat{\alpha}_{u}^{s}})-m^{-1}c_{2}(\bm{\delta};b_{\hat{\alpha}_{l}^{s}})\right]+o(m^{-1})\\ =&1-\alpha+O(m^{-1}).\end{split}

(D.3)

Appendix E Appendix

Proof of Theorem 4.1: With equations (4.6) and (A.1), for the upper limit, we can obtain

\begin{split}\text{E}\left[\hat{\alpha}_{u}\right]&=\text{E}\left[\mathrm{P}^{*}(H_{i}(\hat{\bm{\beta}}^{*},\hat{A}^{*})\leq q_{\hat{\alpha}^{d}_{u}}\,|\,\hat{\bm{\beta}},\hat{A},\hat{\bm{\phi}})\right]\\ &=\text{E}\left[F_{1i}(q_{\hat{\alpha}^{d}_{u}};\hat{\bm{\nu}})\right]\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}^{d}_{u}};\hat{\bm{\nu}})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}^{d}_{u}})\right]+O(m^{-3/2})\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}^{d}_{u}};\bm{\nu})+m^{-1}c(\bm{\nu};q_{\hat{\alpha}^{d}_{u}})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}^{d}_{u}})\right]+o(m^{-1}).\end{split}

(E.1)

Replacing $\alpha_{u}$ by $\hat{\alpha}_{u}$ in the system of equations (4.3), we obtain

\begin{split}&\text{E}\left[1-\alpha/2-\hat{\alpha}_{u}\right]\\ &=\text{E}\left[F_{1i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}})-F_{1i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}}^{*})\right]\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}})-F_{2i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}}^{*})+m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}_{u}})-m^{-1}\gamma(\hat{\bm{\beta}}^{*},\hat{A}^{*};q_{\hat{\alpha}_{u}})\right]+O(m^{-3/2})\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}})-F_{2i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}}^{*})\right]+O(m^{-3/2})\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}})-F_{2i}(q_{\hat{\alpha}_{u}};\hat{\bm{\nu}})-m^{-1}c(\hat{\bm{\nu}};q_{\hat{\alpha}_{u}})\right]+o(m^{-1})\\ &=\text{E}\left[-m^{-1}c(\hat{\bm{\nu}};q_{\hat{\alpha}_{u}})\right]+o(m^{-1}).\end{split}

(E.2)

Therefore,

\text{E}\left[\hat{\alpha}_{u}\right]=1-\alpha/2+\text{E}\left[m^{-1}c(\hat{\bm{\nu}};q_{\hat{\alpha}_{u}})\right]+o(m^{-1}).

(E.3)

For the upper limit, using the last equation of (E.1) and (E.3), we have

\begin{split}&\mathrm{P}(\theta_{i}\leq\hat{\theta}_{i}+q_{\hat{\alpha}^{d}_{u}}\sqrt{\hat{g}_{1i}})\\ &=\text{E}\left[F_{1i}(q_{\hat{\alpha}^{d}_{u}})\right]\\ &=\text{E}\left[F_{2i}(q_{\hat{\alpha}^{d}_{u}};\bm{\nu})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}^{d}_{u}})\right]+O(m^{-3/2})\\ &=\text{E}\left[1-\alpha/2+m^{-1}c(\hat{\bm{\nu}};q_{\hat{\alpha}^{d}_{u}})-m^{-1}c(\bm{\nu};q_{\hat{\alpha}^{d}_{u}})-m^{-1}\gamma(\hat{\bm{\beta}},\hat{A};q_{\hat{\alpha}^{d}_{u}})+m^{-1}\gamma(\bm{\beta},A;q_{\hat{\alpha}^{d}_{u}})\right]+o(m^{-1})\\ &=1-\alpha/2+o(m^{-1}).\end{split}

(E.4)

Similarly, for the lower limit, we have

\begin{split}\mathrm{P}(\theta_{i}\leq\hat{\theta}_{i}+q_{\hat{\alpha}^{d}_{l}}\sqrt{\hat{g}_{1i}})&=\text{E}\left[F_{1i}(q_{\hat{\alpha}^{d}_{l}})\right]\\ &=\alpha/2+o(m^{-1}).\end{split}

(E.5)

Finally, obtain

\begin{split}&\mathrm{P}(\hat{\theta}_{i}+q_{\hat{\alpha}^{d}_{l}}\sqrt{\hat{g}_{1i}}\leq\theta_{i}\leq\hat{\theta}_{i}+q_{\hat{\alpha}^{d}_{u}}\sqrt{\hat{g}_{1i}})\\ &=\text{E}\left[F_{1i}(q_{\hat{\alpha}^{d}_{u}})-F_{1i}(q_{\hat{\alpha}^{d}_{l}})\right]\\ &=1-\alpha/2-\alpha/2+o(m^{-1})\\ &=1-\alpha+o(m^{-1}).\end{split}

(E.6)

Appendix F Appendix

Let $f_{2i}(\cdot)$ denote the probability density function of $H_{i}(\bm{\beta},A)$ . Let $f^{\prime}_{2i}$ and $f^{\prime\prime}_{2i}$ be the first and second derivative of $f_{2i}(\cdot)$ . Then (A.1) can be expressed as:

\begin{split}F_{1i}(q_{\alpha})&=\mathrm{P}((\theta_{i}-\hat{\theta}_{i})/\sqrt{\hat{g}_{1i}}\leq q_{\alpha})\\ &=E\left[F_{2i}(q_{\alpha}+Q(q_{\alpha}+\bm{Y}))\right]\\ &=F_{2i}(q_{\alpha})+f_{2i}(q_{\alpha})\text{E}Q+\frac{1}{2}f^{\prime}_{2i}(q_{\alpha})\text{E}Q^{2}+\frac{1}{2}\text{E}\left[\int_{q_{\alpha}}^{q_{\alpha}+Q}f^{\prime\prime}_{2i}(t)(q_{\alpha}+Q-t)^{2}dt\right],\end{split}

(F.1)

where $Q=Q(q_{\alpha}+\bm{Y})$ . Note that for $t\in(q_{\alpha},q_{\alpha}+Q)$ , we have $0\leq|q_{\alpha}+Q-t|\leq|Q|$ . Assuming that the third derivative is uniformly bounded, we have

\text{E}\int_{q_{\alpha}}^{q_{\alpha}+Q}f^{\prime\prime}_{2i}(t)(q_{\alpha}+Q-t)^{2}dt\leq C\text{E}|Q|^{3}

(F.2)

for some constant $C>0$ . Then, to show the last equation of (A.1), it reduces to the evaluation of $\text{E}Q$ , $\text{E}Q^{2}$ and $\text{E}|Q|^{3}$ . In particular, if $\text{E}Q=O(m^{-1})$ , $\text{E}Q^{2}=O(m^{-1})$ and $\text{E}Q^{4}=O(m^{-2})$ , then it follows that

\text{E}|Q|^{3}\leq(\text{E}Q^{4})^{3/4}=O(m^{-3/2}),

(F.3)

by the Lyapunov inequality so that

F_{1i}(q_{\alpha})=F_{2i}(q_{\alpha})+m^{-1}\gamma(\bm{\beta},A;q_{\alpha})+O(m^{-3/2}).

First, note that

\bm{\hat{\beta}}=(\bm{X^{\prime}\hat{\Sigma}^{-1}X)^{-1}X^{\prime}\hat{\Sigma}^{-1}Y}=\bm{\beta}+(\bm{X^{\prime}\hat{\Sigma}^{-1}X)^{-1}X^{\prime}\hat{\Sigma}^{-1}(u+e)}.

We now simplify the expression for $Q=Q(q_{\alpha}+\bm{Y})$ :

\begin{split}\hat{\theta}_{i}-\tilde{\theta}_{i}=&\frac{\hat{A}}{\hat{A}+D_{i}}y_{i}+\frac{D_{i}}{\hat{A}+D_{i}}\bm{x}_{i}^{\prime}\bm{\hat{\beta}}-\frac{A}{A+D_{i}}y_{i}-\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}\bm{\beta}\\ =&\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\bm{\Sigma}^{-1}(\bm{u}+\bm{e})\\ &+\frac{D_{i}}{\hat{A}+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}(\bm{u}+\bm{e})-\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\Sigma^{-1}\bm{X})^{-1}\bm{X}^{\prime}\Sigma^{-1}(\bm{u}+\bm{e})\\ &+(\frac{\hat{A}}{\hat{A}+D_{i}}-\frac{A}{A+D_{i}})(u_{i}+e_{i}).\\ \end{split}

(F.4)

In view of the above, similar to Li (2007) and Saegusa et al. (2022), we can write

Q(q_{\alpha},\bm{Y})=Q_{1}+Q_{2}(\bm{Y})+Q_{3}(\bm{Y})+Q_{4}(\bm{Y}).

where

\begin{split}Q_{1}=&\frac{1}{\sqrt{g_{1i}}}\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\bm{\Sigma}^{-1}(\bm{u}+\bm{e}),\\ Q_{2}(\bm{Y})=&\frac{1}{\sqrt{g_{1i}}}\left[\frac{D_{i}}{\hat{A}+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}(\bm{u}+\bm{e})\right.\\ &\left.-\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\Sigma^{-1}\bm{X})^{-1}\bm{X}^{\prime}\Sigma^{-1}(\bm{u}+\bm{e})\right],\\ Q_{3}(\bm{Y})=&\frac{1}{\sqrt{g_{1i}}}\left[(\frac{\hat{A}}{\hat{A}+D_{i}}-\frac{A}{A+D_{i}})(u_{i}+e_{i})\right],\\ Q_{4}(\bm{Y})=&q_{\alpha}(\sqrt{\hat{g}_{1i}}-\sqrt{g_{1i}})/\sqrt{g_{1i}}.\end{split}

Therefore, in order to obtain the last equation of (F.1), we need to show that $\text{E}Q_{i}=O(m^{-1})$ , $\text{E}Q_{i}^{2}=O(m^{-1})$ and $\text{E}Q_{i}^{4}=O(m^{-2})$ , $i=1,\cdots,4$ . Firstly, it is easy to show that $\text{E}Q_{1}=0$ . Using the regularity condition that $\text{sup}_{i\geq 1}h_{ii}\equiv\bm{x_{i}^{\prime}(X^{\prime}X)^{-1}x_{i}}=O(m^{-1})$ , we have

\begin{split}\text{E}Q_{1}^{2}&=\frac{D_{i}}{A(A+D_{i})}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\bm{\Sigma}^{-1}\text{E}(\bm{u}+\bm{e})(\bm{u}+\bm{e})^{\prime}\bm{\Sigma}^{-1}\bm{X}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{x}_{i}\\ &=\frac{D_{i}}{A(A+D_{i})}h_{ii}\\ &=O(m^{-1}).\end{split}

(F.5)

Under the assumption $h_{ii}=O(m^{-1})$ , it is easily to see that

\begin{split}\tilde{h}_{ii}&=\bm{x_{i}^{\prime}(X\Sigma^{-1}X)^{-1}X^{\prime}\Sigma^{-2}(X\Sigma^{-1}X)^{-1}x_{i}}\\ &=\bm{d_{i}^{\prime}P_{X}P_{X}^{\prime}d_{i}}\\ &=O(m^{-1})\end{split}

where $\bm{d_{i}}$ is a column vector with the $i^{th}$ element unity and the rest zeros, $P_{X}=\bm{X(X\Sigma^{-1}X)^{-1}X^{\prime}\Sigma^{-1}}$ . An application of the Cauchy–Schwartz inequality yields

\begin{split}\text{E}Q_{1}^{4}&=\frac{D_{i}^{2}}{A^{2}(A+D_{i})^{2}}\text{E}(\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\bm{\Sigma}^{-1}(\bm{u}+\bm{e}))^{4}\\ &=\frac{D_{i}^{2}}{A^{2}(A+D_{i})^{2}}\text{E}(\bm{d_{i}^{\prime}P_{X}P_{X}(u+e)})^{4}\\ &\leq\frac{D_{i}^{2}}{A^{2}(A+D_{i})^{2}}(\bm{d_{i}^{\prime}P_{X}P_{X}^{\prime}d_{i}})^{2}\{\text{E}[\bm{(u+e)^{\prime}P_{X}^{\prime}P_{X}(u+e)}]\}^{2}\\ &=Cm^{-2}\{\text{tr}(\bm{X(X^{\prime}\Sigma^{-1}X)^{-1}X})\}^{2}\\ &=O(m^{-2}).\end{split}

(F.6)

To evaluate the fourth moment of rest of the terms in $Q$ , we need to evaluate the moment of $(\hat{A}+D_{i})^{-1}-(A+D_{i})^{-1}$ and $\hat{\bm{\Sigma}}^{-1}-\bm{\Sigma}^{-1}$ . To see this, for example, we can break $Q_{2}$ down in terms of more tractable variables and remainder terms:

\begin{split}Q_{2}(\bm{Y})=&\frac{1}{\sqrt{g_{1i}}}\left[\frac{D_{i}}{\hat{A}+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}(\bm{u}+\bm{e})-\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\Sigma^{-1}\bm{X})^{-1}\bm{X}^{\prime}\bm{\Sigma}^{-1}(\bm{u}+\bm{e})\right]\\ =&\frac{1}{\sqrt{g_{1i}}}\bigg{[}\left(\frac{D_{i}}{\hat{A}+D_{i}}-\frac{D_{i}}{A+D_{i}}\right)\bm{x}_{i}^{\prime}\left((\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}\bm{X})^{-1}-(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\right)\bm{X}^{\prime}\left(\hat{\bm{\Sigma}}^{-1}-\bm{\Sigma}^{-1}\right)(\bm{u+e})\\ &+\left(\frac{D_{i}}{\hat{A}+D_{i}}-\frac{D_{i}}{A+D_{i}}\right)\bm{x}_{i}^{\prime}\left((\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}\bm{X})^{-1}-(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\right)\bm{X}^{\prime}\bm{\Sigma}^{-1}(\bm{u+e})\\ &+\left(\frac{D_{i}}{\hat{A}+D_{i}}-\frac{D_{i}}{A+D_{i}}\right)\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\left(\hat{\bm{\Sigma}}^{-1}-\bm{\Sigma}^{-1}\right)(\bm{u+e})\\ &+\left(\frac{D_{i}}{\hat{A}+D_{i}}-\frac{D_{i}}{A+D_{i}}\right)\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\bm{\Sigma}^{-1}(\bm{u+e})\\ &+\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}\left((\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}\bm{X})^{-1}-(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\right)\bm{X}^{\prime}\left(\hat{\bm{\Sigma}}^{-1}-\bm{\Sigma}^{-1}\right)(\bm{u+e})\\ &+\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}\left((\bm{X}^{\prime}\hat{\bm{\Sigma}}^{-1}\bm{X})^{-1}-(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\right)\bm{X}^{\prime}\bm{\Sigma}^{-1}(\bm{u+e})\\ &+\frac{D_{i}}{A+D_{i}}\bm{x}_{i}^{\prime}(\bm{X}^{\prime}\bm{\Sigma}^{-1}\bm{X})^{-1}\bm{X}^{\prime}\left(\hat{\bm{\Sigma}}^{-1}-\bm{\Sigma}^{-1}\right)(\bm{u+e})\bigg{]}.\end{split}

(F.7)

Using the Taylor series expansion, we have

\begin{split}&\frac{D_{i}}{\hat{A}+D_{i}}-\frac{D_{i}}{A+D_{i}}\\ =&-(\hat{A}-A)\frac{D_{i}}{(A+D_{i})^{2}}+(\hat{A}-A)^{2}\frac{D_{i}}{(A+D_{i})^{3}}-(\hat{A}-A)^{3}\frac{D_{i}}{(A^{*}+D_{i})^{4}},\end{split}

\begin{split}&\hat{\bm{\Sigma}}^{-1}-\bm{\Sigma}^{-1}\\ =&-(\hat{A}-A)\text{diag}\Bigl{\{}\frac{1}{(A+D_{i})^{2}}\Bigr{\}}_{i=1}^{m}+(\hat{A}-A)^{2}\text{diag}\Bigl{\{}\frac{1}{(A+D_{i})^{3}}\Bigr{\}}_{i=1}^{m}\\ &-(\hat{A}-A)^{3}\text{diag}\Bigl{\{}\frac{1}{(A^{*}+D_{i})^{4}}\Bigr{\}}_{i=1}^{m},\end{split}

and for $Q_{3}$ , we have

\begin{split}&\frac{\hat{A}}{\hat{A}+D_{i}}-\frac{A}{A+D_{i}}\\ =&(\hat{A}-A)\frac{D_{i}}{(A+D_{i})^{2}}-(\hat{A}-A)^{2}\frac{D_{i}}{(A+D_{i})^{3}}+(\hat{A}-A)^{3}\frac{D_{i}}{(A^{*}+D_{i})^{4}},\end{split}

where $A^{*}$ lies between $A$ and $\hat{A}$ .

For the last term $Q_{4}$ , letting $W=g_{1i}^{-1}(\hat{g}_{1i}-g_{1i})$ , we have

\begin{split}Q_{4}(q_{\alpha},\bm{Y})&=q_{\alpha}(\sqrt{\hat{g}_{1i}}-\sqrt{g_{1i}})/\sqrt{g_{1i}}\\ &=q_{\alpha}(\sqrt{\hat{g}_{1i}}/\sqrt{g_{1i}}-1)\\ &=q_{\alpha}\bigg{[}\{g_{1i}^{-1}(\hat{g}_{1i}-g_{1i})+1\}^{1/2}-1\bigg{]}\\ &=q_{\alpha}\bigg{[}\{W+1\}^{1/2}-1\bigg{]}\\ &=q_{\alpha}\bigg{[}W/2-W^{2}/8+r_{n}\bigg{]},\end{split}

(F.8)

where the remainder being $r_{n}=O(W^{3})$ . Then, apply Taylor expansion on $W$ :

\begin{split}W&=g_{1i}^{-1}(\hat{g}_{1i}-g_{1i})\\ &=g_{1i}^{-1}\bigg{(}\frac{\hat{A}D_{i}}{\hat{A}+D_{i}}-\frac{AD_{i}}{A+D_{i}}\bigg{)}\\ &=g_{1i}^{-1}\bigg{(}(\hat{A}-A)\frac{D_{i}^{2}}{(A+D_{i})^{2}}-(\hat{A}-A)^{2}\frac{D_{i}^{2}}{(A+D_{i})^{3}}+(\hat{A}-A)^{3}\frac{D_{i}^{2}}{(A^{*}+D_{i})^{4}}\bigg{)}.\\ \end{split}

(F.9)

As pointed out in Chatterjee et al. (2008) and Saegusa et al. (2022), this evaluation of $(\hat{A}+D_{i})^{-1}-(A+D_{i})^{-1}$ and $\hat{\bm{\Sigma}}^{-1}-\bm{\Sigma}^{-1}$ involves numerous elementary calculations. In the end, both fourth moments reduce to the fourth moment of $\hat{A}-A$ . If using the Prasad-Rao method of moment estimator $\hat{A}_{PR}$ , Lahiri and Rao (1995) showed that $\text{E}(\hat{A}_{\rm{PR}}-A)=o(m^{-1})$ and $\text{E}|\hat{A}_{\rm{PR}}-A|^{2q}=O(m^{-q})$ for any $q$ satisfying $0<q\leq 2+\delta^{\prime}$ with $0<\delta^{\prime}<\frac{1}{4}\delta$ under regularity conditions.

Therefore, the above arguments support the final expression:

F_{1i}(q_{\alpha})=F_{2i}(q_{\alpha})+m^{-1}\gamma(\bm{\beta},A;q_{\alpha})+O(m^{-3/2}).

Appendix G Appendix

Table 7: Coverage probabilities (average length) of different intervals under

t_{9}

distribution with

m

= 50 and Pattern (ii)

SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) $\alpha=80\%$ G1 79.94 ( 3.26 ) 80.30 ( 3.58 ) 84.56 ( 3.76 ) 85.44 ( 4.12 ) 79.89 ( 3.26 ) 78.65 ( 3.27 ) 79.43 ( 7.25 ) G2 80.15 ( 2.25 ) 80.31 ( 3.52 ) 84.51 ( 2.61 ) 85.40 ( 4.07 ) 80.38 ( 2.25 ) 81.56 ( 2.36 ) 79.85 ( 2.81 ) G3 79.71 ( 2.12 ) 79.53 ( 3.51 ) 83.85 ( 2.47 ) 84.58 ( 4.07 ) 79.86 ( 2.12 ) 80.88 ( 2.23 ) 79.99 ( 2.56 ) G4 80.47 ( 1.96 ) 80.05 ( 3.51 ) 84.72 ( 2.29 ) 84.72 ( 4.06 ) 80.49 ( 1.96 ) 82.06 ( 2.08 ) 80.29 ( 2.29 ) G5 79.65 ( 1.49 ) 80.48 ( 3.48 ) 84.04 ( 1.76 ) 85.73 ( 4.03 ) 79.99 ( 1.49 ) 81.8 ( 1.62 ) 80.32 ( 1.62 ) (i) $\alpha=90\%$ G1 89.89 ( 4.31 ) 90.38 ( 4.80 ) 94.62 ( 5.73 ) 94.68 ( 6.33 ) 88.90 ( 4.18 ) 87.79 ( 4.19 ) 89.42 ( 9.30 ) G2 90.18 ( 2.91 ) 90.37 ( 4.67 ) 93.44 ( 3.96 ) 94.56 ( 6.24 ) 89.89 ( 2.89 ) 90.58 ( 3.03 ) 90.19 ( 3.60 ) G3 90.06 ( 2.74 ) 89.34 ( 4.66 ) 93.68 ( 3.74 ) 94.17 ( 6.23 ) 90.15 ( 2.72 ) 90.78 ( 2.87 ) 90.11 ( 3.29 ) G4 90.09 ( 2.53 ) 89.5 ( 4.65 ) 93.69 ( 3.46 ) 94.65 ( 6.21 ) 90.12 ( 2.52 ) 91.18 ( 2.67 ) 90.24 ( 2.94 ) G5 90.29 ( 1.92 ) 90.28 ( 4.60 ) 93.56 ( 2.65 ) 94.73 ( 6.15 ) 90.26 ( 1.92 ) 91.56 ( 2.07 ) 90.32 ( 2.08 ) (i) $\alpha=95\%$ G1 95.13 ( 5.30 ) 95.04 ( 5.97 ) 97.45 ( 8.42 ) 97.61 ( 9.31 ) 93.57 ( 4.98 ) 92.61 ( 5.00 ) 94.77 ( 11.09 ) G2 95.12 ( 3.49 ) 95.37 ( 5.76 ) 97.15 ( 5.82 ) 97.62 ( 9.13 ) 94.87 ( 3.44 ) 95.51 ( 3.60 ) 95.38 ( 4.29 ) G3 94.97 ( 3.29 ) 94.46 ( 5.74 ) 97.22 ( 5.48 ) 97.60 ( 9.08 ) 94.7 ( 3.24 ) 95.51 ( 3.42 ) 95 ( 3.92 ) G4 95.07 ( 3.03 ) 95.03 ( 5.71 ) 96.93 ( 5.08 ) 97.68 ( 9.04 ) 95.12 ( 3.00 ) 95.86 ( 3.18 ) 95.07 ( 3.51 ) G5 95.31 ( 2.29 ) 95.01 ( 5.64 ) 96.99 ( 3.88 ) 97.52 ( 8.94 ) 95.28 ( 2.29 ) 96.02 ( 2.47 ) 95.17 ( 2.48 )

Table 8: Coverage probabilities (average length) of different intervals under

t_{9}

distribution with

m

= 15 and Pattern (ii)

SB.FH HM.FH SB.PR HM.PR FH PR DIRECT (i) $\alpha=80\%$ G1 84.30 ( 3.89 ) 84.53 ( 4.26 ) 82.80 ( 5.42 ) 84.1 ( 5.98 ) 76.07 ( 3.30 ) 76.50 ( 3.48 ) 79.97 ( 7.25 ) G2 83.43 ( 2.59 ) 83.77 ( 4.03 ) 81.33 ( 3.65 ) 83.77 ( 5.74 ) 79.57 ( 2.32 ) 87.10 ( 3.09 ) 79.17 ( 2.81 ) G3 84.20 ( 2.43 ) 84.53 ( 3.99 ) 82.73 ( 3.42 ) 84.57 ( 5.67 ) 80.00 ( 2.19 ) 88.30 ( 3.03 ) 80.03 ( 2.56 ) G4 83.07 ( 2.25 ) 84.50 ( 3.97 ) 81.87 ( 3.17 ) 83.67 ( 5.65 ) 79.77 ( 2.04 ) 89.33 ( 2.97 ) 80.57 ( 2.29 ) G5 82.53 ( 1.69 ) 84.87 ( 3.87 ) 80.73 ( 2.38 ) 84 ( 5.42 ) 79.17 ( 1.57 ) 87.20 ( 2.82 ) 78.13 ( 1.62 ) (i) $\alpha=90\%$ G1 94.50 ( 5.78 ) 94.97 ( 6.43 ) 87.57 ( 10.07 ) 87.8 ( 11.18 ) 85.23 ( 4.23 ) 85.83 ( 4.47 ) 89.5 ( 9.3 ) G2 93.73 ( 3.77 ) 94.00 ( 5.88 ) 87.57 ( 6.69 ) 87.87 ( 10.47 ) 88.80 ( 2.98 ) 94.87 ( 3.96 ) 90.03 ( 3.60 ) G3 93.83 ( 3.52 ) 95.10 ( 5.80) 87.60 ( 6.26 ) 88.33 ( 10.32 ) 88.93 ( 2.81 ) 95.37 ( 3.89 ) 89.47 ( 3.29 ) G4 93.87 ( 3.23 ) 94.50 ( 5.72 ) 87.60 ( 5.81 ) 88.53 ( 10.25 ) 89.83 ( 2.61 ) 95.50 ( 3.81 ) 89.83 ( 2.94 ) G5 91.73 ( 2.39 ) 94.47 ( 5.42 ) 87.17 ( 4.28 ) 88.7 ( 9.66 ) 89.17 ( 2.02 ) 93.73 ( 3.62 ) 88.5 ( 2.08 ) (i) $\alpha=95\%$ G1 98.10 ( 8.48 ) 97.9 ( 9.45 ) 89.27 ( 16.44 ) 89.37 ( 18.35 ) 90.4 ( 5.04 ) 91.17 ( 5.33 ) 94.40 ( 11.09 ) G2 97.73 ( 5.36 ) 97.73 ( 8.3 ) 89.33 ( 10.69 ) 89.43 ( 16.85 ) 93.8 ( 3.55 ) 98.50 ( 4.72 ) 94.77 ( 4.29 ) G3 97.50 ( 4.97 ) 97.73 ( 8.1 ) 89.33 ( 9.96 ) 89.50 ( 16.57 ) 93.67 ( 3.35 ) 98.27 ( 4.64 ) 94.53 ( 3.92 ) G4 97.23 ( 4.55 ) 97.80 ( 7.92 ) 89.67 ( 9.20 ) 89.87 ( 16.39 ) 94.2 ( 3.11 ) 97.93 ( 4.54 ) 94.90 ( 3.51 ) G5 96.5 ( 3.29 ) 97.4 ( 7.30 ) 89.93 ( 6.70 ) 90.53 ( 15.28 ) 94.57 ( 2.40 ) 97.13 ( 4.32 ) 93.40 ( 2.48 )

Table 9: Coverage probabilities (average length) of different intervals under shifted exponential distribution with

m

= 50,

B_{2}

= 50 and Pattern (i)

SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) $\alpha=80\%$ G1 80.41 ( 2.22 ) 81.98 ( 2.36 ) 80.46 ( 2.27 ) 84.82 ( 2.51 ) 87.88 ( 2.69 ) 84.59 ( 3.08 ) 83.69 ( 2.29 ) 82.36 ( 2.32 ) 79.51 ( 5.13 ) G2 79.91 ( 1.56 ) 80.96 ( 2.33 ) 80.36 ( 1.59 ) 83.76 ( 1.77 ) 87.22 ( 2.66 ) 84.04 ( 2.26 ) 80.64 ( 1.58 ) 82.97 ( 1.67 ) 80.08 ( 1.99 ) G3 79.99 ( 1.48 ) 81.47 ( 2.32 ) 80.06 ( 1.50 ) 83.42 ( 1.67 ) 87.41 ( 2.65 ) 83.80 ( 2.06 ) 80.62 ( 1.49 ) 82.90 ( 1.59 ) 79.56 ( 1.81 ) G4 80.32 ( 1.37 ) 81.26 ( 2.32 ) 80.89 ( 1.4 ) 83.69 ( 1.55 ) 87.52 ( 2.65 ) 83.83 ( 1.96 ) 80.67 ( 1.38 ) 83.41 ( 1.48 ) 80.17 ( 1.62 ) G5 80.10 ( 1.05 ) 81.69 ( 2.30 ) 80.41 ( 1.07 ) 83.22 ( 1.20 ) 87.29 ( 2.63 ) 83.32 ( 1.51 ) 80.14 ( 1.05 ) 82.80 ( 1.17 ) 80.30 ( 1.15 ) (i) $\alpha=90\%$ G1 90.86 ( 3.03 ) 92.24 ( 3.28 ) 90.70 ( 3.16 ) 93.66 ( 3.78 ) 95.04 ( 4.14 ) 93.44 ( 6.27 ) 91.23 ( 2.94 ) 90.27 ( 2.97 ) 89.62 ( 6.58 ) G2 90.53 ( 2.05 ) 91.43 ( 3.19 ) 90.85 ( 2.13 ) 92.97 ( 2.59 ) 94.46 ( 4.09 ) 93.16 ( 4.54 ) 90.63 ( 2.02 ) 92.15 ( 2.15 ) 89.99 ( 2.55 ) G3 89.86 ( 1.93 ) 91.99 ( 3.18 ) 90.27 ( 2.00 ) 93.28 ( 2.44 ) 95.21 ( 4.08 ) 92.79 ( 4.27 ) 89.95 ( 1.91 ) 91.67 ( 2.04 ) 89.71 ( 2.33 ) G4 90.34 ( 1.79 ) 91.91 ( 3.17 ) 90.64 ( 1.85 ) 92.80 ( 2.26 ) 94.41 ( 4.07 ) 92.7 ( 4.08 ) 90.20 ( 1.77 ) 91.98 ( 1.90 ) 89.74 ( 2.08 ) G5 90.28 ( 1.36 ) 92.04 ( 3.13 ) 90.80 ( 1.41 ) 92.83 ( 1.72 ) 94.90 ( 4.03 ) 92.48 ( 3.21 ) 90.47 ( 1.35 ) 92.10 ( 1.50 ) 90.20 ( 1.47 ) (i) $\alpha=95\%$ G1 95.90 ( 3.85 ) 96.88 ( 4.21 ) 95.72 ( 4.42 ) 96.67 ( 5.39 ) 97.25 ( 5.96 ) 96.93 ( 11.17 ) 94.28 ( 3.51 ) 93.54 ( 3.54 ) 94.54 ( 7.84 ) G2 95.51 ( 2.51 ) 96.10 ( 4.00 ) 95.72 ( 2.82 ) 96.56 ( 3.64 ) 96.98 ( 5.84 ) 96.83 ( 7.92 ) 94.98 ( 2.41 ) 95.36 ( 2.56 ) 94.81 ( 3.04 ) G3 94.99 ( 2.36 ) 96.76 ( 3.98 ) 95.51 ( 2.65 ) 96.43 ( 3.42 ) 97.28 ( 5.82 ) 96.70 ( 7.51 ) 94.85 ( 2.27 ) 95.94 ( 2.43 ) 94.68 ( 2.77 ) G4 95.34 ( 2.17 ) 96.54 ( 3.96 ) 95.64 ( 2.43 ) 96.40 ( 3.17 ) 96.99 ( 5.82 ) 96.30 ( 7.07 ) 94.98 ( 2.11 ) 95.91 ( 2.27 ) 94.84 ( 2.48 ) G5 95.51 ( 1.64 ) 96.57 ( 3.89 ) 95.76 ( 1.83 ) 96.41 ( 2.40) 97.08 ( 5.74 ) 96.5 ( 5.53 ) 95.13 ( 1.61 ) 96.10 ( 1.78 ) 95.21 ( 1.75 )

Table 10: Coverage probabilities (average length) of different intervals under shifted exponential distribution with

m

= 15,

B_{2}

= 50 and Pattern (i)

SB.FH HM.FH DB.FH SB.PR HM.PR DB.PR FH PR DIRECT (i) $\alpha=80\%$ G1 83.90( 2.66 ) 85.60 ( 2.90 ) 84.43 ( 2.77 ) 79.27 ( 3.29 ) 80.23 ( 3.67 ) 80.27 ( 5.36 ) 76.83 ( 2.23 ) 80.90 ( 2.43 ) 79.83 ( 5.13 ) G2 83.00 ( 1.78 ) 85.90 ( 2.74 ) 82.90 ( 1.85 ) 79.73 ( 2.19 ) 80.67 ( 3.54 ) 80.20 ( 3.49 ) 79.77 ( 1.59 ) 89.60 ( 2.22 ) 79.40 ( 1.99 ) G3 83.40 ( 1.68 ) 85.47 ( 2.71 ) 83.23 ( 1.74 ) 79.67 ( 2.06 ) 80.77 ( 3.52 ) 80.10 ( 3.28 ) 80.57 ( 1.51 ) 89.90 ( 2.19 ) 79.90 ( 1.81 ) G4 83.23 ( 1.55 ) 85.70 ( 2.69 ) 82.93 ( 1.60 ) 78.90 ( 1.90 ) 80.77 ( 3.48 ) 78.97 ( 2.98 ) 81.30 ( 1.41 ) 90.30 ( 2.15 ) 79.40 ( 1.62 ) G5 83.17 ( 1.18 ) 85.00 ( 2.60 ) 82.73 ( 1.21 ) 80.40 ( 1.43 ) 81.03 ( 3.36 ) 80.50 ( 2.08 ) 83.20 ( 1.12 ) 89.50 ( 2.08 ) 80.77 ( 1.15 ) (i) $\alpha=90\%$ G1 93.30 ( 4.08 ) 93.93 ( 4.54 ) 93.40 ( 5.54 ) 85.07 ( 5.83 ) 85.27 ( 6.58 ) 85.53 ( 15.6 ) 85.53 ( 2.87 ) 89.27 ( 3.12 ) 90.10 ( 6.58 ) G2 92.87 ( 2.59 ) 93.83 ( 4.06 ) 92.63 ( 3.31 ) 85.77 ( 3.75 ) 86.07 ( 6.14 ) 86.43 ( 9.76 ) 89.83 ( 2.04 ) 95.43 ( 2.84 ) 90.33 ( 2.55 ) G3 92.87 ( 2.42 ) 93.50 ( 3.99 ) 92.60 ( 3.04 ) 85.33 ( 3.53 ) 85.67 ( 6.12 ) 85.90 ( 9.26 ) 89.83 ( 1.94 ) 95.83 ( 2.81 ) 89.67 ( 2.33 ) G4 92.77 ( 2.22 ) 94.53 ( 3.92 ) 92.87 ( 2.7 ) 85.70 ( 3.23 ) 86.37 ( 6.02 ) 86.07 ( 8.26 ) 90.43 ( 1.81 ) 95.57 ( 2.76 ) 89.53 ( 2.08 ) G5 92.23 ( 1.65 ) 93.40 ( 3.71 ) 92.00 ( 1.88 ) 86.87 ( 2.38 ) 87.03 ( 5.73 ) 87.13 ( 5.79 ) 91.47 ( 1.43 ) 95.30 ( 2.67 ) 90.6 ( 1.47 ) (i) $\alpha=95\%$ G1 96.33 ( 5.99 ) 96.57 ( 6.68 ) 96.70 ( 12.50 ) 87.33 ( 9.30 ) 87.43 ( 10.53 ) 88.53 ( 24.21 ) 89.53 ( 3.41 ) 92.73 ( 3.71 ) 94.93 ( 7.84 ) G2 96.17 ( 3.64 ) 96.6 ( 5.65 ) 96.10 ( 7.03 ) 87.97 ( 5.82 ) 87.90 ( 9.59 ) 89.23 ( 14.30 ) 93.93 ( 2.44 ) 97.77 ( 3.39 ) 95.33 ( 3.04 ) G3 95.93 ( 3.38 ) 96.13 ( 5.53 ) 96.30 ( 6.24 ) 87.70 ( 5.46 ) 87.70 ( 9.51 ) 88.87 ( 13.53 ) 94.07 ( 2.31 ) 97.9 ( 3.34 ) 94.97 ( 2.77 ) G4 96.47 ( 3.08 ) 97.07 ( 5.39 ) 96.53 ( 5.67 ) 87.93 ( 5.01 ) 88.2 ( 9.39 ) 89.40 ( 12.21 ) 95.03 ( 2.16 ) 98.13 ( 3.29 ) 94.63 ( 2.48 ) G5 96.07 ( 2.22 ) 96.50 ( 4.95 ) 96.23 ( 3.69 ) 89.17 ( 3.60 ) 88.90 ( 8.72 ) 90.27 ( 8.69 ) 96.00 ( 1.71 ) 98.00 ( 3.19 ) 95.07 ( 1.75 )

References

Bell and Huang (2006) Bell, W. R. and E. T. Huang (2006). Using the t-distribution to deal with outliers in small area estimation. In Proceedings of statistics Canada symposium.
Chatterjee et al. (2008) Chatterjee, S., P. Lahiri, and H. Li (2008). Parametric bootstrap approximation to the distribution of EBLUP and related prediction intervals in linear mixed models. The Annals of Statistics 36(3), 1221 – 1245.
Cox (1975) Cox, D. (1975). Prediction intervals and empirical bayes confidence intervals. Journal of Applied Probability 12(S1), 47–55.
Datta and Lahiri (1995) Datta, G. S. and P. Lahiri (1995). Robust hierarchical bayes estimation of small area characteristics in the presence of covariates and outliers. Journal of Multivariate Analysis 54(2), 310–328.
Datta et al. (2005) Datta, G. S., J. Rao, and D. D. Smith (2005). On measuring the variability of small area estimators under a basic area level model. Biometrika 92(1), 183–196.
Fabrizi and Trivisano (2010) Fabrizi, E. and C. Trivisano (2010). Robust linear mixed models for small area estimation. Journal of Statistical Planning and Inference 140(2), 433–443.
Fay and Herriot (1979) Fay, R. E. and R. A. Herriot (1979). Estimates of income for small places: an application of james-stein procedures to census data. Journal of the American Statistical Association 74(366a), 269–277.
Ha et al. (2014) Ha, N. S., P. Lahiri, and V. Parsons (2014). Methods and results for small area estimation using smoking data from the 2008 national health interview survey. Statistics in Medicine 33(22), 3932–3945.
Hall (2013) Hall, P. (2013). The bootstrap and Edgeworth expansion. Springer Science & Business Media.
Hall and Maiti (2006) Hall, P. and T. Maiti (2006). On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society Series B: Statistical Methodology 68(2), 221–238.
Hawala and Lahiri (2018) Hawala, S. and P. Lahiri (2018). Variance modeling for domains. Statistics and Applications 16(1), 399–409.
Hirose and Lahiri (2018) Hirose, M. Y. and P. Lahiri (2018). Estimating variance of random effects to solve multiple problems simultaneously. The Annals of Statistics 46(4), 1721–1741.
Jiang and Nguyen (2007) Jiang, J. and T. Nguyen (2007). Linear and generalized linear mixed models and their applications, Volume 1. Springer.
Lahiri and Rao (1995) Lahiri, P. and J. Rao (1995). Robust estimation of mean squared error of small area estimators. Journal of the American Statistical Association 90(430), 758–766.
Li (2007) Li, H. (2007). Small area estimation: An empirical best linear unbiased prediction approach. University of Maryland, College Park.
Li and Lahiri (2010) Li, H. and P. Lahiri (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of multivariate analysis 101(4), 882–892.
McCullough and Vinod (1998) McCullough, B. and H. Vinod (1998). Implementing the double bootstrap. Computational Economics 12, 79–95.
Otto and Bell (1995) Otto, M. C. and W. R. Bell (1995). Sampling error modelling of poverty and income statistics for states. In American Statistical Association, Proceedings of the Section on Government Statistics, pp. 160–165.
Prasad and Rao (1990) Prasad, N. N. and J. N. Rao (1990). The estimation of the mean squared error of small-area estimators. Journal of the American statistical association 85(409), 163–171.
Rao and Molina (2015) Rao, J. N. and I. Molina (2015). Small area estimation. John Wiley & Sons.
Saegusa et al. (2022) Saegusa, T., S. Sugasawa, and P. Lahiri (2022). Parametric bootstrap confidence intervals for the multivariate fay–herriot model. Journal of Survey Statistics and Methodology 10(1), 115–130.
Shao (2008) Shao, J. (2008). Mathematical statistics. Springer Science & Business Media.
Shi (1992) Shi, S. G. (1992). Accurate and efficient double-bootstrap confidence limit method. Computational statistics & data analysis 13(1), 21–32.
Xie et al. (2007) Xie, D., T. E. Raghunathan, and J. M. Lepkowski (2007). Estimation of the proportion of overweight individuals in small areas—a robust extension of the fay–herriot model. Statistics in Medicine 26(13), 2699–2715.
Yoshimori and Lahiri (2014) Yoshimori, M. and P. Lahiri (2014). A second-order efficient empirical Bayes confidence interval. The Annals of Statistics 42(4), 1233 – 1261.