Efficiency of QMLE for dynamic panel data models with interactive effects

Jushan Bailabel=e1][email protected] [ Department of Economics, Columbia Universitypresep=, ]e1

Abstract

This paper studies the problem of efficient estimation of panel data models in the presence of an increasing number of incidental parameters. We formulate the dynamic panel as a simultaneous equations system, and derive the efficiency bound under the normality assumption. We then show that the Gaussian quasi-maximum likelihood estimator (QMLE) applied to the system achieves the normality efficiency bound without the normality assumption. Comparison of QMLE with the fixed effects estimators is made.

62H12,

62F12,

Fixed effects,

incidental parameters,

local likelihood ratios,

local parameter space,

regular estimators,

efficiency bound,

factor models,

keywords:

[class=MSC]

keywords:

1 Introduction

Consider the dynamic panel data model with interactive effects

y_{it}=\alpha\,y_{it-1}+\delta_{t}+\lambda_{i}^{\prime}f_{t}+\varepsilon_{it}

(1)

i=1,2,...,N;t=1,...,T

where $y_{it}$ is the outcome variable, $\lambda_{i}$ and $f_{t}$ are each $r\times 1$ and both are unobservable, $\delta_{t}$ is the time effect, and $\varepsilon_{it}$ is the error term. Only $y_{it}$ are observable. The above model is increasingly used for empirical studies in social sciences. The purpose of this paper is about efficient estimation of the model. We argue that quasi-maximum likelihood estimation is a preferred method. But first, we explain the meaning of QMLE for this model and its motivations.

The index $i$ is referred to as individuals (e.g., households) and $t$ as time. If $f_{t}=1$ for all $t$ , and $\lambda_{i}$ is scalar, then $\delta_{t}+\lambda_{i}^{\prime}f_{t}=\delta_{t}+\lambda_{i}$ , we obtain the usual additive individual and time fixed effects model. Dynamic panel models with additive fixed effects remain the workhorse for empirical research. The product $\lambda_{i}^{\prime}f_{t}$ is known as the interactive effects [5], and is more general than additive fixed effects models. The models allow the individual heterogeneities (such as unobserved innate ability, captured by $\lambda_{i}$ ) to have time varying impact (through $f_{t}$ ) on the outcome variable $y_{it}$ . In a different perspective, the models allow common shocks (modeled by $f_{t})$ to have heterogeneous impact (through $\lambda_{i}$ ) on the outcome. For many panel data sets, $N$ is usually much larger than $T$ because it is costly to keep track of the same individuals over time. Under fixed $T$ , typical estimation methods such as the least squares do not yield consistent estimation of the model parameters. Consider the special case

y_{it}=\alpha y_{it-1}+c_{i}+\varepsilon_{it}

(2)

where $c_{i}$ are fixed effects. This corresponds to $\delta_{t}=0$ and $f_{t}=1$ for all $t$ . Even if $c_{i}$ are iid, zero mean and finite variance, and independent of $\varepsilon_{it}$ , the least squares estimator $\widehat{\alpha}=(\sum_{i}\sum_{t}y_{it-1}^{2})^{-1}\sum_{i}\sum_{t}y_{it-1}y_{it}$ is easily shown to be inconsistent. When $c_{i}$ are treated as parameters to be estimated along with $\alpha$ , the least squares method is still bias, and the order of bias is $O(1/T)$ , no matter how large is $N$ , see [18]. So unless $T$ goes to infinity, the least squares method remains inconsistent, an issue known as the incidental parameters problem, for example, [13] and [14].

However, provided $T\geq 3$ , consistent estimation of $\alpha$ is possible with the instrumental variables (IV) method. Anderson and Hsiao [3] suggested the IV estimator by solving $\sum_{i=1}^{N}y_{i1}(\Delta y_{i3}-\alpha\Delta y_{i2})=0$ , where $\Delta y_{it}=y_{it}-y_{it-1}$ . Differencing the data purges $c_{i}$ , but introduces correlation between the regressor and the resulting errors, which is why the IV method is used. With $T$ strictly greater than 3, more efficient IV estimator is suggested by Arellano and Bond [4]. Model (2) can also be estimated by the Gaussian quasi-maximum likelihood method, for example, [1], [6], and [17].

For model (1), differencing cannot remove the interactive effects since $\Delta y_{it}=\alpha\,\Delta y_{it-1}+\Delta\delta_{t}+\lambda_{i}^{\prime}\Delta f_{t}+\Delta\varepsilon_{it}$ . The model can be estimated by the fixed effects approach, treating both $\lambda_{i}$ and $f_{t}$ as parameters. Just like the least squares for the earlier additive effects model, the fixed effects method will produce bias. Below we introduce the quasi-likelihood approach, similar to [8, 9] for non-dynamic models.

Project the first observation $y_{i1}$ on $[1,\lambda_{i}]$ and write $y_{i1}=\delta_{1}^{*}+\lambda_{i}^{\prime}f_{1}^{*}+\varepsilon_{i1}^{*}$ , where $(\delta_{1}^{*},f_{1}^{*})$ is the projection coefficients, and $\varepsilon_{i1}^{*}$ is the projection error. The asterisk variables are different from the true $(\delta_{1},f_{1},\varepsilon_{i1})$ that generates $y_{i1}$ . This projection is called upon because $y_{i0}$ is not observable.¹¹1The first observation starts at $t=1$ , $y_{i0}$ is not available. If $y_{i0}$ were observable we would have $y_{i1}=\alpha y_{i0}+\delta_{1}+\lambda_{i}^{\prime}f_{1}+\varepsilon_{i1}$ . But then a projection of $y_{i0}$ on $[1,\lambda_{i}]$ would be required. Note that we can drop the superscript $*$ to simplify the notation. This is because we will treat $\delta_{t}$ and $f_{t}$ as (nuisance and free) parameters, and we do not require $\varepsilon_{it}$ to have the same distribution over time. This means we can rewrite $y_{i1}$ as $y_{i1}=\delta_{1}+\lambda_{i}^{\prime}f_{1}+\varepsilon_{i1}$ .

The following notation will be used

y_{i}=\begin{bmatrix}y_{i1}\\ \vdots\\ y_{iT}\\ \end{bmatrix},\quad\delta=\begin{bmatrix}\delta_{1}\\ \vdots\\ \delta_{T}\\ \end{bmatrix},\quad F=\begin{bmatrix}f_{1}^{\prime}\\ \vdots\\ f_{T}^{\prime}\\ \end{bmatrix},\quad\varepsilon_{i}=\begin{bmatrix}\varepsilon_{i1}\\ \vdots\\ \varepsilon_{iT}\\ \end{bmatrix}

(3)

together with the following $T\times T$ matrices,

B=\begin{bmatrix}1&0&\cdots&0\\ -\alpha&1&\cdots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\cdots&-\alpha&1\\ \end{bmatrix},\quad J=\begin{bmatrix}0&0&\cdots&0\\ 1&0&\cdots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\cdots&1&0\\ \end{bmatrix},\quad L=\begin{bmatrix}0&0&\cdots&0&0\\ 1&0&\cdots&0&0\\ \alpha&1&\ddots&0&0\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ \alpha^{T-2}&\cdots&\alpha&1&0\\ \end{bmatrix}

(4)

Note that $L=JB^{-1}$ . With these notations, we can write the model as

By_{i}=\delta+F\lambda_{i}+\varepsilon_{i}

This gives a simultaneous equations system with $T$ equations. We assume $\lambda_{i}$ are iid, independent of $\varepsilon_{i}$ . Without loss of generality, we assume $\mathbb{E}(\lambda_{i})=0$ , otherwise, absorb $F\lambda$ (where $\lambda=\mathbb{E}\lambda_{i})$ into $\delta$ . Further assume $\mathbb{E}(\lambda_{i}\lambda_{i}^{\prime})=I_{r}$ (a normalization restriction, where $I_{r}$ is an identity matrix). We also assume $\varepsilon_{i}$ are iid with zero mean, and

D=\mbox{var}(\varepsilon_{i})=\operatorname*{diag}(\sigma_{1}^{2},\sigma_{2}^{2},...,\sigma_{T}^{2})

These assumptions imply that $By_{i}$ are iid with mean $\delta$ and covariance matrix $FF^{\prime}+D$ . Consider the Gaussian quasi likelihood function

\ell_{NT}(\theta)=-\frac{N}{2}\log|FF^{\prime}+D|-\frac{1}{2}\sum_{i=1}^{N}(By_{i}-\delta)^{\prime}(FF^{\prime}+D)^{-1}(By_{i}-\delta)

where the Jacobian does not enter since the determinant of $B$ is 1, where $\theta=(\alpha,\delta,F,\sigma_{1}^{2},...,\sigma_{T}^{2})$ . The quasi-maximum likelihood estimator (QMLE) is defined as

\widehat{\theta}=\operatorname*{argmax}_{\theta}\ell_{NT}(\theta)

The asymptotic distribution of this estimator is studied by [7].

An alternative estimator, the fixed effects estimator, treats both $\lambda_{i}$ and $f_{t}$ as parameters, in addition to $\alpha$ and $\delta_{t}$ . The corresponding likelihood function under normality of $\varepsilon_{it}$ is given in (7) below. The fixed effects framework estimates more nuisance parameters (can be substantially more under large $N$ ), the source of incidental parameters problem. Our analysis focuses on QMLE. Comparison of the two approaches will be made based on the local likelihood methods.

The objectives of the present paper are threefold. First, what is the efficiency bound for the system maximum likelihood estimator under normality assumption? Second, does the QMLE attain the normality efficiency bound without the normality assumption? Third, how does QMLE fare in comparison to the fixed effects estimator?

We approach these questions with the Le Cam’s type of analysis. The difficulty lies in the increasing dimension of the parameter space as $T$ goes to infinity because the number of parameters is of order $T$ . No sparsity in parameters is assumed. With sparsity, [12] derived efficiency bounds and constructed efficient estimators via regularization for various models. The ability to deal with non-sparsity in the current model relies on panel data.

On notation: $\|A\|$ denotes the Frobenius norm for matrix (or vector) $A$ , that is, $\|A\|=(\mathrm{tr}(A^{\prime}A))^{1/2}$ , and $\|A\|_{sp}$ denotes the spectral norm of $A$ , that is, the square root of the largest eigenvalue of $A^{\prime}A$ . Notice $\|AB\|\leq\|A\|_{sp}\|B\|$ . The transpose of $A$ is denoted by $A^{\prime}$ ; $|A|$ and $\mathrm{tr}(A)$ denote, respectively, its determinant and trace for a square matrix $A$ .

2 Assumptions for QMLE

We assume $|\alpha|<1$ for asymptotic analysis. The following assumptions are made for the model.

Assumption A

(i) $\varepsilon_{i}$ are iid over $i$ ; $\mathbb{E}(\varepsilon_{it})=0$ , $\mbox{var}(\varepsilon_{it})=\sigma_{t}^{2}>0$ , and $\varepsilon_{it}$ are independent over $t$ ; $\mathbb{E}\varepsilon_{it}^{4}\leq M<\infty$ for all $i$ and $t$ .

(ii) The $\lambda_{i}$ are iid, independent of $\varepsilon_{i}$ , with $\mathbb{E}\lambda_{i}=0$ , $\mathbb{E}(\lambda_{i}\lambda_{i}^{\prime})=I_{r}$ , and $\mathbb{E}\|\lambda_{i}\|^{4}\leq M$ .

(iii) There exist constants $a$ and $b$ such that $0<a<\sigma_{t}^{2}<b<\infty$ for all $t$ ; $\frac{1}{T}F^{\prime}D^{-1}F=\frac{1}{T}\sum_{t=1}^{T}\sigma_{t}^{-2}f_{t}f_{t}^{\prime}\rightarrow\Sigma_{ff}>0$ .

Two comments are in order for this model. First, Assumption A(ii) assumes $\lambda_{i}$ are random variables, but they can be fixed bounded constant. All needed is that $\Psi_{N}:=\frac{1}{N}\sum_{i=1}^{N}(\lambda_{i}-\bar{\lambda})(\lambda_{i}-\bar{\lambda})^{\prime}\rightarrow\Psi>0$ (an $r\times r$ positive definite matrix), where $\bar{\lambda}$ is the sample average of $\lambda_{i}$ . One can normalize the matrix $\Psi$ to be an identity matrix. Second, $F$ is determined up to an orthogonal rotation since $FF^{\prime}=FR(FR)^{\prime}$ for $RR^{\prime}=I_{r}$ . the rotational indeterminacy can be removed by the normalization that $F^{\prime}D^{-1}F$ is a diagonal matrix (with distinct elements), see [2] (p.573) and [15] (p.8). Rotational indeterminacy does not affect the estimate for $\alpha$ , $D$ , and $\delta$ .

Under Assumption A, [7] showed that the QMLE for $\widehat{\alpha}$ has the following asymptotic representation

\sqrt{NT}(\widehat{\alpha}-\alpha)=\Big{(}\frac{1}{T}\mathrm{tr}(LDL^{\prime}D^{-1})\bigg{)}^{-1}\Big{[}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}\Big{]}+o_{p}(1)

(5)

where $L$ , $D$ , and $\varepsilon_{i}$ are defined earlier. Note that $\mathbb{E}[(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}]=\mathrm{tr}(L^{\prime})=0$ , and $\mathbb{E}[(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}]^{2}=\mathrm{tr}(LDL^{\prime}D^{-1})$ with the expression

\frac{1}{T}\mathrm{tr}(LDL^{\prime}D^{-1})=\frac{1}{T}\sum_{t=2}^{T}\frac{1}{\sigma_{t}^{2}}\Big{(}\sigma_{t-1}^{2}+\alpha^{2}\sigma_{t-2}^{2}+\cdots+\alpha^{2(t-2)}\sigma_{1}^{2}\Big{)}

Assume the above converges to $\gamma$ , as $T\rightarrow\infty$ , then we have, as $N,T\rightarrow\infty$ ,

\sqrt{NT}(\widehat{\alpha}-\alpha)\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1/\gamma).

For the special case of homoskedasticity, ( $\sigma_{t}^{2}=\sigma^{2}$ for all $t$ ), $\gamma=1/(1-\alpha^{2})$ , and hence $\sqrt{NT}(\widehat{\alpha}-\alpha)\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1-\alpha^{2})$ . QMLE requires no bias correction, unlike the fixed effects regression. The latter is considered by [5] and [16]. Our objective is to show that $1/\gamma$ is the efficiency bound under normality assumption, and QMLE attains the normality efficiency bound. This result is obtained in the presence of increasing number of incidental parameters. The estimator $\widehat{\alpha}$ is also consistent under fixed $T$ in contrast to the fixed effects estimator. The estimated $\delta_{t},f_{t},\sigma_{t}^{2}$ are all $\sqrt{N}$ consistent and asymptotically normal. In particular, the estimated factors $\widehat{f}_{t}$ have the asymptotic representation, for each $t=1,2,...,T$

\sqrt{N}(\widehat{f}_{t}-f_{t})=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}\varepsilon_{it}+o_{p}(1)

(6)

This implies $\sqrt{N}(\widehat{f}_{t}-f_{t})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,\sigma_{t}^{2}I_{r})$ . Details are given in Bai [7]. Also see [8], under the IC3 normalization) for factor models.

3 Local likelihood ratios and efficiency bound

3.1 Related literature

A closely related work is that of Iwakura and Okui [11]. They consider the fixed effects framework instead of the QMLE. The fixed effects estimation procedure treats both $\lambda_{i}$ and $f_{t}$ as parameters $(i=1,2,...,N;t=1,2,...,T$ ), along with $\alpha$ and $\delta$ . The corresponding likelihood function under normality of $\varepsilon_{it}$ is²²2The fixed effects likelihood does not have a global maximum under heteroskedasticity, for example, [2] (p.587), but local maximization is still meaningful. Another solution is to impose homoskedasticity.

\ell_{\text{fixed effects}}(\theta)=-\frac{N}{2}\sum_{t=2}^{T}\log\sigma_{t}^{2}-\frac{1}{2}\sum_{i=1}^{N}\sum_{t=2}^{T}(y_{it}-\alpha y_{i,t-1}-\delta_{t}-\lambda_{i}^{\prime}f_{t})^{2}/\sigma_{t}^{2}

(7)

The fixed effects estimator for $\alpha$ will generate bias, similar to the fixed effects estimator for dynamic panels with additive effects. The bias is studied by [16].³³3 In contrast, the QMLE does not generate bias under fixed $T$ . Iwakura and Okui [11] derive the efficiency bound for the fixed effects estimators under homoskedasticity ( $\sigma_{t}^{2}=\sigma^{2}$ for all $t$ ). Another closely related work is that of Hahn and Kuersteiner [10]. They consider the efficiency bound problem under the fixed effects framework for the additive effects model described by (2). Throughout this paper, the fixed effects framework refers to methods that also estimate the factor loadings $\lambda_{i}$ in addition to $f_{t}$ .

In contrast, we consider the likelihood function for the system of equations

\ell(\theta)=-\frac{N}{2}\log|FF^{\prime}+D|-\frac{1}{2}\sum_{i=1}^{N}(By_{i}-\delta)^{\prime}(FF^{\prime}+D)^{-1}(By_{i}-\delta)

(8)

QMLE does not estimate $\lambda_{i}$ (even if they are fixed constants as explained earlier), thus eliminating the incidental parameters in the cross-section dimension. The incidental parameters are now $\delta$ , $F$ and $D$ , and the number of parameters increases with $T$ . Despite fewer number of incidental parameters, the analysis of local likelihood is more demanding than that of the fixed effects likelihood (7). Intuitively, the fixed effects likelihood (7) is quadratic in $F$ , but the QMLE likelihood $\ell(\theta)$ in (8) depends on $F$ through the inverse matrix $(FF^{\prime}+D)^{-1}$ and through the log-determinant of this matrix. The high degree of nonlinearity makes the perturbation analysis more challenging. As demonstrated later, the local analysis brings insights regarding the relative merits of the QMLE and the fixed effects estimators.

3.2 The $\ell^{\infty}$ local parameter space

Local likelihood ratio processes are indexed by local parameters. Since the convergence rate for the estimated parameter of $\alpha^{0}$ is $\sqrt{NT}$ , that is, $\sqrt{NT}(\widehat{\alpha}-\alpha^{0})=O_{p}(1)$ , it is natural to consider the local parameters of the form

\alpha^{0}+\frac{1}{\sqrt{NT}}\widetilde{\alpha}

where $\widetilde{\alpha}\in\mathbb{R}$ . However, the consideration of local parameters for $f_{t}^{0}$ is non-trivial, as explained by Iwakura and Okui [11] for the fixed effects likelihood ratio. We consider the following local parameters

f_{t}^{0}+\frac{1}{\sqrt{N}}(\frac{1}{\sqrt{T}}\widetilde{f}_{t}),\quad t=1,2,...,T

(9)

where $\|\widetilde{f}_{t}\|\leq M<\infty$ for all $t$ with $M$ arbitrarily given; $\|\cdot\|$ denotes the r-dimensional Euclidean norm. In view that the estimated factor $\widehat{f}_{t}$ is $\sqrt{N}$ consistent, that is, $\sqrt{N}(\widehat{f}_{t}-f_{t}^{0})=O_{p}(1)$ , one would expect local parameters in the form $f_{t}^{0}+N^{-1/2}\widetilde{f}_{t}$ , the extra scale factor $T^{-1/2}$ in the above local rate looks rather unusual. However, (9) is the suitable local rate for the local likelihood ratio to be $O_{p}(1)$ , as is shown in both the statement and the proof of Theorem 1 below. Without the scale factor $T^{-1/2}$ , the local likelihood ratio will diverge to infinity (in absolute values) if no restrictions are imposed on $\widetilde{f}_{t}$ other than its boundedness. This type of local parameters was used in earlier work by [10] for additive fixed effects estimator. Later we shall consider a different type of local parameters without the extra scale factor $T^{-1/2}$ , but other restrictions on $\widetilde{f}_{t}$ will be needed.

Additionally, even if one regards $\frac{1}{\sqrt{NT}}\widetilde{f}_{t}$ to be small (relative to $\frac{1}{\sqrt{N}}\widetilde{f}_{t}$ ), it is for the better provided that the associated efficiency bound is achievable by an estimator. This is because the smaller the perturbation, the lower the efficiency bound, and hence harder to attain by any estimator.

Consider the space

\ell_{r}^{\infty}:=\{(\widetilde{f}_{t})_{t=1}^{\infty}\,\Big{|}\widetilde{f}_{t}\in\mathbb{R}^{r},\,\sup_{s}\|\widetilde{f}_{s}\|<\infty\}

the space of bounded sequences, each coordinate is $\mathbb{R}^{r}$ -valued. Let

\widetilde{f}=(\widetilde{f}_{1},\widetilde{f}_{2},...)\in\ell_{r}^{\infty}

and define $\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime}$ , the projection of $\widetilde{f}$ onto the first $T$ coordinates. The matrix $\widetilde{F}$ is $T\times r$ , but we suppress its dependence on $T$ for notational simplicity. Since $\widetilde{f}\in\ell_{r}^{\infty}$ , it follows that

\frac{1}{T}\widetilde{F}^{\prime}\widetilde{F}=\frac{1}{T}\sum_{t=1}^{T}\widetilde{f}_{t}\widetilde{f}_{t}^{\prime}=O(1)

(10)

Similarly, for the time effects, we consider the local parameters

\delta_{t}^{0}+\frac{1}{\sqrt{NT}}\widetilde{\delta}_{t}

with $|\widetilde{\delta}_{t}|\leq M$ for all $t$ . Let $\ell_{1}^{\infty}:=\{(\widetilde{\delta}_{t})_{t=1}^{\infty}\,\Big{|}\widetilde{\delta}_{t}\in\mathbb{R},\,\sup_{s}|\widetilde{\delta}_{s}|<\infty\}$ . For each $\{\widetilde{\delta}_{t}\}=(\widetilde{\delta}_{1},\widetilde{\delta}_{2},...)\in\ell_{1}^{\infty}$ , we use $\widetilde{\delta}$ to denote the projection of the sequence onto the first $T$ coordinates $\widetilde{\delta}=(\widetilde{\delta}_{1},\widetilde{\delta}_{2},...,\widetilde{\delta}_{T})^{\prime}$ , a $T\times 1$ vector.

To simplify the analysis, we assume $D$ is known. It can be shown that this simplifying assumption does not affect the efficiency bound for $\alpha$ , but reduces the complexity of the derivation. Let $\theta^{0}=(\alpha^{0},\delta^{0},F^{0})$ and $\widetilde{\theta}=(\widetilde{\alpha},\widetilde{\delta},\widetilde{F})$ , we study the asymptotic behavior of

\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})

under the normality of $\varepsilon_{it}$ and $\lambda_{i}$ . The normality assumption allows us to derive the parametric efficiency bound in the presence of increasing number of nuisance parameters. We then show the QMLE without normality attains the efficiency bound. In the rest of the paper, we use $(\alpha^{0},\delta^{0},F^{0})$ and $(\alpha,\delta,F)$ interchangeably (they represent the true parameters); $(\widetilde{\alpha},\widetilde{\delta},\widetilde{F})$ represent local parameters, and $\widehat{\alpha}$ , $\widehat{\delta}$ , and $\widehat{F}=(\widehat{f}_{1},...,\widehat{f}_{T})^{\prime}$ are the QMLE estimated parameters.

Assumption B

(i): $\varepsilon_{it}$ are iid over $i$ , and independent over $t$ such that $\varepsilon_{it}\sim\mathcal{N}(0,\sigma_{t}^{2})$ .

(ii): $\lambda_{i}$ are iid $\mathcal{N}(0,I_{r})$ , independent of $\varepsilon_{it}$ for all $i$ and $t$ .

(iii): $\sigma_{t}^{2}\in[a,b]$ with $0<a<b<\infty$ for all $t$ .

(iv): $\|f_{t}\|\leq M<\infty$ for all $t$ , and $\frac{1}{T}F^{\prime}D^{-1}F\rightarrow\Sigma_{ff}>0$ , where $D=\operatorname*{diag}(\sigma_{1}^{2},\sigma_{2}^{2},...,\sigma_{T}^{2})$ .

(v): As $T\rightarrow\infty$ ,
(a) $\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)\rightarrow\gamma>0$ ,
(b) $\frac{1}{T}\mathrm{tr}[(LF)^{\prime}(D^{-1/2}M_{D^{-1/2}F}D^{-1/2})(LF)]\rightarrow\nu\geq 0$
(c) $\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)\rightarrow\mu\geq 0$
where $M_{D^{-1/2}F}$ denote the projection matrix orthogonal to $D^{-1/2}F$ . Specifically,

D^{-1/2}M_{D^{-1/2}F}D^{-1/2}=D^{-1}-D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}.

Under assumptions B(i) and (ii), $F\lambda_{i}+\varepsilon_{i}$ are iid $\mathcal{N}(0,FF^{\prime}+D)$ , implying a parametric model with an increasing dimension of incidental parameters. Normality for $\lambda_{i}$ and $\varepsilon_{it}$ is a standard assumption in factor analysis, see, e.g., Anderson [2] (p.576). Here we switch the role of $\lambda_{i}$ and $f_{t}$ . Note in classical factor analysis, the time dimension $T$ (in our notation) is fixed, there is no incidental parameters problem since the number of parameters is fixed. But we consider $T$ that goes to infinity. The following theorem gives the asymptotic representation for the local likelihood ratios.

Theorem 1.

Under Assumption B, for $\widetilde{\alpha}\in\mathbb{R}$ , $\widetilde{f}\in\ell_{r}^{\infty}$ , $\{\widetilde{\delta}_{t}\}\in\ell_{1}^{\infty}$ , $\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime}$ , and $\widetilde{\delta}=(\widetilde{\delta}_{1},...,\widetilde{\delta}_{T})^{\prime}$ , we have as $N,T\rightarrow\infty$ ,

\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})=\Delta_{NT}(\widetilde{\theta})-\frac{1}{2}\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}+o_{p}(1)

where

$\displaystyle\Delta_{NT}(\widetilde{\theta})$	$\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}$
	$\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$
	$\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}$	(11)
	$\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}$
	$\displaystyle+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$

	$\displaystyle\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}=$	$\displaystyle\frac{1}{T}\mathrm{tr}\Big{[}\widetilde{F}^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}\Big{]}$
		$\displaystyle+\widetilde{\alpha}^{2}\Big{[}\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)\Big{]}$
		$\displaystyle+\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)\Big{]}$
		$\displaystyle+2\widetilde{\alpha}\,\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}\Big{]}$
		$\displaystyle+2\widetilde{\alpha}\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta$
		$\displaystyle+\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}$
		$\displaystyle+\widetilde{\alpha}^{2}\;\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)$

where $o_{p}(1)$ is uniform over $\widetilde{\theta}$ such that $|\widetilde{\alpha}|\leq M$ , $\frac{1}{T}\|\widetilde{F}^{\prime}\widetilde{F}\|=\frac{1}{T}\sum_{t=1}^{T}\|\widetilde{f}_{t}\|^{2}\leq M$ , and $\frac{1}{T}\|\widetilde{\delta}\|^{2}=\frac{1}{T}\sum_{t=1}^{T}\widetilde{\delta}_{t}^{2}\leq M$ , for any given $M<\infty$ .

The proof of Theorem 1 is given in the Appendix.

Note that the expected value of $\Delta_{NT}(\widetilde{\theta})$ is zero, so $\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}$ is the variance.

All terms in $\Delta_{NT}(\widetilde{\theta})$ are stochastically bounded, they have expressions of the form $\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi_{it}$ , where $\xi_{it}$ have zero mean, and finite variance (and in fact finite moments of any order under Assumption B). By assuming $\{\widetilde{\delta}_{t}\}$ and $\widetilde{f}$ are such that

	$\displaystyle\frac{1}{T}\widetilde{F}^{\prime}D^{-1}\widetilde{F}$	$\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}\widetilde{f}_{t}\widetilde{f}_{t}^{\prime}\rightarrow\Sigma_{\widetilde{f}\widetilde{f}},$
	$\displaystyle\frac{1}{T}\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta}$	$\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}\widetilde{\delta}_{t}^{2}\rightarrow\sigma_{\delta}^{2}$

as well as existence of limits for several cross products $\frac{1}{T}\widetilde{F}^{\prime}D^{-1}F$ , $\frac{1}{T}\widetilde{\delta}^{\prime}D^{-1}F$ , and so forth, then the variance $\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}$ has a limit. Let $\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}\rightarrow\tau^{2}$ for some $\tau^{2}$ depending on $(\widetilde{\alpha},\{\widetilde{\delta}_{t}\},\widetilde{f})$ . We can further show

\Delta_{NT}(\widetilde{\theta})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,\tau^{2}).

Thus the local likelihood ratio can be rewritten as

\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})=\Delta_{NT}(\widetilde{\theta})-\frac{1}{2}\tau^{2}+o_{p}(1).

We next consider the asymptotic efficiency bound for regular estimators. Regular estimators rule out Hodges type “superefficient” and James-Stein type estimators. A regular estimator sequence converges locally uniformly (under the local laws) to a limiting distribution that is free from the local parameters (van der Vaart [19], p.115, p.365).

3.3 Efficient scores and efficiency bound

In the likelihood ratio expansion, the term $\Delta_{NT}(\widetilde{\theta})$ contains the scores of the likelihood function. The coefficient of $\widetilde{\alpha}$ gives the score for $\alpha^{0}$ , the coefficient of $\widetilde{f}_{t}$ gives the score of $f_{t}^{0}$ , and the same holds for $\widetilde{\delta}$ and $\delta^{0}$ . The efficient score for $\alpha^{0}$ is the projection residual of its own score onto the scores of $f_{1}^{0},...,f_{T}^{0}$ and of $\delta_{1}^{0},...,\delta_{T}^{0}$ . Moreover, the inverse of the variance of the efficient score gives the efficiency bound. To derive the efficient score for $\alpha^{0}$ , rewrite $\Delta_{NT}(\widetilde{\theta})$ of Theorem 1 as

\Delta_{NT}(\widetilde{\theta})=\Delta_{NT1}+\Delta_{NT2}+\widetilde{\alpha}\,[\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5}]

where $\Delta_{NT1}$ and $\Delta_{NT2}$ denote the first two terms of $\Delta_{NT}(\widetilde{\theta})$ , see (11), and $\Delta_{NTj}$ $(j=3,4,5$ ) denote the last three terms of (11), but taking out $\widetilde{\alpha}$ . So the score for $\alpha^{0}$ is the sum $\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5}$ . Next, rewrite

\displaystyle\Delta_{NT1}

\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}=\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\widetilde{f}_{t}^{\prime}v_{t}

where $v_{t}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}v_{it}$ $(r\times 1)$ and $v_{it}$ is the $t$ -th element of the vector $D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}$ . Thus $v_{t}$ is the score of $f^{0}_{t}$ ( $t=1,2,...,T$ ). Similarly, rewrite

\displaystyle\Delta_{NT2}

\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\widetilde{\delta}_{t}u_{t}

where $u_{t}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}u_{it}$ (a scalar), where $u_{it}$ is the $t$ th element of the vector $(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$ , and $u_{t}$ is the score of $\delta_{t}^{0}$ . To obtain the efficient score for $\alpha^{0}$ , we project $\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT4}$ onto the scores of $f_{t}^{0}$ and $\delta_{t}^{0}$ $(t=1,2,...,T)$ , that is onto $[v_{1},v_{2}....,v_{T}]$ and $[u_{1},u_{2},...,u_{T}]$ to get the projection residuals. Let $V_{T}=(v_{1}^{\prime},v_{2}^{\prime},...,v_{T}^{\prime})^{\prime}$ and $U_{T}=(u_{1},u_{2},...,u_{T})^{\prime}$ and $Z_{T}=(V_{T}^{\prime},\;U_{T}^{\prime})^{\prime}$ . The projection residual is given by

\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5}-Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}\left[Z_{T}(\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5})\right]

(12)

Notice $\Delta_{NT3}$ is uncorrelated with the scores of $f_{t}^{0}$ and of $\delta_{t}^{0}$ ( $t=1,2,...,T)$ , i.e. $\mathbb{E}(Z_{T}\Delta_{NT,3})=0$ . This follows because $L\varepsilon_{i}$ contains the lags of $\varepsilon_{i}$ , so $\Delta_{NT3}$ is composed of terms $\varepsilon_{it-s}\varepsilon_{it}$ (with $s\geq 1$ ), and $\mathbb{E}(\varepsilon_{it-s}\varepsilon_{it}\varepsilon_{ik})=0$ for any $k$ . Next, $\Delta_{NT4}$ is simply a linear combination of $V_{T}=[v_{1},v_{2},...,v_{T}]$ since $\Delta_{NT3}$ can be written as $T^{-1/2}\sum_{t=1}^{T}p_{t}^{\prime}v_{t}$ , where $p_{t}^{\prime}$ is the $t$ -th row of the matrix $LF$ . Because $V_{T}$ is a subvector of $Z_{T}$ , $\Delta_{NT4}$ is also a linear combination of $Z_{T}$ . Thus $Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}[Z_{T}(\Delta_{NT4})]\equiv\Delta_{NT4}$ . Similarly, $\Delta_{NT5}$ is a linear combination of $U_{T}$ because $\Delta_{NT5}=\frac{1}{\sqrt{T}}\sum_{t=1}^{T}q_{t}u_{t}$ with $q_{t}$ being the $t$ th element of $L\delta$ . Thus $\Delta_{NT5}$ is a linear combination of $Z_{T}$ . Hence, $Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}[Z_{T}(\Delta_{NT5})]\equiv\Delta_{NT5}$ . In summary, we have

Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}\left[Z_{T}(\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5})\right]=\Delta_{NT4}+\Delta_{NT5}

It follows that the projection residual in (12) is equal to $\Delta_{NT3}$ . Hence the efficient score for $\alpha^{0}$ is $\Delta_{NT3}$ . Notice,

\mbox{var}(\Delta_{NT3})=\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)=\frac{1}{T}\sum_{t=2}^{T}\frac{1}{\sigma_{t}^{2}}\Big{(}\sigma_{t-1}^{2}+\alpha^{2}\sigma_{t-2}^{2}+\cdots+\alpha^{2(t-2)}\sigma_{1}^{2}\Big{)}

(13)

its limit is $\gamma$ by Assumption B(v), so $1/\gamma$ gives the asymptotic efficiency bound.

We summarize the result in the following corollary

Corollary 1.

Under Assumption B, the asymptotic efficiency bound for regular estimators of $\alpha^{0}$ , is $1/\gamma$ , with $\gamma$ being the limit of (13).

Since Assumption B is stronger than Assumption A, the asymptotic representation in (5) holds under Assumption B. That is, under the normality assumption, the system maximum likelihood estimator satisfies

\sqrt{NT}(\widehat{\alpha}-\alpha)=\Big{(}\frac{1}{T}\mathrm{tr}(LDL^{\prime}D^{-1})\bigg{)}^{-1}\Big{[}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}\Big{]}+o_{p}(1).

We see that $\sqrt{NT}(\widehat{\alpha}-\alpha^{0})$ is expressed in terms of the efficient influence functions, thus $\widehat{\alpha}$ is regular and asymptotically efficient (van der Vaart [19], p.121 and p.369). We state the result in the following corollary.

Corollary 2.

Under Assumption B, the system maximum likelihood estimator $\widehat{\alpha}$ is a regular estimator and achieves the asymptotic efficiency bound (in spite of an increasing number of incidental parameters).

The preceding corollaries imply that, under normality, we are able to establish the asymptotic efficiency bound in the presence of increasing number of nuisance parameters. Further, the system maximum likelihood estimator achieves the efficiency bound. These results are not obvious owing to the incidental parameters problem.

QMLE in Section 2 does not require normality. But it achieves the normality efficiency bound, see equation (5). So QMLE is robust to the normality assumption. If $\lambda_{i}$ and $\varepsilon_{i}$ are non-normal, and their distributions are known, one should be able to construct a more efficient estimator than the QMLE. But for panel data analysis in practice, researchers usually do not impose distributional assumptions other than some moment conditions such as those in Assumption A. Thus QMLE presents a viable estimation procedure, knowing that it achieves the normality efficiency bound. Furthermore, QMLE does not need bias correction, unlike the fixed effects estimator.

The result of Corollary 1 is not directly obtained via a limit experiment and the convolution theorem (e.g., van der Vaart and Wellner, [20], chapter 3.11). Since $\ell_{r}^{\infty}$ is not a Hilbert space, the convolution theorem is not directly applicable. However, using the line of argument in [11] it is possible to construct a Hilbert subspace with an appropriate inner product in which the efficiency bound for the low dimensional parameter $\alpha^{0}$ can be shown to be $1/\gamma$ . That is, Corollary 1 can be obtained via the convolution theorem. But [11] also show that the incidental parameters $f_{t}^{0}$ under the corresponding local parameter space are not regular. Therefore, we shall not pursue this approach. Below we shall consider $\ell^{2}$ perturbations.

3.4 The $\ell^{2}$ local parameter space

To have the limit process of the local likelihood ratios reside in a Hilbert space and to directly apply the convolution theorem ([20], chapter 3.11), we consider the second type of local parameter space, which is also used by [11] for the fixed effects estimators:

f_{t}^{0}+\frac{1}{\sqrt{N}}\widetilde{f}_{t},\quad t=1,2,...,T

(14)

with $\widetilde{f}=(\widetilde{f}_{1},\widetilde{f}_{2},...)$ being required to be in $\ell_{r}^{2}$ :

\ \ell_{r}^{2}:=\Big{\{}(\widetilde{f}_{t})_{t=1}^{\infty}\,\Big{|}\widetilde{f}_{t}\in\mathbb{R}^{r},\sum_{s=1}^{\infty}\|\widetilde{f}_{s}\|^{2}<\infty\Big{\}}.

For this type of local parameters, we can remove the scale factor $T^{-1/2}$ , (cf. (9)). Since $\widetilde{f}\in\ell_{r}^{2}$ , we have, for $\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime}$ (projection of $\widetilde{f}$ on the first $T$ coordinates),

\widetilde{F}^{\prime}\widetilde{F}=\sum_{t=1}^{T}\widetilde{f}_{t}\widetilde{f}_{t}^{\prime}=O(1)

This is in contrast with (10). A necessary condition for $\widetilde{f}\in\ell_{r}^{2}$ is $\widetilde{f}_{t}\rightarrow 0$ as $t\rightarrow\infty$ . In comparison, the $\ell_{r}^{\infty}$ perturbation considered earlier only requires the boundedness of $\widetilde{f}_{t}$ , so the process $\widetilde{f}_{t}$ can be rather “jagged.” In certain sense, requiring $\widetilde{f}$ to be in $\ell_{r}^{2}$ can be viewed as a “smoothness” restriction (for example, the Banach-Mazur theorem).

Similar to $\widetilde{f}$ , we assume the sequence $\{\widetilde{\delta}_{t}\}$ is in $\ell_{1}^{2}$ , i.e. $\sum_{s=1}^{\infty}\widetilde{\delta}_{s}^{2}<\infty$ . We still use $\widetilde{\delta}$ to denote the projection of the sequence onto the first $T$ coordinates, $\widetilde{\delta}=(\widetilde{\delta}_{1},...,\widetilde{\delta}_{T})$ .

Theorem 2.

Under Assumption B, for $\widetilde{\alpha}\in\mathbb{R}$ , $\{\widetilde{\delta}_{t}\}\in\ell_{1}^{2}$ , $\widetilde{f}\in\ell_{r}^{2}$ , $\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime}$ , and $\widetilde{\delta}=(\widetilde{\delta}_{1},...,\widetilde{\delta}_{T})$ , we have as $N,T\rightarrow\infty$ ,

\ell(\alpha^{0}+\frac{\widetilde{\alpha}}{\sqrt{NT}},\,\delta^{0}+\frac{1}{\sqrt{N}}\widetilde{\delta},F^{0}+\frac{1}{\sqrt{N}}\widetilde{F})-\ell(\theta^{0})=\Delta_{NT}^{\dagger}(\widetilde{\theta})-\frac{1}{2}\mathbb{E}[\Delta_{NT}^{\dagger}(\widetilde{\theta})]^{2}+o_{p}(1)

where

$\displaystyle\Delta_{NT}^{\dagger}(\widetilde{\theta})$	$\displaystyle=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}$
	$\displaystyle+\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}D^{-1}\varepsilon_{i}$
	$\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}$	(15)
	$\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}$
	$\displaystyle+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$
$\displaystyle\mathbb{E}[\Delta_{NT}^{\dagger}(\widetilde{\theta})]^{2}$	$\displaystyle=\mathrm{tr}\Big{[}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\Big{]}$
	$\displaystyle+\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta}$
	$\displaystyle+\widetilde{\alpha}^{2}\Big{[}\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)\Big{]}$	(16)
	$\displaystyle+\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)\Big{]}$
	$\displaystyle+\widetilde{\alpha}^{2}\;\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)$

where $o_{p}(1)$ is uniform over $|\widetilde{\alpha}|\leq M$ , $\|\widetilde{\delta}\|\leq M$ , and $\|\widetilde{F}\|\leq M$ for any given $M<\infty$ .

In comparison with Theorem 1, Theorem 2 has simpler expressions, due to the smaller local parameter space. The first two terms in $\Delta_{NT}(\widetilde{\theta})$ are simplified, with the corresponding simplification in the variance, and in addition, two covariance terms in $\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}$ are dropped. The proof is given in the appendix.

We next establish the local asymptotic normality (LAN) property for the local likelihood ratios. Here we introduce further notation to the make the expressions more compact. Let

\lambda_{i}^{+}=\begin{bmatrix}1\\ \lambda_{i}\end{bmatrix},\quad\widetilde{f}_{t}^{+}=\begin{bmatrix}\widetilde{\delta}_{t}\\ \widetilde{f}_{t}\end{bmatrix},\quad\widetilde{F}^{+}=(\widetilde{\delta},\widetilde{F})

Both $\lambda_{i}^{+}$ and $\widetilde{f}_{t}^{+}$ are vectors with $r+1$ elements, and $\widetilde{F}^{+}$ is a matrix of $T\times(r+1)$ . With these notations, the sum of the first two terms of (15) equals $\frac{1}{\sqrt{N}}\sum_{i=1}^{N}(\lambda_{i}^{+})^{\prime}(\widetilde{F}^{+})^{\prime}D^{-1}\varepsilon_{i}$ .

For each $(\widetilde{\alpha},\widetilde{f}^{+})=(\widetilde{\alpha},\widetilde{f}_{1}^{+},\widetilde{f}_{2}^{+},....)\in\mathbb{R}\times\ell_{r+1}^{2}$ , we introduce a new sequence

h(\widetilde{\alpha},\widetilde{f}^{+})=(h_{0},h_{1},h_{2},...)=\Big{(}\widetilde{\alpha}(\gamma+\nu+\mu)^{1/2},\frac{1}{\sigma_{1}}{\widetilde{f}_{1}^{+}},\frac{1}{\sigma_{2}}{\widetilde{f}_{2}^{+}},...\Big{)}

(17)

so $h_{0}=\widetilde{\alpha}(\gamma+\nu+\mu)^{1/2}$ , and $h_{s}=\frac{1}{\sigma_{s}}\widetilde{f}_{s}^{+}$ for $s\geq 1$ , where $\gamma$ , $\nu$ , and $\mu$ are defined in Assumption B(v), and $\sigma_{s}^{2}$ is the variance of $\varepsilon_{is}$ . Hence, $h(\widetilde{\alpha},\widetilde{f}^{+})$ is a scaled version of $(\widetilde{\alpha},\widetilde{f}^{+})$ . By Assumption B(iii), $\min_{s}\sigma_{s}^{2}\geq a>0$ , it follows that $h(\widetilde{\alpha},\widetilde{f}^{+})\in\mathbb{H}:=\mathbb{R}\times\ell_{r+1}^{2}$ . For any $h,g\in\mathbb{H}$ , define the inner product, $\langle h,g\rangle=h_{0}g_{0}+\sum_{s=1}^{\infty}h_{s}^{\prime}g_{s}$ , then $\mathbb{H}$ is a Hilbert space. Let $\|h\|_{\mathbb{H}}^{2}=\langle h,h\rangle$ . In particular, for $h=h(\widetilde{\alpha},\widetilde{f}^{+})$ in (17), we have

\|h\|_{\mathbb{H}}^{2}=\widetilde{\alpha}^{2}(\gamma+\nu+\mu)+\sum_{s=1}^{\infty}\frac{1}{\sigma_{s}^{2}}(\widetilde{f}^{+}_{s})^{\prime}\widetilde{f}^{+}_{s}

(18)

Notice $\mathrm{tr}(\widetilde{F}^{\prime}D^{-1}\widetilde{F})+\mathrm{tr}(\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta})=\sum_{s=1}^{T}\frac{1}{\sigma_{s}^{2}}\widetilde{f}_{s}^{\prime}\widetilde{f}_{s}+\sum_{s=1}^{T}\frac{1}{\sigma_{s}^{2}}\widetilde{\delta}_{s}^{2}=\sum_{s=1}^{\infty}\frac{1}{\sigma_{s}^{2}}(\widetilde{f}^{+}_{s})^{\prime}\widetilde{f}^{+}_{s}+o(1)$ because the series are convergent and nonnegative; rearranging does not alter the limit. By Assumption B(v), we can write (16) as

\mathbb{E}[\Delta_{NT}^{\dagger}(\widetilde{\theta})]^{2}=\widetilde{\alpha}^{2}(\gamma+\nu+\mu)+\sum_{s=1}^{\infty}\frac{1}{\sigma_{s}^{2}}(\widetilde{f}^{+}_{s})^{\prime}\widetilde{f}^{+}_{s}+o(1)=\|h\|_{\mathbb{H}}^{2}+o(1)

(19)

where $h=h(\widetilde{\alpha},\widetilde{f})$ is given in (17). Next, rewrite (15) as

\Delta_{NT}^{\dagger}(\widetilde{\theta})=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}(\lambda^{+}_{i})^{\prime}(\widetilde{F}^{+})^{\prime}D^{-1}\varepsilon_{i}+\widetilde{\alpha}(\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5})

where $\Delta_{NTj}$ ( $j=3,4,5)$ are defined earlier. The first term

\frac{1}{\sqrt{N}}\sum_{i=1}^{N}(\lambda^{+}_{i})^{\prime}(\widetilde{F}^{+})^{\prime}D^{-1}\varepsilon_{i}=\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}(\widetilde{f}^{+}_{t})^{\prime}\Big{(}\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{+}\varepsilon_{it}\Big{)}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\Big{(}0,\sum_{t=1}^{\infty}(\widetilde{f}^{+}_{t})^{\prime}\widetilde{f}^{+}_{t}/\sigma_{t}^{2}\Big{)}

because $N^{-1/2}\sum_{i=1}^{N}\lambda_{i}^{+}\varepsilon_{it}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,I_{r+1})$ . Note that the LHS above is asymptotically independent of $\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5}$ (their covariance being zero, as is shown in the appendix). From

\Delta_{NT,3}+\Delta_{NT,4}+\Delta_{NT,5}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,\gamma+\nu+\mu)

where $\gamma,\nu,\mu$ are given in Assumption B, we have

\Delta_{NT}^{\dagger}(\widetilde{\theta})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\left(0,\|h\|_{\mathbb{H}}^{2}\,\right)

Moreover, it is not difficult to establish the finite dimensional weak convergence. Let $\widetilde{\alpha}^{j}\in\mathbb{R}$ , $\widetilde{f}^{+j}\in\ell_{r+1}^{2}$ for $j=1,2,..,q$ . Let $h^{j}=h(\widetilde{\alpha}^{j},\widetilde{f}^{+j})$ and $\widetilde{\theta}^{j}=(\widetilde{\alpha}^{j},\widetilde{F}^{+j})$ , for any finite integer $q$ ,

(\Delta_{NT}^{\dagger}(\widetilde{\theta}^{1}),...,\Delta_{NT}^{\dagger}(\widetilde{\theta}^{q}))^{\prime}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\Big{(}0,(\langle h^{j},h^{k}\rangle)_{j,k=1}^{q}\Big{)}.

Summarizing the above, we have

Corollary 3.

Under the assumption of Theorem 2,

\ell(\alpha^{0}+\frac{\widetilde{\alpha}}{\sqrt{NT}},\delta^{0}+\frac{1}{\sqrt{N}}\widetilde{\delta},F^{0}+\frac{1}{\sqrt{N}}\widetilde{F})-\ell(\theta^{0})=\Delta_{NT}^{\dagger}(\widetilde{\theta})-\frac{1}{2}\|h\|_{\mathbb{H}}^{2}+o_{p}(1)

and

\Delta_{NT}^{\dagger}(\widetilde{\theta})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\left(0,\|h\|_{\mathbb{H}}^{2}\right)

where $h=h(\widetilde{\alpha},\widetilde{f}^{+})$ and $\|h\|_{\mathbb{H}}^{2}$ are defined in (17) and (18), respectively. Furthermore, the likelihood ratio is locally asymptotically normal (LAN).

Using the convolution theorem for locally asymptotically normal (LAN) experiments, the implied efficiency bound for regular estimators of $\alpha^{0}$ is $1/(\gamma+\nu+\mu)$ . The implied efficiency bound for regular estimators of $f_{t}^{0+}:=(\delta_{t}^{0},f_{t}^{0})$ is $\sigma_{t}^{2}I_{r+1}$ for each $t$ . These bounds are, respectively, the inverse of the coefficient of $\widetilde{\alpha}^{2}$ , and the inverse of the matrix in the quadratic form $(\widetilde{f}_{t}^{+})^{\prime}\widetilde{f}^{+}_{t}/\sigma_{t}^{2}=(\widetilde{f}_{t}^{+})^{\prime}(I_{r+1}/\sigma_{t}^{2})\widetilde{f}^{+}_{t}$ in the expression for $\|h\|_{\mathbb{H}}^{2}$ .

To see this, fix $s\in\mathbb{N}$ , with $s\geq 1$ . For $h=(h_{0},h_{1},...,h_{s},...)\in\mathbb{H}$ , consider the parameter sequence,

\phi_{NT,s}(h):=f_{s}^{0+}+N^{-1/2}\widetilde{f}_{s}^{+}=f_{s}^{0+}+N^{-1/2}\sigma_{s}h_{s},\quad\phi_{NT,s}(0)=f_{s}^{0+}

so $\sqrt{N}[\phi_{NT,s}(h)-\phi_{NT,s}(0)]=\sigma_{s}h_{s}$ . If we define $\dot{\phi}_{s}(h)=\sigma_{s}h_{s}$ , then

\sqrt{N}[\phi_{NT,s}(h)-\phi_{NT,s}(0)]=\dot{\phi}_{s}(h).

Since $\dot{\phi}_{s}$ is a coordinate projection map (multiplied by a positive constant $\sigma_{s}$ ), it is a continuous linear map, $\dot{\phi}_{s}:\mathbb{H}\rightarrow\mathbb{R}^{r+1}$ . Its adjoint map $\dot{\phi}_{s}^{*}:\mathbb{R}^{r+1}\rightarrow\mathbb{H}$ (both spaces are self-dual) is the inclusion map (i.e., embedding): $\dot{\phi}_{s}^{*}x=(0,...,0,\sigma_{s}x,0,...)\in\mathbb{H}$ , for all $x\in\mathbb{R}^{r+1}$ . The adjoint map satisfies $\langle\dot{\phi}_{s}^{*}x,h\rangle=\sigma_{s}x^{\prime}h_{s}=x^{\prime}\dot{\phi}_{s}(h)=\langle x,\dot{\phi}_{s}(h)\rangle$ . Let $Z$ denote the limiting distribution of efficient estimators of $f_{s}^{0+}$ . Theorem 3.11.2 in van der Vaart and Wellner ([20], p.414) show that $x^{\prime}Z\sim\mathcal{N}(0,\|\dot{\phi}_{s}^{*}x\|_{\mathbb{H}}^{2})$ for all $x\in\mathbb{R}^{r+1}$ . But $\|\dot{\phi}_{s}^{*}x\|_{\mathbb{H}}^{2}=\sigma_{s}^{2}x^{\prime}x$ . It follows that $Z\sim\mathcal{N}(0,\sigma_{s}^{2}I_{r+1})$ . Thus the efficiency bound for regular estimators of $f_{s}^{0+}$ is $\sigma_{s}^{2}I_{r+1}$ . For $s=0$ , the same argument shows that the efficiency bound for regular estimators of $\alpha^{0}$ is $1/(\gamma+\nu+\mu)$ . In summary, we have

Corollary 4.

Under the assumptions of Theorem 2, the asymptotic efficiency bound for regular estimators of $\alpha^{0}$ is $1/(\gamma+\nu+\mu)$ , and the efficiency bound for regular estimators of $f_{t}^{0+}$ is $\sigma_{t}^{2}I_{r+1}$ .

It can be shown that the efficiency bound $1/(\gamma+\nu+\mu)$ corresponds to the case in which the incidental parameters $(\delta^{0},F^{0})$ are known, not estimated, thus the implied efficiency bound is too low to be attainable. The implication is that the $\ell^{2}$ perturbation is “too small”. Intuitively, the smaller the local parameter space, the lower the efficiency bound. Thus it is harder to achieve the implied bound, unless the estimation is done within the given local parameter space. But estimators, in general, are not constructed in reference to local parameter spaces.

When the same model is estimated by the fixed effects method (that is, the $\lambda_{i}$ ’s are also treated as parameters), Iwakura and Okui [11] show that the $\ell^{2}$ perturbation is a suitable choice, and that the corresponding efficiency bound for $\alpha^{0}$ is $1/\gamma$ (the authors confined their analysis to the homoskedastic case so $1/\gamma=1-\alpha^{2}$ ). To the method of QMLE, however, there is no sufficient variation in the $\ell^{2}$ perturbation, thus implying a smaller bound. Which is to say that QMLE is a more efficient estimation procedure than the fixed effects approach. This finding is consistent with the result that even under fixed $T$ , QMLE provides a consistent estimator for $\alpha^{0}$ , but fixed effects estimator is not consistent, see [17] and [6].

To recap, the $\ell^{2}$ local parameter space is “too small” for the QMLE, but is the suitable local parameter space for the fixed effects approach. This implies that, as explained earlier, QMLE is a better procedure than the fixed effects method. By analyzing the local likelihood ratios, we are able to inform the merits of different estimators that are otherwise hard to discern based on the usual asymptotics alone (e.g., limiting distributions).

4 Concluding remarks

We derive the efficiency bound for estimating dynamic panel models with interactive effects by treating the model as a simultaneous equations system. We show that quasi-maximum likelihood method applied to the system attains the efficiency bound. These results are obtained under an increasing number of incidental parameters. Contrasts with the fixed effects estimators are made. In particular, the analysis shows that the system QMLE is a preferred method to the fixed effects estimators.

Proof of Results

We focus on the local parameter space for $\widetilde{f}\in\ell_{r}^{\infty}$ , and $\{\widetilde{\delta}_{t}\}\in\ell_{1}^{\infty}$ . The analysis for the sequence to be in the $\ell^{2}$ space is a special case. Let $G=F+\frac{1}{\sqrt{NT}}\widetilde{F}$ . We drop the superscript 0 associated with true parameters to make the notation less cumbersome. When evaluated at the true parameters we have the equality $By_{i}-\delta=F\lambda_{i}+\varepsilon_{i}$ . Thus

\ell(\theta^{0})=-\frac{N}{2}\log|FF^{\prime}+D|-\frac{1}{2}\sum_{i=1}^{N}(F\lambda_{i}+\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})

and $\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})$ is equal to

	$\displaystyle\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})$	$\displaystyle=-\frac{N}{2}\log\|GG^{\prime}+D\|$
		$\displaystyle-\frac{1}{2}\sum_{i=1}^{N}\Big{(}y_{i}-(\delta+\frac{1}{\sqrt{NT}}\widetilde{\delta})-(\alpha+\frac{1}{\sqrt{NT}}\widetilde{\alpha})y_{i,-1}\Big{)}^{\prime}(GG^{\prime}+D)^{-1}$
		$\displaystyle\quad\times\Big{(}y_{i}-(\delta+\frac{1}{\sqrt{NT}}\widetilde{\delta})-(\alpha+\frac{1}{\sqrt{NT}}\widetilde{\alpha})y_{i,-1}\Big{)}$
		$\displaystyle=-\frac{N}{2}\log\|GG^{\prime}+D\|-\frac{1}{2}\sum_{i=1}^{N}\Big{(}F\lambda_{i}+\varepsilon_{i}-\frac{1}{\sqrt{NT}}\widetilde{\delta}-\widetilde{\alpha}\frac{1}{\sqrt{NT}}y_{i,-1}\Big{)}^{\prime}(GG^{\prime}+D)^{-1}$
		$\displaystyle\quad\times\Big{(}F\lambda_{i}+\varepsilon_{i}-\frac{1}{\sqrt{NT}}\widetilde{\delta}-\widetilde{\alpha}\frac{1}{\sqrt{NT}}y_{i,-1}\Big{)}$

where $y_{i,-1}=Jy_{i}=(0,y_{i1},y_{i2},...,y_{iT-1})^{\prime}$ . Thus, the difference is

$\displaystyle\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})$	$\displaystyle-\ell(\theta^{0})=$
$\displaystyle-$	$\displaystyle\frac{N}{2}\Big{[}\log\|GG^{\prime}+D\|-\log\|FF^{\prime}+D\|\Big{]}$
$\displaystyle-$	$\displaystyle\frac{1}{2}\sum_{i=1}^{N}(F\lambda_{i}+\varepsilon_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}(F\lambda_{i}+\varepsilon_{i})$
$\displaystyle+$	$\displaystyle\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$	(20)
$\displaystyle-$	$\displaystyle\frac{1}{2}\widetilde{\alpha}^{2}\frac{1}{NT}\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1}$
$\displaystyle+$	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$
$\displaystyle-$	$\displaystyle\widetilde{\alpha}\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1}$
$\displaystyle-$	$\displaystyle\frac{1}{2}\frac{1}{T}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}\widetilde{\delta}$

Throughout, we use the matrix inversion formula (Woodbury formula)

(FF^{\prime}+D)^{-1}=D^{-1}-D^{-1}F(I_{r}+F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}

(FF^{\prime}+D)^{-1}F=D^{-1}F(I_{r}+F^{\prime}D^{-1}F)^{-1}

and the matrix determinant result

|FF^{\prime}+D|=|D||I_{r}+F^{\prime}D^{-1}F|

From now on, we assume $r=1$ to simply the derivation. We define $\omega_{F}^{2}$ as

\omega_{F}^{2}=\frac{1}{T}F^{\prime}D^{-1}F

We start with a number of lemmas. The first few do not involve any random quantities, and are results of matrix algebra and Taylor expansions.

Lemma 1.

For $G=F+\frac{1}{\sqrt{NT}}\widetilde{F}$ ,

-\frac{N}{2}\Big{[}\log|GG^{\prime}+D|-\log|FF^{\prime}+D|\Big{]}=-\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}+o(1)

(21)

Proof: Notice

\begin{split}|GG^{\prime}+D|=&|(F+\frac{1}{\sqrt{NT}}\widetilde{F})(F+\frac{1}{\sqrt{NT}}\widetilde{F})^{\prime}+D|\\ =&|D|[1+(F+\frac{1}{\sqrt{NT}}\widetilde{F})^{\prime}D^{-1}(F+\frac{1}{\sqrt{NT}}\widetilde{F})]\\ =&|D|\Big{(}1+F^{\prime}D^{-1}F+2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\Big{)}\\ =&|D|(1+F^{\prime}D^{-1}F)\Big{[}1+2\frac{1}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{2}}+\frac{1}{NT^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{2}}\Big{]}+R_{NT}\\ \end{split}

where $R_{NT}$ is negligible. From $\log(1+x)\approx x$ ,

\log|GG^{\prime}+D|-\log|FF^{\prime}+D|=\log\Big{[}1+2\frac{1}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+\frac{1}{NT^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}\Big{]}

=2\frac{1}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+\frac{1}{NT^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+R_{NT1}

where $R_{NT1}$ is a higher order remainder term. Thus

	$\displaystyle-\frac{N}{2}\Big{[}\log\|GG^{\prime}+D\|-\log\|FF^{\prime}+D\|\Big{]}$
	$\displaystyle=-\frac{N}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}-\frac{1}{2T^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+o(1)$

The second term on the right is negligible. This proves Lemma 1. $\Box$

Lemma 2.

Let $H:=(1+G^{\prime}D^{-1}G)^{-1}-(1+F^{\prime}D^{-1}F)^{-1}$ . Then

	$\displaystyle H=$	$\displaystyle-2\frac{1}{T^{2}}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}-\frac{1}{NT^{3}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}$		(22)
		$\displaystyle+4\frac{1}{T^{3}}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{6}}+4(\frac{F^{\prime}D^{-1}\widetilde{F}}{T})^{2}\frac{1}{NT^{2}\omega_{F}^{6}}+R_{1}$

with $R_{1}$ being negligible (note this order of expansion is necessary).

Proof:

	$\displaystyle(GG^{\prime}+D)^{-1}$	$\displaystyle=D^{-1}-D^{-1}G(1+G^{\prime}D^{-1}G)^{-1}G^{\prime}D^{-1}$
	$\displaystyle(1+G^{\prime}D^{-1}G)$	$\displaystyle=\Big{(}1+F^{\prime}D^{-1}F+2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\Big{)}$
		$\displaystyle=(1+F^{\prime}D^{-1}F)\Big{(}1+\frac{2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}}{1+F^{\prime}D^{-1}F}\Big{)}$
	$\displaystyle(1+G^{\prime}D^{-1}G)^{-1}$	$\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}\Big{(}1+\frac{2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}}{1+F^{\prime}D^{-1}F}\Big{)}^{-1}$

Let $A=2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}$ and use the expansion $1/(1+x)\simeq 1-x+x^{2}$ , we have

	$\displaystyle(1+G^{\prime}D^{-1}G)^{-1}$
	$\displaystyle\approx(1+F^{\prime}D^{-1}F)^{-1}\Big{(}1-\frac{A}{1+F^{\prime}D^{-1}F}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{2}}\Big{)}$
	$\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{(1+F^{\prime}D^{-1}F)^{2}}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}$
	$\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{T^{2}\omega_{F}^{4}(1+\frac{1}{T\omega_{F}^{2}})^{2}}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}$
	$\displaystyle\approx(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{T^{2}\omega_{F}^{4}}(1-2\frac{1}{T\omega_{F}^{2}})+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}$
	$\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{T^{2}\omega_{F}^{4}}+2\frac{A}{T^{3}\omega_{F}^{6}}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}$

The above can be further approximated. For the third expression, we keep the first term of $A$ (i.e., $2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}$ ), and for the last expression, we keep the first term of $A^{2}$ (i.e, $4\frac{1}{NT}(F^{\prime}D^{-1}\widetilde{F})^{2}$ ), other terms are negligible. The denominator of the last expression is treated as $T^{3}\omega_{F}^{6}$ . This gives the lemma. $\Box$

Lemma 3.

The following $T\times T$ matrix $\Xi$ satisfies

\Xi:=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}=-\Xi_{a}-\Xi_{b}-\Xi_{c}-\Xi_{d}+R

where

	$\displaystyle\Xi_{a}=HD^{-1}FF^{\prime}D^{-1}$		(23)
	$\displaystyle\Xi_{b}=\Big{[}\frac{1}{\sqrt{NT}}\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}-2\frac{1}{NT^{3}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}\Big{]}D^{-1}F\widetilde{F}^{\prime}D^{-1}$		(24)
	$\displaystyle\Xi_{c}=\Big{[}\frac{1}{\sqrt{NT}}\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}-2\frac{1}{NT^{3}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}\Big{]}D^{-1}\widetilde{F}F^{\prime}D^{-1}$		(25)
	$\displaystyle\Xi_{d}=\frac{1}{NT^{2}}\frac{1}{\omega_{F}^{2}}D^{-1}\widetilde{F}\widetilde{F}^{\prime}D^{-1}$		(26)

where $H$ is defined in Lemma 2, and R is a negligible higher order term with $\|R\|_{sp}=o(1/(NT))$ .

Proof: From the Woodbury formula

-\Xi=D^{-1}G(1+G^{\prime}D^{-1}G)^{-1}G^{\prime}D^{-1}-D^{-1}F(1+F^{\prime}D^{-1}F)F^{\prime}D^{-1}

we can write

G(1+G^{\prime}D^{-1}G)^{-1}G^{\prime}=(1+F^{\prime}D^{-1}F)^{-1}GG^{\prime}+HGG^{\prime}

(27)

where $H$ is defined in Lemma 2. The first term on the right hand side above is

	$\displaystyle(1+F^{\prime}D^{-1}F)GG^{\prime}$	$\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}\Big{(}FF^{\prime}+\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}\Big{)}$
		$\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}FF^{\prime}$
		$\displaystyle+(1+F^{\prime}D^{-1}F)^{-1}\Big{(}\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}\Big{)}$

Use

(1+F^{\prime}D^{-1}F)^{-1}=\frac{1}{T}\frac{1}{\omega_{F}^{2}}-\frac{1}{T^{2}}\frac{1}{\omega_{F}^{4}}+R_{2}

where $R_{2}=O(1/T^{3})$ is negligible, we have

$\displaystyle(1+F^{\prime}D^{-1}F)^{-1}$	$\displaystyle GG-(1+F^{\prime}D^{-1}F)^{-1}FF^{\prime}$
	$\displaystyle=\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}\Big{[}\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}\Big{]}+R_{3}$	(28)
	$\displaystyle=\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}\Big{[}\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}\Big{]}+\frac{1}{\omega_{F}^{2}}\frac{1}{NT^{2}}\widetilde{F}\widetilde{F}^{\prime}+R_{4}$

The last equality follows by ignoring higher order terms; $R_{3}$ and $R_{4}$ are negligible. To see this, $R_{3}$ is equal to $R_{2}$ multiplied the three matrices in the bracket of (28). But the Frobenius norm of each matrix is $O(T/\sqrt{NT})$ and $R_{2}=O(1/T^{3})$ so the product is $O(1/(T^{2}\sqrt{NT}))$ . $R_{4}$ is equal to $R_{3}+O(1/(T^{2}N))$ . To analyze $HGG^{\prime}$ in (27), we use $GG^{\prime}=FF^{\prime}+\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}$ , but we can ignore $\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}$ because its product with $H$ is a higher order term. Thus

HGG^{\prime}=HFF^{\prime}+H\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+H\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+R_{5}

(29)

where $H$ is given in (22). All of the four terms in $H$ are non-negligible for the matrix $HFF^{\prime}$ ; only the first term in $H$ is non-negligible for $H\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}$ and $H\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}$ . Thus, combining (28) and (29) and pre and post multiplying $D^{-1}$ , we obtain the lemma. $\Box$ .

Lemma 4.

	$\displaystyle-\frac{1}{2}$	$\displaystyle\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}F\lambda_{i}$
		$\displaystyle=\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}-\frac{1}{2}\frac{1}{T}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}+o_{p}(1)$

where $D^{-1/2}M_{D^{-1/2}F}D^{-1/2}=D^{-1}-D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}$ .

Note the first term has an opposite sign with (21).

Proof: By Lemma 3, consider

\begin{split}\sum_{i=1}^{N}(F\lambda_{i}^{\prime})^{\prime}\Xi_{a}(F\lambda_{i})=&-H\sum_{i=1}^{N}(F\lambda_{i})^{\prime}D^{-1}FF^{\prime}D^{-1}(F\lambda_{i})\\ =&2\sqrt{NT}(F^{\prime}D^{-1}\widetilde{F}/T)\\ &+(\widetilde{F}^{\prime}D^{-1}\widetilde{F}/T)\\ &-4\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}\\ &-4(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1)\\ \end{split}

(30)

We have used the fact that $\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1+o_{p}(1)$ . Next,

	$\displaystyle\sum_{i=1}^{N}(F\lambda_{i}^{\prime})^{\prime}(\Xi_{b}+\Xi_{c})(F\lambda_{i})=$	$\displaystyle-2\sqrt{NT}(\widetilde{F}^{\prime}D^{-1}F/T)+2\sqrt{\frac{N}{T}}(\widetilde{F}^{\prime}D^{-1}F/T)\frac{1}{\omega_{F}^{2}}$
		$\displaystyle+4(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1)$

and

\sum_{i=1}^{N}(F\lambda_{i}^{\prime})^{\prime}\Xi_{d}(F\lambda_{i})=(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1).

Summing up terms gives

	$\displaystyle-\frac{1}{2}\sum_{i=1}^{N}$	$\displaystyle(F\lambda_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}F\lambda_{i}$
		$\displaystyle=\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}-\frac{1}{2}(\widetilde{F}^{\prime}D^{-1}\widetilde{F}/T)+\frac{1}{2}(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1)$

we can rewrite

(\widetilde{F}^{\prime}D^{-1}\widetilde{F}/T)-(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}=\frac{1}{T}\Big{[}\widetilde{F}^{\prime}D^{-1}\widetilde{F}-\widetilde{F}^{\prime}D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}\widetilde{F}\Big{]}

=\frac{1}{T}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}

This gives Lemma 4. $\Box$ .

Lemma 5.

-\frac{1}{2}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}\varepsilon_{i}=o_{p}(1)

(31)

Proof: By the notation of Lemma 3, we evaluate $\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi\varepsilon_{i}$ . Note that $\Xi$ consists of four parts. For this lemma, it is sufficient to approximate $\Xi$ by $\Xi_{1}+\Xi_{2}+\Xi_{3}$ , where

$\displaystyle\Xi_{1}$	$\displaystyle=\Big{(}2\frac{1}{T^{2}}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}\Big{)}D^{-1}FF^{\prime}D^{-1}$
$\displaystyle\Xi_{2}$	$\displaystyle=-\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}F\widetilde{F}^{\prime}D^{-1}$	(32)
$\displaystyle\Xi_{3}$	$\displaystyle=-\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}\widetilde{F}F^{\prime}D^{-1}$

In the above approximation, we kept the first term of $H$ in (23) ( $H$ has four terms), and kept the very first term inside the brackets in (24) and (25). All other terms are negligible in the evaluation of $\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi\varepsilon_{i}$ . Using trace, it is easy to obtain the expected values

\mathbb{E}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi_{1}\varepsilon_{i}=2\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}

And

\mathbb{E}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}(\Xi_{2}+\Xi_{3})\varepsilon_{i}=-2\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}

Thus the sum of the expected values is zero. In addition, the deviation of each term from its expected value is $o_{p}(1)$ . This is because the variance of each term is $O(1/T)=o(1)$ . For example, $\mbox{var}(\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi_{2}\varepsilon_{i})=\mathrm{tr}(\Xi_{2}D\Xi_{2}D)+\mathrm{tr}(\Xi_{2}^{\prime}D\Xi D)=\frac{1}{T}(\frac{1}{T}\widetilde{F}^{\prime}D^{-1}F)^{2}\frac{1}{\omega_{F}^{4}}+\frac{1}{T}(\frac{1}{T}\widetilde{F}^{\prime}D^{-1}F)(\frac{1}{T}F^{\prime}D^{-1}F)\frac{1}{\omega_{F}^{4}}=O(1/T)$ . We have used the fact that for normal $\varepsilon_{i}\sim N(0,D)$ , $\mbox{var}(\varepsilon_{i}^{\prime}A\varepsilon_{i})=\mathrm{tr}(ADAD)+\mathrm{tr}(A^{\prime}DAD)$ for any $A$ , This proves the lemma. $\Box$ .

Lemma 6.

\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}\varepsilon_{i}=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}+o_{p}(1)

Proof: Recall $\Xi=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}$ . The preceding approximation of $\Xi$ by $\Xi_{1}+\Xi_{2}+\Xi_{3}$ in (32) is sufficient (other terms are negligible). We evaluate $\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{k}\varepsilon_{i}$ $(k=1,2,3)$ .

	$\displaystyle\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{1}\varepsilon_{i}$	$\displaystyle=2(F^{\prime}D^{-1}\widetilde{F}/T)(F^{\prime}D^{-1}F/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}\frac{1}{\omega_{F}^{4}}$
		$\displaystyle=2\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}$

\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{2}\,\varepsilon_{i}=-\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}F/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}

and

\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{3}\,\varepsilon_{i}=-\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}

Combining the three expressions

\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi\varepsilon_{i}=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}+\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}+o_{p}(1)

Using the definition of $M_{D^{-1/2}F}$ and $\omega_{T}^{2}$ and rewriting the RHS gives the lemma. $\Box$

Corollary 5.

	$\displaystyle-\frac{1}{2}\sum_{i=1}^{N}(F\lambda_{i}+\varepsilon_{i})^{\prime}$	$\displaystyle\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}(F\lambda_{i}+\varepsilon_{i})=\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}$
		$\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}$
		$\displaystyle-\frac{1}{2}\frac{1}{T}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}+o_{p}(1)$

Proof: This follows by combining the results of Lemmas 4, 5, and 6. $\Box$

Lemma 7.

$\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}$	$\displaystyle(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\delta^{\prime}L^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$
	$\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}$	(33)
	$\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}+o_{p}(1)$

Proof: Using $y_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}$ , the left hand side equals the sum of the first term on the right hand side and the following two expressions

\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}F\lambda_{i}+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}\varepsilon_{i}:=a+b

where $a$ is defined as the frist term, $b$ as the second term. From the formula,

(FF^{\prime}+D)^{-1}F=D^{-1}F({1+F^{\prime}D^{-1}F})^{-1}=D^{-1}F(1+T\omega_{F}^{2})^{-1}

term $a$ equals

	$\displaystyle a$	$\displaystyle=\frac{\sqrt{N/T}}{1+T\omega_{F}^{2}}(\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2})(F^{\prime}L^{\prime}D^{-1}F)+\frac{1}{1+T\omega_{F}^{2}}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}L^{\prime}D^{-1}F\lambda_{i}$
		$\displaystyle=\frac{\sqrt{N/T}}{1+T\omega_{F}^{2}}(F^{\prime}L^{\prime}D^{-1}F)+o_{p}(1)$

where we used $\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1$ , and the second term is $O_{p}(1/T)$ .

For term $b$ , use the Woodbury formula,

	$\displaystyle b$	$\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(LF\lambda_{i})^{\prime}D^{-1}\varepsilon_{i}+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}$
		$\displaystyle-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\frac{\lambda_{i}^{\prime}F^{\prime}L^{\prime}D^{-1}F}{(1+F^{\prime}D^{-1}F)}F^{\prime}D^{-1}\varepsilon_{i}-\frac{1}{(1+F^{\prime}D^{-1}F)}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}L^{\prime}D^{-1}FF^{\prime}D^{-1}\varepsilon_{i}$

Note the expected value of the last term in the preceding equation is

-\frac{F^{\prime}L^{\prime}D^{-1}F}{1+F^{\prime}D^{-1}F}\sqrt{\frac{N}{T}}

and the deviation from its expected value is negligible (because its variance is $O(1/T)=o(1)$ , following from the same argument in Lemma 5). The above expected value cancels out with term $a$ . Thus, we can rewrite

a+b=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}+o_{p}(1)

where we have used $1/(1+F^{\prime}D^{-1}F)\approx 1/(F^{\prime}D^{-1}F)$ . This proves Lemma 7.

Lemma 8.

	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}$	$\displaystyle\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}(F\lambda_{i}+\varepsilon_{i})$		(34)
		$\displaystyle=-\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}+o_{p}(1)$

Proof: It is not difficult to show

\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}\varepsilon_{i}=o_{p}(1)

we thus focus on $\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi F\lambda_{i}$ , where $\Xi=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}$ . Approximating $\Xi$ by (32) is sufficient.

Using $y_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}$ , we have

	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi_{1}F\lambda_{i}$	$\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta+LF\lambda_{i}+L\varepsilon_{i})^{\prime}\Big{(}2\frac{1}{T^{2}}\frac{1}{\sqrt{NT}}(F^{\prime}D^{-1}\widetilde{F})\frac{1}{\omega_{F}^{4}}\Big{)}D^{-1}(FF^{\prime})D^{-1}F\lambda_{i}$
		$\displaystyle=2(F^{\prime}L^{\prime}D^{-1}F/T)(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}+o_{p}(1)$

where the two terms involving $L\delta$ and $L\varepsilon_{i}$ are negligible, and each is $O_{p}(N^{-1/2})$ . Here we have used $F^{\prime}D^{-1}F=T\omega_{F}^{2}$ and $\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1+o_{p}(1)$ . Next,

	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi_{2}F\lambda_{i}$	$\displaystyle=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta+LF\lambda_{i}+L\varepsilon_{i})^{\prime}\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}(F\widetilde{F}^{\prime})D^{-1}F\lambda_{i}$
		$\displaystyle=-\frac{1}{\omega_{F}^{2}}(F^{\prime}L^{\prime}D^{-1}F/T)(\widetilde{F}^{\prime}D^{-1}F/T)+o_{p}(1),$

where terms involving $L\delta$ and $L\varepsilon_{i}$ are negligible. Next

	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi_{3}F\lambda_{i}$	$\displaystyle=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta+LF\lambda_{i}+L\varepsilon_{i})^{\prime}\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}(\widetilde{F}F^{\prime})D^{-1}F\lambda_{i}$
		$\displaystyle=-(F^{\prime}L^{\prime}D^{-1}\widetilde{F}/T)+o_{p}(1)$

which follows from the same reasoning as given for the term involving $\Xi_{1}$ . Summing up,

\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi(F\lambda_{i}+\varepsilon_{i})

\displaystyle=\frac{1}{\omega_{F}^{2}}(F^{\prime}L^{\prime}D^{-1}F/T)(\widetilde{F}^{\prime}D^{-1}F/T)-(F^{\prime}L^{\prime}D^{-1}\widetilde{F}/T)+o_{p}(1)

Using the definition of $M_{D^{-1/2}F}$ and rewriting give the Lemma. $\Box$

Corollary 6.

\begin{split}\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}&(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})\\ &=\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}\\ &+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}\\ &-\widetilde{\alpha}\,\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}\\ &+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\delta^{\prime}L^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})+o_{p}(1)\end{split}

(35)

Proof: adding and subtracting terms

	$\displaystyle\widetilde{\alpha}\frac{1}{\sqrt{NT}}$	$\displaystyle\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$
		$\displaystyle=\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi(F\lambda_{i}+\varepsilon_{i})$

where, by definition, $\Xi=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}$ . The corollary follows from Lemmas 7 and 8, that is, by summing (33) and (34), where every term is multiplied by $\widetilde{\alpha}$ . $\Box$

Lemma 9.

$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1}$	$\displaystyle=\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}L\delta$
	$\displaystyle+\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)$	(36)
	$\displaystyle+\frac{1}{T}\mathrm{tr}[(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)]$

Proof: Here it is sufficient to approximate $(GG^{\prime}+D)^{-1}$ by $(FF^{\prime}+D)^{-1}$ (recall $(GG^{\prime}+D)^{-1}=(FF^{\prime}+D)^{-1}+\Xi$ , terms involving $\Xi$ are negligible). Using $y_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}$ , we rewrite

	$\displaystyle\frac{1}{NT}$	$\displaystyle\sum_{i=1}^{N}y_{i,-1}^{\prime}(FF^{\prime}+D)^{-1}y_{i,-1}=\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}L\delta$
		$\displaystyle+2\frac{1}{NT}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$
		$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}(LF\lambda_{i}+L\varepsilon_{i})$

The second term is negligible (Its mean is zero, its variance converges to zero). Consider the third term. Using the Wooldbury formula, we rewrite the third term as

	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}D^{-1}(LF\lambda_{i}+L\varepsilon_{i})$
	$\displaystyle-\frac{1}{(1+F^{\prime}D^{-1}F)}\frac{1}{NT}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}D^{-1}FF^{\prime}D^{-1}(LF\lambda_{i}+L\varepsilon_{i})$
	$\displaystyle:=a-b$

where $a$ and $b$ represent the first and the second expressions, respectively. Notice

a=\frac{1}{T}(LF)^{\prime}D^{-1}LF+\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)+o_{p}(1)

where we have used $\mathbb{E}(\varepsilon_{i}^{\prime}L^{\prime}D^{-1}L\varepsilon_{i})=\mathrm{tr}(L^{\prime}D^{-1}LD)$ and $\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1+o_{p}(1)$ . The cross product terms are negligible. Next,

	$\displaystyle b$	$\displaystyle=\frac{T}{1+T\omega_{T}^{2}}\Big{[}\Big{(}\frac{F^{\prime}L^{\prime}D^{-1}F}{T}\Big{)}^{2}+\frac{1}{T}\Big{(}\frac{F^{\prime}D^{-1}LDL^{\prime}D^{-1}F}{T}\Big{)}\Big{]}+o_{p}(1)$
		$\displaystyle=\frac{1}{\omega_{T}^{2}}\Big{(}\frac{F^{\prime}L^{\prime}D^{-1}F}{T}\Big{)}^{2}+o_{p}(1)$

here we used $\mathbb{E}(\varepsilon_{i}^{\prime}L^{\prime}D^{-1}FF^{\prime}D^{-1}L\varepsilon_{i})=F^{\prime}D^{-1}LDL^{\prime}D^{-1}F$ , and $\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1$ and the cross product terms are negligible. It follows that

	$\displaystyle a-b$	$\displaystyle=\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)+\frac{1}{T}(LF)^{\prime}D^{-1}LF-\frac{1}{\omega_{T}^{2}}\Big{(}\frac{F^{\prime}L^{\prime}D^{-1}F}{T}\Big{)}^{2}+o_{p}(1)$
		$\displaystyle=\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)+\frac{1}{T}(LF)^{\prime}(D^{-1/2}M_{D^{-1/2}F}D^{-1/2})LF+o_{p}(1)$

Combining results we obtain Lemma 9. $\Box$

The next lemma is concerned with the last 3 terms of equation (20).

Lemma 10.

\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})+o_{p}(1)

\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1}=\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta+o_{p}(1)

-\frac{1}{2}\frac{1}{T}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}\widetilde{\delta}=-\frac{1}{2T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}+o_{p}(1)

Proof: Consider (i). It is not difficult to show that replacing $GG^{\prime}+D$ by $FF^{\prime}+D$ generates a negligible term. Consider (ii). From $y_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}$ , we can write the left hand side of (ii) as

\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}L\delta+\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}LF\lambda_{i}+\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}L\varepsilon_{i}

The first term does not depend on $i$ , is equal to $T^{-1}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}L\widetilde{\delta}$ . Replace $G$ by $F$ only generates a negligible term. Thus the first term is $T^{-1}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta+o_{p}(1)$ . By using $\lambda_{i}$ being iid zero mean random variables, we can show the second term is $O_{p}(N^{-1/2})$ . Similarly, the last term is $O_{p}((NT)^{-1/2})$ . Thus the last two terms are negligible. For (iii). Here replacing $G$ by $F$ only generates a negligible term. $\Box$

Proof of Theorem 1. The local likelihood ratio is given by (20). Using Lemma 1, Corollary 5, Corollary 6, Lemma 9 (multiplied by $-\frac{1}{2}\widetilde{\alpha}^{2}$ ), and Lemma 10, we obtain

\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})=A(\widetilde{\theta})+B(\widetilde{\theta})+o_{p}(1)

(37)

where

$\displaystyle A(\widetilde{\theta})$	$\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}$
	$\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}$
	$\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}$
	$\displaystyle-\frac{1}{2}\frac{1}{T}\mathrm{tr}[\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}]$	(38)
	$\displaystyle-\frac{1}{2}\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(L^{\prime}D^{-1}LD)\Big{]}$
	$\displaystyle-\frac{1}{2}\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)\Big{]}$
	$\displaystyle-\widetilde{\alpha}\,\frac{1}{T}\mathrm{tr}[(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}]$
$\displaystyle B(\widetilde{\theta})$	$\displaystyle=\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$
	$\displaystyle-\frac{1}{2}\widetilde{\alpha}^{2}\;\frac{1}{T}\Big{[}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)\Big{]}$
	$\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})$	(39)
	$\displaystyle-\widetilde{\alpha}\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta$
	$\displaystyle-\frac{1}{2T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}$

Note that $A(\widetilde{\theta})$ does not involve $\widetilde{\delta}$ or $\delta$ . Terms related to $\widetilde{\delta}$ and $\delta$ are given in $B(\widetilde{\theta})$ .

Inspecting $A(\widetilde{\theta})$ , the variances of the first three terms are given by the fourth to the sixth terms (times -1/2), respectively. The last term is twice of the covariance between the first and the third term (times -1/2), thus the last term is simply the negative covariance. The second term is uncorrelated with the first and third terms (since $L\varepsilon_{i}$ depends on the lags of $\varepsilon_{it}$ , and $\varepsilon_{it-1}\varepsilon_{it}$ is uncorrelated with $\varepsilon_{ik}$ for all $t$ and $k$ ).

Inspecting $B(\widetilde{\theta})$ , the variance of the first term is given by the second term (times -1/2). The variance of the third term is given by the last term (also times -1/2). The fourth term is the negative covariance between the first and third terms.

Furthermore, the random variables in $A(\widetilde{\theta})$ are uncorrelated with those in $B(\widetilde{\theta})$ , because variables of the form $\lambda_{i}^{\prime}C\varepsilon_{i}$ is uncorrelated with $\lambda_{i}$ and with $\varepsilon_{i}$ ; $(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}$ is uncorrelated with $\lambda_{i}$ and with $\varepsilon_{i}$ ( $L\varepsilon_{i}$ depends on the lags of $\varepsilon_{i}$ and $D$ is diagonal). Hence all variances and covariances are already accounted for and are included in $A(\widetilde{\theta})$ and $B(\widetilde{\theta})$ .

Finally, $\Delta_{NT}(\widetilde{\theta})$ of Theorem 1 is composed of the first three terms of $A(\widetilde{\theta})$ , plus the first and the third terms of $B(\widetilde{\theta})$ (these are the random terms). All the remaining terms constitute $-(1/2)\mathbb{E}\Delta_{NT}(\widetilde{\theta})^{2}$ . This completes the proof of Theorem 1. $\Box$

Proof of Theorem 2. With respect to the local parameters $\widetilde{F}$ , the proof of the previous theorem only uses $\|T^{-1/2}\widetilde{F}\|=O(1)$ . If $\widetilde{f}\in\ell_{r}^{2}$ , then $\|\widetilde{F}\|=O(1)$ . The entire earlier proof holds with $T^{-1/2}\widetilde{F}$ replaced by $\widetilde{F}$ . In particular, equation (38) holds with $T^{-1/2}\widetilde{F}$ replaced by $\widetilde{F}$ (that is, omitting $T^{-1/2}$ ) due to $\|\widetilde{F}\|=O(1)$ . Notice $\widetilde{F}$ appears in three terms on the right hand side of (38), namely, the first, fourth, and the last term. We analyze each of them. The first term of (38) after replacing $T^{-1/2}\widetilde{F}$ with $\widetilde{F}$ is written as

	$\displaystyle\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}$	$\displaystyle=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}$
		$\displaystyle-\mathrm{tr}[\widetilde{F}^{\prime}D^{-1}F(F^{\prime}D^{-1}F)^{-1}\frac{1}{\sqrt{N}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}^{\prime}]$

But the second term on the righthand side is $o_{p}(1)$ , because it can be written as (ignore the trace)

\frac{1}{\sqrt{T}}(\widetilde{F}^{\prime}D^{-1}F)(F^{\prime}D^{-1}F/T)^{-1}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}^{\prime}

Now $\|(F^{\prime}D^{-1}F/T)^{-1}\|=O(1)$ , $\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}^{\prime}=\frac{1}{\sqrt{NT}}\sum_{t=1}^{T}\sum_{i=1}^{N}\frac{1}{\sigma^{2}}f_{t}\lambda_{i}^{\prime}\varepsilon_{it}=O_{p}(1)$ , but

\|\frac{1}{\sqrt{T}}(\widetilde{F}^{\prime}D^{-1}F)\|=\|\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}\widetilde{f}_{t}f_{t}^{\prime}\|\leq\frac{1}{a}M\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\|\widetilde{f}_{t}\|=o(1)

(40)

we have used $\sigma_{t}^{2}\geq a>0$ , and $\|f_{t}\|\leq M$ . To see $\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\|\widetilde{f}_{t}\|=o(1)$ for $\widetilde{f}\in\ell_{r}^{2}$ , notice $\widetilde{f}_{s}\rightarrow 0$ as $s\rightarrow\infty$ , and by the Toeplitz lemma, $\frac{1}{\sqrt{T}}\sum_{t=1}^{\sqrt{T}}\|\widetilde{f}_{t}\|\rightarrow 0$ . By the Cauchy-Schwarz, $\frac{1}{\sqrt{T}}\sum_{t=\sqrt{T}+1}^{T}\|\widetilde{f}_{t}\|\leq(\sum_{t=\sqrt{T}+1}^{T}\|\widetilde{f}_{t}\|^{2})^{1/2}\leq(\sum_{t=\sqrt{T}}^{\infty}\|\widetilde{f}_{t}\|^{2})^{1/2}\rightarrow 0$ (also see [11] for a similar argument). Thus,

\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}+o_{p}(1)

The fourth term of (38) after replacing $\frac{1}{T}\widetilde{F}$ by $\widetilde{F}$ becomes (ignore the -1/2 and the trace),

	$\displaystyle\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}$	$\displaystyle=\widetilde{F}^{\prime}D^{-1}\widetilde{F}-\widetilde{F}^{\prime}D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}\widetilde{F}$
		$\displaystyle=\widetilde{F}^{\prime}D^{-1}\widetilde{F}+o(1)$

owing to $(F^{\prime}D^{-1}F/T)^{-1}=O(1)$ and $\widetilde{F}^{\prime}D^{-1}F/T^{1/2}=o(1)$ due to (40). The last term of (38) being $o(1)$ with $\widetilde{F}$ in place of $\frac{1}{\sqrt{T}}\widetilde{F}$ follows from the same argument.

The same analysis is applicable to terms involving $\widetilde{\delta}$ with $\|\widetilde{\delta}\|=O(1)$ . Analogous to the proof of (40), using Toeplitz Lemma, we can show

T^{-1/2}(\widetilde{\delta}^{\prime}D^{-1}F)=o(1),\quad T^{-1/2}(\widetilde{\delta}^{\prime}D^{-1}L\delta)=o(1).

(41)

There are three terms in $B(\widetilde{\theta})$ relating to $\widetilde{\delta}$ , see (39). We analyze each. Replacing $\frac{1}{\sqrt{T}}\widetilde{\delta}$ by $\widetilde{\delta}$ , we will show

\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}D^{-1}\varepsilon_{i}+o_{p}(1).

(42)

Using $(FF^{\prime}+D)^{-1}F=D^{-1}F(I_{r}+F^{\prime}D^{-1}F)^{-1}$ , the left hand side above is

(\widetilde{\delta}^{\prime}D^{-1}F)(I_{r}+F^{\prime}D^{-1}F)^{-1}(\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i})+\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\varepsilon_{i}

The first term is bounded by $o_{p}(T^{-1/2})$ because of (41), $\|(I_{r}+F^{\prime}D^{-1}F)^{-1}\|=O(1/T)$ , and $\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}=O_{p}(1)$ . For the second term, using Woodbury formula, and (41)

\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\varepsilon_{i}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}D^{-1}\varepsilon_{i}+o_{p}(1).

This proves (42). Next, for the second term in $B(\widetilde{\theta})$ that depends on $\widetilde{\delta}$ (replacing $T^{-1}\widetilde{\delta}$ by $\widetilde{\delta}$ )

T^{-1/2}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta=o(1)

This follows from the Woodbury formula and (41). Finally, the last term becomes (after replacing $T^{-1/2}\widetilde{\delta}$ by $\widetilde{\delta}$ )

\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}=\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta}-(\widetilde{\delta}D^{-1}F)(I_{r}+F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}\widetilde{\delta}

the second term is negligible because $T^{-1/2}(\widetilde{\delta}^{\prime}D^{-1}F)=o(1)$ and $(I_{r}+F^{\prime}D^{-1}F)^{-1}=O(T^{-1})$ . Thus $\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}=\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta}+o(1)$ .

Collecting the simplified and the non-negligible terms, we obtain the expressions in Theorem 2. $\Box$

References

[1] Alvarez, J. and M. Arellano (2022). Robust likelihood estimation of dynamic panel data models. Journal of Econometrics, 226, 21-61.
[2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
[3] Anderson, T.W., and C. Hsiao (1982). Formulation and estimation of dynamic Models with Error Components, Journal of Econometrics, 76, 598-606.
[4] Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations, The Review of Economic Studies 58 (2), 277-297.
[5] Bai, J. (2009). Panel data models with interactive fixed effects, Econometrica, 77 1229-1279.
[6] Bai, J. (2013). Fixed effects dynamic panel data models, a factor analytical method, Econometrica, 81, 285-314.
[7] Bai, J. (2024). Likelihood approach to dynamic panel models with interactive effects. Journal of Econometrics. Vol 240, Issue 1. https://doi.org/10.1016/j.jeconom.2023.105636
[8] Bai, J. and K.P. Li (2012). Statistical analysis of factor models of high dimension. Annals of Statistics, 40, 436-465.
[9] Bai, J. and K.P. Li (2014). Theory and methods for panel data models with interactive effects. The Annals of Statistics Vol. 42, No. 1, 142-170.
[10] Hahn, J., and G. Kuersteiner (2002). “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both n and T Are Large,” Econometrica, 70, 1639-1657.
[11] Iwakura, H. and R. Okui (2014). Asymptotic efficiency in factor models and dynamic panel data models, Institute of Economic Research, Kyoto University, Available at SSRN 2395722
[12] Jankova J. and S. van de Geer (2018). Semiparametric efficiency bounds for high-dimensional models. The Annals of Statistics 46 (5), 2336-2359.
[13] Kiviet, J. (1995). “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models”, Journal of Econometrics, 68, 53-78.
[14] Lancaster, T. (2000). “The incidental parameter problem since 1948,” Journal of Econometrics, 95 391-413.
[15] Lawley, D.N. and A.E. Maxwell (1971). Factor Analysis as a Statistical Method, London: Butterworth.
[16] Moon H.R. and M. Weidner (2017). “Dynamic Linear Panel Regression Models with Interactive Fixed Effects”, Econometric Theory 33, 158-195.
[17] Moreira, M.J. (2009), A Maximum Likelihood Method for the Incidental Parameter Problem, Annals of Statistics, 37, 3660-3696.
[18] Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects, Econometrica, 49, 1417-1426.
[19] van der Vaart, A.W. (1998). Asymptotics Statistics, Cambridge University Press.
[20] van der Vaart, A.W. and J.A. Wellner (1996). Weak Convergence and Empirical Processes with Applications to Statistics, Springer-Verlag.

Efficiency of QMLE for dynamic panel data models with interactive effects

Abstract

keywords:

keywords:

1 Introduction

2 Assumptions for QMLE

3 Local likelihood ratios and efficiency bound

3.1 Related literature

3.2 The ℓ∞\ell^{\infty} local parameter space

Theorem 1.

3.3 Efficient scores and efficiency bound

Corollary 1.

Corollary 2.

3.4 The ℓ2\ell^{2} local parameter space

Theorem 2.

Corollary 3.

Corollary 4.

4 Concluding remarks

Proof of Results

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Corollary 5.

Lemma 7.

Lemma 8.

Corollary 6.

Lemma 9.

Lemma 10.

References

3.2 The $\ell^{\infty}$ local parameter space

3.4 The $\ell^{2}$ local parameter space