This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Efficiency of QMLE for dynamic panel data models with interactive effects

Jushan Bailabel=e1][email protected] [ Department of Economics, Columbia Universitypresep=, ]e1
Abstract

This paper studies the problem of efficient estimation of panel data models in the presence of an increasing number of incidental parameters. We formulate the dynamic panel as a simultaneous equations system, and derive the efficiency bound under the normality assumption. We then show that the Gaussian quasi-maximum likelihood estimator (QMLE) applied to the system achieves the normality efficiency bound without the normality assumption. Comparison of QMLE with the fixed effects estimators is made.

62H12,
62F12,
Fixed effects,
incidental parameters,
local likelihood ratios,
local parameter space,
regular estimators,
efficiency bound,
factor models,
keywords:
[class=MSC]
keywords:

,

1 Introduction

Consider the dynamic panel data model with interactive effects

yit=αyit1+δt+λift+εity_{it}=\alpha\,y_{it-1}+\delta_{t}+\lambda_{i}^{\prime}f_{t}+\varepsilon_{it} (1)
i=1,2,,N;t=1,,Ti=1,2,...,N;t=1,...,T

where yity_{it} is the outcome variable, λi\lambda_{i} and ftf_{t} are each r×1r\times 1 and both are unobservable, δt\delta_{t} is the time effect, and εit\varepsilon_{it} is the error term. Only yity_{it} are observable. The above model is increasingly used for empirical studies in social sciences. The purpose of this paper is about efficient estimation of the model. We argue that quasi-maximum likelihood estimation is a preferred method. But first, we explain the meaning of QMLE for this model and its motivations.

The index ii is referred to as individuals (e.g., households) and tt as time. If ft=1f_{t}=1 for all tt, and λi\lambda_{i} is scalar, then δt+λift=δt+λi\delta_{t}+\lambda_{i}^{\prime}f_{t}=\delta_{t}+\lambda_{i}, we obtain the usual additive individual and time fixed effects model. Dynamic panel models with additive fixed effects remain the workhorse for empirical research. The product λift\lambda_{i}^{\prime}f_{t} is known as the interactive effects [5], and is more general than additive fixed effects models. The models allow the individual heterogeneities (such as unobserved innate ability, captured by λi\lambda_{i}) to have time varying impact (through ftf_{t}) on the outcome variable yity_{it}. In a different perspective, the models allow common shocks (modeled by ft)f_{t}) to have heterogeneous impact (through λi\lambda_{i}) on the outcome. For many panel data sets, NN is usually much larger than TT because it is costly to keep track of the same individuals over time. Under fixed TT, typical estimation methods such as the least squares do not yield consistent estimation of the model parameters. Consider the special case

yit=αyit1+ci+εity_{it}=\alpha y_{it-1}+c_{i}+\varepsilon_{it} (2)

where cic_{i} are fixed effects. This corresponds to δt=0\delta_{t}=0 and ft=1f_{t}=1 for all tt. Even if cic_{i} are iid, zero mean and finite variance, and independent of εit\varepsilon_{it}, the least squares estimator α^=(ityit12)1ityit1yit\widehat{\alpha}=(\sum_{i}\sum_{t}y_{it-1}^{2})^{-1}\sum_{i}\sum_{t}y_{it-1}y_{it} is easily shown to be inconsistent. When cic_{i} are treated as parameters to be estimated along with α\alpha, the least squares method is still bias, and the order of bias is O(1/T)O(1/T), no matter how large is NN, see [18]. So unless TT goes to infinity, the least squares method remains inconsistent, an issue known as the incidental parameters problem, for example, [13] and [14].

However, provided T3T\geq 3, consistent estimation of α\alpha is possible with the instrumental variables (IV) method. Anderson and Hsiao [3] suggested the IV estimator by solving i=1Nyi1(Δyi3αΔyi2)=0\sum_{i=1}^{N}y_{i1}(\Delta y_{i3}-\alpha\Delta y_{i2})=0, where Δyit=yityit1\Delta y_{it}=y_{it}-y_{it-1}. Differencing the data purges cic_{i}, but introduces correlation between the regressor and the resulting errors, which is why the IV method is used. With TT strictly greater than 3, more efficient IV estimator is suggested by Arellano and Bond [4]. Model (2) can also be estimated by the Gaussian quasi-maximum likelihood method, for example, [1], [6], and [17].

For model (1), differencing cannot remove the interactive effects since Δyit=αΔyit1+Δδt+λiΔft+Δεit\Delta y_{it}=\alpha\,\Delta y_{it-1}+\Delta\delta_{t}+\lambda_{i}^{\prime}\Delta f_{t}+\Delta\varepsilon_{it}. The model can be estimated by the fixed effects approach, treating both λi\lambda_{i} and ftf_{t} as parameters. Just like the least squares for the earlier additive effects model, the fixed effects method will produce bias. Below we introduce the quasi-likelihood approach, similar to [8, 9] for non-dynamic models.

Project the first observation yi1y_{i1} on [1,λi][1,\lambda_{i}] and write yi1=δ1+λif1+εi1y_{i1}=\delta_{1}^{*}+\lambda_{i}^{\prime}f_{1}^{*}+\varepsilon_{i1}^{*}, where (δ1,f1)(\delta_{1}^{*},f_{1}^{*}) is the projection coefficients, and εi1\varepsilon_{i1}^{*} is the projection error. The asterisk variables are different from the true (δ1,f1,εi1)(\delta_{1},f_{1},\varepsilon_{i1}) that generates yi1y_{i1}. This projection is called upon because yi0y_{i0} is not observable.111The first observation starts at t=1t=1, yi0y_{i0} is not available. If yi0y_{i0} were observable we would have yi1=αyi0+δ1+λif1+εi1y_{i1}=\alpha y_{i0}+\delta_{1}+\lambda_{i}^{\prime}f_{1}+\varepsilon_{i1}. But then a projection of yi0y_{i0} on [1,λi][1,\lambda_{i}] would be required. Note that we can drop the superscript * to simplify the notation. This is because we will treat δt\delta_{t} and ftf_{t} as (nuisance and free) parameters, and we do not require εit\varepsilon_{it} to have the same distribution over time. This means we can rewrite yi1y_{i1} as yi1=δ1+λif1+εi1y_{i1}=\delta_{1}+\lambda_{i}^{\prime}f_{1}+\varepsilon_{i1}.

The following notation will be used

yi=[yi1yiT],δ=[δ1δT],F=[f1fT],εi=[εi1εiT]y_{i}=\begin{bmatrix}y_{i1}\\ \vdots\\ y_{iT}\\ \end{bmatrix},\quad\delta=\begin{bmatrix}\delta_{1}\\ \vdots\\ \delta_{T}\\ \end{bmatrix},\quad F=\begin{bmatrix}f_{1}^{\prime}\\ \vdots\\ f_{T}^{\prime}\\ \end{bmatrix},\quad\varepsilon_{i}=\begin{bmatrix}\varepsilon_{i1}\\ \vdots\\ \varepsilon_{iT}\\ \end{bmatrix} (3)

together with the following T×TT\times T matrices,

B=[100α100α1],J=[000100010],L=[00001000α100αT2α10]B=\begin{bmatrix}1&0&\cdots&0\\ -\alpha&1&\cdots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\cdots&-\alpha&1\\ \end{bmatrix},\quad J=\begin{bmatrix}0&0&\cdots&0\\ 1&0&\cdots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\cdots&1&0\\ \end{bmatrix},\quad L=\begin{bmatrix}0&0&\cdots&0&0\\ 1&0&\cdots&0&0\\ \alpha&1&\ddots&0&0\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ \alpha^{T-2}&\cdots&\alpha&1&0\\ \end{bmatrix} (4)

Note that L=JB1L=JB^{-1}. With these notations, we can write the model as

Byi=δ+Fλi+εiBy_{i}=\delta+F\lambda_{i}+\varepsilon_{i}

This gives a simultaneous equations system with TT equations. We assume λi\lambda_{i} are iid, independent of εi\varepsilon_{i}. Without loss of generality, we assume 𝔼(λi)=0\mathbb{E}(\lambda_{i})=0, otherwise, absorb FλF\lambda (where λ=𝔼λi)\lambda=\mathbb{E}\lambda_{i}) into δ\delta. Further assume 𝔼(λiλi)=Ir\mathbb{E}(\lambda_{i}\lambda_{i}^{\prime})=I_{r} (a normalization restriction, where IrI_{r} is an identity matrix). We also assume εi\varepsilon_{i} are iid with zero mean, and

D=var(εi)=diag(σ12,σ22,,σT2)D=\mbox{var}(\varepsilon_{i})=\operatorname*{diag}(\sigma_{1}^{2},\sigma_{2}^{2},...,\sigma_{T}^{2})

These assumptions imply that ByiBy_{i} are iid with mean δ\delta and covariance matrix FF+DFF^{\prime}+D. Consider the Gaussian quasi likelihood function

NT(θ)=N2log|FF+D|12i=1N(Byiδ)(FF+D)1(Byiδ)\ell_{NT}(\theta)=-\frac{N}{2}\log|FF^{\prime}+D|-\frac{1}{2}\sum_{i=1}^{N}(By_{i}-\delta)^{\prime}(FF^{\prime}+D)^{-1}(By_{i}-\delta)

where the Jacobian does not enter since the determinant of BB is 1, where θ=(α,δ,F,σ12,,σT2)\theta=(\alpha,\delta,F,\sigma_{1}^{2},...,\sigma_{T}^{2}). The quasi-maximum likelihood estimator (QMLE) is defined as

θ^=argmaxθNT(θ)\widehat{\theta}=\operatorname*{argmax}_{\theta}\ell_{NT}(\theta)

The asymptotic distribution of this estimator is studied by [7].

An alternative estimator, the fixed effects estimator, treats both λi\lambda_{i} and ftf_{t} as parameters, in addition to α\alpha and δt\delta_{t}. The corresponding likelihood function under normality of εit\varepsilon_{it} is given in (7) below. The fixed effects framework estimates more nuisance parameters (can be substantially more under large NN), the source of incidental parameters problem. Our analysis focuses on QMLE. Comparison of the two approaches will be made based on the local likelihood methods.

The objectives of the present paper are threefold. First, what is the efficiency bound for the system maximum likelihood estimator under normality assumption? Second, does the QMLE attain the normality efficiency bound without the normality assumption? Third, how does QMLE fare in comparison to the fixed effects estimator?

We approach these questions with the Le Cam’s type of analysis. The difficulty lies in the increasing dimension of the parameter space as TT goes to infinity because the number of parameters is of order TT. No sparsity in parameters is assumed. With sparsity, [12] derived efficiency bounds and constructed efficient estimators via regularization for various models. The ability to deal with non-sparsity in the current model relies on panel data.

On notation: A\|A\| denotes the Frobenius norm for matrix (or vector) AA, that is, A=(tr(AA))1/2\|A\|=(\mathrm{tr}(A^{\prime}A))^{1/2}, and Asp\|A\|_{sp} denotes the spectral norm of AA, that is, the square root of the largest eigenvalue of AAA^{\prime}A. Notice ABAspB\|AB\|\leq\|A\|_{sp}\|B\|. The transpose of AA is denoted by AA^{\prime}; |A||A| and tr(A)\mathrm{tr}(A) denote, respectively, its determinant and trace for a square matrix AA.

2 Assumptions for QMLE

We assume |α|<1|\alpha|<1 for asymptotic analysis. The following assumptions are made for the model.

Assumption A

(i) εi\varepsilon_{i} are iid over ii; 𝔼(εit)=0\mathbb{E}(\varepsilon_{it})=0, var(εit)=σt2>0\mbox{var}(\varepsilon_{it})=\sigma_{t}^{2}>0, and εit\varepsilon_{it} are independent over tt; 𝔼εit4M<\mathbb{E}\varepsilon_{it}^{4}\leq M<\infty for all ii and tt.

(ii) The λi\lambda_{i} are iid, independent of εi\varepsilon_{i}, with 𝔼λi=0\mathbb{E}\lambda_{i}=0, 𝔼(λiλi)=Ir\mathbb{E}(\lambda_{i}\lambda_{i}^{\prime})=I_{r}, and 𝔼λi4M\mathbb{E}\|\lambda_{i}\|^{4}\leq M.

(iii) There exist constants aa and bb such that 0<a<σt2<b<0<a<\sigma_{t}^{2}<b<\infty for all tt; 1TFD1F=1Tt=1Tσt2ftftΣff>0\frac{1}{T}F^{\prime}D^{-1}F=\frac{1}{T}\sum_{t=1}^{T}\sigma_{t}^{-2}f_{t}f_{t}^{\prime}\rightarrow\Sigma_{ff}>0.

Two comments are in order for this model. First, Assumption A(ii) assumes λi\lambda_{i} are random variables, but they can be fixed bounded constant. All needed is that ΨN:=1Ni=1N(λiλ¯)(λiλ¯)Ψ>0\Psi_{N}:=\frac{1}{N}\sum_{i=1}^{N}(\lambda_{i}-\bar{\lambda})(\lambda_{i}-\bar{\lambda})^{\prime}\rightarrow\Psi>0 (an r×rr\times r positive definite matrix), where λ¯\bar{\lambda} is the sample average of λi\lambda_{i}. One can normalize the matrix Ψ\Psi to be an identity matrix. Second, FF is determined up to an orthogonal rotation since FF=FR(FR)FF^{\prime}=FR(FR)^{\prime} for RR=IrRR^{\prime}=I_{r}. the rotational indeterminacy can be removed by the normalization that FD1FF^{\prime}D^{-1}F is a diagonal matrix (with distinct elements), see [2] (p.573) and [15] (p.8). Rotational indeterminacy does not affect the estimate for α\alpha, DD, and δ\delta.

Under Assumption A, [7] showed that the QMLE for α^\widehat{\alpha} has the following asymptotic representation

NT(α^α)=(1Ttr(LDLD1))1[1NTi=1N(Lεi)D1εi]+op(1)\sqrt{NT}(\widehat{\alpha}-\alpha)=\Big{(}\frac{1}{T}\mathrm{tr}(LDL^{\prime}D^{-1})\bigg{)}^{-1}\Big{[}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}\Big{]}+o_{p}(1) (5)

where LL, DD, and εi\varepsilon_{i} are defined earlier. Note that 𝔼[(Lεi)D1εi]=tr(L)=0\mathbb{E}[(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}]=\mathrm{tr}(L^{\prime})=0, and 𝔼[(Lεi)D1εi]2=tr(LDLD1)\mathbb{E}[(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}]^{2}=\mathrm{tr}(LDL^{\prime}D^{-1}) with the expression

1Ttr(LDLD1)=1Tt=2T1σt2(σt12+α2σt22++α2(t2)σ12)\frac{1}{T}\mathrm{tr}(LDL^{\prime}D^{-1})=\frac{1}{T}\sum_{t=2}^{T}\frac{1}{\sigma_{t}^{2}}\Big{(}\sigma_{t-1}^{2}+\alpha^{2}\sigma_{t-2}^{2}+\cdots+\alpha^{2(t-2)}\sigma_{1}^{2}\Big{)}

Assume the above converges to γ\gamma, as TT\rightarrow\infty, then we have, as N,TN,T\rightarrow\infty,

NT(α^α)d𝒩(0,1/γ).\sqrt{NT}(\widehat{\alpha}-\alpha)\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1/\gamma).

For the special case of homoskedasticity, (σt2=σ2\sigma_{t}^{2}=\sigma^{2} for all tt), γ=1/(1α2)\gamma=1/(1-\alpha^{2}), and hence NT(α^α)d𝒩(0,1α2)\sqrt{NT}(\widehat{\alpha}-\alpha)\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1-\alpha^{2}). QMLE requires no bias correction, unlike the fixed effects regression. The latter is considered by [5] and [16]. Our objective is to show that 1/γ1/\gamma is the efficiency bound under normality assumption, and QMLE attains the normality efficiency bound. This result is obtained in the presence of increasing number of incidental parameters. The estimator α^\widehat{\alpha} is also consistent under fixed TT in contrast to the fixed effects estimator. The estimated δt,ft,σt2\delta_{t},f_{t},\sigma_{t}^{2} are all N\sqrt{N} consistent and asymptotically normal. In particular, the estimated factors f^t\widehat{f}_{t} have the asymptotic representation, for each t=1,2,,Tt=1,2,...,T

N(f^tft)=1Ni=1Nλiεit+op(1)\sqrt{N}(\widehat{f}_{t}-f_{t})=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}\varepsilon_{it}+o_{p}(1) (6)

This implies N(f^tft)d𝒩(0,σt2Ir)\sqrt{N}(\widehat{f}_{t}-f_{t})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,\sigma_{t}^{2}I_{r}). Details are given in Bai [7]. Also see [8], under the IC3 normalization) for factor models.

3 Local likelihood ratios and efficiency bound

3.1 Related literature

A closely related work is that of Iwakura and Okui [11]. They consider the fixed effects framework instead of the QMLE. The fixed effects estimation procedure treats both λi\lambda_{i} and ftf_{t} as parameters (i=1,2,,N;t=1,2,,T(i=1,2,...,N;t=1,2,...,T), along with α\alpha and δ\delta. The corresponding likelihood function under normality of εit\varepsilon_{it} is222The fixed effects likelihood does not have a global maximum under heteroskedasticity, for example, [2] (p.587), but local maximization is still meaningful. Another solution is to impose homoskedasticity.

fixed effects(θ)=N2t=2Tlogσt212i=1Nt=2T(yitαyi,t1δtλift)2/σt2\ell_{\text{fixed effects}}(\theta)=-\frac{N}{2}\sum_{t=2}^{T}\log\sigma_{t}^{2}-\frac{1}{2}\sum_{i=1}^{N}\sum_{t=2}^{T}(y_{it}-\alpha y_{i,t-1}-\delta_{t}-\lambda_{i}^{\prime}f_{t})^{2}/\sigma_{t}^{2} (7)

The fixed effects estimator for α\alpha will generate bias, similar to the fixed effects estimator for dynamic panels with additive effects. The bias is studied by [16].333 In contrast, the QMLE does not generate bias under fixed TT. Iwakura and Okui [11] derive the efficiency bound for the fixed effects estimators under homoskedasticity (σt2=σ2\sigma_{t}^{2}=\sigma^{2} for all tt). Another closely related work is that of Hahn and Kuersteiner [10]. They consider the efficiency bound problem under the fixed effects framework for the additive effects model described by (2). Throughout this paper, the fixed effects framework refers to methods that also estimate the factor loadings λi\lambda_{i} in addition to ftf_{t}.

In contrast, we consider the likelihood function for the system of equations

(θ)=N2log|FF+D|12i=1N(Byiδ)(FF+D)1(Byiδ)\ell(\theta)=-\frac{N}{2}\log|FF^{\prime}+D|-\frac{1}{2}\sum_{i=1}^{N}(By_{i}-\delta)^{\prime}(FF^{\prime}+D)^{-1}(By_{i}-\delta) (8)

QMLE does not estimate λi\lambda_{i} (even if they are fixed constants as explained earlier), thus eliminating the incidental parameters in the cross-section dimension. The incidental parameters are now δ\delta, FF and DD, and the number of parameters increases with TT. Despite fewer number of incidental parameters, the analysis of local likelihood is more demanding than that of the fixed effects likelihood (7). Intuitively, the fixed effects likelihood (7) is quadratic in FF, but the QMLE likelihood (θ)\ell(\theta) in (8) depends on FF through the inverse matrix (FF+D)1(FF^{\prime}+D)^{-1} and through the log-determinant of this matrix. The high degree of nonlinearity makes the perturbation analysis more challenging. As demonstrated later, the local analysis brings insights regarding the relative merits of the QMLE and the fixed effects estimators.

3.2 The \ell^{\infty} local parameter space

Local likelihood ratio processes are indexed by local parameters. Since the convergence rate for the estimated parameter of α0\alpha^{0} is NT\sqrt{NT}, that is, NT(α^α0)=Op(1)\sqrt{NT}(\widehat{\alpha}-\alpha^{0})=O_{p}(1), it is natural to consider the local parameters of the form

α0+1NTα~\alpha^{0}+\frac{1}{\sqrt{NT}}\widetilde{\alpha}

where α~\widetilde{\alpha}\in\mathbb{R}. However, the consideration of local parameters for ft0f_{t}^{0} is non-trivial, as explained by Iwakura and Okui [11] for the fixed effects likelihood ratio. We consider the following local parameters

ft0+1N(1Tf~t),t=1,2,,Tf_{t}^{0}+\frac{1}{\sqrt{N}}(\frac{1}{\sqrt{T}}\widetilde{f}_{t}),\quad t=1,2,...,T (9)

where f~tM<\|\widetilde{f}_{t}\|\leq M<\infty for all tt with MM arbitrarily given; \|\cdot\| denotes the r-dimensional Euclidean norm. In view that the estimated factor f^t\widehat{f}_{t} is N\sqrt{N} consistent, that is, N(f^tft0)=Op(1)\sqrt{N}(\widehat{f}_{t}-f_{t}^{0})=O_{p}(1), one would expect local parameters in the form ft0+N1/2f~tf_{t}^{0}+N^{-1/2}\widetilde{f}_{t}, the extra scale factor T1/2T^{-1/2} in the above local rate looks rather unusual. However, (9) is the suitable local rate for the local likelihood ratio to be Op(1)O_{p}(1), as is shown in both the statement and the proof of Theorem 1 below. Without the scale factor T1/2T^{-1/2}, the local likelihood ratio will diverge to infinity (in absolute values) if no restrictions are imposed on f~t\widetilde{f}_{t} other than its boundedness. This type of local parameters was used in earlier work by [10] for additive fixed effects estimator. Later we shall consider a different type of local parameters without the extra scale factor T1/2T^{-1/2}, but other restrictions on f~t\widetilde{f}_{t} will be needed.

Additionally, even if one regards 1NTf~t\frac{1}{\sqrt{NT}}\widetilde{f}_{t} to be small (relative to 1Nf~t\frac{1}{\sqrt{N}}\widetilde{f}_{t}), it is for the better provided that the associated efficiency bound is achievable by an estimator. This is because the smaller the perturbation, the lower the efficiency bound, and hence harder to attain by any estimator.

Consider the space

r:={(f~t)t=1|f~tr,supsf~s<}\ell_{r}^{\infty}:=\{(\widetilde{f}_{t})_{t=1}^{\infty}\,\Big{|}\widetilde{f}_{t}\in\mathbb{R}^{r},\,\sup_{s}\|\widetilde{f}_{s}\|<\infty\}

the space of bounded sequences, each coordinate is r\mathbb{R}^{r}-valued. Let

f~=(f~1,f~2,)r\widetilde{f}=(\widetilde{f}_{1},\widetilde{f}_{2},...)\in\ell_{r}^{\infty}

and define F~=(f~1,f~2,,f~T)\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime}, the projection of f~\widetilde{f} onto the first TT coordinates. The matrix F~\widetilde{F} is T×rT\times r, but we suppress its dependence on TT for notational simplicity. Since f~r\widetilde{f}\in\ell_{r}^{\infty}, it follows that

1TF~F~=1Tt=1Tf~tf~t=O(1)\frac{1}{T}\widetilde{F}^{\prime}\widetilde{F}=\frac{1}{T}\sum_{t=1}^{T}\widetilde{f}_{t}\widetilde{f}_{t}^{\prime}=O(1) (10)

Similarly, for the time effects, we consider the local parameters

δt0+1NTδ~t\delta_{t}^{0}+\frac{1}{\sqrt{NT}}\widetilde{\delta}_{t}

with |δ~t|M|\widetilde{\delta}_{t}|\leq M for all tt. Let 1:={(δ~t)t=1|δ~t,sups|δ~s|<}\ell_{1}^{\infty}:=\{(\widetilde{\delta}_{t})_{t=1}^{\infty}\,\Big{|}\widetilde{\delta}_{t}\in\mathbb{R},\,\sup_{s}|\widetilde{\delta}_{s}|<\infty\}. For each {δ~t}=(δ~1,δ~2,)1\{\widetilde{\delta}_{t}\}=(\widetilde{\delta}_{1},\widetilde{\delta}_{2},...)\in\ell_{1}^{\infty}, we use δ~\widetilde{\delta} to denote the projection of the sequence onto the first TT coordinates δ~=(δ~1,δ~2,,δ~T)\widetilde{\delta}=(\widetilde{\delta}_{1},\widetilde{\delta}_{2},...,\widetilde{\delta}_{T})^{\prime}, a T×1T\times 1 vector.

To simplify the analysis, we assume DD is known. It can be shown that this simplifying assumption does not affect the efficiency bound for α\alpha, but reduces the complexity of the derivation. Let θ0=(α0,δ0,F0)\theta^{0}=(\alpha^{0},\delta^{0},F^{0}) and θ~=(α~,δ~,F~)\widetilde{\theta}=(\widetilde{\alpha},\widetilde{\delta},\widetilde{F}), we study the asymptotic behavior of

(θ0+1NTθ~)(θ0)\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})

under the normality of εit\varepsilon_{it} and λi\lambda_{i}. The normality assumption allows us to derive the parametric efficiency bound in the presence of increasing number of nuisance parameters. We then show the QMLE without normality attains the efficiency bound. In the rest of the paper, we use (α0,δ0,F0)(\alpha^{0},\delta^{0},F^{0}) and (α,δ,F)(\alpha,\delta,F) interchangeably (they represent the true parameters); (α~,δ~,F~)(\widetilde{\alpha},\widetilde{\delta},\widetilde{F}) represent local parameters, and α^\widehat{\alpha}, δ^\widehat{\delta}, and F^=(f^1,,f^T)\widehat{F}=(\widehat{f}_{1},...,\widehat{f}_{T})^{\prime} are the QMLE estimated parameters.

Assumption B

(i): εit\varepsilon_{it} are iid over ii, and independent over tt such that εit𝒩(0,σt2)\varepsilon_{it}\sim\mathcal{N}(0,\sigma_{t}^{2}).

(ii): λi\lambda_{i} are iid 𝒩(0,Ir)\mathcal{N}(0,I_{r}), independent of εit\varepsilon_{it} for all ii and tt.

(iii): σt2[a,b]\sigma_{t}^{2}\in[a,b] with 0<a<b<0<a<b<\infty for all tt.

(iv): ftM<\|f_{t}\|\leq M<\infty for all tt, and 1TFD1FΣff>0\frac{1}{T}F^{\prime}D^{-1}F\rightarrow\Sigma_{ff}>0, where D=diag(σ12,σ22,,σT2)D=\operatorname*{diag}(\sigma_{1}^{2},\sigma_{2}^{2},...,\sigma_{T}^{2}).

(v): As TT\rightarrow\infty,
(a) 1Ttr(LD1LD)γ>0\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)\rightarrow\gamma>0,
(b) 1Ttr[(LF)(D1/2MD1/2FD1/2)(LF)]ν0\frac{1}{T}\mathrm{tr}[(LF)^{\prime}(D^{-1/2}M_{D^{-1/2}F}D^{-1/2})(LF)]\rightarrow\nu\geq 0
(c) 1T(Lδ)(FF+D)1(Lδ)μ0\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)\rightarrow\mu\geq 0
where MD1/2FM_{D^{-1/2}F} denote the projection matrix orthogonal to D1/2FD^{-1/2}F. Specifically,

D1/2MD1/2FD1/2=D1D1F(FD1F)1FD1.D^{-1/2}M_{D^{-1/2}F}D^{-1/2}=D^{-1}-D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}.

Under assumptions B(i) and (ii), Fλi+εiF\lambda_{i}+\varepsilon_{i} are iid 𝒩(0,FF+D)\mathcal{N}(0,FF^{\prime}+D), implying a parametric model with an increasing dimension of incidental parameters. Normality for λi\lambda_{i} and εit\varepsilon_{it} is a standard assumption in factor analysis, see, e.g., Anderson [2] (p.576). Here we switch the role of λi\lambda_{i} and ftf_{t}. Note in classical factor analysis, the time dimension TT (in our notation) is fixed, there is no incidental parameters problem since the number of parameters is fixed. But we consider TT that goes to infinity. The following theorem gives the asymptotic representation for the local likelihood ratios.

Theorem 1.

Under Assumption B, for α~\widetilde{\alpha}\in\mathbb{R}, f~r\widetilde{f}\in\ell_{r}^{\infty}, {δ~t}1\{\widetilde{\delta}_{t}\}\in\ell_{1}^{\infty}, F~=(f~1,f~2,,f~T)\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime}, and δ~=(δ~1,,δ~T)\widetilde{\delta}=(\widetilde{\delta}_{1},...,\widetilde{\delta}_{T})^{\prime}, we have as N,TN,T\rightarrow\infty,

(θ0+1NTθ~)(θ0)=ΔNT(θ~)12𝔼[ΔNT(θ~)]2+op(1)\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})=\Delta_{NT}(\widetilde{\theta})-\frac{1}{2}\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}+o_{p}(1)

where

ΔNT(θ~)\displaystyle\Delta_{NT}(\widetilde{\theta}) =1NTi=1NλiF~[D1/2MD1/2FD1/2]εi\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}
+1NTi=1Nδ~(FF+D)1(Fλi+εi)\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
+α~1NTi=1N(Lεi)D1εi\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i} (11)
+α~1NTi=1Nλi(LF)[D1/2MD1/2FD1/2]εi\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}
+α~1NTi=1N(Lδ)(FF+D)1(Fλi+εi)\displaystyle+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
𝔼[ΔNT(θ~)]2=\displaystyle\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}= 1Ttr[F~[D1/2MD1/2FD1/2]F~]\displaystyle\frac{1}{T}\mathrm{tr}\Big{[}\widetilde{F}^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}\Big{]}
+α~2[1Ttr(LD1LD)]\displaystyle+\widetilde{\alpha}^{2}\Big{[}\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)\Big{]}
+α~2tr[1T(LF)[D1/2MD1/2FD1/2](LF)]\displaystyle+\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)\Big{]}
+2α~tr[1T(LF)[D1/2MD1/2FD1/2]F~]\displaystyle+2\widetilde{\alpha}\,\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}\Big{]}
+2α~1Tδ~(FF+D)1Lδ\displaystyle+2\widetilde{\alpha}\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta
+1Tδ~(FF+D)1δ~\displaystyle+\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}
+α~21T(Lδ)(FF+D)1(Lδ)\displaystyle+\widetilde{\alpha}^{2}\;\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)

where op(1)o_{p}(1) is uniform over θ~\widetilde{\theta} such that |α~|M|\widetilde{\alpha}|\leq M, 1TF~F~=1Tt=1Tf~t2M\frac{1}{T}\|\widetilde{F}^{\prime}\widetilde{F}\|=\frac{1}{T}\sum_{t=1}^{T}\|\widetilde{f}_{t}\|^{2}\leq M, and 1Tδ~2=1Tt=1Tδ~t2M\frac{1}{T}\|\widetilde{\delta}\|^{2}=\frac{1}{T}\sum_{t=1}^{T}\widetilde{\delta}_{t}^{2}\leq M, for any given M<M<\infty.

The proof of Theorem 1 is given in the Appendix.

Note that the expected value of ΔNT(θ~)\Delta_{NT}(\widetilde{\theta}) is zero, so 𝔼[ΔNT(θ~)]2\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2} is the variance.

All terms in ΔNT(θ~)\Delta_{NT}(\widetilde{\theta}) are stochastically bounded, they have expressions of the form 1NTi=1Nt=1Tξit\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi_{it}, where ξit\xi_{it} have zero mean, and finite variance (and in fact finite moments of any order under Assumption B). By assuming {δ~t}\{\widetilde{\delta}_{t}\} and f~\widetilde{f} are such that

1TF~D1F~\displaystyle\frac{1}{T}\widetilde{F}^{\prime}D^{-1}\widetilde{F} =1Tt=1T1σt2f~tf~tΣf~f~,\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}\widetilde{f}_{t}\widetilde{f}_{t}^{\prime}\rightarrow\Sigma_{\widetilde{f}\widetilde{f}},
1Tδ~D1δ~\displaystyle\frac{1}{T}\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta} =1Tt=1T1σt2δ~t2σδ2\displaystyle=\frac{1}{T}\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}\widetilde{\delta}_{t}^{2}\rightarrow\sigma_{\delta}^{2}

as well as existence of limits for several cross products 1TF~D1F\frac{1}{T}\widetilde{F}^{\prime}D^{-1}F, 1Tδ~D1F\frac{1}{T}\widetilde{\delta}^{\prime}D^{-1}F, and so forth, then the variance 𝔼[ΔNT(θ~)]2\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2} has a limit. Let 𝔼[ΔNT(θ~)]2τ2\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2}\rightarrow\tau^{2} for some τ2\tau^{2} depending on (α~,{δ~t},f~)(\widetilde{\alpha},\{\widetilde{\delta}_{t}\},\widetilde{f}). We can further show

ΔNT(θ~)d𝒩(0,τ2).\Delta_{NT}(\widetilde{\theta})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,\tau^{2}).

Thus the local likelihood ratio can be rewritten as

(θ0+1NTθ~)(θ0)=ΔNT(θ~)12τ2+op(1).\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})=\Delta_{NT}(\widetilde{\theta})-\frac{1}{2}\tau^{2}+o_{p}(1).

We next consider the asymptotic efficiency bound for regular estimators. Regular estimators rule out Hodges type “superefficient” and James-Stein type estimators. A regular estimator sequence converges locally uniformly (under the local laws) to a limiting distribution that is free from the local parameters (van der Vaart [19], p.115, p.365).

3.3 Efficient scores and efficiency bound

In the likelihood ratio expansion, the term ΔNT(θ~)\Delta_{NT}(\widetilde{\theta}) contains the scores of the likelihood function. The coefficient of α~\widetilde{\alpha} gives the score for α0\alpha^{0}, the coefficient of f~t\widetilde{f}_{t} gives the score of ft0f_{t}^{0}, and the same holds for δ~\widetilde{\delta} and δ0\delta^{0}. The efficient score for α0\alpha^{0} is the projection residual of its own score onto the scores of f10,,fT0f_{1}^{0},...,f_{T}^{0} and of δ10,,δT0\delta_{1}^{0},...,\delta_{T}^{0}. Moreover, the inverse of the variance of the efficient score gives the efficiency bound. To derive the efficient score for α0\alpha^{0}, rewrite ΔNT(θ~)\Delta_{NT}(\widetilde{\theta}) of Theorem 1 as

ΔNT(θ~)=ΔNT1+ΔNT2+α~[ΔNT3+ΔNT4+ΔNT5]\Delta_{NT}(\widetilde{\theta})=\Delta_{NT1}+\Delta_{NT2}+\widetilde{\alpha}\,[\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5}]

where ΔNT1\Delta_{NT1} and ΔNT2\Delta_{NT2} denote the first two terms of ΔNT(θ~)\Delta_{NT}(\widetilde{\theta}), see (11), and ΔNTj\Delta_{NTj} (j=3,4,5(j=3,4,5) denote the last three terms of (11), but taking out α~\widetilde{\alpha}. So the score for α0\alpha^{0} is the sum ΔNT3+ΔNT4+ΔNT5\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5}. Next, rewrite

ΔNT1\displaystyle\Delta_{NT1} =1NTi=1NλiF~D1/2MD1/2FD1/2εi=1Tt=1Tf~tvt\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}=\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\widetilde{f}_{t}^{\prime}v_{t}

where vt=1Ni=1Nλivitv_{t}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}v_{it} (r×1)(r\times 1) and vitv_{it} is the tt-th element of the vector D1/2MD1/2FD1/2εiD^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}. Thus vtv_{t} is the score of ft0f^{0}_{t} (t=1,2,,Tt=1,2,...,T). Similarly, rewrite

ΔNT2\displaystyle\Delta_{NT2} =1NTi=1Nδ~(FF+D)1(Fλi+εi)=1Tt=1Tδ~tut\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\widetilde{\delta}_{t}u_{t}

where ut=1Ni=1Nuitu_{t}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}u_{it} (a scalar), where uitu_{it} is the ttth element of the vector (FF+D)1(Fλi+εi)(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i}), and utu_{t} is the score of δt0\delta_{t}^{0}. To obtain the efficient score for α0\alpha^{0}, we project ΔNT3+ΔNT4+ΔNT4\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT4} onto the scores of ft0f_{t}^{0} and δt0\delta_{t}^{0} (t=1,2,,T)(t=1,2,...,T), that is onto [v1,v2.,vT][v_{1},v_{2}....,v_{T}] and [u1,u2,,uT][u_{1},u_{2},...,u_{T}] to get the projection residuals. Let VT=(v1,v2,,vT)V_{T}=(v_{1}^{\prime},v_{2}^{\prime},...,v_{T}^{\prime})^{\prime} and UT=(u1,u2,,uT)U_{T}=(u_{1},u_{2},...,u_{T})^{\prime} and ZT=(VT,UT)Z_{T}=(V_{T}^{\prime},\;U_{T}^{\prime})^{\prime}. The projection residual is given by

ΔNT3+ΔNT4+ΔNT5ZT[𝔼(ZTZT)]1𝔼[ZT(ΔNT3+ΔNT4+ΔNT5)]\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5}-Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}\left[Z_{T}(\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5})\right] (12)

Notice ΔNT3\Delta_{NT3} is uncorrelated with the scores of ft0f_{t}^{0} and of δt0\delta_{t}^{0} (t=1,2,,T)t=1,2,...,T), i.e. 𝔼(ZTΔNT,3)=0\mathbb{E}(Z_{T}\Delta_{NT,3})=0. This follows because LεiL\varepsilon_{i} contains the lags of εi\varepsilon_{i}, so ΔNT3\Delta_{NT3} is composed of terms εitsεit\varepsilon_{it-s}\varepsilon_{it} (with s1s\geq 1), and 𝔼(εitsεitεik)=0\mathbb{E}(\varepsilon_{it-s}\varepsilon_{it}\varepsilon_{ik})=0 for any kk. Next, ΔNT4\Delta_{NT4} is simply a linear combination of VT=[v1,v2,,vT]V_{T}=[v_{1},v_{2},...,v_{T}] since ΔNT3\Delta_{NT3} can be written as T1/2t=1TptvtT^{-1/2}\sum_{t=1}^{T}p_{t}^{\prime}v_{t}, where ptp_{t}^{\prime} is the tt-th row of the matrix LFLF. Because VTV_{T} is a subvector of ZTZ_{T}, ΔNT4\Delta_{NT4} is also a linear combination of ZTZ_{T}. Thus ZT[𝔼(ZTZT)]1𝔼[ZT(ΔNT4)]ΔNT4Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}[Z_{T}(\Delta_{NT4})]\equiv\Delta_{NT4}. Similarly, ΔNT5\Delta_{NT5} is a linear combination of UTU_{T} because ΔNT5=1Tt=1Tqtut\Delta_{NT5}=\frac{1}{\sqrt{T}}\sum_{t=1}^{T}q_{t}u_{t} with qtq_{t} being the ttth element of LδL\delta. Thus ΔNT5\Delta_{NT5} is a linear combination of ZTZ_{T}. Hence, ZT[𝔼(ZTZT)]1𝔼[ZT(ΔNT5)]ΔNT5Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}[Z_{T}(\Delta_{NT5})]\equiv\Delta_{NT5}. In summary, we have

ZT[𝔼(ZTZT)]1𝔼[ZT(ΔNT3+ΔNT4+ΔNT5)]=ΔNT4+ΔNT5Z_{T}^{\prime}[\mathbb{E}(Z_{T}Z_{T}^{\prime})]^{-1}\mathbb{E}\left[Z_{T}(\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5})\right]=\Delta_{NT4}+\Delta_{NT5}

It follows that the projection residual in (12) is equal to ΔNT3\Delta_{NT3}. Hence the efficient score for α0\alpha^{0} is ΔNT3\Delta_{NT3}. Notice,

var(ΔNT3)=1Ttr(LD1LD)=1Tt=2T1σt2(σt12+α2σt22++α2(t2)σ12)\mbox{var}(\Delta_{NT3})=\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)=\frac{1}{T}\sum_{t=2}^{T}\frac{1}{\sigma_{t}^{2}}\Big{(}\sigma_{t-1}^{2}+\alpha^{2}\sigma_{t-2}^{2}+\cdots+\alpha^{2(t-2)}\sigma_{1}^{2}\Big{)} (13)

its limit is γ\gamma by Assumption B(v), so 1/γ1/\gamma gives the asymptotic efficiency bound.

We summarize the result in the following corollary

Corollary 1.

Under Assumption B, the asymptotic efficiency bound for regular estimators of α0\alpha^{0}, is 1/γ1/\gamma, with γ\gamma being the limit of (13).

Since Assumption B is stronger than Assumption A, the asymptotic representation in (5) holds under Assumption B. That is, under the normality assumption, the system maximum likelihood estimator satisfies

NT(α^α)=(1Ttr(LDLD1))1[1NTi=1N(Lεi)D1εi]+op(1).\sqrt{NT}(\widehat{\alpha}-\alpha)=\Big{(}\frac{1}{T}\mathrm{tr}(LDL^{\prime}D^{-1})\bigg{)}^{-1}\Big{[}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}\Big{]}+o_{p}(1).

We see that NT(α^α0)\sqrt{NT}(\widehat{\alpha}-\alpha^{0}) is expressed in terms of the efficient influence functions, thus α^\widehat{\alpha} is regular and asymptotically efficient (van der Vaart [19], p.121 and p.369). We state the result in the following corollary.

Corollary 2.

Under Assumption B, the system maximum likelihood estimator α^\widehat{\alpha} is a regular estimator and achieves the asymptotic efficiency bound (in spite of an increasing number of incidental parameters).

The preceding corollaries imply that, under normality, we are able to establish the asymptotic efficiency bound in the presence of increasing number of nuisance parameters. Further, the system maximum likelihood estimator achieves the efficiency bound. These results are not obvious owing to the incidental parameters problem.

QMLE in Section 2 does not require normality. But it achieves the normality efficiency bound, see equation (5). So QMLE is robust to the normality assumption. If λi\lambda_{i} and εi\varepsilon_{i} are non-normal, and their distributions are known, one should be able to construct a more efficient estimator than the QMLE. But for panel data analysis in practice, researchers usually do not impose distributional assumptions other than some moment conditions such as those in Assumption A. Thus QMLE presents a viable estimation procedure, knowing that it achieves the normality efficiency bound. Furthermore, QMLE does not need bias correction, unlike the fixed effects estimator.

The result of Corollary 1 is not directly obtained via a limit experiment and the convolution theorem (e.g., van der Vaart and Wellner, [20], chapter 3.11). Since r\ell_{r}^{\infty} is not a Hilbert space, the convolution theorem is not directly applicable. However, using the line of argument in [11] it is possible to construct a Hilbert subspace with an appropriate inner product in which the efficiency bound for the low dimensional parameter α0\alpha^{0} can be shown to be 1/γ1/\gamma. That is, Corollary 1 can be obtained via the convolution theorem. But [11] also show that the incidental parameters ft0f_{t}^{0} under the corresponding local parameter space are not regular. Therefore, we shall not pursue this approach. Below we shall consider 2\ell^{2} perturbations.

3.4 The 2\ell^{2} local parameter space

To have the limit process of the local likelihood ratios reside in a Hilbert space and to directly apply the convolution theorem ([20], chapter 3.11), we consider the second type of local parameter space, which is also used by [11] for the fixed effects estimators:

ft0+1Nf~t,t=1,2,,Tf_{t}^{0}+\frac{1}{\sqrt{N}}\widetilde{f}_{t},\quad t=1,2,...,T (14)

with f~=(f~1,f~2,)\widetilde{f}=(\widetilde{f}_{1},\widetilde{f}_{2},...) being required to be in r2\ell_{r}^{2}:

r2:={(f~t)t=1|f~tr,s=1f~s2<}.\ \ell_{r}^{2}:=\Big{\{}(\widetilde{f}_{t})_{t=1}^{\infty}\,\Big{|}\widetilde{f}_{t}\in\mathbb{R}^{r},\sum_{s=1}^{\infty}\|\widetilde{f}_{s}\|^{2}<\infty\Big{\}}.

For this type of local parameters, we can remove the scale factor T1/2T^{-1/2}, (cf. (9)). Since f~r2\widetilde{f}\in\ell_{r}^{2}, we have, for F~=(f~1,f~2,,f~T)\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime} (projection of f~\widetilde{f} on the first TT coordinates),

F~F~=t=1Tf~tf~t=O(1)\widetilde{F}^{\prime}\widetilde{F}=\sum_{t=1}^{T}\widetilde{f}_{t}\widetilde{f}_{t}^{\prime}=O(1)

This is in contrast with (10). A necessary condition for f~r2\widetilde{f}\in\ell_{r}^{2} is f~t0\widetilde{f}_{t}\rightarrow 0 as tt\rightarrow\infty. In comparison, the r\ell_{r}^{\infty} perturbation considered earlier only requires the boundedness of f~t\widetilde{f}_{t}, so the process f~t\widetilde{f}_{t} can be rather “jagged.” In certain sense, requiring f~\widetilde{f} to be in r2\ell_{r}^{2} can be viewed as a “smoothness” restriction (for example, the Banach-Mazur theorem).

Similar to f~\widetilde{f}, we assume the sequence {δ~t}\{\widetilde{\delta}_{t}\} is in 12\ell_{1}^{2}, i.e. s=1δ~s2<\sum_{s=1}^{\infty}\widetilde{\delta}_{s}^{2}<\infty. We still use δ~\widetilde{\delta} to denote the projection of the sequence onto the first TT coordinates, δ~=(δ~1,,δ~T)\widetilde{\delta}=(\widetilde{\delta}_{1},...,\widetilde{\delta}_{T}).

Theorem 2.

Under Assumption B, for α~\widetilde{\alpha}\in\mathbb{R}, {δ~t}12\{\widetilde{\delta}_{t}\}\in\ell_{1}^{2}, f~r2\widetilde{f}\in\ell_{r}^{2}, F~=(f~1,f~2,,f~T)\widetilde{F}=(\widetilde{f}_{1},\widetilde{f}_{2},...,\widetilde{f}_{T})^{\prime}, and δ~=(δ~1,,δ~T)\widetilde{\delta}=(\widetilde{\delta}_{1},...,\widetilde{\delta}_{T}), we have as N,TN,T\rightarrow\infty,

(α0+α~NT,δ0+1Nδ~,F0+1NF~)(θ0)=ΔNT(θ~)12𝔼[ΔNT(θ~)]2+op(1)\ell(\alpha^{0}+\frac{\widetilde{\alpha}}{\sqrt{NT}},\,\delta^{0}+\frac{1}{\sqrt{N}}\widetilde{\delta},F^{0}+\frac{1}{\sqrt{N}}\widetilde{F})-\ell(\theta^{0})=\Delta_{NT}^{\dagger}(\widetilde{\theta})-\frac{1}{2}\mathbb{E}[\Delta_{NT}^{\dagger}(\widetilde{\theta})]^{2}+o_{p}(1)

where

ΔNT(θ~)\displaystyle\Delta_{NT}^{\dagger}(\widetilde{\theta}) =1Ni=1NλiF~D1εi\displaystyle=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}
+1Ni=1Nδ~D1εi\displaystyle+\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}D^{-1}\varepsilon_{i}
+α~1NTi=1N(Lεi)D1εi\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i} (15)
+α~1NTi=1Nλi(LF)[D1/2MD1/2FD1/2]εi\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}
+α~1NTi=1N(Lδ)(FF+D)1(Fλi+εi)\displaystyle+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
𝔼[ΔNT(θ~)]2\displaystyle\mathbb{E}[\Delta_{NT}^{\dagger}(\widetilde{\theta})]^{2} =tr[F~D1F~]\displaystyle=\mathrm{tr}\Big{[}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\Big{]}
+δ~D1δ~\displaystyle+\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta}
+α~2[1Ttr(LD1LD)]\displaystyle+\widetilde{\alpha}^{2}\Big{[}\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)\Big{]} (16)
+α~2tr[1T(LF)[D1/2MD1/2FD1/2](LF)]\displaystyle+\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)\Big{]}
+α~21T(Lδ)(FF+D)1(Lδ)\displaystyle+\widetilde{\alpha}^{2}\;\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)

where op(1)o_{p}(1) is uniform over |α~|M|\widetilde{\alpha}|\leq M, δ~M\|\widetilde{\delta}\|\leq M, and F~M\|\widetilde{F}\|\leq M for any given M<M<\infty.

In comparison with Theorem 1, Theorem 2 has simpler expressions, due to the smaller local parameter space. The first two terms in ΔNT(θ~)\Delta_{NT}(\widetilde{\theta}) are simplified, with the corresponding simplification in the variance, and in addition, two covariance terms in 𝔼[ΔNT(θ~)]2\mathbb{E}[\Delta_{NT}(\widetilde{\theta})]^{2} are dropped. The proof is given in the appendix.

We next establish the local asymptotic normality (LAN) property for the local likelihood ratios. Here we introduce further notation to the make the expressions more compact. Let

λi+=[1λi],f~t+=[δ~tf~t],F~+=(δ~,F~)\lambda_{i}^{+}=\begin{bmatrix}1\\ \lambda_{i}\end{bmatrix},\quad\widetilde{f}_{t}^{+}=\begin{bmatrix}\widetilde{\delta}_{t}\\ \widetilde{f}_{t}\end{bmatrix},\quad\widetilde{F}^{+}=(\widetilde{\delta},\widetilde{F})

Both λi+\lambda_{i}^{+} and f~t+\widetilde{f}_{t}^{+} are vectors with r+1r+1 elements, and F~+\widetilde{F}^{+} is a matrix of T×(r+1)T\times(r+1). With these notations, the sum of the first two terms of (15) equals 1Ni=1N(λi+)(F~+)D1εi\frac{1}{\sqrt{N}}\sum_{i=1}^{N}(\lambda_{i}^{+})^{\prime}(\widetilde{F}^{+})^{\prime}D^{-1}\varepsilon_{i}.

For each (α~,f~+)=(α~,f~1+,f~2+,.)×r+12(\widetilde{\alpha},\widetilde{f}^{+})=(\widetilde{\alpha},\widetilde{f}_{1}^{+},\widetilde{f}_{2}^{+},....)\in\mathbb{R}\times\ell_{r+1}^{2}, we introduce a new sequence

h(α~,f~+)=(h0,h1,h2,)=(α~(γ+ν+μ)1/2,1σ1f~1+,1σ2f~2+,)h(\widetilde{\alpha},\widetilde{f}^{+})=(h_{0},h_{1},h_{2},...)=\Big{(}\widetilde{\alpha}(\gamma+\nu+\mu)^{1/2},\frac{1}{\sigma_{1}}{\widetilde{f}_{1}^{+}},\frac{1}{\sigma_{2}}{\widetilde{f}_{2}^{+}},...\Big{)} (17)

so h0=α~(γ+ν+μ)1/2h_{0}=\widetilde{\alpha}(\gamma+\nu+\mu)^{1/2}, and hs=1σsf~s+h_{s}=\frac{1}{\sigma_{s}}\widetilde{f}_{s}^{+} for s1s\geq 1, where γ\gamma, ν\nu, and μ\mu are defined in Assumption B(v), and σs2\sigma_{s}^{2} is the variance of εis\varepsilon_{is}. Hence, h(α~,f~+)h(\widetilde{\alpha},\widetilde{f}^{+}) is a scaled version of (α~,f~+)(\widetilde{\alpha},\widetilde{f}^{+}). By Assumption B(iii), minsσs2a>0\min_{s}\sigma_{s}^{2}\geq a>0, it follows that h(α~,f~+):=×r+12h(\widetilde{\alpha},\widetilde{f}^{+})\in\mathbb{H}:=\mathbb{R}\times\ell_{r+1}^{2}. For any h,gh,g\in\mathbb{H}, define the inner product, h,g=h0g0+s=1hsgs\langle h,g\rangle=h_{0}g_{0}+\sum_{s=1}^{\infty}h_{s}^{\prime}g_{s}, then \mathbb{H} is a Hilbert space. Let h2=h,h\|h\|_{\mathbb{H}}^{2}=\langle h,h\rangle. In particular, for h=h(α~,f~+)h=h(\widetilde{\alpha},\widetilde{f}^{+}) in (17), we have

h2=α~2(γ+ν+μ)+s=11σs2(f~s+)f~s+\|h\|_{\mathbb{H}}^{2}=\widetilde{\alpha}^{2}(\gamma+\nu+\mu)+\sum_{s=1}^{\infty}\frac{1}{\sigma_{s}^{2}}(\widetilde{f}^{+}_{s})^{\prime}\widetilde{f}^{+}_{s} (18)

Notice tr(F~D1F~)+tr(δ~D1δ~)=s=1T1σs2f~sf~s+s=1T1σs2δ~s2=s=11σs2(f~s+)f~s++o(1)\mathrm{tr}(\widetilde{F}^{\prime}D^{-1}\widetilde{F})+\mathrm{tr}(\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta})=\sum_{s=1}^{T}\frac{1}{\sigma_{s}^{2}}\widetilde{f}_{s}^{\prime}\widetilde{f}_{s}+\sum_{s=1}^{T}\frac{1}{\sigma_{s}^{2}}\widetilde{\delta}_{s}^{2}=\sum_{s=1}^{\infty}\frac{1}{\sigma_{s}^{2}}(\widetilde{f}^{+}_{s})^{\prime}\widetilde{f}^{+}_{s}+o(1) because the series are convergent and nonnegative; rearranging does not alter the limit. By Assumption B(v), we can write (16) as

𝔼[ΔNT(θ~)]2=α~2(γ+ν+μ)+s=11σs2(f~s+)f~s++o(1)=h2+o(1)\mathbb{E}[\Delta_{NT}^{\dagger}(\widetilde{\theta})]^{2}=\widetilde{\alpha}^{2}(\gamma+\nu+\mu)+\sum_{s=1}^{\infty}\frac{1}{\sigma_{s}^{2}}(\widetilde{f}^{+}_{s})^{\prime}\widetilde{f}^{+}_{s}+o(1)=\|h\|_{\mathbb{H}}^{2}+o(1) (19)

where h=h(α~,f~)h=h(\widetilde{\alpha},\widetilde{f}) is given in (17). Next, rewrite (15) as

ΔNT(θ~)=1Ni=1N(λi+)(F~+)D1εi+α~(ΔNT3+ΔNT4+ΔNT5)\Delta_{NT}^{\dagger}(\widetilde{\theta})=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}(\lambda^{+}_{i})^{\prime}(\widetilde{F}^{+})^{\prime}D^{-1}\varepsilon_{i}+\widetilde{\alpha}(\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5})

where ΔNTj\Delta_{NTj} (j=3,4,5)j=3,4,5) are defined earlier. The first term

1Ni=1N(λi+)(F~+)D1εi=t=1T1σt2(f~t+)(1Ni=1Nλi+εit)d𝒩(0,t=1(f~t+)f~t+/σt2)\frac{1}{\sqrt{N}}\sum_{i=1}^{N}(\lambda^{+}_{i})^{\prime}(\widetilde{F}^{+})^{\prime}D^{-1}\varepsilon_{i}=\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}(\widetilde{f}^{+}_{t})^{\prime}\Big{(}\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{+}\varepsilon_{it}\Big{)}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\Big{(}0,\sum_{t=1}^{\infty}(\widetilde{f}^{+}_{t})^{\prime}\widetilde{f}^{+}_{t}/\sigma_{t}^{2}\Big{)}

because N1/2i=1Nλi+εitd𝒩(0,Ir+1)N^{-1/2}\sum_{i=1}^{N}\lambda_{i}^{+}\varepsilon_{it}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,I_{r+1}). Note that the LHS above is asymptotically independent of ΔNT3+ΔNT4+ΔNT5\Delta_{NT3}+\Delta_{NT4}+\Delta_{NT5} (their covariance being zero, as is shown in the appendix). From

ΔNT,3+ΔNT,4+ΔNT,5d𝒩(0,γ+ν+μ)\Delta_{NT,3}+\Delta_{NT,4}+\Delta_{NT,5}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,\gamma+\nu+\mu)

where γ,ν,μ\gamma,\nu,\mu are given in Assumption B, we have

ΔNT(θ~)d𝒩(0,h2)\Delta_{NT}^{\dagger}(\widetilde{\theta})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\left(0,\|h\|_{\mathbb{H}}^{2}\,\right)

Moreover, it is not difficult to establish the finite dimensional weak convergence. Let α~j\widetilde{\alpha}^{j}\in\mathbb{R}, f~+jr+12\widetilde{f}^{+j}\in\ell_{r+1}^{2} for j=1,2,..,qj=1,2,..,q. Let hj=h(α~j,f~+j)h^{j}=h(\widetilde{\alpha}^{j},\widetilde{f}^{+j}) and θ~j=(α~j,F~+j)\widetilde{\theta}^{j}=(\widetilde{\alpha}^{j},\widetilde{F}^{+j}), for any finite integer qq,

(ΔNT(θ~1),,ΔNT(θ~q))d𝒩(0,(hj,hk)j,k=1q).(\Delta_{NT}^{\dagger}(\widetilde{\theta}^{1}),...,\Delta_{NT}^{\dagger}(\widetilde{\theta}^{q}))^{\prime}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\Big{(}0,(\langle h^{j},h^{k}\rangle)_{j,k=1}^{q}\Big{)}.

Summarizing the above, we have

Corollary 3.

Under the assumption of Theorem 2,

(α0+α~NT,δ0+1Nδ~,F0+1NF~)(θ0)=ΔNT(θ~)12h2+op(1)\ell(\alpha^{0}+\frac{\widetilde{\alpha}}{\sqrt{NT}},\delta^{0}+\frac{1}{\sqrt{N}}\widetilde{\delta},F^{0}+\frac{1}{\sqrt{N}}\widetilde{F})-\ell(\theta^{0})=\Delta_{NT}^{\dagger}(\widetilde{\theta})-\frac{1}{2}\|h\|_{\mathbb{H}}^{2}+o_{p}(1)

and

ΔNT(θ~)d𝒩(0,h2)\Delta_{NT}^{\dagger}(\widetilde{\theta})\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}\left(0,\|h\|_{\mathbb{H}}^{2}\right)

where h=h(α~,f~+)h=h(\widetilde{\alpha},\widetilde{f}^{+}) and h2\|h\|_{\mathbb{H}}^{2} are defined in (17) and (18), respectively. Furthermore, the likelihood ratio is locally asymptotically normal (LAN).

Using the convolution theorem for locally asymptotically normal (LAN) experiments, the implied efficiency bound for regular estimators of α0\alpha^{0} is 1/(γ+ν+μ)1/(\gamma+\nu+\mu). The implied efficiency bound for regular estimators of ft0+:=(δt0,ft0)f_{t}^{0+}:=(\delta_{t}^{0},f_{t}^{0}) is σt2Ir+1\sigma_{t}^{2}I_{r+1} for each tt. These bounds are, respectively, the inverse of the coefficient of α~2\widetilde{\alpha}^{2}, and the inverse of the matrix in the quadratic form (f~t+)f~t+/σt2=(f~t+)(Ir+1/σt2)f~t+(\widetilde{f}_{t}^{+})^{\prime}\widetilde{f}^{+}_{t}/\sigma_{t}^{2}=(\widetilde{f}_{t}^{+})^{\prime}(I_{r+1}/\sigma_{t}^{2})\widetilde{f}^{+}_{t} in the expression for h2\|h\|_{\mathbb{H}}^{2}.

To see this, fix ss\in\mathbb{N}, with s1s\geq 1. For h=(h0,h1,,hs,)h=(h_{0},h_{1},...,h_{s},...)\in\mathbb{H}, consider the parameter sequence,

ϕNT,s(h):=fs0++N1/2f~s+=fs0++N1/2σshs,ϕNT,s(0)=fs0+\phi_{NT,s}(h):=f_{s}^{0+}+N^{-1/2}\widetilde{f}_{s}^{+}=f_{s}^{0+}+N^{-1/2}\sigma_{s}h_{s},\quad\phi_{NT,s}(0)=f_{s}^{0+}

so N[ϕNT,s(h)ϕNT,s(0)]=σshs\sqrt{N}[\phi_{NT,s}(h)-\phi_{NT,s}(0)]=\sigma_{s}h_{s}. If we define ϕ˙s(h)=σshs\dot{\phi}_{s}(h)=\sigma_{s}h_{s}, then

N[ϕNT,s(h)ϕNT,s(0)]=ϕ˙s(h).\sqrt{N}[\phi_{NT,s}(h)-\phi_{NT,s}(0)]=\dot{\phi}_{s}(h).

Since ϕ˙s\dot{\phi}_{s} is a coordinate projection map (multiplied by a positive constant σs\sigma_{s}), it is a continuous linear map, ϕ˙s:r+1\dot{\phi}_{s}:\mathbb{H}\rightarrow\mathbb{R}^{r+1}. Its adjoint map ϕ˙s:r+1\dot{\phi}_{s}^{*}:\mathbb{R}^{r+1}\rightarrow\mathbb{H} (both spaces are self-dual) is the inclusion map (i.e., embedding): ϕ˙sx=(0,,0,σsx,0,)\dot{\phi}_{s}^{*}x=(0,...,0,\sigma_{s}x,0,...)\in\mathbb{H}, for all xr+1x\in\mathbb{R}^{r+1}. The adjoint map satisfies ϕ˙sx,h=σsxhs=xϕ˙s(h)=x,ϕ˙s(h)\langle\dot{\phi}_{s}^{*}x,h\rangle=\sigma_{s}x^{\prime}h_{s}=x^{\prime}\dot{\phi}_{s}(h)=\langle x,\dot{\phi}_{s}(h)\rangle. Let ZZ denote the limiting distribution of efficient estimators of fs0+f_{s}^{0+}. Theorem 3.11.2 in van der Vaart and Wellner ([20], p.414) show that xZ𝒩(0,ϕ˙sx2)x^{\prime}Z\sim\mathcal{N}(0,\|\dot{\phi}_{s}^{*}x\|_{\mathbb{H}}^{2}) for all xr+1x\in\mathbb{R}^{r+1}. But ϕ˙sx2=σs2xx\|\dot{\phi}_{s}^{*}x\|_{\mathbb{H}}^{2}=\sigma_{s}^{2}x^{\prime}x. It follows that Z𝒩(0,σs2Ir+1)Z\sim\mathcal{N}(0,\sigma_{s}^{2}I_{r+1}). Thus the efficiency bound for regular estimators of fs0+f_{s}^{0+} is σs2Ir+1\sigma_{s}^{2}I_{r+1}. For s=0s=0, the same argument shows that the efficiency bound for regular estimators of α0\alpha^{0} is 1/(γ+ν+μ)1/(\gamma+\nu+\mu). In summary, we have

Corollary 4.

Under the assumptions of Theorem 2, the asymptotic efficiency bound for regular estimators of α0\alpha^{0} is 1/(γ+ν+μ)1/(\gamma+\nu+\mu), and the efficiency bound for regular estimators of ft0+f_{t}^{0+} is σt2Ir+1\sigma_{t}^{2}I_{r+1}.

It can be shown that the efficiency bound 1/(γ+ν+μ)1/(\gamma+\nu+\mu) corresponds to the case in which the incidental parameters (δ0,F0)(\delta^{0},F^{0}) are known, not estimated, thus the implied efficiency bound is too low to be attainable. The implication is that the 2\ell^{2} perturbation is “too small”. Intuitively, the smaller the local parameter space, the lower the efficiency bound. Thus it is harder to achieve the implied bound, unless the estimation is done within the given local parameter space. But estimators, in general, are not constructed in reference to local parameter spaces.

When the same model is estimated by the fixed effects method (that is, the λi\lambda_{i}’s are also treated as parameters), Iwakura and Okui [11] show that the 2\ell^{2} perturbation is a suitable choice, and that the corresponding efficiency bound for α0\alpha^{0} is 1/γ1/\gamma (the authors confined their analysis to the homoskedastic case so 1/γ=1α21/\gamma=1-\alpha^{2}). To the method of QMLE, however, there is no sufficient variation in the 2\ell^{2} perturbation, thus implying a smaller bound. Which is to say that QMLE is a more efficient estimation procedure than the fixed effects approach. This finding is consistent with the result that even under fixed TT, QMLE provides a consistent estimator for α0\alpha^{0}, but fixed effects estimator is not consistent, see [17] and [6].

To recap, the 2\ell^{2} local parameter space is “too small” for the QMLE, but is the suitable local parameter space for the fixed effects approach. This implies that, as explained earlier, QMLE is a better procedure than the fixed effects method. By analyzing the local likelihood ratios, we are able to inform the merits of different estimators that are otherwise hard to discern based on the usual asymptotics alone (e.g., limiting distributions).

4 Concluding remarks

We derive the efficiency bound for estimating dynamic panel models with interactive effects by treating the model as a simultaneous equations system. We show that quasi-maximum likelihood method applied to the system attains the efficiency bound. These results are obtained under an increasing number of incidental parameters. Contrasts with the fixed effects estimators are made. In particular, the analysis shows that the system QMLE is a preferred method to the fixed effects estimators.

Proof of Results

We focus on the local parameter space for f~r\widetilde{f}\in\ell_{r}^{\infty}, and {δ~t}1\{\widetilde{\delta}_{t}\}\in\ell_{1}^{\infty}. The analysis for the sequence to be in the 2\ell^{2} space is a special case. Let G=F+1NTF~G=F+\frac{1}{\sqrt{NT}}\widetilde{F}. We drop the superscript 0 associated with true parameters to make the notation less cumbersome. When evaluated at the true parameters we have the equality Byiδ=Fλi+εiBy_{i}-\delta=F\lambda_{i}+\varepsilon_{i}. Thus

(θ0)=N2log|FF+D|12i=1N(Fλi+εi)(FF+D)1(Fλi+εi)\ell(\theta^{0})=-\frac{N}{2}\log|FF^{\prime}+D|-\frac{1}{2}\sum_{i=1}^{N}(F\lambda_{i}+\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})

and (θ0+1NTθ~)\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta}) is equal to

(θ0+1NTθ~)\displaystyle\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta}) =N2log|GG+D|\displaystyle=-\frac{N}{2}\log|GG^{\prime}+D|
12i=1N(yi(δ+1NTδ~)(α+1NTα~)yi,1)(GG+D)1\displaystyle-\frac{1}{2}\sum_{i=1}^{N}\Big{(}y_{i}-(\delta+\frac{1}{\sqrt{NT}}\widetilde{\delta})-(\alpha+\frac{1}{\sqrt{NT}}\widetilde{\alpha})y_{i,-1}\Big{)}^{\prime}(GG^{\prime}+D)^{-1}
×(yi(δ+1NTδ~)(α+1NTα~)yi,1)\displaystyle\quad\times\Big{(}y_{i}-(\delta+\frac{1}{\sqrt{NT}}\widetilde{\delta})-(\alpha+\frac{1}{\sqrt{NT}}\widetilde{\alpha})y_{i,-1}\Big{)}
=N2log|GG+D|12i=1N(Fλi+εi1NTδ~α~1NTyi,1)(GG+D)1\displaystyle=-\frac{N}{2}\log|GG^{\prime}+D|-\frac{1}{2}\sum_{i=1}^{N}\Big{(}F\lambda_{i}+\varepsilon_{i}-\frac{1}{\sqrt{NT}}\widetilde{\delta}-\widetilde{\alpha}\frac{1}{\sqrt{NT}}y_{i,-1}\Big{)}^{\prime}(GG^{\prime}+D)^{-1}
×(Fλi+εi1NTδ~α~1NTyi,1)\displaystyle\quad\times\Big{(}F\lambda_{i}+\varepsilon_{i}-\frac{1}{\sqrt{NT}}\widetilde{\delta}-\widetilde{\alpha}\frac{1}{\sqrt{NT}}y_{i,-1}\Big{)}

where yi,1=Jyi=(0,yi1,yi2,,yiT1)y_{i,-1}=Jy_{i}=(0,y_{i1},y_{i2},...,y_{iT-1})^{\prime}. Thus, the difference is

(θ0+1NTθ~)\displaystyle\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta}) (θ0)=\displaystyle-\ell(\theta^{0})=
\displaystyle- N2[log|GG+D|log|FF+D|]\displaystyle\frac{N}{2}\Big{[}\log|GG^{\prime}+D|-\log|FF^{\prime}+D|\Big{]}
\displaystyle- 12i=1N(Fλi+εi)[(GG+D)1(FF+D)1](Fλi+εi)\displaystyle\frac{1}{2}\sum_{i=1}^{N}(F\lambda_{i}+\varepsilon_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}(F\lambda_{i}+\varepsilon_{i})
+\displaystyle+ α~1NTi=1Nyi,1(GG+D)1(Fλi+εi)\displaystyle\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i}) (20)
\displaystyle- 12α~21NTi=1Nyi,1(GG+D)1yi,1\displaystyle\frac{1}{2}\widetilde{\alpha}^{2}\frac{1}{NT}\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1}
+\displaystyle+ 1NTi=1Nδ~(GG+D)1(Fλi+εi)\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
\displaystyle- α~1NTi=1Nδ~(GG+D)1yi,1\displaystyle\widetilde{\alpha}\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1}
\displaystyle- 121Tδ~(GG+D)1δ~\displaystyle\frac{1}{2}\frac{1}{T}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}\widetilde{\delta}

Throughout, we use the matrix inversion formula (Woodbury formula)

(FF+D)1=D1D1F(Ir+FD1F)1FD1(FF^{\prime}+D)^{-1}=D^{-1}-D^{-1}F(I_{r}+F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}
(FF+D)1F=D1F(Ir+FD1F)1(FF^{\prime}+D)^{-1}F=D^{-1}F(I_{r}+F^{\prime}D^{-1}F)^{-1}

and the matrix determinant result

|FF+D|=|D||Ir+FD1F||FF^{\prime}+D|=|D||I_{r}+F^{\prime}D^{-1}F|

From now on, we assume r=1r=1 to simply the derivation. We define ωF2\omega_{F}^{2} as

ωF2=1TFD1F\omega_{F}^{2}=\frac{1}{T}F^{\prime}D^{-1}F

We start with a number of lemmas. The first few do not involve any random quantities, and are results of matrix algebra and Taylor expansions.

Lemma 1.

For G=F+1NTF~G=F+\frac{1}{\sqrt{NT}}\widetilde{F},

N2[log|GG+D|log|FF+D|]=NT(FD1F~/T)1ωF2+o(1)-\frac{N}{2}\Big{[}\log|GG^{\prime}+D|-\log|FF^{\prime}+D|\Big{]}=-\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}+o(1) (21)

Proof: Notice

|GG+D|=|(F+1NTF~)(F+1NTF~)+D|=|D|[1+(F+1NTF~)D1(F+1NTF~)]=|D|(1+FD1F+21NTFD1F~+1NTF~D1F~)=|D|(1+FD1F)[1+21T1NTFD1F~1ωF2+1NT2F~D1F~1ωF2]+RNT\begin{split}|GG^{\prime}+D|=&|(F+\frac{1}{\sqrt{NT}}\widetilde{F})(F+\frac{1}{\sqrt{NT}}\widetilde{F})^{\prime}+D|\\ =&|D|[1+(F+\frac{1}{\sqrt{NT}}\widetilde{F})^{\prime}D^{-1}(F+\frac{1}{\sqrt{NT}}\widetilde{F})]\\ =&|D|\Big{(}1+F^{\prime}D^{-1}F+2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\Big{)}\\ =&|D|(1+F^{\prime}D^{-1}F)\Big{[}1+2\frac{1}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{2}}+\frac{1}{NT^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{2}}\Big{]}+R_{NT}\\ \end{split}

where RNTR_{NT} is negligible. From log(1+x)x\log(1+x)\approx x,

log|GG+D|log|FF+D|=log[1+21T1NTFD1F~/ωF2+1NT2F~D1F~/ωF2]\log|GG^{\prime}+D|-\log|FF^{\prime}+D|=\log\Big{[}1+2\frac{1}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+\frac{1}{NT^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}\Big{]}
=21T1NTFD1F~/ωF2+1NT2F~D1F~/ωF2+RNT1=2\frac{1}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+\frac{1}{NT^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+R_{NT1}

where RNT1R_{NT1} is a higher order remainder term. Thus

N2[log|GG+D|log|FF+D|]\displaystyle-\frac{N}{2}\Big{[}\log|GG^{\prime}+D|-\log|FF^{\prime}+D|\Big{]}
=NT1NTFD1F~/ωF212T2F~D1F~/ωF2+o(1)\displaystyle=-\frac{N}{T}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}-\frac{1}{2T^{2}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}/\omega_{F}^{2}+o(1)

The second term on the right is negligible. This proves Lemma 1. \Box

Lemma 2.

Let H:=(1+GD1G)1(1+FD1F)1H:=(1+G^{\prime}D^{-1}G)^{-1}-(1+F^{\prime}D^{-1}F)^{-1}. Then

H=\displaystyle H= 21T21NTFD1F~1ωF41NT3F~D1F~1ωF4\displaystyle-2\frac{1}{T^{2}}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}-\frac{1}{NT^{3}}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}} (22)
+41T31NTFD1F~1ωF6+4(FD1F~T)21NT2ωF6+R1\displaystyle+4\frac{1}{T^{3}}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{6}}+4(\frac{F^{\prime}D^{-1}\widetilde{F}}{T})^{2}\frac{1}{NT^{2}\omega_{F}^{6}}+R_{1}

with R1R_{1} being negligible (note this order of expansion is necessary).

Proof:

(GG+D)1\displaystyle(GG^{\prime}+D)^{-1} =D1D1G(1+GD1G)1GD1\displaystyle=D^{-1}-D^{-1}G(1+G^{\prime}D^{-1}G)^{-1}G^{\prime}D^{-1}
(1+GD1G)\displaystyle(1+G^{\prime}D^{-1}G) =(1+FD1F+21NTFD1F~+1NTF~D1F~)\displaystyle=\Big{(}1+F^{\prime}D^{-1}F+2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}\Big{)}
=(1+FD1F)(1+21NTFD1F~+1NTF~D1F~1+FD1F)\displaystyle=(1+F^{\prime}D^{-1}F)\Big{(}1+\frac{2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}}{1+F^{\prime}D^{-1}F}\Big{)}
(1+GD1G)1\displaystyle(1+G^{\prime}D^{-1}G)^{-1} =(1+FD1F)1(1+21NTFD1F~+1NTF~D1F~1+FD1F)1\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}\Big{(}1+\frac{2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F}}{1+F^{\prime}D^{-1}F}\Big{)}^{-1}

Let A=21NTFD1F~+1NTF~D1F~A=2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}+\frac{1}{NT}\widetilde{F}^{\prime}D^{-1}\widetilde{F} and use the expansion 1/(1+x)1x+x21/(1+x)\simeq 1-x+x^{2}, we have

(1+GD1G)1\displaystyle(1+G^{\prime}D^{-1}G)^{-1}
(1+FD1F)1(1A1+FD1F+A2(1+FD1F)2)\displaystyle\approx(1+F^{\prime}D^{-1}F)^{-1}\Big{(}1-\frac{A}{1+F^{\prime}D^{-1}F}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{2}}\Big{)}
=(1+FD1F)1A(1+FD1F)2+A2(1+FD1F)3\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{(1+F^{\prime}D^{-1}F)^{2}}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}
=(1+FD1F)1AT2ωF4(1+1TωF2)2+A2(1+FD1F)3\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{T^{2}\omega_{F}^{4}(1+\frac{1}{T\omega_{F}^{2}})^{2}}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}
(1+FD1F)1AT2ωF4(121TωF2)+A2(1+FD1F)3\displaystyle\approx(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{T^{2}\omega_{F}^{4}}(1-2\frac{1}{T\omega_{F}^{2}})+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}
=(1+FD1F)1AT2ωF4+2AT3ωF6+A2(1+FD1F)3\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}-\frac{A}{T^{2}\omega_{F}^{4}}+2\frac{A}{T^{3}\omega_{F}^{6}}+\frac{A^{2}}{(1+F^{\prime}D^{-1}F)^{3}}

The above can be further approximated. For the third expression, we keep the first term of AA (i.e., 21NTFD1F~2\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}), and for the last expression, we keep the first term of A2A^{2} (i.e, 41NT(FD1F~)24\frac{1}{NT}(F^{\prime}D^{-1}\widetilde{F})^{2}), other terms are negligible. The denominator of the last expression is treated as T3ωF6T^{3}\omega_{F}^{6}. This gives the lemma. \Box

Lemma 3.

The following T×TT\times T matrix Ξ\Xi satisfies

Ξ:=(GG+D)1(FF+D)1=ΞaΞbΞcΞd+R\Xi:=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}=-\Xi_{a}-\Xi_{b}-\Xi_{c}-\Xi_{d}+R

where

Ξa=HD1FFD1\displaystyle\Xi_{a}=HD^{-1}FF^{\prime}D^{-1} (23)
Ξb=[1NT(1TωF21T2ωF4)21NT3FD1F~1ωF4]D1FF~D1\displaystyle\Xi_{b}=\Big{[}\frac{1}{\sqrt{NT}}\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}-2\frac{1}{NT^{3}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}\Big{]}D^{-1}F\widetilde{F}^{\prime}D^{-1} (24)
Ξc=[1NT(1TωF21T2ωF4)21NT3FD1F~1ωF4]D1F~FD1\displaystyle\Xi_{c}=\Big{[}\frac{1}{\sqrt{NT}}\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}-2\frac{1}{NT^{3}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}\Big{]}D^{-1}\widetilde{F}F^{\prime}D^{-1} (25)
Ξd=1NT21ωF2D1F~F~D1\displaystyle\Xi_{d}=\frac{1}{NT^{2}}\frac{1}{\omega_{F}^{2}}D^{-1}\widetilde{F}\widetilde{F}^{\prime}D^{-1} (26)

where HH is defined in Lemma 2, and R is a negligible higher order term with Rsp=o(1/(NT))\|R\|_{sp}=o(1/(NT)).

Proof: From the Woodbury formula

Ξ=D1G(1+GD1G)1GD1D1F(1+FD1F)FD1-\Xi=D^{-1}G(1+G^{\prime}D^{-1}G)^{-1}G^{\prime}D^{-1}-D^{-1}F(1+F^{\prime}D^{-1}F)F^{\prime}D^{-1}

we can write

G(1+GD1G)1G=(1+FD1F)1GG+HGGG(1+G^{\prime}D^{-1}G)^{-1}G^{\prime}=(1+F^{\prime}D^{-1}F)^{-1}GG^{\prime}+HGG^{\prime} (27)

where HH is defined in Lemma 2. The first term on the right hand side above is

(1+FD1F)GG\displaystyle(1+F^{\prime}D^{-1}F)GG^{\prime} =(1+FD1F)1(FF+1NTFF~+1NTF~F+1NTF~F~)\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}\Big{(}FF^{\prime}+\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}\Big{)}
=(1+FD1F)1FF\displaystyle=(1+F^{\prime}D^{-1}F)^{-1}FF^{\prime}
+(1+FD1F)1(1NTFF~+1NTF~F+1NTF~F~)\displaystyle+(1+F^{\prime}D^{-1}F)^{-1}\Big{(}\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}\Big{)}

Use

(1+FD1F)1=1T1ωF21T21ωF4+R2(1+F^{\prime}D^{-1}F)^{-1}=\frac{1}{T}\frac{1}{\omega_{F}^{2}}-\frac{1}{T^{2}}\frac{1}{\omega_{F}^{4}}+R_{2}

where R2=O(1/T3)R_{2}=O(1/T^{3}) is negligible, we have

(1+FD1F)1\displaystyle(1+F^{\prime}D^{-1}F)^{-1} GG(1+FD1F)1FF\displaystyle GG-(1+F^{\prime}D^{-1}F)^{-1}FF^{\prime}
=(1TωF21T2ωF4)[1NTFF~+1NTF~F+1NTF~F~]+R3\displaystyle=\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}\Big{[}\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}\Big{]}+R_{3} (28)
=(1TωF21T2ωF4)[1NTFF~+1NTF~F]+1ωF21NT2F~F~+R4\displaystyle=\Big{(}\frac{1}{T\omega_{F}^{2}}-\frac{1}{T^{2}\omega_{F}^{4}}\Big{)}\Big{[}\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}\Big{]}+\frac{1}{\omega_{F}^{2}}\frac{1}{NT^{2}}\widetilde{F}\widetilde{F}^{\prime}+R_{4}

The last equality follows by ignoring higher order terms; R3R_{3} and R4R_{4} are negligible. To see this, R3R_{3} is equal to R2R_{2} multiplied the three matrices in the bracket of (28). But the Frobenius norm of each matrix is O(T/NT)O(T/\sqrt{NT}) and R2=O(1/T3)R_{2}=O(1/T^{3}) so the product is O(1/(T2NT))O(1/(T^{2}\sqrt{NT})). R4R_{4} is equal to R3+O(1/(T2N))R_{3}+O(1/(T^{2}N)). To analyze HGGHGG^{\prime} in (27), we use GG=FF+1NTFF~+1NTF~F+1NTF~F~GG^{\prime}=FF^{\prime}+\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime}, but we can ignore 1NTF~F~\frac{1}{NT}\widetilde{F}\widetilde{F}^{\prime} because its product with HH is a higher order term. Thus

HGG=HFF+H1NTFF~+H1NTF~F+R5HGG^{\prime}=HFF^{\prime}+H\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime}+H\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}+R_{5} (29)

where HH is given in (22). All of the four terms in HH are non-negligible for the matrix HFFHFF^{\prime}; only the first term in HH is non-negligible for H1NTFF~H\frac{1}{\sqrt{NT}}F\widetilde{F}^{\prime} and H1NTF~FH\frac{1}{\sqrt{NT}}\widetilde{F}F^{\prime}. Thus, combining (28) and (29) and pre and post multiplying D1D^{-1}, we obtain the lemma. \Box.

Lemma 4.
12\displaystyle-\frac{1}{2} i=1N(Fλi)[(GG+D)1(FF+D)1]Fλi\displaystyle\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}F\lambda_{i}
=NT(FD1F~/T)1ωF2121TF~D1/2MD1/2FD1/2F~+op(1)\displaystyle=\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}-\frac{1}{2}\frac{1}{T}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}+o_{p}(1)

where D1/2MD1/2FD1/2=D1D1F(FD1F)1FD1D^{-1/2}M_{D^{-1/2}F}D^{-1/2}=D^{-1}-D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}.

Note the first term has an opposite sign with (21).

Proof: By Lemma 3, consider

i=1N(Fλi)Ξa(Fλi)=Hi=1N(Fλi)D1FFD1(Fλi)=2NT(FD1F~/T)+(F~D1F~/T)4NT(FD1F~/T)1ωF24(FD1F~/T)21ωF2+op(1)\begin{split}\sum_{i=1}^{N}(F\lambda_{i}^{\prime})^{\prime}\Xi_{a}(F\lambda_{i})=&-H\sum_{i=1}^{N}(F\lambda_{i})^{\prime}D^{-1}FF^{\prime}D^{-1}(F\lambda_{i})\\ =&2\sqrt{NT}(F^{\prime}D^{-1}\widetilde{F}/T)\\ &+(\widetilde{F}^{\prime}D^{-1}\widetilde{F}/T)\\ &-4\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}\\ &-4(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1)\\ \end{split} (30)

We have used the fact that 1Ni=1Nλi2=1+op(1)\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1+o_{p}(1). Next,

i=1N(Fλi)(Ξb+Ξc)(Fλi)=\displaystyle\sum_{i=1}^{N}(F\lambda_{i}^{\prime})^{\prime}(\Xi_{b}+\Xi_{c})(F\lambda_{i})= 2NT(F~D1F/T)+2NT(F~D1F/T)1ωF2\displaystyle-2\sqrt{NT}(\widetilde{F}^{\prime}D^{-1}F/T)+2\sqrt{\frac{N}{T}}(\widetilde{F}^{\prime}D^{-1}F/T)\frac{1}{\omega_{F}^{2}}
+4(FD1F~/T)21ωF2+op(1)\displaystyle+4(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1)

and

i=1N(Fλi)Ξd(Fλi)=(FD1F~/T)21ωF2+op(1).\sum_{i=1}^{N}(F\lambda_{i}^{\prime})^{\prime}\Xi_{d}(F\lambda_{i})=(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1).

Summing up terms gives

12i=1N\displaystyle-\frac{1}{2}\sum_{i=1}^{N} (Fλi)[(GG+D)1(FF+D)1]Fλi\displaystyle(F\lambda_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}F\lambda_{i}
=NT(FD1F~/T)1ωF212(F~D1F~/T)+12(FD1F~/T)21ωF2+op(1)\displaystyle=\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}-\frac{1}{2}(\widetilde{F}^{\prime}D^{-1}\widetilde{F}/T)+\frac{1}{2}(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}+o_{p}(1)

we can rewrite

(F~D1F~/T)(FD1F~/T)21ωF2=1T[F~D1F~F~D1F(FD1F)1FD1F~](\widetilde{F}^{\prime}D^{-1}\widetilde{F}/T)-(F^{\prime}D^{-1}\widetilde{F}/T)^{2}\frac{1}{\omega_{F}^{2}}=\frac{1}{T}\Big{[}\widetilde{F}^{\prime}D^{-1}\widetilde{F}-\widetilde{F}^{\prime}D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}\widetilde{F}\Big{]}
=1TF~D1/2MD1/2FD1/2F~=\frac{1}{T}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}

This gives Lemma 4. \Box.

Lemma 5.
12i=1Nεi[(GG+D)1(FF+D)1]εi=op(1)-\frac{1}{2}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}\varepsilon_{i}=o_{p}(1) (31)

Proof: By the notation of Lemma 3, we evaluate i=1NεiΞεi\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi\varepsilon_{i}. Note that Ξ\Xi consists of four parts. For this lemma, it is sufficient to approximate Ξ\Xi by Ξ1+Ξ2+Ξ3\Xi_{1}+\Xi_{2}+\Xi_{3}, where

Ξ1\displaystyle\Xi_{1} =(21T21NTFD1F~1ωF4)D1FFD1\displaystyle=\Big{(}2\frac{1}{T^{2}}\frac{1}{\sqrt{NT}}F^{\prime}D^{-1}\widetilde{F}\frac{1}{\omega_{F}^{4}}\Big{)}D^{-1}FF^{\prime}D^{-1}
Ξ2\displaystyle\Xi_{2} =(1NT1T1ωF2)D1FF~D1\displaystyle=-\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}F\widetilde{F}^{\prime}D^{-1} (32)
Ξ3\displaystyle\Xi_{3} =(1NT1T1ωF2)D1F~FD1\displaystyle=-\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}\widetilde{F}F^{\prime}D^{-1}

In the above approximation, we kept the first term of HH in (23) (HH has four terms), and kept the very first term inside the brackets in (24) and (25). All other terms are negligible in the evaluation of i=1NεiΞεi\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi\varepsilon_{i}. Using trace, it is easy to obtain the expected values

𝔼i=1NεiΞ1εi=2NT(FD1F~/T)1ωF2\mathbb{E}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi_{1}\varepsilon_{i}=2\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}

And

𝔼i=1Nεi(Ξ2+Ξ3)εi=2NT(FD1F~/T)1ωF2\mathbb{E}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}(\Xi_{2}+\Xi_{3})\varepsilon_{i}=-2\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}

Thus the sum of the expected values is zero. In addition, the deviation of each term from its expected value is op(1)o_{p}(1). This is because the variance of each term is O(1/T)=o(1)O(1/T)=o(1). For example, var(i=1NεiΞ2εi)=tr(Ξ2DΞ2D)+tr(Ξ2DΞD)=1T(1TF~D1F)21ωF4+1T(1TF~D1F)(1TFD1F)1ωF4=O(1/T)\mbox{var}(\sum_{i=1}^{N}\varepsilon_{i}^{\prime}\Xi_{2}\varepsilon_{i})=\mathrm{tr}(\Xi_{2}D\Xi_{2}D)+\mathrm{tr}(\Xi_{2}^{\prime}D\Xi D)=\frac{1}{T}(\frac{1}{T}\widetilde{F}^{\prime}D^{-1}F)^{2}\frac{1}{\omega_{F}^{4}}+\frac{1}{T}(\frac{1}{T}\widetilde{F}^{\prime}D^{-1}F)(\frac{1}{T}F^{\prime}D^{-1}F)\frac{1}{\omega_{F}^{4}}=O(1/T). We have used the fact that for normal εiN(0,D)\varepsilon_{i}\sim N(0,D), var(εiAεi)=tr(ADAD)+tr(ADAD)\mbox{var}(\varepsilon_{i}^{\prime}A\varepsilon_{i})=\mathrm{tr}(ADAD)+\mathrm{tr}(A^{\prime}DAD) for any AA, This proves the lemma. \Box.

Lemma 6.
i=1N(Fλi)[(GG+D)1(FF+D)1]εi=1NTi=1NλiF~D1/2MD1/2FD1/2εi+op(1)\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}\varepsilon_{i}=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}+o_{p}(1)

Proof: Recall Ξ=(GG+D)1(FF+D)1\Xi=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}. The preceding approximation of Ξ\Xi by Ξ1+Ξ2+Ξ3\Xi_{1}+\Xi_{2}+\Xi_{3} in (32) is sufficient (other terms are negligible). We evaluate i=1N(Fλi)Ξkεi\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{k}\varepsilon_{i} (k=1,2,3)(k=1,2,3).

i=1N(Fλi)Ξ1εi\displaystyle\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{1}\varepsilon_{i} =2(FD1F~/T)(FD1F/T)1NTi=1NFD1εiλi1ωF4\displaystyle=2(F^{\prime}D^{-1}\widetilde{F}/T)(F^{\prime}D^{-1}F/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}\frac{1}{\omega_{F}^{4}}
=21ωF2(FD1F~/T)1NTi=1NFD1εiλi\displaystyle=2\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}
i=1N(Fλi)Ξ2εi=1ωF2(FD1F/T)1NTi=1NF~D1εiλi=1NTi=1NF~D1εiλi\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{2}\,\varepsilon_{i}=-\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}F/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}

and

i=1N(Fλi)Ξ3εi=1ωF2(FD1F~/T)1NTi=1NFD1εiλi\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi_{3}\,\varepsilon_{i}=-\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}

Combining the three expressions

i=1N(Fλi)Ξεi=1NTi=1NλiF~D1εi+1ωF2(FD1F~/T)1NTi=1NFD1εiλi+op(1)\sum_{i=1}^{N}(F\lambda_{i})^{\prime}\Xi\varepsilon_{i}=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}+\frac{1}{\omega_{F}^{2}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}+o_{p}(1)

Using the definition of MD1/2FM_{D^{-1/2}F} and ωT2\omega_{T}^{2} and rewriting the RHS gives the lemma. \Box

Corollary 5.
12i=1N(Fλi+εi)\displaystyle-\frac{1}{2}\sum_{i=1}^{N}(F\lambda_{i}+\varepsilon_{i})^{\prime} [(GG+D)1(FF+D)1](Fλi+εi)=NT(FD1F~/T)1ωF2\displaystyle\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}(F\lambda_{i}+\varepsilon_{i})=\sqrt{\frac{N}{T}}(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}
+1NTi=1NλiF~D1/2MD1/2FD1/2εi\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}
121TF~D1/2MD1/2FD1/2F~+op(1)\displaystyle-\frac{1}{2}\frac{1}{T}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}+o_{p}(1)

Proof: This follows by combining the results of Lemmas 4, 5, and 6. \Box

Lemma 7.
1NTi=1Nyi,1\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime} (FF+D)1(Fλi+εi)=1NTi=1NδL(FF+D)1(Fλi+εi)\displaystyle(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\delta^{\prime}L^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
+1NTi=1N(Lεi)D1εi\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i} (33)
+1NTi=1Nλi(LF)[D1/2MD1/2FD1/2]εi+op(1)\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}+o_{p}(1)

Proof: Using yi,1=Lδ+LFλi+Lεiy_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}, the left hand side equals the sum of the first term on the right hand side and the following two expressions

1NTi=1N(LFλi+Lεi)(FF+D)1Fλi+1NTi=1N(LFλi+Lεi)(FF+D)1εi:=a+b\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}F\lambda_{i}+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}\varepsilon_{i}:=a+b

where aa is defined as the frist term, bb as the second term. From the formula,

(FF+D)1F=D1F(1+FD1F)1=D1F(1+TωF2)1(FF^{\prime}+D)^{-1}F=D^{-1}F({1+F^{\prime}D^{-1}F})^{-1}=D^{-1}F(1+T\omega_{F}^{2})^{-1}

term aa equals

a\displaystyle a =N/T1+TωF2(1Ni=1Nλi2)(FLD1F)+11+TωF21NTi=1NεiLD1Fλi\displaystyle=\frac{\sqrt{N/T}}{1+T\omega_{F}^{2}}(\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2})(F^{\prime}L^{\prime}D^{-1}F)+\frac{1}{1+T\omega_{F}^{2}}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}L^{\prime}D^{-1}F\lambda_{i}
=N/T1+TωF2(FLD1F)+op(1)\displaystyle=\frac{\sqrt{N/T}}{1+T\omega_{F}^{2}}(F^{\prime}L^{\prime}D^{-1}F)+o_{p}(1)

where we used 1Ni=1Nλi2=1\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1, and the second term is Op(1/T)O_{p}(1/T).

For term bb, use the Woodbury formula,

b\displaystyle b =1NTi=1N(LFλi)D1εi+1NTi=1N(Lεi)D1εi\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(LF\lambda_{i})^{\prime}D^{-1}\varepsilon_{i}+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}
1NTi=1NλiFLD1F(1+FD1F)FD1εi1(1+FD1F)1NTi=1NεiLD1FFD1εi\displaystyle-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\frac{\lambda_{i}^{\prime}F^{\prime}L^{\prime}D^{-1}F}{(1+F^{\prime}D^{-1}F)}F^{\prime}D^{-1}\varepsilon_{i}-\frac{1}{(1+F^{\prime}D^{-1}F)}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\varepsilon_{i}^{\prime}L^{\prime}D^{-1}FF^{\prime}D^{-1}\varepsilon_{i}

Note the expected value of the last term in the preceding equation is

FLD1F1+FD1FNT-\frac{F^{\prime}L^{\prime}D^{-1}F}{1+F^{\prime}D^{-1}F}\sqrt{\frac{N}{T}}

and the deviation from its expected value is negligible (because its variance is O(1/T)=o(1)O(1/T)=o(1), following from the same argument in Lemma 5). The above expected value cancels out with term aa. Thus, we can rewrite

a+b=1NTi=1N(Lεi)D1εi+1NTi=1Nλi(LF)[D1/2MD1/2FD1/2]εi+op(1)a+b=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}+o_{p}(1)

where we have used 1/(1+FD1F)1/(FD1F)1/(1+F^{\prime}D^{-1}F)\approx 1/(F^{\prime}D^{-1}F). This proves Lemma 7.

Lemma 8.
1NTi=1Nyi,1\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime} [(GG+D)1(FF+D)1](Fλi+εi)\displaystyle\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}(F\lambda_{i}+\varepsilon_{i}) (34)
=1T(LF)[D1/2MD1/2FD1/2]F~+op(1)\displaystyle=-\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}+o_{p}(1)

Proof: It is not difficult to show

1NTi=1Nyi,1[(GG+D)1(FF+D)1]εi=op(1)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Big{[}(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}\Big{]}\varepsilon_{i}=o_{p}(1)

we thus focus on 1NTi=1Nyi,1ΞFλi\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi F\lambda_{i}, where Ξ=(GG+D)1(FF+D)1\Xi=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}. Approximating Ξ\Xi by (32) is sufficient.

Using yi,1=Lδ+LFλi+Lεiy_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}, we have

1NTi=1Nyi,1Ξ1Fλi\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi_{1}F\lambda_{i} =1NTi=1N(Lδ+LFλi+Lεi)(21T21NT(FD1F~)1ωF4)D1(FF)D1Fλi\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta+LF\lambda_{i}+L\varepsilon_{i})^{\prime}\Big{(}2\frac{1}{T^{2}}\frac{1}{\sqrt{NT}}(F^{\prime}D^{-1}\widetilde{F})\frac{1}{\omega_{F}^{4}}\Big{)}D^{-1}(FF^{\prime})D^{-1}F\lambda_{i}
=2(FLD1F/T)(FD1F~/T)1ωF2+op(1)\displaystyle=2(F^{\prime}L^{\prime}D^{-1}F/T)(F^{\prime}D^{-1}\widetilde{F}/T)\frac{1}{\omega_{F}^{2}}+o_{p}(1)

where the two terms involving LδL\delta and LεiL\varepsilon_{i} are negligible, and each is Op(N1/2)O_{p}(N^{-1/2}). Here we have used FD1F=TωF2F^{\prime}D^{-1}F=T\omega_{F}^{2} and 1Ni=1Nλi2=1+op(1)\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1+o_{p}(1). Next,

1NTi=1Nyi,1Ξ2Fλi\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi_{2}F\lambda_{i} =1NTi=1N(Lδ+LFλi+Lεi)(1NT1T1ωF2)D1(FF~)D1Fλi\displaystyle=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta+LF\lambda_{i}+L\varepsilon_{i})^{\prime}\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}(F\widetilde{F}^{\prime})D^{-1}F\lambda_{i}
=1ωF2(FLD1F/T)(F~D1F/T)+op(1),\displaystyle=-\frac{1}{\omega_{F}^{2}}(F^{\prime}L^{\prime}D^{-1}F/T)(\widetilde{F}^{\prime}D^{-1}F/T)+o_{p}(1),

where terms involving LδL\delta and LεiL\varepsilon_{i} are negligible. Next

1NTi=1Nyi,1Ξ3Fλi\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi_{3}F\lambda_{i} =1NTi=1N(Lδ+LFλi+Lεi)(1NT1T1ωF2)D1(F~F)D1Fλi\displaystyle=-\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta+LF\lambda_{i}+L\varepsilon_{i})^{\prime}\Big{(}\frac{1}{\sqrt{NT}}\frac{1}{T}\frac{1}{\omega_{F}^{2}}\Big{)}D^{-1}(\widetilde{F}F^{\prime})D^{-1}F\lambda_{i}
=(FLD1F~/T)+op(1)\displaystyle=-(F^{\prime}L^{\prime}D^{-1}\widetilde{F}/T)+o_{p}(1)

which follows from the same reasoning as given for the term involving Ξ1\Xi_{1}. Summing up,

1NTi=1Nyi,1Ξ(Fλi+εi)\displaystyle\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi(F\lambda_{i}+\varepsilon_{i}) =1ωF2(FLD1F/T)(F~D1F/T)(FLD1F~/T)+op(1)\displaystyle=\frac{1}{\omega_{F}^{2}}(F^{\prime}L^{\prime}D^{-1}F/T)(\widetilde{F}^{\prime}D^{-1}F/T)-(F^{\prime}L^{\prime}D^{-1}\widetilde{F}/T)+o_{p}(1)

Using the definition of MD1/2FM_{D^{-1/2}F} and rewriting give the Lemma. \Box

Corollary 6.
α~1NTi=1Nyi,1(GG+D)1(Fλi+εi)=α~1NTi=1N(Lεi)D1εi+α~1NTi=1Nλi(LF)[D1/2MD1/2FD1/2]εiα~1T(LF)[D1/2MD1/2FD1/2]F~+α~1NTi=1NδL(FF+D)1(Fλi+εi)+op(1)\begin{split}\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}&(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})\\ &=\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}\\ &+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}\\ &-\widetilde{\alpha}\,\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}\\ &+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\delta^{\prime}L^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})+o_{p}(1)\end{split} (35)

Proof: adding and subtracting terms

α~1NT\displaystyle\widetilde{\alpha}\frac{1}{\sqrt{NT}} i=1Nyi,1(GG+D)1(Fλi+εi)\displaystyle\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
=α~1NTi=1Nyi,1(FF+D)1(Fλi+εi)+α~1NTi=1Nyi,1Ξ(Fλi+εi)\displaystyle=\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})+\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}y_{i,-1}^{\prime}\Xi(F\lambda_{i}+\varepsilon_{i})

where, by definition, Ξ=(GG+D)1(FF+D)1\Xi=(GG^{\prime}+D)^{-1}-(FF^{\prime}+D)^{-1}. The corollary follows from Lemmas 7 and 8, that is, by summing (33) and (34), where every term is multiplied by α~\widetilde{\alpha}. \Box

Lemma 9.
1NTi=1Nyi,1(GG+D)1yi,1\displaystyle\frac{1}{NT}\sum_{i=1}^{N}y_{i,-1}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1} =1T(Lδ)(FF+D)1Lδ\displaystyle=\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}L\delta
+1Ttr(LD1LD)\displaystyle+\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD) (36)
+1Ttr[(LF)[D1/2MD1/2FD1/2](LF)]\displaystyle+\frac{1}{T}\mathrm{tr}[(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)]

Proof: Here it is sufficient to approximate (GG+D)1(GG^{\prime}+D)^{-1} by (FF+D)1(FF^{\prime}+D)^{-1} (recall (GG+D)1=(FF+D)1+Ξ(GG^{\prime}+D)^{-1}=(FF^{\prime}+D)^{-1}+\Xi, terms involving Ξ\Xi are negligible). Using yi,1=Lδ+LFλi+Lεiy_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}, we rewrite

1NT\displaystyle\frac{1}{NT} i=1Nyi,1(FF+D)1yi,1=1T(Lδ)(FF+D)1Lδ\displaystyle\sum_{i=1}^{N}y_{i,-1}^{\prime}(FF^{\prime}+D)^{-1}y_{i,-1}=\frac{1}{T}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}L\delta
+21NTi=1N(Lδ)(FF+D)1(Fλi+εi)\displaystyle+2\frac{1}{NT}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
+1NTi=1N(LFλi+Lεi)(FF+D)1(LFλi+Lεi)\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}(FF^{\prime}+D)^{-1}(LF\lambda_{i}+L\varepsilon_{i})

The second term is negligible (Its mean is zero, its variance converges to zero). Consider the third term. Using the Wooldbury formula, we rewrite the third term as

1NTi=1N(LFλi+Lεi)D1(LFλi+Lεi)\displaystyle\frac{1}{NT}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}D^{-1}(LF\lambda_{i}+L\varepsilon_{i})
1(1+FD1F)1NTi=1N(LFλi+Lεi)D1FFD1(LFλi+Lεi)\displaystyle-\frac{1}{(1+F^{\prime}D^{-1}F)}\frac{1}{NT}\sum_{i=1}^{N}(LF\lambda_{i}+L\varepsilon_{i})^{\prime}D^{-1}FF^{\prime}D^{-1}(LF\lambda_{i}+L\varepsilon_{i})
:=ab\displaystyle:=a-b

where aa and bb represent the first and the second expressions, respectively. Notice

a=1T(LF)D1LF+1Ttr(LD1LD)+op(1)a=\frac{1}{T}(LF)^{\prime}D^{-1}LF+\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)+o_{p}(1)

where we have used 𝔼(εiLD1Lεi)=tr(LD1LD)\mathbb{E}(\varepsilon_{i}^{\prime}L^{\prime}D^{-1}L\varepsilon_{i})=\mathrm{tr}(L^{\prime}D^{-1}LD) and 1Ni=1Nλi2=1+op(1)\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1+o_{p}(1). The cross product terms are negligible. Next,

b\displaystyle b =T1+TωT2[(FLD1FT)2+1T(FD1LDLD1FT)]+op(1)\displaystyle=\frac{T}{1+T\omega_{T}^{2}}\Big{[}\Big{(}\frac{F^{\prime}L^{\prime}D^{-1}F}{T}\Big{)}^{2}+\frac{1}{T}\Big{(}\frac{F^{\prime}D^{-1}LDL^{\prime}D^{-1}F}{T}\Big{)}\Big{]}+o_{p}(1)
=1ωT2(FLD1FT)2+op(1)\displaystyle=\frac{1}{\omega_{T}^{2}}\Big{(}\frac{F^{\prime}L^{\prime}D^{-1}F}{T}\Big{)}^{2}+o_{p}(1)

here we used 𝔼(εiLD1FFD1Lεi)=FD1LDLD1F\mathbb{E}(\varepsilon_{i}^{\prime}L^{\prime}D^{-1}FF^{\prime}D^{-1}L\varepsilon_{i})=F^{\prime}D^{-1}LDL^{\prime}D^{-1}F, and 1Ni=1Nλi2=1\frac{1}{N}\sum_{i=1}^{N}\lambda_{i}^{2}=1 and the cross product terms are negligible. It follows that

ab\displaystyle a-b =1Ttr(LD1LD)+1T(LF)D1LF1ωT2(FLD1FT)2+op(1)\displaystyle=\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)+\frac{1}{T}(LF)^{\prime}D^{-1}LF-\frac{1}{\omega_{T}^{2}}\Big{(}\frac{F^{\prime}L^{\prime}D^{-1}F}{T}\Big{)}^{2}+o_{p}(1)
=1Ttr(LD1LD)+1T(LF)(D1/2MD1/2FD1/2)LF+op(1)\displaystyle=\frac{1}{T}\mathrm{tr}(L^{\prime}D^{-1}LD)+\frac{1}{T}(LF)^{\prime}(D^{-1/2}M_{D^{-1/2}F}D^{-1/2})LF+o_{p}(1)

Combining results we obtain Lemma 9. \Box

The next lemma is concerned with the last 3 terms of equation (20).

Lemma 10.
1NTi=1Nδ~(GG+D)1(Fλi+εi)=1NTi=1Nδ~(FF+D)1(Fλi+εi)+op(1)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})+o_{p}(1)
1NTi=1Nδ~(GG+D)1yi,1=1Tδ~(FF+D)1Lδ+op(1)\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}y_{i,-1}=\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta+o_{p}(1)
121Tδ~(GG+D)1δ~=12Tδ~(FF+D)1δ~+op(1)-\frac{1}{2}\frac{1}{T}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}\widetilde{\delta}=-\frac{1}{2T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}+o_{p}(1)

Proof: Consider (i). It is not difficult to show that replacing GG+DGG^{\prime}+D by FF+DFF^{\prime}+D generates a negligible term. Consider (ii). From yi,1=Lδ+LFλi+Lεiy_{i,-1}=L\delta+LF\lambda_{i}+L\varepsilon_{i}, we can write the left hand side of (ii) as

1NTi=1Nδ~(GG+D)1Lδ+1NTi=1Nδ~(GG+D)1LFλi+1NTi=1Nδ~(GG+D)1Lεi\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}L\delta+\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}LF\lambda_{i}+\frac{1}{NT}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}L\varepsilon_{i}

The first term does not depend on ii, is equal to T1δ~(GG+D)1Lδ~T^{-1}\widetilde{\delta}^{\prime}(GG^{\prime}+D)^{-1}L\widetilde{\delta}. Replace GG by FF only generates a negligible term. Thus the first term is T1δ~(FF+D)1Lδ+op(1)T^{-1}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta+o_{p}(1). By using λi\lambda_{i} being iid zero mean random variables, we can show the second term is Op(N1/2)O_{p}(N^{-1/2}). Similarly, the last term is Op((NT)1/2)O_{p}((NT)^{-1/2}). Thus the last two terms are negligible. For (iii). Here replacing GG by FF only generates a negligible term. \Box

Proof of Theorem 1. The local likelihood ratio is given by (20). Using Lemma 1, Corollary 5, Corollary 6, Lemma 9 (multiplied by 12α~2-\frac{1}{2}\widetilde{\alpha}^{2}), and Lemma 10, we obtain

(θ0+1NTθ~)(θ0)=A(θ~)+B(θ~)+op(1)\ell(\theta^{0}+\frac{1}{\sqrt{NT}}\widetilde{\theta})-\ell(\theta^{0})=A(\widetilde{\theta})+B(\widetilde{\theta})+o_{p}(1) (37)

where

A(θ~)\displaystyle A(\widetilde{\theta}) =1NTi=1NλiF~D1/2MD1/2FD1/2εi\displaystyle=\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}
+α~1NTi=1N(Lεi)D1εi\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i}
+α~1NTi=1Nλi(LF)[D1/2MD1/2FD1/2]εi\displaystyle+\widetilde{\alpha}\,\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\lambda_{i}^{\prime}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\varepsilon_{i}
121Ttr[F~D1/2MD1/2FD1/2F~]\displaystyle-\frac{1}{2}\frac{1}{T}\mathrm{tr}[\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F}] (38)
12α~2tr[1T(LD1LD)]\displaystyle-\frac{1}{2}\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(L^{\prime}D^{-1}LD)\Big{]}
12α~2tr[1T(LF)[D1/2MD1/2FD1/2](LF)]\displaystyle-\frac{1}{2}\widetilde{\alpha}^{2}\mathrm{tr}\Big{[}\frac{1}{T}(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}](LF)\Big{]}
α~1Ttr[(LF)[D1/2MD1/2FD1/2]F~]\displaystyle-\widetilde{\alpha}\,\frac{1}{T}\mathrm{tr}[(LF)^{\prime}[D^{-1/2}M_{D^{-1/2}F}D^{-1/2}]\widetilde{F}]
B(θ~)\displaystyle B(\widetilde{\theta}) =α~1NTi=1N(Lδ)(FF+D)1(Fλi+εi)\displaystyle=\widetilde{\alpha}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})
12α~21T[(Lδ)(FF+D)1(Lδ)]\displaystyle-\frac{1}{2}\widetilde{\alpha}^{2}\;\frac{1}{T}\Big{[}(L\delta)^{\prime}(FF^{\prime}+D)^{-1}(L\delta)\Big{]}
+1NTi=1Nδ~(FF+D)1(Fλi+εi)\displaystyle+\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i}) (39)
α~1Tδ~(FF+D)1Lδ\displaystyle-\widetilde{\alpha}\frac{1}{T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta
12Tδ~(FF+D)1δ~\displaystyle-\frac{1}{2T}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}

Note that A(θ~)A(\widetilde{\theta}) does not involve δ~\widetilde{\delta} or δ\delta. Terms related to δ~\widetilde{\delta} and δ\delta are given in B(θ~)B(\widetilde{\theta}).

Inspecting A(θ~)A(\widetilde{\theta}), the variances of the first three terms are given by the fourth to the sixth terms (times -1/2), respectively. The last term is twice of the covariance between the first and the third term (times -1/2), thus the last term is simply the negative covariance. The second term is uncorrelated with the first and third terms (since LεiL\varepsilon_{i} depends on the lags of εit\varepsilon_{it}, and εit1εit\varepsilon_{it-1}\varepsilon_{it} is uncorrelated with εik\varepsilon_{ik} for all tt and kk).

Inspecting B(θ~)B(\widetilde{\theta}), the variance of the first term is given by the second term (times -1/2). The variance of the third term is given by the last term (also times -1/2). The fourth term is the negative covariance between the first and third terms.

Furthermore, the random variables in A(θ~)A(\widetilde{\theta}) are uncorrelated with those in B(θ~)B(\widetilde{\theta}), because variables of the form λiCεi\lambda_{i}^{\prime}C\varepsilon_{i} is uncorrelated with λi\lambda_{i} and with εi\varepsilon_{i}; (Lεi)D1εi(L\varepsilon_{i})^{\prime}D^{-1}\varepsilon_{i} is uncorrelated with λi\lambda_{i} and with εi\varepsilon_{i} (LεiL\varepsilon_{i} depends on the lags of εi\varepsilon_{i} and DD is diagonal). Hence all variances and covariances are already accounted for and are included in A(θ~)A(\widetilde{\theta}) and B(θ~)B(\widetilde{\theta}).

Finally, ΔNT(θ~)\Delta_{NT}(\widetilde{\theta}) of Theorem 1 is composed of the first three terms of A(θ~)A(\widetilde{\theta}), plus the first and the third terms of B(θ~)B(\widetilde{\theta}) (these are the random terms). All the remaining terms constitute (1/2)𝔼ΔNT(θ~)2-(1/2)\mathbb{E}\Delta_{NT}(\widetilde{\theta})^{2}. This completes the proof of Theorem 1. \Box

Proof of Theorem 2. With respect to the local parameters F~\widetilde{F}, the proof of the previous theorem only uses T1/2F~=O(1)\|T^{-1/2}\widetilde{F}\|=O(1). If f~r2\widetilde{f}\in\ell_{r}^{2}, then F~=O(1)\|\widetilde{F}\|=O(1). The entire earlier proof holds with T1/2F~T^{-1/2}\widetilde{F} replaced by F~\widetilde{F}. In particular, equation (38) holds with T1/2F~T^{-1/2}\widetilde{F} replaced by F~\widetilde{F} (that is, omitting T1/2T^{-1/2}) due to F~=O(1)\|\widetilde{F}\|=O(1). Notice F~\widetilde{F} appears in three terms on the right hand side of (38), namely, the first, fourth, and the last term. We analyze each of them. The first term of (38) after replacing T1/2F~T^{-1/2}\widetilde{F} with F~\widetilde{F} is written as

1Ni=1NλiF~D1/2MD1/2FD1/2εi\displaystyle\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i} =1Ni=1NλiF~D1εi\displaystyle=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}
tr[F~D1F(FD1F)11Ni=1NFD1εiλi]\displaystyle-\mathrm{tr}[\widetilde{F}^{\prime}D^{-1}F(F^{\prime}D^{-1}F)^{-1}\frac{1}{\sqrt{N}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}^{\prime}]

But the second term on the righthand side is op(1)o_{p}(1), because it can be written as (ignore the trace)

1T(F~D1F)(FD1F/T)11NTi=1NFD1εiλi\frac{1}{\sqrt{T}}(\widetilde{F}^{\prime}D^{-1}F)(F^{\prime}D^{-1}F/T)^{-1}\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}^{\prime}

Now (FD1F/T)1=O(1)\|(F^{\prime}D^{-1}F/T)^{-1}\|=O(1), 1NTi=1NFD1εiλi=1NTt=1Ti=1N1σ2ftλiεit=Op(1)\frac{1}{\sqrt{NT}}\sum_{i=1}^{N}F^{\prime}D^{-1}\varepsilon_{i}\lambda_{i}^{\prime}=\frac{1}{\sqrt{NT}}\sum_{t=1}^{T}\sum_{i=1}^{N}\frac{1}{\sigma^{2}}f_{t}\lambda_{i}^{\prime}\varepsilon_{it}=O_{p}(1), but

1T(F~D1F)=1Tt=1T1σt2f~tft1aM1Tt=1Tf~t=o(1)\|\frac{1}{\sqrt{T}}(\widetilde{F}^{\prime}D^{-1}F)\|=\|\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\frac{1}{\sigma_{t}^{2}}\widetilde{f}_{t}f_{t}^{\prime}\|\leq\frac{1}{a}M\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\|\widetilde{f}_{t}\|=o(1) (40)

we have used σt2a>0\sigma_{t}^{2}\geq a>0, and ftM\|f_{t}\|\leq M. To see 1Tt=1Tf~t=o(1)\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\|\widetilde{f}_{t}\|=o(1) for f~r2\widetilde{f}\in\ell_{r}^{2}, notice f~s0\widetilde{f}_{s}\rightarrow 0 as ss\rightarrow\infty, and by the Toeplitz lemma, 1Tt=1Tf~t0\frac{1}{\sqrt{T}}\sum_{t=1}^{\sqrt{T}}\|\widetilde{f}_{t}\|\rightarrow 0. By the Cauchy-Schwarz, 1Tt=T+1Tf~t(t=T+1Tf~t2)1/2(t=Tf~t2)1/20\frac{1}{\sqrt{T}}\sum_{t=\sqrt{T}+1}^{T}\|\widetilde{f}_{t}\|\leq(\sum_{t=\sqrt{T}+1}^{T}\|\widetilde{f}_{t}\|^{2})^{1/2}\leq(\sum_{t=\sqrt{T}}^{\infty}\|\widetilde{f}_{t}\|^{2})^{1/2}\rightarrow 0 (also see [11] for a similar argument). Thus,

1Ni=1NλiF~D1/2MD1/2FD1/2εi=1Ni=1NλiF~D1εi+op(1)\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\varepsilon_{i}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}^{\prime}\widetilde{F}^{\prime}D^{-1}\varepsilon_{i}+o_{p}(1)

The fourth term of (38) after replacing 1TF~\frac{1}{T}\widetilde{F} by F~\widetilde{F} becomes (ignore the -1/2 and the trace),

F~D1/2MD1/2FD1/2F~\displaystyle\widetilde{F}^{\prime}D^{-1/2}M_{D^{-1/2}F}D^{-1/2}\widetilde{F} =F~D1F~F~D1F(FD1F)1FD1F~\displaystyle=\widetilde{F}^{\prime}D^{-1}\widetilde{F}-\widetilde{F}^{\prime}D^{-1}F(F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}\widetilde{F}
=F~D1F~+o(1)\displaystyle=\widetilde{F}^{\prime}D^{-1}\widetilde{F}+o(1)

owing to (FD1F/T)1=O(1)(F^{\prime}D^{-1}F/T)^{-1}=O(1) and F~D1F/T1/2=o(1)\widetilde{F}^{\prime}D^{-1}F/T^{1/2}=o(1) due to (40). The last term of (38) being o(1)o(1) with F~\widetilde{F} in place of 1TF~\frac{1}{\sqrt{T}}\widetilde{F} follows from the same argument.

The same analysis is applicable to terms involving δ~\widetilde{\delta} with δ~=O(1)\|\widetilde{\delta}\|=O(1). Analogous to the proof of (40), using Toeplitz Lemma, we can show

T1/2(δ~D1F)=o(1),T1/2(δ~D1Lδ)=o(1).T^{-1/2}(\widetilde{\delta}^{\prime}D^{-1}F)=o(1),\quad T^{-1/2}(\widetilde{\delta}^{\prime}D^{-1}L\delta)=o(1). (41)

There are three terms in B(θ~)B(\widetilde{\theta}) relating to δ~\widetilde{\delta}, see (39). We analyze each. Replacing 1Tδ~\frac{1}{\sqrt{T}}\widetilde{\delta} by δ~\widetilde{\delta}, we will show

1Ni=1Nδ~(FF+D)1(Fλi+εi)=1Ni=1Nδ~D1εi+op(1).\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}(F\lambda_{i}+\varepsilon_{i})=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}D^{-1}\varepsilon_{i}+o_{p}(1). (42)

Using (FF+D)1F=D1F(Ir+FD1F)1(FF^{\prime}+D)^{-1}F=D^{-1}F(I_{r}+F^{\prime}D^{-1}F)^{-1}, the left hand side above is

(δ~D1F)(Ir+FD1F)1(1Ni=1Nλi)+1Ni=1Nδ~(FF+D)1εi(\widetilde{\delta}^{\prime}D^{-1}F)(I_{r}+F^{\prime}D^{-1}F)^{-1}(\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i})+\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\varepsilon_{i}

The first term is bounded by op(T1/2)o_{p}(T^{-1/2}) because of (41), (Ir+FD1F)1=O(1/T)\|(I_{r}+F^{\prime}D^{-1}F)^{-1}\|=O(1/T), and 1Ni=1Nλi=Op(1)\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lambda_{i}=O_{p}(1). For the second term, using Woodbury formula, and (41)

1Ni=1Nδ~(FF+D)1εi=1Ni=1Nδ~D1εi+op(1).\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\varepsilon_{i}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\widetilde{\delta}^{\prime}D^{-1}\varepsilon_{i}+o_{p}(1).

This proves (42). Next, for the second term in B(θ~)B(\widetilde{\theta}) that depends on δ~\widetilde{\delta} (replacing T1δ~T^{-1}\widetilde{\delta} by δ~\widetilde{\delta})

T1/2δ~(FF+D)1Lδ=o(1)T^{-1/2}\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}L\delta=o(1)

This follows from the Woodbury formula and (41). Finally, the last term becomes (after replacing T1/2δ~T^{-1/2}\widetilde{\delta} by δ~\widetilde{\delta})

δ~(FF+D)1δ~=δ~D1δ~(δ~D1F)(Ir+FD1F)1FD1δ~\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}=\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta}-(\widetilde{\delta}D^{-1}F)(I_{r}+F^{\prime}D^{-1}F)^{-1}F^{\prime}D^{-1}\widetilde{\delta}

the second term is negligible because T1/2(δ~D1F)=o(1)T^{-1/2}(\widetilde{\delta}^{\prime}D^{-1}F)=o(1) and (Ir+FD1F)1=O(T1)(I_{r}+F^{\prime}D^{-1}F)^{-1}=O(T^{-1}). Thus δ~(FF+D)1δ~=δ~D1δ~+o(1)\widetilde{\delta}^{\prime}(FF^{\prime}+D)^{-1}\widetilde{\delta}=\widetilde{\delta}^{\prime}D^{-1}\widetilde{\delta}+o(1).

Collecting the simplified and the non-negligible terms, we obtain the expressions in Theorem 2. \Box

References

  • [1] Alvarez, J. and M. Arellano (2022). Robust likelihood estimation of dynamic panel data models. Journal of Econometrics, 226, 21-61.
  • [2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • [3] Anderson, T.W., and C. Hsiao (1982). Formulation and estimation of dynamic Models with Error Components, Journal of Econometrics, 76, 598-606.
  • [4] Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations, The Review of Economic Studies 58 (2), 277-297.
  • [5] Bai, J. (2009). Panel data models with interactive fixed effects, Econometrica, 77 1229-1279.
  • [6] Bai, J. (2013). Fixed effects dynamic panel data models, a factor analytical method, Econometrica, 81, 285-314.
  • [7] Bai, J. (2024). Likelihood approach to dynamic panel models with interactive effects. Journal of Econometrics. Vol 240, Issue 1. https://doi.org/10.1016/j.jeconom.2023.105636
  • [8] Bai, J. and K.P. Li (2012). Statistical analysis of factor models of high dimension. Annals of Statistics, 40, 436-465.
  • [9] Bai, J. and K.P. Li (2014). Theory and methods for panel data models with interactive effects. The Annals of Statistics Vol. 42, No. 1, 142-170.
  • [10] Hahn, J., and G. Kuersteiner (2002). “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both n and T Are Large,” Econometrica, 70, 1639-1657.
  • [11] Iwakura, H. and R. Okui (2014). Asymptotic efficiency in factor models and dynamic panel data models, Institute of Economic Research, Kyoto University, Available at SSRN 2395722
  • [12] Jankova J. and S. van de Geer (2018). Semiparametric efficiency bounds for high-dimensional models. The Annals of Statistics 46 (5), 2336-2359.
  • [13] Kiviet, J. (1995). “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models”, Journal of Econometrics, 68, 53-78.
  • [14] Lancaster, T. (2000). “The incidental parameter problem since 1948,” Journal of Econometrics, 95 391-413.
  • [15] Lawley, D.N. and A.E. Maxwell (1971). Factor Analysis as a Statistical Method, London: Butterworth.
  • [16] Moon H.R. and M. Weidner (2017). “Dynamic Linear Panel Regression Models with Interactive Fixed Effects”, Econometric Theory 33, 158-195.
  • [17] Moreira, M.J. (2009), A Maximum Likelihood Method for the Incidental Parameter Problem, Annals of Statistics, 37, 3660-3696.
  • [18] Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects, Econometrica, 49, 1417-1426.
  • [19] van der Vaart, A.W. (1998). Asymptotics Statistics, Cambridge University Press.
  • [20] van der Vaart, A.W. and J.A. Wellner (1996). Weak Convergence and Empirical Processes with Applications to Statistics, Springer-Verlag.