Efficiency of QMLE for dynamic panel data models with interactive effects
Abstract
This paper studies the problem of efficient estimation of panel data models in the presence of an increasing number of incidental parameters. We formulate the dynamic panel as a simultaneous equations system, and derive the efficiency bound under the normality assumption. We then show that the Gaussian quasi-maximum likelihood estimator (QMLE) applied to the system achieves the normality efficiency bound without the normality assumption. Comparison of QMLE with the fixed effects estimators is made.
keywords:
[class=MSC]keywords:
,
1 Introduction
Consider the dynamic panel data model with interactive effects
(1) |
where is the outcome variable, and are each and both are unobservable, is the time effect, and is the error term. Only are observable. The above model is increasingly used for empirical studies in social sciences. The purpose of this paper is about efficient estimation of the model. We argue that quasi-maximum likelihood estimation is a preferred method. But first, we explain the meaning of QMLE for this model and its motivations.
The index is referred to as individuals (e.g., households) and as time. If for all , and is scalar, then , we obtain the usual additive individual and time fixed effects model. Dynamic panel models with additive fixed effects remain the workhorse for empirical research. The product is known as the interactive effects [5], and is more general than additive fixed effects models. The models allow the individual heterogeneities (such as unobserved innate ability, captured by ) to have time varying impact (through ) on the outcome variable . In a different perspective, the models allow common shocks (modeled by to have heterogeneous impact (through ) on the outcome. For many panel data sets, is usually much larger than because it is costly to keep track of the same individuals over time. Under fixed , typical estimation methods such as the least squares do not yield consistent estimation of the model parameters. Consider the special case
(2) |
where are fixed effects. This corresponds to and for all . Even if are iid, zero mean and finite variance, and independent of , the least squares estimator is easily shown to be inconsistent. When are treated as parameters to be estimated along with , the least squares method is still bias, and the order of bias is , no matter how large is , see [18]. So unless goes to infinity, the least squares method remains inconsistent, an issue known as the incidental parameters problem, for example, [13] and [14].
However, provided , consistent estimation of is possible with the instrumental variables (IV) method. Anderson and Hsiao [3] suggested the IV estimator by solving , where . Differencing the data purges , but introduces correlation between the regressor and the resulting errors, which is why the IV method is used. With strictly greater than 3, more efficient IV estimator is suggested by Arellano and Bond [4]. Model (2) can also be estimated by the Gaussian quasi-maximum likelihood method, for example, [1], [6], and [17].
For model (1), differencing cannot remove the interactive effects since . The model can be estimated by the fixed effects approach, treating both and as parameters. Just like the least squares for the earlier additive effects model, the fixed effects method will produce bias. Below we introduce the quasi-likelihood approach, similar to [8, 9] for non-dynamic models.
Project the first observation on and write , where is the projection coefficients, and is the projection error. The asterisk variables are different from the true that generates . This projection is called upon because is not observable.111The first observation starts at , is not available. If were observable we would have . But then a projection of on would be required. Note that we can drop the superscript to simplify the notation. This is because we will treat and as (nuisance and free) parameters, and we do not require to have the same distribution over time. This means we can rewrite as .
The following notation will be used
(3) |
together with the following matrices,
(4) |
Note that . With these notations, we can write the model as
This gives a simultaneous equations system with equations. We assume are iid, independent of . Without loss of generality, we assume , otherwise, absorb (where into . Further assume (a normalization restriction, where is an identity matrix). We also assume are iid with zero mean, and
These assumptions imply that are iid with mean and covariance matrix . Consider the Gaussian quasi likelihood function
where the Jacobian does not enter since the determinant of is 1, where . The quasi-maximum likelihood estimator (QMLE) is defined as
The asymptotic distribution of this estimator is studied by [7].
An alternative estimator, the fixed effects estimator, treats both and as parameters, in addition to and . The corresponding likelihood function under normality of is given in (7) below. The fixed effects framework estimates more nuisance parameters (can be substantially more under large ), the source of incidental parameters problem. Our analysis focuses on QMLE. Comparison of the two approaches will be made based on the local likelihood methods.
The objectives of the present paper are threefold. First, what is the efficiency bound for the system maximum likelihood estimator under normality assumption? Second, does the QMLE attain the normality efficiency bound without the normality assumption? Third, how does QMLE fare in comparison to the fixed effects estimator?
We approach these questions with the Le Cam’s type of analysis. The difficulty lies in the increasing dimension of the parameter space as goes to infinity because the number of parameters is of order . No sparsity in parameters is assumed. With sparsity, [12] derived efficiency bounds and constructed efficient estimators via regularization for various models. The ability to deal with non-sparsity in the current model relies on panel data.
On notation: denotes the Frobenius norm for matrix (or vector) , that is, , and denotes the spectral norm of , that is, the square root of the largest eigenvalue of . Notice . The transpose of is denoted by ; and denote, respectively, its determinant and trace for a square matrix .
2 Assumptions for QMLE
We assume for asymptotic analysis. The following assumptions are made for the model.
Assumption A
(i) are iid over ; , , and are independent over ; for all and .
(ii) The are iid, independent of , with , , and .
(iii) There exist constants and such that for all ; .
Two comments are in order for this model. First, Assumption A(ii) assumes are random variables, but they can be fixed bounded constant. All needed is that (an positive definite matrix), where is the sample average of . One can normalize the matrix to be an identity matrix. Second, is determined up to an orthogonal rotation since for . the rotational indeterminacy can be removed by the normalization that is a diagonal matrix (with distinct elements), see [2] (p.573) and [15] (p.8). Rotational indeterminacy does not affect the estimate for , , and .
Under Assumption A, [7] showed that the QMLE for has the following asymptotic representation
(5) |
where , , and are defined earlier. Note that , and with the expression
Assume the above converges to , as , then we have, as ,
For the special case of homoskedasticity, ( for all ), , and hence . QMLE requires no bias correction, unlike the fixed effects regression. The latter is considered by [5] and [16]. Our objective is to show that is the efficiency bound under normality assumption, and QMLE attains the normality efficiency bound. This result is obtained in the presence of increasing number of incidental parameters. The estimator is also consistent under fixed in contrast to the fixed effects estimator. The estimated are all consistent and asymptotically normal. In particular, the estimated factors have the asymptotic representation, for each
(6) |
This implies . Details are given in Bai [7]. Also see [8], under the IC3 normalization) for factor models.
3 Local likelihood ratios and efficiency bound
3.1 Related literature
A closely related work is that of Iwakura and Okui [11]. They consider the fixed effects framework instead of the QMLE. The fixed effects estimation procedure treats both and as parameters ), along with and . The corresponding likelihood function under normality of is222The fixed effects likelihood does not have a global maximum under heteroskedasticity, for example, [2] (p.587), but local maximization is still meaningful. Another solution is to impose homoskedasticity.
(7) |
The fixed effects estimator for will generate bias, similar to the fixed effects estimator for dynamic panels with additive effects. The bias is studied by [16].333 In contrast, the QMLE does not generate bias under fixed . Iwakura and Okui [11] derive the efficiency bound for the fixed effects estimators under homoskedasticity ( for all ). Another closely related work is that of Hahn and Kuersteiner [10]. They consider the efficiency bound problem under the fixed effects framework for the additive effects model described by (2). Throughout this paper, the fixed effects framework refers to methods that also estimate the factor loadings in addition to .
In contrast, we consider the likelihood function for the system of equations
(8) |
QMLE does not estimate (even if they are fixed constants as explained earlier), thus eliminating the incidental parameters in the cross-section dimension. The incidental parameters are now , and , and the number of parameters increases with . Despite fewer number of incidental parameters, the analysis of local likelihood is more demanding than that of the fixed effects likelihood (7). Intuitively, the fixed effects likelihood (7) is quadratic in , but the QMLE likelihood in (8) depends on through the inverse matrix and through the log-determinant of this matrix. The high degree of nonlinearity makes the perturbation analysis more challenging. As demonstrated later, the local analysis brings insights regarding the relative merits of the QMLE and the fixed effects estimators.
3.2 The local parameter space
Local likelihood ratio processes are indexed by local parameters. Since the convergence rate for the estimated parameter of is , that is, , it is natural to consider the local parameters of the form
where . However, the consideration of local parameters for is non-trivial, as explained by Iwakura and Okui [11] for the fixed effects likelihood ratio. We consider the following local parameters
(9) |
where for all with arbitrarily given; denotes the r-dimensional Euclidean norm. In view that the estimated factor is consistent, that is, , one would expect local parameters in the form , the extra scale factor in the above local rate looks rather unusual. However, (9) is the suitable local rate for the local likelihood ratio to be , as is shown in both the statement and the proof of Theorem 1 below. Without the scale factor , the local likelihood ratio will diverge to infinity (in absolute values) if no restrictions are imposed on other than its boundedness. This type of local parameters was used in earlier work by [10] for additive fixed effects estimator. Later we shall consider a different type of local parameters without the extra scale factor , but other restrictions on will be needed.
Additionally, even if one regards to be small (relative to ), it is for the better provided that the associated efficiency bound is achievable by an estimator. This is because the smaller the perturbation, the lower the efficiency bound, and hence harder to attain by any estimator.
Consider the space
the space of bounded sequences, each coordinate is -valued. Let
and define , the projection of onto the first coordinates. The matrix is , but we suppress its dependence on for notational simplicity. Since , it follows that
(10) |
Similarly, for the time effects, we consider the local parameters
with for all . Let . For each , we use to denote the projection of the sequence onto the first coordinates , a vector.
To simplify the analysis, we assume is known. It can be shown that this simplifying assumption does not affect the efficiency bound for , but reduces the complexity of the derivation. Let and , we study the asymptotic behavior of
under the normality of and . The normality assumption allows us to derive the parametric efficiency bound in the presence of increasing number of nuisance parameters. We then show the QMLE without normality attains the efficiency bound. In the rest of the paper, we use and interchangeably (they represent the true parameters); represent local parameters, and , , and are the QMLE estimated parameters.
Assumption B
(i): are iid over , and independent over such that .
(ii): are iid , independent of for all and .
(iii): with for all .
(iv): for all , and , where .
(v): As ,
(a) ,
(b)
(c)
where denote the projection matrix orthogonal to . Specifically,
Under assumptions B(i) and (ii), are iid , implying a parametric model with an increasing dimension of incidental parameters. Normality for and is a standard assumption in factor analysis, see, e.g., Anderson [2] (p.576). Here we switch the role of and . Note in classical factor analysis, the time dimension (in our notation) is fixed, there is no incidental parameters problem since the number of parameters is fixed. But we consider that goes to infinity. The following theorem gives the asymptotic representation for the local likelihood ratios.
Theorem 1.
Under Assumption B, for , , , , and , we have as ,
where
(11) | ||||
where is uniform over such that , , and , for any given .
Note that the expected value of is zero, so is the variance.
All terms in are stochastically bounded, they have expressions of the form , where have zero mean, and finite variance (and in fact finite moments of any order under Assumption B). By assuming and are such that
as well as existence of limits for several cross products , , and so forth, then the variance has a limit. Let for some depending on . We can further show
Thus the local likelihood ratio can be rewritten as
We next consider the asymptotic efficiency bound for regular estimators. Regular estimators rule out Hodges type “superefficient” and James-Stein type estimators. A regular estimator sequence converges locally uniformly (under the local laws) to a limiting distribution that is free from the local parameters (van der Vaart [19], p.115, p.365).
3.3 Efficient scores and efficiency bound
In the likelihood ratio expansion, the term contains the scores of the likelihood function. The coefficient of gives the score for , the coefficient of gives the score of , and the same holds for and . The efficient score for is the projection residual of its own score onto the scores of and of . Moreover, the inverse of the variance of the efficient score gives the efficiency bound. To derive the efficient score for , rewrite of Theorem 1 as
where and denote the first two terms of , see (11), and ) denote the last three terms of (11), but taking out . So the score for is the sum . Next, rewrite
where and is the -th element of the vector . Thus is the score of (). Similarly, rewrite
where (a scalar), where is the th element of the vector , and is the score of . To obtain the efficient score for , we project onto the scores of and , that is onto and to get the projection residuals. Let and and . The projection residual is given by
(12) |
Notice is uncorrelated with the scores of and of (, i.e. . This follows because contains the lags of , so is composed of terms (with ), and for any . Next, is simply a linear combination of since can be written as , where is the -th row of the matrix . Because is a subvector of , is also a linear combination of . Thus . Similarly, is a linear combination of because with being the th element of . Thus is a linear combination of . Hence, . In summary, we have
It follows that the projection residual in (12) is equal to . Hence the efficient score for is . Notice,
(13) |
its limit is by Assumption B(v), so gives the asymptotic efficiency bound.
We summarize the result in the following corollary
Corollary 1.
Under Assumption B, the asymptotic efficiency bound for regular estimators of , is , with being the limit of (13).
Since Assumption B is stronger than Assumption A, the asymptotic representation in (5) holds under Assumption B. That is, under the normality assumption, the system maximum likelihood estimator satisfies
We see that is expressed in terms of the efficient influence functions, thus is regular and asymptotically efficient (van der Vaart [19], p.121 and p.369). We state the result in the following corollary.
Corollary 2.
Under Assumption B, the system maximum likelihood estimator is a regular estimator and achieves the asymptotic efficiency bound (in spite of an increasing number of incidental parameters).
The preceding corollaries imply that, under normality, we are able to establish the asymptotic efficiency bound in the presence of increasing number of nuisance parameters. Further, the system maximum likelihood estimator achieves the efficiency bound. These results are not obvious owing to the incidental parameters problem.
QMLE in Section 2 does not require normality. But it achieves the normality efficiency bound, see equation (5). So QMLE is robust to the normality assumption. If and are non-normal, and their distributions are known, one should be able to construct a more efficient estimator than the QMLE. But for panel data analysis in practice, researchers usually do not impose distributional assumptions other than some moment conditions such as those in Assumption A. Thus QMLE presents a viable estimation procedure, knowing that it achieves the normality efficiency bound. Furthermore, QMLE does not need bias correction, unlike the fixed effects estimator.
The result of Corollary 1 is not directly obtained via a limit experiment and the convolution theorem (e.g., van der Vaart and Wellner, [20], chapter 3.11). Since is not a Hilbert space, the convolution theorem is not directly applicable. However, using the line of argument in [11] it is possible to construct a Hilbert subspace with an appropriate inner product in which the efficiency bound for the low dimensional parameter can be shown to be . That is, Corollary 1 can be obtained via the convolution theorem. But [11] also show that the incidental parameters under the corresponding local parameter space are not regular. Therefore, we shall not pursue this approach. Below we shall consider perturbations.
3.4 The local parameter space
To have the limit process of the local likelihood ratios reside in a Hilbert space and to directly apply the convolution theorem ([20], chapter 3.11), we consider the second type of local parameter space, which is also used by [11] for the fixed effects estimators:
(14) |
with being required to be in :
For this type of local parameters, we can remove the scale factor , (cf. (9)). Since , we have, for (projection of on the first coordinates),
This is in contrast with (10). A necessary condition for is as . In comparison, the perturbation considered earlier only requires the boundedness of , so the process can be rather “jagged.” In certain sense, requiring to be in can be viewed as a “smoothness” restriction (for example, the Banach-Mazur theorem).
Similar to , we assume the sequence is in , i.e. . We still use to denote the projection of the sequence onto the first coordinates, .
Theorem 2.
Under Assumption B, for , , , , and , we have as ,
where
(15) | ||||
(16) | ||||
where is uniform over , , and for any given .
In comparison with Theorem 1, Theorem 2 has simpler expressions, due to the smaller local parameter space. The first two terms in are simplified, with the corresponding simplification in the variance, and in addition, two covariance terms in are dropped. The proof is given in the appendix.
We next establish the local asymptotic normality (LAN) property for the local likelihood ratios. Here we introduce further notation to the make the expressions more compact. Let
Both and are vectors with elements, and is a matrix of . With these notations, the sum of the first two terms of (15) equals .
For each , we introduce a new sequence
(17) |
so , and for , where , , and are defined in Assumption B(v), and is the variance of . Hence, is a scaled version of . By Assumption B(iii), , it follows that . For any , define the inner product, , then is a Hilbert space. Let . In particular, for in (17), we have
(18) |
Notice because the series are convergent and nonnegative; rearranging does not alter the limit. By Assumption B(v), we can write (16) as
(19) |
where is given in (17). Next, rewrite (15) as
where ( are defined earlier. The first term
because . Note that the LHS above is asymptotically independent of (their covariance being zero, as is shown in the appendix). From
where are given in Assumption B, we have
Moreover, it is not difficult to establish the finite dimensional weak convergence. Let , for . Let and , for any finite integer ,
Summarizing the above, we have
Corollary 3.
Using the convolution theorem for locally asymptotically normal (LAN) experiments, the implied efficiency bound for regular estimators of is . The implied efficiency bound for regular estimators of is for each . These bounds are, respectively, the inverse of the coefficient of , and the inverse of the matrix in the quadratic form in the expression for .
To see this, fix , with . For , consider the parameter sequence,
so . If we define , then
Since is a coordinate projection map (multiplied by a positive constant ), it is a continuous linear map, . Its adjoint map (both spaces are self-dual) is the inclusion map (i.e., embedding): , for all . The adjoint map satisfies . Let denote the limiting distribution of efficient estimators of . Theorem 3.11.2 in van der Vaart and Wellner ([20], p.414) show that for all . But . It follows that . Thus the efficiency bound for regular estimators of is . For , the same argument shows that the efficiency bound for regular estimators of is . In summary, we have
Corollary 4.
Under the assumptions of Theorem 2, the asymptotic efficiency bound for regular estimators of is , and the efficiency bound for regular estimators of is .
It can be shown that the efficiency bound corresponds to the case in which the incidental parameters are known, not estimated, thus the implied efficiency bound is too low to be attainable. The implication is that the perturbation is “too small”. Intuitively, the smaller the local parameter space, the lower the efficiency bound. Thus it is harder to achieve the implied bound, unless the estimation is done within the given local parameter space. But estimators, in general, are not constructed in reference to local parameter spaces.
When the same model is estimated by the fixed effects method (that is, the ’s are also treated as parameters), Iwakura and Okui [11] show that the perturbation is a suitable choice, and that the corresponding efficiency bound for is (the authors confined their analysis to the homoskedastic case so ). To the method of QMLE, however, there is no sufficient variation in the perturbation, thus implying a smaller bound. Which is to say that QMLE is a more efficient estimation procedure than the fixed effects approach. This finding is consistent with the result that even under fixed , QMLE provides a consistent estimator for , but fixed effects estimator is not consistent, see [17] and [6].
To recap, the local parameter space is “too small” for the QMLE, but is the suitable local parameter space for the fixed effects approach. This implies that, as explained earlier, QMLE is a better procedure than the fixed effects method. By analyzing the local likelihood ratios, we are able to inform the merits of different estimators that are otherwise hard to discern based on the usual asymptotics alone (e.g., limiting distributions).
4 Concluding remarks
We derive the efficiency bound for estimating dynamic panel models with interactive effects by treating the model as a simultaneous equations system. We show that quasi-maximum likelihood method applied to the system attains the efficiency bound. These results are obtained under an increasing number of incidental parameters. Contrasts with the fixed effects estimators are made. In particular, the analysis shows that the system QMLE is a preferred method to the fixed effects estimators.
Proof of Results
We focus on the local parameter space for , and . The analysis for the sequence to be in the space is a special case. Let . We drop the superscript 0 associated with true parameters to make the notation less cumbersome. When evaluated at the true parameters we have the equality . Thus
and is equal to
where . Thus, the difference is
(20) | ||||
Throughout, we use the matrix inversion formula (Woodbury formula)
and the matrix determinant result
From now on, we assume to simply the derivation. We define as
We start with a number of lemmas. The first few do not involve any random quantities, and are results of matrix algebra and Taylor expansions.
Lemma 1.
For ,
(21) |
Proof: Notice
where is negligible. From ,
where is a higher order remainder term. Thus
The second term on the right is negligible. This proves Lemma 1.
Lemma 2.
Let . Then
(22) | ||||
with being negligible (note this order of expansion is necessary).
Proof:
Let and use the expansion , we have
The above can be further approximated. For the third expression, we keep the first term of (i.e., ), and for the last expression, we keep the first term of (i.e, ), other terms are negligible. The denominator of the last expression is treated as . This gives the lemma.
Lemma 3.
The following matrix satisfies
where
(23) | |||
(24) | |||
(25) | |||
(26) |
where is defined in Lemma 2, and R is a negligible higher order term with .
Proof: From the Woodbury formula
we can write
(27) |
where is defined in Lemma 2. The first term on the right hand side above is
Use
where is negligible, we have
(28) | ||||
The last equality follows by ignoring higher order terms; and are negligible. To see this, is equal to multiplied the three matrices in the bracket of (28). But the Frobenius norm of each matrix is and so the product is . is equal to . To analyze in (27), we use , but we can ignore because its product with is a higher order term. Thus
(29) |
where is given in (22). All of the four terms in are non-negligible for the matrix ; only the first term in is non-negligible for and . Thus, combining (28) and (29) and pre and post multiplying , we obtain the lemma. .
Lemma 4.
where .
Note the first term has an opposite sign with (21).
Proof: By Lemma 3, consider
(30) |
We have used the fact that . Next,
and
Summing up terms gives
we can rewrite
This gives Lemma 4. .
Lemma 5.
(31) |
Proof: By the notation of Lemma 3, we evaluate . Note that consists of four parts. For this lemma, it is sufficient to approximate by , where
(32) | ||||
In the above approximation, we kept the first term of in (23) ( has four terms), and kept the very first term inside the brackets in (24) and (25). All other terms are negligible in the evaluation of . Using trace, it is easy to obtain the expected values
And
Thus the sum of the expected values is zero. In addition, the deviation of each term from its expected value is . This is because the variance of each term is . For example, . We have used the fact that for normal , for any , This proves the lemma. .
Lemma 6.
Proof: Recall . The preceding approximation of by in (32) is sufficient (other terms are negligible). We evaluate .
and
Combining the three expressions
Using the definition of and and rewriting the RHS gives the lemma.
Corollary 5.
Lemma 7.
(33) | ||||
Proof: Using , the left hand side equals the sum of the first term on the right hand side and the following two expressions
where is defined as the frist term, as the second term. From the formula,
term equals
where we used , and the second term is .
For term , use the Woodbury formula,
Note the expected value of the last term in the preceding equation is
and the deviation from its expected value is negligible (because its variance is , following from the same argument in Lemma 5). The above expected value cancels out with term . Thus, we can rewrite
where we have used . This proves Lemma 7.
Lemma 8.
(34) | ||||
Using , we have
where the two terms involving and are negligible, and each is . Here we have used and . Next,
where terms involving and are negligible. Next
which follows from the same reasoning as given for the term involving . Summing up,
Using the definition of and rewriting give the Lemma.
Corollary 6.
(35) |
Proof: adding and subtracting terms
where, by definition, . The corollary follows from Lemmas 7 and 8, that is, by summing (33) and (34), where every term is multiplied by .
Lemma 9.
(36) | ||||
Proof: Here it is sufficient to approximate by (recall , terms involving are negligible). Using , we rewrite
The second term is negligible (Its mean is zero, its variance converges to zero). Consider the third term. Using the Wooldbury formula, we rewrite the third term as
where and represent the first and the second expressions, respectively. Notice
where we have used and . The cross product terms are negligible. Next,
here we used , and and the cross product terms are negligible. It follows that
Combining results we obtain Lemma 9.
The next lemma is concerned with the last 3 terms of equation (20).
Lemma 10.
Proof: Consider (i). It is not difficult to show that replacing by generates a negligible term. Consider (ii). From , we can write the left hand side of (ii) as
The first term does not depend on , is equal to . Replace by only generates a negligible term. Thus the first term is . By using being iid zero mean random variables, we can show the second term is . Similarly, the last term is . Thus the last two terms are negligible. For (iii). Here replacing by only generates a negligible term.
Proof of Theorem 1. The local likelihood ratio is given by (20). Using Lemma 1, Corollary 5, Corollary 6, Lemma 9 (multiplied by ), and Lemma 10, we obtain
(37) |
where
(38) | ||||
(39) | ||||
Note that does not involve or . Terms related to and are given in .
Inspecting , the variances of the first three terms are given by the fourth to the sixth terms (times -1/2), respectively. The last term is twice of the covariance between the first and the third term (times -1/2), thus the last term is simply the negative covariance. The second term is uncorrelated with the first and third terms (since depends on the lags of , and is uncorrelated with for all and ).
Inspecting , the variance of the first term is given by the second term (times -1/2). The variance of the third term is given by the last term (also times -1/2). The fourth term is the negative covariance between the first and third terms.
Furthermore, the random variables in are uncorrelated with those in , because variables of the form is uncorrelated with and with ; is uncorrelated with and with ( depends on the lags of and is diagonal). Hence all variances and covariances are already accounted for and are included in and .
Finally, of Theorem 1 is composed of the first three terms of , plus the first and the third terms of (these are the random terms). All the remaining terms constitute . This completes the proof of Theorem 1.
Proof of Theorem 2. With respect to the local parameters , the proof of the previous theorem only uses . If , then . The entire earlier proof holds with replaced by . In particular, equation (38) holds with replaced by (that is, omitting ) due to . Notice appears in three terms on the right hand side of (38), namely, the first, fourth, and the last term. We analyze each of them. The first term of (38) after replacing with is written as
But the second term on the righthand side is , because it can be written as (ignore the trace)
Now , , but
(40) |
we have used , and . To see for , notice as , and by the Toeplitz lemma, . By the Cauchy-Schwarz, (also see [11] for a similar argument). Thus,
The fourth term of (38) after replacing by becomes (ignore the -1/2 and the trace),
owing to and due to (40). The last term of (38) being with in place of follows from the same argument.
The same analysis is applicable to terms involving with . Analogous to the proof of (40), using Toeplitz Lemma, we can show
(41) |
There are three terms in relating to , see (39). We analyze each. Replacing by , we will show
(42) |
Using , the left hand side above is
The first term is bounded by because of (41), , and . For the second term, using Woodbury formula, and (41)
This proves (42). Next, for the second term in that depends on (replacing by )
This follows from the Woodbury formula and (41). Finally, the last term becomes (after replacing by )
the second term is negligible because and . Thus .
Collecting the simplified and the non-negligible terms, we obtain the expressions in Theorem 2.
References
- [1] Alvarez, J. and M. Arellano (2022). Robust likelihood estimation of dynamic panel data models. Journal of Econometrics, 226, 21-61.
- [2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
- [3] Anderson, T.W., and C. Hsiao (1982). Formulation and estimation of dynamic Models with Error Components, Journal of Econometrics, 76, 598-606.
- [4] Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations, The Review of Economic Studies 58 (2), 277-297.
- [5] Bai, J. (2009). Panel data models with interactive fixed effects, Econometrica, 77 1229-1279.
- [6] Bai, J. (2013). Fixed effects dynamic panel data models, a factor analytical method, Econometrica, 81, 285-314.
- [7] Bai, J. (2024). Likelihood approach to dynamic panel models with interactive effects. Journal of Econometrics. Vol 240, Issue 1. https://doi.org/10.1016/j.jeconom.2023.105636
- [8] Bai, J. and K.P. Li (2012). Statistical analysis of factor models of high dimension. Annals of Statistics, 40, 436-465.
- [9] Bai, J. and K.P. Li (2014). Theory and methods for panel data models with interactive effects. The Annals of Statistics Vol. 42, No. 1, 142-170.
- [10] Hahn, J., and G. Kuersteiner (2002). “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both n and T Are Large,” Econometrica, 70, 1639-1657.
- [11] Iwakura, H. and R. Okui (2014). Asymptotic efficiency in factor models and dynamic panel data models, Institute of Economic Research, Kyoto University, Available at SSRN 2395722
- [12] Jankova J. and S. van de Geer (2018). Semiparametric efficiency bounds for high-dimensional models. The Annals of Statistics 46 (5), 2336-2359.
- [13] Kiviet, J. (1995). “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models”, Journal of Econometrics, 68, 53-78.
- [14] Lancaster, T. (2000). “The incidental parameter problem since 1948,” Journal of Econometrics, 95 391-413.
- [15] Lawley, D.N. and A.E. Maxwell (1971). Factor Analysis as a Statistical Method, London: Butterworth.
- [16] Moon H.R. and M. Weidner (2017). “Dynamic Linear Panel Regression Models with Interactive Fixed Effects”, Econometric Theory 33, 158-195.
- [17] Moreira, M.J. (2009), A Maximum Likelihood Method for the Incidental Parameter Problem, Annals of Statistics, 37, 3660-3696.
- [18] Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects, Econometrica, 49, 1417-1426.
- [19] van der Vaart, A.W. (1998). Asymptotics Statistics, Cambridge University Press.
- [20] van der Vaart, A.W. and J.A. Wellner (1996). Weak Convergence and Empirical Processes with Applications to Statistics, Springer-Verlag.