This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dyadic Regression with Sample Selection

Kensuke Sakamoto University of Wisconsin-Madison [email protected]
(Date: 07/03/2024)
Abstract.

This paper addresses the sample selection problem in panel dyadic regression analysis. Dyadic data often include many zeros in the main outcomes due to the underlying network formation process. This not only contaminates popular estimators used in practice but also complicates the inference due to the dyadic dependence structure. We extend Kyriazidou (1997)’s approach to dyadic data and characterize the asymptotic distribution of our proposed estimator. The convergence rates are n\sqrt{n} or n2hn\sqrt{n^{2}h_{n}}, depending on the degeneracy of the Hájek projection part of the estimator, where nn is the number of nodes and hnh_{n} is a bandwidth. We propose a bias-corrected confidence interval and a variance estimator that adapts to the degeneracy. A Monte Carlo simulation shows the good finite sample performance of our estimator and highlights the importance of bias correction in both asymptotic regimes when the fraction of zeros in outcomes varies. We illustrate our procedure using data from Moretti and Wilson (2017)’s paper on migration.

The author thanks supports and comments from Jack Porter, Bruce Hansen, Xiaoxia Shi, Harold Chiang, Kohei Yata, and participants in New York Camp Econometrics XVII. This paper was supported by the Summer Fellowship from the Department of Economics at University of Wisconsin-Madison.

Keywords: Dyadic Data, Sample Selection, Fixed Effects, Network Formation, Bias Correction.

1. Introduction

Dyadic data describes pairwise outcomes, such as trade volume between countries. Numerous applications have analyzed such data using the regression model, referred to as dyadic regression. Examples include gravity equations in trade, migration, and urban economics (Helpman et al., 2008; Moretti and Wilson, 2017; Monte et al., 2018), and risk-sharing networks in development economics (Fafchamps and Gubert, 2007). One of the prominent features of dyadic data is the non-negligible number of zeros in the outcomes of interest, 111Helpman et al. (2008) documents that there was no trade among roughly 50% of country pairs from 1970 to 1997. In 2017, there was no migration among about 60% of country pairs (the author calculated using the data available from the World Bank (https://www.worldbank.org/en/topic/migrationremittancesdiasporaissues/brief/migration-remittances-data). possibly due to economic mechanisms such as prohibitive fixed costs. This paper deals with panel dyadic data, where zeros are prevalent both cross-sectionally and in the time series.

How should we treat zeros in dyadic regression? In applications, zeros are often discarded due to the log-linear specification (Moretti and Wilson, 2017). The Poisson pseudo-maximum-likelihood (PPML) estimator is also frequently used to avoid discarding zeros and address issues related to log-linearization (Silva and Tenreyro, 2006). These approaches implicitly assume that zeros occur exogenously. Since a zero in a pairwise outcome results from no link between two units, we can associate zeros with the underlying network formation mechanism that determines which pairs appear in a sample. If the network is formed endogenously as a result of an interaction between two agents, the empirical practices mentioned above can be subject to sample selection bias, as in Heckman (1979).

This paper has two primary objectives. First, we aim to jointly model network formation and the outcome generation on such networks. This joint modeling allows identification of the effects of changes in pair-level or individual-level characteristics, separating them from the effects caused by changes in networks. In contrast, the dyadic regression literature has primarily focused on regression with fixed or exogenous networks. Second, we develop a robust inference method that accounts for the dyadic dependence structure. Pairwise outcomes are likely to be dependent on each other through common shocks to individuals. This dyadic dependence can be especially important in the presence of zeros and the network formation because a few individuals can have significantly more links than others, 222For example, in Moretti and Wilson (2017)’s migration flow data, star scientists’ migration from or to California constituted approximately 14% of the links in the sample on average. This percentage is much higher than the expected 2% when considering all potential links in the sample. which strengthens the influence of shocks to those individuals on the dyadic dependence. At the same time, it is known that with dyadic data, we can have different asymptotic regimes depending on the nature of those individual-level shocks (Menzel, 2021). To be practitioner-friendly, our inference method needs to consider the dyadic dependence and ensure adaptivity to different resulting asymptotic regimes.

Our setup will be a linear panel dyadic regression model, featuring the network formation process as a sample selection mechanism that generates both zeros and unobservable outcomes. To capture the dyadic dependence structure, we incorporate two types of unobservable individual heterogeneity into the model: time-invariant fixed effects and time-varying random effects, which is a new modeling strategy in the literature. We extend Kyriazidou (1997)’s identification argument, originally designed for individualistic data, to dyadic data, and correspondingly propose a semiparametric, kernel-based estimator that assigns weights to pairs whose selection index remains stable over time. A significant challenge we face when analyzing our estimator is the need to address the dependence structure caused by node-level shocks, which is absent in individualistic data models analyzed in Kyriazidou (1997). To control for this type of dependence, we utilize the U-statistic-like structure of our estimator, which gives us a mutually uncorrelated decomposition into the node-level Hájek projection part and the dyad-level projection error part.

We show that our estimator is asymptotically normal with two different convergence rates depending on the nature of errors. If the Hájek projection is non-degenerate (i.e., each summand has positive variance), our estimator achieves n\sqrt{n}-asymptotic normality, where nn is the number of nodes. In this case, we not only have zero asymptotic bias but also share the same convergence rates as the usual fixed effect estimator and PPML estimator when its leading term is also non-degenerate. The latter point implies that there is no loss in effective sample sizes with our estimator for using a kernel-based local method compared with the usual non-weighted estimator. If the Hájek projection is degenerate, our estimator achieves Nhn\sqrt{Nh_{n}}-asymptotic normality, where Nn2N\sim n^{2} is the number of dyads and hnh_{n} is a bandwidth. While the usual fixed effect estimator and the PPML estimator can be non-Gaussian in the limit (Menzel, 2021), our estimator is guaranteed to be asymptotically normal regardless of degeneracy. This result is analogous to Hall (1984)’s central limit theorem for degenerate U-statistics, allowing common statistics of interest, such as confidence intervals, to be constructed in a standard manner. In the degenerate case, our estimator exhibits asymptotic bias, which motivates us to introduce a bias correction.

We propose a variance estimator and bias-corrected confidence intervals that adapt to the degeneracy. Our variance estimator is similar to the one proposed by Graham et al. (2019) for nonparametric dyadic density estimation. We show that our estimator is consistent for the asymptotic variances in both non-degenerate and degenerate cases, after being rescaled by n\sqrt{n} or Nhn\sqrt{Nh_{n}}, respectively. For the bias correction, we use a consistent estimator for the asymptotic bias in the degenerate case. We show that the correction term is negligible in the non-degenerate case after being rescaled by n\sqrt{n}. Combining both bias-corrected estimator and variance estimator, we can construct bias-corrected confidence intervals for our estimator. These intervals have asymptotically correct sizes regardless of the (non-)degeneracy of the leading term in our estimator.

We conduct a simple simulation exercise to demonstrate the performance of our estimators compared to the usual fixed effect estimator and PPML estimator, as we vary the fraction of selected dyads from 10% to 90%. Our proposed estimator exhibits better finite sample properties than the other two estimators. Our bias-corrected confidence intervals also outperform the alternatives in coverage probabilities, regardless of degeneracy. This result underscores the importance of bias correction in finite samples, even though the asymptotic bias is zero in the non-degenerate case, which is a new finding in the literature.

As an empirical application, we extend and apply our estimator to the regression specification proposed by Moretti and Wilson (2017), which estimates the effects of state tax differences on the internal migration flows in the U.S. Comparing our proposed estimator with Moretti and Wilson (2017)’s, we find that their conclusion, which suggests that state tax differences have a significant impact on internal migration, may not be robust in the presence of a dyadic dependence structure and sample selection biases.

This paper is closely related to the burgeoning literature on dyadic regression (Cameron and Miller, 2014; Tabord-Meehan, 2019; Bonhomme, 2020; Zeleneev, 2020; Graham, 2020; Graham et al., 2021). Except for Bonhomme (2020) and Zeleneev (2020), these papers do not touch on non-random sample selection but instead focus on the consequence of the dyadic dependence structure. Bonhomme (2020) mainly focuses on the case where the selection is conditionally random with random effects. He also discusses the case where the selection is conditionally non-random without a theoretical analysis. Zeleneev (2020) studies identification and estimation of a dyadic regression model with flexible fixed effects. While his focus is on identification and a rate of convergence analysis, our paper provides the full inference results.

This paper also contributes to the literature on econometric analysis of models with endogenous network formation. Examples include Johnsson and Moon (2021), Auerbach (2022), and Jochmans (2023). While these papers study social interaction/peer effects type models where outcomes of interest are individualistic, our paper studies the direct consequence of network formation on dyadic outcomes.

2. Model

There are nn nodes in the data (e.g., states, countries), indexed by i=1,,ni=1,...,n. Let {(Xit,Zit)t=1,,T)}i=1n\{(X_{it},Z_{it})_{t=1,...,T})\}_{i=1}^{n} be a node-level observation, where XitqxX_{it}\in\mathbb{R}^{q_{x}} and ZitqzZ_{it}\in\mathbb{R}^{q_{z}}. For each dyad ijij and time tt, YijtY_{ijt}\in\mathbb{R} is a main outcome, and we observe a binary variable dijt{0,1}d_{ijt}\in\{0,1\}, which indicates that YijtY_{ijt} is observable 333Since we focus on a linear model, we can interchange unobservability with zero. Alternatively, we can interpret YijtY_{ijt} as the logarithm of Y~ijt0\tilde{Y}_{ijt}\geq 0. only if dijt=1d_{ijt}=1. We can interpret the adjacency matrix Dt[dijt]i,j=1,,nD_{t}\equiv[d_{ijt}]_{i,j=1,...,n} as a network that summarizes the existence of interactions between nodes. In this paper, we restrict our attention to a model with T=2T=2 and an undirected graph where Yijt=YjitY_{ijt}=Y_{jit}, dijt=djitd_{ijt}=d_{jit} for all i,j,ti,j,t. We also rule out self-loops by convention: Yiit=diit=0Y_{iit}=d_{iit}=0 for all i,ti,t. An extension to T>2T>2 and a directed graph is discussed in Section 4.1.

The data is generated according to the following model:

Wijt\displaystyle W_{ijt} =w(Xit,Xjt),Rijt=r(Zit,Zjt),\displaystyle=w(X_{it},X_{jt}),R_{ijt}=r(Z_{it},Z_{jt}), (2.1)
Yijt\displaystyle Y_{ijt}^{*} =Wijtβ+Ai+Aj+ϵijt,\displaystyle=W_{ijt}^{\prime}\beta+A_{i}+A_{j}+\epsilon_{ijt}, (2.2)
dijt\displaystyle d_{ijt} =𝟏{Rijtγ+Bi+Bjηijt},\displaystyle=\boldsymbol{1}\{R_{ijt}^{\prime}\gamma+B_{i}+B_{j}-\eta_{ijt}\}, (2.3)
Yijt\displaystyle Y_{ijt} ={Yijt if dijt=1unobserved if dijt=0.\displaystyle=\begin{cases}Y_{ijt}^{*}\text{ if }d_{ijt}=1\\ \text{unobserved if }d_{ijt}=0\end{cases}. (2.4)

The regressors WijtqwW_{ijt}\in\mathbb{R}^{q_{w}} and RijtqrR_{ijt}\in\mathbb{R}^{q_{r}} are constructed from some user-specified symmetric functions w:qx×qxqww:\mathbb{R}^{q_{x}}\times\mathbb{R}^{q_{x}}\to\mathbb{R}^{q_{w}} and r:qz×qzqrr:\mathbb{R}^{q_{z}}\times\mathbb{R}^{q_{z}}\to\mathbb{R}^{q_{r}} such that w(x,y)=w(y,x)w(x,y)=w(y,x) and r(x,y)=r(y,x)r(x^{\prime},y^{\prime})=r(y^{\prime},x^{\prime}) for any x,yqxx,y\in\mathbb{R}^{q_{x}} and x,yqzx^{\prime},y^{\prime}\in\mathbb{R}^{q_{z}}. For example, we can specify ww to be a pairwise summation w(x,y)=x+yw(x,y)=x+y. The symmetry in these functions is needed as our graphs are undirected; we can relax this requirement with directed graphs, as disccused in Section 4.1. The node-level fixed effects Ai,BiA_{i},B_{i}\in\mathbb{R} are unobservable, and we allow them to correlate with the regressors, as in the usual fixed effect model.

We specify the structure of errors ϵijt,ηijt\epsilon_{ijt},\eta_{ijt} as follows: For 1i<jn1\leq i<j\leq n,

(ϵij1,ϵij2,ηij1,ηij2)=τ(Ui1,Ui2,Uj1,Uj2,Uij1,Uij2),\displaystyle(\epsilon_{ij1},\epsilon_{ij2},\eta_{ij1},\eta_{ij2})=\tau(U_{i1},U_{i2},U_{j1},U_{j2},U_{ij1},U_{ij2}), (2.5)

where Ui(Ui1,Ui2)U_{i}\equiv(U_{i1},U_{i2}) and Uij(Uij1,Uij2)U_{ij}\equiv(U_{ij1},U_{ij2}) are node-level and dyad-level random vectors, respectively, and τ\tau is an unknown multivariate function.444Here, we need not specify the dimensions of those vectors and the function since the following results do not depend on them as long as those dimensions are fixed.

Let ξi(Xi1,Xi2,Zi1,Zi2,Ai,Bi)\xi_{i}\equiv(X_{i1},X_{i2},Z_{i1},Z_{i2},A_{i},B_{i}) be a vector that contains observed and unobserved information in the two periods with respect to node ii. We impose the following distributional assumption:

Assumption 1.

  1. (1)

    ξi\xi_{i}, i=1,,ni=1,...,n are independently and identically distributed.

  2. (2)

    (ϵijt,ηijt)t=1,2,1i<jn(\epsilon_{ijt},\eta_{ijt})_{t=1,2},1\leq i<j\leq n are generated according to (2.5).

  3. (3)

    Conditionally on {ξi}i=1n\{\xi_{i}\}_{i=1}^{n}, UiU_{i}, i=1,,ni=1,...,n are independent, UijU_{ij}, 1i<jn1\leq i<j\leq n are independent, and both of them are mutually independent.

  4. (4)

    For i<ji<j, (Ui,Uj,Uij)(U_{i},U_{j},U_{ij}) conditional on {ξi}i=1n\{\xi_{i}\}_{i=1}^{n} has the same distribution as (Ui,Uj,Uij)(U_{i},U_{j},U_{ij}) conditional on ξi,ξj\xi_{i},\xi_{j}.

  5. (5)

    For i<j<ki<j<k, if ξi=ξj=ξk\xi_{i}=\xi_{j}=\xi_{k}, (Ui,Uj,Uij)(U_{i},U_{j},U_{ij}) and (Ui,Uk,Uik)(U_{i},U_{k},U_{ik}) has the same distribution conditional conditionally on ξi,ξj,ξk\xi_{i},\xi_{j},\xi_{k}.

Part (1) imposes homogeneity on the node-level data-generating process. Parts (2) and (3) are new to the literature on dyadic regression with fixed effects. While the previous literature assumes conditional independence of dyadic-level errors (Graham, 2017; Zeleneev, 2020; Candelaria, 2020), our error structure (2.5) allows for the conditional dependence between errors with a common node (e.g., ϵij1\epsilon_{ij1} and ϵik1\epsilon_{ik1}) through UiU_{i}, but also includes conditional independence as a special case where node-level random vectors Uit,UjtU_{it},U_{jt} are degenerate given {ξi}i=1n\{\xi_{i}\}_{i=1}^{n}. Part (4) is the standard assumption in the literature and excludes "externalities," where dyad ijij can be affected by nodes other than ii or jj. Part (5) ensures the conditional exchangeability of (ϵijt,ηijt)t=1,2(\epsilon_{ijt},\eta_{ijt})_{t=1,2} across dyads.

2.1. Identification


The following two assumptions are crucial for the identification of β\beta:

Assumption 2.

(ϵij1,ϵij2,ηij1,ηij2)(\epsilon_{ij1},\epsilon_{ij2},\eta_{ij1},\eta_{ij2}) and (ϵij2,ϵij1,ηij2,ηij1)(\epsilon_{ij2},\epsilon_{ij1},\eta_{ij2},\eta_{ij1}) are identically distributed conditionally on ξi,ξj\xi_{i},\xi_{j}.

Assumption 3.

E[dij1dij2ΔWijΔWij|ΔRijγ=0]E[d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}|\Delta R_{ij}^{\prime}\gamma=0] is non-singular where ΔWij=Wij1Wij2\Delta W_{ij}=W_{ij1}-W_{ij2} and ΔRij=Rij1Rij2\Delta R_{ij}=R_{ij1}-R_{ij2}.

Assumption 2 excludes cases where, for example, the conditional variance of ϵijt\epsilon_{ijt} depends only on period tt’s information: Var(ϵijt|ξi,ξj)=σ2×WijtβVar(\epsilon_{ijt}|\xi_{i},\xi_{j})=\sigma^{2}\times W_{ijt}^{\prime}\beta. However, it allows time invariant heteroskedasticity such as Var(ϵijt|ξi,ξj)=σ2(Wij1+Wij2)β×Ai×AjVar(\epsilon_{ijt}|\xi_{i},\xi_{j})=\sigma^{2}(W_{ij1}+W_{ij2})^{\prime}\beta\times A_{i}\times A_{j}. From (2.5), this assumption is implied by the conditional exchangeability of UitU_{it} and UijtU_{ijt} with respect to time. Assumption 3 excludes cases where WijtW_{ijt} is exactly the same as RijtR_{ijt} and implies that some variables in RijtR_{ijt} must be excluded from WijtW_{ijt}. Since

E[dij1dij2ΔWijΔWij|ΔRijγ=0]\displaystyle E[d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}|\Delta R_{ij}^{\prime}\gamma=0]
=Pr(dij1dij2=1|ΔRijγ=0)×E[ΔWijΔWij|dij1dij2=1,ΔRijγ=0],\displaystyle=Pr(d_{ij1}d_{ij2}=1|\Delta R_{ij}^{\prime}\gamma=0)\times E[\Delta W_{ij}\Delta W_{ij}^{\prime}|d_{ij1}d_{ij2}=1,\Delta R_{ij}^{\prime}\gamma=0],

this assumption also implies that the networks D1,D2D_{1},D_{2} are locally dense across time in the sense that Pr(dij1dij2=1|ΔRijγ=0)>0Pr(d_{ij1}d_{ij2}=1|\Delta R_{ij}^{\prime}\gamma=0)>0.

Our identification argument is summarized in the following two steps, similarly to Kyriazidou (1997). First, take the time-difference on observed outcomes (dyads with dij1=dij2=1d_{ij1}=d_{ij2}=1) to eliminate the fixed effects:

ΔYij=ΔWijβ+ϵij1ϵij2.\displaystyle\Delta Y_{ij}=\Delta W_{ij}^{\prime}\beta+\epsilon_{ij1}-\epsilon_{ij2}.

If we take expectation of both sides conditionally on dij1=dij2=1d_{ij1}=d_{ij2}=1 and ξi,ξj\xi_{i},\xi_{j},

E[ΔYij|dij1dij2=1,ξi,ξj]=ΔWijβ+E[ϵij1ϵij2|dij1dij2=1,ξi,ξj]Sample selection effect.\displaystyle E[\Delta Y_{ij}|d_{ij1}d_{ij2}=1,\xi_{i},\xi_{j}]=\Delta W_{ij}^{\prime}\beta+\underbrace{E[\epsilon_{ij1}-\epsilon_{ij2}|d_{ij1}d_{ij2}=1,\xi_{i},\xi_{j}]}_{\text{Sample selection effect}}.

Note that, in general, the sample selection effect is not 0.

Second, we seek to find conditions to eliminate the selection effect. Assumption 2 is equivalent to

F(ϵij1,ϵij2,ηij1,ηij2|ξi,ξj)=F(ϵij2,ϵij1,ηij2,ηij1|ξi,ξj),\displaystyle F(\epsilon_{ij1},\epsilon_{ij2},\eta_{ij1},\eta_{ij2}|\xi_{i},\xi_{j})=F(\epsilon_{ij2},\epsilon_{ij1},\eta_{ij2},\eta_{ij1}|\xi_{i},\xi_{j}),

where FF is the conditional distribution of the errors given ξi,ξj\xi_{i},\xi_{j}. Then, for dyad ijij with ΔRijγ=Rij1γRij2γ=0\Delta R_{ij}^{\prime}\gamma=R_{ij1}^{\prime}\gamma-R_{ij2}^{\prime}\gamma=0,

E[ϵij1|dij1dij2=1,ξi,ξj,ΔRijγ=0]\displaystyle E[\epsilon_{ij1}|d_{ij1}d_{ij2}=1,\xi_{i},\xi_{j},\Delta R_{ij}^{\prime}\gamma=0]
=E[ϵij1|Rij1γ+Bi+Bjηij1,Rij2γ+Bi+Bjηij2,ξi,ξj,ΔRijγ=0]\displaystyle=E[\epsilon_{ij1}|R_{ij1}^{\prime}\gamma+B_{i}+B_{j}\geq\eta_{ij1},R_{ij2}^{\prime}\gamma+B_{i}+B_{j}\geq\eta_{ij2},\xi_{i},\xi_{j},\Delta R_{ij}^{\prime}\gamma=0]
=E[ϵij2|Rij2γ+Bi+Bjηij2,Rij1γ+Bi+Bjηij1,ξi,ξj,ΔRijγ=0]\displaystyle=E[\epsilon_{ij2}|R_{ij2}^{\prime}\gamma+B_{i}+B_{j}\geq\eta_{ij2},R_{ij1}^{\prime}\gamma+B_{i}+B_{j}\geq\eta_{ij1},\xi_{i},\xi_{j},\Delta R_{ij}^{\prime}\gamma=0]
=E[ϵij2|dij1dij2=1,ξi,ξj,ΔRijγ=0].\displaystyle=E[\epsilon_{ij2}|d_{ij1}d_{ij2}=1,\xi_{i},\xi_{j},\Delta R_{ij}^{\prime}\gamma=0].

Hence, the conditional expectation of ΔYij\Delta Y_{ij} given dij1dij2=1d_{ij1}d_{ij2}=1, ξi,ξj\xi_{i},\xi_{j}, and ΔRijγ=0\Delta R_{ij}^{\prime}\gamma=0 is

E[ΔYij|dij1dij2=1,ξi,ξj,ΔRijγ=0]=ΔWijβ.\displaystyle E[\Delta Y_{ij}|d_{ij1}d_{ij2}=1,\xi_{i},\xi_{j},\Delta R_{ij}^{\prime}\gamma=0]=\Delta W_{ij}^{\prime}\beta.

Multiplying the both sides by ΔWij\Delta W_{ij} and aggregating ξi,ξj\xi_{i},\xi_{j}, we get

E[ΔWijΔYij|dij1dij2=1,ΔRijγ=0]=E[ΔWijΔWij|dij1dij2=1,ΔRijγ=0]β.\displaystyle E[\Delta W_{ij}\Delta Y_{ij}|d_{ij1}d_{ij2}=1,\Delta R_{ij}^{\prime}\gamma=0]=E[\Delta W_{ij}\Delta W_{ij}^{\prime}|d_{ij1}d_{ij2}=1,\Delta R_{ij}^{\prime}\gamma=0]\beta.

Then, under Assumption 3, β\beta is uniquely written as

β=E[dij1dij2ΔWijΔWij|ΔRijγ=0]1E[dij1dij2ΔWijΔYij|ΔRijγ=0].\displaystyle\beta=E[d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}|\Delta R_{ij}^{\prime}\gamma=0]^{-1}E[d_{ij1}d_{ij2}\Delta W_{ij}\Delta Y_{ij}|\Delta R_{ij}^{\prime}\gamma=0]. (2.6)

2.2. Estimation


Estimation is done in two steps; In the first step, we estimate γ\gamma with a consistent estimator γ^n\hat{\gamma}_{n}, and in the second step we estimate β\beta with β^n\hat{\beta}_{n}, a sample analogue of the identified β\beta with γ\gamma replaced by γ^n\hat{\gamma}_{n}.

In the following, we focus on the second step. The sample-analogue of (2.6) is given by

β^n=[i<jdij1dij2ΔWijΔWijKhn(ΔRijγ^n)]1[i<jdij1dij2ΔWijΔYijKhn(ΔRijγ^n)],\displaystyle\hat{\beta}_{n}=\left[\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}K_{h_{n}}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n})\right]^{-1}\left[\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta Y_{ij}K_{h_{n}}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n})\right],

where i<j=i=1n1j=i+1n\sum_{i<j}=\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}, Khn(v)=hn1K(v/hn)K_{h_{n}}(v)=h_{n}^{-1}K(v/h_{n}) is a kernel, and hnh_{n} is a bandwidth. The weight function is used to smooth the condition ΔRijγ=0\Delta R_{ij}^{\prime}\gamma=0 and puts larger weight on observations with small ΔRijγ^n\Delta R_{ij}^{\prime}\hat{\gamma}_{n}.

To evaluate β^n\hat{\beta}_{n} in terms of β\beta, rewrite the time-differenced model as

ΔYij=ΔWijβ+λij+νij,\displaystyle\Delta Y_{ij}=\Delta W_{ij}^{\prime}\beta+\lambda_{ij}+\nu_{ij},

where

λij\displaystyle\lambda_{ij} E[ϵij1ϵij2|dij1dij2=1,ξi,ξj]\displaystyle\equiv E[\epsilon_{ij1}-\epsilon_{ij2}|d_{ij1}d_{ij2}=1,\xi_{i},\xi_{j}]
νij\displaystyle\nu_{ij} ϵij1ϵij2λij.\displaystyle\equiv\epsilon_{ij1}-\epsilon_{ij2}-\lambda_{ij}.

Note that E[νij|dij1dij2=1,ξi,ξj]=0E[\nu_{ij}|d_{ij1}d_{ij2}=1,\xi_{i},\xi_{j}]=0 by construction. Define

S^WW\displaystyle\hat{S}_{WW} 1Ni<jdij1dij2ΔWijΔWijKhn(ΔRijγ^n),\displaystyle\equiv\frac{1}{N}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}K_{h_{n}}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}),
S^Wλ\displaystyle\hat{S}_{W\lambda} 1Ni<jdij1dij2ΔWijλijKhn(ΔRijγ^n),\displaystyle\equiv\frac{1}{N}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\lambda_{ij}K_{h_{n}}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}),
S^Wν\displaystyle\hat{S}_{W\nu} 1Ni<jdij1dij2ΔWijνijKhn(ΔRijγ^n).\displaystyle\equiv\frac{1}{N}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\nu_{ij}K_{h_{n}}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}).

Substituting ΔYij\Delta Y_{ij} into β^n\hat{\beta}_{n} yields

β^n=β+S^WW1S^Wλ+S^WW1S^Wν.\displaystyle\hat{\beta}_{n}=\beta+\hat{S}_{WW}^{-1}\hat{S}_{W\lambda}+\hat{S}_{WW}^{-1}\hat{S}_{W\nu}.

The terms S^WW1S^Wλ\hat{S}_{WW}^{-1}\hat{S}_{W\lambda} and S^WW1S^Wν\hat{S}_{WW}^{-1}\hat{S}_{W\nu} can be understood as the selection bias term and the stochastic error term in the estimator, respectively.

3. Asymptotic Analysis

3.1. Regularity Conditions


For ease of notation, we write the following conditions in terms of dyads 1212 and 1313, which entails no loss of generality under the undirected graph and Assumption 1.

Let fRγ,2f_{R\gamma,2} be the joint density of ΔR12γ\Delta R_{12}^{\prime}\gamma and ΔR13γ\Delta R_{13}^{\prime}\gamma when it exists. Let fRγf_{R\gamma} be the marginal density and fRγ|ξ1,U1f_{R\gamma|\xi_{1},U_{1}} be the conditional density given ξ1,U1\xi_{1},U_{1}.

Assumption 4.

The joint distribution of ΔR12γ\Delta R_{12}^{\prime}\gamma and ΔR13γ\Delta R_{13}^{\prime}\gamma is absolutely continuous, and for some κ0>0\kappa_{0}>0, the following hold in the neighborhoods (κ0,κ0)2(-\kappa_{0},\kappa_{0})^{2} or (κ0,κ0)(-\kappa_{0},\kappa_{0}) around (0,0)(0,0) or (0)(0), respectively:

  1. (1)

    The density fRγ,2(,)f_{R\gamma,2}(\cdot,\cdot) is k2k\geq 2 times continuously differentiable, and the derivatives 2xpyqfRγ,2(x,y)\frac{\partial^{2}}{\partial x^{p}\partial y^{q}}f_{R\gamma,2}(x,y) are uniformly bounded for p+qk,p,q0p+q\leq k,p,q\geq 0 and bounded away from 0.

  2. (2)

    The marginal density fRγ()f_{R\gamma}(\cdot) is bounded away from 0.

  3. (3)

    The conditional density fRγ|ξ1,U1()f_{R\gamma|\xi_{1},U_{1}}(\cdot) given ξ1,U1\xi_{1},U_{1} is continuous and uniformly bounded almost surely.

Part (1) is a smoothness assumption on the density as in the nonparametric regression literature. Part (2) ensures that we observe ΔR12γ\Delta R_{12}^{\prime}\gamma around 0, which is crucial for identification. Part (3) essentially requires well-behaved r(,)r(\cdot,\cdot) in (2.1).

Define (w1,w2)Λ(w1,w2,ξ1,ξ2)(w_{1},w_{2})\mapsto\Lambda(w_{1},w_{2},\xi_{1},\xi_{2}) as

Λ(w1,w2,ξ1,ξ2)E[ϵ12t|η12tw1,η12sw2,ξ1,ξ2]\displaystyle\Lambda(w_{1},w_{2},\xi_{1},\xi_{2})\equiv E[\epsilon_{12t}|\eta_{12t}\leq w_{1},\eta_{12s}\leq w_{2},\xi_{1},\xi_{2}]

with t,s=1,2,tst,s=1,2,t\neq s. This Λ\Lambda is the sample selection effect caused by the correlation between errors ϵ12t,ϵ12s\epsilon_{12t},\epsilon_{12s} and η12t,η12s\eta_{12t},\eta_{12s}. Note that the function Λ\Lambda does not depend on time tt or ss because of Assumption 2.

Assumption 5.

The function (w1,w2)Λ(w1,w2,ξ1,ξ2)(w_{1},w_{2})\mapsto\Lambda(w_{1},w_{2},\xi_{1},\xi_{2}) is differentiable in the neighborhoods (κ0,κ0)2(-\kappa_{0},\kappa_{0})^{2} around (0,0)(0,0) for some κ0>0\kappa_{0}>0.

This assumption is essential for controlling the sample selection effect and characterizing the asymptotic bias in some cases. An implication of this assumption is that for some Λ12Λ~(w1,w2,ξ1,ξ2)\Lambda_{12}\equiv\tilde{\Lambda}(w_{1},w_{2},\xi_{1},\xi_{2}),

Λ(w1,w2,ξ1,ξ2)Λ(w2,w1,ξ1,ξ2)=Λ12×(w1w2),\displaystyle\Lambda(w_{1},w_{2},\xi_{1},\xi_{2})-\Lambda(w_{2},w_{1},\xi_{1},\xi_{2})=\Lambda_{12}\times(w_{1}-w_{2}),

by the multivariate mean-value theorem. Note that the function Λ\Lambda does not depend on time tt or ss because of Assumption 2. This assumption is strong because the difference in Λ\Lambda must be exactly linear in the first and second elements. If we focus on the degenerate case discussed below, since the asymptotic bias is 0 in that case, we can relax the differentiability to Lipschitz-like continuity on Λ\Lambda: |Λ(w1,w2,ξ1,ξ2)Λ(w2,w1,ξ1,ξ2)||Λ12|×|w1w2||\Lambda(w_{1},w_{2},\xi_{1},\xi_{2})-\Lambda(w_{2},w_{1},\xi_{1},\xi_{2})|\leq|\Lambda_{12}|\times|w_{1}-w_{2}|.

Let \|\cdot\| denote a Euclidian norm of vectors.

Assumption 6.

For some κ0>0\kappa_{0}>0, the following hold in the neighborhoods (κ0,κ0)2(-\kappa_{0},\kappa_{0})^{2} or (κ0,κ0)(-\kappa_{0},\kappa_{0}) around (0,0)(0,0) or (0)(0), respectively.

  1. (1)

    The following moments are uniformly bounded almost surely:

    E[ΔW128|ΔR12γ=,ξ1,U1],E[ΔR126|ΔR12γ=,ξ1,U1],\displaystyle E[\|\Delta W_{12}\|^{8}|\Delta R_{12}^{\prime}\gamma=\cdot,\xi_{1},U_{1}],E[\|\Delta R_{12}\|^{6}|\Delta R_{12}^{\prime}\gamma=\cdot,\xi_{1},U_{1}],
    E[ν128|ΔR12γ=,ξ1,U1],E[Λ126|ΔR12γ=,ξ1,U1].\displaystyle E[\nu_{12}^{8}|\Delta R_{12}^{\prime}\gamma=\cdot,\xi_{1},U_{1}],E[\Lambda_{12}^{6}|\Delta R_{12}^{\prime}\gamma=\cdot,\xi_{1},U_{1}].
  2. (2)

    The following moments are continuous and bounded, and the first two are positive definite:

    E[d121d122ΔW12ΔW12|ΔR12γ=],E[d121d122ΔW12ΔW12ν122|ΔR12γ=]\displaystyle E[d_{121}d_{122}\Delta W_{12}\Delta W_{12}^{\prime}|\Delta R_{12}^{\prime}\gamma=\cdot],E[d_{121}d_{122}\Delta W_{12}\Delta W_{12}^{\prime}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=\cdot]
    E[d121d122d131d132ΔW12ΔW13ν12ν13|ΔR12γ=,ΔR13γ=].\displaystyle E[d_{121}d_{122}d_{131}d_{132}\Delta W_{12}\Delta W_{13}^{\prime}\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=\cdot,\Delta R_{13}^{\prime}\gamma=\cdot].
  3. (3)

    g()E[d121d122ΔW12Λ12|ΔR12γ=]fRγ()g(\cdot)\equiv E[d_{121}d_{122}\Delta W_{12}\Lambda_{12}|\Delta R_{12}^{\prime}\gamma=\cdot]f_{R\gamma}(\cdot) is kk-times continuously differentiable with uniformly bounded derivatives.

  4. (4)

    gξ1,U1()E[d121d122ΔW12ν12|ΔR12γ=,ξ1,U1]fRγ|ξ1,U1()g_{\xi_{1},U_{1}}(\cdot)\equiv E[d_{121}d_{122}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=\cdot,\xi_{1},U_{1}]f_{R\gamma|\xi_{1},U_{1}}(\cdot) is kk-times continuously differentiable with uniformly bounded derivatives almost surely.

Part (1) assumes the existence of conditional moments for the relevant variables. The conditioning on ξ1\xi_{1} and U1U_{1} is needed for controlling the dyadic dependence structure. Part (2) is crucial for obtaining the convergence results used below, and the positive definiteness is needed for ensuring the non-degeneracy of our estimator in the limit. Part (3) is used for characterizing the asymptotic bias provided below. Part (4) is essential for the negligibility of the approximation error of our variance estimator.

Assumption 7.

The following moments exist:

E[ΔW128],E[ΔR128],E[Λ126],E[ν126]\displaystyle E[\|\Delta W_{12}\|^{8}],E[\|\Delta R_{12}\|^{8}],E[\Lambda_{12}^{6}],E[\nu_{12}^{6}]

Additionally to Assumption 6, which restricts the moments locally around (0,0)(0,0) or (0)(0), we use the existence of these unconditional moments when bounding error terms coming from the usage of γ^n\hat{\gamma}_{n}.

Assumption 8.

A kernel function K()K(\cdot) satisfies the following:

  1. (1)

    For some κ>0\kappa>0, KK is 0 outside of [κ,κ][-\kappa,\kappa], bounded in [κ,κ][-\kappa,\kappa], and three times continuously differentiable with bounded derivatives in (κ,κ)(-\kappa,\kappa).

  2. (2)

    K(s)𝑑s=1\int K(s)ds=1.

  3. (3)

    siK(s)𝑑s=0\int s^{i}K(s)ds=0 for i=1,,ki=1,...,k.

For example, a biweight kernel K(x)=15/16(1x2)2𝟏{|x|1}K(x)=15/16(1-x^{2})^{2}\boldsymbol{1}\{|x|\leq 1\} satisfies this assumption with κ=1\kappa=1 and k=2k=2.

Assumption 9.

The sequence of bandwidths {hn}\{h_{n}\} satisfies hn0h_{n}\to 0 and nhnnh_{n}\to\infty as nn\to\infty.

This assumption is standard in the nonparametric regression literature. We impose further conditions on {hn}\{h_{n}\} in each statement below.

Assumption 10.

The first-step estimator γ^n\hat{\gamma}_{n} satisfies Nhn(γ^nγ)=op(1)\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)=o_{p}(1).

This assumption requires the first-step estimator to be consistent and converge faster than our estimator. For example, if ηijtLogistic(0,1)\eta_{ijt}\sim Logistic(0,1) independently across ijij and tt, we can show that Chamberlain (1980)’s conditional logit estimator satisfies γ^nγ=Op(1/N)\hat{\gamma}_{n}-\gamma=O_{p}(1/\sqrt{N}) so that Nhn(γ^nγ)=Op(hn)=op(1)\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)=O_{p}(\sqrt{h_{n}})=o_{p}(1). In Section 3.6, we discuss the availability of alternative estimators for γ\gamma.

3.2. Asymptotic Normality


Define the following components that will appear in the asymptotic bias and variance expression:

ΣWW\displaystyle\Sigma_{WW} fRγ(0)E[d121d122ΔW12ΔW12|ΔR12γ=0]\displaystyle\equiv f_{R\gamma}(0)E[d_{121}d_{122}\Delta W_{12}\Delta W_{12}^{\prime}|\Delta R_{12}^{\prime}\gamma=0]
ΣWλ\displaystyle\Sigma_{W\lambda} 1k!kg(0)wksk+1K(s)𝑑s,\displaystyle\equiv\frac{1}{k!}\frac{\partial^{k}g(0)}{\partial w^{k}}\int s^{k+1}K(s)ds,
ΣWν,1\displaystyle\Sigma_{W\nu,1} 4fRγ,2(0,0)E[d121d122d131d132ΔW12ΔW13ν12ν13|ΔR12γ=ΔR13γ=0],\displaystyle\equiv 4f_{R\gamma,2}(0,0)E[d_{121}d_{122}d_{131}d_{132}\Delta W_{12}\Delta W_{13}^{\prime}\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0],
ΣWν,2\displaystyle\Sigma_{W\nu,2} fRγ(0)E[d121d122ΔW12ΔW12ν122|ΔR12γ=0]K2(s)𝑑s.\displaystyle\equiv f_{R\gamma}(0)E[d_{121}d_{122}\Delta W_{12}\Delta W_{12}^{\prime}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=0]\int K^{2}(s)ds.

We have the following result:

Theorem 1.

Suppose that Assumptions 1-10 hold. Fix an arbitrary non-zero vector cqwc\in\mathbb{R}^{q_{w}} and some constant h(0,)h\in(0,\infty). Let cW=ΣWW1cc_{W}=\Sigma_{WW}^{-1}c. Then, as nn\to\infty, we have the following three cases:

  1. (1)

    If Nhn2k+3hNh_{n}^{2k+3}\to h and cWΣWν,1cW>0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}>0:

    nc(β^nβ)d𝒩(0,cWΣWν,1cW).\displaystyle\sqrt{n}c^{\prime}(\hat{\beta}_{n}-\beta)\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}).
  2. (2)

    If Nhn2k+3hNh_{n}^{2k+3}\to h and cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0:

    Nhnc(β^nβ)d𝒩(hcWΣWλ,cWΣWν,2cW).\displaystyle\sqrt{Nh_{n}}c^{\prime}(\hat{\beta}_{n}-\beta)\to_{d}\mathcal{N}(\sqrt{h}c_{W}^{\prime}\Sigma_{W\lambda},c_{W}^{\prime}\Sigma_{W\nu,2}c_{W}).
  3. (3)

    If Nhn2k+3Nh_{n}^{2k+3}\to\infty and nhn2k+2nh^{2k+2}_{n}\to\infty:

    hn(k+1)(β^nβ)pΣWW1ΣWλ.\displaystyle h_{n}^{-(k+1)}(\hat{\beta}_{n}-\beta)\to_{p}\Sigma_{WW}^{-1}\Sigma_{W\lambda}.

Part (1) and (2) of Theorem 1 show that our estimator is asymptotically normal with different convergence rates depending on ΣWν,1\Sigma_{W\nu,1}. Part (1) departs from Kyriazidou (1997)’s result in that we have a parametric convergence rate based on the number of nodes nn, not the number of dyads (or units) NN. Under the condition that cWΣWν,1cW>0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}>0 in part (1), the covariance of two summands indexed with a common index (e.g., dyad ijij and ikik) in the estimator does not vanish in the limit, which results in the reduction in the effective sample size to nn, the number of nodes. At the same time, the leading term is an average of conditional means of the summand given ξi\xi_{i} and UiU_{i}, which averages out and drops hnh_{n} in the convergence rate. This n\sqrt{n}-asymptotic normality is aligned with the dyadic non-parametric density estimation literature (Graham et al., 2019). Once we have ΣWν,1=0\Sigma_{W\nu,1}=0 as in part (2), our result is aligned with Kyriazidou (1997) in that the convergence rates are non-parametric and based on the number of dyads (units). Part (3) of Theorem 1 shows that our estimator converges to the asymptotic bias part with suitable normalization, regardless of the degeneracy. We utilize this result to propose the bias-corrected estimator in the later section.

We can compare our estimator with the usual fixed effect estimator:

β^FE=[i<jdij1dij2ΔWijΔWij]1[i<jdij1dij2ΔWijΔYij],\displaystyle\hat{\beta}_{FE}=\left[\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\right]^{-1}\left[\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta Y_{ij}\right], (3.1)

which is biased because of the selection effect λij\lambda_{ij}. First, in the case of non-degeneracy, our estimator and the re-centered (infeasible) fixed effect estimator share the same convergence rates of n\sqrt{n} (Davezies et al., 2021). This implies that there is no reduction in the effective sample size for using our kernel-based local estimator, which amends the need for fairly large samples as discussed in Kyriazidou (1997). Second, in the case of degeneracy, the fixed effect estimator applied to our model can exhibit a non-Gaussian distribution in the limit (Menzel, 2021), while our estimator is asymptotically normal regardless of the degeneracy. This guaranteed asymptotic normality is analogous to Hall (1984)’s central limit theorem for degenerate U-statistics, and thus the common statsitics of interest, such as confidence intervals, can be constructed in a standard manner.

If we interpret the structural equation 2.2 as the log-linearized version of the canonical gravity model (Silva and Tenreyro, 2006; Head and Mayer, 2014),

Y~ijt=exp(Wijtβ+Ai+Aj)×ηijt=dijtexp(ϵijt),\displaystyle\tilde{Y}_{ijt}=exp(W_{ijt}^{\prime}\beta+A_{i}+A_{j})\times\underbrace{\eta_{ijt}}_{=d_{ijt}exp(\epsilon_{ijt})},

the Poisson pseudo-maximum-likelihood estimator (PPML) for β\beta can be compared with our estimator. The PPML estimator is given by

i<jt=12[Y~ijtexp(Wijtβ^PPML+ai𝟏{i=1}+aj𝟏{j=1})](Wijt,𝟏{i=1},𝟏{j=1})=0.\displaystyle\sum_{i<j}\sum_{t=1}^{2}\left[\tilde{Y}_{ijt}-exp(W_{ijt}^{\prime}\hat{\beta}_{PPML}+a_{i}\boldsymbol{1}\{i=1\}+a_{j}\boldsymbol{1}\{j=1\})\right]\begin{pmatrix}W_{ijt},\boldsymbol{1}\{i=1\},\boldsymbol{1}\{j=1\}\end{pmatrix}^{\prime}=0. (3.2)

We can make a similar comparison as in the fixed effect estimator based on the results by Davezies et al. (2021) and Menzel (2021): β^PPML\hat{\beta}_{PPML} will be biased because of the misspecfied errors, and the re-centered β^PPML\hat{\beta}_{PPML} is asymptotically normal at the rate of n\sqrt{n} in the non-degenerate case and can be non-Gaussian in the degenerate case.

3.3. Variance Estimation


Since our estimator exhibits different asymptotic distributions depending on ΣWν,1\Sigma_{W\nu,1}, it is desirable to have a variance estimator that adapts to the degeneracy.

First, we estimate ΣWν,1\Sigma_{W\nu,1}. Define

S^ij2dij1dij2Khn(ΔRijγ^n)ΔWijΔϵ^ij,\displaystyle\hat{S}_{ij}\equiv 2d_{ij1}d_{ij2}K_{h_{n}}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n})\Delta W_{ij}\Delta\hat{\epsilon}_{ij},

where Δϵ^ij\Delta\hat{\epsilon}_{ij} is a residual ΔYijΔWijβ^n\Delta Y_{ij}-\Delta W_{ij}^{\prime}\hat{\beta}_{n}. Then, we propose an estimator for ΣWν,1\Sigma_{W\nu,1} as

Σ^Wν,1=(n3)1Σi<j<k13(S^ijS^ik+S^ijS^jk+S^ikS^jk).\displaystyle\hat{\Sigma}_{W\nu,1}={n\choose 3}^{-1}\Sigma_{i<j<k}\frac{1}{3}(\hat{S}_{ij}\hat{S}_{ik}^{\prime}+\hat{S}_{ij}\hat{S}_{jk}^{\prime}+\hat{S}_{ik}\hat{S}_{jk}^{\prime}).

Next, we estimate ΣWν,2\Sigma_{W\nu,2} by

Σ^Wν,2=hnNi<jdij1dij2Khn(ΔRijγ^n)2ΔWijΔWijΔϵ^ij.\displaystyle\hat{\Sigma}_{W\nu,2}=\frac{h_{n}}{N}\sum_{i<j}d_{ij1}d_{ij2}K_{h_{n}}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n})^{2}\Delta W_{ij}\Delta W_{ij}^{\prime}\Delta\hat{\epsilon}_{ij}.

The following result shows concistency of these estimators and their usefulness in adaptive variance estimation.

Proposition 1.

Suppose that Assumptions 1-10 hold. Set hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} for some h(0,)h\in(0,\infty). We have

Σ^Wν,1\displaystyle\hat{\Sigma}_{W\nu,1} pΣWν,1,\displaystyle\to_{p}\Sigma_{W\nu,1},
Σ^Wν,2\displaystyle\hat{\Sigma}_{W\nu,2} pΣWν,2,\displaystyle\to_{p}\Sigma_{W\nu,2},

as nn\to\infty. If cWΣWν,1cW=0c_{W}\Sigma_{W\nu,1}c_{W}=0 with cW=ΣWW1cc_{W}=\Sigma_{WW}^{-1}c for some cqwc\in\mathbb{R}^{q_{w}}, we have

nhncS^WW1Σ^Wν,1S^WW1cp0,\displaystyle nh_{n}c^{\prime}\hat{S}_{WW}^{-1}\hat{\Sigma}_{W\nu,1}\hat{S}_{WW}^{-1}c\to_{p}0,

as nn\to\infty.

We now propose our variance estimator as follows:

Σ^S^WW1[n2n(n1)Σ^Wν,1+1NhnΣ^Wν,2]S^WW1.\displaystyle\hat{\Sigma}\equiv\hat{S}_{WW}^{-1}\left[\frac{n-2}{n(n-1)}\hat{\Sigma}_{W\nu,1}+\frac{1}{Nh_{n}}\hat{\Sigma}_{W\nu,2}\right]\hat{S}_{WW}^{-1}.

We can see that this estimator is adaptive to the degeneracy: When ΣWν,1\Sigma_{W\nu,1} is positive definite, since n/(Nhn)=o(1)n/(Nh_{n})=o(1),

ncΣ^c=cS^WW1[n2n1Σ^Wν,1+nNhnΣ^Wν,2]S^WW1cpcWΣWν,1cW,\displaystyle nc^{\prime}\hat{\Sigma}c=c^{\prime}\hat{S}_{WW}^{-1}\left[\frac{n-2}{n-1}\hat{\Sigma}_{W\nu,1}+\frac{n}{Nh_{n}}\hat{\Sigma}_{W\nu,2}\right]\hat{S}_{WW}^{-1}c\to_{p}c_{W}^{\prime}\Sigma_{W\nu,1}c_{W},

as nn\to\infty by Proposition 1 and Lemma 1 in Appendix A. When cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0, since nhncS^WW1Σ^Wν,1S^WW1c=op(1)nh_{n}c^{\prime}\hat{S}_{WW}^{-1}\hat{\Sigma}_{W\nu,1}\hat{S}_{WW}^{-1}c=o_{p}(1) by Proposition 1,

NhncΣ^c=cS^WW1[2(n2)hnΣ^Wν,1+Σ^Wν,2]S^WW1cpcWΣWν,2cW,\displaystyle Nh_{n}c^{\prime}\hat{\Sigma}c=c^{\prime}\hat{S}_{WW}^{-1}\left[2(n-2)h_{n}\hat{\Sigma}_{W\nu,1}+\hat{\Sigma}_{W\nu,2}\right]\hat{S}_{WW}^{-1}c\to_{p}c_{W}^{\prime}\Sigma_{W\nu,2}c_{W},

as nn\to\infty.

Our variance estimator is adapted from the one provided in Graham et al. (2019) for a dyadic non-parametric density estimator. They show that this type of estimator can be adaptive to the "knife edge" case, where nhnnh_{n} is bounded from above and below asymptotically so that NhnnNh_{n}\sim n. Here, we additionally show that the estimator is adaptive to the degeneracy by showing that the term involving Σ^Wν,1\hat{\Sigma}_{W\nu,1} decays fast enough to be negligible when the convergence rate is Nhn\sqrt{Nh_{n}}.

3.4. Bandwidth Selection


From the asymptotic distributional approximation result in Theorem 1, we can write down the mean squared error of our estimator (without negligible parts)

MSE(cβ^n)=hn2(k+1)(cWΣWλ)2+1ncWΣWν,1cW+1NhncWΣWν,1cW.\displaystyle MSE(c^{\prime}\hat{\beta}_{n})=h_{n}^{2(k+1)}(c_{W}^{\prime}\Sigma_{W\lambda})^{2}+\frac{1}{n}c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}+\frac{1}{Nh_{n}}c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}.

The optimal solution for minimizing this mean squared error with respect to hnh_{n} is given by

hn\displaystyle h_{n}^{*} =(cWΣWν,2cW2(k+1)N(cWΣWν)2)12k+3\displaystyle=\left(\frac{c_{W}^{\prime}\Sigma_{W\nu,2}c_{W}}{2(k+1)N(c_{W}^{\prime}\Sigma_{W\nu})^{2}}\right)^{\frac{1}{2k+3}}
=hN12k+3.\displaystyle=h^{*}N^{-\frac{1}{2k+3}}.

We can estimate hh^{*} by the plug-in method. By Proposition 1, we have a consistent estimator for the variance part. For the bias part, we use a pilot bandwidth given by

hn,δ=hNδ/(2k+3),\displaystyle h_{n,\delta}=hN^{-\delta/(2k+3)},

for some δ(0,2k+34k+4)\delta\in(0,\frac{2k+3}{4k+4}) and h>0h>0. Let β^n,δ\hat{\beta}_{n,\delta} be our estimator calculated with hn,δh_{n,\delta}. We can check that this bandwidth satisfies Nhn,δ2k+3Nh_{n,\delta}^{2k+3}\to\infty and nhn,δ2k+2nh_{n,\delta}^{2k+2}\to\infty. Thus, by Theorem 1,

hn,δ(k+1)(β^n,δβ)pΣWW1ΣWν,\displaystyle h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n,\delta}-\beta)\to_{p}\Sigma_{WW}^{-1}\Sigma_{W\nu},

as nn\to\infty. By replacing β\beta by β^n\hat{\beta}_{n}, calculated with hn=hN12k+3h_{n}=hN^{-\frac{1}{2k+3}}, we have the following result:

Proposition 2.

Suppose that Assumptions 1-10 hold. Let β^n\hat{\beta}_{n} and β^n,δ\hat{\beta}_{n,\delta} be the proposed estimators with bandwidths hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} and hn,δ=hNδ/(2k+3)h_{n,\delta}=hN^{-\delta/(2k+3)}, respectively, for some h>0h>0 and δ(0,2k+34k+4)\delta\in(0,\frac{2k+3}{4k+4}). Then,

hn,δ(k+1)(β^n,δβ^n)pΣWW1ΣWλ,\displaystyle h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n,\delta}-\hat{\beta}_{n})\to_{p}\Sigma_{WW}^{-1}\Sigma_{W\lambda},

as nn\to\infty.

Thus,

h^=(cS^WW1Σ^Wν,2S^WW1c2(k+1){hn,δ(k+1)c(β^n,δβ^n)}2)12k+3\displaystyle\hat{h}^{*}=\left(\frac{c^{\prime}\hat{S}_{WW}^{-1}\hat{\Sigma}_{W\nu,2}\hat{S}_{WW}^{-1}c}{2(k+1)\{h_{n,\delta}^{-(k+1)}c^{\prime}(\hat{\beta}_{n,\delta}-\hat{\beta}_{n})\}^{2}}\right)^{\frac{1}{2k+3}}

is a consistent estimator for hh^{*} by Propositions 1 and 2.

3.5. Bias Correction


Notice that our estimator has the asymptotic bias of hΣWW1ΣWλ\sqrt{h}\Sigma_{WW}^{-1}\Sigma_{W\lambda} in the case of degeneracy, ΣWν,1=0\Sigma_{W\nu,1}=0 from Theorem 1. If the bias is non-negligible, it distorts the coverage probability of the confidence interval. Correcting the bias part is desirable as it is generally unknown whether the degeneracy occurs. Fortunately, given the similar asymptotic distributional result as Kyriazidou (1997) in the degenerate case, we can use her bias correction strategy as follows.

Note that hn,δ(k+1)(β^n,δβ)h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n,\delta}-\beta) directly estimates the asymptotic bias from Theorem 1. We can construct a bias-corrected estimator β^n,bc(β)\hat{\beta}_{n,bc}(\beta) by subtracting this bias estimator from the original estimator with suitable normalization: Let rn,δ=N(1δ)/(2k+3)r_{n,\delta}=N^{(1-\delta)/(2k+3)}. The bias-corrected estimator is given by

β^n,bc(β)=β^nrn,δ(k+1)(β^n,δβ).\displaystyle\hat{\beta}_{n,bc}(\beta)=\hat{\beta}_{n}-r_{n,\delta}^{-(k+1)}(\hat{\beta}_{n,\delta}-\beta).

We can check that this estimator is asymptotically unbiased regardless of the degeneracy: When cWΣWν,1cW>0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}>0,

nc(β^n,bc(β)β)\displaystyle\sqrt{n}c^{\prime}(\hat{\beta}_{n,bc}(\beta)-\beta) =nc(β^nβ)nrn,δ(k+1)c(β^n,δβ)\displaystyle=\sqrt{n}c^{\prime}(\hat{\beta}_{n}-\beta)-\sqrt{n}r_{n,\delta}^{-(k+1)}c^{\prime}(\hat{\beta}_{n,\delta}-\beta)
=n(β^nβ)d𝒩(0,cWΣWν,1cW)nhk+1N(k+1)/(2k+3)0hn,δ(k+1)c(β^n,δβ)pcWΣWλ\displaystyle=\underbrace{\sqrt{n}(\hat{\beta}_{n}-\beta)}_{\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,1}c_{W})}-\underbrace{\sqrt{n}h^{k+1}N^{-(k+1)/(2k+3)}}_{\to 0}\underbrace{h_{n,\delta}^{-(k+1)}c^{\prime}(\hat{\beta}_{n,\delta}-\beta)}_{\to_{p}c_{W}^{\prime}\Sigma_{W\lambda}}
d𝒩(0,cWΣWν,1cW),\displaystyle\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}),

as nn\to\infty. When cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0,

Nhnc(β^n,bc(β)β)\displaystyle\sqrt{Nh_{n}}c^{\prime}(\hat{\beta}_{n,bc}(\beta)-\beta) =Nhnc(β^nβ)Nhnhn,1δ(k+1)c(β^n,δβ)\displaystyle=\sqrt{Nh_{n}}c^{\prime}(\hat{\beta}_{n}-\beta)-\sqrt{Nh_{n}}h_{n,1-\delta}^{-(k+1)}c^{\prime}(\hat{\beta}_{n,\delta}-\beta)
=Nhnc(β^nβ)d𝒩(h2k+3cWΣWλ,cWΣWν,2cW)hhn,δ(k+1)c(β^n,δβ)ph2k+3cWΣWλ\displaystyle=\underbrace{\sqrt{Nh_{n}}c^{\prime}(\hat{\beta}_{n}-\beta)}_{\to_{d}\mathcal{N}(\sqrt{h^{2k+3}}c_{W}^{\prime}\Sigma_{W\lambda},c_{W}^{\prime}\Sigma_{W\nu,2}c_{W})}-\underbrace{\sqrt{h}h_{n,\delta}^{-(k+1)}c^{\prime}(\hat{\beta}_{n,\delta}-\beta)}_{\to_{p}\sqrt{h^{2k+3}}c_{W}\Sigma_{W\lambda}}
d𝒩(0,cWΣWν,2cW),\displaystyle\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,2}c_{W}),

as nn\to\infty. Thus, given the adaptivity of Σ^\hat{\Sigma} to the degeneracy, we have

(cΣ^c)1/2c(β^n,bc(β)β)d𝒩(0,1),\displaystyle(c^{\prime}\hat{\Sigma}c)^{-1/2}c^{\prime}(\hat{\beta}_{n,bc}(\beta)-\beta)\to_{d}\mathcal{N}(0,1),

as nn\to\infty for an arbitrary non-zero vector cqwc\in\mathbb{R}^{q_{w}}.

Then, we can construct the bias-corrected confidence interval as follows: Letting Φ1α/21\Phi_{1-\alpha/2}^{-1} be 1α/21-\alpha/2 quantile of the standard normal distribution, we have

Φ1α/21(cΣ^c)1/2c(β^n,bc(β)β)Φ1α/21\displaystyle-\Phi_{1-\alpha/2}^{-1}\leq(c^{\prime}\hat{\Sigma}c)^{-1/2}c^{\prime}(\hat{\beta}_{n,bc}(\beta)-\beta)\leq\Phi_{1-\alpha/2}^{-1}
(cΣ^c)1/2Φ1α/21cβ^nhn,1δ(k+1)cβ^n,δ(1hn,1δ(k+1))cβ(cΣ^c)1/2Φ1α/21\displaystyle\Longleftrightarrow-(c^{\prime}\hat{\Sigma}c)^{-1/2}\Phi_{1-\alpha/2}^{-1}\leq c^{\prime}\hat{\beta}_{n}-h_{n,1-\delta}^{-(k+1)}c^{\prime}\hat{\beta}_{n,\delta}-(1-h_{n,1-\delta}^{-(k+1)})c^{\prime}\beta\leq(c^{\prime}\hat{\Sigma}c)^{-1/2}\Phi_{1-\alpha/2}^{-1}
CIL,α,ccβCIU,α,c,\displaystyle\Longleftrightarrow CI_{L,\alpha,c}\leq c^{\prime}\beta\leq CI_{U,\alpha,c},

where

CIL,α,c\displaystyle CI_{L,\alpha,c} (1hn,1δ(k+1))1[cβ^nhn,1δ(k+1)cβ^n,δ(cΣ^c)1/2Φ1α/21],\displaystyle\equiv(1-h_{n,1-\delta}^{-(k+1)})^{-1}\left[c^{\prime}\hat{\beta}_{n}-h_{n,1-\delta}^{(-k+1)}c^{\prime}\hat{\beta}_{n,\delta}-(c^{\prime}\hat{\Sigma}c)^{-1/2}\Phi_{1-\alpha/2}^{-1}\right],
CIU,α,c\displaystyle CI_{U,\alpha,c} (1hn,1δ(k+1))1[cβ^nhn,1δ(k+1)cβ^n,δ+(cΣ^c)1/2Φ1α/21].\displaystyle\equiv(1-h_{n,1-\delta}^{-(k+1)})^{-1}\left[c^{\prime}\hat{\beta}_{n}-h_{n,1-\delta}^{(-k+1)}c^{\prime}\hat{\beta}_{n,\delta}+(c^{\prime}\hat{\Sigma}c)^{-1/2}\Phi_{1-\alpha/2}^{-1}\right].

The full inference procedure is summarized as follows:

  1. (1)

    Compute the first step estimator γ^n\hat{\gamma}_{n}.

  2. (2)

    Choose k2,δ(0,2k+34k+4)k\geq 2,\delta\in(0,\frac{2k+3}{4k+4}), and h>0h>0 to compute β^n\hat{\beta}_{n} and β^n,δ\hat{\beta}_{n,\delta} with bandwidths hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} and hn,δ=hNδ/(2k+3)h_{n,\delta}=hN^{-\delta/(2k+3)}, respectively.

  3. (3)

    Compute Σ^\hat{\Sigma} and hn,δ(k+1)(β^n,δβ^n)h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n,\delta}-\hat{\beta}_{n}) to estimate the asymptotic variance and bias and obtain h^\hat{h}^{*}.

  4. (4)

    Update β^n\hat{\beta}_{n} and β^n,δ\hat{\beta}_{n,\delta} with bandwidths hn=h^N1/(2k+3)h_{n}=\hat{h}^{*}N^{-1/(2k+3)} and hn,δ=h^Nδ/(2k+3)h_{n,\delta}=\hat{h}^{*}N^{-\delta/(2k+3)}, respectively.

  5. (5)

    Construct the confidence interval by computing CIL,α,cCI_{L,\alpha,c} and CIU,α,cCI_{U,\alpha,c} from β^n,β^n,δ\hat{\beta}_{n},\hat{\beta}_{n,\delta}, and cΣ^cc^{\prime}\hat{\Sigma}c.

3.6. First-step Estimator


Remember that we want to estimate γ\gamma from the selection equation or network formation process (2.3):

dijt=𝟏{Rijtγ+Bi+Bjηijt0}.\displaystyle d_{ijt}=\boldsymbol{1}\{R_{ijt}^{\prime}\gamma+B_{i}+B_{j}-\eta_{ijt}\geq 0\}.

This DGP can be interpreted as a panel discrete choice model as well as a network formation model. Estimators for discrete choice models such as Chamberlain (1980), Manski (1987), or Horowitz (1992) can be candidates for estimating γ\gamma. Also, estimators for network formation models such as Graham (2017) or Candelaria (2020) can be applicable under additional conditions.

Whether those estimators can be used as our first-step estimator γ^n\hat{\gamma}_{n} boils down to their convergence rates: Recall that Assumption 10 requires that Nhn(γ^nγ)=op(1)\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)=o_{p}(1), which implies that the first-step estimator needs to converge faster than β^n\hat{\beta}_{n}. We can conjecture that, without additional conditions on ηijt\eta_{ijt}, the convergence rates of those estimators are n\sqrt{n} in worst cases due to the conditional dependence across dyads. Obviously, n\sqrt{n}-rate is incompatible with Assumption 10. In the following, we discuss what kind of additional conditions are needed to ensure Assumption 10.

We may assume additive separability for ηijt\eta_{ijt}: ηijt=Vit+Vjt+Vijt\eta_{ijt}=V_{it}+V_{jt}+V_{ijt}, where conditionally on {ξi}i=1n\{\xi_{i}\}_{i=1}^{n}, (Vi1,Vi2),i=1,,n(V_{i1},V_{i2}),i=1,...,n is independent, (Vij1,Vij2),1i<jn(V_{ij1},V_{ij2}),1\leq i<j\leq n is independent, and both are mutually independent. This assumption is weaker than assuming (ηij1,ηij2),1i<jn(\eta_{ij1},\eta_{ij2}),1\leq i<j\leq n is conditionally independent given {ξi}i=1n\{\xi_{i}\}_{i=1}^{n}, where VitV_{it} is treated as degenerate. With additional conditions, we can directly apply Graham (2017)’s joint maximum likelihood estimator or Candelaria (2020)’s semiparametric estimator, both of which leverage the cross-sectional variation in dijtd_{ijt} and RijtR_{ijt}. We can show that in our setting (especially Assumptions 1 and 6), the limiting networks are dense, which implies that both Graham (2017) and Candelaria (2020)’s estimators satisfy N(γ^nγ)=Op(1)\sqrt{N}(\hat{\gamma}_{n}-\gamma)=O_{p}(1) and Assumption 10.

Alternatively, we may assume that (ηij1,ηij2),1i<jn(\eta_{ij1},\eta_{ij2}),1\leq i<j\leq n is conditionally independent given {ξi}i=1n\{\xi_{i}\}_{i=1}^{n}. Graham (2017) and Candelaria (2020)’s estimators still satisfy Assumption 10, but we can also show that Chamberlain (1980)’s conditional logit estimator and Horowitz (1992)’s smoothed maximum score estimator can satisfy Assumption 10. Under the conditional independence assumption, the latter two estimators can be written in an asymptotically locally linear form where the corresponding influence function is indexed by ijij with 0 covariances. Thus, the convergence rates are based on NN and Assumption 10 can be satisfied depending on the tuning parameters.

4. Extension

4.1. Directed Graph with Multiple Periods


In the above analysis, we restricted our attention to an undirected graph; the variables are all symmetric with respect to nodes (e.g., Yijt=YjitY_{ijt}=Y_{jit}). Also, there were only two time periods, t=1,2t=1,2. The extension to a directed graph case with t=1,,T(T2)t=1,...,T\,(T\geq 2) is straightforward; Letting ΔstAAsAt\Delta_{st}A\equiv A_{s}-A_{t} denote the time difference between ss and tt, we propose the following estimator:

β^n=\displaystyle\hat{\beta}_{n}= [s<ti=1njidijsdijtΔstWijΔstWijKhn(ΔstRijγ^n)]1\displaystyle\left[\sum_{s<t}\sum_{i=1}^{n}\sum_{j\neq i}d_{ijs}d_{ijt}\Delta_{st}W_{ij}\Delta_{st}W_{ij}^{\prime}K_{h_{n}}(\Delta_{st}R_{ij}^{\prime}\hat{\gamma}_{n})\right]^{-1}
×[s<ti=1njidijsdijtΔstWijΔstYijKhn(ΔstRijγ^n)].\displaystyle\times\left[\sum_{s<t}\sum_{i=1}^{n}\sum_{j\neq i}d_{ijs}d_{ijt}\Delta_{st}W_{ij}\Delta_{st}Y_{ij}K_{h_{n}}(\Delta_{st}R_{ij}^{\prime}\hat{\gamma}_{n})\right].

All the results and their proofs are valid with some modification because we can always rewrite the double sum i=1njiAij\sum_{i=1}^{n}\sum_{j\neq i}A_{ij} as i<j(Aij+Aji)\sum_{i<j}(A_{ij}+A_{ji}) for any variables {Aij}\{A_{ij}\}. We will use this version of the estimator in our empirical application.

4.2. Pairwise Fixed Effects


In the model (2.2) and (2.3), all the fixed effects are node-wise. Since we are interested in coefficients on time-varying dyadic variables, it is possible to include pairwise fixed effects AijA_{ij} and BijB_{ij} in each equation, additionally to Ai,AjA_{i},A_{j} and Bi,BjB_{i},B_{j}. Clearly, with pairwise fixed effects, the identification and estimator will be the same as with node-wise fixed effects since we are leveraging the time variation. Thus, a similar asymptotic analysis will also hold as long as (Aij,Bij),1i<jn(A_{ij},B_{ij}),1\leq i<j\leq n are independently distributed conditionally on {ξi}i=1n\{\xi_{i}\}_{i=1}^{n}.

Alternatively, we can also do away with the additive separability by incorporating node-wise fixed effects into pairwise ones:

Aij=τ~(A~i,A~j,A~ij),\displaystyle A_{ij}=\tilde{\tau}\left(\tilde{A}_{i},\tilde{A}_{j},\tilde{A}_{ij}\right),

where τ~\tilde{\tau} is some unknown function, A~i\tilde{A}_{i} is a node-wise fixed effect, and A~ij\tilde{A}_{ij} is a pairwise fixed effect. We can impose a similar structure for BijB_{ij}. Again, the asymptotic analysis will hold as long as (A~ij,B~ij),1i<jn(\tilde{A}_{ij},\tilde{B}_{ij}),1\leq i<j\leq n are conditionally independent. With a more general dependence structure, we could show a similar asymptotic result using Kojevnikov et al. (2021)’s central limit theorem for ψ\psi-dependent data.

4.3. Sparsity


Above, we argue that our model and assumptions imply that the limiting networks D1D_{1} and D2D_{2} are locally dense around ΔRijγ0\Delta R_{ij}^{\prime}\gamma\sim 0. Thus, we limit our attention to cases where the number of dyads in the sample must be proportional to NN. Our modeling is appropriate in some applications, such as trade or migration, where the number of dyads is rather dense. However, ours can be inappropriate for some applications where the networks are sparse such as employee-employer, bank-firm matched data (e.g., Abowd et al. (1999), Jiménez et al. (2014)).

We can accommodate sparse networks by the following modification; let us modify Assumption 1 so that ξi,i=1,,n\xi_{i},i=1,...,n are drawn from some distribution that is allowed to depend on nn. For example, as argued in Graham (2017), we can consider a distribution where the fixed effects are such that lim inf1inBi=\liminf_{1\leq i\leq n}B_{i}=-\infty. Then, we can discuss identification and estimation with fixed nn, and the moments of interest are all dependent on nn. Especially, we can consider the sequence of networks such that Pr(d121d122=1|ΔR12γ=0)0Pr(d_{121}d_{122}=1|\Delta R_{12}^{\prime}\gamma=0)\to 0 and rnPr(d121d122=1|ΔR12γ=0)=Ω(1)r_{n}Pr(d_{121}d_{122}=1|\Delta R_{12}^{\prime}\gamma=0)=\Omega(1) for some rnr_{n}\to\infty to incorporate sparsity. We do not pursue sparsity in this paper and leave it for future projects.

5. Simulation

To see the performance of the estimator, we conduct some simulation exercises. Consider the following data-generating process:

Wijt=Xit+Xjt,Rit=(Wijt,Zit+Zjt),\displaystyle W_{ijt}=X_{it}+X_{jt},R_{it}=(W_{ijt},Z_{it}+Z_{jt})^{\prime},
Ai=Xi1+Xi22,Bi=Zi1+Zi22,\displaystyle A_{i}=\frac{X_{i1}+X_{i2}}{2},B_{i}=\frac{Z_{i1}+Z_{i2}}{2},
dijt=𝟏{Rijt(1,1)+θ×(Bi+Bj)ηijt0},\displaystyle d_{ijt}=\boldsymbol{1}\{R_{ijt}^{\prime}(1,1)^{\prime}+\theta\times(B_{i}+B_{j})-\eta_{ijt}\geq 0\},
Yijt=dijt(Wijt+Ai+Aj+ϵijt)\displaystyle Y_{ijt}=d_{ijt}(W_{ijt}+A_{i}+A_{j}+\epsilon_{ijt})

where

Xit,Zit𝒩(2,1),i.i.d. across i,t,\displaystyle X_{it},Z_{it}\sim\mathcal{N}(2,1),i.i.d.\text{ across }i,t,
ηijtLogistic(0,1),i.i.d. across ij,t,\displaystyle\eta_{ijt}\sim Logistic(0,1),i.i.d.\text{ across }ij,t,
ϵijt=Uit+Ujt+ηijt, where Uit𝒩(0,σ),i.i.d. acrossi,t.\displaystyle\epsilon_{ijt}=U_{it}+U_{jt}+\eta_{ijt},\text{ where }U_{it}\sim\mathcal{N}(0,\sigma),i.i.d.\text{ across}i,t.

Note that β=1\beta=1 and γ=(1,1)\gamma=(1,1)^{\prime}. We have θ{0.3,2.0,3.0}\theta\in\{-0.3,-2.0,-3.0\} inside of dijtd_{ijt} to control for the fraction of zeros in the simulated data set:

Pr(d121×d122=0){20% if θ=0.375% if θ=2.090% if θ=3.0.\displaystyle Pr(d_{121}\times d_{122}=0)\sim\begin{cases}20\%\text{ if }\theta=-0.3\\ 75\%\text{ if }\theta=-2.0\\ 90\%\text{ if }\theta=-3.0\end{cases}.

We also change σ{0.0,1.0}\sigma\in\{0.0,1.0\} for UitU_{it} so that σ=0.0\sigma=0.0 (σ=1.0\sigma=1.0) corresponds to the degenerate (non-degenerate) case.

As described above, we can interpret this data-generating process as a log-linearized version of the canonical gravity model (Head and Mayer (2014)); by writing Y~ijt\tilde{Y}_{ijt} as an observable outcome, we redefine the main equation as

Y~ijt=exp(Wijt+Ai+Aj)×ηijt=dijtexp(ϵijt).\displaystyle\tilde{Y}_{ijt}=exp(W_{ijt}+A_{i}+A_{j})\times\underbrace{\eta_{ijt}}_{=d_{ijt}exp(\epsilon_{ijt})}.

We can take a log and recover the original model for a unit with dijt=1d_{ijt}=1. This modeling allows a mass at Y~ijt=0\tilde{Y}_{ijt}=0, one important feature of dyadic data.

We conduct experiments for n{50,100,150,200}n\in\{50,100,150,200\}, θ{0.3,2.0,3.0}\theta\in\{-0.3,-2.0,-3.0\}, and σ{0.0,1.0}\sigma\in\{0.0,1.0\}, and iterate 20002000 times for each one. We calculate γ^n\hat{\gamma}_{n} by Chamberlain (1980)’s conditional logit estimator:

γ^n=argmaxg𝒢i<j:dij1+dij2=1Mij(g)\displaystyle\hat{\gamma}_{n}=\underset{g\in\mathcal{G}}{argmax}\sum_{i<j:d_{ij1}+d_{ij2}=1}M_{ij}(g)

where 𝒢\mathcal{G} is a compact subset of qr\mathbb{R}^{q_{r}} and

Mij(g)=𝟏{dij1=1}ln(exp(ΔRijg)1+exp(ΔRijg))+𝟏{dij2=1}ln(11+exp(ΔRijg)).\displaystyle M_{ij}(g)=\boldsymbol{1}\{d_{ij1}=1\}\ln\left(\frac{exp(\Delta R_{ij}^{\prime}g)}{1+exp(\Delta R_{ij}^{\prime}g)}\right)+\boldsymbol{1}\{d_{ij2}=1\}\ln\left(\frac{1}{1+exp(\Delta R_{ij}^{\prime}g)}\right).

For β^n\hat{\beta}_{n}, we use a biweight kernel for K()K(\cdot), given by K(x)=15/16(1x2)2𝟏{|x|1}K(x)=15/16(1-x^{2})^{2}\boldsymbol{1}\{|x|\leq 1\}. This choice implies that we assume that the smoothness of the model is given by k=2k=2. We set δ=0.4\delta=0.4 and h=3.0h=3.0 and calculate each estimator and confidence interval according to the inference procedure discussed above.

For comparison, we calculate the fixed effect estimator β^FE\hat{\beta}_{FE} given by (3.1). The standard error is calculated by Σ^\hat{\Sigma}, with Khn()K_{h_{n}}(\cdot) replaced by 11. We also calculate the Poisson pseudo-maximum-likelihood (PPML) esimator β^PPML\hat{\beta}_{PPML} given by (3.2). We compute β^PPML\hat{\beta}_{PPML} and its standard error by the penppml package in R (Ferreras Garrucho and Zylkin, 2023). The standard error is clustered at the node level, which is close to Σ^WW2Σ^Wν,1\hat{\Sigma}_{WW}^{-2}\hat{\Sigma}_{W\nu,1} in our setting (Graham, 2020).

The result is summarized in the following TABLE 1 and 2. In TABLE 1, we evaluate the three estimators by mean and median biases (MeanBias), root mean square error (RMSE) for σ=0,1\sigma=0,1. In TABLE 2, we compute 95% coverage probabilities (Coverage) of four different confidence intervals: CIconvCI_{conv} (conventional CI from β^^n\hat{\hat{\beta}}_{n} and Σ^\hat{\Sigma}), CIbcCI_{bc} (bias-corrected CI given by CIL,0.05CI_{L,0.05} and CIU,0.05CI_{U,0.05}), CIFECI_{FE} (conventional CI from β^FE\hat{\beta}_{FE} and Σ^\hat{\Sigma} with a flat kernel.), and CIPPMLCI_{PPML} (conventional CI from β^PPML\hat{\beta}_{PPML} and its node-level clustered standard error).

From TABLE LABEL:table:1 and LABEL:table:2, we can see that our estimator performs better than the fixed effect estimator and the PPML estimator in terms of bias, which shows that the weights given by the first step estimator work well in eliminating the bias. Our estimator also outperforms the competitors regarding RMSE, which implies that the loss in precision is not severe. Our estimator also performs well even when there is a large fraction of zeros in YY (Pr(Dij1×Dij2)90%Pr(D_{ij1}\times D_{ij2})\sim 90\% when θ=3.0\theta=-3.0). There is little difference between σ=0\sigma=0 and σ=1\sigma=1 other than added variances in the estimators.

From TABLE LABEL:table:3 and LABEL:table:4, we can see that CIbcCI_{bc} is close to 95% regardless of the degeneracy (ΣWν,1=0\Sigma_{W\nu,1}=0 or >0>0) while the others are off from the targeted nominal coverage. This result confirms the effectiveness of the bias correction strategy as well as the adaptivity of our variance estimator, as claimed in Section 3.3. Also, it is notable to see that the bias correction is important for obtaining correct coverage probabilities even though the asymptotic bias is 0 in the case of σ=1.0\sigma=1.0 so that ΣWν,1>0\Sigma_{W\nu,1}>0 (Theorem 1) and CIconvCI_{conv} would return an asymptotically correct coverage.

Table 1. Finite sample properties of β^n\hat{\beta}_{n}, β^FE\hat{\beta}_{FE}, and β^PPML\hat{\beta}_{PPML}
(a) σ=1.0\sigma=1.0
MeanBias RMSE
θ\theta nn β^n\hat{\beta}_{n} β^FE\hat{\beta}_{FE} β^PPML\hat{\beta}_{PPML} β^n\hat{\beta}_{n} β^FE\hat{\beta}_{FE} β^PPML\hat{\beta}_{PPML}
-0.3 50 0.045 0.133 0.185 0.122 0.160 0.430
-2.0 50 0.141 0.352 0.467 0.210 0.377 0.617
-3.0 50 0.162 0.369 0.582 0.273 0.415 0.752
-0.3 100 0.038 0.136 0.195 0.087 0.148 0.376
-2.0 100 0.099 0.349 0.438 0.142 0.359 0.536
-3.0 100 0.117 0.359 0.542 0.184 0.378 0.657
-0.3 150 0.028 0.135 0.193 0.070 0.143 0.327
-2.0 150 0.075 0.346 0.427 0.112 0.353 0.496
-3.0 150 0.095 0.356 0.527 0.145 0.367 0.607
-0.3 200 0.024 0.134 0.193 0.060 0.140 0.305
-2.0 200 0.061 0.344 0.417 0.091 0.348 0.471
-3.0 200 0.076 0.352 0.510 0.118 0.360 0.572
(b) σ=0.0\sigma=0.0
MeanBias RMSE
θ\theta nn β^n\hat{\beta}_{n} β^FE\hat{\beta}_{FE} β^PPML\hat{\beta}_{PPML} β^n\hat{\beta}_{n} β^FE\hat{\beta}_{FE} β^PPML\hat{\beta}_{PPML}
-0.3 50 0.048 0.134 0.194 0.082 0.142 0.399
-2.0 50 0.140 0.352 0.468 0.176 0.365 0.586
-3.0 50 0.161 0.369 0.581 0.229 0.397 0.714
-0.3 100 0.037 0.135 0.193 0.053 0.138 0.332
-2.0 100 0.093 0.348 0.438 0.110 0.352 0.508
-3.0 100 0.113 0.359 0.546 0.145 0.368 0.630
-0.3 150 0.028 0.135 0.191 0.039 0.136 0.301
-2.0 150 0.071 0.345 0.427 0.082 0.348 0.477
-3.0 150 0.089 0.354 0.529 0.108 0.359 0.592
-0.3 200 0.024 0.135 0.191 0.031 0.136 0.278
-2.0 200 0.058 0.345 0.415 0.067 0.347 0.451
-3.0 200 0.074 0.355 0.508 0.089 0.358 0.553
Table 2. 95%95\% coverage probabilities of CIconv,CIbcCI_{conv},CI_{bc}, CIFECI_{FE}, and CIPPMLCI_{PPML}
(a) σ=1.0\sigma=1.0
Coverage
θ\theta n CIconvCI_{conv} CIbcCI_{bc} CIFECI_{FE} CIPPMLCI_{PPML}
-0.3 50 0.790 0.961 0.498 0.537
-2.0 50 0.646 0.963 0.150 0.236
-3.0 50 0.640 0.901 0.311 0.211
-0.3 100 0.785 0.978 0.233 0.498
-2.0 100 0.668 0.970 0.011 0.173
-3.0 100 0.674 0.953 0.072 0.143
-0.3 150 0.790 0.971 0.103 0.472
-2.0 150 0.689 0.949 0.001 0.117
-3.0 150 0.688 0.944 0.016 0.09
-0.3 200 0.817 0.964 0.040 0.426
-2.0 200 0.730 0.947 0.000 0.08
-3.0 200 0.720 0.946 0.004 0.08
(b) σ=0.0\sigma=0.0
Coverage
θ\theta n CIconvCI_{conv} CIbcCI_{bc} CIFECI_{FE} CIPPMLCI_{PPML}
-0.3 50 0.698 0.918 0.141 0.547
-2.0 50 0.535 0.935 0.026 0.204
-3.0 50 0.592 0.869 0.168 0.182
-0.3 100 0.655 0.960 0.001 0.515
-2.0 100 0.482 0.958 0.000 0.12
-3.0 100 0.571 0.944 0.004 0.106
-0.3 150 0.673 0.977 0.000 0.45
-2.0 150 0.471 0.945 0.000 0.073
-3.0 150 0.532 0.949 0.001 0.065
-0.3 200 0.660 0.970 0.000 0.407
-2.0 200 0.444 0.939 0.000 0.052
-3.0 200 0.520 0.933 0.000 0.047

6. Empirical example

6.1. Background


As a leading application of our model, consider Moretti and Wilson (2017). They study how state-level tax differences affect migration by top scientists in the U.S. Specifically, they estimate the following model implied by their economic theory:

log(Pijt/Piit)\displaystyle log(P_{ijt}/P_{iit}) =η[log(1τjt)log(1τit)]\displaystyle=\eta\left[log(1-\tau_{jt})-log(1-\tau_{it})\right]
+η[log(1τjt)log(1τit)]+γj+γi+uijt,\displaystyle+\eta^{\prime}\left[log(1-\tau_{jt}^{\prime})-log(1-\tau_{it}^{\prime})\right]+\gamma_{j}+\gamma_{i}+u_{ijt},

where PijtP_{ijt} is the number of scientists migrating to state jj from state ii at year tt, τit\tau_{it} and τit\tau_{it}^{\prime} are personal and corporate taxes imposed in state ii at year tt, γi\gamma_{i} is a state fixed effect, and uijtu_{ijt} is an error term.

Note that if there is no migration from ii to jj at year tt, Pijt=0P_{ijt}=0 and log(Pijt/Piit)log(P_{ijt}/P_{iit}) is undefined. In Moretti and Wilson (2017)’s dataset, more than 70% of state-pairs exhibit no migration flow:

Figure 6.1. Fraction of positive migration flows in Moretti and Wilson (2017)’s dataset. The migration flow is positive in a given year if there is at least one scientist moving from state ii to jj (scientists are "star"; They are at or above 95% quantile in number of patents over the past ten years)
Refer to caption

When running a regression, they are concerned with a potential sample selection bias stemming from these undefined outcomes. They argue that if the main regressors are not systemically associated with whether there is positive migration flow or not, the selection bias should be minimal. Running OLS on the linear probability model, they find little correlation between the main regressors and no flow. Recalculating their regression with our estimator helps check the validity of their argument and the appropriateness of using the linear probability model.

When applying our model to their context, we must consider what RijtR_{ijt} should be. Since Moretti and Wilson (2017)’s underlying theory is based on scientists’ and firms’ discrete choice, one consistent way to generate zero migration flows between some states is to consider endogenous choice sets as in Dubé et al. (2021). Formally, we can write the choice set of representative scientists in state ii as Cit={j{1,,51}:dijt=1}C_{it}=\{j\in\{1,...,51\}:d_{ijt}=1\}. Here, {dijt}\{d_{ijt}\} represents the job-market network; if dijt=1d_{ijt}=1, it is possible to move from ii to jj, and vice versa. We can attribute the determinants of the network to the utilities and profits of scientists and firms, as well as the matching costs between the two parties. Such costs are not present in the structural equation if those costs are not compensated through wages; The structural equation consists of the determinants of log wage differences between two states. Thus in the selection equation (2.3), in addition to WijtW_{ijt}, we can include variables in RijtR_{ijt} that capture non-monetary matching costs between two states ii and jj, which does not violate Assumption 3 as RijtR_{ijt} satisfies the exclusion restriction.

6.2. Implementation


For WijtW_{ijt}, as in Moretti and Wilson (2017), we include the state-to-state differences in (i) an individual income average income tax rate (ATR) faced by a hypothetical taxpayer at 99% quantile of the national income distribution, (ii) the corporate tax rate (CIT), (iii) the investment tax credit (ITC), and (iv) the R&D tax credit (R&D credit). This is the same set of regressors as Moretti and Wilson (2017)’s baseline regression. For RijtR_{ijt}, we use WijtW_{ijt} plus state-to-state difference in the logarithm of population (POP) and a dummy variable that indicates whether ii and jj share their governors’ political parties (GOV). The additional variables in RijtR_{ijt} arguably measure non-monetary costs of connecting firms and workers in two states.

We implement the first step estimation as follows. We use the conditional logit estimator extended to a directed graph with multi-periods case, which is given by

γ^n=argmaxg𝒢s<ti,jMij,st(g),\displaystyle\hat{\gamma}_{n}=\underset{g\in\mathcal{G}}{argmax}\sum_{s<t}\sum_{i,j}M_{ij,st}(g),

where 𝒢\mathcal{G} is a compact subset of qr\mathbb{R}^{q_{r}} and

Mij,st(g)\displaystyle M_{ij,st}(g) =𝟏{dijs+dijt=1}×[𝟏{dijs=1}ln(eij,st)+𝟏{dijt=1}ln(1eij,st)],\displaystyle=\boldsymbol{1}\{d_{ijs}+d_{ijt}=1\}\times\left[\boldsymbol{1}\{d_{ijs}=1\}ln(e_{ij,st})+\boldsymbol{1}\{d_{ijt}=1\}ln(1-e_{ij,st})\right],
eij,st\displaystyle e_{ij,st} =exp(ΔstRijg)1+exp(ΔstRijg).\displaystyle=\frac{exp(\Delta_{st}R_{ij}^{\prime}g)}{1+exp(\Delta_{st}R_{ij}^{\prime}g)}.

TABLE 3 reports the first step estimation result. We can see that the coefficients on the newly added variables GOVGOV and POPPOP deviate from zero, which implies that a part of the identification assumptions (Assumption 3) is satisfied. Also, for these two variables, the estimated coefficients imply that the job market network exhibits homophily; similar states are more likely to be connected.

Table 3. First Step Estimation Result
Variable γ^n\hat{\gamma}_{n}
GOV 0.162
POP -13.144
ATR 8.561
CIT 5.850
ITC 5.350
R&\&D credit -0.230

For the second step estimator, we use our β^n\hat{\beta}_{n} defined above and extend it to the directed graph with multiple period cases, as discussed in Section 4. We use a biweight kernel for K()K(\cdot) (so k=2k=2), choose h=3.0h=3.0 as an initial constant for pilot bandwidths, and use δ=0.4\delta=0.4 for calculating hn,δh_{n,\delta}. We extend and use Σ^\hat{\Sigma} to calculate the standard error while taking into account the correlation across time (case 1). We also calculate the bias-corrected 95%95\%-confidence interval by computing CIL,0.05CI_{L,0.05} and CIU,0.05CI_{U,0.05} as defined above. Also, we list β^MW\hat{\beta}_{MW}, an estimate from Moretti and Wilson (2017) (page 1883, TABLE 2A, specification (3)) and calculate the conventional 95%95\%-confidence interval based on their standard errors. Note that β^MW\hat{\beta}_{MW} is the fixed estimator, where its standard error is calculated by clustering across time and the origin, destination, and origin-destination pairs.

We summarize the result in TABLE 4(b). We can see that β^n\hat{\beta}_{n} returns similar values as β^WM\hat{\beta}_{WM}, which claims the robust positive effect of income and corporate-related tax differences on migration. Thus, Moretti and Wilson (2017)’s estimates are not likely to be qualitatively affected by the sample selection effects. However, while Moretti and Wilson (2017)’s estimates are statistically significant at 5%5\% level, our confidence intervals show that all of the estimates are no longer statistically significant at that level except for ITC. Our insignificance result is driven by both the increase in standard errors 555Our standard error is from Σ^\hat{\Sigma}, which takes fully into account the dependence among pairs that share origin and destination, such as California\toWisconsin and New York\toCalifornia. Moretti and Wilson (2017)’s standard error calculation ignores such dependence structure. and the asymptotic bias correction. Thus, our exercise shows that some of the results in Moretti and Wilson (2017) may not be robust to the presence of sample selection due to the endogeneity of the job market network.

Table 4. Comparison of our estimator and Moretti and Wilson (2017)
(a) This paper
Variable Estimator (s.e) CI
ATR 1.634 [2.656,5.872][-2.656,5.872]
(1.886)
CIT 1.666 [2.948,6.276][-2.948,6.276]
(2.040)
ITC 1.980 [0.651,3.290][0.651,3.290]
(0.584)
R&\&D credit 0.429 [1.352,2.117][-1.352,2.117]
(0.780)
(b) Moretti and Wilson (2017)
Estimator (s.e) CI
1.926 [0.918,2.933][0.918,2.933]
(0.514)
1.840 [0.687,2.992][0.687,2.992]
(0.588)
1.793 [0.987,2.598][0.987,2.598]
(0.411)
0.368 [0.011,0.724][0.011,0.724]
(0.182)

7. Conclusion

This paper studies identification and inference of a panel dyadic data sample selection model. We show that Kyriazidou (1997)’s identification strategy can be extended to our dyadic data setting, and we prove asymptotic normality of the proposed estimator.

Our estimator has some appealing properties. The distributional result implies that our estimator has the same convergence rates as the usual estimators used in practice in the non-degenerate case, and there is no loss of effective sample size for using our nonparametric type estimator. Also, our estimator is guaranteed to be asymptotically normal, while others can be non-Gaussian in the limit.

We also provide consistent estimators for asymptotic bias and variance that adapts to the degeneracy. Specifically, the bias corrected confidence interval has an asymptotically correct size. Our simple simulation exercise confirms the validity of these estimators and highlights the importance of bias correction in both degenerate and non-degenerate cases.

References

  • Abowd et al. (1999) Abowd, J. M., F. Kramarz, D. N. Margolis, B. Y. J. M. Abowd, F. Kramarz, and D. N. Margolis (1999): “High Wage Workers and High Wage Firms,” Econometrica, 67, 251–333.
  • Ahn and Powell (1993) Ahn, H. and J. L. Powell (1993): “Semiparametric estimation of censored selection models with a nonparametric selection mechanism,” Journal of Econometrics, 58, 3–29.
  • Auerbach (2022) Auerbach, E. (2022): “Identification and Estimation of a Partially Linear Regression Model Using Network Data,” Econometrica, 90, 347–365.
  • Bonhomme (2020) Bonhomme, S. (2020): “Econometric analysis of bipartite networks,” The Econometric Analysis of Network Data.
  • Cameron and Miller (2014) Cameron, A. C. and D. Miller (2014): “Robust Inference for Dyadic Data,” Unpublished manuscript.
  • Candelaria (2020) Candelaria, L. E. (2020): “A Semiparametric Network Formation Model with Unobserved Linear Heterogeneity,” .
  • Chamberlain (1980) Chamberlain, G. (1980): “Analysis of Covariance with Qualitative Data,” The Review of Economic Studies, 47, 225–238.
  • Davezies et al. (2021) Davezies, L., X. D’Haultfœuille, and Y. Guyonvarch (2021): “Empirical process results for exchangeable arrays,” The Annals of Statistics, 49, 845 – 862.
  • Dubé et al. (2021) Dubé, J. P., A. Hortaçsu, and J. Joo (2021): “Random-coefficients logit demand estimation with zero-valued market shares,” Marketing Science, 40.
  • Fafchamps and Gubert (2007) Fafchamps, M. and F. Gubert (2007): “The formation of risk sharing networks,” Journal of Development Economics, 83.
  • Ferreras Garrucho and Zylkin (2023) Ferreras Garrucho, D. and T. Zylkin (2023): penppml: Penalized Poisson Pseudo Maximum Likelihood Regression, r package version 0.2.3.
  • Graham (2017) Graham, B. S. (2017): “An Econometric Model of Network Formation With Degree Heterogeneity,” Econometrica, 85, 1033–1063.
  • Graham (2020) ——— (2020): “Dyadic regression,” The Econometric Analysis of Network Data.
  • Graham et al. (2019) Graham, B. S., F. Niu, and J. L. Powell (2019): “Kernel Density Estimation for Undirected Dyadic Data,” .
  • Graham et al. (2021) ——— (2021): “Minimax Risk and Uniform Convergence Rates for Nonparametric Dyadic Regression,” .
  • Hall (1984) Hall, P. (1984): “Central limit theorem for integrated square error of multivariate nonparametric density estimators,” Journal of Multivariate Analysis, 14, 1–16.
  • Head and Mayer (2014) Head, K. and T. Mayer (2014): “Gravity Equations: Workhorse,Toolkit, and Cookbook,” Handbook of International Economics, 4, 131–195.
  • Heckman (1979) Heckman, J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica, 47, 153–161.
  • Helpman et al. (2008) Helpman, E., M. Melitz, and Y. Rubinstein (2008): “Estimating trade flows: Trading partners and trading volumes,” Quarterly Journal of Economics, 123, 441–487.
  • Hoeffding (1961) Hoeffding, W. (1961): “The strong law of large numbers for U-statistics,” .
  • Horowitz (1992) Horowitz, J. L. (1992): “A Smoothed Maximum Score Estimator for the Binary Response Model,” Econometrica, 60.
  • Jiménez et al. (2014) Jiménez, G., S. Ongena, J.-L. Peydró, and J. Saurina (2014): “Hazardous Times for Monetary Policy: What Do Twenty-Three Million Bank Loans Say About the Effects of Monetary Policy on Credit Risk-Taking?” Econometrica, 82, 463–505.
  • Jochmans (2023) Jochmans, K. (2023): “Peer effects and endogenous social interactions,” Journal of Econometrics, 235, 1203–1214.
  • Johnsson and Moon (2021) Johnsson, I. and H. R. Moon (2021): “Estimation of Peer Effects in Endogenous Social Networks: Control Function Approach,” The Review of Economics and Statistics, 103, 328–345.
  • Kojevnikov et al. (2021) Kojevnikov, D., V. Marmer, and K. Song (2021): “Limit theorems for network dependent random variables,” Journal of Econometrics, 222, 882–908.
  • Kyriazidou (1997) Kyriazidou, E. (1997): “Estimation of a Panel Data Sample Selection Model,” Econometrica, 65.
  • Manski (1987) Manski, C. F. (1987): “Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data,” Econometrica, 55, 357–362.
  • Menzel (2021) Menzel, K. (2021): “Bootstrap With Cluster-Dependence in Two or More Dimensions,” Econometrica, 89, 2143–2188.
  • Monte et al. (2018) Monte, F., S. J. Redding, and E. Rossi-Hansberg (2018): “Commuting, migration, and local employment elasticities,” American Economic Review, 108.
  • Moretti and Wilson (2017) Moretti, E. and D. J. Wilson (2017): “The Effect of State Taxes on the Geographical Location of Top Earners: Evidence from Star Scientists,” American Economic Review, 107, 1858–1903.
  • Silva and Tenreyro (2006) Silva, J. M. S. and S. Tenreyro (2006): “The log of gravity,” Review of Economics and Statistics, 88.
  • Tabord-Meehan (2019) Tabord-Meehan, M. (2019): “Inference With Dyadic Data: Asymptotic Behavior of the Dyadic-Robust t-Statistic,” Journal of Business and Economic Statistics, 37.
  • White (2001) White, H. (2001): Asymptotic Theory for Econometricians, Academic Press.
  • Zeleneev (2020) Zeleneev, A. (2020): “Identification and Estimation of Network Models with Nonparametric Unobserved Heterogeneity,” https://www.princeton.edu/~zeleneev/azeleneev_jmp.pdf.

Appendix A. Proofs

Proof of Theorem 1

Proof.

First, consider the infeasible version of βn\beta_{n}, where γ^n\hat{\gamma}_{n} is replaced by true γ\gamma:

β~n=β+SWW1SWλ+SWW1SWν,\displaystyle\tilde{\beta}_{n}=\beta+S_{WW}^{-1}S_{W\lambda}+S_{WW}^{-1}S_{W\nu},

where SWW,SWλ,S_{WW},S_{W\lambda}, and SWνS_{W\nu} are the same as S^WW,S^Wλ,\hat{S}_{WW},\hat{S}_{W\lambda}, and S^Wν\hat{S}_{W\nu} except γ^n\hat{\gamma}_{n} replaced by γ\gamma. We use the following lemmas

Lemma 1.

Suppose Assumptions 1-9 hold. Then,

SWWpΣWW,\displaystyle S_{WW}\to_{p}\Sigma_{WW},

as nn\to\infty.

Lemma 2.

Suppose Assumptions 1-9 hold. Fix some h(0,)h\in(0,\infty). If Nhn2k+3hNh_{n}^{2k+3}\to h, then

NhnSWλphΣWλ,\displaystyle\sqrt{Nh_{n}}S_{W\lambda}\to_{p}\sqrt{h}\Sigma_{W\lambda},

as nn\to\infty. If Nhn2k+3Nh_{n}^{2k+3}\to\infty and nhn2k+2nh_{n}^{2k+2}\to\infty, then

hn(k+1)SWλpΣWν,\displaystyle h_{n}^{-(k+1)}S_{W\lambda}\to_{p}\Sigma_{W\nu},

as nn\to\infty.

Lemma 3.

Suppose Assumptions 1-9 hold. Fix an arbitrary nonzero vector cqwc\in\mathbb{R}^{q_{w}} and some constant h(0,]h\in(0,\infty]. Let cW=ΣWW1cc_{W}=\Sigma_{WW}^{-1}c. If cWΣWν,1cW>0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}>0 and Nhn2k+3hNh_{n}^{2k+3}\to h, then

ncSWW1SWνd𝒩(0,cWΣWν,1cW),\displaystyle\sqrt{n}c^{\prime}S_{WW}^{-1}S_{W\nu}\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}),

as nn\to\infty. If cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0 and Nhn2k+3hNh_{n}^{2k+3}\to h, then

NhncSWW1SWν\displaystyle\sqrt{Nh_{n}}c^{\prime}S_{WW}^{-1}S_{W\nu} d𝒩(0,cWΣWν,2cW),\displaystyle\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,2}c_{W}),

as nn\to\infty.

By combining Lemmas 1-3, the statement of Theorem 1 follows for β~n\tilde{\beta}_{n}. The following lemmas are used to show the negligibility of β^nβ~n\hat{\beta}_{n}-\tilde{\beta}_{n}:

Lemma 4.

Suppose Assumptions 1-10 hold. Fix some constant h(0,]h\in(0,\infty]. If Nhn2k+3hNh_{n}^{2k+3}\to h, then,

S^WW=SWW+op(1).\displaystyle\hat{S}_{WW}=S_{WW}+o_{p}(1).
Lemma 5.

Suppose Assumptions 1-10 hold. Fix some constant h(0,]h\in(0,\infty]. If Nhn2k+3hNh_{n}^{2k+3}\to h, then

S^Wλ=SWλ+op(1Nhn).\displaystyle\hat{S}_{W\lambda}=S_{W\lambda}+o_{p}\left(\frac{1}{\sqrt{Nh_{n}}}\right).
Lemma 6.

Suppose Assumptions 1-10 hold. Fix some constant h(0,]h\in(0,\infty]. If Nhn2k+3hNh_{n}^{2k+3}\to h, then

S^Wν=SWν+op(1Nhn).\displaystyle\hat{S}_{W\nu}=S_{W\nu}+o_{p}\left(\frac{1}{\sqrt{Nh_{n}}}\right).

By combining Lemmas 4-6, we have

β^nβ=β~nβ+op(1Nhn).\displaystyle\hat{\beta}_{n}-\beta=\tilde{\beta}_{n}-\beta+o_{p}\left(\frac{1}{\sqrt{Nh_{n}}}\right).

Thus, the normalization rn{n,Nhn,hn(k+1)}r_{n}\in\{\sqrt{n},\sqrt{Nh_{n}},h_{n}^{-(k+1)}\} corresponding to each case results in

rn(β^nβ)=rn(β~nβ)+op(1).\displaystyle r_{n}(\hat{\beta}_{n}-\beta)=r_{n}(\tilde{\beta}_{n}-\beta)+o_{p}(1).

Since β~n\tilde{\beta}_{n} satisfies the statement of Theorem 1, this completes the proof. ∎

Proof of Proposition 1

Proof.

We show the claim by the following steps.

Step 1: Σ^Wν,2pΣWν,2\hat{\Sigma}_{W\nu,2}\to_{p}\Sigma_{W\nu,2}


By expanding K2(ΔRijγ^n/hn)K^{2}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}/h_{n}) around ΔRijγ\Delta R_{ij}^{\prime}\gamma, we get

K2(ΔRijγ^n/hn)=K2(ΔRijγ/hn)+2ΔRij(γ^nγ)/hnK(cij,n/hn)K(c12n/hn),\displaystyle K^{2}(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}/h_{n})=K^{2}(\Delta R_{ij}^{\prime}\gamma/h_{n})+2\Delta R_{ij}^{\prime}(\hat{\gamma}_{n}-\gamma)/h_{n}K^{\prime}(c_{ij,n}^{*}/h_{n})K(c_{12n}^{*}/h_{n}),

where cij,nc_{ij,n}^{*} is between ΔRijγ\Delta R_{ij}^{\prime}\gamma and ΔRijγ^n\Delta R_{ij}^{\prime}\hat{\gamma}_{n}. Then,

Σ^Wν\displaystyle\hat{\Sigma}_{W\nu} =1Nhni<jK2(ΔR12γ/hn)dij1dij2ΔWijΔWijΔε^ij2Dp1,1\displaystyle=\underbrace{\frac{1}{Nh_{n}}\sum_{i<j}K^{2}(\Delta R_{12}^{\prime}\gamma/h_{n})d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\Delta\hat{\varepsilon}^{2}_{ij}}_{D_{p1,1}}
+2Nhn2i<jΔRij(γ^nγ)K(cijn/hn)K(cijn/hn)ΔWijΔWijΔε^ij2Dp1,2.\displaystyle+\underbrace{\frac{2}{Nh_{n}^{2}}\sum_{i<j}\Delta R_{ij}^{\prime}(\hat{\gamma}_{n}-\gamma)K^{\prime}(c_{ijn}^{*}/h_{n})K(c_{ijn}^{*}/h_{n})\Delta W_{ij}\Delta W_{ij}^{\prime}\Delta\hat{\varepsilon}^{2}_{ij}}_{D_{p1,2}}.

Sub-Step 1: Dp1,1pΣWν,2D_{p1,1}\to_{p}\Sigma_{W\nu,2}


Observe that

Δε^ij2νij2\displaystyle\Delta\hat{\varepsilon}_{ij}^{2}-\nu_{ij}^{2} =(ΔWij(ββ^n))2+λij2+2ΔWij(ββ^n)λij\displaystyle=\left(\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n})\right)^{2}+\lambda_{ij}^{2}+2\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n})\lambda_{ij}
+2ΔWij(ββ^n)νij+2λijνij.\displaystyle+2\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n})\nu_{ij}+2\lambda_{ij}\nu_{ij}.

Thus,

Dp1,1\displaystyle D_{p1,1} =1Nhni<jK2(ΔRijγ/hn)dij1dij2ΔWijΔWijνij2\displaystyle=\frac{1}{Nh_{n}}\sum_{i<j}K^{2}(\Delta R_{ij}^{\prime}\gamma/h_{n})d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\nu^{2}_{ij}
+1Nhni<jK2(ΔRijγ/hn)dij1dij2ΔWijΔWij(ΔWij(ββ^n))2\displaystyle+\frac{1}{Nh_{n}}\sum_{i<j}K^{2}(\Delta R_{ij}^{\prime}\gamma/h_{n})d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\left(\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n})\right)^{2}
+1Nhni<jK2(ΔRijγ/hn)dij1dij2ΔWijΔWijλij2\displaystyle+\frac{1}{Nh_{n}}\sum_{i<j}K^{2}(\Delta R_{ij}^{\prime}\gamma/h_{n})d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\lambda_{ij}^{2}
+2Nhni<jK2(ΔRijγ/hn)dij1dij2ΔWijΔWijΔWij(ββ^n)λij\displaystyle+\frac{2}{Nh_{n}}\sum_{i<j}K^{2}(\Delta R_{ij}^{\prime}\gamma/h_{n})d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n})\lambda_{ij}
+2Nhni<jK2(ΔRijγ/hn)dij1dij2ΔWijΔWijΔWij(ββ^n)νij\displaystyle+\frac{2}{Nh_{n}}\sum_{i<j}K^{2}(\Delta R_{ij}^{\prime}\gamma/h_{n})d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n})\nu_{ij}
+2Nhni<jK2(ΔRijγ/hn)dij1dij2ΔWijΔWijλijνij.\displaystyle+\frac{2}{Nh_{n}}\sum_{i<j}K^{2}(\Delta R_{ij}^{\prime}\gamma/h_{n})d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\lambda_{ij}\nu_{ij}.

We call each term by Dp1,1iD_{p1,1}^{i} for i=1,,6i=1,...,6 that is corresponding to each row.

The first term Dp1,11D_{p1,1}^{1} converges to ΣWν\Sigma_{W\nu}. Its expectation coincides with ΣWν\Sigma_{W\nu} in the limit as

E[Dp1,11]\displaystyle E[D_{p1,1}^{1}] =1hnE[d121d122ΔW12ΔW12ν122|ΔR12γ=r]K2(r/hn)fRγ(r)𝑑r\displaystyle=\frac{1}{h_{n}}\int E[d_{121}d_{122}\Delta W_{12}\Delta W_{12}^{\prime}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r]K^{2}(r/h_{n})f_{R\gamma}(r)dr
=E[d121d122ΔW12ΔW12ν122|ΔR12γ=rhn]K2(r)fRγ(rhn)𝑑r\displaystyle=\int E[d_{121}d_{122}\Delta W_{12}\Delta W_{12}^{\prime}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=rh_{n}]K^{2}(r)f_{R\gamma}(rh_{n})dr
=ΣWν,2+o(1),\displaystyle=\Sigma_{W\nu,2}+o(1),

where the last line holds by the dominated convergence theorem under Assumptions 4, 6, and 8. For the variance, denoting each summand by Dp1,1,ij1D_{p1,1,ij}^{1} and for any vector aa with a=1\|a\|=1, we have

Var[Dp1,11]\displaystyle Var[\|D_{p1,1}^{1}\|] 1Nhn2E[Dp1,1,1212]\displaystyle\leq\frac{1}{Nh_{n}^{2}}E\left[\|D_{p1,1,12}^{1}\|^{2}\right]
+2(n2)Nhn2E[Dp1,1,121×Dp1,1,131].\displaystyle+\frac{2(n-2)}{Nh_{n}^{2}}E[\|D_{p1,1,12}^{1}\|\times\|D_{p1,1,13}^{1}\|].

The first term in the right hand side is O(1/(Nhn))O(1/(Nh_{n})) because,

E[Dp1,1,1212]\displaystyle E\left[\|D_{p1,1,12}^{1}\|^{2}\right] E[ΔW124ν124|ΔR12γ=r]K2(r/hn)fRγ(r)𝑑r\displaystyle\leq\int E[\|\Delta W_{12}\|^{4}\nu_{12}^{4}|\Delta R_{12}^{\prime}\gamma=r]K^{2}(r/h_{n})f_{R\gamma}(r)dr
=hnE[ΔW124ν124|ΔR12γ=rhn]K2(r)fRγ(rhn)𝑑r\displaystyle=h_{n}\int E[\|\Delta W_{12}\|^{4}\nu_{12}^{4}|\Delta R_{12}^{\prime}\gamma=rh_{n}]K^{2}(r)f_{R\gamma}(rh_{n})dr
=O(hn),\displaystyle=O(h_{n}),

where the last line holds from Assumptions 4, 6, and 8. The second term on the right-hand side is O(1/nO(1/n) because

E[Dp1,1,121×Dp1,1,131]\displaystyle E[\|D_{p1,1,12}^{1}\|\times\|D_{p1,1,13}^{1}\|] E[E[ΔW122ν122|ΔR12γ=r1,ξ1,U1]\displaystyle\leq E\Big{[}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]
×E[ΔW132ν132|ΔR13γ=r1,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|^{2}\nu_{13}^{2}|\Delta R_{13}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]
×K2(r1/hn)K2(r2/hn)fRγ|ξ1,U1(r1)fRγ|ξ1,U1(r2)dr1dr2]\displaystyle\times K^{2}(r_{1}/h_{n})K^{2}(r_{2}/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{1})f_{R\gamma|\xi_{1},U_{1}}(r_{2})dr_{1}dr_{2}\Big{]}
=hn2E[E[ΔW122ν122|ΔR12γ=r1hn,ξ1,U1]\displaystyle=h_{n}^{2}E\Big{[}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]
×E[ΔW132ν132|ΔR13γ=r1hn,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|^{2}\nu_{13}^{2}|\Delta R_{13}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]
×K2(r1)K2(r2)fRγ|ξ1,U1(r1hn)fRγ|ξ1,U1(r1hn)dr1dr2]\displaystyle\times K^{2}(r_{1})K^{2}(r_{2})f_{R\gamma|\xi_{1},U_{1}}(r_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{1}h_{n})dr_{1}dr_{2}\Big{]}
=O(hn2),\displaystyle=O(h_{n}^{2}),

where the first line follows from Assumptions 4, 6, and 8. Thus,

Var[Dp1,11]=O(1Nhn)+O(1n)=o(1).\displaystyle Var[\|D_{p1,1}^{1}\|]=O\left(\frac{1}{Nh_{n}}\right)+O\left(\frac{1}{n}\right)=o(1).

This implies that Dp1,11pΣWν,2D_{p1,1}^{1}\to_{p}\Sigma_{W\nu,2} as nn\to\infty.

The second term Dp1,12D_{p1,1}^{2} converges to 0. Observe that, as KK is bounded by Assumption 8, for some absolute constant C>0C>0,

Dp1,12ββn2hn×CNi<jΔWij4.\displaystyle\|D_{p1,1}^{2}\|\leq\frac{\|\beta-\beta_{n}\|^{2}}{h_{n}}\times\frac{C}{N}\sum_{i<j}\|\Delta W_{ij}\|^{4}.

Since E[ΔWij4]<E[\|\Delta W_{ij}\|^{4}]<\infty by Assumption 7, we can apply the law of large numbers for U-statistics (Hoeffding (1961)) to C/Ni<jΔWij4C/N\sum_{i<j}\|\Delta W_{ij}\|^{4}, which is Op(1)O_{p}(1). Also, since ββn=Op(1/n)\|\beta-\beta_{n}\|=O_{p}(1/\sqrt{n}) (which is the worst-case rate for the specified hnh_{n} by Theorem 1), we have ββn2/hn=Op(1/(nhn2))=op(1)\|\beta-\beta_{n}\|^{2}/h_{n}=O_{p}(1/(nh_{n}^{2}))=o_{p}(1) as nhn2n×n2/(2k+3)=n(2k+1)/(2k+3)nh_{n}^{2}\sim n\times n^{-2/(2k+3)}=n^{(2k+1)/(2k+3)} diverges. Thus,

Dp1,12=op(1)×Op(1)=op(1),\displaystyle\|D_{p1,1}^{2}\|=o_{p}(1)\times O_{p}(1)=o_{p}(1),

and Dp1,12p0D_{p1,1}^{2}\to_{p}0 as nn\to\infty.

The third term Dp1,13D_{p1,1}^{3} converges to 0. Observe that,

E[Dp1,13]\displaystyle E[\|D_{p1,1}^{3}\|] E[ΔW122Λ122|ΔR12γ=r]r2K2(r/hn)fRγ(r)𝑑r\displaystyle\leq\int E[\|\Delta W_{12}^{2}\Lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r]r^{2}K^{2}(r/h_{n})f_{R\gamma}(r)dr
=hn3E[ΔW122Λ122|ΔR12γ=rhn]r2K2(r)fRγ(rhn)dr\displaystyle=h_{n}^{3}E[\|\Delta W_{12}^{2}\Lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=rh_{n}]r^{2}K^{2}(r)f_{R\gamma}(rh_{n})dr
=O(hn3)\displaystyle=O(h_{n}^{3})

where the last line follows from Assumptions 4, 6, and 8. Thus, E[Dp1,13]=o(1)E[\|D_{p1,1}^{3}\|]=o(1). Observe that, by writing each summand of Dp1,13D_{p1,1}^{3} as Dp1,1,ij3D_{p1,1,ij}^{3},

Var[Dp1,13]1Nhn2E[Dp1,1,1232]+2(n2)Nhn2E[Dp1,1,123×Dp1,1,133].\displaystyle Var[\|D_{p1,1}^{3}\|]\leq\frac{1}{Nh_{n}^{2}}E[\|D_{p1,1,12}^{3}\|^{2}]+\frac{2(n-2)}{Nh_{n}^{2}}E[\|D_{p1,1,12}^{3}\|\times\|D_{p1,1,13}^{3}\|].

The first term on the right hand is O(hn3/N)O(h_{n}^{3}/N) because

E[Dp,1,1232]\displaystyle E[\|D_{p,1,12}^{3}\|^{2}] E[ΔW124Λ124|ΔR12γ=r]r2K2(r/hn)fRγ(r)𝑑r\displaystyle\leq\int E[\|\Delta W_{12}\|^{4}\Lambda_{12}^{4}|\Delta R_{12}^{\prime}\gamma=r]r^{2}K^{2}(r/h_{n})f_{R\gamma}(r)dr
=hn5E[ΔW124Λ124|ΔR12γ=rhn]r4K2(r)fRγ(rhn)dr\displaystyle=h_{n}^{5}E[\|\Delta W_{12}\|^{4}\Lambda_{12}^{4}|\Delta R_{12}^{\prime}\gamma=rh_{n}]r^{4}K^{2}(r)f_{R\gamma}(rh_{n})dr
=O(hn5),\displaystyle=O(h_{n}^{5}),

where the last line holds from Assumptions 4, 6, and 8. The second term on the right hand side is O(hn4/n)O(h_{n}^{4}/n) because

E[Dp1,1,123×Dp1,1,133]\displaystyle E[\|D_{p1,1,12}^{3}\|\times\|D_{p1,1,13}^{3}\|] E[E[ΔW122Λ122|ΔR12γ=r1,ξ1,U1]\displaystyle\leq E\Big{[}\int E[\|\Delta W_{12}\|^{2}\Lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]
×E[ΔW132Λ132|ΔR13γ=r1,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|^{2}\Lambda_{13}^{2}|\Delta R_{13}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]
×r12r22K2(r1/hn)K2(r2/hn)fRγ|ξ1,U1(r1)fRγ|ξ1,U1(r2)dr1dr2]\displaystyle\times r_{1}^{2}r_{2}^{2}K^{2}(r_{1}/h_{n})K^{2}(r_{2}/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{1})f_{R\gamma|\xi_{1},U_{1}}(r_{2})dr_{1}dr_{2}\Big{]}
=hn6E[E[ΔW122Λ122|ΔR12γ=r1hn,ξ1,U1]\displaystyle=h_{n}^{6}E\Big{[}\int E[\|\Delta W_{12}\|^{2}\Lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]
×E[ΔW132Λ132|ΔR13γ=r2hn,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|^{2}\Lambda_{13}^{2}|\Delta R_{13}^{\prime}\gamma=r_{2}h_{n},\xi_{1},U_{1}]
×r12r22K2(r1)K2(r2)fRγ|ξ1,U1(r1hn)fRγ|ξ1,U1(r2hn)dr1dr2]\displaystyle\times r_{1}^{2}r_{2}^{2}K^{2}(r_{1})K^{2}(r_{2})f_{R\gamma|\xi_{1},U_{1}}(r_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{2}h_{n})dr_{1}dr_{2}\Big{]}
=O(hn6),\displaystyle=O(h_{n}^{6}),

where the last line follows from Assumptions 4, 6, and 8. Hence, we have

Var[Dp1,13]=O(hn3N)+O(hn4n)=o(1).\displaystyle Var[\|D_{p1,1}^{3}\|]=O\left(\frac{h_{n}^{3}}{N}\right)+O\left(\frac{h_{n}^{4}}{n}\right)=o(1).

This implies that Dp1,13p0D_{p1,1}^{3}\to_{p}0 as nn\to\infty.

The fourth term Dp1,14D_{p1,1}^{4} converges to 0. Observe that, since KK is bounded by Assumption 8 and γ<\|\gamma\|<\infty, for some constant C>0C>0,

Dp1,14Cββ^nhn×1Ni<jΔWij3ΔRij||Λ12|\displaystyle\|D_{p1,1}^{4}\|\leq\frac{C\|\beta-\hat{\beta}_{n}\|}{h_{n}}\times\frac{1}{N}\sum_{i<j}\|\Delta W_{ij}\|^{3}\|\Delta R_{ij}\|||\Lambda_{12}|

The sum part converges to the expectation of summand by the law of large numbers for U-statistics (Hoeffding (1961)) as E[ΔW123ΔR12|Λ12|]<E[\|\Delta W_{12}\|^{3}\|\Delta R_{12}\||\Lambda_{12}|]<\infty is bounded by Cauchy-Schwartz and Assumption 7. Thus, this part is Op(1)O_{p}(1). Also note that

ββ^nhn=Op(1nhn)=op(1),\displaystyle\frac{\|\beta-\hat{\beta}_{n}\|}{h_{n}}=O_{p}\left(\frac{1}{nh_{n}}\right)=o_{p}(1),

by Assumption 9. Hence,,

Dp1,14=op(1).\displaystyle\|D_{p1,1}^{4}\|=o_{p}(1).

This shows that Dp1,14p0D_{p1,1}^{4}\to_{p}0 as nn\to\infty.

The fifth term Dp1,15D_{p1,1}^{5} converges to 0. Observe that, since KK is bounded by Assumption 8, for some constant C>0C>0

Dp1,15CNi<jΔWij3|νij|×ββ^nhn.\displaystyle\|D_{p1,1}^{5}\|\leq\frac{C}{N}\sum_{i<j}\|\Delta W_{ij}\|^{3}|\nu_{ij}|\times\frac{\|\beta-\hat{\beta}_{n}\|}{h_{n}}.

The sum part is Op(1)O_{p}(1) because

E[ΔW123|ν12|]<,\displaystyle E[\|\Delta W_{12}\|^{3}|\nu_{12}|]<\infty,

by Assumption 7 and

Var[1NΔWij|νij|]E[ΔW126ν122]N+2(n2)NE[ΔW123ΔW133|ν12||ν13|]=o(1),\displaystyle Var\left[\frac{1}{N}\sum\|\Delta W_{ij}\||\nu_{ij}|\right]\leq\frac{E[\|\Delta W_{12}\|^{6}\nu_{12}^{2}]}{N}+\frac{2(n-2)}{N}E[\|\Delta W_{12}\|^{3}\Delta W_{13}\|^{3}|\nu_{12}||\nu_{13}|]=o(1),

as these two moments are bounded by Assumption 7. Thus,

Dp1,15=op(1),\displaystyle\|D_{p1,1}^{5}\|=o_{p}(1),

by the previous calculation for the term involving β^nβ\hat{\beta}_{n}-\beta. This shows that Dp1,15p0D_{p1,1}^{5}\to_{p}0 as nn\to\infty.

The sixth term Dp1,16D_{p1,1}^{6} converges to 0. Its expectation is exactly 0 by the conditional mean independence of νij\nu_{ij}. Also, by repeating the similar calculation as Var[Dp1,12]Var[\|D_{p1,1}^{2}\|] (by replacing νij2\nu_{ij}^{2} by λijνij\lambda_{ij}\nu_{ij}), we have

Var[Dp1,16]=O(1N)+O(hn2n)=o(1).\displaystyle Var[\|D_{p1,1}^{6}\|]=O\left(\frac{1}{N}\right)+O\left(\frac{h_{n}^{2}}{n}\right)=o(1).

This shows that D~1,6p0\tilde{D}_{1,6}\to_{p}0 as nn\to\infty.

Sub-Step 2: Dp1,2p0D_{p1,2}\to_{p}0


As before, we can decompose Dp1,2D_{p1,2} into Dp1,2iD_{p1,2}^{i} for i=1,,6i=1,...,6. Unlike in Dp1,1D_{p1,1}, we can no longer have the moments scaled by hnαh_{n}^{\alpha} because the middle values cij,nc_{ij,n}^{*} are in the kernels. Thus, by the previous calculation for Dp1,1D_{p1,1}, the Dp1,2iD_{p1,2}^{i} that involves νij2\nu_{ij}^{2},λij2\lambda_{ij}^{2}, or λijνij\lambda_{ij}\nu_{ij} will have the slowest convergence rate. So, it suffices to show that those terms converge to 0 in probability.

Pick up such Dp1,2iD_{p1,2}^{i} with νij2\nu_{ij}^{2}, which is Dp1,21D_{p1,2}^{1} and given by

Dp1,21=2Nhn2i<jdij1dij2ΔWijΔWijΔRijνij2K(cij,n/hn)K(cij,n)(γ^nγ)\displaystyle D_{p1,2}^{1}=\frac{2}{Nh_{n}^{2}}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\Delta R_{ij}^{\prime}\nu_{ij}^{2}K^{\prime}(c_{ij,n}^{*}/h_{n})K(c_{ij,n}^{*})(\hat{\gamma}_{n}-\gamma)

Observe that, for some constant C>0C>0

Dp1,21CNi<jΔWij2ΔRijνij2×γ^nγhn2\displaystyle\|D_{p1,2}^{1}\|\leq\frac{C}{N}\sum_{i<j}\|\Delta W_{ij}\|^{2}\|\Delta R_{ij}\|\nu_{ij}^{2}\times\frac{\|\hat{\gamma}_{n}-\gamma\|}{h_{n}^{2}}

The sum part is Op(1)O_{p}(1) because

E[ΔW122ΔR12ν122]<,\displaystyle E[\|\Delta W_{12}\|^{2}\|\Delta R_{12}\|\nu_{12}^{2}]<\infty,

by Assumption 7, and

Var[1Ni<jΔWij2ΔRijνij2]\displaystyle Var\left[\frac{1}{N}\sum_{i<j}\|\Delta W_{ij}\|^{2}\|\Delta R_{ij}\|\nu_{ij}^{2}\right]
E[ΔW124ΔR122ν124]N+2(n2)NE[ΔW122ΔW132ΔR12ΔR13ν122ν132]\displaystyle\leq\frac{E[\|\Delta W_{12}\|^{4}\|\Delta R_{12}\|^{2}\nu_{12}^{4}]}{N}+\frac{2(n-2)}{N}E[\|\Delta W_{12}\|^{2}\|\Delta W_{13}\|^{2}\|\Delta R_{12}\|\|\Delta R_{13}\|\nu_{12}^{2}\nu_{13}^{2}]
=o(1),\displaystyle=o(1),

as these moments are bounded by Assumption 7. The term involving γ^n\hat{\gamma}_{n} is op(1)o_{p}(1) because

γ^nγhn2\displaystyle\frac{\|\hat{\gamma}_{n}-\gamma\|}{h_{n}^{2}} =Nhnγ^nγNhn5=op(1),\displaystyle=\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|}{\sqrt{Nh_{n}^{5}}}=o_{p}(1),

by Assumption 10 and Nhn5=Nhn2k+3×hn2k+2Nh_{n}^{5}=Nh_{n}^{2k+3}\times h_{n}^{-2k+2} diverges for k2k\geq 2. Hence,

Dp1,21=Op(1)×op(1)=op(1).\displaystyle\|D_{p1,2}^{1}\|=O_{p}(1)\times o_{p}(1)=o_{p}(1).

This shows that Dp1,21p0D_{p1,2}^{1}\to_{p}0 as nn\to\infty. Thus, by the above argument, it follows that Dp1,2p0D_{p1,2}\to_{p}0 as nn\to\infty.

These two sub-steps conclude that

Σ^Wν,2pΣWν,2,\displaystyle\hat{\Sigma}_{W\nu,2}\to_{p}\Sigma_{W\nu,2},

as nn\to\infty. This finishes Step 1.

Step 2: Σ^Wν,1pΣWν,1\hat{\Sigma}_{W\nu,1}\to_{p}\Sigma_{W\nu,1}


Define

Sij2dij1dij2Khn(ΔRijγ)ΔWijΔϵ^ij,\displaystyle S_{ij}\equiv 2d_{ij1}d_{ij2}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)\Delta W_{ij}\Delta\hat{\epsilon}_{ij},

and let Σ~Wν,1\tilde{\Sigma}_{W\nu,1} be Σ^Wν,1\hat{\Sigma}_{W\nu,1} with S^ij\hat{S}_{ij} replaced by SijS_{ij}. First, we use the following result:

Lemma 7.

Suppose that Assumptions 1-10 hold. If hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} for some h>0h>0, we have

Σ~Wν,1pΣWν,1,\displaystyle\tilde{\Sigma}_{W\nu,1}\to_{p}\Sigma_{W\nu,1},

as nn\to\infty.

Then, it is enough to show that Σ^Wν,1\hat{\Sigma}_{W\nu,1} is well approximated by Σ~Wν,1\tilde{\Sigma}_{W\nu,1}:

Lemma 8.

Suppose that Assumptions 1-10 hold. If hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} for some h>0h>0, we have

Σ^Wν,1Σ~Wν,1=op(1).\displaystyle\|\hat{\Sigma}_{W\nu,1}-\tilde{\Sigma}_{W\nu,1}\|=o_{p}(1).

Lemmas 7 and 8 imply that

Σ^Wν,1ΣWν,1Σ^Wν,1Σ~Wν,1+Σ~Wν,1ΣWν,1=op(1),\displaystyle\|\hat{\Sigma}_{W\nu,1}-\Sigma_{W\nu,1}\|\leq\|\hat{\Sigma}_{W\nu,1}-\tilde{\Sigma}_{W\nu,1}\|+\|\tilde{\Sigma}_{W\nu,1}-\Sigma_{W\nu,1}\|=o_{p}(1),

which shows the consistency of Σ^Wν,1\hat{\Sigma}_{W\nu,1} for ΣWν,1\Sigma_{W\nu,1}. This finishes Step 2.

Step 3: cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0 case


Observe that, by some algebra,

nhncS^WW1Σ^Wν,1S^WW1c\displaystyle nh_{n}c^{\prime}\hat{S}_{WW}^{-1}\hat{\Sigma}_{W\nu,1}\hat{S}_{WW}^{-1}c
=nhnc(S^WW1ΣWW1)Σ^Wν,1(S^WW1ΣWW1)c\displaystyle=nh_{n}c^{\prime}(\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1})\hat{\Sigma}_{W\nu,1}(\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1})c
+nhncΣWW1Σ^Wν,1(S^WW1ΣWW1)c+nhnc(S^WW1)Σ^Wν,1ΣWW1c\displaystyle+nh_{n}c^{\prime}\Sigma_{WW}^{-1}\hat{\Sigma}_{W\nu,1}(\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1})c+nh_{n}c^{\prime}(\hat{S}_{WW}^{-1})\hat{\Sigma}_{W\nu,1}\Sigma_{WW}^{-1}c
+3nhncΣWW1Σ^Wν,1ΣWW1c.\displaystyle+3nh_{n}c^{\prime}\Sigma_{WW}^{-1}\hat{\Sigma}_{W\nu,1}\Sigma_{WW}^{-1}c.

We show the negligibility of the first line in the right hand side of this decomposition. By the proof of Lemma 1, we have that

S^WW1ΣWW1=op(nα/2)\displaystyle\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1}=o_{p}(n^{-\alpha/2})

for any α(0,1)\alpha\in(0,1). Thus,

nhnc(S^WW1ΣWW1)Σ^Wν,1(S^WW1ΣWW1)=n1αhnop(1)Σ^Wν,1op(1)=op(1),\displaystyle nh_{n}c^{\prime}(\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1})\hat{\Sigma}_{W\nu,1}(\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1})=n^{1-\alpha}h_{n}o_{p}(1)\hat{\Sigma}_{W\nu,1}o_{p}(1)=o_{p}(1),

for α[(2k+1)/(2k+3),1)\alpha\in[(2k+1)/(2k+3),1) as n1αhn=n(2k+1α(2k+3))/(2k+3)=o(1)n^{1-\alpha}h_{n}=n^{(2k+1-\alpha(2k+3))/(2k+3)}=o(1) and Σ^Wν,1=Op(1)\hat{\Sigma}_{W\nu,1}=O_{p}(1) by the above Step 2.

The remaining terms are shown to be negligible by applying the following lemmas:

Lemma 9.

Suppose that Assumptions 1-10 hold. If cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0 and hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} for some h(0,)h\in(0,\infty), we have

n1α/2hnΣ^Wν,1cW\displaystyle n^{1-\alpha/2}h_{n}\hat{\Sigma}_{W\nu,1}c_{W} p0,\displaystyle\to_{p}0,
n1α/2hncWΣ^Wν,1\displaystyle n^{1-\alpha/2}h_{n}c_{W}^{\prime}\hat{\Sigma}_{W\nu,1} p0,\displaystyle\to_{p}0,

as nn\to\infty for α[6/(2k+3),1)\alpha\in[6/(2k+3),1).

Lemma 10.

Suppose that Assumptions 1-10 hold. If cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0 and hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} for some h(0,)h\in(0,\infty), we have

nhncWΣ^Wν,1cWp0,\displaystyle nh_{n}c_{W}^{\prime}\hat{\Sigma}_{W\nu,1}c_{W}\to_{p}0,

as nn\to\infty.

Then, by Lemmas 9 and 10, the last two lines are shown to be

nhncΣWW1Σ^Wν,1(S^WW1ΣWW1)c+nhnc(S^WW1ΣWW1)Σ^Wν,1ΣWW1c\displaystyle nh_{n}c^{\prime}\Sigma_{WW}^{-1}\hat{\Sigma}_{W\nu,1}(\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1})c+nh_{n}c^{\prime}(\hat{S}_{WW}^{-1}-\Sigma_{WW}^{-1})\hat{\Sigma}_{W\nu,1}\Sigma_{WW}^{-1}c
+3nhncΣWW1Σ^Wν,1ΣWW1c\displaystyle+3nh_{n}c^{\prime}\Sigma_{WW}^{-1}\hat{\Sigma}_{W\nu,1}\Sigma_{WW}^{-1}c
=n1α/2hncΣWW1Σ^Wν,1op(1)+cop(1)n1α/2hnΣ^Wν,1ΣWW1c+op(1)\displaystyle=n^{1-\alpha/2}h_{n}c^{\prime}\Sigma_{WW}^{-1}\hat{\Sigma}_{W\nu,1}o_{p}(1)+c^{\prime}o_{p}(1)n^{1-\alpha/2}h_{n}\hat{\Sigma}_{W\nu,1}\Sigma_{WW}^{-1}c+o_{p}(1)
=op(1).\displaystyle=o_{p}(1).

Hence,

nhncS^WW1Σ^Wν,1S^WWc=op(1).\displaystyle nh_{n}c^{\prime}\hat{S}_{WW}^{-1}\hat{\Sigma}_{W\nu,1}\hat{S}_{WW}c=o_{p}(1).

Steps 1-3 finish the proof of Proposition 1. ∎

Proof of Proposition 2

Proof.

Since

hn,δ(k+1)(β^n,δβ^n)=hn,δ(k+1)(β^n,δβ)hn,δ(k+1)(β^nβ),\displaystyle h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n,\delta}-\hat{\beta}_{n})=h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n,\delta}-\beta)-h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n}-\beta),

where the first term on the right hand side converges to ΣWW1ΣWλ\Sigma_{WW}^{-1}\Sigma_{W\lambda} by Theorem 1 as Nhn,δ2k+3Nh_{n,\delta}^{2k+3}\to\infty, it suffices to show that

hn,δ(k+1)(β^nβ)=op(1).\displaystyle h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n}-\beta)=o_{p}(1).

Take an arbitrary non-zero vector cqwc\in\mathbb{R}^{q_{w}}. Since β^n\hat{\beta}_{n} is calculated based on hn=hN1/(2k+3)h_{n}=hN^{-1/(2k+3)} such that Nhn2k+3hNh_{n}^{2k+3}\to h, by Theorem 1,

hn,δ(k+1)c(β^nβ)\displaystyle h_{n,\delta}^{-(k+1)}c^{\prime}(\hat{\beta}_{n}-\beta) =1nhn,δ2(k+1)×nc(β^nβ)=Op(1)=op(1)\displaystyle=\frac{1}{\sqrt{nh_{n,\delta}^{2(k+1)}}}\times\underbrace{\sqrt{n}c^{\prime}(\hat{\beta}_{n}-\beta)}_{=O_{p}(1)}=o_{p}(1)

since

nhn,δ2(k+1)n×n4δ(k+1)/(2k+3)=n2k+34δ(k+1)2k+3\displaystyle nh_{n,\delta}^{2(k+1)}\sim n\times n^{-4\delta(k+1)/(2k+3)}=n^{\frac{2k+3-4\delta(k+1)}{2k+3}}

diverges for δ(0,2k+34k+4)\delta\in(0,\frac{2k+3}{4k+4}), which is assumed by the hypothesis. Since cc is arbitrary, hn,δ(k+1)(β^nβ)=op(1)h_{n,\delta}^{-(k+1)}(\hat{\beta}_{n}-\beta)=o_{p}(1), which completes the proof. ∎

Proofs of Lemmas

Proof of Lemma 1

Proof.

Write each summand of SWWS_{WW} as SWW,ijS_{WW,ij}. Since it suffices to show the element-wise convergence of SWWS_{WW} to ΣWW\Sigma_{WW}, we use a unit vector eqwe\in\mathbb{R}^{q_{w}} with the arbitrary element being 11 and 0 elsewhere. Observe that

E[eSWWe]=E[eSWW,ije]\displaystyle E[e^{\prime}S_{WW}e]=E[e^{\prime}S_{WW,ij}e] =1hnE[d121d122eΔW12ΔW12e|ΔR12γ=r]K(r/hn)fRγ(r)𝑑r\displaystyle=\frac{1}{h_{n}}\int E[d_{121}d_{122}e^{\prime}\Delta W_{12}\Delta W_{12}^{\prime}e|\Delta R_{12}^{\prime}\gamma=r]K(r/h_{n})f_{R\gamma}(r)dr
=E[d121d122eΔW12ΔW12e|ΔR12γ=rhn]K(r)fRγ(rhn)𝑑r\displaystyle=\int E[d_{121}d_{122}e^{\prime}\Delta W_{12}\Delta W_{12}^{\prime}e|\Delta R_{12}^{\prime}\gamma=rh_{n}]K(r)f_{R\gamma}(rh_{n})dr
=eΣWWe+op(1),\displaystyle=e^{\prime}\Sigma_{WW}e+o_{p}(1),

where the last line holds from the dominated convergence theorem under Assumptions 4, 6 and 8. Since SWW,ijS_{WW,ij} and SWW,klS_{WW,kl} are independent if ik,li\neq k,l and jk,lj\neq k,l by Assumption 1, observe that

Var(eSWWe)\displaystyle Var(e^{\prime}S_{WW}e) =Var(SWW,12)N+2(n2)NCov(eSWW,12e,eSWW,13e).\displaystyle=\frac{Var(S_{WW,12})}{N}+\frac{2(n-2)}{N}Cov(e^{\prime}S_{WW,12}e,e^{\prime}S_{WW,13}e).

For the variance, we have

Var(SWW,12)\displaystyle Var(S_{WW,12}) E[(eSWW,12e)2]\displaystyle\leq E[(e^{\prime}S_{WW,12}e)^{2}]
1hn2E[ΔW124|ΔR12γ=r]K2(r/hn)fRγ(r)𝑑r\displaystyle\leq\frac{1}{h_{n}^{2}}\int E[\|\Delta W_{12}\|^{4}|\Delta R_{12}^{\prime}\gamma=r]K^{2}(r/h_{n})f_{R\gamma}(r)dr
=1hnE[ΔW124|ΔR12γ=rhn]K2(r)fRγ(rhn)𝑑r\displaystyle=\frac{1}{h_{n}}\int E[\|\Delta W_{12}\|^{4}|\Delta R_{12}^{\prime}\gamma=rh_{n}]K^{2}(r)f_{R\gamma}(rh_{n})dr
=O(1hn),\displaystyle=O\left(\frac{1}{h_{n}}\right),

where the last line holds from Assumptions 4, 6 and 8. For the covariance, by the conditional independence of ΔW12\Delta W_{12} and ΔW13\Delta W_{13}, we have

Cov(eSWW,12e,eSWW,13e)\displaystyle Cov(e^{\prime}S_{WW,12}e,e^{\prime}S_{WW,13}e)
E[eSWW,12e×eSWW,13e]\displaystyle\leq E[e^{\prime}S_{WW,12}e\times e^{\prime}S_{WW,13}e]
1hnE[ΔW122|ΔR12γ=r1,ξ1,U1]×E[ΔW132|ΔR13γ=r2,ξ1,U1]\displaystyle\leq\frac{1}{h_{n}}\int E[\|\Delta W_{12}\|^{2}|\Delta R_{12}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]\times E[\|\Delta W_{13}\|^{2}|\Delta R_{13}^{\prime}\gamma=r_{2},\xi_{1},U_{1}]
×|K(r1/hn)||K(r2/hn)|fRγ|ξ1,U1(r1)fRγ|ξ1,U1(r2)dr1dr2\displaystyle\times|K(r_{1}/h_{n})||K(r_{2}/h_{n})|f_{R\gamma|\xi_{1},U_{1}}(r_{1})f_{R\gamma|\xi_{1},U_{1}}(r_{2})dr_{1}dr_{2}
=hnE[ΔW122|ΔR12γ=r1hn,ξ1,U1]×E[ΔW132|ΔR13γ=r2hn,ξ1,U1]\displaystyle=h_{n}\int E[\|\Delta W_{12}\|^{2}|\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]\times E[\|\Delta W_{13}\|^{2}|\Delta R_{13}^{\prime}\gamma=r_{2}h_{n},\xi_{1},U_{1}]
×|K(r1)||K(r2)|fRγ|ξ1,U1(r1hn)fRγ|ξ1,U1(r2hn)dr1dr2\displaystyle\times|K(r_{1})||K(r_{2})|f_{R\gamma|\xi_{1},U_{1}}(r_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{2}h_{n})dr_{1}dr_{2}
=O(hn),\displaystyle=O(h_{n}),

where the last line holds from Assumptions 4, 6 and 8. Thus,

Var(eSWWe)=O(1Nhn)+O(hnn)=o(1).\displaystyle Var(e^{\prime}S_{WW}e)=O\left(\frac{1}{Nh_{n}}\right)+O\left(\frac{h_{n}}{n}\right)=o(1).

By Chebychev’s inequality, eSWWepeΣWWee^{\prime}S_{WW}e\to_{p}e^{\prime}\Sigma_{WW}e as nn\to\infty. Since ee is arbitrary, this completes the proof.

Proof of Lemma 2

Proof.

Write each summand of SWλS_{W\lambda} as SWλ,ijS_{W\lambda,ij}. We use a unit vector eqwe\in\mathbb{R}^{q_{w}} with an arbitrary element being 11 and 0 elsewhere. Observe that, for large enough nn,

E[eSWλ]=E[eSWλ,ij]\displaystyle E[e^{\prime}S_{W\lambda}]=E[e^{\prime}S_{W\lambda,ij}] =1hnE[d121d122eΔW12λ12|ΔR12γ=r]K(r/hn)fRγ(r)𝑑r\displaystyle=\frac{1}{h_{n}}\int E[d_{121}d_{122}e^{\prime}\Delta W_{12}\lambda_{12}|\Delta R_{12}^{\prime}\gamma=r]K(r/h_{n})f_{R\gamma}(r)dr
=hneg(rhn)rK(r)𝑑r\displaystyle=h_{n}\int e^{\prime}g(rh_{n})rK(r)dr
=hnk+1k!(ekg(rhn)rk+o(1))rk+1K(r)𝑑r\displaystyle=\frac{h_{n}^{k+1}}{k!}\int\left(e^{\prime}\frac{\partial^{k}g(rh_{n})}{\partial r^{k}}+o(1)\right)r^{k+1}K(r)dr
=hnk+1eΣWλ+o(hnk+1),\displaystyle=h_{n}^{k+1}e^{\prime}\Sigma_{W\lambda}+o(h_{n}^{k+1}),

where the second line holds from λ12=Λ12×ΔR12γ\lambda_{12}=\Lambda_{12}\times\Delta R_{12}^{\prime}\gamma, the third line holds from Assumption 8 eliminating siK(s)\int s^{i}K(s) for i=1,,ki=1,...,k, and the last line holds from the dominated convergence theorem under Assumptions 6 and 8. Observe that

Var[eSWλ]=Var[eSWλ,12]N+2(n2)NCov[eSWλ,12,eSWλ,13].\displaystyle Var[e^{\prime}S_{W\lambda}]=\frac{Var[e^{\prime}S_{W\lambda,12}]}{N}+\frac{2(n-2)}{N}Cov[e^{\prime}S_{W\lambda,12},e^{\prime}S_{W\lambda,13}].

For the variance, we have

Var[eSWλ,12]\displaystyle Var[e^{\prime}S_{W\lambda,12}] E[(eSWλ,12)2]\displaystyle\leq E[(e^{\prime}S_{W\lambda,12})^{2}]
1hn2E[ΔW122λ122|ΔR12γ=r]K2(r/hn)fRγ(r)𝑑r\displaystyle\leq\frac{1}{h_{n}^{2}}\int E[\|\Delta W_{12}\|^{2}\lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r]K^{2}(r/h_{n})f_{R\gamma}(r)dr
=hnE[ΔW122Λ122|ΔR12γ=rhn]r2K2(r)fRγ(rhn)𝑑r\displaystyle=h_{n}\int E[\|\Delta W_{12}\|^{2}\Lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=rh_{n}]r^{2}K^{2}(r)f_{R\gamma}(rh_{n})dr
=O(hn),\displaystyle=O(h_{n}),

where the last line holds from Cauchy-Schwartz and Assumptions 4, 6, and 8. For the covariance, we have

Cov[eSWλ,12,eSWλ,13]\displaystyle Cov[e^{\prime}S_{W\lambda,12},e^{\prime}S_{W\lambda,13}]
E[eSWλ,12×eSWλ,13]\displaystyle\leq E[e^{\prime}S_{W\lambda,12}\times e^{\prime}S_{W\lambda,13}]
=1hn2E[E[d121d122eΔW12Λ12|ΔR12γ=r1,ξ1,U1]E[d131d132eΔW13Λ13|ΔR13γ=r1,ξ1,U1]\displaystyle=\frac{1}{h_{n}^{2}}E\Big{[}\int E[d_{121}d_{122}e^{\prime}\Delta W_{12}\Lambda_{12}|\Delta R_{12}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]E[d_{131}d_{132}e^{\prime}\Delta W_{13}\Lambda_{13}|\Delta R_{13}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]
×r1r2K(r1/hn)K(r2/hn)fRγ,ξ1,U1(r1)fRγ,ξ1,U1(r2)dr1dr2]\displaystyle\times r_{1}r_{2}K(r_{1}/h_{n})K(r_{2}/h_{n})f_{R\gamma,\xi_{1},U_{1}}(r_{1})f_{R\gamma,\xi_{1},U_{1}}(r_{2})dr_{1}dr_{2}\Big{]}
hn2E[E[ΔW12|Λ12||ΔR12γ=r1hn,ξ1,U1]E[ΔW13|Λ13||ΔR13γ=r2hn,ξ1,U1]\displaystyle\leq h_{n}^{2}E\Big{[}\int E[\|\Delta W_{12}\||\Lambda_{12}||\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]E[\|\Delta W_{13}\||\Lambda_{13}||\Delta R_{13}^{\prime}\gamma=r_{2}h_{n},\xi_{1},U_{1}]
×r1r2K(r1)K(r2)fRγ,ξ1,U1(r1hn)fRγ,ξ1,U1(r2hn)dr1dr2]\displaystyle\times r_{1}r_{2}K(r_{1})K(r_{2})f_{R\gamma,\xi_{1},U_{1}}(r_{1}h_{n})f_{R\gamma,\xi_{1},U_{1}}(r_{2}h_{n})dr_{1}dr_{2}\Big{]}
=O(hn2),\displaystyle=O(h_{n}^{2}),

where the last line holds from Cauchy-Schwartz and Assumptions 4, 6, and 8. Thus,

Var[eSWλ]=O(hnN)+O(hn2n)=O(hn2n),\displaystyle Var[e^{\prime}S_{W\lambda}]=O\left(\frac{h_{n}}{N}\right)+O\left(\frac{h_{n}^{2}}{n}\right)=O\left(\frac{h_{n}^{2}}{n}\right),

since O(hn/N)=O(hn2/n)×O(1/(nhn))=o(hn2/n)O(h_{n}/N)=O(h_{n}^{2}/n)\times O(1/(nh_{n}))=o(h_{n}^{2}/n) under Assumption 9.

If Nhn2k+3hNh_{n}^{2k+3}\to h for some 0<h<0<h<\infty, note that

NhnE[eSWλ]=Nhn2k+3eΣWλ+o(Nhn2k+3)hΣWλ,\displaystyle\sqrt{Nh_{n}}E[e^{\prime}S_{W\lambda}]=\sqrt{Nh_{n}^{2k+3}}e^{\prime}\Sigma_{W\lambda}+o(\sqrt{Nh_{n}^{2k+3}})\to\sqrt{h}\Sigma_{W\lambda},

as nn\to\infty. Also,

Var[NhneSWλ]=O(Nhn3n)=O(nhn)×O(hn2)=o(1),\displaystyle Var[\sqrt{Nh_{n}}e^{\prime}S_{W\lambda}]=O\left(\frac{Nh_{n}^{3}}{n}\right)=O(nh_{n})\times O(h_{n}^{2})=o(1),

by Assumption 9. Thus, by Chebyshev’s inequality, we have

NhneSWλpheΣWλ,\displaystyle\sqrt{Nh_{n}}e^{\prime}S_{W\lambda}\to_{p}\sqrt{h}e^{\prime}\Sigma_{W\lambda},

as nn\to\infty.

If Nhn2k+3Nh_{n}^{2k+3}\to\infty, note that

hn(k+1)E[eSWλ]=eΣWλ+o(1)eΣWλ,\displaystyle h_{n}^{-(k+1)}E[e^{\prime}S_{W\lambda}]=e^{\prime}\Sigma_{W\lambda}+o(1)\to e^{\prime}\Sigma_{W\lambda},

as nn\to\infty. Also,

Var[hn(k+1)eSWλ]=O(hn2nhn2k+2)=o(1),\displaystyle Var[h_{n}^{-(k+1)}e^{\prime}S_{W\lambda}]=O\left(\frac{h_{n}^{2}}{nh_{n}^{2k+2}}\right)=o(1),

as nhn2k+2nh_{n}^{2k+2}\to\infty by the hypothesis. Thus, by Chebyshev’s inequality, we have

hn(k+1)eSWλpeΣWλ,\displaystyle h_{n}^{-(k+1)}e^{\prime}S_{W\lambda}\to_{p}e^{\prime}\Sigma_{W\lambda},

as nn\to\infty. Since ee is arbitrary, this completes the proof. ∎

Proof of Lemma 3

Proof.

The proof is done in the following steps:

Step 0: Decomposition


Observe that

cSWW1SWν=c(SWW1ΣWW1)SWν+cΣWW1SWν\displaystyle c^{\prime}S_{WW}^{-1}S_{W\nu}=c^{\prime}(S_{WW}^{-1}-\Sigma_{WW}^{-1})S_{W\nu}+c^{\prime}\Sigma_{WW}^{-1}S_{W\nu}

In Steps 1-2, we verify the asymptotic normality of SWνS_{W\nu}, with the worst-case convergence rate being n\sqrt{n}. Given that result, the first term on the right-hand side is shown to be negligible even when normalized by Nhn\sqrt{Nh_{n}}:

Nhnc(SWW1ΣWW1)SWν\displaystyle\sqrt{Nh_{n}}c^{\prime}(S_{WW}^{-1}-\Sigma_{WW}^{-1})S_{W\nu} =Nhnop(nα/2)Op(1/n)\displaystyle=\sqrt{Nh_{n}}o_{p}(n^{-\alpha/2})O_{p}(1/\sqrt{n})
=n1αhnop(1)=op(1)\displaystyle=\sqrt{n^{1-\alpha}h_{n}}o_{p}(1)=o_{p}(1)

because by Lemma 1, SWW1ΣWW1=op(nα/2)S_{WW}^{-1}-\Sigma_{WW}^{-1}=o_{p}(n^{-\alpha/2}) for any α(0,1)\alpha\in(0,1) and n1αhn=o(1)n^{1-\alpha}h_{n}=o(1) for sufficiently large α\alpha under the hypothesis. Thus,

cSWW1SWν=cΣWW1SWν+op(1/Nhn),\displaystyle c^{\prime}S_{WW}^{-1}S_{W\nu}=c^{\prime}\Sigma_{WW}^{-1}S_{W\nu}+o_{p}(1/\sqrt{Nh_{n}}),

and it suffices to establish the asymptotic normality of SWνS_{W\nu}. Write c=cWc=c_{W} for short. Observe that, since E[SWν]=0E[S_{W\nu}]=0 by the definition of νij\nu_{ij}, cSWνc^{\prime}S_{W\nu} can be decomposed as

cSWν=1ni=1nLi,WνLWν+1Ni<jQij,WνQWν,\displaystyle c^{\prime}S_{W\nu}=\underbrace{\frac{1}{n}\sum_{i=1}^{n}L_{i,W\nu}}_{L_{W\nu}}+\underbrace{\frac{1}{N}\sum_{i<j}Q_{ij,W\nu}}_{Q_{W\nu}},

where

Li,Wν\displaystyle L_{i,W\nu} 2E[dij1dij2cΔWijνijKhn(ΔRijγ)|ξi,Ui]\displaystyle\equiv 2E[d_{ij1}d_{ij2}c^{\prime}\Delta W_{ij}\nu_{ij}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)|\xi_{i},U_{i}]
Qij,Wν\displaystyle Q_{ij,W\nu} =dij1dij2cΔWijνijKhn(ΔRijγ)E[dij1dij2cΔWijνijKhn(ΔRijγ)|ξi,Ui]\displaystyle=d_{ij1}d_{ij2}c^{\prime}\Delta W_{ij}\nu_{ij}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)-E[d_{ij1}d_{ij2}c^{\prime}\Delta W_{ij}\nu_{ij}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)|\xi_{i},U_{i}]
E[dij1dij2ΔWijνijKhn(ΔRijγ)|ξj,Uj]\displaystyle-E[d_{ij1}d_{ij2}\Delta W_{ij}\nu_{ij}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)|\xi_{j},U_{j}]

By design, we have that Cov[Li,Wν,Lj,Wν]=Cov[Li,Wν,Qkl,Wν]=Cov[Qij,Wν,Qkl,Wν]=0Cov[L_{i,W\nu},L_{j,W\nu}]=Cov[L_{i,W\nu},Q_{kl,W\nu}]=Cov[Q_{ij,W\nu},Q_{kl,W\nu}]=0 for any iji\neq j, klk\neq l, and ijklij\neq kl. We show the asymptotic normality of cSWνc^{\prime}S_{W\nu} in the following.

Step 1: Asymptotic Normality of LWνL_{W\nu}


Define VLV_{L} by

VL=nLWν=i=1Li,WνnVi,L\displaystyle V_{L}=\sqrt{n}L_{W\nu}=\sum_{i=1}\underbrace{\frac{L_{i,W\nu}}{\sqrt{n}}}_{V_{i,L}}

Note that E[Vi,Wν]=0E[V_{i,W\nu}]=0 by the mean independence of νij\nu_{ij}. Observe that

Var[VL]\displaystyle Var[V_{L}] =Var[Li,Wν]\displaystyle=Var[L_{i,W\nu}]
=4E[E[d121d122cΔW12ν12Khn(ΔR12γ)|ξ1,U1]2]\displaystyle=4E\left[E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|\xi_{1},U_{1}]^{2}\right]
=4E[d121d122d131d132cΔW12cΔW13ν12ν13Khn(ΔR12γ)Khn(ΔR13γ)]\displaystyle=4E[d_{121}d_{122}d_{131}d_{132}c^{\prime}\Delta W_{12}c^{\prime}\Delta W_{13}\nu_{12}\nu_{13}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)K_{h_{n}}(\Delta R_{13}^{\prime}\gamma)]
=4hn2E[d121d122d131d132cΔW12cΔW13ν12ν13|ΔR12γ=r1,ΔR13γ=r2]\displaystyle=\frac{4}{h_{n}^{2}}\int E[d_{121}d_{122}d_{131}d_{132}c^{\prime}\Delta W_{12}c^{\prime}\Delta W_{13}\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=r_{1},\Delta R_{13}^{\prime}\gamma=r_{2}]
×K(r1/hn)K(r2/hn)fRγ,2(r1,r2)dr1dr2\displaystyle\times K(r_{1}/h_{n})K(r_{2}/h_{n})f_{R\gamma,2}(r_{1},r_{2})dr_{1}dr_{2}
=4E[d121d122d131d132cΔW12cΔW13ν12ν13|ΔR12γ=r1hn,ΔR13γ=r2hn]\displaystyle=4\int E[d_{121}d_{122}d_{131}d_{132}c^{\prime}\Delta W_{12}c^{\prime}\Delta W_{13}\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\Delta R_{13}^{\prime}\gamma=r_{2}h_{n}]
×K(r1)K(r2)fRγ,2(r1hn,r2hn)dr1dr2\displaystyle\times K(r_{1})K(r_{2})f_{R\gamma,2}(r_{1}h_{n},r_{2}h_{n})dr_{1}dr_{2}
=cΣWν,1c+op(1),\displaystyle=c^{\prime}\Sigma_{W\nu,1}c+o_{p}(1),

where the last line holds from the dominated convergence theorem under Assumptions 4, 6, and 8. Furthermore, note that

E[d121d122cΔW12ν12Khn(ΔR12γ)|ξ1,U1]\displaystyle E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|\xi_{1},U_{1}]
=1hnE[d121d122cΔW12ν12|ΔR12γ=r,ξ1,U1]K(r/hn)fRγ|ξ1,U1(r)𝑑r\displaystyle=\frac{1}{h_{n}}\int E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=r,\xi_{1},U_{1}]K(r/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r)dr
=E[d121d122cΔW12ν12|ΔR12γ=r,ξ1,U1]K(r/hn)fRγ|ξ1,U1(r)𝑑r\displaystyle=\int E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=r,\xi_{1},U_{1}]K(r/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r)dr
=O(1),\displaystyle=O(1),

almost surely for sufficiently large nn by Assumptions 4, 6, and 8. Thus, we have

i=1nE[|Vi,L|3]=n×O(1nn)=o(1).\displaystyle\sum_{i=1}^{n}E[|V_{i,L}|^{3}]=n\times O\left(\frac{1}{n\sqrt{n}}\right)=o(1).

If ΣWν,1\Sigma_{W\nu,1} is positive definite, we have cΣWν,1c>0c^{\prime}\Sigma_{W\nu,1}c>0. Thus, by Lyapunov CLT, we have

VL/Var[VL]d𝒩(0,1),\displaystyle V_{L}/\sqrt{Var[V_{L}]}\to_{d}\mathcal{N}(0,1),

as nn\to\infty. Thus, VL=nLWνd𝒩(0,cΣWν,1c)V_{L}=\sqrt{n}L_{W\nu}\to_{d}\mathcal{N}(0,c^{\prime}\Sigma_{W\nu,1}c) as nn\to\infty.

If cΣWν,1c=0c^{\prime}\Sigma_{W\nu,1}c=0, observe that, for some constant C>0C>0

Var[Li,Wν]\displaystyle Var[L_{i,W\nu}]
=4E[(hn1E[d121d122cΔW12ν12|ΔR12γ=r,ξ1,U1]K(r/hn)fRγ|ξ1,U1(r)𝑑r)2]\displaystyle=4E\left[\left(\int h_{n}^{-1}E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=r,\xi_{1},U_{1}]K(r/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r)dr\right)^{2}\right]
=4E[(gξ1,U1(rhn)K(r)𝑑r)2]\displaystyle=4E\left[\left(\int g_{\xi_{1},U_{1}}(rh_{n})K(r)dr\right)^{2}\right]
4E[(gξ1,U1(0)+Chnk)2]\displaystyle\leq 4E\left[\left(g_{\xi_{1},U_{1}}(0)+Ch_{n}^{k}\right)^{2}\right]
=O(hn2k),\displaystyle=O(h_{n}^{2k}),

where the first inequality holds from the Taylor expansion of gξ1,U1(rhn)g_{\xi_{1},U_{1}}(rh_{n}) and KK eliminating riK(r)𝑑r=0\int r^{i}K(r)dr=0 for i=1,,ki=1,...,k, and the last line holds since

E[(gξ1,U1(0))2]\displaystyle E\left[\left(g_{\xi_{1},U_{1}}(0)\right)^{2}\right]
=E[(E[d121d122cΔW12ν12fRγ|ξ1,U1(0)|ΔR12γ=0,ξ1,U1])2]\displaystyle=E\left[\left(E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}f_{R\gamma|\xi_{1},U_{1}}(0)|\Delta R_{12}^{\prime}\gamma=0,\xi_{1},U_{1}]\right)^{2}\right]
=E[E[d121d122d131d132cΔW12cΔW13ν12ν13fRγ,2(0,0)|ΔR12γ=ΔR13γ=0,ξ1,U1]]\displaystyle=E\left[E[d_{121}d_{122}d_{131}d_{132}c^{\prime}\Delta W_{12}c^{\prime}\Delta W_{13}\nu_{12}\nu_{13}f_{R\gamma,2}(0,0)|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0,\xi_{1},U_{1}]\right]
=fRγ,2(0,0)E[d121d122d131d132cΔW12cΔW13ν12ν13|ΔR12γ=ΔR13γ=0]\displaystyle=f_{R\gamma,2}(0,0)E[d_{121}d_{122}d_{131}d_{132}c^{\prime}\Delta W_{12}c^{\prime}\Delta W_{13}\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0]
=cΣWν,1c/4=0,\displaystyle=c^{\prime}\Sigma_{W\nu,1}c/4=0,

and E[gξ1,U1]=0E[g_{\xi_{1},U_{1}}]=0 by the conditional mean independence of ν12\nu_{12}. Thus,

Var[NhnLWν]=nhn×O(hn2k)=hnk1/2×O(Nhn2k+3)=o(1),\displaystyle Var[\sqrt{Nh_{n}}L_{W\nu}]=nh_{n}\times O(h_{n}^{2k})=h_{n}^{k-1/2}\times O(\sqrt{Nh_{n}^{2k+3}})=o(1),

since Nhn2k+3=O(1)\sqrt{Nh_{n}^{2k+3}}=O(1) by the hypothesis and k2k\geq 2 so that hnk1/2=o(1)h_{n}^{k-1/2}=o(1). Hence, by Chebyshev’s inequality,

NhnLWνp0,\displaystyle\sqrt{Nh_{n}}L_{W\nu}\to_{p}0,

as nn\to\infty when ΣWν,1=0\Sigma_{W\nu,1}=0.

Step 2: Asymptotic Normality of QWνQ_{W\nu}


We use the CLT for martingale differences (Theorem 5.24 and Corollary 5.26 in White (2001)). Define Vn,t(1tN)V_{n,t}(1\leq t\leq N), a triangular array, as

Vn,1\displaystyle V_{n,1} =1NQ12,Wν,\displaystyle=\frac{1}{N}Q_{12,W\nu},
Vn,2\displaystyle V_{n,2} =1NQ13,Wν,\displaystyle=\frac{1}{N}Q_{13,W\nu},
\displaystyle\vdots
Vn,n+n1\displaystyle V_{n,n+n-1} =1NQ1n,Wν,\displaystyle=\frac{1}{N}Q_{1n,W\nu},
\displaystyle\vdots
Vn,N\displaystyle V_{n,N} =1NQn1n,Wν.\displaystyle=\frac{1}{N}Q_{n-1n,W\nu}.

Notice that Qij,WνQ_{ij,W\nu} is independent of Qkm,WνQ_{km,W\nu} if ik,mi\neq k,m and jk,mj\neq k,m. Also, Qij,WνQ_{ij,W\nu} is conditionally independent of Qkm,WνQ_{km,W\nu} even if i=ki=k or mm, or j=kj=k or mm; Note that (ϵijt,ηijt)t=1,2(\epsilon_{ijt},\eta_{ijt})_{t=1,2} and (ϵimt,ηimt)t=1,2(\epsilon_{imt},\eta_{imt})_{t=1,2} are conditionally independent given ξi,Ui\xi_{i},U_{i} by Assumption 1. Since (ξi,Ui),i=1,n(\xi_{i},U_{i}),i=1,...n is i.i.d., this implies that, for 1<tN1<t\leq N,

E[Vn,t|{Vn,s;s<t}]\displaystyle E[V_{n,t}|\{V_{n,s};s<t\}] =E[E[Vn,t|{Vn,s;s<t},{ξi,Ui}j=1n|]{Vn,s;s<t}]\displaystyle=E\left[E[V_{n,t}|\{V_{n,s};s<t\},\{\xi_{i},U_{i}\}_{j=1}^{n}|]\{V_{n,s};s<t\}\right]
=E[E[Vn,t|ξt,Ut]|{Vn,s;s<t}]\displaystyle=E\left[E[V_{n,t}|\xi_{t},U_{t}]|\{V_{n,s};s<t\}\right]
=0,\displaystyle=0,

where

E[Vn,t|ξt,Ut]\displaystyle E[V_{n,t}|\xi_{t},U_{t}]
=1NE[d121d122cΔW12ν12Khn(ΔR12γ)|ξ1,U1]1NE[d121d122cΔW12ν12Khn(ΔR12γ)|ξ1,U1]\displaystyle=\frac{1}{N}E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|\xi_{1},U_{1}]-\frac{1}{N}E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|\xi_{1},U_{1}]
1NE[E[d121d122cΔW12ν12Khn(ΔR12γ)|ξ2,U2]|ξ1,U1]\displaystyle-\frac{1}{N}E\left[E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|\xi_{2},U_{2}]|\xi_{1},U_{1}\right]
=1NE[d121d122cΔW12ν12Khn(ΔR12γ)]\displaystyle=-\frac{1}{N}E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)]
=0\displaystyle=0

because ξ1,U1\xi_{1},U_{1} is independent of ξ2,U2\xi_{2},U_{2}. Thus, letting tσ(Vs|1st)\mathcal{F}_{t}\equiv\sigma(V_{s}|1\leq s\leq t) be a sigma algebra generated by V1,,Vt1V_{1},...,V_{t-1} (1\mathcal{F}_{1} is set to be a trivial σ\sigma-algebra) and 𝔽(t)1tN\mathbb{F}\equiv(\mathcal{F}_{t})_{1\leq t\leq N} be a filtration, we have

E[Vn,t|t1]=0\displaystyle E[V_{n,t}|\mathcal{F}_{t-1}]=0

for 1tN1\leq t\leq N. Also, for each tt, for some constant C>0C>0,

E[|Vn,t|]3NE[|ΔW12|ν12||Khn(ΔR12γ)|]CNhnE[ΔW122]1/2E[ν122]1/2<,\displaystyle E[|V_{n,t}|]\leq\frac{3}{N}E[|\Delta W_{12}\||\nu_{12}||K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|]\leq\frac{C}{Nh_{n}}E[\|\Delta W_{12}\|^{2}]^{1/2}E[\nu_{12}^{2}]^{1/2}<\infty,

by Assumptions 7 and 8. This shows that {Vn,t}\{V_{n,t}\} is a martingale difference sequence.

Let Vn=t=1NVn,tV_{n}=\sum_{t=1}^{N}V_{n,t}. Define the variance of this sequence by

vn2=Var[t=1NVn,t]=NVar[Vn,1]=1NE[Q12,Wν2].\displaystyle v_{n}^{2}=Var\left[\sum^{N}_{t=1}V_{n,t}\right]=NVar[V_{n,1}]=\frac{1}{N}E[Q_{12,W\nu}^{2}].

We can calculate that

E[Q12,Wν2]\displaystyle E[Q_{12,W\nu}^{2}] =E[d121d122(cΔW12)2ν122K2(ΔR12γ)]\displaystyle=E\left[d_{121}d_{122}(c^{\prime}\Delta W_{12})^{2}\nu_{12}^{2}K^{2}(\Delta R_{12}^{\prime}\gamma)\right]
2E[E[d121d122cΔW12ν12Khn(ΔR12γ)|ξ1,U1]2].\displaystyle-2E\left[E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|\xi_{1},U_{1}]^{2}\right].

Observe that

E[d121d122(cΔW12)2ν122K2(ΔR12γ)]\displaystyle E\left[d_{121}d_{122}(c^{\prime}\Delta W_{12})^{2}\nu_{12}^{2}K^{2}(\Delta R_{12}^{\prime}\gamma)\right]
=1hn2E[d121d122(cΔW12)2ν122|ΔR12γ=r]K(r/hn)fRγ(r)𝑑r\displaystyle=\frac{1}{h_{n}^{2}}\int E[d_{121}d_{122}(c^{\prime}\Delta W_{12})^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r]K(r/h_{n})f_{R\gamma}(r)dr
=1hnE[d121d122(cΔW12)2ν122|ΔR12γ=rhn]K(r)fRγ(rhn)𝑑r\displaystyle=\frac{1}{h_{n}}\int E[d_{121}d_{122}(c^{\prime}\Delta W_{12})^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=rh_{n}]K(r)f_{R\gamma}(rh_{n})dr
=1hncΣWν,2c+o(1hn),\displaystyle=\frac{1}{h_{n}}c^{\prime}\Sigma_{W\nu,2}c+o\left(\frac{1}{h_{n}}\right),

where the last line holds from the dominated convergence theorem under Assumptions 4, 6, and 8, and

E[d121d122cΔW12ν12Khn(ΔR12γ)|ξ1,U1]\displaystyle E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}K_{h_{n}}(\Delta R_{12}^{\prime}\gamma)|\xi_{1},U_{1}]
=1hnE[d121d122cΔW12ν12|ΔR12γ=r,ξ1,U1]K(r/hn)fRγ|ξ1,U1(r)𝑑r\displaystyle=\frac{1}{h_{n}}\int E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=r,\xi_{1},U_{1}]K(r/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r)dr
=E[d121d122cΔW12ν12|ΔR12γ=rhn,ξ1,U1]K(r)fRγ|ξ1,U1(rhn)𝑑r\displaystyle=\int E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=rh_{n},\xi_{1},U_{1}]K(r)f_{R\gamma|\xi_{1},U_{1}}(rh_{n})dr
=O(1),\displaystyle=O(1),

almost surely for sufficiently large nn by Assumptions 4, 6, and 8. Since 1/hn1/h_{n} diverges as nn\to\infty,

E[Q12,Wν2]=1hncΣWν,2c+o(1hn).\displaystyle E[Q_{12,W\nu}^{2}]=\frac{1}{h_{n}}c^{\prime}\Sigma_{W\nu,2}c+o\left(\frac{1}{h_{n}}\right).

Hence, we have

vn2=1NhncΣWν,2c+o(1Nhn).\displaystyle v_{n}^{2}=\frac{1}{Nh_{n}}c^{\prime}\Sigma_{W\nu,2}c+o\left(\frac{1}{Nh_{n}}\right).

The CLT for martingale differences holds if we can show the following two conditions:

t=1NE[(Vn,tvn)2+δ]0 (Lyapunov),\displaystyle\sum_{t=1}^{N}E\left[\left(\frac{V_{n,t}}{v_{n}}\right)^{2+\delta}\right]\to 0\text{ (Lyapunov)},

for some δ>0\delta>0 as nn\to\infty and

t=1N(Vn,tvn)2p1 (Stability),\displaystyle\sum_{t=1}^{N}\left(\frac{V_{n,t}}{v_{n}}\right)^{2}\to_{p}1\text{ (Stability)},

as nn\to\infty. If these conditions are met, we can apply Theorem 5.24 and Corollary 5.26 in White (2001) to show that

Vnvnd𝒩(0,1),\displaystyle\frac{V_{n}}{v_{n}}\to_{d}\mathcal{N}(0,1),

as nn\to\infty. Since NhnvncΣWν,2c\sqrt{Nh_{n}v_{n}}\to\sqrt{c^{\prime}\Sigma_{W\nu,2}c}, by Slutsky’s lemma,

NhnVnd𝒩(0,cΣWν,2c),\displaystyle\sqrt{Nh_{n}}V_{n}\to_{d}\mathcal{N}(0,c^{\prime}\Sigma_{W\nu,2}c),

which is equivalent to

NhnQWνd𝒩(0,cΣWν,2c),\displaystyle\sqrt{Nh_{n}}Q_{W\nu}\to_{d}\mathcal{N}(0,c^{\prime}\Sigma_{W\nu,2}c),

as nn\to\infty.

For Lyapunov’s condition, observe that for some constant C>0C>0,

E[|Vn,1|3]\displaystyle E\left[|V_{n,1}|^{3}\right] C(Nhn)3E[ΔW123|ν12|3|ΔR12γ=r]|K(r/hn)|3fRγ(r)𝑑r\displaystyle\leq\frac{C}{(Nh_{n})^{3}}\int E[\|\Delta W_{12}\|^{3}|\nu_{12}|^{3}|\Delta R_{12}^{\prime}\gamma=r]|K(r/h_{n})|^{3}f_{R\gamma}(r)dr
=Chn(Nhn)3E[ΔW123|ν12|3|ΔR12γ=rhn]|K(r)|3fRγ(rhn)𝑑r\displaystyle=\frac{Ch_{n}}{(Nh_{n})^{3}}\int E[\|\Delta W_{12}\|^{3}|\nu_{12}|^{3}|\Delta R_{12}^{\prime}\gamma=rh_{n}]|K(r)|^{3}f_{R\gamma}(rh_{n})dr
=O(1N3hn2),\displaystyle=O\left(\frac{1}{N^{3}h_{n}^{2}}\right),

where the first inequality follows from Jensen’s inequality, the last line follows from Cauchy-Schwartz and Assumptions 4, 6, and 8. Since vn=O(1/Nhn)v_{n}=O(1/\sqrt{Nh_{n}}), we have

t=1NE[|Vn,tvn|3]=NO(NhnN3hn2)=O(1(Nhn)3/2)=o(1)\displaystyle\sum_{t=1}^{N}E\left[\left|\frac{V_{n,t}}{v_{n}}\right|^{3}\right]=NO\left(\frac{\sqrt{Nh_{n}}}{N^{3}h_{n}^{2}}\right)=O\left(\frac{1}{(Nh_{n})^{3/2}}\right)=o(1)

by Assumption 9. Thus Lyapunov’s condition holds.

For the stability condition, we can alternatively show that

1vn2t=1N(Vn,t2E[Vn,t2])p0,\displaystyle\frac{1}{v_{n}^{2}}\sum_{t=1}^{N}\left(V_{n,t}^{2}-E[V_{n,t}^{2}]\right)\to_{p}0,

as nn\to\infty. Note that

1vn2t=1N(Vn,t2E[Vn,t2])=1Nvn2(1Ni<jQij,Wν2E[Q12,Wν2]).\displaystyle\frac{1}{v_{n}^{2}}\sum_{t=1}^{N}\left(V_{n,t}^{2}-E[V_{n,t}^{2}]\right)=\frac{1}{Nv_{n}^{2}}\left(\frac{1}{N}\sum_{i<j}Q_{ij,W\nu}^{2}-E[Q_{12,W\nu}^{2}]\right).

Since Nvn2=O(1/hn)Nv_{n}^{2}=O(1/h_{n}), we need to show that the remaining term is op(1/hn)o_{p}(1/h_{n}). Since Qij,WνQ_{ij,W\nu} is independent fron Qkm,WνQ_{km,W\nu} if there is no common node,

E[(1Ni<jQij,Wν2E[Q12,Wν2])2]\displaystyle E\left[\left(\frac{1}{N}\sum_{i<j}Q_{ij,W\nu}^{2}-E[Q_{12,W\nu}^{2}]\right)^{2}\right]
=Var[Q12,Wν2]N+2(n2)NCov[Q12,Wν2,Q13,Wν2]\displaystyle=\frac{Var[Q_{12,W\nu}^{2}]}{N}+\frac{2(n-2)}{N}Cov[Q_{12,W\nu}^{2},Q_{13,W\nu}^{2}]
E[Q12,Wν4]N+2(n2)NE[Q12,Wν2×Q13,Wν2].\displaystyle\leq\frac{E[Q_{12,W\nu}^{4}]}{N}+\frac{2(n-2)}{N}E[Q_{12,W\nu}^{2}\times Q_{13,W\nu}^{2}].

The first term in the far right-hand side is bounded as follows: For some constant C>0C>0,

E[Q12,Wν4]N\displaystyle\frac{E[Q_{12,W\nu}^{4}]}{N} CNhn4E[ΔW124ν124|ΔR12γ=r]K4(r)fRγ(r)𝑑r\displaystyle\leq\frac{C}{Nh_{n}^{4}}\int E[\|\Delta W_{12}\|^{4}\nu_{12}^{4}|\Delta R_{12}^{\prime}\gamma=r]K^{4}(r)f_{R\gamma}(r)dr
=CNhn3E[ΔW124ν124|ΔR12γ=rhn]K4(r)fRγ(rhn)𝑑r\displaystyle=\frac{C}{Nh_{n}^{3}}\int E[\|\Delta W_{12}\|^{4}\nu_{12}^{4}|\Delta R_{12}^{\prime}\gamma=rh_{n}]K^{4}(r)f_{R\gamma}(rh_{n})dr
=O(1Nhn3),\displaystyle=O\left(\frac{1}{Nh_{n}^{3}}\right),

where the first inequality follows from Jensen’s inequality, and the last line follows from Cauchy-Schwartz and Assumptions 4, 6, and 8. The second term on the far right-hand side is bounded as follows: For some constant C>0C>0,

2(n2)NE[Q12,Wν2×Q13,Wν2]\displaystyle\frac{2(n-2)}{N}E[Q_{12,W\nu}^{2}\times Q_{13,W\nu}^{2}]
C(n2)Nhn4E[ΔW122ν122|ΔR12γ=r1,ξ1,U1]×E[ΔW132ν132|ΔR13γ=r2,ξ1,U1]\displaystyle\leq\frac{C(n-2)}{Nh_{n}^{4}}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]\times E[\|\Delta W_{13}\|^{2}\nu_{13}^{2}|\Delta R_{13}^{\prime}\gamma=r_{2},\xi_{1},U_{1}]
×K2(r1/hn)K2(r2/hn)fRγ|ξ1,U1(r1)fRγ|ξ1,U1(r2)dr1dr2\displaystyle\times K^{2}(r_{1}/h_{n})K^{2}(r_{2}/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{1})f_{R\gamma|\xi_{1},U_{1}}(r_{2})dr_{1}dr_{2}
=C(n2)Nhn2E[ΔW122ν122|ΔR12γ=r1hn,ξ1,U1]×E[ΔW132ν132|ΔR13γ=r2hn,ξ1,U1]\displaystyle=\frac{C(n-2)}{Nh_{n}^{2}}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]\times E[\|\Delta W_{13}\|^{2}\nu_{13}^{2}|\Delta R_{13}^{\prime}\gamma=r_{2}h_{n},\xi_{1},U_{1}]
×K2(r1)K2(r2)fRγ|ξ1,U1(r1hn)fRγ|ξ1,U1(r2hn)dr1dr2\displaystyle\times K^{2}(r_{1})K^{2}(r_{2})f_{R\gamma|\xi_{1},U_{1}}(r_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{2}h_{n})dr_{1}dr_{2}
=O(1nhn2)\displaystyle=O\left(\frac{1}{nh_{n}^{2}}\right)

Thus,

hnE[(1Ni<jQij,Wν2E[Q12,Wν2])2]=O(1Nhn2)+O(1nhn)=o(1),\displaystyle h_{n}E\left[\left(\frac{1}{N}\sum_{i<j}Q_{ij,W\nu}^{2}-E[Q_{12,W\nu}^{2}]\right)^{2}\right]=O\left(\frac{1}{Nh_{n}^{2}}\right)+O\left(\frac{1}{nh_{n}}\right)=o(1),

and by Markov’s inequality,

hn(1Ni<jQij,Wν2E[Q12,Wν2])=op(1).\displaystyle\sqrt{h_{n}}\left(\frac{1}{N}\sum_{i<j}Q_{ij,W\nu}^{2}-E[Q_{12,W\nu}^{2}]\right)=o_{p}(1).

Then,

1vn2t=1N(Vn,t2E[Vn,t2])=O(hn)×op(1hn)=op(hn)=op(1),\displaystyle\frac{1}{v_{n}^{2}}\sum_{t=1}^{N}\left(V_{n,t}^{2}-E[V_{n,t}^{2}]\right)=O(h_{n})\times o_{p}\left(\frac{1}{\sqrt{h_{n}}}\right)=o_{p}(\sqrt{h_{n}})=o_{p}(1),

which shows the stability condition.

Step 3: Conclusion


By Steps 0-2, we have established that if cWΣWν,1cW>0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}>0,

ncSWW1SWν\displaystyle\sqrt{n}c^{\prime}S_{WW}^{-1}S_{W\nu}
=nLWνd𝒩(0,cWΣWν,1cW)+nNhn0×NhnQWνd𝒩(0,cWΣWν,2cW)+op(n/Nhn)d𝒩(0,cWΣWν,1cW),\displaystyle=\underbrace{\sqrt{n}L_{W\nu}}_{\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,1}c_{W})}+\underbrace{\frac{\sqrt{n}}{\sqrt{Nh_{n}}}}_{\to 0}\times\underbrace{\sqrt{Nh_{n}}Q_{W\nu}}_{\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,2}c_{W})}+o_{p}(\sqrt{n}/\sqrt{Nh_{n}})\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}),

as nn\to\infty by Assumption 9, and if cWΣWν,1cW=0c_{W}^{\prime}\Sigma_{W\nu,1}c_{W}=0,

NhncSWW1SWν=NhnLWνp0+NhnQWνd𝒩(0,cWΣWν,2cW)+op(1)d𝒩(0,cWΣWν,2cW),\displaystyle\sqrt{Nh_{n}}c^{\prime}S_{WW}^{-1}S_{W\nu}=\underbrace{\sqrt{Nh_{n}}L_{W\nu}}_{\to_{p}0}+\underbrace{\sqrt{Nh_{n}}Q_{W\nu}}_{\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,2}c_{W})}+o_{p}(1)\to_{d}\mathcal{N}(0,c_{W}^{\prime}\Sigma_{W\nu,2}c_{W}),

as nn\to\infty. This completes the proof. ∎

Proof of Lemma 4

Proof.

By expanding K(ΔRijγ^n/hn)K(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}/h_{n}) around ΔRijγ\Delta R_{ij}^{\prime}\gamma, we have

S^WW\displaystyle\hat{S}_{WW} =SWW+1Nhn2i<jdij1dij2ΔWijΔWijΔRij(γ^nγ)K(cij,n)\displaystyle=S_{WW}+\frac{1}{Nh_{n}^{2}}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta W_{ij}^{\prime}\Delta R_{ij}^{\prime}(\hat{\gamma}_{n}-\gamma)K^{\prime}(c_{ij,n}^{*})

where cij,nc_{ij,n}^{*} is in between ΔRijγ\Delta R_{ij}^{\prime}\gamma and ΔRijγ^n\Delta R_{ij}^{\prime}\hat{\gamma}_{n}. Thus, for some constant C>0C>0,

S^WWSWW\displaystyle\|\hat{S}_{WW}-S_{WW}\|\leq CNi<jΔWij2ΔRijD4,1×hn2γ^nγ\displaystyle\underbrace{\frac{C}{N}\sum_{i<j}\|\Delta W_{ij}\|^{2}\|\Delta R_{ij}\|}_{D_{4,1}}\times h_{n}^{-2}\|\hat{\gamma}_{n}-\gamma\|

Notice that ΔWij=w(Xi1,Xj1)w(Xi2,Xj2),Rij=r(Zi1,Zj1r(Zi2,Zj2)\|\Delta W_{ij}\|=\|w(X_{i1},X_{j1})-w(X_{i2},X_{j2})\|,\|R_{ij}\|=\|r(Z_{i1},Z_{j1}-r(Z_{i2},Z_{j2})\| are symmetric in ii and jj by the symmetry of ww, rr, and \|\cdot\| so that D4,1D_{4,1} is a second-order U-statistics. Also,

E[ΔW122ΔRij]<.\displaystyle E[\|\Delta W_{12}\|^{2}\|\Delta R_{ij}\|]<\infty.

by Cauchy-Schwartz with Assumption 7. Thus, we can apply the law of large numbers for U-statistics (Hoeffding (1961)):

D4,1=Op(1).\displaystyle D_{4,1}=O_{p}(1).

By the hypothesis and Assumption 10,

hn2γ^nγ\displaystyle h_{n}^{-2}\|\hat{\gamma}_{n}-\gamma\| =Nhn(γ^nγ)Nhn5\displaystyle=\frac{\|\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)\|}{\sqrt{Nh_{n}^{5}}}
=Nhn(γ^nγ)Nhn2k+3×hn2k2\displaystyle=\frac{\|\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)\|}{\sqrt{Nh_{n}^{2k+3}}}\times\sqrt{h_{n}^{2k-2}}
=op(1),\displaystyle=o_{p}(1),

as Nhn2k+3\sqrt{Nh_{n}^{2k+3}} is either diverging or O(1)O(1), h2k2=o(1)\sqrt{h^{2k-2}}=o(1) for k2k\geq 2. Thus,

S^WWSWW=Op(1)×op(1)=op(1).\displaystyle\|\hat{S}_{WW}-S_{WW}\|=O_{p}(1)\times o_{p}(1)=o_{p}(1).

This shows that S^WW=SWW+op(1)\hat{S}_{WW}=S_{WW}+o_{p}(1). ∎

Proof of Lemma 5

Proof.

Expanding K(ΔRijγ^n/hn)K(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}/h_{n}) around ΔRijγ\Delta R_{ij}^{\prime}\gamma, for some constant C>0C>0, we have

NhnS^WλSWλ\displaystyle\sqrt{Nh_{n}}\|\hat{S}_{W\lambda}-S_{W\lambda}\|
1Nhn2i<jΔWijΔRij|λij||K(ΔRijγ/hn)|D5,1×Nhnγ^nγ\displaystyle\leq\underbrace{\frac{1}{Nh_{n}^{2}}\sum_{i<j}\|\Delta W_{ij}\|\|\Delta R_{ij}\||\lambda_{ij}||K^{\prime}(\Delta R_{ij}^{\prime}\gamma/h_{n})|}_{D_{5,1}}\times\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|
+1Nhn2i<jΔWijΔRij2|λij||K′′(ΔRijγ/hn)|D5,2×Nhnγ^nγ2hn\displaystyle+\underbrace{\frac{1}{Nh_{n}^{2}}\sum_{i<j}\|\Delta W_{ij}\|\|\Delta R_{ij}\|^{2}|\lambda_{ij}||K^{\prime\prime}(\Delta R_{ij}^{\prime}\gamma/h_{n})|}_{D_{5,2}}\times\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|}{2h_{n}}
+C1Ni<jΔWijΔRij3|λij|D5,3×Nhnγ^nγ36hn4\displaystyle+C\underbrace{\frac{1}{N}\sum_{i<j}\|\Delta W_{ij}\|\|\Delta R_{ij}\|^{3}|\lambda_{ij}|}_{D_{5,3}}\times\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|^{3}}{6h_{n}^{4}}

We follow the following steps to bound the right hand side.

Step 1 D5,1D_{5,1} and D5,2D_{5,2}


Observe that

E[D5,1]\displaystyle E[D_{5,1}] =1hn2E[ΔW12ΔR12|Λ12||ΔR12γ=r]|r||K(r/hn)|fRγ(r)𝑑r\displaystyle=\frac{1}{h_{n}^{2}}\int E[\|\Delta W_{12}\|\|\Delta R_{12}\||\Lambda_{12}||\Delta R_{12}^{\prime}\gamma=r]|r||K^{\prime}(r/h_{n})|f_{R\gamma}(r)dr
=E[ΔW12ΔR12|Λ12||ΔR12γ=rhn]|r||K(r)|fRγ(rhn)𝑑r\displaystyle=\int E[\|\Delta W_{12}\|\|\Delta R_{12}\||\Lambda_{12}||\Delta R_{12}^{\prime}\gamma=rh_{n}]|r||K^{\prime}(r)|f_{R\gamma}(rh_{n})dr
=O(1),\displaystyle=O(1),

where the last line holds from Assumptions 4, 6 and 8. Also, writing each summand of D5,1D_{5,1} by D5,1,ijD_{5,1,ij}, we have

Var[D5,1]\displaystyle Var[D_{5,1}] =1Nhn4Var[D5,1,12]+2(n2)Nhn4Cov[D5,1,12,D5,1,13]\displaystyle=\frac{1}{Nh_{n}^{4}}Var[D_{5,1,12}]+\frac{2(n-2)}{Nh_{n}^{4}}Cov[D_{5,1,12},D_{5,1,13}]
1Nhn4E[D5,1,122]+2(n2)Nhn4E[D5,1,12×D5,1,13].\displaystyle\leq\frac{1}{Nh_{n}^{4}}E[D^{2}_{5,1,12}]+\frac{2(n-2)}{Nh_{n}^{4}}E[D_{5,1,12}\times D_{5,1,13}].

The first term on the far right side is O(1/(Nhn2))O(1/(Nh_{n}^{2})) because

E[D5,1,122]\displaystyle E[D_{5,1,12}^{2}] =E[ΔW122ΔR122Λ122|ΔR12γ=r]r2K(r/hn)2fRγ(r)𝑑r\displaystyle=\int E[\|\Delta W_{12}\|^{2}\|\Delta R_{12}\|^{2}\Lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r]r^{2}K^{\prime}(r/h_{n})^{2}f_{R\gamma}(r)dr
=hn2E[ΔW122ΔR122Λ122|ΔR12γ=rhn]r2K(r)2fRγ(rhn)𝑑r\displaystyle=h_{n}^{2}\int E[\|\Delta W_{12}\|^{2}\|\Delta R_{12}\|^{2}\Lambda_{12}^{2}|\Delta R_{12}^{\prime}\gamma=rh_{n}]r^{2}K^{\prime}(r)^{2}f_{R\gamma}(rh_{n})dr
=O(hn2),\displaystyle=O(h_{n}^{2}),

where the last line holds from Assumptions 4, 6, and 8. The second term on the far right side is O(1/n)O(1/n) because

E[D5,1,12×D5,1,13]\displaystyle E[D_{5,1,12}\times D_{5,1,13}] =E[E[ΔW12ΔR12|Λ12||ΔR12γ=r1,ξ1,U1]\displaystyle=E\Big{[}\int E[\|\Delta W_{12}\|\|\Delta R_{12}\||\Lambda_{12}||\Delta R_{12}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]
×E[ΔW13ΔR13|Λ13||ΔR13γ=r2,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|\|\Delta R_{13}\||\Lambda_{13}||\Delta R_{13}^{\prime}\gamma=r_{2},\xi_{1},U_{1}]
×|r1||r2||K(r1/hn)||K(r2/hn)|fRγ|ξ1,U1(r1)fRγ|ξ1,U1(r2)dr1dr2]\displaystyle\times|r_{1}||r_{2}||K^{\prime}(r_{1}/h_{n})||K^{\prime}(r_{2}/h_{n})|f_{R\gamma|\xi_{1},U_{1}}(r_{1})f_{R\gamma|\xi_{1},U_{1}}(r_{2})dr_{1}dr_{2}\Big{]}
=hn4E[E[ΔW12ΔR12|Λ12||ΔR12γ=r1hn,ξ1,U1]\displaystyle=h_{n}^{4}E\Big{[}\int E[\|\Delta W_{12}\|\|\Delta R_{12}\||\Lambda_{12}||\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]
×E[ΔW13ΔR13|Λ13||ΔR13γ=r2hn,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|\|\Delta R_{13}\||\Lambda_{13}||\Delta R_{13}^{\prime}\gamma=r_{2}h_{n},\xi_{1},U_{1}]
×|r1||r2||K(r1)||K(r2)|fRγ|ξ1,U1(r1hn)fRγ|ξ1,U1(r2hn)dr1dr2]\displaystyle\times|r_{1}||r_{2}||K^{\prime}(r_{1})||K^{\prime}(r_{2})|f_{R\gamma|\xi_{1},U_{1}}(r_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{2}h_{n})dr_{1}dr_{2}\Big{]}
=O(hn4),\displaystyle=O(h_{n}^{4}),

where the last line holds from Assumptions 4, 6, and 8. Thus,

Var[D5,1]=O(1Nhn2)+O(1n)=o(1),\displaystyle Var[D_{5,1}]=O\left(\frac{1}{Nh_{n}^{2}}\right)+O\left(\frac{1}{n}\right)=o(1),

since Nhn2=Nh2k+3×hn12kNh_{n}^{2}=Nh^{2k+3}\times h_{n}^{1-2k} diverges by the hypothesis. Thus,

D5,1=Op(1).\displaystyle D_{5,1}=O_{p}(1).

By a similar calculation, we have

D5,2=Op(1).\displaystyle D_{5,2}=O_{p}(1).

Step 2: D5,3D_{5,3}


First, observe that

E[D5,3]=E[ΔW12ΔR124|Λ12|]<\displaystyle E[D_{5,3}]=E[\|\Delta W_{12}\|\|\Delta R_{12}\|^{4}|\Lambda_{12}|]<\infty

by Hölder’s inequality with Assumption 7. Note that by construction, Λij\Lambda_{ij} is written as a function of ξi\xi_{i} and ξj\xi_{j} with symmetry with respect to ii and jj, which implies that D5,3D_{5,3} is a second-order U-statistics. Since each summand is non-negative and has a finite mean, we can apply the law of large numbers for U-statistics (Hoeffding (1961)) to show that

D5,3=Op(1).\displaystyle D_{5,3}=O_{p}(1).

Step 3: Conclusion


Finally, by Assumption 10 and the hypothesis,

Nhnγ^nγ=op(1),\displaystyle\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|=o_{p}(1),
Nhnγ^nγ22hn=Nhn(γ^nγ)22Nhn3=op(1),\displaystyle\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|^{2}}{2h_{n}}=\frac{\|\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)\|^{2}}{2\sqrt{Nh_{n}^{3}}}=o_{p}(1),
Nhnγ^nγ36hn4=Nhn(γ^nγ)36Nhn5=op(1).\displaystyle\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|^{3}}{6h_{n}^{4}}=\frac{\|\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)\|^{3}}{6Nh_{n}^{5}}=o_{p}(1).

Thus,

NhnS^WλSWλ=op(1).\displaystyle\sqrt{Nh_{n}}\|\hat{S}_{W\lambda}-S_{W\lambda}\|=o_{p}(1).

This implies that

S^Wλ=SWλ+op(1Nhn).\displaystyle\hat{S}_{W\lambda}=S_{W\lambda}+o_{p}\left(\frac{1}{\sqrt{Nh_{n}}}\right).

This completes the proof. ∎

Proof of Lemma 6

Proof.

By expanding K(ΔRijγ^n/hn)K(\Delta R_{ij}^{\prime}\hat{\gamma}_{n}/h_{n}) around ΔRijγ\Delta R_{ij}^{\prime}\gamma, we have

Nhn(S^WνSWν)\displaystyle\sqrt{Nh_{n}}(\hat{S}_{W\nu}-S_{W\nu})
=1Nhn2i<jdij1dij2ΔWijΔRijνijK(ΔRijγ/hn)D6,1Nhn(γ^nγ)\displaystyle=\underbrace{\frac{1}{Nh_{n}^{2}}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta R_{ij}^{\prime}\nu_{ij}K^{\prime}(\Delta R_{ij}^{\prime}\gamma/h_{n})}_{D_{6,1}}\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)
+(γ^nγ)1Nhn2i<jdij1dij2ΔWijΔRijΔRijνijK′′(ΔRijγ/hn)D6,2Nhn(γ^nγ)hn\displaystyle+(\hat{\gamma}_{n}-\gamma)^{\prime}\underbrace{\frac{1}{Nh_{n}^{2}}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\Delta R_{ij}\Delta R_{ij}^{\prime}\nu_{ij}K^{\prime\prime}(\Delta R_{ij}^{\prime}\gamma/h_{n})}_{D_{6,2}}\sqrt{Nh_{n}}\frac{(\hat{\gamma}_{n}-\gamma)}{h_{n}}
+NhnNhn4i<jdij1dij2ΔWijνijK′′′(cij,n/hn)(ΔRij(γ^nγ))3D6,3\displaystyle+\underbrace{\frac{\sqrt{Nh_{n}}}{Nh_{n}^{4}}\sum_{i<j}d_{ij1}d_{ij2}\Delta W_{ij}\nu_{ij}K^{\prime\prime\prime}(c_{ij,n}^{*}/h_{n})\left(\Delta R_{ij}^{\prime}(\hat{\gamma}_{n}-\gamma)\right)^{3}}_{D_{6,3}}

We bound each component by the following steps.

Step 1: D6,1D_{6,1} and D6,2D_{6,2}


Note that E[D6,1]=E[D6,2]=0E[D_{6,1}]=E[D_{6,2}]=0 by the conditional mean independence of νij\nu_{ij}. Write D6,1,ijD_{6,1,ij} as each summand of D6,1D_{6,1}. Observe that, by the similar calculation as above,

Var[D6,1]1Nhn4E[D6,1,122]+2(n2)Nhn4E[D6,1,12×D6,1,13].\displaystyle Var[\|D_{6,1}\|]\leq\frac{1}{Nh_{n}^{4}}E[\|D_{6,1,12}\|^{2}]+\frac{2(n-2)}{Nh_{n}^{4}}E[\|D_{6,1,12}\|\times\|D_{6,1,13}\|].

The first term on the right hand side is O(1/(Nhn3))O(1/(Nh_{n}^{3})) since

E[D6,1,122]\displaystyle E[\|D_{6,1,12}\|^{2}] E[ΔW122ΔR122ν122|ΔR12γ=r]K(r/hn)2fRγ(r)𝑑r\displaystyle\leq\int E[\|\Delta W_{12}\|^{2}\|\Delta R_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=r]K^{\prime}(r/h_{n})^{2}f_{R\gamma}(r)dr
=hnE[ΔW122ΔR122ν122|ΔR12γ=rhn]K(r)2fRγ(rhn)𝑑r\displaystyle=h_{n}\int E[\|\Delta W_{12}\|^{2}\|\Delta R_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=rh_{n}]K^{\prime}(r)^{2}f_{R\gamma}(rh_{n})dr
=O(hn),\displaystyle=O(h_{n}),

where the last line holds from Assumptions 4, 6, and 8. The second term on the right hand side is O(1/(nhn2))O(1/(nh_{n}^{2})) since

E[D6,1,12×D6,1,13]\displaystyle E[\|D_{6,1,12}\|\times\|D_{6,1,13}\|] E[E[ΔW12ΔR12|ν|12|ΔR12γ=r1,ξ1,U1]\displaystyle\leq E\Big{[}\int E[\|\Delta W_{12}\|\|\Delta R_{12}\||\nu|_{12}|\Delta R_{12}^{\prime}\gamma=r_{1},\xi_{1},U_{1}]
×E[ΔW13ΔR13|ν|13|ΔR13γ=r2,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|\|\Delta R_{13}\||\nu|_{13}|\Delta R_{13}^{\prime}\gamma=r_{2},\xi_{1},U_{1}]
×|K(r1/hn)||K(r2/hn)|fRγ|ξ1,U1(r1)fRγ|ξ1,U1(r2)dr1dr2]\displaystyle\times|K^{\prime}(r_{1}/h_{n})||K^{\prime}(r_{2}/h_{n})|f_{R\gamma|\xi_{1},U_{1}}(r_{1})f_{R\gamma|\xi_{1},U_{1}}(r_{2})dr_{1}dr_{2}\Big{]}
=hn2E[E[ΔW12ΔR12|ν|12|ΔR12γ=r1hn,ξ1,U1]\displaystyle=h_{n}^{2}E\Big{[}\int E[\|\Delta W_{12}\|\|\Delta R_{12}\||\nu|_{12}|\Delta R_{12}^{\prime}\gamma=r_{1}h_{n},\xi_{1},U_{1}]
×E[ΔW13ΔR13|ν|13|ΔR13γ=r2hn,ξ1,U1]\displaystyle\times E[\|\Delta W_{13}\|\|\Delta R_{13}\||\nu|_{13}|\Delta R_{13}^{\prime}\gamma=r_{2}h_{n},\xi_{1},U_{1}]
×|K(r1)||K(r2)|fRγ|ξ1,U1(r1hn)fRγ|ξ1,U1(r2hn)dr1dr2]\displaystyle\times|K^{\prime}(r_{1})||K^{\prime}(r_{2})|f_{R\gamma|\xi_{1},U_{1}}(r_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(r_{2}h_{n})dr_{1}dr_{2}\Big{]}
=O(hn2),\displaystyle=O(h_{n}^{2}),

where the last line holds from Assumptions 4, 6, and 8. Hence,

Var[D6,1]=O(1Nhn3)+O(1nhn2)=o(1),\displaystyle Var[\|D_{6,1}\|]=O\left(\frac{1}{Nh_{n}^{3}}\right)+O\left(\frac{1}{nh_{n}^{2}}\right)=o(1),

since both Nhn3=Nhn2k+3×hn2kNh_{n}^{3}=Nh_{n}^{2k+3}\times h_{n}^{-2k} and nhn2Nhn4=Nhn2k+3×hn2k+1nh_{n}^{2}\sim\sqrt{Nh_{n}^{4}}=\sqrt{Nh_{n}^{2k+3}}\times\sqrt{h_{n}^{-2k+1}} diverge under the hypothesis. This shows that

D6,1=op(1).\displaystyle D_{6,1}=o_{p}(1).

A similar calculation shows that

D6,2=op(1),\displaystyle D_{6,2}=o_{p}(1),

as well.

Step 2: D6,3D_{6,3}


Observe that, for some constant C>0C>0

D6,3C1Ni<jΔWijΔRij3|νij|D6,4×Nhnγ^nγ3Nhn4.\displaystyle\|D_{6,3}\|\leq C\underbrace{\frac{1}{N}\sum_{i<j}\|\Delta W_{ij}\|\|\Delta R_{ij}\|^{3}|\nu_{ij}|}_{D_{6,4}}\times\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|^{3}}{Nh_{n}^{4}}.

Observe that

E[D6,4]=E[ΔW12ΔR123|ν12|]<,\displaystyle E[D_{6,4}]=E[\|\Delta W_{12}\|\|\Delta R_{12}\|^{3}|\nu_{12}|]<\infty,

by Cauchy-Schwartz with Assumption 6. Also, by writing each summand of D6,4D_{6,4} as D6,4,ijD_{6,4,ij}, we have

Var[D6,4]E[D6,4,122]N+2(n2)NE[D6,4,12×D6,4,13].\displaystyle Var[D_{6,4}]\leq\frac{E[D_{6,4,12}^{2}]}{N}+\frac{2(n-2)}{N}E[D_{6,4,12}\times D_{6,4,13}].

Since

E[D6,4,122]\displaystyle E[D_{6,4,12}^{2}] =E[ΔW122ΔR126ν122]<\displaystyle=E[\|\Delta W_{12}\|^{2}\|\Delta R_{12}\|^{6}\nu_{12}^{2}]<\infty
E[D6,4,12×D6,4,13]\displaystyle E[D_{6,4,12}\times D_{6,4,13}] =E[ΔW12ΔW13ΔR123ΔR133|ν12||ν13|]<\displaystyle=E[\|\Delta W_{12}\|\|\Delta W_{13}\|\|\Delta R_{12}\|^{3}\|\Delta R_{13}\|^{3}|\nu_{12}||\nu_{13}|]<\infty

by Hölder’s inequality with Assumption 7,

Var[D6,4]=O(1N)+O(1n)=o(1).\displaystyle Var[D_{6,4}]=O\left(\frac{1}{N}\right)+O\left(\frac{1}{n}\right)=o(1).

This shows that

D6,4=Op(1).\displaystyle D_{6,4}=O_{p}(1).

Hence, by the previous calculation for the term involving γ^nγ\hat{\gamma}_{n}-\gamma,

D6,3=Op(1)×op(1)=op(1).\displaystyle\|D_{6,3}\|=O_{p}(1)\times o_{p}(1)=o_{p}(1).

Step 3: Conclusion


By the above steps and the hypothesis on γ^nγ\hat{\gamma}_{n}-\gamma,

NhnS^WνSWν=op(1).\displaystyle\sqrt{Nh_{n}}\|\hat{S}_{W\nu}-S_{W\nu}\|=o_{p}(1).

This implies that

S^Wν=SWν+op(1Nhn).\displaystyle\hat{S}_{W\nu}=S_{W\nu}+o_{p}\left(\frac{1}{\sqrt{Nh_{n}}}\right).

This completes the proof. ∎

Proof of Lemma 7

Proof.

Define

Sij,1\displaystyle S_{ij,1} 2dij1dij2Khn(ΔRijγ)ΔWijνij,\displaystyle\equiv 2d_{ij1}d_{ij2}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)\Delta W_{ij}\nu_{ij},
Sij,2\displaystyle S_{ij,2} 2dij1dij2Khn(ΔRijγ)ΔWijλij,\displaystyle\equiv 2d_{ij1}d_{ij2}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)\Delta W_{ij}\lambda_{ij},
Sij,3\displaystyle S_{ij,3} 2dij1dij2Khn(ΔRijγ)ΔWijΔWij(ββ^n).\displaystyle\equiv 2d_{ij1}d_{ij2}K_{h_{n}}(\Delta R_{ij}^{\prime}\gamma)\Delta W_{ij}\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n}).

Since

Δϵ^ij=ΔWij(ββ^n)+λij+νij,\displaystyle\Delta\hat{\epsilon}_{ij}=\Delta W_{ij}^{\prime}(\beta-\hat{\beta}_{n})+\lambda_{ij}+\nu_{ij},

we have

Sij=Sij,1+Sij,2+Sij,3.\displaystyle S_{ij}=S_{ij,1}+S_{ij,2}+S_{ij,3}.

Thus,

Σ~Wν,1\displaystyle\tilde{\Sigma}_{W\nu,1}
=(n3)1i<j<k13(Sij,1Sik,1+Sij,1Sjk,1+Sik,1Sjk,1)D7,ijkD7+𝒪7,\displaystyle=\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\underbrace{\frac{1}{3}(S_{ij,1}S_{ik,1}^{\prime}+S_{ij,1}S_{jk,1}^{\prime}+S_{ik,1}S_{jk,1})}_{D_{7,ijk}}}_{D_{7}}+\mathcal{O}_{7},

where 𝒪7\mathcal{O}_{7} is the remainder term.

We first show that cD7cpcΣWν,1cc^{\prime}D_{7}c\to_{p}c^{\prime}\Sigma_{W\nu,1}c. Note that

E[cD7c]\displaystyle E[c^{\prime}D_{7}c] =E[cS12,1S13,1c]\displaystyle=E[c^{\prime}S_{12,1}S_{13,1}c]
=4hn2E[d121d122d131d132cΔW12ΔW13cν12ν13|ΔR12γ=s1,ΔR13γ=s2]\displaystyle=\frac{4}{h_{n}^{2}}\int E[d_{121}d_{122}d_{131}d_{132}c^{\prime}\Delta W_{12}\Delta W_{13}^{\prime}c\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=s_{1},\Delta R_{13}^{\prime}\gamma=s_{2}]
×K(s1/hn)K(s2/hn)fRγ,2(s1,s2)ds1ds2\displaystyle\times K(s_{1}/h_{n})K(s_{2}/h_{n})f_{R\gamma,2}(s_{1},s_{2})ds_{1}ds_{2}
=4E[d121d122d131d132cΔW12ΔW13cν12ν13|ΔR12γ=s1hn,ΔR13γ=s2hn]\displaystyle=4\int E[d_{121}d_{122}d_{131}d_{132}c^{\prime}\Delta W_{12}\Delta W_{13}^{\prime}c\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=s_{1}h_{n},\Delta R_{13}^{\prime}\gamma=s_{2}h_{n}]
×K(s1)K(s2)fRγ,2(s1hn,s2hn)ds1ds2\displaystyle\times K(s_{1})K(s_{2})f_{R\gamma,2}(s_{1}h_{n},s_{2}h_{n})ds_{1}ds_{2}
cΣWν,1c,\displaystyle\to c^{\prime}\Sigma_{W\nu,1}c,

as nn\to\infty by the dominated convergence theorem under Assumptions 4, 6, and 8. Define the third order U-statistics

Un,1=(n3)1i<j<kpn(𝝃i,𝝃j,𝝃k),\displaystyle U_{n,1}={n\choose 3}^{-1}\sum_{i<j<k}p_{n}(\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{k}),

where 𝝃i=(ξi,Ui)\boldsymbol{\xi}_{i}=(\xi_{i},U_{i}) and

pn(𝝃i,𝝃j,𝝃k)=E[cD7,ijkc|𝝃i,𝝃j,𝝃k]\displaystyle p_{n}(\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{k})=E[c^{\prime}D_{7,ijk}c|\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{k}]

By the calculation of Graham et al. (2019) in Appendix B,

E[(cD7cUn,1)2]\displaystyle E[(c^{\prime}D_{7}c-U_{n,1})^{2}] =(n3)1E[(cD7,123cE[cD7,123c|𝝃1,𝝃2,𝝃3])2]\displaystyle={n\choose 3}^{-1}E\left[(c^{\prime}D_{7,123}c-E[c^{\prime}D_{7,123}c|\boldsymbol{\xi}_{1},\boldsymbol{\xi}_{2},\boldsymbol{\xi}_{3}])^{2}\right]
+(n3)2×3(n2)(n22)×E[cD7,123cE[cD7,123c|𝝃1,𝝃2,𝝃3]]\displaystyle+{n\choose 3}^{-2}\times 3{n\choose 2}{n-2\choose 2}\times E\left[c^{\prime}D_{7,123}c-E[c^{\prime}D_{7,123}c|\boldsymbol{\xi}_{1},\boldsymbol{\xi}_{2},\boldsymbol{\xi}_{3}]\right]
×E[cD7,124cE[cD7,124c|𝝃1,𝝃2,𝝃4]]\displaystyle\times E\left[c^{\prime}D_{7,124}c-E[c^{\prime}D_{7,124}c|\boldsymbol{\xi}_{1},\boldsymbol{\xi}_{2},\boldsymbol{\xi}_{4}]\right]
=O(E[(cD7,123c)2]n3).\displaystyle=O\left(\frac{E[(c^{\prime}D_{7,123}c)^{2}]}{n^{3}}\right).

Observe that

E[(cD7,123c)2]\displaystyle E[(c^{\prime}D_{7,123}c)^{2}] =19(3E[(cS12,1c×cS13,1c)2]+6E[(cS12,1c)2×cS13,1c×cS23,1c])\displaystyle=\frac{1}{9}\left(3E[(c^{\prime}S_{12,1}c\times c^{\prime}S_{13,1}c)^{2}]+6E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c\times c^{\prime}S_{23,1}c]\right)
=O(1hn2),\displaystyle=O\left(\frac{1}{h_{n}^{2}}\right),

since for some positive constant C>0C>0,

E[(cS12,1c×cS13,1c)2]\displaystyle E[(c^{\prime}S_{12,1}c\times c^{\prime}S_{13,1}c)^{2}] Chn4E[ΔW122ΔW132ν122ν132|ΔR12γ=s1,ΔR13γ=s2]\displaystyle\leq\frac{C}{h_{n}^{4}}\int E[\|\Delta W_{12}\|^{2}\|\Delta W_{13}\|^{2}\nu_{12}^{2}\nu_{13}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1},\Delta R_{13}^{\prime}\gamma=s_{2}]
×K2(s1/hn)K2(s2/hn)fRγ,2(s1,s2)ds1ds2\displaystyle\times K^{2}(s_{1}/h_{n})K^{2}(s_{2}/h_{n})f_{R\gamma,2}(s_{1},s_{2})ds_{1}ds_{2}
=ChnE[ΔW122ΔW132ν122ν132|ΔR12γ=s1hn,ΔR13γ=s2hn]\displaystyle=\frac{C}{h_{n}}\int E[\|\Delta W_{12}\|^{2}\|\Delta W_{13}\|^{2}\nu_{12}^{2}\nu_{13}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1}h_{n},\Delta R_{13}^{\prime}\gamma=s_{2}h_{n}]
×K2(s1)K2(s2)fRγ,2(s1hn,s2hn)ds1ds2\displaystyle\times K^{2}(s_{1})K^{2}(s_{2})f_{R\gamma,2}(s_{1}h_{n},s_{2}h_{n})ds_{1}ds_{2}
=O(1hn2),\displaystyle=O\left(\frac{1}{h_{n}^{2}}\right),

as nn\to\infty with the last line coming from Assumption 4, 6, and 8, and,

E[(cS12,1c)2×cS13,1c×cS23,1c]\displaystyle E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c\times c^{\prime}S_{23,1}c]
=E[(cS12,1c)2×E[cS13,1c|ξ1,U1]×E[cS23,1c|ξ2,U2]]\displaystyle=E[(c^{\prime}S_{12,1}c)^{2}\times E[c^{\prime}S_{13,1}c|\xi_{1},U_{1}]\times E[c^{\prime}S_{23,1}c|\xi_{2},U_{2}]]
=E[E[(cS12,1c)2×cS13,1c|ξ1,U1]×E[cS23,1|ξ2,U2]]\displaystyle=E[E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c|\xi_{1},U_{1}]\times E[c^{\prime}S_{23,1}|\xi_{2},U_{2}]]
=E[(cS12,1c)2×cS13,1c]×E[cS12,1c]\displaystyle=E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c]\times E[c^{\prime}S_{12,1}c]
O(1)×Chn3E[ΔW122ν122|ΔR12γ=s1,ξ1,U1]×E[ΔW13|ν13||ΔR13γ=s2,ξ1,U1]\displaystyle\leq O(1)\times\frac{C}{h_{n}^{3}}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1},\xi_{1},U_{1}]\times E[\|\Delta W_{13}\||\nu_{13}||\Delta R_{13}^{\prime}\gamma=s_{2},\xi_{1},U_{1}]
×K2(s1/hn)K(s2/hn)fRγ|ξ1,U1(s1)fRγ|ξ1,U1(s2)ds1ds2ds3\displaystyle\times K^{2}(s_{1}/h_{n})K(s_{2}/h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{1})f_{R\gamma|\xi_{1},U_{1}}(s_{2})ds_{1}ds_{2}ds_{3}
=O(1)×ChnE[ΔW122ν122|ΔR12γ=s1hn,ξ1,U1]×E[ΔW13|ν13||ΔR13γ=s2hn,ξ1,U1]\displaystyle=O(1)\times\frac{C}{h_{n}}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1}h_{n},\xi_{1},U_{1}]\times E[\|\Delta W_{13}\||\nu_{13}||\Delta R_{13}^{\prime}\gamma=s_{2}h_{n},\xi_{1},U_{1}]
×K2(s1)K(s2)fRγ|ξ1,U1(s1hn)fRγ|ξ1,U1(s2hn)ds1ds2ds3\displaystyle\times K^{2}(s_{1})K(s_{2})f_{R\gamma|\xi_{1},U_{1}}(s_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{2}h_{n})ds_{1}ds_{2}ds_{3}
=O(1hn),\displaystyle=O\left(\frac{1}{h_{n}}\right),

where the first to third lines follow from the conditional independence of Sij,1S_{ij,1}, the random sampling of ξi\xi_{i}, and the conditional independence and exchangeability of UiU_{i} under Assumption 1, and the last line follows from Assumptions 4, 6, and 8. Observe that, by conditional independence of Sij,1S_{ij,1} and Sik,1S_{ik,1} given ξi,Ui\xi_{i},U_{i} and Sij,1=Sji,1S_{ij,1}=S_{ji,1}, one can show that

E[cD7,123c×cD7,124c]\displaystyle E[c^{\prime}D_{7,123}c\times c^{\prime}D_{7,124}c] =19{2E[(cS12,1c)2×cS13,1c×cS14,1c]\displaystyle=\frac{1}{9}\big{\{}2E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c\times c^{\prime}S_{14,1}c]
+2E[(cS12,1c)2×cS13,1c]×E[cS13,1c]+5E[cS12,1c×cS13,1c]2}\displaystyle+2E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c]\times E[c^{\prime}S_{13,1}c]+5E[c^{\prime}S_{12,1}c\times c^{\prime}S_{13,1}c]^{2}\big{\}}
=(1hn),\displaystyle=\left(\frac{1}{h_{n}}\right),

where the last line holds since

E[(cS12,1c)2×cS13,1c×cS14,1c]\displaystyle E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c\times c^{\prime}S_{14,1}c]
Chn4E[E[ΔW122ν122|ΔR12γ=s1,ξ1,U1]×E[ΔW12ν12|ΔR12γ=s2,ξ1,U1]\displaystyle\leq\frac{C}{h_{n}^{4}}E\bigg{[}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1},\xi_{1},U_{1}]\times E[\|\Delta W_{12}\|\nu_{12}|\Delta R_{12}^{\prime}\gamma=s_{2},\xi_{1},U_{1}]
×E[ΔW14ν14|ΔR14γ=s3,ξ1,U1]×K2(s1/hn)K(s2/hn)K(s3/hn)\displaystyle\times E[\|\Delta W_{14}\|\nu_{14}|\Delta R_{14}^{\prime}\gamma=s_{3},\xi_{1},U_{1}]\times K^{2}(s_{1}/h_{n})K(s_{2}/h_{n})K(s_{3}/h_{n})
×fRγ|ξ1,U1(s1)fRγ|ξ1,U1(s2)fRγ|ξ1,U1(s3)ds1ds2ds3]\displaystyle\times f_{R\gamma|\xi_{1},U_{1}}(s_{1})f_{R\gamma|\xi_{1},U_{1}}(s_{2})f_{R\gamma|\xi_{1},U_{1}}(s_{3})ds_{1}ds_{2}ds_{3}\bigg{]}
=ChnE[E[ΔW122ν122|ΔR12γ=s1hn,ξ1,U1]×E[ΔW12ν12|ΔR12γ=s2hn,ξ1,U1]\displaystyle=\frac{C}{h_{n}}E\bigg{[}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1}h_{n},\xi_{1},U_{1}]\times E[\|\Delta W_{12}\|\nu_{12}|\Delta R_{12}^{\prime}\gamma=s_{2}h_{n},\xi_{1},U_{1}]
×E[ΔW14ν14|ΔR14γ=s3hn,ξ1,U1]×K2(s1)K(s2)K(s3)\displaystyle\times E[\|\Delta W_{14}\|\nu_{14}|\Delta R_{14}^{\prime}\gamma=s_{3}h_{n},\xi_{1},U_{1}]\times K^{2}(s_{1})K(s_{2})K(s_{3})
×fRγ|ξ1,U1(s1hn)fRγ|ξ1,U1(s2hn)fRγ|ξ1,U1(s3hn)ds1ds2ds3]\displaystyle\times f_{R\gamma|\xi_{1},U_{1}}(s_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{2}h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{3}h_{n})ds_{1}ds_{2}ds_{3}\bigg{]}
=O(1hn),\displaystyle=O\left(\frac{1}{h_{n}}\right),

where the last line holds by Assumptions 4, 6, 8,

E[(cS12,1c)2×cS13,1c]\displaystyle E[(c^{\prime}S_{12,1}c)^{2}\times c^{\prime}S_{13,1}c]
Chn3E[E[ΔW122ν122|ΔR12γ=s1,ξ1,U1]\displaystyle\leq\frac{C}{h_{n}^{3}}E\bigg{[}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1},\xi_{1},U_{1}]
×E[ΔW13|ν13||ΔR13γ=s2,ξ1,U1]K2(s1/hn)K(s2/hn)fRγ|ξ1,U1(s1)fRγ|ξ1,U1(s2)ds1ds2]\displaystyle\times E[\|\Delta W_{13}\||\nu_{13}||\Delta R_{13}^{\prime}\gamma=s_{2},\xi_{1},U_{1}]K^{2}(s_{1}/h_{n})K(s_{2}/h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{1})f_{R\gamma|\xi_{1},U_{1}}(s_{2})ds_{1}ds_{2}\bigg{]}
ChnE[E[ΔW122ν122|ΔR12γ=s1hn,ξ1,U1]\displaystyle\leq\frac{C}{h_{n}}E\bigg{[}\int E[\|\Delta W_{12}\|^{2}\nu_{12}^{2}|\Delta R_{12}^{\prime}\gamma=s_{1}h_{n},\xi_{1},U_{1}]
×E[ΔW13|ν13||ΔR13γ=s2hn,ξ1,U1]K2(s1)K(s2)fRγ|ξ1,U1(s1hn)fRγ|ξ1,U1(s2hn)ds1ds2]\displaystyle\times E[\|\Delta W_{13}\||\nu_{13}||\Delta R_{13}^{\prime}\gamma=s_{2}h_{n},\xi_{1},U_{1}]K^{2}(s_{1})K(s_{2})f_{R\gamma|\xi_{1},U_{1}}(s_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{2}h_{n})ds_{1}ds_{2}\bigg{]}
=O(1hn),\displaystyle=O\left(\frac{1}{h_{n}}\right),

where the last equality holds from Assumptions 4, 6, and 8, and

E[cS12c×cS13c]\displaystyle E[c^{\prime}S_{12}c\times c^{\prime}S_{13}c]
Chn2E[E[ΔW12|ν12||ΔR12γ=s1,ξ1,U1]\displaystyle\leq\frac{C}{h_{n}^{2}}E\bigg{[}\int E[\|\Delta W_{12}\||\nu_{12}||\Delta R_{12}^{\prime}\gamma=s_{1},\xi_{1},U_{1}]
×E[ΔW13|ν13||ΔR13γ=s2,ξ1,U1]K(s1/hn)K(s2/hn)fRγ|ξ1,U1(s1)fRγ|ξ1,U1(s2)ds1ds2]\displaystyle\times E[\|\Delta W_{13}\||\nu_{13}||\Delta R_{13}^{\prime}\gamma=s_{2},\xi_{1},U_{1}]K(s_{1}/h_{n})K(s_{2}/h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{1})f_{R\gamma|\xi_{1},U_{1}}(s_{2})ds_{1}ds_{2}\bigg{]}
CE[E[ΔW12|ν12||ΔR12γ=s1hn,ξ1,U1]\displaystyle\leq CE\bigg{[}\int E[\|\Delta W_{12}\||\nu_{12}||\Delta R_{12}^{\prime}\gamma=s_{1}h_{n},\xi_{1},U_{1}]
×E[ΔW13|ν13||ΔR13γ=s2hn,ξ1,U1]K(s1)K(s2)fRγ|ξ1,U1(s1hn)fRγ|ξ1,U1(s2hn)ds1ds2]\displaystyle\times E[\|\Delta W_{13}\||\nu_{13}||\Delta R_{13}^{\prime}\gamma=s_{2}h_{n},\xi_{1},U_{1}]K(s_{1})K(s_{2})f_{R\gamma|\xi_{1},U_{1}}(s_{1}h_{n})f_{R\gamma|\xi_{1},U_{1}}(s_{2}h_{n})ds_{1}ds_{2}\bigg{]}
=O(1),\displaystyle=O(1),

where the last equality holds from Assumptions 4, 6, and 8. Thus,

E[(cD7cUn,1)2]=O(1n3hn2)=o(1).\displaystyle E\left[(c^{\prime}D_{7}c-U_{n,1})^{2}\right]=O\left(\frac{1}{n^{3}h_{n}^{2}}\right)=o(1).

Thus, cD1c^{\prime}D_{1} is well approximated by UnU_{n}. Also, since nhn2nh_{n}^{2}\to\infty with the stated assumption on hnh_{n},

E[(pn(𝝃i,𝝃j,𝝃k))2]=O(E[(cD7,123c)2])=O(nnhn2)=o(1)×O(n),\displaystyle E\left[\left(p_{n}(\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{k})\right)^{2}\right]=O\left(E[(c^{\prime}D_{7,123}c)^{2}]\right)=O\left(\frac{n}{nh_{n}^{2}}\right)=o(1)\times O(n),

and by Lemma A.3 of Ahn and Powell (1993), we have

Un=E[Un,1]+op(1).\displaystyle U_{n}=E[U_{n,1}]+o_{p}(1).

This shows that

cD7c\displaystyle c^{\prime}D_{7}c =E[Un,1]+cD7cUn,1=op(1)+Un,1E[Un,1]=op(1)\displaystyle=E[U_{n,1}]+\underbrace{c^{\prime}D_{7}c-U_{n,1}}_{=o_{p}(1)}+\underbrace{U_{n,1}-E[U_{n,1}]}_{=o_{p}(1)}
=E[cD7c]+op(1)\displaystyle=E[c^{\prime}D_{7}c]+o_{p}(1)
=cΣWν,1c+op(1).\displaystyle=c^{\prime}\Sigma_{W\nu,1}c+o_{p}(1).

This completes cD7cpcΣWν,1cc^{\prime}D_{7}c\to_{p}c^{\prime}\Sigma_{W\nu,1}c as nn\to\infty.

The remainder term 𝒪7\mathcal{O}_{7} with each term involving either Sij,2S_{ij,2} or (and) Sij,3S_{ij,3} is of smaller order than D7D_{7} since Sij,2S_{ij,2} and Sij,3S_{ij,3} involve β^nβ=Op(1/n)\|\hat{\beta}_{n}-\beta\|=O_{p}(1/\sqrt{n}) and λijhn\lambda_{ij}\sim h_{n} for large nn; By computing in a similar way as before, we can establish that E[|c𝒪7c|]=o(1)E[|c^{\prime}\mathcal{O}_{7}c|]=o(1) and Var[c𝒪7c]=o(1)Var[c^{\prime}\mathcal{O}_{7}c]=o(1) so that |c𝒪7c|p0|c^{\prime}\mathcal{O}_{7}c|\to_{p}0. Hence,

|cΣ~ν,1ccΣWν,1c||cD1ccΣWν,1c|+|c𝒪7c|=op(1),\displaystyle|c^{\prime}\tilde{\Sigma}_{\nu,1}c-c^{\prime}\Sigma_{W\nu,1}c|\leq|c^{\prime}D_{1}c-c^{\prime}\Sigma_{W\nu,1}c|+|c^{\prime}\mathcal{O}_{7}c|=o_{p}(1),

which completes the proof for Lemma 7. ∎

Proof of Lemma 8

Proof.

Define

S^ij,1\displaystyle\hat{S}_{ij,1} =2hn2dij1dij2K(cij,nhn)ΔWijΔRijΔϵ^ij\displaystyle=\frac{2}{h_{n}^{2}}d_{ij1}d_{ij2}K^{\prime}\left(\frac{c_{ij,n}^{*}}{h_{n}}\right)\Delta W_{ij}\Delta R_{ij}^{\prime}\Delta\hat{\epsilon}_{ij}

where cij,nc_{ij,n}^{*} is in between ΔRijγ\Delta R_{ij}^{\prime}\gamma and ΔRijγ^n\Delta R_{ij}^{\prime}\hat{\gamma}_{n}. In the following argument, we treat Δϵ^ij\Delta\hat{\epsilon}_{ij} in S^ij,1\hat{S}_{ij,1} as νij\nu_{ij} because only the existence of higher moments is important and bounding the terms involving νij\nu_{ij} suffices. By the expression for Δϵ^ij\Delta\hat{\epsilon}_{ij}, we have

S^ij\displaystyle\hat{S}_{ij} =Sij,1+Sij,2+Sij,3+S^ij,1(γ^nγ).\displaystyle=S_{ij,1}+S_{ij,2}+S_{ij,3}+\hat{S}_{ij,1}(\hat{\gamma}_{n}-\gamma).

By the proof of 7, we know that

(n3)1i<j<k13(Sij,pSik,p+Sij,pSjk,p+Sik,pSjk,p)\displaystyle{n\choose 3}^{-1}\sum_{i<j<k}\frac{1}{3}(S_{ij,p}S_{ik,p}^{\prime}+S_{ij,p}S_{jk,p}^{\prime}+S_{ik,p}S_{jk,p}^{\prime}) =op(1),\displaystyle=o_{p}(1),

for p=2,3p=2,3. Then,

Σ^Wν,1Σ~Wν,1\displaystyle\|\hat{\Sigma}_{W\nu,1}-\tilde{\Sigma}_{W\nu,1}\| p=13(n3)1i<j<khn23(Sij,pS^ik,1+Sij,pS^jk,1+Sik,pS^jk,1)D8,1,ijkpD8,1pγ^nγhn2\displaystyle\leq\sum_{p=1}^{3}\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\underbrace{\frac{h_{n}^{2}}{3}(\|S_{ij,p}\|\|\hat{S}_{ik,1}\|+\|S_{ij,p}\|\|\hat{S}_{jk,1}\|+\|S_{ik,p}\|\|\hat{S}_{jk,1}\|)}_{D_{8,1,ijk}^{p}}}_{D_{8,1}^{p}}\frac{\|\hat{\gamma}_{n}-\gamma\|}{h_{n}^{2}}
+p=13(n3)1i<j<khn23(S^ij,1Sik,p+S^ij,1Sjk,p+S^ik,1Sjk,p)D8,2γ^nγhn2\displaystyle+\underbrace{\sum_{p=1}^{3}{n\choose 3}^{-1}\sum_{i<j<k}\frac{h_{n}^{2}}{3}(\|\hat{S}_{ij,1}\|\|S_{ik,p}\|+\|\hat{S}_{ij,1}\|\|S_{jk,p}\|+\|\hat{S}_{ik,1}\|\|S_{jk,p}\|)}_{D_{8,2}}\frac{\|\hat{\gamma}_{n}-\gamma\|}{h_{n}^{2}}
+(n3)1i<j<khn43(S^ij,1S^ik,1+S^ij,1S^jk,1+S^ik,1S^jk,1)D8,3γ^nγ2hn4\displaystyle+\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{h_{n}^{4}}{3}(\|\hat{S}_{ij,1}\|\|\hat{S}_{ik,1}\|+\|\hat{S}_{ij,1}\|\|\hat{S}_{jk,1}\|+\|\hat{S}_{ik,1}\|\|\hat{S}_{jk,1}\|)}_{D_{8,3}}\frac{\|\hat{\gamma}_{n}-\gamma\|^{2}}{h_{n}^{4}}

For D8,1pD_{8,1}^{p}, it suffices to bound D8,11D_{8,1}^{1} as the similar calculation applies to the other terms. By Assumption 8, for some constant C>0C>0,

Sij,1\displaystyle\|S_{ij,1}\| 2hnΔWij|νij||K(ΔRijγ/hn)|gij,1,\displaystyle\leq\frac{2}{h_{n}}\underbrace{\|\Delta W_{ij}\||\nu_{ij}||K(\Delta R_{ij}^{\prime}\gamma/h_{n})|}_{g_{ij,1}},
S^ij,1\displaystyle\|\hat{S}_{ij,1}\| Chn2ΔWijΔRij|νij|gij,2.\displaystyle\leq\frac{C}{h_{n}^{2}}\underbrace{\|\Delta W_{ij}\|\|\Delta R_{ij}\||\nu_{ij}|}_{g_{ij,2}}.

Thus,

D8,11C(n3)1i<j<k(gij,1gik,2+gij,1gjk,2+gik,1gjk,2)gijk,12.\displaystyle D_{8,1}^{1}\leq C{n\choose 3}^{-1}\sum_{i<j<k}\underbrace{(g_{ij,1}g_{ik,2}+g_{ij,1}g_{jk,2}+g_{ik,1}g_{jk,2})}_{g_{ijk,12}}.

Observe that

E[D8,11]\displaystyle E[D_{8,1}^{1}] 1hnE[g12,1g13,2]\displaystyle\leq\frac{1}{h_{n}}E[g_{12,1}g_{13,2}]
=1hnE[E[ΔW12|ν12||ΔR12γ=r,ξ1,U1]K(r/hn)fRγ|ξ1,U1(r)dr\displaystyle=\frac{1}{h_{n}}E\Big{[}\int E[\|\Delta W_{12}\||\nu_{12}||\Delta R_{12}^{\prime}\gamma=r,\xi_{1},U_{1}]K(r/h_{n})f_{R\gamma|\xi_{1},U_{1}}(r)dr
×E[ΔW13ΔR13|ν13||ξ1,U1]]\displaystyle\times E[\|\Delta W_{13}\|\|\Delta R_{13}\||\nu_{13}||\xi_{1},U_{1}]\Big{]}
=E[E[ΔW12|ν12||ΔR12γ=rhn,ξ1,U1]K(r)fRγ|ξ1,U1(rhn)dr\displaystyle=E\Big{[}\int E[\|\Delta W_{12}\||\nu_{12}||\Delta R_{12}^{\prime}\gamma=rh_{n},\xi_{1},U_{1}]K(r)f_{R\gamma|\xi_{1},U_{1}}(rh_{n})dr
×E[ΔW13ΔR13|ν13||ξ1,U1]]\displaystyle\times E[\|\Delta W_{13}\|\|\Delta R_{13}\||\nu_{13}||\xi_{1},U_{1}]\Big{]}
=O(1),\displaystyle=O(1),

where the last equality follows from Assumptions 4, 6, 7, and 8. For variance, the leading term involves covariances between variables with one common node, which has n×(n14)n\times{n-1\choose 4} elements (up to some constant scale):

Var[D8,11]=O(1hn2×(n3)2×n×(n14)×E[D8,1,1231×D8,1,1451])=Op(1nhn2)=o(1),\displaystyle Var[D_{8,1}^{1}]=O\Bigg{(}\frac{1}{h_{n}^{2}}\times{n\choose 3}^{-2}\times n\times{n-1\choose 4}\times E[D_{8,1,123}^{1}\times D_{8,1,145}^{1}]\Bigg{)}=O_{p}\left(\frac{1}{nh_{n}^{2}}\right)=o(1),

since nhn2n(2k1)/(2k+3)nh_{n}^{2}\sim n^{(2k-1)/(2k+3)} diverges for k2k\geq 2 and

E[D8,1,1231×D8,1,1451]\displaystyle E[D_{8,1,123}^{1}\times D_{8,1,145}^{1}] E[g123,12×g145,12]\displaystyle\leq E[g_{123,12}\times g_{145,12}]
E[g123,122]\displaystyle\leq E[g_{123,12}^{2}]
CE[ΔW122ΔW132ΔR132ν122ν132]\displaystyle\leq CE[\|\Delta W_{12}\|^{2}\|\Delta W_{13}\|^{2}\|\Delta R_{13}\|^{2}\nu_{12}^{2}\nu_{13}^{2}]
=O(1),\displaystyle=O(1),

by Cauchy-Schwartz under Assumption 7. Thus, D8,1=Op(1)D_{8,1}=O_{p}(1). Similarly, D8,2=Op(1)D_{8,2}=O_{p}(1) and D8,3=Op(1)D_{8,3}=O_{p}(1) hold.

Notice that

γ^nγhn2=Nhnγ^nγNhn5=op(1),\displaystyle\frac{\|\hat{\gamma}_{n}-\gamma\|}{h_{n}^{2}}=\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|}{\sqrt{Nh_{n}^{5}}}=o_{p}(1),

since 1/Nhn51/\sqrt{Nh_{n}^{5}} diverges by the hypothesis, and similarly,

γ^nγ2hn4=(Nhnγ^nγ)2Nhn5=op(1).\displaystyle\frac{\|\hat{\gamma}_{n}-\gamma\|^{2}}{h_{n}^{4}}=\frac{(\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|)^{2}}{Nh_{n}^{5}}=o_{p}(1).

Hence,

Σ^Wν,1Σ~Wν,1Op(1)×op(1)+op(1)=op(1),\displaystyle\|\hat{\Sigma}_{W\nu,1}-\tilde{\Sigma}_{W\nu,1}\|\leq O_{p}(1)\times o_{p}(1)+o_{p}(1)=o_{p}(1),

which completes the proof for Lemma 8. ∎

Proof of Lemma 9

Proof.

We only show nhncWΣ^=op(1)nh_{n}c_{W}^{\prime}\hat{\Sigma}=o_{p}(1) as the other case follows by taking transpose. We also write cWc_{W} as cc for short. The statement is proved by showing that Lemmas 7 and 8 hold even after re-scaled by n1α/2hnn^{1-\alpha/2}h_{n}. First, we show that cΣWν,1c=0c^{\prime}\Sigma_{W\nu,1}c=0 implies that cΣWν,1=0c^{\prime}\Sigma_{W\nu,1}=0.

Step 1: The implication of cΣWν,1c=0c^{\prime}\Sigma_{W\nu,1}c=0


Note that

ΣWν,1\displaystyle\Sigma_{W\nu,1} =fRγ,2(0,0)Pr(d121d122d131d132=1|ΔR12γ=ΔR13γ=0)\displaystyle=f_{R\gamma,2}(0,0)Pr(d_{121}d_{122}d_{131}d_{132}=1|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0)
×E[ΔW12ΔW13ν12ν13|ΔR12γ=ΔR13γ=0].\displaystyle\times E[\Delta W_{12}\Delta W_{13}^{\prime}\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0].

By Assumption 4, fRγ,2(0,0)>0f_{R\gamma,2}(0,0)>0. Also, by the conditional independence of d12td_{12t} and d13sd_{13s} under Assumption 1,

Pr(d121d122d131d132=1|ΔR12γ=ΔR13γ=0)\displaystyle Pr(d_{121}d_{122}d_{131}d_{132}=1|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0)
=E[(Pr(d122d122=1|ΔR12γ=0,ξ1,U1))2|ΔR12γ=0].\displaystyle=E\left[\left(Pr(d_{122}d_{122}=1|\Delta R_{12}^{\prime}\gamma=0,\xi_{1},U_{1})\right)^{2}\right|\Delta R_{12}^{\prime}\gamma=0].

Since Pr(d121d122=1|ΔR12γ=0)>0Pr(d_{121}d_{122}=1|\Delta R_{12}^{\prime}\gamma=0)>0 is implied by Assumption 3, it must be that

Pr(d121d122d131d132=1|ΔR12γ=ΔR13γ=0)>0,\displaystyle Pr(d_{121}d_{122}d_{131}d_{132}=1|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0)>0,

as otherwise Pr(d121d122=1|ΔR12γ=0,ξ1,U1)Pr(d_{121}d_{122}=1|\Delta R_{12}^{\prime}\gamma=0,\xi_{1},U_{1}) is constant at 0, which contradicts with the locally positive probability of d121d122=1d_{121}d_{122}=1. Thus, cΣWν,1c=0c^{\prime}\Sigma_{W\nu,1}c=0 is equivalent to

E[ΔcW12ΔW13cν12ν13|ΔR12γ=ΔR13γ=0]\displaystyle E[\Delta c^{\prime}W_{12}\Delta W_{13}^{\prime}c\nu_{12}\nu_{13}|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0]
=E[(E[cΔW12ν12|ΔR12γ=0,ξ1,U1])2|ΔR12γ=0]\displaystyle=E\left[\left(E[c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=0,\xi_{1},U_{1}]\right)^{2}|\Delta R_{12}^{\prime}\gamma=0\right]
=0,\displaystyle=0,

which, in turn, is equivalent to (by the mean independence of ν12\nu_{12}),

E[cΔW12ν12|ΔR12γ=0,ξ1,U1]=0,\displaystyle E[c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=0,\xi_{1},U_{1}]=0,

almost surely. Thus, cΣWν,1c=0c^{\prime}\Sigma_{W\nu,1}c=0 implies that

cΣWν,1\displaystyle c^{\prime}\Sigma_{W\nu,1} =fRγ,2(0,0)Pr(d121d122d131d132=1|ΔR12γ=ΔR13γ=0)\displaystyle=f_{R\gamma,2}(0,0)Pr(d_{121}d_{122}d_{131}d_{132}=1|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0)
×E[E[cΔW12ν12|ΔR12γ=0,ξ1,U1]E[ΔW13ν13|ΔR13γ,ξ1,U1]|ΔR12γ=ΔR13γ=0]\displaystyle\times E\left[E[c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=0,\xi_{1},U_{1}]E[\Delta W_{13}^{\prime}\nu_{13}|\Delta R_{13}^{\prime}\gamma,\xi_{1},U_{1}]|\Delta R_{12}^{\prime}\gamma=\Delta R_{13}^{\prime}\gamma=0\right]
=0.\displaystyle=0.

Step 2: nhncΣ~Wν,1=op(1)nh_{n}c^{\prime}\tilde{\Sigma}_{W\nu,1}=o_{p}(1)


Remember that

Σ~Wν,1=D7+𝒪7.\displaystyle\tilde{\Sigma}_{W\nu,1}=D_{7}+\mathcal{O}_{7}.

For D7D_{7}, from the calculation in the proof of Lemma 3, we have that

E[cD7]=E[cS12,1S13,1]=O(hnk).\displaystyle E[c^{\prime}D_{7}]=E[c^{\prime}S_{12,1}S_{13,1}^{\prime}]=O(h_{n}^{k}).

, which shows nhnE[cD7]=O(nhnk+1)=o(1)nh_{n}E[c^{\prime}D_{7}]=O(nh_{n}^{k+1})=o(1) under the hypothesis. For any non-zero vector aqwa\in\mathbb{R}^{q_{w}}, redefining UnU_{n} and Un,1U_{n,1} with the kernel pn(𝝃i,𝝃j,𝝃k)=E[cD7,ijka|𝝃i,𝝃j,𝝃k]p_{n}(\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{k})=E[c^{\prime}D_{7,ijk}a|\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{k}], we can repeat the calculation in the proof of Lemma 7 to get

E[(nhncD7anhnUn,1)2]=n2hn2×O(1n3hn2)=o(1).\displaystyle E[(nh_{n}c^{\prime}D_{7}a-nh_{n}U_{n,1})^{2}]=n^{2}h_{n}^{2}\times O\left(\frac{1}{n^{3}h_{n}^{2}}\right)=o(1).

Also, E[nhnpn(𝝃i,𝝃j,𝝃k)]=E[nhncD7]=o(1)E[nh_{n}p_{n}(\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{k})]=E[nh_{n}c^{\prime}D_{7}]=o(1) and

E[(nhnpn(𝝃i,𝝃j,𝝃l))2]=O(n),\displaystyle E[(nh_{n}p_{n}(\boldsymbol{\xi}_{i},\boldsymbol{\xi}_{j},\boldsymbol{\xi}_{l}))^{2}]=O(n),

so that by Lemma A.3 of Ahn and Powell (1993),

nhnUn=nhnE[Un,1]+op(1)=op(1).\displaystyle nh_{n}U_{n}=nh_{n}E[U_{n,1}]+o_{p}(1)=o_{p}(1).

This shows that, since aa is arbitrary,

nhncD7=op(1).\displaystyle nh_{n}c^{\prime}D_{7}=o_{p}(1).

For the remainder term 𝒪7\mathcal{O}_{7}, this should again be of smaller order than nhncD7nh_{n}c^{\prime}D_{7} since ββ^n=Op(1/n)\beta-\hat{\beta}_{n}=O_{p}(1/\sqrt{n}) and λij=ΔRijγΛij\lambda_{ij}=\Delta R_{ij}^{\prime}\gamma\Lambda_{ij} is locally O(hnk+1)O(h_{n}^{k+1}) under the smoothing kernel and smoothness conditions on the density. For example, one of the elements in 𝒪7\mathcal{O}_{7} is given by

nhn(n3)i<j<k13(Sij,2Sik,2+Sij,2Sjk,2+Sik,2Sjk,2)\displaystyle nh_{n}{n\choose 3}\sum_{i<j<k}\frac{1}{3}(S_{ij,2}S_{ik,2}^{\prime}+S_{ij,2}S_{jk,2}^{\prime}+S_{ik,2}S_{jk,2})
(n3)i<j<k43hn2(|ΔWij2ΔWik2|K(ΔRijγ/hn)||K(ΔRikγ/hn)|\displaystyle\leq{n\choose 3}\sum_{i<j<k}\frac{4}{3h_{n}^{2}}\Bigg{(}|\Delta W_{ij}\|^{2}\|\Delta W_{ik}\|^{2}|K(\Delta R_{ij}^{\prime}\gamma/h_{n})||K(\Delta R_{ik}^{\prime}\gamma/h_{n})|
+ΔWij2ΔWjk2|K(ΔRijγ/hn)||K(ΔRjk/hn)|\displaystyle+\|\Delta W_{ij}\|^{2}\|\Delta W_{jk}\|^{2}|K(\Delta R_{ij}^{\prime}\gamma/h_{n})||K(\Delta R_{jk}/h_{n})|
+ΔWik2ΔWjk2|K(ΔRik/hn)||K(ΔRjk/hn)|)nhnββ^n2\displaystyle+\|\Delta W_{ik}^{\prime}\|^{2}\|\Delta W_{jk}\|^{2}|K(\Delta R_{ik}/h_{n})||K(\Delta R_{jk}/h_{n})|\Bigg{)}nh_{n}\|\beta-\hat{\beta}_{n}\|^{2}
=Op(hn)=op(1),\displaystyle=O_{p}(h_{n})=o_{p}(1),

where the last line can be shown by the same calculation as before to show Op(1)O_{p}(1) for the summation part and ββ^n2=Op(1/n)\|\beta-\hat{\beta}_{n}\|^{2}=O_{p}(1/n) from Theorem 1. Similarly, we can show the negligibility of the elements in 𝒪7\mathcal{O}_{7}. This finishes the step 2.

Step 3: n1α/2hnΣ^Wν,1Σ~Wν,1=op(1)n^{1-\alpha/2}h_{n}\|\hat{\Sigma}_{W\nu,1}-\tilde{\Sigma}_{W\nu,1}\|=o_{p}(1)


By the proof of Lemma 8, we have

n1α/2hnΣ^Wν,1Σ~Wν,1Op(1)(n1α/2γ^nγhn+n1α/2γ^nγ2hn3).\displaystyle n^{1-\alpha/2}h_{n}\|\hat{\Sigma}_{W\nu,1}-\tilde{\Sigma}_{W\nu,1}\|\leq O_{p}(1)\Bigg{(}\frac{n^{1-\alpha/2}\|\hat{\gamma}_{n}-\gamma\|}{h_{n}}+\frac{n^{1-\alpha/2}\|\hat{\gamma}_{n}-\gamma\|^{2}}{h_{n}^{3}}\Bigg{)}.

Observe that

n1α/2γ^nγhn\displaystyle\frac{n^{1-\alpha/2}\|\hat{\gamma}_{n}-\gamma\|}{h_{n}} =O(Nhnγ^nγnαhn3)=op(1),\displaystyle=O\Bigg{(}\frac{\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|}{\sqrt{n^{\alpha}h_{n}^{3}}}\Bigg{)}=o_{p}(1),
n1α/2γ^nγ2hn3\displaystyle\frac{n^{1-\alpha/2}\|\hat{\gamma}_{n}-\gamma\|^{2}}{h_{n}^{3}} =O(Nhn(γ^nγ)2n2+αhn7)=op(1),\displaystyle=O\Bigg{(}\frac{\|\sqrt{Nh_{n}}(\hat{\gamma}_{n}-\gamma)\|^{2}}{\sqrt{n^{2+\alpha}h_{n}^{7}}}\Bigg{)}=o_{p}(1),

by Assumption 10 and

nαhn3\displaystyle n^{\alpha}h_{n}^{3} n(α(2k+3)6)/(2k+3),\displaystyle\sim n^{(\alpha(2k+3)-6)/(2k+3)}\to\infty,
n2+αhn7\displaystyle n^{2+\alpha}h_{n}^{7} n((2k+3)α+4k8)/(2k+3),\displaystyle\sim n^{((2k+3)\alpha+4k-8)/(2k+3)}\to\infty,

for α[6/(2k+3),1)\alpha\in[6/(2k+3),1). Hence,

n1α/2hnΣ^Wν,1Σ~Wν,1=op(1).\displaystyle n^{1-\alpha/2}h_{n}\|\hat{\Sigma}_{W\nu,1}-\tilde{\Sigma}_{W\nu,1}\|=o_{p}(1).

Steps 1-3 complete the proof of Lemma 9. ∎

Proof of Lemma 10

Proof.

Write cWc_{W} as cc for short. It suffices to show that nhncΣ^Wν,1ccΣ~Wν,1c=op(1)nh_{n}\|c^{\prime}\hat{\Sigma}_{W\nu,1}c-c^{\prime}\tilde{\Sigma}_{W\nu,1}c\|=o_{p}(1) as we already show, in the proof of Lemma 9, that nhncΣ~Wν,1=op(1)nh_{n}c^{\prime}\tilde{\Sigma}_{W\nu,1}=o_{p}(1), which implies nhncΣ~Wν,1c=op(1)nh_{n}c^{\prime}\tilde{\Sigma}_{W\nu,1}c=o_{p}(1). To save the space, in the following argument, we treat Δϵ^ij\Delta\hat{\epsilon}_{ij} as νij\nu_{ij}; The other terms are similarly bounded using the properties of ββ^n\beta-\hat{\beta}_{n} and λij\lambda_{ij}.

Re-define

Sij\displaystyle S_{ij} =2hndij1dij2K(ΔRijγhn)cΔWijνij\displaystyle=\frac{2}{h_{n}}d_{ij1}d_{ij2}K\left(\frac{\Delta R_{ij}^{\prime}\gamma}{h_{n}}\right)c^{\prime}\Delta W_{ij}\nu_{ij}
S^ij\displaystyle\hat{S}_{ij} =2hn2dij1dij2K(ΔRijγhn)cΔWijΔRijνij\displaystyle=\frac{2}{h_{n}^{2}}d_{ij1}d_{ij2}K^{\prime}\left(\frac{\Delta R_{ij}^{\prime}\gamma}{h_{n}}\right)c^{\prime}\Delta W_{ij}\Delta R_{ij}^{\prime}\nu_{ij}
S^ij,2\displaystyle\hat{S}_{ij,2} =1hn3dij1dij2K(cij,nhn)ΔRijcΔWijΔRijνij,\displaystyle=\frac{1}{h_{n}^{3}}d_{ij1}d_{ij2}K^{\prime\prime}\left(\frac{c_{ij,n}^{*}}{h_{n}}\right)\Delta R_{ij}c^{\prime}\Delta W_{ij}\Delta R_{ij}^{\prime}\nu_{ij},

where cij,nc_{ij,n}^{*} is in between ΔRijγ\Delta R_{ij}^{\prime}\gamma and ΔRijγ^n\Delta R_{ij}^{\prime}\hat{\gamma}_{n}. We have that

S^ij=Sij+S^ij,1(γ^nγ)+(γ^nγ)S^ij,2(γ^nγ).\displaystyle\hat{S}_{ij}=S_{ij}+\hat{S}_{ij,1}(\hat{\gamma}_{n}-\gamma)+(\hat{\gamma}_{n}-\gamma)^{\prime}\hat{S}_{ij,2}(\hat{\gamma}_{n}-\gamma).

Note that, for some constant C>0C>0, S^ij,1Chn2gij,2\|\hat{S}_{ij,1}\|\leq Ch_{n}^{-2}g_{ij,2} as before and

S^ij,2Chn3ΔWijΔRij2|νij|gij,3.\displaystyle\|\hat{S}_{ij,2}\|\leq\frac{C}{h_{n}^{3}}\underbrace{\|\Delta W_{ij}\|\|\Delta R_{ij}\|^{2}|\nu_{ij}|}_{g_{ij,3}}.

Observe that

cΣ^Wν,1ccΣ~Wν,1c\displaystyle c^{\prime}\hat{\Sigma}_{W\nu,1}c-c^{\prime}\tilde{\Sigma}_{W\nu,1}c
(γ^nγ)hn(n3)1i<j<khn3(SijS^ik,1+SijS^jk,1+SikS^jk,1)D10,1,ijkD9,1\displaystyle\leq\frac{(\hat{\gamma}_{n}-\gamma)^{\prime}}{\sqrt{h_{n}}}\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\underbrace{\frac{\sqrt{h_{n}}}{3}(S_{ij}\hat{S}_{ik,1}^{\prime}+S_{ij}\hat{S}_{jk,1}^{\prime}+S_{ik}\hat{S}_{jk,1}^{\prime})}_{D_{10,1,ijk}}}_{D_{9,1}}
+(n3)1i<j<khn3(S^ij,1Sik+S^ij,1Sjk+S^ik,1Sjk)D10,2(γ^nγ)hn\displaystyle+\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{\sqrt{h_{n}}}{3}(\hat{S}_{ij,1}S_{ik}+\hat{S}_{ij,1}S_{jk}+\hat{S}_{ik,1}S_{jk})}_{D_{10,2}}\frac{(\hat{\gamma}_{n}-\gamma)}{\sqrt{h_{n}}}
+(γ^nγ)hn3/2(n3)1i<j<khn33(SijS^ik,2+SijS^jk,2+SikS^jk,2)D10,3(γ^nγ)hn3/2\displaystyle+\frac{(\hat{\gamma}_{n}-\gamma)^{\prime}}{h_{n}^{3/2}}\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{h_{n}^{3}}{3}(S_{ij}\hat{S}_{ik,2}^{\prime}+S_{ij}\hat{S}_{jk,2}^{\prime}+S_{ik}\hat{S}_{jk,2}^{\prime})}_{D_{10,3}}\frac{(\hat{\gamma}_{n}-\gamma)}{h_{n}^{3/2}}
+(γ^nγ)hn3/2(n3)1i<j<khn33(S^ij,2Sik+S^ij,2Sjk+S^ik,2Sjk)D9,4(γ^nγ)hn3/2\displaystyle+\frac{(\hat{\gamma}_{n}-\gamma)^{\prime}}{h_{n}^{3/2}}\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{h_{n}^{3}}{3}(\hat{S}_{ij,2}S_{ik}+\hat{S}_{ij,2}S_{jk}+\hat{S}_{ik,2}S_{jk})}_{D_{9,4}}\frac{(\hat{\gamma}_{n}-\gamma)}{h_{n}^{3/2}}
+C2(n3)1i<j<k13(gij,2gik,3+gij,2gjk,3+gik,2gjk,3)D10,5γ^nγ3hn5\displaystyle+C^{2}\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{1}{3}(g_{ij,2}g_{ik,3}+g_{ij,2}g_{jk,3}+g_{ik,2}g_{jk,3})}_{D_{10,5}}\frac{\|\hat{\gamma}_{n}-\gamma\|^{3}}{h_{n}^{5}}
+C2(n3)1i<j<k13(gij,3gik,2+gij,3gjk,2+gjk,3gik,2)D10,6γ^nγ3hn5\displaystyle+C^{2}\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{1}{3}(g_{ij,3}g_{ik,2}+g_{ij,3}g_{jk,2}+g_{jk,3}g_{ik,2})}_{D_{10,6}}\frac{\|\hat{\gamma}_{n}-\gamma\|^{3}}{h_{n}^{5}}
+(n3)1i<j<khn23(S^ij,1S^ik,1+S^ij,1S^jk,1+S^jk,1S^ik,1)D10,7γ^nγ2hn2\displaystyle+\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{h_{n}^{2}}{3}(\|\hat{S}_{ij,1}\|\|\hat{S}_{ik,1}\|+\|\hat{S}_{ij,1}\|\|\hat{S}_{jk,1}\|+\|\hat{S}_{jk,1}\|\|\hat{S}_{ik,1}\|)}_{D_{10,7}}\frac{\|\hat{\gamma}_{n}-\gamma\|^{2}}{h_{n}^{2}}
+C2(n3)1i<j<k13(gij,3gik,3+gij,3gjk,3+gjk,3gik,3)D10,8γ^nγ4hn6.\displaystyle+C^{2}\underbrace{{n\choose 3}^{-1}\sum_{i<j<k}\frac{1}{3}(g_{ij,3}g_{ik,3}+g_{ij,3}g_{jk,3}+g_{jk,3}g_{ik,3})}_{D_{10,8}}\frac{\|\hat{\gamma}_{n}-\gamma\|^{4}}{h_{n}^{6}}.

First we stochastically bound D10,1D_{10,1} and D10,2D_{10,2}. For any vector aqra\in\mathbb{R}^{q_{r}} and some constant C>0C>0,

E[aD10,1]\displaystyle E[a^{\prime}D_{10,1}] =hnhn3E[E[cS12|ξ1,U1]E[aS^13c|ξ1,U1]]\displaystyle=\frac{\sqrt{h_{n}}}{h_{n}^{3}}E\left[E[c^{\prime}S_{12}|\xi_{1},U_{1}]E[a^{\prime}\hat{S}_{13}c|\xi_{1},U_{1}]\right]
1hn3/2{E[E[d121d122cΔWν12|ΔR12=s1hn,ξ1,U1]2K(s1)fRγ|ξ1,U1(s1hn)]}1/2\displaystyle\leq\frac{1}{h_{n}^{3/2}}\left\{E\left[\int E[d_{121}d_{122}c^{\prime}\Delta W\nu_{12}|\Delta R_{12}=s_{1}h_{n},\xi_{1},U_{1}]^{2}K(s_{1})f_{R\gamma|\xi_{1},U_{1}}(s_{1}h_{n})\right]\right\}^{1/2}
×CE[ΔW122ΔR132ν132]1/2\displaystyle\times CE[\|\Delta W_{12}\|^{2}\|\Delta R_{13}\|^{2}\nu_{13}^{2}]^{1/2}
=O(hn(2k1)/2)=o(1),\displaystyle=O(h_{n}^{(2k-1)/2})=o(1),

where the first line hold from the conditional independence, the inequality follows from Cauchy-Schwartz and Assumption 8, and the final line holds from the implication from cΣWν,1c=0c^{\prime}\Sigma_{W\nu,1}c=0 and Assumptions 4, 6, 7, and 8. Repeating the calculation in the proof for Lemma 7 and adjusting for covariances with one index in common,

Var[aD10,1]=O(1n2hn3)+O(hnE[S12aS^13,1S14aS^15,1]n)=o(1),\displaystyle Var[a^{\prime}D_{10,1}]=O\left(\frac{1}{n^{2}h_{n}^{3}}\right)+O\left(\frac{h_{n}E[S_{12}a^{\prime}\hat{S}_{13,1}^{\prime}S_{14}a^{\prime}\hat{S}_{15,1}^{\prime}]}{n}\right)=o(1),

where the last equality holds from the stated assumption on hnh_{n} for k1k\geq 1 and

E[S12aS^13,1S14aS^15,1]\displaystyle E[S_{12}a^{\prime}\hat{S}_{13,1}^{\prime}S_{14}a^{\prime}\hat{S}_{15,1}^{\prime}]
=16hn2E[E[d121d122cΔW12ν12|ΔR12γ=hns1,𝝃1]\displaystyle=\frac{16}{h_{n}^{2}}E\bigg{[}E[d_{121}d_{122}c^{\prime}\Delta W_{12}\nu_{12}|\Delta R_{12}^{\prime}\gamma=h_{n}s_{1},\boldsymbol{\xi}_{1}]
×E[d131d132cΔW13aΔR13ν13|ΔR13γ=hns2,𝝃1]\displaystyle\times E[d_{131}d_{132}c^{\prime}\Delta W_{13}a^{\prime}\Delta R_{13}\nu_{13}|\Delta R_{13}^{\prime}\gamma=h_{n}s_{2},\boldsymbol{\xi}_{1}]
×E[d141d142cΔW14ν14|ΔR14γ=hns3,𝝃1]\displaystyle\times E[d_{141}d_{142}c^{\prime}\Delta W_{14}\nu_{14}|\Delta R_{14}^{\prime}\gamma=h_{n}s_{3},\boldsymbol{\xi}_{1}]
×E[d151d152cΔW15aΔR15ν15|ΔR15γ=hns4,𝝃1]\displaystyle\times E[d_{151}d_{152}c^{\prime}\Delta W_{15}a^{\prime}\Delta R_{15}\nu_{15}|\Delta R_{15}^{\prime}\gamma=h_{n}s_{4},\boldsymbol{\xi}_{1}]
×K(s1)K(s2)K(s3)K(s4)Πi=14fRγ|ξ1,U1(sihn)]\displaystyle\times K(s_{1})K^{\prime}(s_{2})K(s_{3})K^{\prime}(s_{4})\Pi_{i=1}^{4}f_{R\gamma|\xi_{1},U_{1}}(s_{i}h_{n})\bigg{]}
=O(1hn2)\displaystyle=O\left(\frac{1}{h_{n}^{2}}\right)

by Assumptions 4, 6, and 8. Thus, D1,1=op(1)D_{1,1}=o_{p}(1). Similarly, we have D1,2=op(1)D_{1,2}=o_{p}(1).

Next, we stochastically bound D10,3D_{10,3} and D9,4D_{9,4}. For any finite a,bqra,b\in\mathbb{R}^{q_{r}}, for some C>0C>0,

E[aD10,3b]\displaystyle E[a^{\prime}D_{10,3}b]
ChnE[|S12|g13,3]\displaystyle\leq\frac{C}{h_{n}}E[|S_{12}|g_{13,3}]
=CE[E[d121d122|cΔW12||ν12||ΔR12γ=hns1,𝝃1]\displaystyle=CE\bigg{[}\int E[d_{121}d_{122}|c^{\prime}\Delta W_{12}||\nu_{12}||\Delta R_{12}^{\prime}\gamma=h_{n}s_{1},\boldsymbol{\xi}_{1}]
×E[g13|𝝃1]\displaystyle\times E[g_{13}|\boldsymbol{\xi}_{1}]
×K(s1)fRγ|ξ1,U1(s1h1)fRγ|ξ1,U1(s2)]\displaystyle\times K(s_{1})f_{R\gamma|\xi_{1},U_{1}}(s_{1}h_{1})f_{R\gamma|\xi_{1},U_{1}}(s_{2})\bigg{]}
=O(1),\displaystyle=O(1),

where the last equality holds from Assumptions 4, 6, 7 and 8. The variance is calculated similarly as before:

Var[aD10,3b]=O(1n2hn)+O(hnn)=o(1).\displaystyle Var[a^{\prime}D_{10,3}b]=O\left(\frac{1}{n^{2}h_{n}}\right)+O\left(\frac{h_{n}}{n}\right)=o(1).

Thus, D10,3=Op(1)D_{10,3}=O_{p}(1). Similarly, D10,4=Op(1)D_{10,4}=O_{p}(1).

D10,5,D10,6D_{10,5},D_{10,6}, and D10,8D_{10,8} are all Op(1)O_{p}(1) by the similar computation as in Lemma 7.

D9.7D_{9.7} is stochastically bounded as follows. Observe that

E[D10,7]\displaystyle E[D_{10,7}] =4hn2E[S^12,1S^13,1]\displaystyle=\frac{4}{h_{n}^{2}}E[\|\hat{S}_{12,1}\|\|\hat{S}_{13,1}\|]
4E[|cΔW12|cΔW12|ΔR12ΔR13|ν12||ν13||ΔR12γ=hns1,ΔR13γ=hns2]\displaystyle\leq 4\int E[|c^{\prime}\Delta W_{12}|c^{\prime}\Delta W_{12}|\|\Delta R_{12}\|\Delta R_{13}\||\nu_{12}||\nu_{13}||\Delta R_{12}^{\prime}\gamma=h_{n}s_{1},\Delta R_{13}^{\prime}\gamma=h_{n}s_{2}]
×K(s1)K(s2)fRγ,2(hns1,hns2)\displaystyle\times K^{\prime}(s_{1})K^{\prime}(s_{2})f_{R\gamma,2}(h_{n}s_{1},h_{n}s_{2})
=O(1),\displaystyle=O(1),

where the last line holds from Assumption 4, 6, and 8. The variance is calculated similarly as before:

Var[D10,7]=O(1n2hn2)+O(1n)=o(1).\displaystyle Var[D_{10,7}]=O\left(\frac{1}{n^{2}h_{n}^{2}}\right)+O\left(\frac{1}{n}\right)=o(1).

Thus, D10,7=Op(1)D_{10,7}=O_{p}(1).

Finally, the above implies

nhn|cΣ^Wν,1ccΣ~Wν,1c|\displaystyle nh_{n}|c^{\prime}\hat{\Sigma}_{W\nu,1}c-c^{\prime}\tilde{\Sigma}_{W\nu,1}c|
Op(1)×(nhnγ^nγhn+nhnγ^nγ2hn3+nhnγ^nγ3hn5+nhnγ^nγ4hn6)\displaystyle\leq O_{p}(1)\times\left(\frac{nh_{n}\|\hat{\gamma}_{n}-\gamma\|}{h_{n}}+\frac{nh_{n}\|\hat{\gamma}_{n}-\gamma\|^{2}}{h_{n}^{3}}+\frac{nh_{n}\|\hat{\gamma}_{n}-\gamma\|^{3}}{h_{n}^{5}}+\frac{nh_{n}\|\hat{\gamma}_{n}-\gamma\|^{4}}{h_{n}^{6}}\right) =op(1),\displaystyle=o_{p}(1),

because

nhnγ^γhn\displaystyle\frac{nh_{n}\|\hat{\gamma}-\gamma\|}{h_{n}} =O(Nhnγ^nγ)=op(1),\displaystyle=O(\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|)=o_{p}(1),
nhnγ^nγ2hn3\displaystyle\frac{nh_{n}\|\hat{\gamma}_{n}-\gamma\|^{2}}{h_{n}^{3}} =O((Nhnγ^nγ)2nhn3)=op(1),\displaystyle=O\left(\frac{(\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|)^{2}}{nh_{n}^{3}}\right)=o_{p}(1),
nhnγ^nγ3hn5\displaystyle\frac{nh_{n}\|\hat{\gamma}_{n}-\gamma\|^{3}}{h_{n}^{5}} =O((Nhnγ^nγ)3n2hn11/2)=op(1),\displaystyle=O\left(\frac{(\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|)^{3}}{n^{2}h_{n}^{11/2}}\right)=o_{p}(1),
nhnγ^nγ4hn6\displaystyle\frac{nh_{n}\|\hat{\gamma}_{n}-\gamma\|^{4}}{h_{n}^{6}} =O((Nhnγ^nγ)4n3hn7)=op(1),\displaystyle=O\left(\frac{(\sqrt{Nh_{n}}\|\hat{\gamma}_{n}-\gamma\|)^{4}}{n^{3}h_{n}^{7}}\right)=o_{p}(1),

by Assumption 10, nhn3=O(n(2k3)/(2k+3))nh_{n}^{3}=O(n^{(2k-3)/(2k+3)}), n2hn11/2=O(n(4k5)/(2k+3))n^{2}h_{n}^{11/2}=O(n^{(4k-5)/(2k+3)}), and n3hn7=O(n(6k5))n^{3}h_{n}^{7}=O(n^{(6k-5)}) all diverging for k2k\geq 2. This completes the proof for Lemma 10. ∎