This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

General Covariance-Based Conditions for Central Limit Theorems with Dependent Triangular Arrays

Arun G. Chandrasekhar‡,§ Matthew O. Jackson‡,⋆ Tyler H. McCormick⋄,¶  and  Vydhourie Thiyageswaran
Abstract.

We present a general central limit theorem with simple, easy-to-check covariance-based sufficient conditions for triangular arrays of random vectors when all variables could be interdependent. The result is constructed from Stein’s method, but the conditions are distinct from those of related work. Existing approaches require checking bespoke conditions that in many contexts are difficult to verify scientifically and either (i) impose rigid structure, often difficult to interpret and lacking microfoundations (e.g., strong mixing in random fields) or (ii) allow more flexibility but limit the extent of correlations (e.g., sparsity restrictions in dependency graphs). Our approach, in contrast, permits researchers to work with high-level but intuitive conditions based on overall correlation. We show that these covariance conditions nest standard assumptions studied in the literature such as MM-dependence, mixing random fields, non-mixing autoregressive processes, and dependency graphs, which themselves need not imply each other. We apply our result practical settings previously not covered in the literature, such as treatment effects with spillovers in more settings than previously admitted, covariance matrices, processes with global dependencies such as epidemic spread and information diffusion, and spatial processes with Matérn dependencies.
Keywords: Central limit theorem, dependent data, Stein’s method

Stanford, Department of Economics
§J-PAL
Santa Fe Institute
University of Washington, Department of Statistics
University of Washington, Department of Sociology

1. Introduction

In many contexts researchers use an interdependent set of random vectors to develop estimators and need to establish whether the estimators are asymptotically normal. In existing results, dependency is modeled in idiosyncratic ways, with perhaps unintuitive if not unappealing conditions to describe assumptions on correlation. Further, in many settings—such as treatment effects with spillovers, epidemic spread, information diffusion, general equilibrium effects, and so on,—the correlation is non-zero for any finite set of observations across all random vectors, which is not allowed for in the literature.

We present simple covariance-based sufficient conditions for a central limit theorem to be applied to a triangular array of dependent random vectors. We use Stein’s method (Stein, 1986) to derive three high-level, easy-to-interpret conditions. Stein’s method is widely used in theoretical arguments (in fact, a special case of the argument here first appeared in (Chandrasekhar and Jackson, 2024)) and our goal is not to advance a new technique for proving limit theorems. Instead, we present a new approach to proving asymptotic normality that is extremely general, nesting many existing approaches. More importantly, it consists of easily-interpretable conditions that are also easy to check. Researchers can check our conditions by thinking directly about correlations using calculations that we demonstrate are straightforward in several consequential examples. This feature makes the result accessible to a wider range of researchers without having to derive bespoke limit theorems in every setting. Our approach also eases restrictions required by existing methods, again doing so in a way that is not idiosyncratically imposed by a specific setting.

We follow a literature that uses Stein’s method to prove asymptotic normality. Our goal is not to derive a new proof, but rather to provide much more general conditions that allow for some amount of dependence between all observations, but at the same time are minimal in terms of parametric/shape restrictions, making them flexible and easy to check.

To review, Stein’s method observes that

E[Yf(Y)]=E[f(Y)]\mathrm{E}\left[Yf(Y)\right]=\mathrm{E}\left[f^{\prime}(Y)\right]

for all continuously differentiable functions if and only if YY has a standard normal distribution. So, when considering normalized sums taking the role of YY, it is enough to show that this equality holds for all such functions asymptotically. Rinott and Rotar (2000) and Ross (2011), among others, provide a detailed view of the method.Proving normality using Stein’s method typically amounts to checking dependence conditions. Bolthausen (1982), for example, establishes how the probability of a joint set of events differs from independence decays in temporal distance. So, as the mixing coefficients decay fast enough in distance, the proof proceeds by checking that the Stein argument follows under the mixing structure. Distance in time can be generalized to space and, further, to random fields. Existing literature is organized around a number of such conditions that apply in particular data settings.

A peculiar consequence of this organization, however, is that these conditions generally lack a scientific basis for the assumed dependence structure. For example, spatial standard errors are often used—e.g., in Froot (1989); Conley (1999); Driscoll and Kraay (1998)—when conducting ZZ-estimation (or GMM). However, in actual applications, for instance agricultural shocks such as rainfall or pests or soil, it is not clear that they follow a specific form of interdependence satisfying ϕ\phi-mixing with a certain decay rate as invoked in Conley (1999) (cross-sectionally) or Driscoll and Kraay (1998) (temporally and cross-sectionally). Surely shocks correlate over space, but it is hard to argue empirically that their correlation matches the required mixing conditions.

To take a different example, some models of social network formation orient themselves by embedding individuals in a random field to deliver central limit theorems. Distance in the metric space is, then, inversely proportional to the likelihood of a connection. Given the structure of the field, researchers can toggle dependence as a function of distance, reducing the size of the covariance sum. However, the structure of the field has implications for consequential properties of the graph, such as clustering patterns (Hoff et al., 2002; Lubold et al., 2023), that then often fail to match the empirical observations. As in the previous example, the researcher probably has some intuition as to whether graph properties (e.g., clustering) are likely or not, but assessing the reasonableness of specific mixing random field assumptions is all but impossible.

As an alternative to embedding variables in a metric space and toggling dependency by distance, a literature on dependency graphs emerged (Baldi and Rinott, 1989; Goldstein and Rinott, 1996; Ross, 2011) in part to be more flexible on the correlation structure. There, the cost is extreme sparsity: observations have indices in a graph, where those that are not edge-adjacent are independent. This provides a different strategy to apply Stein’s method, by creating for each observation a dependency neighborhood. Sufficient sparsity in the graph structure allows a central limit theorem to apply, despite not forcing a time- or space-like structure. Examples include Ross (2011); Goldstein and Rinott (1996) and a more general treatment in Chen and Shao (2004).

Both embedding indices in a metric space or using a more unstructured dependency structure are similar in the sense that they constrain the total amount of correlation between the nn random variables. In principle there are (n2)\binom{n}{2}-order components to this sum, but, via mixing conditions or sparsity conditions, this sum is assumed to be of order nn. However, in many settings—such as treatment effects with spillovers, epidemic spread, information diffusion, or general equilibrium effects—we see dependency that are neither random fields nor are there any conditional independencies. The correlation is non-zero for any finite set of observations across all random vectors of interest. Our approach allows for this general dependence.

In the remainder of this section, we provide a brief overview of our approach, which we formalize in the next section. We consider a triangular array of nn random vectors X1:nnpX_{1:n}^{n}\in\mathbb{R}^{p}, which are neither necessarily independent nor identically distributed. We study conditions under which their appropriately normalized sample mean is asymptotically normally disturbed. In principle, at any nn, all XinX_{i}^{n} and XjnX_{j}^{n} can be correlated. The proof follows the well-known Stein’s method, though we develop and apply specific bounds for our purpose. To apply Stein’s method we first associate each random variable with a set of other random variables with which it could have a higher level of correlation, keeping in mind that it could in principle have some non-zero correlation with all variables.

We call these affinity sets, denoted 𝒜(i,d)n\mathcal{A}_{(i,d)}^{n}, to capture the other random variables with which ii may have high correlations in the dd-th dimension. We use the term affinity set rather than “dependency neighborhood” to emphasize the possibility of high and low non-zero covariance structures with arbitrary groupings that satisfy summation conditions. We provide sufficient conditions for asymptotic normality in terms of the total amount of covariance within an affinity set and the total amount of covariance across affinity sets. As long as, in the limit, the amount of interdependence in the overall mean comes from the covariances within affinity sets, asymptotic normality follows.

This yields substantially weaker conditions than in the previous literature. In Arratia et al. (1989), the authors present Chen’s method (Chen, 1975) for Poisson approximation rather than normality, which has a similar approach to ours in collecting random variables into dependencies. While this results in nice finite sample bounds, these bounds consist of three almost separate pieces, making it less friendly to understanding these bounds together with growing sums of covariance of the samples. In fact, all five of the examples studied in Arratia et al. (1989) are limited to cases where at least one of these pieces is identically zero in the cases where Chen’s method succeeds. Many empirically relevant examples do not have such zeros, and so our approach substantially expand the relevant applications.

To preview our conditions, presented formally below, let us take p=1p=1 so XinX_{i}^{n} is scalar, and let ZinZ_{i}^{n} be centered, Zin:=XinE(Xin)Z_{i}^{n}:=X_{i}^{n}-\mathrm{E}(X_{i}^{n}). Let

Ωn:=i=1nj𝒜incov(Zi,Zj),\Omega_{n}:=\sum_{i=1}^{n}\sum_{j\in\mathcal{A}^{n}_{i}}\mathrm{cov}\left(Z_{i},Z_{j}\right),

denote the total covariance within the affinity sets. Let also 𝐙i:=(j)𝒜(i)nZj\mathbf{Z}_{-i}:=\sum_{(j)\notin\mathcal{A}^{n}_{(i)}}Z_{j}, the sum of the random variables outside of ii’s affinity set. Informally, our conditions are the following.

  1. (1)

    Within affinity set covariance control:

    ij,k𝒜inE(ZjZk|Zi|) is small relative to (Ωn)3/2.\sum_{i}\sum_{j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}(Z_{j}Z_{k}\cdot|Z_{i}|)\text{ is small relative to }(\Omega_{n})^{3/2}.

    The average covariance between two random variables in an affinity set, when weighted by the realized magnitude of the reference variable, and the covariance between the averages given the magnitude of the reference variable, is small relative to the comparably adjusted covariance between the reference variables and their affinity sets.

  2. (2)

    Cross affinity set covariance control:

    i,jk𝒜in,l𝒜jncov(ZiZk,ZjZl) is small relative to (Ωn)2.\sum_{i,j}\sum_{k\in\mathcal{A}_{i}^{n},l\in\mathcal{A}_{j}^{n}}\mathrm{cov}(Z_{i}Z_{k},Z_{j}Z_{l})\text{ is small relative to }(\Omega_{n})^{2}.

    The average covariance across members of two different affinity sets (weighted by the reference variables) is sufficiently small compared to the squared covariance within affinity sets.

  3. (3)

    Outside affinity set covariance control:

    iE(|𝐙iE(Zi𝐙i)|) is small relative to Ωn.\sum_{i}\mathrm{E}(|\mathbf{Z}_{-i}\mathrm{E}(Z_{i}\mid\mathbf{Z}_{-i})|)\text{ is small relative to }\Omega_{n}.

    The average absolute conditional covariance outside of affinity sets is small compared to the covariance within affinity sets.

These conditions are general and easy to check, as we show. They also further simplify in a number of cases such as with positive correlations (as in diffusion models and auto-regressive processes) and with binary variables.

The advantages of this approach, which we see as most salient for emiprical scientists, are several-fold. First, we do not require a sparse dependency structure at any nn. That is, there can be non-zero correlation between any pair Xi,XjX_{i},X_{j}. Much of the dependency graph literature leverages an independence structure in constructing their bounds and, therefore, the bounds we build are different.

Second, because of this possibility of non-zero covariance across all random vectors, we organize our bounds through covariance conditions. We are reminded of a discussion in Chen (1975) in the context of Poisson approximation. Covariance conditions are easy-to-interpret and check, and from an applied perspective often easier to justify from a microfoundation.

Third, our result is for random vectors, and while the application of the Cramér-Wold device is simple in our setting—by the nature of how indexing works—it is useful to have and instructive for a practitioner.

Fourth, our setup nests many of the previous literature’s examples, most of which do not nest each other. We illustrate the utility of our central limit theorem through several distinct applications. We begin with an example of MM-dependence in a stochastic process, noting that this implies many other types of mixing, such as α\alpha-, ϕ\phi-, or ρ\rho- mixing (Bradley, 2005). We then move to random fields, where we show an example with α\alpha- and ϕ\phi- dependence. These mixing approaches require constructing idiosyncratic notions of dependence based on the underlying probability distribution that happen to imply bounds on the covariance function (see, for example, Rio (1993) or Rio (2017)), which, as noted above, is not derived from, and may not match, micro-economic foundations or scientific principles. Our covariance-based arguments are compact and direct, placing restrictions on the covariance explicitly and, thus, in a matter that is salient in the scientific context. They are also general rather than based on a specific model or type of dependence. We also show that our framework is applicable outside the context of mixing, giving examples with non-mixing autoregressive processes, dependency graphs, among other examples.

Fifth, we show that our generalizations permit a wider and more practical set of analyses that were otherwise ruled out or limited in the literature. This includes treatment effects with spillover models, covariance matrices, and things like epidemic and diffusion models. Specifically, we extend the treatment effects with spillovers analysis, as in Aronow and Samii (2017), to allow every individual’s exposure to treatment to possibly be increasing in every other node’s treatment assignment, and nonetheless, the relevant estimator is still asymptotically normally distributed. This case, which is ubiquitous in practice: e.g., in diffusion, epidemic, and financial flow models in principle a shock anywhere could theoretical impact any other node’s outcomes (albeit with potentially small odds). But this is assumed away in applied work because conventional central limit theorems do not cover such a case. We also show how a researcher can model covariance matrices without forcing a random field structure as in Conley (1999) or Driscoll and Kraay (1998). This allows applied researchers to proceed with greater generality, and permits structure across units that do not have a natural ordering such as race, ethnicity, caste, and occupation.

The next two examples concern diffusion. First, we look at a sub-critical epidemic process with a number of periods longer than the graph’s diameter. So, whether an individual is infected is correlated with the infection status of any other individual (assuming a connected, unweighted graph). Again, this practical situation is excluded by the previous central limit theorems in the literature. Second, we look at diffusion in stochastic block models to show that our conditions characterize when asymptotic normality holds and when it does not.

Lastly, we turn to the setting of Zhan and Datta (2023): the estimation of neural network models with irregular spatial dependence, e.g., Matérn covariance functions. The authors provide the first proof of consistent estimation of the neural network model in this dependent setting. We show that the covariance structure of the residuals, on which the asymptotic distribution of the estimator depends, satisfies our main assumptions and our CLT.

The remainder of the paper is organized as follows, Section 2 proves our main result. Section 3 shows that the conditions we provide nest several commonly used characterizations of dependence: M-dependence, non-mixing autoregressive processes, random fields, and dependency graphs. Section 4 discusses the process for checking our conditions in several applications (peer effects models, socio-demographic distances, sub-critical diffusion models, stochastic block models, and irregularly observed spatial processes), while also yielding new results. Section 5 provides a discussion.

2. The Theorem

We consider a triangular array of nn random variables X1:nnpX_{1:n}^{n}\in\mathbb{R}^{p}, with entries Xi,dnX_{i,d}^{n} and d{1,,p}d\in\{1,\ldots,p\}, each of which has finite variance (possibly varying with nn). We let ZinpZ_{i}^{n}\in\mathbb{R}^{p} denote the corresponding de-meaned variables, Zin=XinE[Xin].Z_{i}^{n}=X_{i}^{n}-{\rm E}\left[X_{i}^{n}\right]. The sum, SnpS^{n}\in\mathbb{R}^{p}, is given by Sn:=i=1nZin.S^{n}:=\sum_{i=1}^{n}Z_{i}^{n}. We suppress the dependency on nn for clarity; writing Xi,dX_{i,d} unless otherwise needed.

2.1. Affinity Sets

Each real-valued random variable Xi,dX_{i,d} has an affinity set, denoted 𝒜(i,d)n\mathcal{A}_{(i,d)}^{n}, which can depend on nn. We require (i,d)𝒜(i,d)n(i,d)\in\mathcal{A}_{(i,d)}^{n}. Heuristically, 𝒜(i,d)n\mathcal{A}^{n}_{(i,d)} includes the indices j,d{j,d^{\prime}} for which the covariance between Xj,dX_{j,d^{\prime}} and Xi,dX_{i,d} is relatively high in magnitude, but not those for which the covariance is low.

There is no independence requirement at any nn and, in fact, our sufficient conditions for the central limit theorem bound the total sums of covariances within and across affinity sets. The precise construction of affinity sets is flexible, as long as these bounds on the respective total sums are respected.

2.2. The Central Limit Theorem

Let Ωn\Omega_{n} be a p×pp\times p matrix which houses the bulk of covariance across observations and dimensions, summing across variables all the covariances of each variable and the others in its affinity set:

Ωn,dd:=i=1n(j,d)𝒜(i,d)ncov(Zi,d,Zj,d).\Omega_{n,dd^{\prime}}:=\sum_{i=1}^{n}\sum_{(j,d^{\prime})\in\mathcal{A}^{n}_{(i,d)}}\mathrm{cov}\left(Z_{i,d},Z_{j,d^{\prime}}\right).

This is distinct from a total variance-covariance matrix Σn,dd:=i=1nj=1ncov(Zi,d,Zj,d),\Sigma_{n,dd^{\prime}}:=\sum_{i=1}^{n}\sum_{j=1}^{n}\mathrm{cov}\left(Z_{i,d},Z_{j,d^{\prime}}\right), which includes terms outside of 𝒜(i,d)n\mathcal{A}^{n}_{(i,d)}.

In what follows, we maintain the assumption that ΩnF\left\|\Omega_{n}\right\|_{F}\rightarrow\infty, where F\left\|\cdot\right\|_{F} is the Frobenius norm. We also presume that for all (i,d),(i,d), E[|Z(i,d)|3]/E[|Z(i,d)|2]3/2\mathrm{E}[|Z_{(i,d)}|^{3}]/\mathrm{E}[|Z_{(i,d)}|^{2}]^{3/2} is bounded above. Define 𝐙i,d:=(j,d)𝒜(i,d)nZj,d\mathbf{Z}_{-i,d}:=\sum_{(j,d^{\prime})\notin\mathcal{A}^{n}_{(i,d)}}Z_{j,d^{\prime}}, the sum of the random variables outside of (i,d)(i,d)’s affinity set.

Our first assumption is that the the total mass of the variance-covariance is not driven by the covariance between members of a given affinity set neither of which are the reference random variables themselves. That is, given reference variable Xi,dX_{i,d}, the covariance of some Xj,dX_{j,d^{\prime}} and Xk,d′′X_{k,d^{\prime\prime}} where both are in the reference variable’s affinity set is relatively small in total across all such triples of variables compared to the variance coming from the reference variable and its affinity sets.

Assumption 1 (Bound on total weighted-covariance within affinity sets).
(i,d);(j,d),(k,d′′)𝒜(i,d)nE[|Zi,d|Zj,dZk,d′′]=o((ΩnF)3/2).\sum_{(i,d);(j,d^{\prime}),(k,d^{\prime\prime})\in\mathcal{A}^{n}_{(i,d)}}\mathrm{E}\left[|Z_{i,d}|Z_{j,d^{\prime}}Z_{k,d^{\prime\prime}}\right]=o\left(\left(\left\|\Omega_{n}\right\|_{F}\right)^{3/2}\right).

The second assumption is that the total mass of the variance-covariance is not driven by random variables across affinity sets relative to two distinct reference variables. That is, given two random variables Xi,dX_{i,d} and Xj,dX_{j,d^{\prime}}, the aggregate amount of weighted covariance between two other random variables—each within one of the reference variables’ affinity sets—is small compared to the (squared) variance coming from the reference variable and its affinity sets.

Assumption 2 (Bound on total weighted-covariance across affinity sets).
(i,d),(j,d);(k,d′′)𝒜(i,d)n,(l,d^)𝒜(j,d)ncov(Zi,dZk,d′′,Zj,dZl,d^)=o((ΩnF)2),\sum_{(i,d),(j,d^{\prime});(k,d^{\prime\prime})\in\mathcal{A}^{n}_{(i,d)},(l,\hat{d})\in\mathcal{A}^{n}_{(j,d^{\prime})}}\mathrm{cov}\left(Z_{i,d}Z_{k,d^{\prime\prime}},Z_{j,d^{\prime}}Z_{l^{\prime},\hat{d}}\right)=o\left(\left(\left\|\Omega_{n}\right\|_{F}\right)^{2}\right),

The third assumption is that the total mass of variance-covariance is not driven by reference random variables and the variables outside of their affinity sets, again compared to the variance coming from the reference variable and its affinity sets.

Assumption 3 (Bound on total weighted-covariance from outside of affinity sets).
(i,d)E(|E(Zi,d𝐙i,d|𝐙i,d)|)=(i,d)E(|𝐙i,dE(Zi,d|𝐙i,d)|)=o(ΩnF).\sum_{(i,d)}\mathrm{E}\left(|\mathrm{E}(Z_{i,d}\mathbf{Z}_{-i,d}|\mathbf{Z}_{-i,d})|\right)=\sum_{(i,d)}\mathrm{E}\left(|\mathbf{Z}_{-i,d}\mathrm{E}(Z_{i,d}|\mathbf{Z}_{-i,d})|\right)=o\left(\left\|\Omega_{n}\right\|_{F}\right).

These three assumptions imply a central limit theorem.

Theorem 1.

If Assumptions 1-3 are satisfied, then Ωn1/2Sn𝒩(0,Ip×p)\Omega_{n}^{-1/2}S^{n}\operatorname*{\rightsquigarrow}\mathcal{N}(0,I_{p\times p}).

The proof is provided in the Appendix. The argument follows by applying the Cramér-Wold device to the arguments following Stein’s method, as Chandrasekhar and Jackson (2024) argued for the univariate case. Since the Cramér-Wold device requires for all cpc\in\mathbb{R}^{p} fixed in nn that the cc-weighted sum satisfies a central limit theorem (Biscio et al., 2018)—that is, (cΩnc)1/2cSn𝒩(0,1)(c^{\prime}\Omega_{n}c)^{-1/2}c^{\prime}S^{n}\operatorname*{\rightsquigarrow}\mathcal{N}(0,1)—we can consider a problem of npnp random variables with affinity sets. Then, by checking Assumptions 1-3 for the case of c=1pc=1_{p} the result follows.

An important special case is where the affinity sets are the variables themselves: 𝒜(i,d)n={(i,d)}\mathcal{A}^{n}_{(i,d)}=\{(i,d)\}. In that case, the conditions simplify to a total bound on the overall sum of covariances across variables (the univariate case is in Chandrasekhar and Jackson (2024)). It nests many cases in practice, and we provide an illustration in our second application.

Corollary 1.

If 𝒜(i,d)n={(i,d)}\mathcal{A}^{n}_{(i,d)}=\{(i,d)\}, E[Zi,d𝐙i,d|𝐙i,d]0\mathrm{E}[Z_{i,d}\mathbf{Z}_{-i,d}|\mathbf{Z}_{-i,d}]\geq 0 for every (i,d)(i,d), and

  • (i)

    (i,d),(j,d)cov(Zi,d2,Zj,d2)=o((ΩnF)2)\sum_{(i,d),(j,d^{\prime})}\mathrm{cov}(Z_{i,d}^{2},Z_{j,d^{\prime}}^{2})=o\left(\left(\left\|\Omega_{n}\right\|_{F}\right)^{2}\right), and

  • (ii)

    (i,d)(j,d)cov(Zi,d,Zj,d)=o(ΩnF)\sum_{(i,d)\neq(j,d^{\prime})}\mathrm{cov}(Z_{i,d},Z_{j,d^{\prime}})=o\left(\left\|\Omega_{n}\right\|_{F}\right),

then Ωn1/2Sn𝒩(0,Ip×p)\Omega_{n}^{-1/2}S^{n}\operatorname*{\rightsquigarrow}\mathcal{N}(0,I_{p\times p}).

If E[Zi,d𝐙i,d|𝐙i,d]0\mathrm{E}[Z_{i,d}\mathbf{Z}_{-i,d}|\mathbf{Z}_{-i,d}]\geq 0 does not hold, then (ii) can just be substituted by Assumption 3. Also, it is useful to note that, for instance, if p=1p=1 and the XiX_{i}’s are Bernoulli random variables with E[Xi]0\mathrm{E}[X_{i}]\rightarrow 0 (uniformly), then condition (ii) implies condition (i) (Chandrasekhar and Jackson, 2024).

3. Models of Dependence

We first present four applications from the literature that prove asymptotic normality: (i) MM-dependence, (ii) non-mixing autoregressive processes, (iii) mixing random fields, and (iv) dependency graphs. These examples do not necessarily nest each other, though we do comment on relations between the dependence types in terms of mixing, where relevant. We can construct affinity sets that meet our conditions in each case. A key distinction in our work is that the conditions we provide are general, rather than specific to a particular model class of dependency type. We provide a sketch of the core assumptions made in the relevant papers to be self-contained for the reader. We show how these assumptions imply our covariance restrictions and the relative complexity of these setups.

We then present five common applications that are not covered by the previous literature but are covered by our model: (i) peer effects, (ii) covariance estimation with socio-demographic characteristics, (iii) subcritical diffusion processes, (iv) diffusion in stochastic block models, and (v) spatial dependence via a Matérn covariance matrix.

In these examples, we maintain consistent use of notation defined in the previous sections. The remaining notation, however, is kept consistent only within each subsection.

3.1. MM-dependence

3.1.1. Environment

We consider Theorem 2.1 of Romano and Wolf (2000). In this application there are real-valued time series data, so p=1p=1 (and we drop the index dd) and Ωn\Omega_{n} is a scalar. Under Romano and Wolf’s setup, Zn,iZ_{n,i} and Zn,jZ_{n,j} are independent if |ij|>M|i-j|>M. Here, {Zn,i}\{Z_{n,i}\} are mean zero random variables. For convenience of the reader, we include the assumptions made in their paper: Suppose Zn,1,Zn,2,,Zn,rZ_{n,1},Z_{n,2},...,Z_{n,r} is an MM-dependent sequence of random variables for some δ>0\delta>0 and 1γ<1-1\leq\gamma<1,

  1. (1)

    E|Zn,i|2+δΔn\mathrm{E}|Z_{n,i}|^{2+\delta}\leq\Delta_{n} for all ii

  2. (2)

    var(i=aa+k1Zn,i)k1γKn\mathrm{var}\left(\sum_{i=a}^{a+k-1}Z_{n,i}\right)k^{-1-\gamma}\leq K_{n} for all aa and kMk\geq M

  3. (3)

    var(i=1rZn,i)r1MγLn\mathrm{var}\left(\sum_{i=1}^{r}Z_{n,i}\right)r^{-1}M^{-\gamma}\geq L_{n}

  4. (4)

    Kn=O(Ln)K_{n}=O\left(L_{n}\right)

  5. (5)

    Δn=O(Ln1+δ/2)\Delta_{n}=O\left({L_{n}}^{1+\delta/2}\right)

  6. (6)

    M1+(1γ)(1+2/δ)=o(r)M^{1+(1-\gamma)(1+2/\delta)}=o(r)

3.1.2. Application of Theorem 1

We consider the MM-ball, 𝒜in={j:|ji|M}\mathcal{A}_{i}^{n}=\{j:|j-i|\leq M\}. We drop the subscript nn in Zn,iZ_{n,i} for convenience. In this case, cov(Zi,Zj)=0\mathrm{cov}(Z_{i},Z_{j})=0 by independence for all jj with |ij|>M|i-j|>M, so Assumption 3 is satisfied. Under bounded third and fourth moments, we check the remaining assumptions. Assumption 1 is easily verified:

i;j,k𝒜inE[|Zi|ZjZk]\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i}|Z_{j}Z_{k}] =O(M2iE[|Zi|3])=o(n3/2M3/2)=o(Ωn3/2),\displaystyle=O(M^{2}\sum_{i}\mathrm{E}[|Z_{i}|^{3}])=o\left(n^{3/2}M^{3/2}\right)=o\left(\Omega_{n}^{3/2}\right),

following their Assumption 6. Our Assumption 2 is satisfied similarly following their Assumption 6:

i,j;k𝒜in,l𝒜jncov(ZiZk,ZjZl)\displaystyle\sum_{i,j;k\in\mathcal{A}_{i}^{n},l\in\mathcal{A}_{j}^{n}}\mathrm{cov}(Z_{i}Z_{k},Z_{j}Z_{l}) =O(i,j:|ij|M,k:|ki|M,l:|li|2Mvar(Zi2))=O(nM3var(Zi2))=o(n2M2)=o(Ωn2).\displaystyle=O(\sum_{\mathclap{\begin{subarray}{c}i,j:|i-j|\leq M,\\ k:|k-i|\leq M,\\ l:|l-i|\leq 2M\end{subarray}}}\mathrm{var}(Z_{i}^{2}))=O(n\cdot M^{3}\cdot\mathrm{var}(Z_{i}^{2}))=o(n^{2}M^{2})=o(\Omega_{n}^{2}).

This is due to the fact that if they are not within that distance, then automatically it is impossible for the ZkZ_{k} and ZlZ_{l} to induce any correlation as well.

Following the hierarchy established in Bradley (2007), MM-dependence implies several of these commonly used forms of mixing, such as ρ\rho-mixing (Bradley, 2005). It also implies ϕ\phi-mixing and α\alpha-mixing in the time series context. We also given a example using α\alpha- and ϕ\phi- mixing in the context of random fields below. Characterizing mixing in the context of random fields requires imposing restrictions on the dependence between σ\sigma-algebras as the number of points in those sets increases, see Jenish and Prucha (2009) or Bradley (2005).

Note that our conditions generate bounded fourth moment requirements, which is not necessarily a condition invoked in every analysis of MM-dependent processes in the literature, which sometimes have slightly lower moment requirements. Nonetheless, our results are not intended to provide the tightest bounds, but rather general conditions, spanning various types of dependence, that are easily checkable for most applied settings.

3.2. Andrews’ Non-Mixing Autoregressive Processes

3.2.1. Environment

This application is from Andrews (1984), which allows interdependence in a time series, but one that does not satisfy strong (α\alpha-) mixing, in order to clarify the distinction between dependence and mixing. Again, p=1p=1. We take Xt=l=0ρlϵtl,X_{t}=\sum_{l=0}^{\infty}\rho^{l}\epsilon_{t-l}, where ρ(0,1/2]\rho\in(0,1/2]. The ϵt\epsilon_{t}’s come from a Bernoulli distribution with success probability qq. We define the ZtZ_{t} as normalized, mean zero, XtX_{t}. Assume, without loss of generality, that s>ts>t, so

cov(Zt,Zs)\displaystyle\mathrm{cov}(Z_{t},Z_{s}) =cov(l=0ρlϵtl,k=0ρkϵsk)=ρMl=0ρ2lvar(ϵtl)=ρM1ρ2q(1q)\displaystyle=\mathrm{cov}(\sum_{l=0}^{\infty}\rho^{l}\epsilon_{t-l},\sum_{k=0}^{\infty}\rho^{k}\epsilon_{s-k})=\rho^{M}\sum_{l=0}^{\infty}\rho^{2l}\mathrm{var}(\epsilon_{t-l})=\frac{\rho^{M}}{1-\rho^{2}}\cdot q(1-q)

where M=stM=s-t. For a constant CC depending only on ρ\rho, we show asymptotic normality of ZtZ_{t}: 1Cnq(1q)tZt𝒩(0,1).\frac{1}{\sqrt{Cnq(1-q)}}\sum_{t}Z_{t}\operatorname*{\rightsquigarrow}\mathcal{N}(0,1).

3.2.2. Application of Theorem 1

We begin by verifying our conditions for truncated versions of our random variables and then show that the full result applies. To define what we mean by “truncation”, for each i[n]i\in[n], let

Xi(D)=j=0Dρjϵij\displaystyle X_{i}^{(D)}=\sum_{j=0}^{D}\rho^{j}\epsilon_{i-j}

for some D.D. Let also Sn(D):=i=1nZi(D)i=1n(Xi(D)𝔼[Xi(D)])S_{n}^{(D)}:=\sum_{i=1}^{n}Z_{i}^{(D)}\equiv\sum_{i=1}^{n}(X_{i}^{(D)}-\mathbb{E}[X_{i}^{(D)}]). We then define the affinity sets to be 𝒜i={k:|ik|D}\mathcal{A}_{i}=\{k:|i-k|\leq D\}.

We begin by verifying Assumption 1:

i;j,k𝒜inE[|Zi(D)|Zj(D)Zk(D)]\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i}^{(D)}|Z_{j}^{(D)}Z_{k}^{(D)}] =O(D2iE[|Zi|3])=o(n3/2D3/2)=o(Ωn3/2),\displaystyle=O(D^{2}\sum_{i}\mathrm{E}[|Z_{i}|^{3}])=o\left(n^{3/2}D^{3/2}\right)=o\left(\Omega_{n}^{3/2}\right),

Next, we verify Assumption 2:

i,j;k𝒜in,l𝒜jncov(Zi(D)Zk(D),Zj(D)Zl(D))\displaystyle\sum_{i,j;k\in\mathcal{A}_{i}^{n},l\in\mathcal{A}_{j}^{n}}\mathrm{cov}(Z_{i}^{(D)}Z_{k}^{(D)},Z_{j}^{(D)}Z_{l}^{(D)}) =\displaystyle= O(i,j:|ij|D,k:|ki|D,l:|li|2Dvar(Zi(D)2))\displaystyle O(\sum_{\mathclap{\begin{subarray}{c}i,j:|i-j|\leq D,\\ k:|k-i|\leq D,\\ l:|l-i|\leq 2D\end{subarray}}}\mathrm{var}({Z_{i}^{(D)}}^{2}))
=\displaystyle= O(nD3var(Zi(D)2)=o(n2D2)=o(Ωn2).\displaystyle O(n\cdot D^{3}\cdot\mathrm{var}({Z_{i}^{(D)}}^{2})=o(n^{2}D^{2})=o(\Omega_{n}^{2}).

To verify Assumption 3 is trivial; we have i𝔼[|𝐙i(D)𝔼[Zi(D)|𝐙i(D)]|]=i𝔼[|𝐙i(D)𝔼[Zi(D)]|]=0\sum_{i}\mathbb{E}[|\mathbf{Z}_{-i}^{(D)}\mathbb{E}[Z^{(D)}_{i}|\mathbf{Z}_{-i}^{(D)}]|]=\sum_{i}\mathbb{E}[|\mathbf{Z}_{-i}^{(D)}\mathbb{E}[Z^{(D)}_{i}]|]=0. Therefore, we have that the truncated variables (Ωn(D))1/2Sn(D)𝒩(0,1){\left(\Omega_{n}^{(D)}\right)}^{-1/2}S_{n}^{(D)}\rightsquigarrow\mathcal{N}(0,1), and we would like to show the result in full generality, Ωn1/2Sn𝒩(0,1)\Omega_{n}^{-1/2}S_{n}\rightsquigarrow\mathcal{N}(0,1). To do that, we begin by writing

(3.1) SnΩn1/2=Sn(D)Ωn(D)1/2Ωn(D)1/2Ωn1/2+Sn(D¯)Ωn1/2\displaystyle\frac{S_{n}}{\Omega_{n}^{1/2}}=\frac{S_{n}^{(D)}}{{\Omega_{n}^{(D)}}^{1/2}}\frac{{\Omega_{n}^{(D)}}^{1/2}}{\Omega_{n}^{1/2}}+\frac{S_{n}^{(\overline{D})}}{{\Omega_{n}}^{1/2}}

where Sn(D¯)=in(ZiZi(D))inZi(D¯)S_{n}^{(\overline{D})}=\sum_{i}^{n}(Z_{i}-Z_{i}^{(D)})\equiv\sum_{i}^{n}Z_{i}^{(\overline{D})}. We will show that limDlimnΩn(D¯)Ωn=0\lim_{D\to\infty}\lim_{n\to\infty}\frac{\Omega_{n}^{(\overline{D})}}{\Omega_{n}}=0, where Ωn(D¯)=i,j=1cov(Zi(D¯),Zj(D¯))\Omega_{n}^{(\overline{D})}=\sum_{i,j=1}\mathrm{cov}(Z_{i}^{(\overline{D})},Z_{j}^{(\overline{D})}), and use this to control the second term in the right-hand-side of the decomposition given in expression (3.1) above, via Chebyshev’s inequality. Then, we will also show that limDlimnΩn(D)1/2Ωn1/2=1\lim_{D\to\infty}\lim_{n\to\infty}\frac{{\Omega_{n}^{(D)}}^{1/2}}{\Omega_{n}^{1/2}}=1, which will result in the weak convergence of the first term on the right-hand-side of (3.1) to 𝒩(0,1).\mathcal{N}(0,1). We start with

limDlimnΩn(D¯)Ωn\displaystyle\lim_{D\to\infty}\lim_{n\to\infty}\frac{\Omega_{n}^{(\overline{D})}}{\Omega_{n}} =limDlimni,jcov(Zi(D¯),Zj(D¯))i,jcov(Zi,Zj)\displaystyle=\lim_{D\to\infty}\lim_{n\to\infty}\frac{\sum_{i,j}\mathrm{cov}(Z_{i}^{(\overline{D})},Z_{j}^{(\overline{D})})}{\sum_{i,j}\mathrm{cov}(Z_{i},Z_{j})}
=limDlimni,jρ2D+2+|ij|k=0ρ2kq(1q)i,jρ|ij|k=0ρ2kq(1q)\displaystyle=\lim_{D\to\infty}\lim_{n\to\infty}\frac{\sum_{i,j}\rho^{2D+2+|i-j|}\sum_{k=0}^{\infty}\rho^{2k}q(1-q)}{\sum_{i,j}\rho^{|i-j|}\sum_{k=0}^{\infty}\rho^{2k}q(1-q)}
=limDlimni,jρ2D+2+|ij|i,jρ|ij|=0.\displaystyle=\lim_{D\to\infty}\lim_{n\to\infty}\frac{\sum_{i,j}\rho^{2D+2+|i-j|}}{\sum_{i,j}\rho^{|i-j|}}=0.

The final line comes from observing that from i,jρ2D+2+|ij|i,jρ|ij|{\sum_{i,j}\rho^{2D+2+|i-j|}}{\sum_{i,j}\rho^{|i-j|}}, the denominator i,jρ|ij|=Ω(n)\sum_{i,j}\rho^{|i-j|}=\Omega(n), i.e. there exists some constant cc such that i,jρ|ij|cn\sum_{i,j}\rho^{|i-j|}\geq cn. This is because i,jρ|ij|i(k=1n/2ρk)=i((1ρn/2)/(1ρ))nc\sum_{i,j}\rho^{|i-j|}\geq\sum_{i}(\sum_{k=1}^{n/2}\rho^{k})=\sum_{i}((1-\rho^{n/2})/(1-\rho))\geq nc for some constant c>0c>0. Additionally, for any fixed ϵ>0\epsilon>0, there exists a D>0D>0 such that the numerator i,jρ2D+2+|ij|=O(ϵn).\sum_{i,j}\rho^{2D+2+|i-j|}=O(\epsilon n).

Together with Chebyshev’s inequality, we have that

(3.2) limDlimn(|Sn(D¯)Ωn1/2|>ϵ)limDlimnΩn(D¯)ϵ2Ωn=0,\displaystyle\lim_{D\to\infty}\lim_{n\to\infty}\mathbb{P}\left(\left|\frac{S_{n}^{(\overline{D})}}{{\Omega_{n}}^{1/2}}\right|>\epsilon\right)\leq\lim_{D\to\infty}\lim_{n\to\infty}\frac{{\Omega_{n}^{(\overline{D})}}}{\epsilon^{2}\Omega_{n}}=0,

and this gives us convergence in probability of the second term in 3.1 to zero.

Next, we show that limDlimnΩn(D)1/2Ωn1/2=1\lim_{D\to\infty}\lim_{n\to\infty}\frac{{\Omega_{n}^{(D)}}^{1/2}}{\Omega_{n}^{1/2}}=1. We begin by considering

Ωn(D)Ωn\displaystyle\frac{\Omega_{n}^{(D)}}{\Omega_{n}} =i,jcov(Zi(D),Zj(D))i,jcov(Zi,Zj)=i,jρ|ij|k=0D|ij|ρ2kq(1q)i,jρ|ij|k=0ρ2kq(11)\displaystyle=\frac{\sum_{i,j}\mathrm{cov}(Z_{i}^{(D)},Z_{j}^{(D)})}{\sum_{i,j}\mathrm{cov}(Z_{i},Z_{j})}=\frac{\sum_{i,j}\rho^{|i-j|}\sum_{k=0}^{D-|i-j|}\rho^{2k}q(1-q)}{\sum_{i,j}\rho^{|i-j|}\sum_{k=0}^{\infty}\rho^{2k}q(1-1)}
=1i,jρ|ij|k=D|ij|+10ρ2ki,jρ|ij|k=0ρ2k\displaystyle=1-\frac{\sum_{i,j}\rho^{|i-j|}\sum_{k=D-|i-j|+1\vee 0}^{\infty}\rho^{2k}}{\sum_{i,j}\rho^{|i-j|}\sum_{k=0}^{\infty}\rho^{2k}}
(3.3) =1i,j:|ij|Dρ2D|ij|+2i,jρ|ij|i,j:|ij|>Dρ|ij|i,jρ|ij|\displaystyle=1-\frac{\sum_{i,j:|i-j|\leq D}\rho^{2D-|i-j|+2}}{\sum_{i,j}\rho^{|i-j|}}-\frac{\sum_{i,j:|i-j|>D}\rho^{|i-j|}}{\sum_{i,j}\rho^{|i-j|}}

We now consider the second term in the above expression, (3.2.2), i,j:|ij|Dρ2D|ij|+2/i,jρ|ij|{\sum_{i,j:|i-j|\leq D}\rho^{2D-|i-j|+2}}/{\sum_{i,j}\rho^{|i-j|}}. The lower bound i,jρ|ij|k=0ρ2k=Ω(n)\sum_{i,j}\rho^{|i-j|}\sum_{k=0}^{\infty}\rho^{2k}=\Omega(n) (i.e. there exists some constant cc such that i,jρ|ij|cn\sum_{i,j}\rho^{|i-j|}\geq cn). For a fixed ϵ>0\epsilon>0, there exists a D>0D>0 such that ρd<ϵ\rho^{d}<\epsilon for all dDd\geq D. Therefore, we have that i,jρ|ij|k=D|ij|+1ρ2k=O(nϵ)\sum_{i,j}\rho^{|i-j|}\sum_{k=D-|i-j|+1}^{\infty}\rho^{2k}=O(n\epsilon). Hence,

i,j:|ij|Dρ2D|ij|+2i,jρ|ij|=O(ϵ).\displaystyle\frac{\sum_{i,j:|i-j|\leq D}\rho^{2D-|i-j|+2}}{\sum_{i,j}\rho^{|i-j|}}=O(\epsilon).

Next, we consider the third term in 3.2.2

i,j:|ij|>Dρ|ij|i,jρ|ij|=i,jρD+1ρji,jρ|ij|.\displaystyle\frac{\sum_{i,j:|i-j|>D}\rho^{|i-j|}}{\sum_{i,j}\rho^{|i-j|}}=\frac{\sum_{i,j}\rho^{D+1}\rho^{j}}{\sum_{i,j}\rho^{|i-j|}}.

Once again, we have the lower bound i,jρ|ij|=Ω(n)\sum_{i,j}\rho^{|i-j|}=\Omega(n) (i.e. there exists some constant cc such that i,jρ|ij|cn\sum_{i,j}\rho^{|i-j|}\geq cn), and for a fixed ϵ>0\epsilon>0, there exists a D>0D>0 such that ρd<ϵ\rho^{d}<\epsilon for all dDd\geq D. Therefore, we have that i,jρD+1ρj=O(nϵ)\sum_{i,j}\rho^{D+1}\rho^{j}=O(n\epsilon), and hence,

i,j:|ij|>Dρ|ij|i,jρ|ij|=O(ϵ).\displaystyle\frac{\sum_{i,j:|i-j|>D}\rho^{|i-j|}}{\sum_{i,j}\rho^{|i-j|}}=O(\epsilon).

Putting all of this together gives us limDlimnΩn(D)1/2Ωn1/2=1\lim_{D\to\infty}\lim_{n\to\infty}\frac{{\Omega_{n}^{(D)}}^{1/2}}{{\Omega_{n}}^{1/2}}=1. Finally, we have that the first term in 3.1, Sn(D)Ωn(D)1/2Ωn(D)1/2Ωn1/2N(0,1)\frac{S_{n}^{(D)}}{{\Omega_{n}^{(D)}}^{1/2}}\frac{{\Omega_{n}^{(D)}}^{1/2}}{\Omega_{n}^{1/2}}\rightsquigarrow N(0,1) since limDlimnΩn(D)1/2Ωn1/2=1\lim_{D\to\infty}\lim_{n\to\infty}\frac{{\Omega_{n}^{(D)}}^{1/2}}{\Omega_{n}^{1/2}}=1. Therefore, together with 3.2, we have that SnΩn1/2𝒩(0,1)\frac{S_{n}}{\Omega_{n}^{1/2}}\rightsquigarrow\mathcal{N}(0,1), and the proof is complete.

3.3. Random Fields

3.3.1. Environment

This example nests many time series and spatial mixing models. Take the setting of Jenish and Prucha (2009), Theorem 1. Their setting has either ϕ\phi- or α\alpha- mixing in random fields, allowing for non-stationarity and asymptotically unbounded second moments. They treat real mean-zero random field arrays {Zi,n;iDnd,n}\{Z_{i,n};i\in D_{n}\subseteq\mathbb{R}^{d},n\in\mathbb{N}\}, where each pair of elements i,ji,j has some minimum distance ρ(i,j)ρ0>0\rho(i,j)\geq\rho_{0}>0, where ρ(i,j):=max1ld|iljl|\rho(i,j):=\max_{1\leq l\leq d}|i_{l}-j_{l}|, between them. At each point on the lattice, there is a real-valued random variable drawn, so p=1p=1. The authors assume (see their Assumptions 2 and 5) a version of uniform integrability that allows for asymptotically unbounded second moments, while maintaining that no single variance summand dominates by scaling Xi,n:=Zi,n/maxiDnci,nX_{i,n}:=Z_{i,n}/{\max_{i\in D_{n}}c_{i,n}} so that Xi,nX_{i,n} is uniformly integrable in L2L_{2}. They also assume (Assumption 3, restated below) conditions on the inverse function αinv\alpha_{inv} on mixing coefficients α\alpha (their Assumption 3) and ϕ\phi (their Assumption 4) together with the tail quantile functions Qi,nQ_{i,n} (where QX(u):=inf{x:FX(x)1u}Q_{X}(u):=\inf\{x:F_{X}(x)\geq 1-u\} where FXF_{X} is the cumulative distribution function for the random variable XX), requiring nice trade-off conditions between the two, such that under α\alpha-mixing decaying at a rate O(ρd+δ)O(\rho^{d+\delta}) for some δ>0\delta>0, m=1md1supnαk,l,n(ρ)<\sum_{m=1}^{\infty}m^{d-1}\sup_{n}\alpha_{k,l,n}(\rho)<\infty for all k+l4k+l\leq 4, and supnsupiDn01αinvd(u)Qi,n(u)𝑑u\sup_{n}\sup_{i\in D_{n}}\int_{0}^{1}\alpha^{d}_{inv}(u)Q_{i,n}(u)du tends to zero in the limit of upper quantiles. Restating the assumptions made in their paper:

  • Assumption 2: limksupnsupiDnE[|Zi,n/ci,n|2𝟏{|Zi,n/ci,n|>k}]=0\lim_{k\to\infty}\sup_{n}\sup_{i\in D_{n}}\mathrm{E}[|Z_{i,n}/c_{i,n}|^{2}\mathbf{1}\{|Z_{i,n}/c_{i,n}|>k\}]=0 for ci,n+c_{i,n}\in\mathbb{R}^{+}

  • Assumption 3: The following conditions must be satisfied by the α\alpha-mixing coefficients:

    1. (1)

      limksupnsupiDn01αinvd(u)(Q|Zi,n/ci,n|𝟏{Zi,n/ci,n>k})2𝑑u=0\lim_{k\to\infty}\sup_{n}\sup_{i\in D_{n}}\int_{0}^{1}\alpha_{inv}^{d}(u)\left(Q_{|Z_{i,n}/c_{i,n}|\mathbf{1}\{Z_{i,n}/c_{i,n}>k\}}\right)^{2}du=0

    2. (2)

      m=1md1supnαk,l,n(m)<\sum_{m=1}^{\infty}m^{d-1}\sup_{n}\alpha_{k,l,n}(m)<\infty for k+l4k+l\leq 4 where

      αk,l,n(r)=sup(αn(U,V),|U|k,|V|l,ρ(U,V)r)\alpha_{k,l,n}(r)=\sup(\alpha_{n}(U,V),|U|\leq k,|V|\leq l,\rho(U,V)\geq r)
    3. (3)

      supnα1,,n(m)=O(mdδ)\sup_{n}\alpha_{1,\infty,n}(m)=O\left(m^{-d-\delta}\right)

  • Assumption 4: The following conditions must be satisfied by the ϕ\phi-mixing coefficients:

    1. (1)

      m=1md1ϕ¯1,11/2(m)<\sum_{m=1}^{\infty}m^{d-1}\overline{\phi}_{1,1}^{1/2}(m)<\infty

    2. (2)

      m=1md1ϕ¯k,l(m)<\sum_{m=1}^{\infty}m^{d-1}\overline{\phi}_{k,l}(m)<\infty for k+14k+1\leq 4

    3. (3)

      ϕ¯1,(m)=𝒪(mdϵ)\overline{\phi}_{1,\infty}(m)=\mathcal{O}(m^{-d-\epsilon}) for some ϵ>0\epsilon>0

  • Assumption 5: liminfn|Dn|1Mn2σn2>0\lim\inf_{n\to\infty}|D_{n}|^{-1}{M_{n}}^{-2}\sigma_{n}^{2}>0

3.3.2. Application of Theorem 1

In the following, we assume that the Zi,nZ_{i,n}s have bounded second moments (otherwise, we can replace them with their scaled versions (see above), and the results should go through under bounded third and fourth moments). Here, for any ϵ>0\epsilon>0, we take 𝒜in={j:ρ(i,j)Ki(ϵn)}\mathcal{A}_{i}^{n}=\{j:\rho(i,j)\leq K^{i}(\epsilon_{n})\}, where KiK^{i} is a non-increasing function. That is, we pick Ki(ϵn)K^{i}(\epsilon_{n}) to be large enough, and this can be decided by understanding the cumulative distribution function of the random variables.

By Assumption 3 (and Proposition B.10), and Lemma B.1 (a) in Jenish and Prucha (2009), we know that for any kik\neq i such that ρ(i,k)Ki(ϵn)\rho(i,k)\geq K^{i}(\epsilon_{n}), we have that for some constant CC,

(3.4) C0α¯1,1(ρ(i,k))Qi,n2(u)𝑑uϵn.\displaystyle C\int_{0}^{\overline{\alpha}_{1,1}(\rho(i,k))}Q_{i,n}^{2}(u)du\leq\epsilon_{n}.

In particular, we note that we can pick Ki(ϵn)K^{i}(\epsilon_{n}) by observing that ϵn\epsilon_{n} above satisfies O(1/ρ(i,k)r)O(1/\rho(i,k)^{r}) for r>1.r>1.

Taking ϵn=ω(1ρ0dnγ)\epsilon_{n}=\omega(\frac{1}{\rho_{0}^{d}n^{\gamma}}) allows control of the size of the affinity sets. Indeed, via a packing number calculation, we see that while this allows Ki(ϵn)K^{i}(\epsilon_{n}) to grow with nn, it grows more slowly than nn. Specifically, taking ϵn=ω(1ρ0dnγ)\epsilon_{n}=\omega\left(\frac{1}{\rho_{0}^{d}n^{\gamma}}\right) for any 0<γ<10<\gamma<1, and since δ>0\delta>0 together with their Assumption 3, we have,

(Ki(ϵn)ρ0)d=((1/ϵn)(d+δ)ρ0)d<(1ϵnρ0d)=o(n)\displaystyle\left(\frac{K^{i}(\epsilon_{n})}{\rho_{0}}\right)^{d}=\left(\frac{(1/\epsilon_{n})^{-(d+\delta)}}{\rho_{0}}\right)^{d}<\left(\frac{1}{\epsilon_{n}\rho_{0}^{d}}\right)=o(n)

Now, we verify that our key conditions are satisfied in this setting. We write K:=maxiKi(ϵn)K:=\max_{i}K^{i}(\epsilon_{n}), and first, we check Assumption 1:

i;j,k𝒜inE[|Zi,n|Zj,nZk,n]\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i,n}|Z_{j,n}Z_{k,n}] =O(K2iE[|Zi,n|3])=O(K2n)=o(n3/2K3/2)=o(Ωn3/2)\displaystyle=O(K^{2}\sum_{i}\mathrm{E}[|Z_{i,n}|^{3}])=O(K^{2}n)=o(n^{3/2}K^{3/2})=o\left(\Omega_{n}^{3/2}\right)

since Ωn3/2=(ij𝒜inE[ZiZj])3/2.\Omega_{n}^{3/2}=(\sum_{i}\sum_{j\in\mathcal{A}_{i}^{n}}\mathrm{E}[Z_{i}Z_{j}])^{3/2}. The second inequality holds by the algorithm-geometric mean inequality. The remaining argument relies on rearranging the summations and using the growth rate of KK, i.e. K=o(n)K=o(n) in the third equality, as defined above.

Next, we check that Assumption 2 is satisfied using similar arguments and assuming bounded fourth moments:

i,j;k𝒜in,l𝒜jncov(Zi,nZk,n,Zj,nZl,n)\displaystyle\sum_{i,j;k\in\mathcal{A}_{i}^{n},l\in\mathcal{A}_{j}^{n}}\mathrm{cov}(Z_{i,n}Z_{k,n},Z_{j,n}Z_{l,n}) =O(i,j:|ij|K,k:|ki|K,l:|li|2KE(Zi,n4))=O(nK3E(Zi,n4))=o(Ωn2)\displaystyle=O(\sum_{\mathclap{\begin{subarray}{c}i,j:|i-j|\leq K,\\ k:|k-i|\leq K,\\ l:|l-i|\leq 2K\end{subarray}}}\mathrm{E}(Z_{i,n}^{4}))=O(n\cdot K^{3}\cdot\mathrm{E}(Z_{i,n}^{4}))=o(\Omega_{n}^{2})

where the first equality comes from the covariance terms within the affinity sets dominating those with outside the affinity sets (see the following verification of Assumption 3).

Finally, we check Assumption 3. Together with 3.4, taking ϵn=ω(1ρ0dnγ)\epsilon_{n}=\omega(\frac{1}{\rho_{0}^{d}n^{\gamma}}) where γ=1β\gamma=1-\beta for arbitrarily small β>0\beta>0, we have, nK1+1/(d+δ)=o(1)\frac{n}{K^{1+1/(d+\delta)}}=o(1). Thus, defining the random variable ξi,i\xi_{i,-i} such that ξi,i=1\xi_{i,-i}=1 if E(Zi,n|𝐙i,n)>0\mathrm{E}(Z_{i,n}|\mathbf{Z}_{-i,n})>0, and ξi,i=1\xi_{i,-i}=-1 otherwise just as in Rio (2013), we have

iE(|E(Zi,n𝐙i,n|𝐙i,n)|)\displaystyle\sum_{i}\mathrm{E}(\left|\mathrm{E}(Z_{i,n}\mathbf{Z}_{-i,n}|\mathbf{Z}_{-i,n})\right|) =ij𝒜icov(ξi,iZj,n,Zi,n)\displaystyle=\sum_{i}\sum_{j\notin\mathcal{A}_{i}}\mathrm{cov}(\xi_{i,-i}Z_{j,n},Z_{i,n})
2ij𝒜i0α¯(ρ(i,j))Qi,n(u)Qj,n(u)𝑑u\displaystyle\leq 2\sum_{i}\sum_{j\notin\mathcal{A}_{i}}\int_{0}^{\overline{\alpha}(\rho(i,j))}Q_{i,n}(u)Q_{j,n}(u)du
=O(n(nK)ϵn)=O(n(nK)K1/(2+δ))=o(nK)=o(Ωn),\displaystyle=O(n(n-K)\epsilon_{n})=O(n(n-K)K^{-1/(2+\delta)})=o(nK)=o\left(\Omega_{n}\right),

by Rio’s inequality Rio (1993).

3.4. Dependency Graphs and Chen and Shao (2004)

Next, we consider dependency graphs. There is an undirected, unweighted graph GG with dependency neighborhoods Ni:={j:Gij=1}N_{i}:=\{j:\ G_{ij}=1\} such that ZiZ_{i} is independent of all ZjZ_{j} for jNij\notin N_{i} (Baldi and Rinott, 1989; Chen and Shao, 2004; Ross, 2011). Let 𝒜in={j:Gij=1}\mathcal{A}_{i}^{n}=\{j:\ G_{ij}=1\}. Denote the maximum cardinality of these to be DnD_{n}. In Ross (2011) (see Theorem 3.6), together with a bounded fourth moment assumption, we see that the conditions there imply the conditions here. Indeed, we see that

i;j,k𝒜inE[|Zi|ZjZk]\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}\left[|Z_{i}|Z_{j}Z_{k}\right] D2i=1nE[Zi3],\displaystyle\leq D^{2}\sum_{i=1}^{n}\mathrm{E}\left[Z_{i}^{3}\right],

and for Ωn1/2SnN(0,1)\Omega_{n}^{-1/2}S_{n}\operatorname*{\rightsquigarrow}N(0,1), we need D2i=1nE[Zi3]o(Ωn3/2)D^{2}\sum_{i=1}^{n}\mathrm{E}\left[Z_{i}^{3}\right]\leq o\left(\Omega_{n}^{3/2}\right) in Ross (2011) (Theorem 3.6), and hence Assumption 1 is satisfied. Similarly,

i,i;j𝒜(i,n),j𝒜incov(ZiZj,ZiZj)=O(D3i=1nE[Zi4]),\displaystyle\sum_{\mathclap{i,i^{\prime};j\in\mathcal{A}\left(i,n\right),j^{\prime}\in\mathcal{A}_{i}^{n}}}\mathrm{cov}\left(Z_{i}Z_{j},Z_{i^{\prime}}Z_{j^{\prime}}\right)=O(D^{3}\sum_{i=1}^{n}\mathrm{E}[Z_{i}^{4}]),

and for Ωn1/2SnN(0,1)\Omega_{n}^{-1/2}S_{n}\operatorname*{\rightsquigarrow}N(0,1), we need D3i=1nE[Zi4]o((Ωn)2)D^{3}\sum_{i=1}^{n}\mathrm{E}[Z_{i}^{4}]\leq o\left(\left(\Omega_{n}\right)^{2}\right) in Ross (2011) (Theorem 3.6), so Assumption 2 is satisfied. Assumption 3 holds by definition of the dependency neighborhoods.

Now, we consider Chen and Shao (2004). In particular, we consider their weakest assumption, LD1: Given index set \mathcal{I}, for any ii\in\mathcal{I}, there exists an AiA_{i}\in\mathcal{I} such that XiX_{i} and XAiCX_{A_{i}^{C}} are independent. The affinity sets can be defined by the complement of the independence sets. So 𝒜in={j:Zj is not independent of Zi}\mathcal{A}^{n}_{i}=\{j:Z_{j}\text{ is not independent of }Z_{i}\}, which is similar to the dependency graphs setting. The goals of their paper are different. They develop finite-sample Berry-Esseen bounds, with bounded pp-th moments, where 2<p32<p\leq 3. This is different from our approach. In our paper, we focus on covariance conditions in the asymptotics, and collect relatively more dependent sets along the triangular array.

4. Applications

4.1. Peer Effects Models

4.1.1. Environment

We now turn to an example of treatment effects with spillovers. Consider a setting in which units in a network are assigned treatment status Ti{0,1}T_{i}\in\{0,1\}, as in Aronow and Samii (2017). The network is a graph, 𝒢\mathcal{G}, consisting of individuals (nodes) and connections (edges). For now, we consider the case in which treatment assignments are independent across nodes. However, there are spillovers in treatment effects determined by the topology of the network where treatment status within one’s network neighborhood may influence one’s own outcome - for instance whether a friend is vaccinated affects a person’s chance of being exposed to a disease.

Rather than being arbitrary, Aronow and Samii (2017) consider an exposure function ff that takes on one of KK finite values; i.e., f(Ti;T1:n,𝒢){d1,,dK}f(T_{i};T_{1:n},\mathcal{G})\in\{d_{1},\ldots,d_{K}\}. An estimand of the average causal effect is of the form

τ(dk,dl)=1niyi(dk)1nyi(dl)\tau(d_{k},d_{l})=\frac{1}{n}\sum_{i}y_{i}(d_{k})-\frac{1}{n}\sum y_{i}(d_{l})

where dkd_{k} and dld_{l} are the induced exposures under the treatment vectors. The Horvitz and Thompson estimator from Horvitz and Thompson (1952) provides:

τ^HT(dk,dl)=1ni𝟏{Di=dk}yi(dk)πi(dk)1n𝟏{Di=dl}yi(dl)πi(dl)\displaystyle\hat{\tau}_{HT}(d_{k},d_{l})=\frac{1}{n}\sum_{i}\mathbf{1}\{D_{i}=d_{k}\}\frac{y_{i}(d_{k})}{\pi_{i}(d_{k})}-\frac{1}{n}\sum\mathbf{1}\{D_{i}=d_{l}\}\frac{y_{i}(d_{l})}{\pi_{i}(d_{l})}

where πi(dk)\pi_{i}(d_{k}) is the probability that that node ii receives exposure dkd_{k} over all treatments.

The challenge is that the treatment effects are not independent across subjects. Let Ni,:N_{i,:} denote a dummy vector of whether jj is in ii’s neighborhood: let Nij=1N_{ij}=1 when Gij=1G_{ij}=1 with the convention Nii=0N_{ii}=0. Aronow and Samii (2017) consider an empirical study with K=4K=4 that are: (i) only ii is treated in their neighborhood (d1=Ti1{T1:nNi,:=0}d_{1}=T_{i}\cdot 1\{T_{1:n}^{\prime}N_{i,:}=0\}), (ii) at least one is treated in ii’s neighborhood and ii is treated (d2=Ti1{T1:nNi,:>0}d_{2}=T_{i}\cdot 1\{T_{1:n}^{\prime}N_{i,:}>0\}), (iii) ii is not treated but some member of the neighborhood is (d3=(1Ti)1{T1:nNi,:>0}d_{3}=(1-T_{i})\cdot 1\{T_{1:n}^{\prime}N_{i,:}>0\}), and (iv) neither ii nor any neighbor is treated (d4=(1Ti)j:Nij=1(1Tj)d_{4}=(1-T_{i})\cdot\prod_{j:\ N_{ij}=1}(1-T_{j})). We show that our result allows for a more generalized setting.

4.1.2. Application of Theorem 1

To obtain consistency and asymptotic normality Aronow and Samii (2017) assume a covariance restriction of local dependence (Condition 5) and apply Chen and Shao (2004) to prove the result. Namely, their restriction is that there is a dependency graph HH (with entries Hij{0,1}H_{ij}\in\{0,1\}) with degree that is uniformly bounded by some integer mm independent of nn. That is, jHijm\sum_{j}H_{ij}\leq m for every ii. This setting is much more restrictive than our conditions, especially as there can exist indirect correlation in choices as effects propagate or diffuse through the graph. We can work with larger real exposure values, and in settings concentrating the mass of influence in a neighborhood while allowing from spillovers from everywhere. This is important to allow for in centrality-based diffusion models, SIR models, and financial flow networks, since the the spillovers in these settings are less restricted than the sparse dependency graph in their Condition 5.

Indeed, we can even allow the dependency graph to be a complete graph, as long as the correlations between the nodes in this dependency graph satisfy our Assumptions 1-3. That is, we can handle cases where, for a given treatment assignment, each node has nn real exposure conditions for which the exposure conditions of the whole graph can be well approximated by simple functions, where small perturbations to any node in large regions of small correlations do not substantially perturb the outcomes in these regions (i.e., across affinity sets) while perturbations of the same size in any region of larger correlations (i.e., within affinity sets) can cause significant changes in the outcomes in that region. One can think of the “shorter” monotonic regions in the simple function to be over affinity sets, and “longer” monotonic regions to be across different affinity sets. One can think about monotonically non-decreasing functions, for instance, in epidemic spread settings where any increase in the “treatment” cannot decrease the number of infected nodes.

To take an example, let the true exposure of ii be given by ei(T1:n)e_{i}(T_{1:n}). Then consider a case where ei(T1:n+)ei(T1:n)0e_{i}(T_{1:n}^{+})-e_{i}(T_{1:n})\geq 0, where T1:n+T_{1:n}^{+} indicates an increase in any element j[n]j\in[n] from T1:nT_{1:n}. This is a structure that would happen naturally in a setting with diffusion. The potential outcome for ii given treatment assignment is assumed to be yi(ei)y_{i}(e_{i}).

In practice, for parsimony and ease exposures are often binned. So consider the problem where the 2n2^{n} possible exposures ei(T1:n)e_{i}(T_{1:n}) can be approximated by KK well-separated “effective” exposures {d1,d2,,dK}\{d_{1},d_{2},...,d_{K}\} where |didj|>δ|d_{i}-d_{j}|>\delta for any i,j{1,2,,K}i,j\in\{1,2,...,K\} and some δ>0\delta>0, and for any r{1,2,,K}r\in\{1,2,...,K\}, i,j{1,2,,2n}i,j\in\{1,2,...,2^{n}\}, we have, ei(T1:n),ej(T1:n)dre_{i}(T_{1:n}),e_{j}(T_{1:n})\in d_{r} if and only if |ei(T1:n)ej(T1:n)|<δ|e_{i}(T_{1:n})-e_{j}(T_{1:n})|<\delta and we have yi(ei)y_{i}(e_{i}) smooth in its argument for every ii.

Then, following the above, the researcher’s target estimand is the average causal effect switching between two exposure bins,

τ(dk,dl)=1ni1{ei(T1:n)dk}yi(ei)1ni1{ei(T1:n)dl}yi(ei).\tau(d_{k},d_{l})=\frac{1}{n}\sum_{i}1\{e_{i}(T_{1:n})\in d_{k}\}y_{i}(e_{i})-\frac{1}{n}\sum_{i}1\{e_{i}(T_{1:n})\in d_{l}\}y_{i}(e_{i}).

The estimator for this estimand cannot directly be shown to be asymptotically normally distributed using the prior literature. It is ruled out by Condition 5 in Aronow and Samii (2017) which uses Chen and Shao (2004). However, it is straightforward to apply our result.

An example of this is a sub-critical diffusion process with randomly selected set of nodes MnM_{n} being assigned some treatment, and every other node is subsequently infected with some probability. We provide two examples which speak to this in Subsections 4.3 and 4.4.

4.2. Covariance Estimation using Socio-economic Distances

4.2.1. Environment

One application of mixing random fields is to use them to develop covariance matrices for estimators (e.g., Driscoll and Kraay (1998); Bester et al. (2011); Barrios et al. (2012); Cressie (2015)). Here we consider the example of Conley and Topa (2002), which builds on Conley (1999). Essentially, their approach is to parameterize the characteristics (observable or unobservable) of units that drive correlation in shocks by the Euclidean metric, as we further describe below. This, however, rules out examples that are common in practice that include (discrete) characteristics with no intrinsic ordering driving degrees of correlation. For instance, correlational structures across race, ethnicity, caste, occupation, and so on, are not readily accommodated in the framework. For a concrete example, correlations between ethnicities eie_{i} and eje_{j} for units i,ji,j that are parametrized by peiejp_{e_{i}e_{j}} in an unstructured manner are ruled out. Like this, many of these examples only admit partial orderings, if that. Yet these are important, practical considerations in applied work. The results in our Theorem 1 allows an intuitively nice treatment of such cases. Our discussion below also applies to combinations of temporal (and possibly cross-sectional) dependence as in Driscoll and Kraay (1998). Our conditions also provide consistent estimators for covariance matrices of moment conditions for parameters of interest in the GMM setting, under full-rank conditions of expected derivatives Conley (1999), since the author uses the CLT from Bolthausen (1982) under stationary random fields which is generalized above.

In Conley (1999), the model is that the population lives in a Euclidean space (taken to be 2\mathbb{R}^{2} for the purposes of exposition), with each individual ii at location sis_{i}. Each location has an associated random field Xsi.X_{s_{i}}. Conley (1999) obtains the limiting distribution of parameter estimates bb of βB\beta\in B, where BdB\subset\mathbb{R}^{d} is a compact subset and β\beta is the unique solution to E[g(Xsi;β)]\mathrm{E}[g(X_{s_{i}};\beta)] for a moment function gg. Conley (1999) lists sufficient conditions on the moment function to imply consistent estimation of the expected derivatives and having full rank:

  1. a)

    for all bBb\in B, the derivative Dg(Xsi;b)Dg(X_{s_{i}};b) with respect to bb is measurable, continuous on BB for all XkX\in\mathbb{R}^{k}, and first-moment continuous.

  2. b)

    E[Dg(Xsi;b)]<\mathrm{E}[Dg(X_{s_{i}};b)]<\infty and is of full-rank.

  3. c)

    s2cov(g(X0;β),g(Xs;β))\sum_{s\in\mathbb{Z}^{2}}\mathrm{cov}(g(X_{0};\beta),g(X_{s};\beta)) corresponding to sampled locations XsX_{s} is a non-singular matrix.

In addition to the sufficient conditions on the expected derivatives above, we list the remaining sufficient conditions on the random field XsX_{s} itself used by Conley (1999) to obtain the limiting distribution of parameter estimates through the GMM; and these are nested in the conditions from Jenish and Prucha (2009), with the addition of bounded 2+δ2+\delta-moment of g(Xs;β)||g(X_{s};\beta)||:

  1. (1)

    m=1mαk,l(m)<\sum_{m=1}^{\infty}m\alpha_{k,l}(m)<\infty for k+l4k+l\leq 4

  2. (2)

    α1,(m)=o(m2)\alpha_{1,\infty}(m)=o(m^{-2})

  3. (3)

    for some δ>0\delta>0, E[g(Xs;β)]2+δ<\mathrm{E}[||g(X_{s};\beta)||]^{2+\delta}<\infty and m=1m(αk,l(m))δ/(2+δ)<\sum_{m=1}^{\infty}m(\alpha_{k,l}(m))^{\delta/(2+\delta)}<\infty.

In Conley and Topa (2002), the authors develop consistent covariance estimators, using these conditions, combining different distance metrics including physical distance as well as ethnicity (or occupation, for another example) distance in L2L_{2} at an aggregate level (using census tracts data). In particular, the authors use indicator vectors to encode ethnicities (or occupations), and take the Euclidean distance from aggregated (at the census tract level) indicator vectors, as a measure of these ethnic/occupational distances. They use the Euclidean metric to write a “race and ethnicity” distance between census tracts ii and jj,

Dij=k=19(eikejk)2,\displaystyle D_{ij}=\sqrt{\sum_{k=1}^{9}(e_{ik}-e_{jk})^{2}},

where the sum is taken over nine ethnicities/races, indexed by kk, defined by the authors.

The use of indicator vectors and Euclidean distance results people of different race/ethnicity groups being in orthogonal groups (with a fixed addition of Euclidean distance 2\sqrt{2} between any pairs of different race/ethnicity groups). To apply in practice, one would often need to allow for varying degrees of pairwise correlation for each pair of race/ethnicity groups. Additionally, even if the correlation induced by physical distance vanishes, it may be of interest to us to maintain correlation arising from interactions within and between ethnicity groups, where being of similar ethnic groups may induce nontrivial transfer of information between people despite being physically located large distances apart. It is not too difficult to see that the indicator vector formulation above does not allow for this, since in the case of a pair of distinct ethnicities with high correlation, the formulation in Conley and Topa (2002) would require a correlation of zero.

4.2.2. Application of Theorem 1

Consider random variables ZiZ_{i} and ZjZ_{j}, whose correlation we can decompose into components of physical distance and racial/ethnic distance (just as in Conley and Topa (2002)). It is direct to see how our work above takes care of the physical distance component, and so we turn to the remaining distance component. For this, for instance, one could consider the pairwise interaction probabilities, pei,ejp_{e_{i},e_{j}} characterizing the correlation between ethnicity eie_{i} of ii, and ethnicity eje_{j} of jj. Our affinity set structure then allows us to incorporate this correlation structure. That is, one can construct an affinity set 𝒜in={j:ρ(i,j)Ki(ϵ/2), or pei,ej1ϵ/2}\mathcal{A}_{i}^{n}=\{j:\rho(i,j)\leq K^{i}(\epsilon/2),\text{ or }p_{e_{i},e_{j}}\geq 1-\epsilon/2\} with KiK^{i} defined just as in Subsection 3.3. Following our previous section, our generalization holds.

Attempts to (non-parametrically) estimate covariance in the cross-section often leverage a time or distance structure. For example, Driscoll and Kraay (1998) assume a mixing condition on a random field such that the correlation between shocks ϵit\epsilon_{it} and ϵj,ts\epsilon_{j,t-s} tends to zero as ss\rightarrow\infty. This allows for reasonably agnostic cross sectional correlational structures but requires it to be temporally invariant and studies T1/2T^{1/2} asymptotics. Although such an assumption applies in certain contexts, there are many socio-economic contexts in which it does do not apply and yet our theorem can be applied. We provide two such examples.

For instance, in simple models of migration with migration cost, there is often persistence in how shocks to incentives to migrate in some areas affect populations in other areas. Nonetheless, there are very particular correlation patterns because, as an example, ethnic groups migrate to specific places based on existing populations, and so affinity sets are driven by the places to which a given ethnic group might consider moving and our central limit theorem can then be applied, provided our conditions on affinity sets apply.

Another example comes from social interaction. Individuals interact with others in small groups that experience correlated shocks which correlate grouped individuals’ behaviors and beliefs. Each group involves only a tiny portion of the population, and any given person interacts in a series of groups over time. Thus individuals’ behaviors or beliefs are correlated with others with whom they have interacted; but without any natural temporal or spatial structure. Each person has an affinity set composed of the groups (classes, teams, and so on) that they have been part of. People may also have their own idiosyncratic shocks to behaviors and beliefs. In this example again, our central limit theorem applies despite the lack of any spatial or temporal structure. One does not need to know the affinity sets, just to know that each person’s affinity group is appropriately small relative to the population.

4.3. Sub-Critical Diffusion Models

4.3.1. Environment

A finite-time SIR diffusion process occurs on a sequence of unweighted and undirected graphs GnG_{n}. A first-infected set, or seed set MnM_{n}, of size mnm_{n}, of random nodes with treatment indicated Wi{0,1}W_{i}\in\{0,1\}, are seeded (set to have Wi=1W_{i}=1) at t=0t=0 and in period t=1t=1 each infects each of its network neighbors, {j:Gij,n=1}\{j:\ G_{ij,n}=1\} i.i.d. with probability qnq_{n}. The seeds then are no longer active in infecting others. In period t=2t=2 each of the nodes infected at period t=1t=1 infects each of its network neighbors who were never previously infected i.i.d. with probability qnq_{n}. The process continues for TnT_{n} periods. Let Xin{0,1}X_{i}^{n}\in\{0,1\} be a binary indicator of whether ii was ever infected throughout the process.

Assume that the sequence of SIR models under study, (Gn,qn,Tn,Wn)(G_{n},q_{n},T_{n},W_{n}) have mnm_{n}\rightarrow\infty (with αn:=mn/n=o(1)\alpha_{n}:=m_{n}/n=o(1)), qn0q_{n}\rightarrow 0, and TnT_{n}\rightarrow\infty (with Tndiam(Gn)T_{n}\geq\text{diam}(G_{n}) at each nn), and are such that the process is sub-critical. Since the number of periods is at least as large as the diameter, it guarantees that for a connected GnG_{n}, cov(Xin,Xjn)>0\mathrm{cov}(X_{i}^{n},X_{j}^{n})>0 for each i,ji,j.

The statistician may be interested in a number of quantities. For instance, the unknown parameter qnq_{n} may be of interest. Suppose E[Ψi(Xi;qn,W1:n)]=0\mathrm{E}[\Psi_{i}(X_{i};q_{n},W_{1:n})]=0 is a (scalar) moment condition satisfied only at the true parameter qnq_{n} given known seeding W1:nW_{1:n}. The ZZ-estimator (or GMM) derives from the empirical analog, setting iΨi(Xi;q^,W1:n)=0.\sum_{i}\Psi_{i}(X_{i};\hat{q},W_{1:n})=0.

By a standard expansion argument

(q^qn)={iqΨi(Xi;q~,W1:n)}1×iΨi(Xi;qn,W1:n)+op(n1/2).\displaystyle(\hat{q}-q_{n})=-\left\{\sum_{i}\nabla_{q}\Psi_{i}(X_{i};\tilde{q},W_{1:n})\right\}^{-1}\times\sum_{i}\Psi_{i}(X_{i};q_{n},W_{1:n})+o_{p}(n^{-1/2}).

To study the asymptotic normality of the estimator, we need to study

1var(iΨi)iΨi(Xi;qn,W1:n),\frac{1}{\sqrt{\mathrm{var}\left(\sum_{i}\Psi_{i}\right)}}\sum_{i}\Psi_{i}(X_{i};q_{n},W_{1:n}),

which involves developing affinity sets for each Ψi\Psi_{i}. For simplicity we consider the case where the estimator may directly work with X1:nnX_{1:n}^{n}. Letting Zi=XinE(Xin)Z_{i}=X_{i}^{n}-\mathrm{E}(X_{i}^{n}) be the de-meaned outcome and we want to show that

1ij𝒜incov(Zin,Zjn)iZin𝒩(0,1).\displaystyle\frac{1}{\sqrt{\sum_{i}\sum_{j\in\mathcal{A}_{i}^{n}}\mathrm{cov}(Z_{i}^{n},Z_{j}^{n})}}\sum_{i}Z_{i}^{n}\operatorname*{\rightsquigarrow}\mathcal{N}(0,1).

Under sub-criticality, a vanishing share of nodes are infected from a single seed. Without sub-criticality, most of the graph can have a nontrivial probability of being infected and accurate inference cannot be made with a single network. Let us define

jn:={i:P(Xin=1jMn,mn=1)>ϵn}.\displaystyle\mathcal{B}_{j}^{n}:=\{i:\mathrm{P}(X_{i}^{n}=1\mid j\in M_{n},m_{n}=1)>\epsilon_{n}\}.

Then jn\mathcal{B}_{j}^{n} is the set of nodes for which, if jj is the only seed, the probability of being infected in the process is at least ϵn\epsilon_{n}. As noted above, in a sub-critical process |jn|=o(n)\left|\mathcal{B}_{j}^{n}\right|=o(n) for every jj, not necessarily uniformly in jj. For simplicity assume that there is a sequence ϵn0\epsilon_{n}\rightarrow 0 such that this holds uniformly (otherwise, one can simply consider sums). Let βn:=supj|jn|/n\beta_{n}:=\sup_{j}\left|\mathcal{B}_{j}^{n}\right|/n which tends to zero, such that αnβn=o(n)\alpha_{n}\beta_{n}=o(n).

Next, we assume that the rate at which infections happen within the affinity set is higher than outside of it, and the share of seeds is sufficiently high and affinity sets are large enough to lead to many small infection outbursts but not so large as to infect the whole network. That is, there exists some jnjn\mathcal{B}_{j}^{n}{}^{\prime}\subset\mathcal{B}_{j}^{n} such that |jn|=Θ(|jn|)|\mathcal{B}_{j}^{n}{}^{\prime}|=\Theta(|\mathcal{B}_{j}^{n}|) and P(Xi=1|jMn,mn=1)γn\mathrm{P}(X_{i}=1|j\in M_{n},m_{n}=1)\geq\gamma_{n} with γn/ϵn\gamma_{n}/\epsilon_{n}\rightarrow\infty, such that αn3=O(γn)\alpha_{n}^{3}=O(\gamma_{n}), and βn=O(γn)\beta_{n}=O(\gamma_{n}). This would apply to, for instance, targeted advertisements or promotions that lead to local spread of information about a product, but that does not go viral. None of the prior examples such as random fields and dependency graphs cover this case since since all XinX_{i}^{n} are correlated. We now show that Theorem 1 applies to this case.

4.3.2. Application of Theorem 1

Let us define the affinity sets 𝒜in=in\mathcal{A}_{i}^{n}=\mathcal{B}_{i}^{n}. Next, consider a random seed kk. Let i,j,k:={abn:a,b{i,j,k},ab}\mathcal{E}_{i,j,k}:=\{a\notin\mathcal{B}_{b}^{n}:\ a,b\in\{i,j,k\},\ a\neq b\} denote the event that none of the nodes are in each others’ affinity sets. It is clear that P(i,j,k)1\mathrm{P}(\mathcal{E}_{i,j,k})\rightarrow 1, since |an|=o(n)\left|\mathcal{B}_{a}^{n}\right|=o(n) for a{i,j,k}a\in\{i,j,k\} and that seeds are uniformly randomly chosen.

If we look at an affinity set, it is sufficient to just look at the variance components and check it is a higher order of magnitude

ivar(Zin)=n×(1(1βn)2mn)γn2=O(n2×αnβn×γn2).\sum_{i}\mathrm{var}(Z_{i}^{n})=n\times(1-(1-\beta_{n})^{2m_{n}})\gamma_{n}^{2}=O(n^{2}\times\alpha_{n}\beta_{n}\times\gamma_{n}^{2}).

Now to check Assumption 1, we compute:

i;j,k𝒜inE[|Zi|ZjZk]\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i}|Z_{j}Z_{k}] =i;j,k𝒜in,j𝒜kn or k𝒜jnE[|Zi|ZjZk]+i;j,k𝒜in,𝒜jn,𝒜knE[|Zi|ZjZk]\displaystyle=\sum_{i;j,k\in\mathcal{A}_{i}^{n},j\notin\mathcal{A}_{k}^{n}\text{ or }k\notin\mathcal{A}_{j}^{n}}\mathrm{E}[|Z_{i}|Z_{j}Z_{k}]+\sum_{i;j,k\in\mathcal{A}_{i}^{n},\mathcal{A}_{j}^{n},\mathcal{A}_{k}^{n}}\mathrm{E}[|Z_{i}|Z_{j}Z_{k}]
n3βnϵn2+nαnβnϵn.\displaystyle\approx n^{3}\beta_{n}\epsilon_{n}^{2}+n\alpha_{n}\beta_{n}\epsilon_{n}.

Thus, we have

n6αn3βn3γn6n6βn3ϵn6=αn3γn6ϵn6,\frac{n^{6}\alpha_{n}^{3}\beta_{n}^{3}\gamma_{n}^{6}}{n^{6}\beta_{n}^{3}\epsilon_{n}^{6}}=\frac{\alpha_{n}^{3}\gamma_{n}^{6}}{\epsilon_{n}^{6}}\to\infty,

which is satisfied. The probability that no seed is in any other affinity set

P(kMni,j,k)=(1βn)2mn12mnβn=12n×αnβn.\mathrm{P}(\cap_{k\in M_{n}}\mathcal{E}_{i,j,k})=(1-\beta_{n})^{2m_{n}}\approx 1-2m_{n}\beta_{n}=1-2n\times\alpha_{n}\beta_{n}.

This puts an intuitive restriction on the number of seeds and percolation size as a function of nn. Next, we verify Assumption 2. We have,

i;j,k𝒜in,r𝒜kncov(ZinZjn,ZknZrn)\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n},r\in\mathcal{A}_{k}^{n}}\mathrm{cov}(Z_{i}^{n}Z_{j}^{n},Z_{k}^{n}Z_{r}^{n}) =O(i;j,k𝒜in;rAkn,Aincov(ZinZjnZknZrn))\displaystyle=O(\sum_{i;j,k\in\mathcal{A}_{i}^{n};r\in A_{k}^{n},\notin A_{i}^{n}}\mathrm{cov}(Z_{i}^{n}Z_{j}^{n}Z_{k}^{n}Z_{r}^{n}))
=O(n2×(n1)(1βn)βnϵn4)=O(n3βnϵn4).\displaystyle=O(n^{2}\times(n-1)(1-\beta_{n})\beta_{n}\epsilon_{n}^{4})=O(n^{3}\beta_{n}\epsilon_{n}^{4}).

Therefore, (n4αn2βn2γn4)/(n3βnϵn4)=nαnβn(γnϵn)4{(n^{4}\alpha_{n}^{2}\beta_{n}^{2}\gamma_{n}^{4})}/{(n^{3}\beta_{n}\epsilon_{n}^{4})}=n\alpha_{n}\beta_{n}(\frac{\gamma_{n}}{\epsilon_{n}})^{4}\to\infty, is satisfied. We next verify Assumption 3. Given the event i,j,k\mathcal{E}_{i,j,k}, we can bound the conditional covariance, cov(Zin,Zjni,j,k)=O(ϵn2)\mathrm{cov}(Z_{i}^{n},Z_{j}^{n}\mid\mathcal{E}_{i,j,k})=O(\epsilon_{n}^{2}), by bounding the probabilities of two contagions. So then

i,j:j𝒜inkMncov(Zin,Zjni,j,k)P(i,j,k)\displaystyle\sum_{i,j:\ j\notin\mathcal{A}_{i}^{n}}\sum_{k\in M_{n}}\mathrm{cov}(Z_{i}^{n},Z_{j}^{n}\mid\mathcal{E}_{i,j,k})\mathrm{P}(\mathcal{E}_{i,j,k}) =Ci,j:j𝒜in(12nαnβn)cov(Zin,Zjni,j,k)\displaystyle=C\sum_{i,j:\ j\notin\mathcal{A}_{i}^{n}}(1-2n\alpha_{n}\beta_{n})\cdot\mathrm{cov}(Z_{i}^{n},Z_{j}^{n}\mid\mathcal{E}_{i,j,k})
(1αn)n×((1αn)n1)(1βn)×(12nαnβn)ϵn2\displaystyle\approx(1-\alpha_{n})\cdot n\times((1-\alpha_{n})\cdot n-1)(1-\beta_{n})\times(1-2n\alpha_{n}\beta_{n})\cdot\epsilon_{n}^{2}

for some constant C>0C>0 fixed in nn. Keeping orders we have

i,j:i𝒜jnkMncov(Zin,Zjni,j,k)P(i,j,k)=O((nϵn)2).\sum_{i,j:\ i\notin\mathcal{A}_{j}^{n}}\sum_{k\in M_{n}}\mathrm{cov}(Z_{i}^{n},Z_{j}^{n}\mid\mathcal{E}_{i,j,k})\mathrm{P}(\mathcal{E}_{i,j,k})=O((n\epsilon_{n})^{2}).

Since i,j:i𝒜jnkMncov(E[Zini,j,k],E[Zjni,j,k])=0\sum_{i,j:\ i\notin\mathcal{A}_{j}^{n}}\sum_{k\in M_{n}}\mathrm{cov}(\mathrm{E}[Z_{i}^{n}\mid\mathcal{E}_{i,j,k}],\mathrm{E}[Z_{j}^{n}\mid\mathcal{E}_{i,j,k}])=0, we have, i,j:i𝒜jnkMncov(Zin,Zjn)=O((nϵn)2)\sum_{i,j:\ i\notin\mathcal{A}_{j}^{n}}\sum_{k\in M_{n}}\mathrm{cov}(Z_{i}^{n},Z_{j}^{n})=O((n\epsilon_{n})^{2}) and (n2γn2αnβn)/(n2ϵn2)=(γn2αnβn)/(ϵn2),{(n^{2}\gamma_{n}^{2}\alpha_{n}\beta_{n}})/({n^{2}\epsilon_{n}^{2})}=({\gamma_{n}^{2}\alpha_{n}\beta_{n}})/({\epsilon_{n}^{2}})\rightarrow\infty, is also satisfied.

4.4. Diffusion in Stochastic Block Models

4.4.1. Environment

A SIR diffusion process occurs on a sequence of unweighted and undirected networks, as in the previous example, except that the network has a block structure as generated by a standard stochastic block model (Holland et al. (1983)). The nn nodes are partitioned into knk_{n} blocks, where block sizes are equal, or within one of each other. With probability pnin(0,1)p^{in}_{n}\in(0,1) links are formed inside blocks, and with probability pnac(0,1)p^{ac}_{n}\in(0,1) they are formed across blocks, independently. Let qnq_{n} be the probability of infection, as in the last example. Inside link probabilities are large enough for percolation within blocks: pninnknqn>>log(nkn)p^{in}_{n}\frac{n}{k_{n}}q_{n}>>\log\left(\frac{n}{k_{n}}\right). Across link probabilities are small enough for vanishing probabilities of contagion across blocks, even if all other blocks are infected: pnacn2qn<<1p^{ac}_{n}n^{2}q_{n}<<1. The infections are seeded with kn/2k_{n}/2 seeds. With probability going to 1, all nodes in the blocks with the seeds will be infected and no others. There is a correlation going to 1 of infection status of nodes within the blocks, which are the affinity sets; and there is correlation of infection status going to 0 across blocks, but it is always positive. If knk_{n} is bounded, then a central limit theorem fails. If knk_{n} grows without bound (while allowing n/knn/k_{n}\rightarrow\infty so that blocks are large), then the central limit theorem holds.

4.4.2. Application of Theorem 1

Given the previous examples, we simply sketch the application. The affinity set of node ii, 𝒜in\mathcal{A}_{i}^{n}, is the block in which it resides. Letting σn2\sigma_{n}^{2} denote var(Zi)\mathrm{var}(Z_{i}), it follows that Ωnnnknσn2\Omega_{n}\approx n\frac{n}{k_{n}}\sigma_{n}^{2}. Then the first assumption is satisfied noting that if knk_{n}\rightarrow\infty, then

i;j,k𝒜inE[|Zi|ZjZk]=O(i;j,k𝒜inE[|Zi|3])=O(n(nkn)2E[|Zi|3])=o(Ωn3/2).\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i}|Z_{j}Z_{k}]=O(\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i}|^{3}])=O\left(n\left(\frac{n}{k_{n}}\right)^{2}\mathrm{E}[|Z_{i}|^{3}]\right)=o(\Omega_{n}^{3/2}).

Verification of the second assumption comes from noting that if knk_{n}\rightarrow\infty then

i;j,k𝒜in,r𝒜kncov(ZinZjn,ZknZrn)\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n},r\in\mathcal{A}_{k}^{n}}\mathrm{cov}(Z_{i}^{n}Z_{j}^{n},Z_{k}^{n}Z_{r}^{n}) =\displaystyle= O(i;j,k,r𝒜incov(ZinZjn,ZknZrn))\displaystyle O\left(\sum_{i;j,k,r\in\mathcal{A}_{i}^{n}}\mathrm{cov}(Z_{i}^{n}Z_{j}^{n},Z_{k}^{n}Z_{r}^{n})\right)
=\displaystyle= O(n(nkn)3E[Zi4])=o(Ωn2).\displaystyle O\left(n\left(\frac{n}{k_{n}}\right)^{3}\mathrm{E}[Z_{i}^{4}]\right)=o(\Omega_{n}^{2}).

Next we check the third assumption. Let εn=cov(Zin,Zjn)\varepsilon_{n}=\mathrm{cov}(Z_{i}^{n},Z_{j}^{n}) for i,ji,j in different blocks (ignoring the approximation due to the fact that blocks may be of slightly different sizes). Note that εn=cov(Zin,Zjn)\varepsilon_{n}=\mathrm{cov}(Z_{i}^{n},Z_{j}^{n}) is on the order of contagion across blocks, which is pnacqn(n2/kn2)=o(1/kn2)p^{ac}_{n}q_{n}({n^{2}}/{k_{n}^{2}})=o({1}/{k_{n}^{2}}). Both blocks could also be infected by some other nodes, which happens of order at most n2(pacqn(n/kn))2n^{2}(p^{ac}q_{n}({n}/{k_{n}}))^{2}, which is also of order o(1/kn2)o({1}/{k_{n}^{2}}). If knk_{n} grows without bound

iE(|E[Zi𝐙i|𝐙i]|)=O(n2kn1knεn)=o(n2knσn2).\sum_{i}\mathrm{E}\left(|\mathrm{E}[Z_{i}\mathbf{Z}_{-i}|\mathbf{Z}_{-i}]|\right)=O\left(n^{2}\frac{k_{n}-1}{k_{n}}\varepsilon_{n}\right)=o\left(\frac{n^{2}}{k_{n}}\sigma^{2}_{n}\right).

In this example, not only do the assumptions fail if knk_{n} is bounded, but also the conclusion of the theorem fails to hold as well; so the conditions are tight in that sense.

4.5. Spatial Process with Irregular Observations and Matérn Covariance

4.5.1. Environment

Finally, we turn to an example of neural network models for geospatial data. Specifically, we look at the environment of Zhan and Datta (2023). The authors propose a neural network generalized least squares process (NN-GLS) with the dependency in the residuals modeled by a Matérn covariance function, described below. Their paper is the first to demonstrate consistency for the NN-GLS estimator in this setting.

Consider a spatial process model, Yi=f0(Xi)+ε(si)Y_{i}=f_{0}(X_{i})+\varepsilon(s_{i}), where XikX_{i}\in\mathbb{R}^{k} is a vector of characteristics and the residuals correspond to observations at locations s1,..,sns_{1},..,s_{n} in 2\mathbb{R}^{2}. Let f0()f_{0}(\cdot) be a continuous function and define ε(si)\varepsilon(s_{i}) as a Gaussian Process with covariance function Σ(si,sj)=C(si,sj)+τ2δ(si=sj)\Sigma(s_{i},s_{j})=C(s_{i},s_{j})+\tau^{2}\delta(s_{i}=s_{j}) for some τ2>0\tau^{2}>0 and δ\delta is the indicator function. Here C(si,sj)=C(sisj2)=C(h2)C(s_{i},s_{j})=C(||s_{i}-s_{j}||_{2})=C(||h||_{2}), where

C(h2)=σ221ν(2ϕh2)νΓ(ν)𝒦ν(2ϕh2)C(||h||_{2})=\sigma^{2}\frac{2^{1-\nu}(\sqrt{2}\phi||h||_{2})^{\nu}}{\Gamma(\nu)}\mathcal{K}_{\nu}(\sqrt{2}\phi||h||_{2})

is the Matérn covariance function, with modified Bessel function of the second kind 𝒦ν()\mathcal{K}_{\nu}(\cdot). We consider the setting in Zhan and Datta (2023) (Proposition 1) where C(h2)=o(h2(2+κ))C\left(||h||_{2}\right)=o\left({||h||_{2}}^{-(2+\kappa)}\right) for some κ>0\kappa>0.

The NN-GLS fits a system of multi-layered perceptrons via the L2L_{2} loss function and the authors prove consistency under some assumptions including, in particular, restrictions on the spectral radius of a sparse approximation of the covariance function. That is covered by an assumption of minimum distance h0>0h_{0}>0 separation of locations si,sjs_{i},s_{j} above, where iji\neq j. Previous work characterizes the asymptotic properties, including asymptotic normality of the neural network estimators, in the case of independent and identically distributed shocks (Shen et al., 2019). Zhan and Datta (2023) extend this result by modeling dependency using the Matérn covariance function.

4.5.2. Application of Theorem 1

We create affinity sets using the same restrictions as presented in Zhan and Datta (2023). Reflecting the duality of spatial distance and dependence, we construct affinity sets such that the maximal separation in the affinity set has implications for the maximum covariance between random variables. Specifically, take the affinity sets to be defined as 𝒜in:={j:sisj2<K(ϵn)}\mathcal{A}_{i}^{n}:=\{j:||s_{i}-s_{j}||_{2}<K(\epsilon_{n})\} where |cov(Zi,n,Zj,n)|ϵn|\mathrm{cov}(Z_{i,n},Z_{j,n})|\leq\epsilon_{n} for sisj2>K(ϵn)||s_{i}-s_{j}||_{2}>K(\epsilon_{n}). Using  Zhan and Datta (2023)’s restriction on the amount of dependence associated with the distance in space, namely C(h2)=o(h2(2+κ))C\left(||h||_{2}\right)=o\left({||h||_{2}}^{-(2+\kappa)}\right) for some κ>0\kappa>0, we can solve for the appropriate K(ϵn)K(\epsilon_{n}). Specifically, if we know that the covariance in Zhan and Datta (2023) is asymptotically bounded by h2(2+κ){||h||_{2}}^{-(2+\kappa)}, we can set this equal to ϵn\epsilon_{n} and solve for the appropriate distance. After the resulting algebra we take K(ϵn)=1/(ϵn2+κ)K(\epsilon_{n})=1/(\epsilon_{n}^{2+\kappa}).

Until now, we have defined distances that give affinity sets that contain the bulk of the dependence. We also need to ensure that, under the setup in Zhan and Datta (2023), these sets are not too large that they violate our assumptions. To do this, we take ϵn=ω(1h02nγ)\epsilon_{n}=\omega(\frac{1}{h_{0}^{2}n^{\gamma}}) for 0<γ<10<\gamma<1 and h0h_{0} as the minimum separation distance, defined above. Using a packing number calculation, we see that while this allows K(ϵn)K(\epsilon_{n}) to grow with nn, it grows more slowly than nn. Specifically, we have,

(K(ϵn)h0)2=((1/ϵn)(2+κ)h0)2<(1ϵnh02)=o(n).\displaystyle\Big{(}\frac{K(\epsilon_{n})}{h_{0}}\Big{)}^{2}=\Big{(}\frac{(1/\epsilon_{n})^{-(2+\kappa)}}{h_{0}}\Big{)}^{2}<\Big{(}\frac{1}{\epsilon_{n}h_{0}^{2}}\Big{)}=o(n).

This logic generalizes to dimensions d1d\geq 1, taking ϵn=ω(1h0dnγ)\epsilon_{n}=\omega(\frac{1}{h_{0}^{d}n^{\gamma}}) for 0<γ<10<\gamma<1. Using this construction and assuming bounded third and fourth moments, we check that our Assumptions 1-3 apply.

Letting K:=K(ϵ)K:=K(\epsilon), we show Assumption 1 holds since

i;j,k𝒜inE[|Zi,n|Zj,nZk,n]\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i,n}|Z_{j,n}Z_{k,n}] \displaystyle\leq i;j,k𝒜inE[|Zi,n||Zj,n|Zk,n|]\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i,n}||Z_{j,n}|Z_{k,n}|]
\displaystyle\leq i;j,k𝒜in(13E[|Zi,n|3]+13E[|Zj,n|3]+13E[|Zk,n|3])\displaystyle\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\left(\frac{1}{3}\mathrm{E}[|Z_{i,n}|^{3}]+\frac{1}{3}\mathrm{E}[|Z_{j,n}|^{3}]+\frac{1}{3}\mathrm{E}[|Z_{k,n}|^{3}]\right)
=\displaystyle= O(i;j,k𝒜inE[|Zi,n|3])\displaystyle O\left(\sum_{i;j,k\in\mathcal{A}_{i}^{n}}\mathrm{E}[|Z_{i,n}|^{3}]\right)
=\displaystyle= O(K2iE[|Zi,n|3])=O(K2n)=o(n3/2K3/2)=o(Ωn3/2).\displaystyle O(K^{2}\sum_{i}\mathrm{E}[|Z_{i,n}|^{3}])=O(K^{2}n)=o(n^{3/2}K^{3/2})=o\left(\Omega_{n}^{3/2}\right).

The second inequality holds by the algorithm-geometric mean inequality. The remaining argument relies on rearranging the summations and using the growth rate of KK, i.e. K=o(n)K=o(n) in the fourth equality, as defined above based on the conditions required by Zhan and Datta (2023).The last equality follows since Ωn3/2=(ij𝒜inE[ZiZj])3/2.\Omega_{n}^{3/2}=(\sum_{i}\sum_{j\in\mathcal{A}_{i}^{n}}\mathrm{E}[Z_{i}Z_{j}])^{3/2}.

We check that Assumption 2 is satisfied using similar arguments and relying on an assumption of finite fourth moment:

i,j;k𝒜in,l𝒜jncov(Zi,nZk,n,Zj,nZl,n)\displaystyle\sum_{i,j;k\in\mathcal{A}_{i}^{n},l\in\mathcal{A}_{j}^{n}}\mathrm{cov}(Z_{i,n}Z_{k,n},Z_{j,n}Z_{l,n}) =\displaystyle= O(i,j:|sisj|K,k:|sksi|K,l:|slsi|2Kvar(Zi,n2))=O(nK3E(Zi,n4))=o(n2K2)=o(Ωn2).\displaystyle O(\sum_{\mathclap{\begin{subarray}{c}i,j:|s_{i}-s_{j}|\leq K,\\ k:|s_{k}-s_{i}|\leq K,l:|s_{l}-s_{i}|\leq 2K\end{subarray}}}\mathrm{var}(Z_{i,n}^{2}))=O(n\cdot K^{3}\cdot\mathrm{E}(Z_{i,n}^{4}))=o(n^{2}K^{2})=o(\Omega_{n}^{2}).

The first equality comes from the construction of affinity sets such that the covariance terms within the affinity sets dominate those with outside the affinity sets (additional details in Assumption 3). The remaining equalities use similar arguments to the Assumption above.

Assumption 3 follows from taking ϵn=ω(1/(h0dnγ))\epsilon_{n}=\omega({1}/({h_{0}^{d}n^{\gamma}})) where γ=1β\gamma=1-\beta for arbitrarily small β>0.\beta>0. Indeed, taking ϵn\epsilon_{n} as such, we have, n/(K1+1/(2+κ))=o(1){n}/({K^{1+1/(2+\kappa)}})=o(1), and thus,

iE(|𝐙i,nE(Zi,n|𝐙i,n)|)\displaystyle\sum_{i}\mathrm{E}(|\mathbf{Z}_{-i,n}\mathrm{E}(Z_{i,n}|\mathbf{Z}_{-i,n})|) =O(iϵn2E(|𝐙i,nZi,n|))\displaystyle=O\left(\sum_{i}\frac{\epsilon_{n}}{2}\mathrm{E}(|\mathbf{Z}_{-i,n}Z_{i,n}|)\right)
O(ij𝒜iϵnE(|Zj,nZi,n|))\displaystyle\leq O\left(\sum_{i}\sum_{j\notin\mathcal{A}_{i}}\epsilon_{n}\mathrm{E}(|Z_{j,n}Z_{i,n}|)\right)
=O(ij𝒜iϵn01QZ(u)2𝑑u)\displaystyle=O\left(\sum_{i}\sum_{j\notin\mathcal{A}_{i}}\epsilon_{n}\int_{0}^{1}Q_{Z}(u)^{2}du\right)
=O(n(nK)ϵn)=O(n(nK)K1/(2+κ))=o(nK)=o(Ωn),\displaystyle=O(n(n-K)\epsilon_{n})=O(n(n-K)K^{-1/(2+\kappa)})=o(nK)=o\left(\Omega_{n}\right),

where the first line above is obtained by observing that for jointly Gaussian random variables X1,X2X_{1},X_{2}, we can write 𝔼[X2|X1]=μ2+cov(X1,X2)var(X1)(X1μ1)\mathbb{E}[X_{2}|X_{1}]=\mu_{2}+\frac{\mathrm{cov}(X_{1},X_{2})}{\mathrm{var}(X_{1})}(X_{1}-\mu_{1}), where 0<var(X1)<0<\mathrm{var}(X_{1})<\infty due to the eigenvalues of the covariance matrix being uniformly bounded in nn.

5. Discussion

We have provided an organizing principle for modeling dependency and obtaining a central limit theorem: affinity sets. It allows for non-zero correlation across all random vectors in the triangular array and places focus on correlations within and across sets. These conditions are intuitive and we illustrate their use through some practical applications for applied research. In some cases, as in several of our applied examples, our result is needed as previous conditions do not apply.

We now reflect on settings that our theorem does not cover. For example, the martingale central limit theorem (e.g., Billingsley (1961); Ibragimov (1963); Hall and Heyde (2014) among others) is not covered by our theorem, without modification. It admits nontrivial unconditional correlation between all variables, but relies on other structural properties to deduce the result. In fact, proofs of the martingale central limit theorem did not appeal to Stein’s method, until Röllin (2018). By combining Stein and Lindeberg methods, Röllin (2018) develops a shorter proof but did not find a direct proof using the Stein technique alone. Some biased processes that do not fall under the martingale umbrella can still generate a central limit theorem if they satisfy the covariance structure that we have provided.

References

  • Andrews [1984] D. W. Andrews. Non-strong mixing autoregressive processes. Journal of Applied Probability, 21(4):930–934, 1984.
  • Aronow and Samii [2017] P. M. Aronow and C. Samii. Estimating average causal effects under general interference, with application to a social network experiment. Annals of Applied Statistics, 11(4):1912–1947, 2017. doi: 10.1214/16-AOAS1005.
  • Arratia et al. [1989] R. Arratia, L. Goldstein, and L. Gordon. Two moments suffice for poisson approximations: the chen-stein method. The Annals of Probability, pages 9–25, 1989.
  • Baldi and Rinott [1989] P. Baldi and Y. Rinott. On normal approximations of distributions in terms of dependency graphs. The Annals of Probability, pages 1646–1650, 1989.
  • Barrios et al. [2012] T. Barrios, R. Diamond, G. W. Imbens, and M. Kolesár. Clustering, spatial correlations, and randomization inference. Journal of the American Statistical Association, 107(498):578–591, 2012.
  • Bester et al. [2011] C. A. Bester, T. G. Conley, and C. B. Hansen. Inference with dependent data using cluster covariance estimators. Journal of Econometrics, 165(2):137–151, 2011.
  • Billingsley [1961] P. Billingsley. The lindeberg-levy theorem for martingales. Proceedings of the American Mathematical Society, 12(5):788–792, 1961.
  • Biscio et al. [2018] C. A. N. Biscio, A. Poinas, and R. Waagepetersen. A note on gaps in proofs of central limit theorems. Statistics & Probability Letters, 135:7–10, 2018.
  • Bolthausen [1982] E. Bolthausen. On the central limit theorem for stationary mixing random fields. Annals of Probability, 10:1047–1050, 1982.
  • Bradley [2005] R. C. Bradley. Basic properties of strong mixing conditions. a survey and some open questions. Probability Surveys, 2:107–144, 2005. doi: 10.1214/154957805100000104.
  • Bradley [2007] R. C. Bradley. Introduction to strong mixing conditions, volume 1. Kendrick Press, 2007.
  • Chandrasekhar and Jackson [2024] A. G. Chandrasekhar and M. O. Jackson. A network formation model based on subgraphs. Review of Economic Studies, forthcoming, 2024.
  • Chen [1975] L. H. Chen. Poisson approximation for dependent trials. The Annals of Probability, 3(3):534–545, 1975.
  • Chen and Shao [2004] L. H. Chen and Q.-M. Shao. Normal approximation under local dependence. The Annals of Probability, 32(3):1985–2028, 2004.
  • Conley [1999] T. G. Conley. GMM estimation with cross sectional dependence. Journal of Econometrics, 92(1):1–45, 1999.
  • Conley and Topa [2002] T. G. Conley and G. Topa. Socio-economic distance and spatial patterns in unemployment. Journal of Applied Econometrics, 17(4):303–327, 2002.
  • Cramér and Wold [1936] H. Cramér and H. Wold. Some theorems on distribution functions. Journal of the London Mathematical Society, 1(4):290–294, 1936.
  • Cressie [2015] N. Cressie. Statistics for spatial data. John Wiley & Sons, 2015.
  • Driscoll and Kraay [1998] J. C. Driscoll and A. C. Kraay. Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics, 80(4):549–560, 1998.
  • Froot [1989] K. A. Froot. Consistent covariance matrix estimation with cross-sectional dependence and heteroskedasticity in financial data. Journal of Financial and Quantitative analysis, 24(3):333–355, 1989.
  • Goldstein and Rinott [1996] L. Goldstein and Y. Rinott. Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability, pages 1–17, 1996.
  • Hall and Heyde [2014] P. Hall and C. C. Heyde. Martingale limit theory and its application. Academic press, 2014.
  • Hoff et al. [2002] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002.
  • Holland et al. [1983] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
  • Horvitz and Thompson [1952] D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260):663–685, 1952.
  • Ibragimov [1963] I. Ibragimov. A central limit theorem for a class of dependent random variables. Theory of Probability & Its Applications, 8(1):83–89, 1963.
  • Jenish and Prucha [2009] N. Jenish and I. R. Prucha. Central limit theorems and uniform laws of large numbers for arrays of random fields. Journal of Econometrics, 150(1):86–98, 2009.
  • Lubold et al. [2023] S. Lubold, A. G. Chandrasekhar, and T. H. McCormick. Identifying the latent space geometry of network models through analysis of curvature. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2):240–292, 2023.
  • Rinott and Rotar [2000] Y. Rinott and V. Rotar. Normal approximations by stein’s method. Decisions in Economics and Finance, 23:15–29, 2000.
  • Rio [1993] E. Rio. Covariance inequalities for strongly mixing processes. Annales de l’I.H.P. Probabilités et statistiques, 29(4):587–597, 1993. doi: 10.1016/S0246-0203(93)80028-X.
  • Rio [2013] E. Rio. Inequalities and limit theorems for weakly dependent sequences. 2013.
  • Rio [2017] E. Rio. Weakly dependent sequences: theory and applications. Springer, 2017. doi: 10.1007/978-3-319-54235-7.
  • Röllin [2018] A. Röllin. On quantitative bounds in the mean martingale central limit theorem. Statistics & Probability Letters, 138:171–176, 2018.
  • Romano and Wolf [2000] J. P. Romano and M. Wolf. A more general central limit theorem for m-dependent random variables with unbounded m. Statistics & probability letters, 47(2):115–124, 2000.
  • Ross [2011] N. Ross. Fundamentals of stein’s method. Probability Surveys, 8:210–293, 2011.
  • Shen et al. [2019] X. Shen, C. Jiang, L. Sakhanenko, and Q. Lu. Asymptotic properties of neural network sieve estimators. arXiv preprint arXiv:1906.00875, 2019.
  • Stein [1986] C. Stein. Approximate computation of expectations. Lecture Notes-Monograph Series, 7:i–164, 1986.
  • Zhan and Datta [2023] W. Zhan and A. Datta. Neural networks for geospatial data. arXiv preprint arXiv:2304.09157, 2023.

Appendix

Proof of Central Limit Theorem 1 and Corollary 1

Recall that Ωn\Omega_{n} is a p×pp\times p matrix with entries

Ωn,dd:=i=1n(j,d)𝒜(i,d)ncov(Zi,d,Zj,d).\displaystyle\Omega_{n,dd^{\prime}}:=\sum_{i=1}^{n}\sum_{(j,d^{\prime})\in\mathcal{A}^{n}_{(i,d)}}\mathrm{cov}\left(Z_{i,d},Z_{j,d^{\prime}}\right).

We start with the case p=1p=1, reproducing elements of the proof to Theorem 2 in Chandrasekhar and Jackson [2024]. Let Ωn:=i=1nj𝒜incov(Zi,Zj).\Omega_{n}:=\sum_{i=1}^{n}\sum_{j\in\mathcal{A}^{n}_{i}}\mathrm{cov}\left(Z_{i},Z_{j}\right).

The proof uses Stein’s lemma from Stein [1986].

Lemma .1 (Stein [1986], Ross [2011]).

If YY is a random variable and ZZ has the standard normal distribution, then

dW(Y,Z)sup{f:f,f′′2,f2/π}|E[f(Y)Yf(Y)]|.d_{W}(Y,Z)\leq\sup_{\{f:||f||,||f^{\prime\prime}||\leq 2,||f^{\prime}||\leq\sqrt{2/\pi}\}}\left|\mathrm{E}[f^{\prime}(Y)-Yf(Y)]\right|.

Further dK(Y,Z)(2/π)1/4(dW(Y,Z))1/2.d_{K}(Y,Z)\leq(2/\pi)^{1/4}(d_{W}(Y,Z))^{1/2}.

By this lemma, if we show that a normalized sum of random variables satisfies

sup{f:f,f′′2,f2/π}|E[f(S¯n)S¯nf(S¯n)]|0,\sup_{\{f:||f||,||f^{\prime\prime}||\leq 2,||f^{\prime}||\leq\sqrt{2/\pi}\}}\left|\mathrm{E}[f^{\prime}(\overline{S}^{n})-\overline{S}^{n}f(\overline{S}^{n})]\right|\rightarrow 0,

then dW(S¯n,Z)0d_{W}(\overline{S}^{n},Z)\rightarrow 0, and so it must be asymptotically normally distributed.

The following lemmas are useful in the proof.

Lemma .2 (Chandrasekhar and Jackson [2024], Lemma B.2).

A solution to maxhE[Zh(Y)] s.t. |h|1\max_{h}\mathrm{E}[Zh(Y)]\mbox{ s.t. }|h|\leq 1 (where hh is measurable) is h(Y)=sign(E[Z|Y]),h(Y)={\rm sign}(\mathrm{E}[Z|Y]), where we break ties, setting sign(E[Z|Y])=1{\rm sign}(\mathrm{E}[Z|Y])=1 when E[Z|Y]=0\mathrm{E}[Z|Y]=0.

Lemma .3 (Chandrasekhar and Jackson [2024], Lemma B.3).

E[XYh(Y)]\mathrm{E}[XYh(Y)] when h()h(\cdot) is measurable and bounded by 2π\sqrt{\frac{2}{\pi}} satisfies

E[XYh(Y)]2πE[XYsign(E[X|Y]Y)].\mathrm{E}[XYh(Y)]\leq\sqrt{\frac{2}{\pi}}\mathrm{E}\left[XY\cdot{\rm sign}(\mathrm{E}[X|Y]Y)\right].
Lemma .4 (Cramér-Wold Device, Cramér and Wold [1936]).

The sequence {Xn}n\{X_{n}\}_{n} of random vectors of dimension pp\in\mathbb{N} weakly converges to the random vector XpX\in\mathbb{R}^{p}, as nn\to\infty, if and only if for any cpc\in\mathbb{R}^{p},

cXncX\displaystyle c^{\intercal}X_{n}\operatorname*{\rightsquigarrow}c^{\intercal}X

as n.n\to\infty.

Proof of Theorem 1. The p=1p=1 case is Theorem 2 in Chandrasekhar and Jackson [2024]. So we reproduce a sketch to provide intuition. By Lemma .1, it is sufficient to show that the appropriate sequence of random variables S¯n\overline{S}^{n} satisfies

sup{f:f,f′′2,f2/π}|E[f(S¯n)S¯nf(S¯n)]|0.\sup_{\{f:||f||,||f^{\prime\prime}||\leq 2,||f^{\prime}||\leq\sqrt{2/\pi}\}}\left|\mathrm{E}[f^{\prime}(\overline{S}^{n})-\overline{S}^{n}f(\overline{S}^{n})]\right|\rightarrow 0.

Let

Si:=j𝒜inZj and S¯i:=Si/Ωn1/2.S_{i}:=\sum_{j\notin\mathcal{A}_{i}^{n}}Z_{j}\mbox{ and }\overline{S}_{i}:=S_{i}/\Omega_{n}^{1/2}.

Let S¯n=Sn/Ωn1/2.\overline{S}^{n}=S^{n}/\Omega_{n}^{1/2}.

We consider

(.1) |E[f(S¯n)S¯nf(S¯n)]|\displaystyle|\mathrm{E}[f^{\prime}(\overline{S}^{n})-\overline{S}^{n}f(\overline{S}^{n})]|

for all ff such that f,f′′2,f2/π||f||,||f^{\prime\prime}||\leq 2,||f^{\prime}||\leq\sqrt{2/\pi}. Observe that

(.2) E[S¯f(S¯)]\displaystyle\mathrm{E}\left[\overline{S}f\left(\overline{S}\right)\right] =E[1Ωn1/2inZif(S¯)]=E[1Ωn1/2inZi(f(S¯)f(S¯i))]+E[1Ωn1/2inZif(S¯i)].\displaystyle=\mathrm{E}\left[\frac{1}{\Omega_{n}^{1/2}}\sum_{i}^{n}Z_{i}\cdot f\left(\overline{S}\right)\right]=\mathrm{E}\left[\frac{1}{\Omega_{n}^{1/2}}\sum_{i}^{n}Z_{i}\left(f\left(\overline{S}\right)-f\left(\overline{S}_{i}\right)\right)\right]+\mathrm{E}\left[\frac{1}{\Omega_{n}^{1/2}}\sum_{i}^{n}Z_{i}\cdot f\left(\overline{S}_{i}\right)\right].

We first consider the second term above. By a first-order Taylor approximation of f(S¯)f\left(\overline{S}\right) about f(0)f(0), and observing E[Zi]=0\mathrm{E}[Z_{i}]=0 and the triangle inequality, we get an upper bound:

|E[Ωn1/2inZif(S¯i)]|\displaystyle\left|\mathrm{E}\left[\Omega_{n}^{-1/2}\sum_{i}^{n}Z_{i}\cdot f\left(\overline{S}_{i}\right)\right]\right| |E[Ωn1/2inZif(0)]|+|E[Ωn1/2inZi(S¯i)f(S~i)]|\displaystyle\leq\left|\mathrm{E}\left[\Omega_{n}^{-1/2}\sum_{i}^{n}Z_{i}\cdot f\left(0\right)\right]\right|+\left|\mathrm{E}\left[\Omega_{n}^{-1/2}\sum_{i}^{n}Z_{i}\cdot(\overline{S}_{i})f^{\prime}\left(\tilde{S}_{i}\right)\right]\right|

where S~i\tilde{S}_{i} is an intermediate term between 0 and S¯i\overline{S}_{i}. The first term is zero since E[Zi]=0\mathrm{E}[Z_{i}]=0. We upper bound the second term by applying Lemma .3.

|E[Ωn1/2inZi(Ωn1/2j𝒜inZj)f(S~i)]|\displaystyle\left|\mathrm{E}\left[\Omega_{n}^{-1/2}\sum_{i}^{n}Z_{i}\cdot\left(\Omega_{n}^{-1/2}\sum_{j\notin\mathcal{A}_{i}^{n}}Z_{j}\right)f^{\prime}\left(\tilde{S}_{i}\right)\right]\right| |Ωn12/πE[iZi𝐙isign(E[Zi𝐙i]𝐙i)]|\displaystyle\leq\left|\Omega_{n}^{-1}\sqrt{2/\pi}\cdot\mathrm{E}\left[\sum_{i}Z_{i}\mathbf{Z}_{-i}\cdot\mathrm{sign}(\mathrm{E}[Z_{i}\mid\mathbf{Z}_{-i}]\mathbf{Z}_{-i})\right]\right|
=|Ωn12/πE[iE[Zi𝐙i𝐙i]]|\displaystyle=\left|\Omega_{n}^{-1}\sqrt{2/\pi}\cdot\mathrm{E}\left[\sum_{i}\mid\mathrm{E}\left[Z_{i}\mathbf{Z}_{-i}\mid\mathbf{Z}_{-i}\right]\mid\right]\right|
=|Ωn12/πE[i𝐙iE[Zi𝐙i]]|\displaystyle=\left|\Omega_{n}^{-1}\sqrt{2/\pi}\cdot\mathrm{E}\left[\sum_{i}\mid\mathbf{Z}_{-i}\mathrm{E}\left[Z_{i}\mid\mathbf{Z}_{-i}\right]\mid\right]\right|

By Assumption 3, we have that the upper bound is o(1)o(1). This kind of bound is generated by the fact that, in principle, any two random variables can be correlated for a given nn.

Therefore, (.2) is simply

E[1Ωn1/2inZi(f(S¯)f(S¯i))]+o(1).\displaystyle\mathrm{E}\left[\frac{1}{\Omega_{n}^{1/2}}\sum_{i}^{n}Z_{i}\left(f\left(\overline{S}\right)-f\left(\overline{S}_{i}\right)\right)\right]+o(1).

Now, by plugging this expression into (.1) we have an upper bound by using the triangle inequality (we follow a reasoning similar to that in Ross [2011]). Follow this by a second-order Taylor series approximation and a bound on the derivatives of ff and the use of the Cauchy-Schwarz inequality, together with Assumptions 1 and 2. In both of these pieces, rather than relying on conditional independence and applying the arithmetic-geometric mean inequality to write conditions in terms of moment restrictions, which cannot be used with the general dependency structure, we collect covariance terms.

Therefore, we have shown the convergence S¯nN(0,1)\overline{S}^{n}\operatorname*{\rightsquigarrow}N(0,1) in distribution in each dimension. Now, we consider the multidimensional setting, and let Y=(Y1,Y2,,Yp)Y=(Y_{1},Y_{2},...,Y_{p}) be a mean-zero normally distributed random vector with covariance the pp-dimensional identity matrix. By the Cramér-Wold device (Lemma .4), it is sufficient to show that

(.3) u=1pcui=1nk=1pZik(Ωn1/2)kuu=1pcui=1nYiu\displaystyle\sum_{u=1}^{p}c_{u}\sum_{i=1}^{n}\sum_{k=1}^{p}Z_{ik}{({\Omega_{n}}^{-1/2})_{ku}}\operatorname*{\rightsquigarrow}\sum_{u=1}^{p}c_{u}\sum_{i=1}^{n}Y_{iu}

for all cpc\in\mathbb{R}^{p}.

But, from the proof above, we see that for each u[p]u\in[p],

i=1nk=1pZik(Ωn1/2)kui=1nYiu.\sum_{i=1}^{n}\sum_{k=1}^{p}Z_{ik}{({\Omega_{n}}^{-1/2})_{ku}}\operatorname*{\rightsquigarrow}\sum_{i=1}^{n}Y_{iu}.

It immediately follows that .3 is satisfied, so we have shown (Ωn)1/2Sn𝒩(0,Ip×p)(\Omega_{n})^{-1/2}S^{n}\operatorname*{\rightsquigarrow}\mathcal{N}(0,I_{p\times p}) 

Proof sketch of Corollary 1. We refer the reader to the proof to Corollary 2 in Chandrasekhar and Jackson [2024]. Applying the Cramér-Wold device just as above gives us the result.