This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rate-Optimal Cluster-Randomized Designs for Spatial Interference

Michael P. Leunglabel=e1][email protected] [ Department of Economics, UC Santa Cruz,
Abstract

We consider a potential outcomes model in which interference may be present between any two units but the extent of interference diminishes with spatial distance. The causal estimand is the global average treatment effect, which compares outcomes under the counterfactuals that all or no units are treated. We study a class of designs in which space is partitioned into clusters that are randomized into treatment and control. For each design, we estimate the treatment effect using a Horvitz-Thompson estimator that compares the average outcomes of units with all or no neighbors treated, where the neighborhood radius is of the same order as the cluster size dictated by the design. We derive the estimator’s rate of convergence as a function of the design and degree of interference and use this to obtain estimator-design pairs that achieve near-optimal rates of convergence under relatively minimal assumptions on interference. We prove that the estimators are asymptotically normal and provide a variance estimator. For practical implementation of the designs, we suggest partitioning space using clustering algorithms.

causal inference,
interference,
experimental design,
spatial dependence,
keywords:
\startlocaldefs\pdfximage

supplement.pdf \endlocaldefs

1 Introduction

Consider a population of nn experimental units. Denote by Yi(𝒅)Y_{i}(\bm{d}) the potential outcome of unit ii under the counterfactual that the population is assigned treatments according to the vector 𝒅=(di)i=1n{0,1}n\bm{d}=(d_{i})_{i=1}^{n}\in\{0,1\}^{n}, where di=1d_{i}=1 (di=0d_{i}=0) implies unit ii is assigned to treatment (control). Treatments assigned to alters can influence the ego since Yi(𝒅)Y_{i}(\bm{d}) is a function of djd_{j} for jij\neq i, what is known as interference.

An important estimand of practical interest is the global average treatment effect

θn=1ni𝒩n(Yi(𝟏n)Yi(𝟎n))\theta_{n}=\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}\big{(}Y_{i}(\bm{1}_{n})-Y_{i}(\bm{0}_{n})\big{)}

where 𝟏n\bm{1}_{n} (𝟎n\bm{0}_{n}) is the nn-dimensional vector of ones (zeros). This compares average outcomes under the counterfactuals that all or no units are treated. Each average can only be directly observed in the data under an extreme design that assigns all units to the same treatment arm, which would necessarily preclude observation of the other counterfactual. Common designs used in the literature, including those studied here, assign different units to different treatment arms, so neither average is directly observed in the data. Nonetheless, we show that asymptotic inference on θn\theta_{n} is possible for a class of cluster-randomized designs under spatial interference where the degree of interference diminishes with distance.

Many phenomena diffuse primarily through physical interaction. The government of a large city may wish to compare the effect of two different policing strategies on crime, but more intensive policing in one neighborhood may displace crime to adjacent neighborhoods [8, 49]. A rideshare company may wish to compare the performance of two different pricing algorithms, but these may induce behavior that generates spatial externalities, such as congestion. Other phenomena exhibiting spatial interference include infectious diseases in animal [14] and human [35] populations, pollution [18], and environmental conservation programs [36].

Much of the existing literature assumes that interference is summarized by a low-dimensional exposure mapping and that units are individually randomized into treatment or control either via Bernoulli or complete randomization [e.g. 3, 7, 16, 34, 45]. Jagadeesan et al. [23] and Ugander et al. [46] also utilize exposure mappings but depart from unit-level randomization. They propose new designs that introduce cluster dependence in unit-level assignments in order to improve estimator precision. We build on this literature by (1) studying rate-optimal choices of both cluster-randomized designs and Horvitz-Thompson estimators, (2) avoiding exposure mapping restrictions on interference, which can be quite strong [15], and (3) developing a distributional theory for the estimator and a variance estimator.

Regarding (2), most exposure mappings used in the literature imply that only units within a small, known distance from the ego can interfere with the ego’s outcome. We instead study a weaker restriction on interference similar to [31], which states that the degree of interference decreases with distance but does not necessarily zero out at any given distance. This is analogous to the widespread use of mixing-type conditions in the time series and spatial literature instead of mm-dependence because the latter rules out interesting forms of autocorrelation, including models as basic as AR(1)(1).

Regarding (1), we study cluster-randomized designs in which units are partitioned into spatial clusters, clusters are independently randomized into treatment and control, and the assignment of a unit is dictated by the assignment of its cluster. By introducing correlation in assignments, such designs can avoid overlap problems common under Bernoulli randomization, which improves the rate of convergence. For analytical tractability, we focus on designs in which clusters are equally sized squares, each design distinguished by the number of such squares. We pair each design with a Horvitz-Thompson estimator that compares the average outcomes of units with all or no treated neighbors, where the neighborhood radius is of the same order as the cluster size dictated by the design. See Figure 1 for a depiction of a hypothetical design and neighborhoods used to construct the estimator.

Our results inform how the analyst should choose the number of clusters (and hence, the cluster size and neighborhood radius of the estimator) to minimize the rate of convergence of the estimator. Notably, existing work on cluster randomization with interference utilizes small clusters (those with asymptotically bounded size). We show that such designs are generally asymptotically biased under the weaker restriction on interference we impose, which motivates the large-cluster designs we study.

Refer to caption
Figure 1: White and black dots depict units. White squares correspond to clusters and gray squares to neighborhoods of black units used to construct the estimator.

Finally, regarding (3), we show that the estimator is asymptotically normal and provide a variance estimator. These results appear to be novel, as no existing central limit theorems seem to apply to our setup in which treatments exhibit cluster dependence, clusters can be large, and units in different clusters are spatially dependent due to interference. As usual, the variance estimator is biased due to heterogeneity in unit-level treatment effects. However, we show that, in a superpopulation setting in which potential outcomes are weakly spatially dependent, the bias is asymptotically negligible.

Based on our theory, we provide practical recommendations for implementing cluster-randomized designs in § 3.3. Of course, rate-optimality results do not determine the choice of nonasymptotic constants that are often important in practice under smaller sample sizes. Still, they constitute an important first step toward designing practical procedures. Due to the generality of the setting, which imposes quite minimal assumptions on interference, it seems reasonable to first study rate-optimality, as finite-sample optimality appears to require substantial additional structure on the problem. We note that existing results on graph cluster randomization, which require stronger restrictions on interference than this paper, are nonetheless limited to rates, and how “best” to construct clusters in practice has been an open question.

1.1 Related Literature

Most of the literature supposes interference is mediated by a network. Studying optimal design in this setting is difficult because network clusters can be highly heterogeneous in topology, and their graph-theoretic properties can closely depend on the generative model of the network [32]. We study spatial interference, and to make the optimal design problem analytically tractable, we focus on a class of designs that partitions space into equally sized squares while exploring in simulations the performance of more realistic designs that partition using clustering algorithms. We discuss in § 6 the (pessimistic) prospects of extending our approach to network interference.

There is relatively little work on optimal experimental design under interference. Viviano [50] proposes variance-minimizing two-wave experiments under network interference. Baird et al. [5] study the power of randomized saturation designs under partial interference.

A recent literature studies designs for interference that depart from unit-level randomization. A key paper motivating our work is [46], who propose graph cluster randomization designs under network interference. Ugander and Yin [47] study a new variant of these designs, and [19] consider related designs for bipartite experiments. These papers assume interference is summarized by exposure mappings, which enables the construction of unbiased estimators and use of designs in which clusters are small. Under our weaker restriction on interference, we show that large clusters are required to reduce bias, which creates a bias-variance trade-off.

Eckles et al. [15] show that graph cluster randomization can reduce the bias of common estimators for θn\theta_{n} in the absence of correctly specified exposure mappings. Pouget-Abadie et al. [40] propose two-stage cluster-randomized designs to minimize bias under a monotonicity restriction on interference. Several papers [6, 23, 44] study linear potential outcome models and propose designs targeting the direct average treatment effect, rather than θn\theta_{n}. Under a normal-sum model, [6] compute the mean-squared error of the difference-in-means estimator, which they use to suggest model-assisted designs.

The aforementioned papers on cluster randomization target global effects such as θn\theta_{n} [also see 9, 10]. Much of the literature on interference considers fundamentally different estimands defined by exposure mappings. When these mappings are misspecified, the estimands are functions of assignment probabilities, in which case their interpretations can be specific to the experiments run [42, 43]. Hu et al. [21] (§5) views this as “largely unavoidable” in nonparametric settings with interference. Our results show that inference on θn\theta_{n}, which avoids this issue, is possible under restrictions on interference weaker than those typically used in the literature. Additionally, papers in the literature impose an overlap assumption, which implicitly restricts the estimand [31]. We study cluster-randomized designs that directly satisfy overlap.

There is a large literature on cluster-randomized trials [e.g. 20, 37]. This literature predominantly studies partial interference, meaning that units are divided into clusters such that those in distinct clusters do not interfere. That is, the clusters themselves impose restrictions on interference. In our setting, clusters are determined by the design and do not restrict interference.

Finally, [4], [39], and [51] study spatial interference in a different “bipartite” setting in which treatments are assigned to units or locations that are distinct from the units whose outcomes are of interest. This shares some similarities with spatial cluster randomization, where different spatial regions are randomized into treatment, so some of the ideas here may be applicable to optimal design there.

1.2 Outline

The next section defines the model of spatial interference and the class of designs and estimators studied. In § 3, we derive the estimator’s rate of convergence, discuss rate-optimal designs, and provide practical design recommendations. In § 4, we prove that the estimator is asymptotically normal, propose a variance estimator, and characterize its asymptotic properties. We report results from a simulation study in § 5, exploring the use of spectral clustering to implement the designs. Finally, § 6 concludes.

2 Setup

Let 𝒩n\mathcal{N}_{n} be a set of nn units. We study experiments in which units are cluster-randomized into treatment and control, postponing to § 2.2 the specifics of the design. For each i𝒩ni\in\mathcal{N}_{n}, let DiD_{i} be a binary random variable where Di=1D_{i}=1 indicates that ii is assigned to treatment and Di=0D_{i}=0 indicates assignment to control. Let 𝑫=(Di)i𝒩n\bm{D}=(D_{i})_{i\in\mathcal{N}_{n}} be the vector of realized treatments and 𝒅=(di)i𝒩n{0,1}n\bm{d}=(d_{i})_{i\in\mathcal{N}_{n}}\in\{0,1\}^{n} denote a non-random vector of counterfactual treatments. Recall from § 1 that Yi(𝒅)Y_{i}(\bm{d}) is the potential outcome of unit ii under the counterfactual that units are assigned treatments according to 𝒅\bm{d}. Formally, for each nn\in\mathbb{N} and i𝒩ni\in\mathcal{N}_{n}, Yi()Y_{i}(\cdot) is a non-random function from {0,1}n\{0,1\}^{n} to \mathbb{R}. We denote ii’s factual, or observed, outcome by Yi=Yi(𝑫)Y_{i}=Y_{i}(\bm{D}) and maintain the standard assumption that potential outcomes are uniformly bounded.

Assumption 1 (Bounded Outcomes).

supnmaxi𝒩nmax𝒅{0,1}n|Yi(𝒅)|<\sup_{n\in\mathbb{N}}\max_{i\in\mathcal{N}_{n}}\max_{\bm{d}\in\{0,1\}^{n}}\lvert Y_{i}(\bm{d})\rvert<\infty.

2.1 Spatial Interference

Thus far, the model allows for unrestricted interference in the sense that Yi(𝒅)Y_{i}(\bm{d}) may vary essentially arbitrarily in any component of 𝒅\bm{d}. In order to obtain a positive result on asymptotic inference, it is necessary to impose restrictions on interference to establish some form of weak dependence across unit outcomes. The existing literature primarily focuses on restrictions captured by KK-neighborhood exposure mappings, which imply that DjD_{j} can only interfere with Yi(𝑫)Y_{i}(\bm{D}) if the distance between i,ji,j is at most KK. We will discuss how this assumption is potentially restrictive and establish results under weaker conditions.

We assume each unit is located in 2\mathbb{R}^{2}. Label each unit by its location, so that 𝒩n2\mathcal{N}_{n}\subset\mathbb{R}^{2}, and equip this space with the sup metric ρ(i,j)=maxt=1,2|itjt|\rho(i,j)=\max_{t=1,2}\lvert i_{t}-j_{t}\rvert for i=(i1,i2)i=(i_{1},i_{2}), j=(j1,j2)j=(j_{1},j_{2}), and i,j2i,j\in\mathbb{R}^{2}. Let Q(i,r)={j2:ρ(i,j)r}Q(i,r)=\{j\in\mathbb{R}^{2}\colon\rho(i,j)\leq r\}, the ball of radius rr centered at ii. Under the sup metric, balls are squares, and the radius is half the side length of the square. Letting 𝟎\bm{0} denote the origin, we consider a sequence of population regions {Q(𝟎,Rn)}n\{Q(\bm{0},R_{n})\}_{n\in\mathbb{N}} such that

𝒩nQ(𝟎,Rn),whereRnn1/2c(0,).\mathcal{N}_{n}\subset Q(\bm{0},R_{n}),\quad\text{where}\quad R_{n}n^{-1/2}\rightarrow c\in(0,\infty).

That is, units are located in the square Q(𝟎,Rn)Q(\bm{0},R_{n}) with growing radius RnR_{n}. Combined with the next increasing domain assumption, the number of units in the region is O(n)O(n), but throughout, we will simply assume the number is exactly nn.

Assumption 2 (Increasing Domain).

There exists ρ0>0\rho_{0}>0 such that, for any nn\in\mathbb{N} and i,j𝒩ni,j\in\mathcal{N}_{n}, ρ(i,j)ρ0\rho(i,j)\geq\rho_{0}.

This allows units to be arbitrarily irregularly spaced, subject to being minimally separated by some distance ρ0\rho_{0}, a widely used sampling framework in the spatial literature [e.g. 25]. In contrast, “infill” asymptotic approaches that do not require minimal separation and instead assume increasingly dense sampling from a fixed region can yield nonstandard limiting behavior [28]. For some applications, the spatial distribution of units may exhibit “hotspots” with unusually high densities, perhaps making the infill approach more plausible. Some work adopts a hybrid of the two approaches [29, 30], and it may be possible to extend our results to this framework.

Let +\mathbb{R}_{+} denote the set of non-negative reals and

𝒩(i,K)=Q(i,K)𝒩n\mathcal{N}(i,K)=Q(i,K)\cap\mathcal{N}_{n}

denote the KK-neighborhood of ii. We study the following model of interference similar to that proposed by [31].

Assumption 3 (Interference).

There exists a non-increasing function ψ:++\psi\colon\mathbb{R}_{+}\rightarrow\mathbb{R}_{+} such that ψ(0)(0,)\psi(0)\in(0,\infty), s=1sψ(s)<\sum_{s=1}^{\infty}s\,\psi(s)<\infty, and, for all s+s\in\mathbb{R}_{+},

supnmaxi𝒩nmax{|Yi(𝒅)Yi(𝒅)|:𝒅,𝒅{0,1}n,dj=djj𝒩(i,s)}ψ(s).\sup_{n\in\mathbb{N}}\max_{i\in\mathcal{N}_{n}}\max\big{\{}\lvert Y_{i}(\bm{d})-Y_{i}(\bm{d}^{\prime})\rvert\colon\bm{d},\bm{d}^{\prime}\in\{0,1\}^{n},d_{j}=d_{j}^{\prime}\,\,\forall j\in\mathcal{N}(i,s)\big{\}}\leq\psi(s).

To interpret this, observe that max{|Yi(𝒅)Yi(𝒅)|:𝒅,𝒅{0,1}n,dj=djj𝒩(i,s)}\max\big{\{}\lvert Y_{i}(\bm{d})-Y_{i}(\bm{d}^{\prime})\rvert\colon\bm{d},\bm{d}^{\prime}\in\{0,1\}^{n},d_{j}=d_{j}^{\prime}\,\,\forall j\in\mathcal{N}(i,s)\big{\}} maximizes over pairs of treatment assignment vectors that fix the assignments of units in ii’s ss-neighborhood but allow assignments to freely vary outside of this neighborhood. It therefore measures the degree of spatial interference in terms of the maximum change to ii’s potential outcome caused by manipulating treatments assigned to units kk “distant” from ii in the sense that ρ(i,k)>s\rho(i,k)>s. The assumption requires the degree of interference to vanish with the neighborhood radius ss so that treatments assigned to more distant alters interfere less with the ego. The rate at which interference vanishes is controlled by ψ(s)\psi(s), which is required to decay at a rate faster than s2s^{-2}.

Remark 1 (Necessity of rate condition).

Assumption 3(b) of [25] and Assumption 4(c) of [27] impose the same minimum rate of decay as Assumption 3 on various measures of spatial dependence (mixing or near-epoch dependence coefficients) to establish central limit theorems. If the rate is slower, then the variance can be infinite. For example, consider a spatial process {Zi}i𝒩n\{Z_{i}\}_{i\in\mathcal{N}_{n}} such that units are positioned on the integer lattice 2\mathbb{Z}^{2} and Cov(Zi,Zj)=f(ρ(i,j))\text{Cov}(Z_{i},Z_{j})=f(\rho(i,j)) for some function f()f(\cdot) for any i,ji,j. Then

Var(1ni=1nZi)=1ni=1ns=0j:ρ(i,j)=sCov(Zi,Zj)=s=0f(s)1ni=1n|{j:ρ(i,j)=s}|.\text{Var}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}Z_{i}\right)=\frac{1}{n}\sum_{i=1}^{n}\sum_{s=0}^{\infty}\sum_{j:\rho(i,j)=s}\text{Cov}(Z_{i},Z_{j})=\sum_{s=0}^{\infty}f(s)\frac{1}{n}\sum_{i=1}^{n}\lvert\{j\colon\rho(i,j)=s\}\rvert.

Note that |{j:ρ(i,j)=s}|8s\lvert\{j\colon\rho(i,j)=s\}\rvert\leq 8s, with equality achieved for all ii not near the boundary of the population region. Thus, for large nn, a finite variance requires that f(s)f(s) decay with ss faster than s2s^{-2}.

We next discuss two models of interference satisfying Assumption 3. The first is the standard approach of specifying a KK-neighborhood exposure mapping. Such a mapping is given by Ti=T(i,𝑫,𝒩n)T_{i}=T(i,\bm{D},\mathcal{N}_{n}) with the crucial property that its dimension does not depend on nn, unlike that of 𝑫\bm{D}. The approach is to assume that the low-dimensional TiT_{i} summarizes interference by reparameterizing potential outcomes as

Yi(𝑫)=Y~i(Ti).Y_{i}(\bm{D})=\tilde{Y}_{i}(T_{i}). (1)

That is, once we fix ii’s exposure mapping TiT_{i}, its potential outcome is fully determined. No less important, it is also typically assumed that exposure mappings are restricted to a unit’s KK-neighborhood, where KK is small, meaning fixed with respect to nn. Formally, T(i,𝒅,𝒩n)=T(i,𝒅,𝒩n)T(i,\bm{d},\mathcal{N}_{n})=T(i,\bm{d}^{\prime},\mathcal{N}_{n}) for any 𝒅,𝒅\bm{d},\bm{d}^{\prime} such that dj=djd_{j}=d_{j}^{\prime} for all j𝒩(i,K)j\in\mathcal{N}(i,K), which implies that the treatment assigned to a unit jj only interferes with ii if ρ(i,j)K\rho(i,j)\leq K. In practice, choices with K=1K=1 are most common, for example Ti=(Di,Si)T_{i}=(D_{i},S_{i}) for Si=𝟏{j𝒩nGijTj>0}S_{i}=\bm{1}\{\sum_{j\in\mathcal{N}_{n}}G_{ij}T_{j}>0\} or Si=j𝒩nGijTjS_{i}=\sum_{j\in\mathcal{N}_{n}}G_{ij}T_{j} where Gij=𝟏{ρ(i,j)1}G_{ij}=\bm{1}\{\rho(i,j)\leq 1\}. In these examples, DiD_{i} captures the direct effect of the treatment, and SiS_{i} captures interference from units near ii.

Crucially, TiT_{i} and KK must be known to the analyst in this approach, which is often a strong requirement. In contrast, Assumption 3 enables the analyst to impose (1) while requiring neither to be known. Indeed, if there exists a KK-neighborhood exposure mapping satisfying (1), then Assumption 3 holds with ψ(s)=c 1{sK}\psi(s)=c\,\bm{1}\{s\leq K\} for some cc sufficiently large.

Furthermore, Assumption 3 allows for more complex forms of interference ruled out by (1) in which interference decays more smoothly with distance, rather than being truncated at some distance KK. The former is analogous to mixing conditions, which are widespread in the time series and spatial literature, while the latter is analogous to mm-dependence, which rules out interesting forms of autocorrelation, including models as basic as AR(1)(1).

In the spatial context, our assumption accommodates, for example, the Cliff-Ord autoregressive model [11, 12], which is a workhorse model of spatial autocorrelation used in a variety of fields, including geography [17], ecology [48], and economics [2]. A typical formulation of the model is

Yi=α+λj𝒩nWijYj+Diβ+εi,Y_{i}=\alpha+\lambda\sum_{j\in\mathcal{N}_{n}}W_{ij}Y_{j}+D_{i}\beta+\varepsilon_{i},

where we assume εi\varepsilon_{i} is uniformly bounded to satisfy Assumption 1. Let 𝑾\bm{W} be the n×nn\times n spatial weight matrix whose ijijth entry is WijW_{ij}. These weights typically decay with distance ρ(i,j)\rho(i,j) in a sense to be made precise below. While this model is highly stylized, the important aspect it captures is autocorrelation through the spatial autoregressive parameter λ\lambda. If this is nonzero, then there is no KK-neighborhood exposure mapping for which (1) holds, a point previously noted by [15] in the context of network interference.

To see this, first note that coherency of the model requires nonsingularity of 𝑰λ𝑾\bm{I}-\lambda\bm{W}, where 𝑰\bm{I} is the n×nn\times n identity matrix. Let 𝑽\bm{V} be the inverse of this matrix and VijV_{ij} its entry corresponding to units (i,j)(i,j). Then the reduced form of the model is

Yi(𝑫)Yi=j𝒩nVij(α+Djβ+εj),Y_{i}(\bm{D})\equiv Y_{i}=\sum_{j\in\mathcal{N}_{n}}V_{ij}(\alpha+D_{j}\beta+\varepsilon_{j}), (2)

a spatial “moving average” model with spatial weight matrix 𝑽\bm{V}. (See § 5 for some examples of 𝑽\bm{V}.) Noticeably, Yi(𝑫)Y_{i}(\bm{D}) can potentially depend on DjD_{j} for any j𝒩nj\in\mathcal{N}_{n}, which is ruled out if one imposes a KK-neighborhood exposure mapping.

Outcomes satisfying (2) are near-epoch dependent, a notion of weak spatial dependence, when the weights decay with spatial distance in the following sense:

supnmaxi𝒩nj𝒩n|Vij|ρ(i,j)γ<\sup_{n\in\mathbb{N}}\max_{i\in\mathcal{N}_{n}}\sum_{j\in\mathcal{N}_{n}}\lvert V_{ij}\rvert\rho(i,j)^{\gamma}<\infty (3)

for some γ>0\gamma>0 [see Proposition 5 and eq. (13) of 26]. The next result shows that this condition is sufficient for verifying Assumption 3 if γ>2\gamma>2.

Proposition 1.

Suppose potential outcomes are given by (2) and spatial weights satisfy (3) for some γ>2\gamma>2. Then Assumption 3 holds with ψ(s)=cmin{sγ,1}\psi(s)=c\,\min\{s^{-\gamma},1\} for some c(0,)c\in(0,\infty) that does not depend on ss.

Proof.

Fix s0s\geq 0 and any 𝒅,𝒅{0,1}n\bm{d},\bm{d}^{\prime}\in\{0,1\}^{n} such that dj=djd_{j}=d_{j}^{\prime} for all j𝒩(i,s)j\in\mathcal{N}(i,s). For s<1s<1, |Yi(𝒅)Yi(𝒅)|c12supnmaximax𝒅|Yi(𝒅)|\lvert Y_{i}(\bm{d})-Y_{i}(\bm{d}^{\prime})\rvert\leq c_{1}\equiv 2\sup_{n}\max_{i}\max_{\bm{d}}\lvert Y_{i}(\bm{d})\rvert. For s1s\geq 1,

|Yi(𝒅)Yi(𝒅)||β|j𝒩n|Vij||djdj|𝟏{ρ(i,j)>s}sγ|β|j𝒩n|Vij|ρ(i,j)γ.\lvert Y_{i}(\bm{d})-Y_{i}(\bm{d}^{\prime})\rvert\leq\lvert\beta\rvert\sum_{j\in\mathcal{N}_{n}}\lvert V_{ij}\rvert\lvert d_{j}-d_{j}^{\prime}\rvert\bm{1}\{\rho(i,j)>s\}\leq s^{-\gamma}\lvert\beta\rvert\sum_{j\in\mathcal{N}_{n}}\lvert V_{ij}\rvert\rho(i,j)^{\gamma}.

Defining c2=supnmaxij𝒩n|β||Vij|ρ(i,j)γc_{2}=\sup_{n}\max_{i}\sum_{j\in\mathcal{N}_{n}}\lvert\beta\rvert\lvert V_{ij}\rvert\rho(i,j)^{\gamma} and c=max{c1,c2}+1>0c=\max\{c_{1},c_{2}\}+1>0, the inequality in Assumption 3 holds with ψ(s)=cmin{sγ,1}\psi(s)=c\,\min\{s^{-\gamma},1\} by construction. Furthermore, c<c<\infty by (3) and uniform boundedness of {εi}i𝒩n\{\varepsilon_{i}\}_{i\in\mathcal{N}_{n}}, and ψ(s)\psi(s) satisfies s=1sψ(s)<\sum_{s=1}^{\infty}s\,\psi(s)<\infty because γ>2\gamma>2. ∎

This result shows that, unlike the standard approach of imposing a KK-neighborhood exposure mapping, Assumption 3 can allow for richer forms of interference in which alters that are arbitrarily distant from the ego can interfere with the ego’s response.

Remark 2 (Literature).

Assumption 3 and Proposition 1 are spatial analogs of Assumptions 4 and 6 and Proposition 1 of [31] who studies interference mediated by an unweighted network, Bernoulli designs, and a different class of estimands defined by exposure mappings satisfying overlap. We study the global average treatment effect and cluster-randomized designs that induce overlap by introducing dependence in treatment assignments, and we further derive rate-optimal designs. These differences require an entirely distinct asymptotic theory.

2.2 Design and Estimator

Much of the literature on interference considers designs in which units are individually randomized into treatment and control, either via Bernoulli or complete randomization. A common problem faced by such designs is limited overlap, meaning that some realizations of the exposure mapping occur with low probability. For example, suppose that (1) holds with exposure mapping Ti=j𝒩n𝟏{ρ(i,j)K}DjT_{i}=\sum_{j\in\mathcal{N}_{n}}\bm{1}\{\rho(i,j)\leq K\}D_{j}, the number of treated units in ii’s KK-neighborhood. Then in a Bernoulli design, for large values of KK, 𝐏(Ti=0){\bf P}(T_{i}=0) is small, tending to zero with KK at an exponential rate. This is problematic for a Horvitz-Thompson estimator such as n1i𝒩n(pi(t)1𝟏{Ti=t}pi(t)1𝟏{Ti=t})Yin^{-1}\sum_{i\in\mathcal{N}_{n}}(p_{i}(t)^{-1}\bm{1}\{T_{i}=t\}-p_{i}(t^{\prime})^{-1}\bm{1}\{T_{i}=t^{\prime}\})Y_{i} where pi(t)=𝐏(Ti=t)p_{i}(t)={\bf P}(T_{i}=t) since its variance grows rapidly with KK if either tt or tt^{\prime} is zero. Ugander et al. [46] propose cluster-randomized designs, which reduce this problem by deliberately introducing dependence in treatment assignments across certain units.

We consider the following class of such designs. We assign units to mutually exclusive clusters by partitioning the population region Q(𝟎,Rn)Q(\bm{0},R_{n}) into mnnm_{n}\leq n equally sized squares, assuming for simplicity that mn{4s:s}m_{n}\in\{4^{s}\colon s\in\mathbb{N}\}. That is, to obtain increasingly more clusters, we first divide the population region into four squares, then divide each of these squares into four squares, and so on, as in Figure 2. Label the mnm_{n} squares Q1,,QmnQ_{1},\dots,Q_{m_{n}}, and call

Ck=Qk𝒩nC_{k}=Q_{k}\mathbin{\scalebox{1.5}{$\cap$}}\mathcal{N}_{n}

cluster kk. Then the number of units in each cluster is uniformly O(n/mn)O(n/m_{n}) under Assumption 2, and the radius of each cluster is

rnRn/mn=n/mn+o(1),r_{n}\equiv R_{n}/\sqrt{m_{n}}=\sqrt{n/m_{n}}+o(1),

which we assume is greater than 1. We also assume there are no units on the common boundaries of different squares, so the squares partition 𝒩n\mathcal{N}_{n}.

Refer to caption
Figure 2: Sequence of population regions and clusters as nn\rightarrow\infty.

A cluster-randomized design first independently assigns each cluster to treatment and control with some probability

p(0,1)p\in(0,1)

that is fixed with respect to nn. Then within a treated (control) cluster CkC_{k}, all iCki\in C_{k} are assigned Di=1D_{i}=1 (Di=0D_{i}=0). In order to emphasize that we use this design in later theorems, we state it as a separate assumption.

Assumption 4 (Design).

For any nn, 𝑫\bm{D} is realized according to a cluster-randomized design with mnm_{n} clusters constructed as above.

Note that mnm_{n} will be required to diverge with nn since a large number of clusters is needed for the estimator to concentrate. If mnm_{n} is order nn, then rn=O(1)r_{n}=O(1), so clusters are asymptotically bounded in size, the usual case studied in the literature, which includes unit-level Bernoulli randomization as a special case. If mnm_{n} is of smaller order, then cluster sizes grow with nn.

To construct the estimator, define the neighborhood exposure indicator

Tti=j𝒩(i,κn)𝟏{Dj=t}fort{0,1},κn=rn/2.T_{ti}=\prod_{j\in\mathcal{N}(i,\kappa_{n})}\bm{1}\{D_{j}=t\}\quad\text{for}\quad t\in\{0,1\},\quad\kappa_{n}=r_{n}/2.

This is an indicator for whether ii’s κn\kappa_{n}-neighborhood is entirely treated (t=1t=1) or untreated (t=0t=0). Unlike KK-neighborhood exposure mappings, the radius κn\kappa_{n} will be allowed to diverge. Let pti=𝐄[Tti]p_{ti}={\bf E}[T_{ti}]. We study the Horvitz-Thompson estimator

θ^=1ni𝒩nZiforZi=(T1ip1iT0ip0i)Yi.\hat{\theta}=\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}Z_{i}\quad\text{for}\quad Z_{i}=\left(\frac{T_{1i}}{p_{1i}}-\frac{T_{0i}}{p_{0i}}\right)Y_{i}.

Intuitively, n1i=1nYiT1ip1i1n^{-1}\sum_{i=1}^{n}Y_{i}T_{1i}p_{1i}^{-1} estimates n1i𝒩nYi(𝟏n)n^{-1}\sum_{i\in\mathcal{N}_{n}}Y_{i}(\bm{1}_{n}) using the outcomes of units whose neighbors within radius κn\kappa_{n} are all treated. Since the radius depends on mnm_{n} through rnr_{n}, θ^\hat{\theta} is a function of the number of clusters dictated by the design. Figure 1 depicts the relationship between the clusters and the κn\kappa_{n}-neighborhoods that determine exposure.

Since nontrivial designs will include both treated and untreated units, θ^\hat{\theta} is biased for the global average treatment effect. The choice of design can trade off the size of the bias against that of the variance. In particular, small choices of mnm_{n} (few clusters, large radii) induce lower bias and higher variance. In § 3, we discuss nearly rate-optimal choices of mnm_{n} for which the bias is asymptotically negligible.

Remark 3 (Overlap).

Under Bernoulli randomization, overlap needs to be imposed as a separate assumption for asymptotic inference. By overlap we mean the probability weights p1ip_{1i} and p0ip_{0i} are uniformly bounded away from zero and one, which imposes potentially strong restrictions on the types of exposure mappings the analyst can use, as previously illustrated. In our setup, however, overlap is directly satisfied because p1i=pkp_{1i}=p^{k} and p0i=(1p)kp_{0i}=(1-p)^{k}, where kk is the number of clusters that intersect ii’s κn\kappa_{n}-neighborhood. Our choice of κn\kappa_{n} implies k[1,4]k\in[1,4] for all ii, so overlap holds.

Remark 4 (Neighborhood radius).

Let 𝒄(Ck)2\bm{c}(C_{k})\in\mathbb{R}^{2} be the centroid of cluster CkC_{k}. The choice of κn=rn/2\kappa_{n}=r_{n}/2 ensures that, for any unit ii in the “interior” of a cluster in the sense that iQ(𝒄(Ck),rn/2)i\in Q(\bm{c}(C_{k}),r_{n}/2), ii’s κn\kappa_{n}-neighborhood also lies within that cluster, in which case the exposure probabilities are simply given by the cluster assignment probability: p1i=pp_{1i}=p and p0i=1pp_{0i}=1-p. If we had instead chosen, say, κn=rn\kappa_{n}=r_{n}, then this would be true only for the centroid, while for the remaining units, p1ip_{1i} and p0ip_{0i} could be as small as p4p^{4}, which means less overlap and a more variable estimate. For the purposes of the asymptotic theory, the main requirement is that κn\kappa_{n} has the same asymptotic order as rnr_{n}. If κn\kappa_{n} were of smaller order, then results in § 3 show that the bias of θ^\hat{\theta} could be non-negligible, whereas if κn\kappa_{n} were of larger order, then kk in Remark 3 would grow with nn, overlap would be limited, and Var(θ^)\text{Var}(\hat{\theta}) could be large.

3 Rate-Optimal Designs

We next derive the rate of convergence of θ^\hat{\theta} as a function of nn, mnm_{n}, and ψ()\psi(\cdot), which we use to obtain rate-optimal choices of mnm_{n}. Recall that designs are parameterized by mnm_{n}, which determines the number and sizes of clusters, and also that θ^\hat{\theta} depends on mnm_{n} through the neighborhood exposure radius κn\kappa_{n}, so we will be optimizing over both the design and the radius that determines the estimator.

3.1 Rate of Convergence

We first provide asymptotic upper bounds on the bias and variance of θ^\hat{\theta}. For two sequences {an}n\{a_{n}\}_{n\in\mathbb{N}} and {bn}n\{b_{n}\}_{n\in\mathbb{N}}, we write anbna_{n}\lesssim b_{n} to mean an/bn=O(1)a_{n}/b_{n}=O(1) and anbna_{n}\gtrsim b_{n} to mean bn/an=O(1)b_{n}/a_{n}=O(1).

Theorem 1.

Under Assumptions 14, |𝐄[θ^]θn|ψ(rn/2)\lvert{\bf E}[\hat{\theta}]-\theta_{n}\rvert\lesssim\psi(r_{n}/2) and Var(θ^)mn1\text{Var}(\hat{\theta})\lesssim m_{n}^{-1}.

Proof.

First, we bound the bias. If T1i=1T_{1i}=1, then all units in Q(i,κn)Q(i,\kappa_{n}) are treated, so by Assumption 3,

|𝐄[YiT1ip1i1]Yi(𝟏n)|=|𝐄[Yi(𝑫)T1i=1]Yi(𝟏n)|ψ(κn).\lvert{\bf E}[Y_{i}T_{1i}p_{1i}^{-1}]-Y_{i}(\bm{1}_{n})\rvert=\lvert{\bf E}[Y_{i}(\bm{D})\mid T_{1i}=1]-Y_{i}(\bm{1}_{n})\rvert\leq\psi(\kappa_{n}).

The same argument applies to |𝐄[YiT0ip0i1]Yi(𝟎n)|\lvert{\bf E}[Y_{i}T_{0i}p_{0i}^{-1}]-Y_{i}(\bm{0}_{n})\rvert, so combining these results, we obtain the rate for the bias.

Next, we bound the variance. The following is an important object in our analysis and also for later constructing the variance estimator:

Λi={j𝒩n:k s.t. Ck𝒩(i,κn),Ck𝒩(j,κn)}.\Lambda_{i}=\{j\in\mathcal{N}_{n}\colon\exists\,k\,\text{ s.t. }\,C_{k}\cap\mathcal{N}(i,\kappa_{n})\neq\varnothing,\,C_{k}\cap\mathcal{N}(j,\kappa_{n})\neq\varnothing\}. (4)

This is the set of units jj whose κn\kappa_{n}-neighborhoods intersect a cluster CkC_{k} that also intersects ii’s κn\kappa_{n}-neighborhood. We have

Var(1ni𝒩nZi)=1n2i𝒩njΛiCov(Zi,Zj)+1n2i𝒩njΛiCov(Zi,Zj)[A]+[B].\text{Var}\left(\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}Z_{i}\right)=\frac{1}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{j\in\Lambda_{i}}\text{Cov}(Z_{i},Z_{j})+\frac{1}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{j\not\in\Lambda_{i}}\text{Cov}(Z_{i},Z_{j})\equiv[A]+[B]. (5)

Note that Λi\Lambda_{i} contains units from at most 16 clusters (the worst case is when Q(i,κn)Q(i,\kappa_{n}) intersects four clusters), and clusters contain uniformly O(n/mn)O(n/m_{n}) units by Lemma A.1 of [25]. By Assumption 1 and Remark 3, {Zi}i𝒩n\{Z_{i}\}_{i\in\mathcal{N}_{n}} is uniformly bounded, so

[A]1n2nnmn=1mn.[A]\lesssim\frac{1}{n^{2}}\cdot n\cdot\frac{n}{m_{n}}=\frac{1}{m_{n}}.

The difficult part of the argument is obtaining a tight enough rate for [B][B], which requires control over the dependence between outcomes of units i,ji,j that are “distant” in the sense that jΛij\not\in\Lambda_{i}. Lemma B.1 in § B proves that [B](nmn)1/2[B]\lesssim(nm_{n})^{-1/2}. Therefore, Var(θ^)mn1+(nmn)1/2\text{Var}(\hat{\theta})\lesssim m_{n}^{-1}+(nm_{n})^{-1/2}, which is at most mn1m_{n}^{-1} since mnnm_{n}\leq n. ∎

Our second result provides asymptotic lower bounds.

Theorem 2.

Suppose ψ(s)=min{sγ,0.5}\psi(s)=\min\{s^{-\gamma},0.5\} for some γ>2\gamma>2. Under Assumption 4, there exists a sequence of units and potential outcomes {{Yi()}i𝒩n:n}\{\{Y_{i}(\cdot)\}_{i\in\mathcal{N}_{n}}\colon n\in\mathbb{N}\} satisfying Assumptions 13 such that |𝐄[θ^]θn|ψ(1.5rn)\lvert{\bf E}[\hat{\theta}]-\theta_{n}\rvert\gtrsim\psi(1.5r_{n}) and Var(θ^)mn1\text{Var}(\hat{\theta})\gtrsim m_{n}^{-1}.

Proof.

See supplemental material [33]. ∎

The result shows that we can construct potential outcomes satisfying the assumptions of Theorem 1 such that the bias is at least order ψ(1.5rn)\psi(1.5r_{n}) and the variance at least mn1m_{n}^{-1}. As discussed in § 1, existing work on cluster randomization under interference assumes clusters have asymptotically bounded size, which, in our setting, implies rn=O(1)r_{n}=O(1). Theorem 2 implies that the bias of the Horvitz-Thompson estimator can then be bounded away from zero, showing that existing results strongly rely on the exposure mapping assumption to obtain unbiased estimates. In the absence of this assumption, it is necessary to consider designs in which cluster sizes are large to ensure the bias vanishes with nn.

3.2 Design Examples

Theorem 1 implies the mean-squared error of θ^\hat{\theta} is at most of order ψ(rn/2)2+mn1\psi(r_{n}/2)^{2}+m_{n}^{-1}, and Theorem 2 provides a similar asymptotic lower bound. Under either bound, the bias increases with mnm_{n} while the variance decreases, so there exists a bias-variance trade-off in the choice of design. We next derive rates for mnm_{n} that minimize or nearly minimize the upper bound under different assumptions on ψ()\psi(\cdot). Based on these results, we make recommendations for practical implementation in the next subsection.

Oracle design. Suppose ψ(s)\psi(s) is known to decay with ss at some rate ϕ(s)\phi(s). Then by definition of rnr_{n}, a rate-optimal design chooses mnm_{n} to minimize ϕ(0.5Rnmn1/2)2+mn1\phi(0.5R_{n}m_{n}^{-1/2})^{2}+m_{n}^{-1}.

Exposure mappings. If we assume (1) holds for some KK-neighborhood exposure mapping, then ψ(s)=0\psi(s)=0 for all s>Ks>K. If KK is known, then by choosing mn=Rn2(2K)2m_{n}=R_{n}^{2}(2K)^{-2}, we have κn=K\kappa_{n}=K and zero bias. In this case, clusters are asymptotically bounded in size, the estimator converges at rate n1/2n^{-1/2}, and both the design and estimator qualitatively coincide with those of [46].

On the other hand, if KK is unknown, then for a nearly rate-optimal design, we can choose κn\kappa_{n} to grow at a slow rate so that it eventually exceeds any fixed KK. This may be achieved by choosing mnm_{n} to grow slightly slower than nn, say n/log(n)n/\log(n). Then for large enough nn, the bias is zero, and the rate of convergence is log(n)/n\sqrt{\log(n)/n}.

Exponential decay. Common specifications of the spatial weight matrix 𝑾\bm{W} in the Cliff-Ord model imply that ψ(s)\psi(s) decays exponentially with ss, for example, the row-normalized matrix

Wij=𝟏{ρ(i,j)1}k𝒩n𝟏{ρ(i,k)1}.W_{ij}=\frac{\bm{1}\{\rho(i,j)\leq 1\}}{\sum_{k\in\mathcal{N}_{n}}\bm{1}\{\rho(i,k)\leq 1\}}. (6)

If ψ(s)\psi(s) is known to decay at some exponential rate but the exponent is unknown, then we may choose mn=n1ϵm_{n}=n^{1-\epsilon} for any small ϵ>0\epsilon>0 for a nearly rate-optimal design, which yields a rate of convergence of n0.5(1ϵ)n^{-0.5(1-\epsilon)}, close to an n1/2n^{-1/2}-rate. This shows that rates close to n1/2n^{-1/2} are attainable in the absence of exposure mapping assumptions, despite targeting the global average treatment effect.

Worst-case decay. In practice, we may have little prior knowledge about ψ(s)\psi(s). Recall that Assumption 3 requires the rate of decay to be no slower than s2(1+ϵ)s^{-2(1+\epsilon)} for ϵ>0\epsilon>0. As discussed in Remark 1, this is the slowest rate for spatial dependence that ensures a finite variance. For this rate, since RnR_{n} is order n\sqrt{n}, the bias is order (n/mn)(1+ϵ)(n/m_{n})^{-(1+\epsilon)}. Without knowledge of ϵ\epsilon, we can settle for a nearly rate-optimal design by setting ϵ=0\epsilon=0 and choosing mnm_{n} to minimize (n/mn)2+mn1(n/m_{n})^{-2}+m_{n}^{-1}, which yields mn=n2/3m_{n}=n^{2/3} and an n1/3n^{-1/3}-rate of convergence. Under this design, cluster sizes grow at the rate rn2=n1/3r_{n}^{2}=n^{1/3}.

In the last three designs, the bias is o(mn1/2)o(m_{n}^{-1/2}), which is of smaller order than the variance. This makes the bias negligible from an asymptotic standpoint, but it would be useful to develop bias-reduction methods. We also reiterate that, while this analysis only provides rates, it is apparently by necessity at this level of generality. A finite-sample optimal design seems to require substantially more knowledge of the functional form of ψ()\psi(\cdot).

3.3 Practical Recommendations

The designs in the previous section rely on varying degrees of knowledge of ψ()\psi(\cdot), the rate at which interference vanishes with distance. In practice, this is likely unknown, so we recommend operating under the worst-case rate of decay discussed in the previous subsection. The default conservative choice we recommend using in practice is the near-optimal rate described there, namely

mn=n2/3.m_{n}=n^{2/3}. (7)

To construct the clusters, we recommend partitioning space into mnm_{n} clusters using a clustering algorithm, such as spectral clustering. A confidence interval (CI) for θn\theta_{n} is given in (11). In § 5, we explore in simulations the performance of the CI when clusters are constructed according to these recommendations.

Our large-sample theory assumes space is subdivided into evenly sized squares in order to avoid the difficult problem of optimizing over arbitrary shapes. However, since units are typically irregularly distributed in practice, division into equally sized squares may be inefficient, which is why we recommend the use of clustering algorithms. We suggest spectral clustering because it recovers, under weak conditions, low-conductance clusters [38], and low conductance is the key property of clusters utilized in our proofs, as discussed in § 6.

4 Inference

We next state results for asymptotic inference on θn\theta_{n}. Define σn2=Var(mnθ^)\sigma_{n}^{2}=\text{Var}(\sqrt{m_{n}}\hat{\theta}).

Assumption 5 (Non-degeneracy).

lim infnσn2>0\liminf_{n\rightarrow\infty}\sigma_{n}^{2}>0.

This is a standard condition and reasonable to impose in light of the lower bound on the variance derived in Theorem 2.

Theorem 3.

Suppose mnm_{n}\rightarrow\infty and mn=o(n)m_{n}=o(n). Under Assumptions 15,

σn1mn(θ^𝐄[θ^])d𝒩(0,1).\sigma_{n}^{-1}\sqrt{m_{n}}(\hat{\theta}-{\bf E}[\hat{\theta}])\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1).
Proof.

See § B. ∎

The result centers θ^\hat{\theta} at its expectation, not the estimand θn\theta_{n}. However, designs discussed in § 3.2 result in small bias, meaning |𝐄[θ^]θn|=o(mn1/2)\lvert{\bf E}[\hat{\theta}]-\theta_{n}\rvert=o(m_{n}^{-1/2}), so we can replace 𝐄[θ^]{\bf E}[\hat{\theta}] with θn\theta_{n} on the left-hand side. Also note that the assumption mn=o(n)m_{n}=o(n) implies that cluster sizes grow with nn. If instead mnm_{n} were of order nn, then rn=O(1)r_{n}=O(1), so by Theorem 2, we would additionally need to assume that there exists a KK-neighborhood exposure mapping in the sense of (1) in order to guarantee that the bias vanishes at all. In this case, it is straightforward to establish a normal approximation using existing results.

4.1 Proof Sketch

To our knowledge, there is no off-the-shelf central limit theorem that we can apply to θ^\hat{\theta}. Under Assumption 3, the outcomes appear to be near-epoch dependent on the input process {Di}i𝒩n\{D_{i}\}_{i\in\mathcal{N}_{n}}, but the treatments are cluster-dependent with growing cluster sizes, rather than α\alpha-mixing, as required by [27]. To prove a central limit theorem, they split the average into two parts: its expectation conditional on the dependent input process {Di}i𝒩n\{D_{i}\}_{i\in\mathcal{N}_{n}}, and a remainder that they show is small. Rather than conditioning on all treatments, we find that the following unit-specific conditioning event is more useful for proving our result.

Let 𝒞i\mathcal{C}_{i} be the cluster containing unit ii, and i={Dj:j𝒞i𝒩(i,κn)}\mathcal{F}_{i}=\{D_{j}\colon j\in\mathcal{C}_{i}\cup\mathcal{N}(i,\kappa_{n})\}. Rewrite the estimator as

θ^=1ni𝒩nZi=1ni𝒩n𝐄[Zii]+1ni𝒩n(Zi𝐄[Zii]).\hat{\theta}=\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}Z_{i}=\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}{\bf E}[Z_{i}\mid\mathcal{F}_{i}]+\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}(Z_{i}-{\bf E}[Z_{i}\mid\mathcal{F}_{i}]). (8)

We first show that the last term is relatively small, op(mn1/2)o_{p}(m_{n}^{-1/2}) to be precise, which means that, on average, ZiZ_{i} is primarily determined by i\mathcal{F}_{i}. The proof of this claim is somewhat complicated [and different from that of 27], but it is similar to the argument showing [B](nmn)1/2[B]\lesssim(nm_{n})^{-1/2} in (5). To then establish a central limit theorem for n1i𝒩n𝐄[Zii]n^{-1}\sum_{i\in\mathcal{N}_{n}}{\bf E}[Z_{i}\mid\mathcal{F}_{i}], we observe that the dependence between “observations” {𝐄[Zii]}i𝒩n\{{\bf E}[Z_{i}\mid\mathcal{F}_{i}]\}_{i\in\mathcal{N}_{n}} is characterized by the following dependency graph 𝑨\bm{A}, which, roughly speaking, links two units only if they are dependent. Recalling the definition of Λi\Lambda_{i} from (4), we connect units i,ji,j in 𝑨\bm{A} if and only if jΛij\in\Lambda_{i} (or equivalently iΛji\in\Lambda_{j}). Then 𝑨\bm{A} is indeed a dependency graph because, under Assumption 4, jΛij\not\in\Lambda_{i} implies that the treatment assignments that determine i\mathcal{F}_{i} are independent of those that determine j\mathcal{F}_{j}. The result follows from a central limit theorem for dependency graphs.

The proof highlights two sources of dependence. The first-order source is the first term on the right-hand side of (8). Dependence in this average is due to cluster randomization, which induces correlation in the treatments determining i\mathcal{F}_{i} across ii. The second-order source is the second term on the right-hand side of (8). Dependence in this average is due to interference, which decays with distance due to Assumption 3. Sävje [42] derives a similar decomposition in a different context with misspecified exposure mappings. The previous arguments show that the second-order source of dependence is small relative to the first-order source because, with large clusters, dependence induced by cluster randomization dominates dependence induced by interference. This is generally untrue with small clusters.

4.2 Variance Estimator

The proof sketch suggests that, to estimate σn2\sigma_{n}^{2}, it suffices to account for dependence induced by cluster randomization. Define Aij=𝟏{jΛi}A_{ij}=\bm{1}\{j\in\Lambda_{i}\}, where Λi\Lambda_{i} is defined in (4), and note that Aii=1A_{ii}=1 and Aij=AjiA_{ij}=A_{ji}. Let Z¯=n1i𝒩nZi\bar{Z}=n^{-1}\sum_{i\in\mathcal{N}_{n}}Z_{i}, which is equivalent to θ^\hat{\theta}. Our proposed variance estimator is

σ^2=mnn2i𝒩nj𝒩n(ZiZ¯)(ZjZ¯)Aij.\hat{\sigma}^{2}=\frac{m_{n}}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{j\in\mathcal{N}_{n}}(Z_{i}-\bar{Z})(Z_{j}-\bar{Z})A_{ij}. (9)
Theorem 4.

Suppose mnm_{n}\rightarrow\infty and mn=o(n)m_{n}=o(n). Under Assumptions 15, (σ^2n)/σn2p1(\hat{\sigma}^{2}-\mathcal{R}_{n})/\sigma_{n}^{2}\stackrel{{\scriptstyle p}}{{\longrightarrow}}1, where

n=mnnσ~^2andσ~^2=1ni𝒩nj𝒩n(𝐄[Zi]𝐄[Z¯])(𝐄[Zj]𝐄[Z¯])Aij.\mathcal{R}_{n}=\frac{m_{n}}{n}\hat{\tilde{\sigma}}^{2}\quad\text{and}\quad\hat{\tilde{\sigma}}^{2}=\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}\sum_{j\in\mathcal{N}_{n}}({\bf E}[Z_{i}]-{\bf E}[\bar{Z}])({\bf E}[Z_{j}]-{\bf E}[\bar{Z}])A_{ij}.
Proof.

See supplemental material [33]. ∎

The bias term n\mathcal{R}_{n} is typically nonzero due to the unit-level heterogeneity. That is, |𝐄[Zi]𝐄[Z¯]|\lvert{\bf E}[Z_{i}]-{\bf E}[\bar{Z}]\rvert does not approach zero asymptotically, except in the special case of homogeneous treatment effects where Yi(𝟏n)Yi(𝟎n)Y_{i}(\bm{1}_{n})-Y_{i}(\bm{0}_{n}) does not vary across ii. In the no-interference setting, it is well-known that the variance of the difference-in-means estimator is biased for the same reason and that consistent estimation of the variance is impossible. However, due to the term mn/n=o(1)m_{n}/n=o(1) in n\mathcal{R}_{n}, we will argue that typically n=op(1)\mathcal{R}_{n}=o_{p}(1), meaning that σ^2\hat{\sigma}^{2} is asymptotically exact.

Let us first compare n\mathcal{R}_{n} to its formulation under no interference. In this case, Yi(𝑫)=Yi(Di)Y_{i}(\bm{D})=Y_{i}(D_{i}), and we replace T1iT_{1i} with DiD_{i} and T0iT_{0i} with 1Di1-D_{i} to estimate the usual average treatment effect. Furthermore, we set Aij=0A_{ij}=0 for all iji\neq j because units are independent and set mn=nm_{n}=n since there is no longer a need to cluster units. With these changes, Zi=(Di/p(1Di)/(1p))YiZ_{i}=(D_{i}/p-(1-D_{i})/(1-p))Y_{i}, and

n=1ni𝒩n(τiτ¯)2\mathcal{R}_{n}=\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}(\tau_{i}-\bar{\tau})^{2} (10)

for τi=Yi(1)Yi(0)\tau_{i}=Y_{i}(1)-Y_{i}(0) and τ¯=n1i𝒩nτi\bar{\tau}=n^{-1}\sum_{i\in\mathcal{N}_{n}}\tau_{i}. This is the well-known expression for the bias in the absence of interference [e.g. 22, Theorem 6.2].

In our setting, we have additional “covariance” terms included in n\mathcal{R}_{n} due to the non-zero off-diagonals of the dependency graph AijA_{ij}. These would be problematic if they were negative and larger in magnitude than the main variance terms since that would make σ^2\hat{\sigma}^{2} anti-conservative. We show that this occurs with small probability, and in fact, that n\mathcal{R}_{n} is op(1)o_{p}(1). Observe that mn/n=o(1)m_{n}/n=o(1) and σ~^2\hat{\tilde{\sigma}}^{2} has the form of a HAC (heteroskedasticity and autocorrelation consistent) variance estimator [1, 13]. Hence, under conventional regularity conditions, σ~^2\hat{\tilde{\sigma}}^{2} is consistent for a variance term σ~20\tilde{\sigma}^{2}\geq 0, in which case n\mathcal{R}_{n} is non-negative in large samples, and furthermore, op(1)o_{p}(1). To formalize this intuition, we need to specify conditions on the superpopulation from which potential outcomes are drawn. In § A, we show that, if potential outcomes are α\alpha-mixing, then σ~^2\hat{\tilde{\sigma}}^{2} is asymptotically unbiased for σ~2=Var(n1/2i=1n𝐄[Zi{Yi(𝒅)}𝒅{0,1}n])\tilde{\sigma}^{2}=\text{Var}(n^{-1/2}\sum_{i=1}^{n}{\bf E}[Z_{i}\mid\{Y_{i}(\bm{d})\}_{\bm{d}\in\{0,1\}^{n}}]), and furthermore, Var(σ~^2)=O(n2/mn3)\text{Var}(\hat{\tilde{\sigma}}^{2})=O(n^{2}/m_{n}^{3}). Consequently, Var(n)=O(mn1)\text{Var}(\mathcal{R}_{n})=O(m_{n}^{-1}) due to the mn/nm_{n}/n term in its expression.

Remark 5 (Confidence interval).

As previously discussed, the bias of θ^n\hat{\theta}_{n} is o(mn1/2)o(m_{n}^{-1/2}) for the near-optimal designs discussed at the end of § 3. Thus, for such designs, the preceding discussion justifies the use of

θ^±1.96σ^mn1/2\hat{\theta}\pm 1.96\cdot\hat{\sigma}m_{n}^{-1/2} (11)

as an asymptotic 95-percent CI for θn\theta_{n}, where σ^2\hat{\sigma}^{2} is defined in (9).

Remark 6 (Literature).

Leung [31] proves a result similar to Theorem 4 but for a different variance estimator under a different design and variety of interference. Due to the lack of an analogous mn/nm_{n}/n term, in his setting, weak dependence conditions would only ensure npσ~20\mathcal{R}_{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\tilde{\sigma}^{2}\geq 0, in which case the estimator would be asymptotically conservative, whereas ours is asymptotically exact. He does not provide a formal result on the limit of n\mathcal{R}_{n}.

5 Monte Carlo

We next present results from a simulation study illustrating the quality of the normal approximation in Theorem 3 and coverage of the CI (11) when constructing clusters using spectral clustering. To generate spatial locations, let {ρ~i}i𝒩n\{\tilde{\rho}_{i}\}_{i\in\mathcal{N}_{n}} be i.i.d. draws from 𝒰([1,1]2)\mathcal{U}([-1,1]^{2}). Unit locations in 2\mathbb{R}^{2} are given by {ρi}i𝒩n\{\rho_{i}\}_{i\in\mathcal{N}_{n}} for ρi=Rnρ~i\rho_{i}=R_{n}\tilde{\rho}_{i} with Rn=nR_{n}=\sqrt{n}. We let ρ(i,j)=ρiρj\rho(i,j)=\lVert\rho_{i}-\rho_{j}\rVert where \lVert\cdot\rVert is the Euclidean norm.

We set the number of clusters according to (7), rounded to the nearest integer, which corresponds to the near-optimal design under the worst-case decay discussed in § 3.2. To construct clusters, we apply spectral clustering to {ρi}i𝒩n\{\rho_{i}\}_{i\in\mathcal{N}_{n}} with the standard Gaussian affinity matrix whose ijijth entry is exp{ρ(i,j)2}\text{exp}\{-\rho(i,j)^{2}\}. Clusters are randomized into treatment with probability p=0.5p=0.5. Figure 3 displays the clusters and treatment assignments for a typical simulation draw.

Refer to caption
Figure 3: The left figure colors units by cluster memberships obtained from spectral clustering (some colors are reused for different clusters). The right figure colors units by treatment assignment.

We generate outcomes from three different models. Let {εi}i𝒩niid𝒩(0,1)\{\varepsilon_{i}\}_{i\in\mathcal{N}_{n}}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,1) be drawn independently of the other primitives. The first model is Cliff-Ord:

Yi=α+λj𝒩nWijYj+δj𝒩nWijDj+Diβ+εiY_{i}=\alpha+\lambda\sum_{j\in\mathcal{N}_{n}}W_{ij}Y_{j}+\delta\sum_{j\in\mathcal{N}_{n}}W_{ij}D_{j}+D_{i}\beta+\varepsilon_{i}

with (α,λ,δ,β)=(1,0.8,1,1)(\alpha,\lambda,\delta,\beta)=(-1,0.8,1,1) and spatial weight matrix given by the row-normalized adjacency matrix (6). As discussed in § 3, this model features exponentially decaying ψ(s)\psi(s), in fact of order λs\lambda^{s} [31, Proposition 1].

We construct the second and third models to explore how our methods break down when Assumption 3 is violated or close to violated. For this purpose, we use the “moving average” model (2) with (α,β)=(1,1)(\alpha,\beta)=(-1,1) and Vij=ρ(i,j)ηV_{ij}=\rho(i,j)^{-\eta} for η=4,5\eta=4,5 for the two respective models, so that ψ(s)\psi(s) decays at a polynomial rate. Notably, the choice of η=4\eta=4 implies that the rate of decay is slow enough that Assumption 3 can fail to hold. This is because

(3)=s=1j𝒩n𝟏{ρ(i,j)[s1,s)}ρ(i,j)γ4cs=0sγ3\eqref{coani}=\sum_{s=1}^{\infty}\sum_{j\in\mathcal{N}_{n}}\bm{1}\{\rho(i,j)\in[s-1,s)\}\rho(i,j)^{\gamma-4}\leq c\sum_{s=0}^{\infty}s^{\gamma-3}

for some c>0c>0 by Lemma A.1(iii) of [25]. The right-hand side does not converge for some γ>2\gamma>2, as required by Proposition 1. On the other hand, the choice of η=5\eta=5 is large enough for Assumption 3 to be satisfied since we now replace the 3 on the right-hand side of the previous display with 4. However, in smaller samples, η=4\eta=4 or 5 may not be substantially different, so our methods may still break down from the assumption being “close to” violated.

Table 1 displays the results of 5000 simulation draws. Row “Bias(θ^)(\hat{\theta})” displays |𝐄[θ^θn]|\lvert{\bf E}[\hat{\theta}-\theta_{n}]\rvert, estimated by taking the average over the draws, while “Var(θ^)(\hat{\theta})” is the variance of θ^\hat{\theta} across the draws. The next rows display the coverage of three different confidence intervals. “Our CI” corresponds to the empirical coverage of (11). “Naive CI” corresponds to (11) but replaces σ^mn1/2\hat{\sigma}m_{n}^{-1/2} with the i.i.d. standard error, so the extent to which its coverage deviates from 95 percent illustrates the degree of spatial dependence. “Oracle CI” corresponds to (11) but replaces σ^mn1/2\hat{\sigma}m_{n}^{-1/2} with the “oracle” SE, which is the standard deviation of θ^\hat{\theta} across the draws. Note that the oracle SE approximates σn2+n\sigma_{n}^{2}+\mathcal{R}_{n} because the variance is taken over the randomness of the design as well as of the potential outcomes. Lastly, “SE” displays our standard error σ^mn1/2\hat{\sigma}m_{n}^{-1/2}.

Table 1: Simulation Results
Moving Avg, η=4\eta=4 Moving Avg, η=5\eta=5 Cliff-Ord
nn 250 500 1000 250 500 1000 250 500 1000
mnm_{n} 40 63 100 40 63 100 40 63 100
Our CI 0.909 0.925 0.924 0.922 0.937 0.940 0.979 0.983 0.982
Naive CI 0.530 0.489 0.447 0.575 0.539 0.494 0.918 0.913 0.916
Oracle CI 0.943 0.943 0.936 0.950 0.952 0.949 0.983 0.982 0.982
Bias(θ^)(\hat{\theta}) 0.108 0.093 0.083 0.033 0.027 0.024 0.009 0.004 0.003
Var(θ^)(\hat{\theta}) 0.143 0.087 0.054 0.108 0.065 0.039 0.341 0.177 0.087
SE 0.364 0.292 0.232 0.317 0.252 0.199 0.601 0.432 0.309
θ^\hat{\theta} 1.432 1.467 1.492 1.289 1.307 1.319 5.804 5.822 5.851
  • 5k simulations. The “CI” rows show the empirical coverage of 95% CIs. “Naive” and “Oracle” respectively correspond to i.i.d. and true standard errors.

There are at most 100 clusters in all designs, and the rate of convergence is quite slow at n1/3n^{-1/3} for our choice of mnm_{n}. Nonetheless, across all designs, the coverage of the oracle CI is close to 95 percent or above, which illustrates the quality of the normal approximation. For the Cliff-Ord model, our CI attains at least 95 percent coverage even for small sample sizes, despite mnm_{n} being chosen suboptimally for the worst-case decay. For the moving average model with η=5\eta=5, we see some under-coverage in smaller samples due to the larger bias, which is unsurprising from the above discussion, but coverage is close to the nominal level for larger nn. The results for η=4\eta=4, as expected, are worse since it is deliberately constructed to violate our main assumption. Once again, our CI exhibits under-coverage due to the larger bias, but coverage improves and bias decreases as nn grows.

6 Conclusion

This paper studies the design of cluster-randomized experiments targeting the global average treatment effect under spatial interference. Each design is characterized by a parameter mnm_{n} that determines the number and sizes of clusters. We propose a Horvitz-Thompson estimator that compares units with different neighborhood exposures to treatment, where the neighborhood radius is of the same order as clusters’ sizes given by the design. We asymptotically bound the estimator’s bias and variance as a function of mnm_{n} and the degree of interference and derive rate-optimal choices of mnm_{n}. Our lower asymptotic bound shows that designs using small clusters (those with asymptotically bounded sizes) generally result in a non-negligible asymptotic bias. On the other hand, constructing large clusters reduces the total number of clusters, resulting in a bias-variance trade-off that we seek to optimize in terms of rates through the choice of design.

In the worst case where the degree of interference is substantial, the estimator has an n1/3n^{-1/3}-rate of convergence under a nearly rate-optimal design, whereas in the best case where interference is characterized by a KK-neighborhood exposure mapping, the rate is n1/2n^{-1/2} under a rate-optimal design. We derive the asymptotic distribution of the estimator and provide an estimate of the variance.

Important areas for future research include data-driven choices of mnm_{n} and κn\kappa_{n} and methods to reduce the bias of the estimator. However, a rigorous theory appears to require more substantive restrictions on interference than what we impose.

Our results focus on the canonical case of spatial data in 2\mathbb{R}^{2}. We conjecture that they can be extended to d\mathbb{R}^{d} for d>2d>2 because our proofs fundamentally rely on the following key property of Euclidean space, which is true for any dimension: it is always possible to construct many clusters with low conductance, or boundary-to-volume ratio, for example by partitioning space into hypercubes or by spectral clustering [32]. This appears in our proofs through the use of Lemma A.1 of [25], which, together with Assumption 3, is crucial to establish that spatially distant units have small covariance, despite dependence induced by cluster randomization and interference. In this sense, the technical idea behind this paper is to exploit a useful property of Euclidean space – the existence of many low-conductance clusters – to show that cluster-randomized designs may be fruitfully applied to the problem of spatial interference.

The story for network interference appears to be different. Existing cluster-randomized designs have theoretical guarantees under exposure mapping assumptions, but it is an open question whether such designs work under weaker restrictions on interference such as Assumption 3. In order to directly apply our idea in the previous paragraph, the network must possess many low-conductance clusters across which we can randomize. Unfortunately, this is a strong requirement in practice because, as discussed in [32], not only do some networks not possess multiple low-conductance clusters, but, of those that do, some apparently possess only a small number of such clusters. Because network “space” differs from Euclidean space in this fundamental aspect, under network interference, clusters can be strongly dependent in the absence of exposure mapping assumptions.

Appendix A Bias of the Variance Estimator

Characterizing the asymptotic behavior of n\mathcal{R}_{n} requires conditions on the superpopulation from which units are drawn. In this section, we assume potential outcomes are random and constitute a weakly dependent spatial process (independent of 𝑫\bm{D}). Accordingly, we rewrite 𝐄[Zi]{\bf E}[Z_{i}] in Theorem 4 as

Z~i𝐄[Zi{Yi(𝒅)}𝒅{0,1}n].\tilde{Z}_{i}\equiv{\bf E}[Z_{i}\mid\{Y_{i}(\bm{d})\}_{\bm{d}\in\{0,1\}^{n}}].

We require the spatial process {Z~i}i𝒩n\{\tilde{Z}_{i}\}_{i\in\mathcal{N}_{n}} to be α\alpha-mixing, which is a standard concept of weak spatial dependence. The results we use in fact apply to the weaker concept of near-epoch dependence, but we focus on mixing since it requires less exposition.

Definition A.1.

Let (Ω,,𝐏)(\Omega,\mathcal{F},{\bf P}) be the probability space, 𝒜,\mathcal{A},\mathcal{B} be sub-σ\sigma-algebras of \mathcal{F}, and

α(𝒜,)=sup{|𝐏(AB)𝐏(A)𝐏(B)|;A𝒜,B}.\alpha(\mathcal{A},\mathcal{B})=\sup\{\lvert{\bf P}(A\cap B)-{\bf P}(A){\bf P}(B)\rvert;A\in\mathcal{A},B\in\mathcal{B}\}.

For U,V𝒩nU,V\subseteq\mathcal{N}_{n}, let αn(U,V)=α(σn(U),σn(V))\alpha_{n}(U,V)=\alpha(\sigma_{n}(U),\sigma_{n}(V)), where σn(U)\sigma_{n}(U) is σ\sigma-algebra generated by {Z~i}iU\{\tilde{Z}_{i}\}_{i\in U}. The α\alpha-mixing coefficient of {Z~i}i𝒩n\{\tilde{Z}_{i}\}_{i\in\mathcal{N}_{n}} is

α¯(u,v,r)=supnsupU,V{αn(U,V);|U|u,|V|v,ρ(U,V)r}\bar{\alpha}(u,v,r)=\sup_{n}\sup_{U,V}\{\alpha_{n}(U,V);\lvert U\rvert\leq u,\lvert V\rvert\leq v,\rho(U,V)\geq r\}

for u,vu,v\in\mathbb{N}, r+r\in\mathbb{R}_{+}, and ρ(U,V)=min{ρ(i,j):iU,jV}\rho(U,V)=\min\{\rho(i,j)\colon i\in U,j\in V\}.

That is, for any two sets of units U,V𝒩nU,V\subseteq\mathcal{N}_{n} with respective sizes u,vu,v such that the minimum distance between U,VU,V is at least rr, α¯(u,v,r)\bar{\alpha}(u,v,r) bounds their dependence with respect to observations {Z~i}i𝒩n\{\tilde{Z}_{i}\}_{i\in\mathcal{N}_{n}}.

Example A.1.

Suppose that, for any nn and i𝒩ni\in\mathcal{N}_{n}, there exists a function f()f(\cdot) such that Yi(𝒅)=f(εi,𝒅)Y_{i}(\bm{d})=f(\varepsilon_{i},\bm{d}). If the unobserved heterogeneity {εi}i𝒩n\{\varepsilon_{i}\}_{i\in\mathcal{N}_{n}} is α\alpha-mixing and Z~i\tilde{Z}_{i} is a Borel-measurable function of εi\varepsilon_{i} (a mild requirement since treatments are independent of potential outcomes), then {Z~i}i𝒩n\{\tilde{Z}_{i}\}_{i\in\mathcal{N}_{n}} is α\alpha-mixing.

Example A.2.

Generalizing the previous example, suppose Yi(𝒅)=f(di,di,εi,εi)Y_{i}(\bm{d})=f(d_{i},d_{-i},\varepsilon_{i},\varepsilon_{-i}), where 𝒅=(di,di)\bm{d}=(d_{i},d_{-i}) and εi\varepsilon_{-i} is similarly defined. Under some conditions on f()f(\cdot), one can ensure that {Z~i}i𝒩n\{\tilde{Z}_{i}\}_{i\in\mathcal{N}_{n}} is near-epoch dependent on the input {εi}i𝒩n\{\varepsilon_{i}\}_{i\in\mathcal{N}_{n}} [e.g. 27, Proposition 1]. However, we only focus on mixing.

We next discuss the intuition behind our main result. Let Z~¯=n1i=1nZ~i\bar{\tilde{Z}}=n^{-1}\sum_{i=1}^{n}\tilde{Z}_{i}, so that

n=mnnσ~^2,whereσ~^2=1ni𝒩nj𝒩n(Z~i𝐄[Z~¯])(Z~i𝐄[Z~¯])Aij.\mathcal{R}_{n}=\frac{m_{n}}{n}\hat{\tilde{\sigma}}^{2},\quad\text{where}\quad\hat{\tilde{\sigma}}^{2}=\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}\sum_{j\in\mathcal{N}_{n}}(\tilde{Z}_{i}-{\bf E}[\bar{\tilde{Z}}])(\tilde{Z}_{i}-{\bf E}[\bar{\tilde{Z}}])A_{ij}.

Observe that mn/n=o(1)m_{n}/n=o(1), and σ~^2\hat{\tilde{\sigma}}^{2} is essentially a HAC variance estimator with “kernel” AijA_{ij}. More precisely, AijA_{ij} is sandwiched between two uniform kernels:

𝟏{ρ(i,j)κn}Aij𝟏{ρ(i,j)2rn+κn}.\bm{1}\{\rho(i,j)\leq\kappa_{n}\}\leq A_{ij}\leq\bm{1}\{\rho(i,j)\leq 2r_{n}+\kappa_{n}\}. (A.1)

This is a consequence of the construction of clusters. The lower bound holds because Λi\Lambda_{i} must include ii’s κn\kappa_{n}-neighborhood. The upper bound is achieved if ii is located at a corner shared by four clusters, and each such cluster intersects some κn\kappa_{n}-neighborhood that is maximally distant from the cluster. Notably, the bandwidths of the two kernels are of the same asymptotic order (recalling that κn\kappa_{n} has the same order as rnr_{n}). Hence, σ~^2\hat{\tilde{\sigma}}^{2} should behave as a HAC variance estimator. This has two implications.

  1. 1.

    σ~^2\hat{\tilde{\sigma}}^{2} should be consistent for a non-negative variance term under standard regularity conditions, so n=op(1)\mathcal{R}_{n}=o_{p}(1).

  2. 2.

    If, in the formula for σ^2\hat{\sigma}^{2}, we replace AijA_{ij} with the uniform kernel in the upper bound (A.1), then we would have a positive semidefinite HAC estimator, implying n0\mathcal{R}_{n}\geq 0 a.s. [1, 13, §3.3.1]. Then in the finite-population setting of our main results, σ^2\hat{\sigma}^{2} would be at worst asymptotically conservative.

    We choose not use a spatial HAC for σ^2\hat{\sigma}^{2} because they have a reputation for being anti-conservative in smaller samples [see references in e.g. 32]. In our estimator, AijA_{ij} functions as a sort of heterogeneous bandwidth determined by the design that allows different units to have different neighborhood radii in the variance estimator, whereas HAC kernels imply homogeneous radii determined by the bandwidth. The hope is that heterogeneous radii could translate to better finite-sample properties since they directly capture the first-order dependency neighborhood.

We next state regularity conditions taken from [24], which we use to apply her Theorem 4 on consistency of HAC estimators.

Assumption A.1.

The mixing coefficient satisfies α¯(u,v,r)(u+v)ςα^(r)\bar{\alpha}(u,v,r)\leq(u+v)^{\varsigma}\hat{\alpha}(r) for ς0\varsigma\geq 0 and r=1r2(ς+1)1α^(r)<\sum_{r=1}^{\infty}r^{2(\varsigma+1)-1}\hat{\alpha}(r)<\infty.

This is Assumption 2 of [24]. The substance of the condition is the requirement that the mixing coefficient decays at a sufficiently fast rate with distance rr. For ς>0\varsigma>0, the rate requirement is stronger than what we require of ψ()\psi(\cdot) in Assumption 3.

Assumption A.2.

(a) 𝐄[Z~i]=𝐄[Z~j]{\bf E}[\tilde{Z}_{i}]={\bf E}[\tilde{Z}_{j}] for all i,ji,j. (b) Var(n1/2i=1nZ~i)pσ~2<\text{Var}(n^{-1/2}\sum_{i=1}^{n}\tilde{Z}_{i})\stackrel{{\scriptstyle p}}{{\longrightarrow}}\tilde{\sigma}^{2}<\infty.

This is Assumption 7(a) of [24]. Part (a) is a standard mean-homogeneity condition required for HAC to be asymptotically unbiased. Such a requirement is untenable in the finite population setting of Theorem 4 because Z~i\tilde{Z}_{i} is a function of ii’s potential outcomes which are generally heterogeneous across units. Heterogeneity is responsible for the appearance of the bias n\mathcal{R}_{n}. However, in the superpopulation setting of this section, the assumption is much more tenable since we integrate over the randomness of the potential outcomes. The condition then requires that the mean be invariant to unit labels. The finiteness requirement in part (b) can be proven as a consequence of Assumption A.1 and moment conditions, so the only substance of the assumption is the existence of a limit.

Theorem A.1.

Under Assumptions 1, 2, A.1, and A.2, if mnm_{n}\rightarrow\infty, then (a) 𝐄[σ~^2]σ~2{\bf E}[\hat{\tilde{\sigma}}^{2}]\rightarrow\tilde{\sigma}^{2}, and (b) Var(σ~^)=O(n2/mn3)\text{Var}(\hat{\tilde{\sigma}})=O(n^{2}/m_{n}^{3}).

We next discuss the implications of the theorem for n\mathcal{R}_{n} (also see the discussion in § 4.2) and then conclude with the proof. The designs discussed at the end of § 3, other than under the worst-case decay, choose mnm_{n} to be of substantially higher order than n2/3n^{2/3}. In this case, Theorem A.1 yields

n=mnno(1)σ~^2pσ~20p0.\mathcal{R}_{n}=\underbrace{\frac{m_{n}}{n}}_{o(1)}\underbrace{\hat{\tilde{\sigma}}^{2}}_{\stackrel{{\scriptstyle p}}{{\longrightarrow}}\tilde{\sigma}^{2}\geq 0}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

For the worst-case decay, mn=n2/3m_{n}=n^{2/3}, in which case σ~^2\hat{\tilde{\sigma}}^{2} remains asymptotically unbiased for σ~20\tilde{\sigma}^{2}\geq 0, and we still have Var(n)=O(mn1)\text{Var}(\mathcal{R}_{n})=O(m_{n}^{-1}), so again n=op(1)\mathcal{R}_{n}=o_{p}(1).

Proof of Theorem A.1.

We apply Theorem 4 of [24]. Note that our setting is a simple mean estimation problem, which is a special case of her semiparametric model Y1in=h(Y2in,θ0)+g(Xin)+UinY_{1in}=h(Y_{2in},\theta_{0})+g(X_{in})+U_{in} with h(Y2in,θ0)+g(Xin)=0h(Y_{2in},\theta_{0})+g(X_{in})=0 and UinU_{in} equal to our Z~i𝐄[Z~¯]=Z~i𝐄[Z~i]\tilde{Z}_{i}-{\bf E}[\bar{\tilde{Z}}]=\tilde{Z}_{i}-{\bf E}[\tilde{Z}_{i}] (by Assumption A.2(a)). The moments mεn(Win,θ^,τ^)m_{\varepsilon_{n}}(W_{in},\hat{\theta},\hat{\tau}) in her formula (13) for the HAC estimator corresponds to our Z~i𝐄[Z~i]\tilde{Z}_{i}-{\bf E}[\tilde{Z}_{i}]. Other than Assumption 10, her assumptions are either satisfied (increasing domain corresponds to our Assumption 2 and the moment conditions hold by our Assumption 1 and Remark 3) or are irrelevant in our setting.

Assumption 10 concerns the properties of the kernel function. For context, note that if, hypothetically, we replaced AijA_{ij} in σ~^2\hat{\tilde{\sigma}}^{2} with its upper bound in (A.1), then the kernel K()K(\cdot) and bandwidth βn\beta_{n} in [24] would correspond in our setting to the uniform kernel 𝟏{1}\bm{1}\{\cdot\leq 1\} and 2rn+κn2r_{n}+\kappa_{n}, respectively, so that K(x/βn)K(x/\beta_{n}) in Jenish’s notation would correspond to our 𝟏{x2rn+κn}\bm{1}\{x\leq 2r_{n}+\kappa_{n}\}.

Now, because AijA_{ij} is only bounded by kernel functions but cannot be written as one, we cannot directly verify Assumption 10. However, inspection of the proof reveals that the assumption is used as follows. First, to derive bounds on the variance of the HAC estimator (Step 1 of the proof), uniform boundedness of the kernel is used, but this is trivially satisfied by Aij{0,1}A_{ij}\in\{0,1\}. Second, to derive bounds on the bias (Step 2 of the proof), Assumption 10 is used to show that the term ar,n=argmaxrxr+1|K(x/βn)1|0a_{r,n}=\operatornamewithlimits{argmax}_{r\leq x\leq r+1}\lvert K(x/\beta_{n})-1\rvert\rightarrow 0 as nn\rightarrow\infty for any r>0r>0. In our case, ar,na_{r,n} corresponds to argmaxi,j𝒩n:ρ(i,j)[r,r+1]|Aij1|\operatornamewithlimits{argmax}_{i,j\in\mathcal{N}_{n}\colon\rho(i,j)\in[r,r+1]}\lvert A_{ij}-1\rvert. But this has the desired property; it is in fact exactly zero for nn sufficiently large due to (A.1).

Hence, the conclusions of the proof of Jenish’s Theorem 4 apply to σ~^2\hat{\tilde{\sigma}}^{2}, which we now apply to prove our claims. Part (a) of our theorem follows from Step 2 of her proof. Next, in Step 1 of her proof, H2n,H3n,H4n=0H_{2n},H_{3n},H_{4n}=0 in our setting because the data is α\alpha-mixing rather than near-epoch dependent. Accordingly, the variance bound on H1nH_{1n} in that step implies that the variance of the HAC estimator is O(n1βn3d)O(n^{-1}\beta_{n}^{3d}) where βn\beta_{n} is the bandwidth and dd is the dimension of the space. In our case, by (A.1), the bandwidth corresponds to βn=2rn+κn=O(n/mn)\beta_{n}=2r_{n}+\kappa_{n}=O(\sqrt{n/m_{n}}), so n1βn3d=O(n2/mn3)n^{-1}\beta_{n}^{3d}=O(n^{2}/m_{n}^{3}), and part (b) of our theorem follows. ∎

Appendix B Proofs

The proofs use the following definitions. Let 𝒄(Ck)2\bm{c}(C_{k})\in\mathbb{R}^{2} be the centroid of cluster CkC_{k}. For srn1s\leq r_{n}-1, define

J(s,Ck)={jCk:ρ(𝒄(Ck),j)[rns1,rns)}.J(s,C_{k})=\big{\{}j\in C_{k}\colon\rho(\bm{c}(C_{k}),j)\in[r_{n}-s-1,r_{n}-s)\big{\}}. (B.1)

For s=0s=0, this is the “boundary” of CkC_{k}, and as we increase ss, J(s,Ck)J(s,C_{k}) moves through contour sets within CkC_{k} that are increasingly further from the boundary. Also, for any two sets S,T2S,T\subset\mathbb{R}^{2}, let ρ(S,T)=min{ρ(i,j):iS,jT}\rho(S,T)=\min\{\rho(i,j)\colon i\in S,j\in T\}.

The proofs make use of the following facts, which are a consequence of Lemma A.1 of [25] and use Assumption 2. Given that CkC_{k} has radius rnr_{n}, |Ck|crn2\lvert C_{k}\rvert\leq c^{\prime}r_{n}^{2} and |J(0,Ck)|crn\lvert J(0,C_{k})\rvert\leq c^{\prime}r_{n} for some universal constant c>0c^{\prime}>0 that does not depend on nn or kk. Also, since J(s,Ck)J(s,C_{k}) is the boundary of a ball of radius rnsr_{n}-s, |J(s,Ck)|c(rns)\lvert J(s,C_{k})\rvert\leq c^{\prime}(r_{n}-s).

Lemma B.1.

Recall the definition of [B][B] from (5). Under the assumptions of Theorem 1, [B](nmn)1/2[B]\lesssim(nm_{n})^{-1/2}.

Proof.

Step 1. We first establish covariance bounds. Fix i,ji,j such that jΛij\not\in\Lambda_{i}, the latter defined in (4), and set s=ρ(i,j)s=\rho(i,j). Trivially,

|Cov(Zi,Zj)|=|Cov(Zi,Zj)|𝟏{s4rn}+|Cov(Zi,Zj)|𝟏{s>4rn}.\lvert\text{Cov}(Z_{i},Z_{j})\rvert=\lvert\text{Cov}(Z_{i},Z_{j})\rvert\bm{1}\{s\leq 4r_{n}\}+\lvert\text{Cov}(Z_{i},Z_{j})\rvert\bm{1}\{s>4r_{n}\}.

First consider the case s4rns\leq 4r_{n}. Let 𝒞i\mathcal{C}_{i} be the cluster containing unit ii, i(r)={Dj:ρ(i,j)r or j𝒞i}\mathcal{F}_{i}(r)=\{D_{j}\colon\rho(i,j)\leq r\text{ or }j\in\mathcal{C}_{i}\}, and Xir=𝐄[Zii(r)]X_{i}^{r}={\bf E}[Z_{i}\mid\mathcal{F}_{i}(r)]. As a preliminary result, we bound the discrepancy between ZiZ_{i} and XisX_{i}^{s}.

Let t=ρ({i},J(0,𝒞i))t=\rho(\{i\},J(0,\mathcal{C}_{i})), the distance between ii and the nearest unit in the boundary of 𝒞i\mathcal{C}_{i}. By Assumption 3, for any q>0q>0,

𝐄[|ZiXis|qT1i=1]=p1iq𝐄[|Yi(𝑫)𝐄[Yi(𝑫)i(s)]|qT1i=1]p1iqψ(max{t,s})q.{\bf E}[\lvert Z_{i}-X_{i}^{s}\rvert^{q}\mid T_{1i}=1]=p_{1i}^{-q}{\bf E}[\lvert Y_{i}(\bm{D})-{\bf E}[Y_{i}(\bm{D})\mid\mathcal{F}_{i}(s)]\rvert^{q}\mid T_{1i}=1]\\ \leq p_{1i}^{-q}\psi(\max\{t,s\})^{q}.

The equality holds because i(s)\mathcal{F}_{i}(s) conditions on DiD_{i}, and by Assumption 4, T1i=1T_{1i}=1 implies Di=1D_{i}=1, which implies T0i=0T_{0i}=0. The inequality holds because i(s)\mathcal{F}_{i}(s) fixes {Dj:jQ(i,t)Q(i,s)}\{D_{j}\colon j\in Q(i,t)\cup Q(i,s)\} at their realized values. Similarly, 𝐄[|ZiXis|qT0i=1]p0iqψ(max{t,s})q{\bf E}[\lvert Z_{i}-X_{i}^{s}\rvert^{q}\mid T_{0i}=1]\leq p_{0i}^{-q}\psi(\max\{t,s\})^{q}, so by the law of total probability and Remark 3, for some universal constant c′′>0c^{\prime\prime}>0,

𝐄[|ZiXis|q]1/qc′′ψ(max{t,s}).{\bf E}[\lvert Z_{i}-X_{i}^{s}\rvert^{q}]^{1/q}\leq c^{\prime\prime}\psi(\max\{t,s\}). (B.2)

Define Ri=ZiXiκnR_{i}=Z_{i}-X_{i}^{\kappa_{n}}. Since jΛij\not\in\Lambda_{i}, |Cov(Xiκn,Xjκn)|=0\lvert\text{Cov}(X_{i}^{\kappa_{n}},X_{j}^{\kappa_{n}})\rvert=0 by Assumption 4. Applying the Cauchy-Schwarz and Jensen’s inequalities and (B.2) for q=2q=2,

|Cov(Zi,Zj)|\displaystyle\lvert\text{Cov}(Z_{i},Z_{j})\rvert |Cov(Xiκn,Xjκn)|+|Cov(Xiκn,Rj)|+|Cov(Ri,Xjκn)|+|Cov(Ri,Rj)|\displaystyle\leq\lvert\text{Cov}(X_{i}^{\kappa_{n}},X_{j}^{\kappa_{n}})\rvert+\lvert\text{Cov}(X_{i}^{\kappa_{n}},R_{j})\rvert+\lvert\text{Cov}(R_{i},X_{j}^{\kappa_{n}})\rvert+\lvert\text{Cov}(R_{i},R_{j})\rvert
2c′′(Zi2ψ(max{t,κn})+Zj2ψ(max{t,κn})+ψ(max{t,κn})2)\displaystyle\leq 2c^{\prime\prime}(\lVert Z_{i}\rVert_{2}\psi(\max\{t,\kappa_{n}\})+\lVert Z_{j}\rVert_{2}\psi(\max\{t,\kappa_{n}\})+\psi(\max\{t,\kappa_{n}\})^{2})
cψ(max{t,κn})\displaystyle\leq c\,\psi(\max\{t,\kappa_{n}\})

for some universal c>0c>0.

Next consider the case s>4rns>4r_{n}. Abbreviate Xi=Xis/2rnX_{i}=X_{i}^{s/2-r_{n}}, and redefine Ri=ZiXiR_{i}=Z_{i}-X_{i}. Note that ρ(Q(i,s/2rn),Q(j,s/2rn))>2rn\rho(Q(i,s/2-r_{n}),Q(j,s/2-r_{n}))>2r_{n} and 2rn2r_{n} is the diameter of a cluster, so XiXjX_{i}\perp\!\!\!\perp X_{j}. Consequently, following the previous argument,

|Cov(Zi,Zj)||Cov(Xi,Xj)|+|Cov(Xi,Rj)|+|Cov(Ri,Xj)|+|Cov(Ri,Rj)|cψ(max{t,s/2rn}).\lvert\text{Cov}(Z_{i},Z_{j})\rvert\leq\lvert\text{Cov}(X_{i},X_{j})\rvert+\lvert\text{Cov}(X_{i},R_{j})\rvert+\lvert\text{Cov}(R_{i},X_{j})\rvert+\lvert\text{Cov}(R_{i},R_{j})\rvert\\ \leq c\,\psi(\max\{t,s/2-r_{n}\}). (B.3)

Step 2. For any cc\in\mathbb{R}, let c\lfloor c\rfloor denote cc rounded down to the nearest integer. Using the covariance bounds derived in step 1,

1n2i𝒩njΛi|Cov(Zi,Zj)|cn2k=1mns=12Rnt=0min{s,rn1}iJ(t,Ck)jΛi𝟏{ρ(i,j)[s1,s)}×(ψ(max{t,rn/2})𝟏{s4rn}+ψ(max{t,s/2rn})𝟏{s>4rn})[B1]+[B2],\frac{1}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{j\not\in\Lambda_{i}}\lvert\text{Cov}(Z_{i},Z_{j})\rvert\leq\frac{c}{n^{2}}\sum_{k=1}^{m_{n}}\sum_{s=1}^{\lfloor 2R_{n}\rfloor}\sum_{t=0}^{\lfloor\min\{s,r_{n}-1\}\rfloor}\sum_{i\in J(t,C_{k})}\sum_{j\not\in\Lambda_{i}}\bm{1}\{\rho(i,j)\in[s-1,s)\}\\ \times\big{(}\psi(\max\{t,r_{n}/2\})\bm{1}\{s\leq 4r_{n}\}+\psi(\max\{t,s/2-r_{n}\})\bm{1}\{s>4r_{n}\}\big{)}\\ \equiv[B1]+[B2], (B.4)

where [B1][B1] takes the part involving s4rns\leq 4r_{n} and [B2][B2] the remainder. Note that tt can be at most rn1r_{n}-1 because J(rn1,Ck)J(r_{n}-1,C_{k}) is the 1-ball centered at the centroid of CkC_{k}, and it can be at most ss because ρ(i,j)[s1,s)\rho(i,j)\in[s-1,s) and jCkj\not\in C_{k} since jΛij\not\in\Lambda_{i}.

As discussed at the start of this section, jΛi𝟏{ρ(i,j)[s1,s)}j𝒩n𝟏{ρ(i,j)[s1,s)}cs\sum_{j\not\in\Lambda_{i}}\bm{1}\{\rho(i,j)\in[s-1,s)\}\leq\sum_{j\in\mathcal{N}_{n}}\bm{1}\{\rho(i,j)\in[s-1,s)\}\leq c^{\prime}\,s and |J(t,Ck)|c(rnt)\lvert J(t,C_{k})\rvert\leq c^{\prime}(r_{n}-t) for some universal c>0c^{\prime}>0 by Assumption 2. Then by Assumption 3,

[B1]\displaystyle[B1] cmnn2s=14rncst=0min{s,rn1}c(rnt)ψ(t)\displaystyle\leq c\,\frac{m_{n}}{n^{2}}\sum_{s=1}^{\lfloor 4r_{n}\rfloor}c^{\prime}s\sum_{t=0}^{\lfloor\min\{s,r_{n}-1\}\rfloor}c^{\prime}(r_{n}-t)\psi(t)
mnn2s=14rns(rnt=0ψ(t)+t=0tψ(t))mnn2rn2rn1nmn.\displaystyle\lesssim\frac{m_{n}}{n^{2}}\sum_{s=1}^{\lfloor 4r_{n}\rfloor}s\left(r_{n}\sum_{t=0}^{\infty}\psi(t)+\sum_{t=0}^{\infty}t\,\psi(t)\right)\lesssim\frac{m_{n}}{n^{2}}r_{n}^{2}r_{n}\lesssim\frac{1}{\sqrt{nm_{n}}}. (B.5)

Finally,

[B2]\displaystyle[B2] cmnn2s=4rn2Rncst=0min{s,rn1}c(rnt)ψ(s/2rn)\displaystyle\leq c\,\frac{m_{n}}{n^{2}}\sum_{s=\lfloor 4r_{n}\rfloor}^{\lfloor 2R_{n}\rfloor}c^{\prime}s\sum_{t=0}^{\lfloor\min\{s,r_{n}-1\}\rfloor}c^{\prime}(r_{n}-t)\psi(s/2-r_{n})
mnn2t=0rn1(rnt)s=4rn2Rnsψ(s/2rn)mnn2rn2s=4rn2Rnsψ(s/2rn)\displaystyle\lesssim\frac{m_{n}}{n^{2}}\sum_{t=0}^{\lfloor r_{n}-1\rfloor}(r_{n}-t)\sum_{s=\lfloor 4r_{n}\rfloor}^{\lfloor 2R_{n}\rfloor}s\,\psi(s/2-r_{n})\lesssim\frac{m_{n}}{n^{2}}r_{n}^{2}\sum_{s=\lfloor 4r_{n}\rfloor}^{\lfloor 2R_{n}\rfloor}s\,\psi(s/2-r_{n})
mnn2rn2rn1nmn.\displaystyle\lesssim\frac{m_{n}}{n^{2}}r_{n}^{2}r_{n}\lesssim\frac{1}{\sqrt{nm_{n}}}. (B.6)

Proof of Theorem 3.

Recall that i={Dj:j𝒞i𝒩(i,κn)}\mathcal{F}_{i}=\{D_{j}\colon j\in\mathcal{C}_{i}\cup\mathcal{N}(i,\kappa_{n})\}, and define Ri=Zi𝐄[Zii]R_{i}=Z_{i}-{\bf E}[Z_{i}\mid\mathcal{F}_{i}].

Step 1. We show that

𝐄[(mn1ni𝒩nRi)2]=o(1).{\bf E}\left[\left(\sqrt{m_{n}}\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}R_{i}\right)^{2}\right]=o(1).

Recalling (4), the left-hand side equals

mnn2i𝒩njΛi𝐄[RiRj]+mnn2i𝒩njΛi𝐄[RiRj][C]+[D].\frac{m_{n}}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{j\in\Lambda_{i}}{\bf E}[R_{i}R_{j}]+\frac{m_{n}}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{j\not\in\Lambda_{i}}{\bf E}[R_{i}R_{j}]\equiv[C]+[D].

If jΛij\in\Lambda_{i}, then ρ(i,j)3rn\rho(i,j)\leq 3r_{n}. Using this fact and (B.2) with s=rn/2s=r_{n}/2,

|[C]|\displaystyle\lvert[C]\rvert mnn2i𝒩n𝐄[Ri2]+mnn2i𝒩ns=13rnji(c′′ψ(rn/2))2𝟏{ρ(i,j)[s1,s)}\displaystyle\leq\frac{m_{n}}{n^{2}}\sum_{i\in\mathcal{N}_{n}}{\bf E}[R_{i}^{2}]+\frac{m_{n}}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{s=1}^{\lfloor 3r_{n}\rfloor}\sum_{j\neq i}(c^{\prime\prime}\psi(r_{n}/2))^{2}\bm{1}\{\rho(i,j)\in[s-1,s)\}
mnn+mnns=13rnsψ(rn/2)2mnn+ψ(rn/2)2=o(1).\displaystyle\lesssim\frac{m_{n}}{n}+\frac{m_{n}}{n}\sum_{s=1}^{\lfloor 3r_{n}\rfloor}s\,\psi(r_{n}/2)^{2}\lesssim\frac{m_{n}}{n}+\psi(r_{n}/2)^{2}=o(1).

For |[D]|\lvert[D]\rvert, we first establish some covariance bounds. Fix i,ji,j such that jΛij\not\in\Lambda_{i}, and let s=ρ(i,j)s=\rho(i,j). Let t=ρ({i},J(0,𝒞i))t=\rho(\{i\},J(0,\mathcal{C}_{i})), the distance between ii and the nearest unit in the boundary of 𝒞i\mathcal{C}_{i}. By Assumption 3,

𝐄[|Ri|2T1i=1]=p1i2𝐄[|Yi(𝑫)𝐄[Yi(𝑫)i]|2T1i=1]p1i2ψ(t)2.{\bf E}[\lvert R_{i}\rvert^{2}\mid T_{1i}=1]=p_{1i}^{-2}{\bf E}\big{[}\lvert Y_{i}(\bm{D})-{\bf E}[Y_{i}(\bm{D})\mid\mathcal{F}_{i}]\rvert^{2}\mid T_{1i}=1\big{]}\leq p_{1i}^{-2}\psi(t)^{2}.

Similarly, 𝐄[|Ri|2T0i=1]p0i2ψ(t)2{\bf E}[\lvert R_{i}\rvert^{2}\mid T_{0i}=1]\leq p_{0i}^{-2}\psi(t)^{2}, so by the law of total probability and Remark 3, for some universal constant c>0c>0,

𝐄[|Ri|2]cψ(t)2and|Cov(Ri,Rj)|cψ(t).{\bf E}[\lvert R_{i}\rvert^{2}]\leq c\,\psi(t)^{2}\quad\text{and}\quad\lvert\text{Cov}(R_{i},R_{j})\rvert\leq c\,\psi(t).

We derive an alternate bound for the case s>4rns>4r_{n}. Let i(r)={Dj:ρ(i,j)r or j𝒞i}\mathcal{F}_{i}(r)=\{D_{j}\colon\rho(i,j)\leq r\text{ or }j\in\mathcal{C}_{i}\} and Xi=𝐄[Zii(s/2rn)]X_{i}={\bf E}[Z_{i}\mid\mathcal{F}_{i}(s/2-r_{n})]. Then

SiRi𝐄[Rii(s/2rn)]=R~iZiXiS_{i}\equiv R_{i}-{\bf E}[R_{i}\mid\mathcal{F}_{i}(s/2-r_{n})]=\tilde{R}_{i}\equiv Z_{i}-X_{i}

since 𝒩(i,s/2rn)𝒩(i,κn)\mathcal{N}(i,s/2-r_{n})\supseteq\mathcal{N}(i,\kappa_{n}). Moreover, XiXjX_{i}\perp\!\!\!\perp X_{j} since ρ(Q(i,s/2rn),Q(j,s/2rn))>2rn\rho(Q(i,s/2-r_{n}),Q(j,s/2-r_{n}))>2r_{n} and 2rn2r_{n} is the diameter of a cluster. Consequently, by (B.3),

|Cov(Ri,Rj)|\displaystyle\lvert\text{Cov}(R_{i},R_{j})\rvert |Cov(Xi,Xj)|+|Cov(Xi,Sj)|+|Cov(Si,Xj)|+|Cov(Si,Sj)|\displaystyle\leq\lvert\text{Cov}(X_{i},X_{j})\rvert+\lvert\text{Cov}(X_{i},S_{j})\rvert+\lvert\text{Cov}(S_{i},X_{j})\rvert+\lvert\text{Cov}(S_{i},S_{j})\rvert
=|Cov(Xi,R~j)|+|Cov(R~i,Xj)|+|Cov(R~i,R~j)|cψ(s/2rn).\displaystyle=\lvert\text{Cov}(X_{i},\tilde{R}_{j})\rvert+\lvert\text{Cov}(\tilde{R}_{i},X_{j})\rvert+\lvert\text{Cov}(\tilde{R}_{i},\tilde{R}_{j})\rvert\leq c\,\psi(s/2-r_{n}).

Applying the covariance bounds,

|[D]|mnn2i𝒩njΛi|𝐄[RiRj]|cmnn2k=1mns=12Rnt=0min{s,rn1}iJ(t,Ck)jΛi𝟏{ρ(i,j)[s1,s)}×(ψ(t)𝟏{s4rn}+ψ(s/2rn)𝟏{s>4rn}),\lvert[D]\rvert\leq\frac{m_{n}}{n^{2}}\sum_{i\in\mathcal{N}_{n}}\sum_{j\not\in\Lambda_{i}}\lvert{\bf E}[R_{i}R_{j}]\rvert\\ \leq c\frac{m_{n}}{n^{2}}\sum_{k=1}^{m_{n}}\sum_{s=1}^{\lfloor 2R_{n}\rfloor}\sum_{t=0}^{\lfloor\min\{s,r_{n}-1\}\rfloor}\sum_{i\in J(t,C_{k})}\sum_{j\not\in\Lambda_{i}}\bm{1}\{\rho(i,j)\in[s-1,s)\}\\ \times\big{(}\psi(t)\bm{1}\{s\leq 4r_{n}\}+\psi(s/2-r_{n})\bm{1}\{s>4r_{n}\}\big{)},

which is order (mn/n)1/2=o(1)(m_{n}/n)^{1/2}=o(1) by an argument similar to (B.5) and (B.6).

Step 2. We show that

σn1mn1ni𝒩n(𝐄[Zii]𝐄[Zi])d𝒩(0,1).\sigma_{n}^{-1}\sqrt{m_{n}}\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}\big{(}{\bf E}[Z_{i}\mid\mathcal{F}_{i}]-{\bf E}[Z_{i}]\big{)}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1).

First, let σ~n2=Var(mnn1i𝒩n𝐄[Zii])\tilde{\sigma}_{n}^{2}=\text{Var}(\sqrt{m_{n}}n^{-1}\sum_{i\in\mathcal{N}_{n}}{\bf E}[Z_{i}\mid\mathcal{F}_{i}]). By Minkowski’s inequality,

|σnσ~n|Var(mn1ni𝒩n(Zi𝐄[Zii]))1/2,\lvert\sigma_{n}-\tilde{\sigma}_{n}\rvert\leq\text{Var}\left(\sqrt{m_{n}}\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}(Z_{i}-{\bf E}[Z_{i}\mid\mathcal{F}_{i}])\right)^{1/2}, (B.7)

which is o(1)o(1) by step 1. By Assumption 5, it then suffices to show

σ~n1mn1ni𝒩n(𝐄[Zii]𝐄[Zi])d𝒩(0,1).\tilde{\sigma}_{n}^{-1}\sqrt{m_{n}}\frac{1}{n}\sum_{i\in\mathcal{N}_{n}}\big{(}{\bf E}[Z_{i}\mid\mathcal{F}_{i}]-{\bf E}[Z_{i}]\big{)}\stackrel{{\scriptstyle d}}{{\longrightarrow}}\mathcal{N}(0,1). (B.8)

We apply Lemma B.2 with Xi=n1mn1/2(𝐄[Zii]𝐄[Zi])X_{i}=n^{-1}m_{n}^{1/2}({\bf E}[Z_{i}\mid\mathcal{F}_{i}]-{\bf E}[Z_{i}]) and dependency graph 𝑨\bm{A} defined after (8). The maximum degree of 𝑨\bm{A} is at most 16maxk|Ck|16\max_{k}\lvert C_{k}\rvert, and maxk|Ck|=O(n/mn)\max_{k}\lvert C_{k}\rvert=O(n/m_{n}). Therefore, by Assumptions 1 and 5,

(B.9)(nmn)2n(mnn)3+(nmn)3/2n(mnn)4mn1/2.\eqref{wass}\lesssim\left(\frac{n}{m_{n}}\right)^{2}n\left(\frac{\sqrt{m_{n}}}{n}\right)^{3}+\left(\frac{n}{m_{n}}\right)^{3/2}\sqrt{n\left(\frac{\sqrt{m_{n}}}{n}\right)^{4}}\lesssim m_{n}^{-1/2}.

Since mnm_{n}\rightarrow\infty, (B.8) follows. ∎

Lemma B.2 ([41], Theorem 3.6).

Let {Xi}i=1n\{X_{i}\}_{i=1}^{n} be random variables with dependency graph 𝐀\bm{A} such that 𝐄[Xi4]<{\bf E}[X_{i}^{4}]<\infty and 𝐄[Xi]=0{\bf E}[X_{i}]=0. Define σ2Var(i=1nXi)\sigma^{2}\equiv\text{Var}(\sum_{i=1}^{n}X_{i}), 𝒲=i=1nXi/σ2\mathcal{W}=\sum_{i=1}^{n}X_{i}/\sigma^{2}, and Ψ=maxi=1,,nj=1nAij\Psi=\max_{i=1,\dots,n}\sum_{j=1}^{n}A_{ij}. For 𝒵𝒩(0,1)\mathcal{Z}\sim\mathcal{N}(0,1),

d(𝒲,𝒵)Ψ2σ3i=1n𝐄[|Xi|3]+28Ψ3/2πσ2(i=1n𝐄[Xi4])1/2,d(\mathcal{W},\mathcal{Z})\leq\frac{\Psi^{2}}{\sigma^{3}}\sum_{i=1}^{n}{\bf E}[\lvert X_{i}\rvert^{3}]+\frac{\sqrt{28}\Psi^{3/2}}{\sqrt{\pi}\sigma^{2}}\left(\sum_{i=1}^{n}{\bf E}[X_{i}^{4}]\right)^{1/2}, (B.9)

where d(,)d(\cdot,\cdot) is the Wasserstein distance.

{acks}

[Acknowledgments] The author thanks the referees and associate editor for helpful comments that improved the exposition of the paper.

{supplement}\stitle

supplement.zip \sdescriptionThis zip file contains a PDF with proofs omitted in this text and code used to produce the simulation results in § 5.

References

  • [1] {barticle}[author] \bauthor\bsnmAndrews, \bfnmD.\binitsD. (\byear1991). \btitleHeteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation. \bjournalEconometrica \bpages817–858. \endbibitem
  • [2] {bincollection}[author] \bauthor\bsnmAnselin, \bfnmL.\binitsL. (\byear2001). \btitleSpatial Econometrics. In \bbooktitleA Companion to Theoretical Econometrics (\beditor\bfnmB.\binitsB. \bsnmBaltagi, ed.) \bchapter14 \bpublisherBlackwell Publishing Ltd. \endbibitem
  • [3] {barticle}[author] \bauthor\bsnmAronow, \bfnmP.\binitsP. and \bauthor\bsnmSamii, \bfnmC.\binitsC. (\byear2017). \btitleEstimating Average Causal Effects Under General Interference, with Application to a Social Network Experiment. \bjournalThe Annals of Applied Statistics \bvolume11 \bpages1912–1947. \endbibitem
  • [4] {barticle}[author] \bauthor\bsnmAronow, \bfnmP.\binitsP., \bauthor\bsnmSamii, \bfnmC.\binitsC. and \bauthor\bsnmWang, \bfnmY.\binitsY. (\byear2020). \btitleDesign-Based Inference for Spatial Experiments with Interference. \bjournalarXiv preprint arXiv:2010.13599. \endbibitem
  • [5] {barticle}[author] \bauthor\bsnmBaird, \bfnmS.\binitsS., \bauthor\bsnmBohren, \bfnmJ.\binitsJ., \bauthor\bsnmMcIntosh, \bfnmC.\binitsC. and \bauthor\bsnmÖzler, \bfnmB.\binitsB. (\byear2018). \btitleOptimal Design of Experiments in the Presence of Interference. \bjournalReview of Economics and Statistics \bvolume100 \bpages844–860. \endbibitem
  • [6] {barticle}[author] \bauthor\bsnmBasse, \bfnmG.\binitsG. and \bauthor\bsnmAiroldi, \bfnmE.\binitsE. (\byear2018). \btitleModel-Assisted Design of Experiments in the Presence of Network-Correlated Outcomes. \bjournalBiometrika \bvolume105 \bpages849–858. \endbibitem
  • [7] {barticle}[author] \bauthor\bsnmBasse, \bfnmG.\binitsG., \bauthor\bsnmFeller, \bfnmA.\binitsA. and \bauthor\bsnmToulis, \bfnmP.\binitsP. (\byear2019). \btitleRandomization Tests of Causal Effects Under Interference. \bjournalBiometrika \bvolume106 \bpages487–494. \endbibitem
  • [8] {barticle}[author] \bauthor\bsnmBlattman, \bfnmC.\binitsC., \bauthor\bsnmGreen, \bfnmD.\binitsD., \bauthor\bsnmOrtega, \bfnmD.\binitsD. and \bauthor\bsnmTobón, \bfnmS.\binitsS. (\byear2021). \btitlePlace-Based Interventions at Scale: The Direct and Spillover Effects of Policing and City Services on Crime. \bjournalJournal of the European Economic Association \bvolume19 \bpages2022–2051. \endbibitem
  • [9] {barticle}[author] \bauthor\bsnmChin, \bfnmA.\binitsA. (\byear2019). \btitleRegression Adjustments for Estimating the Global Treatment Effect in Experiments with Interference. \bjournalJournal of Causal Inference \bvolume7. \endbibitem
  • [10] {barticle}[author] \bauthor\bsnmChoi, \bfnmD.\binitsD. (\byear2017). \btitleEstimation of Monotone Treatment Effects in Network Experiments. \bjournalJournal of the American Statistical Association \bvolume112 \bpages1147–1155. \endbibitem
  • [11] {bbook}[author] \bauthor\bsnmCliff, \bfnmA.\binitsA. and \bauthor\bsnmOrd, \bfnmJ.\binitsJ. (\byear1973). \btitleSpatial Autocorrelation. \bpublisherLondon: Pion. \endbibitem
  • [12] {bbook}[author] \bauthor\bsnmCliff, \bfnmA.\binitsA. and \bauthor\bsnmOrd, \bfnmJ.\binitsJ. (\byear1981). \btitleSpatial Processes: Models and Applications. \bpublisherLondon: Pion. \endbibitem
  • [13] {barticle}[author] \bauthor\bsnmConley, \bfnmT.\binitsT. (\byear1999). \btitleGMM Estimation with Cross Sectional Dependence. \bjournalJournal of Econometrics \bvolume92 \bpages1–45. \endbibitem
  • [14] {barticle}[author] \bauthor\bsnmDonnelly, \bfnmC.\binitsC., \bauthor\bsnmWoodroffe, \bfnmR.\binitsR., \bauthor\bsnmCox, \bfnmD.\binitsD., \bauthor\bsnmBourne, \bfnmJ.\binitsJ., \bauthor\bsnmGettinby, \bfnmG.\binitsG., \bauthor\bsnmLe Fevre, \bfnmA.\binitsA., \bauthor\bsnmMcInerney, \bfnmJ.\binitsJ. and \bauthor\bsnmMorrison, \bfnmI.\binitsI. (\byear2003). \btitleImpact of Localized Badger Culling on Tuberculosis Incidence in British Cattle. \bjournalNature \bvolume426 \bpages834–837. \endbibitem
  • [15] {barticle}[author] \bauthor\bsnmEckles, \bfnmD.\binitsD., \bauthor\bsnmKarrer, \bfnmB.\binitsB. and \bauthor\bsnmUgander, \bfnmJ.\binitsJ. (\byear2017). \btitleDesign and Analysis of Experiments in Networks: Reducing Bias from Interference. \bjournalJournal of Causal Inference \bvolume5. \endbibitem
  • [16] {barticle}[author] \bauthor\bsnmForastiere, \bfnmL.\binitsL., \bauthor\bsnmAiroldi, \bfnmE.\binitsE. and \bauthor\bsnmMealli, \bfnmF.\binitsF. (\byear2021). \btitleIdentification and Estimation of Treatment and Interference Effects in Observational Studies on Networks. \bjournalJournal of the American Statistical Association \bvolume116 \bpages901–918. \endbibitem
  • [17] {barticle}[author] \bauthor\bsnmGetis, \bfnmA.\binitsA. (\byear2008). \btitleA History of the Concept of Spatial Autocorrelation: A Geographer’s Perspective. \bjournalGeographical Analysis \bvolume40 \bpages297–309. \endbibitem
  • [18] {barticle}[author] \bauthor\bsnmGiffin, \bfnmA.\binitsA., \bauthor\bsnmReich, \bfnmB.\binitsB., \bauthor\bsnmYang, \bfnmS.\binitsS. and \bauthor\bsnmRappold, \bfnmA.\binitsA. (\byear2020). \btitleGeneralized Propensity Score Approach to Causal Inference with Spatial Interference. \bjournalarXiv preprint arXiv:2007.00106. \endbibitem
  • [19] {barticle}[author] \bauthor\bsnmHarshaw, \bfnmC.\binitsC., \bauthor\bsnmSävje, \bfnmF.\binitsF., \bauthor\bsnmEisenstat, \bfnmD.\binitsD., \bauthor\bsnmMirrokni, \bfnmV.\binitsV. and \bauthor\bsnmPouget-Abadie, \bfnmJ.\binitsJ. (\byear2021). \btitleDesign and Analysis of Bipartite Experiments Under a Linear Exposure-Response Model. \bjournalarXiv preprint arXiv:2103.06392. \endbibitem
  • [20] {bbook}[author] \bauthor\bsnmHayes, \bfnmR.\binitsR. and \bauthor\bsnmMoulton, \bfnmL.\binitsL. (\byear2017). \btitleCluster Randomised Trials. \bpublisherChapman and Hall/CRC. \endbibitem
  • [21] {barticle}[author] \bauthor\bsnmHu, \bfnmY.\binitsY., \bauthor\bsnmLi, \bfnmS.\binitsS. and \bauthor\bsnmWager, \bfnmS.\binitsS. (\byear2022). \btitleAverage Direct and Indirect Causal Effects Under Interference. \bjournalBiometrika (forthcoming). \endbibitem
  • [22] {bbook}[author] \bauthor\bsnmImbens, \bfnmG\binitsG. and \bauthor\bsnmRubin, \bfnmD.\binitsD. (\byear2015). \btitleCausal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. \bpublisherCambridge University Press. \endbibitem
  • [23] {barticle}[author] \bauthor\bsnmJagadeesan, \bfnmR.\binitsR., \bauthor\bsnmPillai, \bfnmN.\binitsN. and \bauthor\bsnmVolfovsky, \bfnmA.\binitsA. (\byear2020). \btitleDesigns for Estimating the Treatment Effect in Networks with Interference. \bjournalThe Annals of Statistics \bvolume48 \bpages679–712. \endbibitem
  • [24] {barticle}[author] \bauthor\bsnmJenish, \bfnmN.\binitsN. (\byear2016). \btitleSpatial Semiparametric Model with Endogenous Regressors. \bjournalEconometric Theory \bvolume32 \bpages714–739. \endbibitem
  • [25] {barticle}[author] \bauthor\bsnmJenish, \bfnmN.\binitsN. and \bauthor\bsnmPrucha, \bfnmI.\binitsI. (\byear2009). \btitleCentral Limit Theorems and Uniform Laws of Large Numbers for Arrays of Random Fields. \bjournalJournal of Econometrics \bvolume150 \bpages86–98. \endbibitem
  • [26] {barticle}[author] \bauthor\bsnmJenish, \bfnmN.\binitsN. and \bauthor\bsnmPrucha, \bfnmI.\binitsI. (\byear2011). \btitleOn Spatial Processes and Asymptotic Inference Under Near-Epoch Dependence. \bjournalU. Maryland working paper. \endbibitem
  • [27] {barticle}[author] \bauthor\bsnmJenish, \bfnmN.\binitsN. and \bauthor\bsnmPrucha, \bfnmI.\binitsI. (\byear2012). \btitleOn Spatial Processes and Asymptotic Inference Under Near-Epoch Dependence. \bjournalJournal of Econometrics \bvolume170 \bpages178–190. \endbibitem
  • [28] {barticle}[author] \bauthor\bsnmLahiri, \bfnmS.\binitsS. (\byear1996). \btitleOn Inconsistency of Estimators Based on Spatial Data Under Infill Asymptotics. \bjournalSankhyā: The Indian Journal of Statistics, Series A \bpages403–417. \endbibitem
  • [29] {barticle}[author] \bauthor\bsnmLahiri, \bfnmS.\binitsS. (\byear2003). \btitleCentral Limit Theorems for Weighted Sums of a Spatial Process Under a Class of Stochastic and Fixed Designs. \bjournalSankhyā: The Indian Journal of Statistics \bpages356–388. \endbibitem
  • [30] {barticle}[author] \bauthor\bsnmLahiri, \bfnmS.\binitsS. and \bauthor\bsnmZhu, \bfnmJ.\binitsJ. (\byear2006). \btitleResampling Methods for Spatial Regression Models Under a Class of Stochastic Designs. \bjournalThe Annals of Statistics \bvolume34 \bpages1774–1813. \endbibitem
  • [31] {barticle}[author] \bauthor\bsnmLeung, \bfnmM.\binitsM. (\byear2022). \btitleCausal Inference Under Approximate Neighborhood Interference. \bjournalEconometrica \bvolume90 \bpages267-293. \endbibitem
  • [32] {barticle}[author] \bauthor\bsnmLeung, \bfnmM.\binitsM. (\byear2022). \btitleNetwork Cluster-Robust Inference. \bjournalarXiv preprint arXiv:2103.01470. \endbibitem
  • [33] {barticle}[author] \bauthor\bsnmLeung, \bfnmM.\binitsM. (\byear2022). \btitleSupplement to “Rate-Optimal Cluster-Randomized Designs for Spatial Interference”. \bjournalDOI: 10.1214/22-AOS2224SUPP. \endbibitem
  • [34] {barticle}[author] \bauthor\bsnmManski, \bfnmC.\binitsC. (\byear2013). \btitleIdentification of Treatment Response with Social Interactions. \bjournalThe Econometrics Journal \bvolume16 \bpagesS1–S23. \endbibitem
  • [35] {barticle}[author] \bauthor\bsnmMiguel, \bfnmE.\binitsE. and \bauthor\bsnmKremer, \bfnmM.\binitsM. (\byear2004). \btitleWorms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities. \bjournalEconometrica \bvolume72 \bpages159–217. \endbibitem
  • [36] {btechreport}[author] \bauthor\bsnmPaler, \bfnmL.\binitsL., \bauthor\bsnmSamii, \bfnmC.\binitsC., \bauthor\bsnmLisiecki, \bfnmM.\binitsM. and \bauthor\bsnmMorel, \bfnmA.\binitsA. (\byear2015). \btitleSocial and Environmental Impact of the Community Rangers Program in Aceh \btypeTechnical Report, \bpublisherWorld Bank, Washington, DC. \endbibitem
  • [37] {barticle}[author] \bauthor\bsnmPark, \bfnmC.\binitsC. and \bauthor\bsnmKang, \bfnmH.\binitsH. (\byear2021). \btitleAssumption-Lean Analysis of Cluster Randomized Trials in Infectious Diseases for Intent-to-Treat Effects and Spillover Effects Among a Vulnerable Subpopulation. \bjournalJournal of the American Statistical Association (forthcoming). \endbibitem
  • [38] {barticle}[author] \bauthor\bsnmPeng, \bfnmR.\binitsR., \bauthor\bsnmSun, \bfnmH.\binitsH. and \bauthor\bsnmZanetti, \bfnmL.\binitsL. (\byear2017). \btitlePartitioning Well-Clustered Graphs: Spectral Clustering Works! \bjournalSIAM Journal on Computing \bvolume46 \bpages710–743. \endbibitem
  • [39] {barticle}[author] \bauthor\bsnmPollmann, \bfnmM.\binitsM. (\byear2020). \btitleCausal Inference for Spatial Treatments. \bjournalarXiv preprint arXiv:2011.00373. \endbibitem
  • [40] {binproceedings}[author] \bauthor\bsnmPouget-Abadie, \bfnmJ.\binitsJ., \bauthor\bsnmMirrokni, \bfnmV.\binitsV., \bauthor\bsnmParkes, \bfnmD.\binitsD. and \bauthor\bsnmAiroldi, \bfnmE.\binitsE. (\byear2018). \btitleOptimizing Cluster-Based Randomized Experiments Under Monotonicity. In \bbooktitleProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining \bpages2090–2099. \endbibitem
  • [41] {barticle}[author] \bauthor\bsnmRoss, \bfnmN.\binitsN. (\byear2011). \btitleFundamentals of Stein’s Method. \bjournalProbability Surveys \bvolume8 \bpages210–293. \endbibitem
  • [42] {barticle}[author] \bauthor\bsnmSävje, \bfnmF.\binitsF. (\byear2021). \btitleCausal Inference with Misspecified Exposure Mappings. \bjournalarXiv preprint arXiv:2103.06471. \endbibitem
  • [43] {barticle}[author] \bauthor\bsnmSävje, \bfnmF.\binitsF., \bauthor\bsnmAronow, \bfnmP.\binitsP. and \bauthor\bsnmHudgens, \bfnmM.\binitsM. (\byear2021). \btitleAverage Treatment Effects in the Presence of Unknown Interference. \bjournalThe Annals of Statistics \bvolume49 \bpages673–701. \endbibitem
  • [44] {barticle}[author] \bauthor\bsnmSussman, \bfnmD.\binitsD. and \bauthor\bsnmAiroldi, \bfnmE.\binitsE. (\byear2017). \btitleElements of Estimation Theory for Causal Effects in the Presence of Network Interference. \bjournalarXiv preprint arXiv:1702.03578. \endbibitem
  • [45] {binproceedings}[author] \bauthor\bsnmToulis, \bfnmP.\binitsP. and \bauthor\bsnmKao, \bfnmE.\binitsE. (\byear2013). \btitleEstimation of Causal Peer Influence Effects. In \bbooktitleInternational Conference on Machine Learning \bpages1489–1497. \endbibitem
  • [46] {binproceedings}[author] \bauthor\bsnmUgander, \bfnmJ.\binitsJ., \bauthor\bsnmKarrer, \bfnmB.\binitsB., \bauthor\bsnmBackstrom, \bfnmL.\binitsL. and \bauthor\bsnmKleinberg, \bfnmJ.\binitsJ. (\byear2013). \btitleGraph Cluster Randomization: Network Exposure to Multiple Universes. In \bbooktitleProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining \bpages329–337. \endbibitem
  • [47] {barticle}[author] \bauthor\bsnmUgander, \bfnmJ.\binitsJ. and \bauthor\bsnmYin, \bfnmH.\binitsH. (\byear2020). \btitleRandomized Graph Cluster Randomization. \bjournalarXiv preprint arXiv:2009.02297. \endbibitem
  • [48] {barticle}[author] \bauthor\bsnmValcu, \bfnmM.\binitsM. and \bauthor\bsnmKempenaers, \bfnmB.\binitsB. (\byear2010). \btitleSpatial Autocorrelation: an Overlooked Concept in Behavioral Ecology. \bjournalBehavioral Ecology \bvolume21 \bpages902–905. \endbibitem
  • [49] {barticle}[author] \bauthor\bsnmVerbitsky-Savitz, \bfnmN.\binitsN. and \bauthor\bsnmRaudenbush, \bfnmS.\binitsS. (\byear2012). \btitleCausal Inference Under Interference in Spatial Settings: A Case Study Evaluating Community Policing Program in Chicago. \bjournalEpidemiologic Methods \bvolume1 \bpages107–130. \endbibitem
  • [50] {barticle}[author] \bauthor\bsnmViviano, \bfnmD.\binitsD. (\byear2020). \btitleExperimental Design Under Network Interference. \bjournalarXiv preprint arXiv:2003.08421. \endbibitem
  • [51] {barticle}[author] \bauthor\bsnmZigler, \bfnmC.\binitsC. and \bauthor\bsnmPapadogeorgou, \bfnmG.\binitsG. (\byear2021). \btitleBipartite Causal Inference with Interference. \bjournalStatistical Science \bvolume36 \bpages109. \endbibitem

See pages 1 of supplement.pdf See pages 0 of supplement.pdf