This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Independent-Set Design of Experiments for Estimating Treatment and Spillover Effects under Network Interference

Chencheng Cai
Washington State University
[email protected]
Xu Zhang
Amazon Inc.
[email protected]
Edoardo M. Airoldi
Temple University
[email protected]
Chencheng Cai is Assistant Professor of Statistics at Department of Mathematics and Statistics, Washington State University. Xu Zhang is Data Scientist at Amazon Inc.. Edoardo M. Airoldi is Millard E. Gladfelter Professor of Statistics, Operations, and Data Science at the Fox School of Business at Temple University.
Abstract

Interference is ubiquitous when conducting causal experiments over networks. Except for certain network structures, causal inference on the network in the presence of interference is difficult due to the entanglement between the treatment assignments and the interference levels. In this article, we conduct causal inference under interference on an observed, sparse but connected network, and we propose a novel design of experiments based on an independent set. Compared to conventional designs, the independent-set design focuses on an independent subset of data and controls their interference exposures through the assignments to the rest (auxiliary set). We provide a lower bound on the size of the independent set from a greedy algorithm , and justify the theoretical performance of estimators under the proposed design. Our approach is capable of estimating both spillover effects and treatment effects. We justify its superiority over conventional methods and illustrate the empirical performance through simulations.

Keywords:
Causal Inference, Experimental Design, Independent Set, Network Interference

1 Introduction

Randomized experiments are widely regarded as the gold standard for estimating causal effects. However, spillover effect is ubiquitous in experiments when one unit’s treatment affects others’ outcomes, where the stable unit treatment value assumption (SUTVA) (Imbens and Rubin,, 2015) is violated. Examples include the traditional agricultural field experiments (Zaller and Köpke,, 2004), educational studies (Rosenbaum,, 2007), econometric assumptions (Banerjee et al.,, 2013; Johari et al.,, 2020), and social networks (Phan and Airoldi,, 2015). The spillover effect is also called “interference” and the two names are used interchangeably in the literature.

Estimating causal effects in the presence of interference is usually a difficult task, except for certain special cases. For instance, when the population of units can be well-partitioned into several isolated clusters, randomized saturation design (Hudgens and Halloran,, 2008) and its variations (for example, Eckles et al., (2017), Owen and Varian, (2020)) has shown success in estimating causal effects involving interference by controlling the interference level at each cluster with a pre-determined saturation. The difficulties in estimating spillover effects on a well-connected network are two-fold. The first is the entanglement between the treatment assignments and the spillover effects received by the units. Because the spillover effects received by the units are determined by the interference relationships and the assignment of all units, one cannot achieve separate controls on the treatment and the spillover of all units. The second difficulty comes from the collapse in spillover effect. A completely randomized experiment on all units anchors the spillover effect received by each units at its expectation for large networks because of the law of large numbers. In estimating spillover effects or total treatment effects, where we prefer diversified interference levels for the units, the collapse gives rise to an increase in variance and a reduction in power.

Although the individual treatment assignments and their interference received cannot be controlled independently for the whole dataset, an approximately separate control is feasible for a subset of the dataset. We propose a novel design of experiments on the network, where we focus on a high-quality subset of units. The subset contains non-interfering units and is called “independent set”, where we borrow the name from graph theory to emphasize that there is no edge in the independent set. The rest units form the “auxiliary set”, which provides interference to the units in the independent set. Such a partitioning of the independent set and the auxiliary set re-gains separate controls over the treatment and the interference on the independent set. The approach can be viewed as a trade-off between the quantity and the quality of data. The independent set has a smaller sample size compared to the full dataset. However, due to the possibility of separate controls of their treatments and interference, observed outcomes from the independent set units provide more information for the causal effect, and, therefore, are more efficient for causal inference purposes.

The remainder of this paper is organized as follows: Section 2 discusses related studies and summarizes our contribution. We formally define the causal effect under interference and introduce the independent set design in Section 3. Section 4 provides the theoretical foundation and quantifies the performance. In Section 5, we evaluate the performance of our approach on simulated data. Finally, in Section 6, we discuss our approach and highlight potential future work.

2 Related work and Our Contribution

In classical methods for randomized experiments, a key assumption is that there is no interference among the units and the outcome of each unit does not depend on the treatments assigned to the other units (SUTVA). However, this no-interference assumption is not plausible on networks. To address the interference problem, many approaches are proposed. First, researchers improve experimental designs to reduce interference, such as the cluster-based randomized design (Bland,, 2004), group-level experimental designs (Sinclair et al.,, 2012; Basse et al.,, 2017), randomized saturation design (Hudgens and Halloran,, 2008), and graph cluster randomization (Ugander et al.,, 2013; Eckles et al.,, 2017; Ugander and Yin,, 2020). These approaches aim to reduce interference by clustering and assume interference only exists within units in the cluster and no interference between clusters (known as partial interference (Sobel,, 2006)). However, when the isolated cluster assumption is violated, the randomized saturation design fails to estimate effects accurately (Cai et al.,, 2022). To extend the cluster-based randomized design, ego-clusters design (Saint-Jacques et al.,, 2019) is proposed to estimate spillover effects, which requires non-overlapping ego-clusters. Karwa and Airoldi, (2018) introduced a simple independent set design, which only focuses on estimating treatment effect and has no control of bias/variance of the estimator. Unlike graph cluster randomization and ego-clusters randomization that require non-overlapping clusters, our design allows units to share neighbors. Furthermore, recent studies incorporate network structure in the experiment based primarily on the measurement of local neighbors of units (Awan et al.,, 2020), and they estimate causal effects by matching the units with similar network structures. Some studies leverage special network structures in designs to relax SUTVA assumption to estimate causal effects (Aronow and Samii,, 2013; Toulis and Kao,, 2013; Yuan et al.,, 2021). In another direction, some studies leverage inference strategies, and they conduct a variety of hypotheses under network inference (Athey et al.,, 2015; Rosenbaum,, 2007; Pouget-Abadie et al., 2019b, ; Sussman and Airoldi,, 2017). When there exists a two-sided network or two-sided market for units, for example, consumers and suppliers, researchers proposed bipartite experiments (Pouget-Abadie et al., 2019a, ; Holtz et al.,, 2020; Doudchenko et al.,, 2020; Zigler and Papadogeorgou,, 2021) to mitigate spillover effects. In bipartite experiments, two distinct groups of units are connected together and form a bipartite graph. In contrast, our approach generates two sets from units that belong to one group in a network. In addition, an increasing number of studies focus on observational studies on networks (Liu et al.,, 2016; Ogburn and VanderWeele,, 2017; Forastiere et al.,, 2021). However, these methods require structural models and need to identify confounders.

Our method makes several key contributions: First, we propose a novel design that partitions data into two disjoint sets to gain separate controls on the treatment and the interference. Second, in contrast to most of the previous studies, which only focus on estimating of treatment effect or spillover effect, our design works as a general framework for network experiments involving interference and is capable of various causal tasks (estimating both treatment and spillover effects). Third, the treatment assignments on the two sets can be optimized directly to control the bias/variance of the estimator. Such a connection between the objective function and the bias/variance is discussed in Section 4. Fourth, we provide theoretical guarantees, including an almost-sure lower bound on the sample size and the analytic bias/variance for the estimators. Finally, unlike previous studies, which require SUTVA assumptions and no interference between units, or are restricted to special network structures to mitigate the spillover effects, our approach does not require SUTVA and could be implemented in arbitrary networks.

In Table 1, we compare our method to a few competitive designs in terms of assumptions on networks, effective sample size, and the ability to control interference.

Table 1: Comparison of designs for estimating spillover effects
Design Network Assumption Sample Size Interference Control
Independent Set random, sparse (nlogs)/s(n\log s)/s high
Ego-clusters random, sparse n/(s+1)n/(s+1) full
Randomized Saturation isolated clusters nn partial
Completely Randomized no nn no
*: number of units used for inference, nn: network size, ss: average degree

3 Independent-set design

3.1 Potential outcomes and causal effects under interference

Consider a set of nn experimental units, labeled with i=1,,ni=1,\dots,n. Each of them is assigned with a treatment Zi{0,1}Z_{i}\in\{0,1\}, for which we denote Zi=1Z_{i}=1 for treated, Zi=0Z_{i}=0 for control, and 𝒁:=(Z1,,Zn)\bm{Z}:=(Z_{1},\dots,Z_{n}) for the full treatment vector. Under Rubin’s framework of potential outcomes (Imbens and Rubin,, 2015), we assume Yi(𝒁)Y_{i}(\bm{Z}) is the potential outcome of unit ii that would be observed if the units were assigned with the treatment vector 𝒁\bm{Z}. We say ”unit ii is interfered by unit jj” when YiY_{i} depends on ZjZ_{j}, for iji\neq j. All the pairwise interference relationships can be represented by a graph 𝒢=(V,E)\mathcal{G}=(V,E), where V={1,,n}V=\{1,\dots,n\} is the vertex set, and (i,j)E(i,j)\in E if units ii and jj interfere with each other. Here, we assume all pairwise interference is symmetric such that 𝒢\mathcal{G} can be reduced to an undirected graph with each undirected edge representing the original bi-directed edges. The left panel in Figure 1 shows a diagram of 12 units and 18 interfering pairs.

Given the graph 𝒢\mathcal{G}, we write the unit ii’s potential outcome as Yi(Zi,𝒁𝒩i)Y_{i}(Z_{i},\bm{Z}_{\mathcal{N}_{i}}), where 𝒩i:={jV:(i,j)E}\mathcal{N}_{i}:=\{j\in V:(i,j)\in E\} is the neighbor of unit ii, and 𝒁𝒩i:=(Zj)j𝒩i\bm{Z}_{\mathcal{N}_{i}}:=(Z_{j})_{j\in\mathcal{N}_{i}} is the neighbor treatment (sub-)vector. Following Forastiere et al., (2021); Cai et al., (2022), we further assume that YiY_{i} depends on unit ii’s neighbor through the proportion of treated neighbors, which is defined by

ρi:=|𝒩i|1j𝒩iZj.\rho_{i}:=|\mathcal{N}_{i}|^{-1}\sum_{j\in\mathcal{N}_{i}}Z_{j}. (1)

Assumption 1 assumes the interference on the proportion of treated neighbor assumption. In this paper, we simply write Yi(Zi,ρi):{0,1}×[0,1]Y_{i}(Z_{i},\rho_{i}):\{0,1\}\times[0,1]\rightarrow\mathbb{R} as the potential outcome of unit ii.

Assumption 1 (interference on the proportion of treated neighbors)

Let 𝐳\bm{z} and 𝐳\bm{z}^{\prime} be two treatment vectors, and let Yi(𝐙)Y_{i}(\bm{Z}) be the potential outcome of unit ii. We assume

Yi(𝒛)=Yi(𝒛)whenever zi=zi and ρi=ρi,Y_{i}(\bm{z})=Y_{i}(\bm{z}^{\prime})\quad\text{whenever }z_{i}=z_{i}^{\prime}\text{ and }\rho_{i}=\rho_{i}^{\prime},

where ρi\rho_{i} and ρi\rho_{i}^{\prime} are the proportion of treated neighbors induced from 𝐳\bm{z} and 𝐳\bm{z}^{\prime}, correspondingly.

Furthermore, for any two tuples (z,ρ),(z,ρ){0,1}×[0,1](z,\rho),(z^{\prime},\rho^{\prime})\in\{0,1\}\times[0,1], we define the unit-level causal effect as τi(z,ρ,z,ρ):=Yi(z,ρ)Yi(z,ρ)\tau_{i}(z,\rho,z^{\prime},\rho^{\prime}):=Y_{i}(z,\rho)-Y_{i}(z^{\prime},\rho^{\prime}), with certain special cases defined as

(direct effect) τi(d)(ρ)\displaystyle\tau_{i}^{(d)}(\rho) :=τi(1,ρ,0,ρ),\displaystyle:=\tau_{i}(1,\rho,0,\rho),
(spillover effect) τi(i)(z,ρ,ρ)\displaystyle\tau_{i}^{(i)}(z,\rho,\rho^{\prime}) :=τi(z,ρ,z,ρ),\displaystyle:=\tau_{i}(z,\rho,z,\rho^{\prime}),
(total effect) τi(t)\displaystyle\tau_{i}^{(t)} :=τi(1,1,0,0).\displaystyle:=\tau_{i}(1,1,0,0).

The direct effect measures the marginal effect of ziz_{i} under a given level of interference. The indirect effect relates to the effect of interference and is often called “spillover effect”. The total treatment effect measures the change in outcome when all units are treated vs all units are control. Note that the total treatment effect can be decomposed as a combination of direct and indirect effects such that τi(t)=τi(d)(0)+τi(i)(1,1,0)\tau_{i}^{(t)}=\tau_{i}^{(d)}(0)+\tau_{i}^{(i)}(1,1,0).

Experimenters are usually interested in the population-level average effects, which are defined as τ¯(z,ρ,z,ρ):=n1i=1nτi(z,ρ,z,ρ)\bar{\tau}(z,\rho,z^{\prime},\rho^{\prime}):=n^{-1}\sum_{i=1}^{n}\tau_{i}(z,\rho,z^{\prime},\rho^{\prime}). The average direct / spillover / total treatment effect are defined in a similar way. The average total treatment effect τ¯(t)\bar{\tau}^{(t)} is of particular interest because it measures the average change in outcomes when all units are assigned to treatment. Further discussions on causal effects under interference can be found in Forastiere et al., (2021).

3.2 Independent-set design

123456789101112134791025681112Independent SetAuxiliary Set134791025681112𝒁I\bm{Z}_{I}𝒁A\bm{Z}_{A}
Figure 1: Illustration of Independent Set design. (Left) Example graph. (Middle) The graph is partitioned into an independent set and an auxiliary set. (Right) Conduct an experiment on the independent set and the auxiliary set, shaded nodes denote the treated nodes.

We disentangle the unit-level treatments and the interference by partitioning the units VV into two sets VIV_{I} and VAV_{A}. Let 𝒢I=(VI,EI)\mathcal{G}_{I}=(V_{I},E_{I}) be the sub-graph of 𝒢\mathcal{G} by restricting it to the vertex set VIV_{I}, and 𝒢A=(VA,EA)\mathcal{G}_{A}=(V_{A},E_{A}) be the sub-graph by restricting to VAV_{A}. Specifically, we require 𝒢I\mathcal{G}_{I} to be a null graph such that EI=E_{I}=\varnothing, or equivalently, VIV_{I} is an independent set of 𝒢\mathcal{G}. We call the counterpart VAV_{A} as the auxiliary set. The middle panel in Figure 1 gives one such partition for the interference graph in the left panel. We will later see that the two sets play different roles in the inference.

Denote the treatment vectors on the independent set VIV_{I} and on the auxiliary set VAV_{A} by 𝒁I\bm{Z}_{I} and 𝒁A\bm{Z}_{A}, correspondingly. Define the |VI|×|VA||V_{I}|\times|V_{A}| interference matrix 𝚪\bm{\Gamma} between VIV_{I} and VAV_{A} by [𝚪]ij:=di1𝕀{(i,j)E}[\bm{\Gamma}]_{ij}:=d_{i}^{-1}\mathbb{I}\{(i,j)\in E\}, for all iVI,jVAi\in V_{I},j\in V_{A}, where did_{i} is the degree of unit ii in 𝒢\mathcal{G}, and 𝕀{}\mathbb{I}\{\cdot\} is the indicator function. The interference vector on independent set 𝝆I:=(ρi)iVI\bm{\rho}_{I}:=(\rho_{i})_{i\in V_{I}} is given by 𝝆I=𝚪𝒁A\bm{\rho}_{I}=\bm{\Gamma}\bm{Z}_{A}. Because 𝚪\bm{\Gamma} is a constant matrix when the graph 𝒢\mathcal{G} is observed, 𝝆I\bm{\rho}_{I} is determined by the treatment vector 𝒁A\bm{Z}_{A} of the auxiliary set. If we restrict our estimation to the independent set VIV_{I}, the unit-level treatments 𝒁I\bm{Z}_{I} and the interference 𝝆I\bm{\rho}_{I} can be separately controlled through 𝒁I\bm{Z}_{I} and 𝒁A\bm{Z}_{A}, respectively. We call such a design ”independent set design”, which partitions the dataset into the independent set and the auxiliary set, controls the treatment and interference separately, and constructs an estimator based on the observed outcomes from the independent set.

The assignments for 𝒁I\bm{Z}_{I} and 𝒁A\bm{Z}_{A} can be designed to accomplish different causal inference goals. For example, to estimate the average direct effect τ¯(d)(ρ)\bar{\tau}^{(d)}(\rho), one can assign 𝒁I\bm{Z}_{I} by completely randomized design and optimize 𝒁A\bm{Z}_{A} by minimizing 𝚪𝒁Aρ1\|\bm{\Gamma}\bm{Z}_{A}-\rho\|_{1}. To estimate the average indirect effect τ¯(z,1,0)\bar{\tau}(z,1,0), one can assign Zi=zZ_{i}=z for all iVIi\in V_{I} and optimize 𝒁A\bm{Z}_{A} by maximizing Var[𝚪𝒁A]\mathrm{Var}[\bm{\Gamma}\bm{Z}_{A}]. To estimate the average total treatment effect, one can optimize 𝒁A\bm{Z}_{A} by maximizing Var[𝚪𝒁A]\mathrm{Var}[\bm{\Gamma}\bm{Z}_{A}] and let Zi=𝟏{ρi>0.5}Z_{i}=\bm{1}_{\{\rho_{i}>0.5\}} for all iVIi\in V_{I}. In the right panel of Figure 1, we provide one such assignment of 𝒁I\bm{Z}_{I} and 𝒁A\bm{Z}_{A} for estimating the total treatment effect. We will discuss the implementations for different causal inference tasks in the next sections.

Two aspects are compromised in order to improve the data quality of the independent set. The first one is the sample size. Independent set design restricts the estimation to use only observations of the independent set, which is usually a fraction of the dataset, but of higher quality in estimation. The second aspect that is compromised is the bias from representativity of the independent set, because the average effect on the independent set is not necessarily identical to the one on the whole dataset. The unbiasedness is achieved when we assume the observed interference graph is random and is unconfounded with the potential outcomes. We discuss the assumptions in Section 4.1.

Input: graph 𝒢=(V,E)\mathcal{G}=(V,E)
Output: independent set VIV_{I}
VIV_{I}\leftarrow\varnothing
while |V|>0|V|>0 do
       Choose ii from VV uniformly
       VIVI{i}V_{I}\leftarrow V_{I}\cup\{i\}
       VV{jV:(i,j)E or i=j}V\leftarrow V\setminus\{j\in V:(i,j)\in E\text{ or }i=j\}
return VIV_{I}
Algorithm 1 Greedy algorithm for finding independent set

Before proceeding to different design implementations, we first give a greedy algorithm of finding a locally largest independent set in Algorithm 1. The independent set of a graph is not unique in general — any subset of an independent set is another independent set. For inference purposes, one would expect the variance of the estimator scales as O(|VI|1)O(|V_{I}|^{-1}). Therefore, the goal is to find the largest independent set, which is known as NP-hard (Robson,, 1986). In practice, we adopt the fast greedy algorithm that can find the local largest independent set in linear time.

3.3 Inference

In this section, we assume the two sets VIV_{I} and VAV_{A} have been obtained through the aforementioned greedy algorithm, and discuss the designs of the assignments on 𝒁I\bm{Z}_{I} and 𝒁A\bm{Z}_{A} for different inference tasks. For simplicity, we denote their sizes |VI||V_{I}| and |VA||V_{A}| by nIn_{I} and nAn_{A}, respectively. Next, we discuss the estimations of direct treatment effect and spillover effect separately.

3.3.1 Estimating average direct effects

To estimate the average direct effects τ¯(d)(ρ)\bar{\tau}^{(d)}(\rho), one wish to find an assignment vector 𝒁A\bm{Z}_{A} on the auxiliary set such that all independent units’ interference received is close to ρ\rho. The optimal assignment vector can be found by the following optimization.

min𝒁A{0,1}nA𝚪𝒁Aρ𝟏1.\min_{\bm{Z}_{A}\in\{0,1\}^{n_{A}}}\ \|\bm{\Gamma}\bm{Z}_{A}-\rho\bm{1}\|_{1}. (2)

On the other hand, the treatment vector 𝒁I\bm{Z}_{I} for the independent set can be assigned randomly with a completely randomized design such that half of the units are randomly assigned to treated. The difference-in-means estimator on the independent set can be constructed as

τ^(d)(ρ)=1nI/2iVIYi(obs)Zi1nI/2iVIYi(obs)(1Zi),\hat{\tau}^{(d)}(\rho)=\frac{1}{n_{I}/2}\sum_{i\in V_{I}}Y_{i}^{(obs)}Z_{i}-\frac{1}{n_{I}/2}\sum_{i\in V_{I}}Y_{i}^{(obs)}(1-Z_{i}), (3)

where Yi(obs)Y_{i}^{(obs)} is the observed outcome of unit ii.

The optimization in (2) is a convex programming on a binary/integer space with an objective function lower bounded by 0. When ρ=0\rho=0 or 11, the optimization (2) has a trivial but perfect solution as 𝒁A=ρ𝟏\bm{Z}_{A}=\rho\bm{1} such that all the interference of independent units matches ρ\rho exactly. However, for general 0<ρ<10<\rho<1, this lower bound is not attainable. As we will later see in Section 4.3.1, the objective function in (2) is directly related to the bias of the estimator in (3).

3.3.2 Estimating average spillover effects and total treatment effects

When estimating the average indirect effect τ¯(i)(z,1,0)\bar{\tau}^{(i)}(z,1,0) or the average total treatment effect τ¯(t)\bar{\tau}^{(t)}, we hope the interference received by the independent set units spreads to the two extremes: either ρi=0\rho_{i}=0 for empty interference or ρi=1\rho_{i}=1 for saturated interference. Therefore, we propose to maximize the variance of (ρi)iVI(\rho_{i})_{i\in V_{I}} in the following optimization:

max𝒁A{0,1}nA𝒁AT𝚪T[𝑰1nI𝟏𝟏T]𝚪𝒁A.\max_{\bm{Z}_{A}\in\{0,1\}^{n_{A}}}\ \bm{Z}_{A}^{T}\bm{\Gamma}^{T}\left[\bm{I}-\frac{1}{n_{I}}\bm{1}\bm{1}^{T}\right]\bm{\Gamma}\bm{Z}_{A}. (4)

The above optimization is a concave quadratic programming, and the support can be expanded to its convex hull: [0,1]nA[0,1]^{n_{A}} while keeping the same solution. The objective function in (4) is bounded below by nI/4n_{I}/4, the largest eigenvalue of 𝑰nI1𝟏𝟏T\bm{I}-n_{I}^{-1}\bm{1}\bm{1}^{T}, by taking 𝚪𝒁A=(1,,1,0,,0)\bm{\Gamma}\bm{Z}_{A}=(1,\dots,1,0,\dots,0). The optimum may not be attainable when it is outside the manifold of 𝚪\bm{\Gamma}.

Consider a linear additive structure of the potential outcomes such that

Yi(Zi,ρi)=α+βZi+γρi+ϵi,Y_{i}(Z_{i},\rho_{i})=\alpha+\beta Z_{i}+\gamma\rho_{i}+\epsilon_{i}, (5)

where ϵi\epsilon_{i} represents the heterogeneity in potential outcomes and we assume Var[ϵi]=σ2\mathrm{Var}[\epsilon_{i}]=\sigma^{2}.

With the additive linear model in (5), the causal effects are written in a parametric form such that the direct effect is β\beta, the spillover effect is γ\gamma, and the total treatment effect is β+γ\beta+\gamma. The coefficients can be estimated through standard linear regression estimators for the units in the independent set. Specifically, the estimators are

(α^,β^,γ^)T=(𝑿T𝑿)1𝑿T𝒀I,(\hat{\alpha},\hat{\beta},\hat{\gamma})^{T}=(\bm{X}^{T}\bm{X})^{-1}\bm{X}^{T}\bm{Y}_{I},

where 𝑿=[𝟏,𝒁I,𝝆I]\bm{X}=[\bm{1},\bm{Z}_{I},\bm{\rho}_{I}] is the design matrix and 𝒀I\bm{Y}_{I} is the outcome vector on the independent set.

The variance of the estimators β^\hat{\beta} and γ^\hat{\gamma} depend inversely on the variance of 𝒁I\bm{Z}_{I} and 𝝆I\bm{\rho}_{I}, respectively. The optimization in (4) minimizes the variance of linear regression estimators. Detailed discussions on the variance of 𝝆I\bm{\rho}_{I} and the variance of the estimator are provided in Section 4.3.2.

Notice that, one can use non-parametric ways to estimate spillover or total treatment effects with difference-in-means estimators. However, it results in fewer units that have the required interference levels and, therefore, large variance.

4 Theoretical results

4.1 Assumptions

We first list a few more assumptions that are necessary for subsequent analysis.

Assumption 2 (super-population perspective)

We view the nn units from the super-population perspective such that their potential outcomes are i.i.d. from some super-population distribution.

In Assumption 2, we assume the sample units are i.i.d. from a super-population such that both VIV_{I} and VV can be viewed as representative finite samples. Furthermore, if we denote τ¯I\bar{\tau}_{I} as the average causal effects on the independent set VIV_{I}, then under Assumption 2, we have 𝔼[τ¯I]=𝔼[τ¯]\mathbb{E}[\bar{\tau}_{I}]=\mathbb{E}[\bar{\tau}], which gives the unbiasedness of average causal effect on VIV_{I} with respect to the (full) sample average effects.

Assumption 3 (unconfoundedness of network)

We assume the observed network 𝒢\mathcal{G} is a realization of random network, which is unconfounded with the units’ potential outcomes such that 𝒢𝒴\mathcal{G}\perp\!\!\!\perp\mathcal{Y}, where 𝒴:={Yi(Zi,ρi):i[n],Zi{0,1},ρi[0,1]}\mathcal{Y}:=\{Y_{i}(Z_{i},\rho_{i}):i\in[n],Z_{i}\in\{0,1\},\rho_{i}\in[0,1]\} is the table of potential outcomes.

Furthermore, we assume the network formation is independent of the potential outcomes in Assumption 3. Note that the greedy algorithm of finding the independent set in Algorithm 1, and the optimizations of 𝒁A\bm{Z}_{A} as in (2) and (4) are not node-symmetric — vertices with different degrees have different probabilities of being assigned to VIV_{I}. It gives rise to the selection bias of the set VIV_{I}. By assuming the network unconfoundedness, the selection bias from a single sample is averaged if we consider multiple repetitions of sampling and network randomization.

4.2 The greedy algorithm

The first question at hand is how large the set VIV_{I} we can find from a random graph 𝒢\mathcal{G}. There always exists such a graph that one can hardly find a large independent set (for example, the complete graph). Due to the randomness of graph 𝒢\mathcal{G}, we can show such extreme cases occur with a negligible probability as nn\rightarrow\infty. We consider the Erdös-Rényi random graph where each pair of units forms an edge with probability p=s/np=s/n independently. The random graph has an expected average degree of pn=spn=s. Note that most networks of interest (e.g. social networks) are connected and sparse networks. Therefore, we assume s=o(n)s=o(n) for sparsity and assume s=Ω(logn)s=\Omega(\log n) for connectivity. Theorem 1 gives a lower bound on the size of the independent set VIV_{I} resulting from Algorithm 1.

Theorem 1 (lower bound on the greedy algorithm)

Consider an Erdös-Rényi random graph 𝒢\mathcal{G} with nn vertices and edge probability p=s/np=s/n for sΩ(logn)s\in\Omega(\log n). Then with high probability, the independent set VIV_{I} from the greedy algorithm yields a size at least logssn\frac{\log s}{s}n.

Note that Algorithm 1 is a greedy approximation in finding the locally largest independent set. Dani and Moore, (2011) discussed the size of the globally largest independent set for random graphs. The resulting independent set has a similar order of size compared to the globally optimal one. In the egocentric design (Saint-Jacques et al.,, 2019), the number of ego-clusters is upper bounded by n/(s+1)n/(s+1), which is smaller than ours by a factor of logs\log s.

4.3 Estimation performance

In this section, we analyze the performance of the causal estimators for the tasks discussed in Section 3.3, and provide additional insights into the optimizations in (2) and (4).

4.3.1 Estimating average direct effect

We first investigate the performance of the difference-in-means estimator in (3) for the direct effect. Recall that in order to construct τ^(d)(ρ)\hat{\tau}^{(d)}(\rho), we first determine 𝒁A\bm{Z}_{A} through the optimization in (2), and then randomly assign half of VIV_{I} to treated through a completely randomized design. Given the sample set, including the potential outcomes 𝒴\mathcal{Y} and the graph 𝒢\mathcal{G}, for any assignment 𝒁A\bm{Z}_{A} on the auxiliary set, we provide an upper bound for the (conditional) bias and variance in Theorem 2.

Theorem 2 (bias and variance for estimating direct effects)

Suppose Assumptions 1, 2 and 3 hold, and the potential outcomes Yi(z,ρ)Y_{i}(z,\rho) is uniformly Lipschitz in ρ\rho such that there exists L>0L>0 satisfying |Yi(z,ρ1)Yi(z,ρ2)|L|ρ1ρ2||Y_{i}(z,\rho_{1})-Y_{i}(z,\rho_{2})|\leq L|\rho_{1}-\rho_{2}| , for all i[n],z{0,1}i\in[n],z\in\{0,1\} and ρ1,ρ2[0,1]\rho_{1},\rho_{2}\in[0,1]. Consider the estimator in (3). We have

𝔼[τ^(d)(ρ)τ¯(d)(ρ)𝒴,𝒢,𝒁A]2LnI𝚫1,and\displaystyle\mathbb{E}[\hat{\tau}^{(d)}(\rho)-\bar{\tau}^{(d)}(\rho)\mid\mathcal{Y},\mathcal{G},\bm{Z}_{A}]\leq\frac{2L}{n_{I}}\|\bm{\Delta}\|_{1},\quad\text{and}
|Var[τ^(d)(ρ)𝒴,𝒢,𝒁A]1nI𝕊I[Yi(1,ρ)+Yi(0,ρ)]|4nI(nI1)(LYmax𝚫1+L2𝚫12),\displaystyle\left|\mathrm{Var}[\hat{\tau}^{(d)}(\rho)\mid\mathcal{Y},\mathcal{G},\bm{Z}_{A}]-\frac{1}{n_{I}}\mathbb{S}_{I}[Y_{i}(1,\rho)+Y_{i}(0,\rho)]\right|\leq\frac{4}{n_{I}(n_{I}-1)}\left(LY_{max}\|\bm{\Delta}\|_{1}+L^{2}\|\bm{\Delta}\|_{1}^{2}\right),

where 𝚫=𝚪𝐙Aρ𝟏\bm{\Delta}=\bm{\Gamma}\bm{Z}_{A}-\rho\bm{1} is the deviation of interference from the target ρ\rho, 𝕊I[Yi(1,ρ)+Yi(0,ρ)]=(nI1)1iVI(Yi(1,ρ)+Yi(0,ρ)Y¯(1,ρ)Y¯(0,ρ))2\mathbb{S}_{I}[Y_{i}(1,\rho)+Y_{i}(0,\rho)]=(n_{I}-1)^{-1}\sum_{i\in V_{I}}(Y_{i}(1,\rho)+Y_{i}(0,\rho)-\bar{Y}(1,\rho)-\bar{Y}(0,\rho))^{2} with Y¯(z,ρ)=nI1iVIYi(z,ρ)\bar{Y}(z,\rho)=n_{I}^{-1}\sum_{i\in V_{I}}Y_{i}(z,\rho) is the sample variance of the potential outcomes on VIV_{I}, and Ymax=maxiVI|Yi(1,ρ)+Yi(0,ρ)Y¯(1,ρ)Y¯(0,ρ)|Y_{max}=\max_{i\in V_{I}}\ |Y_{i}(1,\rho)+Y_{i}(0,\rho)-\bar{Y}(1,\rho)-\bar{Y}(0,\rho)| is the maximum deviation of potential outcomes from their mean on VIV_{I}.

𝚫\bm{\Delta} represents the distance between the interference received by the independent units and the target value ρ\rho. When 𝚫=𝟎\bm{\Delta}=\bm{0}, the design is equivalent to a completely randomized experiment. In the general case, the optimization in (2) minimizes the upper bound for potential bias. Theorem 2 also tells the variance of τ^(d)\hat{\tau}^{(d)} is close to the completely randomized design on VIV_{I} if 𝚫1\|\bm{\Delta}\|_{1} is small.

4.3.2 Estimating spillover effect and total treatment effect

In this section, we consider the two treatment effects discussed in Section 3.3.2. We focus on the linear additive model in (5) such that the effects can be estimated through linear regression. Recall that, for estimating either the spillover effect or the total treatment effect, the assignment 𝒁A\bm{Z}_{A} on the auxiliary set is determined through the variance maximization in (4). We set 𝒁I\bm{Z}_{I} to zz for estimating the spillover effect and set 𝒁I\bm{Z}_{I} depending on 𝝆I\bm{\rho}_{I} for the total treatment effect.

The following theorem gives the bias and variance for the linear regression estimator for the spillover effects. The spillover effect estimator is always unbiased and its conditional variance depends inversely on the variance of the interference 𝝆I\bm{\rho}_{I} of the independent units. The optimization in 4 corresponds to the minimization of the conditional variance of the linear regression estimator.

Theorem 3 (bias and variance for estimating spillover effects)

Assume Assumptions 1, 2 and 3. Consider a linear regression over the parametric potential outcome model (5) with estimated parameters β^,γ^\hat{\beta},\hat{\gamma}. Set VI=z𝟏V_{I}=z\bm{1} and let τ^(i)(z,1,0)=γ^\hat{\tau}^{(i)}(z,1,0)=\hat{\gamma}. Then we have

𝔼[τ^(i)(z,1,0)𝒴]=τ¯(i)(z,1,0)andVar[τ^(i)(z,1,0)𝒴,𝒢,𝒁A]=σ2nIVar[𝝆I].\mathbb{E}[\hat{\tau}^{(i)}(z,1,0)\mid\mathcal{Y}]=\bar{\tau}^{(i)}(z,1,0)\qquad\text{and}\qquad\mathrm{Var}[\hat{\tau}^{(i)}(z,1,0)\mid\mathcal{Y},\mathcal{G},\bm{Z}_{A}]=\frac{\sigma^{2}}{n_{I}\mathrm{Var}[\bm{\rho}_{I}]}.

Theorem 4 gives the unbiasedness of the total treatment effect estimator, and provides a lower bound on the variance. Such a lower bound is, in general, not attainable in the binary support of 𝒁A\bm{Z}_{A}, but it provides the reciprocal dependence on Var[𝝆I]\mathrm{Var}[\bm{\rho}_{I}], which was maximized in the optimization 4 as well.

Theorem 4 (bias and variance for estimating total treatment effects)

Assume Assumptions 1, 2 and 3. Consider a linear regression over the parametric potential outcome model (5) with estimated parameters β^,γ^\hat{\beta},\hat{\gamma}. Let τ^(t)=β^+γ^\hat{\tau}^{(t)}=\hat{\beta}+\hat{\gamma}. If |Corr(𝐙I,𝛒I)|<1|\mathrm{Corr}(\bm{Z}_{I},\bm{\rho}_{I})|<1, we have

𝔼[τ^(t)𝒴]=τ¯(t), and Var[τ^(t)𝒢,𝒁A,𝒁I]=σ2nIVar[𝒁I𝝆I]Var[𝒁I]Var[𝝆I]Cov2[𝒁I,𝝆I].\mathbb{E}[\hat{\tau}^{(t)}\mid\mathcal{Y}]=\bar{\tau}^{(t)},\quad\text{ and }\quad\mathrm{Var}[\hat{\tau}^{(t)}\mid\mathcal{G},\bm{Z}_{A},\bm{Z}_{I}]=\frac{\sigma^{2}}{n_{I}}\frac{\mathrm{Var}[\bm{Z}_{I}-\bm{\rho}_{I}]}{\mathrm{Var}[\bm{Z}_{I}]\mathrm{Var}[\bm{\rho}_{I}]-\mathrm{Cov}^{2}[\bm{Z}_{I},\bm{\rho}_{I}]}.

The variance is lower bounded by σ2/(nIVar[𝛒I])\sigma^{2}/(n_{I}\mathrm{Var}[\bm{\rho}_{I}]).

5 Experiments

In this section, we implement the independent set design and validate results empirically on synthetic graphs. In order to evaluate the performance of the proposed design, we conduct simulations under a variety of simulated settings. The bias and variance of estimators are compared for completely randomized design on the independent set (CR), completely randomized design on the full graph (Full), ego-clusters, graph cluster, and Independent Set design (IS), which optimizes the assignments on the auxiliary set. To illustrate the benefits of IS design, we implement the design to estimate average spillover effects and average direct effects respectively.

5.1 Estimating average spillover effects

First, we examine the performance of estimating average spillover effects for different designs. We run experiments on 50 Erdös-Rényi random graphs (Erdős and Rényi,, 1959) with n=60,p=0.1n=60,p=0.1, and estimate the spillover effects. The potential outcomes are randomly generated by Yi(Zi,ρi)=α+βZi+γρi+U+ϵi,iVY_{i}(Z_{i},\rho_{i})=\alpha+\beta Z_{i}+\gamma\rho_{i}+U+\epsilon_{i},\ \forall i\in V, and we let baseline α=1\alpha=1, treatment parameter β=20\beta=20, interference parameter γ=5,10,15,20\gamma=5,10,15,20, UUnif(0,1)U\sim\mathrm{Unif}(0,1) and ϵiN(0,0.5)\epsilon_{i}\sim N(0,0.5).

Results are displayed in Figure 2. Independent set design outperforms all other methods to estimate the average spillover effects. At different levels of interference, IS design shows the lowest bias and variance. We notice that IS design is more stable than the other methods, the variance of IS design stabilizes at a very low level as spillover effects increase.

Refer to caption
(a) Bias
Refer to caption
(b) Variance
Figure 2: Bias and variance to estimate the average spillover effects. The bands around the lines represent the errors of the estimator, for each value of Gamma.

To introduce diversity, we run experiments on distinct random graphs: Erdös-Rényi (Erdős and Rényi,, 1959) G(n,p)G(n,p) , Barabási–Albert (Barabási and Albert,, 1999) G(n,m)G(n,m) and small-world (Watts and Strogatz,, 1998) G(n,p)G(n,p). The potential outcomes are based on Yi(Zi,ρi)=αi+βZi+γρi+ϵi,iVY_{i}(Z_{i},\rho_{i})=\alpha_{i}+\beta Z_{i}+\gamma\rho_{i}+\epsilon_{i},\ \forall i\in V, and αi=1,β=20,γ=10.\alpha_{i}=1,\beta=20,\gamma=10. ϵiN(0,0.5)\epsilon_{i}\sim N(0,0.5). Each configuration is run 2,000 times. The results of estimating spillover effects are summarized in Tables 2. As shown in Tables 2, IS design outperforms all other methods on distinct settings and achieves the lowest bias and variance.

Table 2: Performance of designs in estimating average spillover effects under distinct random graphs.
CR IS Full Graph Cluster Ego-Clusters
Graph Parameters Bias Variance Bias Variance Bias Variance Bias Variance Bias Variance
n = 100, p = 0.10 0.598 0.468 0.398 0.242 0.473 0.399 0.497 0.418 0.428 0.491
n = 200, p = 0.15 0.392 0.188 0.315 0.124 0.366 0.178 0.384 0.192 0.342 0.187
Erdös-Rényi n = 400, p = 0.15 0.246 0.096 0.225 0.067 0.266 0.094 0.242 0.089 0.239 0.091
n = 100, m = 1 0.192 0.064 0.152 0.032 0.168 0.049 0.177 0.056 0.159 0.072
Barabási–Albert n = 075, m = 1 0.239 0.055 0.135 0.041 0.181 0.051 0.185 0.067 0.176 0.079
n = 80, p = 0.05 0.303 0.165 0.212 0.087 0.263 0.089 0.274 0.093 0.232 0.117
Small world n = 50, p = 0.05 0.351 0.093 0.243 0.036 0.277 0.044 0.296 0.061 0.264 0.089

5.2 Estimating average direct effects

We next implement IS design to estimate average direct effects and evaluate the performance under various settings. Here, the potential outcomes are randomly generated according to Yi(Zi,ρi)=αi+βZi+γρi+ϵi,iVY_{i}(Z_{i},\rho_{i})=\alpha_{i}+\beta Z_{i}+\gamma\rho_{i}+\epsilon_{i},\ \forall i\in V, and we have αi=1,β=20,γ=10\alpha_{i}=1,\beta=20,\gamma=10. ϵiN(0,0.5)\epsilon_{i}\sim N(0,0.5). We run experiments 2,000 times and record the bias and variance of estimators. Table 3 shows the results of estimating average direct effects. In the presence of spillover effects, IS design achieves the best performance (the lowest bias and variance). When SUTVA is violated, network interference impairs the performance of conventional approaches. In this case, estimators in classical designs are biased to estimate average direct effects. In contrast, IS design could mitigate the spillover effects and improve performance by controlling treatment assignments on the auxiliary set.

Table 3: Performance of designs in estimating average direct effects under distinct random graphs.
CR IS Full Graph Cluster
Graph Parameters Bias Variance Bias Variance Bias Variance Bias Variance
n = 100, p = 0.1 0.262 0.035 0.117 0.009 0.261 0.023 0.221 0.024
n = 200, p = 0.1 0.193 0.038 0.094 0.005 0.188 0.015 0.156 0.013
Erdös-Rényi n = 400, p = 0.1 0.098 0.006 0.062 0.002 0.086 0.004 0.081 0.005
n = 080, m = 1 1.145 0.851 0.237 0.034 1.130 0.949 1.031 0.815
Barabási–Albert n = 100, m = 1 1.207 0.693 0.278 0.026 1.296 0.832 1.139 0.742
n = 100, p = 0.05 0.961 0.556 0.278 0.024 1.184 0.791 1.105 0.716
Small world n = 050, p = 0.05 1.101 0.869 0.225 0.029 1.115 0.877 1.013 0.854

6 Discussion and Conclusion

Conventional randomization experiments on networks are affected by interference among units in networks. Hence, causal inference on units in a network often presents technical challenges, and estimators for causal effects in classic randomization on networks will be biased. In this study, we propose a novel randomized design to estimate causal effects in networks. Our approach separates the units in a network into an independent set and an auxiliary set and controls interference by the assignments on the auxiliary set. Unlike classic randomization experiments that require SUTVA or special network structures, our design does not require SUTVA and can be applied to arbitrary networks. We show that our design provides more accurate estimations and has good interpretability.

Whilst IS design allows us to improve the estimation of causal effects on networks, there are several limitations. Due to the segmentation of units in a network, we can only estimate causal effects on the independent set. The sample size of the experiment will be smaller than using the full network. Moreover, the computational cost depends on the size of the auxiliary set, it may cost more time to optimize the assignments on the auxiliary set. Another limitation is that our design depends on observed networks. The performance of the proposed design on a misspecified network is unknown yet. Future research includes the improvement of the computational efficiency of the algorithm to optimize the assignments on the auxiliary set and further extension when the network is misspecified.

References

  • Aronow and Samii, (2013) Aronow, P. M. and Samii, C. (2013). Estimating average causal effects under interference between units. arXiv:1305.6156.
  • Athey et al., (2015) Athey, S., Eckles, D., and Imbens, G. W. (2015). Exact p-values for network interference. Technical report, National Bureau of Economic Research.
  • Awan et al., (2020) Awan, U., Morucci, M., Orlandi, V., Roy, S., Rudin, C., and Volfovsky, A. (2020). Almost-matching-exactly for treatment effect estimation under network interference. In International Conference on Artificial Intelligence and Statistics, pages 3252–3262. PMLR.
  • Banerjee et al., (2013) Banerjee, A., Chandrasekhar, A. G., Duflo, E., and Jackson, M. O. (2013). The diffusion of microfinance. Science, 341(6144).
  • Barabási and Albert, (1999) Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439):509–512.
  • Basse et al., (2017) Basse, G., Feller, A., and Toulis, P. (2017). Exact tests for two-stage randomized designs in the presence of interference. arXiv:1709.08036.
  • Bland, (2004) Bland, J. M. (2004). Cluster randomised trials in the medical literature: two bibliometric surveys. BMC Medical Research Methodology, 4(1):1–6.
  • Cai et al., (2022) Cai, C., Pouget-Abadie, J., and Airoldi, E. M. (2022). Optimizing randomized and deterministic saturation designs under interference. arXiv preprint arXiv:2203.09682.
  • Dani and Moore, (2011) Dani, V. and Moore, C. (2011). Independent sets in random graphs from the weighted second moment method. In APPROX-RANDOM, pages 472–482. Springer.
  • Doudchenko et al., (2020) Doudchenko, N., Zhang, M., Drynkin, E., Airoldi, E., Mirrokni, V., and Pouget-Abadie, J. (2020). Causal inference with bipartite designs. arXiv preprint arXiv:2010.02108.
  • Eckles et al., (2017) Eckles, D., Karrer, B., and Ugander, J. (2017). Design and analysis of experiments in networks: Reducing bias from interference. Journal of Causal Inference, 5(1).
  • Erdős and Rényi, (1959) Erdős, P. and Rényi, A. (1959). On random graphs i. Publ. Math. Debrecen, 6(290-297):18.
  • Forastiere et al., (2021) Forastiere, L., Airoldi, E. M., and Mealli, F. (2021). Identification and estimation of treatment and interference effects in observational studies on networks. Journal of the American Statistical Association, 116(534):901–918.
  • Frieze and Karoński, (2016) Frieze, A. and Karoński, M. (2016). Introduction to random graphs. Cambridge University Press.
  • Holtz et al., (2020) Holtz, D., Lobel, R., Liskovich, I., and Aral, S. (2020). Reducing interference bias in online marketplace pricing experiments. arXiv preprint arXiv:2004.12489.
  • Hudgens and Halloran, (2008) Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, pages 832–842.
  • Imbens and Rubin, (2015) Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
  • Johari et al., (2020) Johari, R., Li, H., and Weintraub, G. (2020). Experiment design in two-sided platforms: An analysis of bias. In ACM Conference on Economics and Computation (EC).
  • Karwa and Airoldi, (2018) Karwa, V. and Airoldi, E. M. (2018). A systematic investigation of classical causal inference strategies under mis-specification due to network interference. arXiv preprint arXiv:1810.08259.
  • Liu et al., (2016) Liu, L., Hudgens, M. G., and Becker-Dreps, S. (2016). On inverse probability-weighted estimators in the presence of interference. Biometrika, 103(4):829–842.
  • Ogburn and VanderWeele, (2017) Ogburn, E. L. and VanderWeele, T. J. (2017). Vaccines, contagion, and social networks. The Annals of Applied Statistics, 11(2):919–948.
  • Owen and Varian, (2020) Owen, A. B. and Varian, H. (2020). Optimizing the tie-breaker regression discontinuity design. Electronic Journal of Statistics, 14(2):4004–4027.
  • Phan and Airoldi, (2015) Phan, T. Q. and Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics. Proceedings of the National Academy of Sciences, 112(21):6595–6600.
  • (24) Pouget-Abadie, J., Aydin, K., Schudy, W., Brodersen, K., and Mirrokni, V. (2019a). Variance reduction in bipartite experiments through correlation clustering. Advances in Neural Information Processing Systems, 32.
  • (25) Pouget-Abadie, J., Saint-Jacques, G., Saveski, M., Duan, W., Ghosh, S., Xu, Y., and Airoldi, E. M. (2019b). Testing for arbitrary interference on experimentation platforms. Biometrika, 106(4):929–940.
  • Robson, (1986) Robson, J. M. (1986). Algorithms for maximum independent sets. Journal of Algorithms, 7(3):425–440.
  • Rosenbaum, (2007) Rosenbaum, P. R. (2007). Interference between units in randomized experiments. Journal of the American Statistical Association, 102(477).
  • Saint-Jacques et al., (2019) Saint-Jacques, G., Varshney, M., Simpson, J., and Xu, Y. (2019). Using ego-clusters to measure network effects at linkedin. arXiv preprint arXiv:1903.08755.
  • Sinclair et al., (2012) Sinclair, B., McConnell, M., and Green, D. P. (2012). Detecting spillover effects: Design and analysis of multilevel experiments. American Journal of Political Science, 56(4):1055–1069.
  • Sobel, (2006) Sobel, M. E. (2006). What do randomized studies of housing mobility demonstrate? causal inference in the face of interference. Journal of the American Statistical Association, 101(476):1398–1407.
  • Sussman and Airoldi, (2017) Sussman, D. L. and Airoldi, E. M. (2017). Elements of estimation theory for causal effects in the presence of network interference. arXiv:1702.03578.
  • Toulis and Kao, (2013) Toulis, P. and Kao, E. (2013). Estimation of Causal Peer Influence Effects. In ICML.
  • Ugander et al., (2013) Ugander, J., Karrer, B., Backstrom, L., and Kleinberg, J. (2013). Graph cluster randomization: Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, page 329–337.
  • Ugander and Yin, (2020) Ugander, J. and Yin, H. (2020). Randomized graph cluster randomization. arXiv preprint arXiv:2009.02297.
  • Wainwright, (2019) Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press.
  • Watts and Strogatz, (1998) Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’networks. Nature, 393(6684):440–442.
  • Yuan et al., (2021) Yuan, Y., Altenburger, K., and Kooti, F. (2021). Causal network motifs: Identifying heterogeneous spillover effects in a/b tests. In Proceedings of the Web Conference 2021, pages 3359–3370.
  • Zaller and Köpke, (2004) Zaller, J. G. and Köpke, U. (2004). Effects of traditional and biodynamic farmyard manure amendment on yields, soil chemical, biochemical and biological properties in a long-term field experiment. Biology and fertility of soils, 40(4):222–229.
  • Zigler and Papadogeorgou, (2021) Zigler, C. M. and Papadogeorgou, G. (2021). Bipartite causal inference with interference. Statistical science: a review journal of the Institute of Mathematical Statistics, 36(1):109.

Appendix A Proof of Theorem 1.

In order to prove the lower bound, we follow the differential equation method in (Frieze and Karoński,, 2016, Chapter 24).

Recall Algorithm 1, and let vt,t=1,,|VI|v_{t},t=1,\dots,|V_{I}| be the selected independent unit at step tt. Denote V(t)V(t) the set of vertices after step tt in the algorithm such that V(0)=nV(0)=n and V(|VI|)=0V(|V_{I}|)=0. Write X(t)=|V(t)|X(t)=|V(t)| as the number of remaining units after picking out tt independent units. Then the algorithm gives

X(t+1)=X(t)1d(vt+1V(t)),fort=0,,|VI|1,X(t+1)=X(t)-1-d(v_{t+1}\mid V(t)),\ \text{for}\ t=0,\dots,|V_{I}|-1,

where d(vt+1V(t))d(v_{t+1}\mid V(t)) is the degree of vt+1v_{t+1} in the subgraph containing the vertex set V(t)V(t). Let f(t,x)=sx(1p)f(t,x)=-sx-(1-p) and

D={(t,x):1n<t<1,0<x<1}.D=\left\{(t,x):-\frac{1}{n}<t<1,0<x<1\right\}.

We now show the conditions (P1)-(P4) in (Frieze and Karoński,, 2016, Chapter 24) are satisfied.

(P1): It is obvious X(t)<X(0)=nX(t)<X(0)=n. Therefore, (P1) is satisfied with C0=1C_{0}=1.
(P2): Since |X(t+1)X(t)|=d(vt+1V(t))+1d(vt+1V0)+1|X(t+1)-X(t)|=d(v_{t+1}\mid V(t))+1\leq d(v_{t+1}\mid V_{0})+1, where vtv_{t} is the vertex deleted at step tt. We claim with high probability,

d(vt+1V0)maxvGd(vV0)2(s+logn)1.d(v_{t+1}\mid V_{0})\leq\max_{v\in G}\ d(v\mid V_{0})\leq 2(s+\log n)-1.

It follows from

[maxvGd(vV0)ξ]\displaystyle\mathbb{P}\left[\max_{v\in G}\ d(v\mid V_{0})\geq\xi\right] vG[d(vV0)ξ]\displaystyle\leq\sum_{v\in G}\mathbb{P}[d(v\mid V_{0})\geq\xi]
(i)nexp{n1ns(nξ(n1)slognξ(n1)snξ(n1)s+1)}\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}n\exp\left\{-\frac{n-1}{n}s\left(\frac{n\xi}{(n-1)s}\log\frac{n\xi}{(n-1)s}-\frac{n\xi}{(n-1)s}+1\right)\right\}
exp{lognξlogξs+ξ},\displaystyle\leq\exp\left\{\log n-\xi\log\frac{\xi}{s}+\xi\right\},

where (i)(i) follows from the Chernoff bound of binomial distributions (see, for example, Chapter 2 in Wainwright, (2019)).
By setting ξ=e2(s+logn)1>e2s\xi=e^{2}(s+\log n)-1>e^{2}s, we have

[maxvGd(vV0)e2(s+logn)1]neξene21ee2s.\mathbb{P}\left[\max_{v\in G}\ d(v\mid V_{0})\geq e^{2}(s+\log n)-1\right]\leq ne^{-\xi}\leq\frac{e}{n^{e^{2}-1}}e^{-e^{2}s}.

Therefore,

|X(t+1)X(t)|e2(s+logn)|X(t+1)-X(t)|\leq e^{2}(s+\log n)

is satisfied with probability at least 1n1e2e1e2s1n6e6s1-n^{1-e^{2}}e^{1-e^{2}s}\geq 1-n^{-6}e^{-6s}.
(P3):Let \mathcal{E} be the event that (P2) holds. Then,

|𝔼(X(t+1)X(t)V(t),)f(t/n,X(t)/n)|=0.|\mathbb{E}(X(t+1)-X(t)\mid V(t),\mathcal{E})-f(t/n,X(t)/n)|=0.

(P4): It is immediate that |f(t,x)f(t,x)|=s|xx|.|f(t,x)-f(t^{\prime},x^{\prime})|=s|x-x^{\prime}|.

To use Theorem 24.1 in Frieze and Karoński, (2016), we set

C0\displaystyle C_{0} =1\displaystyle=1
β\displaystyle\beta =e2(s+logn)\displaystyle=e^{2}(s+\log n)
λ\displaystyle\lambda =nϵ1/3(s+logn)\displaystyle=n^{\epsilon-1/3}(s+\log n)
γ\displaystyle\gamma =n6e6s\displaystyle=n^{-6}e^{-6s}
α\displaystyle\alpha =n3ϵe6.\displaystyle=n^{3\epsilon}e^{-6}.

Then we have X(t)=nz(t/n)+O(nϵ+2/3(s+logn))X(t)=nz(t/n)+O(n^{\epsilon+2/3}(s+\log n)) uniformly on 0tσt0\leq t\leq\sigma t, with probability at least 1O(n6e6s+n1/3ϵen3ϵe6)=1O(n6e6s)1O(n12)1-O(n^{-6}e^{-6s}+n^{1/3-\epsilon}e^{-n^{3\epsilon}e^{-6}})=1-O(n^{-6}e^{-6s})\geq 1-O(n^{-12}).

z(t)z(t) satisfies the following ODE that:

z(t)=f(t,z(t))=sz(t)(1p),z^{\prime}(t)=f(t,z(t))=-sz(t)-(1-p),

with initial condition that z(0)=X(0)=nz(0)=X(0)=n. The solution is

z(t)=1ps+(1+1ps)est.z(t)=-\frac{1-p}{s}+\left(1+\frac{1-p}{s}\right)e^{-st}.

We have z(t)>0z(t)>0 on [0,1slogs+1p1p][0,logss]\left[0,\frac{1}{s}\log\frac{s+1-p}{1-p}\right]\supset\left[0,\frac{\log s}{s}\right]. Therefore, with high probability, X(t)0X(t)\geq 0 for tlogssnt\leq\frac{\log s}{s}n.

In conclusion, we have at least logssn\frac{\log s}{s}n iterations in the greedy algorithm, resulting in an independent set of size at least logssn\frac{\log s}{s}n with probability at least 1O(n12)1-O(n^{-12}).

Appendix B Proof of Theorem 2.

We fix the potential outcomes 𝒴\mathcal{Y}, the graph 𝒢\mathcal{G}, and the assignment 𝒁A\bm{Z}_{A} for the auxiliary set. Recall the difference-in-means estimator:

τ^(d)(ρ)=1nI/2iVIYi(obs)Zi1nI/2iVIYi(obs)(1Zi).\hat{\tau}^{(d)}(\rho)=\frac{1}{n_{I}/2}\sum_{i\in V_{I}}Y_{i}^{(obs)}Z_{i}-\frac{1}{n_{I}/2}\sum_{i\in V_{I}}Y_{i}^{(obs)}(1-Z_{i}).

We have the expectation:

𝔼[τ^(d)(ρ)𝒴,𝒢,𝒁A]\displaystyle\mathbb{E}[\hat{\tau}^{(d)}(\rho)\mid\mathcal{Y},\mathcal{G},\bm{Z}_{A}] =1nIiVIYi(1,ρi)1nIiVIYi(0,ρi),\displaystyle=\frac{1}{n_{I}}\sum_{i\in V_{I}}Y_{i}(1,\rho_{i})-\frac{1}{n_{I}}\sum_{i\in V_{I}}Y_{i}(0,\rho_{i}),

where we use the facts that ρi\rho_{i} is fixed given 𝒢\mathcal{G} and 𝒁A\bm{Z}_{A}, and 𝔼[Zi]=1/2\mathbb{E}[Z_{i}]=1/2. Therefore, the bias is

|𝔼[τ^(d)(ρ)𝒴,𝒢,𝒁A]τ¯(d)(ρ)|\displaystyle\left|\mathbb{E}[\hat{\tau}^{(d)}(\rho)\mid\mathcal{Y},\mathcal{G},\bm{Z}_{A}]-\bar{\tau}^{(d)}(\rho)\right| =|1nIiVI[Yi(1,ρi)Yi(1,ρ)]1nIiVI[Yi(0,ρi)Yi(1,ρ)]|\displaystyle=\left|\frac{1}{n_{I}}\sum_{i\in V_{I}}[Y_{i}(1,\rho_{i})-Y_{i}(1,\rho)]-\frac{1}{n_{I}}\sum_{i\in V_{I}}[Y_{i}(0,\rho_{i})-Y_{i}(1,\rho)]\right|
2LnIiVi|ρiρ|\displaystyle\leq\frac{2L}{n_{I}}\sum_{i\in V_{i}}|\rho_{i}-\rho|
=2LnI𝝆Iρ𝟏1.\displaystyle=\frac{2L}{n_{I}}\|\bm{\rho}_{I}-\rho\bm{1}\|_{1}.

Next, we give the variance.

Var[τ^(d)(ρ)𝒴,𝒢,𝒁A]\displaystyle\mathrm{Var}[\hat{\tau}^{(d)}(\rho)\mid\mathcal{Y},\mathcal{G},\bm{Z}_{A}]
=\displaystyle= Var[1nI/2iVI[Yi(1,ρi)+Yi(0,ρi)]Zi]\displaystyle\mathrm{Var}\left[\frac{1}{n_{I}/2}\sum_{i\in V_{I}}[Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})]Z_{i}\right]
=\displaystyle= 4nI2iVI[Yi(1,ρi)+Yi(0,ρi)]2Var[Zi]\displaystyle\frac{4}{n_{I}^{2}}\sum_{i\in V_{I}}[Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})]^{2}\mathrm{Var}[Z_{i}]
+4nI2ijVI[Yi(1,ρi)+Yi(0,ρi)][Yj(1,ρj)+Yj(0,ρj)]Cov[Zi,Zj]\displaystyle\qquad+\frac{4}{n_{I}^{2}}\sum_{i\neq j\in V_{I}}[Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})][Y_{j}(1,\rho_{j})+Y_{j}(0,\rho_{j})]\mathrm{Cov}[Z_{i},Z_{j}]
=\displaystyle= 1nI2iVI[Yi(1,ρi)+Yi(0,ρi)]21nI2(nI1)ijVI[Yi(1,ρi)+Yi(0,ρi)][Yj(1,ρj)+Yj(0,ρj)]\displaystyle\frac{1}{n_{I}^{2}}\sum_{i\in V_{I}}[Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})]^{2}-\frac{1}{n_{I}^{2}(n_{I}-1)}\sum_{i\neq j\in V_{I}}[Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})][Y_{j}(1,\rho_{j})+Y_{j}(0,\rho_{j})]
=\displaystyle= 1nI𝕊I[Yi(1,ρi)+Yi(0,ρi)]\displaystyle\frac{1}{n_{I}}\mathbb{S}_{I}[Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})]

Therefore, we have

|Var[τ^(d)(ρ)𝒴,𝒢,𝒁A]1nI𝕊I[Yi(1,ρ)+Yi(0,ρ)]|\displaystyle\left|\mathrm{Var}[\hat{\tau}^{(d)}(\rho)\mid\mathcal{Y},\mathcal{G},\bm{Z}_{A}]-\frac{1}{n_{I}}\mathbb{S}_{I}[Y_{i}(1,\rho)+Y_{i}(0,\rho)]\right|
=\displaystyle= 1nI[2𝕊I[Yi(1,ρi)+Yi(0,ρi)]𝕊I[Yi(1,ρ)+Yi(0,ρ)]]\displaystyle\frac{1}{n_{I}}\left[2\mathbb{S}_{I}[Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})]-\mathbb{S}_{I}[Y_{i}(1,\rho)+Y_{i}(0,\rho)]\right]
=\displaystyle= 1nI(2𝕊I[δi,Yi(1,ρ)+Yi(0,ρ)]+𝕊I[δi])\displaystyle\frac{1}{n_{I}}\left(2\mathbb{S}_{I}[\delta_{i},Y_{i}(1,\rho)+Y_{i}(0,\rho)]+\mathbb{S}_{I}[\delta_{i}]\right)
\displaystyle\leq 1nI(nI1)[2𝜹I1Ymax+𝜹I22]\displaystyle\frac{1}{n_{I}(n_{I}-1)}\left[2\|\bm{\delta}_{I}\|_{1}Y_{max}+\|\bm{\delta}_{I}\|_{2}^{2}\right]
\displaystyle\leq 4nI(nI1)[L𝚫I1Ymax+L2𝚫I12],\displaystyle\frac{4}{n_{I}(n_{I}-1)}\left[L\|\bm{\Delta}_{I}\|_{1}Y_{max}+L^{2}\|\bm{\Delta}_{I}\|_{1}^{2}\right],

where δi=Yi(1,ρi)+Yi(0,ρi)Yi(1,ρ)+Yi(0,ρ)2LΔi\delta_{i}=Y_{i}(1,\rho_{i})+Y_{i}(0,\rho_{i})-Y_{i}(1,\rho)+Y_{i}(0,\rho)\leq 2L\Delta_{i}, and

Ymax=maxiVi|Yi(1,ρ)+Yi(0,ρ)1nIiVI[Yi(1,ρ)+Yi(0,ρ)]|.Y_{max}=\max_{i\in V_{i}}\ \left|Y_{i}(1,\rho)+Y_{i}(0,\rho)-\frac{1}{n_{I}}\sum_{i\in V_{I}}\left[Y_{i}(1,\rho)+Y_{i}(0,\rho)\right]\right|.

.

Appendix C Proof of Theorem 3.

Consider the linear regression Yiα+βZi+γρiY_{i}\sim\alpha+\beta Z_{i}+\gamma\rho_{i} for iVIi\in V_{I}. When ZiZ_{i} is a constant as in the design, the linear regression is equivalent to Yiα~+γρiY_{i}\sim\tilde{\alpha}+\gamma\rho_{i}, where α~:=α+βz\tilde{\alpha}:=\alpha+\beta z. By observing τ^(i)(z,1,0)=γ^\hat{\tau}^{(i)}(z,1,0)=\hat{\gamma}, the result follows immediately from univariate linear regression.

Appendix D Proof of Theorem 4.

Let 𝒀I\bm{Y}_{I} be the observed outcomes on the independent set such that 𝒀I=α𝟏+β𝒁I+γ𝝆I+ϵI\bm{Y}_{I}=\alpha\bm{1}+\beta\bm{Z}_{I}+\gamma\bm{\rho}_{I}+\bm{\epsilon}_{I}. We write the design matrix as

𝑿=[𝟏𝒁I𝝆I]=[𝟏𝒁~I𝝆~I][1Z¯Iρ¯I010001],\bm{X}=\begin{bmatrix}\bm{1}&\bm{Z}_{I}&\bm{\rho}_{I}\end{bmatrix}=\begin{bmatrix}\bm{1}&\tilde{\bm{Z}}_{I}&\tilde{\bm{\rho}}_{I}\end{bmatrix}\begin{bmatrix}1&\bar{Z}_{I}&\bar{\rho}_{I}\\ 0&1&0\\ 0&0&1\end{bmatrix},

where Z¯I\bar{Z}_{I} and ρ¯I\bar{\rho}_{I} are sample means of 𝒁I\bm{Z}_{I} and 𝝆I\bm{\rho}_{I}, correspondingly, and 𝒁~I\tilde{\bm{Z}}_{I} and 𝝆~I\tilde{\bm{\rho}}_{I} are de-meaned versions of 𝒁I\bm{Z}_{I} and 𝝆I\bm{\rho}_{I}, correspondingly.

Let α^\hat{\alpha}, β^\hat{\beta}, γ^\hat{\gamma} be the regression estimators. The unbiasedness of β^+γ^\hat{\beta}+\hat{\gamma} to the true value β+γ\beta+\gamma is immediate. Now we calculate the covariance matrix such that

Cov[(α^,β^,γ^)]\displaystyle\mathrm{Cov}[(\hat{\alpha},\hat{\beta},\hat{\gamma})] =σ2[𝑿T𝑿]1\displaystyle=\sigma^{2}[\bm{X}^{T}\bm{X}]^{-1}
=σ2[1Z¯Iρ¯I010001][nI000nIVar[𝒁I]nICov[𝒁I,𝝆I]0nICov[𝒁I,𝝆I]nIVar[𝝆I]]1[100Z¯I10ρ¯I01]\displaystyle=\sigma^{2}\begin{bmatrix}1&-\bar{Z}_{I}&-\bar{\rho}_{I}\\ 0&1&0\\ 0&0&1\end{bmatrix}\begin{bmatrix}n_{I}&0&0\\ 0&n_{I}\mathrm{Var}[\bm{Z}_{I}]&n_{I}\mathrm{Cov}[\bm{Z}_{I},\bm{\rho}_{I}]\\ 0&n_{I}\mathrm{Cov}[\bm{Z}_{I},\bm{\rho}_{I}]&n_{I}\mathrm{Var}[\bm{\rho}_{I}]\end{bmatrix}^{-1}\begin{bmatrix}1&0&0\\ -\bar{Z}_{I}&1&0\\ -\bar{\rho}_{I}&0&1\end{bmatrix}
=σ2nI[1S1(Z¯Iρ2¯ρ¯ZIρ¯)S1(Z¯IZIρ¯ρ¯Z2¯)S1(Z¯Iρ2¯ρ¯ZIρ¯)S1Var[𝝆I]S1Cov[𝒁I,𝝆I]S1(Z¯IZIρ¯ρ¯Z2¯)S1Cov[𝒁I,𝝆I]S1Var[𝒁I]],\displaystyle=\frac{\sigma^{2}}{n_{I}}\begin{bmatrix}1&-S^{-1}(\overline{Z}_{I}\overline{\rho^{2}}-\overline{\rho}\overline{Z_{I}\rho})&S^{-1}(\overline{Z}_{I}\overline{Z_{I}\rho}-\overline{\rho}\overline{Z^{2}})\\ -S^{-1}(\overline{Z}_{I}\overline{\rho^{2}}-\overline{\rho}\overline{Z_{I}\rho})&S^{-1}\mathrm{Var}[\bm{\rho}_{I}]&-S^{-1}\mathrm{Cov}[\bm{Z}_{I},\bm{\rho}_{I}]\\ S^{-1}(\overline{Z}_{I}\overline{Z_{I}\rho}-\overline{\rho}\overline{Z^{2}})&-S^{-1}\mathrm{Cov}[\bm{Z}_{I},\bm{\rho}_{I}]&S^{-1}\mathrm{Var}[\bm{Z}_{I}]\end{bmatrix},

where S=Var[𝒁I]Var[𝝆I]Cov2[𝒁I,𝝆I]S=\mathrm{Var}[\bm{Z}_{I}]\mathrm{Var}[\bm{\rho}_{I}]-\mathrm{Cov}^{2}[\bm{Z}_{I},\bm{\rho}_{I}].
Therefore, we have

Var[β^+γ^]=σ2nIVar[𝒁I]+Var[𝝆I]2Cov[𝒁I,𝝆I]Var[𝒁I]Var[𝝆I]Cov2[𝒁I,𝝆I].\mathrm{Var}[\hat{\beta}+\hat{\gamma}]=\frac{\sigma^{2}}{n_{I}}\frac{\mathrm{Var}[\bm{Z}_{I}]+\mathrm{Var}[\bm{\rho}_{I}]-2\mathrm{Cov}[\bm{Z}_{I},\bm{\rho}_{I}]}{\mathrm{Var}[\bm{Z}_{I}]\mathrm{Var}[\bm{\rho}_{I}]-\mathrm{Cov}^{2}[\bm{Z}_{I},\bm{\rho}_{I}]}.

Consider the function f(λ)f(\lambda):

λVar[𝒁I]+Var[𝝆I]2λVar[𝒁I]Var[𝝆I]λ2.\lambda\mapsto\frac{\mathrm{Var}[\bm{Z}_{I}]+\mathrm{Var}[\bm{\rho}_{I}]-2\lambda}{\mathrm{Var}[\bm{Z}_{I}]\mathrm{Var}[\bm{\rho}_{I}]-\lambda^{2}}.

Then we have

dfdλ=2(λVar[𝒁I])(λVar[𝝆I])(Var[𝒁I]Var[𝝆I]Cov2[𝒁I,𝝆I])2.\frac{df}{d\lambda}=-\frac{2(\lambda-\mathrm{Var}[\bm{Z}_{I}])(\lambda-\mathrm{Var}[\bm{\rho}_{I}])}{(\mathrm{Var}[\bm{Z}_{I}]\mathrm{Var}[\bm{\rho}_{I}]-\mathrm{Cov}^{2}[\bm{Z}_{I},\bm{\rho}_{I}])^{2}}.

Therefore, when λ2<Var[𝒁I]Var[𝝆I]\lambda^{2}<\mathrm{Var}[\bm{Z}_{I}]\mathrm{Var}[\bm{\rho}_{I}], f(λ)f(\lambda) takes its minimum at λ=Var[𝒁I]Var[𝝆I]\lambda^{*}=\mathrm{Var}[\bm{Z}_{I}]\wedge\mathrm{Var}[\bm{\rho}_{I}] such that

f(λ)=1Var[𝒁I]Var[𝝆I]1Var[𝝆I].f(\lambda^{*})=\frac{1}{\mathrm{Var}[\bm{Z}_{I}]\wedge\mathrm{Var}[\bm{\rho}_{I}]}\geq\frac{1}{\mathrm{Var}[\bm{\rho}_{I}]}.

Therefore, we have

Var[β^+γ^]σ2nIVar[𝝆I].\mathrm{Var}[\hat{\beta}+\hat{\gamma}]\geq\frac{\sigma^{2}}{n_{I}\mathrm{Var}[\bm{\rho}_{I}]}.

The lower bound can be obtained by choosing 𝒁I=𝝆I[0,1]nI\bm{Z}_{I}=\bm{\rho}_{I}\in[0,1]^{n_{I}}, which is in general outside of the binary support for 𝒁I\bm{Z}_{I}.