Causal Effects in Twin Studies: the Role of Interference
2 Department of Psychology, University of Minnesota, Minneapolis, MN
3 Division of Biostatistics, University of Minnesota, Minneapolis, MN
[email protected])
Abstract
The use of twins designs to address causal questions is becoming increasingly popular. A standard assumption is that there is no interference between twins—that is, no twin’s exposure has a causal impact on their co-twin’s outcome. However, there may be settings in which this assumption would not hold, and this would (1) impact the causal interpretation of parameters obtained by commonly used existing methods; (2) change which effects are of greatest interest; and (3) impact the conditions under which we may estimate these effects. We explore these issues, and we derive semi-parametric efficient estimators for causal effects in the presence of interference between twins. Using data from the Minnesota Twin Family Study, we apply our estimators to assess whether twins’ consumption of alcohol in early adolescence may have a causal impact on their co-twins’ substance use later in life.
Keywords: Co-twin control method; Semi-parametric efficiency; Spillover effect
1 Introduction
While researchers have predominantly used twin studies to assess the heritability of a given trait, they are also increasingly leveraging twin designs to learn about the causal relationship between an exposure and an outcome: see for example [15, 7, 8, 9, 10, 19] and references therein. A popular tool in this area of twins research is the co-twin control method, in which an unexposed twin serves as the control for their exposed co-twin. This method uses the fact that twins are naturally matched on many predictors of the exposure and the outcome, such as shared genetics and characteristics of their shared environment; this can make it possible to estimate the causal effect of the exposure on the outcome even if these shared predictors are unobserved [15]. In this paper, we draw attention to an important causal assumption that underlies this approach. This is the assumption that there is no interference between twins, meaning one twin’s exposure has no causal impact on their co-twin’s outcome. This is a standard assumption made in the twins literature [21, 10]; however, evidence from the literature on sibling influence suggests that it may not hold in some cases. A number of studies have identified variables, such as substance use, where a subject’s behavior may be causally impacted by that of their siblings [22, 11, 17]. Since there is evidence of interference between siblings in these settings, it seems tenable that between twins, especially, interference could be present in some cases.
The possibility of interference between twins is important for a number of reasons. One is that the parameter estimated using the co-twin control method has a different causal interpretation when interference is present. Incorrectly assuming that there is no interference may therefore lead one to draw misleading conclusions. Interference impacts several issues regarding the causal parameter to be studied, including which parameters are of most scientific interest, and conditions under which these causal parameters can be identified from the observed data. If interference is present, key causal effects of interest include spillover effects, which are changes that would result in twins’ outcomes from intervening on their co-twins’ exposures, while holding their own exposures fixed; and main effects, which are changes that would result in twins’ outcomes from intervening on their own exposures, while holding their co-twins’ exposures fixed. Unfortunately, the co-twin control method does not provide a means of estimating spillover effects or main effects when there is interference between twins; rather, it targets an effect of more limited scientific and policy relevance.
In this paper, we clarify which causal effect is estimated by the co-twin control method when there is interference between twins, and argue that it is not a causal effect of prime interest. In order to identify effects that are of prime of interest, such as main effects and spillover effects, we then proceed under an assumption that the measured baseline covariates control for all confounding of the effects of the exposures on the outcomes. In this framework, we derive estimators of average main effects and average spillover effects which are semi-parametric efficient—that is, they make the most efficient use of the data possible, given the assumptions of the statistical models that we use.
In Section 2, we define potential outcomes and causal effects for our setting of independent pairs of twins with possible between-twin interference, and we review the co-twin control method for the case where there is no interference. In Section 3, we consider the co-twin control method with interference, then introduce the causal models that we will use in the rest of the paper, which make the assumption of no unmeasured confounding. In Section 4 we present the semi-parametric efficient estimators of average spillover effects and average main effects in these models. In Section 5 we apply these estimators to data from the Minnesota Twin Family Study, and investigate whether twins’ exposure to alcohol in early adolescence may have a causal impact on their co-twins’ drinking behavior in adulthood. We demonstrate the finite-sample performance of our estimators in a simulation study in Section 6, and conclude with a discussion in Section 7.
2 Background
2.1 Causal effects for the setting of within-pairs interference
Consider a twin study with a binary exposure , where if the subject is exposed and if the subject is unexposed, and an outcome , and suppose first that there is no interference between subjects. In this setting each subject is considered to have two potential outcomes: , the outcome that the subject would have if, possibly counter to fact, they were to have the exposure; and , the outcome that the subject would have if, possibly counter to fact, they were to be unexposed. The observed outcome is assumed to be equal to the potential outcome for subjects who have exposure , and equal to for subjects with . While only one potential outcome is observed for each twin—and hence the causal effect for an individual is never known—under additional assumptions (of positivity and exchangeability), the population average of these effects can be estimated from the observed data. Common targets of inference in twin studies include the average causal effect , where the mean is over all twins in the target population, and the average causal effect in the subgroup of twins who are discordant with their co-twin for the exposure.
Now suppose instead that there may be interference between the two twins in each pair, but not between twins from different twin pairs, so that each twin’s outcome may be impacted by their own exposure and their co-twin’s exposure. In this case, we cannot meaningfully talk about a twin’s outcome if they were to be exposed, or , since they might have one outcome if both they and their co-twin were to be exposed, and a different outcome if they were to be exposed but their co-twin were not. Thus, in this setting, each twin has 4 potential outcomes: , , , and , where we write for the outcome that the twin would have if, possibly counter to fact, they were to have exposure and their co-twin were to have exposure . For a twin who has exposure and whose co-twin has exposure , the observed outcome is assumed to be equal to the potential outcome , while the twin’s other 3 potential outcomes are not observed. There are a number of different causal effects that we can consider in this setting: of particular interest are average main effects and , in which the subject’s own exposure is varied while their co-twin’s exposure is held constant; and average spillover effects and , in which the subject’s own exposure is held constant while their co-twin’s exposure is varied.
Methods for estimating average main effects and average spillover effects have been studied by several authors. One body of research considers the setting of partial interference, in which there are multiple distinct groups of subjects and interference does not operate across different groups. See for example [5, 6, 23, 12, 14] for estimation of average main effects and average spillover effects under partial interference, and see [13, 12, 14] for asymptotic properties of the estimators. A different strand of the interference literature considers the setting where there is only one group, such as a single connected social network where interference could occur between any of the subjects. See for example [1, 26, 18, 16, 24]. The context that we consider here, namely independent groups of size two, falls into the first of these two frameworks. To the best of our knowledge, no previous work on partial interference demonstrated semi-parametric efficiency of an estimator. Here we derive semi-parametric efficient estimators for two models that are tailored to the setting of independent pairs.
2.2 Notation and data structure
In describing the data for a twin pair, we distinguish between those baseline covariates which are characteristics of the pair—and thus necessarily common between the two twins in a given pair—such as zygosity or parental characteristics; and baseline covariates which are characteristics of an individual twin and may or may not be the same for the two twins in a given pair. We refer to these respectively as shared covariates and individual covariates. In each twin pair, suppose that the twins have been randomly labeled as Twin 1 and Twin 2. The observed data for a given pair is , where denotes the shared baseline covariates, denotes the individual baseline covariates for Twin , and and are the binary exposure and the outcome for Twin . Throughout, , , and will refer to collections of measured baseline covariates. In some sections we will also consider factors which are unobserved; we will use notation such as or when we refer to collections of variables at least some of which are unobserved. We assume that we observe data for twin pairs, and that the observed data for these pairs are independent and identically distributed.
We use the notation for a potential outcome for Twin : specifically, let be the outcome that Twin would have if, possibly counter to fact, Twin were to have exposure and their co-twin were to have exposure . Our primary targets of inference will be means of potential outcomes and contrasts of such parameters, where here the mean is taken over all twins in the population. We can also write the as , where is the average of the potential outcomes within a twin pair, and the last expectation is a mean taken over all pairs. We will often express this way, so that the units we work in are the independent twin pairs rather than the individual twins.
2.3 The co-twin control method under the assumption of no interference
Here we review the co-twin control method for the setting where there is no interference between twins, following Sjölander et al. [21] and McGue et al. [15]; in Section 3 we compare the setting where there is interference.
The causal effect of the exposure on the outcome may be confounded by any predictor that is a common cause of both and (since there will be a non-causal source of association between and via the path through ). Such a predictor could be a measured or unmeasured factor individual to Twin , or a measured or unmeasured factor shared by both twins. While we can explicitly adjust for measured confounders, unobserved confounders would typically preclude identification of the causal effect. However, using the co-twin control method, all shared factors—whether measured or unmeasured—are naturally accounted for by the fact that the two twins are matched on these factors. Therefore, Sjölander et al.[21] showed, in cases where there is no unmeasured confounding due to individual factors, a causal effect is identified by adjusting for the measured individual covariates and .
Let denote the set of all confounders, including both measured and unmeasured factors, which are shared between the two twins. Consider the causal diagram in Figure 1(a), where a directed arrow means that the first variable has a possible causal impact on the second, while a bidirected arc between two variables indicates that there may be unobserved variables that are common causes of the two variables. Assume that the relationship among the variables is given by this diagram. In particular, assume that there is no interference between the two twins, as signaled by the absence of directed arrows and , and assume that there are no non-shared unmeasured variables that directly impact both and .
Consider the subgroup of twins who are discordant with their co-twins for the exposure, but who have the same level of individual covariates as their co-twins. Write to denote the average causal effect on this subgroup. For an exposed twin in this subgroup, their potential outcome is observed; and while their potential outcome is not observed, their unexposed co-twin’s potential outcome is observed. [21] show that, because of symmetry between the group of twins designated Twin 1 and the group designated Twin 2, we may use the unexposed co-twins’ outcomes as proxies for the exposed twins’ potential outcomes, and that the causal parameter is equal to , the mean difference in the observed outcome for the exposed twin and the observed outcome for the unexposed twin, within this subgroup. The latter difference can now be estimated, for example by fitting a between-within regression model as described in Sjölander et al. Thus when all individual confounders are fully observed, Sjölander et al. have shown that the within-pair coefficient in the between-within regression model has a causal interpretation, which is the causal effect for the subgroup of twins who are discordant with their co-twin for the exposure but have the same level of individual covariates as their co-twin.
3 Identification of causal effects when interference is present
Now we consider a setting where interference between twins may exist, that is, where it is possible for one twin’s exposure to affect their co-twin’s outcome. For example, this could occur because the two twins influence one another’s behavior or share information related to the exposure with each other. Interference is likely to be present in many studies where the exposure and outcome are behavioral; and incorrectly assuming that there is no interference can lead to misleading conclusions, and ignores spillover effects which may often be of interest.
3.1 The co-twin control effect when there is interference between twins
Here we modify the scenario described in Section 2.3 to allow for interference between the two twins in each pair. Suppose now that the relationship among the variables is as in Figure 1(b). In particular, there may be shared factors (both measured and unmeasured) which impact both the exposures and the outcomes, but we assume that there are no unmeasured non-shared factors that directly cause both and , or that directly cause both and . As before, the matching between the twins on shared factors allows for identification of one subgroup causal effect: importantly, this is the effect which is estimated using the co-twin control method if there is interference present. However, we argue that this is not an effect of prime importance, and that a different approach is needed in order to identify more useful effects such as average main effects and average spillover effects.
Between-within regression model | |
---|---|
Statistical value of the | |
regression coefficient | |
Causal interpretation of with | |
no interference between twins | |
Causal interpretation of with | |
interference between twins |
As in Section 2.3, consider the subgroup of twins who are discordant with their co-twins for the exposure, but who have the same level of individual covariates as their co-twins. For an exposed twin in this subgroup, their potential outcome is observed, and for an unexposed twin in this subgroup, their potential outcome is observed. In Appendix A, we prove that, under the assumptions listed there, the mean of the contrast on this subgroup, , is equal to . Note that the latter difference is the same quantity that appears in Section 2.3, and which is estimated using a between-within regression model, but that the causal interpretation of this parameter is different in cases where there is interference between twins. The different interpretations for the two settings are highlighted in Table 1. The effect identified here is the average difference in twins’ outcomes that we would see if we could intervene to change the twins from unexposed to exposed, while conversely intervening to change their co-twins from exposed to unexposed, in the subgroup described above. It is difficult to see why one would want to target this particular combination of interventions. The contrast is the difference of the spillover effect and the main effect ; however we cannot tease apart these two effects, and the value of itself is not readily interpretable. For example, a zero value of could reflect the fact that , or it could reflect a qualitatively different scenario where and are each nonzero but cancel each other out. That is, a null value of is equally compatible with the exposure having no causal effect, or with the presence of very strong interference.
The co-twin control method is a unique tool which allows for estimation of a causal effect even with shared unmeasured confounders, based on the use of discordant pairs. However, we cannot identify contrasts involving or from discordant pairs when interference is present, since these potential outcomes need not equal the observed outcome of either twin in a discordant pair. Therefore, the presence of shared unmeasured confounders poses a greater barrier than it does in settings with no interference, since the co-twin control method does not provide a means of estimating important causal effects for the interference setting. In order to identify average main effects and average spillover effects, throughout the rest of the paper we work under the assumption that there is no unmeasured confounding due to either individual or shared factors.
3.2 Identification under the assumption of no unmeasured confounding
Throughout the rest of the paper we make the following assumption of no unmeasured confounding: there are no unmeasured factors, whether shared or non-shared, which directly cause both and , or that directly cause both and . Specifically, we assume that any unmeasured factors which impact both an outcome and an exposure do so only through the measured baseline covariates , , and . Under this assumption, adjusting for the measured baseline covariates controls for all confounding of the effects of and on . This is an untestable assumption which is not automatically satisfied by design in an observational study; how reasonable the assumption is in a given study will depend on factors specific to that study, including how rich a set of baseline covariates is measured.
Consider the four groups of twin pairs with the four exposure patterns , , , and , and consider a specific potential outcome . In general the potential outcomes could be systematically higher among one of these four groups than another. For example they could be higher among the group than among the group if having some predictor causes both exposures and potential outcomes to be higher. However, our assumption of no unmeasured confounding implies that adjusting for the measured baseline covariates breaks any such association between and the exposures, and that within levels of the measured baseline covariates , the four groups are exchangeable in terms of potential outcomes . This assumption of no unmeasured confounding, or exchangeability, is given by:
(exchangeability) |
We additionally make the positivity assumption that, within each level of the baseline covariates, there is a nonzero probability of having each of the 4 exposure patterns, and the consistency assumption that a twin’s observed outcome is equal to the twin’s potential outcome corresponding to the exposures actually received by the the twin and their co-twin:
(positivity) | ||||
(consistency) |
The assumptions of exchangeability, positivity, and consistency are sufficient for identification of the parameters . That is, under these three assumptions, we can express the causal parameter as a function of the observed data, as we show for completeness below. Exchangeability and positivity allow us to take the mean over just the group with the exposure pattern rather than over all twins labeled as Twin in the population, within each level of the baseline covariates; and consistency implies that, among the twins in this group, the potential outcome is equal to the observed outcome .
by A1 and A2 | ||||
by A3 | ||||
We will consider two models for the data for each twin pair. The larger of these, Model 1, makes only the assumptions used for identification above, and corresponds to the diagram in Figure 2(a). Here represents any unmeasured factors (shared or not). The absence of arrows pointing directly from to the exposures and the outcomes corresponds to the no unmeasured confounding assumption. Model 2 is a smaller model in which we make three extra assumptions in addition to the identification assumptions, and corresponds to the diagram in Figure 2(b). These three extra assumptions, A4-A6, are that Twin ’s individual covariates do not have a causal impact on their co-twin’s exposure (as seen by the lack of a directed arrow ), or on their co-twin’s outcome (as seen by the lack of a directed arrow ); and that Twin 1’s exposure and Twin 2’s exposure are conditionally independent given the measured shared and individual baseline covariates (as seen by the lack of a bidirected arc ).
A4: | ||||
A5: | (Model 2 assumptions) | |||
A6: |
The assumption that Twin ’s individual-level covariates do not impact their co-twin’s exposure or outcome is one that is commonly used in the twin literature [21], and the distinction drawn in Model 2 between the individual and the shared covariates is a feature that distinguishes Model 2 from models considered elsewhere in the interference literature. While Model 1, being less restrictive, gives valid inference in a wider range of settings than Model 2, the advantage of Model 2 is that it allows for improved (asymptotic) efficiency: in settings which meet the assumptions posited for Model 2, we may leverage these additional assumptions to make a more efficient use of the data in estimating our causal parameters of interest, allowing for narrower confidence intervals in large samples. We demonstrate this improved efficiency in a simulation study in Section 6.
4 Efficient estimation of causal effects when interference is present
4.1 Efficient estimators
Here we derive the efficient estimator of the parameter for each of the two models described in Section 3.2. That is, we derive the estimator with the smallest asymptotic variance, out of all estimators for which are regular and asymptotically linear (RAL) under every distribution in Model 1, and similarly for Model 2. Since Model 2 is a smaller model contained in Model 1, any RAL estimator for Model 1 may be used in Model 2, while the converse need not hold; and we will see that, in fact, there are RAL estimators for Model 2 that are more efficient than any Model 1 estimator.
RAL estimators are asymptotically normal with asymptotic variance determined by their influence function. In order to obtain the RAL estimator with the smallest possible asymptotic variance, we first derive the efficient influence function for the parameter in each Model . The efficient estimator based on data from independent twin pairs is then found as the solution to the estimating equation . See Tsiatis [25] for more on efficient influence functions.
The efficient influence function for in Model 1 is:
Proofs are given in Appendix B. In order to use to obtain an estimator for , we need a model for the joint propensity score and a model for the outcome regression . The resulting estimator of will be efficient provided that both of these models are correctly specified, and provided that the estimated values converge to the truth at fast enough rates. (See Section 4.2 for more discussion of rates.) Suppose this is the case. Let be the predicted propensity score for a twin pair with covariates , and let be the predicted outcome regression for Twin in a twin pair with covariates if the twins’ exposures were set to . Then the efficient estimator for in Model 1, based on data , is given by:
The estimator is the augmented inverse probability weighted estimator for a bivariate exposure. It is a doubly robust estimator [20]: as long as at least one of the propensity score model or the outcome regression model is correctly specified, remains a consistent estimator for , even if the other model is misspecified. Liu et al. [14] have presented three doubly robust estimators of average main effects and average spillover effects in groups of size , working in the analog of Model 1. Our estimator corresponds to one of their estimators specialized to groups of size 2. Our contribution as regards Model 1 is to prove that this estimator is semiparametric efficient in Model 1, thus providing a partial answer to a question posed in Liu et al. [14]. To the best of our knowledge, Model 2 is distinct from other models that have been considered in the interference literature.
If the true distribution lies inside the smaller Model 2, a more efficient estimator than is possible. The efficient influence function for in Model 2 is:
In order to use to obtain an estimator of , we need a model for the outcome regression , a model for the propensity score which relates Twin ’s exposure to their own covariates, and a model for the propensity score which relates Twin ’s exposure to their co-twin’s covariates. The resulting estimator of will be efficient provided that all three of these models are correctly specified, and provided that the estimated values converge to the truth at fast enough rates. Suppose this is the case. Let be the predicted value of the propensity score for a twin pair with covariates . Let be the predicted value of the propensity score for a twin pair with covariates . Let be the predicted outcome regression for Twin in a twin pair with covariates if the twins’ exposures were set to . Then the efficient estimator for in Model 2, based on data , is given by:
The estimator is also doubly robust: it is consistent if (i) both propensity score models are correct, even if the outcome regression model is incorrect, or (ii) the outcome regression model is correct, even if one or both propensity score models are incorrect. We show the double robustness of both estimators in Appendix C, and we also illustrate this property in the simulation study in Section 6.
Efficient influence functions of average main effects and average spillover effects are obtained as differences of efficient influence functions of the parameters. For example, the efficient influence function of the spillover effect in Model is . Similarly the efficient estimator of in Model is .
4.2 Confidence intervals
Here we consider confidence intervals for a parameter or a linear combination of such parameters, such as an average spillover effect or average main effect, based on the efficient estimators presented above.
Denote the parameter of interest by . Let denote the efficient influence function of in Model , where represents the parameters of the propensity score models and outcome regression model, which must be estimated in order to obtain the efficient estimator for . Under the assumption that the parameters are estimated consistently at fast enough rates, the resulting estimator for will be asymptotically normal at root- rates. Therefore, we may use a Wald-type confidence interval for . Estimating at rates faster than —that is, using an estimator such that is bounded in probability for some —is sufficient. However, this can be relaxed somewhat since our estimator for is doubly robust, and the bias of a doubly robust estimator is the product of the bias in the propensity score estimation times the bias in the outcome regression estimation. Therefore it suffices that we estimate the propensity scores and outcome regression at rates whose product is less than . Suppose this is the case, and let be the solution of the estimating equation . Because of the assumption on the rate of convergence for , the influence function of is the efficient influence function (with the true value of the nuisance parameters ). Therefore is the efficient estimator of , and the asymptotic variance of is the variance of .
Thus in large samples, the variance of is approximately . A consistent estimator of is . Therefore, a large-sample Wald-type confidence interval for is given by:
Alternatively, we may use a bootstrap-based confidence interval, which may have better coverage than the Wald-type confidence interval at moderate sample sizes.
5 Application: spillover effects of alcohol in early adolescence
Early use of of alcohol is often associated with substance use and dependence in adulthood [4, 3]. Recently, Irons et al. [7] used a twin design to determine whether this association is due in part to a causal relationship. The authors found that there is evidence of a causal effect of twins’ early alcohol use on their own adult substance use outcomes and antisocial behavior outcomes. Here we consider the same data from the Minnesota Twin Family Study; but rather than focusing on the effect of twins’ exposures on their own outcomes, we investigate whether twins’ early alcohol use has a causal impact on their co-twins’ outcomes. That is, our primary question of interest is whether there is evidence of a nonzero spillover effect.
In the Minnesota Twin Family Study, pairs of same-sex twins were contacted for an intake assessment at which baseline covariates were collected at target age 11; to assess exposure status at target age 14; and to measure outcomes at target age 24. The exposure that we consider here is an indicator of whether the subject had ever consumed alcohol without their parents’ permission by the time of the exposure interview. The outcome on which we focus is Drinking Index, which is a composite measure of adult drinking frequency and amount. Shared covariates which we include are severity of parent alcohol abuse, severity of parent drug abuse, parents’ occupation level, and sex and zygosity of the twins. Individual covariates which we include are a measure of academic motivation, number of externalizing disorder symptoms (such as symptoms of attention deficit/hyperactivity disorder or conduct disorder), amount of conflict with parents, and actual age at the time of the exposure assessment. Here we use only the complete case pairs, which account for 511 of the 761 pairs of twins in the dataset; future work is needed to develop methods to address missing data in settings where there is interference between pairs.
Our primary target of inference is the spillover effect , which is the mean difference in twins’ outcomes we would see if we could intervene to change all co-twins from exposed to unexposed, while keeping all twins unexposed. We also estimate the main effect , the mean difference in twins’ outcomes we would see if we could intervene to change all twins from exposed to unexposed, while keeping all co-twins unexposed. We estimate and using the efficient estimators that we derived in Section 4. For the Model 1 estimator, we use generalized additive models to model the outcome regression . In order to model the joint propensity score , we first use generalized additive models to fit the marginal distributions and , then model the association between the exposures via a Dale model [2]. We allow the strength of the association to vary by sex and zygosity. For the Model 2 estimator, we use generalized additive models to model the outcome regression , the propensity score , and the propensity score . Details are given in Appendix D.
Results are shown in Table 2. Using either of the two estimators, there is evidence of a nonzero spillover effect of twins’ use of alcohol in early adolescence on their co-twins’ adult drinking behavior. That is, there is evidence of interference between twins in this study. A spillover effect of (the point estimate using the Model 1 estimator) would indicate that twins’ Drinking Index outcomes would be an average of 1.5 points lower if all twins and all co-twins were unexposed to alcohol in early adolescence, compared to all twins being unexposed with their co-twins exposed. The Drinking Index outcome ranges from 0 to 20, with mean 10.7 and standard deviation 4.0. Larger values correspond to more frequent and/or greater amounts of drinking, so a decrease of 1.5 points would represent a moderate benefit. There is also evidence of a nonzero main effect. A main effect of points would indicate that twins’ Drinking Index outcomes would be an average of 2.1 points lower if all twins changed from exposed to unexposed, while their co-twins remained unexposed.
Our estimators for and are based on the assumption of no unmeasured confounding due to shared or individual factors. This assumption is untestable, meaning that we have no means of determining that it holds in our study; and if it does not in fact hold, this could invalidate our results. However, we can check for one specific way in which the assumption may fail, through comparison of monozygotic (MZ) twins and dizygotic (DZ) twins. If there is no unmeasured confounding which is due to shared genetics or other unmeasured factors that are differential between MZ and DZ twins, then, since the distribution of measured covariates is well balanced across these two groups, the average spillover effect in the population of MZ twins, , should be equal to the average spillover effect in the population of DZ twins, . Therefore, evidence that the average spillover effects are different in these two populations would be evidence that there is unmeasured confounding due to shared genetics.
Spillover effect | Main effect | CTC effect | ||
---|---|---|---|---|
All twins | Model 1 | -1.470 (-2.318, -0.716) | -2.093 (-3.251, -0.738) | 0.730 (0.064,1.372) |
(n=510) | Model 2 | -1.599 (-2.318, -0.762) | -2.298 (-3.356, -0.823) | |
MZ twins | Model 1 | -1.671 (-2.796, -0.640) | -2.131 (-3.422, -1.094) | 0.577 (-0.224, 1.347) |
(n=324) | Model 2 | -1.931 ( -2.821, -0.705) | -2.424 (-3.499, -1.123) | |
DZ twins | Model 1 | -1.039 (-2.523, 0.118) | -1.952 (-3.711, -0.856) | 1.016 (-0.106, 2.155) |
(n=186) | Model 2 | -0.957 (-2.245, 0.210) | -2.084 (-3.784, -0.890) |
The point estimates are consistent with a somewhat stronger spillover effect among MZ twins than among DZ twins; however, a 95% confidence interval for the parameter is , so the data do not provide evidence that is different from zero. Thus, comparison of the MZ and DZ subgroups does not suggest that there is unmeasured confounding due to shared genetics or factors that are differential between MZ and DZ twins. However, this subgroup comparison does not provide a means of assessing whether other types of unmeasured confounding may be present. Unmeasured shared factors such as peer group or attributes of the twins’ school or community, and unmeasured individual factors, could still invalidate our results if they are not captured by measured covariates.
We also fit a between-within regression model and report the within-pair coefficient . The causal interpretation of is valid even if there are shared unmeasured confounders, provided that there is no unmeasured confounding due to individual factors. In Section 3.1 we saw that, in settings where there is interference between twins, the causal interpretation of is equal to the spillover effect minus the main effect on a subset of the group of exposure-discordant twins. If the estimates of were not consistent with the estimates of , this could be an indication that the estimates of and were invalid due to shared confounders (though it could also be an indication that the subgroup effect does not generalize to the whole population of twins). In our case the estimates are consistent with each other.
Irons et al. [7] estimated the effect of adolescent alcohol use on adult drinking behavior using two approaches, propensity score weighting and the co-twin control method, and found that the results using the co-twin control method were attenuated compared to the first method. They pointed out that one possible reason for this attenuation is that unmeasured shared confounders could be creating bias in the propensity score estimates. Another possible explanation which appears now is interference: if there is interference between twins in this study, then the parameter estimated using the co-twin control method is not simply the effect of twins’ exposure on their outcome, but the difference of this quantity and the spillover effect from their co-twins’ exposure.
6 Simulation study
Here we evaluate the finite-sample performance of the estimators derived in Section 4, using simulated data with sample size designed to mimic the data from the Minnesota Twin Family Study (MTFS). We consider two data-generating mechanisms. Under the first data-generating mechanism, the larger Model 1 holds but the smaller Model 2 does not, while under the second data-generating mechanism both models hold. We generate the covariates in the simulated data by resampling covariates from the MTFS dataset. We then generate the exposures from a joint distribution of exposures given covariates based on modeling of the MTFS data, and generate the outcomes from a joint distribution of outcomes given exposures and covariates based on modeling of the MTFS data. Details of the data-generating mechanisms are given in Appendix D.
For each data-generating mechanism, we estimate the spillover effect using each of 5 estimators. Under the first data-generating mechanism, we estimate using the efficient estimator for Model 1 with both the joint propensity score model and the outcome regression model correctly specified; the Model 1 estimator with one but not both of these correctly specified; the Model 1 estimator with both misspecified; and the Model 2 efficient estimator with the outcome regression and correctly specified. Table 3 shows the empirical bias of each estimator in simulations; the empirical variance of each estimator; the mean of the influence function-based variance estimates; the mean of the bootstrap variance estimates based on bootstrap replicates within each simulation; the coverage of the 95% Wald confidence intervals for using the influence function-based variance estimates; and coverage of the percentile bootstrap 95% confidence intervals. As expected, bias of the correctly specified Model 1 estimator is low. Also as expected, since is a doubly robust estimator, bias remains low even when either the propensity score model or the outcome regression model is misspecified. Coverage probabilities of the percentile bootstrap confidence intervals are very close to the nominal 95% level, while the influence function-based variance estimates are slightly anti-conservative at this sample size, resulting in coverage probabilities that are somewhat lower.
Bias | Var | IF-Var Est | Wald Cov’g | Btstp-Var Est | Btstp Cov’g | |
-0.00658 | 0.11390 | 0.09886 | 92.96 | 0.12582 | 95.12 | |
0.00320 | 0.11587 | 0.07195 | 87.50 | 0.13357 | 95.68 | |
0.00042 | 0.11076 | 0.09971 | 93.44 | 0.12522 | 95.10 | |
-0.02243 | 0.11766 | 0.11348 | 94.12 | 0.11136 | 94.70 | |
-0.04121 | 0.11736 | 0.11618 | 94.56 | 0.11400 | 94.60 |
Bias | Var | IF-Var Est | Wald Cov’g | Btstp-Var Est | Btstp Cov’g | |
0.00719 | 0.08526 | 0.07612 | 93.24 | 0.08967 | 95.30 | |
0.00576 | 0.08273 | 0.07427 | 93.46 | 0.08567 | 95.14 | |
-0.01473 | 0.08232 | 0.07171 | 92.88 | 0.08375 | 94.82 | |
-0.00005 | 0.08710 | 0.08372 | 94.86 | 0.08263 | 95.00 | |
-0.00453 | 0.08716 | 0.08253 | 94.20 | 0.08231 | 94.50 |
Under the second data-generating mechanism, we estimate using the efficient estimator for Model 1 with both the joint propensity score model and the outcome regression model correctly specified; the Model 2 efficient estimator with both propensity score models and the outcome regression model all correctly specified; the Model 2 estimator with either the propensity score models or the outcome regression model misspecified; and the Model 2 estimator with all of these misspecified. Results are displayed in Table 4. We expect each of the first four of these estimators to have low bias, as they do. Coverage probabilities are close to the nominal level, especially using the bootstrap confidence intervals.
Additionally, the Model 2 efficient estimator has smaller variance than the Model 1 estimator under the second data-generating mechanism. The theory from Section 4 shows that the Model 2 estimator is asymptotically more efficient than the Model 1 estimator when both models are correct; and here we see improved precision at sample size in this simulation scenario.
7 Discussion
In this paper we have considered the setting of independent pairs of twins where one twin’s exposure may have a causal impact on their co-twin’s outcome. Whether or not interference is present in a given study will depend on the nature of the exposure and the outcome: in some cases, researchers may be able to rule out the possibility of between-twin interference at the outset based on their knowledge of the domain; while for many behavioral exposures and outcomes, interference is likely to be a possibility. For settings where there may or may not be interference based on scientific considerations, researchers may allow for possible interference and estimate the spillover effect of twins’ exposures on their co-twins’ outcomes using the estimators we have presented here. Evidence of a nonzero spillover effect would be evidence of interference. We have highlighted the impact that between-twin interference would have for researchers using the co-twin control method: when there is no interference, this method estimates the causal effect of twins’ exposures on their outcomes. When there is interference, however, it estimates the difference of two effects: the spillover effect of the co-twins’ exposures on the twins’ outcomes, minus the main effect of twins’ exposures on their own outcomes.
Under the assumption of no unmeasured confounding, we derived the semi-parametric efficient estimators of key causal effects for the interference setting, including average main effects and average spillover effects. We applied our estimators to data from the Minnesota Twin Family Study (MTFS), and found evidence that twins’ exposure to alcohol in early adolescence may have a spillover effect on their co-twins’ drinking behavior in adulthood. However, if there are genetic or other shared or individual unmeasured factors impacting both the choice to drink in early adolescence, and drinking behavior in adulthood (after controlling for measured covariates), this could invalidate our results. Comparing the causal effects among MZ twin pairs versus among DZ twin pairs did not yield evidence of unmeasured confounders due to shared genetics in the MTFS data. The development of sensitivity analyses, showing how a range of different strengths of unmeasured confounding would impact results in the interference setting, would be valuable future work.
Future work towards addressing missing data in this setting is also needed. As with the MTFS data, there may be missingness in baseline covariates, exposures, and outcomes, and the interference structure leads to some challenges in addressing missing data. An imputation approach, for example, should be designed in a way which is compatible with the different models to be fit for construction of the estimators; how best to do this for the Model 2 estimator under minimal modeling assumptions is an open research question.
A final direction of future research involves leveraging symmetry. Throughout, we randomly labeled the twins in each pair as Twin 1 and Twin 2. Because this labeling is random, it follows that there is symmetry between the population of twins who are labeled as Twin 1, and the population of twins who are labeled as Twin 2. In our models we did not explicitly make an assumption of symmetry, and our estimators therefore apply not only to twins data, but to any setting of independent pairs where there is possible interference between partners. However, in the twins setting, leveraging such a symmetry assumption could lead to additional efficiency gains, and in future work we plan to derive efficient estimators for models which do impose an assumption of symmetry between the twins.
Funding
This work was supported by [ONR grant N00014-18-1-2760 to E.O. and B.S.]; and [R37-AA009367 to M.M.].
Acknowledgments
Conflict of Interest: None declared.
References
- [1] Peter M. Aronow and Cyrus Samii. Estimating average causal effects under general interference, with application to a social network experiment. The Annals of Applied Statistics, 11(4):1912–1947, 12 2017.
- [2] Jocelyn R. Dale. Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics, 42(4):909–917, 1986.
- [3] Bridget F. Grant and Deborah A. Dawson. Age at onset of alcohol use and its association with dsm-iv alcohol abuse and dependence: results from the national longitudinal alcohol epidemiologic survey. Journal of Substance Abuse, 9:103 – 110, 1997.
- [4] Julia D. Grant, Jeffrey F. Scherrer, Michael T. Lynskey, Michael J. Lyons, Seth A. Eisen, Ming T. Tsuang, William R. True, and Kathleen K. Bucholz. Adolescent alcohol use is a risk factor for adult alcohol and drug dependence: evidence from a twin design. Psychological Medicine, 36(1):109–118, 2006.
- [5] Guanglei Hong and Stephen W Raudenbush. Evaluating kindergarten retention policy. Journal of the American Statistical Association, 101(475):901–910, 2006.
- [6] Michael G Hudgens and M. Elizabeth Halloran. Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832–842, 2008. PMID: 19081744.
- [7] Daniel E. Irons, William G. Iacono, and Matt McGue. Tests of the effects of adolescent early alcohol exposures on adult outcomes. Addiction, 110(2):269–278, 2015.
- [8] Wendy Johnson, Eric Turkheimer, Irving I. Gottesman, and Thomas J. Bouchard Jr. Beyond heritability: Twin studies in behavioral research. Current Directions in Psychological Science, 18(4):217 – 220, 2009.
- [9] Kenneth S. Kendler and Charles O. Gardner. Dependent Stressful Life Events and Prior Depressive Episodes in the Prediction of Major Depression: The Problem of Causal Inference in Psychiatric Epidemiology. JAMA Psychiatry, 67(11):1120–1127, 11 2010.
- [10] Benjamin B. Lahey and Brian M. D’Onofrio. All in the family: Comparing siblings to test causal hypotheses regarding environmental influences on behavior. Current Directions in Psychological Science, 19(5):319–323, 2010.
- [11] Brett Laursen, Amy C. Hartl, Frank Vitaro, Mara Brendgen, Ginette Dionne, and Michel Boivin. The spread of substance use and delinquency between adolescent twins. Developmental Psychology, 53(2):329 – 339, 2017.
- [12] L. Liu, M. G. Hudgens, and S. Becker-Dreps. On inverse probability-weighted estimators in the presence of interference. Biometrika, 103(4):829–842, 12 2016.
- [13] Lan Liu and Michael G. Hudgens. Large sample randomization inference of causal effects in the presence of interference. Journal of the American Statistical Association, 109(505):288–301, 2014. PMID: 24659836.
- [14] Lan Liu, Michael G. Hudgens, Bradley Saul, John D. Clemens, Mohammad Ali, and Michael E. Emch. Doubly robust estimation in observational studies with partial interference. Stat, 8(1):e214. e214 sta4.214.
- [15] Matt McGue, Merete Osler, and Kaare Christensen. Causal inference and observational research: The utility of twins. Perspectives on Psychological Science, 5(5):546–556, 2010.
- [16] Caleb H. Miles, Maya Petersen, and Mark J. van der Laan. Causal inference when counterfactuals depend on the proportion of all subjects exposed. Biometrics, 75(3):768–777, 2019.
- [17] Gerald S. Oettinger. Sibling similarity in high school graduation outcomes: Causal interdependency or unobserved heterogeneity?. Southern Economic Journal, 66(3):631 – 648, 2000.
- [18] Elizabeth L. Ogburn, Oleg Sofrygin, Ivan Diaz, and Mark J. van der Laan. Causal inference for social network data. arXiv e-prints, page arXiv:1705.08527, May 2017.
- [19] Jonathan D. Schaefer, Terrie E. Moffitt, Louise Arseneault, Andrea Danese, Helen L. Fisher, Renate Houts, Margaret A. Sheridan, Jasmin Wertz, and Avshalom Caspi. Adolescent victimization and early-adult psychopathology: Approaching causal inference using a longitudinal twin study to rule out noncausal explanations. Clinical Psychological Science, 6(3):352–371, 2018. PMID: 29805917.
- [20] Daniel O. Scharfstein, Andrea Rotnitzky, and James M. Robins. Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448):1096–1120, 1999.
- [21] Arvid Sjölander, Thomas Frisell, and Sara Ö berg. Causal interpretation of between-within models for twin research. Epidemiologic Methods, 1(1):217 – 237, 2012.
- [22] Cheryl Slomkowski, Richard Rende, Scott Novak, Elizabeth Lloyd-Richardson, and Raymond Niaura. Sibling effects on smoking in adolescence: evidence for social influence from a genetically informative design. Addiction, 100(4):430 – 438, 2005.
- [23] Eric J Tchetgen Tchetgen and Tyler J VanderWeele. On causal inference in the presence of interference. Statistical Methods in Medical Research, 21(1):55–75, 2012. PMID: 21068053.
- [24] Eric J. Tchetgen Tchetgen, Isabel Fulcher, and Ilya Shpitser. Auto-G-Computation of Causal Effects on a Network. arXiv e-prints, page arXiv:1709.01577, September 2017.
- [25] Anastasios A. Tsiatis. Semiparametric Theory and Missing Data. Springer, New York, 2006.
- [26] Mark J. van der Laan. Causal inference for a population of causally connected units. Journal of Causal Inference, 2(1):13 – 74, 2014.
Appendix A Identification of the co-twin control effect under interference
Here we show identification of the subgroup causal effect discussed in Section 3.1, under the following assumptions:
C1: | |||
C2: | |||
C3: | |||
C4: |
C1-C3 are untestable assumptions of exchangeability within levels of the (possibly unobserved) shared factors and the observed individual covariates ; consistency; and positivity. C4 is an assumption of symmetry between the population of twins labeled as Twin 1 and the population labeled as Twin 2, which should hold by design as the labeling is randomly assigned.
Let us denote the subgroup of twins who are exposure-discordant with their co-twin, but who have the same value of of all individual covariates as their co-twin, as . Thus is the mean of the contrast over all twins in the subgroup . Note first that this is a weighted average of Twin 1’s contrast and Twin 2’s contrast in the subgroup of where Twin 1 is exposed and the subgroup where Twin 2 is exposed. Specifically, writing , we have:
Now is equal to by consistency, while can be identified by leveraging the symmetry between the group of twins labeled as Twin 1 and the group labeled as Twin 2:
by C1 | ||||
by C4 | ||||
by C1 | ||||
by C2 | ||||
Similarly, is identified as . Therefore,
Appendix B Efficient influence functions
Here we derive the efficient influence function for the parameter , for each of the two models presented in Section 3.2. This also immediately gives the efficient influence function for , and of sums and differences of such parameters. We use the theory, terminology, and notation of Tsiatis [25] and Scharfstein et al. [20].
We assume that we have independent pairs of twins, and that the data from these twin pairs constitute i.i.d. draws from some distribution. Let be the observed data for each twin pair. Let be the full data for each twin pair. Under the consistency assumption, the observed data is a coarsening of the full data, since for each . We partition the full data into and , where determines which components of are observed.
Let and denote the Hilbert spaces of mean-zero functions of the observed data, and of the full data, respectively, with the covariance inner product. Full-data influence functions for are elements of the space , the orthogonal complement of the full-data nuisance tangent space . Observed-data influence functions are elements of the space , the orthogonal complement of the observed-data nuisance tangent space , where for some . The efficient influence function in a model is the unique influence function for which is an element of the observed data tangent space of that model; moreover, the efficient influence function is the projection of any influence function onto .
B.1 The efficient influence function in Model 1
Model 1 is the nonparametric model which places no restrictions on the observed data. This implies that there is just a single observed-data influence function for in Model 1, which is therefore the efficient influence function. By theory in [20] and [25], the observed-data influence function may be found by using a full data influence function for , say , and the mapping defined by . An element in the inverse image of under this mapping is in the space . This can be seen using adjoint operators: if we consider the linear map defined by , then is the range of . Therefore, by properties of adjoint operators, is the null space of , the adjoint of . In this case , and therefore . The residual from projecting onto the space is in the space . In our case, the spaces and are orthogonal, which implies that is equal to . Therefore the residual is in and is thus the observed data influence function for in Model 1.
A full data influence function for is . To verify this, set , and partition the parameters of into variation-independent components and , where is the distribution of and is formed by deleting from the vector . Then there is a one-to-one transformation from to , which we factor as . Because the only constraint on these distributions is that is mean zero, we can show that the nuisance tangent space for is . Therefore as claimed, since is orthogonal to each of the direct summands.
An element of which is in the inverse image of for the mapping , and hence in , is:
The space is . Since elements of are functions of the observed data (due to the no unobserved confounding assumption), the space is equal to the space . Projection onto is given by:
as this operation yields elements that are in , and whose residuals are orthogonal to . Below we write . Then:
Therefore, taking the residual, the efficient influence function for for Model 1 is
B.2 The efficient influence function in Model 2.
Model 2, by contrast, does impose constraints on the observed data. In a model that imposes restrictions on the observed law, there will be multiple observed-data influence functions. An approach for deriving the efficient influence function in such a setting is to (i) find one observed data influence function, say , and (ii) project onto the observed data tangent space for the model. Here we will actually start by considering a slightly smaller model, which we will call Model 3, in which we impose the additional assumption that . The observed data tangent space for Model 3 is more straightforward to compute than the observed data tangent space for Model 2; therefore we will first derive the the efficient influence function for in Model 3, then show the efficient influence function for in Model 3 is the same as the efficient influence function for in Model 2.
We start by recalling the assumptions placed on the full data in Model 3. Here we write for the vector of potential outcomes :
A1: | (exchangeabilitiy) | |||
A2: | (positivity) | |||
A3: | (consistency) | |||
Finally, we also assume the property of composition, meaning that for any sets of variables , if and , then it also holds that , as with graphical -separation. The following independencies follow from the assumptions listed above:
-
•
B1: .
A1 A5 -
•
B2: . By B1, A5, and composition, we have . Now the result follows immediately from the weak union property of conditional independence, which states that if , then .
-
•
B3: . By A3, for each , . Therefore it suffices to show that , and this follows immediately from B2 by weak union.
-
•
B4: . By A1, A7, and composition, . Since the observed outcome is a function of , this implies . Now the result follows by weak union.
-
•
B5: . By B4 and weak union, . By A3, this implies .
-
•
B6: .
By B1, , and by A5, . Therefore by composition, . We show that :
A7 A5 Therefore by composition , and this implies
by weak union. -
•
B7: .
A4 A6
We now derive the observed data tangent space for Model 3, by first computing the full data tangent space, then showing how to move from the full data to the observed data tangent space. In Model 3 the full data likelihood factors as:
A1 | ||||
A6 | ||||
A4 | ||||
A7 | ||||
A5 |
The full data tangent space is , where:
The observed data tangent space is , where . For
, , while and .
We claim that the 5 spaces are mutually orthogonal. To show that , let and , where . Then:
A5 |
All other parts of the claim follow similarly from the orthogonality of the full data spaces , except for showing that and are orthogonal. To show that , let where , and let , where . Then:
B5 | ||||
B3 | ||||
B4 | ||||
B2 | ||||
B1 |
Since is a sum of 5 mutually orthogonal spaces, projection onto is given by the sum of the projections onto each of the 5 spaces. Projection onto , , and is straightforward. For projection onto , we will use the following claim (and analogously for ):
Claim: is equal to the space , where
To show the claim, we first show that . Let . As shown in the preceding proof, by B5 and B3, which shows that is a function of only. We must also check that :
B1 |
Therefore . To show the opposite containment, we will show that , which is equivalent to . We begin by computing , using properties of adjoint operators.
Consider the linear map given by . The adjoint of is the map
such that for all and all . The range of is . Therefore, by properties of adjoint operators, is the null space of . We show that is given by . First, note that for any , is indeed an element in , since it is a function of that is mean zero given . Now we check that :
where the last equality follows since
Therefore,
Now let be the set
We will show the chain of containments .
We first show that, for any , for each , is a function of , , and only. This follows from B6: since , we can remove all but one of the potential outcomes in in the conditioning below.
B6 | ||||
Now by B1, . Write for this function. Then we have
Now take , so that . We will show that
. Isolating in the equation
, we have
Since the right hand side does not depend on , this shows that does not depend on . Therefore is constant in , and hence .
Similarly, , , and are functions of only, and therefore for each . Therefore as claimed, which shows that .
Finally we will show that Let , so that . We show that is orthogonal to every :
This completes the proof of the claim that . The analogous result holds for by symmetry. Therefore the observed data tangent space for Model 3 is the direct sum of 5 mutually orthogonal spaces , where
The efficient influence function for in Model 3 is the projection of any influence function for in Model 3, say , onto , where by orthogonality, .
Because Model 3 is a subset of Model 1, the full-data influence function from Model 1 is also a full-data influence function for Model 3. Following the steps outlined in Section 1.1, the element
is in the space for Model 3. In Model 3, , so one influence function for in Model 3 is:
Therefore, the efficient influence function for in Model 3 is:
(1) |
where the last line follows because for each , and because for any , by orthogonality.
For notational convenience we take in the following calculations; the calculations for other values of are exactly similar. We write , . By A4 and A6, . Then:
We compute each of the 3 terms in (1):
A1 | ||||
A5 | ||||
B1 | ||||
A3 |
We relabel and as and , according to whether or in the parameter . Set , so that , and , so . Then:
A3 | ||||
B2 | ||||
B2 | ||||
A3 |
Using the fact that for binary , , we have:
B7 | ||||
Therefore,
Finally,
B6 | ||||
Therefore, the efficient influence function for in Model 3 is:
By exactly the same arguments, the efficient influence function for in Model 3 is:
Next we will show that the function given above is also the efficient influence function for in Model 2. It suffices to show that is an influence function for in Model 2, for then the projection of onto is the efficient influence function for in Model 2. But Model 3 Model 2, which implies . Since is the efficient influence function for Model 3, we know that . Therefore , and so is its own projection onto . Thus, once we show that is an influence function for in Model 2, it will automatically follow that is the efficient influence function for in Model 2.
To show that is an influence function for in Model 2, let be any regular parametric submodel of Model 2, parametrized say by , in such a way that corresponds to the true distribution . Write , and let be the score vector for evaluated at the truth. We will show that:
(2) |
where the right hand side is the derivative of the functional , evaluated at the truth. By [25], is an influence function for in Model 2 if and only if equation (2) is satisfied for every regular parametric submodel of Model 2.
Factoring the observed data likelihood, can be parametrized as
where are variation independent. We can partition the set of equations (B.2) into the 3 sets of equations
(3) |
for . We will show that, for , both sides of (3) are zero, since neither the functional nor the the influence function involve . We will then show that (3) holds for because Model 2 and Model 3 are identical as regards features related to and .
The identifying functional for is:
(4) |
Note that (4) varies in only, so . Since is the score for
, . Note also that is a function of only. Therefore:
which shows that (3) is satisfied for .
Now consider the submodel given by
Note that for , and have the same scores, i.e. , and that . We claim that is a parametric submodel of Model 3: Model 2 and Model 3 impose exactly the same restrictions on , and neither imposes any restriction on . For each , is in Model 2; therefore it is also in Model 3. Furthermore, taking shows that contains the true distribution.
Therefore, since is an influence function for in Model 3, and is a regular parametric submodel of Model 3,
Since for and , this shows that equation (3) holds for as claimed. Therefore, is an influence function for in Model 2, and hence the efficient influence function for in Model 2.
Appendix C Double Robustness
Here we show that the Model 1 and Model 2 efficient estimators for are doubly robust.
We consider the Model 1 estimator first. Fix , , and , and let be a model for the propensity score , and let be a model for . These models may or may not be correctly specified, but suppose that they converge to some values, say and . In the case where the models are correctly specified, say with and corresponding to the truth, we would have and . Let and denote predicted values under these models, and consider the corresponding Model 1 estimator
We will show that is consistent even if either or is misspecified, as long as the other is correctly specified. We claim that , where
For we can rewrite as
(5) |
(6) | ||||
where (5) is a sample mean of i.i.d. terms, and hence converges to the expected value of each term, and (6) converges in probability to 0.
Now we show that, if either or is correctly specified, then . If is correctly specified, then . Conditioning on , we have:
by A1 | ||||
If is correctly specified, then we have:
by A1 | ||||
Next we consider the Model 2 estimator. Now let be a model for the propensity score , let be a model for the propensity score , and let be a model for . Suppose that these models converge to some values, say , , and . In the case where the models are correctly specified, denote the truth by , , and respectively. Let , , and denote predicted values under these models, and consider the corresponding Model 2 estimator:
We show that is consistent under misspecification of one or both of the propensity score models, provided the outcome regression model is correctly specified; and that is consistent under misspecification of the outcome regression model, provided that both propensity score models are correctly specified.
We have , where
Now suppose first that the outcome regression model is correctly specified. Then . Conditioning on , we have:
by B1 | ||||
Finally, if both propensity score models are correctly specified, then conditioning on , we have:
by B1 | ||||
by B7 | ||||
Appendix D Data analysis and Simulations
Here we describe the models that were used in the data analysis. The Model 1 estimator uses predicted values from an outcome regression model, and from a joint propensity score model. We use generalized additive models to model , the conditional mean of the twin’s Drinking Index outcome, with the following model formulation:
Drinking.index s(Parent.alcohol.abuse) + s(Parent.drug.abuse) +
ti(Parent.alcohol.abuse, Parent.drug.abuse) + Parent.occupation.level +
Sex + Zygosity + Academic.motivation + Sex*Academic.motivation +
s(Conflict.with.parents)+ Age + Exposure + Parent.alcohol.abuse*Exposure +
Sex*Exposure + Cotwins.exposure +Zygosity*Cotwins.exposure + Exposure*Cotwins.exposure
where s( ) indicates a smooth function of the variable, and ti( , ) and indicates an interaction term consisting of a smooth function of the two variables. In order to enforce symmetry between the Twin 1’s and the Twin 2’s, we fit this model on a stacked dataset combining the data for the Twin 1’s and the data for the Twin 2’s, so that the same fitted values of the regression parameters are used for both the Twin 1’s and the Twin 2’s.
For the joint propensity score model, we first use generalized additive models to model logit :
Exposure Parent.alcohol.abuse + Parent.drug.abuse +
ti(Parent.alcohol.abuse, Parent.drug.abuse) + Externalizing.disorder +
Academic.motivation + Conflict.with.parents +
ti(Parent.alcohol.abuse, Conflict.with.parents) + s(Age), family=binomial
As before, we fit this propensity score model on a stacked dataset to enforce symmetry between the Twin 1’s and the Twin 2’s. We then obtain predicted values for the joint distribution from predicted values for the margins using a Dale model [2]. Under the Dale model, the joint distribution is determined by the two margins and together with a parameter for the association, the cross ratio , where corresponds to the case that and are conditionally independent. Here we allow the strength of the association between and to depend on zygosity and sex, and we assume . We estimate using maximum likelihood, treating the margins as fixed. Then is determined by the margins and and the estimated :
if , and otherwise.
The Model 2 estimator uses predicted values from three models. We use the same conditional mean model and the same propensity score model as above. Finally, we model logit, where is the probability that the twin is exposed given shared covariates and their co-twin’s individual covariates, as:
Exposure Parent.alcohol.abuse + Parent.drug.abuse +
ti(Parent.alcohol.abuse, Parent.drug.abuse) +
Cotwins.externalizing.disorder + Cotwins.academic.motivation +
Cotwins.conflict.with.parents +
ti(Parent.alcohol.abuse, Cotwins.conflict.with.parents) + s(Age), family=binomial
and fit this model on the stacked dataset.
Next, we describe the data generating mechanisms used in our simulation study. Under the first data-generating mechanism, Model 1 is correctly specified but Model 2 is not. For each simulated dataset, we resample twin pairs from the Minnesota Twin Family Study (MTFS) data to generate shared and individual baseline covariates. Let the th twin pair in the simulated data have covariates . We then use the model for described above to generate exposures, such that twin pair will have exposures with probability , the predicted values from the model fit on the MTFS data. The Drinking Index outcome in the MTFS data is approximately normal, so we generate outcomes from a bivariate normal distribution where the mean corresponds to predicted values of the outcome regression model fit on the MTFS data, and where values of the variance-covariance matrix are chosen to approximate the variance and covariance seen in the MTFS data. In particular, we take outcomes for Twin 1 and Twin 2 to be more strongly correlated among MZ twins than among DZ twins. Specifically, for twin pair , we draw
where is the predicted value from the model fit on the MTFS data, and where if twin pair is MZ, and if twin pair is DZ.
For the second data-generating mechanism, under which Model 1 and Model 2 are both correctly specified, we modify the step in which we generate exposures, drawing , where , and independently drawing , where .