Single Proxy Control
Abstract
Negative control variables are sometimes used in non-experimental studies to detect the presence of confounding by hidden factors. A negative control outcome (NCO) is an outcome that is influenced by unobserved confounders of the exposure effects on the outcome in view, but is not causally impacted by the exposure. Tchetgen Tchetgen (2013) introduced the Control Outcome Calibration Approach (COCA) as a formal NCO counterfactual method to detect and correct for residual confounding bias. For identification, COCA treats the NCO as an error-prone proxy of the treatment-free counterfactual outcome of interest, and involves regressing the NCO on the treatment-free counterfactual, together with a rank-preserving structural model which assumes a constant individual-level causal effect. In this work, we establish nonparametric COCA identification for the average causal effect for the treated, without requiring rank-preservation, therefore accommodating unrestricted effect heterogeneity across units. This nonparametric identification result has important practical implications, as it provides single proxy confounding control, in contrast to recently proposed proximal causal inference, which relies for identification on a pair of confounding proxies. For COCA estimation we propose three separate strategies: (i) an extended propensity score approach, (ii) an outcome bridge function approach, and (iii) a doubly-robust approach. Finally, we illustrate the proposed methods in an application evaluating the causal impact of a Zika virus outbreak on birth rate in Brazil.
Keywords: Confounding Proxy; Doubly Robust; Extended Propensity Score; Negative Controls; Unmeasured Confounding.
1 Introduction
Unmeasured confounding is a well-known threat to valid causal inference from observational data. An approach that is sometimes used in practice to assess residual confounding bias, is to check whether known null effects can be recovered free of bias, by evaluating whether the exposure or treatment of interest is found to be associated with a so-called negative control outcome (NCO), upon adjusting for measured confounders (Rosenbaum, 1989; Lipsitch et al., 2010; Shi et al., 2020). An observed variable is said to be a valid negative control outcome or more broadly, an outcome confounding proxy, to the extent that it is associated with hidden factors confounding the exposure-outcome relationship in view, although not directly impacted by the exposure. Therefore, an NCO which is empirically associated with the exposure, might suggest the presence of residual confounding. In the event such an association is present, a natural question is whether the negative control outcome can be used for bias correction.
The most well-established NCO approach for debiasing observational causal effect estimates is the difference-in-differences approach (DiD) (Card and Krueger, 1994; Lechner, 2011; Caniglia and Murray, 2020). In fact, DiD may be viewed as directly leveraging the pre-treatment outcome as an NCO since it cannot logically be causally impacted by the treatment. Identification then follows from an additive equi-confounding assumption that the unmeasured confounder association with the post-treatment outcome of interest matches that with the pre-treatment outcome on the additive scale (Sofer et al., 2016). The baseline outcome in DiD is thus implicitly assumed to be a valid NCO, and equi-confounding is equivalent to the so-called parallel trends assumption, that the average trends in treatment-free potential outcomes for treatment and untreated units are parallel. In practice, equi-confounding or equivalently parallel trends may not be reasonable for a number of reasons, including if the outcome trend is also impacted by an unmeasured common cause with the treatment. Furthermore, additive equi-confounding may not be realistic as a broader debiasing method in non-DiD settings where the NCO is not necessarily a pre-treatment measurement of the outcome of interest, but is instead a post-treatment measurement of a different type of outcome (and might therefore have support on a different scale than the outcome of interest has).
To address these potential limitations of additive equi-confounding, Tchetgen Tchetgen (2013) introduced the Control Outcome Calibration Approach (COCA) as a simple yet formal counterfactual NCO approach to debias causal effect estimates in observational analyses. At its core, COCA essentially treats the NCO variable as a proxy measurement for the treatment-free potential outcome, which therefore is associated with the latter, and which becomes independent of the treatment assignment mechanism, upon conditioning on the treatment-free counterfactual outcome. As the treatment-free potential outcome can be viewed as an ultimate source of unmeasured confounding, this assumption formalizes the idea that, as a relevant proxy for the source of residual confounding, the NCO would be made irrelevant for the treatment assignment mechanism if one were to hypothetically condition on the underlying potential outcome.
For identification and inference for a continuous outcome, the original COCA approach of Tchetgen Tchetgen (2013) involves the correct specification of a regression model for the NCO, conditional on the treatment-free potential outcome and measured confounders, together with a rank-preserving structural model which effectively assumes a constant individual-level treatment effect. In this paper, we develop a nonparametric COCA identification framework for the average causal effect for the treated, which equally applies irrespective of the nature of the primary outcome, whether binary, continuous, or polytomous. Importantly, as we show, the proposed COCA identification framework completely obviates the need for rank preservation, therefore accommodating an arbitrary degree of effect heterogeneity across units. Relatedly, an alternative counterfactual approach named proximal causal inference has recently developed in causal inference literature (Miao et al., 2016, 2018; Tchetgen Tchetgen et al., 2024), which leverages a pair of negative treatment and outcome control variables, or more broadly treatment and outcome confounding proxies, to nonparametrically identify treatment causal effects subject to residual confounding without invoking a rank-preservation assumption. Importantly, while proximal causal inference relies on two proxies for causal identification, in contrast, COCA is a single-proxy control approach which therefore may present practical advantages. For estimation and inference, we introduce three strategies to implement COCA which improve on prior methods: (i) an extended propensity score approach, (ii) a so-called outcome calibration bridge function approach, and (iii) a doubly robust approach which carefully combines approaches (i) and (ii) and remains unbiased, provided that either approach is also unbiased, without necessarily knowing which method might not be unbiased. Finally, we illustrate the methods with an application evaluating the causal effect of a Zika outbreak on birth rate in Brazil, and we conclude with possible extensions to our methods and a brief discussion.
2 Notation and Brief Review of COCA
Consider an observational study where, as represented in Figure 1, one has observed an outcome variable , a binary treatment whose causal effect on is of interest, and measured pre-treatment covariates . We are concerned that as displayed succinctly in the figure with the bow arc, the association between and is confounded by hidden factors.
Throughout, denotes the potential outcome or counterfactual, had possibly contrary to fact, the exposure been set to by an external hypothetical intervention. Furthermore, throughout, we also make the consistency assumption:
Assumption 1
almost surely.
Hereafter, we aim to make inferences about the causal effect of treatment on the treated (ETT), denoted by . Under consistency, is identified by ; to identify the counterfactual mean requires additional assumptions. Standard methods often resort to the no unmeasured confounding assumption, i.e., , a strong assumption we do not make. Instead, we suppose that one has measured a valid NCO , possibly multi-dimensional, which is known a priori to satisfy the following conditions:
Assumption 2
Condition (i): almost surely for , where is the potential NCO under an external intervention that sets ; Condition (ii): ; Condition (iii): .
Assumption 2-(i) encodes the key assumption of a known null causal effect of the treatment on the NCO in potential outcome notation. Assumption 2-(ii) encodes that is relevant for predicting the treatment-free potential outcome of interest. Assumption 2-(iii) states that is independent of conditional on treatment-free potential outcome and covariates. These conditions formally encode the assumption that is a valid proxy for the treatment-free potential outcome, a source of residual confounding bias; is only associated with the treatment mechanism to the extent that it is associated with the confounding mechanism captured by the potential outcome. We illustrate these NCO assumptions with the causal graph displayed in Figure 2. The thick arrows in the graph indicate the deterministic relationship defining the observed outcome in terms of potential outcomes and treatment variables by the consistency assumption. The missing arrows on the graph formally encode the core conditional independence conditions implied by Assumption 2.
It is enlightening to consider a data generating mechanism that is compatible with Assumption 2. As an example, we consider the following latent variable model for a continuous outcome:
(1a) | ||||
(1b) |
where is a continuously distributed unobserved variable and is a function that is strictly monotone in for all , but otherwise completely unrestricted. In the Supplementary Material A.1, we establish that expressions (1a)-(1b) imply Assumption 2. Expression (1a) means that the treatment-free outcome is a monotonic transformation of an unobserved variable , a specific instance of a so-called changes-in-changes model (Athey and Imbens, 2006). Although the latter would also assume under the causal graph in Figure 2 that where is strictly monotone in for all , an assumption we do not make. Expression (1b) corresponds to Assumption 2-(ii) and (iii). In fact, our formulation accommodates an additional measurement error in , say , so that where varies in (potentially non-monotonic) and , (1b) is satisfied. Figure 3 provides a graphical representation compatible with, although not necessarily with, expressions (1a)-(1b).
For identification and estimation in the case of a continuous outcome, Tchetgen Tchetgen (2013) further assumed the rank-preserving structural model:
(2) |
which, by consistency, implies a constant individual-level causal effect . Under this model, he noted that upon defining , then one can deduce from Assumption 2 that if and only if , in which case, given (2), which motivates a regression-based implementation of COCA, that entails searching for the parameter value of such that . A straightforward implementation of the approach uses linear models whereby for each value of on a sufficiently fine grid, one obtains an estimate of the regression model using ordinary least squares (OLS), with estimated coefficients . Then a 95% confidence interval for consists of all values of for which a valid test of the null hypothesis fails to reject at the 0.05 type 1 error level. Such hypothesis test might be performed by verifying whether the interval covers with the OLS estimate of the standard error of . Tchetgen Tchetgen (2013) also describes a potentially simpler one-shot approach which fits a single regression via OLS where in which case where are OLS estimates; a corresponding standard error estimator of is given in the Supplementary Material A.2 for convenience. Though practically convenient, validity of either approach relies on both correct specification of the linear model for given , and on the rank-preserving structural model, which may be biologically implausible. In the following, we describe alternative methods aimed at addressing these limitations.
3 Identification of the Effect of Treatment on the Treated
3.1 Identification via Extended Propensity Score Weighting
In order to establish identification, consider the extended propensity score function (EPS):
which makes explicit the fact that, in the presence of unmeasured confounding, the treatment mechanism will generally depend on the treatment-free potential outcome even after conditioning for all observed confounders. For notational brevity, we denote the treatment odds by . Since and have a one-to-one relationship, we use throughout to model the exposure mechanism. We assume that positivity holds, i.e.,
Assumption 3
where is the support of for .
Next, we note that were known, the average treatment-free potential outcome in the treated would then be empirically identified by the expression
(3) |
See the Supplementary Material B.1 for the details. Therefore, the ETT would be identified by
As is unknown, we next demonstrate how the NCO assumption can be leveraged to identify the latter quantity. Let . The proposed approach to identify is based on the following equality, which we prove in the Supplementary Material B.2:
Result 1
Result 1 provides an expression relating the exposure mechanism of interest to the observed data distribution, as of the three quantities involved in the expression , , and ; two are uniquely determined by the observed data, mainly and , which are then related to the unknown function of interest in Result 1; see the Supplementary Material A.3 for a graphical illustration. Equation (4) is known as an integral equation, more precisely a Fredholm integral equation of the first kind. In slight abuse of notation, the sum may be interpreted as an integral if is Lebesgue measurable. We then have the following identification result.
Result 2
Sufficient conditions for the existence and uniqueness of a solution to such an equation are well-studied; see Section A.5 for details. Such conditions were recently discussed in the context of proximal causal inference (Miao et al., 2016, 2018; Tchetgen Tchetgen et al., 2024). A detailed comparison of COCA with proximal causal inference is relegated to Section 6. Intuitively, the condition that the integral equation admits a unique solution essentially requires that is sufficiently relevant for in the sense that for any variation in the latter, there is corresponding variation in the former. This assumption is akin to the assumption of relevance in the context of instrumental variable methodology, which states that variation in the instrument should induce variation in the treatment. Importantly, Result 2 applies whether is binary, continuous or polytomous, provided that is sufficiently relevant for , to ensure that equation (3) admits a solution. Also, the result obviates the need for a rank-preserving structural model, and delivers fully nonparametric identification of the causal effect of treatment on the treated.
3.2 Identification via COCA Confounding Bridge Function
In this Section, we introduce an alternative nonparametric identification and estimation approach which does not rely on modeling the EPS, but instead relies on the existence of a so-called confounding bridge function formalized below.
Assumption 4
For all , there exists a function (possibly nonlinear) that satisfies the following equation
(5) |
Intuitively, upon noting that under Assumptions 1-2, Assumption 4 can equivalently be stated in terms of potential outcomes
(6) |
which essentially formalizes the idea that is a sufficiently relevant proxy for the potential outcome if there exist a (potentially nonlinear) transformation of whose conditional expectation given recovers . As provides a bridge between the observed data equation (5), and its potential outcome counterpart (6), we aptly refer to as a COCA confounding bridge function. Note that classical measurement error is a special case of the equation in the display above in which case is the identity map and where is an independent mean zero error. The condition can therefore be viewed as a nonparametric generalization of classical measurement error which allows and to be of arbitrary nature and does not assume the error to be unbiased on the additive scale. We further illustrate the assumption in the case of binary and . As shown in the Supplementary Material B.4, in this case with suppressing covariates, the following satisfies (5):
provided that , encoding the requirement that cannot be independent of , i.e., Assumption 2. Beyond the binary case, for more general outcome types, Assumption 4 likewise formally defines a Fredholm integral equation of the first kind, for which sufficient conditions for existence of a solution are well characterized in functional analysis textbooks; we again refer the reader to Miao et al. (2016). We are now ready to state our result, which we prove in the Supplementary Material B.5:
In the binary example discussed above where was uniquely identified, we have that
We briefly highlight a key feature of the above result reflected in its proof, which is that need not be uniquely identified by equation (5), and that any such solution leads to a unique value for Interestingly, the identifying formula in the display above was also obtained by Tchetgen Tchetgen (2013) in the binary case, although he did not emphasize the key role of the bridge function as a general framework for identification beyond the binary case.
3.3 Semiparametric Efficiency Theory
Let denote a semiparametric model defined as a collection of observed data laws that admit a solution to (5), i.e.,
We further consider the following surjectivity condition:
-
(Surjectivity): Let denote the operator given by . At the true data law, is surjective.
The surjectivity condition states that the Hilbert space is sufficiently rich so that any element in can be recovered from an element in via the conditional expectation mapping; see Cui et al. (2023), Dukes et al. (2023), and Ying et al. (2023) for related discussions. In addition, we consider a submodel :
We then establish the semiparametric local efficiency bound for under at the submodel .
Result 4
Suppose that Assumptions 1-4 hold. Then, the following results hold.
-
(i)
The following function is an influence function for under .
(8) -
(ii)
The influence function is the efficient influence function for under at the submodel . Therefore, the corresponding semiparametric local efficiency bound for is .
The influence function IF shares similarity with an influence function for in the proximal causal inference framework; see Section G of Cui et al. (2023) for details. Interestingly, the influence function has the following doubly robust property (Scharfstein et al., 1999; Lunceford and Davidian, 2004; Bang and Robins, 2005); see the Supplementary Material B.6 for the proof:
Result 5
In words, if either the COCA confounding bridge function or the EPS, but not necessarily both, is correctly specified, the influence function is an unbiased estimating function of .
Using expressions (3), (7), and (8), one can construct parametric estimators of . Specifically, the first estimator using (3) entails a priori specifying a parametric model for the EPS, say a logistic regression model. The second estimator based on (7) entails a priori specifying a parametric model for the COCA bridge function, say a linear model. Lastly, the third estimator based on (8) entails parametric models for both EPS and COCA bridge functions. The first two estimators rely on the correct exposure and COCA bridge function specifications, respectively. Thus, misspecification of either model will likely result in biased inferences about the ETT. On the other hand, the last estimator has a doubly-robust property (Scharfstein et al., 1999; Lunceford and Davidian, 2004; Bang and Robins, 2005) in that it can be used for unbiased inference about the ETT if either EPS or COCA bridge function is correct, without a priori knowledge of which model, if any, is incorrect. In Section A.4, we provide details on constructing these three parametric estimators and their large sample behavior.
A significant limitation of the three parametric estimators is their dependence on specific parametric specifications of nuisance components, which can lead to biased inference if the model specifications are incorrect. To address this concern, a potential solution is to develop an estimator where nuisance components are estimated using nonparametric methods, drawing on advancements in recent learning theory. In the following Section, we construct such an estimator and study its statistical properties.
4 A Semiparametric Locally Efficient Estimator
Our estimator is derived from the influence function IF in Result 5 and adopts the cross-fitting approach (Schick, 1986; Chernozhukov et al., 2018), which is implemented as follows. We randomly split study units, denoted by , into non-overlapping folds, denoted by . For each , we estimate the EPS and COCA confounding bridge functions using observations in , and then evaluate the estimated nuisance functions using observations in to obtain an estimator of . We refer to and as the estimation and evaluation folds, respectively. To use the entire sample, we take the simple average of the estimators.
We introduce the following additional notation in order to facilitate the discussion. Let be the Reproducing Kernel Hilbert Space (RKHS) of endowed with a universal kernel function , such as the Gaussian kernel , i.e., where is a bandwidth parameter; see Chapter 4 of Steinwart and Christmann (2008) for the definition and examples of the universal kernal function. For each , let and . For a function , let be the -norm of .
We estimate the EPS and COCA bridge functions by adopting a recently developed minimax estimation approach (Ghassami et al., 2022). We remark that other approaches (e.g., Mastouri et al. (2021)) can also be adopted with minor modification. Note that and satisfy
Therefore, following Ghassami et al. (2022), minimax estimators of and are given by
where is an RKHS norm and , , , and are positive regularization parameters.
We make a few remarks about the minimax estimation approach, of which details are relegated to the Supplementary Material A.6. First, despite the complicated formulas, closed-form representations of and are available from the representer theorem (Kimeldorf and Wahba, 1970; Schölkopf et al., 2001). Second, the bandwidth and regularization parameters can be selected via cross-validation. Lastly, may vary widely because it is a ratio of two probabilities. In such cases, the proposed minimax estimator may result in significantly small or negative estimates. To mitigate this issue, one may consider a practical approach to regularize the minimax estimator when it appears to be ill-behaved.
Using the minimax estimators of the nuisance functions, a semiparametric estimator of is then obtained as follows:
Under regularity conditions, the semiparametric estimator is consistent and asymptotically normal for .
Assumption 5
Suppose that the following conditions hold for all :
-
(i)
(Boundedness) There exists a finite constant such that
-
(ii)
(Consistency) As , we have and .
-
(iii)
(Cross-product Rates) As , we have
Assumption 5-(i) states that nuisance functions and the corresponding estimators are uniformly bounded. Assumption 5-(ii) states that the estimated nuisance functions are consistent for the true nuisance functions in the norm sense. Assumption 5-(iii) states that the cross-product rate of nuisance function estimators are . Assumption 5-(iii) is satisfied if and are sufficiently smooth, the conditional expectation operators and are sufficiently smooth, and and are estimated over an RKHS with fast enough eigendecay; see Section 5 of Ghassami et al. (2022) for details. Importantly, if one nuisance function is estimated at sufficiently fast rates, the other nuisance function is allowed to converge at a substantially slower rate provided that the cross-products remain . This is an instance of the mixed-bias property described by Rotnitzky et al. (2020) and Ghassami et al. (2022). It is also worth highlighting the structure of the mixed bias in the current context which is the minimum of two product biases, each containing a bias term for a nuisance function and a projected bias term for the other nuisance function; this property was first reported in Ghassami et al. (2022) for a large class of functionals including ours.
Result 6 establishes that is consistent and asymptotically normal (CAN) for .
Result 6
Using the variance estimator , valid % confidence intervals for the ETT are given by where is the th percentile of the standard normal distribution. Alternatively, one may construct confidence intervals using the multiplier bootstrap (van der Vaart and Wellner, 1996, Chapter 2.9); see Section A.6 for details.
Lastly, the cross-fitting estimator depends on a specific sample split, and thus, may produce outlying estimates if some split samples do not represent the entire data. To mitigate this issue, Chernozhukov et al. (2018) proposes to use median adjustment from multiple cross-fitting estimates; the detail can be found in the Supplementary Material A.6.
5 Data Application: Zika Virus Outbreak in Brazil
The Zika virus, which can be transmitted from a pregnant woman to her fetus, can cause serious brain abnormalities, including microcephaly (i.e., an abnormally small head) (Rasmussen et al., 2016). Brazil is one of the countries hardest hit by the Zika virus. In particular, the outbreak in 2015 resulted in over 200,000 cases in Brazil by 2016 (Lowe et al., 2018). As a result, many prior works (Castro et al., 2018; Diaz-Quijano et al., 2018; Taddeo et al., 2022; Tchetgen Tchetgen et al., 2024) asked whether the Zika virus outbreak caused a drop in birth rates.
We re-analyzed the dataset analyzed in Taddeo et al. (2022) and Tchetgen Tchetgen et al. (2024). In the dataset, we focused on 673 municipalities in two states of Brazil, Pernambuco and Rio Grande do Sul, which are northeastern and southernmost states. Out of the 1248 cases of microcephaly that occurred in Brazil by November 28, 2015, 51.8% (646 cases) were reported in Pernambuco (PE), less than 10 cases of Zika-related microcephaly were reported in Rio Grande do Sul (RS) (Gregianini et al., 2017), which shows that PE was severely impacted by the Zika virus outbreak, while RS was minimally affected. Based on their epidemiologic histories, we defined 185 and 488 municipalities in PE and RS as treated and control groups, respectively.
For each municipality, we included the following variables in the analysis. As pre-treatment covariates, we included municipality-level population size, population density, and proportion of females measured in 2014. We used the post-epidemic municipality-level birth rate in 2016 as the outcome , where the birth rate is defined as the total number of live human births per 1,000 persons. We used the pre-epidemic municipality-level birth rates in 2013 and 2014 as the outcome proxies (i.e., NCO), denoted by and , respectively. To be valid proxies, the birth rates in 2013 and 2014 must satisfy Assumption 2: (i) birth rates in 2013 and 2014 cannot be causally impacted by the Zika virus epidemic which occurred in 2015, (ii) birth rates in 2013 and 2014 are correlated with what the birth rate in 2016 would have been had there not been a Zika virus epidemic, and (iii) birth rates in 2013 and 2014 are independent of a municipality’s Zika epidemic status, upon conditioning on its Zika virus epidemic-free potential birth rate in 2016. The first two conditions are uncontroversial, while the third condition largely relies on the extent to which pre-epidemic birth rates can accurately be viewed as a proxy for the counterfactual birth rate had the pandemic not occurred, and as such would not further be predictive of whether the municipality experienced a high rate of Zika virus incidence, conditional on the region’s epidemic-free counterfactual birth rate in 2016. Although one might consider this last assumption reasonable, ultimately, it is empirically untestable without making an alternative assumption. Nevertheless, in the Supplementary Material A.8, we describe a straightforward sensitivity analysis to evaluate the extent to which violation of the assumption might impact inference.
Using the dataset, we estimated , i.e., the difference between the observed average birth rate of Pernambuco and a forecast of what it would have been had the Zika outbreak been prevented. Therefore, the ETT quantifies the average treatment effect of the Zika outbreak on the birth rate within the Pernambuco region. Of note, the crude estimand was estimated to be equal to , suggesting that municipalities in the PE region (with higher incidence of Zika virus) experienced a higher birth rate than RS regions in 2016 during the Zika virus outbreak. An immediate concern is that this crude association between and might be subject to significant confounding bias, leading us to conduct two separate analyses geared at addressing residual confounding bias; the proposed COCA methods, which we compared with a standard difference-in-differences analysis. Thus, we estimate the ETT using the approach outlined in Section 4 where the NCOs are specified as either (i) , birth rate in 2013, or (ii) , birth rate in 2014, or (iii) . For comparison, we also obtained doubly-robust parametric estimators of the ETT using the three NCO specifications; see the Supplementary Material A.7 for details on how these estimators were constructed.
Table 1 summarizes corresponding results. We find that the six COCA estimates vary between and , meaning between and birth per 1,000 persons were reduced in PE due to the Zika virus outbreak, an empirical finding better aligned with the scientific hypothesis that Zika may likely adversely impact the birth rates of exposed populations. Compared to the crude estimate of 3.384, the negative effect estimates indeed provide compelling evidence of potential confounding. We also obtain an estimate using the difference-in-difference estimator under a standard parallel trends assumption (e.g., Card and Krueger (1994); Angrist and Pischke (2009)), which yields a considerably smaller effect estimate varying between and ; noting that the DiD estimator requires the assumption of equi-confounding of the association in the pre- and post-periods, while our proposed estimator does not (but instead requires conditions (i)-(iii) outlined above). Regardless of the estimator, all estimates appear to be consistent with the anticipated adverse causal impact of the Zika Virus epidemic. Consequently, we conclude that based on inferences aimed at accounting for confounding (DiD and COCA), the Zika virus outbreak likely led to a decline in the birthrate of affected regions in Brazil, which agrees with similar findings in the literature (Castro et al., 2018; Diaz-Quijano et al., 2018; Taddeo et al., 2022; Tchetgen Tchetgen et al., 2024).
Estimator | Statistic | NCO | ||
---|---|---|---|---|
Semiparametric COCA | Estimate | |||
SE | ||||
95% CI | (,) | (,) | (,) | |
Doubly-robust parametric COCA | Estimate | |||
SE | ||||
95% CI | (,) | (,) | (,) | |
Standard DiD under parallel trends | Estimate | |||
SE | ||||
95% CI | (,) | (,) | (,) |
6 Discussion and Possible Extensions
We have described a COCA nonparametric identification framework, therefore extending previous results of Tchetgen Tchetgen (2013) to a more general setting accommodating outcomes of arbitrary nature and obviating the need for an assumption of constant treatment effects, i.e. rank preservation. We have proposed three estimation strategies including a doubly robust method which has appealing robustness properties. Interestingly, the COCA central identifying assumption, that conditioning on the treatment-free counterfactual would in principle shield the treatment assignment from any association with the NCO is isomorphic to an analogous assumption in the missing data literature where an outcome might be missing not at random; however, a fully observed so-called shadow variable (the missing data analog of an NCO) reasonably assumed to be conditionally independent of the missing data process given the value of the potentially missing outcome. For example, Zahner et al. (1992) considered a study of the children’s mental health evaluated through their teachers’ assessments in Connecticut. However, the data for the teachers’ assessments are subject to nonignorable missingness. As a proxy of the teacher’s assessment, a separate parent report is available for all children in this study. The parent report is likely to be correlated with the teacher’s assessment, but is unlikely to be related to the teacher’s response rate given the teacher’s assessment and fully observed covariates. Hence, the parental assessment is regarded as a shadow variable for the teacher’s assessment in this study. The literature on shadow variables is fast-growing (d’Haultfoeuille, 2010; Kott, 2014; Wang et al., 2014; Miao and Tchetgen Tchetgen, 2016; Li et al., 2023; Miao et al., 2023), the methods developed in this paper have close parallels to shadow variable counterparts in this literature. This connection to shadow variables is particularly salient when the COCA confounding bridge function is not uniquely defined, which can easily occur for instance when the shadow variable (or analogously the NCO) is multivariate, therefore significantly complicating inference. Fortunately, the methods developed by Li et al. (2023) for the analogous shadow variable setting directly apply to the corresponding COCA setting and thus provide a complete solution for identification and inference for the average treatment effect for the treated without relying on completeness conditions nor on unique identification of either the EPS or the COCA confounding bridge function. We refer the interested reader to this latter work for further details. It is worth noting that the doubly robust estimator proposed in this paper appears to be completely new, and different from those of Miao and Tchetgen Tchetgen (2016), Li et al. (2023), and Miao et al. (2023) and therefore may also be of use in shadow variable applications. Likewise, the doubly robust estimators proposed in the latter works can equally be applied to the current COCA setting as an alternative inferential approach.
Additionally, as mentioned in the previous Section, the key assumption that conditioning on the treatment-free potential outcome, would in principle make the NCO or outcome proxy irrelevant to treatment mechanism is ultimately untestable, and may in certain settings not hold exactly. In fact, this would be the case if the NCO were in fact explicitly used in assigning the treatment in which case the assumption might be violated. In order to address such eventuality, the analyst might consider several candidate proxies/NCOs when available, and may even perform an over-identification test, by inspecting the extent to which the estimated causal effect depends on the choice of proxy. Alternatively, a sensitivity analysis might also be performed to evaluate the potential impact of a hypothesized departure from the assumption. In the context of the Zika virus application, an over-identification test and a sensitivity analysis were carried out as illustrative examples, with the corresponding results and discussion provided in the Supplementary Material A.8 and A.9.
Finally, as previously mentioned in the Introduction, COCA offers an alternative approach to proximal causal inference for debiasing observational estimates of the ETT by leveraging negative control or valid confounding proxies. A key difference highlighted earlier between these two frameworks is that COCA relies on a single valid NCO which directly proxies the treatment-free potential outcome, while proximal causal inference requires both valid NCO and negative control treatment variables that proxy an underlying unmeasured confounder. Importantly, COCA takes advantage of the fact that the treatment-free potential outcome is observed in the untreated, while in proximal causal inference, the unmeasured confounders for which proxies are available, are themselves never observed, arguably a more challenging identification task. Despite the practical advantage of needing one rather than two proxies, it is important to note that though COCA identifies the ETT, it fails to nonparametrically identify the population average treatment effect (ATE), without an additional assumption. In contrast, proximal causal inference provides nonparametric identification of both causal parameters and thus can be interpreted as providing richer identification opportunities. A key reason for this difference in the scope of identification is the fact that in the current paper, we have emphasized an interpretation of the NCO as a proxy for the treatment-free potential outcome but not for the potential outcome under treatment, in the sense that under our conditions, it must be that the treatment-free potential outcome does not only shield the treatment from the NCO, but also shields the potential outcome under treatment from the latter. Two potential strategies to recover COCA identification of the population ATE, might be either (i) evoke a rank-preservation assumption which if appropriate would imply that the ETT and the ATE are equal (this is the assumption made in Tchetgen Tchetgen (2013)); or (ii) identify a second proxy NCO which is a valid proxy for the counterfactual outcome under treatment. The second condition would be needed if can also be viewed as a hidden confounder. By a symmetry argument, one can show that (ii) in fact would provide identification of the average counterfactual outcome under treatment for the untreated. A weighted average of both counterfactual means would then provide identification of the ATE. Details are not provided, but can easily be deduced from the presentation.
Acknowledgment
The authors would like to thank James Robins, Thomas Richardson, and Ilya Shpitser for helpful discussions.
Data Availability
The data and the analysis R code are accessible on the GitHub repository located at http://github.com/qkrcks0218/SingleProxyControl.
Supplementary Material
Appendix A Details of the Main Paper
A.1 Details of the Latent Variable Model (1)
Under these assumptions, we establish that
justifying Assumption 2-(ii): , and
justifying Assumption 2-(iii): .
It is instructive to consider a data generating mechanism that can be formulated in a manner analogous to the changes-in-changes model (Athey and Imbens, 2006). To this end, we now consider a special case of model (1):
(SPC-a) | |||
(SPC-b) | |||
(SPC-c) |
Figure 4 provides a graphical representation compatible with expressions (SPC-a)-(SPC-c).
Under these assumptions, we establish that
and
Models (SPC-a)-(SPC-c) share similarity with a changes-in-changes model in panel data setting (Athey and Imbens, 2006). Specifically, when a pre-treatment outcome is viewed as , the changes-in-changes model in panel data setting is represented as follows:
(CiC-a) | ||||
(CiC-b) | ||||
(CiC-c) |
Expression (SPC-a) means that the treatment-free outcome is a monotonic transformation of an unobserved variable . We remark that the same model for is used in the changes-in-changes model (CiC-a). Expression (SPC-b) means that the negative control outcome is a transformation of an unobserved variable . It is important to highlight that in model (SPC-b), the function is not necessarily monotonic in , unlike (CiC-b). Consequently, (SPC-b) is strictly weaker than (CiC-b). Lastly, expressions (SPC-c) and (CiC-c) share a common goal of restricting the distributions of the unmeasured confounders, but have significant differences. Specifically, (CiC-c) states that and have the identical distribution given and do not place any restriction on the correlation between and . On the other hand, (SPC-c) states that (i) and must be correlated given and (ii) is conditionally independent of given , but it does not necessitate that and have the identical distribution given . Consequently, and may have different supports under expression (SPC-c), a condition that does not apply to expression (CiC-c). Lastly, we note that both (SPC-c) and (CiC-c) hold if almost surely, yet perfect correlation between the two unmeasured confounders may not be reasonable.
A.2 Standard Error of the COCA Estimator in Tchetgen Tchetgen (2013)
We provide details on the standard error of the COCA estimator obtained from the ordinary least squares estimators of the regression model . It is well known that the variance estimator of the OLS estimator is given as
where is the number of observed units, indexed by subscript . Let . Then, the first-order Taylor expansion around is
where and . Therefore, we find
Consequently, the standard error estimator of is given as where is the th element of .
A.3 A Graphical Illustration of Result 1
Recall that Result 1 is given by
(4) |
where is the conditional law of given evaluated at . To illustrate, we consider the following data generating process:
-
•
-
•
-
•
where with
We find the odds function relating and and that relating and are given by
where is the density of . In addition, . From straightforward algebra, we find . We then generate observations from the data generating process, and we draw a scatterplot of and in Figure 5.

A.4 Details of the Construction of Parametric Estimators via GMM
In this Section, we provide details of the generalized method of moments (GMM, Hansen (1982)), which is implemented in many publicly available software programs, such as gmm package in R (Chaussé, 2010). Before we present details, we introduce additional notations. Let be the number of observed units, indexed by subscript . Let be the observed data for the th unit. Let denote the average of function across units, i.e., . For random variables and , let denote converges to in distribution.
Let and . The additive and multiplicative additive average causal effects of treatment on the treated are represented as . Let be a vector of parameters of interest and be a vector-valued function that is restricted by the mean-zero moment restriction, i.e. where is the unique parameter that achieves the mean-zero moment restriction.
A.4.1 A Moment Restriction based on the EPS
The first approach entails a priori specifying a parametric logistic regression model for the EPS, say:
where is a user-specified sufficient statistic for the log-odds ratio ratio association between and evaluated at given . Options for specifying includes both more parsimonious specifications, e.g., , as well as more flexible specifications, say or
where is the user-specified percentile of . Alternative nonparametric smoothing techniques, e.g. nearest neighbor, kernel smoothing, splines or wavelets might also be considered. We assume that for an unknown unique value .
While such logistic regression model specifications for the EPS might look familiar, estimation and inference about its unknown parameter via maximum likelihood is complicated by the fact that is only observed among untreated units with Instead, to find an empirical estimate of , we observe the following property of and :
The identity is trivial from Result 1, the law of iterated expectations, and some straightforward algebra:
and
Consequently, one can use the following moment restriction to estimate :
(11) |
We need to choose so that , which is required to admit a unique .
A.4.2 A Moment Restriction based on the COCA Confounding Bridge Function
We consider a parametric model of for user-specified function , assuming that for an unknown unique value . Similar to in the previous Section, can likewise account for nonlinearities by specifying specific basis functions, such as polynomials, splines, or wavelets. We then note that a standard least-squares regression on would generally fail to recover even if the model is correctly specified. In other words, under our assumptions it will generally be the case that , as this would in fact imply that which does not follow from our assumptions. A correct approach in fact is based on the following consequence of our assumptions:
(12) |
Again, we need to choose so that to admit a unique . If only one NCO is available and is chosen as , is represented as
assuming is invertible.
A.4.3 Construction of the GMM Estimators
Based on the moment restrictions (11) and (12), we consider the following three moment functions and the corresponding parameters :
-
•
(Extended Propensity Score)
(13) where .
-
•
(COCA Confounding Bridge Function)
(14) where .
-
•
(Doubly Robust)
(15) where .
The unique parameter can be represented as the minimizer of the weighted squared norm of . Specifically, with a chosen weighting matrix (which can depend on the observed data), is represented as
(16) |
From the law of large numbers, we find . In section B.1 of the Appendix, we show that for each estimation strategy if
-
•
(Extended Propensity Score) the EPS is correctly specified, i.e.
(17) -
•
(COCA Confounding Bridge Function) the COCA confounding bridge function is correctly specified, i.e.
(18) - •
For efficiency, is chosen as , the inverse of the variance matrix of , i.e. .
The GMM estimator is obtained from the empirical analog of the minimization task in (16). Specifically, we get
(19) |
where is often chosen as a matrix that is consistent for ; see equation (20) below for the exact form.
Under regularity conditions (see Newey and McFadden (1994) for details), the GMM estimator is consistent and asymptotically normal (CAN) as follows:
In particular, reduces to if , which is the most efficient in the class of all GMM estimators. Therefore, we obtain the consistency and asymptotic normality of the GMM-based estimator of as follows:
The variance estimator of can be obtained from the empirical analog, i.e. where
(20) |
If we choose , reduces to .
Since depends on the parameter , it may be difficult to obtain the GMM estimator by directly working on (19). Therefore, a popular procedure is the following two-step approach in that:
-
(1)
Compute the preliminary GMM estimator from (19) where the weight matrix is chosen as the identity matrix or other positive definite fixed matrix.
-
(2)
Compute the GMM estimator from (19) where the weight matrix is chosen as .
In the second step, is consistent for because is consistent (even though it is inefficient). One can repeat the iterative procedure multiple times until convergence criteria are satisfied.
A.4.4 Local Efficiency of the Doubly-robust GMM Estimator
Recall that is defined as
We show that the doubly robust GMM estimator is semiparametric locally efficient in at the intersection model where
is correctly specified, i.e., | |||
Moreover, we choose and as
Then, the moment equation in (• ‣ A.4.3) is just identified, i.e., . Therefore, under regularity conditions, (see Newey and McFadden (1994) for details), we obtain
where is
Here, (g1) and (g2) are zero as follows:
(g1) | |||
and
(g2) | |||
Consequently, the influence function of reduces to
Therefore, the estimator achieves the semiparmetric local efficiency bound for the ETT under at the submodel so long as the posited parametric models belong to the intersection model .
A.4.5 Regularized GMM Estimators
Finding the optimal solution in (19) can be quite challenging when dealing with finite samples, especially if the objective function is not convex. To mitigate the issue, one may regularize (19) and solve a penalized version of GMM, e.g.,
(21) |
where is a regularization term and is a regularization parameter, which is chosen from cross-validation; see Algorithm 1 below for details.
A.5 Sufficient Conditions for the Existence and Uniqueness of the COCA Confounding Bridge Function
In this Section, we discuss sufficient conditions for the existence and uniqueness of the solutions to the integral equations in equations (4) and (5):
Since both equations share a similar structure, our focus here is primarily on the case of the COCA bridge function .
A.5.1 Conditions for Existence
We begin by providing sufficient conditions for the existence of the COCA confounding bridge function . In brief, we follow the approach in Miao et al. (2018). The proof relies on Theorem 15.18 of Kress (2014), which is stated below for completeness.
Theorem 15.18. (Kress, 2014)
Let be a compact operator with singular system . The integral equation of the first kind is solvable if and only if
To apply the Theorem, we introduce some additional notations. For a fixed , let and be the spaces of square-integrable functions of and , respectively, which are equipped with the inner products
Let be the conditional expectation of given , i.e.,
Then, the COCA confounding bridge function evaluated at , i.e., , solves , i.e.,
Now, we assume the following conditions for each :
-
(Bridge-2) For , implies almost surely;
-
(Bridge-4) Let the singular system of be . Then, we have
First, we show that is a compact operator under Condition (Bridge-1) for each . Let be the conditional expectation of given , i.e.,
Then, and are the adjoint operator of each other as follows:
Additionally, as shown in page 5659 of Carrasco et al. (2007), and are compact operators under Condition (Bridge-1). Moreover, by Theorem 15.16 of Kress (2014), there exists a singular value decomposition of as .
Second, we show that , which suffices to show . Under Condition (Bridge-2), we have
where the first arrow is from the definition of the null space , and the second arrow is from Condition (Bridge-2). Therefore, any must satisfy almost surely, i.e., almost surely.
Third, from the definition of , under Condition (Bridge-3).
Combining the three results, we establish that satisfies the first condition of Theorem 15.18 of Kress (2014). The second condition of the Theorem is exactly the same as Condition (Bridge-4). Therefore, we establish that the Fredholm integral equation of the first kind is solvable under Conditions (Bridge-1)-(Bridge-4). Consequently, for each , there exists a function satisfying
We define as a function satisfying , which then solves
A.5.2 Conditions for Uniqueness
The COCA confounding bridge function is unique under the following completeness condition:
-
(Completeness): Suppose that for any . Then, almost surely.
The proof is given below. Suppose that and satisfy the bridge function condition (5). Then, we find
From (Completeness), almost surely for any . This implies almost surely. Therefore, the COCA confounding bridge function is unique.
A.6 Details of the Minimax Estimation
A.6.1 Closed-form Representations of the Minimax Estimators
For notational brevity, let . Recall and satisfy
Following Ghassami et al. (2022), minimax estimators of and are given by
Note that all true and estimated functions are defined in terms of the following representations:
(22) |
where are appropriately chosen. Speicifcally, for the weight function , we choose , , , ; for the COCA confounding bridge function , we choose , , , .
A closed-form representation of the solution to (A.6.1) is available from the represented theorem (Kimeldorf and Wahba, 1970; Schölkopf et al., 2001). Specifically, we have where is equal to
where
Therefore, minimax estimators of the nuisance functions are readily available by appropriately choosing .
A.6.2 Cross-Validation
We use cross-validation to select the hyperparameters that minimize an empirical risk evaluated over a validation set , denoted by . For the empirical risk, we can either use (i) the projected risk (Dikkala et al., 2020; Ghassami et al., 2022), or (ii) the V-statistic (Mastouri et al., 2021). To motivate (i), we remark that Dikkala et al. (2020); Ghassami et al. (2022) defined the population-level projected risk of (A.6.1) as . In addition, they showed that its empirical counterpart evaluated over a validation set is given by where
To motivate the V-statistic, we begin by introducing Lemma 2 of Mastouri et al. (2021):
Lemma 2 (Mastouri et al., 2021): Suppose that
(23) |
Then, we have
where are independent copies of . The result is also reported in other works, e.g., Theorem 3.3 of Muandet et al. (2020) and Lemma 1 of Zhang et al. (2023). The condition (23) implies that is Bochner integrable (Steinwart and Christmann, 2008, Definition A.5.20). One important property of the Bochner integrability is that an integration and a linear operator can be interchanged. Therefore, we find
The first line holds from , implying that . The second line holds from the Bochner integrability. The third line holds from the fact that is a vector space, and from the Bochner integrability. Therefore, by choosing , we obtain the result. The fourth line is trivial from the definition of the norm . The fifth and sixth lines are from the Bochner integrability. The last line is trivial. Using this risk function, one may use the following empirical risk evaluated over a validation set to find hyperparameters:
(24) |
Using these empirical risks, we may choose the hyperparameters by following Algorithm 2 below:
A.6.3 A Remedial Strategy to Address Widely Varying Nuisance Functions
We present a remedial strategy to address widely varying . First, following Section A.4.1, we may obtain a GMM estimator of , denoted by , using observations in . If the range of is excessively wide (e.g., the maximum is larger than ten times the minimum), we recommend estimating instead of ; note that stands for the ratio. If is a good estimator, would be close to 1, and thus, the minimax estimator of would vary less than the minimax estimator of . Motivated by this phenomenon, we obtain a minimax estimator of as where is estimated from the following minimax estimation strategy:
A.6.4 Multiplier Bootstrap
One can obtain a standard error of the minimax estimator and confidence intervals for the ETT based on the multiplier bootstrap procedure; see Algorithm 3 for details:
A.6.5 Median Adjustment
Lastly, the cross-fitting estimator depends on a specific sample split, and thus, may produce outlying estimates if some split samples do not represent the entire data. To mitigate this issue, Chernozhukov et al. (2018) proposes to use so-called median adjustment from multiple cross-fitting estimates, which is implemented as follows. First, let be the th cross-fitting estimate with a variance estimate . Then, the median-adjusted cross-fitting estimate and its variance estimate are defined as follows:
(25) |
These estimates are more robust to the particular realization of sample partition.
A.7 Details of the Specifications for the Zika Virus Application
In this Appendix, we present a detailed explanation of the Zika virus application. We remark that the replication code is available at https://github.com/qkrcks0218/SingleProxyControl, and the code implements the estimation process outlined below.
Recall that the variables are defined as follows:
-
•
: Number of municipalities
-
•
: municipality belongs to Pernambuco (treated)
-
•
: municipality belongs to Rio Grande do Sul (control)
-
•
: Birth rate of municipality in 2016
-
•
: Birth rate of municipality in 2014
-
•
: Birth rate of municipality in 2013
-
•
: A vector of municipality ’s log population, log population density, proportion of females in 2014
A.7.1 GMM Estimators
We first provide details of GMM estimators. We consider three specifications according to how NCOs are used:
-
•
(NCO 1) We use , birth rate in 2013, as an NCO.
-
•
(NCO 2) We use , birth rate in 2014, as an NCO.
-
•
(NCO 3) We use , birth rates in 2013 and 2014, as NCOs.
According to the specifications, we use penalized GMM in (21) to estimate the ETT, where the nuisance functions and basis functions in the moment function ((13),(14),(• ‣ A.4.3)) are specified is specified as follows:
-
•
(NCO 1)
-
–
,
-
–
-
–
,
-
–
-
–
-
•
(NCO 2)
-
–
,
-
–
-
–
,
-
–
-
–
-
•
(NCO 3)
-
–
,
-
–
-
–
,
-
–
-
–
The penalization term in equation (21) is determined based on the specific moment function used:
-
•
() where
-
•
() where
-
•
()
The regularization parameter is chosen from cross-validation in Algorithm 1. Table summarizes the choice of :
Moment Function | NCO | ||
---|---|---|---|
A.7.2 Minimax Estimators
We next provide details of minimax estimators. First, we employ the cross-fitting procedure with split samples. Second, the hyperparameters are chosen based on 5-fold cross-validation in Algorithm 2 using the V-statistic in (24) as a criterion. Third, we employ the remedial strategy in Section A.6.3 where is estimated from GMM using in Section A.7.1 where the regularization parameter is set to . Lastly, we implement median adjustment in Section A.6.5 based on 500 cross-fitting estimates.
A.7.3 Summary of the Result
Table 3 summarizes corresponding results. We find that the twelve COCA estimates vary between and , meaning between and birth per 1,000 persons were reduced in PE due to the Zika virus outbreak, an empirical finding better aligned with the scientific hypothesis that Zika may likely adversely impact the birth rates of exposed populations. We remark that parametric estimators utilizing the COCA bridge function yield larger effect sizes compared to the other three COCA estimators. However, the estimates show minimal variability across different NCO specifications. Regardless of the estimator used, all the estimates support the conclusion that the Zika virus outbreak likely led to a decline in the birthrate in the affected regions of Brazil.
Estimator | Statistic | NCO | |||
Semiparametric | Estimator | ||||
SE | |||||
95% CI | |||||
Parametric | EPS | Estimator | |||
SE | |||||
95% CI | |||||
COCA bridge function | Estimator | ||||
SE | |||||
95% CI | |||||
Doubly-robust | Estimator | ||||
SE | |||||
95% CI | |||||
DiD under parallel trends | Estimator | ||||
SE | |||||
95% CI |
A.8 Sensitivity Analysis
We provide details of the sensitivity analysis described in Section 6 of the main paper. To perform such sensitivity analysis, one might consider a parametric approach in Section A.4 where the EPS model is slightly modified by including the NCO into the EPS to encode a departure from the assumption. For example, we may consider:
where is specified by the user and is a pre-specified sensitivity parameter encoding a hypothetical violation of a valid NCO assumption in the direction of . For instance, one might let and vary over a grid of values in the neighborhood of zero (which corresponds to the valid NCO assumption). For each hypothetical value of , one would then re-estimate the EPS using methods described in the paper with the sensitivity function specified as an offset.
We implement a sensitivity analysis in the context of the Zika virus application. We focus on the GMM estimator based on the EPS model using the following specifications:
-
•
(NCO 1)
-
–
, -
–
-
–
,
-
–
-
–
-
•
(NCO 2)
-
–
, -
–
-
–
,
-
–
-
–
-
•
(NCO 3)
-
–
, -
–
-
–
,
-
–
-
–
Therefore, using the EPS (here, subscript emphasizes that the EPS is for the sensitivity analysis) with a specified sensitivity parameter value , we redefine the moment equation used in the GMM procedure as follows:
Using this function, we use penalized GMM in (21) to estimate the ETT, where the penalization term is chosen as and are chosen as the same value in Table 2.
Table 4 summarizes the sensitivity analysis results. We focus the results associated with because the other two cases can be interpreted in a similar manner. First, at , we have the same result as the EPS COCA estimates in Table 3. Second, the 95% pointwise confidence intervals for the effect estimate include the null effect once the sensitivity parameter is equal to 0.74. Third, the effect estimate is positive once the sensitivity parameter is greater than 0.99. Lastly, the lower bound of the 95% confidence interval for the crude estimate overlaps with the 95% pointwise confidence intervals for the effect estimate once is greater than 4.19.
NCO | Statistic | COCA | COCA (Null) | COCA (Positive) | COCA (Crude) |
0.000 | 0.690 | 1.490 | 2.600 | ||
2.677 | 1.750 | 0.325 | -0.484 | ||
Estimate | -2.334 | -0.718 | 0.024 | 2.330 | |
95% CI | (-3.050,-1.618) | (-1.444,0.007) | (-0.379,0.427) | (1.672,2.989) | |
0.000 | 0.690 | 0.980 | 4.370 | ||
3.060 | 1.047 | 0.273 | -4.243 | ||
Estimate | -2.298 | -0.646 | 0.130 | 2.288 | |
95% CI | (-3.261,-1.334) | (-1.303,0.011) | (-0.444,0.704) | (1.473,3.103) | |
0.000 | 1.040 | 1.320 | 7.030 | ||
2.629 | 0.829 | 0.165 | -4.611 | ||
Estimate | -2.462 | -0.541 | 0.006 | 2.630 | |
95% CI | (-3.357,-1.567) | (-1.083,0.001) | (-0.457,0.469) | (2.292,2.969) |
To interpret these results, consider the scenario where are used as NCOs. Note that, at the sensitivity parameter value , the EPS COCA inference would become empirically consistent with the crude estimate. At this specific value of , the estimated coefficient for stands at . Compared to at , this suggests that it would take a substantial violation of our primary assumption for the crude estimate to have a causal interpretation; a possibility we do not believe credible. Likewise, our sensitivity analysis suggests that it would also take incorporating a relatively large violation of our primary assumption for the COCA estimator to become consistent with the sharp null hypothesis of no causal effect , therefore indicating the presence of a strong common cause of exposure and baseline outcome, leading to a near doubling of the odds of exposure to Zika virus but which does not also confound the outcome of interest; which we believe to be unlikely.
A.9 Over-identification Test
We formalize an over-identification test discussed in Section 6 of the paper. For simplicity, suppose that there are two potential NCOs, denoted by , and that is known a priori to be a valid NCO. One can obtain two semiparametric estimators of , denoted by and , which are constructed by either using or both as NCOs; see Section 4 of the main paper or the Appendix A.4 for details on how these estimators are constructed. From Result 6 and (20), is CAN for . Likewise, if is also a valid NCO, is also CAN for the ETT from Result 6. However, if is an invalid NCO, say it violates Assumption 2-(iii), may be no longer CAN for . This implies that, we have the following result under a null hypothesis :
A consistent estimator for , say can be obtained in two ways. First, if parametric estimators are used, can be constructed from a consistent GMM variance estimator. Second, if semiparametric estimators are used, can be chosen as the empirical variance of the difference between the two efficient influence functions where one is associated with only and the other is associated with both and . Therefore, a statistical test for evaluating the null hypothesis is given by
where is the percentile of .
In the context of the Zika virus application, we conducted the over-identification test, and the results are summarized in Table 5. The findings from all four estimators indicate that there is insufficient statistical evidence to conclude that either or is an invalid NCO, as long as the other is a valid NCO.
Estimator | Statistic |
|
|
|||||
---|---|---|---|---|---|---|---|---|
Semiparametric | Estimator | |||||||
SE | ||||||||
Test statistic | ||||||||
Parametric | EPS | Estimator | ||||||
SE | ||||||||
Test statistic | ||||||||
COCA bridge function | Estimator | |||||||
SE | ||||||||
Test statistic | ||||||||
Doubly-robust | Estimator | |||||||
SE | ||||||||
Test statistic |
Appendix B Proof
B.1 Proof of (3)
Suppose that the EPS is correctly specified, i.e. . Then,
The first equality is from the law of iterated expectations. The second equality is from the consistency assumption, i.e. if . The third equality holds when the EPS is correctly specified. The rest identities hold from the law of iterated expectations. Similarly, we obtain
Consequently, we obtain
Therefore, the GMM estimator using leads to if the EPS is correctly specified.
B.2 Proof of Result 1 and (4)
The proof is straightforward from the following algebra:
Equality (A) holds from the consistency assumption, i.e. if . Equalities (B) holds from Assumption 2. The rest identities are trivial.
B.3 Proof of Result 2
We have
Therefore, by taking and , we obtain
B.4 Condition (5) Under Binary and
If is a binary variable, any arbitrary function of is represented as where and are finite numbers. Therefore, Condition (5) is written as
(26) | |||
(27) |
Therefore, the system of equations satisfy
and
provided that . Therefore,
which is equivalent to
B.5 Proof of Result 3 and (7)
Suppose that satisfies (5) and Assumption 2. Then,
Equalities (A), (B), and (C) hold from Assumption 2, consistency assumption, and (5), respectively.
Suppose that the COCA confounding bridge function is correctly specified, i.e. . Then,
Therefore, the GMM estimator using leads to if the COCA confounding bridge function is correctly specified.
B.6 Proof of Result 5
Suppose then
The first equality holds from the law of iterated expectations. The second equality holds from Assumption 2 and the consistency assumption. The third equality holds from . The last equality is trivial.
Next suppose that then
The first equality holds from and the consistency assumption. The second equality holds from the law of iterated expectations. The third equality holds from Assumption 2. The remaining equalities are trivial. Therefore, the GMM estimator using leads to if the EPS or the COCA confounding bridge function is correctly specified.
B.7 Proof of Result 4
It is straightforward to verify that the efficient influence function for is . Therefore, it suffices to find an (efficient) influence function of . In the remainder of the proof, we establish that (i) the following is an influence function of the counterfactual mean under and (ii) it is the efficient influence function for at the submodel :
B.7.1 Proof of (i)
Let be the one-dimensional parametric submodel of and be the density at . We suppose that the true density is recovered at , i.e., . Let be the score function evaluated at , and let be the score function evaluated at . Let be the expectation operator at .
Let be the COCA confounding bridge function satisfying
Let be the counterfactual mean evaluated at . Let and . Note that and and . Taking the derivative of the above restrictions with respect to and evaluating at , the score functions satisfies:
(28) |
The pathwise derivative of is
which is evaluated at as follows:
We find that
Equalities with no marks are straightforward to show from the law of iterated expectations. Equalities with are established from Assumption 2: . The equality with is established from (28). The equality with holds from the property of the score function. The equality with holds from the definition of the COCA confounding bridge function. As a consequence, we find
implying that is a pathwise differentiable parameter (Newey, 1990) with the corresponding influence function .
B.7.2 Proof of (ii)
It suffices to show that the influence function belongs to the tangent space of at the submodel . Note that the model imposes a restriction in (28) on the score function. The restriction implies that the tangent space of consists of the functions satisfying
(29) |
Under (Surjectivity), we have . Therefore, any satisfies the restriction (29) at the submodel . This implies that belongs to the tangent space of at the submodel .
B.8 Proof of Result 6
In what follows, we simply denote and . If we show that has the asymptotic representation as
(30) |
then we establish has the asymptotic representation as
(31) |
Therefore, the asymptotic normality result holds from the central limit theorem. Thus, we focus on showing that (30) holds.
Recall that is
(32) |
Therefore, we find the left hand side of (30) is
The second line holds because and from the law of large numbers, and is bounded. The third row holds because term (AN) is asymptotically normal, which will be shown later.
Let be the numerator of the uncentered influence function, i.e.,
Note that we have .
Let be the empirical process of centered by . Similarly, let be the empirical process of centered by where is the expectation after considering random functions obtained from as fixed functions. The empirical process of is
(33) | ||||
(34) | ||||
(35) |
where
From the derivation below, we find that (34) and (35) are , indicating that (33) is asymptotically normal, and consequently, (AN) is . Moreover, we establish (30), concluding the proof as follows:
Showing that (34) is
First, we introduce the following equality:
(36) |
We find
Equality holds from (36) by choosing . Therefore, is upper bounded as
and
This concludes that
where the second inequality is from Assumption 5-(iii).
Showing that (35) is
Next, we establish that Term (35) is . Suppose is a mean-zero function. Then, . Therefore, it suffices to show that , i.e.,
The variance of (35) is
(39) | |||
(42) | |||
(46) | |||
(55) | |||
(56) |
Inequality holds from the Cauchy-Schwartz inequality . Inequality is established in items (a) and (b) below. The last convergence rate is obtained from Assumption 5.
-
(a)
(Term-Q1)
(Term-Q1) Inequality holds from Assumption 5, which implies for and . Inequality holds from the Cauchy-Schwartz inequality.
-
(b)
(Term-Q2)-(Term-Q4)
(Term-Q2) (Term-Q3) (Term-Q4)
Consistency of the Variance Estimator
The proposed variance estimator is
where is defined in (32). Therefore, it suffices to show that . Note that is
(57) |
The third and fifth lines hold from the law of large numbers. Therefore, it is sufficient to show that (57) is also . From some algebra, we find the term in (57) is
Let . From the Hölder’s inequality, we find the absolute value of (57) is upper bounded by
Since , (57) is if . From some algebra, we find
The first line holds from . The second line holds from the law of large numbers applied to . The last line holds from (56) and , which is from the asymptotic normality of the estimator. This concludes the proof.
References
- Angrist and Pischke (2009) Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An empiricist’s companion. Princeton university press.
- Athey and Imbens (2006) Athey, S. and Imbens, G. W. (2006). Identification and inference in nonlinear difference-in-differences models. Econometrica, 74(2):431–497.
- Bang and Robins (2005) Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973.
- Caniglia and Murray (2020) Caniglia, E. C. and Murray, E. J. (2020). Difference-in-difference in the time of cholera: a gentle introduction for epidemiologists. Current Epidemiology Reports, 7(4):203–211.
- Card and Krueger (1994) Card, D. and Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in new jersey and pennsylvania. The American Economic Review, 84(4):772–793.
- Carrasco et al. (2007) Carrasco, M., Florens, J.-P., and Renault, E. (2007). Linear inverse problems in structural econometrics estimation based on spectral decomposition and regularization. In Heckman, J. J. and Leamer, E. E., editors, Handbook of Econometrics, volume 6, pages 5633–5751. Elsevier.
- Castro et al. (2018) Castro, M. C., Han, Q. C., Carvalho, L. R., Victora, C. G., and França, G. V. A. (2018). Implications of Zika virus and congenital Zika syndrome for the number of live births in Brazil. Proceedings of the National Academy of Sciences, 115(24):6177–6182.
- Chaussé (2010) Chaussé, P. (2010). Computing generalized method of moments and generalized empirical likelihood with R. Journal of Statistical Software, 34(11):1–35.
- Chernozhukov et al. (2018) Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68.
- Cui et al. (2023) Cui, Y., Pu, H., Shi, X., Miao, W., and Tchetgen Tchetgen, E. (2023). Semiparametric proximal causal inference. Journal of the American Statistical Association, 0(0):1–12.
- Diaz-Quijano et al. (2018) Diaz-Quijano, F. A., Pelissari, D. M., and Chiavegatto Filho, A. D. P. (2018). Zika-associated microcephaly epidemic and birth rate reduction in Brazilian cities. American Journal of Public Health, 108(4):514–516.
- Dikkala et al. (2020) Dikkala, N., Lewis, G., Mackey, L., and Syrgkanis, V. (2020). Minimax estimation of conditional moment models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
- Dukes et al. (2023) Dukes, O., Shpitser, I., and Tchetgen Tchetgen, E. J. (2023). Proximal mediation analysis. Biometrika. asad015.
- d’Haultfoeuille (2010) d’Haultfoeuille, X. (2010). A new instrumental method for dealing with endogenous selection. Journal of Econometrics, 154(1):1–15.
- Ghassami et al. (2022) Ghassami, A., Ying, A., Shpitser, I., and Tchetgen Tchetgen, E. (2022). Minimax kernel machine learning for a class of doubly robust functionals with application to proximal causal inference. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I., editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 7210–7239. PMLR.
- Gregianini et al. (2017) Gregianini, T. S., Ranieri, T., Favreto, C., Nunes, Z. M. A., Tumioto Giannini, G. L., Sanberg, N. D., da Rosa, M. T. M., and da Veiga, A. B. G. (2017). Emerging arboviruses in Rio Grande do Sul, Brazil: Chikungunya and Zika outbreaks, 2014-2016. Reviews in Medical Virology, 27(6):e1943.
- Hansen (1982) Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, pages 1029–1054.
- Kimeldorf and Wahba (1970) Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41(2):495–502.
- Kott (2014) Kott, P. S. (2014). Calibration weighting when model and calibration variables can differ. Contributions to sampling statistics, pages 1–18.
- Kress (2014) Kress, R. (2014). Linear Integral Equations. Springer, 3 edition.
- Lechner (2011) Lechner, M. (2011). The estimation of causal effects by difference-in-difference methods. Foundations and Trends® in Econometrics, 4(3):165–224.
- Li et al. (2023) Li, W., Miao, W., and Tchetgen Tchetgen, E. (2023). Non-parametric inference about mean functionals of non-ignorable non-response data without identifying the joint distribution. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(3):913–935.
- Lipsitch et al. (2010) Lipsitch, M., Tchetgen Tchetgen, E., and Cohen, T. (2010). Negative controls: A tool for detecting confounding and bias in observational studies. Epidemiology, 21(3):383.
- Lowe et al. (2018) Lowe, R., Barcellos, C., Brasil, P., Cruz, O. G., Honório, N. A., Kuper, H., and Carvalho, M. S. (2018). The Zika virus epidemic in brazil: From discovery to future implications. International journal of environmental research and public health, 15(1):96.
- Lunceford and Davidian (2004) Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23(19):2937–2960.
- Mastouri et al. (2021) Mastouri, A., Zhu, Y., Gultchin, L., Korba, A., Silva, R., Kusner, M., Gretton, A., and Muandet, K. (2021). Proximal causal learning with kernels: Two-stage estimation and moment restriction. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 7512–7523. PMLR.
- Miao et al. (2016) Miao, W., Geng, Z., and Tchetgen Tchetgen, E. (2016). Identifying causal effects with proxy variables of an unmeasured confounder. Preprint arXiv:1609.08816.
- Miao et al. (2018) Miao, W., Geng, Z., and Tchetgen Tchetgen, E. J. (2018). Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika, 105(4):987–993.
- Miao et al. (2023) Miao, W., Liu, L., Li, Y., Tchetgen Tchetgen, E. J., and Geng, Z. (2023). Identification and semiparametric efficiency theory of nonignorable missing data with a shadow variable. ACM/IMS Journal of Data Science. In press.
- Miao and Tchetgen Tchetgen (2016) Miao, W. and Tchetgen Tchetgen, E. (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103(2):475–482.
- Muandet et al. (2020) Muandet, K., Jitkrittum, W., and Kübler, J. (2020). Kernel conditional moment test via maximum moment restriction. In Peters, J. and Sontag, D., editors, Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), volume 124 of Proceedings of Machine Learning Research, pages 41–50. PMLR.
- Newey (1990) Newey, W. K. (1990). Semiparametric efficiency bounds. Journal of Applied Econometrics, 5(2):99–135.
- Newey and McFadden (1994) Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. In Engle, R. F. and McFadden, D. L., editors, Handbook of Econometrics, volume 4, pages 2111 – 2245. Elsevier, New York.
- Rasmussen et al. (2016) Rasmussen, S. A., Jamieson, D. J., Honein, M. A., and Petersen, L. R. (2016). Zika virus and birth defects–Reviewing the evidence for causality. New England Journal of Medicine, 374(20):1981–1987.
- Rosenbaum (1989) Rosenbaum, P. R. (1989). The role of known effects in observational studies. Biometrics, pages 557–569.
- Rotnitzky et al. (2020) Rotnitzky, A., Smucler, E., and Robins, J. M. (2020). Characterization of parameters with a mixed bias property. Biometrika, 108(1):231–238.
- Scharfstein et al. (1999) Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448):1096–1120.
- Schick (1986) Schick, A. (1986). On asymptotically efficient estimation in semiparametric models. The Annals of Statistics, 14(3):1139–1151.
- Schölkopf et al. (2001) Schölkopf, B., Herbrich, R., and Smola, A. J. (2001). A generalized representer theorem. In Helmbold, D. and Williamson, B., editors, Computational Learning Theory, pages 416–426. Springer, Berlin, Heidelberg.
- Shi et al. (2020) Shi, X., Miao, W., and Tchetgen Tchetgen, E. (2020). A selective review of negative control methods in epidemiology. Current epidemiology reports, 7:190–202.
- Sofer et al. (2016) Sofer, T., Richardson, D. B., Colicino, E., Schwartz, J., and Tchetgen Tchetgen, E. J. (2016). On negative outcome control of unobserved confounding as a generalization of difference-in-differences. Statistical Science, 31(3):348 – 361.
- Steinwart and Christmann (2008) Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer-Verlag, New York.
- Taddeo et al. (2022) Taddeo, M. M., Amorim, L. D., and Aquino, R. (2022). Causal measures using generalized difference-in-difference approach with nonlinear models. Statistics and Its Interface, 15(4):399–413.
- Tchetgen Tchetgen (2013) Tchetgen Tchetgen, E. (2013). The Control Outcome Calibration Approach for Causal Inference With Unobserved Confounding. American Journal of Epidemiology, 179(5):633–640.
- Tchetgen Tchetgen et al. (2024) Tchetgen Tchetgen, E. J., Park, C., and Richardson, D. B. (2024). Universal difference-in-differences for causal inference in epidemiology. Epidemiology, 35(1):16–22.
- Tchetgen Tchetgen et al. (2024) Tchetgen Tchetgen, E. J., Ying, A., Cui, Y., Shi, X., and Miao, W. (2024). An introduction to proximal causal inference. Statistical Science. to appear.
- van der Vaart and Wellner (1996) van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer.
- Wang et al. (2014) Wang, S., Shao, J., and Kim, J. K. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24(3):1097–1116.
- Ying et al. (2023) Ying, A., Miao, W., Shi, X., and Tchetgen Tchetgen, E. J. (2023). Proximal causal inference for complex longitudinal studies. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(3):684–704.
- Zahner et al. (1992) Zahner, G. E., Pawelkiewicz, W., DeFrancesco, J. J., and Adnopoz, J. (1992). Children’s mental health service needs and utilization patterns in an urban community: An epidemiological assessment. Journal of the American Academy of Child & Adolescent Psychiatry, 31(5):951–960.
- Zhang et al. (2023) Zhang, R., Imaizumi, M., Schölkopf, B., and Muandet, K. (2023). Instrumental variable regression via kernel maximum moment loss. Journal of Causal Inference, 11(1):20220073.