0000 \jvol000 \jnum0
Doubly Robust Proximal Causal Inference under Confounded Outcome-Dependent Sampling
Abstract
Unmeasured confounding and selection bias are often of concern in observational studies and may invalidate a causal analysis if not appropriately accounted for. Under outcome-dependent sampling, a latent factor that has causal effects on the treatment, outcome, and sample selection process may cause both unmeasured confounding and selection bias, rendering standard causal parameters unidentifiable without additional assumptions. Under an odds ratio model for the treatment effect, Li et al. (2022) established both proximal identification and estimation of causal effects by leveraging a pair of negative control variables as proxies of latent factors at the source of both confounding and selection bias. However, their approach relies exclusively on the existence and correct specification of a so-called treatment confounding bridge function, a model that restricts the treatment assignment mechanism. In this article, we propose doubly robust estimation under the odds ratio model with respect to two nuisance functions – a treatment confounding bridge function and an outcome confounding bridge function that restricts the outcome law, such that our estimator is consistent and asymptotically normal if either bridge function model is correctly specified, without knowing which one is. Thus, our proposed doubly robust estimator is potentially more robust than that of Li et al. (2022). Our simulations confirm that the proposed proximal estimators of an odds ratio causal effect can adequately account for both residual confounding and selection bias under stated conditions with well-calibrated confidence intervals in a wide range of scenarios, where standard methods generally fail to be consistent. In addition, the proposed doubly robust estimator is consistent if at least one confounding bridge function is correctly specified.
keywords:
endogenous selection bias; kernel machine learning; proximal causal inference; semiparametric estimation.1 Introduction
Unmeasured confounding and selection bias are two ubiquitous challenges in observational studies, which may lead to biased causal effect estimates and misleading causal conclusions (Hernán MA, 2020). Lipsitch et al. (2010) reviewed and formalized the use of negative control variables as tools to detect the presence of unmeasured confounding in epidemiological research. Specifically, they identify two types of negative control variables: negative control exposures (NCE) known a priori to have no causal effect on the outcome and negative control outcomes (NCO) known a priori not to be causally impacted by the treatment. Such variables may be viewed as valid proxies of unmeasured confounders to the extent that they are associated with the latter. Tchetgen Tchetgen (2014) and Flanders et al. (2017) proposed regression-type calibration for unmeasured confounding adjustment using NCOs and NCEs, respectively, under fairly strong parametric restrictions. More recently, a double negative control approach which uses NCOs and NCEs jointly to correct for unmeasured confounding has been developed, which achieves nonparametric identification of the average treatment effect (Miao et al., 2018a); also see (Shi et al., 2020b) for a recent review of negative control methods in Epidemiology. Methods for identification, estimation and inference about causal parameters by leveraging such proxies to address unmeasured confounding bias have been referred to as “proximal causal inference” (Tchetgen Tchetgen et al., 2020).
The literature on proximal causal inference is fast growing. Miao et al. (2018b) introduced an outcome confounding bridge function approach to identify and estimate the average treatment effect by leveraging both NCE and NCO variables, while Cui et al. (2020) and Deaner (2018) introduced identification via a treatment confounding bridge function. Cui et al. (2020) further developed a doubly robust approach which can identify and consistently estimate the average treatment effect if either a treatment or an outcome confounding bridge function exists and can be estimated consistently. Ghassami et al. (2021) and Kallus et al. (2021) concurrently proposed a cross-fitting minimax kernel learning approach for nonparametric estimation of the confounding bridge functions, where the efficient influence function has a mixed bias structure (Rotnitzky et al., 2021), such that -consistent estimation of the average treatment effect remains possible even when both confounding bridge functions might be estimated at considerably slower than rates.
Despite the rapid growth in methods to address confounding bias, few have considered situations in which confounding bias and selection bias might coexist. Selection bias, also sometimes called endogenous selection bias or collider stratification bias (Greenland, 2003; Elwert & Winship, 2014; Cole et al., 2010; Hernán et al., 2004), arises when the analysis conditions on selection that may be induced by both the primary treatment and outcome variables. Selection bias naturally occurs in outcome-dependent sampling designs that are widely used in epidemiological and econometric research to reduce cost and effort, such as case-control design (Breslow & Day, 1980; Pearce, 2018), case-cohort design (Prentice, 1986), choice-based design (Manski & McFadden, 1981), test-negative design (Jackson & Nelson, 2013; Sullivan et al., 2016), and retrospective cohort designs with electronic health records (EHR) data (Streeter et al., 2017). Similar to confounding bias, selection bias potentially induces spurious associations between treatment and outcome variables of primary scientific interest, even under the null hypothesis of no causal effect. Existing methods to adjust for selection bias include explicit modeling of sample selection mechanism (Heckman, 1979; Heckman et al., 1998; Cuddeback et al., 2004), matching (Heckman et al., 1996, 1998), difference-in-differences models (Heckman et al., 1998) and inverse probability weighting (Hernán et al., 2004; Mansournia & Altman, 2016). However, when a common latent factor causes the treatment, outcome and selection mechanisms, the above methods are not applicable and the causal effect cannot generally be identified. Gabriel et al. (2020) referred to such a sampling design as “confounded outcome-dependent sampling”. They derived causal bounds of the average treatment effect, but these bounds are often too wide to be useful in practice, which highlights the challenges of identification. Didelez et al. (2010), Bareinboim & Pearl (2012) and Bareinboim et al. (2014) studied graphical conditions for recoverability of the causal effect encoded on the additive scale and odds ratio scale under different forms of selection bias including outcome-dependent sampling. They concluded that neither causal effect is identifiable if selection is dependent on both outcome and unmeasured confounders.
Recently, Li et al. (2022) studied inference about an odds ratio model encoding a treatment’s association with an outcome of interest conditional on both measured and unmeasured confounders under confounded outcome-dependent sampling, leveraging a pair of negative control variables. The consistency of their estimator, which we refer to as the proximal inverse probability weighted (PIPW) estimator, requires consistently estimating a treatment confounding bridge function which restricts the treatment assignment mechanism. However, in practice, it may be difficult to posit a suitable parametric model for the treatment confounding bridge function, which may limit the applicability of their approach.
In this paper, we develop semiparametric inference for the conditional odds ratio estimand in Li et al. (2022). We present a new identification result and propose the proximal outcome regression (POR) estimator of conditional odds ratio that relies on the existence and correct specification of an outcome confounding bridge function which restricts the outcome law by the conditional odds of the outcome given the treatment and confounders. We further introduce a doubly-robust closed-form expression for the conditional odds ratio, based on which we propose a proximal doubly robust (PDR) estimator, which is consistent if either the treatment or the outcome confounding bridge function is consistently estimated and thus has improved robustness against model misspecification of the nuisance functions compared with PIPW and POR. We demonstrate the performance of the proposed estimators through comprehensive simulations. Throughout, we relegate all proofs to Section S1 of the Supplementary Materials.
(a) | (b) | (c) |
2 Identification and estimation under a homogeneous odds ratio model
2.1 Notation and Setup
Suppose the data contain a sample of identically and independently distributed observations drawn from the population of interest, referred to as the “target population”. For each observation in the target population, let and be the treatment and outcome of primary interest, respectively, assumed to be binary for the time being. Let be a binary indicator for selection into the study sample, such that the available data only include observations with . Let be a vector of measured pre-treatment covariates that may be associated with , and . Suppose that in addition to , there exist pre-treatment latent factors, denoted as , that may cause , and . Figure 1(a) shows the directed acyclic graph (DAG) for the causal relationships between the above variables. Similar to Li et al. (2022) Section 2.5, we assume the following model, where , the conditional log odds ratio given , encodes the treatment effect of primary interest: {model}[Homogeneous odds ratio model; Li et al. (2022) Assumption 3’]
where denotes the logit function and is an unknown real-valued function. Model 2.1 describes a semiparametric logistic regression model that assumes a homogeneous association between and on the odds ratio scale, across strata of all measured and latent factors . Model 2.1 encodes the structural model
where () denotes the potential outcome were the individual given the treatment status , under standard identifiability assumptions of (a) consistency, i.e. if , which requires there is only one version of treatment and an individual’s outcome is not affected by others’ treatment status (Cole & Frangakis, 2009); (b) latent ignorability, i.e. , which essentially requires to contain all confounders of A-Y association; and (c) positivity, i.e. almost surely (Hernán MA, 2020). Under these conditions, encodes the log conditional causal odds ratio effect in every stratum.
In Li et al. (2022) Assumption 3, they also considered a homogeneous risk ratio model, by which the marginal causal risk ratio can be identified. We focus on Model 2.1 for two reasons: first, proximal identification of under Model 2.1 permits treatment-induced selection under Assumption 2.1 below, while identification under the homogeneous risk ratio model appears to require that has no causal effect on selection other than through ; second, Li et al. (2022) invoked a rare outcome assumption in the target population for proximal identification under the homogeneous risk ratio model, which restricts the applicability of the method. Such a rare outcome assumption is not necessary for proximal identification of under Model 2.1. As discussed in Section S2 of the Supplementary Materials, however, under standard identifiability assumptions (a)-(c) above and a rare outcome assumption akin to that assumed by Li et al. (2022), in Model 2.1 still approximates the log marginal causal risk ratio . Alternatively, as discussed in Section S3 of the Supplementary Materials, under settings where control subjects are selected to be positive for an outcome variable that is a priori known to be not affected by the treatment, may recover the marginal causal risk ratio without invoking the rare outcome condition. Such a setting may occur, for example, in test-negative design studies of vaccine effectiveness, where the control subjects are selected to have a pre-determined infection that is not affected by the vaccine (Jackson & Nelson, 2013; Sullivan et al., 2016; Chua et al., 2020; Schnitzer, 2022).
Although to simplify the presentation, results given in the main text focus primarily on Model 2.1, Section S11 of the Supplemental Materials extends these results to the more general model
which explicitly accounts for effect modification of odds ratio with respect to measured covariates .
In general, the conditional log odds ratio is not identifiable: first, the latent factors defining the strata are not observed; second, the data only include selected subjects with whilst Model 2.1 is defined in the target population.
As in Li et al. (2022) Assumption 2’, we make an important assumption regarding the selection mechanism that the treatment does not modify the conditional risk ratio of selection against the outcome in every stratum. Formally, we assume: {assumption}[No effect modification by on the outcome-dependent selection]For and some positive-valued unknown function ,
As mentioned above, a special case of Assumption 2.1 is if has no causal effect on other than through , a stronger assumption considered by Li et al. (2022) to identify the marginal causal risk ratio. As a result of Assumption 2.1, the conditional odds ratio given is identical to that given , which leaves only the challenge of controlling for unmeasured confounding by in the selected sample.
2.2 Proximal Identification of in Li et al. (2022)
To detect and correct for unmeasured confounding bias, the proximal inference framework proposes to leverage a pair of proxy measurements of unmeasured confounders (Miao et al., 2018a; Cui et al., 2020; Tchetgen Tchetgen et al., 2020). In this line of works, Li et al. (2022) developed proximal causal inference for under confounded outcome-dependent sampling as represented in Figure 1(a) (Jackson & Nelson, 2013; Chua et al., 2020). We similarly assume proxies of the latent factors are available. {assumption}[Proximal independence conditions] There exist a pair of proxies of the latent factor : a treatment proxy (which Li et al. (2022) refers to as a “negative control exposure”), denoted as , and an outcome proxy (which Li et al. (2022) refers to as a “negative control outcome”), denoted as , that satisfy
Assumption 2.2 requires the treatment proxy and outcome proxy to satisfy certain conditional independences in the target population. Namely, the treatment proxy has no direct effect on the primary outcome , selection indicator , and the outcome proxy ; and the primary treatment has no direct effect on the outcome proxy . It is crucial that both and are both -relevant, that is they carry information and therefore are associated with (Shi et al., 2020b; Tchetgen Tchetgen et al., 2020); this is formalized by Assumptions 2.2(a) and 2.3(a) below. As such, in the selected sample, an observed - association after adjusting for , or an observed - or - association after adjusting for , may indicate the presence of bias due to the latent factor . Figure 1 (b) and (c) show the DAGs of scenarios where Assumption 2.2 holds.
As in Li et al. (2022), we assumed the existence of a treatment confounding bridge function that connects the treatment proxy to the latent factor.
[Treatment confounding bridge function] There exists a treatment confounding bridge function that satisfies
(1) |
almost surely for .
Equation (1) formally defines a Fredholm integral equation of the first kind. To ensure the existence and identifiability of the solution to Equation (1), we make the following completeness assumptions, similar to those assumed by Li et al. (2022) and Cui et al. (2020):
[Completeness]
-
(a)
For any square-integrable function , if almost surely, then almost surely;
-
(b)
For any square-integrable function , if almost surely, then almost surely.
The completeness condition has been used to achieve identification for statistical functionals in econometrics and semiparametric statistics literature (Newey & Powell, 2003; D’Haultfoeuille, 2011). Essentially, Assumption 2.2(a) formalizes the -relevance of and requires that has at least as much variability as in every stratum of the outcome-free subjects in the selected sample. When and are both categorical variables, necessarily has at least as many categories as . Assumption 2.2(a) and the regularity conditions discussed in Section S4 of the Supplementary Materials, together constitute a sufficient condition for Assumption 2.2, i.e. the existence of .
Similarly, Assumption 2.2(b) roughly requires that the outcome proxy is at least as variable as the treatment proxy , and is a sufficient condition under which can be uniquely identified from the selected sample, as stated in the theorem below:
Theorem 2.1 (Identification of ; Li et al. (2022) Theorem 2’).
Leveraging the proxies, Li et al. (2022) established the following identification result for .
Theorem 2.2 (Identification of ; Li et al. (2022) Theorem 1’).
Under Model 2.1, if Assumptions 2.1-2.2 hold, then the conditional log odds ratio parameter solves the moment equation
(3) |
where is an arbitrary square-integrable unidimensional function that satisfies
Equation (3) admits a closed-form solution for :
(4) |
Unique identification of by Assumption 2.2(b) is necessary. Although Equation (4) holds for any treatment confounding bridge function that satisfies Equation (1), can only be identified in the selected sample through solving Equation (2). If Equation (2) has multiple solutions, there is no guarantee that a solution to Equation (2) is also a solution to Equation (1).
2.3 New Proximal Identification Results
From Theorem 2.2(b), it is clear that the treatment confounding bridge function plays a crucial role for identifying . However, in case Assumption 2.2 does not hold, or Assumption 2.2 holds but Assumption 2.2(b) does not hold, then the solution to Equation (2) in the selected sample may not be unique and satisfy Equation (1) that defines a treatment confounding bridge function. As a result, the parameter identified by Equation (4), where is a solution to Equation (2), may not have a causal interpretation. In this section, we establish a new identification result for that does not rely on . We instead define an outcome confounding bridge function as below: {assumption}[Outcome confounding bridge function]There exists an outcome confounding bridge function that satisfies
(5) |
almost surely. Miao et al. (2018b) and Cui et al. (2020) previously proposed proximal identification of average treatment effect using an outcome confounding bridge function. Our definition of by Equation (5) is different from those in previous works of proximal causal inference, in that the conditional expectation of our is defined to equal the conditional odds of the outcome, whilst the conditional expectation of the outcome confounding bridge function in previous works was defined to equal the conditional mean outcome. Theorem 2.4 below suggests that the standard outcome bridge function of prior literature would indeed fail to identify while our alternative definition yields the desired identification result.
In contrast with Assumption 2.2, we make the following completeness assumption for the existence and identifiability of the outcome confounding bridge function .
[Completeness]
-
(a)
For any square-integrable function , if almost surely, then almost surely;
-
(b)
For any square-integrable function , if almost surely, then almost surely.
Similar to Assumption 2.2(a), Assumption 2.3(a) essentially requires that has sufficient variability relative to in every stratum of the outcome-free subjects in the selected sample, and constitutes a sufficient condition for Assumption 2.3, i.e. the existence of , together with other regularity conditions, as discussed in Section S4 of the Supplementary Materials. Assumption 2.3(b) requires the opposite of Assumption 2.2(b), that the treatment proxy is sufficiently variable relative to the outcome proxy , and is a sufficient condition under which can be uniquely identified from the selected sample, as stated in Theorem 2.3 below.
Theorem 2.3 (Identification of ).
We introduce the new identification result in Theorem 2.4 below:
Theorem 2.4 (Identification of ).
Under Model 2.1, if Assumptions 2.1, 2.2 and 2.3 hold, then the conditional log odds ratio parameter solves the moment equation
(7) |
where is an arbitrary square-integrable unidimensional function that satisfies
The integral equation (7) also admits a closed-form solution
(8) |
2.4 A doubly robust closed-form expression for
Until now, identification of has required the existence and identification of either a treatment confounding bridge function or an outcome confounding bridge function. In this section, we present a closed-form expression for , which has a desirable doubly-robustness property in the sense that it only requires one of the two confounding bridge functions to exist and be identified.
Theorem 2.5 (A doubly-robust closed-form expression for ).
2.5 Estimation and large sample inference
Denote data for the th subject in the selected sample as , . As suggested by Equations (4), (8) or (9), consistent estimators for and can be used to construct estimators for following the standard plug-in principle (Bickel & Ritov, 2003). It remains to estimate the two confounding bridge functions. Below, we present crucial moment conditions for the identification of the confounding bridge functions.
Theorem 2.6 (Moment conditions for and ).
Equations (4), (8) and (9) and Theorem 2.6 together suggest a parametric approach to estimate : one may postulate suitable parametric working models and where and are unknown parameters of finite dimensions. One can then estimate the nuisance parameters and by solving the estimating equations
(12) | |||
(13) |
with user-specified functions and of dimensions equal to that of and respectively. Closed-form parametric models for the confounding bridge functions can be derived in some settings as described below.
Example 2.7.
If are all binary variable and is a categorical variable with levels , then a suitable model is the saturated model of the form
(14) | ||||
and | (15) |
for , where
Example 2.8.
If and are both multivariate Gaussian random variables conditioning on , under the setting given in Section S6 of the Supplementary Materials, suitable models for the confounding bridge functions are
(16) | ||||
and | (17) |
where and .
After obtaining the estimators and by solving Equations (12) and (13), can then be estimated with the following plug-in estimators:
(18) | ||||
(19) | ||||
(20) |
Alternatively, one can jointly estimate and nuisance parameters and by solving estimating equations with the following estimating functions:
and | ||
The resulting estimators are consistent and asymptotically normal, following standard estimating equation theory (Van der Vaart, 2000). Similar to Cui et al. (2020), we refer to , and as the proximal inverse probability weighted (PIPW) estimator, the proximal outcome regression (POR) estimator, and the proximal doubly robust (PDR) estimator, respectively. As discussed in Sections S2 and S3 in the Supplementary Materials, the PIPW estimator can estimate the marginal causal risk ratio assuming either (a) a homogeneous risk model, a treatment-independent selection mechanism, and a rare outcome condition in the target population, or (b) a test-negative design where the controls are selected to have an irrelevant outcome to the treatment. Because they estimate the same functional, the new POR and PDR estimators can be viewed as estimators of the marginal causal risk ratio under the same conditions.
As implied by Theorem 2.5, the estimator also enjoys the desirable double robustness property– it is consistent for if either or can be consistently estimated. We stated this result formally in the theorem below.
Theorem 2.9.
Under Model 2.1 and Assumptions 2.1, 2.2, 2.2, and 2.3, we have the following results:
-
(a)
is consistent for and asymptotically normal if the model is correctly specified and Assumption 2.2(b) holds;
-
(b)
is consistent for and asymptotically normal if the model is correctly specified and Assumption 2.3(b) holds;
-
(c)
is consistent for and asymtotically normal if either the conditions in (a) or those in (b) hold.
Theorem 2.9 indicates that the proposed PDR estimator has improved robustness over the PIPW and POR estimator against model misspecification of the two confounding bridge functions. If candidate proxies and have different cardinalities, for example, when they are categorical variables of unequal dimensions, then only one of the completeness Assumptions 2.2 and 2.3 may hold. Say is of higher dimension, then Assumption 2.3(b) cannot hold, the outcome bridge function may not be not uniquely identified, and may be biased for . However, it may be possible to coarsen levels of to match those of (Shi et al., 2020a). We highlight that the consistency of only requires one of the two completeness assumptions to hold.
Our identification and estimation strategies involve user-specified functions , and . In principle, these nuisance functions can be chosen to construct a potentially more efficient estimator for , , and respectively, but the optimal choice of these functions may require modeling complex features of the observed data distribution. As discussed in Cui et al. (2020) and Stephens et al. (2014), such features can be considerably difficult to model correctly, and unsuccessful attempts to do so may lead to loss of efficiency, thus are not always worthwhile. In our simulation and real data analysis, we found that our generic choices for , and deliver reasonable efficiency.
Until now, our estimation has required specification of parametric working models for and/or . When is continuous or high-dimensional, however, neither of these working models may contain the true confounding bridge functions. Ghassami et al. (2021) and Kallus et al. (2021) concurrently proposed a cross-fitting minimax kernel learning method for proximal learning of average treatment effect. Their method allows nonparametric estimation of the nuisance functions and -consistent estimation for the average treatment effect, even when both estimated confounding bridge functions are consistent at slower than rates. In Section S7 of the Supplementary Materials, we demonstrate that their method is also applicable to our setting, and describe a semiparametric kernel estimator for the odds ratio parameter with flexibly estimated confounding bridge functions, -consistency, and asymptotic normality.
3 Simulation
In this section, we perform simulation studies to evaluate the performance of our proposed estimators of under five different scenarios. All scenarios follow Model 2.1 with . In Scenario I, the variables are all univariate binary variables. We suppose there are no measured confounders . Therefore, saturated models for the confounding bridge functions are appropriate, i.e.
The nuisance parameters and are estimated by solving Equations (12) and (13), respectively, with and . In Scenario II, the variables are continuous. The conditional distributions of and given other variables follow Example 2.8. Therefore parametric working models in Equations (16) and (17) are appropriate. We use these parametric working models and estimate the nuisance parameters and by solving estimating equations (12) and (13), setting to be a vector including , , , all of their higher-order interactions, and an intercept term, and similarly. We set for all the estimators under comparison.
In Scenarios III-V, the data generating process is the same as in Scenario II but we misspecify different components of the parametric working models. In Scenario III, we misspecify the treatment confounding bridge function by ignoring , i.e., we set
In this scenario, we expect the POR and PDR estimators to be consistent and the PIPW estimator to be inconsistent. In Scenario IV, we similarly misspecify the outcome confounding bridge function by ignoring , i.e. we set
In this scenario, we expect the PIPW and PDR estimators to be consistent and the POR estimator to be inconsistent. Finally, in Scenario V, we misspecify both and as above. We expect all three estimators to be inconsistent.
We relegate the details of the data generating process to Section S10 of the Supplementary Materials. We report the bias, standard deviation, and coverage of 95% confidence intervals of the POR, PIPW, and PDR estimators for each scenario over 500 Monte Carlo samples.
Table 1 shows the results of various simulations. In Scenario I and II, as the parametric working models for both confounding bridge functions are correctly specified, all three estimators have essentially identical performances as expected. In Scenario III, the POR and PDR estimators are consistent and have calibrated 95% confidence intervals, while PIPW is inconsistent and has anti-conservative 95% confidence intervals. On the other hand, in Scenario IV, the PIPW and PDR estimators are consistent and have calibrated 95% confidence intervals but not the POR estimator. In Scenario V, all three estimators are inconsistent and have anti-conservative 95% confidence intervals, but the PDR estimator appears to have slightly smaller biases and better confidence interval coverage.
Scenario | N | POR | PIPW | PDR | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Bias | SD | Coverage | Bias | SD | Coverage | Bias | SD | Coverage | ||
I | 3,500 | 0.05 | 0.27 | 97.4% | 0.05 | 0.05 | 97.4% | 0.05 | 0.27 | 97.4% |
I | 5,000 | 0.00 | 0.23 | 96.4% | 0.02 | 0.22 | 96.4% | 0.00 | 0.22 | 96.4% |
II | -0.03 | 0.24 | 95.0% | -0.03 | 0.23 | 94.2% | -0.02 | 0.23 | 94.2% | |
II | -0.02 | 0.16 | 95.0% | -0.02 | 0.15 | 94.8% | -0.02 | 0.15 | 94.2% | |
III | -0.03 | 0.24 | 94.8% | -0.11 | 0.22 | 91.2% | -0.02 | 0.23 | 94.2% | |
III | -0.02 | 0.16 | 95.0% | -0.10 | 0.15 | 90.0% | -0.02 | 0.15 | 95.0% | |
IV | -0.12 | 0.24 | 92.2% | -0.03 | 0.23 | 94.0% | -0.03 | 0.23 | 94.0% | |
IV | -0.11 | 0.16 | 90.8% | -0.02 | 0.15 | 94.6% | -0.02 | 0.15 | 94.2% | |
V | -0.12 | 0.24 | 92.2% | -0.11 | 0.22 | 90.8% | -0.09 | 0.22 | 92.2% | |
V | -0.11 | 0.16 | 90.8% | -0.10 | 0.15 | 89.8% | -0.08 | 0.15 | 91.6% |
4 Extensions
We have focused on the settings where the treatment and outcome are both binary, but the proposed framework can extend to study the association between a polytomous treatment and a polytomous outcome with simple modification. These extensions are relegated to Section S12 of the Supplementary Materials.
Although we have primarily focused on test-negative design as an example of outcome-dependent sampling, our method applies to other outcome-dependent sampling designs where unmeasured confounding is of concern and suitable proxy variables may be identified, such as a case-cohort study (Breslow & Day, 1980) or a retrospective case-control/cohort study using data from electronic health records (Streeter et al., 2017), where subjects’ treatment status may be associated with underlying frailty or risk of developing the outcome of interest.
References
- Ai & Chen (2003) Ai, C. & Chen, X. (2003). Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica 71, 1795–1843.
- Bareinboim & Pearl (2012) Bareinboim, E. & Pearl, J. (2012). Controlling selection bias in causal inference. In Artificial Intelligence and Statistics. PMLR.
- Bareinboim et al. (2014) Bareinboim, E., Tian, J. & Pearl, J. (2014). Recovering from selection bias in causal and statistical inference. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28.
- Bickel & Ritov (2003) Bickel, P. J. & Ritov, Y. (2003). Nonparametric estimators which can be” plugged-in”. The Annals of Statistics 31, 1033–1053.
- Breslow & Day (1980) Breslow, N. & Day, N. (1980). Statistical methods in cancer research. volume i-the analysis of case-control studies.[accessed october 10, 2009]. IARC Sci Publ 32, 5–338.
- Chen (2007) Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Handbook of econometrics 6, 5549–5632.
- Chua et al. (2020) Chua, H., Feng, S., Lewnard, J. A., Sullivan, S. G., Blyth, C. C., Lipsitch, M. & Cowling, B. J. (2020). The use of test-negative controls to monitor vaccine effectiveness: a systematic review of methodology. Epidemiology (Cambridge, Mass.) 31, 43.
- Cole & Frangakis (2009) Cole, S. R. & Frangakis, C. E. (2009). The consistency statement in causal inference: a definition or an assumption? Epidemiology 20, 3–5.
- Cole et al. (2010) Cole, S. R., Platt, R. W., Schisterman, E. F., Chu, H., Westreich, D., Richardson, D. & Poole, C. (2010). Illustrating bias due to conditioning on a collider. International journal of epidemiology 39, 417–420.
- Cuddeback et al. (2004) Cuddeback, G., Wilson, E., Orme, J. G. & Combs-Orme, T. (2004). Detecting and statistically correcting sample selection bias. Journal of Social Service Research 30, 19–33.
- Cui et al. (2020) Cui, Y., Pu, H., Shi, X., Miao, W. & Tchetgen Tchetgen, E. (2020). Semiparametric proximal causal inference. arXiv preprint arXiv:2011.08411 .
- Deaner (2018) Deaner, B. (2018). Proxy controls and panel data. arXiv preprint arXiv:1810.00283 .
- Didelez et al. (2010) Didelez, V., Kreiner, S. & Keiding, N. (2010). Graphical models for inference under outcome-dependent sampling. Statistical Science 25, 368–387.
- Dikkala et al. (2020) Dikkala, N., Lewis, G., Mackey, L. & Syrgkanis, V. (2020). Minimax estimation of conditional moment models. Advances in Neural Information Processing Systems 33, 12248–12262.
- D’Haultfoeuille (2011) D’Haultfoeuille, X. (2011). On the completeness condition in nonparametric instrumental problems. Econometric Theory 27, 460–471.
- Elwert & Winship (2014) Elwert, F. & Winship, C. (2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual review of sociology 40, 31–53.
- Engel (1988) Engel, J. (1988). Polytomous logistic regression. Statistica Neerlandica 42, 233–252.
- Flanders et al. (2017) Flanders, W. D., Strickland, M. J. & Klein, M. (2017). A new method for partial correction of residual confounding in time-series and other observational studies. American journal of epidemiology 185, 941–949.
- Gabriel et al. (2020) Gabriel, E. E., Sachs, M. C. & Sjölander, A. (2020). Causal bounds for outcome-dependent sampling in observational studies. Journal of the American Statistical Association , 1–12.
- Ghassami et al. (2021) Ghassami, A., Ying, A., Shpitser, I., Tchetgen Tchetgen, E. et al. (2021). Minimax kernel machine learning for a class of doubly robust functionals. Tech. rep.
- Greenland (2003) Greenland, S. (2003). Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology 14, 300–306.
- Heckman (1979) Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica: Journal of the econometric society , 153–161.
- Heckman et al. (1996) Heckman, J. J., Ichimura, H., Smith, J. & Todd, P. (1996). Sources of selection bias in evaluating social programs: An interpretation of conventional measures and evidence on the effectiveness of matching as a program evaluation method. Proceedings of the National Academy of Sciences 93, 13416–13420.
- Heckman et al. (1998) Heckman, J. J., Ichimura, H., Smith, J. A. & Todd, P. E. (1998). Characterizing selection bias using experimental data.
- Hernán et al. (2004) Hernán, M. A., Hernández-Díaz, S. & Robins, J. M. (2004). A structural approach to selection bias. Epidemiology , 615–625.
- Hernán MA (2020) Hernán MA, R. J. (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
- Jackson & Nelson (2013) Jackson, M. L. & Nelson, J. C. (2013). The test-negative design for estimating influenza vaccine effectiveness. Vaccine 31, 2165–2168.
- Kallus et al. (2021) Kallus, N., Mao, X. & Uehara, M. (2021). Causal inference under unmeasured confounding with negative controls: A minimax learning approach. arXiv preprint arXiv:2103.14029 .
- Li et al. (2022) Li, K. Q., Shi, X., Miao, W. & Tchetgen, E. T. (2022). Double negative control inference in test-negative design studies of vaccine effectiveness. arXiv preprint arXiv:2203.12509 .
- Lipsitch et al. (2010) Lipsitch, M., Tchetgen Tchetgen, E. & Cohen, T. (2010). Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology (Cambridge, Mass.) 21, 383.
- Manski & McFadden (1981) Manski, C. F. & McFadden, D. (1981). Alternative estimators and sample designs for discrete choice analysis. Structural analysis of discrete data with econometric applications 2, 2–50.
- Mansournia & Altman (2016) Mansournia, M. A. & Altman, D. G. (2016). Inverse probability weighting. Bmj 352.
- Miao et al. (2018a) Miao, W., Geng, Z. & Tchetgen Tchetgen, E. J. (2018a). Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105, 987–993.
- Miao et al. (2018b) Miao, W., Shi, X. & Tchetgen Tchetgen, E. (2018b). A confounding bridge approach for double negative control inference on causal effects. arXiv e-prints , arXiv–1808.
- Newey & Powell (2003) Newey, W. K. & Powell, J. L. (2003). Instrumental variable estimation of nonparametric models. Econometrica 71, 1565–1578.
- Pearce (2018) Pearce, N. (2018). Bias in matched case–control studies: Dags are not enough. European journal of epidemiology 33, 1–4.
- Prentice (1986) Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11.
- Rotnitzky et al. (2021) Rotnitzky, A., Smucler, E. & Robins, J. M. (2021). Characterization of parameters with a mixed bias property. Biometrika 108, 231–238.
- Schnitzer (2022) Schnitzer, M. E. (2022). Estimands and estimation of covid-19 vaccine effectiveness under the test-negative design: Connections to causal inference. Epidemiology (Cambridge, Mass.) 33, 325.
- Shi et al. (2020a) Shi, X., Miao, W., Nelson, J. C. & Tchetgen Tchetgen, E. J. (2020a). Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 521–540.
- Shi et al. (2020b) Shi, X., Miao, W. & Tchetgen Tchetgen, E. (2020b). A selective review of negative control methods in epidemiology. Current Epidemiology Reports , 1–13.
- Stephens et al. (2014) Stephens, A., Tchetgen, E. T. & De Gruttola, V. (2014). Locally efficient estimation of marginal treatment effects when outcomes are correlated: is the prize worth the chase? The international journal of biostatistics 10, 59–75.
- Streeter et al. (2017) Streeter, A. J., Lin, N. X., Crathorne, L., Haasova, M., Hyde, C., Melzer, D. & Henley, W. E. (2017). Adjusting for unmeasured confounding in nonrandomized longitudinal studies: a methodological review. Journal of clinical epidemiology 87, 23–34.
- Sullivan et al. (2016) Sullivan, S. G., Tchetgen Tchetgen, E. J. & Cowling, B. J. (2016). Theoretical basis of the test-negative study design for assessment of influenza vaccine effectiveness. American journal of epidemiology 184, 345–353.
- Tchetgen Tchetgen (2014) Tchetgen Tchetgen, E. (2014). The control outcome calibration approach for causal inference with unobserved confounding. American journal of epidemiology 179, 633–640.
- Tchetgen Tchetgen et al. (2020) Tchetgen Tchetgen, E. J., Ying, A., Cui, Y., Shi, X. & Miao, W. (2020). An introduction to proximal causal learning. arXiv preprint arXiv:2009.10982 .
- Van der Vaart (2000) Van der Vaart, A. (2000). Asymptotic statistics, vol. 3. Cambridge university press.
Supplementary Material to “Proximal learning of odds ratio under confounded outcome-dependent sampling designs”
S1 Proofs of Theorems.
S1.1 Proof of Theorem 2.1
To prove Equation (2), we have
(21) | ||||
(22) | ||||
To prove the uniqueness of a solution to Equation (2) under Assumption 2.2(b), we suppose there exists another function that solves Equation (2), that is,
Then
By Assumption 2.2(b), almost surely.
S1.2 Proof of Theorem 2.2
We first state a useful result for identification:
Lemma S1.
By Lemma S1, it suffices to prove that . Under Assumptions (2.2) and (2.2), solves Equation (1). We have
(25) |
S1.3 Proof of Theorem 2.3
First, We show that Equation (5) is equivalent to the moment equation
(26) |
This is because
Rearranging the terms, we obtain Equation (6).
To prove the uniqueness of a solution to Equation (6) under Assumption 2.3(b), we suppose there exists another function that solves Equation (6), that is,
Then
By Assumption 2.3(b), almost surely.
S1.4 Proof of Theorem 2.4
Under Model 2.1 and Assumptions 2.1 and 2.2, the result in Lemma S1 holds. It suffices to prove that
(28) |
for .
For , we have that
S1.5 Proof of Theorem 2.5
(29) |
for .
-
(a)
If Assumptions 2.2 and Equation (1) hold, then solves Equation (2) and the conclusion in Theorem 2.2 holds.
The right-hand side of Equation (29) equals
-
(b)
If Assumptions 2.2 and Equation (5) hold, then solves Equation (27) and the conclusion in Theorem 2.4 holds. The right-hand side of Equation (29) equals
S1.6 Proof of Theorem 2.6
To prove Equation (10), we have
S1.7 Proof of Theorem 2.9
S2 Approximate equivalence of RR and OR for a rare outcome
Under Model 2.1, the conditional odds ratio is
Consider the conditional risk ratio
Under a rare infection assumption that
almost surely for and a small , OR and RR satisfies
(30) |
If , then .
Furthermore, denote as the potential outcome had the individual received the treatment . We make the following standard identifiability assumptions (Hernán MA, 2020):
-
(a)
(Consistency) if ;
-
(b)
(Confounded ignorability) for ;
-
(c)
(Positivity) almost surely.
Under these assumptions, the marginal causal risk ratio is
cRR | |||
and
If , then .
S3 Identification of marginal causal risk ratio in a test-negative design with diseased controls
Similar to Schnitzer (2022), we denote as a categorical variable indicating a subject’s outcome, where indicates , i.e. the subject is positive for the infection of interest; indicates that the subject has but is positive for another outcome, referred to as the “control infection”, that is known a priori to not be affected by ; and indicates that neither outcome is positive. To formalize the assumption of no treatment effect on the control disease, we assume
(31) |
As sample selection is only restricted to subjects with infection, we have
(32) |
Rather than Assumption 2.1, we make the stronger assumption that
(33) |
That is, a subject’s treatment status has no impact on the sample selection. This assumption holds in a test-negative study if, given a subject’s disease status and other traits included in , their decision to seek care does not depend on their treatment status.
Under Model 2.1 and above assumptions, we have
Under the standard identifiability assumptions in Section S2, the marginal causal risk ratio is
cRR | |||
In the above discussion, we do not invoke the rare outcome assumption.
S4 Regularity conditions for the existence of a solution to Equations (1) and (26)
The results in this section directly adapted from Appendix B of Cui et al. (2020) and Section B of the Supplementary Materials of Li et al. (2022).
Let denote the Hilbert space of all square-integrable functions of with respect to distribution function , equiped with inner product . Let denote the conditional expectation operator and let denote a singular value decomposition of . Similarly, let denote the conditional expectation operator and let denote a singular value decomposition of . Consider the following regularity conditions:
-
(1)
;
-
(2)
;
-
(3)
;
-
(4)
.
S5 Summary of our assumptions and results and those in Li et al. (2022)
Li et al. (2022) Sections 2.2-2.4 | PIPW identification | ||
(Li et al. (2022) Section 2.5) | |||
Model | Model 2.1 | ||
Interpretation of | conditional risk ratio | conditional odds ratio | |
Selection mechanism | Assumption 2.1 | ||
Proximal independence | Assumption 2.2 | ||
definition | Equation (1) | ||
definition | / | / | |
Completeness | Assumption 2.2(b) | Assumption 2.2(b) | |
Identification | Equation (2) | Equation (2) | |
/ | / | ||
Equation (4) | Equation (4) | ||
Note | (1) Identification of and requires a | Under standard identifiability | |
rare outcome condition; | assumptions in Section S2, if either | ||
(2) Under standard identifiability | the outcome is rare or under the | ||
assumptions in Section S2, | setting of Section S3, identifies the | ||
identifies the marginal causal risk ratio. | marginal causal risk ratio. | ||
(3) The proximal independence conditions | |||
can be replaced by Assumption 2.2 | |||
POR identification | PDR closed-form expression | ||
Model | Model 2.1 | Model 2.1 | |
Interpretation of | conditional odds ratio | conditional odds ratio | |
Selection mechanism | Assumption 2.1 | Assumption 2.1 | |
Proximal independence | Assumption 2.2 | Assumption 2.2 | |
definition | / | Equation (1) | |
definition | Equation (5) | Equation (5) | |
Completeness | Assumption 2.3(b) | Assumptions 2.2(b), 2.3(b) | |
Identification | / | Equation (2) + Assumption 2.2(b) | |
Equation (6) | Equation (6) + Assumption 2.3(b) | ||
Equation (8) | Equation (9) | ||
Note | Under standard identifiability a | (1) Under standard identifiability | |
assumptions in Section S2, if either | assumptions in Section S2, if either | ||
the outcome is rare or under the | the outcome is rare or under the | ||
setting of Section S3, identifies the | setting of Section S3, identifies the | ||
marginal causal risk ratio. | marginal causal risk ratio. | ||
(2) Equation (8) holds if either is a | |||
solution to Equation (1) or is a | |||
solution to Equation (5). |
S6 Detailed setting and derivation of and in Example 2.8
If are both multivariate Gaussian random variables satisfying
Assume that the treatment and outcome both follow a logistic regression model given :
Further assume the selection indicator follows a log-linear model of the form:
It is straightforward to verify that the data distribution satisfies Model 2.1 and Assumptions 2.1-2.2. Then a suitable model for is
(34) |
While a closed-form parametric model for does not exist in this example, if the outcome is rare in the target population in the sense that
then the parametric model below may serve as an approximation
(35) |
We first derive the outcome confounding bridge function that satisfies Equation (5):
We have
Therefore
and
We further have
Now assume that , then
Hence satisfies
Here denotes the generalised inverse of the matrix .
To derive , we have
and | |||
Therefore
In the case where are small almost surely, both and are necessarily small, then
and therefore
We need find a function that satisfies Equation (1):
and
We consider , then
and
Hence satisfies
S7 Cross-fitting kernel learning estimator for nonparametric estimation of .
Under Model 2.1 and Assumptions 2.1 and 2.2, by Lemma S1, estimation of reduces to estimation of . We first derive an influence function of .
Theorem S1.
An influence function for is
Proof S2.
We let Because , their influence functions and satisfy
We proceed to derive .
Consider regular one-dimensional parametric submodel indexed by , where indicates the truth, we have
where denotes the score function.
To calculate , we note that
Taking derivatives w.r.t on both sides and rearranging the terms, we have
(36) | |||
(37) | |||
(38) |
In Equation (36), taking , we have
In Equation (38), by taking , we have
We therefore have
Hence
and
Similarly,
The influence function can be written as
(39) |
where
and |
This influence function belongs to the class of doubly-robust influence functions considered in Ghassami et al. (2021), and thus their method directly applies. By Theorem 1 of Rotnitzky et al. (2021), such an influence function also satisfies the so-call ”mixed-bias” structure, and therefore -consistent doubly robust estimation is possible even when the nuisance functions are estimated nonparametrically. In this section, we describe a cross-fitting minimax kernel learning estimator for and stated the theoretic property of the resulting estimator of .
We randomly partition the data evenly into subsamples. For each , let and be the indices and size of data in the th sample, respectively. Let be the size of the sample excluding . Write and . As proposed in Ghassami et al. (2021), we estimate the nuisance functions and using data from by solving the following optimization problems:
(40) | ||||
(41) |
where , , and are reproducing kernel Hilbert spaces (RKHS) generated by the kernel functions , , an , equipped with norms , , and , respectively, and , , and are non-negative regularizing parameters. The above optimization problems have closed form solutions, which we leave to to the end of this section.
The objective function for can be viewed as a regularised version of empirical perturbation evaluated using data from , by moving the influence function from the pair to , where the perturbation is defined as
The objective function contains two forms of regularization: the term improves the robustness against model misspecification, stability and the convergence rates, while the Tikhonov-type penalties and penalise the complexity of the classes of nuisance functions considered. The objective function for can be interpreted similarly.
We estimate with the th subsample with the PDR estimator
(42) |
The final estimator for is , which gives the cross-fitting minimax kernel learning estimator for as We summarise the estimation of in Algorithm S7.
A cross-fitting minimax kernel learning algorithm to estimate
Split the data into even subsamples |
For to |
For to |
Solve and in Equations (40) and (41) |
Obtain according to Equation (S7) |
Output . |
The estimator is consistent and asymptotically normal, thus allowing valid statistical inference, as stated in Theorem S3 below.
Theorem S3.
Implementation of this approach is challenging, as the performance of the method proves to be sensitive to the value of tuning parameters. We leave an efficient implementation of this approach to future research.
The closed form solutions for and are
where
is a diagonal matrix with elements , and are respectively the empirical kernel matrices with data in the sample , is the -vector with elements , and is the -vector with elements .
S8 Regularity conditions in Theorem S3
For a function class , let .
Let and be the eigenvalues and eigenfunctions of the RKHS and and be the eigenvalues and eigenfunctions of the RKHS . For any , let and be the matrices with entry defined respectively as
and
Let and be the minimum eigenvalues of and , respectively.
-
R.1
The functions , and are bounded.
-
R.2
There exists a constant such that
almost surely for .
-
R.3
The nuisance functions and have finite second moments.
-
R.4
The nuisance functions and their estimators and in each fold of cross-fitting satisfy
-
R.5
, ;
-
R.6
There exists a constant such that and for .
-
R.7
and
for some constant and .
-
R.8
and
-
R.9
, , , and , where the decay rates , , and satisfies
S9 Proof of Theorem S3
Under Model 2.1 and Assumptions 2.1, 2.2 and 2.2, Theorem 2.2(b) states that . By delta-method, it suffices to prove that is asymptotically linear with the influence function , the proof of which follows Ghassami et al. (2021) Theorem 2 and Lemma 1, which we reproduced below.
For and a given value , we define and . We define the measures of ill-posedness and .
Lemma S1.
The result follows by the theorem below:
S10 Details of the simulation studies
We generate the data for a target population of observations according to the following five scenarios
Scenario I: Univariate binary , and
Scenario II-V: Continuous , , , .
S11 Estimating conditional odds ratio under effect modification by
In Section 2, we assumed that the effect size of odds ratio is constant across every strata. However, in practical situation, the treatment effect may be heterogeneous. Therefore, we relax Model 2.1 and propose Model S11 below that allows heterogeneous odds ratio across strata: {model}[Heterogeneous odds ratio model by ]
(44) |
In Model S11, the odds ratio treatment effect is homogeneous with respect to the latent factors but is allowed to vary by . In such scenarios, if the negative control variables are available, the odds ratio function can still be identified, similar to Theorems 2.2, 2.4 and 2.5, as stated in the following theorem:
Theorem S1.
In cases where a parametric model for the odds ratio function is considered appropriate with a finite dimensional parameter , similar to the previous PIPW, POR and PDR estimation, the low-dimension parameter can be estimated by solving the estimating equations
(45) | ||||
(46) | ||||
or | ||||
(47) |
where and are estimators for the parameters indexing the treatment and outcome confounding bridge functions, obtained by solving Equations (12) and (13), respectively, and is a user-specified function with dimension no smaller than that of .
Alternatively, a similar kernel method to the one in Section S7 of the Supplementary Materials can be used to obtain a cross-fitting minimax learning estimator of which solves Equation (47) and nonparametrically estimates the nuisance functions and . With an estimator using either of the above methods, the conditional odds ratio at can then be estimated by .
Nonparametric estimation of is possible, for example, by using the minimax learning approach (Dikkala et al., 2020) or the semi-nonparametric sieve generalised method of moments approach (Ai & Chen, 2003; Chen, 2007). Such a nonparametric estimator may be of interest, for example, to more flexibly account for the heterogeneity of treatment effects and making flexible prediction on the “individual treatment effect”.
S12 Extension to polytomous treatment and effect
Suppose we want to study the effect of a polytomous treatment on a polytomous outcome , where have levels and have levels . Here and denote the reference treatment and outcome, respectively. We make the following homogeneity assumption: {model}[Homogeneous odds ratio model]
Model S12 indicates that the odds ratio of relative to against the treatment versus is across all strata of . Model S12 is a natural semiparametric extension to the polytomous logistic regression model (Engel, 1988). When and , Model S12 reduces to Model 2.1. The goal is to estimate and make inference on every in the presence of latent factors and confounded outcome-dependent sampling.
We make the following assumptions, corresponding to Assumption 2.1, 2.2 and 2.3: {assumption}[No effect modification by on the outcome-dependent selection]For and real-valued unknown functions , the sampling mechanism satisfies
for .
[Confounding bridge functions] The exists a treatment confounding bridge function and outcome confounding bridge functions such that for and ,
(48) | |||
(49) |
We state the following results for inference of . The proofs are similar to those in Section S1 and are therefore omitted.
Lemma S1.
Theorem S2 (Identification of and ).
The functions and , , satisfy the moment conditions
where and are arbitrary functions.
Theorem S3 (Identification of ).
By Theorem S2 and S3, to estimate one may:
-
1.
Specify suitable parametric working models and ;
-
2.
Estimate the nuisance parameters and by solving
for , where and are user-specified functions that dimensions at least as large as that of and respectively. Denote the resulting estimator as and .
-
3.
Estimate by
or for and and a user-specified real-valued function (one may simply set ).
-
4.
The resulting estimators for include the PIPW estimator , the POR estimator , and the PDR estimator
Under Model S12 and Assumptions 2.2 and S12, the PDR estimator satisfies doubly robustness, i.e. if either (i) the parametric working model is correctly specified, and the completeness assumption 2.2(b) holds, or (ii) the parametric working model is correctly specified, and Assumption 2.3(b) holds.
The kernel learning approach in Section S7 can similarly be employed to obtain semiparametric estimators for with flexible modeling for and .