This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Flexible sensitivity analysis for causal inference in observational studies subject to unmeasured confounding

Sizhu Lu and Peng Ding 111 Department of Statistics, University of California, Berkeley, CA 94720, USA (Emails: [email protected] and [email protected])
Abstract

Causal inference with observational studies often suffers from unmeasured confounding, yielding biased estimators based on the unconfoundedness assumption. Sensitivity analysis assesses how the causal conclusions change with respect to different degrees of unmeasured confounding. Most existing sensitivity analysis methods work well for specific types of statistical estimation or testing strategies. We propose a flexible sensitivity analysis framework that can deal with commonly-used inverse probability weighting, outcome regression, and doubly robust estimators simultaneously. It is based on the well-known parametrization of the selection bias as comparisons of the observed and counterfactual outcomes conditional on observed covariates. It is attractive for practical use because it only requires simple modifications of the standard estimators. Moreover, it naturally extends to many other causal inference settings, including the causal risk ratio or odds ratio, the average causal effect on the treated units, and studies with survival outcomes. We also develop an R package saci to implement our sensitivity analysis estimators.

Keywords: Double robustness; Efficient influence function; Ignorability; Inverse probability weighting; Potential outcome; Selection bias

1 Introduction

Learning causal effects is of great importance in empirical research, which, however, presents a persistent challenge for researchers. When randomized control trials are not feasible, we need to learn from the observational data to conduct causal inference. The standard method for identifying causal parameters, such as the average causal effect, hinges upon the unconfoundedness assumption. Unconfoundedness postulates that the potential outcomes are conditionally independent of the treatment assignment given the observed covariates. However, unconfoundedness is untestable and necessitates the measurement of all confounding factors associated with the treatment and outcome. The presence of unmeasured confounders yields bias in estimating the true causal effect, even with a substantial sample size.

Motivated by the concern of unmeasured confounding, sensitivity analysis becomes an attractive alternative to assess the impact of the unobserved confounders on the causal estimates. Rosenbaum and Rubin (1983a), Lin et al. (1998) and Imbens (2003) built parametric models to assess the impact of the unobserved confounder on the estimation of the average causal effect. Rosenbaum (1987) proposed a sensitivity analysis framework to test the sharp null hypothesis of no unit-level causal effects in matched observational studies, with a recent application by Zubizarreta et al. (2013). Cornfield et al. (1959) and Ding and VanderWeele (2016) derived inequalities to quantify the strength of the unmeasured confounding in order to explain away observed causal estimates based on risk ratios, which motivated the concept of E-value for observational research for causal inference (VanderWeele and Ding 2017). Zhao et al. (2019) and Dorn and Guo (2022) developed sensitivity analysis methods for the inverse propensity score weighting estimator.

The methods mentioned above for sensitivity analysis are useful for specific statistical estimation or testing strategies. However, they may be limited in their ability to simultaneously deal with various types of estimators, such as outcome regression-based estimators, inverse propensity score weighting estimators, and doubly robust estimators. To develop a comprehensive approach, we propose a unified framework for sensitivity analysis adaptable to different estimators. Our framework utilizes the sensitivity parameters defined as the ratio between the conditional means of potential outcome in two treatment groups given the observed covariates (Robins 1999). We provide nonparametric identification formulas for the average causal effect given the specified sensitivity parameters. Our framework facilitates sensitivity analysis for various standard estimators by providing closed-form modifications. Furthermore, we furnish practical guidance on calibrating and specifying sensitivity parameters to interpret the sensitivity analysis results.

1.1 A running example

Consider the causal impact of smoking on individuals’ health as a motivating example. Randomly assigning individuals to smoke is neither practical nor ethical, thus we need to rely on observational data to learn the causal effect of interest. Specifically, the observational study in Bazzano et al. (2003) explored whether cigarette smoking positively affects homocysteine levels, a known risk factor for cardiovascular disease. Drawing upon observations from the U.S. National Health and Nutrition Examination Survey (NHANES) 2005–2006, the study compares homocysteine levels between smokers and non-smokers while adjusting for observed covariates like age, gender, race, and body mass index (BMI) under the unconfoundedness assumption. However, in the presence of suspected underlying factors associated with both increased probability of smoking and elevated homocysteine levels, such as certain gene expressions, estimates adjusted solely for observed covariates are prone to bias. In such instances, conducting sensitivity analyses becomes essential to discern how the estimated causal effect changes in response to the presence of unmeasured confounders. Such analyses help determine the degree to which we need to deviate from the unconfoundedness assumption to attribute the estimated positive effect to unobserved confounders. We will revisit this example in Section 5.1 to illustrate our proposed methods.

1.2 Organiztion of the paper

The remainder of the paper is organized as follows. In Section 2, we review causal inference in observational studies under the unconfoundedness assumption. In Section 3, we introduce our sensitivity analysis framework and provide identification and estimation procedures. In Section 4, we provide practical suggestions for calibration and specification of the sensitivity parameters. In Section 5, we revisit the motivating example, apply the proposed method to this real-world application, and conduct a simulation study to evaluate the finite sample performance of our proposed methods. In Section 6, we discuss alternative sensitivity parameters as well as the extensions to other causal inference settings. In Section 7, we conclude. In the supplementary material, we present additional technical details.

2 Review of causal inference in unconfounded observational studies

Causal inference in observational studies is a central task in many disciplines. The potential outcomes framework is a leading approach to causal inference in statistics. Let Yi(1)Y_{i}(1) and Yi(0)Y_{i}(0) denote the potential outcomes if unit ii receives the treatment and control, respectively. Let ZiZ_{i} denote the binary treatment, with Zi=1Z_{i}=1 if unit ii receives the treatment and Zi=0Z_{i}=0 if unit ii receives the control, respectively. Let Yi=Yi(Zi)=ZiYi(1)+(1Zi)Yi(0)Y_{i}=Y_{i}(Z_{i})=Z_{i}Y_{i}(1)+(1-Z_{i})Y_{i}(0) denote the observed outcome, and XiX_{i} denote the observed covariates for unit ii. We focus on the standard setting with independent and identically distributed {Xi,Zi,Yi(1),Yi(0):i=1,,n}\{X_{i},Z_{i},Y_{i}(1),Y_{i}(0):i=1,\ldots,n\}, and will drop the index ii henceforth for the description of population quantities. We start with the average causal effect τ=E{Y(1)Y(0)}.\tau=E\{Y(1)-Y(0)\}. By the law of total probability, it decomposes into

τ\displaystyle\tau =\displaystyle= [E(YZ=1)pr(Z=1)+E{Y(1)Z=0}pr(Z=0)]\displaystyle\left[E(Y\mid Z=1)\textup{pr}(Z=1)+E\{Y(1)\mid Z=0\}\textup{pr}(Z=0)\right] (1)
[E{Y(0)Z=1}pr(Z=1)+E(YZ=0)pr(Z=0)].\displaystyle-\left[E\{Y(0)\mid Z=1\}\textup{pr}(Z=1)+E(Y\mid Z=0)\textup{pr}(Z=0)\right].

In (1), the fundamental difficulty is to estimate the counterfactual means E{Y(1)Z=0}E\{Y(1)\mid Z=0\} and E{Y(0)Z=1}E\{Y(0)\mid Z=1\}, which are not identifiable without further assumptions. Rubin (1978) and Rosenbaum and Rubin (1983b) proposed the following unconfoundedness assumption as a sufficient condition to identify τ\tau.

Assumption 1 (unconfoundedness).

Z{Y(1),Y(0)}XZ\begin{picture}(9.0,8.0)\put(0.0,0.0){\line(1,0){9.0}} \put(3.0,0.0){\line(0,1){8.0}} \put(6.0,0.0){\line(0,1){8.0}} \end{picture}\{Y(1),Y(0)\}\mid X.

Under Assumption 1, we can prove two identification formulas for τ\tau which write our causal parameter of interest in terms of the observed data distribution:

τ\displaystyle\tau =\displaystyle= E{μ1(X)μ0(X)}=E{ZYe(X)(1Z)Y1e(X)},\displaystyle E\{\mu_{1}(X)-\mu_{0}(X)\}\ =\ E\left\{\frac{ZY}{e(X)}-\frac{(1-Z)Y}{1-e(X)}\right\}, (2)

where μ1(X)=E(YZ=1,X)\mu_{1}(X)=E(Y\mid Z=1,X) and μ0(X)=E(YZ=0,X)\mu_{0}(X)=E(Y\mid Z=0,X) are the conditional means of the outcome given covariates under treatment and control, respectively, and e(X)=pr(Z=1X)e(X)=\textup{pr}(Z=1\mid X) is the conditional mean of the treatment given covariates, also called the propensity score (Rosenbaum and Rubin 1983b). The formulas in (2) rely on the overlap assumption: 0<e(X)<10<e(X)<1. We implicitly assume it, since the focus of this paper is on the unconfoundedness assumption.

The two plug-in estimators corresponding to the two identification formulas for τ\tau in (2) are the outcome regression estimator and the Horvitz–Thompson-type inverse propensity score weighting estimator, respectively:

τ^reg\displaystyle\hat{\tau}^{\textup{reg}} =\displaystyle= n1i=1n{μ^1(Xi)μ^0(Xi)},τ^ht=n1i=1n{ZiYie^(Xi)(1Zi)Yi1e^(Xi)},\displaystyle n^{-1}\sum_{i=1}^{n}\{\hat{\mu}_{1}(X_{i})-\hat{\mu}_{0}(X_{i})\},\quad\hat{\tau}^{\textup{ht}}\ =\ n^{-1}\sum_{i=1}^{n}\left\{\frac{Z_{i}Y_{i}}{\hat{e}(X_{i})}-\frac{(1-Z_{i})Y_{i}}{1-\hat{e}(X_{i})}\right\},

with e^(Xi)\hat{e}(X_{i}) and μ^z(Xi)\hat{\mu}_{z}(X_{i}) signifying fitted propensity score and outcome models. Moreover, we can construct the doubly robust estimator by combining both models (Bang and Robins 2005):

τ^dr=τ^reg+n1i=1n{ZiYˇie^(Xi)(1Zi)Yˇi1e^(Xi)}=τ^htn1i=1nZˇi{μ^1(Xi)e^(Xi)+μ^0(Xi)1e^(Xi)},\hat{\tau}^{\textup{dr}}\ =\ \hat{\tau}^{\textup{reg}}+n^{-1}\sum_{i=1}^{n}\left\{\frac{Z_{i}\check{Y}_{i}}{\hat{e}(X_{i})}-\frac{(1-Z_{i})\check{Y}_{i}}{1-\hat{e}(X_{i})}\right\}\ =\ \hat{\tau}^{\textup{ht}}-n^{-1}\sum_{i=1}^{n}\check{Z}_{i}\left\{\frac{\hat{\mu}_{1}(X_{i})}{\hat{e}(X_{i})}+\frac{\hat{\mu}_{0}(X_{i})}{1-\hat{e}(X_{i})}\right\}, (3)

where Yˇi=Yiμ^Zi(Xi)\check{Y}_{i}=Y_{i}-\hat{\mu}_{Z_{i}}(X_{i}) and Zˇi=Zie^(Xi)\check{Z}_{i}=Z_{i}-\hat{e}(X_{i}) are the residuals from the outcome model and propensity score model, respectively. The two equivalent forms in (3) highlight that we can view τ^dr\hat{\tau}^{\textup{dr}} as a modification of either τ^reg\hat{\tau}^{\textup{reg}} or τ^ht\hat{\tau}^{\textup{ht}}: it modifies τ^reg\hat{\tau}^{\textup{reg}} by inverse propensity score weighted residuals, and augments τ^ht\hat{\tau}^{\textup{ht}} by the imputed outcomes.

3 Sensitivity analysis with unmeasured confounding

3.1 Motivation for sensitivity analysis and parametrization of unmeasured confounding

Assumption 1 is strong and untestable. Observational studies are often plagued with unmeasured confounding, that is, the existence of some hidden variables UU that affect both the treatment and outcome simultaneously. Under these circumstances, we need to conduct sensitivity analyses to assess the impact of UU on the causal estimates. Existing methods only serve purposes in specific statistical estimation or testing strategies, as reviewed in Section 1. To provide a unified framework that can deal with the standard estimators τ^reg\hat{\tau}^{\textup{reg}}, τ^ht\hat{\tau}^{\textup{ht}} and τ^dr\hat{\tau}^{\textup{dr}} reviewed in Section 2 simultaneously, we adopt the following parametrization of confounding.

Definition 1.

Define

E{Y(1)Z=1,X}E{Y(1)Z=0,X}=ε1(X),E{Y(0)Z=1,X}E{Y(0)Z=0,X}=ε0(X).\displaystyle\frac{E\{Y(1)\mid Z=1,X\}}{E\{Y(1)\mid Z=0,X\}}=\varepsilon_{1}(X),\quad\frac{E\{Y(0)\mid Z=1,X\}}{E\{Y(0)\mid Z=0,X\}}=\varepsilon_{0}(X).

In Definition 1, ε1(X)\varepsilon_{1}(X) and ε0(X)\varepsilon_{0}(X) are two sensitivity parameters. In sensitivity analysis, we can first fix them to obtain the corresponding estimators and then vary them within a range to obtain a sequence of estimators. Definition 1, as well as the theory below, allows ε1(X)\varepsilon_{1}(X) and ε0(X)\varepsilon_{0}(X) to depend on the covariates. For simplicity of implementation, we can further assume that they are not functions of the covariates.

The parametrization in Definition 1 compares the observable conditional means of the potential outcomes, E{Y(1)Z=1,X}E\{Y(1)\mid Z=1,X\} and E{Y(0)Z=0,X}E\{Y(0)\mid Z=0,X\}, with the corresponding counterfactual conditional means of the potential outcomes, E{Y(1)Z=0,X}E\{Y(1)\mid Z=0,X\} and E{Y(0)Z=1,X}E\{Y(0)\mid Z=1,X\}, respectively. It quantifies the violation of unconfoundedness in Assumption 1. Technically, the form of unconfoundedness in Assumption 1 is stronger than we need if the parameter of interest is τ\tau. When ε1(X)=ε0(X)=1\varepsilon_{1}(X)=\varepsilon_{0}(X)=1 in Definition 1, we can recover all identification and estimation results for τ\tau in Section 2. We revisit the running example in Section 1.1. If suspected factors associated with both increased probability of smoking and elevated homocysteine levels exist, under Definition 1, ε1(X)>1\varepsilon_{1}(X)>1 and ε0(X)>1\varepsilon_{0}(X)>1, leading to an upward bias of estimators of τ\tau constructed under the unconfoundedness assumption. The sensitivity parameters in Definition 1 are also related to the parametrization based on the treatment selection model pr{Z=zX,Y(z)}\textup{pr}\{Z=z\mid X,Y(z)\} for z=0,1z=0,1. We provide more discussion in Section S1.8 in the supplementary material.

Robins (1999) and Scharfstein et al. (1999) discussed the potential use of the parametrization in Definition 1. Similar parametrizations also appeared in other settings for causal inference (Tchetgen and Shpitser 2012; VanderWeele et al. 2014; Ding and Lu 2017; Yang and Lok 2018; Jiang et al. 2022). The confounding function in Kasza et al. (2017) also adopts ratios between the marginal means, E{Y(1)Z=1}/E{Y(1)Z=0}E\{Y(1)\mid Z=1\}/E\{Y(1)\mid Z=0\}, in the special case of binary outcomes, without specifying it as a flexible function of the observed covariates. Franks et al. (2020) used a similar parameterization in Bayesian inference and pointed out that the parametrization in Definition 1 does not impose any testable restrictions on the observed data distributions. Nevertheless, despite the attractiveness of this parametrization, the statistical theory for sensitivity analysis is still missing for the standard estimators in the canonical setting of observational studies.

3.2 Identification and estimation under sensitivity analysis

This subsection will present the key identification and estimation results for τ\tau, in parallel with the results under the unconfoundedness assumption. In particular, by setting ε1(X)=ε0(X)=1\varepsilon_{1}(X)=\varepsilon_{0}(X)=1 in the formulas below, we can recover the corresponding results under the unconfoundedness assumption. We will provide theory and methods for the outcome regression, the inverse propensity score weighting, and the doubly robust estimators for τ\tau under Definition 1.

Theorem 1 (outcome regression).

Under Definition 1, we have

E{Y(1)Z=0}\displaystyle E\{Y(1)\mid Z=0\} =\displaystyle= E{μ1(X)/ε1(X)Z=0},\displaystyle E\left\{\mu_{1}(X)/\varepsilon_{1}(X)\mid Z=0\right\}, (4)
E{Y(0)Z=1}\displaystyle E\{Y(0)\mid Z=1\} =\displaystyle= E{μ0(X)ε0(X)Z=1}\displaystyle E\left\{\mu_{0}(X)\varepsilon_{0}(X)\mid Z=1\right\} (5)

and therefore

τ\displaystyle\tau =\displaystyle= E{ZY+(1Z)μ1(X)/ε1(X)}E{Zμ0(X)ε0(X)+(1Z)Y}\displaystyle E\{ZY+(1-Z)\mu_{1}(X)/\varepsilon_{1}(X)\}-E\{Z\mu_{0}(X)\varepsilon_{0}(X)+(1-Z)Y\} (6)
=\displaystyle= E{Zμ1(X)+(1Z)μ1(X)/ε1(X)}E{Zμ0(X)ε0(X)+(1Z)μ0(X)}.\displaystyle E\{Z\mu_{1}(X)+(1-Z)\mu_{1}(X)/\varepsilon_{1}(X)\}-E\{Z\mu_{0}(X)\varepsilon_{0}(X)+(1-Z)\mu_{0}(X)\}. (7)

In Theorem 1, we essentially get the identification for E{Y(1)}E\{Y(1)\} and E{Y(0)}E\{Y(0)\} from the identification formulas for the two counterfactual conditional means in (4) and (5), which also lead to the two identification formulas for τ\tau in (6) and (7). Under the unconfoundedness assumption, ε1(X)=ε0(X)=1\varepsilon_{1}(X)=\varepsilon_{0}(X)=1, (7) reduces to the first identification formula in (2). For general εz(X)\varepsilon_{z}(X), we need to reweight the conditional means μz(X)\mu_{z}(X) in the identification of the counterfactual means E{Y(z)Z=1z}E\{Y(z)\mid Z=1-z\}.

With the fitted outcome model, we can construct two plug-in estimators corresponding to (6) and (7) called the predictive and projective estimators for τ\tau, respectively:

τ^pred\displaystyle\hat{\tau}^{\textup{pred}} =\displaystyle= n1i=1n{ZiYi+(1Zi)μ^1(Xi)/ε1(Xi)}n1i=1n{Ziμ^0(Xi)ε0(Xi)+(1Zi)Yi},\displaystyle n^{-1}\sum_{i=1}^{n}\left\{Z_{i}Y_{i}+(1-Z_{i})\hat{\mu}_{1}(X_{i})/\varepsilon_{1}(X_{i})\right\}-n^{-1}\sum_{i=1}^{n}\left\{Z_{i}\hat{\mu}_{0}(X_{i})\varepsilon_{0}(X_{i})+(1-Z_{i})Y_{i}\right\},

and

τ^proj\displaystyle\hat{\tau}^{\textup{proj}} =\displaystyle= n1i=1n{Ziμ^1(Xi)+(1Zi)μ^1(Xi)ε1(Xi)}n1i=1n{Ziμ^0(Xi)ε0(Xi)+(1Zi)μ^0(Xi)}.\displaystyle n^{-1}\sum_{i=1}^{n}\left\{Z_{i}\hat{\mu}_{1}(X_{i})+\frac{(1-Z_{i})\hat{\mu}_{1}(X_{i})}{\varepsilon_{1}(X_{i})}\right\}-n^{-1}\sum_{i=1}^{n}\left\{Z_{i}\hat{\mu}_{0}(X_{i})\varepsilon_{0}(X_{i})+(1-Z_{i})\hat{\mu}_{0}(X_{i})\right\}.

The terminologies “predictive” and “projective” are from the survey sampling literature (Firth and Bennett 1998; Ding and Li 2018). The estimators τ^pred\hat{\tau}^{\textup{pred}} and τ^proj\hat{\tau}^{\textup{proj}} differ slightly: the former uses the observed outcomes when available; in contrast, the latter replaces the observed outcomes with the fitted values.

More interestingly, we can also identify τ\tau by an inverse propensity score weighting formula although Definition 1 only involves the conditional means of the potential outcomes.

Theorem 2 (inverse propensity score weighting).

Under Definition 1, we have

E{Y(1)}=E{w1(X)Ze(X)Y},E{Y(0)}=E{w0(X)1Z1e(X)Y},\displaystyle E\{Y(1)\}=E\left\{w_{1}(X)\frac{Z}{e(X)}Y\right\},\quad E\{Y(0)\}=E\left\{w_{0}(X)\frac{1-Z}{1-e(X)}Y\right\}, (8)

where

w1(X)=e(X)+{1e(X)}/ε1(X),w0(X)=e(X)ε0(X)+1e(X).w_{1}(X)=e(X)+\{1-e(X)\}/\varepsilon_{1}(X),\quad w_{0}(X)=e(X)\varepsilon_{0}(X)+1-e(X).

Theorem 2 modifies the classic inverse probability weighting formulas with two extra factors w1(X)w_{1}(X) and w0(X)w_{0}(X) depending on both the propensity score and the sensitivity parameters. With the fitted propensity score model, (8) motivates the following estimator for τ\tau:

τ^ht=n1i=1nw^1(Xi)ZiYie^(Xi)n1i=1nw^0(Xi)(1Zi)Yi1e^(Xi),\displaystyle\hat{\tau}^{\textup{ht}}=n^{-1}\sum_{i=1}^{n}\hat{w}_{1}(X_{i})\frac{Z_{i}Y_{i}}{\hat{e}(X_{i})}-n^{-1}\sum_{i=1}^{n}\hat{w}_{0}(X_{i})\frac{(1-Z_{i})Y_{i}}{1-\hat{e}(X_{i})},

where w^1(Xi)=e^(Xi)+{1e^(Xi)}/ε1(Xi)\hat{w}_{1}(X_{i})=\hat{e}(X_{i})+\{1-\hat{e}(X_{i})\}/\varepsilon_{1}(X_{i}) and w^0(Xi)=e^(Xi)ε0(Xi)+1e^(Xi)\hat{w}_{0}(X_{i})=\hat{e}(X_{i})\varepsilon_{0}(X_{i})+1-\hat{e}(X_{i}) are the estimated weights.

The identification formulas based on outcome regression and inverse propensity score weighting motivate us to develop a combined strategy that leads to doubly robust estimation. Following Bang and Robins (2005), we calculate the efficient influence function for τ\tau. The details of the efficient influence function are in Bickel et al. (1993), and the proof of Theorem 3 is in Section S3.3 in the supplementary material. Readers focusing only on applying the method do not need to understand the efficient influence function to understand our proposed estimation procedure.

Theorem 3 (efficient influence functions).

Under Definition 1, the efficient influence function for E{Y(1)}E\{Y(1)\} is

ϕ1=w1(X)Ze(X)Y{Ze(X)}μ1(X)e(X)ε1(X)E{Y(1)},\phi_{1}=w_{1}(X)\frac{Z}{e(X)}Y-\frac{\{Z-e(X)\}\mu_{1}(X)}{e(X)\varepsilon_{1}(X)}-E\{Y(1)\},

the efficient influence function for E{Y(0)}E\{Y(0)\} is

ϕ0=w0(X)1Z1e(X)Y{e(X)Z}μ0(X)ε0(X)1e(X)E{Y(0)},\phi_{0}=w_{0}(X)\frac{1-Z}{1-e(X)}Y-\frac{\{e(X)-Z\}\mu_{0}(X)\varepsilon_{0}(X)}{1-e(X)}-E\{Y(0)\},

so the efficient influence function for τ\tau is ϕ1ϕ0.\phi_{1}-\phi_{0}.

The efficient influence function for τ\tau has mean 0 by definition, so τ\tau has the following representation:

τ=E[w1(X)Ze(X)Y{Ze(X)}μ1(X)e(X)ε1(X)]E[w0(X)1Z1e(X)Y{e(X)Z}μ0(X)ε0(X)1e(X)]\tau=E\left[w_{1}(X)\frac{Z}{e(X)}Y-\frac{\{Z-e(X)\}\mu_{1}(X)}{e(X)\varepsilon_{1}(X)}\right]-E\left[w_{0}(X)\frac{1-Z}{1-e(X)}Y-\frac{\{e(X)-Z\}\mu_{0}(X)\varepsilon_{0}(X)}{1-e(X)}\right]

which motivates the following estimator based on the fitted propensity score and outcome models:

τ^dr=n1i=1n{w^1(Xi)ZiYie^(Xi)Zˇiμ^1(Xi)e^(Xi)ε1(Xi)w^0(Xi)(1Zi)Yi1e^(Xi)Zˇiμ^0(Xi)ε0(Xi)1e^(Xi)}.\hat{\tau}^{\textup{dr}}=n^{-1}\sum_{i=1}^{n}\left\{\hat{w}_{1}(X_{i})\frac{Z_{i}Y_{i}}{\hat{e}(X_{i})}-\frac{\check{Z}_{i}\hat{\mu}_{1}(X_{i})}{\hat{e}(X_{i})\varepsilon_{1}(X_{i})}-\hat{w}_{0}(X_{i})\frac{(1-Z_{i})Y_{i}}{1-\hat{e}(X_{i})}-\frac{\check{Z}_{i}\hat{\mu}_{0}(X_{i})\varepsilon_{0}(X_{i})}{1-\hat{e}(X_{i})}\right\}.

It is numerically identical to the following forms

τ^dr\displaystyle\hat{\tau}^{\textup{dr}} =\displaystyle= τ^htn1i=1nZˇi{μ^1(Xi)e^(Xi)ε1(Xi)+μ^0(Xi)ε0(Xi)1e^(Xi)}\displaystyle\hat{\tau}^{\textup{ht}}-n^{-1}\sum_{i=1}^{n}\check{Z}_{i}\left\{\frac{\hat{\mu}_{1}(X_{i})}{\hat{e}(X_{i})\varepsilon_{1}(X_{i})}+\frac{\hat{\mu}_{0}(X_{i})\varepsilon_{0}(X_{i})}{1-\hat{e}(X_{i})}\right\}
=\displaystyle= τ^pred+n1i=1n{1e^(Xi)ε1(Xi)ZiYˇie^(Xi)e^(Xi)ε0(Xi)(1Zi)Yˇi1e^(Xi)}\displaystyle\hat{\tau}^{\textup{pred}}+n^{-1}\sum_{i=1}^{n}\left\{\frac{1-\hat{e}(X_{i})}{\varepsilon_{1}(X_{i})}\frac{Z_{i}\check{Y}_{i}}{\hat{e}(X_{i})}-\hat{e}(X_{i})\varepsilon_{0}(X_{i})\frac{(1-Z_{i})\check{Y}_{i}}{1-\hat{e}(X_{i})}\right\}
=\displaystyle= τ^proj+n1i=1n{w^1(Xi)ZiYˇie^(Xi)w^0(Xi)(1Zi)Yˇi1e^(Xi)},\displaystyle\hat{\tau}^{\textup{proj}}+n^{-1}\sum_{i=1}^{n}\left\{\hat{w}_{1}(X_{i})\frac{Z_{i}\check{Y}_{i}}{\hat{e}(X_{i})}-\hat{w}_{0}(X_{i})\frac{(1-Z_{i})\check{Y}_{i}}{1-\hat{e}(X_{i})}\right\},

which, similar to (3), augments τ^ht\hat{\tau}^{\textup{ht}} by imputed outcomes and modifies τ^pred\hat{\tau}^{\textup{pred}} and τ^proj\hat{\tau}^{\textup{proj}} by inverse propensity score weighted residuals. Importantly, Theorem 4 below shows that τ^dr\hat{\tau}^{\textup{dr}} is doubly robust.

Theorem 4 (double robustness).

Under Definition 1, the estimator τ^dr\hat{\tau}^{\textup{dr}} is consistent if either e^(X)\hat{e}(X) is consistent to e(X)e(X) or μ^z(X)\hat{\mu}_{z}(X) is consistent to μz(X)\mu_{z}(X), for z=0,1z=0,1.

Under standard regularity conditions, the doubly robust estimator τ^dr\hat{\tau}^{\textup{dr}} is asymptotically normally distributed with mean τ\tau and variance achieving the variance of the efficient influence function. Therefore, we can conduct statistical inference based on the closed-form plug-in variance estimator or the nonparametric bootstrap variance estimator of τ^dr\hat{\tau}^{\textup{dr}}. We can also construct double machine learning estimators and apply the cross-fitting technique to achieve the same asymptotic distribution under a relaxed condition on the complexity of the spaces of the nuisance parameters (Chernozhukov et al. 2018).

4 Calibration and specification of the sensitivity parameters

4.1 Calibration of the sensitivity parameters

Specifying the ranges of the sensitivity parameters is fundamentally challenging in causal inference because the observed data do not provide direct information on the sensitivity parameters. We provide a method to calibrate the sensitivity parameters using the observed covariates. The identification and estimation procedures rely on the unidentifiable sensitivity parameters (ε1(X),ε0(X))(\varepsilon_{1}(X),\varepsilon_{0}(X)). Calibration based on the observed covariates provides us guidance on their sizes, helping us to understand their relative magnitudes in the analytical framework. Zhang and Small (2020) proposed a calibration method based on the observed covariates as a benchmark to understand how sensitive the estimates are with respect to an unobserved confounder in matched observational studies. In our sensitivity analysis framework, we do not directly model the unobserved confounders or their association with the treatment or potential outcome. Also, our proposed sensitivity parameter depends on the set of observed covariates XX we have controlled for, thus it is challenging to directly characterize the corresponding calibration of the observed covariates. Below we provide a leave-one-covariate-out approach to calibrate the sensitivity parameters. Similar approaches appeared in other sensitivity analysis methods (Chapter 17 in Ding 2024).

For each observed covariate XjX_{j} in the pp-dimensional X=(X1,,Xp)X=(X_{1},\ldots,X_{p}), j=1,,pj=1,\ldots,p, we first drop it as if it is an unobserved confounder, then estimate the value of sensitivity parameters with XjX_{j} unobserved. Under the unconfoundedness Assumption 1, we have

εz(Xj)\displaystyle\varepsilon_{z}(X_{-j}) =\displaystyle= E{Y(z)Z=1,Xj}E{Y(z)Z=0,Xj}=E{μz(X)Z=1,Xj}E{μz(X)Z=0,Xj}\displaystyle\frac{E\{Y(z)\mid Z=1,X_{-j}\}}{E\{Y(z)\mid Z=0,X_{-j}\}}\ =\ \frac{E\{\mu_{z}(X)\mid Z=1,X_{-j}\}}{E\{\mu_{z}(X)\mid Z=0,X_{-j}\}}

for z=0,1z=0,1, where XjX_{-j} denotes the p1p-1 dimensional observed covariates after dropping covariate XjX_{j}. The εz(Xj)\varepsilon_{z}(X_{-j}) measures how much the observed data deviates from the unconfoundedness assumption when we delete an observed confounder XjX_{j}, and can be estimated using observed data. Thus, we can summarize the distribution of εz(Xj)\varepsilon_{z}(X_{-j}) to characterize how strong a contribution each covariate XjX_{j} has as a confounder. To summarize εz(Xj)\varepsilon_{z}(X_{-j}) from a function of XjX_{-j} to a scalar, we can further marginalize over the distribution of XjX_{-j} to compute the mean of sensitivity parameter ignoring XjX_{j}, or use other summary statistics such as maximum, minimum, upper or lower quantiles of the estimated distribution to guide us on the interpretation of the magnitude of the sensitivity parameter εz(X)\varepsilon_{z}(X). We will illustrate this calibration method in Section 5.1. In addition, we can also calibrate using a subset of covariates and provide a leave-multiple-covariates-out approach. The definition, identification, and estimation of the sensitivity parameters εz(X𝒮)\varepsilon_{z}(X_{-\mathcal{S}}) similarly follow from the unconfoundedness assumption, where 𝒮{1,,p}\mathcal{S}\subset\{1,\ldots,p\} denotes the indices of the subset of covariates we use for calibration.

4.2 Practical suggestions for specifying the sensitivity parameters

In this subsection, we provide practical suggestions on specifying the sensitivity parameters and reporting the results. When the sensitivity parameters are constants not depending on XX, i.e., ε1(X)=ε1\varepsilon_{1}(X)=\varepsilon_{1} and ε0(X)=ε0\varepsilon_{0}(X)=\varepsilon_{0}, the proposed estimators of τ\tau monotonically decrease in both ε1\varepsilon_{1} and ε0\varepsilon_{0}. Therefore, it often suffices to present one side of the parameters for sensitivity analysis if we are interested in how strong the unobserved confounding should be to explain away the estimated causal effect under the unconfoundedness assumption. For instance, if the estimated causal effect under unconfoundedness is positive, researchers can report a table of estimated average causal effects for different (ε1,ε0)(\varepsilon_{1},\varepsilon_{0}) with ε11\varepsilon_{1}\geq 1 and ε01\varepsilon_{0}\geq 1. Such table shows how robust the positive result is when we increase the sensitivity parameters, and presents how large the sensitivity parameters should be to overturn the estimated significant positive effect, while the other direction with small sensitivity parameters is usually of less interest since it will only increase the estimated positive effect. For a similar reason, when the estimated average causal effect is negative under the unconfoundedness assumption, we suggest reporting results with ε11\varepsilon_{1}\leq 1 and ε01\varepsilon_{0}\leq 1. However, in the case when we would like to learn how strong the unobserved confounding is needed to match some previously estimated causal effects with larger magnitude, the other direction may be of interest as well.

In our proposed framework, we allow for the dependence of the sensitivity parameters on observed covariates. When the sensitivity parameters depend on XX, two alternative approaches can be pursued: modeling the sensitivity parameters explicitly or adopting a worst-case interpretation. In the first approach, we can assume εz(X)=exp(αz+βzTX)\varepsilon_{z}(X)=\exp(\alpha_{z}+\beta_{z}^{{\mathrm{\scriptscriptstyle T}}}X) for z=0,1z=0,1, and specify the value of εz(X)\varepsilon_{z}(X) based on domain knowledge about the parameters (αz,βz)(\alpha_{z},\beta_{z}). In the second approach, under the special scenarios when the outcome is non-negative (e.g., binary), treating the estimated results with constant values (ε1,ε0)(\varepsilon_{1},\varepsilon_{0}) serves as a worst-case bound for the true average causal effect. Here, εz\varepsilon_{z} is determined as either min(εz(X))\min(\varepsilon_{z}(X)) or max(εz(X))\max(\varepsilon_{z}(X)), depending on the direction of the average causal effect. We give a formal result below.

Proposition 1.

Let εz,l\varepsilon_{z,\textsc{l}} and εz,u\varepsilon_{z,\textsc{u}} denote the lower and upper bound of εz(X)\varepsilon_{z}(X), i.e., εz(X)[εz,l,εz,u]\varepsilon_{z}(X)\in[\varepsilon_{z,\textsc{l}},\varepsilon_{z,\textsc{u}}] for z=0,1z=0,1. In the special case when the potential outcomes are non-negative (e.g., binary), we have τ[τl,τu]\tau\in[\tau_{\textsc{l}},\tau_{\textsc{u}}], where

τl\displaystyle\tau_{\textsc{l}} =\displaystyle= E{Zμ1(X)+(1Z)μ1(X)ε1,u}E{Zμ0(X)ε0,u+(1Z)μ0(X)}\displaystyle E\left\{Z\mu_{1}(X)+\frac{(1-Z)\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}}\right\}-E\left\{Z\mu_{0}(X)\varepsilon_{0,\textsc{u}}+(1-Z)\mu_{0}(X)\right\}
=\displaystyle= E[{e(X)+1e(X)ε1,u}ZYe(X)]E[{e(X)ε0,u+1e(X)}(1Z)Y1e(X)]\displaystyle E\left[\left\{e(X)+\frac{1-e(X)}{\varepsilon_{1,\textsc{u}}}\right\}\frac{ZY}{e(X)}\right]-E\left[\left\{e(X)\varepsilon_{0,\textsc{u}}+1-e(X)\right\}\frac{(1-Z)Y}{1-e(X)}\right]
=\displaystyle= E[{e(X)+1e(X)ε1,u}ZYe(X){Ze(X)}μ1(X)e(X)ε1,u]\displaystyle E\left[\left\{e(X)+\frac{1-e(X)}{\varepsilon_{1,\textsc{u}}}\right\}\frac{ZY}{e(X)}-\frac{\left\{Z-e(X)\right\}\mu_{1}(X)}{e(X)\varepsilon_{1,\textsc{u}}}\right]
E[{e(X)ε0,u+1e(X)}(1Z)Y1e(X){e(X)Z}μ0(X)ε0,u1e(X)],\displaystyle-E\left[\left\{e(X)\varepsilon_{0,\textsc{u}}+1-e(X)\right\}\frac{(1-Z)Y}{1-e(X)}-\frac{\left\{e(X)-Z\right\}\mu_{0}(X)\varepsilon_{0,\textsc{u}}}{1-e(X)}\right],

and τu\tau_{\textsc{u}} is computed using the same formulas as τl\tau_{\textsc{l}}, with the replacement of ε1,u\varepsilon_{1,\textsc{u}} and ε0,u\varepsilon_{0,\textsc{u}} by ε1,l\varepsilon_{1,\textsc{l}} and ε0,l\varepsilon_{0,\textsc{l}}, respectively.

Proposition 1 gives bounds based on the three identification formulas in Theorems 13. Define τ^l\hat{\tau}^{*}_{\textsc{l}} and τ^u\hat{\tau}^{*}_{\textsc{u}} as estimators using the same formula as τ^\hat{\tau}^{*} by replacing (ε1(Xi),ε0(Xi))(\varepsilon_{1}(X_{i}),\varepsilon_{0}(X_{i})) by (ε1,u,ε0,u)(\varepsilon_{1,\textsc{u}},\varepsilon_{0,\textsc{u}}) and (ε1,l,ε0,l)(\varepsilon_{1,\textsc{l}},\varepsilon_{0,\textsc{l}}), respectively, for =pred, proj, ht, dr*=\textup{pred, proj, ht, dr}. Following from Proposition 1, we can treat τ^l\hat{\tau}^{*}_{\textsc{l}} and τ^u\hat{\tau}^{*}_{\textsc{u}} as estimators of lower and upper bounds of τ\tau, respectively. When the potential outcomes are non-positive, we have analogous results to Proposition 1.

Suppose our estimated average causal effect under the unconfoundedness assumption is positive, we focus on ε1(X)1\varepsilon_{1}(X)\geq 1, ε0(X)1\varepsilon_{0}(X)\geq 1 to answer the question of how large the sensitivity parameters have to be to explain away the estimated positive causal effect. Looping over the pre-specified lists of constant ε1\varepsilon_{1} and ε0\varepsilon_{0}, we are finding how large the lower bounds of ε1(X)\varepsilon_{1}(X) and ε0(X)\varepsilon_{0}(X) are to explain away the positive effect.

5 Application and simulation study

5.1 Application

We apply the proposed sensitivity analysis method to a real-world application. We re-analyze the observational study in Bazzano et al. (2003) to study whether cigarette smoking has a causal effect on homocysteine levels, the elevation of which is considered a risk factor for cardiovascular disease. We compare the homocysteine levels in daily smokers and non-smokers based on the NHANES 2005–2006 data. The NHANES is a series of surveys whose target population is the civilian, noninstitutionalized U.S. population. Since 1999, the NHANES has been conducted annually by interviewing individuals in their homes and asking the responders to complete the health examination competent of the survey. We use the cross-sectional data collected in the NHANES 2005–2006, which includes observed covariates such as gender, age, education level, BMI, and a measure of poverty. Consider the daily smokers to be in the treated group and the never smokers to be in the control group. The baseline homocysteine level in the control group is 7.86 umol/L. Under the unconfoundedness assumption with ε1(X)=ε0(X)=1\varepsilon_{1}(X)=\varepsilon_{0}(X)=1, the estimated τ\tau using the doubly robust estimator is 1.48 with a 95% confidence interval (0.78, 2.18). We use the logit model for the propensity score and the linear model for the outcome, respectively, and use the nonparametric bootstrap method to estimate the variance. Despite a large collection of observed covariates collected in the survey, unmeasured factors such as the lifestyle of their spouses or parents, work intensity, stress level, and certain gene expressions possibly contribute to both a higher probability of smoking and higher homocysteine level. With the concern about the presence of unmeasured confounders, we conduct sensitivity analysis using the doubly robust estimator τ^dr\hat{\tau}^{\textup{dr}}. In this example, it is challenging to specify a correct model of sensitivity parameters as a function of XX, and the outcome of interest is non-negative. Thus, we adopt the conservative worst-case interpretation in Proposition 1, assume the sensitivity parameters ε1(X)\varepsilon_{1}(X) and ε0(X)\varepsilon_{0}(X) do not depend on XX, and vary both of them from 0.75 to 1.25. Table 1 reports the results of the doubly robust estimator and its 95% confidence interval for different (ε1,ε0)(\varepsilon_{1},\varepsilon_{0}) where we again use nonparametric bootstrap to estimate the variance of τ^dr\hat{\tau}^{\textup{dr}}. We omit the detailed results for small ε1\varepsilon_{1} and ε0\varepsilon_{0} since they all give significantly positive results at the 0.05 level. As ε1\varepsilon_{1} and ε0\varepsilon_{0} increase, the significance level decreases, and the direction of the point estimator changes when both are very large.

However, the length of the confidence interval is not a monotone function of ε1(X)\varepsilon_{1}(X) and ε0(X)\varepsilon_{0}(X) since the asymptotic variance of τ^dr\hat{\tau}^{\textup{dr}} under our sensitivity analysis framework is a complicated function of the sensitivity parameters. In Section S1.7 in the supplementary material, we give the specific form of the semiparametric efficiency bound for τ\tau based on its efficient influence function in Theorem 3.

Figure 1 shows the contour plot of the estimated average causal effect using the doubly robust estimator τ^dr\hat{\tau}^{\textup{dr}} with various values of ε1\varepsilon_{1} and ε0\varepsilon_{0}, where the contour lines are generated through grid search. We also implement the leave-one-covariate-out calibration method proposed in Section 4.1 in the example. For each observed covariate XjX_{j}, we label the maximum estimated value of ε1(Xj)\varepsilon_{1}(X_{-j}) and ε0(Xj)\varepsilon_{0}(X_{-j}) in the contour plot. This suggests that to explain away the estimated positive causal effect, the unobserved confounder has to be stronger than all observed covariates. Figure 1 also shows that under Assumption 1, covariates such as BMI and poverty are weak confounders with ε1(X)\varepsilon_{1}(X) and ε0(X)\varepsilon_{0}(X) in the range of ε1(XBMI)[1.002,1.020]\varepsilon_{1}(X_{-\textup{BMI}})\in[1.002,1.020], ε0(XBMI)[0.998,1.001]\varepsilon_{0}(X_{-\textup{BMI}})\in[0.998,1.001] and ε1(Xpoverty)[0.972,1.026]\varepsilon_{1}(X_{-\textup{poverty}})\in[0.972,1.026], ε0(Xpoverty)[0.998,1.002]\varepsilon_{0}(X_{-\textup{poverty}})\in[0.998,1.002], respectively.

Table 1: Sensitivity analysis for τ\tau in the homocysteine observational study
ε1\varepsilon_{1}
0.90 1.00 1.10 1.20 1.25
ε0\varepsilon_{0} 0.90 2.48 (1.70, 3.27) 1.65 (0.97, 2.33) 0.98 (0.32, 1.63) 0.41 (-0.21, 1.03) 0.16 (-0.45, 0.77)
1.00 2.31 (1.58, 3.04) 1.48 (0.78, 2.18) 0.80 (0.12, 1.49) 0.24 (-0.39, 0.87) -0.01 (-0.62, 0.60)
1.10 2.14 (1.38, 2.89) 1.31 (0.62, 2.00) 0.63 (0.01, 1.25) 0.07 (-0.56, 0.69) -0.18 (-0.76, 0.40)
1.20 1.96 (1.13, 2.79) 1.14 (0.42, 1.85) 0.46 (-0.16, 1.07) -0.11 (-0.73, 0.52) -0.36 (-0.97, 0.26)
1.25 1.88 (1.10, 2.65) 1.05 (0.35, 1.75) 0.37 (-0.30, 1.04) -0.19 (-0.80, 0.41) -0.44 (-1.06, 0.17)
Refer to caption
Figure 1: Sensitivity analysis for τ\tau in the homocysteine observational study with calibration of the sensitivity parameters

5.2 Simulation study

In this subsection, we conduct a simulation study to evaluate the finite sample performance of our proposed sensitivity analysis framework with the doubly robust estimator.

We generate the data using the following data-generating process with sample size n=500n=500. We first generate observed covariates X=(X1,X2,X3)TX=(X_{1},X_{2},X_{3})^{\mathrm{\scriptscriptstyle T}} with independent Xj𝒩(0,0.52)X_{j}\sim\mathcal{N}(0,0.5^{2}) for j=1,2j=1,2 and X3Bernoulli(0.5)X_{3}\sim\textup{Bernoulli}(0.5) and a binary unobserved confounder UBernoulli(0.5)U\sim\textup{Bernoulli}(0.5). Next generate the treatment Z(X,U)Bernoulli(0.25+0.5U)Z\mid(X,U)\sim\textup{Bernoulli}(0.25+0.5U). Then generate the potential outcomes with log-normal conditional distribution, log{Y(1)}=X1+X2+bU+e1\log\{Y(1)\}=X_{1}+X_{2}+bU+e_{1} and log{Y(0)}=X1+X2+bX3+e0\log\{Y(0)\}=X_{1}+X_{2}+bX_{3}+e_{0}, where e1e_{1} and e0e_{0} are independent 𝒩(0,0.52)\mathcal{N}(0,0.5^{2}). Under this data-generating process, the population average causal effect τ\tau equals 0. Since UU is an unobserved confounder of Y(1)Y(1) and ZZ conditional on XX, the unconfoundedness assumption is violated, thus estimators constructed using merely observed variables will be asymptotically biased, leading to invalid inference. Under the log-normal model of the potential outcomes and the Bernoulli treatment assignment, the true ε1(X)={0.75exp(b)+0.25}/{0.25exp(b)+0.75}\varepsilon_{1}(X)=\{0.75\exp(b)+0.25\}/\{0.25\exp(b)+0.75\}, and ε0(X)=1\varepsilon_{0}(X)=1 because the distribution of log{Y(0)}\log\{Y(0)\} does not depend on UU. We evaluate the performance of τ^dr\hat{\tau}^{\textup{dr}} with various values of b(0,0.2,0.3,0.5,1,1.5)b\in(0,0.2,0.3,0.5,1,1.5). For b>0b>0, the true ε1(X)>1\varepsilon_{1}(X)>1 and assuming the unconfoundedness assumption will over-estimate τ\tau. Thus, we also take a pre-specified list of values larger than 1, ε1(X){1,1.10,1.16,1.28,1.60,1.93}\varepsilon_{1}(X)\in\{1,1.10,1.16,1.28,1.60,1.93\} and ε0(X)=1\varepsilon_{0}(X)=1. We use a nonparametric bootstrap with 200 bootstrap samples to estimate the asymptotic variance. For each value of bb and each pair (ε1(X),ε0(X))(\varepsilon_{1}(X),\varepsilon_{0}(X)), we compute the coverage probability of the 95% confidence interval.

When we observe a positive τ^dr\hat{\tau}^{\textup{dr}} under the unconfoundedness assumption, at each ε1\varepsilon_{1} level, we reject the null hypothesis that τ=0\tau=0 if the constructed lower bound of the 95% confidence interval is positive. With a positive bb, τ\tau will be overestimated if ε1\varepsilon_{1} takes a value smaller than the truth, thus it is likely to make a false rejection. While for ε1\varepsilon_{1} larger than the truth, we will make a conservative inference thus the false rejection is less likely. To evaluate this, we also compute the frequency of making a false rejection of a significantly positive average causal effect. The number of Monte Carlo samples is 500.

Table 2 reports the simulation results including the coverage and false rejection rates of our proposed estimation procedure under various data-generating processes. First, when the sensitivity parameters are set to be equal to the true values, the coverage rate is correct. Second, the false rejection rates are valid if the value of ε1\varepsilon_{1} is specified to be larger than the truth. The proposed sensitivity analysis procedure has valid finite sample performance.

Table 2: Simulation results
coverage rate false rejection rate
bb true ε1(X)\varepsilon_{1}(X) ε1\varepsilon_{1} ε1\varepsilon_{1}
1.00 1.10 1.16 1.28 1.60 1.93 1.00 1.10 1.16 1.28 1.60 1.93
0.00 1.00 0.95 0.53 0.21 0.01 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00
0.20 1.10 0.55 0.96 0.88 0.32 0.00 0.00 0.45 0.02 0.00 0.00 0.00 0.00
0.30 1.16 0.25 0.85 0.97 0.70 0.00 0.00 0.75 0.15 0.02 0.00 0.00 0.00
0.50 1.28 0.02 0.42 0.71 0.96 0.19 0.00 0.98 0.58 0.29 0.02 0.00 0.00
1.00 1.60 0.00 0.03 0.09 0.40 0.96 0.68 1.00 0.97 0.91 0.60 0.02 0.00
1.50 1.93 0.00 0.01 0.02 0.11 0.74 0.95 1.00 0.99 0.98 0.89 0.26 0.02

6 Extensions

6.1 Sensitivity analysis on the difference scale based on counterfactual means

We expressed the sensitivity parameters as ratios between the observed outcome means and the corresponding counterfactual outcome means. Alternatively, we can also express the sensitivity parameters on the difference scale as Definition 2 below (Robins 1999).

Definition 2 (sensitivity parameters on the difference scale).

Define

E{Y(1)Z=1,X}E{Y(1)Z=0,X}\displaystyle E\{Y(1)\mid Z=1,X\}-E\{Y(1)\mid Z=0,X\} =\displaystyle= δ1(X),\displaystyle\delta_{1}(X),
E{Y(0)Z=1,X}E{Y(0)Z=0,X}\displaystyle E\{Y(0)\mid Z=1,X\}-E\{Y(0)\mid Z=0,X\} =\displaystyle= δ0(X).\displaystyle\delta_{0}(X).

In Definition 2, δ1(X)\delta_{1}(X) and δ0(X)\delta_{0}(X) are the two sensitivity parameters.

Under Definition 2, we can identify the two counterfactual means by E{Y(1)Z=0}=E{μ1(X)δ1(X)Z=0}E\{Y(1)\mid Z=0\}=E\left\{\mu_{1}(X)-\delta_{1}(X)\mid Z=0\right\} and E{Y(0)Z=1}=E{μ0(X)+δ0(X)Z=1}E\{Y(0)\mid Z=1\}=E\left\{\mu_{0}(X)+\delta_{0}(X)\mid Z=1\right\}. Therefore, we have the following identification formulas for E{Y(1)}E\{Y(1)\} and E{Y(0)}E\{Y(0)\} based on outcome regression, inverse probability weighting, and doubly robust estimation:

E{Y(1)}\displaystyle E\{Y(1)\} =\displaystyle= E{μ1(X)}E{(1Z)δ1(X)}=E{ZYe(X)(1Z)δ1(X)}\displaystyle E\{\mu_{1}(X)\}-E\{(1-Z)\delta_{1}(X)\}\ =\ E\left\{\frac{ZY}{e(X)}-(1-Z)\delta_{1}(X)\right\}
=\displaystyle= E[μ1(X)+Z{Yμ1(X)}e(X)(1Z)δ1(X)],\displaystyle E\left[\mu_{1}(X)+\frac{Z\{Y-\mu_{1}(X)\}}{e(X)}-(1-Z)\delta_{1}(X)\right],

and

E{Y(0)}\displaystyle E\{Y(0)\} =\displaystyle= E{μ0(X)}+E{Zδ0(X)}=E{(1Z)Y1e(X)+Zδ0(X)}\displaystyle E\{\mu_{0}(X)\}+E\{Z\delta_{0}(X)\}\ =\ E\left\{\frac{(1-Z)Y}{1-e(X)}+Z\delta_{0}(X)\right\}
=\displaystyle= E[μ0(X)+(1Z){Yμ1(X)}1e(X)+Zδ0(X)],\displaystyle E\left[\mu_{0}(X)+\frac{(1-Z)\{Y-\mu_{1}(X)\}}{1-e(X)}+Z\delta_{0}(X)\right],

where the two doubly robust formulas come from the efficient influence functions for E{Y(1)}E\{Y(1)\} and E{Y(0)}E\{Y(0)\}, respectively. Taking the difference between these two expectations, we have the identification formulas for the average causal effect τ\tau, which can be written as the standard identification formulas under unconfoundedness assumption minus δ=E{Zδ0(X)}+E{(1Z)δ1(X)}.\delta=E\{Z\delta_{0}(X)\}+E\{(1-Z)\delta_{1}(X)\}. When the sensitivity parameters δ0(X)\delta_{0}(X) and δ1(X)\delta_{1}(X) do not depend on XX, the correction simplifies to δ=eδ0+(1e)δ1\delta=e\delta_{0}+(1-e)\delta_{1}, a constant that depends on the probability of the treatment and (δ1,δ0)(\delta_{1},\delta_{0}).

Similarly, for the average causal effect on the treated, the correction is δt=E{Zδ0(X)}/e.\delta_{\textsc{t}}=E\{Z\delta_{0}(X)\}/e. When the sensitivity parameter δ0(X)\delta_{0}(X) does not depend on XX, the correction simplifies to δ0\delta_{0}, a constant that equals the sensitivity parameter itself.

From the above discussion, sensitivity analysis based on the difference scale is much simpler than that based on the ratio scale. Its simplicity is an advantage. However, it also has disadvantages. First, the sensitivity analysis estimators reduce to the corresponding classic estimators minus prespecified constants. It makes sensitivity analysis a tautology of specifying how different the estimators and the estimands are. Second, the sensitivity parameters in Definition 2 depend on the baseline expectations of the outcome. They may be harder to specify than those in Definition 1. This is a classic reason to focus on the ratio scale in sensitivity analysis (Cornfield et al. 1959; Poole 2010; Ding and VanderWeele 2014).

Sjölander et al. (2022) generalized the sensitivity parameters on the difference scale to allow for the transformation of the conditional means by a smooth link function and proposed an outcome regression approach to conduct sensitivity analyses with generalized linear models.

6.2 Extension to nonlinear causal parameters

We can extend the current sensitivity analysis framework to analyze a more general class of nonlinear causal parameters, g(μ1,μ0)g(\mu_{1},\mu_{0}), a general function of the marginal means of the potential outcomes. Previous results focus on the special case when g(μ1,μ0)=μ1μ0g(\mu_{1},\mu_{0})=\mu_{1}-\mu_{0}. More generally, many other function forms are of interest. Specifically, when the potential outcomes are binary, the causal risk ratio and the causal odds ratio, rr=μ1/μ0\textsc{rr}=\mu_{1}/\mu_{0} and or={μ1/(1μ1)}/{μ0/(1μ0)}\textsc{or}=\{\mu_{1}/(1-\mu_{1})\}/\{\mu_{0}/(1-\mu_{0})\} are two canonical causal parameters. We can utilize the nonparametric identification and proposed estimators for μ1\mu_{1} and μ0\mu_{0} and construct estimators for a general causal parameter by simply plugging in estimators μ^z\hat{\mu}_{z}^{*} into function g(μ1,μ0)g(\mu_{1},\mu_{0}) for {pred, prod, ht, dr}*\in\{\textup{pred, prod, ht, dr}\} and z=0,1z=0,1. The plug-in outcome and inverse probability weighting estimators are straightforward. The plug-in estimator g(μ^1dr,μ^0dr)g(\hat{\mu}_{1}^{\textup{dr}},\hat{\mu}_{0}^{\textup{dr}}) remains to be doubly robust and achieves the semiparametric efficiency bound. We also implement estimators for rr, or, and their logarithms in the R package saci.

6.3 Other extensions in the supplementary material

For the inverse propensity score weighting and doubly robust estimators, we presented only the Horvitz–Thompson-type estimators. It is straightforward to obtain the corresponding Hajek-type estimators (Hájek 1971). We present them in Section S1.5 in the supplementary material.

Another canonical estimation strategy is matching (Rubin 2006). Lin et al. (2023) pointed out that Abadie and Imbens (2011)’s bias-corrected matching estimator has an equivalent form as the doubly robust estimator if we view matching as a nonparametric method to estimate the propensity score. Therefore, similar sensitivity analysis formulas also hold for matching. We present them in Section S1.6 in the supplementary material.

Another parameter of interest is the average causal effect on the treated units. Under our sensitivity analysis framework, we provide identification results, estimation procedures, and real-world examples to estimate the average causal effect on the treated units. We also implement estimators in our R package. We present details in Section S1.1 in the supplementary material.

We also extend our framework of sensitivity analysis to survival outcomes with the parameter of interest defined as the difference between two survival functions τ(t)=pr{Y(1)>t}pr{Y(0)>t}\tau(t)=\textup{pr}\{Y(1)>t\}-\textup{pr}\{Y(0)>t\} and to observational studies with a multi-valued treatment. Moreover, estimating the controlled direct effect reduces to estimating causal effects with a multi-valued treatment if we view the treatment and the intermediate variable as two treatment factors (VanderWeele 2015). Again, we relegate the details to Sections S1.2 and S1.3 in the supplementary material.

7 Discussion

We focus on the cross-sectional setting, while the framework can be extended to deal with longitudinal data (Robins 1999). With a time-varying treatment, the sensitivity parameters are also time-varying and depend on the past treatment trajectory. We can derive similar nonparametric identification formulas by modifying the g-formula or the inverse probability weighting estimators under sequential ignorability. We leave the derivation of the semiparametric efficient influence function and the implementation of the motivated estimator to future research.

Acknowledgements

Peng Ding was partially supported by the U.S. National Science Foundation # 1945136. A reviewer and the Associate Editor made constructive comments.

References

  • Abadie and Imbens (2006) Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica 74, 235–267.
  • Abadie and Imbens (2011) Abadie, A. and Imbens, G. W. (2011). Bias-corrected matching estimators for average treatment effects. Journal of Business and Economic Statistics 29, 1–11.
  • Bang and Robins (2005) Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973.
  • Bazzano et al. (2003) Bazzano, L. A., He, J., Muntner, P., Vupputuri, S., and Whelton, P. K. (2003). Relationship between cigarette smoking and novel risk factors for cardiovascular disease in the united states. Annals of Internal Medicine 138, 891–897.
  • Bickel et al. (1993) Bickel, P. J., Klaassen, C. A., Ritov, Y., and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press.
  • Chernozhukov et al. (2018) Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal 21, C1–C68.
  • Cornfield et al. (1959) Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B., and Wynder, E. L. (1959). Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22, 173–203.
  • Cox (1972) Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202.
  • Ding (2024) Ding, P. (2024). A First Course in Causal Inference. New York: Chapman & Hall.
  • Ding and Li (2018) Ding, P. and Li, F. (2018). Causal inference: a missing data perspective. Statistical Science 33, 214–237.
  • Ding and Lu (2017) Ding, P. and Lu, J. (2017). Principal stratification analysis using principal scores. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 757–777.
  • Ding and VanderWeele (2014) Ding, P. and VanderWeele, T. J. (2014). Generalized Cornfield conditions for the risk difference. Biometrika 101, 971–977.
  • Ding and VanderWeele (2016) Ding, P. and VanderWeele, T. J. (2016). Sensitivity analysis without assumptions. Epidemiology 27, 368–377.
  • Dorn and Guo (2022) Dorn, J. and Guo, K. (2022). Sharp sensitivity analysis for inverse propensity weighting via quantile balancing. Journal of the American Statistical Association 00, 1–13.
  • Firth and Bennett (1998) Firth, D. and Bennett, K. E. (1998). Robust models in probability sampling (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60, 3–21.
  • Franks et al. (2020) Franks, A., D’Amour, A., and Feller, A. (2020). Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association 115, 1730–1746.
  • Hahn (1998) Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66, 315–331.
  • Hájek (1971) Hájek, J. (1971). Comment: An essay on the logical foundations of survey sampling, part one. The Foundations of Survey Sampling 236,.
  • Hernán (2010) Hernán, M. A. (2010). The hazards of hazard ratios. Epidemiology 21, 13–15.
  • Imai and Van Dyk (2004) Imai, K. and Van Dyk, D. A. (2004). Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association 99, 854–866.
  • Imbens (2000) Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika 87, 706–710.
  • Imbens (2003) Imbens, G. W. (2003). Sensitivity to exogeneity assumptions in program evaluation. American Economic Review 93, 126–132.
  • Jiang et al. (2022) Jiang, Z., Yang, S., and Ding, P. (2022). Multiply robust estimation of causal effects under principal ignorability. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84, 1423–1445.
  • Kasza et al. (2017) Kasza, J., Wolfe, R., and Schuster, T. (2017). Assessing the impact of unmeasured confounding for binary outcomes using confounding functions. International Journal of Epidemiology 46, 1303–1311.
  • Lin et al. (1998) Lin, D. Y., Psaty, B. M., and Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54, 948–963.
  • Lin et al. (2023) Lin, Z., Ding, P., and Han, F. (2023). Estimation based on nearest neighbor matching: from density ratio to average treatment effect. Econometrica 91, 2187–2217.
  • Poole (2010) Poole, C. (2010). On the origin of risk relativism. Epidemiology 21, 3–9.
  • Robins (1999) Robins, J. M. (1999). Association, causation, and marginal structural models. Synthese 121, 151–179.
  • Rosenbaum (1987) Rosenbaum, P. R. (1987). Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika 74, 13–26.
  • Rosenbaum and Rubin (1983a) Rosenbaum, P. R. and Rubin, D. B. (1983a). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 45, 212–218.
  • Rosenbaum and Rubin (1983b) Rosenbaum, P. R. and Rubin, D. B. (1983b). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55.
  • Rubin (1978) Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics 6, 34–58.
  • Rubin (2006) Rubin, D. B. (2006). Matched Sampling for Causal Effects. Cambridge: Cambridge University Press.
  • Scharfstein et al. (1999) Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association 94, 1096–1120.
  • Sjölander et al. (2022) Sjölander, A., Gabriel, E. E., and Ciocănea-Teodorescu, I. (2022). Sensitivity analysis for causal effects with generalized linear models. Journal of Causal Inference 10, 441–479.
  • Tchetgen and Shpitser (2012) Tchetgen, E. J. T. and Shpitser, I. (2012). Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Annals of Statistics 40, 1816–1845.
  • VanderWeele (2015) VanderWeele, T. J. (2015). Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford: Oxford University Press.
  • VanderWeele and Ding (2017) VanderWeele, T. J. and Ding, P. (2017). Sensitivity analysis in observational research: introducing the e-value. Annals of Internal Medicine 167, 268–274.
  • VanderWeele et al. (2014) VanderWeele, T. J., Tchetgen, E. J. T., and Halloran, M. E. (2014). Interference and sensitivity analysis. Statistical science: a review journal of the Institute of Mathematical Statistics 29, 687.
  • Xie and Liu (2005) Xie, J. and Liu, C. (2005). Adjusted kaplan–meier estimator and log-rank test with inverse probability of treatment weighting for survival data. Statistics in Medicine 24, 3089–3110.
  • Yang and Lok (2018) Yang, S. and Lok, J. J. (2018). Sensitivity analysis for unmeasured confounding in coarse structural nested mean models. Statistica Sinica 28, 1703–1723.
  • Zhang and Small (2020) Zhang, B. and Small, D. S. (2020). A calibrated sensitivity analysis for matched observational studies with application to the effect of second-hand smoke exposure on blood lead levels in children. Journal of the Royal Statistical Society Series C: Applied Statistics 69, 1285–1305.
  • Zhao et al. (2019) Zhao, Q., Small, D. S., and Bhattacharya, B. B. (2019). Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81, 735–761.
  • Zubizarreta et al. (2013) Zubizarreta, J. R., Cerda, M., and Rosenbaum, P. R. (2013). Effect of the 2010 chilean earthquake on posttraumatic stress reducing sensitivity to unmeasured bias through study design. Epidemiology 24, 79–87.

Supplementary Material

The supplementary materials contain the following sections.

Section S1 presents additional results on several extensions, formulas for the Hajek-type estimators and for the bias-corrected matching estimator, semiparametric efficiency bounds for τ\tau and τT\tau_{\mathrm{\scriptscriptstyle T}}, and connection of the sensitivity parameters to the treatment selection model.

Section S2 provides guidance to our R package and presents additional data analysis results.

Section S3 presents all the proofs.

Appendix S1 Additional results

S1.1 Extension to the average treatment effect on the treated units

Another parameter of interest is the average treatment effect on the treated units

τt=E{Y(1)Y(0)Z=1}=E(YZ=1)E{Y(0)Z=1}.\tau_{\textsc{t}}=E\{Y(1)-Y(0)\mid Z=1\}=E(Y\mid Z=1)-E\{Y(0)\mid Z=1\}.

Let μt1=E(YZ=1)\mu_{\textsc{t}1}=E(Y\mid Z=1) and μt0=E{Y(0)Z=1}\mu_{\textsc{t}0}=E\{Y(0)\mid Z=1\} denote these two terms. The first term μt1\mu_{\textsc{t}1} has a simple moment estimator μ^t1=n11i=1nZiYi\hat{\mu}_{\textsc{t}1}=n_{1}^{-1}\sum_{i=1}^{n}Z_{i}Y_{i}, where n1=i=1nZin_{1}=\sum_{i=1}^{n}Z_{i} is the total number of treated units. The key is to identify and estimate the second term μt0\mu_{\textsc{t}0}. In parallel with Theorems 1 and 2, μt0\mu_{\textsc{t}0} has two identification forms based on the outcome and the propensity score models. Define e=pr(Z=1)e=\textup{pr}(Z=1).

Theorem S1 (outcome regression and inverse propensity score weighting for τt\tau_{\textsc{t}}).

Under Definition 1, we have

E{Y(0)Z=1}\displaystyle E\{Y(0)\mid Z=1\} =\displaystyle= E{Zμ0(X)ε0(X)}/e\displaystyle E\left\{Z\mu_{0}(X)\varepsilon_{0}(X)\right\}/e
=\displaystyle= E{e(X)ε0(X)1Z1e(X)Y}/e.\displaystyle E\left\{e(X)\varepsilon_{0}(X)\frac{1-Z}{1-e(X)}Y\right\}/e.

Theorem S1 motivates using τ^t=μ^t1μ^t0(=reg,ht)\hat{\tau}_{\textsc{t}}^{*}=\hat{\mu}_{\textsc{t}1}-\hat{\mu}_{\textsc{t}0}^{*}\ (*=\textup{reg},\textup{ht}) to estimate τt\tau_{\textsc{t}}, where

μ^t0reg\displaystyle\hat{\mu}_{\textsc{t}0}^{\textup{reg}} =\displaystyle= n11i=1nZiε0(Xi)μ^0(Xi),\displaystyle n_{1}^{-1}\sum_{i=1}^{n}Z_{i}\varepsilon_{0}(X_{i})\hat{\mu}_{0}(X_{i}),
μ^t0ht\displaystyle\hat{\mu}_{\textsc{t}0}^{\textup{ht}} =\displaystyle= n11i=1nε0(Xi)o^(Xi)(1Zi)Yi,\displaystyle n_{1}^{-1}\sum_{i=1}^{n}\varepsilon_{0}(X_{i})\hat{o}(X_{i})(1-Z_{i})Y_{i},

with the estimated conditional odds of the treatment given covariates, o^(Xi)=e^(Xi)/{1e^(Xi)}\hat{o}(X_{i})=\hat{e}(X_{i})/\{1-\hat{e}(X_{i})\}. To motivate the doubly robust estimator for τt\tau_{\textsc{t}}, we also derive the efficient influence function for μt0\mu_{\textsc{t}0}.

Theorem S2 (efficient influence function for μt0\mu_{\textsc{t}0}).

Under Definition 1, the efficient influence function for μt0\mu_{\textsc{t}0} equals

ϕt0=[Z{ε0(X)μ0(X)μt0}+ε0(X)e(X)(1Z){Yμ0(X)}1e(X)]/e.\phi_{\textsc{t}0}=\left[Z\left\{\varepsilon_{0}(X)\mu_{0}(X)-\mu_{\textsc{t}0}\right\}+\varepsilon_{0}(X)e(X)\frac{(1-Z)\{Y-\mu_{0}(X)\}}{1-e(X)}\right]/e.

The efficient influence function for μt0\mu_{\textsc{t}0} has mean 0, so μt0\mu_{\textsc{t}0} has the following representation:

μt0=E[Zε0(X)μ0(X)+ε0(X)e(X)(1Z){Yμ0(X)}1e(X)]/e.\mu_{\textsc{t}0}=E\left[Z\varepsilon_{0}(X)\mu_{0}(X)+\varepsilon_{0}(X)e(X)\frac{(1-Z)\{Y-\mu_{0}(X)\}}{1-e(X)}\right]/e.

This representation motivates the following estimator for τt\tau_{\textsc{t}}:

μ^t0dr\displaystyle\hat{\mu}_{\textsc{t}0}^{\textup{dr}} =\displaystyle= μ^t0reg+n11i=1nε0(Xi)o^(Xi)(1Zi)Yˇi\displaystyle\hat{\mu}_{\textsc{t}0}^{\text{reg}}+n_{1}^{-1}\sum_{i=1}^{n}\varepsilon_{0}(X_{i})\hat{o}(X_{i})(1-Z_{i})\check{Y}_{i}
=\displaystyle= μ^t0ht+n11i=1nε0(Xi)Zˇiμ^0(Xi)1e^(Xi),\displaystyle\hat{\mu}_{\textsc{t}0}^{\textup{ht}}+n_{1}^{-1}\sum_{i=1}^{n}\varepsilon_{0}(X_{i})\frac{\check{Z}_{i}\hat{\mu}_{0}(X_{i})}{1-\hat{e}(X_{i})},

which are two numerically identical forms based on the fitted propensity score and outcome models.

Importantly, Theorem S3 below shows that μ^t0dr\hat{\mu}_{\textsc{t}0}^{\textup{dr}} is doubly robust.

Theorem S3 (double robustness for estimating τt\tau_{\textsc{t}}).

Under Definition 1, the estimator μ^t0dr\hat{\mu}_{\textsc{t}0}^{\textup{dr}} is doubly robust in the sense that it is consistent if either the propensity score or the outcome model is correctly specified.

We also provide a worst-case interpretation for the average treatment effect on the treated with a constant value of ε0\varepsilon_{0} in the special scenarios when the outcome is non-negative (e.g., binary). Again, ε0\varepsilon_{0} is determined as either min(ε0(X))\min(\varepsilon_{0}(X)) or max(ε0(X))\max(\varepsilon_{0}(X)), depending on the direction of the average treatment effect on the treated. We give a formal in the following proposition.

Proposition S1.

Let ε0,l\varepsilon_{0,\textsc{l}} and ε0,u\varepsilon_{0,\textsc{u}} denote the lower and upper bound of ε0(X)\varepsilon_{0}(X), i.e., ε0(X)[ε0,l,ε0,u]\varepsilon_{0}(X)\in[\varepsilon_{0,\textsc{l}},\varepsilon_{0,\textsc{u}}]. In the special case when the potential outcomes are non-negative (e.g., binary), we have τT[τT,l,τT,u]\tau_{\mathrm{\scriptscriptstyle T}}\in[\tau_{{\mathrm{\scriptscriptstyle T}},\textsc{l}},\tau_{{\mathrm{\scriptscriptstyle T}},\textsc{u}}], where

τT,l\displaystyle\tau_{{\mathrm{\scriptscriptstyle T}},\textsc{l}} =\displaystyle= E(YZ=1)ε0,uE{Zμ0(X)}/e\displaystyle E(Y\mid Z=1)-\varepsilon_{0,\textsc{u}}E\left\{Z\mu_{0}(X)\right\}/e
=\displaystyle= E(YZ=1)ε0,uE[e(X)(1Z)Ye{1e(X)}]\displaystyle E(Y\mid Z=1)-\varepsilon_{0,\textsc{u}}E\left[\frac{e(X)(1-Z)Y}{e\left\{1-e(X)\right\}}\right]
=\displaystyle= E(YZ=1)ε0,uE[Zμ0(X)+e(X)(1Z){Yμ0(X)}1e(X)]/e,\displaystyle E(Y\mid Z=1)-\varepsilon_{0,\textsc{u}}E\left[Z\mu_{0}(X)+e(X)\frac{(1-Z)\left\{Y-\mu_{0}(X)\right\}}{1-e(X)}\right]/e,

and τT,u\tau_{{\mathrm{\scriptscriptstyle T}},\textsc{u}} is computed using the same formula as τT,l\tau_{{\mathrm{\scriptscriptstyle T}},\textsc{l}}, with the replacement of ε0,u\varepsilon_{0,\textsc{u}} by ε0,l\varepsilon_{0,\textsc{l}}.

Proposition S1 gives bounds based on the three identification formulas based on the outcome model, the propensity score model, and the efficient influence function for τT\tau_{\mathrm{\scriptscriptstyle T}}. Define τ^T,l\hat{\tau}^{*}_{{\mathrm{\scriptscriptstyle T}},\textsc{l}} and τ^T,u\hat{\tau}^{*}_{{\mathrm{\scriptscriptstyle T}},\textsc{u}} τ^T\hat{\tau}^{*}_{\mathrm{\scriptscriptstyle T}} by replacing ε0(Xi)\varepsilon_{0}(X_{i}) by ε0,u\varepsilon_{0,\textsc{u}} and ε0,l\varepsilon_{0,\textsc{l}}, respectively, for =reg, ht, dr*=\textup{reg, ht, dr}. Following from Proposition S1, we can treat τ^T,l\hat{\tau}^{*}_{{\mathrm{\scriptscriptstyle T}},\textsc{l}} and τ^T,u\hat{\tau}^{*}_{{\mathrm{\scriptscriptstyle T}},\textsc{u}} as estimators of lower and upper bounds of τT\tau_{\mathrm{\scriptscriptstyle T}}, respectively. Symmetrically, when the potential outcomes are non-positive, we have analogous results to Proposition S1.

Example S1.

We continue the application study in Section 5.1 by applying the above results to estimate τt\tau_{\textsc{t}} with the same observational data and focus on the other causal parameter of interest, the average treatment effect on the treated τt\tau_{\textsc{t}}. The estimated τt\tau_{\textsc{t}} under the unconfoundedness assumption is 1.36, with a 95% confidence interval (0.66, 2.05), using the same unit of the homocysteine level umol/L. We conduct sensitivity analysis using the doubly robust estimator τ^tdr\hat{\tau}_{\textsc{t}}^{\textup{dr}}. The estimated τt\tau_{\textsc{t}} only depends on values of ε0\varepsilon_{0}, we again simplify it not to depend on XX and vary it from 0.75 to 1.25, and report the results for larger values of ε0\varepsilon_{0}, since for values smaller than 1, the estimated τt\tau_{\textsc{t}} are all significantly positive at the 0.05 level. Table 3 summarizes the results. For ε01.1\varepsilon_{0}\geq 1.1, the inference changes, and for ε01.2\varepsilon_{0}\geq 1.2, the estimated τt\tau_{\textsc{t}} becomes negative.

Table 3: Sensitivity analysis for τt\tau_{\textsc{t}} in the homocysteine observational study
ε0\varepsilon_{0} τt\tau_{\textsc{t}}
0.90 2.18 (1.41, 2.96)
0.95 1.77 (1.07, 2.46)
1.00 1.36 (0.66, 2.05)
1.05 0.94 (0.19, 1.70)
1.10 0.53 (-0.22, 1.29)
1.15 0.12 (-0.64, 0.88)
1.20 -0.29 (-1.05, 0.47)
1.25 -0.70 (-1.52, 0.11)

S1.2 Extension to survival outcomes

We can also extend the results to survival outcomes. To avoid the problem of the hazard ratio for causal inference that it has a built-in selection bias even under randomization (Hernán 2010), we define the parameter of interest as the difference between the two survival functions

τ(t)=S1(t)S0(t),\tau(t)=S_{1}(t)-S_{0}(t),

where Sz(t)=pr{Y(z)>t}S_{z}(t)=\textup{pr}\{Y(z)>t\} denotes the potential survival functions for z{0,1}z\in\{0,1\}. For a fixed tt, τ(t)\tau(t) is the average causal effect on the indicator 1(Y>t)1(Y>t). Define the sensitivity parameters as

pr{Y(1)>tZ=1,X}pr{Y(1)>tZ=0,X}=ε1t(X),pr{Y(0)>tZ=1,X}pr{Y(0)>tZ=0,X}=ε0t(X).\displaystyle\frac{\textup{pr}\{Y(1)>t\mid Z=1,X\}}{\textup{pr}\{Y(1)>t\mid Z=0,X\}}=\varepsilon_{1t}(X),\quad\frac{\textup{pr}\{Y(0)>t\mid Z=1,X\}}{\textup{pr}\{Y(0)>t\mid Z=0,X\}}=\varepsilon_{0t}(X).

Define pzt(X)=pr(Y>tZ=z,X)p_{zt}(X)=\textup{pr}(Y>t\mid Z=z,X) as the conditional survival probability for z{0,1}z\in\{0,1\}. We can rewrite the survival function of potential outcomes as

S1(t)\displaystyle S_{1}(t) =\displaystyle= E{Zp1t(X)+(1Z)p1t(X)ε1t(X)}\displaystyle E\left\{Zp_{1t}(X)+(1-Z)\frac{p_{1t}(X)}{\varepsilon_{1t}(X)}\right\} (S1)
=\displaystyle= E{w1t(X)Z1(Y>t)e(X)}\displaystyle E\left\{w_{1t}(X)\frac{Z1(Y>t)}{e(X)}\right\} (S2)
=\displaystyle= E{w1t(X)Z1(Y>t)e(X)Ze(X)e(X)p1t(X)ε1t(X)},\displaystyle E\left\{w_{1t}(X)\frac{Z1(Y>t)}{e(X)}-\frac{Z-e(X)}{e(X)}\frac{p_{1t}(X)}{\varepsilon_{1t}(X)}\right\}, (S3)

and

S0(t)\displaystyle S_{0}(t) =\displaystyle= E{Zp0t(X)ε0t(X)+(1Z)p0t(X)}\displaystyle E\left\{Zp_{0t}(X)\varepsilon_{0t}(X)+(1-Z)p_{0t}(X)\right\} (S4)
=\displaystyle= E{w0t(X)(1Z)1(Y>t)1e(X)}\displaystyle E\left\{w_{0t}(X)\frac{(1-Z)1(Y>t)}{1-e(X)}\right\} (S5)
=\displaystyle= E{w0t(X)(1Z)1(Y>t)1e(X)e(X)Z1e(X)p0t(X)ε0t(X)}\displaystyle E\left\{w_{0t}(X)\frac{(1-Z)1(Y>t)}{1-e(X)}-\frac{e(X)-Z}{1-e(X)}p_{0t}(X)\varepsilon_{0t}(X)\right\} (S6)

where w1t(X)=e(X)+{1e(X)}/ε1t(X)w_{1t}(X)=e(X)+\{1-e(X)\}/\varepsilon_{1t}(X) and w0t(X)=e(X)ε0t(X)+1e(X)w_{0t}(X)=e(X)\varepsilon_{0t}(X)+1-e(X) are the weights.

Censoring is a major challenge for analyzing survival outcomes. To construct the outcome regression estimator, we must use models and estimators tailored to survival outcomes and censoring (e.g., Cox 1972). With estimated pzt(X)p_{zt}(X), (S1) and (S4) motivate the outcome regression estimator

τ^reg(t)\displaystyle\hat{\tau}^{\textup{reg}}(t) =\displaystyle= {n1i=1nZip^1t(Xi)+n1i=1n(1Zi)p^1t(Xi)/ε1t(Xi)}\displaystyle\left\{n^{-1}\sum_{i=1}^{n}Z_{i}\hat{p}_{1t}(X_{i})+n^{-1}\sum_{i=1}^{n}(1-Z_{i})\hat{p}_{1t}(X_{i})/\varepsilon_{1t}(X_{i})\right\}
{n1i=1nZip^0t(Xi)ε0t(Xi)+n1i=1n(1Zi)p^0t(Xi)}.\displaystyle-\left\{n^{-1}\sum_{i=1}^{n}Z_{i}\hat{p}_{0t}(X_{i})\varepsilon_{0t}(X_{i})+n^{-1}\sum_{i=1}^{n}(1-Z_{i})\hat{p}_{0t}(X_{i})\right\}.

To construct the inverse propensity score weighting estimator, we must use an inverse probability weighting adjusted Kaplan–Meier estimator (Xie and Liu 2005). Let Y~i=min(Yi,Ci)\tilde{Y}_{i}=\min(Y_{i},C_{i}) denote the minimum of the outcome YiY_{i} and the censoring time CiC_{i}, and Δi=1(YiCi)\Delta_{i}=1(Y_{i}\leq C_{i}) denote the censoring indicator. Let tjt_{j}’s denote the times when at least one event happens. For each group with treatment z{0,1}z\in\{0,1\}, define the inverse probability weighted number of events that happened at time tjt_{j} and the inverse probability weighted number of individuals at risk at time tjt_{j} as

dj1=i:Y~i=tjw^1t(Xi)ZiΔie^(Xi),\displaystyle d_{j1}=\sum_{i:\tilde{Y}_{i}=t_{j}}\frac{\hat{w}_{1t}(X_{i})Z_{i}\Delta_{i}}{\hat{e}(X_{i})}, nj1=i:Y~itjw^1t(Xi)Zie^(Xi),\displaystyle n_{j1}=\sum_{i:\tilde{Y}_{i}\geq t_{j}}\frac{\hat{w}_{1t}(X_{i})Z_{i}}{\hat{e}(X_{i})},
dj0=i:Y~i=tjw^0t(Xi)(1Zi)Δi1e^(Xi),\displaystyle d_{j0}=\sum_{i:\tilde{Y}_{i}=t_{j}}\frac{\hat{w}_{0t}(X_{i})(1-Z_{i})\Delta_{i}}{1-\hat{e}(X_{i})}, nj0=i:Y~itjw^0t(Xi)(1Zi)1e^(Xi).\displaystyle n_{j0}=\sum_{i:\tilde{Y}_{i}\geq t_{j}}\frac{\hat{w}_{0t}(X_{i})(1-Z_{i})}{1-\hat{e}(X_{i})}.

Then (S2) and (S5) motivate the weighted Kaplan–Meier estimator

τ^ht(t)=tjt(1dj1nj1)tjt(1dj0nj0).\hat{\tau}^{\textup{ht}}(t)=\prod_{t_{j}\leq t}\left(1-\frac{d_{j1}}{n_{j1}}\right)-\prod_{t_{j}\leq t}\left(1-\frac{d_{j0}}{n_{j0}}\right).

With the inverse probability weighting estimator, (S3) and (S6) motivate the following estimator that combines the estimated propensity score and the survival probability models

τ^dr(t)=τ^ht(t)n1i=1nZˇip^1t(Xi)e^(Xi)ε1t(Xi)n1i=1nZˇip^0t(Xi)ε0t(Xi)1e^(Xi).\hat{\tau}^{\textup{dr}}(t)=\hat{\tau}^{\textup{ht}}(t)-n^{-1}\sum_{i=1}^{n}\frac{\check{Z}_{i}\hat{p}_{1t}(X_{i})}{\hat{e}(X_{i})\varepsilon_{1t}(X_{i})}-n^{-1}\sum_{i=1}^{n}\frac{\check{Z}_{i}\hat{p}_{0t}(X_{i})\varepsilon_{0t}(X_{i})}{1-\hat{e}(X_{i})}.

S1.3 Extensions to observational studies with a multi-valued treatment

The results also extend to observational studies with a multi-valued treatment, Z{1,,K}Z\in\{1,\ldots,K\}. Each unit has KK potential outcomes {Y(1),,Y(K)}\{Y(1),\ldots,Y(K)\} corresponding to the KK treatment levels. We can define the causal parameters of interest as the comparisons of potential outcomes

τc=k=1KckE{Y(k)},\tau_{c}=\sum_{k=1}^{K}c_{k}E\{Y(k)\},

where k=1Kck=0\sum_{k=1}^{K}c_{k}=0.

Define the sensitivity parameters as below.

Definition S1.

For any two treatment levels kk and ll, define the sensitivity parameters as

εk,l(X)=E{Y(k)Z=k,X}E{Y(k)Z=l,X}.\varepsilon_{k,l}(X)=\frac{E\{Y(k)\mid Z=k,X\}}{E\{Y(k)\mid Z=l,X\}}.

In Definition S1, we must have εk,k(X)=1\varepsilon_{k,k}(X)=1 for all kks. Under Definition S1, we have the following identification formulas for the mean of the potential outcomes

E{Y(k)}\displaystyle E\{Y(k)\} =\displaystyle= l=1KE{1(Z=l)μk(X)εk,l(X)}\displaystyle\sum_{l=1}^{K}E\left\{1(Z=l)\frac{\mu_{k}(X)}{\varepsilon_{k,l}(X)}\right\}
=\displaystyle= l=1KE{el(X)εk,l(X)1(Z=k)Yek(X)}\displaystyle\sum_{l=1}^{K}E\left\{\frac{e_{l}(X)}{\varepsilon_{k,l}(X)}\frac{1(Z=k)Y}{e_{k}(X)}\right\}
=\displaystyle= l=1KE[1(Z=l)μk(X)εk,l(X)+el(X)εk,l(X)1(Z=k){Yμk(X)}ek(X)],\displaystyle\sum_{l=1}^{K}E\left[1(Z=l)\frac{\mu_{k}(X)}{\varepsilon_{k,l}(X)}+\frac{e_{l}(X)}{\varepsilon_{k,l}(X)}\frac{1(Z=k)\{Y-\mu_{k}(X)\}}{e_{k}(X)}\right],

where ek(X)=pr(Z=kX)e_{k}(X)=\textup{pr}(Z=k\mid X) is the generalized propensity score (Imbens 2000; Imai and Van Dyk 2004) and μk(X)=E{YZ=k,X}\mu_{k}(X)=E\{Y\mid Z=k,X\} is the conditional outcome mean. Motivated by the three identification formulas, we can construct the following estimators for E{Y(k)}E\{Y(k)\},

μ^kreg\displaystyle\hat{\mu}_{k}^{\textup{reg}} =\displaystyle= n1i=1nl=1K1(Zi=l)μ^k(Xi)εk,l(Xi)\displaystyle n^{-1}\sum_{i=1}^{n}\sum_{l=1}^{K}\frac{1(Z_{i}=l)\hat{\mu}_{k}(X_{i})}{\varepsilon_{k,l}(X_{i})}
μ^kht\displaystyle\hat{\mu}_{k}^{\textup{ht}} =\displaystyle= n1i=1nl=1Ke^l(Xi)1(Zi=k)Yiεk,l(Xi)e^k(Xi)\displaystyle n^{-1}\sum_{i=1}^{n}\sum_{l=1}^{K}\frac{\hat{e}_{l}(X_{i})1(Z_{i}=k)Y_{i}}{\varepsilon_{k,l}(X_{i})\hat{e}_{k}(X_{i})}
μ^kdr\displaystyle\hat{\mu}_{k}^{\textup{dr}} =\displaystyle= μ^kreg+n1i=1nl=1Ke^l(Xi)1(Zi=k){Yiμ^k(Xi)}εk,l(Xi)e^k(Xi),\displaystyle\hat{\mu}_{k}^{\textup{reg}}+n^{-1}\sum_{i=1}^{n}\sum_{l=1}^{K}\frac{\hat{e}_{l}(X_{i})1(Z_{i}=k)\{Y_{i}-\hat{\mu}_{k}(X_{i})\}}{\varepsilon_{k,l}(X_{i})\hat{e}_{k}(X_{i})},

and the following estimator for the causal parameter of interest

τ^c=k=1Kckμ^k(=reg,ht,dr).\hat{\tau}_{c}^{*}=\sum_{k=1}^{K}c_{k}\hat{\mu}_{k}^{*}\ (*=\textup{reg},\textup{ht},\textup{dr}).

S1.4 Extensions to the controlled direct effect

With an intermediate variable SS between the treatment ZZ and the outcome YY, we use the notation Y(z,s)Y(z,s) to denote the potential outcome under treatment assignment Z=zZ=z and intermediate value S=sS=s. When ZZ is binary, for a fixed level ss, we define the controlled direct effect as

CDE(s)=E{Y(1,s)Y(0,s)},\textsc{CDE}(s)=E\{Y(1,s)-Y(0,s)\},

which measures the average treatment effect of ZZ when the level of the intermediate variable is fixed at ss, thus measures the direct effect of treatment on the outcome controlling the mediator value at ss. As an extension, if SS is discrete, we can view the problem of identifying and estimating the controlled direct effect as an observation study with multiple treatment levels (VanderWeele 2015, Ding 2024, Chapter 28 in). Therefore, we can apply the above framework in Section S1.3 to conduct sensitivity analysis for the controlled direct effect. We omit the details.

S1.5 Formulas for the Hajek-type estimators

Under our sensitivity analysis framework, Theorem 2 motivates the following Hajek estimator of average causal effect

τ^haj=i=1nw^1(Xi)ZiYie^(Xi)/i=1nZie^(Xi)i=1nw^0(Xi)(1Zi)Yi1e^(Xi)/i=1n1Zi1e^(Xi).\hat{\tau}^{\textup{haj}}=\sum_{i=1}^{n}\hat{w}_{1}(X_{i})\frac{Z_{i}Y_{i}}{\hat{e}(X_{i})}\Big{/}\sum_{i=1}^{n}\frac{Z_{i}}{\hat{e}(X_{i})}-\sum_{i=1}^{n}\hat{w}_{0}(X_{i})\frac{(1-Z_{i})Y_{i}}{1-\hat{e}(X_{i})}\Big{/}\sum_{i=1}^{n}\frac{1-Z_{i}}{1-\hat{e}(X_{i})}.

The denominators in the Hajek estimator take the same form as the canonical ones, which do not depend on ε1(Xi)\varepsilon_{1}(X_{i}) or ε0(Xi)\varepsilon_{0}(X_{i}). We also have the following alternative form of the doubly robust estimator based on the Hajek estimator

τ^dr2=τ^proj+i=1nw^1(Xi)ZiYˇie^(Xi)/i=1nZie^(Xi)i=1nw^0(Xi)(1Zi)Yˇi1e^(Xi)/i=1n1Zi1e^(Xi).\hat{\tau}^{\textup{dr2}}=\hat{\tau}^{\textup{proj}}+\sum_{i=1}^{n}\hat{w}_{1}(X_{i})\frac{Z_{i}\check{Y}_{i}}{\hat{e}(X_{i})}\Big{/}\sum_{i=1}^{n}\frac{Z_{i}}{\hat{e}(X_{i})}-\sum_{i=1}^{n}\hat{w}_{0}(X_{i})\frac{(1-Z_{i})\check{Y}_{i}}{1-\hat{e}(X_{i})}\Big{/}\sum_{i=1}^{n}\frac{1-Z_{i}}{1-\hat{e}(X_{i})}.

In the special case with unconfoundedness assumption, ε1(Xi)=ε0(Xi)=1\varepsilon_{1}(X_{i})=\varepsilon_{0}(X_{i})=1, they reduce to the standard Hajek estimator and doubly robust estimator for τ\tau.

Similarly, Theorem S1 motivates the following Hajek estimator of the average treatment effect on the treated units, τ^thaj=μ^t1μ^t0haj\hat{\tau}_{\textsc{t}}^{\textup{haj}}=\hat{\mu}_{\textsc{t}1}-\hat{\mu}_{\textsc{t}0}^{\textup{haj}} where

μ^t0haj=i=1nε0(Xi)o^(Xi)(1Zi)Yi/i=1no^(Xi)(1Zi).\hat{\mu}_{\textsc{t}0}^{\textup{haj}}=\sum_{i=1}^{n}\varepsilon_{0}(X_{i})\hat{o}(X_{i})(1-Z_{i})Y_{i}\Big{/}\sum_{i=1}^{n}\hat{o}(X_{i})(1-Z_{i}).

The doubly robust estimator is τ^tdr2=μ^t1μ^t0dr2\hat{\tau}_{\textsc{t}}^{\textup{dr2}}=\hat{\mu}_{\textsc{t}1}-\hat{\mu}_{\textsc{t}0}^{\textup{dr2}} with

μ^t0dr2=μ^t0reg+i=1nε0(Xi)o^(Xi)(1Zi)Yˇi/i=1no^(Xi)(1Zi).\hat{\mu}_{\textsc{t}0}^{\textup{dr2}}=\hat{\mu}_{\textsc{t}0}^{\text{reg}}+\sum_{i=1}^{n}\varepsilon_{0}(X_{i})\hat{o}(X_{i})(1-Z_{i})\check{Y}_{i}\Big{/}\sum_{i=1}^{n}\hat{o}(X_{i})(1-Z_{i}).

Again, in the special case when ε0(Xi)=1\varepsilon_{0}(X_{i})=1, they reduce to the standard Hajek estimator and doubly robust estimator for τt\tau_{\textsc{t}}.

S1.6 Formulas for the bias-corrected matching estimator

We also provide the formulas for the bias-corrected matching estimator under our sensitivity analysis framework. Matching estimators impute the missing counterfactual Yi(1z)Y_{i}(1-z) for an individual ii in treatment group Zi=zZ_{i}=z by finding MM nearest neighbors of ii in treatment group Zi=1zZ_{i}=1-z and use the average observed outcomes of the MM nearest neighbors as the imputed value of the counterfactual Yi(1z)Y_{i}(1-z) for z{0,1}z\in\{0,1\}. The matching-based estimator of τ\tau is then constructed by taking the average of the imputed individual treatment effect (Abadie and Imbens 2006). This estimator is generally not consistent when the nearest neighbor matching is based on multi-dimensional observed covariates. Abadie and Imbens (2011) proposed a bias-corrected version of the matching estimator by estimating the conditional outcome models μz(X)\mu_{z}(X) and combining it with the original matching estimator. As shown in Lemma 5.1 in Lin et al. (2023), under Assumption 1, viewing matching as a nonparametric method of estimating the propensity score, we can rewrite the bias-corrected matching estimator in Abadie and Imbens (2011) as

τ^Mbc\displaystyle\hat{\tau}^{\textup{bc}}_{M} =\displaystyle= τ^reg+n1i=1n[{1+KM1(Xi)M}ZiYˇi{1+KM0(Xi)M}(1Zi)Yˇi],\displaystyle\hat{\tau}^{\textup{reg}}+n^{-1}\sum_{i=1}^{n}\left[\left\{1+\frac{K_{M}^{1}(X_{i})}{M}\right\}Z_{i}\check{Y}_{i}-\left\{1+\frac{K_{M}^{0}(X_{i})}{M}\right\}(1-Z_{i})\check{Y}_{i}\right],

where MM is the fixed number of matches for each observation and KMz(Xi)K_{M}^{z}(X_{i}) is the number of matched times of unit ii in treatment group zz, z{0,1}z\in\{0,1\}.

Under our sensitivity analysis framework, we can rewrite the bias-corrected matching estimator as

τ^Mbc\displaystyle\hat{\tau}^{\textup{bc}}_{M} =\displaystyle= τ^pred+n1i=1n{KM1(Xi)M1ε1(Xi)ZiYˇiKM0(Xi)Mε0(Xi)(1Zi)Yˇi}.\displaystyle\hat{\tau}^{\textup{pred}}+n^{-1}\sum_{i=1}^{n}\left\{\frac{K_{M}^{1}(X_{i})}{M}\frac{1}{\varepsilon_{1}(X_{i})}Z_{i}\check{Y}_{i}-\frac{K_{M}^{0}(X_{i})}{M}\varepsilon_{0}(X_{i})(1-Z_{i})\check{Y}_{i}\right\}.

Lin et al. (2023) shows that under some regularity conditions, nzKMz(Xi)/(n1zM)n_{z}K_{M}^{z}(X_{i})/(n_{1-z}M) consistently estimates the density ratio fXZ=1z(Xi)/fXZ=z(Xi)f_{X\mid Z=1-z}(X_{i})/f_{X\mid Z=z}(X_{i}) for z{0,1}z\in\{0,1\}, where nzn_{z} is the total number of observations in treatment group Z=zZ=z. Moreover, n1/n0n_{1}/n_{0} is a consistent estimator of the ratio of the marginal probabilities pr(Z=1)/pr(Z=0)\textup{pr}(Z=1)/\textup{pr}(Z=0), thus we essentially use 1+KM1(Xi)/M1+K_{M}^{1}(X_{i})/M and 1+KM0(Xi)/M1+K_{M}^{0}(X_{i})/M to estimate 1/e(Xi)1/e(X_{i}) and 1/{1e(Xi)}1/\{1-e(X_{i})\}, respectively.

S1.7 Semiparametric efficiency bounds for τ\tau and τT\tau_{\mathrm{\scriptscriptstyle T}}

In this subsection, we provide the semiparametric efficiency bound for τ\tau and τT\tau_{\mathrm{\scriptscriptstyle T}} under our sensitivity analysis framework in the following proposition.

Proposition S2.

Under Definition 1, the semiparametric efficiency bounds for τ\tau and τT\tau_{\mathrm{\scriptscriptstyle T}} are

var(ϕ)\displaystyle\text{var}(\phi) =\displaystyle= E{w12(X)e(X)var(YX,Z=1)+w02(X)1e(X)var(YX,Z=0)}\displaystyle E\left\{\frac{w_{1}^{2}(X)}{e(X)}\text{var}(Y\mid X,Z=1)+\frac{w_{0}^{2}(X)}{1-e(X)}\text{var}(Y\mid X,Z=0)\right\}
+E[e(X){μ1(X)μ0(X)ε0(X)}2+{1e(X)}{μ1(X)ε1(X)μ0(X)}2τ2],\displaystyle+E\left[e(X)\left\{\mu_{1}(X)-\mu_{0}(X)\varepsilon_{0}(X)\right\}^{2}+\left\{1-e(X)\right\}\left\{\frac{\mu_{1}(X)}{\varepsilon_{1}(X)}-\mu_{0}(X)\right\}^{2}-\tau^{2}\right],

and

var(ϕT)\displaystyle\text{var}(\phi_{\mathrm{\scriptscriptstyle T}}) =\displaystyle= E[e(X)var(YX,Z=1)e2+e2(X)ε02(X)var(YX,Z=0)e2{1e(X)}]\displaystyle E\left[\frac{e(X)\text{var}(Y\mid X,Z=1)}{e^{2}}+\frac{e^{2}(X)\varepsilon_{0}^{2}(X)\text{var}(Y\mid X,Z=0)}{e^{2}\left\{1-e(X)\right\}}\right]
+E[e(X){μ1(X)ε0(X)μ0(X)τT}2e2],\displaystyle+E\left[\frac{e(X)\left\{\mu_{1}(X)-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}^{2}}{e^{2}}\right],

respectively.

Proposition S2 shows that the semiparametric efficiency bound for τ\tau is not monotonic in ε1(X)\varepsilon_{1}(X) and ε0(X)\varepsilon_{0}(X) even when they are constant, and similarly that for τT\tau_{\mathrm{\scriptscriptstyle T}} is not monotonic in ε0(X)\varepsilon_{0}(X) either. This echoes our discussion in Section 5.1.

S1.8 Connection to the treatment selection model

In this subsection, we show that our proposed sensitivity parameters are also related to the treatment selection model, although their definitions are based on the outcome models.

Take the treatment selection model pr{Z=1X,Y(1)}\textup{pr}\{Z=1\mid X,Y(1)\} for example. Under Definition 1, when ε1(X)1\varepsilon_{1}(X)\neq 1, pr{Z=1X,Y(1)}\textup{pr}\{Z=1\mid X,Y(1)\} is generally not equal to e(X)e(X) and does depend on Y(1)Y(1). We next derive the relationship between pr{Z=1X,Y(1)}\textup{pr}\{Z=1\mid X,Y(1)\} and e(X)e(X) to provide the connection between our sensitivity parameters and the selection model.

We start with the special case when YY is binary. In this case, we have

e(X)\displaystyle e(X) =\displaystyle= e(X,1)w1(X)μ1(X)+e(X,0){1w1(X)μ1(X)},\displaystyle e(X,1)w_{1}(X)\mu_{1}(X)+e(X,0)\left\{1-w_{1}(X)\mu_{1}(X)\right\}, (S7)

where e(X,y)=E{ZX,Y(1)=y}e(X,y)=E\{Z\mid X,Y(1)=y\} is the propensity score under selection to treatment for y=0,1y=0,1. The w1(X)=e(X)+{1e(X)}/ε1(X)w_{1}(X)=e(X)+\{1-e(X)\}/\varepsilon_{1}(X) term on the right-hand side depends on ε1(X)\varepsilon_{1}(X), thus (S7) shows the connection between the sensitivity parameter and the selection model. Previous sensitivity analysis framework bounds the ratio

pr{Z=1X,Y(1)}/[1pr{Z=1X,Y(1)}]e(X)/{1e(X)}\frac{\textup{pr}\{Z=1\mid X,Y(1)\}/[1-\textup{pr}\{Z=1\mid X,Y(1)\}]}{e(X)/\{1-e(X)\}}

to estimate E{Y(1)}E\{Y(1)\}, which works better for the IPW estimators (Zhao et al. 2019; Dorn and Guo 2022).

For a general YY, we first introduce a set of ε1(X,y)\varepsilon_{1}(X,y) to denote the ratio between the observed and counterfactual conditional densities,

pY(1)X,Z=1(yX,Z=1)pY(1)X,Z=0(yX,Z=0)\displaystyle\frac{p_{Y(1)\mid X,Z=1}(y\mid X,Z=1)}{p_{Y(1)\mid X,Z=0}(y\mid X,Z=0)} =\displaystyle= ε1(X,y),\displaystyle\varepsilon_{1}(X,y),

where pY(1)X,Z=z(yX,Z=z)p_{Y(1)\mid X,Z=z}(y\mid X,Z=z) denotes the conditional density of Y(1)Y(1) in treatment group Z=zZ=z for z=0,1z=0,1 with y{1ε1(X)/ε1(X,y)}pY(1)X,Z=1(yX,Z=1)dy=0\int y\left\{1-\varepsilon_{1}(X)/\varepsilon_{1}(X,y)\right\}p_{Y(1)\mid X,Z=1}(y\mid X,Z=1)\textup{d}y=0. We have the following equality

e(X)\displaystyle e(X) =\displaystyle= e(X,y){e(X)+1e(X)ε1(X,y)}pY(1)X,Z=1(yX,Z=1)dy.\displaystyle\int e(X,y)\left\{e(X)+\frac{1-e(X)}{\varepsilon_{1}(X,y)}\right\}p_{Y(1)\mid X,Z=1}(y\mid X,Z=1)\textup{d}y.

This provides the relation of treatment selection model pr{Z=1X,Y(1)}\textup{pr}\{Z=1\mid X,Y(1)\} with our sensitivity parameter.

Appendix S2 R package and implementation

We provide an R package saci to conduct the proposed sensitivity analysis. The package can be installed from Github using the command: devtools::install_github("sizhu-lu/saci"). We provide the following functions in this package:

sa_ate(X, Z, Y, eps1_list, eps0_list,
pscore_family="binomial", outcome_family="gaussian",
n_boot=500, truncpscore=c(0,1))

sa_ate is used to conduct sensitivity analysis of the average treatment effect. The inputs include the observed data, the user-specified lists of ε1\varepsilon_{1} and ε0\varepsilon_{0}, the specification of the “family” option in the generalized linear models of propensity score model and outcome model, the total number of bootstrap samples to compute the estimated standard errors, and the truncation cutoffs of the estimated propensity score. This function outputs the point estimator, estimated standard error, p-value, and the 95% confidence interval for each pair of sensitivity parameters (ε1,ε0)(\varepsilon_{1},\varepsilon_{0}) in the user-specified lists and for each method (proj, ht, hajek, and dr). We use the nonparametric bootstrap for standard error estimation.

sa_att(X, Z, Y, eps0_list,
pscore_family="binomial", outcome_family="gaussian",
n_boot=500, truncpscore=c(0,1))

sa_att is used to conduct sensitivity analysis of the average treatment effect on the treated. The inputs and outputs are similar to sa_ate, except that the estimation procedure only depends on ε0\varepsilon_{0}.

sa_rr(X, Z, Y, eps1_list, eps0_list,
pscore_family="binomial", outcome_family="binomial",
n_boot=500, truncpscore=c(0,1), log="FALSE")

For binary potential outcomes, the function sa_rr is used to conduct sensitivity analysis of the causal risk ratio or log causal risk ratio. The inputs include the observed data, the user-specified list of ε1\varepsilon_{1} and ε0\varepsilon_{0}, the specification of the “family” option in the generalized linear models of propensity score model and outcome model, the total number of bootstrap samples to compute the estimated standard errors, the truncation cutoffs of the estimated propensity score, and option to take the logarithm of the causal risk ratio or not. This function outputs the point estimator, estimated standard error, p-value, and the 95% confidence interval for each pair of sensitivity parameters (ε1,ε0)(\varepsilon_{1},\varepsilon_{0}) in the user-specified lists and for each method (proj, ht, hajek, and dr). Again, we use the nonparametric bootstrap for standard error estimation.

sa_or(X, Z, Y, eps1_list, eps0_list,
pscore_family="binomial", outcome_family="binomial",
n_boot=500, truncpscore=c(0,1), log="FALSE")

For binary potential outcomes, the function sa_or is used to conduct sensitivity analysis of the causal odds ratio or log causal odds ratio. The inputs and outputs are similar to sa_rr.

We also provide a function plot_contour to generate a contour plot of the estimated average treatment effects with different values of sensitivity parameters. The usage is

plot_contour(ate_result, caption="", estimator="dr", value="est")

where ate_result is the output of the sa_ate function and the default value to plot is the doubly robust point estimator. Figure 1 shows a sample plot for the observational study in Section 5.1. Users are also free to change the estimation method to other estimators and the plotted values to lower or upper confidence bounds by specifying, e.g., estimator="ht" and value="ci_lb".

Appendix S3 Proofs

S3.1 Proof of Theorem 1

We have

E{Y(1)Z=0}\displaystyle E\{Y(1)\mid Z=0\} =\displaystyle= E[E{Y(1)Z=0,X}Z=0]\displaystyle E\left[E\{Y(1)\mid Z=0,X\}\mid Z=0\right]
=\displaystyle= E[E{Y(1)Z=1,X}/ε1(X)Z=0]\displaystyle E\left[E\{Y(1)\mid Z=1,X\}/\varepsilon_{1}(X)\mid Z=0\right]
=\displaystyle= E{μ1(X)/ε1(X)Z=0}\displaystyle E\left\{\mu_{1}(X)/\varepsilon_{1}(X)\mid Z=0\right\}

and

E{Y(0)Z=1}\displaystyle E\{Y(0)\mid Z=1\} =\displaystyle= E[E{Y(0)Z=1,X}Z=1]\displaystyle E\left[E\{Y(0)\mid Z=1,X\}\mid Z=1\right]
=\displaystyle= E[E{Y(0)Z=0,X}ε0(X)Z=1]\displaystyle E\left[E\{Y(0)\mid Z=0,X\}\varepsilon_{0}(X)\mid Z=1\right]
=\displaystyle= E{μ0(X)ε0(X)Z=1}.\displaystyle E\left\{\mu_{0}(X)\varepsilon_{0}(X)\mid Z=1\right\}.

Therefore, we can write τ\tau as the difference between

E{Y(1)}\displaystyle E\{Y(1)\} =\displaystyle= E(YZ=1)pr(Z=1)+E{μ1(X)/ε1(X)Z=0}pr(Z=0),\displaystyle E(Y\mid Z=1)\textup{pr}(Z=1)+E\left\{\mu_{1}(X)/\varepsilon_{1}(X)\mid Z=0\right\}\textup{pr}(Z=0),
E{Y(0)}\displaystyle E\{Y(0)\} =\displaystyle= E{μ0(X)ε0(X)Z=1}pr(Z=1)+E(YZ=0)pr(Z=0),\displaystyle E\left\{\mu_{0}(X)\varepsilon_{0}(X)\mid Z=1\right\}\textup{pr}(Z=1)+E(Y\mid Z=0)\textup{pr}(Z=0),

which can be simplified as the difference between

E{Y(1)}=E{ZY+(1Z)μ1(X)/ε1(X)},E{Y(0)}=E{Zμ0(X)ε0(X)+(1Z)Y}.\displaystyle E\{Y(1)\}=E\{ZY+(1-Z)\mu_{1}(X)/\varepsilon_{1}(X)\},\quad E\{Y(0)\}=E\{Z\mu_{0}(X)\varepsilon_{0}(X)+(1-Z)Y\}.

This gives (7). The identification formula (6) follows from

E(ZY)=E(YZ=1)pr(Z=1)=E{μ1(X)Z=1}pr(Z=1)E(ZY)=E(Y\mid Z=1)\textup{pr}(Z=1)=E\{\mu_{1}(X)\mid Z=1\}\textup{pr}(Z=1)

and

E{(1Z)Y}=E(YZ=0)pr(Z=0)=E{μ0(X)Z=0}pr(Z=0).E\{(1-Z)Y\}=E(Y\mid Z=0)\textup{pr}(Z=0)=E\{\mu_{0}(X)\mid Z=0\}\textup{pr}(Z=0).

S3.2 Proof of Theorem 2

We only prove the result under treatment because the proof for the result under control is similar. On the one hand, we can use (6) to obtain

E{Y(1)}\displaystyle E\{Y(1)\} =\displaystyle= E{ZY+(1Z)μ1(X)/ε1(X)}\displaystyle E\{ZY+(1-Z)\mu_{1}(X)/\varepsilon_{1}(X)\}
=\displaystyle= E[e(X)μ1(X)+{1e(X)}μ1(X)/ε1(X)]\displaystyle E\left[e(X)\mu_{1}(X)+\{1-e(X)\}\mu_{1}(X)/\varepsilon_{1}(X)\right]
=\displaystyle= E{w1(X)μ1(X)}.\displaystyle E\{w_{1}(X)\mu_{1}(X)\}.

On the other hand, we can condition on XX to obtain

E{w1(X)Ze(X)Y}\displaystyle E\left\{w_{1}(X)\frac{Z}{e(X)}Y\right\} =\displaystyle= E{w1(X)E(ZX)e(X)E(YZ=1,X)}\displaystyle E\left\{w_{1}(X)\frac{E(Z\mid X)}{e(X)}E(Y\mid Z=1,X)\right\}
=\displaystyle= E{w1(X)μ1(X)}.\displaystyle E\{w_{1}(X)\mu_{1}(X)\}.

The conclusion for Y(1)Y(1) follows.

S3.3 Proof of Theorem 3

We first present a short but informal proof. Based on the form E{Y(1)}=E{w1(X)μ1(X)}E\{Y(1)\}=E\{w_{1}(X)\mu_{1}(X)\} as a byproduct in the proof of Theorem 2, we can derive the efficient influence function for E{Y(1)}E\{Y(1)\} by extending the result of Hahn (1998). The key difference is that the weight depends on the propensity score, so we must include an additional term due to the derivative of w1(X)w_{1}(X) with respect to e(X)e(X). This additional term comes from the path-wise derivative e˙θ(X)|θ=0=E[{Ze(X)}s(ZX)X]\dot{e}_{\theta}(X)|_{\theta=0}=E[\{Z-e(X)\}s(Z\mid X)\mid X], where θ\theta is the parameter for the sub-model and s(ZX)s(Z\mid X) is the score function for E(ZX)E(Z\mid X). Including this additional term yields the following efficient influence function for E{Y(1)}E\{Y(1)\}:

w1(X){Ze(X)YZe(X)e(X)μ1(X)}+ε1(X)1ε1(X){Ze(X)}μ1(X)\displaystyle w_{1}(X)\left\{\frac{Z}{e(X)}Y-\frac{Z-e(X)}{e(X)}\mu_{1}(X)\right\}+\frac{\varepsilon_{1}(X)-1}{\varepsilon_{1}(X)}\{Z-e(X)\}\mu_{1}(X)
=\displaystyle= w1(X)Ze(X)Y{w1(X)e(X)ε1(X)1ε1(X)}{Ze(X)}μ1(X),\displaystyle w_{1}(X)\frac{Z}{e(X)}Y-\left\{\frac{w_{1}(X)}{e(X)}-\frac{\varepsilon_{1}(X)-1}{\varepsilon_{1}(X)}\right\}\{Z-e(X)\}\mu_{1}(X),

which reduces to ϕ1\phi_{1} by simple algebra. The efficient influence function for E{Y(0)}E\{Y(0)\} follows from a similar calculation.

We then provide a longer but formal proof. We follow the semiparametric theory in Bickel et al. (1993) and Hahn (1998) to derive the efficient influence function for E{Y(1)}E\left\{Y(1)\right\}. We denote the vector of all observed variables V=(X,Z,Y)V=(X,Z,Y) and factorize the likelihood as

P(V)=P(X)P(ZX)P(YZ,X).P(V)=P(X)P(Z\mid X)P(Y\mid Z,X).

To derive the efficient influence function for E{Y(1)}E\left\{Y(1)\right\}, we consider a one-dimensional parametric submodel Pθ(V)P_{\theta}\left(V\right) which contains the true model P(V)P(V) at θ=0\theta=0. We use θ\theta in the subscript to denote the quantity with respect to the submodel, e.g., eθ(s1,s0)e_{\theta}\left(s_{1},s_{0}\right) is the value of e(s1,s0)e\left(s_{1},s_{0}\right) with respect to the submodel. We use dot to denote the partial derivative with respect to θ\theta, e.g., e˙θ(s1,s0)=eθ(s1,s0)/θ\dot{e}_{\theta}\left(s_{1},s_{0}\right)=\partial e_{\theta}\left(s_{1},s_{0}\right)/\partial\theta, and use sθ()s_{\theta}(\cdot) to denote the score function with respect to the submodel. Decomposition of the score function under the submodel as

sθ(V)=sθ(X)+sθ(ZX)+sθ(YZ,X)s_{\theta}(V)=s_{\theta}(X)+s_{\theta}(Z\mid X)+s_{\theta}(Y\mid Z,X)

where sθ(X)=logPθ(X)/θs_{\theta}(X)=\partial\log P_{\theta}(X)/\partial\theta, sθ(ZX)=logPθ(ZX)/θs_{\theta}(Z\mid X)=\partial\log P_{\theta}(Z\mid X)/\partial\theta, and sθ(YZ,X)=logPθ(YZ,X)/θs_{\theta}(Y\mid Z,X)=\partial\log P_{\theta}(Y\mid Z,X)/\partial\theta are the score functions corresponding to the three components of the likelihood function. We write sθ()|θ=0\left.s_{\theta}(\cdot)\right|_{\theta=0} as s()s(\cdot), which is the score function evaluated at the true model θ=0\theta=0 under the one-dimensional submodel. Let ϕ1(V)\phi_{1}\left(V\right) denote the efficient influence function for E{Y(1)}E\left\{Y(1)\right\}. It must satisfy the equation

Eθ{Y(1)}θ|θ=0=E{ϕ1(V)s(V)}.\left.\frac{\partial E_{\theta}\left\{Y(1)\right\}}{\partial\theta}\right|_{\theta=0}=E\left\{\phi_{1}(V)s(V)\right\}.

To find such ϕ1(V)\phi_{1}(V), we need to calculate the derivative of

Eθ{Y(1)}\displaystyle E_{\theta}\left\{Y\left(1\right)\right\} =\displaystyle= Eθ{w1(X)μ1(X)}\displaystyle E_{\theta}\left\{w_{1}\left(X\right)\mu_{1}\left(X\right)\right\}
=\displaystyle= eθ(x)ε1(x)+1eθ(x)ε1(x)μ1,θ(x)𝑑Pθ(x).\displaystyle\int\frac{e_{\theta}\left(x\right)\varepsilon_{1}\left(x\right)+1-e_{\theta}\left(x\right)}{\varepsilon_{1}\left(x\right)}\mu_{1,\theta}\left(x\right)dP_{\theta}\left(x\right).

Therefore, the Gateaux derivative of our parameter of interest under the parametric submodel with respect to θ\theta is

Eθ{Y(1)}θ|θ=0\displaystyle\left.\frac{\partial E_{\theta}\left\{Y(1)\right\}}{\partial\theta}\right|_{\theta=0} =\displaystyle= e˙θ(x)|θ=0ε1(x)e˙θ(x)|θ=0ε1(x)μ1(x)𝑑P(x)\displaystyle\int\frac{\left.\dot{e}_{\theta}\left(x\right)\right|_{\theta=0}\varepsilon_{1}\left(x\right)-\left.\dot{e}_{\theta}\left(x\right)\right|_{\theta=0}}{\varepsilon_{1}\left(x\right)}\mu_{1}\left(x\right)dP\left(x\right)
+w1(x)μ˙1,θ(x)|θ=0dP(x)\displaystyle+\int w_{1}\left(x\right)\left.\dot{\mu}_{1,\theta}\left(x\right)\right|_{\theta=0}dP\left(x\right)
+{w1(x)μ1(x)μ1}s(x)𝑑P(x)\displaystyle+\int\left\{w_{1}\left(x\right)\mu_{1}\left(x\right)-\mu_{1}\right\}s\left(x\right)dP\left(x\right)
=\displaystyle= E[e˙θ(X)|θ=0{11ε1(X)}μ1(X)]\displaystyle E\left[\left.\dot{e}_{\theta}\left(X\right)\right|_{\theta=0}\left\{1-\frac{1}{\varepsilon_{1}\left(X\right)}\right\}\mu_{1}\left(X\right)\right]
+E{w1(X)μ˙1,θ(X)|θ=0}\displaystyle+E\left\{w_{1}\left(X\right)\left.\dot{\mu}_{1,\theta}\left(X\right)\right|_{\theta=0}\right\}
+E[{w1(X)μ1(X)μ1}s(X)].\displaystyle+E\left[\left\{w_{1}\left(X\right)\mu_{1}\left(X\right)-\mu_{1}\right\}s\left(X\right)\right].

The calculation of two building blocks e˙θ(X)|θ=0\left.\dot{e}_{\theta}\left(X\right)\right|_{\theta=0} and μ˙1,θ(X)|θ=0\left.\dot{\mu}_{1,\theta}\left(X\right)\right|_{\theta=0} are standard (Hahn 1998):

e˙θ(X)|θ=0\displaystyle\left.\dot{e}_{\theta}\left(X\right)\right|_{\theta=0} =\displaystyle= Eθ(ZX)θ|θ=0=E[{Ze(X)}s(ZX)X],\displaystyle\left.\frac{\partial E_{\theta}\left(Z\mid X\right)}{\partial\theta}\right|_{\theta=0}=E\left[\left\{Z-e\left(X\right)\right\}s\left(Z\mid X\right)\mid X\right],
μ˙1,θ(X)|θ=0\displaystyle\left.\dot{\mu}_{1,\theta}\left(X\right)\right|_{\theta=0} =\displaystyle= Eθ(YZ=1,X)θ|θ=0\displaystyle\left.\frac{\partial E_{\theta}\left(Y\mid Z=1,X\right)}{\partial\theta}\right|_{\theta=0}
=\displaystyle= E[{Yμ1(X)}s(YZ=1,X)Z=1,X]\displaystyle E\left[\left\{Y-\mu_{1}\left(X\right)\right\}s\left(Y\mid Z=1,X\right)\mid Z=1,X\right]
=\displaystyle= E[Z{Yμ1(X)}e(X)s(YZ,X)X].\displaystyle E\left[\frac{Z\left\{Y-\mu_{1}\left(X\right)\right\}}{e\left(X\right)}s\left(Y\mid Z,X\right)\mid X\right].

Plugging back to the previous equation, we have

Eθ{Y(1)}θ|θ=0\displaystyle\left.\frac{\partial E_{\theta}\left\{Y(1)\right\}}{\partial\theta}\right|_{\theta=0}
=\displaystyle= E[{Ze(X)}{11ε1(X)}μ1(X)s(ZX)]\displaystyle E\left[\left\{Z-e\left(X\right)\right\}\left\{1-\frac{1}{\varepsilon_{1}\left(X\right)}\right\}\mu_{1}\left(X\right)s\left(Z\mid X\right)\right]
+E[w1(X)Z{Yμ1(X)}e(X)s(YZ,X)]\displaystyle+E\left[w_{1}\left(X\right)\frac{Z\left\{Y-\mu_{1}\left(X\right)\right\}}{e\left(X\right)}s\left(Y\mid Z,X\right)\right]
+E[{w1(X)μ1(X)μ1}s(X)]\displaystyle+E\left[\left\{w_{1}\left(X\right)\mu_{1}\left(X\right)-\mu_{1}\right\}s\left(X\right)\right]
=\displaystyle= E([{Ze(X)}{11ε1(X)}μ1(X)+w1(X)Z{Yμ1(X)}e(X)+w1(X)μ1(X)μ1]s(V))\displaystyle E\left(\left[\left\{Z-e\left(X\right)\right\}\left\{1-\frac{1}{\varepsilon_{1}\left(X\right)}\right\}\mu_{1}\left(X\right)+w_{1}\left(X\right)\frac{Z\left\{Y-\mu_{1}\left(X\right)\right\}}{e\left(X\right)}+w_{1}\left(X\right)\mu_{1}\left(X\right)-\mu_{1}\right]s\left(V\right)\right)
=\displaystyle= E([w1(X)ZYe(X){Ze(X)}μ1(X)e(X)ε1(X)μ1]s(V)),\displaystyle E\left(\left[w_{1}\left(X\right)\frac{ZY}{e\left(X\right)}-\frac{\left\{Z-e\left(X\right)\right\}\mu_{1}\left(X\right)}{e\left(X\right)\varepsilon_{1}\left(X\right)}-\mu_{1}\right]s\left(V\right)\right),

where the second equality is by the properties of score functions and the last equality is by rearranging terms. Thus,

ϕ1(V)=w1(X)ZYe(X){Ze(X)}μ1(X)e(X)ε1(X)μ1.\phi_{1}\left(V\right)=w_{1}\left(X\right)\frac{ZY}{e\left(X\right)}-\frac{\left\{Z-e\left(X\right)\right\}\mu_{1}\left(X\right)}{e\left(X\right)\varepsilon_{1}\left(X\right)}-\mu_{1}.

We can prove

ϕ0(V)=w0(X)(1Z)Y1e(X){e(X)Z}μ0(X)ε0(X)1e(X)μ0\phi_{0}\left(V\right)=w_{0}(X)\frac{(1-Z)Y}{1-e(X)}-\frac{\{e(X)-Z\}\mu_{0}(X)\varepsilon_{0}(X)}{1-e(X)}-\mu_{0}

using a similar calculation.

S3.4 Proof of Theorem 4

If the fitted propensity score model converges to e(X)e^{*}(X) and the fitted outcome models converge to μ1(X)\mu_{1}^{*}(X) and μ0(X)\mu_{0}^{*}(X), the doubly robust estimator has limit τdr=μ1drμ0dr\tau^{\textup{dr}}=\mu_{1}^{\textup{dr}}-\mu_{0}^{\textup{dr}}, where

μ1dr=E[{e(X)+1e(X)ε1(X)}ZYe(X)]E{Ze(X)e(X)μ1(X)ε1(X)},\displaystyle\mu_{1}^{\textup{dr}}=E\left[\left\{e^{*}(X)+\frac{1-e^{*}(X)}{\varepsilon_{1}(X)}\right\}\frac{ZY}{e^{*}(X)}\right]-E\left\{\frac{Z-e^{*}(X)}{e^{*}(X)}\frac{\mu_{1}^{*}(X)}{\varepsilon_{1}(X)}\right\},

and the formula for μ0dr\mu_{0}^{\textup{dr}} is omitted due to similarity. We will only show that μ1dr=E{Y(1)}\mu_{1}^{\textup{dr}}=E\{Y(1)\} if either the propensity score or the outcome model is correctly specified. The proof for the control potential outcome is analogous.

If e(X)=e(X)e^{*}(X)=e(X), the conclusion is obvious since the inverse probability weighting estimator is consistent and the additional term has mean zero. If μ1(X)=μ1(X)\mu_{1}^{*}(X)=\mu_{1}(X), then by definition and the law of iterated expectations, μ1drE{Y(1)}\mu_{1}^{\textup{dr}}-E\{Y(1)\} equals the expectation of

μ1(X)ε1(X)e(X)Δ(X)\frac{\mu_{1}(X)}{\varepsilon_{1}(X)e^{*}(X)}\Delta(X)

where

Δ(X)\displaystyle\Delta(X) =\displaystyle= {e(X)ε1(X)+1e(X)}e(X){e(X)e(X)}{e(X)ε1(X)+1e(X)}e(X)\displaystyle\{e^{*}(X)\varepsilon_{1}(X)+1-e^{*}(X)\}e(X)-\{e(X)-e^{*}(X)\}-\{e(X)\varepsilon_{1}(X)+1-e(X)\}e^{*}(X)
=\displaystyle= e(X)ε1(X)e(X)+e(X)e(X)e(X)e(X)+e(X)\displaystyle e^{*}(X)\varepsilon_{1}(X)e(X)+e(X)-e^{*}(X)e(X)-e(X)+e^{*}(X)
e(X)ε1(X)e(X)e(X)+e(X)e(X)\displaystyle-e(X)\varepsilon_{1}(X)e^{*}(X)-e^{*}(X)+e(X)e^{*}(X)
=\displaystyle= 0.\displaystyle 0.

Therefore, if μ1(X)=μ1(X)\mu_{1}^{*}(X)=\mu_{1}(X), then μ1dr=E{Y(1)}\mu_{1}^{\textup{dr}}=E\{Y(1)\}.

S3.5 Proof of Proposition 1

Recall that εz,l\varepsilon_{z,\textsc{l}} and εz,u\varepsilon_{z,\textsc{u}} are the lower and upper bound of εz(X)\varepsilon_{z}(X), i.e., εz(X)[εz,l,εz,u]\varepsilon_{z}(X)\in[\varepsilon_{z,\textsc{l}},\varepsilon_{z,\textsc{u}}] for z=0,1z=0,1. First, when Y0Y\geq 0, we have μz(X)=E(YX,Z=z)0\mu_{z}(X)=E(Y\mid X,Z=z)\geq 0 for z=0,1z=0,1. Therefore, (1Z)μ1(X)0(1-Z)\mu_{1}(X)\geq 0 and Zμ0(X)0Z\mu_{0}(X)\geq 0. By the identification formula in Theorem 1, we have

τ\displaystyle\tau \displaystyle\geq E{Zμ1(X)+(1Z)μ1(X)ε1,uZμ0(X)ε0,u(1Z)μ0(X)}.\displaystyle E\left\{Z\mu_{1}(X)+\frac{(1-Z)\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}}-Z\mu_{0}(X)\varepsilon_{0,\textsc{u}}-(1-Z)\mu_{0}(X)\right\}.

Second, when Y0Y\geq 0, we have {1e(X)}ZY/e(X)0\{1-e(X)\}ZY/e(X)\geq 0 and e(X)(1Z)Y/{1e(X)}0e(X)(1-Z)Y/\{1-e(X)\}\geq 0. Therefore, by the identification formula in Theorem 2, we have

τ\displaystyle\tau \displaystyle\geq E[{e(X)+1e(X)ε1,u}ZYe(X){ε0,u+1e(X)}(1Z)Y1e(X)].\displaystyle E\left[\left\{e(X)+\frac{1-e(X)}{\varepsilon_{1,\textsc{u}}}\right\}\frac{ZY}{e(X)}-\left\{\varepsilon_{0,\textsc{u}}+1-e(X)\right\}\frac{(1-Z)Y}{1-e(X)}\right].

We prove the third formula by showing the third lower bound is equal to the previous two bounds if either the propensity score or the outcome model is correctly specified. If the propensity score model is correct, we have

E[{Ze(X)}μ1(X)e(X)]=E[{Ze(X)}μ0(X)1e(X)]=0.\displaystyle E\left[\frac{\left\{Z-e(X)\right\}\mu_{1}(X)}{e(X)}\right]=E\left[\frac{\left\{Z-e(X)\right\}\mu_{0}(X)}{1-e(X)}\right]=0.

If the outcome model is correct, we have

E[{e(X)+1e(X)ε1,u}ZYe(X){Ze(X)}μ1(X)ε1,ue(X)]\displaystyle E\left[\left\{e(X)+\frac{1-e(X)}{\varepsilon_{1,\textsc{u}}}\right\}\frac{ZY}{e(X)}-\frac{\left\{Z-e(X)\right\}\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}e(X)}\right]
=\displaystyle= E[ZY+{1e(X)}ZYε1,ue(X){Ze(X)}μ1(X)ε1,ue(X)]\displaystyle E\left[ZY+\frac{\left\{1-e(X)\right\}ZY}{\varepsilon_{1,\textsc{u}}e(X)}-\frac{\left\{Z-e(X)\right\}\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}e(X)}\right]
=\displaystyle= E[Zμ1(X)+(1Z)μ1(X)ε1,u]\displaystyle E\left[Z\mu_{1}(X)+\frac{(1-Z)\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}}\right]
+E[Z{Yμ1(X)}(1Z)μ1(X)e(X){1e(X)}ZY+{Ze(X)}μ1(X)ε1,ue(X)]\displaystyle+E\left[Z\left\{Y-\mu_{1}(X)\right\}-\frac{(1-Z)\mu_{1}(X)e(X)-\left\{1-e(X)\right\}ZY+\left\{Z-e(X)\right\}\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}e(X)}\right]
=\displaystyle= E[Zμ1(X)+(1Z)μ1(X)ε1,u]+E[{1+1e(X)ε1,ue(X)}Z{Yμ1(X)}]\displaystyle E\left[Z\mu_{1}(X)+\frac{(1-Z)\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}}\right]+E\left[\left\{1+\frac{1-e(X)}{\varepsilon_{1,\textsc{u}}e(X)}\right\}Z\left\{Y-\mu_{1}(X)\right\}\right]
=\displaystyle= E[Zμ1(X)+(1Z)μ1(X)ε1,u],\displaystyle E\left[Z\mu_{1}(X)+\frac{(1-Z)\mu_{1}(X)}{\varepsilon_{1,\textsc{u}}}\right],

where the last equality follows from the fact that μ1(X)=E(YZ=1,X)\mu_{1}(X)=E(Y\mid Z=1,X). Similarly, we have

E[{ε0,ue(X)+1e(X)}(1Z)Y1e(X){e(X)=Z}μ0(X)ε0,u1e(X)]\displaystyle E\left[\left\{\varepsilon_{0,\textsc{u}}e(X)+1-e(X)\right\}\frac{(1-Z)Y}{1-e(X)}-\frac{\left\{e(X)=Z\right\}\mu_{0}(X)\varepsilon_{0,\textsc{u}}}{1-e(X)}\right]
=\displaystyle= E{Zμ0(X)ε0,u+(1Z)μ0(X)}.\displaystyle E\left\{Z\mu_{0}(X)\varepsilon_{0,\textsc{u}}+(1-Z)\mu_{0}(X)\right\}.

Therefore, the third formula of the lower bound is equal to the first two formulas if either the propensity score or the outcome model is correct.

The upper bound for τ\tau can be derived similarly.

S3.6 Proof of Theorem S1

On the one hand, we have

E{Y(0)Z=1}\displaystyle E\{Y(0)\mid Z=1\} =\displaystyle= E[E{Y(0)Z=1,X}Z=1]\displaystyle E\big{[}E\{Y(0)\mid Z=1,X\}\mid Z=1\Big{]}
=\displaystyle= E{μ0(X)ε0(X)Z=1}\displaystyle E\left\{\mu_{0}(X)\varepsilon_{0}(X)\mid Z=1\right\}
=\displaystyle= E{Zμ0(X)ε0(X)}/e.\displaystyle E\left\{Z\mu_{0}(X)\varepsilon_{0}(X)\right\}/e.

Conditioning on XX, we can simplify it to E{Y(0)Z=1}=E{e(X)μ0(X)ε0(X)}/e.E\{Y(0)\mid Z=1\}=E\left\{e(X)\mu_{0}(X)\varepsilon_{0}(X)\right\}/e.

On the other hand, we condition on XX to obtain

E{e(X)ε0(X)1Z1e(X)Y}=E{e(X)ε0(X)E(YZ=0,X)}=E{e(X)ε0(X)μ0(X)}.E\left\{e(X)\varepsilon_{0}(X)\frac{1-Z}{1-e(X)}Y\right\}=E\left\{e(X)\varepsilon_{0}(X)E(Y\mid Z=0,X)\right\}=E\left\{e(X)\varepsilon_{0}(X)\mu_{0}(X)\right\}.

Then the conclusion follows.

S3.7 Proof of Theorem S2

We first present a short but informal proof. As a byproduct of the Proof of Theorem S1, we have E{ZY(0)}=E{ε0(X)e(X)μ0(X)}E\{ZY(0)\}=E\left\{\varepsilon_{0}(X)e(X)\mu_{0}(X)\right\}, which is a weighted average of μ0(X)\mu_{0}(X). The result follows from Hahn with an additional term due to the derivative of the weight with respect to e(X)e(X):

ε0(X)e(X){1Z1e(X)Ye(X)Z1e(X)μ0(X)}+ε0(X){Ze(X)}μ0(X)\displaystyle\varepsilon_{0}(X)e(X)\left\{\frac{1-Z}{1-e(X)}Y-\frac{e(X)-Z}{1-e(X)}\mu_{0}(X)\right\}+\varepsilon_{0}(X)\{Z-e(X)\}\mu_{0}(X)
=\displaystyle= ε0(X)e(X)1Z1e(X)Yε0(X){e(X)Z}μ0(X){e(X)1e(X)+1},\displaystyle\varepsilon_{0}(X)e(X)\frac{1-Z}{1-e(X)}Y-\varepsilon_{0}(X)\{e(X)-Z\}\mu_{0}(X)\left\{\frac{e(X)}{1-e(X)}+1\right\},

which reduces to the formula of ϕt0\phi_{\textsc{t}0} by simple algebra.

We then present a long but formal proof. We follow the same framework and notations as in the proof of Theorem 3. Again, we have μt0=E{ZY(0)}/e=E{ε0(X)e(X)μ0(X)}/e\mu_{\textsc{t}0}=E\left\{ZY(0)\right\}/e=E\left\{\varepsilon_{0}(X)e(X)\mu_{0}(X)\right\}/e as a byproduct of the Proof of Theorem S1. For a parametric submodel with parameter θ\theta, we have

θEθ{ZY(0)}|θ=0\displaystyle\left.\frac{\partial}{\partial\theta}E_{\theta}\left\{ZY(0)\right\}\right|_{\theta=0} =\displaystyle= θ{ε0(x)eθ(x)μ0,θ(x)pθ(x)𝑑x}|θ=0\displaystyle\left.\frac{\partial}{\partial\theta}\left\{\int\varepsilon_{0}(x)e_{\theta}(x)\mu_{0,\theta}(x)p_{\theta}(x)dx\right\}\right|_{\theta=0}
=\displaystyle= E{ε0(X)e˙θ(X)|θ=0μ0(X)}\displaystyle E\left\{\varepsilon_{0}(X)\left.\dot{e}_{\theta}(X)\right|_{\theta=0}\mu_{0}(X)\right\}
+E{ε0(X)e(X)μ˙0,θ(X)|θ=0}\displaystyle+E\left\{\varepsilon_{0}(X)e(X)\left.\dot{\mu}_{0,\theta}(X)\right|_{\theta=0}\right\}
+E{ε0(X)e(X)μ0(X)s(X)}\displaystyle+E\left\{\varepsilon_{0}(X)e(X)\mu_{0}(X)s(X)\right\}
=\displaystyle= E[ε0(X){Ze(X)}μ0(X)s(ZX)]\displaystyle E\left[\varepsilon_{0}(X)\left\{Z-e\left(X\right)\right\}\mu_{0}(X)s\left(Z\mid X\right)\right]
+E[ε0(X)e(X)(1Z){Yμ0(X)}1e(X)s(YZ,X)]\displaystyle+E\left[\varepsilon_{0}(X)e(X)\frac{\left(1-Z\right)\left\{Y-\mu_{0}\left(X\right)\right\}}{1-e\left(X\right)}s\left(Y\mid Z,X\right)\right]
+E[{ε0(X)e(X)μ0(X)eμt0}s(X)]\displaystyle+E\left[\left\{\varepsilon_{0}(X)e(X)\mu_{0}(X)-e\mu_{\textsc{t}0}\right\}s(X)\right]
=\displaystyle= E([Zε0(X)μ0(X)+ε0(X)e(X)(1Z){Yμ0(X)}1e(X)eμt0]s(V)),\displaystyle E\left(\left[Z\varepsilon_{0}(X)\mu_{0}(X)+\varepsilon_{0}(X)e(X)\frac{\left(1-Z\right)\left\{Y-\mu_{0}\left(X\right)\right\}}{1-e(X)}-e\mu_{\textsc{t}0}\right]s(V)\right),

by the facts that

e˙θ(X)|θ=0\displaystyle\left.\dot{e}_{\theta}\left(X\right)\right|_{\theta=0} =\displaystyle= E{(Ze(X))s(ZX)X},\displaystyle E\left\{\left(Z-e\left(X\right)\right)s\left(Z\mid X\right)\mid X\right\},
μ˙0,θ(X)|θ=0\displaystyle\left.\dot{\mu}_{0,\theta}(X)\right|_{\theta=0} =\displaystyle= Eθ(YZ=0,X)θ|θ=0\displaystyle\left.\frac{\partial E_{\theta}\left(Y\mid Z=0,X\right)}{\partial\theta}\right|_{\theta=0}
=\displaystyle= E{(Yμ0(X))s(YZ=0,X)Z=0,X}\displaystyle E\left\{\left(Y-\mu_{0}\left(X\right)\right)s\left(Y\mid Z=0,X\right)\mid Z=0,X\right\}
=\displaystyle= E[(1Z){Yμ0(X)}1e(X)s(YZ,X)X],\displaystyle E\left[\frac{\left(1-Z\right)\left\{Y-\mu_{0}\left(X\right)\right\}}{1-e\left(X\right)}s\left(Y\mid Z,X\right)\mid X\right],

and the properties of the score functions.

We also have that e˙θ|θ=0=E{(Ze)s(X)}\left.\dot{e}_{\theta}\right|_{\theta=0}=E\left\{\left(Z-e\right)s\left(X\right)\right\}. Therefore, the efficient influence function for μt0\mu_{\textsc{t}0} is

ϕt0\displaystyle\phi_{\textsc{t}0} =\displaystyle= e1[Zε0(X)μ0(X)+ε0(X)e(X)(1Z){Yμ0(X)}1e(X)eμt0]e2{eμt0(Ze)}\displaystyle e^{-1}\left[Z\varepsilon_{0}(X)\mu_{0}(X)+\varepsilon_{0}(X)e(X)\frac{\left(1-Z\right)\left\{Y-\mu_{0}\left(X\right)\right\}}{1-e(X)}-e\mu_{\textsc{t}0}\right]-e^{-2}\left\{e\mu_{\textsc{t}0}\left(Z-e\right)\right\}
=\displaystyle= e1[Z{ε0(X)μ0(X)μt0}+ε0(X)e(X)(1Z){Yμ0(X)}1e(X)].\displaystyle e^{-1}\left[Z\left\{\varepsilon_{0}(X)\mu_{0}(X)-\mu_{\textsc{t}0}\right\}+\varepsilon_{0}(X)e(X)\frac{\left(1-Z\right)\left\{Y-\mu_{0}\left(X\right)\right\}}{1-e(X)}\right].

This concludes the proof.

S3.8 Proof of Theorem S3

If the fitted propensity score model converges to e(X)e^{*}(X) and the fitted outcome model under control converge to μ0(X)\mu_{0}^{*}(X), the doubly robust estimator has limit τtdr=E{Y(1)Z=1}μt0dr\tau_{\textsc{t}}^{\textup{dr}}=E\{Y(1)\mid Z=1\}-\mu_{\textsc{t}0}^{\textup{dr}}, where

μt0dr=E[ε0(X){e(X)1Z1e(X)Ye(X)Z1e(X)μ0(X)}]/e.\displaystyle\mu_{\textsc{t}0}^{\textup{dr}}=E\left[\varepsilon_{0}(X)\left\{e^{*}(X)\frac{1-Z}{1-e^{*}(X)}Y-\frac{e^{*}(X)-Z}{1-e^{*}(X)}\mu_{0}^{*}(X)\right\}\right]/e.

If e(X)=e(X)e^{*}(X)=e(X), the conclusion is obvious since the inverse probability weighting estimator is consistent and the additional term has mean zero. If μ0(X)=μ0(X)\mu_{0}^{*}(X)=\mu_{0}(X), then by definition, e[μt0drE{Y(0)Z=1}]e\left[\mu_{\textsc{t}0}^{\textup{dr}}-E\{Y(0)\mid Z=1\}\right] equals the expectation of

ε0(X)μ0(X)1e(X)Δt(X)\frac{\varepsilon_{0}(X)\mu_{0}(X)}{1-e^{*}(X)}\Delta_{\textsc{t}}(X)

where

Δt(X)=e(X){1e(X)}{e(X)e(X)}e(X){1e(X)}=0.\Delta_{\textsc{t}}(X)=e^{*}(X)\{1-e(X)\}-\{e^{*}(X)-e(X)\}-e(X)\{1-e^{*}(X)\}=0.

Therefore, if μ0(X)=μ0(X)\mu_{0}^{*}(X)=\mu_{0}(X), then μt0dr\mu_{\textsc{t}0}^{\textup{dr}} equals E{Y(1)Z=1}E\{Y(1)\mid Z=1\}.

S3.9 Proof of Proposition S1

Recall that ε0,l\varepsilon_{0,\textsc{l}} and ε0,u\varepsilon_{0,\textsc{u}} are the lower and upper bound of ε0(X)\varepsilon_{0}(X), i.e., ε0(X)[ε0,l,ε0,u]\varepsilon_{0}(X)\in[\varepsilon_{0,\textsc{l}},\varepsilon_{0,\textsc{u}}].

First, when Y0Y\geq 0, we have μ0(X)=E(YX,Z=0)0\mu_{0}(X)=E(Y\mid X,Z=0)\geq 0. Therefore, Zμ0(X)0Z\mu_{0}(X)\geq 0. By the first identification formula in Theorem S1, we have

τT\displaystyle\tau_{\mathrm{\scriptscriptstyle T}} \displaystyle\geq E(YZ=1)ε0,uE{Zμ0(X)}e.\displaystyle E(Y\mid Z=1)-\varepsilon_{0,\textsc{u}}\frac{E\left\{Z\mu_{0}(X)\right\}}{e}.

Second, when Y0Y\geq 0, we have e(X)(1Z)Y/[e{1e(X)}]0e(X)(1-Z)Y/[e\{1-e(X)\}]\geq 0. Therefore, by the second identification formula for μT0\mu_{{\mathrm{\scriptscriptstyle T}}0} in Theorem S1, we have

τT\displaystyle\tau_{\mathrm{\scriptscriptstyle T}} \displaystyle\geq E(YZ=1)ε0,uE[e(X)(1Z)Ye{1e(X)}].\displaystyle E(Y\mid Z=1)-\varepsilon_{0,\textsc{u}}E\left[\frac{e(X)(1-Z)Y}{e\left\{1-e(X)\right\}}\right].

Again, we prove the third formula by showing the third lower bound is equal to the previous two bounds if either the propensity score or the outcome model is correctly specified. If the propensity score model is correct, we have

E[{Ze(X)}μ0(X)1e(X)]\displaystyle E\left[\frac{\left\{Z-e(X)\right\}\mu_{0}(X)}{1-e(X)}\right] =\displaystyle= 0.\displaystyle 0.

If the outcome model is correct, we have

E[e(X)(1Z){Yμ0(X)}1e(X)]\displaystyle E\left[e(X)\frac{(1-Z)\left\{Y-\mu_{0}(X)\right\}}{1-e(X)}\right] =\displaystyle= 0.\displaystyle 0.

Therefore, the third formula of the lower bound is equal to the first two formulas if either the propensity score or outcome model is correct.

The upper bound for τ\tau can be derived similarly.

S3.10 Proof of Proposition S2

We compute the semiparametric efficiency bound for τ\tau and τT\tau_{\mathrm{\scriptscriptstyle T}} in the following two parts, respectively.

S3.10.1 The semiparametric efficiency bound for τ\tau

With

var(ϕτ)\displaystyle\text{var}(\phi_{\tau}) =\displaystyle= var(ϕ1ϕ0)=var(ϕ1)+var(ϕ0)2cov(ϕ1,ϕ0),\displaystyle\text{var}(\phi_{1}-\phi_{0})\ =\ \text{var}(\phi_{1})+\text{var}(\phi_{0})-2\text{cov}(\phi_{1},\phi_{0}), (S8)

we compute the three terms on the right-hand side separately. Let μz\mu_{z} denote E{Y(z)}E\{Y(z)\} for z=0,1z=0,1, we can rewrite the first term in (S8) as

var(ϕ1)\displaystyle\text{var}(\phi_{1}) =\displaystyle= var(𝒯1)+var(𝒯2)2cov(𝒯1,𝒯2),\displaystyle\text{var}(\mathcal{T}_{1})+\text{var}(\mathcal{T}_{2})-2\text{cov}(\mathcal{T}_{1},\mathcal{T}_{2}),

where 𝒯1=w1(X)ZY/e(X)μ1\mathcal{T}_{1}=w_{1}(X)ZY/e(X)-\mu_{1} and 𝒯2={Ze(X)}μ1(X)/{e(X)ε1(X)}\mathcal{T}_{2}=\{Z-e(X)\}\mu_{1}(X)/\{e(X)\varepsilon_{1}(X)\}. We have

var(𝒯1)\displaystyle\text{var}(\mathcal{T}_{1}) =\displaystyle= E{w12(X)ZY2e2(X)+μ122w1(X)ZYe(X)μ1}\displaystyle E\left\{w_{1}^{2}(X)\frac{ZY^{2}}{e^{2}(X)}+\mu_{1}^{2}-2w_{1}(X)\frac{ZY}{e(X)}\mu_{1}\right\} (S9)
=\displaystyle= E[w12(X)e(X)E{Y2(1)X,Z=1}]+μ122μ1E{w1(X)ZYe(X)}\displaystyle E\left[\frac{w_{1}^{2}(X)}{e(X)}E\left\{Y^{2}(1)\mid X,Z=1\right\}\right]+\mu_{1}^{2}-2\mu_{1}E\left\{w_{1}(X)\frac{ZY}{e(X)}\right\}
=\displaystyle= E[w12(X)e(X){var(YX,Z=1)+μ12(X)}]μ12,\displaystyle E\left[\frac{w_{1}^{2}(X)}{e(X)}\left\{\text{var}(Y\mid X,Z=1)+\mu_{1}^{2}(X)\right\}\right]-\mu_{1}^{2},
var(𝒯2)\displaystyle\text{var}(\mathcal{T}_{2}) =\displaystyle= E[{Ze(X)}2μ12(X)e2(X)ε12(X)]=E{μ12(X)1e(X)e(X)ε12(X)},\displaystyle E\left[\frac{\left\{Z-e(X)\right\}^{2}\mu_{1}^{2}(X)}{e^{2}(X)\varepsilon_{1}^{2}(X)}\right]\ =\ E\left\{\mu_{1}^{2}(X)\frac{1-e(X)}{e(X)\varepsilon_{1}^{2}(X)}\right\}, (S10)
cov(𝒯1,𝒯2)\displaystyle\text{cov}(\mathcal{T}_{1},\mathcal{T}_{2}) =\displaystyle= E[w1(X)μ1(X)e2(X)ε1(X)Z{1e(X)}Y]\displaystyle-E\left[\frac{w_{1}(X)\mu_{1}(X)}{e^{2}(X)\varepsilon_{1}(X)}Z\left\{1-e(X)\right\}Y\right] (S11)
=\displaystyle= E[w1(X)μ12(X){1e(X)}e(X)ε1(X)].\displaystyle-E\left[\frac{w_{1}(X)\mu_{1}^{2}(X)\left\{1-e(X)\right\}}{e(X)\varepsilon_{1}(X)}\right].

Combining (S9)–(S11), we have

var(ϕ1)\displaystyle\text{var}(\phi_{1}) =\displaystyle= E[w12(X)e(X)var(YX,Z=1)μ12+{e(X)+1e(X)ε12(X)}μ12(X)].\displaystyle E\left[\frac{w_{1}^{2}(X)}{e(X)}\text{var}(Y\mid X,Z=1)-\mu_{1}^{2}+\left\{e(X)+\frac{1-e(X)}{\varepsilon_{1}^{2}(X)}\right\}\mu_{1}^{2}(X)\right].

Similarly, the second term in (S8) equals

var(ϕ0)\displaystyle\text{var}(\phi_{0}) =\displaystyle= E[w02(X)1e(X)var(YX,Z=0)μ02+{ε02(X)e(X)+1e(X)}μ02(X)].\displaystyle E\left[\frac{w_{0}^{2}(X)}{1-e(X)}\text{var}(Y\mid X,Z=0)-\mu_{0}^{2}+\left\{\varepsilon_{0}^{2}(X)e(X)+1-e(X)\right\}\mu_{0}^{2}(X)\right].

The third term in (S8) is cov(ϕ1,ϕ0)=E{(𝒯1𝒯2)(𝒯3𝒯4)}\text{cov}(\phi_{1},\phi_{0})=E\{(\mathcal{T}_{1}-\mathcal{T}_{2})(\mathcal{T}_{3}-\mathcal{T}_{4})\}, where similarly 𝒯3=w0(X)(1Z)Y/{1e(X)}μ0\mathcal{T}_{3}=w_{0}(X)(1-Z)Y/\{1-e(X)\}-\mu_{0} and 𝒯4={e(X)Z}μ0(X)ε0(X)/{1e(X)}\mathcal{T}_{4}=\{e(X)-Z\}\mu_{0}(X)\varepsilon_{0}(X)/\{1-e(X)\}. We have

E(𝒯1𝒯3)\displaystyle E(\mathcal{T}_{1}\mathcal{T}_{3}) =\displaystyle= 0μ1E{w0(X)(1Z)Y1e(X)}μ0E{w1(X)ZYe(X)}+μ1μ0=μ1μ0,\displaystyle 0-\mu_{1}E\left\{w_{0}(X)\frac{(1-Z)Y}{1-e(X)}\right\}-\mu_{0}E\left\{w_{1}(X)\frac{ZY}{e(X)}\right\}+\mu_{1}\mu_{0}\ =\ -\mu_{1}\mu_{0}, (S12)
E(𝒯2𝒯3)\displaystyle E(\mathcal{T}_{2}\mathcal{T}_{3}) =\displaystyle= E{w0(X)1Z1e(X)YZe(X)e(X)ε1(X)μ1(X)}\displaystyle E\left\{w_{0}(X)\frac{1-Z}{1-e(X)}Y\frac{Z-e(X)}{e(X)\varepsilon_{1}(X)}\mu_{1}(X)\right\} (S13)
=\displaystyle= E[w0(X)μ1(X)e(X){1e(X)}ε1(X)(1Z)Y]\displaystyle-E\left[\frac{w_{0}(X)\mu_{1}(X)}{e(X)\left\{1-e(X)\right\}\varepsilon_{1}(X)}(1-Z)Y\right]
=\displaystyle= E{w0(X)μ1(X)μ0(X)ε1(X)},\displaystyle-E\left\{\frac{w_{0}(X)\mu_{1}(X)\mu_{0}(X)}{\varepsilon_{1}(X)}\right\},
E(𝒯1𝒯4)\displaystyle E(\mathcal{T}_{1}\mathcal{T}_{4}) =\displaystyle= E[w1(X)μ0(X)ε0(X)e(X)ZY]=E{w1(X)μ1(X)μ0(X)ε0(X)},\displaystyle-E\left[\frac{w_{1}(X)\mu_{0}(X)\varepsilon_{0}(X)}{e(X)}ZY\right]\ =\ -E\left\{w_{1}(X)\mu_{1}(X)\mu_{0}(X)\varepsilon_{0}(X)\right\}, (S14)
E(𝒯2𝒯4)\displaystyle E(\mathcal{T}_{2}\mathcal{T}_{4}) =\displaystyle= E[{Ze(X)}2μ1(X)μ0(X)ε0(X)e(X){1e(X)}ε1(X)]=E{ε0(X)ε1(X)μ1(X)μ0(X)}.\displaystyle E\left[\frac{-\left\{Z-e(X)\right\}^{2}\mu_{1}(X)\mu_{0}(X)\varepsilon_{0}(X)}{e(X)\left\{1-e(X)\right\}\varepsilon_{1}(X)}\right]\ =\ -E\left\{\frac{\varepsilon_{0}(X)}{\varepsilon_{1}(X)}\mu_{1}(X)\mu_{0}(X)\right\}. (S15)

Combining (S12)–(S15), we have

cov(ϕ1,ϕ0)\displaystyle\text{cov}(\phi_{1},\phi_{0}) =\displaystyle= μ1μ0+E[{1e(X)ε1(X)+ε0(X)e(X)}μ1(X)μ0(X)].\displaystyle-\mu_{1}\mu_{0}+E\left[\left\{\frac{1-e(X)}{\varepsilon_{1}(X)}+\varepsilon_{0}(X)e(X)\right\}\mu_{1}(X)\mu_{0}(X)\right].

Finally, plugging in the three terms on the right-hand side of (S8), we have

var(ϕτ)\displaystyle\text{var}(\phi_{\tau}) =\displaystyle= E[w12(X)e(X)var(YX,Z=1)μ12+{e(X)+1e(X)ε12(X)}μ12(X)]\displaystyle E\left[\frac{w_{1}^{2}(X)}{e(X)}\text{var}(Y\mid X,Z=1)-\mu_{1}^{2}+\left\{e(X)+\frac{1-e(X)}{\varepsilon_{1}^{2}(X)}\right\}\mu_{1}^{2}(X)\right]
+E[w02(X)1e(X)var(YX,Z=0)μ02+{ε02(X)e(X)+1e(X)}μ02(X)]\displaystyle+E\left[\frac{w_{0}^{2}(X)}{1-e(X)}\text{var}(Y\mid X,Z=0)-\mu_{0}^{2}+\left\{\varepsilon_{0}^{2}(X)e(X)+1-e(X)\right\}\mu_{0}^{2}(X)\right]
+2μ1μ02E[{1e(X)ε1(X)+ε0(X)e(X)}μ1(X)μ0(X)]\displaystyle+2\mu_{1}\mu_{0}-2E\left[\left\{\frac{1-e(X)}{\varepsilon_{1}(X)}+\varepsilon_{0}(X)e(X)\right\}\mu_{1}(X)\mu_{0}(X)\right]
=\displaystyle= E{w12(X)e(X)var(YX,Z=1)+w02(X)1e(X)var(YX,Z=0)}(μ1μ0)2\displaystyle E\left\{\frac{w_{1}^{2}(X)}{e(X)}\text{var}(Y\mid X,Z=1)+\frac{w_{0}^{2}(X)}{1-e(X)}\text{var}(Y\mid X,Z=0)\right\}-\left(\mu_{1}-\mu_{0}\right)^{2}
+E[e(X){μ1(X)μ0(X)ε0(X)}2+{1e(X)}{μ1(X)ε1(X)μ0(X)}2].\displaystyle+E\left[e(X)\left\{\mu_{1}(X)-\mu_{0}(X)\varepsilon_{0}(X)\right\}^{2}+\left\{1-e(X)\right\}\left\{\frac{\mu_{1}(X)}{\varepsilon_{1}(X)}-\mu_{0}(X)\right\}^{2}\right].

S3.10.2 The semiparametric efficiency bound for τT\tau_{\mathrm{\scriptscriptstyle T}}

In the second part, we derive the semiparametric efficiency bound for τT\tau_{\mathrm{\scriptscriptstyle T}}. The efficient influence function for τT\tau_{\mathrm{\scriptscriptstyle T}} is

ϕτT\displaystyle\phi_{\tau_{\mathrm{\scriptscriptstyle T}}} =\displaystyle= Ze{YμT1}ϕT0=Ze{Yε0(X)μ0(X)τT}+ε0(X)e(X)e(1Z){Yμ0(X)}1e(X),\displaystyle\frac{Z}{e}\left\{Y-\mu_{{\mathrm{\scriptscriptstyle T}}1}\right\}-\phi_{{\mathrm{\scriptscriptstyle T}}0}\ =\ \frac{Z}{e}\left\{Y-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}+\frac{\varepsilon_{0}(X)e(X)}{e}\frac{(1-Z)\left\{Y-\mu_{0}(X)\right\}}{1-e(X)},

therefore, we can rewrite the semiparametric efficiency bound for τT\tau_{\mathrm{\scriptscriptstyle T}} as

var(ϕτT)\displaystyle\text{var}(\phi_{\tau_{\mathrm{\scriptscriptstyle T}}}) =\displaystyle= var(𝒯5)+var(𝒯6)+2cov(𝒯5,𝒯6),\displaystyle\text{var}(\mathcal{T}_{5})+\text{var}(\mathcal{T}_{6})+2\text{cov}(\mathcal{T}_{5},\mathcal{T}_{6}),

where 𝒯5=Z{Yε0(X)μ0(X)τT}/e\mathcal{T}_{5}=Z\{Y-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\}/e and 𝒯6=ε0(X)e(X)(1Z){Yμ0(X)}/[e{1e(X)}]\mathcal{T}_{6}=\varepsilon_{0}(X)e(X)(1-Z)\{Y-\mu_{0}(X)\}/[e\{1-e(X)\}]. We have

var(𝒯5)\displaystyle\text{var}(\mathcal{T}_{5}) =\displaystyle= E[Ze2{Yε0(X)μ0(X)τT}2]\displaystyle E\left[\frac{Z}{e^{2}}\left\{Y-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}^{2}\right] (S16)
=\displaystyle= E(E[Ze2{Yε0(X)μ0(X)τT}2X])\displaystyle E\left(E\left[\frac{Z}{e^{2}}\left\{Y-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}^{2}\mid X\right]\right)
=\displaystyle= E(e(X)e2E[{Yε0(X)μ0(X)τT}2X,Z=1])\displaystyle E\left(\frac{e(X)}{e^{2}}E\left[\left\{Y-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}^{2}\mid X,Z=1\right]\right)
=\displaystyle= E(e(X)e2E[{Yμ1(X)+μ1(X)ε0(X)μ0(X)τT}2X,Z=1])\displaystyle E\left(\frac{e(X)}{e^{2}}E\left[\left\{Y-\mu_{1}(X)+\mu_{1}(X)-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}^{2}\mid X,Z=1\right]\right)
=\displaystyle= E(e(X)e2[var(YX,Z=1)+{μ1(X)ε0(X)μ0(X)τT}2+0])\displaystyle E\left(\frac{e(X)}{e^{2}}\left[\text{var}(Y\mid X,Z=1)+\left\{\mu_{1}(X)-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}^{2}+0\right]\right)
=\displaystyle= E[e(X)var(YX,Z=1)e2+e(X){μ1(X)ε0(X)μ0(X)τT}2e2],\displaystyle E\left[\frac{e(X)\text{var}(Y\mid X,Z=1)}{e^{2}}+\frac{e(X)\left\{\mu_{1}(X)-\varepsilon_{0}(X)\mu_{0}(X)-\tau_{\mathrm{\scriptscriptstyle T}}\right\}^{2}}{e^{2}}\right],
var(𝒯6)\displaystyle\text{var}(\mathcal{T}_{6}) =\displaystyle= E[ε02(X)e2(X)e2(1Z){Yμ0(X)}2{1e(X)}2]\displaystyle E\left[\frac{\varepsilon_{0}^{2}(X)e^{2}(X)}{e^{2}}\frac{(1-Z)\left\{Y-\mu_{0}(X)\right\}^{2}}{\left\{1-e(X)\right\}^{2}}\right] (S17)
=\displaystyle= E(E[ε02(X)e2(X)e2(1Z){Yμ0(X)}2{1e(X)}2X])\displaystyle E\left(E\left[\frac{\varepsilon_{0}^{2}(X)e^{2}(X)}{e^{2}}\frac{(1-Z)\left\{Y-\mu_{0}(X)\right\}^{2}}{\left\{1-e(X)\right\}^{2}}\mid X\right]\right)
=\displaystyle= E(ε02(X)e2(X)e2{1e(X)}E[{Yμ0(X)}2X,Z=0])\displaystyle E\left(\frac{\varepsilon_{0}^{2}(X)e^{2}(X)}{e^{2}\left\{1-e(X)\right\}}E\left[\left\{Y-\mu_{0}(X)\right\}^{2}\mid X,Z=0\right]\right)
=\displaystyle= E[ε02(X)e2(X)var(YX,Z=0)e2{1e(X)}],\displaystyle E\left[\frac{\varepsilon_{0}^{2}(X)e^{2}(X)\text{var}(Y\mid X,Z=0)}{e^{2}\left\{1-e(X)\right\}}\right],
cov(𝒯5,𝒯6)\displaystyle\text{cov}(\mathcal{T}_{5},\mathcal{T}_{6}) =\displaystyle= E(𝒯5𝒯6)=0.\displaystyle E(\mathcal{T}_{5}\mathcal{T}_{6})=0. (S18)

Combining (S16)–(S18), we have the result in Proposition S2.