Evaluating the Conservativeness of Robust Sandwich Variance Estimator in Weighted Average Treatment Effects

Shunichiro Orihara Corresponding author
Address: 6-1-1 Shinjuku, Shinjuku-ku, Tokyo 160-8402, Japan
Email: [email protected] Department of Health Data Science, Tokyo Medical University, Tokyo, Japan

Abstract

In causal inference, the Inverse Probability Weighting (IPW) estimator is commonly used to estimate causal effects for estimands within the class of Weighted Average Treatment Effect (WATE). When constructing confidence intervals (CIs), robust sandwich variance estimators are frequently used for practical reasons. Although these estimators are easy to calculate using widely-used statistical software, they often yield narrow CIs for commonly applied estimands, such as the Average Treatment Effect on the Treated and the Average Treatment Effect for the Overlap Populations. In this manuscript, we reexamine the asymptotic variance of the IPW estimator and clarify the conditions under which CIs derived from the sandwich variance estimator are conservative. Additionally, we propose new criteria to assess the conservativeness of CIs. The results of this investigation are validated through simulation experiments and real data analysis.

Keywords: Balancing weight, Causal estimands, Causal null hypothesis, Inverse probability weighting, Propensity score

1 Introduction

In causal inference, selecting a valid causal estimand that represents causal effect interpretation is a crucial step in addressing clinically questions. The Average Treatment Effect (ATE) is a widely recognized causal estimand, comparing outcomes if all subjects receive an active treatment versus if they all receive a control treatment. The Average Treatment Effect on the Treated (ATT) is also well considered, comparing outcomes if treated subjects retain their treatment versus if they switch to a control treatment. Recently, the Average Treatment Effects for the Overlap Population (ATO), an estimand proposed by Li et al.[1], has gained attention. It offers several advantages over the ATE. For instance, the ATO assigns larger weights to subjects more likely to switch from their actual treatment to the ‘counterfactual’ treatment, making it a more robust measure of causal effect. The estimands belong in the category of ‘Weighted Average Treatment Effect’ (WATE[2, 1]), as they are viewed as specific weighted versions of the ATE. Further details can be found in Li et al.[1].

The estimands can be estimated as a weighted average of the outcome defined by the (estimated) propensity score [3]. The well-known estimator is the Inverse Probability (Treatment) Weighting (IP(T)W) estimator for the ATE. Since other weighted estimators for the WATE can be constructed in the same manner, we refer to these weighted estimators as the ‘IPW estimator’ in this manuscript[4]. The confidence interval (CI) for the IPW estimator is commonly derived from the asymptotic variance, taking into account estimating equations for both the weighted average of the outcome and the propensity score. However, in practice, the CI is often derived based only on the former; the uncertainty of the propensity score is sometimes overlooked. This is because implementing the ‘robust sandwich variance’ estimator for the weighted average of the outcome is straightforward. For example, sandwich in R, or the WHITE option in REG procedure in SAS can be easily implemented [5].

For the IPW estimator of the ATE, it is well-known that a CI that ignores the uncertainty of the propensity score tends to yield a more conservative CI compared to one that accounts for it[6]. We refer to the former as the ‘simple CI’ and the latter as the ‘exact CI’ in this manuscript. However, recent findings suggest that the simple CI for the ATT estimator might not always be conservative[5]. Consequently, the exact CI is thought of as more suitable for the IPW estimator for the ATT. From a practical standpoint, it is important to identify specific conditions under which the simple CI yields a conservative CI. Unfortunately, Reifeis and Hudgens [5] did not describe these situations clearly.

In this manuscript, we confirm again the asymptotic variance of the IPW estimator and clarify the conditions under which the simple CI yields a conservative CI. We focus on binary outcomes, which are commonly encountered in epidemiologic studies. In Section 2, we detail the form of the asymptotic variance for the IPW estimator within the WATE class, extending the works of Mao et al.[4] and Reifeis and Hudgens [5]. Section 3 investigates scenarios where the simple CI is conservative. Our findings show that under the Fisher sharp null hypothesis[7], or concerning the conditional ATE[8], the simple CI is accurately conservative, thereby reducing concerns about $\alpha$ -error inflation. Moreover, we explore broader situations and propose new criteria to assess whether the simple CI remains conservative. Then, validate these findings through simple simulation settings and a real-world data example on bladder cancer[9].

2 Methods

2.1 Definition of the WATE

Let $n$ be the sample size. $T_{i}\in\{0,1\}$ , $\boldsymbol{X}_{i}\in\mathcal{X}\subset\mathbb{R}^{p}$ and $(Y_{1i},Y_{0i})$ represent the treatment, a vector of covariates measured prior to treatment, and potential outcomes, respectively. In the main manuscript, we consider the binary outcome; $Y_{ti}\in\{0,1\}$ . Based on the stable unit treatment value assumption[3], the observed outcome is defined as $Y_{i}:=T_{i}Y_{1i}+(1-T_{i})Y_{0i}$ . Under these settings, we assume that i.i.d. copies $(T_{i},\boldsymbol{X}_{i},Y_{i})$ , $i=1,\,2,\,\dots,\,n$ are obtained. We further assume the strongly ignorable treatment assignment $(Y_{1i},Y_{0i})\mathop{\perp\!\!\!\perp}T_{i}|\boldsymbol{X_{i}}$ [3] for the subsequent discussions.

We now introduce the WATE. The ATE conditional on $\boldsymbol{x}$ is defined as $\tau(\boldsymbol{x}):={\rm E}[Y_{1}-Y_{0}|\boldsymbol{x}]$ , and the WATE is defined as

\tau_{w}=\mu_{w1}-\mu_{w0}:=\frac{{\rm E}[\tau(\boldsymbol{X})w(e(\boldsymbol{X}))]}{{\rm E}[w(e(\boldsymbol{X}))]},

where $e\equiv e(\boldsymbol{X}):={\rm Pr}\left(T=1|\boldsymbol{X}\right)$ is the propensity score[3], and $w(\cdot)$ represents the weight function for $e$ . When $w(e)\equiv 1$ , the WATE becomes the ATE: $\tau_{ATE}:={\rm E}[Y_{1}-Y_{0}]$ . When $w(e)=e$ , the WATE becomes the ATT

\displaystyle\tau_{ATT}:=\frac{{\rm E}[\tau(\boldsymbol{X})e(\boldsymbol{X})]}{{\rm E}[e(\boldsymbol{X})]}={\rm E}[\tau(\boldsymbol{X})|T=1]={\rm E}[Y_{1}-Y_{0}|T=1].

When $w(e)=e(1-e)$ , the WATE becomes the ATO[1]

\displaystyle\tau_{ATO}:=\frac{{\rm E}[\tau(\boldsymbol{X})e(\boldsymbol{X})(1-e(\boldsymbol{X}))]}{{\rm E}[e(\boldsymbol{X})(1-e(\boldsymbol{X}))]}.

Given that the function $e(1-e)$ , with $e\in(0,1)$ , is convex and symmetric around $0.5$ , subjects with a propensity score close to $0.5$ (meaning they could easily change their actual treatment to the “counterfactual” treatment) are weighted more heavily than those near $0$ or $1$ .

2.2 IPW Estimator and the Asymptotic Property of the WATE

As discussed in Mao et al.[4], the WATE can be estimated using the IPW estimator

\displaystyle\hat{\tau}_{w}=\hat{\mu}_{w1}-\hat{\mu}_{w0}=\frac{\sum_{i=1}^{n}W_{i}T_{i}Y_{i}}{\sum_{i=1}^{n}W_{i}T_{i}}-\frac{\sum_{i=1}^{n}W_{i}(1-T_{i})Y_{i}}{\sum_{i=1}^{n}W_{i}(1-T_{i})},

where

W_{i}=\frac{w(\hat{e}_{i})}{T_{i}\hat{e}_{i}+(1-T_{i})(1-\hat{e}_{i})},

and $\hat{e}_{i}\equiv e_{i}(\hat{\boldsymbol{\alpha}})=expit\left\{\boldsymbol{X}_{i}^{\top}\hat{\boldsymbol{\alpha}}\right\}$ denotes the estimated propensity score. In this manuscript, $\hat{\boldsymbol{\alpha}}$ is a Maximum Likelihood Estimator (MLE). Under the settings, the asymptotic distribution of the IPW estimator becomes

\sqrt{n}\left(\hat{\tau}_{w}-\tau_{w}^{0}\right)\stackrel{{\scriptstyle L}}{{\to}}N\left(\boldsymbol{0},\sigma^{2}\right),

where the asymptotic variance $\sigma^{2}$ is described as

\displaystyle\sigma^{2}=\frac{1}{a_{22}^{2}}(1,-1)\left(B_{22}+\delta A_{11}^{-1}\delta^{\top}-B_{12}A_{11}^{-1}B_{12}^{\top}\right)\left(\begin{array}[]{c}1\\ -1\end{array}\right),

(2.3)

with

A_{11}={\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right],\ \ a_{22}={\rm E}[w(e)],

B_{12}=\left(\begin{array}[]{c}b_{1}^{\top}\\ b_{2}^{\top}\end{array}\right):=\left(\begin{array}[]{c}{\rm E}\left[w(e)(Y_{1}-\mu_{w1})(1-e)\boldsymbol{X}^{\top}\right]\\ -{\rm E}\left[w(e)(Y_{0}-\mu_{w0})e\boldsymbol{X}^{\top}\right]\end{array}\right),

B_{22}=\left(\begin{array}[]{cc}{\rm E}\left[\frac{w(e)^{2}(Y_{1}-\mu_{w1})^{2}}{e}\right]&\\ 0&{\rm E}\left[\frac{w(e)^{2}(Y_{0}-\mu_{w0})^{2}}{1-e}\right]\end{array}\right),

and

\delta=\left(\begin{array}[]{c}\delta_{1}^{\top}\\ \delta_{2}^{\top}\end{array}\right):=\left(\begin{array}[]{c}{\rm E}\left[w^{\prime}(e)(Y_{1}-\mu_{w1})e(1-e)\boldsymbol{X^{\top}}\right]\\ {\rm E}\left[w^{\prime}(e)(Y_{0}-\mu_{w0})e(1-e)\boldsymbol{X^{\top}}\right]\end{array}\right).

For more details on the mathematical computations, please see Appendix A.

The first term of (2.3) forms the primary component of the asymptotic variance of the IPW estimator. Standard sandwich variance calculation functions, such as sandwich in R, compute only this term. The second and third terms of (2.3) are related to the variability of the propensity score estimation and are typically ignored when using the sandwich variance estimator. The CI constructed from the first term of (2.3) is referred to as the ‘simple CI’, whereas the CI constructed from all terms of (2.3) is termed the ‘exact CI’ in this manuscript.

2.3 Definition of Null Hypotheses and Homogeneous Effects

In this manuscript, some null hypotheses are considered. The Fisher sharp null, the Neyman null[7], and the null hypothesis concerning the conditional ATE[8] represent as follows.

Definition 1.

Fisher sharp null hypothesis (SN)

Y_{i1}=Y_{i0},\ \ \ \ \ ^{\forall}i=1,2,\dots,n

Definition 2.

Neyman null hypothesis (NN)

\tau_{w}=0\ \ \Leftrightarrow\ \ \frac{{\rm E}[w(e(\boldsymbol{X})){\rm E}\left[Y_{1}|\boldsymbol{X}\right]]}{{\rm E}[w(e(\boldsymbol{X}))]}=\frac{{\rm E}[w(e(\boldsymbol{X})){\rm E}\left[Y_{0}|\boldsymbol{X}\right]]}{{\rm E}[w(e(\boldsymbol{X}))]}

Definition 3.

Null hypothesis concerning the conditional ATE (CN)

\tau(\boldsymbol{x})=0,\ \ \ \ \ ^{\forall}\boldsymbol{x}\in\mathcal{X}

From the definitions, $SN\Rightarrow CN\Rightarrow NN$ .

Additionally, we define the homogeneous treatment effects as the complement of treatment effects heterogeneity[10].

Definition 4.

Homogeneous treatment effects

\tau(\boldsymbol{x})=\tau,\ \ \ \ \ ^{\forall}\boldsymbol{x}\in\mathcal{X}

Under the homogeneous treatment effects, $\tau_{w}\equiv\tau_{ATE}$ for all $w(\cdot)$ . Also, $NN$ becomes equivalent to $CN$ .

3 Results

3.1 Properties of the Simple CIs

In this section, we examine scenarios where simple CIs yield conservative results. In other words, we clarify the conditions under which the second and third terms of (2.3) can be ignored, thus preventing misinterpretation of statistical analysis results in practical applications.

3.1.1 For the ATE

From (2.3), a well-known conclusion can be simply derived.

Theorem 1.

For the ATE, the weight function is given by $w(e)\equiv 1$ . Therefore, $w^{\prime}(e)=0$ . This imply that $\delta=O$ , and the second and third term of (2.3) become precisely negative.

This theorem indicates that the simple CI always yields a conservative CI when we are interested in the ATE.

3.1.2 For the ATT

As mentioned by Reifeis and Hudgens [5], from the form of (2.3), it isn’t clear whether the standard sandwich variance for ATT, ATO, or certain estimands is conservative. From this point onward, to understand the asymptotic variance more clearly, we calculate the second and third term of (2.3) more precisely.

The values of $\delta$ and $B_{22}$ become the following, respectively:

	$\displaystyle\delta$	$\displaystyle=\left(\begin{array}[]{c}\delta_{1}^{\top}\\ \delta_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)\boldsymbol{X^{\top}}\right]\\ {\rm E}\left[(Y_{0}-\mu_{w0})e(1-e)\boldsymbol{X^{\top}}\right]\end{array}\right),$
	$\displaystyle B_{12}$	$\displaystyle=\left(\begin{array}[]{c}b_{1}^{\top}\\ b_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)\boldsymbol{X}^{\top}\right]\\ -{\rm E}\left[(Y_{0}-\mu_{w0})e^{2}\boldsymbol{X}^{\top}\right]\end{array}\right)=\left(\begin{array}[]{c}\delta_{1}^{\top}\\ \delta_{2}^{\top}-\alpha^{\top}\end{array}\right),$

where $\alpha={\rm E}\left[(Y_{0}-\mu_{w0})e\boldsymbol{X}\right]$ . Therefore, the second and third term of (2.3) become

$\displaystyle(1,-1)\left(\delta A_{11}^{-1}\delta^{\top}-B_{12}A_{11}^{-1}B_{12}^{\top}\right)\left(\begin{array}[]{c}1\\ -1\end{array}\right)$		(3.3)
	$\displaystyle=\delta_{1}^{\top}A_{11}^{-1}\delta_{1}-2\delta_{1}^{\top}A_{11}^{-1}\delta_{2}+\delta_{2}^{\top}A_{11}^{-1}\delta_{2}-b_{1}^{\top}A_{11}^{-1}b_{1}+2b_{1}^{\top}A_{11}^{-1}b_{2}-b_{2}^{\top}A_{11}^{-1}b_{2}$
	$\displaystyle=-\alpha^{\top}A_{11}^{-1}\alpha+2(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}\alpha$	(3.4)

From the results, the following conclusions can be derived.

Theorem 2.

For the ATT, when $SN$ , $CN$ , or the treatment effects are homogeneous, the second and third terms of (2.3) are precisely negative. Outside of these situations, when covariates have the positive support: $\mathcal{X}\subset\mathbb{R}^{p}_{\geq 0}$ , a sufficient condition for the second and third terms of (2.3) to be precisely negative is that the following inequality holds:

	$\displaystyle\gamma{\rm E}\left[e(1-e)\boldsymbol{X}^{\top}\right]{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e(1-e)\boldsymbol{X}\right]$
		$\displaystyle\leq{\rm E}\left[e(Y_{0}-\mu_{w0})\boldsymbol{X}^{\top}\right]{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e(Y_{0}-\mu_{w0})\boldsymbol{X}\right].$		(3.5)

When considering $NN$ , $\gamma$ is set to 4. Otherwise, $\gamma$ is set to 16.

The proof of Theorem 2 is in Appendix B. Note that while the former statement can be proven in the broader context of WATE, not limited to ATT, ATO, or binary outcome situations, we focus solely on these specific cases to simplify the proof.

This theorem suggests that the simple CI yields a conservative CI when focusing on certain null hypotheses or in the presence of a homogeneous treatment effect. Moreover, since the right-hand side of (2) is positive, we propose a new criterion to determine whether the simple CI is conservative: by verifying whether the left-hand side of (2) is approximately $0$ . If the left-hand side of (2) is sufficiently close to $0$ , it is expected that the inequality (2) will be satisfied.

3.1.3 For the ATO

Next, we consider the asymptotic variance of the ATO. In the same manner as the ATT, the values of $\delta$ and $B_{22}$ become the following, respectively:

	$\displaystyle\delta$	$\displaystyle=\left(\begin{array}[]{c}\delta_{1}^{\top}\\ \delta_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)(1-2e)\boldsymbol{X^{\top}}\right]\\ {\rm E}\left[(Y_{0}-\mu_{w0})e(1-e)(1-2e)\boldsymbol{X^{\top}}\right]\end{array}\right),$
	$\displaystyle B_{12}$	$\displaystyle=\left(\begin{array}[]{c}b_{1}^{\top}\\ b_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)^{2}\boldsymbol{X}^{\top}\right]\\ -{\rm E}\left[(Y_{0}-\mu_{w0})e^{2}(1-e)\boldsymbol{X}^{\top}\right]\end{array}\right)=\left(\begin{array}[]{c}\delta_{1}^{\top}+\alpha_{1}^{\top}\\ \delta_{2}^{\top}-\alpha_{2}^{\top}\end{array}\right),$

where $\alpha_{1}={\rm E}\left[(Y_{1}-\mu_{w1})e^{2}(1-e)\boldsymbol{X}\right]$ and $\alpha_{2}={\rm E}\left[(Y_{0}-\mu_{w0})e(1-e)^{2}\boldsymbol{X}\right]$ . Through the similar calculation as the ATT (3.3), the second and third term of (2.3) become

(1,-1)\left(\delta A_{11}^{-1}\delta^{\top}-B_{12}A_{11}^{-1}B_{12}^{\top}\right)\left(\begin{array}[]{c}1\\ -1\end{array}\right)=-(\alpha_{1}+\alpha_{2})^{\top}A_{11}^{-1}(\alpha_{1}+\alpha_{2})+2(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}(\alpha_{1}+\alpha_{2})\\

From the results, the following conclusions can be derived.

Theorem 3.

For the ATO, when $SN$ , $CN$ , or the treatment effects are homogeneous, the second and third terms of (2.3) are precisely negative. Outside of these situations, when covariates have the positive support: $\mathcal{X}\subset\mathbb{R}^{p}_{\geq 0}$ , a sufficient condition for the second and third terms of (2.3) to be precisely negative is that the following inequality holds:

$\displaystyle\gamma{\rm E}\left[e(1-e)(1-2e)\boldsymbol{X}^{\top}\right]{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e(1-e)(1-2e)\boldsymbol{X}\right]$
	$\displaystyle\leq{\rm E}\left[e(1-e)\left\{e(Y_{1}-\mu_{w1})+(1-e)(Y_{0}-\mu_{w0})\right\}\boldsymbol{X}^{\top}\right]{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}$
	$\displaystyle\times{\rm E}\left[e(1-e)\left\{e(Y_{1}-\mu_{w1})+(1-e)(Y_{0}-\mu_{w0})\right\}\boldsymbol{X}\right].$	(3.6)

When considering $NN$ , $\gamma$ is set to 4. Otherwise, $\gamma$ is set to 16.

Since the proof follows the same procedure as for the ATT situation, we omit it here.

Following the same discussion as with the ATT situation, we also propose a new criterion to determine whether the simple CI is conservative: by verifying whether the left-hand side of (3) is approximately $0$ . If the left-hand side of (3) is sufficiently close to $0$ , it is expected that the inequality (3) will be satisfied.

3.2 Comparison of the Proposed Criteria

We compare the proposed criteria to determine whether the simple CI yields a conservative CI: assessing if the left-hand side of (2) or (3) is sufficiently close to $0$ . Given that ${\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]>O$ , it is essential to satisfy to assess whether ${\rm E}\left[e(1-e)\boldsymbol{X}\right]\approx\boldsymbol{0}$ for the ATT or ${\rm E}\left[e(1-e)(1-2e)\boldsymbol{X}\right]\approx\boldsymbol{0}$ for the ATO. To evaluate these conditions, we consider the functions $f(e)=e(1-e)$ and $f(e)=e(1-e)(1-2e)$ for $e\in(0,1)$ . As shown in Figure 1, the function for the ATT forms a parabolic shape with a maximum value at 0.5, while the function for the ATO resembles a sine-like curve, varying between positive and negative around 0.5. This suggests that meeting the condition for the ATT is more challenging, whereas the condition for the ATO is likely easier to satisfy. In other words, the simple CI for the ATO tends to be more conservative compared to that for the ATT. To further explore these properties, we conduct simulation experiments in the following section.

Refer to caption — Figure 1: Function forms of the component of the proposed criteria

3.3 Simulation Experiments

In this section, we demonstrate that under CN conditions, the simple CI for the IPW estimator of both ATT and ATO yields conservative results. Furthermore, we illustrate that the proposed criteria can effectively determine whether the simple CI is conservative under NN conditions.

3.3.1 Data-Generating Mechanism

First, we will describe the Data-Generating Mechanism (DGM) used in this simulation. We consider two confounders, denoted as $X_{i1}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}N(50,5^{2})$ and $X_{i2}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{Bin}(0.5)$ . The assignment mechanism (true propensity score) related to these confounders is defined by ${\rm Pr}\left(T=1|\boldsymbol{X}_{i}\right)=expit\left\{\varepsilon\times(0.03(X_{i1}-50)+0.3(X_{i2}-0.5))\right\}$ . Here, $\varepsilon$ adjusts the relationship between the assignment and the confounders, influencing the variance of the propensity score. The outcome model is specified as ${\rm Pr}\left(Y_{0,i}=1|\boldsymbol{X}_{i}\right)=expit\left\{(1,X_{i1}-50,X_{i2}-0.5)\boldsymbol{\beta}_{0}\right\}$ and ${\rm Pr}\left(Y_{1,i}=1|\boldsymbol{X}_{i}\right)=expit\left\{(1,X_{i1}-50,X_{i2}-0.5)\boldsymbol{\beta}_{1}\right\}$ .

In subsequent simulation experiments, the parameters $\varepsilon$ , $\boldsymbol{\beta}_{0}$ , and $\boldsymbol{\beta}_{1}$ will be varied. Under CN conditions, we consider two scenarios; #1: $\varepsilon=1$ and $\boldsymbol{\beta}_{0}=\boldsymbol{\beta}_{1}=(0,-0.2,0.1)^{\top}$ , and #2: $\varepsilon=3$ and $\boldsymbol{\beta}_{0}=\boldsymbol{\beta}_{1}=(0,-0.2,0.1)^{\top}$ . Under NN conditions, we also consider two scenarios; #3: $\varepsilon=1$ , $\boldsymbol{\beta}_{0}=(0,-0.25,0.1)^{\top}$ , and $\boldsymbol{\beta}_{1}=(0.01,-0.4,4)^{\top}$ , and #4: $\varepsilon=3$ , $\boldsymbol{\beta}_{0}=(1,-0.5,-0.5)^{\top}$ , and $\boldsymbol{\beta}_{1}=(1,0.5,0.5)^{\top}$ . In the former scenarios, the CN condition is exactly satisfied, whereas in the latter scenarios, the NN condition is approximately met.

3.3.2 Performance Metrics

Simulation experiments are assessed by calculating the mean, empirical standard error (ESE), and the $\alpha$ -error across 1000 iterations. The $\alpha$ -error is calculated using the simple CI with both the true and estimated propensity scores, and using the exact CI with only the estimated propensity score. We then determine the proportion of cases in which the simple CI and the exact CI does not include 0.

3.3.3 Simulation Results

All simulation results are summarized in Table 1 and 2. Under the CN condition, both ATT and ATO exhibit conservative simple CIs, compared with the exact CIs. This tendency also holds when the variance of the propensity score is large ( $\varepsilon=3$ ).

Under the NN condition, with scenario #3, the left-hand side of (3) is approximately $1.07\times 10^{-2}$ , while the right-hand side is $1.84\times 10^{-2}$ . In this scenario, the simple CI yields a conservative result. However, when scenario #4, the left-hand side of (3) is approximately $7.04\times 10^{-2}$ , in contrast to the right-hand side, which is $0.09\times 10^{-2}$ . This indicates that the proposed criterion is not satisfied. In such cases, the simple CI exhibits $\alpha$ -error inflation (5.5 vs. 4.9).

Table 1: Summary Statistics Under the Null Hypothesis Concerning the Conditional ATE (CN): The sample size is 2000, and the number of iterations is 1000. The

\alpha

-errors are calculated using both the simple CI, which ignores the variability of the estimated propensity score, and the exact CI, which considers the variability.

			Summary statistics
		Propensity	Mean	ESE	$\alpha$ -error:
Scenario #	Estimand	score	( $\times 10^{-2}$ )	( $\times 10^{-2}$ )	simple CI (%)	exact CI (%)
1	ATT	True	0.080	2.224	4.1	–
		Estimated	0.094	2.036	2.9	4.4
	ATO	True	0.068	2.210	4.4	–
		Estimated	0.085	2.018	2.3	4.4
2	ATT	True	0.086	2.418	4.7	–
		Estimated	0.119	2.223	3.2	4.5
	ATO	True	0.053	2.313	4.2	–
		Estimated	0.084	2.101	2.5	4.4

ESE: empirical standard error; CI: confidence interval (calculated as 95%CI)
Situation #1: $\varepsilon=1$ and $\boldsymbol{\beta}_{0}=\boldsymbol{\beta}_{1}=(0,-0.2,0.1)^{\top}$
Situation #2: $\varepsilon=3$ and $\boldsymbol{\beta}_{0}=\boldsymbol{\beta}_{1}=(0,-0.2,0.1)^{\top}$

Table 2: Summary Statistics Under the Neyman Null (NN) for the ATO: The sample size is 2000, and the number of iterations is 1000. The

\alpha

-errors are calculated using both the simple CI, which ignores the variability of the estimated propensity score, and the exact CI, which considers the variability.

		Summary statistics
	Propensity	Mean	ESE	$\alpha$ -error:
Scenario #	score	( $\times 10^{-2}$ )	( $\times 10^{-2}$ )	simple CI (%)	exact CI (%)
3	True	0.162	2.250	5.3	–
	Estimated	0.160	1.878	2.4	4.8
4	True	-0.045	2.383	5.8	–
	Estimated	-0.049	2.428	5.5	4.9

ESE: empirical standard error; CI: 95% confidence interval (calculated as 95%CI)
Situation #3: $\varepsilon=1$ , $\boldsymbol{\beta}_{0}=(0,-0.25,0.1)^{\top}$ , and $\boldsymbol{\beta}_{1}=(0.01,-0.4,4)^{\top}$
Situation #4: $\varepsilon=3$ , $\boldsymbol{\beta}_{0}=(1,-0.5,-0.5)^{\top}$ , and $\boldsymbol{\beta}_{1}=(1,0.5,0.5)^{\top}$

3.4 Illustrative Real Data Example

In this section, we provide an illustrative example of a real data analysis conducted on muscle-invasive bladder cancer[9]. Specifically, we examined the causal effects of adjuvant therapies on muscle-invasive bladder cancer-related deaths. We considered cancer-specific death as a binary endpoint, which differed from the outcome in the original manuscript, that is, we disregarded the ‘time information’ from the survival data. The dataset consisted of 322 patients: 74 in the adjuvant chemotherapy (treatment) group and 248 in the radical cystectomy (control) group. Cancer-specific deaths occurred in 23 (31.1 %) and 45 patients (18.2 %) in the treatment and control groups, respectively. For more information on this study, please refer to Shimizu et al.[9].

All analysis results are summarized in Table 3. Given that the proposed criteria under NN for the ATT and ATO are 0.535 and 0.125, respectively, it is expected that the simple CI will not yield a conservative CI. In this analysis, we assume interest in either SN or CN conditions. Under this assumption, the null hypotheses, which propose no causal effects, are not rejected using the simple CI.

Table 3: Summary Statistics of Real Data Analysis: The CIs are calculated using both the simple CI, which ignores the variability of the estimated propensity score, and the exact CI, which considers the variability.

	Summary statistics
Estimand	Point Estimate	95%CI (simple CI)	95%CI (exact CI)
ATT	-0.0478	(-0.2004, 0.1049)	(-0.2001, 0.1045)
ATO	0.0017	(-0.1394, 0.1428)	(-0.1392, 0.1426)

CI: confidence interval

4 Discussion

In this manuscript, we investigate scenarios where a confidence interval that ignores the variability of the estimated propensity score (referred to as ‘simple CI’) for the IPW estimator is conservative compared to one that considers this variability (termed ‘exact CI’) in weighted ATE classes. This consideration is important from a practical standpoint, as the sandwich variance, which is commonly used to derive the simple CI, is widely employed. Specifically, under the Fisher sharp null, conditional ATE null, or homogeneous treatment effect scenarios, the simple CI yields a conservative CI. However, in other scenarios, by evaluating terms related to the variability of the estimated propensity score in the asymptotic variance of the IPW estimator, the behavior of the simple CI can be forecasted. Using this property, we propose simple criteria to assess whether the simple CI is conservative. The results of this investigation are validated through simulation experiments and real data analysis.

To the best of our knowledge, this is the first study to identify situations where the simple CI becomes conservative. It is well known that the sandwich variance of the IPW estimator for the ATE is conservative, whereas for the ATT, it is not[5]. However, from a practical standpoint, the sandwich variance is commonly applied to the Weighted Average Treatment Effect estimands, including the ATT and ATO[11, 12]. According to the results of this manuscript, statistical hypothesis tests considering certain causal null hypotheses are appropriate. In other scenarios, it is indeed necessary to address this issue by using the exact CI or the proposed criteria.

The results of the simulation experiments suggest that the proposed criteria, especially for the ATO, are effective in determining the behavior of the simple CI. However, it should be noted that the right-hand side of (3) tends to yield a small value. Therefore, additional caution is needed when practically applying these criteria. Specifically, we believe the following two points are important: 1) the distribution of the estimated propensity score, and 2) its variance. As for the first point, this is evident from Figure 1. The blue dashed line indicates that if the distribution is symmetrical around 0.5, the criterion is likely to be satisfied. Regarding the second point, a large variance in the estimated propensity score can lead to a significant difference in the value of ${\rm E}\left[e(1-e)(1-2e)\boldsymbol{X}\right]$ , which is the fundamental term of the criterion for the ATO. Conversely, if the variance is small, this term tends to yield similar values since the estimated propensity scores are concentrated around 0.5. This trend is observed in simulation scenarios #3 and #4. In summary, when the estimated propensity score distribution is centered around 0.5, the criterion for the ATO is more likely to be effective.

In future work, it will be important to examine the behavior of other estimands, such as matching weight, entropy weight, and others[13, 14]. For the matching weight, it may be necessary to employ approximation techniques, as this weight includes the non-differentiable point[13, 15]. Additionally, while some considerations for continuous outcomes are presented in Appendix C, addressing time-to-event outcomes is also crucial. Mao et al.[16] and Shu et al.[17] have proposed asymptotic variance estimators for various causal estimands, with the latter focusing particularly on the Cox proportional hazard model. As mentioned in Shu et al.[17], the variance estimator for the ATE is conservative, while those for other estimands are not. Therefore, from a practical standpoint, it is also important to determine when CIs derived from these variance estimators are conservative.

References

1. Li, F., Morgan, K. L., & Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
2. Hirano, K., Imbens, G. W., & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4), 1161-1189.
3. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55.
4. Mao, H., Li, L., & Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical methods in medical research, 28(8), 2439-2454.
5. Reifeis, S. A., & Hudgens, M. G. (2022). On variance of the treatment effect in the treated when estimated by inverse probability weighting. American Journal of Epidemiology, 191(6), 1092-1097.
6. Lunceford, J. K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in medicine, 23(19), 2937-2960.
7. Imbens, G. W., & Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
8. Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik, O. A. (2008). Nonparametric tests for treatment effect heterogeneity. The Review of Economics and Statistics, 90(3), 389-405.
9. Shimizu, F., Muto, S., Taguri, M., Ieda, T., Tsujimura, A., Sakamoto, Y., … & Horie, S. (2017). Effectiveness of platinum‐based adjuvant chemotherapy for muscle‐invasive bladder cancer: A weighted propensity score analysis. International Journal of Urology, 24(5), 367-372.
10. Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.
11. Merz, M., Goldschmidt, H., Hari, P., Agha, M., Diels, J., Ghilotti, F., … & Jagannath, S. (2021). Adjusted comparison of outcomes between patients from CARTITUDE-1 versus multiple myeloma patients with prior exposure to PI, Imid and Anti-CD-38 from a German Registry. Cancers, 13(23), 5996.
12. Costa, L. J., Lin, Y., Cornell, R. F., Martin, T., Chhabra, S., Usmani, S. Z., … & Hari, P. (2022). Comparison of cilta-cel, an anti-BCMA CAR-T cell therapy, versus conventional treatment in patients with relapsed/refractory multiple myeloma. Clinical Lymphoma Myeloma and Leukemia, 22(5), 326-335.
13. Li, L., & Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The international journal of biostatistics, 9(2), 215-234.
14. Matsouaka, R. A., & Zhou, Y. (2020). A framework for causal inference in the presence of extreme inverse probability weights: the role of overlap weights. arXiv preprint arXiv:2011.01388.
15. Orihara, S., Kawamura, T., & Taguri, M. (2022). Comments on ‘A weighting analogue to pair matching in propensity score analysis’ by L. Li and T. Greene. The International Journal of Biostatistics, 19(1), 53-60.
16. Mao, H., Li, L., Yang, W., & Shen, Y. (2018). On the propensity score weighting analysis with survival outcome: Estimands, estimation, and inference. Statistics in medicine, 37(26), 3745-3763.
17. Shu, D., Young, J. G., Toh, S., & Wang, R. (2021). Variance estimation in inverse probability weighted Cox models. Biometrics, 77(3), 1101-1117.

Appendix A Details of the IPW Estimator and the Asymptotic Property of the WATE

In this manuscript, the propensity score is estimated as a MLE: $\sum_{i=1}^{n}\boldsymbol{X}_{i}\left(T_{i}-e_{i}(\boldsymbol{\alpha})\right)=\boldsymbol{0}$ . To summarize the discussions in the main manuscript, both the IPW estimator and the propensity score estimator can be encapsulated by the following single estimating equation[4]

\displaystyle\sum_{i=1}^{n}\left(\begin{array}[]{c}\boldsymbol{X}_{i}\left(T_{i}-e_{i}(\boldsymbol{\alpha})\right)\\ W_{i}T_{i}(Y_{i}-\mu_{w1})\\ W_{i}(1-T_{i})(Y_{i}-\mu_{w0})\end{array}\right)=\sum_{i=1}^{n}\psi_{i}(\boldsymbol{\theta})=\boldsymbol{0},

(A.4)

where $\boldsymbol{\theta}:=\left(\boldsymbol{\alpha}^{\top},\mu_{w1},\mu_{w0}\right)^{\top}$ , and $\hat{\boldsymbol{\theta}}$ is the solution of (A.4). Using the standard theory for the M-estimator, the asymptotic distribution of $\hat{\boldsymbol{\theta}}$ becomes

\sqrt{n}\left(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{0}\right)\stackrel{{\scriptstyle L}}{{\to}}N\left(\boldsymbol{0},{\rm E}\left[\frac{\partial\psi(\boldsymbol{\theta}^{0})}{\partial\boldsymbol{\theta}^{\top}}\right]^{-1}{\rm E}\left[\psi(\boldsymbol{\theta}^{0})^{\otimes 2}\right]{\rm E}\left[\frac{\partial\psi(\boldsymbol{\theta}^{0})^{\top}}{\partial\boldsymbol{\theta}}\right]^{-1}\right),

where $\boldsymbol{\theta}^{0}$ represents the true value of $\boldsymbol{\theta}$ , implying that the expectation of (A.4) is uniquely satisfied. Also, applying the delta method,

\displaystyle\sqrt{n}\left(\hat{\tau}_{w}-\tau_{w}^{0}\right)\stackrel{{\scriptstyle L}}{{\to}}N\left(\boldsymbol{0},\sigma^{2}\right),

(A.5)

where

\sigma^{2}=\boldsymbol{c}^{\top}{\rm E}\left[\frac{\partial\psi(\boldsymbol{\theta}^{0})}{\partial\boldsymbol{\theta}^{\top}}\right]^{-1}{\rm E}\left[\psi(\boldsymbol{\theta}^{0})^{\otimes 2}\right]{\rm E}\left[\frac{\partial\psi(\boldsymbol{\theta}^{0})^{\top}}{\partial\boldsymbol{\theta}}\right]^{-1}\boldsymbol{c},

and $\boldsymbol{c}=(\boldsymbol{0}^{\top},1,-1)^{\top}$ .

From here, we consider each component of the asymptotic variance of (A.5). Since

{\rm E}\left[\frac{Tw(e)}{Te+(1-T)(1-e)}\right]={\rm E}\left[\frac{(1-T)w(e)}{Te+(1-T)(1-e)}\right]={\rm E}[w(e)],

\frac{\partial}{\partial\boldsymbol{\alpha}}\frac{Tw(e)}{Te+(1-T)(1-e)}(Y-\mu_{w1})=\frac{T(w^{\prime}(e)e-w(e))}{e}(1-e)(Y-\mu_{w1})\boldsymbol{X},

and

\frac{\partial}{\partial\boldsymbol{\alpha}}\frac{(1-T)w(e)}{Te+(1-T)(1-e)}(Y-\mu_{w0})=\frac{(1-T)(w^{\prime}(e)(1-e)+w(e))}{1-e}e(Y-\mu_{w0})\boldsymbol{X},

\displaystyle{\rm E}\left[\frac{\partial\psi(\boldsymbol{\theta}^{0})}{\partial\boldsymbol{\theta}^{\top}}\right]=\left(\begin{array}[]{ccc}-{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]&\boldsymbol{0}&\boldsymbol{0}\\ {\rm E}\left[(w^{\prime}(e)e-w(e))(Y_{1}-\mu_{w1})(1-e)\boldsymbol{X}^{\top}\right]&-{\rm E}[w(e)]&0\\ {\rm E}\left[(w^{\prime}(e)(1-e)+w(e))(Y_{0}-\mu_{w10})e\boldsymbol{X}^{\top}\right]&0&-{\rm E}[w(e)]\end{array}\right),

(A.9)

where $A^{\otimes 2}=AA^{\top}$ . Also,

\displaystyle{\rm E}\left[\psi(\boldsymbol{\theta}^{0})^{\otimes 2}\right]

\displaystyle=\left(\begin{array}[]{ccc}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]&&\\ {\rm E}\left[w(e)(Y_{1}-\mu_{w1})(1-e)\boldsymbol{X}^{\top}\right]&{\rm E}\left[\frac{w(e)^{2}(Y_{1}-\mu_{w1})^{2}}{e}\right]&\\ -{\rm E}\left[w(e)(Y_{0}-\mu_{w0})e\boldsymbol{X}^{\top}\right]&0&{\rm E}\left[\frac{w(e)^{2}(Y_{0}-\mu_{w0})^{2}}{1-e}\right]\end{array}\right).

(A.13)

Regarding (A.9) and (A.13), the symbols used in the subsequent discussions are

(\ref{mat1})=\left(\begin{array}[]{cc}-A_{11}&O^{\top}\\ A_{12}&a_{22}{\rm I}\end{array}\right),\ \ \ \ \ (\ref{mat2})=\left(\begin{array}[]{cc}A_{11}&B_{12}^{\top}\\ B_{12}&B_{22}\end{array}\right),

where ${\rm I}$ represents the identity matrix. Using the relationship

A_{12}=-B_{12}+\left(\begin{array}[]{c}{\rm E}\left[w^{\prime}(e)(Y_{1}-\mu_{w1})e(1-e)\boldsymbol{X^{\top}}\right]\\ {\rm E}\left[w^{\prime}(e)(Y_{0}-\mu_{w0})e(1-e)\boldsymbol{X^{\top}}\right]\end{array}\right)=-B_{12}+\delta=:-\left(\begin{array}[]{c}b_{1}^{\top}\\ b_{2}^{\top}\end{array}\right)+\left(\begin{array}[]{c}\delta_{1}^{\top}\\ \delta_{2}^{\top}\end{array}\right),

the asymptotic variance of (A.5) becomes

\sigma^{2}=\frac{1}{a_{22}^{2}}(1,-1)\left(B_{22}+\delta A_{11}^{-1}\delta^{\top}-B_{12}A_{11}^{-1}B_{12}^{\top}\right)\left(\begin{array}[]{c}1\\ -1\end{array}\right).

Therefore, the result of the main manuscript (2.3) is obtained.

Appendix B Proof of Theorem 2

In the former arguments, it can be simply calculated that $\delta_{2}-\delta_{1}=O$ . Given that ${\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]>O$ , the statement is proven.

In the latter arguments, from the Schwartz inequality,

(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}\alpha\leq|(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}\alpha|\leq\left\{(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}(\delta_{2}-\delta_{1})\alpha^{\top}A_{11}^{-1}\alpha\right\}^{\frac{1}{2}}.

Therefore,

	$\displaystyle-\alpha^{\top}A_{11}^{-1}\alpha+2(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}\alpha$	$\displaystyle\leq-\alpha^{\top}A_{11}^{-1}\alpha+2\left\{(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}(\delta_{2}-\delta_{1})\alpha^{\top}A_{11}^{-1}\alpha\right\}^{\frac{1}{2}}$
		$\displaystyle=\left\{\alpha^{\top}A_{11}^{-1}\alpha\right\}^{\frac{1}{2}}\left[-\left\{\alpha^{\top}A_{11}^{-1}\alpha\right\}^{\frac{1}{2}}+2\left\{(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}(\delta_{2}-\delta_{1})\right\}^{\frac{1}{2}}\right]$		(B.1)

Since $\alpha^{\top}A_{11}^{-1}\alpha>0$ , the sufficient condition for (B) to become negative is for the term within $[\cdot]$ to be negative:

4(\delta_{2}-\delta_{1})^{\top}A_{11}^{-1}(\delta_{2}-\delta_{1})<\alpha^{\top}A_{11}^{-1}\alpha.

We specifically calculate $\delta_{2}-\delta_{1}$ as follows:

\delta_{2}-\delta_{1}={\rm E}\left[(Y_{0}-Y_{1}-(\mu_{w0}-\mu_{w1}))e(1-e)\boldsymbol{X}\right].

When considering $NN$ , $\mu_{w0}-\mu_{w1}=0$ . Therefore, since $Y_{0}-Y_{1}<1$ , the statement is verified. In other cases, given that $Y_{0}-Y_{1}-(\mu_{w0}-\mu_{w1})<2$ , the statement is similarly verified.

Appendix C Continuous Outcomes Situations

In this section, we assume the following linear outcome models

\displaystyle Y_{t}=\beta_{0t}^{\prime}+{\boldsymbol{X}}^{\top}\boldsymbol{\beta}_{xt}+\varepsilon_{t},

(C.1)

where ${\rm E}[\varepsilon_{t}]=0$ , $Var(\varepsilon_{t})<\infty$ , and $\varepsilon_{t}\mathop{\perp\!\!\!\perp}(T,\boldsymbol{X},Y)$ ( $t=0,1$ ). Based on this assumption, the WATE is expressed as

\tau_{w}=\mu_{w1}-\mu_{w0}=\beta_{01}^{\prime}-\beta_{00}^{\prime}+\frac{{\rm E}\left[w(e)\boldsymbol{X}^{\top}\right]}{{\rm E}[w(e)]}\left(\boldsymbol{\beta}_{x1}-\boldsymbol{\beta}_{x0}\right).

In subsequent discussions, we will employ the following reparametrization to simplify the discussion: $\mu_{wt}=\beta_{0t}^{\prime}$ . This reparametrization is consistent with the centering of the outcome models (C.1) with respect to the covariates

Y_{t}=\mu_{wt}+\left({\boldsymbol{X}}-\frac{{\rm E}\left[w(e)\boldsymbol{X}\right]}{{\rm E}[w(e)]}\right)^{\top}\boldsymbol{\beta}_{xt}+\varepsilon_{t}=\mu_{wt}+{\boldsymbol{X}^{\prime}}^{\top}\boldsymbol{\beta}_{xt}+\varepsilon_{t}.

Note that when treatment effects are homogeneous (ie., $\boldsymbol{\beta}_{x1}=\boldsymbol{\beta}_{x0}$ ), $\tau_{w}=\beta_{01}^{\prime}-\beta_{00}^{\prime}$ for all estimands. Note also that the centering does not affect the estimated propensity score since the effect is absorbed into the intercept term. Under the potential outcome models (C.1), the observed outcome model becomes

Y=TY_{1}+(1-T)Y_{0}=\mu_{w0}+T(\mu_{w1}-\mu_{w0})+{\boldsymbol{X}^{\prime}}^{\top}\boldsymbol{\beta}_{x0}+T{\boldsymbol{X}^{\prime}}^{\top}\left(\boldsymbol{\beta}_{x1}-\boldsymbol{\beta}_{x0}\right)+\varepsilon

First, we consider the asymptotic variance of the ATT. The values of $\delta$ and $B_{22}$ become the following, respectively:

	$\displaystyle\delta$	$\displaystyle=\left(\begin{array}[]{c}\delta_{1}^{\top}\\ \delta_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)\boldsymbol{X^{\top}}\right]\\ {\rm E}\left[(Y_{0}-\mu_{w0})e(1-e)\boldsymbol{X^{\top}}\right]\end{array}\right)=\left(\begin{array}[]{c}\boldsymbol{\beta}_{x1}^{\top}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]\\ \boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]\end{array}\right),$
	$\displaystyle B_{12}$	$\displaystyle=\left(\begin{array}[]{c}b_{1}^{\top}\\ b_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)\boldsymbol{X}^{\top}\right]\\ -{\rm E}\left[(Y_{0}-\mu_{w0})e^{2}\boldsymbol{X}^{\top}\right]\end{array}\right)=\left(\begin{array}[]{c}\boldsymbol{\beta}_{x1}^{\top}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]\\ -\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e^{2}\boldsymbol{X}^{\otimes 2}\right]\end{array}\right).$

Since $b_{1}=\delta_{1}$ and $b_{2}=\delta_{2}-\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]$ , the second and third term of (2.3) become

	$\displaystyle(1,-1)\left(\delta A_{11}^{-1}\delta^{\top}-B_{12}A_{11}^{-1}B_{12}^{\top}\right)\left(\begin{array}[]{c}1\\ -1\end{array}\right)$
		$\displaystyle=\delta_{1}^{\top}A_{11}^{-1}\delta_{1}-2\delta_{1}^{\top}A_{11}^{-1}\delta_{2}+\delta_{2}^{\top}A_{11}^{-1}\delta_{2}-b_{1}^{\top}A_{11}^{-1}b_{1}+2b_{1}^{\top}A_{11}^{-1}b_{2}-b_{2}^{\top}A_{11}^{-1}b_{2}$
		$\displaystyle=-2\boldsymbol{\beta}_{x1}^{\top}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]A_{11}^{-1}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}+2\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]A_{11}^{-1}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}$
		$\displaystyle-\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]A_{11}^{-1}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}$
		$\displaystyle=2\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]\left(\boldsymbol{\beta}_{x0}-\boldsymbol{\beta}_{x1}\right)-\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]A_{11}^{-1}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}$

From the result, the following relationship can be proved.

Theorem 4.

For the ATT, when treatment effects are homogeneous (ie., $\boldsymbol{\beta}_{x1}=\boldsymbol{\beta}_{x0}$ ), the second and third terms of (2.3) are precisely negative. When there is only constant heterogeneity (ie., $\boldsymbol{\beta}_{x1}=\gamma\boldsymbol{\beta}_{x0}$ ), a sufficient condition for the second and third terms of (2.3) to be precisely negative is the existence of a value $\gamma\in\mathbb{R}$ such that

{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]-2(1-\gamma){\rm I}>O.

This is clearly satisfied when $\gamma\geq 1$ .

Proof.

Since the former statement is obvious, we will only prove the latter statement. When $\boldsymbol{\beta}_{x1}=\gamma\boldsymbol{\beta}_{x0}$ ,

	$\displaystyle 2(1-\gamma)\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}-\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]A_{11}^{-1}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}$
		$\displaystyle=\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]A_{11}^{-1}\left\{2(1-\gamma){\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]-{\rm E}\left[e\boldsymbol{X}^{\otimes 2}\right]\right\}\boldsymbol{\beta}_{x0}$

By focusing on the term within $\{\cdot\}$ , the statement can be derived. ∎

This theorem suggests that the simple CI yields a conservative CI when there is a homogeneous treatment effect, and we are interested in the ATT. Additionally, if the interaction effect between the treatment and all confounders is proportional to $\gamma$ , the simple CI also yields a conservative CI when the interaction effect is superior to the effect from covariates alone. This scenario arises when the risk factors of interest exhibit significant interaction effects with a treatment.

Next, we consider the asymptotic variance of the ATO. In the same manner as the ATT, the values of $\delta$ and $B_{22}$ become the following, respectively:

	$\displaystyle\delta$	$\displaystyle=\left(\begin{array}[]{c}\delta_{1}^{\top}\\ \delta_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)(1-2e)\boldsymbol{X^{\top}}\right]\\ {\rm E}\left[(Y_{0}-\mu_{w0})e(1-e)(1-2e)\boldsymbol{X^{\top}}\right]\end{array}\right)=\left(\begin{array}[]{c}\boldsymbol{\beta}_{x1}^{\top}{\rm E}\left[e(1-e)(1-2e)\boldsymbol{X}^{\otimes 2}\right]\\ \boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e(1-e)(1-2e)\boldsymbol{X}^{\otimes 2}\right]\end{array}\right),$
	$\displaystyle B_{12}$	$\displaystyle=\left(\begin{array}[]{c}b_{1}^{\top}\\ b_{2}^{\top}\end{array}\right)=\left(\begin{array}[]{c}{\rm E}\left[(Y_{1}-\mu_{w1})e(1-e)^{2}\boldsymbol{X}^{\top}\right]\\ -{\rm E}\left[(Y_{0}-\mu_{w0})e^{2}(1-e)\boldsymbol{X}^{\top}\right]\end{array}\right)=\left(\begin{array}[]{c}\boldsymbol{\beta}_{x1}^{\top}{\rm E}\left[e(1-e)^{2}\boldsymbol{X}^{\otimes 2}\right]\\ -\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\end{array}\right).$

Through the similar calculation as the ATT (3.3), the second and third term of (2.3) become

	$\displaystyle(1,-1)\left(\delta A_{11}^{-1}\delta^{\top}-B_{12}A_{11}^{-1}B_{12}^{\top}\right)\left(\begin{array}[]{c}1\\ -1\end{array}\right)$
		$\displaystyle=\left(\boldsymbol{\beta}_{x0}-2\boldsymbol{\beta}_{x1}\right)^{\top}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}-2\left(\boldsymbol{\beta}_{x1}-2\boldsymbol{\beta}_{x0}\right)^{\top}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\left(\boldsymbol{\beta}_{x1}-\boldsymbol{\beta}_{x0}\right)$
		$\displaystyle+3\left(\boldsymbol{\beta}_{x1}-\boldsymbol{\beta}_{x0}\right)^{\top}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\left(\boldsymbol{\beta}_{x1}-\boldsymbol{\beta}_{x0}\right)$

From the result, the following relationship can be proved.

Theorem 5.

For the ATO, when treatment effects are homogeneous (ie., $\boldsymbol{\beta}_{x1}=\boldsymbol{\beta}_{x0}$ ), the second and third terms of (2.3) are precisely negative. When there is only constant heterogeneity (ie., $\boldsymbol{\beta}_{x1}=\gamma\boldsymbol{\beta}_{x0}$ ), sufficients condition for the second and third terms of (2.3) to be precisely negative is the existence of a value $\gamma>1$ such that

\displaystyle\frac{2(\gamma-2)}{3(\gamma-1)}{\rm I}-{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]>O,

(C.2)

Proof.

Since the former statement is obvious, we will only prove the latter statement. When $\boldsymbol{\beta}_{x1}=\gamma\boldsymbol{\beta}_{x0}$ ,

	$\displaystyle(1-2\gamma)\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}-2(\gamma-1)(\gamma-2)\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}$
		$\displaystyle+3(\gamma-1)^{2}\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}$
		$\displaystyle=(1-2\gamma)\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]\boldsymbol{\beta}_{x0}+(\gamma-1)\boldsymbol{\beta}_{x0}^{\top}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}$
		$\displaystyle\times\left\{-2(\gamma-2){\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]+3(\gamma-1){\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\right\}\boldsymbol{\beta}_{x0}$

When $\gamma\geq 0.5$ , the first term is negative definite. When $\gamma>1$ , the second term is negative definite under the condition (C.2). When $1>\gamma\geq 0.5$ the second term is negative definite under the condition

\frac{2(\gamma-2)}{3(\gamma-1)}{\rm I}-{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]<O.

However, from the relationship

\displaystyle{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]={\rm Pr}(T=1){\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}|T=1\right]<{\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right],

(C.3)

There are no situations where the condition can be satisfied. ∎

Note that when $\gamma<0.5$ , simple sufficient conditions cannot be derived because the first term of the above formula is positive definite. This implies that when there’s a small, or negative interaction effect between the treatment and all confounders, the simple CI may not yield a conservative CI. When $\gamma>1$ , relatively straightforward condition (C.2) can be derived; however, interpretation remains challenging.

To address this, we use the following relationship (C.3). When ${\rm Pr}(T=1)\approx 0$ , it is expected that ${\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\approx O$ . In this situation (C.2) under $2\geq\gamma>1$ is not hold. Whereas, (C.2) under $\gamma>2$ is hold. This means that if there are only a few members in the treatment group, and there are large interaction effects between the treatment and all confounders, the simple CI yields a conservative CI. When ${\rm Pr}(T=1)\approx 1$ , it is expected that ${\rm E}\left[e(1-e)\boldsymbol{X}^{\otimes 2}\right]^{-1}{\rm E}\left[e^{2}(1-e)\boldsymbol{X}^{\otimes 2}\right]\approx{\rm I}$ . In this situation,

\left(\frac{2(\gamma-2)}{3(\gamma-1)}-1\right){\rm I}=\frac{-1-\gamma}{3(\gamma-1)}{\rm I}.

Obviously, (C.2) is not satisfied. Note that when ${\rm Pr}(T=1)\approx 0$ and $\gamma>0.5$ , (C.2) may not always hold from the above discussions; however, the first term of (2.3) dominates. Therefore, the simple CI also yields a conservative CI in this situation.

Summarizing the discussions above, for the ATO, there is no universal scenario where the simple CI is applicable; the exact CI is more appropriate. However, in certain situations, such as when there are homogeneous treatment effects, when there exists significant treatment-confounder interactions, or when there are many members in the control groups, the simple CI might work effectively.