This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Kazuharu Harada Tokyo Medical University, Japan. e-mail: [email protected]    Hironori Fujisawa The Institute of Statistical Mathematics, Japan. e-mail: [email protected]
Abstract

The inverse probability (IPW) and doubly robust (DR) estimators are often used to estimate the average causal effect (ATE), but are vulnerable to outliers. The IPW/DR median can be used for outlier-resistant estimation of the ATE, but the outlier resistance of the median is limited and it is not resistant enough for heavy contamination. We propose extensions of the IPW/DR estimators with density power weighting, which can eliminate the influence of outliers almost completely. The outlier resistance of the proposed estimators is evaluated through the unbiasedness of the estimating equations. Unlike the median-based methods, our estimators are resistant to outliers even under heavy contamination. Interestingly, the naive extension of the DR estimator requires bias correction to keep the double robustness even under the most tractable form of contamination. In addition, the proposed estimators are found to be highly resistant to outliers in more difficult settings where the contamination ratio depends on the covariates. The outlier resistance of our estimators from the viewpoint of the influence function is also favorable. Our theoretical results are verified via Monte Carlo simulations and real data analysis. The proposed methods were found to have more outlier resistance than the median-based methods and estimated the potential mean with a smaller error than the median-based methods.

1 Introduction

Statistical causal inference provides various estimators for causal quantities like the average treatment effect (ATE). To estimate such quantities, the propensity score is widely applied in various ways, such as stratification, matching, inverse probability weighting (IPW), and the doubly robust (DR) estimator (Robins, Rotnitzky, and Zhao, 1994; Rosenbaum and Rubin, 1983; Bang and Robins, 2005). These estimators are designed to control confounding, and they are consistent with the target quantity under some assumptions.

As discussed in the later section, the IPW and DR estimators are vulnerable to outliers since they partially use the sample mean. An outlier in a multivariate setting is classified into three types: a vertical outlier, a good leverage point, and a bad leverage point (Rousseeuw and van Zomeren, 1990). Figure 1 illustrates the three types of outliers. Canavire-Bacarreza et al. (2021) has investigated the influence of these types of outliers on the estimators of the ATE including IPW through exhaustive Monte-Carlo simulations; they have pointed out that the vertical outliers in the outcome variable lead to a serious bias of the ATE estimation. In this paper, we are interested in reducing the bias caused by the vertical outliers.

Refer to caption
Figure 1: Three types of outliers.

Outlier-resistant statistics have been studied for long; however, most literature does not consider a causal setting (Huber, 2004; Hampel et al., 2011; Maronna et al., 2019). The established methods of outlier-resistant statistics are not directly applicable to causal settings. The median-based estimators are the only examples which are applicable to the estimation of the ATE under outlier contamination (Firpo, 2007; Zhang et al., 2012; Díaz, 2017; Sued, Valdora, and Yohai, 2020). It is well known that the sample median is more resistant to outliers than the sample mean, but it is still affected; in particular, when the contamination ratio is not small and the outliers lie on one side of the data-generating density, the influence becomes so large as it cannot be ignored (Fujisawa and Eguchi, 2008).

In this paper, we propose extensions of the IPW and DR estimators for the mean of the potential outcome with more outlier resistance than the median-based methods. We discuss the outlier resistance of these estimators from the viewpoint of the unbiasedness of the estimating equation and influence function (IF). In most literature on outlier-resistant statistics, the contamination ratio is assumed to be small and be independent of covariates; however, we discuss the outlier resistance of the proposed estimators under more general assumptions, including the case where the contamination ratio is not small and related to covariates. Interestingly, a straight extension of the DR estimator loses the robustness to model misspecification under contamination. We also propose a bias-corrected version of the extended DR estimator, which holds the double robustness under contamination. Furthermore, the theoretical advantages of our estimators are verified through Monte-Carlo simulations and real data analysis.

The remainder of this paper is organized as follows. In Section 2, we introduce the potential outcome framework for causal inference and the basic concept of outliers. In Section 3, we propose novel estimators and discuss the outlier resistance from the viewpoint of the unbiasedness of the estimating equations. In Section 4, we evaluate the outlier resistance in terms of the IF. In Section 5, we discuss asymptotic properties. Finally, in Sections 6 and 7, we present the experimental results.

2 Preliminaries

2.1 Potential Outcome and Treatment Effect

Let (Y,T,X)(Y,T,X) be the observable random variables; XX is the outcome, TT is the treatment, and XX is the confounder. We assume that YY is continuous and TT is binary; it is easy to extend TT to multiple discrete treatments. We have the observations (Yi,Ti,Xi)i=1n(Y_{i},T_{i},X_{i})_{i=1}^{n} drawn from the distribution of (Y,T,X)(Y,T,X) in an i.i.d. manner. Denote the potential outcome under T=tT=t by Y(t)Y^{(t)} and let μ(t)=𝔼[Y(t)]\mu^{(t)}=\mathbb{E}[Y^{(t)}]. Y(t)Y^{(t)} is uniquely defined for every treatment as a random variable, namely, well-defined. Note that i.i.d. sampling and well-definedness of the potential outcome are collectively called the stable unit treatment value assumption (SUTVA; Imbens and Rubin, 2015). The ATE is defined as μ(1)μ(0)\mu^{(1)}-\mu^{(0)}. The ATE cannot be estimated directly since we cannot observe Y(1)Y^{(1)} and Y(0)Y^{(0)} simultaneously; instead, we use the observed variables under the common assumptions (e.g. Imbens and Rubin, 2015):

  1. 1.

    Conditional Unconfoundedness: Y(t)T|XY^{(t)}\perp\!\!\!\perp T|X for all t{0,1}t\in\{0,1\},

  2. 2.

    Consistency: Y=Y(t)Y=Y^{(t)} if T=tT=t,

  3. 3.

    Positivity: P(T=1|X)>cP(T=1|X)>c for some constant c>0c>0.

The ATE can be estimated from the observed variables under these assumptions, namely, identifiable. Hereafter, we assume the triple assumption holds and focus on the estimation of μ(1)\mu^{(1)} for simplicity. μ(0)\mu^{(0)} is estimated in a similar way, and the ATE is estimated by the difference between the estimates of μ(1)\mu^{(1)} and μ(0)\mu^{(0)}.

We introduce three consistent estimators of the potential mean. The IPW estimator (Rosenbaum and Rubin, 1983) is based on the propensity score (PS). Let π(x;α)(0,1)\pi(x;\alpha)\in(0,1) be the PS, which models P(T=1|x)P(T=1|x). We assume the PS is correctly specified, in other words, there exists α\alpha^{*} such that π(x;α)=P(T=1|x)\pi(x;\alpha^{*})=P(T=1|x) for every xx. The IPW estimator has several forms (Lunceford and Davidian, 2004), but we use the weighted average form: μ^IPW(1)=(i=1nTiYi/π(Xi;α^))/(i=1nTi/π(Xi;α^))\hat{\mu}^{(1)}_{IPW}=\left.\left(\sum_{i=1}^{n}T_{i}Y_{i}/\pi(X_{i};\hat{\alpha})\right)\right/\left(\sum_{i=1}^{n}T_{i}/\pi(X_{i};\hat{\alpha})\right), where α^\hat{\alpha} is an estimate of α\alpha obtained in a consistent manner such as the maximum likelihood estimation (MLE). The IPW estimator is consistent with μ(1)\mu^{(1)} if the PS model is correctly specified. The IPW estimator can be seen as the root of the following estimating equation:

i=1nTiπ(Xi;α^)(Yiμ)=0.\displaystyle\sum_{i=1}^{n}\frac{T_{i}}{\pi(X_{i};\hat{\alpha})}(Y_{i}-\mu)=0. (1)

Outcome regression (OR) is also popular. To construct the OR estimator, we model 𝔼[Y|T=1,X]\mathbb{E}[Y|T=1,X] by some function m1(X;β)m_{1}(X;\beta). Then, the OR estimator is obtained as n1i=1nm1(Xi;β^)n^{-1}\sum_{i=1}^{n}m_{1}(X_{i};\hat{\beta}), where β^\hat{\beta} is a consistent estimate of β\beta. The IPW and OR estimators are asymptotically consistent with μ(1)\mu^{(1)} when the model used in each estimator is correctly specified; the consistency is not assured if the model is misspecified. The DR estimator (Scharfstein, Rotnitzky, and Robins, 1999; Bang and Robins, 2005) combines the IPW and OR estimators. Since the DR estimator is consistent with μ(1)\mu^{(1)} if either the PS or OR model is correctly specified, it is said to be “doubly robust.” Besides, if both models are correctly specified, the DR estimator is semiparametrically efficient (Robins and Rotnitzky, 1995; Tsiatis, 2006). Although there are many estimators equipped with double robustness, we refer the root of the following estimating equation as the DR estimator μ^DR(1)\hat{\mu}^{(1)}_{DR}, which is a special case of the augmented IPW estimator:

i=1n[Tiπ(Xi;α^)(Yiμ)Tiπ(Xi;α^)π(Xi;α^){m1(Xi;β^)μ}]=0.\displaystyle\sum_{i=1}^{n}\left[\frac{T_{i}}{\pi(X_{i};\hat{\alpha})}(Y_{i}-\mu)-\frac{T_{i}-\pi(X_{i};\hat{\alpha})}{\pi(X_{i};\hat{\alpha})}\{m_{1}(X_{i};\hat{\beta})-\mu\}\right]=0. (2)

2.2 IPW/DR M-estimators

Let i=1nψ(Yi,θ)=0\sum_{i=1}^{n}\psi(Y_{i},\theta)=0 be an estimating equation, where ψ\psi is a known vector-valued map, and θ\theta is the parameter of interest. An estimator θ^\hat{\theta} solving the estimating equation is called an M-estimator. M-estimator is a large class of estimators involving MLE, IPW, OR, and DR. If the estimating equation is unbiased, say 𝔼θ[ψ(Y,θ)]=0\mathbb{E}_{\theta}[\psi(Y,\theta)]=0, the M-estimator is consistent with the truth under some regularity conditions (e.g. Chap. 5 of Van der Vaart, 2000).

By replacing YiμY_{i}-\mu in (2) with an estimating function ψ(Yi;θ)\psi(Y_{i};\theta), the IPW and DR estimators can be expanded to a general M-estimator. If we are interested in the same parameter θ\theta with respect to Y(1)Y^{(1)}, the IPW and DR M-estimators (Tsiatis, 2006) are available:

i=1nTiπ(Xi;α^)ψ(Yi;θ)=0,\displaystyle\sum_{i=1}^{n}\frac{T_{i}}{\pi(X_{i};\hat{\alpha})}\psi(Y_{i};\theta)=0, (3)
i=1n[Tiπ(Xi;α^)ψ(Yi;θ)Tiπ(Xi;α^)π(Xi;α^)𝔼q^[ψ(Yi;θ)|T=1,Xi]]=0.\displaystyle\sum_{i=1}^{n}\left[\frac{T_{i}}{\pi(X_{i};\hat{\alpha})}\psi(Y_{i};\theta)-\frac{T_{i}-\pi(X_{i};\hat{\alpha})}{\pi(X_{i};\hat{\alpha})}\mathbb{E}_{\hat{q}}[\psi(Y_{i};\theta)|T=1,X_{i}]\right]=0. (4)

The conditional expectation 𝔼q^[ψ(Yi;θ)|T=1,Xi]\mathbb{E}_{\hat{q}}[\psi(Y_{i};\theta)|T=1,X_{i}] is calculated using the parametric OR model q(y|T=1,x;β^)q(y|T=1,x;\hat{\beta}) via direct calculation or Monte-Carlo approximation (Hoshino, 2007). When the original M-estimating equation is unbiased, the IPW/DR estimating equations are unbiased under the proper model specification. The asymptotic properties of the IPW and DR M-estimators follow from the standard theory of M-estimators.

2.3 Outlier-resistant Estimation

This section provides a brief review of the outlier-resistant estimation of the mean in a one-variable and non-causal setting. Let g~\tilde{g} be the density function of a random variable ZZ\in\mathbb{R}. Assume that the density is contaminated as g~(z)=(1ε)fμ(z)+εδ(z)\tilde{g}(z)=(1-\varepsilon)f_{\mu^{*}}(z)+\varepsilon\delta(z), where fμf_{\mu^{*}} is the density of ZZ without contamination equipped with the mean μ\mu^{*}, ε\varepsilon is the contamination ratio, and δ\delta is the density of outliers. Our goal is to estimate μ\mu^{*} from i.i.d. observations {Z1,,Zn}\{Z_{1},...,Z_{n}\} drawn from g~\tilde{g}. If we model the contamination in this way, the sample mean converges to (1ε)μ+ε𝔼δ[Z](1-\varepsilon)\mu^{*}+\varepsilon\mathbb{E}_{\delta}[Z]; if the mean of outliers is far from μ\mu^{*}, the sample mean is asymptotically biased. To deal with contamination, many types of M-estimators are applied. The unbiasedness of the estimating equation does not usually hold under contamination because

𝔼g~[ψ(Z,μ)]=(1ε)𝔼fμ[ψ(Z,μ)]=0+ε𝔼δ[ψ(Z,μ)]0.\displaystyle\mathbb{E}_{\tilde{g}}[\psi(Z,\mu^{*})]=(1-\varepsilon)\underbrace{\mathbb{E}_{f_{\mu^{*}}}[\psi(Z,\mu^{*})]}_{=0}+\varepsilon\mathbb{E}_{\delta}[\psi(Z,\mu^{*})]\neq 0. (5)

By designing ψ\psi to eliminate or bound 𝔼δ[ψ(Z,μ)]\mathbb{E}_{\delta}[\psi(Z,\mu^{*})], the influence of outliers can be reduced. Let θψ\theta_{\psi}^{*} denote a root of 𝔼g~[ψ(Z,θ)]=0\mathbb{E}_{\tilde{g}}[\psi(Z,\theta)]=0; then, the latent bias is defined as θψθ\theta_{\psi}^{*}-\theta^{*}. If δ\delta is Dirac’s delta and ε\varepsilon is sufficiently small, the latent bias is approximated by the IF. The IF-based discussion in Section 4 provides some insights into the outlier resistance of the estimators when the contamination ratio is small. The latent bias and M-estimators are discussed in detail elsewhere (e.g. Huber, 2004; Fujisawa, 2013; Fujisawa and Eguchi, 2008).

2.4 IPW and DR Under Contamination

Next, we move to a causal setting. In this paper, we consider the vertical outliers. In other words, we assume that only the outcome YY may be contaminated, and that the contamination does not affect the causal mechanism among (Y,T,X)(Y,T,X). A typical example is contamination of laboratory values in medical research with foreign substances. Let δY|TX\delta_{Y|TX} be the conditional density of outliers given (T,X)(T,X), and let εt(x)\varepsilon_{t}(x) be the contamination ratio. Then, the contaminated conditional density given (T,X)(T,X) is defined as

g~Y|TX(y|t,x)=(1εt(x))gY|TX(y|t,x)+εt(x)δY|TX(y|t,x),\displaystyle\tilde{g}_{Y|TX}(y|t,x)=(1-\varepsilon_{t}(x))g_{Y|TX}(y|t,x)+\varepsilon_{t}(x)\delta_{Y|TX}(y|t,x), (6)

where gg without tilde denotes the density without contamination; the tilde indicates that the distribution is contaminated. For simplifying the notations, we often drop the subscripts of density functions as long as there would be no confusion and write δt(y|x)=δY|TX(y|t,x)\delta_{t}(y|x)=\delta_{Y|TX}(y|t,x) below. The contamination ratio and their density depend on the treatment TT and the confounder XX. Since we estimate μ(t)\mu^{(t)} for each treatment separately, the dependence on TT is tractable. In contrast, the dependence on XX is critical in our analysis. The XX-dependent contamination is referred to as heterogeneous contamination. We also discuss the special case in which ε\varepsilon and δ\delta are not dependent on XX, called homogeneous contamination. Note that we do not assume εt(x)\varepsilon_{t}(x) to be small enough to be negligible, except in Section 4.

We are interested in the marginal mean of Y(1)Y^{(1)}. Let fY(1)(y;μ(1))f_{Y^{(1)}}(y;\mu^{(1)}) be the true marginal density of Y(1)Y^{(1)}. It is obtained by integrating XX out from gY|TX(y|T,X)g_{Y|TX}(y|T,X) under T=1T=1:

fY(1)(y;μ(1))=gY(1)|X(y|x)gX(x)𝑑x=gY|TX(y|1,x)gX(x)𝑑x.\displaystyle f_{Y^{(1)}}(y;\mu^{(1)})=\int g_{Y^{(1)}|X}(y|x)g_{X}(x)dx=\int g_{Y|TX}(y|1,x)g_{X}(x)dx. (7)

The second equality holds from the triple assumption in Section 2.1. We often write fY(1)(y;μ(1))f_{Y^{(1)}}(y;\mu^{(1)}) as f1(y)f_{1}(y) for simplicity.

Under contamination, the IPW estimating equation is severely biased even if the true PS is obtained as π(X|α)=P(T=1|X)\pi(X|\alpha^{*})=P(T=1|X):

𝔼g~[Tπ(X|α)(Yμ(1))]=𝔼g[ε1(X)𝔼g+δ[(Yμ(1))|X]]0.\displaystyle\mathbb{E}_{\tilde{g}}\left[\frac{T}{\pi(X|\alpha^{*})}(Y-\mu^{(1)})\right]=\mathbb{E}_{g}\left[\varepsilon_{1}(X)\mathbb{E}_{-g+\delta}\left[(Y-\mu^{(1)})|X\right]\right]\neq 0. (8)

It is found that the remaining term contains the expectation of YY with respect to δ\delta, which implies the estimating equation is severely affected by outliers. The DR estimating equation is also biased. To estimate μ(1)\mu^{(1)} accurately, we have to remove the influence of contamination.

3 Outlier-Resistant Extensions of IPW and DR

Before we propose novel estimators, we introduce an assumption on outliers. Intuitively, we assume that the outliers are sufficiently far from the main outcome density. Figure 2 shows real examples of outliers that satisfy this assumption. It is found that these outliers are far from the main body of the density both conditionally and marginally.

Refer to caption
Figure 2: Real examples of outliers which satisfy Assumption 1. All datasets are included in the R package “robustbase” (Maechler et al., 2021): airmay (left), condroz (center), education (right).

To formulate this assumption, we introduce density power weighting. The density power weight is used to enhance the outlier resistance in non-causal settings (Windham, 1995; Basu et al., 1998; Jones et al., 2001; Fujisawa and Eguchi, 2008). Let h(y;μ)γ(γ>0)h(y;\mu)^{\gamma}~{}(\gamma>0) be a density power weight for Y(1)Y^{(1)}, where h(y;μ)h(y;\mu) is a symmetric density function with the location parameter μ\mu. The density h(y;μ(1))h(y;\mu^{(1)}) is not necessarily equal to the true marginal density f1(y)f_{1}(y), but we assume that both hh and the true density f1(y)f_{1}(y) are symmetric about μ(1)\mu^{(1)}. The assumption of symmetry is common in outlier-resistant estimation, and it is a prerequisite to use the sample median as an estimator of the population mean. Any symmetric density is available for hh as long as it satisfies Assumption 1 below. Typically, we assume hh is Gaussian. The tuning parameter γ\gamma controls the variability of the weight; this leads to the trade-off between outlier resistance and asymptotic efficiency. Assumption 1 formally describes the assumption on outliers.

Assumption 1.

Let h(y;μ)h(y;\mu) be a weighting density symmetric about μ\mu. Then, there exists γ>0\gamma>0 such that

ξ1(X,γ)=δ1(y|X)h(y;μ(1))γ(yμ(1))𝑑y0a.e.\displaystyle\xi_{1}(X,\gamma)=\int\delta_{1}(y|X)h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})dy\approx 0~{}~{}~{}~{}a.e. (9)

Denote an arbitrary bounded function by ϕ(x)\phi(x). Assumption 1 implies

ν1(ϕ):=𝔼[ϕ(X)ξ1(X,γ)]=ϕ(x)ξ1(x,γ)gX(x)𝑑x0.\displaystyle\nu_{1}(\phi):=\mathbb{E}[\phi(X)\xi_{1}(X,\gamma)]=\int\phi(x)\xi_{1}(x,\gamma)g_{X}(x)dx\approx 0. (10)

In particular, let ϕ(x)=1\phi(x)=1; then, the outliers are marginally negligible:

ν1(1)=𝔼[ξ1(X,γ)]=δ1(y)h(y;μ(1))γ(yμ(1))𝑑y0.\displaystyle\nu_{1}(1)=\mathbb{E}[\xi_{1}(X,\gamma)]=\int\delta_{1}(y)h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})dy\approx 0. (11)

Throughout this paper, we assume that γ\gamma is sufficiently large so that Assumption 1 holds. Assumption 1 is reduced to a simpler form when δ1(y|X)\delta_{1}(y|X) is Dirac’s delta at y0y_{0}; this is one of the core assumptions in Section 4.

Assumption 1’.

Let h(y;μ)h(y;\mu) be a weighting density that is symmetric about μ\mu, and assume that the density of outliers is Dirac’s delta at y0(μ(1))y_{0}~{}(\neq\mu^{(1)}), say δy0(y)\delta_{y_{0}}(y). Then, there exists γ>0\gamma>0 such that

δy0(y)h(y;μ(1))γ(yμ(1))𝑑y=h(y0;μ(1))γ(y0μ(1))0.\displaystyle\int\delta_{y_{0}}(y)h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})dy=h(y_{0};\mu^{(1)})^{\gamma}(y_{0}-\mu^{(1)})\approx 0. (12)

For example, if hh is a Gaussian density with mean μ(1)\mu^{(1)} and fixed variance, (12) tends to 0 as |y0||y_{0}|\rightarrow\infty for fixed γ>0\gamma>0 since h(y0;μ(1))γ(y0μ(1))exp(γ(y0μ(1))2)(y0μ(1))h(y_{0};\mu^{(1)})^{\gamma}(y_{0}-\mu^{(1)})\propto\exp{(-\gamma(y_{0}-\mu^{(1)})^{2})}(y_{0}-\mu^{(1)}).

3.1 IPW-type Estimator

First, we introduce an extension of the IPW estimator, called the density power inverse probability weighting (DP-IPW) estimator. The DP-IPW estimator is defined as a root of the following estimating equation:

i=1nTiπ(Xi;α^)h(Yi;μ)γ(Yiμ)=0.\displaystyle\sum_{i=1}^{n}\frac{T_{i}}{\pi(X_{i};\hat{\alpha})}h(Y_{i};\mu)^{\gamma}(Y_{i}-\mu)=0. (13)

Under no contamination, the DP-IPW estimating equation is unbiased.

Theorem 1.

Assume that the true propensity score π(X;α)\pi(X;\alpha^{*}) is given. Then, under no contamination, we have

𝔼g[Tπ(X;α)h(Y;μ(1))γ(Yμ(1))]=0.\displaystyle\mathbb{E}_{g}\left[\frac{T}{\pi(X;\alpha^{*})}h(Y;\mu^{(1)})^{\gamma}(Y-\mu^{(1)})\right]=0. (14)

Only an estimate π(X;α^)\pi(X;\hat{\alpha}) is available in practice, but the asymptotic consistency of (DP-)IPW still holds if the model π(X;α)\pi(X;\alpha) is correctly specified.

Now we consider the contaminated case. The bias of the DP-IPW estimating equation takes a different form from (8).

Theorem 2.

Assume YY is contaminated as (6). Under the same assumptions as those in Theorem 1, the expectation of the DP-IPW estimating equation is expressed as

𝔼g~[Tπ(X;α)h(Y;μ(1))γ(Yμ(1))]=B1+ν1(ε1),\displaystyle\mathbb{E}_{\tilde{g}}\left[\frac{T}{\pi(X;\alpha^{*})}h(Y;\mu^{(1)})^{\gamma}(Y-\mu^{(1)})\right]=B_{1}+\nu_{1}(\varepsilon_{1}), (15)
whereB1=ε1(x)h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x.\displaystyle\text{where}~{}~{}~{}B_{1}=-\int\varepsilon_{1}(x)\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx.

In particular, under homogeneous contamination, B1B_{1} reduces to 0.

The DP-IPW estimating equation is still biased even if ν1(ε1)\nu_{1}(\varepsilon_{1}) is small. Since we assume that ν1(ε1)\nu_{1}(\varepsilon_{1}) is negligible, B1B_{1} is dominant. However, compared to (8), the dominant bias of DP-IPW does not contain δ1\delta_{1}. This implies that the bias of DP-IPW is not strongly affected by the absolute value of outliers. Under homogeneous contamination, the dominant term disappears, so the bias is negligible.

3.2 DR-type Estimator

Next, we introduce the density power doubly robust (DP-DR) estimator. The DP-DR estimator is a straight application of the DR M-estimator and defined as a root of the following estimating equation:

i=1n[Tih(Yi;μ)γπ(Xi;α^)(Yiμ)Tiπ(Xi;α^)π(Xi;α^)𝔼q^[h(Y;μ)γ(Yμ)|T=1,X]]=0.\displaystyle\sum_{i=1}^{n}\left[\frac{T_{i}h(Y_{i};\mu)^{\gamma}}{\pi(X_{i};\hat{\alpha})}(Y_{i}-\mu)-\frac{T_{i}-\pi(X_{i};\hat{\alpha})}{\pi(X_{i};\hat{\alpha})}\mathbb{E}_{\hat{q}}\left[h(Y;\mu)^{\gamma}(Y-\mu)|T=1,X\right]\right]=0. (16)

As we have discussed in Section 2.1, 𝔼q^[h(Y;μ)γ(Yμ)|T=1,X]\mathbb{E}_{\hat{q}}\left[h(Y;\mu)^{\gamma}(Y-\mu)|T=1,X\right] is obtained by direct calculation or Monte Carlo approximation based on the parametric OR model q^:=q(y|T=1,X;β^)\hat{q}:=q(y|T=1,X;\hat{\beta}). In the Appendix, we present the explicit forms of 𝔼q^[h(Y;μ)γ(Yμ)|T=1,X]\mathbb{E}_{\hat{q}}\left[h(Y;\mu)^{\gamma}(Y-\mu)|T=1,X\right] when hh and qq are assumed to be Gaussian. The parameter β\beta is usually estimated in an outlier-resistant manner: Huber regression (Huber, 2004, Chap.7), MM estimator (Yohai, 1987), density power regression (Basu et al., 1998; Kanamori and Fujisawa, 2015), and γ\gamma-regression (Fujisawa and Eguchi, 2008; Kawashima and Fujisawa, 2017), for example.

The DP-DR estimator is doubly robust under no contamination as with the general DR M-estimator.

Theorem 3.

Assume either the true PS or the true OR model is given. Then, if there is no contamination, the DP-DR estimating equation is unbiased.

Now, we evaluate the bias of the DP-DR estimating equation under contamination.

Theorem 4.

Assume that YY is contaminated as (6). If the true PS model is given, the expectation of the DP-DR estimating equation is expressed as

ε1(x)h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x+ν1(ε1).\displaystyle-\int\varepsilon_{1}(x)\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx+\nu_{1}(\varepsilon_{1}). (17)

In particular, under homogeneous contamination, (17) reduces to ν1(ε1)\nu_{1}(\varepsilon_{1}). If the true OR model is given, the expectation of the DP-DR estimating equation is expressed as

ε1(x)P(T=1|x)π(x;α)h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x+ν1(ε1P(T=1|)/π(;α)).\displaystyle-\int\varepsilon_{1}(x)\frac{P(T=1|x)}{\pi(x;\alpha)}\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx+\nu_{1}(\varepsilon_{1}P(T=1|\cdot)/\pi(\cdot;\alpha)). (18)

Under homogeneous contamination, (18) becomes

ε1P(T=1|x)π(x;α)h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x+ν1(ε1P(T=1|)/π(;α)).\displaystyle-\varepsilon_{1}\int\frac{P(T=1|x)}{\pi(x;\alpha)}\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx+\nu_{1}(\varepsilon_{1}P(T=1|\cdot)/\pi(\cdot;\alpha)). (19)

Assuming that π(;α)\pi(\cdot;\alpha) is bounded away from 0 and 1, we find that P(T=1|)/π(;α)P(T=1|\cdot)/\pi(\cdot;\alpha) is bounded. Then, from Assumption 1, ν1(ε1P(T=1|)/π(;α))\nu_{1}(\varepsilon_{1}P(T=1|\cdot)/\pi(\cdot;\alpha)) is negligible. As with DP-IPW, the dominant bias is independent of δ\delta, indicating that the influence of outliers is reduced. Unfortunately, DP-DR is still biased in the PS-incorrect and OR-correct case even under homogeneous contamination because the dominant term of (19) is not eliminated.

In the OR-correct case, the reason why DP-DR is biased under homogeneous contamination is as follows. Under Assumption 1, the expectation of the DP-DR estimating function becomes

𝔼g[P(T=1|X)π(X;α){𝔼g~[ψ(Y(1);μ(1))|X]𝔼g[ψ(Y(1);μ(1))|X]}]\displaystyle~{}\mathbb{E}_{g}\left[\frac{P(T=1|X)}{\pi(X;\alpha)}\left\{\mathbb{E}_{\tilde{g}}[\psi(Y^{(1)};\mu^{(1)})|X]-\mathbb{E}_{g}[\psi(Y^{(1)};\mu^{(1)})|X]\right\}\right]
\displaystyle\approx 𝔼g[P(T=1|X)π(X;α){(1ε1)𝔼g[ψ(Y(1);μ(1))|X]𝔼g[ψ(Y(1);μ(1))|X]}],\displaystyle~{}\mathbb{E}_{g}\left[\frac{P(T=1|X)}{\pi(X;\alpha)}\left\{(1-\varepsilon_{1})\mathbb{E}_{g}[\psi(Y^{(1)};\mu^{(1)})|X]-\mathbb{E}_{g}[\psi(Y^{(1)};\mu^{(1)})|X]\right\}\right],

where we denote the density power estimating function by ψ\psi. In the last formula, it is found that the terms in the curly brackets do not cancel because the first term is reduced by 1ε11-\varepsilon_{1}. Based on this consideration, we propose a bias-corrected version of DP-DR, called the ε\varepsilonDP-DR estimator. ε\varepsilonDP-DR is designed to cancel the dominant bias under homogeneous contamination. The ε\varepsilonDP-DR estimator is a root of the following estimating equation:

i=1n[Tih(Yi;μ)γπ(Xi;α^)(Yiμ)Tiπ(Xi;α^)π(Xi;α^)(1ε^1)𝔼q^[h(Y;μ)γ(Yμ)|T=1,X]]=0,\displaystyle\sum_{i=1}^{n}\left[\frac{T_{i}h(Y_{i};\mu)^{\gamma}}{\pi(X_{i};\hat{\alpha})}(Y_{i}-\mu)-\frac{T_{i}-\pi(X_{i};\hat{\alpha})}{\pi(X_{i};\hat{\alpha})}(1-\hat{\varepsilon}_{1})\mathbb{E}_{\hat{q}}\left[h(Y;\mu)^{\gamma}(Y-\mu)|T=1,X\right]\right]=0, (20)

where ε^1\hat{\varepsilon}_{1} is a consistent estimator of the expected contamination ratio ε¯1=ε1(x)g(x)𝑑x\overline{\varepsilon}_{1}=\int\varepsilon_{1}(x)g(x)dx. ε^1\hat{\varepsilon}_{1} can be obtained simultaneously with the parametric OR model by the unnormalized modeling with the density power score (Kanamori and Fujisawa, 2015), for example. While DP-DR is a special case of the DR M-estimator, ε\varepsilonDP-DR goes beyond this framework by the bias correction. Under no contamination, the ε\varepsilonDP-DR estimating equation is asymptotically identical to the DP-DR estimating equation. The ε\varepsilonDP-DR estimating equation is also biased under heterogeneous contamination; however, the bias takes a different form.

Corollary 1.

If the true PS model is given, the expectation of the ε\varepsilonDP-DR estimating equation is equal to (17). If the true OR model is given, the expectation of the ε\varepsilonDP-DR estimating equation is expressed as

𝔼g[(ε¯1ε1(X))P(T=1|X)π(X;α)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]+ν1(ε1P(T=1|)/π(;α)).\displaystyle\mathbb{E}_{g}\left[(\overline{\varepsilon}_{1}-\varepsilon_{1}(X))\frac{P(T=1|X)}{\pi(X;\alpha)}\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right]+\nu_{1}(\varepsilon_{1}P(T=1|\cdot)/\pi(\cdot;\alpha)). (21)

The first term disappears under homogeneous contamination.

Proof.

The derivation is the same as that of Theorem 4. If ε1(X)\varepsilon_{1}(X) is constant ε1\varepsilon_{1}, the first term disappears because ε¯1=ε1g(x)𝑑x=ε1\overline{\varepsilon}_{1}=\varepsilon_{1}\int g(x)dx=\varepsilon_{1}. ∎

Similar to (19), the second term of (21) is approximately zero if we assume that π(;α)\pi(\cdot;\alpha) is bounded away from 0 and 1.

Remark

One may find that “ε(X)\varepsilon(X)”DP-DR would work better than ε\varepsilonDP-DR under heterogeneous contamination. In fact, the bias (21) will disappear if we replace ε¯\overline{\varepsilon} with ε(X)\varepsilon(X). However, it is necessary to model ε(X)\varepsilon(X) correctly for consistent estimation of “ε(X)\varepsilon(X)”DP-DR. To the best of our knowledge, no easy method is available for this purpose.

3.3 Summary

We have proposed three types of outlier resistant semiparametric estimators: DP-IPW, DP-DR, and ε\varepsilonDP-DR. Table 1 shows the bias of the estimating equations under the conditions discussed above. Under heterogeneous contamination, all estimators are biased, but the biases are hardly influenced by the absolute value of outliers. Furthermore, as discussed in Section 4, outliers have negligible influence if the contamination ratio is sufficiently small. ε\varepsilonDP-DR improves DP-DR in the OR-correct case under homogeneous contamination, but we continue to discuss DP-DR for three reasons: the contamination ratio is sometimes hard to estimate, the bias (19) is not serious if π(X;α)\pi(X;\alpha) is close to P(T=1|X)P(T=1|X), and the simulation results presented in Section 6 indicate that DP-DR remains better than the existing methods even in the OR-correct case.

Contamination model DP-IPW DP-DR ε\varepsilonDP-DR
No contamination PS-correct 0 0 0
OR-correct - 0 0
homogeneous: ε\varepsilon PS-correct 0\approx 0 0\approx 0 0\approx 0
OR-correct - ε𝔼[ϕ(X)]\approx\varepsilon\mathbb{E}[\phi(X)] 0\approx 0
heterogeneous: ε(X)\varepsilon(X) PS-correct 𝔼[ε(X)ϕ(X)]\approx\mathbb{E}[\varepsilon(X)\phi(X)] 𝔼[ε(X)ϕ(X)]\approx\mathbb{E}[\varepsilon(X)\phi(X)] 𝔼[ε(X)ϕ(X)]\approx\mathbb{E}[\varepsilon(X)\phi(X)]
OR-correct - 𝔼[ε(X)ϕ(X)]\approx\mathbb{E}[\varepsilon(X)\phi(X)] 𝔼[(ε¯ε(X))ϕ(X)]\approx\mathbb{E}[(\overline{\varepsilon}-\varepsilon(X))\phi(X)]
Table 1: Summary of the biases of the proposed estimating equations. The function ϕ(X)\phi(X) differs cell-by-cell. PS-correct means that the PS model is correctly specified and the OR model may not be; OR-correct means the opposite.

4 Influence-function-based Analysis of Outlier Resistance

As discussed in the previous section, the proposed estimators suffer less from outliers compared with ordinary estimators from the viewpoint of the unbiasedness of the estimating equation. In this section, we demonstrate that they are outlier-resistant from the viewpoint of the IF.

Here, we briefly review the IF for the univariate M-estimator and expand it to evaluate our estimators. Let GG be the distribution of ZZ\in\mathbb{R}, and let T(G)T(G) be a functional of G, which is the parameter of interest. The IF of T(G)T(G) is defined as

IF(z0;G):=limε0T((1ε)G+εΔz0)T(G)ε=ε{T((1ε)G+εΔz0)T(G)}|ε=0,\displaystyle IF(z_{0};G):=\lim_{\varepsilon\rightarrow 0}\frac{T((1-\varepsilon)G+\varepsilon\Delta_{z_{0}})-T(G)}{\varepsilon}=\left.\frac{\partial}{\partial\varepsilon}\{T((1-\varepsilon)G+\varepsilon\Delta_{z_{0}})-T(G)\}\right|_{\varepsilon=0}, (22)

where Δz0\Delta_{z_{0}} is a degenerate distribution at z0z_{0}. We also see that the latent bias T((1ε)G+εΔz0)T(G)T((1-\varepsilon)G+\varepsilon\Delta_{z_{0}})-T(G) can be approximated by εIF(z0;G)\varepsilon IF(z_{0};G). Therefore, the behavior of the IF approximates that of the latent bias. In the population, the M-estimator TM(G)T_{M}(G) satisfies ψ(z,TM(G))𝑑G(z)=0\int\psi(z,T_{M}(G))dG(z)=0. Then, the IF for TM(G)T_{M}(G) is obtained by differentiating ψ(z,TM((1ε)G+εΔz0)d{(1ε)G+εΔz0}(z)=0\int\psi(z,T_{M}((1-\varepsilon)G+\varepsilon\Delta_{z_{0}})d\{(1-\varepsilon)G+\varepsilon\Delta_{z_{0}}\}(z)=0 with respect to ε\varepsilon. This yields

IF(z0;G)=𝔼[ηψ(Z,η)|η=TM(G)]1ψ(z0,TM(G)).\displaystyle IF(z_{0};G)=-\mathbb{E}\left[\left.\frac{\partial}{\partial\eta}\psi(Z,\eta)\right|_{\eta=T_{M}(G)}\right]^{-1}\psi(z_{0},T_{M}(G)). (23)

The function ψ\psi is said to have a redescending property if ψ(z0,TM(G))\psi(z_{0},T_{M}(G)) approaches zero as the outlier |z0||z_{0}| increases. Therefore, when ψ\psi has the redescending property and z0z_{0} is an outlier, the latent bias is sufficiently small. This is favorable for outlier resistance.

Since ε1\varepsilon_{1} is dependent on XX, we cannot apply the IF directly to our estimators. To overcome this issue, we consider the IF with fixed covariates {Xi}i=1n\{X_{i}\}_{i=1}^{n}; this approach is similar to the fixed carrier model in Hampel et al. (2011, Chap.6). Consider the following estimating equation:

1ni=1n𝔼g~[ψ(Y,T,Xi;μ)|Xi]=0.\displaystyle\frac{1}{n}\sum_{i=1}^{n}\left.\mathbb{E}_{\tilde{g}}\left[\psi(Y,T,X_{i};\mu)\right|X_{i}\right]=0. (24)

If the fixed sample {Xi}i=1n\{X_{i}\}_{i=1}^{n} consists of i.i.d. observations, then the left-hand side of (24) converges to 𝔼g~[ψ(Y,T,X;μ)]\mathbb{E}_{\tilde{g}}[\psi(Y,T,X;\mu)] as nn\rightarrow\infty. Let μ~n(1)\tilde{\mu}^{(1)}_{n} denote a root of (24), and let μ~(1)\tilde{\mu}^{(1)} be a root of 𝔼g~[ψ(Y,T,X;μ)]\mathbb{E}_{\tilde{g}}[\psi(Y,T,X;\mu)]. Then, μ~n(1)\tilde{\mu}^{(1)}_{n} also converges to μ~(1)\tilde{\mu}^{(1)}. Therefore, μ~n(1)\tilde{\mu}^{(1)}_{n} shows roughly the same behavior as that of the target estimator μ~(1)\tilde{\mu}^{(1)}. The contaminated density g~\tilde{g} is defined as (6), and δ1(y|Xi)\delta_{1}(y|X_{i}) is assumed to be Dirac’s delta at y0y_{0}. The IF of Tn(G~)T_{n}(\tilde{G}) at XiX_{i} is obtained by differentiating (24) with respect to ε1(Xi)\varepsilon_{1}(X_{i}) at ε1(Xi)=0\varepsilon_{1}(X_{i})=0.

Because of the space, we discuss only the ε\varepsilonDP-DR. Assume that ε¯1=1ni=1nε1(Xi)\overline{\varepsilon}_{1}=\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{1}(X_{i}), then the IF of ε\varepsilonDP-DR is

𝔼g[ψμ|μ=μn(1)|Xi]1[P(T=1|Xi)π(Xi;α)h(y0μn(1))γ(y0μn(1))\displaystyle-\mathbb{E}_{g}\left[\left.\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right]^{-1}\left[\frac{P(T=1|X_{i})}{\pi(X_{i};\alpha)}h(y_{0}-\mu_{n}^{(1)})^{\gamma}(y_{0}-\mu^{(1)}_{n})\right.
n1nP(T=1|Xi)π(Xi;α)π(Xi;α)𝔼q^[h(Y;μn(1))γ(Yμn(1))|T=1,X]].\displaystyle\left.-\frac{n-1}{n}\frac{P(T=1|X_{i})-\pi(X_{i};\alpha)}{\pi(X_{i};\alpha)}\mathbb{E}_{\hat{q}}\left[h(Y;\mu_{n}^{(1)})^{\gamma}(Y-\mu_{n}^{(1)})|T=1,X\right]\right]. (25)

In the PS-correct case, the second term in square brackets is equal to zero, and then the IF tends to zero as |y0||y_{0}|\rightarrow\infty. In the OR-correct case, the second term does not disappear. Considering the limit of |y0||y_{0}|\rightarrow\infty, the IF converges to

n1n𝔼g[ψμ|μ=μn(1)|Xi]1\displaystyle\frac{n-1}{n}\mathbb{E}_{g}\left[\left.\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right]^{-1} [P(T=1|Xi)π(Xi;α)π(Xi;α)𝔼q^[h(Y;μn(1))γ(Yμn(1))|T=1,Xi]].\displaystyle\left[\frac{P(T=1|X_{i})-\pi(X_{i};\alpha)}{\pi(X_{i};\alpha)}\mathbb{E}_{\hat{q}}[h(Y;\mu_{n}^{(1)})^{\gamma}(Y-\mu^{(1)}_{n})|T=1,X_{i}]\right]. (26)

Thus, the ε\varepsilonDP-DR estimator has the redescending property only in the PS-correct case. In the OR-correct case, the influence cannot be eliminated, but the IF tends to a constant when |y0||y_{0}| tends to infinity, implying that the influence of the outlier is not serious. DP-DR has an IF similar to that of ε\varepsilonDP-DR, and DP-IPW has an IF similar to that of ε\varepsilonDP-DR whose PS is correct. The derivations of all IFs are presented in the Appendix.

Under homogeneous contamination, the ordinary IF is applicable, and we can see that the proposed estimators have the redescending property in the PS-correct case. Besides, ε\varepsilonDP-DR has the redescending property even in the OR-correct case; this result is consistent with Corollary 1. The IF-based analysis under homogeneous contamination is presented in the Appendix.

5 Asymptotic Properties

We discuss the asymptotic properties of the ε\varepsilonDP-DR estimator. For the other proposed estimators, we obtain similar results with small changes. The asymptotic properties can be obtained in a manner similar to that described in Hoshino (2007). Assume that the PS and OR models are regular and are estimated consistently if the models are correctly specified. Furthermore, the contamination ratio ε1\varepsilon_{1} is known. Note that when the contamination ratio is consistently estimated simultaneously with the OR model by Kanamori and Fujisawa (2015), we can replace β\beta with (ε1,βT)T(\varepsilon_{1},\beta^{T})^{T} in the following discussion.

We write (20) as 1ni=1nψi(μ;α^,β^)\frac{1}{n}\sum_{i=1}^{n}\psi_{i}(\mu;\hat{\alpha},\hat{\beta}), and let 1ni=1nsiPS(α)=0\frac{1}{n}\sum_{i=1}^{n}s^{PS}_{i}(\alpha)=0 and 1ni=1nsiOR(β)=0\frac{1}{n}\sum_{i=1}^{n}s^{OR}_{i}(\beta)=0 be the estimating equations for the PS and OR models, respectively. Let λ=(μ,αT,βT)T\lambda=(\mu,\alpha^{T},\beta^{T})^{T} be the parameter vector, and let the full estimating equation be defined as

i=1nSi(λ)=i=1n(ψi(μ;α,β)siPS(α)siOR(β))=𝟎.\displaystyle\sum_{i=1}^{n}S_{i}(\lambda)=\sum_{i=1}^{n}\left(\begin{array}[]{c}\psi_{i}(\mu;\alpha,\beta)\\ s^{PS}_{i}(\alpha)\\ s^{OR}_{i}(\beta)\end{array}\right)=\mathbf{0}. (30)

Let λ=(μ,αT,βT)T\lambda^{*}=(\mu^{*},\alpha^{*T},\beta^{*T})^{T} be a root of (30) in population. Note that, in this section, * does not necessarily mean that the model is correctly specified. With the results presented in Van der Vaart (2000, Chap.5), the following theorem holds under some regularity conditions.

Theorem 5.

Under the regularity conditions presented in the Appendix, the following asymptotic properties hold:

λ^\displaystyle\hat{\lambda} 𝑝λ,\displaystyle\overset{p}{\rightarrow}\lambda^{*}, (31)
n(λ^λ)\displaystyle\sqrt{n}(\hat{\lambda}-\lambda^{*}) 𝑑𝒩(𝟎,𝐕g~(λ)),\displaystyle\overset{d}{\rightarrow}\mathcal{N}\left(\mathbf{0},\mathbf{V}^{\tilde{g}}(\lambda^{*})\right), (32)

where 𝐕g~(λ)=𝐉g~(λ)1𝐊g~(λ){𝐉g~(λ)T}1\mathbf{V}^{\tilde{g}}(\lambda^{*})=\mathbf{J}^{\tilde{g}}(\lambda^{*})^{-1}\mathbf{K}^{\tilde{g}}(\lambda^{*})\{\mathbf{J}^{\tilde{g}}(\lambda^{*})^{T}\}^{-1}, 𝐉g~(λ)=𝔼g~[Si(λ)/λT]\mathbf{J}^{\tilde{g}}(\lambda^{*})=\mathbb{E}_{\tilde{g}}\left[\partial S_{i}(\lambda^{*})/\partial\lambda^{T}\right], and 𝐊g~(λ)=𝔼g~[Si(λ)Si(λ)T]\mathbf{K}^{\tilde{g}}(\lambda^{*})=\mathbb{E}_{\tilde{g}}\left[S_{i}(\lambda^{*})S_{i}(\lambda^{*})^{T}\right].

By using this and applying the results presented in Section 3.2, we find that the limit μ\mu^{*} is in the neighborhood of μ(1)\mu^{(1)}.

Theorem 6.

Let λ=(μ(1),αT,βT)T\lambda^{**}=(\mu^{(1)},\alpha^{*T},\beta^{*T})^{T} and assume that 𝐉11g~(λ)\mathbf{J}^{\tilde{g}}_{11}(\lambda) is nonzero within the interval [λ,λ][\lambda^{*},\lambda^{**}]. Under Assumption 1 and homogeneous contamination, if either the PS or the OR model is correct, it then holds that

μ=μ(1)+𝒪(ν1(ϕ)),\displaystyle\mu^{*}=\mu^{(1)}+\mathcal{O}(\nu_{1}(\phi)), (33)

where ϕ()=ε1\phi(\cdot)=\varepsilon_{1} (constant) in the PS-correct case and ϕ()=ε1P(T=1|)/π(;α)\phi(\cdot)=\varepsilon_{1}P(T=1|\cdot)/\pi(\cdot;\alpha) in the OR-correct case.

The proof of Theorem 6 and further discussions on the asymptotic variance are available in the Appendix.

6 Monte-Carlo Simulation

We conduct Monte-Carlo simulations to evaluate the performance of the proposed estimators. Our methods are compared with the naive IPW and DR estimators and some existing outlier-resistant methods (Firpo, 2007; Zhang et al., 2012; Díaz, 2017; Sued, Valdora, and Yohai, 2020). Since these methods focus on the median of the potential outcome, they are resistant to outliers at a certain level; but the median-based methods are not so resistant to heavy contamination. To the best of our knowledge, no method other than the proposed method has more outlier resistance than the median. Firpo’s IPW estimator (Firpo, 2007) is defined as

μ^Firpo=argminμi=1nTiπ(Xi;α^)(Yiμ)(0.5𝕀(Yiμ)),\displaystyle\hat{\mu}_{\mathrm{Firpo}}=\mathrm{arg}\min_{\mu}\sum_{i=1}^{n}\frac{T_{i}}{\pi(X_{i};\hat{\alpha})}(Y_{i}-\mu)(0.5-\mathbb{I}(Y_{i}\leq\mu)), (34)

where the function 𝕀\mathbb{I} is an indicator function. Zhang’s IPW median (Zhang et al., 2012) is based on the IPW-empirical distribution. Firpo’s IPW and Zhang’s IPW are almost equivalent except for a slight difference in their computation. Zhang’s and Sued’s DR methods (Zhang et al., 2012; Sued, Valdora, and Yohai, 2020) estimate the empirical distribution in a doubly robust way. They incorporate an IPW-type estimator into the first term. The remaining term of Zhang’s DR is based on the Gaussian cumulative distribution function of YY given XX. In contrast, Sued’s DR constructs the remaining term in a nonparametric manner. Diaz’s DR median (Díaz, 2017) is a different approach; it employs the targeted maximum likelihood estimator (TMLE) (Van Der Laan and Rubin, 2006). We implemented our methods, Zhang’s IPW/DR, and Sued’s DR in R. For Firpo’s IPW and TMLE, we used the causalquantile package (https://github.com/idiazst/causalquantile; Updated on 31 Aug 2017).

6.1 Numerical Algorithm for the Proposed Methods

Since the proposed estimating equations cannot be solved explicitly, we develop an iterative algorithm. Various algorithms are available, but we propose a standard algorithm for M-estimators (Huber, 2004; Hampel et al., 2011). Detailed algorithm is available in the Appendix. Hereafter, we suppose hh and qq are Gaussian. We also provide explicit updating formulae in this case. Note that some additional parameters of hh should be estimated in a roughly unbiased and outlier-resistant way.

6.2 Simulation Model

We simulated random observations based on a simple causal setting. The confounders (X1,X2)(X_{1},X_{2}) were independently drawn from a Gaussian or uniform distribution with mean zero and unit variance. The treatment TT was assigned along with the conditional probability P(T=1|X1,X2)P(T=1|X_{1},X_{2}) that was defined as a sigmoid function of 0.8X1+0.2X20.8X_{1}+0.2X_{2}. The potential outcomes (Y(1),Y(0))(Y^{(1)},Y^{(0)}) were generated according to a linear function of (X1,X2)(X_{1},X_{2}) with Gaussian error: Y(1)=μ(1)+1.2X1+0.3X2+eY^{(1)}=\mu^{(1)}+1.2X_{1}+0.3X_{2}+e and Y(0)=μ(0)+1.2X1+0.3X2+eY^{(0)}=\mu^{(0)}+1.2X_{1}+0.3X_{2}+e. μ(1)\mu^{(1)} and μ(0)\mu^{(0)} were set to 3 and 0, respectively. The standard deviation (SD) of ee was set to 0.72\sqrt{0.72}; then, SD[Y(1)]=SD[Y(0)]=1.5\mathrm{SD}[Y^{(1)}]=\mathrm{SD}[Y^{(0)}]=1.5. When the confounders were not Gaussian, the potential outcomes were not Gaussian. The observed outcome YY was defined as Y=TY(1)+(1T)Y(1)Y=TY^{(1)}+(1-T)Y^{(1)} under no contamination. Outliers were drawn from 𝒩(μ(t)+10σ(t),1)\mathcal{N}(\mu^{(t)}+10\sigma^{(t)},1), with σ(t)=SD[Y(t)]=1.5\sigma^{(t)}=\mathrm{SD}[Y^{(t)}]=1.5. For the homogeneous contamination settings, the contamination ratio was set to be a constant εt\varepsilon_{t}. For the heterogeneous contamination settings, the contamination ratio was set to be 1.5εt1.5\varepsilon_{t} if X1+X20X_{1}+X_{2}\leq 0 and 0.5εt0.5\varepsilon_{t} if X1+X2>0X_{1}+X_{2}>0. The average contamination ratio is set to εt{0,0.05,0.1,0.2}\varepsilon_{t}\in\{0,0.05,0.1,0.2\}. Then, the observations of YY were randomly replaced with outliers according to the contamination ratio. The sample size was fixed to n=100n=100 throughout the Monte Carlo simulations. Furthermore, we generated datasets in which the outcome follows a symmetric and heavy-tailed distribution. We drew the error term of Y(t)Y^{(t)} from the standard Cauchy distribution instead of inserting outliers.

6.3 Results

First, we performed a comparative study. The potential mean μ(1)\mu^{(1)} was estimated using the proposed and comparative methods. In this experiment, we used all settings illustrated in the previous section. The propensity score was estimated by logistic regression. The parametric OR was conducted in two ways: Gaussian MLE with non-outliers or unnormalized Gaussian modeling (the tuning parameter was set to 0.5) (Kanamori and Fujisawa, 2015). For the DR estimators, we investigated three patterns of model misspecification: PS-correct/OR-correct, PS-correct/OR-incorrect, and PS-incorrect/OR-correct. For the model-correct case, we included an intercept and (X1,X2)(X_{1},X_{2}) as covariates. For the model-incorrect case, we included only an intercept and X2X_{2}. We performed 10,000 simulations for every setting and method. Tables 2 and 3 show the results of the comparative study when the covariates were Gaussian and the OR for the DR-type estimators was the Gaussian MLE with non-outliers. The estimation error was measured by the root mean square error (RMSE). The mean and SD of all estimates, the mean computation time, and the results for the other settings are provided in the Appendix. In Table 2, the naive IPW estimator had a significantly larger RMSE under contamination. Both the median-based methods and DP-IPW dramatically reduced the RMSE. As the contamination ratio increased, the RMSE increased. The RMSE tended to be larger for heterogeneous contamination than for homogeneous contamination. When the optimal γ\gamma was chosen, the proposed method outperformed the comparative methods and had the smallest RMSE for all settings. Looking at Table 3, the results for the DR-type estimators were similar to those for the IPW estimators. The proposed method with a proper γ\gamma outperformed the comparative methods and had the smallest RMSE in all settings. DP-DR and ε\varepsilonDP-DR performed similarly, although ε\varepsilonDP-DR was slightly superior in many settings. Among the median-based methods, TMLE performed better, but it took much more time than the other methods, including the proposed methods, and occasionally (<1%<1\%) failed to converge.

No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW Naive 0.222 0.957 1.683 3.153 0.993 1.752 3.253
median (Firpo) 0.257 0.294 0.367 0.649 0.306 0.409 0.769
median (Zhang-IPW) 0.257 0.294 0.367 0.649 0.306 0.409 0.769
DP-IPW (γ=0.1\gamma=0.1) 0.218 0.276 0.531 2.263 0.293 0.609 2.377
DP-IPW (γ=0.5\gamma=0.5) 0.227 0.249 0.272 0.639 0.245 0.287 0.726
DP-IPW (γ=1.0\gamma=1.0) 0.261 0.271 0.275 0.413 0.262 0.281 0.498
Table 2: Results of the comparative study of the IPW-type estimators. Each figure is RMSE between each estimate and the true value. The covariates XX were generated from Gaussian distributions.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
DR(T/T) Naive 0.184 0.957 1.684 3.154 0.997 1.758 3.265
median (Zhang-DR) 0.239 0.317 0.391 0.733 0.330 0.452 0.905
median (Sued) 0.238 0.316 0.388 0.693 0.329 0.450 0.869
median (TMLE) 0.237 0.280 0.359 0.603 0.295 0.402 0.701
DP-DR (γ=0.1\gamma=0.1) 0.183 0.302 0.564 2.262 0.318 0.649 2.394
DP-DR (γ=0.5\gamma=0.5) 0.202 0.285 0.326 0.697 0.274 0.349 0.834
DP-DR (γ=1.0\gamma=1.0) 0.240 0.288 0.307 0.524 0.287 0.336 0.669
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.183 0.296 0.554 2.255 0.314 0.636 2.385
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.202 0.264 0.302 0.669 0.271 0.323 0.793
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.240 0.287 0.299 0.513 0.286 0.335 0.648
DR(T/F) Naive 0.237 0.963 1.686 3.156 1.001 1.758 3.262
median (Zhang-DR) 0.275 0.342 0.408 0.741 0.350 0.465 0.912
median (Sued) 0.275 0.342 0.407 0.699 0.350 0.464 0.872
median (TMLE) 0.242 0.284 0.363 0.622 0.297 0.404 0.719
DP-DR (γ=0.1\gamma=0.1) 0.237 0.314 0.561 2.267 0.330 0.644 2.393
DP-DR (γ=0.5\gamma=0.5) 0.247 0.319 0.349 0.714 0.319 0.361 0.839
DP-DR (γ=1.0\gamma=1.0) 0.280 0.334 0.347 0.581 0.329 0.372 0.709
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.237 0.311 0.557 2.264 0.328 0.640 2.388
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.247 0.317 0.344 0.694 0.313 0.356 0.817
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.280 0.333 0.338 0.551 0.327 0.369 0.708
DR(F/T) Naive 0.181 0.879 1.591 3.026 0.826 1.490 2.813
median (Zhang-DR) 0.237 0.263 0.316 0.503 0.269 0.337 0.548
median (Sued) 0.236 0.272 0.346 0.599 0.277 0.364 0.627
median (TMLE) 0.234 0.260 0.309 0.478 0.265 0.328 0.522
DP-DR (γ=0.1\gamma=0.1) 0.182 0.192 0.345 2.057 0.191 0.299 1.681
DP-DR (γ=0.5\gamma=0.5) 0.199 0.206 0.218 0.366 0.203 0.209 0.283
DP-DR (γ=1.0\gamma=1.0) 0.230 0.232 0.239 0.273 0.230 0.233 0.242
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.182 0.193 0.381 2.207 0.194 0.335 1.839
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.199 0.203 0.208 0.376 0.203 0.212 0.318
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.230 0.230 0.231 0.243 0.231 0.237 0.260
Table 3: Results of the comparative study of the IPW-type estimators. Each figure is RMSE between each estimate and the true value. The covariates XX were generated from Gaussian distributions, and the outcome regression was obtained by the Gaussian MLE using non-outliers. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.

Table 4 shows the RMSE of each method on the data with Cauchy error. As well as the above experiments, the proposed method performed better than the comparative methods. In this setting, we only used the unnormalized Gaussian modeling for OR for the DR-type estimators. Only in the PS-correct/OR-incorrect case, the median (TMLE) performed slightly better than the proposed method.

Distribution of XX
Gaussian Uniform
IPW(T/-) Naive 274.024 246.118
median (Firpo) 0.414 0.438
median (Zhang-IPW) 0.414 0.438
DP-IPW (γ=0.1\gamma=0.1) 0.443 0.425
DP-IPW (γ=0.5\gamma=0.5) 0.367 0.363
DP-IPW (γ=1.0\gamma=1.0) 0.380 0.383
DR(T/T) Naive 275.447 247.011
median (Zhang-DR) 0.415 0.431
median (Sued) 0.408 0.430
median (TMLE) 0.392 0.425
DP-DR (γ=0.1\gamma=0.1) 0.501 0.420
DP-DR (γ=0.5\gamma=0.5) 0.363 0.356
DP-DR (γ=1.0\gamma=1.0) 0.372 0.374
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.487 0.420
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.361 0.355
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.370 0.374
DR(T/F) Naive 275.446 247.011
median (Zhang-DR) 0.456 0.443
median (Sued) 0.436 0.441
median (TMLE) 0.394 0.427
DP-DR (γ=0.1\gamma=0.1) 0.514 0.431
DP-DR (γ=0.5\gamma=0.5) 0.404 0.369
DP-DR (γ=1.0\gamma=1.0) 0.418 0.389
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.503 0.430
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.399 0.368
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.412 0.388
DR(F/T) Naive 263.629 177.037
median (Zhang-DR) 0.390 0.429
median (Sued) 0.373 0.400
median (TMLE) 0.389 0.429
DP-DR (γ=0.1\gamma=0.1) 0.390 0.401
DP-DR (γ=0.5\gamma=0.5) 0.358 0.376
DP-DR (γ=1.0\gamma=1.0) 0.364 0.393
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.377 0.385
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.328 0.338
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.334 0.351
Table 4: Results of the comparative study using the heavy-tailed data. Each figure is RMSE between each estimate and the true value. The covariates XX were generated from Gaussian or uniform distributions. The OR model for the DR-type estimators were obtained by the unnormalized Gaussian modeling. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.

Next, we conduct a γ\gamma-sensitivity study. μ(1)\mu^{(1)} was estimated by the proposed methods with different γ\gammas. XX had a Gaussian distribution, and the contamination ratio varied in {0,0.05,0.1,0.2}\{0,0.05,0.1,0.2\} under homogeneous contamination. For the DR-type estimators, the outcome regression was performed by the Gaussian MLE using nonoutliers. We simulated 10,000 datasets for every setting and method. Table 5 shows the results of the γ\gamma-sensitivity study. As in the comparative study, when the ratio of outliers increased, the bias increased. Larger γ\gamma resulted in increased variance. When the contamination ratio was small, it was sufficient to use a small γ\gamma such as γ=0.1\gamma=0.1 or 0.20.2 to remove the adverse effect of outliers. Even in highly contaminated cases, γ>1.0\gamma>1.0 was not needed. Comparing the estimates of DP-DR and ε\varepsilonDP-DR in the PS-incorrect/OR-correct case, it can be found that the DP-DR estimates were biased especially when ε\varepsilon was large, and contrarily, the ε\varepsilonDP-DR estimates were almost equal to the true value 3. This result shows that the bias correction by 1ε^1-\hat{\varepsilon} worked well in our experiments.

As in many other outlier-resistant statistical methods, parameter tuning is challenging. We suggest a possible policy on this issue based on the solution paths of the proposed estimators, which is provided in the Appendix. Looking at the paths, the influence of outliers decreased as γ\gamma increased, and the paths became stable around the true value after reaching a certain γ\gamma. Thus, we suggest using the smallest γ\gamma for which the estimate is stable.

PS/OR ε\varepsilon γ=0.0\gamma=0.0 0.1 0.2 0.5 1.0 1.5 2.0
DP-IPW T/- 0.00 3.004 (0.22) 2.998 (0.22) 2.994 (0.22) 2.986 (0.23) 2.980 (0.26) 2.974 (0.30) 2.970 (0.34)
0.05 3.749 (0.59) 3.030 (0.27) 2.999 (0.26) 2.987 (0.25) 2.978 (0.27) 2.970 (0.30) 2.963 (0.33)
0.10 4.493 (0.78) 3.142 (0.51) 3.015 (0.32) 2.989 (0.27) 2.977 (0.27) 2.969 (0.30) 2.963 (0.33)
0.20 5.983 (1.02) 4.492 (1.70) 3.536 (1.39) 3.052 (0.64) 2.990 (0.41) 2.978 (0.39) 2.971 (0.40)
DP-DR T/T 0.00 2.999 (0.18) 2.998 (0.18) 2.997 (0.19) 2.996 (0.20) 2.992 (0.24) 2.989 (0.28) 2.985 (0.31)
0.05 3.745 (0.60) 3.029 (0.30) 3.002 (0.27) 2.997 (0.29) 2.991 (0.29) 2.985 (0.31) 2.980 (0.34)
0.10 4.489 (0.79) 3.140 (0.55) 3.017 (0.36) 3.000 (0.33) 2.992 (0.31) 2.986 (0.32) 2.981 (0.33)
0.20 5.979 (1.04) 4.465 (1.72) 3.532 (1.41) 3.060 (0.69) 3.009 (0.52) 2.999 (0.51) 2.994 (0.51)
T/F 0.00 3.004 (0.24) 2.998 (0.24) 2.994 (0.24) 2.986 (0.25) 2.979 (0.28) 2.974 (0.32) 2.968 (0.36)
0.05 3.750 (0.60) 3.033 (0.31) 3.001 (0.29) 2.989 (0.32) 2.978 (0.33) 2.970 (0.36) 2.963 (0.39)
0.10 4.494 (0.78) 3.150 (0.54) 3.020 (0.37) 2.992 (0.35) 2.979 (0.35) 2.970 (0.37) 2.963 (0.39)
0.20 5.984 (1.03) 4.490 (1.71) 3.546 (1.41) 3.059 (0.71) 3.001 (0.58) 2.985 (0.55) 2.975 (0.54)
F/T 0.00 2.999 (0.18) 2.999 (0.18) 2.999 (0.18) 3.001 (0.20) 3.005 (0.23) 3.010 (0.26) 3.014 (0.29)
0.05 3.725 (0.50) 2.997 (0.19) 2.976 (0.19) 2.975 (0.20) 2.978 (0.23) 2.982 (0.26) 2.986 (0.29)
0.10 4.451 (0.65) 3.051 (0.34) 2.956 (0.21) 2.950 (0.21) 2.953 (0.23) 2.956 (0.26) 2.960 (0.28)
0.20 5.902 (0.86) 4.326 (1.57) 3.301 (1.15) 2.907 (0.35) 2.895 (0.25) 2.897 (0.26) 2.900 (0.28)
ε\varepsilonDP-DR T/T 0.00 2.999 (0.18) 2.998 (0.18) 2.997 (0.19) 2.996 (0.20) 2.992 (0.24) 2.989 (0.28) 2.985 (0.31)
0.05 3.745 (0.60) 3.028 (0.29) 3.002 (0.27) 2.997 (0.26) 2.991 (0.29) 2.985 (0.31) 2.980 (0.34)
0.10 4.489 (0.78) 3.138 (0.54) 3.017 (0.35) 2.999 (0.30) 2.991 (0.30) 2.985 (0.32) 2.980 (0.33)
0.20 5.978 (1.03) 4.464 (1.72) 3.531 (1.40) 3.058 (0.67) 3.007 (0.51) 2.998 (0.50) 2.993 (0.51)
T/F 0.00 3.004 (0.24) 2.998 (0.24) 2.994 (0.24) 2.986 (0.25) 2.979 (0.28) 2.974 (0.32) 2.968 (0.36)
0.05 3.750 (0.60) 3.033 (0.31) 3.001 (0.29) 2.989 (0.32) 2.978 (0.33) 2.970 (0.36) 2.963 (0.39)
0.10 4.493 (0.78) 3.149 (0.54) 3.020 (0.36) 2.992 (0.34) 2.978 (0.34) 2.970 (0.37) 2.963 (0.39)
0.20 5.983 (1.02) 4.489 (1.71) 3.543 (1.40) 3.057 (0.69) 2.998 (0.55) 2.984 (0.54) 2.976 (0.54)
F/T 0.00 2.999 (0.18) 2.999 (0.18) 2.999 (0.18) 3.001 (0.20) 3.005 (0.23) 3.010 (0.26) 3.014 (0.29)
0.05 3.746 (0.50) 3.020 (0.19) 2.998 (0.19) 2.998 (0.20) 3.001 (0.23) 3.005 (0.26) 3.009 (0.29)
0.10 4.493 (0.66) 3.108 (0.37) 3.004 (0.20) 2.998 (0.21) 3.001 (0.23) 3.004 (0.26) 3.007 (0.28)
0.20 5.986 (0.87) 4.541 (1.58) 3.486 (1.24) 3.020 (0.38) 3.003 (0.24) 3.005 (0.25) 3.008 (0.27)
Table 5: Results of γ\gamma-sensitivity study. Each figure displays the mean (SD) of 10,000 simulations for each setting. In the second column, "T" and "F" denote the correct and the incorrect modeling, respectively.

7 Real Data Analysis

In this section, we demonstrate the estimation of the ATE on a real dataset. We use the data of the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study (NHEFS). The NHEFS is a national longitudinal study that was performed by U.S. public agencies. We use the processed dataset available online (Hernán and Robins, 2020, https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/). The NHEFS dataset contains 1,566 observations of smokers who were enrolled in the study in 1971–75. By the follow-up visit in 1982, 403 (25.7%) participants had quit smoking. The study goal was to evaluate the treatment effect of smoking cessation (T=1T=1) on weight gain (YY). Other than the treatment and outcome, several baseline variables were collected, including sex, age, race, education level, intensity and duration of smoking, physical activity in daily life, recreational exercise, and baseline weight. We used all of them to control for confounding in a similar manner to that of Hernán and Robins (2020). We included linear and quadratic terms for all continuous covariates (age, intensity and duration of smoking, and baseline weight) and dummy terms for the discrete covariates. The propensity score was estimated by logistic regression, and outcome regression was performed by unnormalized Gaussian modeling (the tuning parameter was set to 0.2). The original dataset does not contain obvious outliers; then, we randomly replaced 10% observations with outliers drawn from 𝒩(100,52)\mathcal{N}(100,5^{2}). Then, we estimated μ(1)\mu^{(1)}, μ(0)\mu^{(0)} and the ATE by the same methods in the Monte Carlo simulations. This process was repeated 10,000 times, and we summarized the results in Table 6. For reference, we estimated every target quantity using the naive IPW/DR using the original data.

For the IPW-type estimators, the median-based methods gave larger estimates of μ(1)\mu^{(1)} and μ(0)\mu^{(0)} than those in the case of IPW (no outliers). In particular, μ(0)\mu^{(0)} was estimated to be much larger. As a result, when using the median-based methods, the ATE was estimated to be smaller than that in the case of IPW (no outliers). By contrast, DP-IPW overestimated μ(1)\mu^{(1)} with γ=0.05\gamma=0.05 and underestimated μ(1)\mu^{(1)} with γ0.10\gamma\geq 0.10. It overestimated μ(0)\mu^{(0)} compared to the case of IPW (no outliers), and this tendency was strengthened by increasing γ\gamma. However, because the extent of overestimation of μ(0)\mu^{(0)} was smaller than that in the case of median-based methods, the estimate of the ATE by DP-IPW was closer to that obtained using IPW (no outliers) than by using the median-based methods. The DR-type estimators showed similar results. The median-based methods overestimated μ(1)\mu^{(1)} and μ(0)\mu^{(0)}. DP-DR and ε\varepsilonDP-DR underestimated μ(1)\mu^{(1)} and overestimated μ(0)\mu^{(0)}. The ATE was estimated better by DP-DR and ε\varepsilonDP-DR than by the median-based methods. DP-DR and ε\varepsilonDP-DR had the same tendency of estimation bias and γ\gamma; a larger γ\gamma value increased the bias.

Target Quantities
μ(1)\mu^{(1)} μ(0)\mu^{(0)} ATE
IPW (no outliers) 5.221 (-) 1.780 (-) 3.441 (-)
IPW 14.718 (1.57) 11.607 (0.87) 3.111 (1.78)
median (Firpo) 5.439 (0.21) 2.753 (0.10) 2.686 (0.24)
median (Zhang-IPW) 5.439 (0.21) 2.753 (0.10) 2.686 (0.24)
DP-IPW (γ=0.05\gamma=0.05) 5.597 (0.30) 1.851 (0.07) 3.746 (0.31)
DP-IPW (γ=0.10\gamma=0.10) 5.157 (0.15) 1.819 (0.07) 3.338 (0.17)
DP-IPW (γ=0.20\gamma=0.20) 5.089 (0.15) 1.875 (0.06) 3.215 (0.16)
DP-IPW (γ=0.50\gamma=0.50) 4.949 (0.15) 2.007 (0.06) 2.941 (0.16)
DR (no outliers) 5.136 (-) 1.772 (-) 3.364 (-)
DR 14.574 (1.57) 11.589 (0.90) 2.985 (1.81)
median (Zhang-DR) 5.352 (0.20) 2.743 (0.10) 2.609 (0.22)
median (Sued) 5.353 (0.20) 2.744 (0.10) 2.609 (0.23)
median (TMLE) 5.363 (0.21) 2.739 (0.10) 2.624 (0.23)
DP-DR (γ=0.05\gamma=0.05) 5.478 (0.27) 1.842 (0.07) 3.636 (0.28)
DP-DR (γ=0.10\gamma=0.10) 5.057 (0.16) 1.810 (0.07) 3.248 (0.17)
DP-DR (γ=0.20\gamma=0.20) 4.983 (0.16) 1.865 (0.06) 3.119 (0.17)
DP-DR (γ=0.50\gamma=0.50) 4.834 (0.16) 1.997 (0.06) 2.837 (0.17)
ε\varepsilonDP-DR (γ=0.05\gamma=0.05) 5.574 (0.29) 1.851 (0.07) 3.723 (0.30)
ε\varepsilonDP-DR (γ=0.10\gamma=0.10) 5.148 (0.15) 1.819 (0.07) 3.330 (0.17)
ε\varepsilonDP-DR (γ=0.20\gamma=0.20) 5.080 (0.15) 1.874 (0.06) 3.206 (0.17)
ε\varepsilonDP-DR (γ=0.50\gamma=0.50) 4.937 (0.15) 2.007 (0.06) 2.930 (0.16)
Table 6: Results of the NHEFS data analysis. Mean and SD are computed on 2,000 bootstrap samples.

Acknowledgements

This work was partially supported by JSPS KAKENHI Grant Number 17K00065.

References

  • Robins et al. [1994] James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
  • Rosenbaum and Rubin [1983] Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
  • Bang and Robins [2005] Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
  • Rousseeuw and van Zomeren [1990] Peter J Rousseeuw and Bert C van Zomeren. Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc., 85(411):633–639, September 1990.
  • Canavire-Bacarreza et al. [2021] Gustavo Canavire-Bacarreza, Luis Castro Peñarrieta, and Darwin Ugarte Ontiveros. Outliers in Semi-Parametric estimation of treatment effects. Econometrics, 9(2):19, April 2021.
  • Huber [2004] Peter J Huber. Robust statistics, volume 523. John Wiley & Sons, 2004.
  • Hampel et al. [2011] Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel. Robust statistics: the approach based on influence functions, volume 196. John Wiley & Sons, 2011.
  • Maronna et al. [2019] Ricardo A Maronna, R Douglas Martin, Victor J Yohai, and Matías Salibián-Barrera. Robust statistics: theory and methods (with R). John Wiley & Sons, 2019.
  • Firpo [2007] Sergio Firpo. Efficient semiparametric estimation of quantile treatment effects. Econometrica, 75(1):259–276, 2007.
  • Zhang et al. [2012] Zhiwei Zhang, Zhen Chen, James F Troendle, and Jun Zhang. Causal inference on quantiles with an obstetric application. Biometrics, 68(3):697–706, 2012.
  • Díaz [2017] Iván Díaz. Efficient estimation of quantiles in missing data models. Journal of Statistical Planning and Inference, 190:39–51, 2017.
  • Sued et al. [2020] Mariela Sued, Marina Valdora, and Víctor Yohai. Robust doubly protected estimators for quantiles with missing data. TEST, 63(3):819–843, 2020.
  • Fujisawa and Eguchi [2008] Hironori Fujisawa and Shinto Eguchi. Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99(9):2053–2081, 2008.
  • Imbens and Rubin [2015] Guido W Imbens and Donald B Rubin. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press, April 2015.
  • Lunceford and Davidian [2004] Jared K Lunceford and Marie Davidian. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in medicine, 23(19):2937–2960, 2004.
  • Scharfstein et al. [1999] Daniel O Scharfstein, Andrea Rotnitzky, and James M Robins. Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448):1096–1120, 1999.
  • Robins and Rotnitzky [1995] James M Robins and Andrea Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90(429):122–129, 1995.
  • Tsiatis [2006] Anastasios Tsiatis. Semiparametric Theory and Missing Data. Springer New York, June 2006.
  • Van der Vaart [2000] Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  • Hoshino [2007] Takahiro Hoshino. Doubly robust-type estimation for covariate adjustment in latent variable modeling. Psychometrika, 72(4):535–549, 2007.
  • Fujisawa [2013] Hironori Fujisawa. Normalized estimating equation for robust parameter estimation. Electronic Journal of Statistics, 7:1587–1606, 2013.
  • Maechler et al. [2021] Martin Maechler, Peter Rousseeuw, Christophe Croux, Valentin Todorov, Andreas Ruckstuhl, Matias Salibian-Barrera, Tobias Verbeke, Manuel Koller, Eduardo LT Conceicao, and Maria Anna di Palma. robustbase: Basic Robust Statistics, 2021. R package version 0.93.9.
  • Windham [1995] Michael P Windham. Robustifying model fitting. Journal of the Royal Statistical Society. Series B (Methodological), pages 599–609, 1995.
  • Basu et al. [1998] Ayanendranath Basu, Ian R Harris, Nils L Hjort, and MC Jones. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549–559, 1998.
  • Jones et al. [2001] MC Jones, Nils Lid Hjort, Ian R Harris, and Ayanendranath Basu. A comparison of related density-based minimum divergence estimators. Biometrika, 88(3):865–873, 2001.
  • Yohai [1987] Victor J Yohai. High breakdown-point and high efficiency robust estimates for regression. The Annals of Statistics, pages 642–656, 1987.
  • Kanamori and Fujisawa [2015] Takafumi Kanamori and Hironori Fujisawa. Robust estimation under heavy contamination using unnormalized models. Biometrika, 102(3):559–572, 2015.
  • Kawashima and Fujisawa [2017] Takayuki Kawashima and Hironori Fujisawa. Robust and sparse regression via γ\gamma-divergence. Entropy, 19(11):608, 2017.
  • Van Der Laan and Rubin [2006] Mark J Van Der Laan and Daniel Rubin. Targeted maximum likelihood learning. The international journal of biostatistics, 2(1), 2006.
  • Hernán and Robins [2020] Miguel A Hernán and James M Robins. Causal inference: what if. Boca Raton: Chapman & Hall/CRC, 2020.

Appendix Appendix A Proofs for Unbiasedness of Estimating Equations

Appendix A.1 Proof of Theorem 1

Proof.
𝔼g[Th(Y;μ(1))γπ(X;α)(Yμ(1))]=\displaystyle\mathbb{E}_{g}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})\right]= 𝔼g[𝔼g[Th(Y;μ(1))γπ(X;α)(Yμ(1))|X]]\displaystyle~{}\mathbb{E}_{g}\left[\left.\mathbb{E}_{g}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})\right|X\right]\right]
=\displaystyle= 𝔼g[P(T=1|X)π(X;α)𝔼g[h(Y;μ(1))γ(Yμ(1))|T=1,X]]\displaystyle~{}\mathbb{E}_{g}\left[\frac{P(T=1|X)}{\pi(X;\alpha^{*})}\left.\mathbb{E}_{g}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu^{(1)})\right|T=1,X\right]\right]
=\displaystyle= 𝔼f1[h(Y(1);μ(1))γ(Y(1)μ(1))]\displaystyle~{}\mathbb{E}_{f_{1}}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})\right]

The third equality holds from the causal consistency and the conditional unconfoundedness. Since h(y;μ(1))h(y;\mu^{(1)}) and f1(y)f_{1}(y) are symmetric about μ(1)\mu^{(1)}, this expectation is equal to zero:

𝔼f1[h(Y(1);μ(1))γ(Y(1)μ(1))]=h(y;μ(1))γ(yμ(1))f1(y)𝑑y=0.\displaystyle\mathbb{E}_{f_{1}}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})\right]=\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})f_{1}(y)dy=0.

Appendix A.2 Proof of Theorem 2

Proof.
𝔼g~[Th(Y;μ(1))γπ(X;α)(Yμ(1))]\displaystyle~{}\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})\right]
=\displaystyle= 𝔼g[𝔼g~[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle~{}\mathbb{E}_{g}\left[\left.\mathbb{E}_{\tilde{g}}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})\right|X\right]\right]
=\displaystyle= {(1ε1(x))h(y;μ(1))γ(yμ(1))g(y|x)𝑑y+ε1(x)h(y;μ(1))γ(yμ(1))δ1(y|x)𝑑y}g(x)𝑑x\displaystyle~{}\int\left\{(1-\varepsilon_{1}(x))\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy+\varepsilon_{1}(x)\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})\delta_{1}(y|x)dy\right\}g(x)dx
=\displaystyle= (1ε1(x))h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x+ν1(ε1)\displaystyle~{}\int(1-\varepsilon_{1}(x))\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx+\nu_{1}(\varepsilon_{1}) (A1)
=\displaystyle= ε1(x)h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x+ν1(ε1).\displaystyle~{}-\int\varepsilon_{1}(x)\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx+\nu_{1}(\varepsilon_{1}). (A2)

If ε1(x)=ε1\varepsilon_{1}(x)=\varepsilon_{1}, the first term disappears:

ε1h(y;μ(1))γ(yμ(1))g(y|x)g(x)𝑑y𝑑x=ε1h(y;μ(1))γ(yμ(1))f1(y)𝑑y=0.\displaystyle-\varepsilon_{1}\iint h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)g(x)dydx=-\varepsilon_{1}\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})f_{1}(y)dy=0.

Appendix A.3 Proof of Theorem 3

Proof.

First, we assume that the true PS is given.

𝔼g[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼q^[h(Y;μ(1))γ(Yμ)|T=1,X]]\displaystyle~{}\mathbb{E}_{g}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\mathbb{E}_{\hat{q}}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu)|T=1,X\right]\right]
=\displaystyle= 𝔼g[𝔼g[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼q^[h(Y;μ(1))γ(Yμ)|T=1,X]|X]]\displaystyle~{}\mathbb{E}_{g}\left[\left.\mathbb{E}_{g}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\mathbb{E}_{\hat{q}}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu)|T=1,X\right]\right|X\right]\right]
=\displaystyle= 𝔼g[𝔼g[Tπ(X;α)π(X;α)|X]𝔼q^[h(Y;μ(1))γ(Yμ)|T=1,X]]\displaystyle~{}\mathbb{E}_{g}\left[-\left.\mathbb{E}_{g}\left[\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\right|X\right]\mathbb{E}_{\hat{q}}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu)|T=1,X\right]\right]
=\displaystyle= 0\displaystyle~{}0

Next, we assume that the true OR model is given.

𝔼g[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼g[h(Y;μ)γ(Yμ)|T=1,X]]\displaystyle~{}\mathbb{E}_{g}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha)}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha)}{\pi(X;\alpha)}\mathbb{E}_{g}\left[h(Y;\mu)^{\gamma}(Y-\mu)|T=1,X\right]\right]
=\displaystyle= 𝔼g[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle~{}\mathbb{E}_{g}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha)}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha)}{\pi(X;\alpha)}\mathbb{E}_{g}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X\right]\right]
=\displaystyle= 𝔼g[𝔼g[Tπ(X;α)h(Y;μ(1))γ(Yμ(1))|X]\displaystyle~{}\mathbb{E}_{g}\left[\left.\mathbb{E}_{g}\left[\frac{T}{\pi(X;\alpha)}h(Y;\mu^{(1)})^{\gamma}(Y-\mu^{(1)})\right|X\right]\right.
𝔼g[Tπ(X;α)1|X]𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left.-\left.\mathbb{E}_{g}\left[\frac{T}{\pi(X;\alpha)}-1\right|X\right]\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right]
=\displaystyle= 𝔼g[P(T=1|X)π(X;α)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]\displaystyle~{}\mathbb{E}_{g}\left[\frac{P(T=1|X)}{\pi(X;\alpha)}\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right.
(P(T=1|X)π(X;α)1)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left.-\left(\frac{P(T=1|X)}{\pi(X;\alpha)}-1\right)\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right]
=\displaystyle= 𝔼g[𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle~{}\mathbb{E}_{g}\left[\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right]
=\displaystyle= 0\displaystyle~{}0

Thus, the DP-DR estimating equation has double robustness under no contamination. ∎

Appendix A.4 Proof of Theorem 4

Proof.

If the true PS model is given, the DP-DR estimating equation yields

𝔼g~[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼q^[h(Y;μ(1))γ(Yμ)|T=1,X]]\displaystyle~{}\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\mathbb{E}_{\hat{q}}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu)|T=1,X\right]\right]
=\displaystyle= 𝔼g[𝔼g~[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼q^[h(Y;μ(1))γ(Yμ)|T=1,X]|X]]\displaystyle~{}\mathbb{E}_{g}\left[\left.\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\mathbb{E}_{\hat{q}}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu)|T=1,X\right]\right|X\right]\right]
=\displaystyle= 𝔼g[𝔼g~[Th(Y;μ(1))γπ(X;α)(Yμ(1))|X]]\displaystyle~{}\mathbb{E}_{g}\left[\left.\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})\right|X\right]\right]
𝔼g[𝔼g[Tπ(X;α)π(X;α)|X]𝔼q^[h(Y;μ(1))γ(Yμ)|T=1,X]]=0\displaystyle~{}~{}~{}~{}~{}-\underbrace{\mathbb{E}_{g}\left[\left.\mathbb{E}_{g}\left[\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\right|X\right]\mathbb{E}_{\hat{q}}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu)|T=1,X\right]\right]}_{=0}
=\displaystyle= ε1(x)h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x+ν1(ε1).\displaystyle~{}-\int\varepsilon_{1}(x)\int h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx+\nu_{1}(\varepsilon_{1}). (A3)

If the contamination ratio is independent of XX, it holds that

ε1h(y;μ(1))γ(yμ(1))g(y|x)𝑑yg(x)𝑑x=0,\displaystyle-\varepsilon_{1}\iint h(y;\mu^{(1)})^{\gamma}(y-\mu^{(1)})g(y|x)dy~{}g(x)dx=0,

which is the same result as the DP-IPW estimating equation.

If the true OR model is given, the DP-DR estimating equation yields

𝔼g~[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼g[h(Y;μ(1))γ(Yμ)|T=1,X]]\displaystyle~{}\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha)}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha)}{\pi(X;\alpha)}\mathbb{E}_{g}\left[h(Y;\mu^{(1)})^{\gamma}(Y-\mu)|T=1,X\right]\right]
=\displaystyle= 𝔼g~[Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼g[h(Y(1);μ(1))γ(Y(1)μ)|X]]\displaystyle~{}\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha)}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha)}{\pi(X;\alpha)}\mathbb{E}_{g}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu)|X\right]\right]
=\displaystyle= 𝔼g[P(T=1|X)π(X;α)((1ε1(X))𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]+ε1(X)𝔼δ[h(Y;μ(1))γ(Yμ(1))|X])\displaystyle~{}\mathbb{E}_{g}\left[\frac{P(T=1|X)}{\pi(X;\alpha)}\left((1-\varepsilon_{1}(X))\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]+\varepsilon_{1}(X)\mathbb{E}_{\delta}[h(Y;\mu^{(1)})^{\gamma}(Y-\mu^{(1)})|X]\right)\right.
(P(T=1|X)π(X;α)1)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle~{}~{}~{}~{}~{}~{}-\left.\left(\frac{P(T=1|X)}{\pi(X;\alpha)}-1\right)\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right]
=\displaystyle= 𝔼g[ε1(X)P(T=1|X)π(X;α)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]+ν1(ε1()P(T=1|)/π(;α)).\displaystyle~{}\mathbb{E}_{g}\left[-\varepsilon_{1}(X)\frac{P(T=1|X)}{\pi(X;\alpha)}\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right]+\nu_{1}(\varepsilon_{1}(\cdot)P(T=1|\cdot)/\pi(\cdot;\alpha)). (A4)

When the contamination ratio is independent of XX, the first term becomes

ε1𝔼g[P(T=1|X)π(X;α)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle-\varepsilon_{1}\mathbb{E}_{g}\left[\frac{P(T=1|X)}{\pi(X;\alpha)}\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right] (A5)

Thus, we have Theorem 4. ∎

Appendix Appendix B Derivation of Influence functions in Section 4

Appendix B.1 DP-IPW

Let μ~n(1)\tilde{\mu}_{n}^{(1)} denote the root of the DP-IPW estimating equation under contamination.

0=\displaystyle 0= ε1(Xi){1ni=1n𝔼g~[Th(Y;μn(1))γπ(Xi;α)(Yμ~n(1))|Xi]}|ε1(Xi)=0\displaystyle~{}\left.\frac{\partial}{\partial\varepsilon_{1}(X_{i})}\left\{\frac{1}{n}\sum_{i=1}^{n}\left.\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\mu_{n}^{(1)})^{\gamma}}{\pi(X_{i};\alpha^{*})}(Y-\tilde{\mu}_{n}^{(1)})\right|X_{i}\right]\right\}\right|_{\varepsilon_{1}(X_{i})=0}
=\displaystyle= ε1(Xi)tπ(Xi;α)h(y;μn(1))γ(yμ~n(1)){(1ε1(Xi))g(y|Xi)+ε1(Xi)δy0(y)}g(t|Xi)𝑑y𝑑t|ε1(Xi)=0\displaystyle~{}\left.\frac{\partial}{\partial\varepsilon_{1}(X_{i})}\iint\frac{t}{\pi(X_{i};\alpha^{*})}h(y;\mu_{n}^{(1)})^{\gamma}(y-\tilde{\mu}_{n}^{(1)})\{(1-\varepsilon_{1}(X_{i}))g(y|X_{i})+\varepsilon_{1}(X_{i})\delta_{y_{0}}(y)\}g(t|X_{i})dydt\right|_{\varepsilon_{1}(X_{i})=0}
=\displaystyle= ψμ|μ=μn(1)g(y|Xi)g(t|Xi)dydtIF(y0)\displaystyle~{}\iint\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}g(y|X_{i})g(t|X_{i})dydt\cdot IF(y_{0})
+tπ(Xi;α)h(y;μn(1))γ(yμn(1))δy0(y)g(t|Xi)𝑑y𝑑t\displaystyle~{}~{}~{}~{}~{}+\iint\frac{t}{\pi(X_{i};\alpha^{*})}h(y;\mu_{n}^{(1)})^{\gamma}(y-\mu_{n}^{(1)})\delta_{y_{0}}(y)g(t|X_{i})dydt
=\displaystyle= 𝔼g[ψμ|μ=μn(1)|Xi]IFDPIPW(y0)+h(y0;μn(1))γ(y0μn(1))\displaystyle~{}\left.\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right]\cdot IF_{DP-IPW}(y_{0})+h(y_{0};\mu_{n}^{(1)})^{\gamma}(y_{0}-\mu^{(1)}_{n})

If 𝔼g[ψμ|μ=μn(1)|Xi]\left.\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right] is invertible, we obtain the IF as

𝔼g[ψμ|μ=μn(1)|Xi]1h(y0;μn(1))γ(y0μn(1)).\displaystyle-\left.\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right]^{-1}h(y_{0};\mu_{n}^{(1)})^{\gamma}(y_{0}-\mu^{(1)}_{n}). (A6)

Appendix B.2 DP-DR

0=\displaystyle 0= ε1(Xi){1ni=1n𝔼g~[Th(Y;μ~n(1))γπ(Xi;α)(Yμ~n(1))|Xi]\displaystyle~{}\frac{\partial}{\partial\varepsilon_{1}(X_{i})}\left\{\frac{1}{n}\sum_{i=1}^{n}\left.\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\tilde{\mu}_{n}^{(1)})^{\gamma}}{\pi(X_{i};\alpha^{*})}(Y-\tilde{\mu}_{n}^{(1)})\right|X_{i}\right]\right.
𝔼[Tπ(Xi;α)π(Xi;α)|Xi]𝔼q^[h(Y(1);μ~n(1))γ(Y(1)μ~n(1))|Xi]}|ε1(Xi)=0\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left.\left.-\mathbb{E}\left[\left.\frac{T-\pi(X_{i};\alpha^{*})}{\pi(X_{i};\alpha^{*})}\right|X_{i}\right]\mathbb{E}_{\hat{q}}[h(Y^{(1)};\tilde{\mu}_{n}^{(1)})^{\gamma}(Y^{(1)}-\tilde{\mu}_{n}^{(1)})|X_{i}]\right\}\right|_{\varepsilon_{1}(X_{i})=0}
=\displaystyle= ε1(Xi){tπ(Xi;α)h(y;μ~n(1))γ(yμ~n(1)){(1ε1(Xi))g(y|Xi)+ε1(Xi)δy0(y)}g(t|Xi)\displaystyle~{}\frac{\partial}{\partial\varepsilon_{1}(X_{i})}\left\{\iint\frac{t}{\pi(X_{i};\alpha^{*})}h(y;\tilde{\mu}_{n}^{(1)})^{\gamma}(y-\tilde{\mu}_{n}^{(1)})\{(1-\varepsilon_{1}(X_{i}))g(y|X_{i})+\varepsilon_{1}(X_{i})\delta_{y_{0}}(y)\}g(t|X_{i})\right.
tπ(Xi;α)π(Xi;α)𝔼q^[h(Y(1);μ~n(1))γ(Y(1)μ~n(1))|Xi]g(t|Xi)dydt}|ε1(Xi)=0\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left.\left.-\frac{t-\pi(X_{i};\alpha^{*})}{\pi(X_{i};\alpha^{*})}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\tilde{\mu}_{n}^{(1)})^{\gamma}(Y^{(1)}-\tilde{\mu}_{n}^{(1)})|X_{i}]g(t|X_{i})dydt\right\}\right|_{\varepsilon_{1}(X_{i})=0}
=\displaystyle= 𝔼g[ψμ|μ=μn(1)|Xi]IF(y0)+P(T=1|Xi)π(Xi;α)h(y0;μn(1))γ(y0μn(1))\displaystyle~{}\left.\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right]\cdot IF(y_{0})+\frac{P(T=1|X_{i})}{\pi(X_{i};\alpha^{*})}h(y_{0};\mu_{n}^{(1)})^{\gamma}(y_{0}-\mu_{n}^{(1)})
P(T=1|Xi)π(Xi;α)π(Xi;α)𝔼q^[h(Y(1);μn(1))γ(Y(1)μn(1))|Xi]\displaystyle~{}~{}~{}~{}~{}~{}~{}-\frac{P(T=1|X_{i})-\pi(X_{i};\alpha^{*})}{\pi(X_{i};\alpha^{*})}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\mu_{n}^{(1)})^{\gamma}(Y^{(1)}-\mu_{n}^{(1)})|X_{i}]

Then, we obtain the IF as

𝔼g[ψμ|μ=μn(1)|Xi]1\displaystyle-\mathbb{E}_{g}\left[\left.\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right]^{-1} {P(T=1|Xi)π(Xi;α)h(y0;μn(1))γ(y0μn(1))\displaystyle\left\{\frac{P(T=1|X_{i})}{\pi(X_{i};\alpha)}h(y_{0};\mu_{n}^{(1)})^{\gamma}(y_{0}-\mu^{(1)}_{n})\right.
P(T=1|Xi)π(Xi;α)π(Xi;α)𝔼q^[h(Y(1);μn(1))γ(Y(1)μn(1))|Xi]}.\displaystyle\left.-\frac{P(T=1|X_{i})-\pi(X_{i};\alpha)}{\pi(X_{i};\alpha)}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\mu_{n}^{(1)})^{\gamma}(Y^{(1)}-\mu_{n}^{(1)})|X_{i}]\right\}. (A7)

Appendix B.3 ε\varepsilonDP-DR

Suppose the expected contamination ratio is correctly specified as ε¯1=(1ε1(Xi)/n)\overline{\varepsilon}_{1}=(1-\sum\varepsilon_{1}(X_{i})/n).

0=\displaystyle 0= ε1(Xi){1ni=1n𝔼g~[Th(Y;μ~n(1))γπ(Xi;α)(Yμ~n(1))|Xi]\displaystyle~{}\frac{\partial}{\partial\varepsilon_{1}(X_{i})}\left\{\frac{1}{n}\sum_{i=1}^{n}\left.\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\tilde{\mu}_{n}^{(1)})^{\gamma}}{\pi(X_{i};\alpha^{*})}(Y-\tilde{\mu}_{n}^{(1)})\right|X_{i}\right]\right.
(11ni=1nε1(Xi))𝔼[Tπ(Xi;α)π(Xi;α)|Xi]𝔼q^[h(Y(1);μ~n(1))γ(Y(1)μ~n(1))|Xi]}|ε1(Xi)=0\displaystyle~{}~{}~{}~{}~{}~{}~{}\left.\left.-\left(1-\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{1}(X_{i})\right)\mathbb{E}\left[\left.\frac{T-\pi(X_{i};\alpha^{*})}{\pi(X_{i};\alpha^{*})}\right|X_{i}\right]\mathbb{E}_{\hat{q}}[h(Y^{(1)};\tilde{\mu}_{n}^{(1)})^{\gamma}(Y^{(1)}-\tilde{\mu}_{n}^{(1)})|X_{i}]\right\}\right|_{\varepsilon_{1}(X_{i})=0}
=\displaystyle= ε1(Xi){tπ(Xi;α)h(y;μ~n(1))γ(yμ~n(1)){(1ε1(Xi))g(y|Xi)+ε1(Xi)δy0(y)}g(t|Xi)\displaystyle~{}\frac{\partial}{\partial\varepsilon_{1}(X_{i})}\left\{\iint\frac{t}{\pi(X_{i};\alpha^{*})}h(y;\tilde{\mu}_{n}^{(1)})^{\gamma}(y-\tilde{\mu}_{n}^{(1)})\{(1-\varepsilon_{1}(X_{i}))g(y|X_{i})+\varepsilon_{1}(X_{i})\delta_{y_{0}}(y)\}g(t|X_{i})\right.
(11ni=1nε1(Xi))tπ(Xi;α)π(Xi;α)𝔼q^[h(Y(1);μ~n(1))γ(Y(1)μ~n(1))|Xi]g(t|Xi)dydt}|ε1(Xi)=0\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}\left.\left.-\left(1-\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{1}(X_{i})\right)\frac{t-\pi(X_{i};\alpha^{*})}{\pi(X_{i};\alpha^{*})}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\tilde{\mu}_{n}^{(1)})^{\gamma}(Y^{(1)}-\tilde{\mu}_{n}^{(1)})|X_{i}]g(t|X_{i})dydt\right\}\right|_{\varepsilon_{1}(X_{i})=0}
=\displaystyle= 𝔼g[ψμ|μ=μn(1)|Xi]IF(y0)+P(T=1|Xi)π(Xi;α)h(y0;μn(1))γ(y0μn(1))\displaystyle~{}\left.\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\mu}\right|_{\mu=\mu^{(1)}_{n}}\right|X_{i}\right]\cdot IF(y_{0})+\frac{P(T=1|X_{i})}{\pi(X_{i};\alpha^{*})}h(y_{0};\mu_{n}^{(1)})^{\gamma}(y_{0}-\mu_{n}^{(1)})
n1nP(T=1|Xi)π(Xi;α)π(Xi;α)𝔼q^[h(Y(1);μn(1))γ(Y(1)μn(1))|Xi]\displaystyle~{}~{}~{}~{}~{}~{}~{}-\frac{n-1}{n}\frac{P(T=1|X_{i})-\pi(X_{i};\alpha^{*})}{\pi(X_{i};\alpha^{*})}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\mu_{n}^{(1)})^{\gamma}(Y^{(1)}-\mu_{n}^{(1)})|X_{i}]

Thus, we obtain (4.25).

Appendix Appendix C Influence Functions Under Homogeneous Contamination

Under homogeneous contamination, we can apply the ordinary IF analysis. By differentiating the estimating equations with respect to ε1\varepsilon_{1} at ε1=0\varepsilon_{1}=0, we obtain the following results.

Appendix C.1 DP-IPW

0=\displaystyle 0= ε1𝔼g~[Th(Y;μ~(1))γπ(X;α)(Yμ~(1))]|ε1=0\displaystyle~{}\left.\frac{\partial}{\partial\varepsilon_{1}}\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\tilde{\mu}^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\tilde{\mu}^{(1)})\right]\right|_{\varepsilon_{1}=0}
0=\displaystyle 0= ε1th(y;μ~(1))γπ(x;α)(yμ~(1)){(1ε1)g(y|t,x)+ε1δy0(y|x)}g(t|x)g(x)𝑑y𝑑t𝑑x|ε1=0\displaystyle~{}\left.\frac{\partial}{\partial\varepsilon_{1}}\iiint\frac{th(y;\tilde{\mu}^{(1)})^{\gamma}}{\pi(x;\alpha^{*})}(y-\tilde{\mu}^{(1)})\{(1-\varepsilon_{1})g(y|t,x)+\varepsilon_{1}\delta_{y_{0}}(y|x)\}g(t|x)g(x)dydtdx\right|_{\varepsilon_{1}=0}
=\displaystyle= 𝔼g[ψε1|μ=μ(1)]IF(y0)+th(y;μ(1))γπ(x;α)(yμ(1))δy0(y|x)g(t|x)g(x)𝑑y𝑑t𝑑x\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]\cdot IF(y_{0})+\iiint\frac{th(y;\mu^{(1)})^{\gamma}}{\pi(x;\alpha^{*})}(y-\mu^{(1)})\delta_{y_{0}}(y|x)g(t|x)g(x)dydtdx
IF(y0)=\displaystyle IF(y_{0})= 𝔼g[ψε1|μ=μ(1)]1h(y0;μ(1))γ(y0μ(1))\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]^{-1}h(y_{0};\mu^{(1)})^{\gamma}(y_{0}-\mu^{(1)}) (A8)

Thus, DP-IPW has a redescending property under homogeneous contamination.

Appendix C.2 DP-DR

0=\displaystyle 0= ε1𝔼g~[Th(Y;μ~(1))γπ(X;α)(Yμ~(1))Tπ(X;α)π(X;α)𝔼q^[h(Y(1);μ~(1))γ(Y(1)μ~(1))|X]]|ε1=0\displaystyle~{}\left.\frac{\partial}{\partial\varepsilon_{1}}\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\tilde{\mu}^{(1)})^{\gamma}}{\pi(X;\alpha)}(Y-\tilde{\mu}^{(1)})-\frac{T-\pi(X;\alpha)}{\pi(X;\alpha)}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\tilde{\mu}^{(1)})^{\gamma}(Y^{(1)}-\tilde{\mu}^{(1)})|X]\right]\right|_{\varepsilon_{1}=0}
=\displaystyle= ε1(th(y;μ~(1))γπ(x;α)(yμ~(1))tπ(x;α)π(x;α)𝔼q^[h(Y(1);μ~(1))γ(Y(1)μ~(1))|x])\displaystyle~{}\frac{\partial}{\partial\varepsilon_{1}}\iiint\left(\frac{th(y;\tilde{\mu}^{(1)})^{\gamma}}{\pi(x;\alpha)}(y-\tilde{\mu}^{(1)})-\frac{t-\pi(x;\alpha)}{\pi(x;\alpha)}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\tilde{\mu}^{(1)})^{\gamma}(Y^{(1)}-\tilde{\mu}^{(1)})|x]\right)
×{(1ε1)g(y|t,x)+ε1δy0(y|x)}g(t|x)g(x)dydtdx|ε1=0\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}\times\left.\{(1-\varepsilon_{1})g(y|t,x)+\varepsilon_{1}\delta_{y_{0}}(y|x)\}g(t|x)g(x)dydtdx\right|_{\varepsilon_{1}=0}
=\displaystyle= 𝔼g[ψε1|μ=μ(1)]IF(y0)+(th(y;μ(1))γπ(x;α)(yμ(1))\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]\cdot IF(y_{0})+\iiint\left(\frac{th(y;\mu^{(1)})^{\gamma}}{\pi(x;\alpha)}(y-\mu^{(1)})\right.
tπ(x;α)π(x;α)𝔼q^[h(Y(1);μ(1))γ(Y(1)μ(1))|x])δy0(y|x)g(t|x)g(x)dydtdx\displaystyle~{}~{}~{}~{}~{}~{}-\left.\frac{t-\pi(x;\alpha)}{\pi(x;\alpha)}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|x]\right)\delta_{y_{0}}(y|x)g(t|x)g(x)dydtdx
IF(y0)=\displaystyle IF(y_{0})= 𝔼g[ψε1|μ=μ(1)]1(th(y0;μ(1))γπ(x;α)(y0μ(1))\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]^{-1}\iint\left(\frac{th(y_{0};\mu^{(1)})^{\gamma}}{\pi(x;\alpha)}(y_{0}-\mu^{(1)})\right.
tπ(x;α)π(x;α)𝔼q^[h(Y(1);μ(1))γ(Y(1)μ(1))|x])g(t|x)g(x)dtdx\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}\left.-\frac{t-\pi(x;\alpha)}{\pi(x;\alpha)}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|x]\right)g(t|x)g(x)dtdx (A9)

If the true PS model is given, this IF reduces to

IF(y0)=\displaystyle IF(y_{0})= 𝔼g[ψε1|μ=μ(1)]1h(y0;μ(1))γ(y0μ(1))\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]^{-1}h(y_{0};\mu^{(1)})^{\gamma}(y_{0}-\mu^{(1)}) (A10)

If the true OR model is given, this IF reduces to

IF(y0)=\displaystyle IF(y_{0})= 𝔼g[ψε1|μ=μ(1)]1P(T=1|x)π(x;α)h(y0;μ(1))γ(y0μ(1))\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]^{-1}\int\frac{P(T=1|x)}{\pi(x;\alpha)}h(y_{0};\mu^{(1)})^{\gamma}(y_{0}-\mu^{(1)})
P(T=1|x)π(x;α)π(x;α)𝔼g[h(Y;μ(1))γ(Y(1)μ(1))|x]g(x)dx\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}-\frac{P(T=1|x)-\pi(x;\alpha)}{\pi(x;\alpha)}\mathbb{E}_{g}[h(Y;\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|x]g(x)dx (A11)

Thus, DP-DR has a redescending property under homogeneous contamination in the PS-correct case.

Appendix C.3 ε\varepsilonDP-DR

0=\displaystyle 0= ε1𝔼g~[Th(Y;μ~(1))γπ(X;α)(Yμ~(1))\displaystyle~{}\left.\frac{\partial}{\partial\varepsilon_{1}}\mathbb{E}_{\tilde{g}}\left[\frac{Th(Y;\tilde{\mu}^{(1)})^{\gamma}}{\pi(X;\alpha)}(Y-\tilde{\mu}^{(1)})\right.\right.
Tπ(X;α)π(X;α)(1ε1)𝔼q^[h(Y(1);μ~(1))γ(Y(1)μ~(1))|X]]|ε1=0\displaystyle~{}~{}~{}~{}~{}~{}\left.\left.-\frac{T-\pi(X;\alpha)}{\pi(X;\alpha)}(1-\varepsilon_{1})\mathbb{E}_{\hat{q}}[h(Y^{(1)};\tilde{\mu}^{(1)})^{\gamma}(Y^{(1)}-\tilde{\mu}^{(1)})|X]\right]\right|_{\varepsilon_{1}=0}
=\displaystyle= 𝔼g[ψε1|μ=μ(1)]IF(y0)\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]\cdot IF(y_{0})
+(th(y;μ(1))γπ(x;α)(yμ(1))tπ(x;α)π(x;α)𝔼q^[h(Y(1);μ(1))γ(Y(1)μ(1))|x]\displaystyle~{}~{}~{}~{}+\iiint\left(\frac{th(y;\mu^{(1)})^{\gamma}}{\pi(x;\alpha)}(y-\mu^{(1)})-\frac{t-\pi(x;\alpha)}{\pi(x;\alpha)}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|x]\right.
+tπ(x;α)π(x;α)𝔼q^[h(Y(1);μ(1))γ(Y(1)μ(1))|x])δy0(y|x)g(t|x)g(x)dydtdx\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left.+\frac{t-\pi(x;\alpha)}{\pi(x;\alpha)}\mathbb{E}_{\hat{q}}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|x]\right)\delta_{y_{0}}(y|x)g(t|x)g(x)dydtdx
IF(y0)=\displaystyle IF(y_{0})= 𝔼g[ψε1|μ=μ(1)]1P(T=1|x)π(x;α)h(y0;μ(1))γ(y0μ(1))g(x)𝑑x\displaystyle~{}\mathbb{E}_{g}\left[\left.\frac{\partial\psi}{\partial\varepsilon_{1}}\right|_{\mu=\mu^{(1)}}\right]^{-1}\int\frac{P(T=1|x)}{\pi(x;\alpha)}h(y_{0};\mu^{(1)})^{\gamma}(y_{0}-\mu^{(1)})g(x)dx (A12)

Thus, under homogeneous contamination, ε\varepsilonDP-DR has a redescending property in either the PS-correct case or the OR-correct case.

Appendix Appendix D Further Discussion on Asymptotic Properties

Appendix D.1 Regularity Conditions for Theorem 5

Detailed discussion is available in Chapter 5 of Van der Vaart (2000), for example.

  1. (a)

    The function S(λ)S(\lambda) is twice continuously differentiable with respect to λ\lambda.

  2. (b)

    There exists a root λ\lambda^{*} of 𝔼g~[S(λ)]=0\mathbb{E}_{\tilde{g}}[S(\lambda)]=0.

  3. (c)

    𝔼g~[S(λ)2]<\mathbb{E}_{\tilde{g}}[\|S(\lambda^{*})\|^{2}]<\infty.

  4. (d)

    𝔼g~[S(λ)/λT]\mathbb{E}_{\tilde{g}}[\partial S(\lambda^{*})/\partial\lambda^{T}] exists and is nonsingular.

  5. (e)

    The second-order differentials of S(λ)S(\lambda) with respect to μ\mu are dominated by a fixed integrable function hh in a neighborhood of λ\lambda^{*}.

Appendix D.2 Proof of Theorem 6

Under homogeneous contamination, we see that simpler properties hold. The matrix 𝐉g~(λ)\mathbf{J}^{\tilde{g}}(\lambda^{*}) is partitioned as

𝐉g~(λ)=\displaystyle\mathbf{J}^{\tilde{g}}(\lambda^{*})= (𝔼g~[μψi(μ;α,β)]𝔼g~[αTψi(μ;α,β)]𝔼g~[βTψi(μ;α,β)]𝟎𝔼g[αTsiPS(α)]𝟎𝟎𝟎𝔼g~[βTsiOR(β)])\displaystyle~{}\left(\begin{array}[]{ccc}\mathbb{E}_{\tilde{g}}\left[\frac{\partial}{\partial\mu}\psi_{i}(\mu^{*};\alpha^{*},\beta^{*})\right]&\mathbb{E}_{\tilde{g}}\left[\frac{\partial}{\partial\alpha^{T}}\psi_{i}(\mu^{*};\alpha^{*},\beta^{*})\right]&\mathbb{E}_{\tilde{g}}\left[\frac{\partial}{\partial\beta^{T}}\psi_{i}(\mu^{*};\alpha^{*},\beta^{*})\right]\\ \mathbf{0}&\mathbb{E}_{g}\left[\frac{\partial}{\partial\alpha^{T}}s^{PS}_{i}(\alpha^{*})\right]&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbb{E}_{\tilde{g}}\left[\frac{\partial}{\partial\beta^{T}}s^{OR}_{i}(\beta^{*})\right]\end{array}\right) (A16)
=\displaystyle= (𝐉11g~(λ)𝐉12g~(λ)𝐉13g~(λ)𝟎𝐉22g(λ)𝟎𝟎𝟎𝐉33g~(λ)).\displaystyle~{}\left(\begin{array}[]{ccc}\mathbf{J}^{\tilde{g}}_{11}(\lambda^{*})&\mathbf{J}^{\tilde{g}}_{12}(\lambda^{*})&\mathbf{J}^{\tilde{g}}_{13}(\lambda^{*})\\ \mathbf{0}&\mathbf{J}^{g}_{22}(\lambda^{*})&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{J}^{\tilde{g}}_{33}(\lambda^{*})\end{array}\right). (A20)

If it is nonsingular, the inverse is obtained as

𝐉g~(λ)1=\displaystyle\mathbf{J}^{\tilde{g}}(\lambda^{*})^{-1}= (𝐉11g~(λ)1𝐉11g~(λ)1𝐉12g~(λ)𝐉22g(λ)1𝐉11g~(λ)1𝐉13g~(λ)𝐉33g~(λ)1𝟎𝐉22g(λ)1𝟎𝟎𝟎𝐉33g~(λ)1).\displaystyle~{}\left(\begin{array}[]{ccc}\mathbf{J}^{\tilde{g}}_{11}(\lambda^{*})^{-1}&-\mathbf{J}^{\tilde{g}}_{11}(\lambda^{*})^{-1}\mathbf{J}^{\tilde{g}}_{12}(\lambda^{*})\mathbf{J}^{g}_{22}(\lambda^{*})^{-1}&-\mathbf{J}^{\tilde{g}}_{11}(\lambda^{*})^{-1}\mathbf{J}^{\tilde{g}}_{13}(\lambda^{*})\mathbf{J}^{\tilde{g}}_{33}(\lambda^{*})^{-1}\\ \mathbf{0}&\mathbf{J}^{g}_{22}(\lambda^{*})^{-1}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{J}^{\tilde{g}}_{33}(\lambda^{*})^{-1}\end{array}\right). (A24)

Note that 𝐉11g~()\mathbf{J}^{\tilde{g}}_{11}(\cdot) is a scalar value.

Then, Theorem 5 is proved as follows.

Proof.

By Taylor’s theorem, the expectation of the estimating equation (5.26) is expressed as

0=𝔼[Si(λ)]=𝔼[Si(λ)]+𝐉g~(λ)(λλ),\displaystyle 0=\mathbb{E}[S_{i}(\lambda^{*})]=\mathbb{E}[S_{i}(\lambda^{**})]+\mathbf{J}^{\tilde{g}}(\lambda^{\dagger})(\lambda^{*}-\lambda^{**}),

where λ\lambda^{\dagger} is an intermediate value between λ\lambda^{**} and λ\lambda^{*}. Since 𝔼[siPS(α)]=𝔼[siOR(β)]=0\mathbb{E}[s^{PS}_{i}(\alpha^{*})]=\mathbb{E}[s^{OR}_{i}(\beta^{*})]=0 and (λλ)=(μμ(1),𝟎T,𝟎T)T(\lambda^{*}-\lambda^{**})=(\mu^{*}-\mu^{(1)},\mathbf{0}^{T},\mathbf{0}^{T})^{T}, only the first element is meaningful:

0=𝔼[ψi(μ(1);α,β)]+𝐉11g~(λ)(μμ(1)).\displaystyle 0=\mathbb{E}[\psi_{i}(\mu^{(1)};\alpha^{*},\beta^{*})]+\mathbf{J}^{\tilde{g}}_{11}(\lambda^{{\dagger}})(\mu^{*}-\mu^{(1)}).

Then, since 𝐉11g~(λ)\mathbf{J}^{\tilde{g}}_{11}(\lambda^{{\dagger}}) is non-zero, the latent bias of μ\mu^{*} reduces to

μμ(1)=𝐉11g~(λ)1𝔼[ψi(μ(1);α,β)].\displaystyle\mu^{*}-\mu^{(1)}=-\mathbf{J}^{\tilde{g}}_{11}(\lambda^{{\dagger}})^{-1}\mathbb{E}[\psi_{i}(\mu^{(1)};\alpha^{*},\beta^{*})]. (A25)

From Corollary 1, if either the PS or the OR model is correct, we have

𝔼[ψi(μ(1);α,β)]=ν1(ϕ).\displaystyle\mathbb{E}[\psi_{i}(\mu^{(1)};\alpha^{*},\beta^{*})]=\nu_{1}(\phi).

Upon substituting it into (A25), the statement holds. ∎

Appendix D.3 Further Discussion on Asymptotic Variance

Considering the structure of the full estimating equation, the asymptotic variance can be expressed in a more explicit form. The discussion about the asymptotic variance is provided in the next section.

The matrix 𝐊g~(λ)\mathbf{K}_{\tilde{g}}(\lambda^{*}) is also partitioned as

𝐊g~(λ)=\displaystyle\mathbf{K}^{\tilde{g}}(\lambda^{*})= (𝐊11g~(λ)𝐊12g~(λ)𝐊13g~(λ)𝐊12g~T(λ)𝐊22g(λ)𝐊23g~(λ)𝐊13g~T(λ)𝐊23g~T(λ)𝐊33g~(λ)).\displaystyle~{}\left(\begin{array}[]{ccc}\mathbf{K}^{\tilde{g}}_{11}(\lambda^{*})&\mathbf{K}^{\tilde{g}}_{12}(\lambda^{*})&\mathbf{K}^{\tilde{g}}_{13}(\lambda^{*})\\ \mathbf{K}^{\tilde{g}~{}T}_{12}(\lambda^{*})&\mathbf{K}^{g}_{22}(\lambda^{*})&\mathbf{K}^{\tilde{g}}_{23}(\lambda^{*})\\ \mathbf{K}^{\tilde{g}~{}T}_{13}(\lambda^{*})&\mathbf{K}^{\tilde{g}~{}T}_{23}(\lambda^{*})&\mathbf{K}^{\tilde{g}}_{33}(\lambda^{*})\end{array}\right). (A29)

The asymptotic variance is also affected by outliers. However, under Assumption 1, the asymptotic variance can be approximated by the asymptotic variance under no contamination and contamination ratio ε1\varepsilon_{1}.

Theorem A7.

Besides to Assumption 1, assume that 𝐉1mδ(λ)𝟎\mathbf{J}^{\delta}_{1m}(\lambda^{**})\approx\mathbf{0} and 𝐊1mδ(λ)𝟎\mathbf{K}^{\delta}_{1m}(\lambda^{**})\approx\mathbf{0} holds for m=1,2,3m=1,2,3. Then, under homogeneous contamination,

𝐕g~(λ)\displaystyle\mathbf{V}^{\tilde{g}}(\lambda^{*})\approx 𝐉gˇ(λ)1(1(1ε1)𝐊11g(λ)𝐊12g(λ)𝐊13g(λ)𝐊12g(λ)T𝐊22g(λ)𝐊23g~(λ)𝐊13g(λ)T𝐊23g~(λ)T𝐊33g~(λ)){𝐉gˇ(λ)T}1,\displaystyle~{}\mathbf{J}^{\check{g}}(\lambda^{**})^{-1}\left(\begin{array}[]{ccc}\frac{1}{(1-\varepsilon_{1})}\mathbf{K}^{g}_{11}(\lambda^{**})&\mathbf{K}^{g}_{12}(\lambda^{**})&\mathbf{K}^{g}_{13}(\lambda^{**})\\ \mathbf{K}^{g}_{12}(\lambda^{**})^{T}&\mathbf{K}^{g}_{22}(\lambda^{**})&\mathbf{K}^{\tilde{g}}_{23}(\lambda^{**})\\ \mathbf{K}^{g}_{13}(\lambda^{**})^{T}&\mathbf{K}^{\tilde{g}}_{23}(\lambda^{**})^{T}&\mathbf{K}^{\tilde{g}}_{33}(\lambda^{**})\end{array}\right)\{\mathbf{J}^{\check{g}}(\lambda^{**})^{T}\}^{-1}, (A33)

where

𝐉gˇ(λ)=(𝐉11g(λ)𝐉12g(λ)𝐉13g(λ)𝟎𝐉22g(λ)𝟎𝟎𝟎𝐉33g~(λ))\displaystyle\mathbf{J}^{\check{g}}(\lambda^{**})=\left(\begin{array}[]{ccc}\mathbf{J}^{g}_{11}(\lambda^{**})&\mathbf{J}^{g}_{12}(\lambda^{**})&\mathbf{J}^{g}_{13}(\lambda^{**})\\ \mathbf{0}&\mathbf{J}^{g}_{22}(\lambda^{**})&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{J}^{\tilde{g}}_{33}(\lambda^{**})\end{array}\right) (A37)

.

If both the PS and the OR models are correct, the asymptotic variance of μ^\hat{\mu} has a simpler expression. From a similar discussion to Section 3, the following lemma holds:

Lemma A1.

If the PS model is correct, 𝐉13g(λ)=𝟎\mathbf{J}^{g}_{13}(\lambda^{**})=\mathbf{0}. If the OR model is correct, 𝐉12g(λ)=𝟎\mathbf{J}^{g}_{12}(\lambda^{**})=\mathbf{0}.

Using Lemma A1, we can see the asymptotic variance of μ^\hat{\mu} is simply expressed.

Theorem A8.

Under the same assumptions of Theorem A7, if the PS and the OR models are both correct, then

𝐕g~(μ)11ε1𝐉11g(λ)1𝐊11g(λ){𝐉11g(λ)T}1.\displaystyle\mathbf{V}_{\tilde{g}}(\mu^{*})\approx\frac{1}{1-\varepsilon_{1}}\mathbf{J}^{g}_{11}(\lambda^{**})^{-1}\mathbf{K}^{g}_{11}(\lambda^{**})\left\{\mathbf{J}^{g}_{11}(\lambda^{**})^{T}\right\}^{-1}.
Proof.

By applying Lemma A1 to Theorem A7, the statement holds. ∎

This implies that the ε\varepsilonDP-DR appropriately ignores outliers.

Appendix D.3.1 Proof of Theorem A7

If either the PS or the OR model is correct, we can say μμ(1)\mu^{*}\approx\mu^{(1)}. Note that the PS model is not related to the contamination distribution δ\delta, and the contamination on the OR model cannot be removed in general. From the assumptions,

𝐉g~(λ)\displaystyle\mathbf{J}^{\tilde{g}}(\lambda^{*})\approx 𝐉g~(λ)\displaystyle~{}\mathbf{J}^{\tilde{g}}(\lambda^{**})
=\displaystyle= ((1ε1)𝐉11g(1ε1)𝐉12g(1ε1)𝐉13g𝟎𝐉22g𝟎𝟎𝟎𝐉33g~)+(ε1𝐉11δε1𝐉12δε1𝐉13δ𝟎𝟎𝟎𝟎𝟎𝟎)\displaystyle~{}\left(\begin{array}[]{ccc}(1-\varepsilon_{1})\mathbf{J}^{g}_{11}&(1-\varepsilon_{1})\mathbf{J}^{g}_{12}&(1-\varepsilon_{1})\mathbf{J}^{g}_{13}\\ \mathbf{0}&\mathbf{J}^{g}_{22}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{J}^{\tilde{g}}_{33}\end{array}\right)+\left(\begin{array}[]{ccc}\varepsilon_{1}\mathbf{J}^{\delta}_{11}&\varepsilon_{1}\mathbf{J}^{\delta}_{12}&\varepsilon_{1}\mathbf{J}^{\delta}_{13}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{0}\end{array}\right) (A44)
\displaystyle\approx ((1ε1)𝐉11g(1ε1)𝐉12g(1ε1)𝐉13g𝟎𝐉22g𝟎𝟎𝟎𝐉33g~)\displaystyle~{}\left(\begin{array}[]{ccc}(1-\varepsilon_{1})\mathbf{J}^{g}_{11}&(1-\varepsilon_{1})\mathbf{J}^{g}_{12}&(1-\varepsilon_{1})\mathbf{J}^{g}_{13}\\ \mathbf{0}&\mathbf{J}^{g}_{22}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{J}^{\tilde{g}}_{33}\end{array}\right) (A48)
=\displaystyle= (1ε1𝟎𝟎𝟎𝐈α𝟎𝟎𝟎𝐈β)𝐉gˇ(λ)\displaystyle~{}\left(\begin{array}[]{ccc}1-\varepsilon_{1}&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{I}_{\alpha}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\mathbf{I}_{\beta}\end{array}\right)\mathbf{J}^{\check{g}}(\lambda^{**}) (A52)
𝐊g~(λ)\displaystyle\mathbf{K}^{\tilde{g}}(\lambda^{*})\approx 𝐊g~(λ)\displaystyle~{}\mathbf{K}^{\tilde{g}}(\lambda^{**})
=\displaystyle= ((1ε1)𝐊11g(1ε1)𝐊12g(1ε1)𝐊13g(1ε1)𝐊12gT𝐊22g𝐊23g~(1ε1)𝐊13gT𝐊23g~T𝐊33g~)+(ε1𝐊11δε1𝐊12δε1𝐊13δTε1𝐊12δT𝟎𝟎ε1𝐊13δT𝟎𝟎)\displaystyle~{}\left(\begin{array}[]{ccc}(1-\varepsilon_{1})\mathbf{K}^{g}_{11}&(1-\varepsilon_{1})\mathbf{K}^{g}_{12}&(1-\varepsilon_{1})\mathbf{K}^{g}_{13}\\ (1-\varepsilon_{1})\mathbf{K}^{g~{}T}_{12}&\mathbf{K}^{g}_{22}&\mathbf{K}^{\tilde{g}}_{23}\\ (1-\varepsilon_{1})\mathbf{K}^{g~{}T}_{13}&\mathbf{K}^{\tilde{g}~{}T}_{23}&\mathbf{K}^{\tilde{g}}_{33}\end{array}\right)+\left(\begin{array}[]{ccc}\varepsilon_{1}\mathbf{K}^{\delta}_{11}&\varepsilon_{1}\mathbf{K}^{\delta}_{12}&\varepsilon_{1}\mathbf{K}^{\delta T}_{13}\\ \varepsilon_{1}\mathbf{K}^{\delta~{}T}_{12}&\mathbf{0}&\mathbf{0}\\ \varepsilon_{1}\mathbf{K}^{\delta~{}T}_{13}&\mathbf{0}&\mathbf{0}\end{array}\right) (A59)
\displaystyle\approx ((1ε1)𝐊11g(1ε1)𝐊12g(1ε1)𝐊13g(1ε1)𝐊12gT𝐊22g𝐊23g~(1ε1)𝐊13gT𝐊23g~T𝐊33g~)\displaystyle~{}\left(\begin{array}[]{ccc}(1-\varepsilon_{1})\mathbf{K}^{g}_{11}&(1-\varepsilon_{1})\mathbf{K}^{g}_{12}&(1-\varepsilon_{1})\mathbf{K}^{g}_{13}\\ (1-\varepsilon_{1})\mathbf{K}^{g~{}T}_{12}&\mathbf{K}^{g}_{22}&\mathbf{K}^{\tilde{g}}_{23}\\ (1-\varepsilon_{1})\mathbf{K}^{g~{}T}_{13}&\mathbf{K}^{\tilde{g}~{}T}_{23}&\mathbf{K}^{\tilde{g}}_{33}\end{array}\right) (A63)

The input (λ)(\lambda^{**}) is dropped for notation simplicity. Thus, we have

𝐕g~(λ)\displaystyle\mathbf{V}^{\tilde{g}}(\lambda^{*})\approx 𝐉gˇ(λ)1(1(1ε1)𝐊11g(λ)𝐊12g(λ)𝐊13g(λ)𝐊12g(λ)T𝐊22g(λ)𝐊23g~(λ)𝐊13g(λ)T𝐊23g~(λ)T𝐊33g~(λ)){𝐉gˇ(λ)T}1,\displaystyle~{}\mathbf{J}^{\check{g}}(\lambda^{**})^{-1}\left(\begin{array}[]{ccc}\frac{1}{(1-\varepsilon_{1})}\mathbf{K}^{g}_{11}(\lambda^{**})&\mathbf{K}^{g}_{12}(\lambda^{**})&\mathbf{K}^{g}_{13}(\lambda^{**})\\ \mathbf{K}^{g}_{12}(\lambda^{**})^{T}&\mathbf{K}^{g}_{22}(\lambda^{**})&\mathbf{K}^{\tilde{g}}_{23}(\lambda^{**})\\ \mathbf{K}^{g}_{13}(\lambda^{**})^{T}&\mathbf{K}^{\tilde{g}}_{23}(\lambda^{**})^{T}&\mathbf{K}^{\tilde{g}}_{33}(\lambda^{**})\end{array}\right)\{\mathbf{J}^{\check{g}}(\lambda^{**})^{T}\}^{-1}, (A67)

The proof is complete.

Appendix D.3.2 Proof of Lemma A1

Proof.

In the PS correct case,

𝐉13g(λ)=\displaystyle\mathbf{J}^{g}_{13}(\lambda^{**})= 𝔼g[βT{Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼q[h(Y,μ(1))γ(Yμ(1))|T=1,X]}]\displaystyle~{}\mathbb{E}_{g}\left[\frac{\partial}{\partial\beta^{T}}\left\{\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\mathbb{E}_{q^{*}}[h(Y,\mu^{(1)})^{\gamma}(Y-\mu^{(1)})|T=1,X]\right\}\right]
=\displaystyle= 𝔼g[P(T=1|X)π(X;α)h(Y(1);μ(1))γ(Y(1)μ(1))\displaystyle~{}\mathbb{E}_{g}\left[\frac{P(T=1|X)}{\pi(X;\alpha^{*})}h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})\right.
P(T=1|X)π(X;α)π(X;α)βT𝔼q[h(Y,μ(1))γ(Yμ(1))|T=1,X]]\displaystyle~{}~{}~{}~{}~{}~{}~{}\left.-\frac{P(T=1|X)-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\frac{\partial}{\partial\beta^{T}}\mathbb{E}_{q^{*}}[h(Y,\mu^{(1)})^{\gamma}(Y-\mu^{(1)})|T=1,X]\right]
=\displaystyle= 𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))]\displaystyle~{}\mathbb{E}_{g}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})\right]
=\displaystyle= 0,\displaystyle~{}0,

where 𝔼q\mathbb{E}_{q^{*}} denotes the expectation with respect to q(y|T=1,x;β)q(y|T=1,x;\beta^{*}). In the OR correct case,

𝐉12g(λ)=\displaystyle\mathbf{J}^{g}_{12}(\lambda^{**})= 𝔼g[αT{Th(Y;μ(1))γπ(X;α)(Yμ(1))Tπ(X;α)π(X;α)𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]}]\displaystyle~{}\mathbb{E}_{g}\left[\frac{\partial}{\partial\alpha^{T}}\left\{\frac{Th(Y;\mu^{(1)})^{\gamma}}{\pi(X;\alpha^{*})}(Y-\mu^{(1)})-\frac{T-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\mathbb{E}_{g}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X\right]\right\}\right]
=\displaystyle= 𝔼g[αT(P(T=1|X)π(X;α)P(T=1|X)π(X;α)π(X;α))𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))|X]]\displaystyle~{}\mathbb{E}_{g}\left[\frac{\partial}{\partial\alpha^{T}}\left(\frac{P(T=1|X)}{\pi(X;\alpha^{*})}-\frac{P(T=1|X)-\pi(X;\alpha^{*})}{\pi(X;\alpha^{*})}\right)\mathbb{E}_{g}[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})|X]\right]
=\displaystyle= 𝔼g[h(Y(1);μ(1))γ(Y(1)μ(1))]\displaystyle~{}\mathbb{E}_{g}\left[h(Y^{(1)};\mu^{(1)})^{\gamma}(Y^{(1)}-\mu^{(1)})\right]
=\displaystyle= 0.\displaystyle~{}0.

Appendix Appendix E Details of Numerical Algorithm

Appendix E.1 General Form

Because the proposed estimating equations cannot be solved explicitly, we use an iterative algorithm. Various algorithms are available; however, we propose a standard algorithm for M-estimators. The algorithm for the DP-IPW estimator is given by the following updates:

μ^[a+1]\displaystyle\hat{\mu}^{[a+1]} ={i=1nwi[a]Yi}{i=1nwi[a]}1,\displaystyle=\left\{\sum_{i=1}^{n}w^{[a]}_{i}Y_{i}\right\}\left\{\sum_{i=1}^{n}w^{[a]}_{i}\right\}^{-1}, (A68)
wi[a+1]\displaystyle w^{[a+1]}_{i} =Tih(Yi;μ^[a+1])γπ(Xi;α^)for alli.\displaystyle=\frac{T_{i}h(Y_{i};\hat{\mu}^{[a+1]})^{\gamma}}{\pi(X_{i};\hat{\alpha})}~{}~{}~{}~{}\text{for all}~{}~{}i. (A69)

We recommend obtaining the initial values (μ[0],wi[0])(\mu^{[0]},w_{i}^{[0]}) in an outlier-resistant manner. For example, μ[0]\mu^{[0]} can be obtained by the IPW median (Firpo, 2007; Zhang et al. 2012), and wi[0]w_{i}^{[0]} is obtained using (A69). If the weighting density is indexed by other parameters, it must be estimated in advance or be updated simultaneously to μ\mu and ww. In the next section, we present an algorithm in which we assume that hh is Gaussian.

The (ε\varepsilon)DP-DR estimator is obtained in a similar manner. Let h(;μ(1))h(\cdot;\mu^{(1)}) be fixed and solve the estimating equation of ε\varepsilonDP-DR with respect to μ\mu:

μ=\displaystyle\mu= 1ni=1n{Tih(Yi;μ(1))γYiπ(Xi;α^)Tiπ(Xi;α^)π(Xi;α^)(1ε^)m1,μ(1)(Xi;β^)}\displaystyle~{}\frac{1}{n}\sum_{i=1}^{n}\left\{\frac{T_{i}h(Y_{i};\mu^{(1)})^{\gamma}Y_{i}}{\pi(X_{i};\hat{\alpha})}-\frac{T_{i}-\pi(X_{i};\hat{\alpha})}{\pi(X_{i};\hat{\alpha})}(1-\hat{\varepsilon})m_{1,\mu^{(1)}}(X_{i};\hat{\beta})\right\} (A70)
×{Tih(Yi;μ(1))γπ(Xi;α^)Tiπ(Xi;α^)π(Xi;α^)(1ε^)m0,μ(1)(Xi;β^)}1,\displaystyle~{}~{}~{}~{}~{}~{}~{}\times\left\{\frac{T_{i}h(Y_{i};\mu^{(1)})^{\gamma}}{\pi(X_{i};\hat{\alpha})}-\frac{T_{i}-\pi(X_{i};\hat{\alpha})}{\pi(X_{i};\hat{\alpha})}(1-\hat{\varepsilon})m_{0,\mu^{(1)}}(X_{i};\hat{\beta})\right\}^{-1}, (A71)

where m0,μ(1)(Xi;β^)=𝔼g[h(Y(1);μ)γ|X]m_{0,\mu^{(1)}}(X_{i};\hat{\beta})=\mathbb{E}_{g}[h(Y^{(1)};\mu)^{\gamma}|X] and m1,μ(1)(Xi;β^)=𝔼g[h(Y(1);μ)γY(1)|X]m_{1,\mu^{(1)}}(X_{i};\hat{\beta})=\mathbb{E}_{g}[h(Y^{(1)};\mu)^{\gamma}Y^{(1)}|X]. Then, the following algorithm is obtained:

μ^[a+1]\displaystyle\hat{\mu}^{[a+1]} ={i=1nw1,i[a]Yiw2,im^1,μ[a](Xi;β^)}{i=1nw1,i[a]w2,im^0,μ[a](Xi;β^)}1,\displaystyle=\left\{\sum_{i=1}^{n}w^{[a]}_{1,i}Y_{i}-w_{2,i}\hat{m}_{1,\mu^{[a]}}(X_{i};\hat{\beta})\right\}\left\{\sum_{i=1}^{n}w^{[a]}_{1,i}-w_{2,i}\hat{m}_{0,\mu^{[a]}}(X_{i};\hat{\beta})\right\}^{-1}, (A72)
w1,i[a+1]\displaystyle w^{[a+1]}_{1,i} =Tih(Yi;μ^[a+1])γπ(Xi;α^),w2,i=Tiπ(Xi;α^)π(Xi;α^)(1ε^1)for alli.\displaystyle=\frac{T_{i}h(Y_{i};\hat{\mu}^{[a+1]})^{\gamma}}{\pi(X_{i};\hat{\alpha})},~{}~{}~{}w_{2,i}=\frac{T_{i}-\pi(X_{i};\hat{\alpha})}{\pi(X_{i};\hat{\alpha})}(1-\hat{\varepsilon}_{1})~{}~{}~{}~{}\text{for all}~{}~{}i. (A73)

Note that it is not necessary to update w2,iw_{2,i} once it is computed. The initial values should be obtained in an outlier-resistant manner, as in DP-IPW. Recall that m^1,μ\hat{m}_{1,\mu} and m^0,μ\hat{m}_{0,\mu} are the estimates for the conditional expectation 𝔼g[h(Y(1);μ)γY(1)|X]\mathbb{E}_{g}[h(Y^{(1)};\mu)^{\gamma}Y^{(1)}|X] and 𝔼g[h(Y(1);μ)γ|X]\mathbb{E}_{g}[h(Y^{(1)};\mu)^{\gamma}|X] given μ\mu. These updates can be obtained from the estimated conditional density q(y|X;β^)q(y|X;\hat{\beta}) through Monte-Carlo approximation (Hoshino, 2007) or direct calculations.

Appendix E.2 Gaussian Weight

When the weighting density hh is assumed to be Gaussian, some value must be assigned to the standard deviation σ\sigma. Under contamination, we suggest that σ\sigma is estimated in an outlier-resistant manner, such as by using the normalized median absolute deviation (MADN). MADN is an unbiased estimator for the standard deviation of a Gaussian random variable. For DP-IPW, we can obtain σ\sigma by the following updating formula:

σ^[a+1]\displaystyle\hat{\sigma}^{[a+1]} =IPW-MADN({Yi}i=1n,μ^[a+1]).\displaystyle=\text{IPW-MADN}(\{Y_{i}\}_{i=1}^{n},\hat{\mu}^{[a+1]}). (A74)

The IPW-MADN is defined as

IPW-MADN({Yi}i=1n,μ)=\displaystyle\text{IPW-MADN}(\{Y_{i}\}_{i=1}^{n},\mu)= 1.483IPW-median({|Yiμ|}i=1n),\displaystyle~{}1.483\cdot\text{IPW-median}\left(\left\{|Y_{i}-\mu|\right\}_{i=1}^{n}\right), (A75)

where 1.4831.483 is a normalization constant.

Similarly, σ\sigma of the (ε\varepsilon)DP-DR estimator under Gaussian weight is obtained by

σ^[a+1]\displaystyle\hat{\sigma}^{[a+1]} =DR-MADN({Yi}i=1n,μ^[a+1]).\displaystyle=\text{DR-MADN}(\{Y_{i}\}_{i=1}^{n},\hat{\mu}^{[a+1]}). (A76)

The DR-MADN is obtained by using the DR-median, which is discussed in Section 7.

DR-MADN({Yi}i=1n,μ)=\displaystyle\text{DR-MADN}(\{Y_{i}\}_{i=1}^{n},\mu)= 1.483DR-median({|Yiμ|}i=1n).\displaystyle~{}1.483\cdot\text{DR-median}\left(\left\{|Y_{i}-\mu|\right\}_{i=1}^{n}\right). (A77)

Further, the updates of m^1,μ\hat{m}_{1,\mu} and m^0,μ\hat{m}_{0,\mu} are expressed explicitly when q(y|X;β^)q(y|X;\hat{\beta}) is assumed to be the conditional Gaussian distribution given XX. Let u(X)=𝔼q[Y|X]u(X)=\mathbb{E}_{q}[Y|X] and v2(X)=Varq[Y|X]v^{2}(X)=\mathrm{Var}_{q}[Y|X]. Then, we obtain

m0,μ[a](X)\displaystyle m_{0,\mu^{[a]}}(X) =(2π)γ2(σ[a]2)1γ2σ[a]2+γv2(X)exp{γ(μ[a]u(X))2(σ[a]2+γv2(X))},\displaystyle=(2\pi)^{-\frac{\gamma}{2}}\frac{({\sigma^{[a]}}^{2})^{\frac{1-\gamma}{2}}}{\sqrt{{\sigma^{[a]}}^{2}+\gamma v^{2}(X)}}\cdot\exp\left\{-\frac{\gamma(\mu^{[a]}-u(X))}{2({\sigma^{[a]}}^{2}+\gamma v^{2}(X))}\right\}, (A78)
m1,μ[a](X)\displaystyle m_{1,\mu^{[a]}}(X) =(2π)γ2(σ[a]2)1γ2σ[a]2+γv2(X)u(X)σ[a]2+γμ[a]v2(X)σ[a]2+γv2(X)exp{γ(μ[a]u(X))2(σ[a]2+γv2(X))}.\displaystyle=(2\pi)^{-\frac{\gamma}{2}}\frac{({\sigma^{[a]}}^{2})^{\frac{1-\gamma}{2}}}{\sqrt{{\sigma^{[a]}}^{2}+\gamma v^{2}(X)}}\cdot\frac{u(X){\sigma^{[a]}}^{2}+\gamma\mu^{[a]}v^{2}(X)}{{\sigma^{[a]}}^{2}+\gamma v^{2}(X)}\cdot\exp\left\{-\frac{\gamma(\mu^{[a]}-u(X))}{2({\sigma^{[a]}}^{2}+\gamma v^{2}(X))}\right\}. (A79)

Notably, the conditional variance can be easily estimated because many general outlier-resistant methods can be applied for this purpose.

Appendix Appendix F Solution Paths of γ\gamma-sensitivity Study

Refer to caption
Figure A1: Solution paths of the first 100 simulations. The x-axis represents the tuning parameter γ\gamma and the y-axis, the estimates of μ(1)\mu^{(1)}.

Appendix Appendix G Remaining Results of Monte-Carlo Simulation

Remaining results of the Monte-Carlo Simulation are presented in the following pages.

  • Tables S1 and S2: Gaussian covariates and Gaussian MLE on non-outliers. The RMSE is presented in Tables 2 and 3 in the main text.

  • Tables S3 to S5: Gaussian covariates and unnormalized Gaussian modeling.

  • Tables S6 to S8: Uniform covariates and Gaussian MLE on non-outliers.

  • Tables S9 to S11: Uniform covariates and unnormalized Gaussian modeling.

No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 3.004 (0.22) 3.749 (0.59) 4.493 (0.78) 5.983 (1.02) 3.766 (0.63) 4.536 (0.84) 6.070 (1.08)
median (Firpo) 2.990 (0.26) 3.091 (0.28) 3.205 (0.30) 3.490 (0.43) 3.116 (0.28) 3.259 (0.32) 3.605 (0.47)
median (Zhang-IPW) 2.990 (0.26) 3.091 (0.28) 3.205 (0.30) 3.490 (0.43) 3.116 (0.28) 3.259 (0.32) 3.605 (0.47)
DP-IPW (γ=0.1\gamma=0.1) 2.998 (0.22) 3.030 (0.27) 3.142 (0.51) 4.492 (1.70) 3.056 (0.29) 3.209 (0.57) 4.620 (1.74)
DP-IPW (γ=0.5\gamma=0.5) 2.986 (0.23) 2.987 (0.25) 2.989 (0.27) 3.052 (0.64) 3.011 (0.24) 3.042 (0.28) 3.173 (0.70)
DP-IPW (γ=1.0\gamma=1.0) 2.980 (0.26) 2.978 (0.27) 2.977 (0.27) 2.990 (0.41) 3.003 (0.26) 3.033 (0.28) 3.114 (0.48)
DR(T/T) Naive 2.999 (0.18) 3.745 (0.60) 4.489 (0.79) 5.979 (1.04) 3.762 (0.64) 4.533 (0.86) 6.069 (1.12)
median (Zhang-DR) 2.994 (0.24) 3.096 (0.30) 3.210 (0.33) 3.499 (0.54) 3.121 (0.31) 3.264 (0.37) 3.620 (0.66)
median (Sued) 2.994 (0.24) 3.096 (0.30) 3.209 (0.33) 3.496 (0.48) 3.121 (0.31) 3.264 (0.36) 3.616 (0.61)
median (TMLE) 2.994 (0.24) 3.095 (0.26) 3.208 (0.29) 3.479 (0.37) 3.120 (0.27) 3.260 (0.31) 3.587 (0.38)
DP-DR (γ=0.1\gamma=0.1) 2.998 (0.18) 3.029 (0.30) 3.140 (0.55) 4.465 (1.72) 3.054 (0.31) 3.207 (0.62) 4.604 (1.78)
DP-DR (γ=0.5\gamma=0.5) 2.996 (0.20) 2.997 (0.29) 3.000 (0.33) 3.060 (0.69) 3.022 (0.27) 3.053 (0.34) 3.195 (0.81)
DP-DR (γ=1.0\gamma=1.0) 2.992 (0.24) 2.991 (0.29) 2.992 (0.31) 3.009 (0.52) 3.017 (0.29) 3.047 (0.33) 3.137 (0.66)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 2.998 (0.18) 3.028 (0.29) 3.138 (0.54) 4.464 (1.72) 3.054 (0.31) 3.204 (0.60) 4.604 (1.77)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.996 (0.20) 2.997 (0.26) 2.999 (0.30) 3.058 (0.67) 3.022 (0.27) 3.052 (0.32) 3.190 (0.77)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.992 (0.24) 2.991 (0.29) 2.991 (0.30) 3.007 (0.51) 3.017 (0.29) 3.047 (0.33) 3.134 (0.63)
DR(T/F) Naive 3.004 (0.24) 3.750 (0.60) 4.494 (0.78) 5.984 (1.03) 3.767 (0.64) 4.537 (0.85) 6.073 (1.09)
median (Zhang-DR) 2.990 (0.27) 3.092 (0.33) 3.208 (0.35) 3.499 (0.55) 3.118 (0.33) 3.262 (0.38) 3.623 (0.67)
median (Sued) 2.989 (0.27) 3.091 (0.33) 3.206 (0.35) 3.493 (0.50) 3.117 (0.33) 3.261 (0.38) 3.616 (0.62)
median (TMLE) 2.999 (0.24) 3.100 (0.27) 3.214 (0.29) 3.496 (0.38) 3.125 (0.27) 3.267 (0.30) 3.607 (0.39)
DP-DR (γ=0.1\gamma=0.1) 2.998 (0.24) 3.033 (0.31) 3.150 (0.54) 4.490 (1.71) 3.060 (0.32) 3.218 (0.61) 4.624 (1.76)
DP-DR (γ=0.5\gamma=0.5) 2.986 (0.25) 2.989 (0.32) 2.992 (0.35) 3.059 (0.71) 3.014 (0.32) 3.044 (0.36) 3.196 (0.82)
DP-DR (γ=1.0\gamma=1.0) 2.979 (0.28) 2.978 (0.33) 2.979 (0.35) 3.001 (0.58) 3.004 (0.33) 3.035 (0.37) 3.133 (0.70)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 2.998 (0.24) 3.033 (0.31) 3.149 (0.54) 4.489 (1.71) 3.059 (0.32) 3.218 (0.60) 4.623 (1.75)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.986 (0.25) 2.989 (0.32) 2.992 (0.34) 3.057 (0.69) 3.013 (0.31) 3.044 (0.35) 3.192 (0.79)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.979 (0.28) 2.978 (0.33) 2.978 (0.34) 2.998 (0.55) 3.004 (0.33) 3.035 (0.37) 3.132 (0.70)
DR(F/T) Naive 2.999 (0.18) 3.725 (0.50) 4.451 (0.65) 5.902 (0.86) 3.667 (0.49) 4.341 (0.65) 5.685 (0.84)
median (Zhang-DR) 3.003 (0.24) 3.081 (0.25) 3.169 (0.27) 3.388 (0.32) 3.096 (0.25) 3.200 (0.27) 3.445 (0.32)
median (Sued) 2.999 (0.24) 3.101 (0.25) 3.213 (0.27) 3.494 (0.34) 3.111 (0.25) 3.237 (0.28) 3.532 (0.33)
median (TMLE) 3.003 (0.23) 3.079 (0.25) 3.164 (0.26) 3.368 (0.30) 3.093 (0.25) 3.194 (0.26) 3.424 (0.30)
DP-DR (γ=0.1\gamma=0.1) 2.999 (0.18) 2.997 (0.19) 3.051 (0.34) 4.326 (1.57) 3.018 (0.19) 3.077 (0.29) 4.000 (1.35)
DP-DR (γ=0.5\gamma=0.5) 3.001 (0.20) 2.975 (0.20) 2.950 (0.21) 2.907 (0.35) 3.000 (0.20) 3.000 (0.21) 3.008 (0.28)
DP-DR (γ=1.0\gamma=1.0) 3.005 (0.23) 2.978 (0.23) 2.953 (0.23) 2.895 (0.25) 3.004 (0.23) 3.006 (0.23) 3.008 (0.24)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 2.999 (0.18) 3.020 (0.19) 3.108 (0.37) 4.541 (1.58) 3.040 (0.19) 3.131 (0.31) 4.209 (1.39)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 3.001 (0.20) 2.998 (0.20) 2.998 (0.21) 3.020 (0.38) 3.022 (0.20) 3.048 (0.21) 3.117 (0.30)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 3.005 (0.23) 3.001 (0.23) 3.001 (0.23) 3.003 (0.24) 3.027 (0.23) 3.054 (0.23) 3.116 (0.23)
Table A1: Mean and SD of 10,000 simulated estimates of μ(1)\mu^{(1)}. The covariates XX were generated from Gaussian distributions, and the outcome regression was obtained by the Gaussian MLE using non-outliers. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 0.009 0.010 0.012 0.009 0.010 0.010 0.012
median (Firpo) 2.019 2.035 2.052 2.040 2.049 2.049 2.044
median (Zhang-IPW) 0.153 0.166 0.165 0.162 0.194 0.195 0.203
DP-IPW (γ=0.1\gamma=0.1) 33.713 44.646 62.069 80.462 44.256 60.934 78.925
DP-IPW (γ=0.5\gamma=0.5) 48.822 50.308 48.108 46.894 49.748 48.500 47.246
DP-IPW (γ=1.0\gamma=1.0) 65.508 64.401 62.285 55.019 63.856 61.479 54.070
DR(T/T) Naive 0.014 0.015 0.014 0.013 0.014 0.020 0.014
median (Zhang-DR) 0.416 0.417 0.422 0.412 0.406 0.424 0.416
median (Sued) 1.720 1.893 1.689 1.825 1.806 1.710 1.723
median (TMLE) 1067.234 1050.107 1021.848 1010.238 1046.995 1015.752 1000.908
DP-DR (γ=0.1\gamma=0.1) 20.316 55.933 131.469 209.206 57.353 141.999 202.042
DP-DR (γ=0.5\gamma=0.5) 54.524 47.142 47.741 39.424 50.581 43.799 37.860
DP-DR (γ=1.0\gamma=1.0) 83.912 81.759 74.653 58.289 78.730 73.391 60.144
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 19.997 59.061 132.762 212.343 57.435 137.271 205.093
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 54.806 50.210 44.774 39.729 50.144 46.977 37.970
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 84.375 79.237 73.203 56.916 78.328 76.474 60.338
DR(T/F) Naive 0.014 0.015 0.013 0.014 0.016 0.013 0.014
median (Zhang-DR) 0.337 0.370 0.358 0.354 0.375 0.371 0.364
median (Sued) 1.646 1.829 1.705 1.754 1.753 1.893 3.771
median (TMLE) 999.657 991.569 983.834 991.174 997.403 994.754 991.656
DP-DR (γ=0.1\gamma=0.1) 18.851 63.892 137.795 198.849 57.978 138.945 195.932
DP-DR (γ=0.5\gamma=0.5) 52.547 49.129 45.949 40.204 53.554 47.560 39.319
DP-DR (γ=1.0\gamma=1.0) 80.465 81.757 72.929 60.123 79.196 73.844 56.672
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 18.603 59.010 133.449 203.896 60.482 138.275 190.823
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 52.892 53.978 49.416 39.773 53.920 49.583 37.347
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 80.870 81.030 73.614 59.259 78.651 72.015 59.778
DR(F/T) Naive 0.013 0.014 0.014 0.014 0.014 0.015 0.015
median (Zhang-DR) 0.399 0.417 0.415 2.122 0.428 0.419 0.440
median (Sued) 1.607 1.876 1.766 1.766 1.751 1.902 1.836
median (TMLE) 951.348 975.007 970.882 986.397 970.804 979.606 988.658
DP-DR (γ=0.1\gamma=0.1) 20.537 58.989 143.063 218.114 54.900 123.918 248.367
DP-DR (γ=0.5\gamma=0.5) 52.096 54.095 45.947 39.811 56.126 48.108 40.617
DP-DR (γ=1.0\gamma=1.0) 85.026 84.556 74.602 59.838 84.897 80.306 65.958
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 20.352 60.387 147.921 201.602 54.002 131.749 226.009
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 52.473 50.362 47.613 41.028 51.504 45.609 41.540
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 85.557 83.889 75.631 64.173 84.836 77.640 64.876
Table A2: Mean computation time (ms) of 10,000 simulations. The covariates XX were generated from Gaussian distributions, and the outcome regression was obtained by the Gaussian MLE using non-outliers. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 0.222 0.957 1.683 3.153 0.993 1.752 3.253
median (Firpo) 0.257 0.294 0.367 0.649 0.306 0.409 0.769
median (Zhang-IPW) 0.257 0.294 0.367 0.649 0.306 0.409 0.769
DP-IPW (γ=0.1\gamma=0.1) 0.218 0.276 0.531 2.263 0.293 0.609 2.377
DP-IPW (γ=0.5\gamma=0.5) 0.227 0.249 0.272 0.639 0.245 0.287 0.726
DP-IPW (γ=1.0\gamma=1.0) 0.261 0.271 0.275 0.413 0.262 0.281 0.498
DR(T/T) Naive 0.185 0.957 1.683 3.154 0.997 1.758 3.265
median (Zhang-DR) 0.242 0.317 0.391 0.733 0.330 0.452 0.905
median (Sued) 0.241 0.316 0.388 0.692 0.328 0.450 0.866
median (TMLE) 0.237 0.280 0.359 0.600 0.295 0.401 0.696
DP-DR (γ=0.1\gamma=0.1) 0.183 0.301 0.563 2.262 0.317 0.649 2.395
DP-DR (γ=0.5\gamma=0.5) 0.202 0.290 0.326 0.692 0.274 0.349 0.839
DP-DR (γ=1.0\gamma=1.0) 0.239 0.287 0.307 0.530 0.287 0.335 0.669
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.183 0.294 0.550 2.256 0.312 0.637 2.388
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.202 0.263 0.301 0.659 0.269 0.321 0.800
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.239 0.287 0.298 0.515 0.286 0.334 0.648
DR(T/F) Naive 0.239 0.963 1.685 3.155 1.001 1.757 3.260
median (Zhang-DR) 0.277 0.344 0.409 0.743 0.351 0.466 0.911
median (Sued) 0.275 0.343 0.407 0.700 0.351 0.465 0.871
median (TMLE) 0.242 0.285 0.364 0.624 0.297 0.406 0.725
DP-DR (γ=0.1\gamma=0.1) 0.240 0.315 0.563 2.267 0.331 0.645 2.391
DP-DR (γ=0.5\gamma=0.5) 0.251 0.322 0.353 0.720 0.321 0.365 0.840
DP-DR (γ=1.0\gamma=1.0) 0.284 0.337 0.349 0.588 0.332 0.380 0.709
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.240 0.312 0.558 2.263 0.329 0.640 2.387
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.250 0.318 0.345 0.696 0.315 0.358 0.812
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.282 0.335 0.344 0.555 0.326 0.372 0.698
DR(F/T) Naive 0.182 0.880 1.592 3.026 0.827 1.490 2.814
median (Zhang-DR) 0.237 0.262 0.313 0.499 0.267 0.333 0.543
median (Sued) 0.236 0.272 0.346 0.600 0.278 0.364 0.627
median (TMLE) 0.235 0.259 0.306 0.470 0.264 0.324 0.513
DP-DR (γ=0.1\gamma=0.1) 0.183 0.193 0.346 2.063 0.192 0.302 1.687
DP-DR (γ=0.5\gamma=0.5) 0.200 0.209 0.221 0.365 0.205 0.211 0.282
DP-DR (γ=1.0\gamma=1.0) 0.230 0.234 0.242 0.278 0.231 0.234 0.244
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.183 0.195 0.396 2.227 0.196 0.345 1.843
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.199 0.204 0.208 0.401 0.204 0.211 0.323
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.230 0.230 0.231 0.246 0.231 0.235 0.255
Table A3: Results of the comparative study. Each figure is RMSE between each method and the true value. The covariates XX were generated from Gaussian distributions, and the outcome regression was obtained by the unnormalized Gaussian modeling. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 3.004 (0.22) 3.749 (0.59) 4.493 (0.78) 5.983 (1.02) 3.766 (0.63) 4.536 (0.84) 6.070 (1.08)
median (Firpo) 2.990 (0.26) 3.091 (0.28) 3.205 (0.30) 3.490 (0.43) 3.116 (0.28) 3.259 (0.32) 3.605 (0.47)
median (Zhang-IPW) 2.990 (0.26) 3.091 (0.28) 3.205 (0.30) 3.490 (0.43) 3.116 (0.28) 3.259 (0.32) 3.605 (0.47)
DP-IPW (γ=0.1\gamma=0.1) 2.998 (0.22) 3.030 (0.27) 3.142 (0.51) 4.492 (1.70) 3.056 (0.29) 3.209 (0.57) 4.620 (1.74)
DP-IPW (γ=0.5\gamma=0.5) 2.986 (0.23) 2.987 (0.25) 2.989 (0.27) 3.052 (0.64) 3.011 (0.24) 3.042 (0.28) 3.173 (0.70)
DP-IPW (γ=1.0\gamma=1.0) 2.980 (0.26) 2.978 (0.27) 2.977 (0.27) 2.990 (0.41) 3.003 (0.26) 3.033 (0.28) 3.114 (0.48)
DR(T/T) Naive 2.999 (0.18) 3.745 (0.60) 4.489 (0.79) 5.979 (1.04) 3.762 (0.64) 4.533 (0.86) 6.069 (1.11)
median (Zhang-DR) 2.994 (0.24) 3.096 (0.30) 3.210 (0.33) 3.500 (0.54) 3.121 (0.31) 3.265 (0.37) 3.620 (0.66)
median (Sued) 2.994 (0.24) 3.096 (0.30) 3.209 (0.33) 3.495 (0.48) 3.121 (0.31) 3.265 (0.36) 3.616 (0.61)
median (TMLE) 2.995 (0.24) 3.094 (0.26) 3.208 (0.29) 3.476 (0.36) 3.120 (0.27) 3.260 (0.31) 3.583 (0.38)
DP-DR (γ=0.1\gamma=0.1) 2.998 (0.18) 3.029 (0.30) 3.140 (0.55) 4.466 (1.72) 3.054 (0.31) 3.207 (0.62) 4.605 (1.78)
DP-DR (γ=0.5\gamma=0.5) 2.996 (0.20) 2.998 (0.29) 3.000 (0.33) 3.060 (0.69) 3.022 (0.27) 3.053 (0.34) 3.197 (0.82)
DP-DR (γ=1.0\gamma=1.0) 2.993 (0.24) 2.991 (0.29) 2.992 (0.31) 3.010 (0.53) 3.017 (0.29) 3.047 (0.33) 3.137 (0.66)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 2.998 (0.18) 3.028 (0.29) 3.137 (0.53) 4.466 (1.72) 3.054 (0.31) 3.205 (0.60) 4.607 (1.77)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.996 (0.20) 2.997 (0.26) 2.999 (0.30) 3.057 (0.66) 3.022 (0.27) 3.052 (0.32) 3.192 (0.78)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.993 (0.24) 2.991 (0.29) 2.991 (0.30) 3.008 (0.52) 3.017 (0.29) 3.047 (0.33) 3.134 (0.63)
DR(T/F) Naive 3.002 (0.24) 3.748 (0.61) 4.492 (0.78) 5.982 (1.03) 3.766 (0.64) 4.536 (0.85) 6.071 (1.09)
median (Zhang-DR) 2.988 (0.28) 3.090 (0.33) 3.206 (0.35) 3.497 (0.55) 3.116 (0.33) 3.260 (0.39) 3.621 (0.67)
median (Sued) 2.988 (0.28) 3.090 (0.33) 3.205 (0.35) 3.491 (0.50) 3.116 (0.33) 3.259 (0.39) 3.615 (0.62)
median (TMLE) 3.000 (0.24) 3.101 (0.27) 3.216 (0.29) 3.499 (0.37) 3.126 (0.27) 3.268 (0.30) 3.610 (0.39)
DP-DR (γ=0.1\gamma=0.1) 2.996 (0.24) 3.032 (0.31) 3.149 (0.54) 4.490 (1.71) 3.058 (0.33) 3.217 (0.61) 4.621 (1.76)
DP-DR (γ=0.5\gamma=0.5) 2.984 (0.25) 2.987 (0.32) 2.990 (0.35) 3.058 (0.72) 3.012 (0.32) 3.042 (0.36) 3.194 (0.82)
DP-DR (γ=1.0\gamma=1.0) 2.977 (0.28) 2.976 (0.34) 2.976 (0.35) 2.999 (0.59) 3.002 (0.33) 3.033 (0.38) 3.130 (0.70)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 2.996 (0.24) 3.032 (0.31) 3.149 (0.54) 4.488 (1.71) 3.058 (0.32) 3.216 (0.60) 4.621 (1.75)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.984 (0.25) 2.987 (0.32) 2.990 (0.34) 3.056 (0.69) 3.011 (0.31) 3.042 (0.36) 3.190 (0.79)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.977 (0.28) 2.976 (0.33) 2.977 (0.34) 2.997 (0.55) 3.001 (0.33) 3.033 (0.37) 3.130 (0.69)
DR(F/T) Naive 2.999 (0.18) 3.726 (0.50) 4.451 (0.65) 5.902 (0.86) 3.667 (0.49) 4.341 (0.65) 5.685 (0.84)
median (Zhang-DR) 2.997 (0.24) 3.074 (0.25) 3.162 (0.27) 3.380 (0.32) 3.088 (0.25) 3.193 (0.27) 3.438 (0.32)
median (Sued) 2.999 (0.24) 3.101 (0.25) 3.214 (0.27) 3.495 (0.34) 3.112 (0.25) 3.238 (0.28) 3.532 (0.33)
median (TMLE) 2.997 (0.24) 3.072 (0.25) 3.157 (0.26) 3.358 (0.30) 3.086 (0.25) 3.186 (0.27) 3.414 (0.30)
DP-DR (γ=0.1\gamma=0.1) 2.998 (0.18) 2.996 (0.19) 3.051 (0.34) 4.333 (1.57) 3.016 (0.19) 3.077 (0.29) 4.007 (1.35)
DP-DR (γ=0.5\gamma=0.5) 2.995 (0.20) 2.970 (0.21) 2.944 (0.21) 2.901 (0.35) 2.994 (0.20) 2.995 (0.21) 3.003 (0.28)
DP-DR (γ=1.0\gamma=1.0) 2.996 (0.23) 2.969 (0.23) 2.943 (0.24) 2.885 (0.25) 2.994 (0.23) 2.996 (0.23) 2.999 (0.24)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 3.002 (0.18) 3.023 (0.19) 3.115 (0.38) 4.556 (1.59) 3.040 (0.19) 3.130 (0.32) 4.201 (1.40)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.999 (0.20) 2.996 (0.20) 2.997 (0.21) 3.024 (0.40) 3.017 (0.20) 3.040 (0.21) 3.104 (0.31)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 3.000 (0.23) 2.996 (0.23) 2.996 (0.23) 3.000 (0.25) 3.018 (0.23) 3.043 (0.23) 3.098 (0.24)
Table A4: Mean and SD of 10,000 simulated estimates of μ(1)\mu^{(1)}. The covariates XX were generated from Gaussian distributions, and the outcome regression was obtained by the unnormalized Gaussian modeling. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 0.008 0.011 0.009 0.010 0.009 0.008 0.006
median (Firpo) 2.028 2.038 2.009 1.992 1.591 1.604 1.610
median (Zhang-IPW) 0.164 0.184 0.159 0.167 0.172 0.166 0.168
DP-IPW (γ=0.1\gamma=0.1) 33.735 44.282 60.503 79.176 39.488 54.135 70.099
DP-IPW (γ=0.5\gamma=0.5) 48.650 49.775 47.834 46.162 43.983 43.577 41.231
DP-IPW (γ=1.0\gamma=1.0) 65.725 64.715 61.072 53.993 56.336 54.092 48.227
DR(T/T) Naive 0.028 0.013 0.015 0.015 0.014 0.017 0.015
median (Zhang-DR) 0.412 0.442 0.429 0.435 0.359 0.371 0.370
median (Sued) 1.736 1.806 1.758 1.836 1.579 1.624 1.605
median (TMLE) 1055.657 1033.144 1008.369 993.124 943.045 934.975 921.709
DP-DR (γ=0.1\gamma=0.1) 18.983 57.100 136.473 211.645 52.968 125.900 186.668
DP-DR (γ=0.5\gamma=0.5) 56.355 48.725 48.968 39.064 43.681 43.085 36.285
DP-DR (γ=1.0\gamma=1.0) 87.300 82.622 74.404 62.842 73.313 69.041 55.489
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 19.912 56.479 135.810 216.944 51.907 128.889 187.006
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 57.311 52.397 50.270 39.846 44.903 46.369 35.150
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 86.517 79.409 74.941 60.813 73.977 68.664 54.090
DR(T/F) Naive 0.016 0.016 0.029 0.017 0.013 0.012 0.014
median (Zhang-DR) 0.339 0.368 0.362 0.366 0.312 0.312 0.319
median (Sued) 1.689 1.873 3.724 3.779 1.445 1.501 1.683
median (TMLE) 981.209 994.937 968.273 974.131 887.897 895.797 915.279
DP-DR (γ=0.1\gamma=0.1) 18.828 59.230 132.483 197.779 53.452 124.888 179.470
DP-DR (γ=0.5\gamma=0.5) 55.630 51.825 46.049 36.762 48.728 44.227 36.664
DP-DR (γ=1.0\gamma=1.0) 81.111 80.063 71.932 58.312 70.519 67.554 54.077
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 18.391 60.712 130.125 199.546 54.833 126.829 180.125
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 56.661 52.369 47.067 40.297 49.611 42.699 37.208
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 83.076 80.920 73.754 62.673 71.937 67.024 56.490
DR(F/T) Naive 0.013 0.016 0.014 0.015 0.014 0.016 0.014
median (Zhang-DR) 0.404 2.159 0.415 0.423 0.372 0.363 0.369
median (Sued) 1.652 1.849 1.657 1.781 1.505 1.487 1.564
median (TMLE) 940.593 961.615 929.511 967.157 861.292 867.211 893.940
DP-DR (γ=0.1\gamma=0.1) 19.768 63.865 144.046 222.438 50.837 116.164 225.446
DP-DR (γ=0.5\gamma=0.5) 51.923 49.406 49.673 42.449 45.033 45.493 36.107
DP-DR (γ=1.0\gamma=1.0) 90.062 85.686 74.602 64.006 76.707 72.257 57.556
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 19.023 63.164 147.321 202.361 50.513 115.868 210.642
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 57.110 50.739 46.918 42.489 47.041 43.003 36.828
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 87.439 87.305 75.546 62.424 76.284 69.237 58.958
Table A5: Mean computation time (ms) of 10,000 simulations. The covariates XX were generated from Gaussian distributions, and the outcome regression was obtained by the unnormalized Gaussian modeling. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 0.197 0.952 1.711 3.174 1.000 1.762 3.269
median (Firpo) 0.278 0.319 0.407 0.705 0.334 0.447 0.785
median (Zhang-IPW) 0.278 0.319 0.407 0.705 0.334 0.447 0.785
DP-IPW (γ=0.1\gamma=0.1) 0.199 0.230 0.567 2.329 0.253 0.609 2.429
DP-IPW (γ=0.5\gamma=0.5) 0.223 0.230 0.239 0.703 0.230 0.247 0.690
DP-IPW (γ=1.0\gamma=1.0) 0.273 0.273 0.273 0.432 0.272 0.279 0.394
DR(T/T) Naive 0.182 0.948 1.711 3.176 1.000 1.766 3.275
median (Zhang-DR) 0.265 0.308 0.404 0.732 0.327 0.446 0.800
median (Sued) 0.264 0.308 0.402 0.704 0.327 0.445 0.786
median (TMLE) 0.263 0.308 0.401 0.664 0.326 0.442 0.768
DP-DR (γ=0.1\gamma=0.1) 0.184 0.219 0.582 2.332 0.254 0.624 2.438
DP-DR (γ=0.5\gamma=0.5) 0.208 0.216 0.243 0.745 0.218 0.246 0.726
DP-DR (γ=1.0\gamma=1.0) 0.257 0.258 0.260 0.498 0.259 0.272 0.480
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.184 0.218 0.579 2.330 0.251 0.618 2.435
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.208 0.216 0.241 0.730 0.218 0.245 0.710
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.257 0.258 0.260 0.481 0.260 0.266 0.437
DR(T/F) Naive 0.202 0.954 1.715 3.179 1.004 1.767 3.276
median (Zhang-DR) 0.285 0.324 0.413 0.738 0.340 0.453 0.806
median (Sued) 0.285 0.325 0.412 0.710 0.340 0.453 0.792
median (TMLE) 0.267 0.313 0.406 0.682 0.331 0.448 0.786
DP-DR (γ=0.1\gamma=0.1) 0.204 0.235 0.579 2.332 0.261 0.625 2.436
DP-DR (γ=0.5\gamma=0.5) 0.229 0.235 0.257 0.752 0.236 0.268 0.754
DP-DR (γ=1.0\gamma=1.0) 0.279 0.279 0.280 0.514 0.279 0.287 0.487
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.204 0.234 0.577 2.329 0.262 0.623 2.433
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.229 0.235 0.256 0.742 0.235 0.266 0.743
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.279 0.279 0.279 0.500 0.278 0.286 0.476
DR(F/T) Naive 0.180 0.887 1.611 3.037 0.832 1.488 2.805
median (Zhang-DR) 0.259 0.288 0.349 0.544 0.295 0.365 0.591
median (Sued) 0.259 0.299 0.387 0.658 0.306 0.399 0.683
median (TMLE) 0.256 0.282 0.338 0.508 0.289 0.354 0.554
DP-DR (γ=0.1\gamma=0.1) 0.183 0.195 0.381 2.121 0.196 0.315 1.734
DP-DR (γ=0.5\gamma=0.5) 0.206 0.216 0.229 0.467 0.213 0.219 0.292
DP-DR (γ=1.0\gamma=1.0) 0.247 0.251 0.259 0.301 0.249 0.251 0.258
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.183 0.198 0.426 2.273 0.202 0.358 1.888
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.206 0.212 0.217 0.491 0.212 0.220 0.318
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.247 0.247 0.247 0.264 0.248 0.253 0.278
Table A6: Results of the comparative study. Each figure is RMSE between each method and the true value. The covariates XX were generated from uniform distributions, and the outcome regression was obtained by the Gaussian MLE using non-outliers. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 3.003 (0.20) 3.759 (0.57) 4.521 (0.79) 6.004 (1.02) 3.783 (0.62) 4.556 (0.83) 6.090 (1.07)
median (Firpo) 2.985 (0.28) 3.099 (0.30) 3.233 (0.33) 3.541 (0.45) 3.130 (0.31) 3.292 (0.34) 3.666 (0.42)
median (Zhang-IPW) 2.985 (0.28) 3.099 (0.30) 3.233 (0.33) 3.541 (0.45) 3.130 (0.31) 3.292 (0.34) 3.666 (0.42)
DP-IPW (γ=0.1\gamma=0.1) 3.000 (0.20) 3.038 (0.23) 3.186 (0.54) 4.590 (1.70) 3.066 (0.24) 3.241 (0.56) 4.713 (1.72)
DP-IPW (γ=0.5\gamma=0.5) 2.991 (0.22) 2.990 (0.23) 2.994 (0.24) 3.075 (0.70) 3.017 (0.23) 3.048 (0.24) 3.185 (0.66)
DP-IPW (γ=1.0\gamma=1.0) 2.979 (0.27) 2.978 (0.27) 2.982 (0.27) 2.998 (0.43) 3.009 (0.27) 3.043 (0.28) 3.123 (0.37)
DR(T/T) Naive 3.001 (0.18) 3.757 (0.57) 4.519 (0.79) 6.003 (1.03) 3.781 (0.62) 4.555 (0.84) 6.090 (1.08)
median (Zhang-DR) 2.990 (0.27) 3.104 (0.29) 3.237 (0.33) 3.547 (0.49) 3.135 (0.30) 3.295 (0.33) 3.669 (0.44)
median (Sued) 2.990 (0.26) 3.103 (0.29) 3.237 (0.32) 3.544 (0.45) 3.135 (0.30) 3.295 (0.33) 3.668 (0.41)
median (TMLE) 2.991 (0.26) 3.104 (0.29) 3.236 (0.32) 3.530 (0.40) 3.134 (0.30) 3.292 (0.33) 3.652 (0.41)
DP-DR (γ=0.1\gamma=0.1) 2.999 (0.18) 3.035 (0.22) 3.185 (0.55) 4.582 (1.71) 3.064 (0.25) 3.238 (0.58) 4.704 (1.74)
DP-DR (γ=0.5\gamma=0.5) 2.995 (0.21) 2.992 (0.22) 2.997 (0.24) 3.083 (0.74) 3.020 (0.22) 3.051 (0.24) 3.190 (0.70)
DP-DR (γ=1.0\gamma=1.0) 2.988 (0.26) 2.985 (0.26) 2.988 (0.26) 3.008 (0.50) 3.016 (0.26) 3.049 (0.27) 3.136 (0.46)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 2.999 (0.18) 3.035 (0.22) 3.184 (0.55) 4.582 (1.71) 3.064 (0.24) 3.237 (0.57) 4.706 (1.74)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.995 (0.21) 2.992 (0.22) 2.996 (0.24) 3.081 (0.73) 3.020 (0.22) 3.051 (0.24) 3.188 (0.68)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.988 (0.26) 2.985 (0.26) 2.988 (0.26) 3.005 (0.48) 3.015 (0.26) 3.049 (0.26) 3.132 (0.42)
DR(T/F) Naive 3.005 (0.20) 3.762 (0.57) 4.524 (0.79) 6.008 (1.03) 3.786 (0.62) 4.560 (0.83) 6.095 (1.07)
median (Zhang-DR) 2.987 (0.28) 3.103 (0.31) 3.237 (0.34) 3.550 (0.49) 3.134 (0.31) 3.296 (0.34) 3.673 (0.44)
median (Sued) 2.987 (0.28) 3.102 (0.31) 3.236 (0.34) 3.544 (0.46) 3.133 (0.31) 3.295 (0.34) 3.670 (0.42)
median (TMLE) 2.996 (0.27) 3.110 (0.29) 3.243 (0.33) 3.547 (0.41) 3.140 (0.30) 3.300 (0.33) 3.671 (0.41)
DP-DR (γ=0.1\gamma=0.1) 3.003 (0.20) 3.042 (0.23) 3.192 (0.55) 4.590 (1.71) 3.070 (0.25) 3.249 (0.57) 4.714 (1.73)
DP-DR (γ=0.5\gamma=0.5) 2.994 (0.23) 2.993 (0.23) 2.997 (0.26) 3.090 (0.75) 3.020 (0.23) 3.053 (0.26) 3.204 (0.73)
DP-DR (γ=1.0\gamma=1.0) 2.982 (0.28) 2.981 (0.28) 2.985 (0.28) 3.008 (0.51) 3.011 (0.28) 3.046 (0.28) 3.136 (0.47)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 3.003 (0.20) 3.042 (0.23) 3.191 (0.54) 4.587 (1.70) 3.070 (0.25) 3.248 (0.57) 4.713 (1.73)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.994 (0.23) 2.992 (0.23) 2.997 (0.26) 3.087 (0.74) 3.020 (0.23) 3.052 (0.26) 3.201 (0.72)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.982 (0.28) 2.980 (0.28) 2.984 (0.28) 3.006 (0.50) 3.011 (0.28) 3.046 (0.28) 3.134 (0.46)
DR(F/T) Naive 3.000 (0.18) 3.737 (0.49) 4.469 (0.66) 5.911 (0.87) 3.676 (0.49) 4.344 (0.64) 5.676 (0.84)
median (Zhang-DR) 3.005 (0.26) 3.088 (0.27) 3.187 (0.29) 3.417 (0.35) 3.104 (0.28) 3.217 (0.29) 3.482 (0.34)
median (Sued) 2.999 (0.26) 3.113 (0.28) 3.243 (0.30) 3.546 (0.37) 3.125 (0.28) 3.264 (0.30) 3.587 (0.35)
median (TMLE) 3.004 (0.26) 3.085 (0.27) 3.180 (0.29) 3.391 (0.32) 3.101 (0.27) 3.210 (0.29) 3.454 (0.32)
DP-DR (γ=0.1\gamma=0.1) 3.000 (0.18) 3.007 (0.20) 3.088 (0.37) 4.422 (1.57) 3.027 (0.19) 3.104 (0.30) 4.082 (1.36)
DP-DR (γ=0.5\gamma=0.5) 3.000 (0.21) 2.970 (0.21) 2.943 (0.22) 2.907 (0.46) 2.997 (0.21) 2.996 (0.22) 2.999 (0.29)
DP-DR (γ=1.0\gamma=1.0) 3.003 (0.25) 2.970 (0.25) 2.939 (0.25) 2.868 (0.27) 2.999 (0.25) 2.998 (0.25) 2.994 (0.26)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 3.000 (0.18) 3.032 (0.20) 3.150 (0.40) 4.641 (1.57) 3.052 (0.20) 3.162 (0.32) 4.286 (1.38)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 3.000 (0.21) 2.997 (0.21) 2.999 (0.22) 3.036 (0.49) 3.024 (0.21) 3.052 (0.21) 3.121 (0.29)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 3.003 (0.25) 2.999 (0.25) 3.001 (0.25) 3.000 (0.26) 3.029 (0.25) 3.060 (0.25) 3.125 (0.25)
Table A7: Mean and SD of 10,000 simulated estimates of μ(1)\mu^{(1)}. The covariates XX were generated from uniform distributions, and the outcome regression was obtained by the Gaussian MLE using non-outliers. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 0.006 0.008 0.008 0.010 0.009 0.008 0.011
median (Firpo) 1.624 1.554 1.598 1.596 1.640 1.599 1.595
median (Zhang-IPW) 0.144 0.155 0.146 0.149 0.163 0.166 0.166
DP-IPW (γ=0.1\gamma=0.1) 28.899 40.705 57.891 72.877 41.438 58.884 72.039
DP-IPW (γ=0.5\gamma=0.5) 41.124 43.104 41.639 40.541 43.319 42.599 40.527
DP-IPW (γ=1.0\gamma=1.0) 58.435 57.102 56.009 48.982 57.823 54.624 48.146
DR(T/T) Naive 0.013 0.017 0.014 0.014 0.011 0.013 0.012
median (Zhang-DR) 0.354 0.380 0.429 0.452 0.391 0.387 0.388
median (Sued) 1.470 1.427 1.650 1.733 1.660 1.535 1.645
median (TMLE) 1019.177 987.574 1046.361 1019.792 1012.613 973.752 953.734
DP-DR (γ=0.1\gamma=0.1) 14.030 65.830 156.014 221.084 65.235 150.543 206.717
DP-DR (γ=0.5\gamma=0.5) 42.618 39.445 42.180 33.461 39.311 39.022 30.945
DP-DR (γ=1.0\gamma=1.0) 80.107 73.122 75.633 54.222 74.230 66.690 52.706
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 13.830 61.746 159.505 222.713 64.413 146.996 203.859
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 42.832 43.615 42.286 36.057 43.612 37.242 31.352
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 80.584 75.842 71.612 55.196 77.477 64.020 52.287
DR(T/F) Naive 0.011 0.013 0.018 0.014 0.013 0.013 0.018
median (Zhang-DR) 0.298 2.320 0.365 0.371 0.316 0.320 0.334
median (Sued) 1.497 4.069 1.717 1.870 1.548 1.509 1.679
median (TMLE) 908.448 921.784 963.501 973.878 904.782 896.085 914.020
DP-DR (γ=0.1\gamma=0.1) 12.200 61.403 166.398 223.529 62.714 154.528 201.548
DP-DR (γ=0.5\gamma=0.5) 42.066 38.529 41.758 32.706 40.717 34.629 31.154
DP-DR (γ=1.0\gamma=1.0) 76.853 73.440 75.127 59.673 74.651 63.882 52.368
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 12.005 64.193 164.242 216.765 61.661 152.694 206.549
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 42.311 40.010 39.141 34.312 40.385 37.031 31.584
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 77.362 74.394 73.187 54.911 75.535 67.096 51.224
DR(F/T) Naive 0.014 0.014 0.017 0.017 0.012 0.013 0.011
median (Zhang-DR) 0.368 0.431 0.412 0.431 0.390 0.379 2.279
median (Sued) 1.586 3.853 1.855 1.805 1.589 1.626 4.012
median (TMLE) 896.351 970.473 976.782 987.069 909.770 907.980 930.365
DP-DR (γ=0.1\gamma=0.1) 13.761 74.393 177.183 239.704 60.244 143.986 246.342
DP-DR (γ=0.5\gamma=0.5) 44.319 46.824 40.203 37.486 42.798 40.716 34.311
DP-DR (γ=1.0\gamma=1.0) 85.614 85.866 75.166 60.665 80.635 71.475 59.946
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 13.569 72.655 175.524 216.075 58.960 144.481 227.731
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 44.594 50.019 45.107 37.493 41.513 42.198 33.163
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 85.994 84.480 77.364 62.210 80.407 74.459 57.817
Table A8: Mean computation time (ms) of 10,000 simulations. The covariates XX were generated from uniform distributions, and the outcome regression was obtained by the Gaussian MLE over non-outliers. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 0.197 0.952 1.711 3.174 1.000 1.762 3.269
median (Firpo) 0.278 0.319 0.407 0.705 0.334 0.447 0.785
median (Zhang-IPW) 0.278 0.319 0.407 0.705 0.334 0.447 0.785
DP-IPW (γ=0.1\gamma=0.1) 0.199 0.230 0.567 2.329 0.253 0.609 2.429
DP-IPW (γ=0.5\gamma=0.5) 0.223 0.230 0.239 0.703 0.230 0.247 0.690
DP-IPW (γ=1.0\gamma=1.0) 0.273 0.273 0.273 0.432 0.272 0.279 0.394
DR(T/T) Naive 0.182 0.948 1.711 3.176 1.000 1.766 3.275
median (Zhang-DR) 0.265 0.309 0.404 0.732 0.328 0.446 0.799
median (Sued) 0.264 0.308 0.403 0.705 0.327 0.445 0.785
median (TMLE) 0.263 0.307 0.402 0.659 0.327 0.443 0.761
DP-DR (γ=0.1\gamma=0.1) 0.184 0.219 0.583 2.333 0.254 0.623 2.439
DP-DR (γ=0.5\gamma=0.5) 0.208 0.216 0.243 0.746 0.218 0.246 0.726
DP-DR (γ=1.0\gamma=1.0) 0.257 0.258 0.260 0.504 0.259 0.272 0.480
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.184 0.218 0.578 2.329 0.251 0.618 2.434
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.208 0.216 0.241 0.726 0.218 0.245 0.708
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.257 0.258 0.260 0.482 0.260 0.266 0.434
DR(T/F) Naive 0.203 0.954 1.714 3.178 1.003 1.767 3.275
median (Zhang-DR) 0.285 0.325 0.413 0.725 0.341 0.453 0.806
median (Sued) 0.284 0.324 0.412 0.710 0.340 0.453 0.792
median (TMLE) 0.266 0.313 0.408 0.697 0.331 0.449 0.787
DP-DR (γ=0.1\gamma=0.1) 0.205 0.236 0.580 2.330 0.262 0.625 2.437
DP-DR (γ=0.5\gamma=0.5) 0.230 0.236 0.258 0.750 0.237 0.269 0.755
DP-DR (γ=1.0\gamma=1.0) 0.280 0.280 0.281 0.501 0.279 0.288 0.489
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.205 0.235 0.577 2.328 0.261 0.622 2.434
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.230 0.236 0.256 0.738 0.236 0.267 0.743
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.280 0.279 0.280 0.488 0.279 0.287 0.475
DR(F/T) Naive 0.182 0.887 1.611 3.037 0.832 1.488 2.805
median (Zhang-DR) 0.260 0.285 0.344 0.535 0.292 0.359 0.582
median (Sued) 0.259 0.300 0.388 0.658 0.306 0.399 0.683
median (TMLE) 0.258 0.280 0.333 0.498 0.287 0.348 0.543
DP-DR (γ=0.1\gamma=0.1) 0.185 0.197 0.385 2.128 0.197 0.319 1.741
DP-DR (γ=0.5\gamma=0.5) 0.208 0.219 0.233 0.474 0.215 0.221 0.290
DP-DR (γ=1.0\gamma=1.0) 0.249 0.254 0.265 0.304 0.251 0.253 0.261
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 0.185 0.202 0.449 2.294 0.204 0.368 1.897
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 0.207 0.212 0.217 0.516 0.213 0.220 0.320
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 0.248 0.247 0.247 0.267 0.248 0.251 0.272
Table A9: Results of the comparative study. Each figure is RMSE between each method and the true value. The covariates XX were generated from uniform distributions, and the outcome regression was obtained by the unnormalized Gaussian modeling. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 3.003 (0.20) 3.759 (0.57) 4.521 (0.79) 6.004 (1.02) 3.783 (0.62) 4.556 (0.83) 6.090 (1.07)
median (Firpo) 2.985 (0.28) 3.099 (0.30) 3.233 (0.33) 3.541 (0.45) 3.130 (0.31) 3.292 (0.34) 3.666 (0.42)
median (Zhang-IPW) 2.985 (0.28) 3.099 (0.30) 3.233 (0.33) 3.541 (0.45) 3.130 (0.31) 3.292 (0.34) 3.666 (0.42)
DP-IPW (γ=0.1\gamma=0.1) 3.000 (0.20) 3.038 (0.23) 3.186 (0.54) 4.590 (1.70) 3.066 (0.24) 3.241 (0.56) 4.713 (1.72)
DP-IPW (γ=0.5\gamma=0.5) 2.991 (0.22) 2.990 (0.23) 2.994 (0.24) 3.075 (0.70) 3.017 (0.23) 3.048 (0.24) 3.185 (0.66)
DP-IPW (γ=1.0\gamma=1.0) 2.979 (0.27) 2.978 (0.27) 2.982 (0.27) 2.998 (0.43) 3.009 (0.27) 3.043 (0.28) 3.123 (0.37)
DR(T/T) Naive 3.001 (0.18) 3.757 (0.57) 4.519 (0.79) 6.003 (1.03) 3.781 (0.62) 4.555 (0.84) 6.090 (1.08)
median (Zhang-DR) 2.990 (0.27) 3.104 (0.29) 3.238 (0.33) 3.548 (0.49) 3.135 (0.30) 3.295 (0.33) 3.668 (0.44)
median (Sued) 2.990 (0.26) 3.104 (0.29) 3.237 (0.33) 3.544 (0.45) 3.135 (0.30) 3.295 (0.33) 3.667 (0.41)
median (TMLE) 2.991 (0.26) 3.104 (0.29) 3.236 (0.33) 3.527 (0.40) 3.134 (0.30) 3.292 (0.33) 3.647 (0.40)
DP-DR (γ=0.1\gamma=0.1) 2.999 (0.18) 3.035 (0.22) 3.185 (0.55) 4.582 (1.71) 3.064 (0.25) 3.238 (0.58) 4.706 (1.74)
DP-DR (γ=0.5\gamma=0.5) 2.995 (0.21) 2.992 (0.22) 2.996 (0.24) 3.083 (0.74) 3.020 (0.22) 3.051 (0.24) 3.190 (0.70)
DP-DR (γ=1.0\gamma=1.0) 2.988 (0.26) 2.985 (0.26) 2.988 (0.26) 3.009 (0.50) 3.016 (0.26) 3.049 (0.27) 3.136 (0.46)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 2.999 (0.18) 3.035 (0.22) 3.184 (0.55) 4.581 (1.71) 3.064 (0.24) 3.238 (0.57) 4.705 (1.74)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.995 (0.21) 2.992 (0.22) 2.996 (0.24) 3.080 (0.72) 3.020 (0.22) 3.051 (0.24) 3.187 (0.68)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.988 (0.26) 2.985 (0.26) 2.988 (0.26) 3.006 (0.48) 3.016 (0.26) 3.049 (0.26) 3.132 (0.41)
DR(T/F) Naive 3.005 (0.20) 3.761 (0.57) 4.523 (0.79) 6.007 (1.03) 3.785 (0.62) 4.559 (0.83) 6.094 (1.07)
median (Zhang-DR) 2.986 (0.29) 3.101 (0.31) 3.237 (0.34) 3.547 (0.48) 3.133 (0.31) 3.295 (0.34) 3.672 (0.44)
median (Sued) 2.987 (0.28) 3.101 (0.31) 3.236 (0.34) 3.544 (0.46) 3.133 (0.31) 3.295 (0.34) 3.670 (0.42)
median (TMLE) 2.997 (0.27) 3.111 (0.29) 3.245 (0.33) 3.550 (0.43) 3.141 (0.30) 3.302 (0.33) 3.672 (0.41)
DP-DR (γ=0.1\gamma=0.1) 3.002 (0.21) 3.041 (0.23) 3.192 (0.55) 4.588 (1.71) 3.070 (0.25) 3.248 (0.57) 4.715 (1.73)
DP-DR (γ=0.5\gamma=0.5) 2.993 (0.23) 2.992 (0.24) 2.997 (0.26) 3.088 (0.74) 3.019 (0.24) 3.052 (0.26) 3.202 (0.73)
DP-DR (γ=1.0\gamma=1.0) 2.981 (0.28) 2.980 (0.28) 2.984 (0.28) 3.007 (0.50) 3.010 (0.28) 3.045 (0.28) 3.134 (0.47)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 3.002 (0.20) 3.041 (0.23) 3.191 (0.54) 4.588 (1.70) 3.069 (0.25) 3.247 (0.57) 4.714 (1.73)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.993 (0.23) 2.992 (0.24) 2.997 (0.26) 3.085 (0.73) 3.019 (0.24) 3.052 (0.26) 3.200 (0.72)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.981 (0.28) 2.980 (0.28) 2.983 (0.28) 3.005 (0.49) 3.010 (0.28) 3.045 (0.28) 3.133 (0.46)
DR(F/T) Naive 3.000 (0.18) 3.737 (0.49) 4.469 (0.66) 5.911 (0.87) 3.676 (0.49) 4.344 (0.64) 5.676 (0.84)
median (Zhang-DR) 2.997 (0.26) 3.079 (0.27) 3.177 (0.29) 3.405 (0.35) 3.095 (0.28) 3.207 (0.29) 3.471 (0.34)
median (Sued) 3.000 (0.26) 3.113 (0.28) 3.244 (0.30) 3.547 (0.37) 3.125 (0.28) 3.265 (0.30) 3.587 (0.35)
median (TMLE) 2.996 (0.26) 3.076 (0.27) 3.170 (0.29) 3.378 (0.32) 3.092 (0.27) 3.200 (0.29) 3.440 (0.32)
DP-DR (γ=0.1\gamma=0.1) 2.999 (0.19) 3.006 (0.20) 3.088 (0.38) 4.435 (1.57) 3.026 (0.20) 3.104 (0.30) 4.092 (1.36)
DP-DR (γ=0.5\gamma=0.5) 2.993 (0.21) 2.964 (0.22) 2.936 (0.22) 2.901 (0.46) 2.990 (0.21) 2.990 (0.22) 2.993 (0.29)
DP-DR (γ=1.0\gamma=1.0) 2.991 (0.25) 2.957 (0.25) 2.927 (0.25) 2.856 (0.27) 2.987 (0.25) 2.986 (0.25) 2.982 (0.26)
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 3.003 (0.18) 3.035 (0.20) 3.160 (0.42) 4.658 (1.58) 3.051 (0.20) 3.160 (0.33) 4.284 (1.40)
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 2.998 (0.21) 2.995 (0.21) 2.997 (0.22) 3.041 (0.51) 3.017 (0.21) 3.042 (0.22) 3.105 (0.30)
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 2.996 (0.25) 2.992 (0.25) 2.994 (0.25) 2.996 (0.27) 3.017 (0.25) 3.044 (0.25) 3.102 (0.25)
Table A10: Mean and SD of 10,000 simulated estimates of μ(1)\mu^{(1)}. The covariates XX were generated from uniform distributions, and the outcome regression was obtained by the unnormalized Gaussian modeling. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.
No contam. Homogeneous Heterogeneous
ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20 ε=0.05\varepsilon=0.05 ε=0.10\varepsilon=0.10 ε=0.20\varepsilon=0.20
IPW(T/-) Naive 0.012 0.009 0.011 0.008 0.009 0.008 0.008
median (Firpo) 2.061 2.018 2.001 2.014 1.593 1.588 1.591
median (Zhang-IPW) 0.159 0.158 0.155 0.174 0.160 0.162 0.162
DP-IPW (γ=0.1\gamma=0.1) 32.502 46.715 64.496 80.991 41.595 58.327 72.898
DP-IPW (γ=0.5\gamma=0.5) 45.786 48.372 46.799 45.370 43.191 41.859 40.298
DP-IPW (γ=1.0\gamma=1.0) 67.400 64.207 61.061 54.756 57.049 54.186 47.922
DR(T/T) Naive 0.020 0.017 0.015 0.014 0.015 0.014 0.016
median (Zhang-DR) 0.426 0.435 0.413 0.432 0.371 0.372 0.372
median (Sued) 1.769 1.883 1.684 1.763 1.524 1.521 1.567
median (TMLE) 1120.595 1076.392 1039.786 1029.243 991.954 961.859 947.842
DP-DR (γ=0.1\gamma=0.1) 15.718 71.203 165.047 228.545 66.563 152.820 207.025
DP-DR (γ=0.5\gamma=0.5) 50.227 46.273 41.325 36.313 43.227 37.408 29.929
DP-DR (γ=1.0\gamma=1.0) 92.921 85.411 73.518 56.090 74.702 65.099 53.264
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 15.929 72.355 166.803 229.476 66.591 151.084 203.590
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 50.950 46.989 43.582 33.812 45.355 37.982 32.001
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 93.544 82.191 72.519 58.983 74.507 67.979 54.172
DR(T/F) Naive 0.015 0.014 0.014 0.017 0.012 0.013 0.012
median (Zhang-DR) 0.344 0.369 0.379 0.348 0.310 0.316 0.313
median (Sued) 1.674 1.913 1.710 1.805 1.645 1.543 1.579
median (TMLE) 1004.081 999.738 972.337 971.670 918.025 905.992 907.104
DP-DR (γ=0.1\gamma=0.1) 13.940 71.371 157.230 220.831 63.078 148.782 198.135
DP-DR (γ=0.5\gamma=0.5) 46.461 45.985 38.650 34.250 40.784 36.915 30.849
DP-DR (γ=1.0\gamma=1.0) 82.531 80.473 72.655 57.973 74.487 64.093 50.140
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 13.889 68.556 163.008 221.689 63.626 148.044 204.685
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 46.726 44.754 39.936 35.078 40.930 37.054 29.266
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 83.223 81.341 72.481 56.218 73.313 64.716 49.878
DR(F/T) Naive 0.013 0.014 0.014 0.018 0.012 0.017 0.012
median (Zhang-DR) 0.411 0.429 0.435 0.435 0.383 0.366 0.374
median (Sued) 1.719 1.848 1.776 1.807 1.503 1.684 1.580
median (TMLE) 974.546 977.795 969.097 988.810 892.363 913.748 924.284
DP-DR (γ=0.1\gamma=0.1) 16.894 76.806 185.199 241.377 61.562 142.078 254.437
DP-DR (γ=0.5\gamma=0.5) 56.429 50.977 44.908 40.217 46.906 41.295 34.944
DP-DR (γ=1.0\gamma=1.0) 92.542 89.220 78.322 64.042 80.361 74.055 58.842
ε\varepsilonDP-DR (γ=0.1\gamma=0.1) 17.194 76.893 177.770 220.029 61.670 147.734 233.473
ε\varepsilonDP-DR (γ=0.5\gamma=0.5) 55.339 51.744 45.734 37.134 46.247 41.614 32.412
ε\varepsilonDP-DR (γ=1.0\gamma=1.0) 93.373 88.307 80.103 64.488 80.508 74.952 61.334
Table A11: Mean computation time (ms) of 10,000 simulations. The covariates XX were generated from uniform distributions, and the outcome regression was obtained by the unnormalized Gaussian modeling. The characters "T" and "F" denote the correct and the incorrect modeling, respectively.