This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Causal Mediation Analysis with Multiple Treatments and Latent Confounders

Wei Li1, 2, Chunchen Liu2, Zhi Geng1, John Murray3
1School of Mathematical Sciences, Peking University, Beijing, China
2 Department of Data Mining, NEC Laboratory, China
3 Department of Computer Science, San Jose State University, USA
Abstract

Causal mediation analysis is used to evaluate direct and indirect causal effects of a treatment on an outcome of interest through an intermediate variable or a mediator. It is difficult to identify the direct and indirect causal effects because the mediator cannot be randomly assigned in many real applications. In this article, we consider a causal model including latent confounders between the mediator and the outcome. We present sufficient conditions for identifying the direct and indirect effects and propose an approach for estimating them. The performance of the proposed approach is evaluated by simulation studies. Finally, we apply the approach to a data set of the customer loyalty survey by a telecom company.

Introduction

Randomized experiments are typically seen as a gold standard for evaluating the causal effect of a treatment on an outcome. Although the estimation of causal effect allows researchers to examine whether the treatment causally affects the outcome, it provides only a black-box view of causality and cannot tell us how and why such an effect occurs. Mediation analysis seeks to open up the black box and helps us to understand how the treatment impacts the outcome. In particular, mediation analysis is an important tool for evaluating direct and indirect causal effects of the treatment on the outcome through an intermediate variable or a mediator. A traditional approach to mediation analysis, which was commonly used in social psychological research (Baron and Kenny, 1986; MacKinnon, Fairchild, and Fritz, 2007), involves three regression models: a regression equation of mediator on treatment, a regression equation of outcome on both treatment and mediator, and a regression model of treatment on outcome. This regression-based traditional approach is often known as the linear structural equation modeling (LSEM) method. Usually, the LSEM framework does not consider latent confounders which affect both the mediator and the outcome in the models. However, in real applications, the mediator cannot be randomly assigned to individuals, and there may exist latent variables confounding the mediator-outcome relationship. The presence of such latent confounders often induces the non-identifiability of the direct and indirect causal effects of treatment on outcome. Apart from this, another drawback of the LSEM framework is that it cannot offer a general definition of these causal effects that are applicable beyond specific statistical models. Alternatively, a large number of scholars adopted the potential outcome framework to define the direct and indirect causal effects in causal mediation analysis (Robins and Greenland, 1992; VanderWeele, 2009; Imai, Keele, and Yamamoto, 2010; VanderWeele and Vansteelandt, 2013; Li and Zhou, 2017).

Causal mediation analysis distinguishes between natural and controlled effects which are defined for different purposes (Pearl, 2001). For example, the natural direct effect captures the effect of the treatment when one intervenes to set the mediator to its naturally occurring level, while the controlled direct effect arises after intervening the mediator to a fixed level, which is particularly relevant for policy making and requires that both the treatment and mediator can be directly manipulated. The natural direct and indirect effects are more useful for understanding the underlying mechanism by which the treatment operates. This is because the total causal effect can be decomposed into the sum of these two natural effects.

The identifiability of direct and indirect causal effects requires the sequential ignorability assumption (Imai, Keele, and Tingley, 2010) or some other similar assumptions (Pearl, 2001; VanderWeele and Vansteelandt, 2009). The sequential ignorability assumption means that the treatment is randomly assigned and the mediator is also randomly assigned conditional on the assigned treatment and the measured covariates. Under the sequential ignorability assumption, the parameters in the LSEM approach can also have causal interpretations. However, this assumption is too stringent and may not hold even in randomized experiments. It is because that there may exist some latent confounders between the mediator and outcome variables. For example, blood pressure as a mediator between treatment and heart disease cannot be randomly assigned to patients, and there may be latent confounders (e.g., diets, habits and genes) affecting both blood pressure and heart disease. To address such latent confounding problems, one possible way is to perform a sensitivity analysis to evaluate how sensitive the result is to the violation of the sequential ignorability assumption (VanderWeele, 2010; Imai, Keele, and Tingley, 2010; Li and Zhou, 2017). In order to obtain identifiability results for the case with latent confounders, Ten Have et al. (2007) proposed a linear rank preserving model approach for assessment of causal mediation effects, but they made some no-interaction assumptions which are untestable. Following this line, Zheng and Zhou (2015) extended this model to a more general setting but still required some other untestable assumptions. Another way of dealing with latent confounders is to use baseline covariates interacted with the random treatment assignment as an instrumental variable. Specifically, it assumes that there exists a baseline covariate which interacts with the treatment in predicting the mediator but is not predictive to the outcome (Dunn and Bentall, 2007; Albert, 2008; Small, 2012).

In this article, we focus on the identification and estimation of natural direct and indirect causal effects in causal mediation analysis. We propose an approach to dealing with multiple and correlated treatments for causal mediation analysis. We give the formal definitions of causal mediation effects for each treatment while accounting for possible correlations with other treatments. Besides, our approach can also be applicable to the case with latent confounders between mediator and outcome variables. We allow for interactions between treatments and covariates in both mediator and outcome models, and we utilize the information from multiple treatments to identify direct and indirect causal effects and obtain consistent estimates of direct and indirect effects.

Preliminaries

Observed random variables

Let 𝐙i\mathbf{Z}_{i} denote the observed treatments assigned to individual ii which is a vector with J2J\geq 2 treatment variables, i.e., 𝐙i=(Zi1,Zi2,,ZiJ)\mathbf{Z}_{i}=(Z_{i1},Z_{i2},\ldots,Z_{iJ})^{\top}. Let YiY_{i} denote the observed outcome for individual ii and MiM_{i} denote some observed intermediate variable on the causal path from the treatments to the outcome. To streamline notation of the random variables, we suppress subscript ii for individual below.

The components of 𝐙\mathbf{Z} can be correlated with each other through an unobserved common cause of them. For each j=1,,Jj=1,\ldots,J, assume that Pr(Zj=zj)>0\Pr(Z_{j}=z_{j})>0 for any zjz_{j}th treatment level with zj{1,,Kj}z_{j}\in\{1,\ldots,K_{j}\}. Both YY and MM are assumed to be continuous and there may exist a latent confounder UU confounding the relationship between these two variables. In general, we may also consider MM as a vector of mediator variables and our results in this article can be straightforwardly generalized from the setting of a single mediator to the setting of multiple mediators.

Potential outcomes and assumptions

To formally define causal effects in causal mediation analysis, we make use of the concept of potential outcomes. Potential outcomes present the values of a outcome variable for each individual under varying levels of a treatment variable. We can observe only one of these potential outcomes but can never observe all of them because it is impossible for us to unwind time and go back and manipulate the individual to other treatment levels.

We first make the stable unit treatment value assumption (SUTVA). This assumption requires that the value of the outcome should not be affected by the manner of manipulations providing the same value for the treatment variable, that is, there is only one version of the potential outcomes and there is no interference between individuals Rubin (1980). The SUTVA allows us to uniquely define the potential values for the mediator M(𝐳)M(\mathbf{z}) and the potential outcome Y(𝐳)Y(\mathbf{z}) if an individual were to receive treatment 𝐙=𝐳\mathbf{Z}=\mathbf{z}. Let Y(𝐳,m)Y(\mathbf{z},m) denote the potential outcome for an individual that would occur if the treatment 𝐙\mathbf{Z} were set to level 𝐳\mathbf{z}, and if the mediator MM were manipulated to level mm. In contrast, let Y(𝐳,M(𝐳))Y(\mathbf{z},M(\mathbf{z}^{*})) denote the potential outcome for an individual, where we do not specify the actual level of MM, but set it to what it would have been if treatment had been 𝐙=𝐳\mathbf{Z}=\mathbf{z}^{*}. To connect the observed random variables with corresponding potential outcomes, we also make the consistency assumption, namely that M(𝐳)=MM(\mathbf{z})=M, Y(𝐳)=YY(\mathbf{z})=Y if 𝐙=𝐳\mathbf{Z}=\mathbf{z}, and Y(𝐳,m)=YY(\mathbf{z},m)=Y if 𝐙=𝐳\mathbf{Z}=\mathbf{z}, M=mM=m. According to this assumption, we note that the observed outcome YY is just one realization of the potential outcome Y(𝐳,m)Y(\mathbf{z},m) with observed treatment level 𝐙=𝐳\mathbf{Z}=\mathbf{z} and mediator level M=mM=m.

We also assume that the underlying causal model corresponds to a directed acyclic graph (DAG, Pearl (2000)). The DAG is a useful tool for visual representations of qualitative causal relationships between the variables of interest. We show the corresponding DAG of this context in Figure 1. Note that there are no causal relationships between the treatment variables in 𝐙\mathbf{Z}, but they may be associated with each other through some unobserved variable. We use CC to denote this unobserved common cause of the treatment variables in 𝐙\mathbf{Z}, and it is assumed to be independent of the other variables conditional on 𝐙\mathbf{Z}. When the treatment variables in 𝐙\mathbf{Z} are randomly assigned, CC is an empty set, and this assumption automatically holds. The symbol ‘ \vdots ’ in the DAG represents other undisplayed treatment variables ZjZ_{j}’s, each of which has the directed edges CZjC\rightarrow Z_{j}, ZjMZ_{j}\rightarrow M, and ZjYZ_{j}\rightarrow Y.

CCZ1Z_{1}\vdotsZJZ_{J}MMYYUU
Figure 1: A DAG depicts the causal relationships for mediation analysis, where the variables in circle are unobserved.

Definitions of direct and indirect effects

Using the nested potential outcomes notation, we can define the causal parameters of interest in this multiple-treatment model. We first describe the definition of the average causal effect in a single-treatment setting. We use the difference E{Y(z)Y(z)}\textnormal{E}\{Y(z)-Y(z^{*})\} to represent the average causal effect of treatment level zz versus treatment level zz^{*}. We now extend this definition to the setting with multiple treatment variables. For notational simplicity, we let 𝐙j=(Z1,,Zj1,Zj+1,,ZJ)\mathbf{Z}_{-j}=(Z_{1},\ldots,Z_{j-1},Z_{j+1},\ldots,Z_{J}) and 𝐳j=(z1,,zj1,zj+1,,zJ)\mathbf{z}_{-j}=(z_{1},\ldots,z_{j-1},z_{j+1},\ldots,z_{J}). Then we let CTE(zj,zj𝐳j)CTE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j}) denote the conditional total causal effect of ZjZ_{j} on YY under two differing levels zjz_{j} and zjz_{j}^{*} of ZjZ_{j} while conditioning on the interventions 𝐳j\mathbf{z}_{-j} for the other treatments. The formulation is given as follows:

CTE(zj,zj𝐳j)=E{Y(zj,𝐳j)}E{Y(zj,𝐳j)}.\displaystyle CTE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j})=\textnormal{E}\{Y(z_{j},\mathbf{z}_{-j})\}-\textnormal{E}\{Y(z_{j}^{*},\mathbf{z}_{-j})\}.\vspace{-1em}

In addition, we can also define the average total causal effect, TE(zj,zj)TE(z_{j},z_{j}^{*}), which is free of other treatment variables, by taking expectation of CTE(zj,zj𝐙j)CTE(z_{j},z_{j}^{*}\mid\mathbf{Z}_{-j}) with respect to 𝐙j\mathbf{Z}_{-j}, i.e.,

TE(zj,zj)=E{CTE(zj,zj𝐙j)}.\displaystyle TE(z_{j},z_{j}^{*})=\textnormal{E}\{CTE(z_{j},z_{j}^{*}\mid\mathbf{Z}_{-j})\}.

We next define the conditional natural direct and indirect effects of ZjZ_{j} on YY. Let CNDE(zj,zj𝐳j)CNDE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j}) denote the conditional natural direct effect of ZjZ_{j} under two different levels zjz_{j} and zjz_{j}^{*} while setting the other treatments to 𝐳j\mathbf{z}_{-j} and setting MM to the values attained under fixed treatment levels 𝐳\mathbf{z}. We use CNIE(zj,zj𝐳j)CNIE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j}) to denote the conditional natural indirect effect which is defined as the difference between two averaged potential outcomes with treatments set to zj,𝐳jz_{j}^{*},\mathbf{z}_{-j} and the mediator MM set to the values attained under differing treatment levels zjz_{j} and zjz_{j}^{*} for ZjZ_{j} and 𝐳j\mathbf{z}_{-j} for others. Below we formally give their definitions:

CNDE(zj,zj𝐳j)=\displaystyle CNDE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j})= E{Y(zj,𝐳j,M(𝐳))}\displaystyle\textnormal{E}\{Y(z_{j},\mathbf{z}_{-j},M(\mathbf{z}))\}
E{Y(zj,𝐳j,M(𝐳))},\displaystyle~{}-\textnormal{E}\{Y(z_{j}^{*},\mathbf{z}_{-j},M(\mathbf{z}))\},
CNIE(zj,zj𝐳j)=\displaystyle CNIE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j})= E{Y(zj,𝐳j,M(zj,𝐳j))}\displaystyle\textnormal{E}\{Y(z_{j}^{*},\mathbf{z}_{-j},M(z_{j},\mathbf{z}_{-j}))\}
E{Y(zj,𝐳j,M(zj,𝐳j))}.\displaystyle~{}-\textnormal{E}\{Y(z_{j}^{*},\mathbf{z}_{-j},M(z_{j}^{*},\mathbf{z}_{-j}))\}.

Similarly, we also give the the definitions of the average natural direct and indirect effects of ZjZ_{j} on YY which do not depend on other treatment variables:

NDE(zj,zj)=\displaystyle NDE(z_{j},z_{j}^{*})={} E{CNDE(zj,zj𝐙j)},\displaystyle\textnormal{E}\{CNDE(z_{j},z_{j}^{*}\mid\mathbf{Z}_{-j})\},
NIE(zj,zj)=\displaystyle NIE(z_{j},z_{j}^{*})={} E{CNIE(zj,zj𝐙j)}.\displaystyle\textnormal{E}\{CNIE(z_{j},z_{j}^{*}\mid\mathbf{Z}_{-j})\}.

Under the composition assumption (Pearl, 2000) that Y(𝐳)=Y(𝐳,M(𝐳))Y(\mathbf{z})=Y(\mathbf{z},M(\mathbf{z})), we immediately have the following decompositions:

CTE(zj,zj𝐳j)=\displaystyle CTE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j})= CNDE(zj,zj𝐳j)\displaystyle CNDE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j})
+CNIE(zj,zj𝐳j),\displaystyle~{}+CNIE(z_{j},z_{j}^{*}\mid\mathbf{z}_{-j}),
TE(zj,zj)=\displaystyle TE(z_{j},z_{j}^{*})= NDE(zj,zj)+NIE(zj,zj).\displaystyle NDE(z_{j},z_{j}^{*})+NIE(z_{j},z_{j}^{*}).

Methods

In this section, we first study the identification of the direct and indirect causal effects for some commonly-used models. We then provide an approach for estimation and also discuss inference procedures via resampling techniques.

Causal models

We use potential outcomes notation to construct causal models that allow us to directly specify the causal effects of interest as functions of parameters in the models. We consider the following causal models including a latent confounder UU for potential outcomes, M(𝐳)M(\mathbf{z}) and Y(𝐳,m)Y(\mathbf{z},m), respectively:

M(𝐳)=gMc(𝐳)+U+ϵ(𝐳),Y(𝐳,m)=gYc(𝐳)+βcm+h(U)+η(𝐳,m),\displaystyle\begin{aligned} M(\mathbf{z})&=g_{M}^{c}(\mathbf{z})+U+\epsilon(\mathbf{z}),\\ Y(\mathbf{z},m)&=g_{Y}^{c}(\mathbf{z})+\beta^{c}m+h(U)+\eta(\mathbf{z},m),\end{aligned} (1)

where E(U)=E{h(U)}=0\textnormal{E}(U)=\textnormal{E}\{h(U)\}=0, gMc()g_{M}^{c}(\cdot), gYc()g_{Y}^{c}(\cdot) and h()h(\cdot) are unknown functions. Note that the causal models in (1) allow the individual natural direct, indirect and total causal effects vary individual by individual. Similar models were also proposed in Lindquist (2012). Here we use the superscript ‘cc’ to indicate causal parameters in models (1) for potential outcomes to distinguish the parameters in SEMs for observed variables.

We assume that E{ϵ(𝐳)}=0\textnormal{E}\{\epsilon(\mathbf{z})\}=0 and E{η(𝐳,m)}=0\textnormal{E}\{\eta(\mathbf{z},m)\}=0 and that ϵ(𝐳)\epsilon(\mathbf{z}) and η(𝐳,m)\eta(\mathbf{z},m) are mutually independent and are also independent of (𝐙,M,U)(\mathbf{Z},M,U) for all values of 𝐳\mathbf{z} and the pair (𝐳,m)(\mathbf{z},m). Then we can obtain E{η(𝐳,M(𝐳))}=0\textnormal{E}\{\eta(\mathbf{z},M(\mathbf{z}^{*}))\}=0 for all values of 𝐳\mathbf{z} and 𝐳\mathbf{z}^{*}. With these conditions, we can directly write the average natural direct, indirect and total causal effects of ZjZ_{j} on YY respectively as

NDE(zj,zj)\displaystyle NDE(z_{j},z_{j}^{*}) =E{gYc(zj,𝐙j)}E{gYc(zj,𝐙j)},\displaystyle=\textnormal{E}\{g_{Y}^{c}(z_{j},\mathbf{Z}_{-j})\}-\textnormal{E}\{g_{Y}^{c}(z_{j}^{*},\mathbf{Z}_{-j})\}, (2)
NIE(zj,zj)\displaystyle NIE(z_{j},z_{j}^{*}) =βc[E{gMc(zj,𝐙j)}E{gMc(zj,𝐙j)}],\displaystyle=\beta^{c}\big{[}\textnormal{E}\{g_{M}^{c}(z_{j},\mathbf{Z}_{-j})\}-\textnormal{E}\{g_{M}^{c}(z_{j}^{*},\mathbf{Z}_{-j})\}\big{]},
TE(zj,zj)\displaystyle TE(z_{j},z_{j}^{*}) =NDE(zj,zj)+NIE(zj,zj).\displaystyle=NDE(z_{j},z_{j}^{*})+NIE(z_{j},z_{j}^{*}).

The identifiability of these causal effects relies on the identifiability of gMc()g_{M}^{c}(\cdot), gYc()g_{Y}^{c}(\cdot) and βc\beta^{c}. These parameters encode the causal relationships in models (1) for the potential outcomes. Since the variables in the models are not observed, these parameters cannot be estimated by the ordinary approaches for the models with observed variables. Hence, additional assumptions or (and) more feasible models are required to identify and estimate these parameters.

SEMs for observed variables

According to the path diagram in Figure 1, we consider the SEMs for observed variables and a latent confounder UU as follows:

M(𝐙)\displaystyle M(\mathbf{Z}) =gMs(𝐙)+U+ϵ,\displaystyle=g_{M}^{s}(\mathbf{Z})+U+\epsilon, (3)
Y(𝐙,M)\displaystyle Y(\mathbf{Z},M) =gYs(𝐙)+βsM+h(U)+η,\displaystyle=g_{Y}^{s}(\mathbf{Z})+\beta^{s}M+h(U)+\eta,

where E(ϵ𝐙,U)=0\textnormal{E}(\epsilon\mid\mathbf{Z},U)=0, E(η𝐙,M,U)=0\textnormal{E}(\eta\mid\mathbf{Z},M,U)=0, gMs()g_{M}^{s}(\cdot) and gYs()g_{Y}^{s}(\cdot) are unknown functions, and the variables M(𝐙)M(\mathbf{Z}) and Y(𝐙,M)Y(\mathbf{Z},M) are observed because of the consistency assumption: M(𝐙)=MM(\mathbf{Z})=M and Y(𝐙,M)=YY(\mathbf{Z},M)=Y. The SEMs are built upon observed variables, which are different from those for the potential outcomes defined in the previous subsection. The superscript ‘ss’ denotes the parameters occurring in the SEMs, and these parameters are in principle estimable from observed data.

The SEMs in (3) assumes constant individual natural direct, indirect and total causal effects, which is more restrictive than that imposed in the causal models (1). However, we are interested in the average versions of these causal effects, and the average causal effects can be expressed as causal parameters {gMc()\{g_{M}^{c}(\cdot), gYc()g_{Y}^{c}(\cdot), βc}\beta^{c}\} in models (1) according to (2). In general, the parameters {gMs(),gYs(),βs}\{g_{M}^{s}(\cdot),g_{Y}^{s}(\cdot),\beta^{s}\} in the SEMs (3) are not equal to their counterparts {gMc()\{g_{M}^{c}(\cdot), gYc()g_{Y}^{c}(\cdot), βc}\beta^{c}\} in the causal models. However, under the assumptions encoded by the DAG in Figure 1, we can establish the equality between these two sets of parameters.

Equivalence between parameters in causal models and SEMs

Under the models in (3), we can write the parameters {gMs(),gYs(),βs}\{g_{M}^{s}(\cdot),g_{Y}^{s}(\cdot),\beta^{s}\} as

gMs(𝐳)=E{M(𝐙)𝐙=𝐳},\displaystyle g_{M}^{s}(\mathbf{z})=\textnormal{E}\{M(\mathbf{Z})\mid\mathbf{Z}=\mathbf{z}\},{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}
βs=1mm[E{Y(𝐙,M(𝐙))M(𝐙)=m,𝐙=𝐳,U}\displaystyle\beta^{s}=\frac{1}{m-m^{*}}\big{[}\textnormal{E}\{Y(\mathbf{Z},M(\mathbf{Z}))\mid M(\mathbf{Z})=m,\mathbf{Z}=\mathbf{z},U\}
E{Y(𝐙,M(𝐙))M(𝐙)=m,𝐙=𝐳,U}],\displaystyle-\textnormal{E}\{Y(\mathbf{Z},M(\mathbf{Z}))\mid M(\mathbf{Z})=m^{*},\mathbf{Z}=\mathbf{z},U\}\big{]},
gYs(𝐳)=E{Y(𝐙,M(𝐙))M(𝐙)=m,𝐙=𝐳,U}\displaystyle g_{Y}^{s}(\mathbf{z})=\textnormal{E}\{Y(\mathbf{Z},M(\mathbf{Z}))\mid M(\mathbf{Z})=m,\mathbf{Z}=\mathbf{z},U\}
βsmh(U).\displaystyle-\beta^{s}m-h(U).

Noting first from the DAG that M(𝐳)𝐙M(\mathbf{z})\mathbin{\mathchoice{\hbox to0.0pt{\hbox{\set@color$\displaystyle\perp$}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$\displaystyle\perp$}}{\hbox to0.0pt{\hbox{\set@color$\textstyle\perp$}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$\textstyle\perp$}}{\hbox to0.0pt{\hbox{\set@color$\scriptstyle\perp$}\hss}\kern 2.36812pt{}\kern 2.36812pt\hbox{\set@color$\scriptstyle\perp$}}{\hbox to0.0pt{\hbox{\set@color$\scriptscriptstyle\perp$}\hss}\kern 1.63437pt{}\kern 1.63437pt\hbox{\set@color$\scriptscriptstyle\perp$}}}\mathbf{Z} and under the assumption that E(U+ϵ𝐙=𝐳)=0\textnormal{E}(U+\epsilon\mid\mathbf{Z}=\mathbf{z})=0 for any value of 𝐳\mathbf{z}, we immediately have the following result:

gMs(𝐳)=E{M(𝐙)𝐙=𝐳}=E{M(𝐳)}=gMc(𝐳).\displaystyle g_{M}^{s}(\mathbf{z})=\textnormal{E}\{M(\mathbf{Z})\mid\mathbf{Z}=\mathbf{z}\}=\textnormal{E}\{M(\mathbf{z})\}=g_{M}^{c}(\mathbf{z}).

We also note that the potential outcomes for the mediator and outcome are conditionally independent given the latent confounder UU, i.e., Y(𝐳,m)M(𝐳)UY(\mathbf{z},m)\mathbin{\mathchoice{\hbox to0.0pt{\hbox{\set@color$\displaystyle\perp$}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$\displaystyle\perp$}}{\hbox to0.0pt{\hbox{\set@color$\textstyle\perp$}\hss}\kern 3.46875pt{}\kern 3.46875pt\hbox{\set@color$\textstyle\perp$}}{\hbox to0.0pt{\hbox{\set@color$\scriptstyle\perp$}\hss}\kern 2.36812pt{}\kern 2.36812pt\hbox{\set@color$\scriptstyle\perp$}}{\hbox to0.0pt{\hbox{\set@color$\scriptscriptstyle\perp$}\hss}\kern 1.63437pt{}\kern 1.63437pt\hbox{\set@color$\scriptscriptstyle\perp$}}}M(\mathbf{z})\mid U for any value of the pair (𝐳,m)(\mathbf{z},m). Combining this with the ignorable treatment assignment assumption, we have

E{Y(𝐙,M(𝐙))M(𝐙)=m,𝐙=𝐳,U}\displaystyle\textnormal{E}\{Y(\mathbf{Z},M(\mathbf{Z}))\mid M(\mathbf{Z})=m,\mathbf{Z}=\mathbf{z},U\}
=\displaystyle={} E{Y(𝐳,m)M(𝐳)=m,U}=E{Y(𝐳,m)U}.\displaystyle\textnormal{E}\{Y(\mathbf{z},m)\mid M(\mathbf{z})=m,U\}=\textnormal{E}\{Y(\mathbf{z},m)\mid U\}.

Consequently,

βs\displaystyle\beta^{s} =1mm[E{Y(𝐳,m)M(𝐳)=m,U}\displaystyle=\frac{1}{m-m^{*}}\big{[}\textnormal{E}\{Y(\mathbf{z},m)\mid M(\mathbf{z})=m,U\}
E{Y(𝐳,m)M(𝐳)=m,U}]\displaystyle~{}~{}~{}~{}~{}~{}~{}-\textnormal{E}\{Y(\mathbf{z},m^{*})\mid M(\mathbf{z})=m^{*},U\}\big{]}
=1mm[E{Y(𝐳,m)U}E{Y(𝐳,m)U}]\displaystyle=\frac{1}{m-m^{*}}\big{[}\textnormal{E}\{Y(\mathbf{z},m)\mid U\}-\textnormal{E}\{Y(\mathbf{z},m^{*})\mid U\}\big{]}
=βc.\displaystyle=\beta^{c}.

In addition, it should also be noted that based on the previous results, we can also show the equality of parameter gYs()g_{Y}^{s}(\cdot) with parameter gYc()g_{Y}^{c}(\cdot) as follows:

gYs(𝐳)\displaystyle g_{Y}^{s}(\mathbf{z}) =E{Y(𝐳,m)U}βsmh(U)=gYc(𝐳).\displaystyle=\textnormal{E}\{Y(\mathbf{z},m)\mid U\}-\beta^{s}m-h(U)=g_{Y}^{c}(\mathbf{z}).

Until now, we have shown the equivalence between parameters of the SEMs in (3) with the corresponding parameters of the causal models in (1). Since the SEMs include the unobserved variable UU, the parameters are still unidentifiable without additional conditions.

Identification of parameters

In this subsection, we give conditions under which the parameters of the SEMs are identifiable from observed data. Apparently, gMs()g_{M}^{s}(\cdot) is identifiable due to the condition that E(U+ϵ𝐙)=0\textnormal{E}(U+\epsilon\mid\mathbf{Z})=0 and can be written as follows:

gMs(𝐳)=E(M𝐙=𝐳).g_{M}^{s}(\mathbf{z})=\textnormal{E}(M\mid\mathbf{Z}=\mathbf{z}).

In order to guarantee the identifiability of βs\beta^{s} and gYs()g_{Y}^{s}(\cdot), we impose the following condition.

Condition 1.

𝒢Ms:={gMs()}\mathscr{G}_{M}^{s}:=\{g_{M}^{s}(\cdot)\} is a function space with finite dimension, and the space 𝒢Ys:={gYs()}\mathscr{G}_{Y}^{s}:=\{g_{Y}^{s}(\cdot)\} is a proper subspace of 𝒢Ms\mathscr{G}_{M}^{s}.

The condition 1 implies that the number of the basis functions of 𝒢Ms\mathscr{G}_{M}^{s} is finite and a proper subset of these basis functions generates the subspace 𝒢Ys\mathscr{G}_{Y}^{s}. This may be not a stringent condition and can be satisfied in a variety of cases. For example, suppose that gMs(𝐳)g_{M}^{s}(\mathbf{z}) is a polynomial function of degree 2 in each component of 𝐳\mathbf{z}. Then, a linear function gYs(𝐳)g_{Y}^{s}(\mathbf{z}) of the components satisfies Condition 1.

Theorem 1.

Given the specified DAG in Figure 1 and models in (3), the parameters βs\beta^{s} and gYs()g_{Y}^{s}(\cdot) are identifiable under Condition 1.

Proof.

Suppose that the functions ϕ1(𝐳),,ϕL(𝐳)\phi_{1}(\mathbf{z}),\ldots,\phi_{L}(\mathbf{z}) are composed of the basis functions of 𝒢Ms\mathscr{G}_{M}^{s}. Since 𝒢Ys\mathscr{G}_{Y}^{s} is a proper subset of 𝒢Ms\mathscr{G}_{M}^{s}, we assume without loss of generality that the basis functions of 𝒢Ys\mathscr{G}_{Y}^{s} are ϕ1(𝐳),,ϕL1(𝐳)\phi_{1}(\mathbf{z}),\ldots,\phi_{L_{1}}(\mathbf{z}), where L1<LL_{1}<L.

By definition, there exist two sequences of real numbers {αl}l=1L\{\alpha_{l}\}_{l=1}^{L} and {γl}l=1L1\{\gamma_{l}\}_{l=1}^{L_{1}} such that

gMs(𝐳)=l=1Lαlϕl(𝐳),andgYs(𝐳)=l=1L1γlϕl(𝐳).\displaystyle g_{M}^{s}(\mathbf{z})=\sum_{l=1}^{L}\alpha_{l}\phi_{l}(\mathbf{z}),~{}~{}\text{and}~{}~{}g_{Y}^{s}(\mathbf{z})=\sum_{l=1}^{L_{1}}\gamma_{l}\phi_{l}(\mathbf{z}). (4)

As discussed earlier, gMs(𝐳)g_{M}^{s}(\mathbf{z}) is identifiable, which implies that the parameters {αl}l=1L\{\alpha_{l}\}_{l=1}^{L} are also identifiable. Substituting the equation for MM into the equation for YY in (3), and replacing the expressions of gMs(𝐳)g_{M}^{s}(\mathbf{z}) and gYs(𝐳)g_{Y}^{s}(\mathbf{z}) with corresponding linear combinations as shown above, we obtain

Y=\displaystyle Y= l=1L1(γl+βsαl)ϕl(𝐙)+βsL1+1Lαlϕl(𝐙)\displaystyle\sum_{l=1}^{L_{1}}(\gamma_{l}+\beta^{s}\alpha_{l})\phi_{l}(\mathbf{Z})+\beta^{s}\sum_{L_{1}+1}^{L}\alpha_{l}\phi_{l}(\mathbf{Z}) (5)
+βsU+h(U)+βsε+η.\displaystyle~{}~{}~{}+\beta^{s}U+h(U)+\beta^{s}\varepsilon+\eta.

Because E{βsU+h(U)+βsε+η𝐙}=0\textnormal{E}\{\beta^{s}U+h(U)+\beta^{s}\varepsilon+\eta\mid\mathbf{Z}\}=0 and ϕ1(𝐳),,ϕL(𝐳)\phi_{1}(\mathbf{z}),\ldots,\phi_{L}(\mathbf{z}) are linearly independent, we find that γl+βsαl\gamma_{l}+\beta^{s}\alpha_{l} (l=1,,L1l=1,\ldots,L_{1}) and βsαl\beta^{s}\alpha_{l} (l=L1+1,,Ll=L_{1}+1,\ldots,L) are identifiable. This, combined with the identifiability of {αl}l=1L\{\alpha_{l}\}_{l=1}^{L}, implies that βs\beta^{s} and {γl}l=1L1\{\gamma_{l}\}_{l=1}^{L_{1}} are also identifiable. Thus, gYs(𝐳)g_{Y}^{s}(\mathbf{z}) is identifiable. ∎

Based on the discussions about the equality between the parameters of the SEMs and the analogous parameters of the causal models in the previous subsection, we conclude from Theorem 1 that the parameters gMc()g_{M}^{c}(\cdot), gYc()g_{Y}^{c}(\cdot) and βc\beta^{c} are also identifiable, which in turn implies the identifiability of the average natural direct, indirect and total causal effects of ZjZ_{j} on YY according to (2).

Estimation and inference

In this subsection, we provide an approach for estimating the parameters of the SEMs in (3). We also discuss the resampling-based procedures for inference.

Given the known basis functions ϕ1(𝐳),,ϕL(𝐳)\phi_{1}(\mathbf{z}),\ldots,\phi_{L}(\mathbf{z}) of 𝒢Ms\mathscr{G}_{M}^{s}, we can express gMs(𝐳)g_{M}^{s}(\mathbf{z}) and gYs(𝐳)g_{Y}^{s}(\mathbf{z}) as linear combinations of them, which have been shown in (4). Denote 𝚽(𝐳)=(ϕ1(𝐳),,ϕL(𝐳))\bm{\Phi}(\mathbf{z})=(\phi_{1}(\mathbf{z}),\ldots,\phi_{L}(\mathbf{z})), 𝚽1(𝐳)=(ϕ1(𝐳),,ϕL1(𝐳))\bm{\Phi}_{1}(\mathbf{z})=(\phi_{1}(\mathbf{z}),\ldots,\phi_{L_{1}}(\mathbf{z})), 𝜶=(α1,,αL)\bm{\alpha}=(\alpha_{1},\ldots,\alpha_{L})^{\top}, 𝜸=(γ1,,γL1)\bm{\gamma}=(\gamma_{1},\ldots,\gamma_{L_{1}})^{\top}. We then rewrite gMs(𝐳)g_{M}^{s}(\mathbf{z}) and gYs(𝐳)g_{Y}^{s}(\mathbf{z}) as

gMs(𝐳)=𝚽(𝐳)𝜶,andgYs(𝐳)=𝚽1(𝐳)𝜸.g_{M}^{s}(\mathbf{z})=\bm{\Phi}(\mathbf{z})\bm{\alpha},~{}~{}\text{and}~{}~{}g_{Y}^{s}(\mathbf{z})=\bm{\Phi}_{1}(\mathbf{z})\bm{\gamma}.

By the least-square criterion, the coefficient 𝜶\bm{\alpha} of the expansion of gMs(𝐳)g_{M}^{s}(\mathbf{z}) can be determined by minimizing

𝜶^=argmin𝜶i=1n{Mi𝚽(𝐙i)𝜶}2.\hat{\bm{\alpha}}=\arg\!\min_{\bm{\alpha}}\sum_{i=1}^{n}\big{\{}M_{i}-\bm{\Phi}(\mathbf{Z}_{i})\bm{\alpha}\big{\}}^{2}.

Then the solution is given by

𝜶^={i=1n𝚽(𝐙i)𝚽(𝐙i)}1{i=1n𝚽(𝐙i)Mi},\hat{\bm{\alpha}}=\bigg{\{}\sum_{i=1}^{n}\bm{\Phi}(\mathbf{Z}_{i})^{\top}\bm{\Phi}(\mathbf{Z}_{i})\bigg{\}}^{-1}\bigg{\{}\sum_{i=1}^{n}\bm{\Phi}(\mathbf{Z}_{i})^{\top}M_{i}\bigg{\}},

which provides us an estimate of the coefficients of gMs(𝐳)g_{M}^{s}(\mathbf{z}).

To derive estimates of βs\beta^{s} and 𝜸\bm{\gamma}, we utilize the idea of the generalized method of moments (GMM) Hall (2005) and introduce more notations. Let 𝜶1=(α1,,αL1)\bm{\alpha}_{1}=(\alpha_{1},\ldots,\alpha_{L_{1}})^{\top}, and the subvector being composed of the remaining components in 𝜶\bm{\alpha} is denoted by 𝜶2\bm{\alpha}_{2}. Let 𝚽2(𝐳)\bm{\Phi}_{2}(\mathbf{z}) represent the subvector of 𝚽(𝐳)\bm{\Phi}(\mathbf{z}) with components not contained in 𝚽1(𝐳)\bm{\Phi}_{1}(\mathbf{z}), i.e., 𝚽(𝐳)={𝚽1(𝐳),𝚽2(𝐳)}\bm{\Phi}(\mathbf{z})=\{\bm{\Phi}_{1}(\mathbf{z}),\bm{\Phi}_{2}(\mathbf{z})\}. Then, we denote 𝒀=(Y1,,Yn)\bm{Y}=(Y_{1},\ldots,Y_{n})^{\top}, 𝚽={𝚽(𝐙1),,𝚽(𝐙n)}\bm{\Phi}=\{\bm{\Phi}(\mathbf{Z}_{1})^{\top},\ldots,\bm{\Phi}(\mathbf{Z}_{n})^{\top}\}^{\top}, 𝚽1={𝚽1(𝐙1),,𝚽1(𝐙n)}\bm{\Phi}_{1}=\{\bm{\Phi}_{1}(\mathbf{Z}_{1})^{\top},\ldots,\bm{\Phi}_{1}(\mathbf{Z}_{n})^{\top}\}^{\top}, and 𝚽2={𝚽2(𝐙1),,𝚽2(𝐙n)}\bm{\Phi}_{2}=\{\bm{\Phi}_{2}(\mathbf{Z}_{1})^{\top},\ldots,\bm{\Phi}_{2}(\mathbf{Z}_{n})^{\top}\}^{\top}. For notational simplicity, we also let 𝜹=𝜸+βs𝜶𝟏\bm{\delta}=\bm{\gamma}+\beta^{s}\bm{\alpha_{1}}. Using these notations, we now rewrite (5) as

Y=𝚽1(𝐙)𝜹+βs𝚽2(𝐙)𝜶2+βsU+h(U)+βsε+η.Y=\bm{\Phi}_{1}(\mathbf{Z})\bm{\delta}+\beta^{s}\bm{\Phi}_{2}(\mathbf{Z})\bm{\alpha}_{2}+\beta^{s}U+h(U)+\beta^{s}\varepsilon+\eta.

In view of E{βsU+h(U)+βsε+η𝐙}=0\textnormal{E}\{\beta^{s}U+h(U)+\beta^{s}\varepsilon+\eta\mid\mathbf{Z}\}=0, it follows immediately that

E[Φ(𝐙){Y𝚽1(𝐙)𝜹βs𝚽2(𝐙)𝜶2}]=𝟎.\textnormal{E}\big{[}\Phi(\mathbf{Z})^{\top}\{Y-\bm{\Phi}_{1}(\mathbf{Z})\bm{\delta}-\beta^{s}\bm{\Phi}_{2}(\mathbf{Z})\bm{\alpha}_{2}\}\big{]}=\bm{0}. (6)

We then estimate (𝜹,βs)(\bm{\delta}^{\top},\beta^{s}) by minimizing the sum of the squares of sample analogues of (6)

(𝜹^,βs^)=argmin(𝜹,βs)(𝒀𝚽1𝜹βs𝚽2𝜶2)𝚽𝚽\displaystyle(\hat{\bm{\delta}}^{\top},\hat{\beta^{s}})=\arg\!\min_{(\bm{\delta}^{\top},\beta^{s})}(\bm{Y}-\bm{\Phi}_{1}\bm{\delta}-\beta^{s}\bm{\Phi}_{2}\bm{\alpha}_{2})^{\top}\bm{\Phi}\bm{\Phi}^{\top}
(𝒀𝚽1𝜹βs𝚽2𝜶2).\displaystyle~{}~{}~{}\qquad\qquad\qquad\qquad~{}~{}(\bm{Y}-\bm{\Phi}_{1}\bm{\delta}-\beta^{s}\bm{\Phi}_{2}\bm{\alpha}_{2}).

After solving the above optimization problem and substituting the estimator 𝜶^2\hat{\bm{\alpha}}_{2} into the solutions, we can easily obtain the estimates of (𝜹,βs)(\bm{\delta}^{\top},\beta^{s}) with explicit forms. However, due to their complex expressions, we omit displaying them here. Let 𝜸^=𝜹^βs^𝜶^1\hat{\bm{\gamma}}=\hat{\bm{\delta}}-\hat{\beta^{s}}\hat{\bm{\alpha}}_{1}, which gives us an estimate of 𝜸\bm{\gamma}. In addition, all these estimators 𝜶^\hat{\bm{\alpha}}, βs^\hat{\beta^{s}} and 𝜸^\hat{\bm{\gamma}} are consistent. Under some regularity conditions, they are also asymptotically normal. Substituting these estimators into (2) and taking averages over samples, we obtain consistent estimators of the average causal effects of interest. Furthermore, by the Delta method, these consistent estimators are also asymptotically normal. Since analytical calculations of the asymptotic variances are difficult, we use a nonparametric bootstrap method to conduct inference.

Simulation studies and application

In this section, we first conduct simulation studies to evaluate the performance of the proposed estimators in finite samples. We then apply the proposed approach to a real customer loyalty data set.

Simulation studies

We consider two different simulation studies where data are generated by using only one and multiple (more than one) available instrumental variables, respectively. Each of the simulations is repeated 1000 times under different sample sizes n=n= 500, 1000, and 2000. We report the results of estimated causal effects of interest by averaging over the 1000 replications.

In both of the simulation studies, we consider three treatment variables: Z1Z_{1}, Z2Z_{2}, and Z3Z_{3}. Each of them is uniformly generated from {1,2,3}\{1,2,3\} with equal probability. The latent confounder UU is generated from N(0,1)N(0,1). The mediator MM is generated from the following equation:

M=\displaystyle M= Z1+Z2+Z3+Z1Z2+Z1Z3\displaystyle Z_{1}+Z_{2}+Z_{3}+Z_{1}*Z_{2}+Z_{1}*Z_{3}
+Z2Z3+U+ϵM,\displaystyle~{}~{}+Z_{2}*Z_{3}+U+\epsilon_{M},

where ϵMN(0,1)\epsilon_{M}\sim N(0,1). The only difference between the two simulation studies is the generation of the outcome YY. The details for the generation of YY are described as follows:

Simulation Study 1.

The outcome YY is generated from

Y=\displaystyle Y= Z1+Z2+Z3+M+Z1Z2\displaystyle Z_{1}+Z_{2}+Z_{3}+M+Z_{1}*Z_{2}
+Z1Z3+2U+ϵY,\displaystyle~{}~{}+Z_{1}*Z_{3}+2*U+\epsilon_{Y},

where ϵYN(0,1)\epsilon_{Y}\sim N(0,1), and is independent of ϵM\epsilon_{M}. Note that Z2Z3Z2*Z3 is independent of UU, associated with MM, and has no direct effect on YY except through MM. Thus, for this simulation, Z2Z3Z2*Z3 can be viewed as an instrumental variable.

We use both the proposed approach and the traditional regression-based approach without considering the latent confounder UU for estimation. To compare the results obtained from these two approaches, we report the bias and standard error (SE) for estimators of NDEj(2,1)NDE_{j}(2,1), NIEj(2,1)NIE_{j}(2,1) and TEj(2,1)TE_{j}(2,1) based on 1000 replications for the sample sizes n=500n=500, 1000, and 2000, respectively. Here, the subscript ‘jj’ indicates the causal effects of the jjth treatment variable and j=1,2,3j=1,2,3. Table 1 displays the simulation results.

From Table 1, we can see that the estimates by our approach all have negligible biases for the small sample size 500. As sample size increases, both the biases and standard errors become much smaller. In contrast, the estimates of the average natural direct and indirect effects obtained by the traditional approach all have quite large biases even for large sample sizes. It is because the traditional approach ignores the latent confounder UU which confounds the mediator-outcome relationship. However, since UU does not affect the relationship between treatments and outcome variables, the performance of the traditional approach for estimations on the average total effects behaves much better.

n\hphantom{0}n Proposed Traditional
Bias SE Bias SE
Z1Z_{1}
NDE1(2,1)\textnormal{NDE}_{1}(2,1) = 5 0500 0.057 0.643 -3.724 0.280
1000 0.010 0.464 -3.719 0.189
2000 0.008 0.326 -3.720 0.133
NIE1(2,1)\textnormal{NIE}_{1}(2,1) = 5 0500 -0.062 0.649 3.820 0.297
1000 -0.014 0.479 3.815 0.208
2000 -0.005 0.335 3.824 0.149
TE1(2,1)\textnormal{TE}_{1}(2,1) = 10 0500 -0.006 0.210 0.096 0.212
1000 -0.004 0.146 0.095 0.148
2000 0.004 0.101 0.104 0.101
Z2Z_{2}
NDE2(2,1)\textnormal{NDE}_{2}(2,1) = 3 0500 0.067 0.632 -3.718 0.284
1000 0.012 0.467 -3.720 0.188
2000 0.003 0.324 -3.723 0.130
NIE2(2,1)\textnormal{NIE}_{2}(2,1) = 5 0500 -0.057 0.653 3.830 0.306
1000 -0.010 0.479 3.815 0.218
2000 -0.008 0.334 3.819 0.144
TE2(2,1)\textnormal{TE}_{2}(2,1) = 8 0500 0.011 0.204 0.111 0.203
1000 0.002 0.138 0.103 0.138
2000 -0.004 0.096 0.097 0.095
Z3Z_{3}
NDE3(2,1)\textnormal{NDE}_{3}(2,1) = 3 0500 -0.052 0.636 -3.728 0.272
1000 0.010 0.466 -3.718 0.191
2000 0.009 0.327 -3.717 0.133
NIE3(2,1)\textnormal{NIE}_{3}(2,1) = 5 0500 -0.064 0.649 3.818 0.300
1000 -0.014 0.478 3.815 0.216
2000 -0.007 0.334 3.820 0.149
TE3(2,1)\textnormal{TE}_{3}(2,1) = 8 0500 -0.012 0.190 0.090 0.189
1000 -0.004 0.138 0.097 0.139
2000 0.002 0.097 0.103 0.096
Table 1: Results for the proposed approach and traditional approach with different sample sizes in Simulation Study 1.

Simulation Study 2.

In this simulation study, we generate the outcome YY from

Y=Z1+Z2+Z3+M+2U+ϵY,\displaystyle Y=Z_{1}+Z_{2}+Z_{3}+M+2*U+\epsilon_{Y},

where ϵYN(0,1)\epsilon_{Y}\sim N(0,1) and it is independent of ϵM\epsilon_{M}. For this simulation, Z1Z2Z_{1}*Z_{2}, Z1Z3Z_{1}*Z_{3} and Z2Z3Z_{2}*Z_{3} can be treated as instrumental variables. We use both the proposed approach and the traditional approach for estimation, and the corresponding results are shown in Table 2. The results are similar to those in Table 1, both of which support the consistency results of the proposed estimators and demonstrate the advantage of the proposed approach over the traditional one.

n\hphantom{0}n Proposed Traditional
Bias SE Bias SE
Z1Z_{1}
NDE1(2,1)\textnormal{NDE}_{1}(2,1) = 1 0500 0.001 0.154 -0.572 0.134
1000 0.000 0.107 -0.569 0.096
2000 -0.000 0.073 -0.570 0.065
NIE1(2,1)\textnormal{NIE}_{1}(2,1) = 5 0500 -0.002 0.213 0.918 0.188
1000 -0.003 0.151 0.915 0.138
2000 0.004 0.106 0.921 0.095
TE1(2,1)\textnormal{TE}_{1}(2,1) = 6 0500 -0.001 0.213 0.346 0.188
1000 -0.003 0.134 0.346 0.129
2000 0.004 0.093 0.351 0.089
Z2Z_{2}
NDE2(2,1)\textnormal{NDE}_{2}(2,1) = 1 0500 0.006 0.150 -0.565 0.133
1000 0.002 0.107 -0.570 0.095
2000 -0.004 0.075 -0.574 0.066
NIE2(2,1)\textnormal{NIE}_{2}(2,1) = 5 0500 0.004 0.216 0.924 0.193
1000 0.001 0.149 0.920 0.139
2000 0.001 0.105 0.918 0.092
TE2(2,1)\textnormal{TE}_{2}(2,1) = 6 0500 0.010 0.197 0.359 0.187
1000 0.003 0.134 0.350 0.129
2000 -0.003 0.092 0.344 0.087
Z3Z_{3}
NDE3(2,1)\textnormal{NDE}_{3}(2,1) = 1 0500 0.006 0.150 -0.565 0.133
1000 0.002 0.107 -0.570 0.095
2000 -0.004 0.075 -0.574 0.066
NIE3(2,1)\textnormal{NIE}_{3}(2,1) = 5 0500 -0.004 0.209 0.916 0.187
1000 -0.003 0.149 0.915 0.138
2000 0.001 0.106 0.919 0.095
TE3(2,1)\textnormal{TE}_{3}(2,1) = 6 0500 -0.011 0.182 0.337 0.176
1000 -0.003 0.133 0.346 0.128
2000 0.004 0.093 0.351 0.089
Table 2: Results for the proposed approach and traditional approach with different sample sizes in Simulation Study 2.

Application to real data

In this section, we apply the proposed approach to a real customer loyalty data set from a telecom company in China. The customer loyalty analysis tries to discover key factors affecting loyalty and the affecting process, with which to choose proper actions to maintain customers. In this study, 18553 randomly chosen customers answer questionnaire via personal interview or an online survey. We drop the incompletely observed participants, leaving a total of 9833 participants. In the questionnaire, customers scored their satisfaction about 13 different specific factors, respectively, such as network quality, tariff plan, voice quality, service quality, and so on. The scores are integers ranging from 1 to 10, with 1 denoting “not satisfied at all” and 10 denoting “satisfied”. For the simplicity, we choose two important treatment factors, network quality and tariff plan, as an example to illustrate our approach, which are denoted by Z1Z_{1} and Z2Z_{2}, respectively. For each customer, a score indicating the loyalty to the company has also been obtained, and we use the score as the outcome variable YY. Investigators also collected the information of each customer’s general satisfaction about the company, and it is used as the mediator variable MM. We aim to assess whether the treatment factors have significant effects on improving customers’ loyalty. In particular, we wish to figure out whether the effects of the factors on loyalty are mediated by the general satisfaction about the company.

We use the proposed approach for evaluation, and also consider the traditional approach without accounting for possible latent confounders for comparison. For both approaches, we calculate estimators of the average natural direct effect NDEj(2,1)NDE_{j}(2,1), the average natural indirect effect NIEj(2,1)NIE_{j}(2,1), and the average total effect TEj(2,1)TE_{j}(2,1), and also their 95% confidence intervals. Of note, the average total causal effect TEj(2,1)TE_{j}(2,1) is estimated by the sum of the estimated average natural direct effect NDEj(2,1)NDE_{j}(2,1) and indirect effect NIEj(2,1)NIE_{j}(2,1). Subscript ‘jj’ indicates the corresponding causal effects of network quality and tariff plan for j=1,2j=1,2. We report these results in Table 3.

Proposed Traditional
Estimate 95% CI Estimate 95% CI
Network quality
NDE1(2,1)^\widehat{NDE_{1}(2,1)} 0.426 (0.392, 0.458) 0.267 (0.244, 0.291)
NIE1(2,1)^\widehat{NIE_{1}(2,1)} 0.039 (0.028, 0.050) 0.101 (0.089, 0.113)
TE1(2,1)^\widehat{TE_{1}(2,1)} 0.465 (0.437, 0.492) 0.369 (0.345, 0.391)
Tariff plan
NDE2(2,1)^\widehat{NDE_{2}(2,1)} 0.401 (0.373, 0.430) 0.271 (0.250, 0.293)
NIE2(2,1)^\widehat{NIE_{2}(2,1)} 0.042 (0.031, 0.053) 0.110 (0.099, 0.120)
TE2(2,1)^\widehat{TE_{2}(2,1)} 0.443 (0.419, 0.466) 0.381 (0.361, 0.403)
  • CI: confidence interval.

Table 3: Results for the proposed approach and traditional approach on the analysis of the real customer loyalty data, respectively.

From Table 3, we can see that both the proposed approach and the traditional approach indicate positive and statistically significant causal effects of network quality and tariff plan on loyalty. Since there are no latent confounders between the treatments (network quality and tariff plan) and outcome variable (loyalty) by the randomization of this study, a simple linear regression of the outcome against the treatments can induce good estimates of the average total causal effects. These results are exactly the same as the corresponding ones obtained from the proposed approach, but different from those obtained from the traditional approach. From this point of view, the results for our proposed approach in Table 3 are more reliable, compared with those obtained from the traditional approach which does not consider possible latent confounders between the general satisfaction and loyalty. According to the results, both the estimated average natural direct effects of network quality and tariff plan on loyalty are positive but larger than the corresponding indirect effects through the mediator variable (the general satisfaction about the company). This means that the causal effects of satisfaction about network quality and tariff plan on loyalty are mostly operated in a direct way, but only a small fraction of them are intermediated by the general satisfaction about the company. Note also that the estimated total causal effect of network quality is a bit larger than that of tariff plan, which to some extent implies that the network quality may play a more important role in affecting loyalty than tariff plan.

Conclusion

In this article, we have proposed an approach for causal mediation analysis with multiple treatments and latent confounders between the mediator and outcome variables. We give the formal definitions of the causal mediation effects in the multiple-treatment setting. For the identification of these causal effects, we first postulate a causal model for potential outcomes and express the causal effects as functions of parameters of the causal model. Then we show the equality between parameters of the causal models with analogous parameters of the more feasible SEMs which is built upon observed variables. Using the idea of instrumental variable methods, we provide sufficient conditions for identifying parameters of the SEMs, which in turn gives the identification of the causal effects of interest. Finally, we also develop an effective approach for estimation and inference.

References

  • Albert (2008) Albert, J. M. 2008. Mediation analysis via potential outcomes models. Statistics in Medicine 27(8):1282–1304.
  • Baron and Kenny (1986) Baron, R. M., and Kenny, D. A. 1986. The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology 51:1173–1182.
  • Dunn and Bentall (2007) Dunn, G., and Bentall, R. 2007. Modelling treatment-effect heterogeneity in randomized controlled trials of complex interventions (psychological treatments). Statistics in Medicine 26(26):4719–4745.
  • Hall (2005) Hall, A. R. 2005. Generalized Method of Moments. Oxford: Oxford University Press.
  • Imai, Keele, and Tingley (2010) Imai, K.; Keele, L.; and Tingley, D. 2010. A general approach to causal mediation analysis. Psychological Methods 15:309–334.
  • Imai, Keele, and Yamamoto (2010) Imai, K.; Keele, L.; and Yamamoto, T. 2010. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science 25:51–71.
  • Li and Zhou (2017) Li, W., and Zhou, X.-H. 2017. Identifiability and estimation of causal mediation effects with missing data. Statistics in Medicine 36(25):3948–3965.
  • Lindquist (2012) Lindquist, M. A. 2012. Functional causal mediation analysis with an application to brain connectivity. Journal of the American Statistical Association 107(500):1297–1309.
  • MacKinnon, Fairchild, and Fritz (2007) MacKinnon, D. P.; Fairchild, A. J.; and Fritz, M. S. 2007. Mediation analysis. Annual Review of Psychology 58:593–614.
  • Pearl (2000) Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge.
  • Pearl (2001) Pearl, J. 2001. Direct and indirect effects. In Proceedings of the 17th Annual Conference on Uncertainty in Artificial Intelligence, 411–420. Morgan Kaufmann, San Francisco, CA.
  • Robins and Greenland (1992) Robins, J. M., and Greenland, S. 1992. Identifiability and exchangeability for direct and indirect effects. Epidemiology 3:143–155.
  • Rubin (1980) Rubin, D. B. 1980. Comment on ‘Randomization analysis of experimental data: the Fisher randomization test’. Journal of the American Statistical Association 75:591–593.
  • Small (2012) Small, D. S. 2012. Mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables. Journal of Statistical Research 46:91–103.
  • Ten Have et al. (2007) Ten Have, T. R.; Joffe, M. M.; Lynch, K. G.; Brown, G. K.; Maisto, S. A.; and Beck, A. T. 2007. Causal mediation analyses with rank preserving models. Biometrics 63:926–934.
  • VanderWeele and Vansteelandt (2009) VanderWeele, T., and Vansteelandt, S. 2009. Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface 2:457–468.
  • VanderWeele and Vansteelandt (2013) VanderWeele, T., and Vansteelandt, S. 2013. Mediation analysis with multiple mediators. Epidemiologic Methods 2:95–115.
  • VanderWeele (2009) VanderWeele, T. J. 2009. Marginal structural models for the estimation of direct and indirect effects. Epidemiology 20(1):18–26.
  • VanderWeele (2010) VanderWeele, T. J. 2010. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology 21(4):540–551.
  • Zheng and Zhou (2015) Zheng, C., and Zhou, X.-H. 2015. Causal mediation analysis in the multilevel intervention and multicomponent mediator case. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 77:581–615.