This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Alternative Measures of Direct and Indirect Effects

Jose M. Peña1 1Linköping University, Sweden. [email protected]
Abstract.

There are a number of measures of direct and indirect effects in the literature. They are suitable in some cases and unsuitable in others. We describe a case where the existing measures are unsuitable and propose new suitable ones. We also show that the new measures can partially handle unmeasured treatment-outcome confounding, and bound long-term effects by combining experimental and observational data.

1. Introduction

Consider the following causal graph studied in [1, 2], where XX, ZZ and YY represents an applicant’s gender, qualifications and hiring, respectively.

XXYYZZ

The edge XYX\rightarrow Y represents that the hirer questions applicants about their gender, and that their answers may have an effect on hiring them. Pearl imagines a policy maker who may be interested in predicting the gender mix in the work force, if it were illegal for the hirer to question applicants about their gender. This quantity corresponds to the effect of gender on hiring mediated by qualifications. Pearl argues that the answer to this question lies in deactivating the direct path XYX\rightarrow Y. He also argues that the answer can be realized by computing the average natural (or pure) indirect effect:

NIE(X,Y)=E[Yx¯,Zx]E[Yx¯]NIE(X,Y)=E[Y_{\overline{x},Z_{x}}]-E[Y_{\overline{x}}]

which is the difference between the expected outcomes under no exposure when the mediator takes the value it would under exposure and non-exposure, respectively. We agree with the answer to the question (i.e., deactivating XYX\rightarrow Y) but not with its realization (i.e., deactivating XYX\rightarrow Y by computing NIE(X,Y)NIE(X,Y)), because the reference value x¯\overline{x} affects the outcome in this realization of the answer. This is problematic because it means that the direct path XYX\rightarrow Y is not really deactivated and, moreover, the answer depends on the level chosen as reference. In other words, this realization of the answer does not really correspond to the no-questioning policy being evaluated. The problems discussed here are shared by other classical measures of indirect effect such as the average total and controlled indirect effects, as well as by the interventional indirect effect measure proposed by Geneletti [3]:

IIE(X,Y)=E[Yx¯,𝒵x]E[Yx¯,𝒵x¯]IIE(X,Y)=E[Y_{\overline{x},\mathcal{Z}_{x}}]-E[Y_{\overline{x},\mathcal{Z}_{\overline{x}}}]

which compares the expected outcome under no exposure when ZZ is drawn from the distributions 𝒵x\mathcal{Z}_{x} and 𝒵x¯\mathcal{Z}_{\overline{x}} of ZxZ_{x} and Zx¯Z_{\overline{x}}.111Although NIE(X,Y)NIE(X,Y) and IIE(X,Y)IIE(X,Y) do not coincide in general, they coincide for the causal graph under study [4]. However, this does not mean that these measures should be abandoned. Quite the opposite. They are informative when the reference value is clear from the context. For instance, if women suspect being discriminated by the hirer, then they may want to know if the probability of a woman getting hired would remain unchanged had she a man’s qualifications. This is measured by NIE(X,Y)NIE(X,Y) with reference value x¯\overline{x} set to “woman”. In summary, the existing measures of indirect effect are suitable in some cases and unsuitable in others. In this paper, we propose a new measure that does not require selecting a reference value.

More recently, [5] have introduced the population intervention indirect effect to measure the indirect effect of XX on YY through the mediator ZZ:

PIIE(x¯)=E[YX,ZX]E[YX,Zx¯]PIIE(\overline{x})=E[Y_{X,Z_{X}}]-E[Y_{X,Z_{\overline{x}}}]

which is the difference between the expected outcomes when the exposure and mediator take natural (observed) values and when the exposure takes natural value but the mediator takes the value it would under no exposure. Therefore, this measure is suitable when the exposure is harmful (e.g., smoking) and, thus, one may be more interested in elucidating the effect (e.g., disease prevalence) of eliminating the exposure rather than in contrasting the effects of exposure and non-exposure. The latter is considered irrelevant, because it is inconceivable that everyone will be exposed. In this paper, though, we are interested in the latter because it may be informative even when the interventions are inconceivable. For instance, the two interventions being contrasted in the gender discrimination example above (everyone is male and everyone is female) are both inconceivable, but their contrast is instrumental to decide whether the no-questioning policy should be introduced or not, as argued in [1, 2] (see also the previous paragraph).

The rest of the paper is organized as follows. We present our new measure of indirect effect in Section 2. We also present a new measure of direct effect. We illustrate them with an example. Moreover, we show that they can partially handle unmeasured treatment-outcome confounding, and bound long-term effects by combining experimental and observational data. Finally, we close with some discussion in Section 3.

2. Alternative Measures

Consider again the causal graph below, where XX, ZZ and YY represents an applicant’s gender, qualifications and hiring, respectively.

XXYYZZ

We assume that the direct path XYX\rightarrow Y is actually mediated by an unmeasured random variable UU that is left unmodelled. This arguably holds in most domains. In the example above, UU may represent the hirer’s predisposition to hire the applicant. However, the identity of UU is irrelevant. Let GG denote the causal graph below, i.e. the original causal graph refined with the addition of UU.

XXYYUUZZ

Now, deactivating the direct path XYX\rightarrow Y in the original causal graph can be achieved by adjusting for UU in GG, i.e. uE[Y|x,u]p(u)\sum_{u}E[Y|x,u]p(u). Unfortunately, UU is unmeasured. We propose an alternative way of deactivating XYX\rightarrow Y. Let HH denote the causal graph below, i.e. the result of reversing the edge XUX\rightarrow U in GG.

XXYYUUZZ

The average total effect of XX on YY in HH can be computed by the front-door criterion [2]:

TE(X,Y)\displaystyle TE(X,Y) =E[Yx]E[Yx¯]\displaystyle=E[Y_{x}]-E[Y_{\overline{x}}]
=zp(z|x)xE[Y|x,z]p(x)zp(z|x¯)xE[Y|x,z]p(x).\displaystyle=\sum_{z}p(z|x)\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime})-\sum_{z}p(z|\overline{x})\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime}). (1)

Note that GG and HH are distribution equivalent, i.e. every probability distribution that is representable by GG is representable by HH and vice versa [2]. Then, evaluating the second line of the equation above in GG or HH gives the same result. If we evaluate it in HH, then it corresponds to the part of association between XX and YY that is attributable to the path XZYX\rightarrow Z\rightarrow Y. If we evaluate it in GG, then it corresponds to the part of TE(X,Y)TE(X,Y) in GG that is attributable to the path XZYX\rightarrow Z\rightarrow Y, because TE(X,Y)TE(X,Y) in GG equals the association between XX and YY, since GG has only directed paths from XX to YY. Therefore, the second line in the equation above corresponds to the part of TE(X,Y)TE(X,Y) in the original causal graph that is attributable to the path XZYX\rightarrow Z\rightarrow Y, thereby deactivating the direct path XYX\rightarrow Y. We propose to use the second line in the equation above as a measure of the indirect effect of XX on YY in the original causal graph:

IE(X,Y)=zp(z|x)xE[Y|x,z]p(x)zp(z|x¯)xE[Y|x,z]p(x).IE(X,Y)=\sum_{z}p(z|x)\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime})-\sum_{z}p(z|\overline{x})\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime}).

IE(X,Y)IE(X,Y) only considers the path XZYX\rightarrow Z\rightarrow Y to propagate the value of XX. This is unlike NIE(X,Y)NIE(X,Y), which considers both paths from XX to YY: The path XYX\rightarrow Y propagates the value X=x¯X=\overline{x}, whereas the path XZYX\rightarrow Z\rightarrow Y propagates the value that ZZ takes under X=xX=x and X=x¯X=\overline{x}. As shown in Section 2.3, provided that ZZ is binary, IE(X,Y)IE(X,Y) can be written as TE(X,Z)TE(Z,Y)TE(X,Z)\cdot TE(Z,Y), which some may find natural. It moreover coincides with the indirect effect in linear structural equation models.

Likewise, we propose to measure the direct effect of XX on YY as the part of TE(X,Y)TE(X,Y) in the original causal graph that remains after deactivating the path XZYX\rightarrow Z\rightarrow Y. This is achieved by simply adjusting for ZZ:

DE(X,Y)=zE[Y|x,z]p(z)zE[Y|x¯,z]p(z).DE(X,Y)=\sum_{z}E[Y|x,z]p(z)-\sum_{z}E[Y|\overline{x},z]p(z).

For the same reasons as above, this is unlike the measure proposed in [1, 2], namely the average natural (or pure) direct effect NDE(X,Y)=E[Yx,Zx¯]E[Yx¯]NDE(X,Y)=E[Y_{x,Z_{\overline{x}}}]-E[Y_{\overline{x}}].

Finally, note that DE(X,Y)DE(X,Y) and IE(X,Y)IE(X,Y) can be computed from the observed data distribution p(X,Z,Y)p(X,Z,Y). This is also true for NDE(X,Y)NDE(X,Y) and NIE(X,Y)NIE(X,Y) [1, 2]. Note also that the sum of DE(X,Y)DE(X,Y) and IE(X,Y)IE(X,Y) does not equal TE(X,Y)TE(X,Y) in the original causal graph, due to interactions in the outcome model. This is also true for the sum of NDE(X,Y)NDE(X,Y) and NIE(X,Y)NIE(X,Y) [1, 2, 6, 7].

2.1. Example

Consider the following example from [8], where the causal graph is

XXYYZZ

such that XX represents a drug treatment, ZZ the presence of a certain enzyme in a patient’s blood, and YY recovery. Moreover, we have that

p(z|x)=0.75\displaystyle p(z|x)=0.75 p(y|x,z)=0.8\displaystyle p(y|x,z)=0.8
p(y|x,z¯)=0.4\displaystyle p(y|x,\overline{z})=0.4
p(z|x¯)=0.4\displaystyle p(z|\overline{x})=0.4 p(y|x¯,z)=0.3\displaystyle p(y|\overline{x},z)=0.3
p(y|x¯,z¯)=0.2.\displaystyle p(y|\overline{x},\overline{z})=0.2.

Pearl imagines a scenario where someone proposes developing a cheaper drug that is equal to the existing one except for the lack of effect on enzyme production. To evaluate the new drug’s performance, he computes TE(X,Y)=0.46TE(X,Y)=0.46 and NDE(X,Y)=0.32NDE(X,Y)=0.32, and concludes that the new drug will reduce the probability of recovery by 30%30\%, i.e. 1NDE(X,Y)/TE(X,Y)=0.31-NDE(X,Y)/TE(X,Y)=0.3. We can repeat the analysis using DE(X,Y)DE(X,Y) instead of NDE(X,Y)NDE(X,Y):

DE(X,Y)\displaystyle DE(X,Y) =p(y|x,z)p(z)+p(y|x,z¯)p(z¯)p(y|x¯,z)p(z)p(y|x¯,z¯)p(z¯)\displaystyle=p(y|x,z)p(z)+p(y|x,\overline{z})p(\overline{z})-p(y|\overline{x},z)p(z)-p(y|\overline{x},\overline{z})p(\overline{z})
=0.8p(z)+0.4(1p(z))0.3p(z)0.2(1p(z))\displaystyle=0.8p(z)+0.4(1-p(z))-0.3p(z)-0.2(1-p(z))
=0.2+0.3p(z)\displaystyle=0.2+0.3p(z)
=0.2+0.3[p(z|x)p(x)+p(z|x¯)p(x¯)]\displaystyle=0.2+0.3[p(z|x)p(x)+p(z|\overline{x})p(\overline{x})]
=0.2+0.3[0.75p(x)+0.4(1p(x))]\displaystyle=0.2+0.3[0.75p(x)+0.4(1-p(x))]
=0.32+0.11p(x)\displaystyle=0.32+0.11p(x)

which implies that 0.32DE(X,Y)0.430.32\leq DE(X,Y)\leq 0.43. An interval is returned because p(X)p(X) is not given in the original example (it is not needed to compute NDE(X,Y)NDE(X,Y) or NIE(X,Y)NIE(X,Y)). Therefore, we conclude that the new drug will reduce the probability of recovery by between 7%7\% and 30%30\%, depending on p(X)p(X).

The new drug development scenario described above corresponds to the following alternative causal graph:

XXYYZZ

where the edge XZX\leftarrow\!\!\!\!\!\multimap Z means that there is an edge XZX\leftarrow Z or XZX\leftrightarrow Z. The former represents that the presence of enzyme may have an effect on the patient taking the treatment, and the latter represents the potential existence of an unmeasured confounder between them. The drug performance in this alternative causal graph is simply TE(X,Y)TE(X,Y), which can be computed by adjusting for ZZ, and thus it coincides with DE(X,Y)DE(X,Y) in the original causal graph, since the two graphs are distribution equivalent. In other words, it is DE(X,Y)DE(X,Y) rather than NDE(X,Y)NDE(X,Y) that should be used to answer the original question. Note that DE(X,Y)=NDE(X,Y)=0.32DE(X,Y)=NDE(X,Y)=0.32 if and only if p(x)=0p(x)=0, i.e. everyone is untreated. This is no coincidence because NDE(X,Y)NDE(X,Y) in the original causal graph coincides with the average effect of the treatment among the untreated in the alternative graph [9],222[9] prove the equivalence when XZX\leftarrow Z, but the proof also applies when XZX\leftrightarrow Z. rather than with TE(X,Y)TE(X,Y) which is the correct answer to the original question.

Pearl also imagines a scenario where someone proposes developing a cheaper drug that is equal to the existing one except for the lack of direct effect on recovery, i.e. it just stimulates enzyme production as much as the existing drug. To evaluate the new drug’s performance, he computes TE(X,Y)=0.46TE(X,Y)=0.46 and NIE(X,Y)=0.04NIE(X,Y)=0.04, and concludes that the new drug will reduce the probability of recovery by 91%91\%, i.e. 1NIE(X,Y)/TE(X,Y)=0.911-NIE(X,Y)/TE(X,Y)=0.91.333The small disagreements with the results in [8] are due to rounding. We can repeat the analysis using IE(X,Y)IE(X,Y) instead of NIE(X,Y)NIE(X,Y):

IE(X,Y)\displaystyle IE(X,Y) =p(z|x)[p(y|x,z)p(x)+p(y|x¯,z)p(x¯)]\displaystyle=p(z|x)[p(y|x,z)p(x)+p(y|\overline{x},z)p(\overline{x})]
+p(z¯|x)[p(y|x,z¯)p(x)+p(y|x¯,z¯)p(x¯)]\displaystyle+p(\overline{z}|x)[p(y|x,\overline{z})p(x)+p(y|\overline{x},\overline{z})p(\overline{x})]
p(z|x¯)[p(y|x,z)p(x)+p(y|x¯,z)p(x¯)]\displaystyle-p(z|\overline{x})[p(y|x,z)p(x)+p(y|\overline{x},z)p(\overline{x})]
p(z¯|x¯)[p(y|x,z¯)p(x)+p(y|x¯,z¯)p(x¯)]\displaystyle-p(\overline{z}|\overline{x})[p(y|x,\overline{z})p(x)+p(y|\overline{x},\overline{z})p(\overline{x})]
=0.75[0.8p(x)+0.3(1p(x))]+0.25[0.4p(x)+0.2(1p(x))]\displaystyle=0.75[0.8p(x)+0.3(1-p(x))]+0.25[0.4p(x)+0.2(1-p(x))]
0.4[0.8p(x)+0.3(1p(x))]0.6[0.4p(x)+0.2(1p(x))]\displaystyle-0.4[0.8p(x)+0.3(1-p(x))]-0.6[0.4p(x)+0.2(1-p(x))]
=0.04+0.11p(x)\displaystyle=0.04+0.11p(x)

which implies that 0.04IE(X,Y)0.150.04\leq IE(X,Y)\leq 0.15. Therefore, we conclude that the new drug will reduce the probability of recovery by between 67%67\% and 91%91\%, depending on p(X)p(X).

The latest new drug development scenario corresponds to the following alternative causal graph:

XXYYZZ

where the edge XYX\leftrightarrow Y represents the potential existence of an unmeasured treatment-outcome confounder. The drug performance in this alternative causal graph is simply TE(X,Y)TE(X,Y), which can be computed by the front-door criterion, and thus it coincides with IE(X,Y)IE(X,Y) in the original causal graph, under our assumption that the direct path XYX\rightarrow Y in the original graph is mediated by an unmeasured random variable (recall the previous section). In other words, it is IE(X,Y)IE(X,Y) rather than NIE(X,Y)NIE(X,Y) that should be used to answer the original question. Note that IE(X,Y)=NIE(X,Y)=0.04IE(X,Y)=NIE(X,Y)=0.04 if and only if p(x)=0p(x)=0. Again, this is no coincidence because NIE(X,Y)NIE(X,Y) in the original causal graph coincides with the average effect of the treatment among the untreated in the alternative graph [1, 10], rather than with TE(X,Y)TE(X,Y) which is the correct answer to the original question.

2.2. Unmeasured Confounding

In this section, we study the following extension of the original causal graph with an unmeasured treatment-outcome confounder VV.

XXYYZZVV

Now, neither NDE(X,Y)NDE(X,Y) nor NIE(X,Y)NIE(X,Y) nor their total and controlled counterparts are identifiable from the observed data distribution p(X,Z,Y)p(X,Z,Y) [1, 2]. However, IE(X,Y)IE(X,Y) can be computed pretty much like before. First, we add the unmeasured mediator UU. The edge VUV\mathrel{\reflectbox{$\leftarrow\!\!\!\!\!\multimap$}}U means that there may be an edge VUV\rightarrow U or VUV\leftrightarrow U.

XXYYUUZZVV

Then, we group UU and VV. Note that every probability distribution that is representable by the graph above is representable by the graph below, since all the independencies entailed by the latter hold in the former.

XXYY{U,V}\{U,V\}ZZ

Finally, we apply the front-door criterion.

Like NDE(X,Y)NDE(X,Y) and its total counterpart, DE(X,Y)DE(X,Y) is not identifiable from the observed data distribution p(X,Z,Y)p(X,Z,Y) in the extended causal graph under consideration. However, it may be bounded if VV is binary and a binary proxy WW of VV is measured. The causal graph under consideration is then as follows.

XXYYZZVVWW

In the literature, there are many cautionary tales about the bias that adjusting for the proxy of an unmeasured confounder introduces to the estimation of a causal effect [11, 12, 13]. For instance, [14] constructs an example where adjusting for the proxy is worse than not adjusting at all. However, there are conditions under which the opposite is true [15, 16, 17, 18]. We use some of these conditions here.

Recall that DE(X,Y)DE(X,Y) is the part of TE(X,Y)TE(X,Y) in the causal graph that remains after deactivating the path XZYX\rightarrow Z\rightarrow Y. This is achieved by simply adjusting for ZZ:

DE(X,Y)=zTE(X,Y|z)p(z)DE(X,Y)=\sum_{z}TE(X,Y|z)p(z)

where TE(X,Y|z)TE(X,Y|z) is the average total effect of XX on YY in the stratum Z=zZ=z. Let us define the observed or partially adjusted average total effect of XX on YY in the stratum Z=zZ=z as

TEobs(X,Y|z)=wE[Y|x,z,w]p(w|z)wE[Y|x¯,z,w]p(w|z).TE_{obs}(X,Y|z)=\sum_{w}E[Y|x,z,w]p(w|z)-\sum_{w}E[Y|\overline{x},z,w]p(w|z).

Note that it can be computed from the observed data distribution p(X,W,Z,Y)p(X,W,Z,Y). Rephrasing the results in [16, 17] to our scenario, if E[Y|x,z,W]E[Y|x^{\prime},z^{\prime},W] and E[X|z,W]E[X|z^{\prime},W] are one nonincreasing and the other nondecreasing in WW for all x{x,x¯}x^{\prime}\in\{x,\overline{x}\} and z{z,z¯}z^{\prime}\in\{z,\overline{z}\}, then TE(X,Y|z)TEobs(X,Y|z)TE(X,Y|z^{\prime})\geq TE_{obs}(X,Y|z^{\prime}) for all z{z,z¯}z^{\prime}\in\{z,\overline{z}\}. On the other hand, if E[Y|x,z,W]E[Y|x^{\prime},z^{\prime},W] and E[X|z,W]E[X|z^{\prime},W] are both nonincreasing or both nondecreasing in WW for all x{x,x¯}x^{\prime}\in\{x,\overline{x}\} and z{z,z¯}z^{\prime}\in\{z,\overline{z}\}, then TEobs(X,Y|z)TE(X,Y|z)TE_{obs}(X,Y|z^{\prime})\geq TE(X,Y|z^{\prime}) for all z{z,z¯}z^{\prime}\in\{z,\overline{z}\}. Note that the antecedents of these rules are testable from the observed data distribution p(X,W,Z,Y)p(X,W,Z,Y). Not in all but in many cases, these rules enable us to bound DE(X,Y)DE(X,Y) and even determine its sign. Specifically,

DE(X,Y)(2𝟏1)[zTEobs(X,Y|z)p(z)](2𝟏1)\displaystyle DE(X,Y)\cdot(2\cdot\boldsymbol{1}_{\neq}-1)\geq[\sum_{z}TE_{obs}(X,Y|z)p(z)]\cdot(2\cdot\boldsymbol{1}_{\neq}-1)

where 𝟏\boldsymbol{1}_{\neq} is 1 (respectively, 0) if E[Y|x,z,W]E[Y|x^{\prime},z^{\prime},W] and E[X|z,W]E[X|z^{\prime},W] are one nonincreasing and the other nondecreasing (respectively, both nonincreasing or both nondecreasing) in WW for all x{x,x¯}x^{\prime}\in\{x,\overline{x}\} and z{z,z¯}z^{\prime}\in\{z,\overline{z}\}.

2.3. Long-Term Effects

This section addresses a problem of randomized controlled trials, namely long-time effect estimation from typically short-lived trials. Consider the following causal graph, where XX and VV are unmeasured.

XXYYVVZZWW

We assume that the mediator ZZ is a short-term effect of the treatment XX, whereas YY is a long-term effect of XX. Randomized controlled trials are typically short-lived due to cost considerations and, thus, they are typically conducted to estimate short-term effects but not longer ones. Observational data, on the other hand, is much cheaper to obtain and, thus, they may include long-term outcome observations. Unfortunately, observational data is typically subject to unmeasured confounding, and mismeasurements due to self-reporting. Therefore, we assume that a randomized controlled trial was conducted to estimate TE(X,Z)TE(X,Z), but not TE(Z,Y)TE(Z,Y) or TE(X,Y)TE(X,Y). We also assume that the probability distribution p(W,Z,Y)p(W,Z,Y) was estimated from observational data, where WW represents the self-reported treatment, which may differ from the actual unmeasured treatment XX. Our goal is computing TE(X,Y)TE(X,Y). Unfortunately, this cannot be done from the information available. However, the fact that TE(X,Y)=IE(X,Y)TE(X,Y)=IE(X,Y) implies, as we show below, that TE(X,Y)TE(X,Y) can be bounded sometimes.

Our setup above is similar to the one by [19] with the difference that they assume no unmeasured confounding. Our setup is also close to the one by [20] with the difference that they consider linear and partial linear structural equation models, while we consider non-parametric models. Moreover, unlike us, these two works assume that the true treatment is available in the observational data.

Provided that ZZ is binary, we have that

IE(X,Y)\displaystyle IE(X,Y) =p(z|x)xE[Y|x,z]p(x)\displaystyle=p(z|x)\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime})
+p(z¯|x)xE[Y|x,z¯]p(x)\displaystyle+p(\overline{z}|x)\sum_{x^{\prime}}E[Y|x^{\prime},\overline{z}]p(x^{\prime})
p(z|x¯)xE[Y|x,z]p(x)\displaystyle-p(z|\overline{x})\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime})
p(z¯|x¯)xE[Y|x,z¯]p(x)\displaystyle-p(\overline{z}|\overline{x})\sum_{x^{\prime}}E[Y|x^{\prime},\overline{z}]p(x^{\prime})
=[p(z|x)p(z|x¯)][xE[Y|x,z]p(x)]\displaystyle=[p(z|x)-p(z|\overline{x})][\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime})]
+[p(z¯|x)p(z¯|x¯)][xE[Y|x,z¯]p(x)]\displaystyle+[p(\overline{z}|x)-p(\overline{z}|\overline{x})][\sum_{x^{\prime}}E[Y|x^{\prime},\overline{z}]p(x^{\prime})]
=[p(z|x)p(z|x¯)][xE[Y|x,z]p(x)]\displaystyle=[p(z|x)-p(z|\overline{x})][\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime})]
+[p(z|x)+p(z|x¯)][xE[Y|x,z¯]p(x)]\displaystyle+[-p(z|x)+p(z|\overline{x})][\sum_{x^{\prime}}E[Y|x^{\prime},\overline{z}]p(x^{\prime})]
=[p(z|x)p(z|x¯)][xE[Y|x,z]p(x)xE[Y|x,z¯]p(x)]\displaystyle=[p(z|x)-p(z|\overline{x})][\sum_{x^{\prime}}E[Y|x^{\prime},z]p(x^{\prime})-\sum_{x^{\prime}}E[Y|x^{\prime},\overline{z}]p(x^{\prime})]
=[E[Zx]E[Zx¯]][E[Yz]E[Yz¯]]\displaystyle=[E[Z_{x}]-E[Z_{\overline{x}}]][E[Y_{z}]-E[Y_{\overline{z}}]]
=TE(X,Z)TE(Z,Y).\displaystyle=TE(X,Z)\cdot TE(Z,Y).

Let us define the observed or partially adjusted average total effect of ZZ on YY as

TEobs(Z,Y)=wE[Y|z,w]p(w)wE[Y|z¯,w]p(w).TE_{obs}(Z,Y)=\sum_{w}E[Y|z,w]p(w)-\sum_{w}E[Y|\overline{z},w]p(w).

Note that it can be computed from the observed data distribution p(W,Z,Y)p(W,Z,Y). If E[Y|z,W]E[Y|z^{\prime},W] and E[Z|W]E[Z|W] are one nonincreasing and the other nondecreasing in WW for all z{z,z¯}z^{\prime}\in\{z,\overline{z}\}, then TE(Z,Y)TEobs(Z,Y)TE(Z,Y)\geq TE_{obs}(Z,Y) [16, 17]. On the other hand, if E[Y|z,W]E[Y|z^{\prime},W] and E[Z|W]E[Z|W] are both nonincreasing or both nondecreasing in WW for all z{z,z¯}z^{\prime}\in\{z,\overline{z}\}, then TEobs(Z,Y)TE(Z,Y)TE_{obs}(Z,Y)\geq TE(Z,Y) [16, 17].444In the proofs of these results, XX is a parent of VV. The results also hold when XX is a child of VV, since (i) every probability distribution that is representable when XX is a child of VV is also representable when XX is a parent of VV and vice versa [2], and (ii) TE(Z,Y)TE(Z,Y) and TEobs(Z,Y)TE_{obs}(Z,Y) give each the same result in both cases. Note that the antecedents of these rules are testable from the observed data distribution p(W,Z,Y)p(W,Z,Y). Not in all but in many cases, these rules together with the knowledge of TE(X,Z)TE(X,Z) enable us to bound IE(X,Y)IE(X,Y) and even determine its sign. Specifically,

IE(X,Y)(2𝟏1)(2𝟏1)\displaystyle IE(X,Y)\cdot(2\cdot\boldsymbol{1}_{\neq}-1)\cdot(2\cdot\boldsymbol{1}_{\geq}-1)
TE(X,Z)TEobs(Z,Y)(2𝟏1)(2𝟏1)\displaystyle\geq TE(X,Z)\cdot TE_{obs}(Z,Y)\cdot(2\cdot\boldsymbol{1}_{\neq}-1)\cdot(2\cdot\boldsymbol{1}_{\geq}-1)

where 𝟏\boldsymbol{1}_{\neq} is 1 (respectively, 0) if E[Y|z,W]E[Y|z^{\prime},W] and E[Z|W]E[Z|W] are one nonincreasing and the other nondecreasing (respectively, both nonincreasing or both nondecreasing) in WW for all z{z,z¯}z^{\prime}\in\{z,\overline{z}\}, and 𝟏\boldsymbol{1}_{\geq} is 1 if TE(X,Z)0TE(X,Z)\geq 0 and 0 otherwise.

Finally, note that the last equation also holds if we add the edge XYX\rightarrow Y to the causal graph under study. To see it, simply pre-process the graph with the three transformations at the beginning of Section 2.2.

3. Discussion

We have proposed new measures of direct and indirect effects. They are based on contrasting the effects of exposure and non-exposure and they do not require selecting a reference value. This makes them unlike the existing measures in the literature and, thus, suitable in cases where the existing ones are unsuitable. The opposite is also true. The new measures assume that the direct path from expose to outcome is mediated by an unmeasured random variable. Its identity is irrelevant. This arguably holds in most domains.

When there is unmeasured treatment-outcome confounding, we have shown that the new measure of indirect effect still applies, whereas the new measure of direct effect can be bounded in some cases. This also sets them apart from the existing measures. Finally, we have shown how the new measure of indirect effect can be used to sometimes bound long-term effects by combining experimental and observational data.

As future work, we note that the bounds presented here require some random variables being binary. It would be worth studying the possibility of relaxing this requirement by making use of the results in [18, 21].

References

  • Pearl [2001] J. Pearl. Direct and Indirect Effects. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, pages 411–420, 2001.
  • Pearl [2009] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009.
  • Geneletti [2007] S. Geneletti. Identifying Direct and Indirect Effects in a Non-Counterfactual Framework. Journal of the Royal Statistical Society Series B, 69:199–215, 2007.
  • VanderWeele et al. [2014] T. J. VanderWeele, S. Vansteelandt, and J. M. Robins. Effect Decomposition in the Presence of an Exposure-Induced Mediator-Outcome Confounder. Epidemiology, 25:300–306, 2014.
  • Fulcher et al. [2020] I. R. Fulcher, I. Shpitser, S. Marealle, and E. J. Tchetgen Tchetgen. Robust Inference on Population Indirect Causal Effects: The Generalized Front Door Criterion. Journal of the Royal Statistical Society Series B, 82:199–214, 2020.
  • VanderWeele [2013] T. J. VanderWeele. A Three-Way Decomposition of a Total Effect into Direct, Indirect, and Interactive Effects. Epidemiology, 24:224–232, 2013.
  • VanderWeele [2014] T. J. VanderWeele. A Unification of Mediation and Interaction: A Four-Way Decomposition. Epidemiology, 25:749–761, 2014.
  • Pearl [2012] J. Pearl. The Causal Mediation Formula - A Guide to the Assessment of Pathways and Mechanisms. Prevention Science, 13:426–436, 2012.
  • Ogburn and VanderWeele [2012a] E. L. Ogburn and T. J. VanderWeele. Analytic Results on the Bias Due to Nondifferential Misclassification of a Binary Mediator. American Journal of Epidemiology, 176:555–561, 2012a.
  • Shpitser and Pearl [2009] I. Shpitser and J. Pearl. Effects of Treatment on the Treated: Identification and Generalization. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages 514–521, 2009.
  • Austin and Brunner [2004] P. C. Austin and L. J. Brunner. Inflation of the Type I Error Rate when a Continuous Confounding Variable is Categorized in Logistic Regression Analyses. Statistics in medicine, 23:1159–1178, 2004.
  • Altman and Royston [2006] D. G. Altman and P. Royston. The Cost of Dichotomising Continuous Variables. BMJ, 332:1080, 2006.
  • Chen et al. [2007] H. Chen, P. Cohen, and S. Chen. Biased Odds Ratios from Dichotomization of Age. Statistics in medicine, 26:3487–3497, 2007.
  • Brenner [1997] H. Brenner. A Potential Pitfall in Control of Covariates in Epidemiologic Studies. Epidemiology, 9:68–71, 1997.
  • Gabriel et al. [2022] E. E. Gabriel, J. M. Peña, and A. Sjölander. Bias Attenuation Results for Dichotomization of a Continuous Confounder. Journal of Causal Inference, 10:515–526, 2022.
  • Ogburn and VanderWeele [2012b] E. L. Ogburn and T. J. VanderWeele. On the Nondifferential Misclassification of a Binary Confounder. Epidemiology, 23:433–439, 2012b.
  • Peña [2020] J. M. Peña. On the Monotonicity of a Nondifferentially Mismeasured Binary Confounder. Journal of Causal Inference, 8:150–163, 2020.
  • Sjölander et al. [2022] A. Sjölander, J. M. Peña, and E. E. Gabriel. Bias Results for Nondifferential Mismeasurement of a Binary Confounder. Statistics & Probability Letters, 186:109474, 2022.
  • Athey et al. [2019] S. Athey, R. Chetty, G. W. Imbens, and H. Kang. The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely. Technical Report 26463, National Bureau of Economic Research, 2019.
  • Van Goffrier et al. [2023] G. Van Goffrier, L. Maystre, and C. M. Gilligan-Lee. Estimating Long-Term Causal Effects from Short-Term Experiments and Long-Term Observational Data with Unobserved Confounding. In Proceedings of the 2nd Conference on Causal Learning and Reasoning, 2023.
  • Peña et al. [2021] J. M. Peña, S. Balgi, A. Sjölander, and E. E. Gabriel. On the Bias of Adjusting for a Non-Differentially Mismeasured Discrete Confounder. Journal of Causal Inference, 9:229–249, 2021.