This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bounds on Causal Effects and Application to High Dimensional Data

Ang Li
University of California, Los Angeles
Computer Science Department
[email protected]
&Judea Pearl
University of California, Los Angeles
Computer Science Department
[email protected]
Abstract

This paper addresses the problem of estimating causal effects when adjustment variables in the back-door or front-door criterion are partially observed. For such scenarios, we derive bounds on the causal effects by solving two non-linear optimization problems, and demonstrate that the bounds are sufficient. Using this optimization method, we propose a framework for dimensionality reduction that allows one to trade bias for estimation power, and demonstrate its performance using simulation studies.

1 Introduction

Estimating causal effects has been encountered in many areas of industry, marketing, and health science, and it is the most critical problem in causal inference. Pearl’s back-door and front-door criteria, along with the adjustment formula [Pearl(1995)], are powerful tools for estimating causal effects. In this paper, the problem of estimating causal effects when adjustment variables in the back-door or front-door criterion are partially observable, or when the adjustment variables have high dimensionality, is addressed.

Consider the problem of estimating the causal effects of XX on YY when a sufficient set WUW\cup U of confounders is partially observable (see Figure 1(a)). Because WUW\cup U is assumed to be sufficient, the causal effects are identified from measurements on X,Y,W,X,Y,W, and UU and can be written as

P(y|do(x))\displaystyle P(y|do(x)) =\displaystyle= w,uP(y|x,w,u)P(w,u)=w,uP(x,y,w,u)P(w,u)P(x,w,u).\displaystyle\sum_{w,u}P(y|x,w,u)P(w,u)=\sum_{w,u}\frac{P(x,y,w,u)P(w,u)}{P(x,w,u)}.

However, if UU is unobserved, dd-separation tells us immediately that adjusting for WW is inadequate by leaving the back-door path XUYX\xleftarrow{}U\xrightarrow{}Y unblocked. Therefore, regardless of sample size, the causal effects of XX on YY cannot be estimated without bias. However, it turns out that when given a prior distribution P(U)P(U), we can obtain bounds on the causal effects. We will demonstrate later that the midpoints of the bounds are sufficient for estimating the causal effects.

Bounding has been proven to be useful in causal inference. [Balke and Pearl(1997a)] provided bounds on causal effects with imperfect compliance, [Tian and Pearl(2000)] proposed bounds on probabilities of causation, [Cai et al.(2008)Cai, Kuroki, Pearl, and Tian] provided bounds on causal effects with the presence of confounded intermediate variables, and [Li and Pearl(2019)] proposed bounds on the benefit function of a unit selection problem.

Although P(U)P(U) is assumed to be given, it is usually known regardless of the model itself (e.g., UU stands for gender, gene type, blood type, or age). Alternatively, if costs permit, one can estimate P(U)P(U) by re-testing within a small sampled sub-population.

UUWWXXYY
(a) UU is unobserved.
ZZXXYY
(b) ZZ has high dimensionality.
UUWWXXYY
(c) Causal diagram of an equivalent problem.
UUWWXXYY
(d) UU is unobserved and independent with WW.
Figure 1: Needed the causal effects of XX on YY.

A second problem considered in this paper is that of estimating causal effects when a sufficient set ZZ of confounders is fully observable (see Figure 1(b)), but with a high dimensionality (e.g., ZZ has 10241024 instantiates). In such a case, a prohibitively large sample size would be required, which is generally recognized to be impractical. We propose a new framework that transforms the problem associated with Figure 1(b) into an equivalent problem associated with Figure 1(c) containing WW and UU, which have much smaller dimensionalities (e.g., WW and UU have 3232 instantiates). We then estimate bounds on causal effects of the equivalent problem and take the midpoints as the effect estimates. We demonstrate through a simulation that this method can deliver good estimates of causal effects of the original problem.

2 Preliminaries & Related Works

In this section, we review the back-door and front-door criteria and their associated adjustment formulas [Pearl(1995)]. We use the causal diagrams in [Pearl(1995), Spirtes et al.(2000)Spirtes, Glymour, Scheines, and Heckerman, Pearl(2009), Koller and Friedman(2009)].

One key concept of a causal diagram is called dd-separation [Pearl(2014)].

Definition 1 (dd-separation).

In a causal diagram GG, a path pp is blocked by a set of nodes ZZ if and only if

  1. 1.

    pp contains a chain of nodes ABCA\xrightarrow{}B\xrightarrow{}C or a fork ABCA\xleftarrow{}B\xrightarrow{}C such that the middle node BB is in ZZ (i.e., BB is conditioned on), or

  2. 2.

    pp contains a collider ABCA\xrightarrow{}B\xleftarrow{}C such that the collision node BB is not in ZZ, and no descendant of BB is in ZZ.

If ZZ blocks every path between two nodes XX and YY, then XX and YY are dd-separated conditional on ZZ, and thus are independent conditional on ZZ, denoted as XY|ZX\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\ |\ Z.

With the concept of dd-separation in a causal diagram, Pearl proposed the back-door and front-door criteria as follows:

Definition 2 (Back-Door Criterion).

Given an ordered pair of variables (X,Y)(X,Y) in a directed acyclic graph GG, a set of variables ZZ satisfies the back-door criterion relative to (X,Y)(X,Y), if no node in ZZ is a descendant of XX, and ZZ blocks every path between XX and YY that contains an arrow into XX.

If a set of variables ZZ satisfies the back-door criterion for XX and YY, the causal effects of XX on YY are given by the adjustment formula:

P(y|do(x))=zP(y|x,z)P(z).\displaystyle P(y|do(x))=\sum_{z}P(y|x,z)P(z). (1)
Definition 3 (Front-Door Criterion).

A set of variables ZZ is said to satisfy the front-door criterion relative to an ordered pair of variables (X,Y)(X,Y) if

  • ZZ intercepts all directed paths from XX to YY;

  • there is no back-door path from XX to ZZ; and

  • all back-door paths from ZZ to YY are blocked by XX.

If a set of variables ZZ satisfies the front-door criterion for XX and YY, and if P(x,Z)>0P(x,Z)>0, then the causal effects of XX on YY are given by the adjustment formula:

P(y|do(x))=zP(z|x)xP(y|x,z)P(x).\displaystyle P(y|do(x))=\sum_{z}P(z|x)\sum_{x^{\prime}}P(y|x^{\prime},z)P(x^{\prime}). (2)

The back-door and front-door criteria are two powerful tools for estimating causal effects; however, causal effects are not identifiable if the set of adjustment variables ZZ is not fully observable. [Tian and Pearl(2000)] provided the naivest bounds for causal effects (Equation 3), regardless of the causal diagram.

P(x,y)P(y|do(x))1P(x,y).\displaystyle P(x,y)\leq P(y|do(x))\leq 1-P(x,y^{\prime}). (3)

As the first contribution of this study, we obtain narrower bounds of the causal effects by leveraging another source of knowledge, i.e., a causal diagram behind data combined with measurements of a set WW (observable part of ZZ) of covariates and a prior information of a set UU (unobservable part of ZZ), in a causal diagram in which the bounds are solutions to two non-linear optimization problems. We illustrate that the midpoints of the bounds are sufficient for estimating the causal effects.

Using this optimization method, our second contribution is the proposal of a new framework for estimating causal effects when a set of fully observable adjustment variables ZZ has a high dimensionality without any assumption regarding the data-generating process. [Maathuis et al.(2009)Maathuis, Kalisch, Bühlmann, et al.] proposed a method of estimating causal effects when the number of covariates is larger than the sample size. However, it relies on several assumptions, including the assumption that the distribution of covariates is multivariate normal. The method is limited if the distribution of covariates is unknown or does not have accuracy estimate owing to the limitation of the sample size.

3 Bounds on Causal Effects

In this section, we demonstrate how bounds on causal effects with partially observable back-door or front-door variables can be obtained through non-linear optimizations.

3.1 Partially Observable Back-Door Variables

Theorem 4.

Given a causal diagram GG and a distribution compatible with GG, let WUW\cup U be a set of variables satisfying the back-door criterion in GG relative to an ordered pair (X,Y)(X,Y), where WUW\cup U is partially observable, i.e., only probabilities P(X,Y,W)P(X,Y,W) and P(U)P(U) are given, the causal effects of XX on YY are then bounded as follows:

LBP(y|do(x))UB\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 4 and UB is the solution to the non-linear optimization problem in Equation 5.

LB=minw,uaw,ubw,ucw,u,\displaystyle LB=\min\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}}, (4)
UB=maxw,uaw,ubw,ucw,u,\displaystyle UB=\max\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}}, (5)
where,
uaw,u=P(x,y,w),ubw,u=P(w),ucw,u=P(x,w) for all wW;\displaystyle\sum_{u}a_{w,u}=P(x,y,w),\sum_{u}b_{w,u}=P(w),\sum_{u}c_{w,u}=P(x,w)\text{ for all }w\in W;
and for all wW and uU,\displaystyle\text{ and~{}for~{}all }w\in W\text{ and }u\in U,
bw,ucw,uaw,u,\displaystyle b_{w,u}\geq c_{w,u}\geq a_{w,u},
max{0,p(x,y,w)+p(u)1}aw,umin{P(x,y,w),p(u)},\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{w,u}\leq\min\{P(x,y,w),p(u)\},
max{0,p(w)+p(u)1}bw,umin{P(w),p(u)},\displaystyle\max\{0,p(w)+p(u)-1\}\leq b_{w,u}\leq\min\{P(w),p(u)\},
max{0,p(x,w)+p(u)1}cw,umin{P(x,w),p(u)}.\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq c_{w,u}\leq\min\{P(x,w),p(u)\}.

3.2 Partially Observable Front-Door Variables

Theorem 5.

Given a causal diagram GG and distribution compatible with GG, let WUW\cup U be a set of variables satisfying the front-door criterion in GG relative to an ordered pair (X,Y)(X,Y), where WUW\cup U is partially observable, i.e., only probabilities P(X,Y,W)P(X,Y,W) and P(U)P(U) are given and P(x,W,U)>0P(x,W,U)>0, the causal effects of XX on YY are then bounded as follows:

LBP(y|do(x))UB\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 6 and UB is the solution to the non-linear optimization problem in Equation 7.

LB=minw,ubx,w,uP(x)xax,w,uP(x)bx,w,u,\displaystyle LB=\min\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}}, (6)
UB=maxw,ubx,w,uP(x)xax,w,uP(x)bx,w,u,\displaystyle UB=\max\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}}, (7)
where,
uax,w,u=P(x,y,w),ubx,w,u=P(x,w) for all xX and wW;\displaystyle\sum_{u}a_{x,w,u}=P(x,y,w),\sum_{u}b_{x,w,u}=P(x,w)\text{ for all }x\in X\text{ and }w\in W;
and for all xX,wW, and uU,\displaystyle\text{ and~{}for~{}all }x\in X\text{,}w\in W\text{, and }u\in U,
bx,w,uax,w,u,\displaystyle b_{x,w,u}\geq a_{x,w,u},
max{0,p(x,y,w)+p(u)1}ax,w,umin{P(x,y,w),p(u)},\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{x,w,u}\leq\min\{P(x,y,w),p(u)\},
max{0,p(x,w)+p(u)1}bx,w,umin{P(x,w),p(u)}.\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq b_{x,w,u}\leq\min\{P(x,w),p(u)\}.

Notably, if any observational data (e.g., P(U)P(U)) are unavailable in the above theorems, we can remove that term, and the rest of non-linear optimization problems still provide valid bounds for the causal effects. In general, midpoints of bounds on causal effects are effective estimates. However, the lower (upper) bounds are also informative, which can be interpreted as the minimal (maximal) causal effects. The proofs of Theorems 4 and 5 are provided in the appendix.

4 Example

Herein, we present a simulated example to demonstrate that the midpoints of the bounds on the causal effects given by Theorem 4 are adequate for estimating the causal effects.

4.1 Causal Effect of a Drug

Drug manufacturers want to know the causal effect of recovery when a drug is taken. Thus, they conduct an observational study. Here, the recovery rates of 700700 patients were recorded. A total of 192192 patients chose to take the drug and 508508 patients did not. The results of the study are shown in Table 2. Blood type (type O or not) is not the only confounder of taking the drug and recovery. Another confounder is age (below the age of 7070 or not). The manufacturers have no data associated with age. They only know that 85.43%85.43\% of people in their region are below the age of 7070.

Because both age and blood type are confounders of taking the drug and recovery, and the observational data associated with age are unobservable, the causal effect is not identifiable.

Let X=xX=x denote the event that a patient took the drug, and X=xX=x^{\prime} denote the event that a patient did not take the drug. Let Y=yY=y denote the event that a patient recovered, and Y=yY=y^{\prime} denote the event that a patient did not recover. Let W=wW=w represent a patient with blood type O, and W=wW=w^{\prime} represent a patient without blood type O. Let U=uU=u represent a patient below the age of 7070, and U=uU=u^{\prime} represent a patient above the age of 7070. The causal diagram is shown in Figure 1(d).

Table 1: Results of an observational study considering blood type.
Drug No Drug
Blood
type O
2323 out of 3636
recovered
(63.9%63.9\%)
145145 out of 225225
recovered
(64.4%64.4\%)
Not blood
type O
135135 out of 156156
recovered
(86.5%86.5\%)
152152 out of 283283
recovered
(53.7%53.7\%)
Overall
158158 out of 192192
recovered
(82.3%82.3\%)
297297 out of 508508
recovered
(58.5%58.5\%)
Table 2: Informer view of the observational data considering blood type and age.
Drug No Drug
Blood
type O
and
Age
below 7070
33 out of 44
recovered
(75.0%75.0\%)
141141 out of 219219
recovered
(64.4%64.4\%)
Blood
type O
and
Age
above 7070
2020 out of 3232
recovered
(62.5%62.5\%)
44 out of 66
recovered
(66.7%66.7\%)
Not blood
type O
and
Age
below 7070
135135 out of 151151
recovered
(89.4%89.4\%)
117117 out of 224224
recovered
(52.2%52.2\%)
Not blood
type O
and
Age
above 7070
0 out of 55
recovered
(0.0%0.0\%)
3535 out of 5959
recovered
(59.3%59.3\%)
Overall
158158 out of 192192
recovered
(82.3%82.3\%)
297297 out of 508508
recovered
(58.5%58.5\%)

An option for the manufacturers could be estimating the causal effect through the Tian-Pearl bounds in Equation 3 and the observational data from Table 2, where

P(x,y)\displaystyle P(x,y) =\displaystyle= wP(y|x,w)P(x|w)P(w)=0.2257,\displaystyle\sum_{w}P(y|x,w)P(x|w)P(w)=0.2257,
1P(x,y)\displaystyle 1-P(x,y^{\prime}) =\displaystyle= 1wP(y|x,w)P(x|w)P(w)=0.9514.\displaystyle 1-\sum_{w}P(y^{\prime}|x,w)P(x|w)P(w)=0.9514.

Therefore, the bounds on the causal effect estimated using Equation 3 are 0.2257P(y|do(x))0.95140.2257\leq P(y|do(x))\leq 0.9514, where the causal information of the covariate WW and the prior information P(U)P(U) are not used. These bounds are not sufficiently informative to conclude the actual causal effect. Although one may believe that we can use the midpoint of the bounds (i.e., 0.58860.5886), the gap (i.e., 0.95140.2257=0.72570.9514-0.2257=0.7257) between the bounds is not small; hence, this point estimate is unconvincing.

Now, considering the proposed bounds in Theorem 4 with the observational data from Table 2. WUW\cup U satisfies the back-door criterion, and P(X,Y,W)P(X,Y,W) and P(U)P(U) are available. We have 1212 optimal variables in each objective function, because WW and UU are binary. With the help of the “SLSQP” solver [Kraft(1988)] in the scipy package [SciPyCommunity(2020)], we obtain the bounds on the causal effect, which are 0.4728P(y|do(x))0.95140.4728\leq P(y|do(x))\leq 0.9514. The lower bound actually increased significantly, and reached close to 0.50.5, which can help make decisions. The midpoint is 0.71210.7121. Our conclusion is then that the causal effect of recovery when taking the drug is 0.71210.7121. We show in the following section that this estimate of the causal effect is extremely close to the actual causal effect.

4.2 Informer View of the Causal Effect

An informer with access to the fully observed observational data, as summarized in Table 2 (Note that although it can be verified that the data in Table 2 are compatible with those in Table 2, we will never know these numbers in reality), would easily calculate the causal effect of recovery when taking the drug using the adjustment formula in Equation 1 (shown in Equation 8). The error of the estimate of the causal effect using Theorem 4 is only (0.75180.7121)/0.75185.28%(0.7518-0.7121)/0.7518\approx 5.28\%.

P(y|do(x))\displaystyle P(y|do(x)) =\displaystyle= z,uP(y|x,z,u)P(z,u)=0.7518.\displaystyle\sum_{z,u}P(y|x,z,u)P(z,u)=0.7518. (8)

4.3 Simulation Results

Here, we further illustrate that the midpoints of the proposed bounds on causal effects are sufficient for estimating the causal effects, and the midpoints of the proposed bounds in Theorem 4 are better than the midpoints of the Tian-Pearl bounds in Equation 3 based on a random simulation.

We employ the simplest causal diagram in Figure 1(a) with binary WW, UU, such that WUW\cup U satisfies the back-door criterion. We randomly generated 10001000 sample distributions compatible with the causal diagram (the algorithm for generating the sample distributions is shown in the appendix). The average gap (upper bound - lower bound) of the Tian-Pearl bounds among 10001000 samples is 0.4870.487, and the average gap of the proposed bounds among 10001000 samples is 0.3830.383. We then randomly picked 100100 out of 10001000 sample distributions to draw the graph of the actual causal effects, the midpoints of the Tian-Pearl bounds, and the midpoints of the proposed bounds. The results are shown in Figure 2(a).

Refer to caption
(a) Estimates of causal effects with partially observed confounders.
Refer to caption
(b) Estimates of causal effects with high dimensional data.
Figure 2: Bounds on causal effects of 100 sample distributions, where the Tian-Pearl bounds are obtained through Equation 3 and the proposed bounds are obtained through Theorem 4.

From Figure 2(a), although both midpoints of the bounds on the causal effects are good estimates of the actual causal effects, the midpoints of the proposed bounds are much closer to the actual causal effects, particularly when the causal effects are close to 0 and 11. The average gap (upper bounds - lower bounds), 0.3830.383, of the proposed bounds among 10001000 samples is much smaller than the average gap, 0.4870.487, of the Tian-Pearl bounds among 10001000 samples. This means that the midpoints of the proposed bounds are more convincing, because the bounds are narrower.

5 Application to High Dimensionality of Adjustment Variables

Consider the problem of estimating the causal effects of XX on YY when a sufficient set ZZ, which satisfies the back-door or front-door criterion, is fully observable (e.g., see Figure 1(b)) in a causal diagram GG but has high dimensionality (e.g., ZZ has 10241024 instantiates), a prohibitive large sample size would be required to estimate the causal effects, which is generally recognized to be impractical. Herein, we propose a new framework to achieve dimensionality reduction.

5.1 Equivalent Causal Diagram with Observational Data

Definition 6 (Equivalent causal diagram with observational data).

Let G,GG,G^{\prime} be causal diagrams both containing nodes X,YX,Y. OO are observational data compatible with GG, and OO^{\prime} are observational data compatible with GG^{\prime}. We say that (G,O)(G,O) is equivalent to (G,O)(G^{\prime},O^{\prime}) if the causal effects of XX on YY with (G,O)(G,O) is equal to the causal effects of XX on YY with (G,O)(G^{\prime},O^{\prime}).

This equivalent tuple (G,O)(G^{\prime},O^{\prime}) is easy to obtain. We can simply add two new nodes WW and UU, and remove a node ZZ in GG to obtain GG^{\prime}. Let the arrows entering ZZ in GG now enter both WW and UU in GG^{\prime}, and let the arrows exiting ZZ in GG now exit both WW and UU in GG^{\prime}. Finally, add an arrow from UU to WW. It is easy to show that (G,O)(G,O) and (G,O)(G^{\prime},O^{\prime}) are equivalent if the states of ZZ are the Cartesian product of the states of WW and the states of UU. Formally, we have the following theorem (the proof of the theorem is provided in the appendix),

Theorem 7.

Let GG be a causal diagram containing nodes {V1,,Vn3,X,Y,Z}\{V_{1},...,V_{n-3},X,Y,Z\}. Let OO be any observational data compatible with GG. Suppose there exists a set of variables that satisfies the back-door or front-door criterion relative to (X,Y)(X,Y) in GG, then, (G,O)(G,O) is equivalent to (G,O)(G^{\prime},O^{\prime}) (GG^{\prime} containing nodes {V1,,Vn3,X,Y,W,U}\{V_{1},...,V_{n-3},X,Y,W,U\}; OO^{\prime} are observational data compatible with GG^{\prime}), where the number of states in WW times the number of states in UU is equal to the number of states in ZZ, and the structure of GG^{\prime} and the observational data OO^{\prime} are obtained as follows:

Structure of GG^{\prime}:
Let ParentsG(H)Parents_{G}(H) be the parents of HH in causal diagram GG.
ParentsG(U)=ParentsG(Z)Parents_{G^{\prime}}(U)=Parents_{G}(Z), ParentsG(W)=ParentsG(Z){U}Parents_{G^{\prime}}(W)=Parents_{G}(Z)\cup\{U\},
ParentsG(H)=ParentsG(H)Parents_{G^{\prime}}(H)=Parents_{G}(H) if ZParentsG(H)Z\notin Parents_{G}(H) for H{V1,,Vn3,X,Y}H\in\{V_{1},...,V_{n-3},X,Y\},
ParentsG(H)=ParentsG(H){Z}{W,U}Parents_{G^{\prime}}(H)=Parents_{G}(H)\setminus\{Z\}\cup\{W,U\}
                           if ZParentsG(H)Z\in Parents_{G}(H) for H{V1,,Vn3,X,Y}H\in\{V_{1},...,V_{n-3},X,Y\}.

Note that, let QQ be the set of variables in GG that satisfies the back-door or front-door criterion relative to (X,Y)(X,Y), then QQ^{\prime} satisfies the back-door or front-door criterion relative to (X,Y)(X,Y) in GG^{\prime} , where
Q=QQ^{\prime}=Q if ZQZ\notin Q,
Q=Q{Z}{W,U}Q^{\prime}=Q\setminus\{Z\}\cup\{W,U\} if ZQZ\in Q.

Observational data:
Let pp be the number of states in WW, and qq be the number of states in UU.
The states of Z are the Cartesian product of the states of W and the states of U.
In detail, (wj,uk)(w_{j},u_{k}) is equivalent to z(j1)q+kz_{(j-1)*q+k}, wjw_{j} is equivalent to k=1q(wj,uk)=k=1qz(j1)q+k\lor_{k=1}^{q}(w_{j},u_{k})=\lor_{k=1}^{q}z_{(j-1)*q+k}, and uku_{k} is equivalent to j=1p(wj,uk)=j=1pz(j1)q+k\lor_{j=1}^{p}(w_{j},u_{k})=\lor_{j=1}^{p}z_{(j-1)*q+k}, i.e., P(wj,uk,V)=P(z(j1)q+k,V)P(w_{j},u_{k},V)=P(z_{(j-1)*q+k},V) for any V{V1,,Vn3,X,Y}V\subseteq\{V_{1},...,V_{n-3},X,Y\}.

For example, consider the causal diagram in Figure 1(b) and the observational data (in the form of conditional probability tables (CPTs), where X,YX,Y are binary, and ZZ has 44 states.) in Table 4. The causal effect, P(y|do(x))P(y|do(x)), through the adjustment formula in Equation 1, is 0.470.47. Based on the construction in Theorem 7 (see the appendix for details), we have the causal diagram in Figure 1(b) with the observational data in Table 4 is equivalent to the causal diagram in Figure 1(c) with the observational data in Table 4 (all nodes are binary), and we can verify that the causal effect, P(y|do(x))P(y|do(x)), in the causal diagram in Figure 1(c) with the observational data in Table 4 is also 0.470.47.

Table 3: Observational data in CPTs compatible with the causal diagram in Figure 1(b).
P(z1)P(z_{1}) 0.3
P(z2)P(z_{2}) 0.2
P(z3)P(z_{3}) 0.2
P(z4)P(z_{4}) 0.3
P(x|z1)P(x|z_{1}) 0.1
P(x|z2)P(x|z_{2}) 0.4
P(x|z3)P(x|z_{3}) 0.5
P(x|z4)P(x|z_{4}) 0.7
P(y|x,z1)P(y|x,z_{1}) 0.2
P(y|x,z1)P(y|x^{\prime},z_{1}) 0.3
P(y|x,z2)P(y|x,z_{2}) 0.7
P(y|x,z2)P(y|x^{\prime},z_{2}) 0.1
P(y|x,z3)P(y|x,z_{3}) 0.6
P(y|x,z3)P(y|x^{\prime},z_{3}) 0.5
P(y|x,z4)P(y|x,z_{4}) 0.5
P(y|x,z4)P(y|x^{\prime},z_{4}) 0.4
Table 4: Observational data in CPTs compatible with the causal diagram in Figure 1(c).
P(u)P(u) 0.5
P(w|u)P(w|u) 0.6
P(w|u)P(w|u^{\prime}) 0.4
P(x|u,w)P(x|u,w) 0.1
P(x|u,w)P(x|u,w^{\prime}) 0.4
P(x|u,w)P(x|u^{\prime},w) 0.5
P(x|u,w)P(x|u^{\prime},w^{\prime}) 0.7
P(y|x,u,w)P(y|x,u,w) 0.2
P(y|x,u,w)P(y|x^{\prime},u,w) 0.3
P(y|x,u,w)P(y|x,u,w^{\prime}) 0.7
P(y|x,u,w)P(y|x^{\prime},u,w^{\prime}) 0.1
P(y|x,u,w)P(y|x,u^{\prime},w) 0.6
P(y|x,u,w)P(y|x^{\prime},u^{\prime},w) 0.5
P(y|x,u,w)P(y|x,u^{\prime},w^{\prime}) 0.5
P(y|x,u,w)P(y|x^{\prime},u^{\prime},w^{\prime}) 0.4

Notably, the equivalent tuple is not unique and is transitive (i.e., if (G,O)(G,O) is equivalent to (G,O)(G^{\prime},O^{\prime}), and (G,O)(G^{\prime},O^{\prime}) is equivalent to (G′′,O′′)(G^{\prime\prime},O^{\prime\prime}), then (G,O)(G,O) is equivalent to (G′′,O′′)(G^{\prime\prime},O^{\prime\prime})).

5.2 Dimensionality Reduction

Now, considering the problem in the beginning of Section 5. First, we transform the causal diagram GG with the compatible observational data OO into an equivalent tuple (G,O)(G^{\prime},O^{\prime}) using Algorithm 1 based on the construction in Theorem 7 (note that the algorithm only construct the structure of the GG^{\prime} and assigning the meaning of the states for W,UW,U, the corresponding observatioal data OO^{\prime} are then easy to obtain), then the new problem (G,O)(G^{\prime},O^{\prime}) has the same causal effects of XX on YY as in (G,O)(G,O). By picking the dimensionality of WW (pp in Algorithm 1), we can control the dimensionality of the new problem.

Note that, if Z=(Z1,Z2,,Zm)Z=(Z_{1},Z_{2},...,Z_{m}) in GG is a set of variables, we can repeat Algorithm 1 for each variable in ZZ, and finally obtain W=(W1,W2,,Wm)W=(W_{1},W_{2},...,W_{m}) and U=(U1,U2,,Um)U=(U_{1},U_{2},...,U_{m}), where the multiplication of the number of states in WW is equal to pp.

We then treat the new problem (G,O)(G^{\prime},O^{\prime}) as a partially observable back-door or front-door variables problem in Sections 3.1 and 3.2, where P(X,Y,W)P(X,Y,W) and P(U)P(U) are given, and we can then obtain the bounds of the causal effects through Theorems 4 and 5. We claim that the midpoints of the bounds are good estimates of the original causal effects. In addition, the bounds themselves will help make decisions.

5.3 Example

Consider the problem in Figure 1(b), where XX and YY are binary and ZZ has 256256 states. We randomly generated a distribution P(X,Y,Z)P(X,Y,Z) that is compatible with the causal diagram. Because we know the exact distribution, we can easily obtain the causal effects through Equation 1. The causal effect P(y|do(x))P(y|do(x)) is 0.55270.5527 (the algorithm for generating the distribution is shown in the appendix).

Now, we transform the causal diagram with the observational data into an equivalent tuple (G,O)(G^{\prime},O^{\prime}) (GG^{\prime} is shown in Figure 1(c)) using Algorithm 1 (p=16p=16). We obtain the variable WW of 1616 states and the variable UU of 1616 states in GG^{\prime} ((wj,uk)(w_{j},u_{k}) is equivalent to z(j1)16+kz_{(j-1)*16+k}). We are then forced to use only observational data P(X,Y,W)P(X,Y,W) and P(U)P(U) (the construction of P(X,Y,W)P(X,Y,W) and P(U)P(U) is shown in the appendix), and based on Theorem 4, with the “SLSQP” solver, we obtain the bounds on the causal effect p(y|do(x))p(y|do(x)), which are 0.4595P(y|do(x))0.70120.4595\leq P(y|do(x))\leq 0.7012. We see the midpoint, 0.58040.5804, is extremely close to the actual causal effect, 0.55270.5527.

input : A nn nodes, (X1,X2,,Xn3,X,Y,Z)(X_{1},X_{2},...,X_{n-3},X,Y,Z), causal diagram GG and compatible OO,
pp, the number of states in WW in GG^{\prime} of the equiv. tuple (G,O)(G^{\prime},O^{\prime}).
output : A n+1n+1 nodes, (X1,X2,,Xn3,X,Y,W,U)(X_{1},X_{2},...,X_{n-3},X,Y,W,U), causal diagram GG^{\prime},
Maping relation M1:state of Wstate of ZM_{1}:\text{state of W}\xrightarrow[]{}\text{state of Z},
Maping relation M2:state of Ustate of ZM_{2}:\text{state of U}\xrightarrow[]{}\text{state of Z}.
begin
       mm \leftarrow num_states_in_G(ZZ);
       if mmodp=0m\mod p=0 then
             qq \leftarrow m/pm/p;
            
       end if
       else
            qq \leftarrow m/p+1m/p+1;
            
       end if
       // Set the virtual states for ZZ such that the probability is 0.
      
      num_states_in_G(ZZ) \leftarrow p×qp\times q;
      
      for HH in {X1,,Xn3,X,Y}\{X_{1},...,X_{n-3},X,Y\} do
             num_states_in_G’(HH) \leftarrow num_states_in_G(HH);
             if ZZ\inParents_in_G(HH) then
                   Parents_in_G’(HH) \leftarrow Parents_in_G(HH){Z}{W,U}\setminus\{Z\}\cup\{W,U\};
                  
             end if
             else
                   Parents_in_G’(HH) \leftarrow Parents_in_G(HH);
                  
             end if
            
       end for
      
      num_states_in_G’(WW) \leftarrow pp;
       num_states_in_G’(UU) \leftarrow qq;
       Parents_in_G’(WW) \leftarrow Parents_in_G(ZZ){U}\cup\{U\};
       Parents_in_G’(UU) \leftarrow Parents_in_G(ZZ);
       for i1i\leftarrow 1 to pp do
             M1(wi)M_{1}(w_{i}) \leftarrow k=1qz(i1)q+k\lor_{k=1}^{q}z_{(i-1)*q+k};
            
       end for
      for i1i\leftarrow 1 to qq do
             M2(ui)M_{2}(u_{i}) \leftarrow j=1pz(j1)q+i\lor_{j=1}^{p}z_{(j-1)*q+i};
            
       end for
      
end
Algorithm 1 Generate Equivalent Tuple

Finally, lets consider how many samples are required for each method. According to [Roscoe(1975)], each state needs at least 3030 samples, and therefore, the exact solution by Equation 1 requires 2×2×256×30=307202\times 2\times 256\times 30=30720 samples. However, the proposed bounds based on Theorem 4 only requires max(2×2×16,16)×30=1920max(2\times 2\times 16,16)\times 30=1920 samples. If the sample size is still unacceptable, we can use another equivalent tuple with WW having 88 states and UU having 3232 states, we then only require max(2×2×8,32)×30=960max(2\times 2\times 8,32)\times 30=960 samples to obtain the bounds on the causal effects.

5.4 Simulation Results

Similarly to the previous simulation, we further illustrate that the bounds on the causal effects of the proposed framework are sufficient for estimating the original causal effects.

Once again, by employing the simplest causal diagram in Figure 1(b), where XX and YY are binary and ZZ has 256256 states. We randomly generated 100100 sample distributions compatible with the causal diagram (the algorithm for generating the distributions are shown in the appendix). The average gap (upper bound - lower bound) of the Tian-Pearl bounds among 100100 samples is 0.51020.5102, and the average gap of the proposed bounds through Theorems 7 and 4 among 100100 samples is 0.06760.0676. We then draw the graph of the actual causal effects, the midpoints of the Tian-Pearl bounds, and the midpoints of the proposed bounds through Theorems 7 and 4. The results are shown in Figure 2(b).

From Figure 2(b), both midpoints of the bounds on the causal effects are good estimates of the actual causal effects, whereas the midpoints of the proposed bounds are slightly closer to the actual causal effects, particularly when the causal effects are close to 0 and 11. Although the trend of the Tian-Pearl bounds is also close to the actual causal effects, the Tian-Pearl bounds are more likely to be parallel with the x-axis. Here, the Tian-Pearl bounds perform well because, in high-dimensionality cases, the randomly generated distributions are more likely to yield causal effects of approximately 0.50.5. However, the average gap of the proposed bounds among 100100 samples, 0.06760.0676, is much smaller than the average gap of the Tian-Pearl bounds among 100100 samples, 0.51020.5102. This means that the midpoints of the proposed bounds are more convincing, because the bounds are narrower.

6 Discussion

Here, we discuss additional features of bounds on causal effects.

First, if a whole set of back-door or front-door variables are unobserved, the causal effects have the naivest bounds in Equation 3. When the back-door or front-door variables are gradually observed, the bounds of the causal effects become increasingly narrow. Finally, when the back-door or front-door variables are fully observed, the bounds shrink into point estimates, which are identifiable. This also tells us that, when we pick pp in Algorithm 1, we should pick the largest pp for which the sample size is sufficient to estimate the observational distributions.

Next, bounds in Theorems 4 and 5 are given by non-linear optimizations. Therefore, the quality of the bounds also depends on the optimization solver. The examples and simulated results in this paper are all obtained from the simplest “SLSQP” solver from 1988. The quality of the bounds can be improved if more advanced solvers are applied. Inspired by the idea of Balke’s linear programming [Balke and Pearl(1997b)], we may obtain parametric solutions to non-linear optimizations in Theorems 4 and 5, we then do not need a non-linear optimization solver. However, the problem related to a non-linear optimization solver is not the scope of this paper.

In addition, the constraints in Theorems 4 and 5 are only based on the basic back-door or front-door criterion. We can also add constraints of independencies in a specific graph. For instance, WW and UU are independent in the causal diagram of Figure 1(d), we can then add the constraints that reflect P(W)P(W) and P(U)P(U) as being independent. The greater the number of constraints that are added to the optimizations, the better the bounds we can obtain.

Moreover, if one believes they have a sufficient sample size to estimate causal effects with high dimensionality adjustment variables, the framework in Section 5 could be evidence validating whether the sample size is indeed sufficient.

Next, in Section 5, we transformed (G,O)(G,O) into (G,O)(G^{\prime},O^{\prime}) to obtain the bounds on causal effects with high dimensionality adjustment variables. However, for a tuple (G,O)(G,O), multiple equivalent tuples exist by picking a different pp in Algorithm 1, and each of the equivalent tuple has bounds for the original causal effects. We can compute bounds for as many equivalent tuples as we want and take the maximal lower bounds and the minimal upper bounds.

Finally, based on numerous experiments, we realized that when P(U)P(U) or P(W)P(W) is specific (i.e., closer to 0 or 11), the proposed bounds are almost identified (i.e., the bounds shrink to point estimates). Therefore, in practice, we can always pick the equivalent tuple to transform, in which the P(U)P(U) or P(W)P(W) is close to 0 or 11.

7 Conclusion

We demonstrated how to estimate causal effects when adjustment variables in the back-door or front-door criterion are partially observable by bounding the causal effects using solutions to non-linear optimizations. We provided examples and simulated results illustrating that the proposed method is sufficient to estimate the causal effects. We also proposed a framework for estimating causal effects when the adjustment variables have a high dimensionality. In summary, we analyzed and demonstrated how causal effects can be gained in practice using a causal diagram.

References

  • [Balke and Pearl(1997a)] Alexander Balke and Judea Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439):1171–1176, 1997a.
  • [Balke and Pearl(1997b)] Alexander A Balke and Judea Pearl. Probabilistic counterfactuals: Semantics, computation, and applications. Technical report, UCLA Dept. of Computer Science, 1997b.
  • [Cai et al.(2008)Cai, Kuroki, Pearl, and Tian] Zhihong Cai, Manabu Kuroki, Judea Pearl, and Jin Tian. Bounds on direct effect in the presence of confounded intermediate variables. Biometrics, 64:695–701, 2008.
  • [Koller and Friedman(2009)] Daphne Koller and Nir Friedman. Probabilistic graphical models: Principles and techniques. MIT press, 2009.
  • [Kraft(1988)] Dieter Kraft. A software package for sequential quadratic programming. Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt Köln: Forschungsbericht. Wiss. Berichtswesen d. DFVLR, 1988. URL https://books.google.com/books?id=4rKaGwAACAAJ.
  • [Li and Pearl(2019)] Ang Li and Judea Pearl. Unit selection based on counterfactual logic. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 1793–1799. AAAI Press, 2019.
  • [Maathuis et al.(2009)Maathuis, Kalisch, Bühlmann, et al.] Marloes H Maathuis, Markus Kalisch, Peter Bühlmann, et al. Estimating high-dimensional intervention effects from observational data. The Annals of Statistics, 37(6A):3133–3164, 2009.
  • [Pearl(1995)] Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
  • [Pearl(2009)] Judea Pearl. Causality. Cambridge university press, 2nd edition, 2009.
  • [Pearl(2014)] Judea Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, 2014.
  • [Roscoe(1975)] John T. Roscoe. Fundamental research statistics for the behavioral sciences. Number v. 2 in Editors’ Series in Marketing. Holt, Rinehart and Winston, 1975. ISBN 9780030919343. URL https://books.google.com/books?id=Fe8vAAAAMAAJ.
  • [SciPyCommunity(2020)] SciPyCommunity. Scipy reference guide, 2020. URL https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#rdd2e1855725e-12.
  • [Spirtes et al.(2000)Spirtes, Glymour, Scheines, and Heckerman] Peter Spirtes, Clark N Glymour, Richard Scheines, and David Heckerman. Causation, prediction, and search. MIT press, 2000.
  • [Tian and Pearl(2000)] Jin Tian and Judea Pearl. Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1-4):287–313, 2000.

Appendix A Proof of Theorem 4

Theorem 4.

Given a causal diagram GG and a distribution compatible with GG, let WUW\cup U be a set of variables satisfying the back-door criterion in GG relative to an ordered pair (X,Y)(X,Y), where WUW\cup U is partially observable, i.e., only probabilities P(X,Y,W)P(X,Y,W) and P(U)P(U) are given, the causal effects of XX on YY are then bounded as follows:

LBP(y|do(x))UB\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 9 and UB is the solution to the non-linear optimization problem in Equation 10.

LB=minw,uaw,ubw,ucw,u,\displaystyle LB=\min\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}}, (9)
UB=maxw,uaw,ubw,ucw,u,\displaystyle UB=\max\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}}, (10)
where,
uaw,u=P(x,y,w),ubw,u=P(w),ucw,u=P(x,w) for all wW;\displaystyle\sum_{u}a_{w,u}=P(x,y,w),\sum_{u}b_{w,u}=P(w),\sum_{u}c_{w,u}=P(x,w)\text{ for all }w\in W;
and for all wW and uU,\displaystyle\text{ and~{}for~{}all }w\in W\text{ and }u\in U,
bw,ucw,uaw,u,\displaystyle b_{w,u}\geq c_{w,u}\geq a_{w,u},
max{0,p(x,y,w)+p(u)1}aw,umin{P(x,y,w),p(u)},\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{w,u}\leq\min\{P(x,y,w),p(u)\},
max{0,p(w)+p(u)1}bw,umin{P(w),p(u)},\displaystyle\max\{0,p(w)+p(u)-1\}\leq b_{w,u}\leq\min\{P(w),p(u)\},
max{0,p(x,w)+p(u)1}cw,umin{P(x,w),p(u)}.\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq c_{w,u}\leq\min\{P(x,w),p(u)\}.
Proof.

To show that the LB and UB bound the actual causal effects, we only need to show that there exists a point in feasible space of the non-linear optimization that w,uaw,ubw,ucw,u\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}} is equal to the actual causal effects.
Since WUW\cup U satisfies the back-door criterion, by adjustment formula in Equation 1, we have,

P(y|do(x))\displaystyle P(y|do(x)) =\displaystyle= w,uP(y|x,w,u)P(w,u)\displaystyle\sum_{w,u}P(y|x,w,u)P(w,u)
=\displaystyle= w,uP(x,y,w,u)P(w,u)P(x,w,u)\displaystyle\sum_{w,u}\frac{P(x,y,w,u)P(w,u)}{P(x,w,u)}

Let

aw,u=P(x,y,w,u)\displaystyle a_{w,u}=P(x,y,w,u)
bw,u=P(w,u)\displaystyle b_{w,u}=P(w,u)
cw,u=P(x,w,u)\displaystyle c_{w,u}=P(x,w,u)

We now show that the above set of aw,u,bw,u,cw,ua_{w,u},b_{w,u},c_{w,u} are in feasible space.
We have,

for w\displaystyle\text{ for }w \displaystyle\in W\displaystyle W
uaw,u\displaystyle\sum_{u}a_{w,u} =\displaystyle= uP(x,y,w,u)=P(x,y,w)\displaystyle\sum_{u}P(x,y,w,u)=P(x,y,w)
ubw,u\displaystyle\sum_{u}b_{w,u} =\displaystyle= uP(w,u)=P(w)\displaystyle\sum_{u}P(w,u)=P(w)
ucw,u\displaystyle\sum_{u}c_{w,u} =\displaystyle= uP(x,w,u)=P(x,w)\displaystyle\sum_{u}P(x,w,u)=P(x,w)

and,

  for  all wW\displaystyle w\in W and uU\displaystyle\text{ and~{}~{} }u\in U
bw,u=P(w,u)\displaystyle b_{w,u}=P(w,u) \displaystyle\geq P(x,w,u)=cw,u\displaystyle P(x,w,u)=c_{w,u}
cw,u=P(x,w,u)\displaystyle c_{w,u}=P(x,w,u) \displaystyle\geq P(x,y,w,u)=aw,u\displaystyle P(x,y,w,u)=a_{w,u}
aw,u=P(x,y,w,u)\displaystyle a_{w,u}=P(x,y,w,u) \displaystyle\leq min{P(x,y,w),p(u)}\displaystyle\min\{P(x,y,w),p(u)\}
bw,u=P(w,u)\displaystyle b_{w,u}=P(w,u) \displaystyle\leq min{P(w),p(u)}\displaystyle\min\{P(w),p(u)\}
cw,u=P(x,w,u)\displaystyle c_{w,u}=P(x,w,u) \displaystyle\leq min{P(x,w),p(u)}\displaystyle\min\{P(x,w),p(u)\}
aw,u=P(x,y,w,u)\displaystyle a_{w,u}=P(x,y,w,u) \displaystyle\geq max{0,p(x,y,w)+p(u)1}\displaystyle\max\{0,p(x,y,w)+p(u)-1\}
bw,u=P(w,u)\displaystyle b_{w,u}=P(w,u) \displaystyle\geq max{0,p(w)+p(u)1}\displaystyle\max\{0,p(w)+p(u)-1\}
cw,u=P(x,w,u)\displaystyle c_{w,u}=P(x,w,u) \displaystyle\geq max{0,p(x,w)+p(u)1}\displaystyle\max\{0,p(x,w)+p(u)-1\}

Therefore, the above set of aw,u,bw,u,cw,ua_{w,u},b_{w,u},c_{w,u} are in feasible space, and thus, the UB and LB bound the actual causal effects. ∎

Appendix B Proof of Theorem 5

Theorem 5.

Given a causal diagram GG and distribution compatible with GG, let WUW\cup U be a set of variables satisfying the front-door criterion in GG relative to an ordered pair (X,Y)(X,Y), where WUW\cup U is partially observable, i.e., only probabilities P(X,Y,W)P(X,Y,W) and P(U)P(U) are given and P(x,W,U)>0P(x,W,U)>0, the causal effects of XX on YY are then bounded as follows:

LBP(y|do(x))UB\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 11 and UB is the solution to the non-linear optimization problem in Equation 12.

LB=minw,ubx,w,uP(x)xax,w,uP(x)bx,w,u,\displaystyle LB=\min\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}}, (11)
UB=maxw,ubx,w,uP(x)xax,w,uP(x)bx,w,u,\displaystyle UB=\max\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}}, (12)
where,
uax,w,u=P(x,y,w),ubx,w,u=P(x,w) for all xX and wW;\displaystyle\sum_{u}a_{x,w,u}=P(x,y,w),\sum_{u}b_{x,w,u}=P(x,w)\text{ for all }x\in X\text{ and }w\in W;
and for all xX,wW, and uU,\displaystyle\text{ and~{}for~{}all }x\in X\text{,}w\in W\text{, and }u\in U,
bx,w,uax,w,u,\displaystyle b_{x,w,u}\geq a_{x,w,u},
max{0,p(x,y,w)+p(u)1}ax,w,umin{P(x,y,w),p(u)},\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{x,w,u}\leq\min\{P(x,y,w),p(u)\},
max{0,p(x,w)+p(u)1}bx,w,umin{P(x,w),p(u)}.\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq b_{x,w,u}\leq\min\{P(x,w),p(u)\}.
Proof.

To show that the LB and UB bound the actual causal effects, we only need to show that there exists a point in feasible space of the non-linear optimization that w,ubx,w,uP(x)xax,w,uP(x)bx,w,u\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}} is equal to the actual causal effects.
Since WUW\cup U satisfies front-door criterion and P(u,W,U)>0P(u,W,U)>0, by adjustment formula in Equation 2, we have,

P(y|do(x))\displaystyle P(y|do(x)) =\displaystyle= w,uP(w,u|x)xP(y|x,w,u)P(x)\displaystyle\sum_{w,u}P(w,u|x)\sum_{x^{\prime}}P(y|x^{\prime},w,u)P(x^{\prime})
=\displaystyle= w,uP(x,w,u)P(x)xP(x,y,w,u)P(x)P(x,w,u)\displaystyle\sum_{w,u}\frac{P(x,w,u)}{P(x)}\sum_{x^{\prime}}\frac{P(x^{\prime},y,w,u)P(x^{\prime})}{P(x^{\prime},w,u)}

Let

ax,w,u=P(x,y,w,u)\displaystyle a_{x,w,u}=P(x,y,w,u)
bx,w,u=P(x,w,u)\displaystyle b_{x,w,u}=P(x,w,u)

Similarly to the proof of Theorem 4, it is easy to show that the above set of ax,w,u,bx,w,ua_{x,w,u},b_{x,w,u} are in feasible space, and therefore, LB and UB bound the actual causal effects. ∎

Appendix C Proof of Theorem 7

Theorem 7.

Let GG be a causal diagram containing nodes {V1,,Vn3,X,Y,Z}\{V_{1},...,V_{n-3},X,Y,Z\}. Let OO be any observational data compatible with GG. Suppose there exists a set of variables that satisfies the back-door or front-door criterion relative to (X,Y)(X,Y) in GG, then, (G,O)(G,O) is equivalent to (G,O)(G^{\prime},O^{\prime}) (GG^{\prime} containing nodes {V1,,Vn3,X,Y,W,U}\{V_{1},...,V_{n-3},X,Y,W,U\}; OO^{\prime} is observational data compatible with GG^{\prime}), where the number of states in WW times the number of states in UU is equal to the number of states in ZZ, and the structure of GG^{\prime} and the observational data OO^{\prime} are obtained as follows:

Structure of GG^{\prime}:
Let ParentsG(H)Parents_{G}(H) be the parents of HH in causal diagram GG.
ParentsG(U)=ParentsG(Z)Parents_{G^{\prime}}(U)=Parents_{G}(Z), ParentsG(W)=ParentsG(Z){U}Parents_{G^{\prime}}(W)=Parents_{G}(Z)\cup\{U\},
ParentsG(H)=ParentsG(H)Parents_{G^{\prime}}(H)=Parents_{G}(H) if ZParentsG(H)Z\notin Parents_{G}(H) for H{V1,,Vn3,X,Y}H\in\{V_{1},...,V_{n-3},X,Y\},
ParentsG(H)=ParentsG(H){Z}{W,U}Parents_{G^{\prime}}(H)=Parents_{G}(H)\setminus\{Z\}\cup\{W,U\}
                           if ZParentsG(H)Z\in Parents_{G}(H) for H{V1,,Vn3,X,Y}H\in\{V_{1},...,V_{n-3},X,Y\}.

Note that, let QQ be the set of variables in GG that satisfies the back-door or front-door criterion relative to (X,Y)(X,Y), then QQ^{\prime} satisfies the back-door or front-door criterion relative to (X,Y)(X,Y) in GG^{\prime} , where
Q=QQ^{\prime}=Q if ZQZ\notin Q,
Q=Q{Z}{W,U}Q^{\prime}=Q\setminus\{Z\}\cup\{W,U\} if ZQZ\in Q.

Observational data:
Let the number of states in WW be pp, and let the number of states in UU be qq.
The states of Z is the Cartesian product of the states of W and the states of U.
In detail, (wj,uk)(w_{j},u_{k}) is equivalent to z(j1)q+kz_{(j-1)*q+k}, wjw_{j} is equivalent to k=1q(wj,uk)=k=1qz(j1)q+k\lor_{k=1}^{q}(w_{j},u_{k})=\lor_{k=1}^{q}z_{(j-1)*q+k}, and uku_{k} is equivalent to j=1p(wj,uk)=j=1pz(j1)q+k\lor_{j=1}^{p}(w_{j},u_{k})=\lor_{j=1}^{p}z_{(j-1)*q+k}, i.e., P(wj,uk,V)=P(z(j1)q+k,V)P(w_{j},u_{k},V)=P(z_{(j-1)*q+k},V) for any V{V1,,Vn3,X,Y}V\subseteq\{V_{1},...,V_{n-3},X,Y\}.

Proof.

First, we show that QQ^{\prime} satisfies the back-door or front-door criterion relative to (X,Y)(X,Y) in GG^{\prime}.

If QQ satisfies the back-door criterion relative to (X,Y)(X,Y) in GG, we need to show that,

  • no node in QQ^{\prime} is a descendant of XX.

  • QQ^{\prime} blocks every path between XX and YY that contains an arrow into XX.

It is easy to show that if there is a node in QQ^{\prime} that is a descendant of XX in GG^{\prime}, then there is a node in QQ that is a descendant of XX in GG. And if there is a path between XX and YY that contains an arrow into XX does not blocked by QQ^{\prime} in GG^{\prime}, then there is a path between XX and YY that contains an arrow into XX does not blocked by QQ in GG. Thus, QQ^{\prime} satisfies the back-door criterion relative to (X,Y)(X,Y) in GG^{\prime}. Similarly, we can show that if QQ satisfies the front-door criterion relative to (X,Y)(X,Y) in GG, then QQ^{\prime} satisfies the front-door criterion relative to (X,Y)(X,Y) in GG^{\prime}.

Now, we show that (G,O)(G,O) is equivalent to (G,O)(G^{\prime},O^{\prime}), i.e., show that P(y|do(x))P(y|do(x)) is the same between (G,O)(G,O) and (G,O)(G^{\prime},O^{\prime}). Suppose QQ satisfies the back-door criterion relative to (X,Y)(X,Y) in GG. By adjustment formula in Equation 1, we have,
P(y|do(x))=qQP(y|x,q)×P(q)=qQP(x,y,q)×P(q)P(x,q)P(y|do(x))=\sum_{q\in Q}P(y|x,q)\times P(q)=\sum_{q\in Q}\frac{P(x,y,q)\times P(q)}{P(x,q)}.
And in GG^{\prime},
P(y|do(x))=qQP(y|x,q)×P(q)=qQP(x,y,q)×P(q)P(x,q)P(y|do(x))=\sum_{q\in Q^{\prime}}P(y|x,q)\times P(q)=\sum_{q\in Q^{\prime}}\frac{P(x,y,q)\times P(q)}{P(x,q)},
it is obviously that these two causal effects are the same, because P(wj,uk,V)=P(z(j1)q+k,V)P(w_{j},u_{k},V)=P(z_{(j-1)*q+k},V) for any V{V1,,Vn3,X,Y}V\subseteq\{V_{1},...,V_{n-3},X,Y\}.
Similarly, we can show that if QQ satisfies the front-door criterion relative to (X,Y)(X,Y) in GG, (G,O)(G,O) is equivalent to (G,O)(G^{\prime},O^{\prime}). ∎

Appendix D Simulation Algorithm for Generating Sample Distributions in Sections 4.3, 5.3, and 5.4

input : nn causal diagram nodes (X1,,XnX_{1},...,X_{n})
Distribution DD
output : nn conditional probability tables for P(Xi|Parents(Xi))P(X_{i}|Parents(X_{i}))
begin
       for i1i\leftarrow 1 to nn do
             s \leftarrow num-instantiates(XiX_{i})
            p \leftarrow num-instantiates(Parents(Xi)Parents(X_{i}))
            for k1k\leftarrow 1 to pp do
                   sum \leftarrow 0
                  for j1j\leftarrow 1 to ss do
                         aja_{j} \leftarrow sample(DD)
                        sum \leftarrow sum ++ aja_{j}
                   end for
                  
                  for j1j\leftarrow 1 to ss do
                         P(xij|Parents(Xi)k)P(x_{i_{j}}|Parents(X_{i})_{k}) \leftarrow aj/a_{j}/sum
                   end for
                  
             end for
            
       end for
      
end
Algorithm 2 Generate-cpt()

In our simulation studies, we set DD in Algorithm 2 to the uniform distribution.

Appendix E Construction of the Data in Table 4 of Section 5.1

P(u,w)\displaystyle P(u,w) =\displaystyle= P(z1),\displaystyle P(z_{1}),
P(u,w)\displaystyle P(u,w^{\prime}) =\displaystyle= P(z2),\displaystyle P(z_{2}),
P(u,w)\displaystyle P(u^{\prime},w) =\displaystyle= P(z3),\displaystyle P(z_{3}),
P(u,w)\displaystyle P(u^{\prime},w^{\prime}) =\displaystyle= P(z4),\displaystyle P(z_{4}),
P(u)\displaystyle P(u) =\displaystyle= P(u,w)+P(u,w)=P(z1)+P(z2)=0.5,\displaystyle P(u,w)+P(u,w^{\prime})=P(z_{1})+P(z_{2})=0.5,
P(w|u)\displaystyle P(w|u) =\displaystyle= P(u,w)/p(u)=P(z1)/P(u)=0.3/0.5=0.6,\displaystyle P(u,w)/p(u)=P(z_{1})/P(u)=0.3/0.5=0.6,
P(w|u)\displaystyle P(w|u^{\prime}) =\displaystyle= P(u,w)/p(u)=P(z3)/(1P(u))=0.2/0.5=0.4,\displaystyle P(u^{\prime},w)/p(u^{\prime})=P(z_{3})/(1-P(u))=0.2/0.5=0.4,
P(x|u,w)\displaystyle P(x|u,w) =\displaystyle= P(x|z1)=0.1,\displaystyle P(x|z_{1})=0.1,
P(x|u,w)\displaystyle P(x|u,w^{\prime}) =\displaystyle= P(x|z2)=0.4,\displaystyle P(x|z_{2})=0.4,
P(x|u,w)\displaystyle P(x|u^{\prime},w) =\displaystyle= P(x|z3)=0.5,\displaystyle P(x|z_{3})=0.5,
P(x|u,w)\displaystyle P(x|u^{\prime},w^{\prime}) =\displaystyle= P(x|z4)=0.7,\displaystyle P(x|z_{4})=0.7,
P(y|x,u,w)\displaystyle P(y|x,u,w) =\displaystyle= P(y|x,z1)=0.2,\displaystyle P(y|x,z_{1})=0.2,
P(y|x,u,w)\displaystyle P(y|x^{\prime},u,w) =\displaystyle= P(y|x,z1)=0.3,\displaystyle P(y|x^{\prime},z_{1})=0.3,
P(y|x,u,w)\displaystyle P(y|x,u,w^{\prime}) =\displaystyle= P(y|x,z2)=0.7,\displaystyle P(y|x,z_{2})=0.7,
P(y|x,u,w)\displaystyle P(y|x^{\prime},u,w^{\prime}) =\displaystyle= P(y|x,z2)=0.1,\displaystyle P(y|x^{\prime},z_{2})=0.1,
P(y|x,u,w)\displaystyle P(y|x,u^{\prime},w) =\displaystyle= P(y|x,z3)=0.6,\displaystyle P(y|x,z_{3})=0.6,
P(y|x,u,w)\displaystyle P(y|x^{\prime},u^{\prime},w) =\displaystyle= P(y|x,z3)=0.5,\displaystyle P(y|x^{\prime},z_{3})=0.5,
P(y|x,u,w)\displaystyle P(y|x,u^{\prime},w^{\prime}) =\displaystyle= P(y|x,z4)=0.5,\displaystyle P(y|x,z_{4})=0.5,
P(y|x,u,w)\displaystyle P(y|x^{\prime},u^{\prime},w^{\prime}) =\displaystyle= P(y|x,z4)=0.4.\displaystyle P(y|x^{\prime},z_{4})=0.4.

Appendix F Construction of the Distribution in Section 5.3

Instead of providing the resulting 1024 rows of the observational data, we provide the details for regenerating the observational data as following steps.

  • Generate P(X,Y,Z)P(X,Y,Z) using Algorithm 2.

  • Let P(X,Y,wj,uk)=P(X,Y,z(j1)16+k)P(X,Y,w_{j},u_{k})=P(X,Y,z_{(j-1)*16+k}).

  • Let P(X,Y,wj)=k=1qP(X,Y,wj,uk).P(X,Y,w_{j})=\sum_{k=1}^{q}P(X,Y,w_{j},u_{k}).

  • Let P(X,Y,uk)=j=1pP(X,Y,wj,uk).P(X,Y,u_{k})=\sum_{j=1}^{p}P(X,Y,w_{j},u_{k}).

  • Let P(uk)=X,YP(X,Y,uk).P(u_{k})=\sum_{X,Y}P(X,Y,u_{k}).

For example,

P(u1)\displaystyle P(u_{1}) =\displaystyle= X,YP(X,Y,u1)\displaystyle\sum_{X,Y}P(X,Y,u_{1})
=\displaystyle= P(x,y,u1)+P(x,y,u1)+P(x,y,u1)+P(x,y,u1)\displaystyle P(x,y,u_{1})+P(x,y^{\prime},u_{1})+P(x^{\prime},y,u_{1})+P(x^{\prime},y^{\prime},u_{1})
=\displaystyle= j=116P(x,y,wj,u1)+j=116P(x,y,wj,u1)+j=116P(x,y,wj,u1)+j=116P(x,y,wj,u1)\displaystyle\sum_{j=1}^{16}P(x,y,w_{j},u_{1})+\sum_{j=1}^{16}P(x,y^{\prime},w_{j},u_{1})+\sum_{j=1}^{16}P(x^{\prime},y,w_{j},u_{1})+\sum_{j=1}^{16}P(x^{\prime},y^{\prime},w_{j},u_{1})
=\displaystyle= j=116P(x,y,z(j1)16+1)+j=116P(x,y,z(j1)16+1)+\displaystyle\sum_{j=1}^{16}P(x,y,z_{(j-1)*16+1})+\sum_{j=1}^{16}P(x,y^{\prime},z_{(j-1)*16+1})+
j=116P(x,y,z(j1)16+1)+j=116P(x,y,z(j1)16+1),\displaystyle\sum_{j=1}^{16}P(x^{\prime},y,z_{(j-1)*16+1})+\sum_{j=1}^{16}P(x^{\prime},y^{\prime},z_{(j-1)*16+1}),
P(x,y,w1)\displaystyle P(x,y,w_{1}) =\displaystyle= k=116P(x,y,w1,uk)\displaystyle\sum_{k=1}^{16}P(x,y,w_{1},u_{k})
=\displaystyle= k=116P(x,y,zk).\displaystyle\sum_{k=1}^{16}P(x,y,z_{k}).