Bounds on Causal Effects and Application to High Dimensional Data

Ang Li
University of California, Los Angeles
Computer Science Department
[email protected]
&Judea Pearl
University of California, Los Angeles
Computer Science Department
[email protected]

Abstract

This paper addresses the problem of estimating causal effects when adjustment variables in the back-door or front-door criterion are partially observed. For such scenarios, we derive bounds on the causal effects by solving two non-linear optimization problems, and demonstrate that the bounds are sufficient. Using this optimization method, we propose a framework for dimensionality reduction that allows one to trade bias for estimation power, and demonstrate its performance using simulation studies.

1 Introduction

Estimating causal effects has been encountered in many areas of industry, marketing, and health science, and it is the most critical problem in causal inference. Pearl’s back-door and front-door criteria, along with the adjustment formula [Pearl(1995)], are powerful tools for estimating causal effects. In this paper, the problem of estimating causal effects when adjustment variables in the back-door or front-door criterion are partially observable, or when the adjustment variables have high dimensionality, is addressed.

Consider the problem of estimating the causal effects of $X$ on $Y$ when a sufficient set $W\cup U$ of confounders is partially observable (see Figure 1(a)). Because $W\cup U$ is assumed to be sufficient, the causal effects are identified from measurements on $X,Y,W,$ and $U$ and can be written as

\displaystyle P(y|do(x))

\displaystyle=

\displaystyle\sum_{w,u}P(y|x,w,u)P(w,u)=\sum_{w,u}\frac{P(x,y,w,u)P(w,u)}{P(x,w,u)}.

However, if $U$ is unobserved, $d$ -separation tells us immediately that adjusting for $W$ is inadequate by leaving the back-door path $X\xleftarrow{}U\xrightarrow{}Y$ unblocked. Therefore, regardless of sample size, the causal effects of $X$ on $Y$ cannot be estimated without bias. However, it turns out that when given a prior distribution $P(U)$ , we can obtain bounds on the causal effects. We will demonstrate later that the midpoints of the bounds are sufficient for estimating the causal effects.

Bounding has been proven to be useful in causal inference. [Balke and Pearl(1997a)] provided bounds on causal effects with imperfect compliance, [Tian and Pearl(2000)] proposed bounds on probabilities of causation, [Cai et al.(2008)Cai, Kuroki, Pearl, and Tian] provided bounds on causal effects with the presence of confounded intermediate variables, and [Li and Pearl(2019)] proposed bounds on the benefit function of a unit selection problem.

Although $P(U)$ is assumed to be given, it is usually known regardless of the model itself (e.g., $U$ stands for gender, gene type, blood type, or age). Alternatively, if costs permit, one can estimate $P(U)$ by re-testing within a small sampled sub-population.

(a)

U

is unobserved.

(b)

Z

has high dimensionality.

(d)

U

is unobserved and independent with

W

Figure 1: Needed the causal effects of

X

Y

A second problem considered in this paper is that of estimating causal effects when a sufficient set $Z$ of confounders is fully observable (see Figure 1(b)), but with a high dimensionality (e.g., $Z$ has $1024$ instantiates). In such a case, a prohibitively large sample size would be required, which is generally recognized to be impractical. We propose a new framework that transforms the problem associated with Figure 1(b) into an equivalent problem associated with Figure 1(c) containing $W$ and $U$ , which have much smaller dimensionalities (e.g., $W$ and $U$ have $32$ instantiates). We then estimate bounds on causal effects of the equivalent problem and take the midpoints as the effect estimates. We demonstrate through a simulation that this method can deliver good estimates of causal effects of the original problem.

2 Preliminaries & Related Works

In this section, we review the back-door and front-door criteria and their associated adjustment formulas [Pearl(1995)]. We use the causal diagrams in [Pearl(1995), Spirtes et al.(2000)Spirtes, Glymour, Scheines, and Heckerman, Pearl(2009), Koller and Friedman(2009)].

One key concept of a causal diagram is called $d$ -separation [Pearl(2014)].

Definition 1 ( $d$ -separation).

In a causal diagram $G$ , a path $p$ is blocked by a set of nodes $Z$ if and only if

1.

$p$ contains a chain of nodes $A\xrightarrow{}B\xrightarrow{}C$ or a fork $A\xleftarrow{}B\xrightarrow{}C$ such that the middle node $B$ is in $Z$ (i.e., $B$ is conditioned on), or
2.

$p$ contains a collider $A\xrightarrow{}B\xleftarrow{}C$ such that the collision node $B$ is not in $Z$ , and no descendant of $B$ is in $Z$ .

If $Z$ blocks every path between two nodes $X$ and $Y$ , then $X$ and $Y$ are $d$ -separated conditional on $Z$ , and thus are independent conditional on $Z$ , denoted as $X\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Y\ |\ Z$ .

With the concept of $d$ -separation in a causal diagram, Pearl proposed the back-door and front-door criteria as follows:

Definition 2 (Back-Door Criterion).

Given an ordered pair of variables $(X,Y)$ in a directed acyclic graph $G$ , a set of variables $Z$ satisfies the back-door criterion relative to $(X,Y)$ , if no node in $Z$ is a descendant of $X$ , and $Z$ blocks every path between $X$ and $Y$ that contains an arrow into $X$ .

If a set of variables $Z$ satisfies the back-door criterion for $X$ and $Y$ , the causal effects of $X$ on $Y$ are given by the adjustment formula:

\displaystyle P(y|do(x))=\sum_{z}P(y|x,z)P(z).

(1)

Definition 3 (Front-Door Criterion).

A set of variables $Z$ is said to satisfy the front-door criterion relative to an ordered pair of variables $(X,Y)$ if

•

$Z$ intercepts all directed paths from $X$ to $Y$ ;
•

there is no back-door path from $X$ to $Z$ ; and
•

all back-door paths from $Z$ to $Y$ are blocked by $X$ .

If a set of variables $Z$ satisfies the front-door criterion for $X$ and $Y$ , and if $P(x,Z)>0$ , then the causal effects of $X$ on $Y$ are given by the adjustment formula:

\displaystyle P(y|do(x))=\sum_{z}P(z|x)\sum_{x^{\prime}}P(y|x^{\prime},z)P(x^{\prime}).

(2)

The back-door and front-door criteria are two powerful tools for estimating causal effects; however, causal effects are not identifiable if the set of adjustment variables $Z$ is not fully observable. [Tian and Pearl(2000)] provided the naivest bounds for causal effects (Equation 3), regardless of the causal diagram.

\displaystyle P(x,y)\leq P(y|do(x))\leq 1-P(x,y^{\prime}).

(3)

As the first contribution of this study, we obtain narrower bounds of the causal effects by leveraging another source of knowledge, i.e., a causal diagram behind data combined with measurements of a set $W$ (observable part of $Z$ ) of covariates and a prior information of a set $U$ (unobservable part of $Z$ ), in a causal diagram in which the bounds are solutions to two non-linear optimization problems. We illustrate that the midpoints of the bounds are sufficient for estimating the causal effects.

Using this optimization method, our second contribution is the proposal of a new framework for estimating causal effects when a set of fully observable adjustment variables $Z$ has a high dimensionality without any assumption regarding the data-generating process. [Maathuis et al.(2009)Maathuis, Kalisch, Bühlmann, et al.] proposed a method of estimating causal effects when the number of covariates is larger than the sample size. However, it relies on several assumptions, including the assumption that the distribution of covariates is multivariate normal. The method is limited if the distribution of covariates is unknown or does not have accuracy estimate owing to the limitation of the sample size.

3 Bounds on Causal Effects

In this section, we demonstrate how bounds on causal effects with partially observable back-door or front-door variables can be obtained through non-linear optimizations.

3.1 Partially Observable Back-Door Variables

Theorem 4.

Given a causal diagram $G$ and a distribution compatible with $G$ , let $W\cup U$ be a set of variables satisfying the back-door criterion in $G$ relative to an ordered pair $(X,Y)$ , where $W\cup U$ is partially observable, i.e., only probabilities $P(X,Y,W)$ and $P(U)$ are given, the causal effects of $X$ on $Y$ are then bounded as follows:

\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 4 and UB is the solution to the non-linear optimization problem in Equation 5.

	$\displaystyle LB=\min\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}},$		(4)
	$\displaystyle UB=\max\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}},$		(5)
	where,
	$\displaystyle\sum_{u}a_{w,u}=P(x,y,w),\sum_{u}b_{w,u}=P(w),\sum_{u}c_{w,u}=P(x,w)\text{ for all }w\in W;$
	$\displaystyle\text{ and~{}for~{}all }w\in W\text{ and }u\in U,$
	$\displaystyle b_{w,u}\geq c_{w,u}\geq a_{w,u},$
	$\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{w,u}\leq\min\{P(x,y,w),p(u)\},$
	$\displaystyle\max\{0,p(w)+p(u)-1\}\leq b_{w,u}\leq\min\{P(w),p(u)\},$
	$\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq c_{w,u}\leq\min\{P(x,w),p(u)\}.$

3.2 Partially Observable Front-Door Variables

Theorem 5.

Given a causal diagram $G$ and distribution compatible with $G$ , let $W\cup U$ be a set of variables satisfying the front-door criterion in $G$ relative to an ordered pair $(X,Y)$ , where $W\cup U$ is partially observable, i.e., only probabilities $P(X,Y,W)$ and $P(U)$ are given and $P(x,W,U)>0$ , the causal effects of $X$ on $Y$ are then bounded as follows:

\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 6 and UB is the solution to the non-linear optimization problem in Equation 7.

	$\displaystyle LB=\min\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}},$		(6)
	$\displaystyle UB=\max\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}},$		(7)
	where,
	$\displaystyle\sum_{u}a_{x,w,u}=P(x,y,w),\sum_{u}b_{x,w,u}=P(x,w)\text{ for all }x\in X\text{ and }w\in W;$
	$\displaystyle\text{ and~{}for~{}all }x\in X\text{,}w\in W\text{, and }u\in U,$
	$\displaystyle b_{x,w,u}\geq a_{x,w,u},$
	$\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{x,w,u}\leq\min\{P(x,y,w),p(u)\},$
	$\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq b_{x,w,u}\leq\min\{P(x,w),p(u)\}.$

Notably, if any observational data (e.g., $P(U)$ ) are unavailable in the above theorems, we can remove that term, and the rest of non-linear optimization problems still provide valid bounds for the causal effects. In general, midpoints of bounds on causal effects are effective estimates. However, the lower (upper) bounds are also informative, which can be interpreted as the minimal (maximal) causal effects. The proofs of Theorems 4 and 5 are provided in the appendix.

4 Example

Herein, we present a simulated example to demonstrate that the midpoints of the bounds on the causal effects given by Theorem 4 are adequate for estimating the causal effects.

4.1 Causal Effect of a Drug

Drug manufacturers want to know the causal effect of recovery when a drug is taken. Thus, they conduct an observational study. Here, the recovery rates of $700$ patients were recorded. A total of $192$ patients chose to take the drug and $508$ patients did not. The results of the study are shown in Table 2. Blood type (type O or not) is not the only confounder of taking the drug and recovery. Another confounder is age (below the age of $70$ or not). The manufacturers have no data associated with age. They only know that $85.43\%$ of people in their region are below the age of $70$ .

Because both age and blood type are confounders of taking the drug and recovery, and the observational data associated with age are unobservable, the causal effect is not identifiable.

Let $X=x$ denote the event that a patient took the drug, and $X=x^{\prime}$ denote the event that a patient did not take the drug. Let $Y=y$ denote the event that a patient recovered, and $Y=y^{\prime}$ denote the event that a patient did not recover. Let $W=w$ represent a patient with blood type O, and $W=w^{\prime}$ represent a patient without blood type O. Let $U=u$ represent a patient below the age of $70$ , and $U=u^{\prime}$ represent a patient above the age of $70$ . The causal diagram is shown in Figure 1(d).

Table 1: Results of an observational study considering blood type.

Drug

No Drug

Blood

type O

23

out of

36

recovered

(

63.9\%

)

145

out of

225

recovered

(

64.4\%

)

Not blood

type O

135

out of

156

recovered

(

86.5\%

)

152

out of

283

recovered

(

53.7\%

)

Overall

158

out of

192

recovered

(

82.3\%

)

297

out of

508

recovered

(

58.5\%

)

Table 2: Informer view of the observational data considering blood type and age.

Drug

No Drug

Blood

type O

and

Age

below

70

3

out of

4

recovered

(

75.0\%

)

141

out of

219

recovered

(

64.4\%

)

Blood

type O

and

Age

above

70

20

out of

32

recovered

(

62.5\%

)

4

out of

6

recovered

(

66.7\%

)

Not blood

type O

and

Age

below

70

135

out of

151

recovered

(

89.4\%

)

117

out of

224

recovered

(

52.2\%

)

Not blood

type O

and

Age

above

70

0

out of

5

recovered

(

0.0\%

)

35

out of

59

recovered

(

59.3\%

)

Overall

158

out of

192

recovered

(

82.3\%

)

297

out of

508

recovered

(

58.5\%

)

An option for the manufacturers could be estimating the causal effect through the Tian-Pearl bounds in Equation 3 and the observational data from Table 2, where

	$\displaystyle P(x,y)$	$\displaystyle=$	$\displaystyle\sum_{w}P(y\|x,w)P(x\|w)P(w)=0.2257,$
	$\displaystyle 1-P(x,y^{\prime})$	$\displaystyle=$	$\displaystyle 1-\sum_{w}P(y^{\prime}\|x,w)P(x\|w)P(w)=0.9514.$

Therefore, the bounds on the causal effect estimated using Equation 3 are $0.2257\leq P(y|do(x))\leq 0.9514$ , where the causal information of the covariate $W$ and the prior information $P(U)$ are not used. These bounds are not sufficiently informative to conclude the actual causal effect. Although one may believe that we can use the midpoint of the bounds (i.e., $0.5886$ ), the gap (i.e., $0.9514-0.2257=0.7257$ ) between the bounds is not small; hence, this point estimate is unconvincing.

Now, considering the proposed bounds in Theorem 4 with the observational data from Table 2. $W\cup U$ satisfies the back-door criterion, and $P(X,Y,W)$ and $P(U)$ are available. We have $12$ optimal variables in each objective function, because $W$ and $U$ are binary. With the help of the “SLSQP” solver [Kraft(1988)] in the scipy package [SciPyCommunity(2020)], we obtain the bounds on the causal effect, which are $0.4728\leq P(y|do(x))\leq 0.9514$ . The lower bound actually increased significantly, and reached close to $0.5$ , which can help make decisions. The midpoint is $0.7121$ . Our conclusion is then that the causal effect of recovery when taking the drug is $0.7121$ . We show in the following section that this estimate of the causal effect is extremely close to the actual causal effect.

4.2 Informer View of the Causal Effect

An informer with access to the fully observed observational data, as summarized in Table 2 (Note that although it can be verified that the data in Table 2 are compatible with those in Table 2, we will never know these numbers in reality), would easily calculate the causal effect of recovery when taking the drug using the adjustment formula in Equation 1 (shown in Equation 8). The error of the estimate of the causal effect using Theorem 4 is only $(0.7518-0.7121)/0.7518\approx 5.28\%$ .

\displaystyle P(y|do(x))

\displaystyle=

\displaystyle\sum_{z,u}P(y|x,z,u)P(z,u)=0.7518.

(8)

4.3 Simulation Results

Here, we further illustrate that the midpoints of the proposed bounds on causal effects are sufficient for estimating the causal effects, and the midpoints of the proposed bounds in Theorem 4 are better than the midpoints of the Tian-Pearl bounds in Equation 3 based on a random simulation.

We employ the simplest causal diagram in Figure 1(a) with binary $W$ , $U$ , such that $W\cup U$ satisfies the back-door criterion. We randomly generated $1000$ sample distributions compatible with the causal diagram (the algorithm for generating the sample distributions is shown in the appendix). The average gap (upper bound $-$ lower bound) of the Tian-Pearl bounds among $1000$ samples is $0.487$ , and the average gap of the proposed bounds among $1000$ samples is $0.383$ . We then randomly picked $100$ out of $1000$ sample distributions to draw the graph of the actual causal effects, the midpoints of the Tian-Pearl bounds, and the midpoints of the proposed bounds. The results are shown in Figure 2(a).

Refer to caption — (a) Estimates of causal effects with partially observed confounders.

From Figure 2(a), although both midpoints of the bounds on the causal effects are good estimates of the actual causal effects, the midpoints of the proposed bounds are much closer to the actual causal effects, particularly when the causal effects are close to $0$ and $1$ . The average gap (upper bounds $-$ lower bounds), $0.383$ , of the proposed bounds among $1000$ samples is much smaller than the average gap, $0.487$ , of the Tian-Pearl bounds among $1000$ samples. This means that the midpoints of the proposed bounds are more convincing, because the bounds are narrower.

5 Application to High Dimensionality of Adjustment Variables

Consider the problem of estimating the causal effects of $X$ on $Y$ when a sufficient set $Z$ , which satisfies the back-door or front-door criterion, is fully observable (e.g., see Figure 1(b)) in a causal diagram $G$ but has high dimensionality (e.g., $Z$ has $1024$ instantiates), a prohibitive large sample size would be required to estimate the causal effects, which is generally recognized to be impractical. Herein, we propose a new framework to achieve dimensionality reduction.

5.1 Equivalent Causal Diagram with Observational Data

Definition 6 (Equivalent causal diagram with observational data).

Let $G,G^{\prime}$ be causal diagrams both containing nodes $X,Y$ . $O$ are observational data compatible with $G$ , and $O^{\prime}$ are observational data compatible with $G^{\prime}$ . We say that $(G,O)$ is equivalent to $(G^{\prime},O^{\prime})$ if the causal effects of $X$ on $Y$ with $(G,O)$ is equal to the causal effects of $X$ on $Y$ with $(G^{\prime},O^{\prime})$ .

This equivalent tuple $(G^{\prime},O^{\prime})$ is easy to obtain. We can simply add two new nodes $W$ and $U$ , and remove a node $Z$ in $G$ to obtain $G^{\prime}$ . Let the arrows entering $Z$ in $G$ now enter both $W$ and $U$ in $G^{\prime}$ , and let the arrows exiting $Z$ in $G$ now exit both $W$ and $U$ in $G^{\prime}$ . Finally, add an arrow from $U$ to $W$ . It is easy to show that $(G,O)$ and $(G^{\prime},O^{\prime})$ are equivalent if the states of $Z$ are the Cartesian product of the states of $W$ and the states of $U$ . Formally, we have the following theorem (the proof of the theorem is provided in the appendix),

Theorem 7.

Let $G$ be a causal diagram containing nodes $\{V_{1},...,V_{n-3},X,Y,Z\}$ . Let $O$ be any observational data compatible with $G$ . Suppose there exists a set of variables that satisfies the back-door or front-door criterion relative to $(X,Y)$ in $G$ , then, $(G,O)$ is equivalent to $(G^{\prime},O^{\prime})$ ( $G^{\prime}$ containing nodes $\{V_{1},...,V_{n-3},X,Y,W,U\}$ ; $O^{\prime}$ are observational data compatible with $G^{\prime}$ ), where the number of states in $W$ times the number of states in $U$ is equal to the number of states in $Z$ , and the structure of $G^{\prime}$ and the observational data $O^{\prime}$ are obtained as follows:

Structure of $G^{\prime}$ :
Let $Parents_{G}(H)$ be the parents of $H$ in causal diagram $G$ .
$Parents_{G^{\prime}}(U)=Parents_{G}(Z)$ , $Parents_{G^{\prime}}(W)=Parents_{G}(Z)\cup\{U\}$ ,
$Parents_{G^{\prime}}(H)=Parents_{G}(H)$ if $Z\notin Parents_{G}(H)$ for $H\in\{V_{1},...,V_{n-3},X,Y\}$ ,
$Parents_{G^{\prime}}(H)=Parents_{G}(H)\setminus\{Z\}\cup\{W,U\}$
if $Z\in Parents_{G}(H)$ for $H\in\{V_{1},...,V_{n-3},X,Y\}$ .

Note that, let $Q$ be the set of variables in $G$ that satisfies the back-door or front-door criterion relative to $(X,Y)$ , then $Q^{\prime}$ satisfies the back-door or front-door criterion relative to $(X,Y)$ in $G^{\prime}$ , where
$Q^{\prime}=Q$ if $Z\notin Q$ ,
$Q^{\prime}=Q\setminus\{Z\}\cup\{W,U\}$ if $Z\in Q$ .

Observational data:
Let $p$ be the number of states in $W$ , and $q$ be the number of states in $U$ .
The states of Z are the Cartesian product of the states of W and the states of U.
In detail, $(w_{j},u_{k})$ is equivalent to $z_{(j-1)*q+k}$ , $w_{j}$ is equivalent to $\lor_{k=1}^{q}(w_{j},u_{k})=\lor_{k=1}^{q}z_{(j-1)*q+k}$ , and $u_{k}$ is equivalent to $\lor_{j=1}^{p}(w_{j},u_{k})=\lor_{j=1}^{p}z_{(j-1)*q+k}$ , i.e., $P(w_{j},u_{k},V)=P(z_{(j-1)*q+k},V)$ for any $V\subseteq\{V_{1},...,V_{n-3},X,Y\}$ .

For example, consider the causal diagram in Figure 1(b) and the observational data (in the form of conditional probability tables (CPTs), where $X,Y$ are binary, and $Z$ has $4$ states.) in Table 4. The causal effect, $P(y|do(x))$ , through the adjustment formula in Equation 1, is $0.47$ . Based on the construction in Theorem 7 (see the appendix for details), we have the causal diagram in Figure 1(b) with the observational data in Table 4 is equivalent to the causal diagram in Figure 1(c) with the observational data in Table 4 (all nodes are binary), and we can verify that the causal effect, $P(y|do(x))$ , in the causal diagram in Figure 1(c) with the observational data in Table 4 is also $0.47$ .

Table 3: Observational data in CPTs compatible with the causal diagram in Figure 1(b).

$P(z_{1})$	0.3
$P(z_{2})$	0.2
$P(z_{3})$	0.2
$P(z_{4})$	0.3

$P(x\|z_{1})$	0.1
$P(x\|z_{2})$	0.4
$P(x\|z_{3})$	0.5
$P(x\|z_{4})$	0.7

$P(y\|x,z_{1})$	0.2
$P(y\|x^{\prime},z_{1})$	0.3
$P(y\|x,z_{2})$	0.7
$P(y\|x^{\prime},z_{2})$	0.1
$P(y\|x,z_{3})$	0.6
$P(y\|x^{\prime},z_{3})$	0.5
$P(y\|x,z_{4})$	0.5
$P(y\|x^{\prime},z_{4})$	0.4

Table 4: Observational data in CPTs compatible with the causal diagram in Figure 1(c).

P(u)

0.5

$P(w\|u)$	0.6
$P(w\|u^{\prime})$	0.4

$P(x\|u,w)$	0.1
$P(x\|u,w^{\prime})$	0.4
$P(x\|u^{\prime},w)$	0.5
$P(x\|u^{\prime},w^{\prime})$	0.7

$P(y\|x,u,w)$	0.2
$P(y\|x^{\prime},u,w)$	0.3
$P(y\|x,u,w^{\prime})$	0.7
$P(y\|x^{\prime},u,w^{\prime})$	0.1
$P(y\|x,u^{\prime},w)$	0.6
$P(y\|x^{\prime},u^{\prime},w)$	0.5
$P(y\|x,u^{\prime},w^{\prime})$	0.5
$P(y\|x^{\prime},u^{\prime},w^{\prime})$	0.4

Notably, the equivalent tuple is not unique and is transitive (i.e., if $(G,O)$ is equivalent to $(G^{\prime},O^{\prime})$ , and $(G^{\prime},O^{\prime})$ is equivalent to $(G^{\prime\prime},O^{\prime\prime})$ , then $(G,O)$ is equivalent to $(G^{\prime\prime},O^{\prime\prime})$ ).

5.2 Dimensionality Reduction

Now, considering the problem in the beginning of Section 5. First, we transform the causal diagram $G$ with the compatible observational data $O$ into an equivalent tuple $(G^{\prime},O^{\prime})$ using Algorithm 1 based on the construction in Theorem 7 (note that the algorithm only construct the structure of the $G^{\prime}$ and assigning the meaning of the states for $W,U$ , the corresponding observatioal data $O^{\prime}$ are then easy to obtain), then the new problem $(G^{\prime},O^{\prime})$ has the same causal effects of $X$ on $Y$ as in $(G,O)$ . By picking the dimensionality of $W$ ( $p$ in Algorithm 1), we can control the dimensionality of the new problem.

Note that, if $Z=(Z_{1},Z_{2},...,Z_{m})$ in $G$ is a set of variables, we can repeat Algorithm 1 for each variable in $Z$ , and finally obtain $W=(W_{1},W_{2},...,W_{m})$ and $U=(U_{1},U_{2},...,U_{m})$ , where the multiplication of the number of states in $W$ is equal to $p$ .

We then treat the new problem $(G^{\prime},O^{\prime})$ as a partially observable back-door or front-door variables problem in Sections 3.1 and 3.2, where $P(X,Y,W)$ and $P(U)$ are given, and we can then obtain the bounds of the causal effects through Theorems 4 and 5. We claim that the midpoints of the bounds are good estimates of the original causal effects. In addition, the bounds themselves will help make decisions.

5.3 Example

Consider the problem in Figure 1(b), where $X$ and $Y$ are binary and $Z$ has $256$ states. We randomly generated a distribution $P(X,Y,Z)$ that is compatible with the causal diagram. Because we know the exact distribution, we can easily obtain the causal effects through Equation 1. The causal effect $P(y|do(x))$ is $0.5527$ (the algorithm for generating the distribution is shown in the appendix).

Now, we transform the causal diagram with the observational data into an equivalent tuple $(G^{\prime},O^{\prime})$ ( $G^{\prime}$ is shown in Figure 1(c)) using Algorithm 1 ( $p=16$ ). We obtain the variable $W$ of $16$ states and the variable $U$ of $16$ states in $G^{\prime}$ ( $(w_{j},u_{k})$ is equivalent to $z_{(j-1)*16+k}$ ). We are then forced to use only observational data $P(X,Y,W)$ and $P(U)$ (the construction of $P(X,Y,W)$ and $P(U)$ is shown in the appendix), and based on Theorem 4, with the “SLSQP” solver, we obtain the bounds on the causal effect $p(y|do(x))$ , which are $0.4595\leq P(y|do(x))\leq 0.7012$ . We see the midpoint, $0.5804$ , is extremely close to the actual causal effect, $0.5527$ .

input : A

n

nodes,

(X_{1},X_{2},...,X_{n-3},X,Y,Z)

, causal diagram

G

and compatible

O

p

, the number of states in

W

G^{\prime}

of the equiv. tuple

(G^{\prime},O^{\prime})

output : A

n+1

nodes,

(X_{1},X_{2},...,X_{n-3},X,Y,W,U)

, causal diagram

G^{\prime}

Maping relation

M_{1}:\text{state of W}\xrightarrow[]{}\text{state of Z}

Maping relation

M_{2}:\text{state of U}\xrightarrow[]{}\text{state of Z}

begin

m

\leftarrow

num_states_in_G( $Z$ );

if $m\mod p=0$ then

q

\leftarrow

m/p

;

end if

else

q

\leftarrow

m/p+1

;

end if

// Set the virtual states for

Z

such that the probability is

0

num_states_in_G( $Z$ )

\leftarrow

p\times q

;

for $H$ in $\{X_{1},...,X_{n-3},X,Y\}$ do

num_states_in_G’( $H$ )

\leftarrow

num_states_in_G( $H$ );

if $Z\in$ Parents_in_G( $H$ ) then

Parents_in_G’( $H$ )

\leftarrow

Parents_in_G( $H$ )

\setminus\{Z\}\cup\{W,U\}

;

end if

else

Parents_in_G’( $H$ )

\leftarrow

Parents_in_G( $H$ );

end if

end for

num_states_in_G’( $W$ )

\leftarrow

p

;

num_states_in_G’( $U$ )

\leftarrow

q

;

Parents_in_G’( $W$ )

\leftarrow

Parents_in_G( $Z$ )

\cup\{U\}

;

Parents_in_G’( $U$ )

\leftarrow

Parents_in_G( $Z$ );

for $i\leftarrow 1$ to $p$ do

M_{1}(w_{i})

\leftarrow

\lor_{k=1}^{q}z_{(i-1)*q+k}

;

end for

for $i\leftarrow 1$ to $q$ do

M_{2}(u_{i})

\leftarrow

\lor_{j=1}^{p}z_{(j-1)*q+i}

;

end for

end

Algorithm 1 Generate Equivalent Tuple

Finally, lets consider how many samples are required for each method. According to [Roscoe(1975)], each state needs at least $30$ samples, and therefore, the exact solution by Equation 1 requires $2\times 2\times 256\times 30=30720$ samples. However, the proposed bounds based on Theorem 4 only requires $max(2\times 2\times 16,16)\times 30=1920$ samples. If the sample size is still unacceptable, we can use another equivalent tuple with $W$ having $8$ states and $U$ having $32$ states, we then only require $max(2\times 2\times 8,32)\times 30=960$ samples to obtain the bounds on the causal effects.

5.4 Simulation Results

Similarly to the previous simulation, we further illustrate that the bounds on the causal effects of the proposed framework are sufficient for estimating the original causal effects.

Once again, by employing the simplest causal diagram in Figure 1(b), where $X$ and $Y$ are binary and $Z$ has $256$ states. We randomly generated $100$ sample distributions compatible with the causal diagram (the algorithm for generating the distributions are shown in the appendix). The average gap (upper bound $-$ lower bound) of the Tian-Pearl bounds among $100$ samples is $0.5102$ , and the average gap of the proposed bounds through Theorems 7 and 4 among $100$ samples is $0.0676$ . We then draw the graph of the actual causal effects, the midpoints of the Tian-Pearl bounds, and the midpoints of the proposed bounds through Theorems 7 and 4. The results are shown in Figure 2(b).

From Figure 2(b), both midpoints of the bounds on the causal effects are good estimates of the actual causal effects, whereas the midpoints of the proposed bounds are slightly closer to the actual causal effects, particularly when the causal effects are close to $0$ and $1$ . Although the trend of the Tian-Pearl bounds is also close to the actual causal effects, the Tian-Pearl bounds are more likely to be parallel with the x-axis. Here, the Tian-Pearl bounds perform well because, in high-dimensionality cases, the randomly generated distributions are more likely to yield causal effects of approximately $0.5$ . However, the average gap of the proposed bounds among $100$ samples, $0.0676$ , is much smaller than the average gap of the Tian-Pearl bounds among $100$ samples, $0.5102$ . This means that the midpoints of the proposed bounds are more convincing, because the bounds are narrower.

6 Discussion

Here, we discuss additional features of bounds on causal effects.

First, if a whole set of back-door or front-door variables are unobserved, the causal effects have the naivest bounds in Equation 3. When the back-door or front-door variables are gradually observed, the bounds of the causal effects become increasingly narrow. Finally, when the back-door or front-door variables are fully observed, the bounds shrink into point estimates, which are identifiable. This also tells us that, when we pick $p$ in Algorithm 1, we should pick the largest $p$ for which the sample size is sufficient to estimate the observational distributions.

Next, bounds in Theorems 4 and 5 are given by non-linear optimizations. Therefore, the quality of the bounds also depends on the optimization solver. The examples and simulated results in this paper are all obtained from the simplest “SLSQP” solver from 1988. The quality of the bounds can be improved if more advanced solvers are applied. Inspired by the idea of Balke’s linear programming [Balke and Pearl(1997b)], we may obtain parametric solutions to non-linear optimizations in Theorems 4 and 5, we then do not need a non-linear optimization solver. However, the problem related to a non-linear optimization solver is not the scope of this paper.

In addition, the constraints in Theorems 4 and 5 are only based on the basic back-door or front-door criterion. We can also add constraints of independencies in a specific graph. For instance, $W$ and $U$ are independent in the causal diagram of Figure 1(d), we can then add the constraints that reflect $P(W)$ and $P(U)$ as being independent. The greater the number of constraints that are added to the optimizations, the better the bounds we can obtain.

Moreover, if one believes they have a sufficient sample size to estimate causal effects with high dimensionality adjustment variables, the framework in Section 5 could be evidence validating whether the sample size is indeed sufficient.

Next, in Section 5, we transformed $(G,O)$ into $(G^{\prime},O^{\prime})$ to obtain the bounds on causal effects with high dimensionality adjustment variables. However, for a tuple $(G,O)$ , multiple equivalent tuples exist by picking a different $p$ in Algorithm 1, and each of the equivalent tuple has bounds for the original causal effects. We can compute bounds for as many equivalent tuples as we want and take the maximal lower bounds and the minimal upper bounds.

Finally, based on numerous experiments, we realized that when $P(U)$ or $P(W)$ is specific (i.e., closer to $0$ or $1$ ), the proposed bounds are almost identified (i.e., the bounds shrink to point estimates). Therefore, in practice, we can always pick the equivalent tuple to transform, in which the $P(U)$ or $P(W)$ is close to $0$ or $1$ .

7 Conclusion

We demonstrated how to estimate causal effects when adjustment variables in the back-door or front-door criterion are partially observable by bounding the causal effects using solutions to non-linear optimizations. We provided examples and simulated results illustrating that the proposed method is sufficient to estimate the causal effects. We also proposed a framework for estimating causal effects when the adjustment variables have a high dimensionality. In summary, we analyzed and demonstrated how causal effects can be gained in practice using a causal diagram.

References

[Balke and Pearl(1997a)] Alexander Balke and Judea Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439):1171–1176, 1997a.
[Balke and Pearl(1997b)] Alexander A Balke and Judea Pearl. Probabilistic counterfactuals: Semantics, computation, and applications. Technical report, UCLA Dept. of Computer Science, 1997b.
[Cai et al.(2008)Cai, Kuroki, Pearl, and Tian] Zhihong Cai, Manabu Kuroki, Judea Pearl, and Jin Tian. Bounds on direct effect in the presence of confounded intermediate variables. Biometrics, 64:695–701, 2008.
[Koller and Friedman(2009)] Daphne Koller and Nir Friedman. Probabilistic graphical models: Principles and techniques. MIT press, 2009.
[Kraft(1988)] Dieter Kraft. A software package for sequential quadratic programming. Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt Köln: Forschungsbericht. Wiss. Berichtswesen d. DFVLR, 1988. URL https://books.google.com/books?id=4rKaGwAACAAJ.
[Li and Pearl(2019)] Ang Li and Judea Pearl. Unit selection based on counterfactual logic. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 1793–1799. AAAI Press, 2019.
[Maathuis et al.(2009)Maathuis, Kalisch, Bühlmann, et al.] Marloes H Maathuis, Markus Kalisch, Peter Bühlmann, et al. Estimating high-dimensional intervention effects from observational data. The Annals of Statistics, 37(6A):3133–3164, 2009.
[Pearl(1995)] Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
[Pearl(2009)] Judea Pearl. Causality. Cambridge university press, 2nd edition, 2009.
[Pearl(2014)] Judea Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, 2014.
[Roscoe(1975)] John T. Roscoe. Fundamental research statistics for the behavioral sciences. Number v. 2 in Editors’ Series in Marketing. Holt, Rinehart and Winston, 1975. ISBN 9780030919343. URL https://books.google.com/books?id=Fe8vAAAAMAAJ.
[SciPyCommunity(2020)] SciPyCommunity. Scipy reference guide, 2020. URL https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#rdd2e1855725e-12.
[Spirtes et al.(2000)Spirtes, Glymour, Scheines, and Heckerman] Peter Spirtes, Clark N Glymour, Richard Scheines, and David Heckerman. Causation, prediction, and search. MIT press, 2000.
[Tian and Pearl(2000)] Jin Tian and Judea Pearl. Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1-4):287–313, 2000.

Appendix A Proof of Theorem 4

Theorem 4.

\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 9 and UB is the solution to the non-linear optimization problem in Equation 10.

	$\displaystyle LB=\min\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}},$		(9)
	$\displaystyle UB=\max\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}},$		(10)
	where,
	$\displaystyle\sum_{u}a_{w,u}=P(x,y,w),\sum_{u}b_{w,u}=P(w),\sum_{u}c_{w,u}=P(x,w)\text{ for all }w\in W;$
	$\displaystyle\text{ and~{}for~{}all }w\in W\text{ and }u\in U,$
	$\displaystyle b_{w,u}\geq c_{w,u}\geq a_{w,u},$
	$\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{w,u}\leq\min\{P(x,y,w),p(u)\},$
	$\displaystyle\max\{0,p(w)+p(u)-1\}\leq b_{w,u}\leq\min\{P(w),p(u)\},$
	$\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq c_{w,u}\leq\min\{P(x,w),p(u)\}.$

Proof.

To show that the LB and UB bound the actual causal effects, we only need to show that there exists a point in feasible space of the non-linear optimization that $\sum_{w,u}\frac{a_{w,u}b_{w,u}}{c_{w,u}}$ is equal to the actual causal effects.
Since $W\cup U$ satisfies the back-door criterion, by adjustment formula in Equation 1, we have,

	$\displaystyle P(y\|do(x))$	$\displaystyle=$	$\displaystyle\sum_{w,u}P(y\|x,w,u)P(w,u)$
		$\displaystyle=$	$\displaystyle\sum_{w,u}\frac{P(x,y,w,u)P(w,u)}{P(x,w,u)}$

Let

	$\displaystyle a_{w,u}=P(x,y,w,u)$
	$\displaystyle b_{w,u}=P(w,u)$
	$\displaystyle c_{w,u}=P(x,w,u)$

We now show that the above set of $a_{w,u},b_{w,u},c_{w,u}$ are in feasible space.
We have,

$\displaystyle\text{ for }w$	$\displaystyle\in$	$\displaystyle W$
$\displaystyle\sum_{u}a_{w,u}$	$\displaystyle=$	$\displaystyle\sum_{u}P(x,y,w,u)=P(x,y,w)$
$\displaystyle\sum_{u}b_{w,u}$	$\displaystyle=$	$\displaystyle\sum_{u}P(w,u)=P(w)$
$\displaystyle\sum_{u}c_{w,u}$	$\displaystyle=$	$\displaystyle\sum_{u}P(x,w,u)=P(x,w)$

and,

for all	$\displaystyle w\in W$	$\displaystyle\text{ and~{}~{} }u\in U$
$\displaystyle b_{w,u}=P(w,u)$	$\displaystyle\geq$	$\displaystyle P(x,w,u)=c_{w,u}$
$\displaystyle c_{w,u}=P(x,w,u)$	$\displaystyle\geq$	$\displaystyle P(x,y,w,u)=a_{w,u}$
$\displaystyle a_{w,u}=P(x,y,w,u)$	$\displaystyle\leq$	$\displaystyle\min\{P(x,y,w),p(u)\}$
$\displaystyle b_{w,u}=P(w,u)$	$\displaystyle\leq$	$\displaystyle\min\{P(w),p(u)\}$
$\displaystyle c_{w,u}=P(x,w,u)$	$\displaystyle\leq$	$\displaystyle\min\{P(x,w),p(u)\}$
$\displaystyle a_{w,u}=P(x,y,w,u)$	$\displaystyle\geq$	$\displaystyle\max\{0,p(x,y,w)+p(u)-1\}$
$\displaystyle b_{w,u}=P(w,u)$	$\displaystyle\geq$	$\displaystyle\max\{0,p(w)+p(u)-1\}$
$\displaystyle c_{w,u}=P(x,w,u)$	$\displaystyle\geq$	$\displaystyle\max\{0,p(x,w)+p(u)-1\}$

Therefore, the above set of $a_{w,u},b_{w,u},c_{w,u}$ are in feasible space, and thus, the UB and LB bound the actual causal effects. ∎

Appendix B Proof of Theorem 5

Theorem 5.

\displaystyle\text{LB}\leq P(y|do(x))\leq\text{UB}

where LB is the solution to the non-linear optimization problem in Equation 11 and UB is the solution to the non-linear optimization problem in Equation 12.

	$\displaystyle LB=\min\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}},$		(11)
	$\displaystyle UB=\max\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}},$		(12)
	where,
	$\displaystyle\sum_{u}a_{x,w,u}=P(x,y,w),\sum_{u}b_{x,w,u}=P(x,w)\text{ for all }x\in X\text{ and }w\in W;$
	$\displaystyle\text{ and~{}for~{}all }x\in X\text{,}w\in W\text{, and }u\in U,$
	$\displaystyle b_{x,w,u}\geq a_{x,w,u},$
	$\displaystyle\max\{0,p(x,y,w)+p(u)-1\}\leq a_{x,w,u}\leq\min\{P(x,y,w),p(u)\},$
	$\displaystyle\max\{0,p(x,w)+p(u)-1\}\leq b_{x,w,u}\leq\min\{P(x,w),p(u)\}.$

Proof.

To show that the LB and UB bound the actual causal effects, we only need to show that there exists a point in feasible space of the non-linear optimization that $\sum_{w,u}\frac{b_{x,w,u}}{P(x)}\sum_{x^{\prime}}\frac{a_{x^{\prime},w,u}P(x^{\prime})}{b_{x^{\prime},w,u}}$ is equal to the actual causal effects.
Since $W\cup U$ satisfies front-door criterion and $P(u,W,U)>0$ , by adjustment formula in Equation 2, we have,

	$\displaystyle P(y\|do(x))$	$\displaystyle=$	$\displaystyle\sum_{w,u}P(w,u\|x)\sum_{x^{\prime}}P(y\|x^{\prime},w,u)P(x^{\prime})$
		$\displaystyle=$	$\displaystyle\sum_{w,u}\frac{P(x,w,u)}{P(x)}\sum_{x^{\prime}}\frac{P(x^{\prime},y,w,u)P(x^{\prime})}{P(x^{\prime},w,u)}$

Let

	$\displaystyle a_{x,w,u}=P(x,y,w,u)$
	$\displaystyle b_{x,w,u}=P(x,w,u)$

Similarly to the proof of Theorem 4, it is easy to show that the above set of $a_{x,w,u},b_{x,w,u}$ are in feasible space, and therefore, LB and UB bound the actual causal effects. ∎

Appendix C Proof of Theorem 7

Theorem 7.

Let $G$ be a causal diagram containing nodes $\{V_{1},...,V_{n-3},X,Y,Z\}$ . Let $O$ be any observational data compatible with $G$ . Suppose there exists a set of variables that satisfies the back-door or front-door criterion relative to $(X,Y)$ in $G$ , then, $(G,O)$ is equivalent to $(G^{\prime},O^{\prime})$ ( $G^{\prime}$ containing nodes $\{V_{1},...,V_{n-3},X,Y,W,U\}$ ; $O^{\prime}$ is observational data compatible with $G^{\prime}$ ), where the number of states in $W$ times the number of states in $U$ is equal to the number of states in $Z$ , and the structure of $G^{\prime}$ and the observational data $O^{\prime}$ are obtained as follows:

Observational data:
Let the number of states in $W$ be $p$ , and let the number of states in $U$ be $q$ .
The states of Z is the Cartesian product of the states of W and the states of U.
In detail, $(w_{j},u_{k})$ is equivalent to $z_{(j-1)*q+k}$ , $w_{j}$ is equivalent to $\lor_{k=1}^{q}(w_{j},u_{k})=\lor_{k=1}^{q}z_{(j-1)*q+k}$ , and $u_{k}$ is equivalent to $\lor_{j=1}^{p}(w_{j},u_{k})=\lor_{j=1}^{p}z_{(j-1)*q+k}$ , i.e., $P(w_{j},u_{k},V)=P(z_{(j-1)*q+k},V)$ for any $V\subseteq\{V_{1},...,V_{n-3},X,Y\}$ .

Proof.

First, we show that $Q^{\prime}$ satisfies the back-door or front-door criterion relative to $(X,Y)$ in $G^{\prime}$ .

If $Q$ satisfies the back-door criterion relative to $(X,Y)$ in $G$ , we need to show that,

•

no node in $Q^{\prime}$ is a descendant of $X$ .
•

$Q^{\prime}$ blocks every path between $X$ and $Y$ that contains an arrow into $X$ .

It is easy to show that if there is a node in $Q^{\prime}$ that is a descendant of $X$ in $G^{\prime}$ , then there is a node in $Q$ that is a descendant of $X$ in $G$ . And if there is a path between $X$ and $Y$ that contains an arrow into $X$ does not blocked by $Q^{\prime}$ in $G^{\prime}$ , then there is a path between $X$ and $Y$ that contains an arrow into $X$ does not blocked by $Q$ in $G$ . Thus, $Q^{\prime}$ satisfies the back-door criterion relative to $(X,Y)$ in $G^{\prime}$ . Similarly, we can show that if $Q$ satisfies the front-door criterion relative to $(X,Y)$ in $G$ , then $Q^{\prime}$ satisfies the front-door criterion relative to $(X,Y)$ in $G^{\prime}$ .

Now, we show that $(G,O)$ is equivalent to $(G^{\prime},O^{\prime})$ , i.e., show that $P(y|do(x))$ is the same between $(G,O)$ and $(G^{\prime},O^{\prime})$ . Suppose $Q$ satisfies the back-door criterion relative to $(X,Y)$ in $G$ . By adjustment formula in Equation 1, we have,
$P(y|do(x))=\sum_{q\in Q}P(y|x,q)\times P(q)=\sum_{q\in Q}\frac{P(x,y,q)\times P(q)}{P(x,q)}$ .
And in $G^{\prime}$ ,
$P(y|do(x))=\sum_{q\in Q^{\prime}}P(y|x,q)\times P(q)=\sum_{q\in Q^{\prime}}\frac{P(x,y,q)\times P(q)}{P(x,q)}$ ,
it is obviously that these two causal effects are the same, because $P(w_{j},u_{k},V)=P(z_{(j-1)*q+k},V)$ for any $V\subseteq\{V_{1},...,V_{n-3},X,Y\}$ .
Similarly, we can show that if $Q$ satisfies the front-door criterion relative to $(X,Y)$ in $G$ , $(G,O)$ is equivalent to $(G^{\prime},O^{\prime})$ . ∎

Appendix D Simulation Algorithm for Generating Sample Distributions in Sections 4.3, 5.3, and 5.4

input :

n

causal diagram nodes (

X_{1},...,X_{n}

)

Distribution

D

output :

n

conditional probability tables for

P(X_{i}|Parents(X_{i}))

begin

for $i\leftarrow 1$ to $n$ do

\leftarrow

num-instantiates( $X_{i}$ )

\leftarrow

num-instantiates( $Parents(X_{i})$ )

for $k\leftarrow 1$ to $p$ do

sum

\leftarrow

0

for $j\leftarrow 1$ to $s$ do

a_{j}

\leftarrow

sample( $D$ )

sum

\leftarrow

sum

+

a_{j}

end for

for $j\leftarrow 1$ to $s$ do

P(x_{i_{j}}|Parents(X_{i})_{k})

\leftarrow

a_{j}/

sum

end for

end

Algorithm 2 Generate-cpt()

In our simulation studies, we set $D$ in Algorithm 2 to the uniform distribution.

Appendix E Construction of the Data in Table 4 of Section 5.1

$\displaystyle P(u,w)$	$\displaystyle=$	$\displaystyle P(z_{1}),$
$\displaystyle P(u,w^{\prime})$	$\displaystyle=$	$\displaystyle P(z_{2}),$
$\displaystyle P(u^{\prime},w)$	$\displaystyle=$	$\displaystyle P(z_{3}),$
$\displaystyle P(u^{\prime},w^{\prime})$	$\displaystyle=$	$\displaystyle P(z_{4}),$
$\displaystyle P(u)$	$\displaystyle=$	$\displaystyle P(u,w)+P(u,w^{\prime})=P(z_{1})+P(z_{2})=0.5,$
$\displaystyle P(w\|u)$	$\displaystyle=$	$\displaystyle P(u,w)/p(u)=P(z_{1})/P(u)=0.3/0.5=0.6,$
$\displaystyle P(w\|u^{\prime})$	$\displaystyle=$	$\displaystyle P(u^{\prime},w)/p(u^{\prime})=P(z_{3})/(1-P(u))=0.2/0.5=0.4,$
$\displaystyle P(x\|u,w)$	$\displaystyle=$	$\displaystyle P(x\|z_{1})=0.1,$
$\displaystyle P(x\|u,w^{\prime})$	$\displaystyle=$	$\displaystyle P(x\|z_{2})=0.4,$
$\displaystyle P(x\|u^{\prime},w)$	$\displaystyle=$	$\displaystyle P(x\|z_{3})=0.5,$
$\displaystyle P(x\|u^{\prime},w^{\prime})$	$\displaystyle=$	$\displaystyle P(x\|z_{4})=0.7,$
$\displaystyle P(y\|x,u,w)$	$\displaystyle=$	$\displaystyle P(y\|x,z_{1})=0.2,$
$\displaystyle P(y\|x^{\prime},u,w)$	$\displaystyle=$	$\displaystyle P(y\|x^{\prime},z_{1})=0.3,$
$\displaystyle P(y\|x,u,w^{\prime})$	$\displaystyle=$	$\displaystyle P(y\|x,z_{2})=0.7,$
$\displaystyle P(y\|x^{\prime},u,w^{\prime})$	$\displaystyle=$	$\displaystyle P(y\|x^{\prime},z_{2})=0.1,$
$\displaystyle P(y\|x,u^{\prime},w)$	$\displaystyle=$	$\displaystyle P(y\|x,z_{3})=0.6,$
$\displaystyle P(y\|x^{\prime},u^{\prime},w)$	$\displaystyle=$	$\displaystyle P(y\|x^{\prime},z_{3})=0.5,$
$\displaystyle P(y\|x,u^{\prime},w^{\prime})$	$\displaystyle=$	$\displaystyle P(y\|x,z_{4})=0.5,$
$\displaystyle P(y\|x^{\prime},u^{\prime},w^{\prime})$	$\displaystyle=$	$\displaystyle P(y\|x^{\prime},z_{4})=0.4.$

Appendix F Construction of the Distribution in Section 5.3

Instead of providing the resulting 1024 rows of the observational data, we provide the details for regenerating the observational data as following steps.

•

Generate $P(X,Y,Z)$ using Algorithm 2.
•

Let $P(X,Y,w_{j},u_{k})=P(X,Y,z_{(j-1)*16+k})$ .
•

Let $P(X,Y,w_{j})=\sum_{k=1}^{q}P(X,Y,w_{j},u_{k}).$
•

Let $P(X,Y,u_{k})=\sum_{j=1}^{p}P(X,Y,w_{j},u_{k}).$
•

Let $P(u_{k})=\sum_{X,Y}P(X,Y,u_{k}).$

For example,

$\displaystyle P(u_{1})$	$\displaystyle=$	$\displaystyle\sum_{X,Y}P(X,Y,u_{1})$
	$\displaystyle=$	$\displaystyle P(x,y,u_{1})+P(x,y^{\prime},u_{1})+P(x^{\prime},y,u_{1})+P(x^{\prime},y^{\prime},u_{1})$
	$\displaystyle=$	$\displaystyle\sum_{j=1}^{16}P(x,y,w_{j},u_{1})+\sum_{j=1}^{16}P(x,y^{\prime},w_{j},u_{1})+\sum_{j=1}^{16}P(x^{\prime},y,w_{j},u_{1})+\sum_{j=1}^{16}P(x^{\prime},y^{\prime},w_{j},u_{1})$
	$\displaystyle=$	$\displaystyle\sum_{j=1}^{16}P(x,y,z_{(j-1)16+1})+\sum_{j=1}^{16}P(x,y^{\prime},z_{(j-1)16+1})+$
		$\displaystyle\sum_{j=1}^{16}P(x^{\prime},y,z_{(j-1)16+1})+\sum_{j=1}^{16}P(x^{\prime},y^{\prime},z_{(j-1)16+1}),$
$\displaystyle P(x,y,w_{1})$	$\displaystyle=$	$\displaystyle\sum_{k=1}^{16}P(x,y,w_{1},u_{k})$
	$\displaystyle=$	$\displaystyle\sum_{k=1}^{16}P(x,y,z_{k}).$

Bounds on Causal Effects and Application to High Dimensional Data

Abstract

1 Introduction

2 Preliminaries & Related Works

Definition 1 (dd-separation).

Definition 2 (Back-Door Criterion).

Definition 3 (Front-Door Criterion).

3 Bounds on Causal Effects

3.1 Partially Observable Back-Door Variables

Theorem 4.

3.2 Partially Observable Front-Door Variables

Theorem 5.

4 Example

4.1 Causal Effect of a Drug

4.2 Informer View of the Causal Effect

4.3 Simulation Results

5 Application to High Dimensionality of Adjustment Variables

5.1 Equivalent Causal Diagram with Observational Data

Definition 6 (Equivalent causal diagram with observational data).

Theorem 7.

5.2 Dimensionality Reduction

5.3 Example

5.4 Simulation Results

6 Discussion

7 Conclusion

References

Appendix A Proof of Theorem 4

Theorem 4.

Proof.

Appendix B Proof of Theorem 5

Theorem 5.

Proof.

Appendix C Proof of Theorem 7

Theorem 7.

Proof.

Appendix D Simulation Algorithm for Generating Sample Distributions in Sections 4.3, 5.3, and 5.4

Appendix E Construction of the Data in Table 4 of Section 5.1

Appendix F Construction of the Distribution in Section 5.3

Definition 1 ( $d$ -separation).