Conditional Adjustment in a Markov Equivalence Class
Sara LaPlante Emilija Perković University of Washington University of Washington
Supplement to:
Conditional Adjustment in a Markov Equivalence Class
Abstract
We consider the problem of identifying a conditional causal effect through covariate adjustment. We focus on the setting where the causal graph is known up to one of two types of graphs: a maximally oriented partially directed acyclic graph (MPDAG) or a partial ancestral graph (PAG). Both MPDAGs and PAGs represent equivalence classes of possible underlying causal models. After defining adjustment sets in this setting, we provide a necessary and sufficient graphical criterion – the conditional adjustment criterion – for finding these sets under conditioning on variables unaffected by treatment. We further provide explicit sets from the graph that satisfy the conditional adjustment criterion, and therefore, can be used as adjustment sets for conditional causal effect identification.
1 INTRODUCTION
Many scientific disciplines have an interest in identifying and estimating causal effects for specific subgroups of a population. For instance, researchers may want to know if a medical treatment is beneficial for people with heart disease or if the treatment will harm older patients (Brand and Xie, 2010; Health, 2010). Such causal effects are referred to as conditional causal effects or heterogeneous causal effects. The identification of these conditional causal effects from observational data is the subject of this work.
Much of the literature on estimating conditional causal effects from observational data focuses on the conditional average treatment effect (CATE; Athey and Imbens, 2016; Wager and Athey, 2018; Künzel et al., 2019; Nie and Wager, 2021; Kennedy et al., 2022). The CATE is represented as a contrast of means for a response under different do-interventions (see Section 2 for definition) of a treatment when conditioning on a set of covariate values . These means take the form .
Some results on CATE estimation assume that the conditioning set is rich enough to capture all relevant common causes of and – meaning that and are unconfounded given . This implies
(1) |
which allows the CATE to be estimated as a difference of means from observational data.
However, this assumption does not hold in all applications. Consider, for example, the setting depicted in the causal directed acyclic graph (DAG) of Figure 1, where we want to compute a causal effect of on given some set . In this setting, age and smoking status are common causes of and , and therefore, and are confounded unless we condition on both age and smoking status (). But we may want to know the causal effect of on conditional on age alone ().
To allow for estimation of the CATE in such cases, various recent works (Abrevaya et al., 2015; Fan et al., 2022; Chernozhukov et al., 2023; Smucler et al., 2020) have proposed estimation methods that rely on knowing an additional set of covariates that – together with – leads to and being unconfounded. We refer to this set of variables as a conditional adjustment set (Definition 1). For such a set ,
(2) | ||||
In the example above, if , then .
Of course, not all conditional causal effect research focuses on estimation through the functional in Equation (2). Notably, other work has explored identifiability without limiting focus to a particular functional. For example, Shpitser and Pearl (2008) and Jaber et al. (2019, 2022) focus on the conditions under which the interventional distribution is identifiable given a causal graph. Though these results broaden the options for identification, estimators based on these results would have to rely on functionals that may prove difficult to estimate, such as (Shpitser and Pearl, 2008; Jaber et al., 2019, 2022). Our work addresses this by focusing on identification of the same interventional distribution given a causal graph – but through the use of conditional adjustment sets, which may lead to more desirable estimators. To the best of our knowledge, this area of research is largely unexplored.
Our main contribution is the conditional adjustment criterion (Definitions 2 and 7), a graphical criterion that we show is necessary and sufficient for identifying a conditional adjustment set (Theorems 3 and 9). We additionally provide explicit sets that satisfy this criterion when any such set exists. We note, however, that these results are restricted to a setting where the conditioning set consists of variables known to be unaffected by treatment. While this restricted setting produces limitations (see the second example in the discussion, Section 5), our results are broadly applicable to a variety of research questions. For example, the restriction is met when the conditioning set includes exclusively pre-treatment variables.
In considering the problem of identifying a conditional adjustment set, we assume that the underlying causal system can be represented by a causal DAG. When we collect observational data on all variables in the system, we can attempt to learn this causal DAG by relying on the constraints present in the data (Spirtes et al., 1999; Chickering, 2002; Zhang, 2008b; Hauser and Bühlmann, 2012; Mooij et al., 2020; Squires and Uhler, 2022). However, this task is often impossible from observational data alone, regardless of the available sample size. And further, we cannot always observe every variable.
Thus, our work focuses on causal models that represent Markov equivalence classes of graphs that can be learned from observational data: a maximally oriented partially directed acyclic graph (MPDAG; Meek, 1995) and a maximally oriented partial ancestral graph (PAG; Richardson and Spirtes, 2002). An MPDAG represents a restriction of the Markov equivalence class of DAGs that can be learned from observational data and background knowledge when all variables are observed (Andersson et al., 1997; Meek, 1995; Chickering, 2002). A PAG represents a Markov equivalence class of maximal ancestral graphs (MAGs; Richardson and Spirtes, 2002), which can be learned from observational data and which allows for unobserved variables (Spirtes et al., 2000; Zhang, 2008b; Ali et al., 2009). A MAG, in turn, can be seen as a marginalization of a DAG containing only the observed variables (Richardson and Spirtes, 2002). See Section 2 and Supp. A for further definitions.
The structure of this paper is as follows: Section 2 provides preliminary definitions, with the remaining definitions given in Supp. A. Section 3 contains all results for the MPDAG setting. In particular, we introduce our conditional adjustment criterion in Section 3.1; Section 3.2 illustrates applications of our criterion with examples; Section 3.3 provides several methods for constructing conditional adjustment sets; and Section 3.4 includes a discussion of the similarities of our conditional adjustment criterion with both the adjustment criterion of Perković et al. (2017) and the -dependent dynamic adjustment criterion of Smucler et al. (2020). We present some analogous results for PAGs in Section 4, and we discuss some limitations of our results and areas for future work in Section 5.
2 PRELIMINARIES
We use capital letters (e.g. ) to denote nodes in a graph as well as random variables that these nodes represent. Similarly, bold capital letters (e.g. ) are used to denote node sets and random vectors.
Nodes, Edges, and Subgraphs. A graph consists of a set of nodes (variables) , and a set of edges . Edges can be directed (), bi-directed (), undirected ( or ), or partially directed (). We use as a stand in for any of the allowed edge marks. An edge is into (out of) a node if the edge has an arrowhead (tail) at . An induced subgraph of consists of and where are all edges in between nodes in .
Directed and Partially Directed Graphs. A directed graph contains only directed edges (). A partially directed graph may contain undirected edges () and directed edges ().
Mixed and Partially Directed Mixed Graphs. A mixed graph may contain directed and bi-directed edges. The partially directed mixed graphs we consider can contain any of the following edge types: , , , and . Hence, an edge in a partially directed graph can only refer to edge , whereas in a partially directed mixed graph, can represent , , or .
Paths and Cycles. For disjoint node sets and , a path from to is a sequence of distinct nodes from some to some for which every pair of successive nodes is adjacent. A path consisting of undirected edges ( or ) is an undirected path. A directed path from to is a path of the form . A directed path from to and the edge form a directed cycle. A directed path from to and the edge form an almost directed cycle. A path , , in a graph is a possibly directed path if no edge , is in (Perković et al., 2017, Zhang, 2008a).
A path from to is proper (w.r.t. ) if only its first node is in . A path from to is a back-door path if does not begin with a visible edge out of (see definition of visible below; Pearl, 2009, Maathuis and Colombo, 2015). For a path and such that , we define the subpath of from to as the path .
Colliders, Shields, and Definite Status Paths. If a path contains as a subpath, then is a collider on . A path is an unshielded triple if and are not adjacent. A path is unshielded if all successive triples on the path are unshielded. A node is a definite non-collider on a path if the edge or is on , or if is an undirected subpath of and is not adjacent to . A node is of definite status on a path if it is a collider, a definite non-collider, or an endpoint on the path. A path is of definite status if every node on is of definite status.
Blocking, D-separation, and M-separation. Let , , and be pairwise disjoint node sets in a directed or partially directed graph . A definite-status path from to is d-connecting given if every definite non-collider on is not in and every collider on has a descendant in . Otherwise, blocks . If blocks all definite status paths between and in , then is d-separated from given in and we write (Pearl, 2009).
If is a mixed or partially directed mixed graph, the analogous terms to d-connection and d-separation are called m-connection and m-separation (Richardson and Spirtes, 2002). If a path is not m-connecting in such a graph we will also call it blocked. We will also use the same notation to denote m-separation in a mixed or partially directed mixed graph .
Ancestral Relationships. If , then is a parent of . If , , , or , then is a possible parent of . If there is a directed path from to , such as , , , then is an ancestor of , is a descendant of , and are mediators for and . We use the convention that if is a descendant of , then is also a mediator for and . If there is a possibly directed path from to , then is a possible ancestor of , is a possible descendant of , and any node on this path that is not is a possible mediator of and . We use the convention that if is a possible descendant of , then is also a possible mediator for and . We also use the convention that every node is an ancestor, descendant, possible ancestor, and possible descendant of itself. The sets of parents, possible parents, ancestors, descendants, possible ancestors, and possible descendants of in are denoted by , , , , , and , respectively. Similarly, we denote the sets of mediators and possible mediators for and in by and .
We let , with analogous definitions for , , and . For disjoint node sets and , we let be the union of all mediators of and that lie on a proper causal path from to , with an analogous definition for . Unconventionally, we define . We denote that is adjacent to in by .
DAGs and PDAGs. A directed graph without directed cycles is a directed acyclic graph (DAG). A partially directed acyclic graph (PDAG) is a partially directed graph without directed cycles.
MAGs. A mixed graph without directed or almost directed cycles is called ancestral. Note that we do not consider ancestral graphs that represent selection bias (see Zhang, 2008a, for details). A maximal ancestral graph (MAG) is an ancestral graph where every pair of non-adjacent nodes and in can be m-separated by a set . A DAG with unobserved variables can be uniquely represented by a MAG , which preserves the ancestry and m-separations among the observed variables (Richardson and Spirtes, 2002).
MPDAGs and Markov Equivalence. All DAGs over a node set with the same adjacencies and unshielded colliders can be uniquely represented by a completed PDAG (CPDAG). These DAGs form a Markov equivalence class with the same set of d-separations. A maximally oriented PDAG (MPDAG) is formed by taking a CPDAG, adding background knowledge (by directing undirected edges), and completing Meek (1995)’s orientation rules. We say a DAG is represented by an MPDAG if it has the same nodes, adjacencies, and directed edges as . The set of such DAGs – denoted by – forms a restriction of the Markov equivalence class so that all DAGs in have same set of d-separations. Note that if has the edge , then contains at least one DAG with and one DAG with (Meek, 1995). Further, note that all DAGs and CPDAGs are MPDAGs.
PAGs and Markov Equivalence. All MAGs that encode the same set of m-separations form a Markov equivalence class, which can be uniquely represented by a partial ancestral graph (PAG; Richardson and Spirtes, 2002; Ali et al., 2009). denotes all MAGs represented by a PAG . We say a DAG is represented by a PAG if there is a MAG such that is represented by .
We do not consider PAGs that represent selection bias (see Zhang, 2008b). Further, we only consider maximally informative PAGs (Zhang, 2008b). That is, if a PAG has the edge , then contains a MAG with and a MAG with . (We preclude MAGs with by assuming no selection bias.) Any arrowhead or tail edge mark in a PAG corresponds to that same arrowhead or tail edge mark in every MAG in . The edge orientations in every PAG we consider are completed with respect to orientation rules and of Zhang (2008b).
Visible and Invisible Edges. Given a MAG or PAG , a directed edge is visible in if there is a node such that contains either or , where and (Zhang, 2006). A directed edge that is not visible in a MAG or PAG is said to be invisible.
Markov Compatibility and Positivity. An observational density is Markov compatible with a DAG if . If is Markov compatible with a DAG , then it is Markov compatible with every DAG that is Markov equivalent to (Pearl, 2009). Hence, we say that a density is Markov compatible with an MPDAG, MAG, or PAG if it is Markov compatible with a DAG represented by . Throughout, we assume positivity. That is, we only consider distributions that satisfy for all valid values of (Kivva et al., 2023).
Probabilistic Implications of Graph Separation. Let , , and be pairwise disjoint node sets in a DAG, MPDAG, MAG, or PAG . If and are d-separated or m-separated given in , then and are conditionally independent given in any observational density that is Markov compatible with (Lauritzen et al., 1990; Zhang, 2008a; Henckel et al., 2022).
Causal Graphs. Let be a graph with nodes and . When is an MPDAG, it is a causal MPDAG if every edge represents a direct causal effect of on and if every edge represents a direct causal effect of unknown direction (either affects or affects ). Note that all DAGs are MPDAGs.
When is a MAG or PAG, it is a causal MAG or causal PAG, respectively, if every edge represents the presence of a causal path from to ; every edge represents the absence of a causal path from to ; and every edge represents the presence of a causal path of unknown direction or a common cause in the underlying causal DAG.
Causal and Non-causal Paths. Note that any directed or possibly directed path in a causal graph is causal or possibly causal, respectively. However, since we focus on causal graphs, we will use this causal terminology for paths in any of our graphs. We will say a path is non-causal if it is not possibly causal.
Consistency. Let be an observational density over . The notation , or for short, represents an outside intervention that sets to fixed values . An interventional density is a density resulting from such an intervention.
Let denote the set of all interventional densities such that (including ). A causal DAG is a causal Bayesian network compatible with if and only if for all , the following truncated factorization holds:
(3) |
(Pearl, 2009; Bareinboim et al., 2012). We say an interventional density is consistent with a causal DAG if it belongs to a set of interventional densities such that is compatible with . Note that any observational density that is Markov compatible with is consistent with . We say an interventional density is consistent with a causal MPDAG, MAG, or PAG if it is consistent with each DAG represented by – were the DAG to be causal.
Identifiability. Let , , and be pairwise disjoint node sets in a causal MPDAG or PAG , and let be a set with which a DAG represented by is compatible – were to be causal. We say the conditional causal effect of on given is identifiable in if for any where , we have (Pearl, 2009).
Forbidden Set. Let and be disjoint node sets in an MPDAG or PAG . Then the forbidden set relative to in is
Fo | (5) |
3 RESULTS - MPDAGS
In this section, we present our results on identifying a conditional causal effect via our conditional adjustment criterion in the setting of an MPDAG (Definition 2). Examples of how to use our criterion and explicit conditional adjustment sets based on our criterion follow these results. We remark here that our criterion shares similarities with the adjustment criterion for total effect identification of Perković et al. (2017) and with the -dependent dynamic adjustment criterion of Smucler et al. (2020), but we save these results and reflections for Section 3.4.
Note that the results of this section hold when a fully oriented DAG is known, since all DAGs are MPDAGs. Throughout, our goal is to identify the conditional causal effect of treatments on responses conditional on covariates and given a known graph .
3.1 Conditional Adjustment Criterion
We include our definition of a conditional adjustment set below (Definition 1). Note that, while this section focuses on MPDAGs, we write Definition 1 broadly for further use in Section 4. Our goal in this section is to find an equivalent graphical characterization of a conditional adjustment set. Theorem 3 establishes that Definition 2 provides such a graphical characterization, which we call the conditional adjustment criterion, under the assumption that the conditioning set does not contain variables affected by treatment ().
Definition 1
(Conditional Adjustment Set for MPDAGs, PAGs) Let , , , and be pairwise disjoint node sets in a causal MPDAG or PAG . Then is a conditional adjustment set relative to in if for any density consistent with
(6) |
Definition 2
(Conditional Adjustment Criterion for MPDAGs) Let , , , and be pairwise disjoint node sets in an MPDAG , where and where every proper possibly causal path from to in starts with a directed edge. Then satisfies the conditional adjustment criterion relative to in if
-
(a)
, and
-
(b)
blocks all proper non-causal definite status paths from to in .
Theorem 3
(Completeness, Soundness of Conditional Adjustment Criterion for MPDAGs) Let , and be pairwise disjoint node sets in a causal MPDAG , where . Then is a conditional adjustment set relative to in (Definition 1) if and only if satisfies the conditional adjustment criterion relative to in (Definition 2).
-
Proof of Theorem 3.
First note the following facts.
-
(i)
Every proper possibly causal path from to in starts with a directed edge.
-
(ii)
in every DAG in .
-
(iii)
.
We have that (i) holds in either direction – by definition () or by Proposition 36 (Supp. C) (). Then Lemmas 20 and 26 (Supp. B) imply (ii) and (iii), respectively, given and (i).
Now consider the following statements.
-
(a)
is a conditional adjustment set relative to in .
-
(b)
is a conditional adjustment set relative to in each DAG in – were the DAG to be causal.
-
(c)
satisfies the conditional adjustment criterion relative to in each DAG in .
-
(d)
satisfies the conditional adjustment criterion relative to in .
-
(i)
3.2 Examples
To illustrate the usefulness of the results above, we provide examples below where we aim to find when . Theorem 3 allows us to use the conditional adjustment criterion to (a) check whether a set can be used for conditional adjustment (Examples 1-3) or (b) determine if no such set exists (Example 4).
Example 1
(Empty Conditional Adjustment Set.) Let be the causal MPDAG in Figure 2(a) 111Compare to Figure 5(a) of Perković (2020)., and let , , and . Note that and that every possibly causal path from to in starts with a directed edge.
Let . Note that , , and blocks all non-causal definite status paths from to . Thus, satisfies the conditional adjustment criterion relative to in , and by Theorem 3, .
Example 2
(Only Nonempty Conditional Adjustment Sets.) Again let be the causal MPDAG in Figure 2(a), where and . But now let . We still have that and that every possibly causal path from to in starts with a directed edge.
Note that if we let , does not block the path , which is a proper non-causal definite status path from to . Thus, the empty set is not a conditional adjustment set relative to in .
Consider, instead, the set . Note that , , and blocks all non-causal definite status paths from to . Thus, satisfies the conditional adjustment criterion relative to in , and by Theorem 3, .
Example 3
(Conditional Adjustment Set Contains Descendants of .) Let be the causal DAG (and therefore, MPDAG) in Figure 2(b) 222Compare to Figure 6(a) of Perković et al. (2018)., where we assume is a variable that cannot be measured. Define , , and . Note that .
Consider the set . Note that , , and blocks all proper non-causal paths from to in . Hence, satisfies the conditional adjustment criterion relative to in , and by Theorem 3, .
Example 4
(No Conditional Adjustment Set, Effect Non-identifiable.) Let be the causal MPDAG in Figure 2(c), and let , , and . Note that . However, is a proper possibly causal path from to in that starts with an undirected edge. Thus, by Theorem 3, there can be no conditional adjustment set relative to in . In fact, by Proposition 36 (Supp. C), is not identifiable in using any method.
3.3 Constructing Adjustment Sets
The conditional adjustment criterion provides a way to check if a set can be used for conditional adjustment given an MPDAG , but it does not provide a way to construct a conditional adjustment set – a task that may be difficult when is large. The results in this section provide such a roadmap under certain assumptions. The proofs can be found in Supp. F.
Lemma 4
Let , , and be pairwise disjoint node sets in a causal MPDAG , where and where every possibly causal path from to in starts with a directed edge. If , then the following is a conditional adjustment set relative to in :
(7) |
Theorem 5
Let , , and be pairwise disjoint node sets in a causal MPDAG , where and where every proper possibly causal path from to in starts with a directed edge.
-
(a)
If there is any conditional adjustment set relative to in , then the following set is one:
(8) -
(b)
Suppose . If there is any conditional adjustment set relative to in , then the following set is one:
(9)
3.4 Comparison of Contexts
In this section, we point out a bridge between our conditional adjustment results and prior literature on unconditional adjustment and adjustment under dynamic treatment. We begin by presenting Lemma 6, which provides an equivalence between our criterion and the criterion of Perković et al. (2017) used for unconditional adjustment given an MPDAG. Note that this lemma is used to prove Theorem 3 (see Figure 5 in Supp. D). See Supp. D for the lemma’s proof.
Lemma 6
Let , , , and be pairwise disjoint node sets in an MPDAG , where . Then we have the following.
- (a)
- (b)
Next we turn to the work of Smucler et al. (2020), where the authors consider causal effect estimation under a dynamic treatment. For this purpose, Smucler et al. (2020) define a dynamic adjustment set, which they then relate to the set used by Maathuis and Colombo (2015) for unconditional adjustment (Definition 11, Supp. A). Lemma 6 allows us to connect this dynamic adjustment to our work.
Before making this connection, we briefly describe the context of these authors’ work. Unlike a do-intervention that sets to fixed values , a dynamic intervention sets to values with probability . However, a do-intervention can be seen as a special case of a dynamic intervention where . Dynamic interventions are often of interest in personalized medicine (Robins, 1993; Murphy et al., 2001; Chakraborty and Moodie, 2013).
Smucler et al. (2020) refer to a causal effect under a dynamic intervention, whose assignment probability depends on , as a -dependent dynamic causal effect (also called a single stage dynamic treatment effect in Chakraborty and Moodie (2013)). They consider these causal effects in the setting where and are nodes, the given graph is a DAG, and the following assumption holds: . They then define a -dependent dynamic adjustment set as a set that satisfies
To compare these sets to our conditional adjustment sets, we reference Proposition 1 of Smucler et al. (2020). This result states that, under their assumptions, is a -dependent dynamic adjustment set if and only if is an adjustment set relative to in (Definition 11, Supp. A). It follows from Lemma 6 that is a -dependent dynamic adjustment set if and only if is a conditional adjustment set relative to in – when is a DAG such that . Thus, our results can be seen as generalizations of Smucler et al. (2020) for and, therefore, can be used for -dependent dynamic causal effect identification.
4 RESULTS - PAGS
We now extend our results on conditional adjustment to the setting of a PAG.
4.1 Conditional Adjustment Criterion
We first introduce our conditional adjustment criterion for PAGs (Definition 7). Note that the difference between this criterion and the analogous criterion for MPDAGs is the use of a visible as opposed to a directed edge. Visibility is a stronger condition introduced by Zhang (2008a) (see Supp. A for definition).
Following this, Lemma 8 provides an equivalence between our criterion and the criterion of Perković et al. (2018) used for unconditional adjustment given a PAG. Theorem 9 is our main result in this section. It establishes that, under restrictions on , the conditional adjustment criterion is an equivalent graphical characterization of a conditional adjustment set in causal PAGs. Proofs of these results are given in Supp. G.
Definition 7
(Conditional Adjustment Criterion for PAGs) Let , , , and be pairwise disjoint node sets in a PAG , where and where every proper possibly causal path from to in starts with a visible edge out of . Then satisfies the conditional adjustment criterion relative to in if
-
(a)
, and
-
(b)
blocks all proper non-causal definite status paths from to in .
Lemma 8
Let , , , and be pairwise disjoint node sets in a PAG , where . Then we have the following.
- (a)
- (b)
-
Proof of Lemma 8.
(a) Follows from the fact that .
(b) We start by noting the following fact. Since , then in every DAG represented by (Lemma 49, Supp. G). Then consider the following statements.
-
(a)
is a conditional adjustment set relative to in .
-
(b)
is a conditional adjustment set relative to in each DAG represented by – were the DAG to be causal.
-
(c)
is an adjustment set relative to in each DAG represented by – were the DAG to be causal.
-
(d)
is an adjustment set relative to in .
-
(a)
Theorem 9
4.2 Constructing Adjustment Sets
We now provide a method for constructing conditional adjustment sets given a causal PAG (Theorem 10). We illustrate this result in Example 6. The proof of Theorem 10 can be found in Supp. H.
Theorem 10
Let , , and be pairwise disjoint node sets in a causal PAG , where and where every proper possibly causal path from to in starts with a visible edge out of . If there is any conditional adjustment set relative to in , then the following set is one:
(10) | ||||
Example 6
Let be the causal PAG in Figure 3, and let , , and . Note that . Furthermore, the only possibly causal path from to is the edge , which is visible due to the presence of , where . If there is any conditional adjustment set relative to in , then the conditions of Theorem 10 are met. We consider the set from Equation (10).
To see that this is a conditional adjustment set relative to in , we note that it fulfills the requirements of Definition 7. That is, and blocks all proper non-causal definite status paths from to in .
5 DISCUSSION
This paper defines a conditional adjustment set that can be used to identify a causal effect in a setting where a causal MPDAG or PAG is known (Definition 1). We give necessary and sufficient graphical conditions for identifying such a set when (Theorems 3 and 9). Further, we provide multiple methods for constructing these sets (Sections 3.3 and 4.2). While our results can be used to identify a broad class of conditional causal effects, we discuss some limitations below.
One such limitation is that there are conditional causal effects that can be identified but cannot be identified using conditional adjustment sets. As an example, consider the causal DAG (and therefore, MPDAG) in Figure 4, and let , , and . Note that the conditional causal effect of on given is identifiable using do calculus rules (Pearl, 2009, see Equations (14)-(16) in Supp. B):
(11) | ||||
(12) | ||||
(13) |
The first two equalities follow from basic probability rules. Equation (11) follows from Rule 1 of the do calculus, since in . Equation (12) follows from Rule 3 of the do calculus, since in and in . Equation (13) follows from Rule 2 of the do calculus, since in and in .
However, we can show that there is no conditional adjustment set relative to in that could have been used to identify the effect above. To see this, note that since , we can use Theorem 3 to state the following. A set must satisfy the conditional adjustment criterion relative to in (Definition 2) in order to be a conditional adjustment set. Definition 2 requires that block the path , since it is a proper non-causal definite status path from to . It follows that must contain , but this contradicts Definition 2’s requirement that .
Adding to the limitation above, there are conditional causal effects that can be identified using conditional adjustment sets but where these conditional adjustment sets cannot be identified using our criterion. This can occur when , since our graphical criterion requires this restriction but our conditional adjustment set definition does not. As an example, consider again the causal DAG given in Figure 2(b), and let , , and . Since , no set satisfies the conditional adjustment criterion. However, using do calculus rules (Pearl, 2009), we can show that is a conditional adjustment set relative to in :
The first and second equality follow from basic probability rules. The third follows by Rules 2 and 3 of the do calculus, since in and in . Future work could address identification in this setting by expanding our graphical criterion to allow for arbitrary conditioning.
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. 2210210.
References
- Abrevaya et al. (2015) J. Abrevaya, Y.-C. Hsu, and R. P. Lieli. Estimating conditional average treatment effects. Journal of Business & Economic Statistics, 33(4):485–505, 2015.
- Ali et al. (2009) R. A. Ali, T. S. Richardson, and P. Spirtes. Markov equivalence for ancestral graphs. Annals of Statistics, 37:2808–2837, 2009.
- Andersson et al. (1997) S. A. Andersson, D. Madigan, and M. D. Perlman. A characterization of Markov equivalence classes for acyclic digraphs. Annals of Statistics, 25:505–541, 1997.
- Athey and Imbens (2016) S. Athey and G. Imbens. Recursive partitioning for heterogeneous causal effects. In Proceedings of the National Academy of Sciences, volume 113, pages 7353–7360, 2016.
- Bareinboim et al. (2012) E. Bareinboim, C. Brito, and J. Pearl. Local characterizations of causal Bayesian networks. In Graph Structures for Knowledge Representation and Reasoning: Second International Workshop, GKR 2011, Barcelona, Spain, July 16, 2011. Revised Selected Papers, pages 1–17. Springer, 2012.
- Brand and Xie (2010) J. E. Brand and Y. Xie. Who benefits most from college? Evidence for negative selection in heterogeneous economic returns to higher education. American Sociological Review, 75(2):273–302, 2010.
- Chakraborty and Moodie (2013) B. Chakraborty and E. E. Moodie. Statistical Methods for Dynamic Treatment Regimes. Springer, 2013.
- Chernozhukov et al. (2023) V. Chernozhukov, W. K. Newey, and R. Singh. A simple and general debiased machine learning theorem with finite-sample guarantees. Biometrika, 110(1):257–264, 2023.
- Chickering (2002) D. M. Chickering. Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2:445–498, 2002.
- Fan et al. (2022) Q. Fan, Y.-C. Hsu, R. P. Lieli, and Y. Zhang. Estimation of conditional average treatment effects with high-dimensional data. Journal of Business & Economic Statistics, 40(1):313–327, 2022.
- Hauser and Bühlmann (2012) A. Hauser and P. Bühlmann. Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. Journal of Maching Learning Research, 13:2409–2464, 2012.
- Health (2010) W. H. O. R. Health. Medical eligibility criteria for contraceptive use. World Health Organization, 2010.
- Henckel et al. (2022) L. Henckel, E. Perković, and M. H. Maathuis. Graphical criteria for efficient total effect estimation via adjustment in causal linear models. Journal of the Royal Statistical Society: Series B, pages 579–599, 2022.
- Jaber et al. (2019) A. Jaber, J. Zhang, and E. Bareinboim. Identification of conditional causal effects under Markov equivalence. In Proceedings of NeurIPS, pages 11516–11524, 2019.
- Jaber et al. (2022) A. Jaber, A. Ribeiro, J. Zhang, and E. Bareinboim. Causal identification under Markov equivalence: Calculus, algorithm, and completeness. In Proceedings of NeurIPS, volume 35, pages 3679–3690, 2022.
- Kalisch et al. (2012) M. Kalisch, M. Mächler, D. Colombo, M. H. Maathuis, and P. Bühlmann. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11):1–26, 2012.
- Kennedy et al. (2022) E. H. Kennedy, S. Balakrishnan, J. M. Robins, and L. Wasserman. Minimax rates for heterogeneous causal effect estimation. arXiv preprint arXiv:2203.00837, 2022.
- Kivva et al. (2023) Y. Kivva, J. Etesami, and N. Kiyavash. On identifiability of conditional causal effects. arXiv preprint arXiv:2306.11755, 2023.
- Künzel et al. (2019) S. R. Künzel, J. S. Sekhon, P. J. Bickel, and B. Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. In Proceedings of the National Academy of Sciences, volume 116, pages 4156–4165, 2019.
- Lauritzen and Spiegelhalter (1988) S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B, pages 157–224, 1988.
- Lauritzen et al. (1990) S. L. Lauritzen, A. P. Dawid, B. N. Larsen, and H.-G. Leimer. Independence properties of directed Markov fields. Networks, 20(5):491–505, 1990.
- Maathuis and Colombo (2015) M. H. Maathuis and D. Colombo. A generalized back-door criterion. Annals of Statistics, 43:1060–1088, 2015.
- Mardia et al. (1980) K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis (Probability and Mathematical Statistics). Academic Press London, 1980.
- Meek (1995) C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of UAI, pages 403–410, 1995.
- Mooij et al. (2020) J. M. Mooij, S. Magliacane, and T. Claassen. Joint causal inference from multiple contexts. The Journal of Machine Learning Research, 21(1):3919–4026, 2020.
- Murphy et al. (2001) S. A. Murphy, M. J. van der Laan, J. M. Robins, and C. P. P. R. Group. Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456):1410–1423, 2001.
- Nie and Wager (2021) X. Nie and S. Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.
- Pearl (2009) J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009.
- Perković (2020) E. Perković. Identifying causal effects in maximally oriented partially directed acyclic graphs. In Proceedings of UAI, pages 530–539, 2020.
- Perković et al. (2015) E. Perković, J. Textor, M. Kalisch, and M. H. Maathuis. A complete generalized adjustment criterion. In Proceedings of UAI, pages 682–691, 2015.
- Perković et al. (2017) E. Perković, M. Kalisch, and M. H. Maathuis. Interpreting and using CPDAGs with background knowledge. In Proceedings of UAI, 2017.
- Perković et al. (2018) E. Perković, J. Textor, M. Kalisch, and M. H. Maathuis. Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs. Journal of Machine Learning Research, 18, 2018.
- Richardson (2003) T. S. Richardson. Markov properties for acyclic directed mixed graphs. Scandinavian Jouranl of Statistics, 30:145–157, 2003.
- Richardson and Spirtes (2002) T. S. Richardson and P. Spirtes. Ancestral graph Markov models. Annals of Statistics, 30:962–1030, 2002.
- Robins (1993) J. M. Robins. Analytic methods for estimating HIV-treatment and cofactor effects. Methodological Issues in AIDS Behavioral Research, pages 213–288, 1993.
- Rothenhäusler et al. (2018) D. Rothenhäusler, J. Ernest, and P. Bühlmann. Causal inference in partially linear structural equation models: identifiability and estimation. Annals of Statistics, 46:2904–2938, 2018.
- Shpitser and Pearl (2008) I. Shpitser and J. Pearl. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9:1941–1979, 2008.
- Smucler et al. (2020) E. Smucler, F. Sapienza, and A. Rotnitzky. Efficient adjustment sets in causal graphical models with hidden variables. Biometrika, 2020.
- Spirtes et al. (1999) P. Spirtes, C. Meek, and T. S. Richardson. Computation, Causation and Discovery, chapter An algorithm for causal inference in the presence of latent variables and selection bias, pages 211–252. MIT Press, 1999.
- Spirtes et al. (2000) P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, second edition, 2000.
- Squires and Uhler (2022) C. Squires and C. Uhler. Causal structure learning: A combinatorial perspective. Foundations of Computational Mathematics, pages 1–35, 2022.
- Textor et al. (2016) J. Textor, B. Van der Zander, M. S. Gilthorpe, M. Liśkiewicz, and G. T. Ellison. Robust causal inference using directed acyclic graphs: the R package ‘dagitty’. International Journal of Epidemiology, 45(6):1887–1894, 2016.
- Wager and Athey (2018) S. Wager and S. Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
- Wright (1921) S. Wright. Correlation and causation. Journal of Agricultural Research, 20(7):557–585, 1921.
- Zhang (2006) J. Zhang. Causal Inference and Reasoning in Causally Insufficient Systems. PhD thesis, Carnegie Mellon University, 2006.
- Zhang (2008a) J. Zhang. Causal reasoning with ancestral graphs. Journal of Machine Learning Research, 9:1437–1474, 2008a.
- Zhang (2008b) J. Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172:1873–1896, 2008b.
Checklist
-
1.
For all models and algorithms presented, check if you include:
-
(a)
A clear description of the mathematical setting, assumptions, algorithm, and/or model. [Yes/No/Not Applicable] Not Applicable
-
(b)
An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Yes/No/Not Applicable] Not Applicable
-
(c)
(Optional) Anonymized source code, with specification of all dependencies, including external libraries. [Yes/No/Not Applicable] Not Applicable
-
(a)
-
2.
For any theoretical claim, check if you include:
-
(a)
Statements of the full set of assumptions of all theoretical results. [Yes/No/Not Applicable] Yes
-
(b)
Complete proofs of all theoretical results. [Yes/No/Not Applicable] Yes
-
(c)
Clear explanations of any assumptions. [Yes/No/Not Applicable] Yes
-
(a)
-
3.
For all figures and tables that present empirical results, check if you include:
-
(a)
The code, data, and instructions needed to reproduce the main experimental results (either in the Supplemental material or as a URL). [Yes/No/Not Applicable] Not Applicable
-
(b)
All the training details (e.g., data splits, hyperparameters, how they were chosen). [Yes/No/Not Applicable] Not Applicable
-
(c)
A clear definition of the specific measure or statistics and error bars (e.g., with respect to the random seed after running experiments multiple times). [Yes/No/Not Applicable] Not Applicable
-
(d)
A description of the computing infrastructure used. (e.g., type of GPUs, internal cluster, or cloud provider). [Yes/No/Not Applicable] Not Applicable
-
(a)
-
4.
If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include:
-
(a)
Citations of the creator If your work uses existing assets. [Yes/No/Not Applicable] Not Applicable
-
(b)
The license information of the assets, if applicable. [Yes/No/Not Applicable] Not Applicable
-
(c)
New assets either in the Supplemental material or as a URL, if applicable. [Yes/No/Not Applicable] Not Applicable
-
(d)
Information about consent from data providers/curators. [Yes/No/Not Applicable] Not Applicable
-
(e)
Discussion of sensible content if applicable, e.g., personally identifiable information or offensive content. [Yes/No/Not Applicable] Not Applicable
-
(a)
-
5.
If you used crowdsourcing or conducted research with human subjects, check if you include:
-
(a)
The full text of instructions given to participants and screenshots. [Yes/No/Not Applicable] Not Applicable
-
(b)
Descriptions of potential participant risks, with links to Institutional Review Board (IRB) approvals if applicable. [Yes/No/Not Applicable] Not Applicable
-
(c)
The estimated hourly wage paid to participants and the total amount spent on participant compensation. [Yes/No/Not Applicable] Not Applicable
-
(a)
Appendix A FURTHER PRELIMINARIES AND DEFINITIONS
A.1 Preliminaries
Path Construction. A subsequence of a path is a path obtained by deleting non-endpoint nodes from without changing the order of the remaining nodes. Let and such that . We denote the concatenation of paths by the symbol , so that . We use the notation to denote the path .
A.2 Definitions
Definition 11
Definition 12
(Adjustment Criterion for MPDAGs (PAGs); Perković et al., 2017, 2018) Let , , and be pairwise disjoint node sets in an MPDAG (PAG) , where every proper possibly causal path from to in starts with a directed (visible) edge out of . Then satisfies the adjustment criterion relative to in if
-
(a)
, and
-
(b)
blocks all proper non-causal definite status paths from to in .
Definition 13
(Generalized Back-Door Criterion for DAGs; cf. Maathuis and Colombo, 2015) Let , , and be pairwise disjoint node sets in a DAG . Then satisfies the generalized back-door criterion relative to in if
-
(a)
, and
-
(b)
blocks all back-door paths from to in , for every .
Definition 14
(Proper Back-Door Graph for DAGs; cf. Perković et al., 2018) Let and be disjoint node sets in a DAG . The proper back-door graph is obtained from by removing all edges out of that are on proper causal paths from to in .
Definition 15
Definition 16
(Distance to ; Zhang, 2006; Perković et al., 2017) Let and be pairwise disjoint node sets in an MPDAG or PAG . Let be a path between and in such that every collider on has a possibly directed path (possibly of length ) to . Define the distance to of to be the length of a shortest possibly directed path (possibly of length ) from to , and define the distance to of to be the sum of the distances from of the colliders on .
Appendix B EXISTING RESULTS
Rules of the Do Calculus (Pearl, 2009). Let and be pairwise disjoint (possibly empty) node sets in a causal DAG . Let denote the graph obtained by deleting all edges into from . Similarly, let denote the graph obtained by deleting all edges out of in , and let denote the graph obtained by deleting all edges into and all edges out of in . The following rules hold for all densities consistent with .
Rule 1. If , then
(14) |
Rule 2. If , then
(15) |
Rule 3. If , then
(16) |
where .
Lemma 17
(Wright’s Rule of Wright, 1921) Let , where , and is a vector of mutually independent errors with means zero. Moreover, let . Let , be the corresponding DAG such that is in if and only if . A non-zero entry is called the edge coefficient of . For two distinct nodes , , let be all paths between and in that do not contain a collider. Then , where is the product of all edge coefficients along path , .
Lemma 18
(Theorem 3.2.4 of Mardia et al., 1980) Let be a -dimensional multivariate Gaussian random vector with mean vector and covariance matrix , so that is a -dimensional multivariate Gaussian random vector with mean vector and covariance matrix and is a -dimensional multivariate Gaussian random vector with mean vector and covariance matrix . Then .
Lemma 19
(cf. Theorem 1 and Proposition 3 of Lauritzen et al., 1990) Let be a DAG, and let be an observational density over . Then is Markov compatible with if and only if
for all , where indicates independence with respect to .
Lemma 20
(cf. Lemma 3.2 of Perković et al., 2017) Let and be disjoint node sets in an MPDAG . If , then in every DAG in .
Lemma 21
(Lemma C.2 of Perković et al., 2017, Lemma 9 of Perković et al., 2018) Let , , and be pairwise disjoint node sets in an MPDAG (PAG) , where every proper possibly causal path from to in starts with a directed (visible) edge out of . Then the following statements are equivalent.
-
(i)
.
-
(ii)
in every DAG (MAG) in .
Lemma 22
(cf. Lemma C.3 of Perković et al., 2017, Lemma 10 of Perković et al., 2018) Let and be pairwise disjoint node sets in an MPDAG (PAG) , where every proper possibly causal path from to in starts with a directed (visible) edge out of and where . Then the following statements are equivalent.
-
(i)
blocks all proper non-causal definite status paths from to in .
-
(ii)
blocks all proper non-causal definite status paths from to in for every DAG (MAG) in .
Theorem 23
(cf. Proposition 3 of Lauritzen et al. (1990), cf. Corollary 2 of Richardson (2003)) Let , and be pairwise disjoint node sets in a DAG . Further let be the moral induced subgraph of on nodes (see Definition 15). Then d-separates and in if and only if all paths between and in contain at least one node in .
Theorem 24
(cf. Theorem 7 of Perković et al., 2018) Consider the definition of the adjustment criterion for MPDAGs (Definition 12) in the specific setting of a DAG. In this setting, replacing condition (b) in Definition 12 with
-
(b)
d-separates and in (see Definition 14)
results in a criterion that is equivalent to Definition 12 applied to a DAG.
Theorem 25
Lemma 26
(cf. Lemma E.6 of Henckel et al., 2022) Let be disjoint node sets in an MPDAG . If there is no proper possibly causal path from to that starts with an undirected edge in , then .
Lemma 27
(cf. Lemma 3.5 of Perković et al., 2017) Let , be a definite status path in MPDAG . Then p is a possibly causal path in if and only if there is no edge , in .
Lemma 28
(cf. Lemma 3.3.1 of Zhang, 2006) Let , , and be distinct nodes in a PAG . If , then there is an edge between and with an arrowhead at . Furthermore, if the edge between and is , then the edge between and is either or (that is, not ).
Lemma 29
(cf. Lemma 7.5 of Maathuis and Colombo, 2015) Let and be two distinct nodes in a MAG or PAG . Then cannot have both an edge and a path where each edge , is of one of these forms: or .
Lemma 30
(cf. Lemma 17 of Perković et al., 2018) Let and be pairwise disjoint node sets in a MAG or PAG . Suppose that every proper possibly causal path from to in starts with a visible edge out of and that . Suppose furthermore that there is a path from to in such that
-
(i)
is a proper definite status non-causal path from to in ,
-
(ii)
all colliders on are in , and
-
(iii)
no definite non-collider on is in .
Then there is a proper definite status non-causal path from to that is m-connecting given in .
Theorem 31
Lemma 32
Lemma 33
Lemma 34
(cf. Lemma 59 of Perković et al., 2018) Let and be pairwise disjoint node sets in a DAG such that satisfies the adjustment criterion relative to in (Definition 12). Let and . Then the following statements hold:
-
(i)
satisfies the adjustment criterion relative to in , and
-
(ii)
, for any density consistent with .
Lemma 35
(Lemma 60 of Perković et al., 2018) Let , , and be pairwise disjoint node sets in a causal DAG such that satisfies the adjustment criterion relative to in . Let and . Additionally, let , , and . Then the following statements hold:
-
(i)
,
-
(ii)
if is a non-causal path from to , then is blocked by in ,
-
(iii)
in , where is allowed,
-
(iv)
if then satisfies the generalized back-door criterion relative to in (Definition 13),
-
(v)
the empty set satisfies the generalized back-door criterion relative to in ,
-
(vi)
in , and
-
(vii)
in .
Appendix C A NECESSARY CONDITION FOR IDENTIFIABILITY
This section includes the proof of Proposition 36, which provides a necessary condition for the identifiability of the conditional causal effect given an MPDAG. This result is needed twice – once for the proof of Theorem 3 in Section 3.1 and once for Example 4 in Section 3.2. Below we also provide two supporting results for the proof of Proposition 36 – namely, Lemmas 37 and 38.
C.1 Main Result
Proposition 36
Let , , and be pairwise disjoint node sets in a causal MPDAG . If there is a proper possibly causal path from to in that starts with an undirected edge and does not contain any element of , then the conditional causal effect of on given is not identifiable in .
-
Proof of Proposition 36.
This lemma extends Proposition 3.2 of Perković (2020) and its proof follows similar logic to that of Perković (2020).
Suppose that there is a proper possibly causal path from to in that starts with an undirected edge and does not contain any element of . Then by Lemma 37, there is one such path – call it , , – where the corresponding paths in two DAGs in take the forms and ( when ). Call these DAGs and with paths and , respectively.
To prove that the conditional causal effect of on given is not identifiable in , it suffices to show that there are two families of interventional densities over – call them and , where for , we define – such that the following properties hold.
-
(i)
and are compatible with and , respectively. 333For brevity, we say a DAG is “compatible with” a set of interventional densities and an interventional density is “consistent with” a DAG as shorthand for these claims holding only were the DAG to be causal.
-
(ii)
.
-
(iii)
.
To define such families, we start by introducing an additional DAG and an observational density . That is, let be a DAG constructed by removing every edge from except for the edges on . Then let be the multivariate normal distribution under the following linear structural equation model (SEM). Each random variable has mean zero and is a linear combination of its parents in and , where are mutually independent. The coefficients in this linear combination are defined by the edge coefficients of . We pick these edge coefficients in conjunction with in such a way that each coefficient is in and for all .
From this, we define such that is compatible with and such that . Note that is Markov compatible with by construction, and we build the interventional densities in by replacing the intervening random variables in the SEM with their interventional values (Pearl, 2009).
To construct the second family of interventional densities, we introduce the DAG , which we form by removing every edge from except for the edges on . Then note that we could have defined using a linear SEM based on the parents in . In this case, the resulting observational density would again be a multivariate normal with mean vector zero and a covariance matrix with ones on the diagonal. The off-diagonal entries would be the covariances between the variables in . But note that by Lemma 17, these values will equal the product of all edge coefficients between the relevant nodes in . Since and contain no paths with colliders, the observational density built using will be an identical distribution to that built under . Thus, in an analogous way to , we define such that is compatible with and such that .
Having defined and , we check that their desired properties hold. Note that by construction, and are compatible with and , respectively. Thus 3 holds by Lemma 38. Similarly by construction, (ii) holds. To show that (iii) holds, it suffices to show that is not the same under and .
To calculate these expectations, we first want to apply Rules 1-3 of the do calculus (Equations (14)-(16)). Since , , is consistent with , we apply these rules using graphical relationships in . Because the path in corresponding to , , does not contain nodes in or , then and in . Further, in and in . Thus by Rules 1-3 of the do calculus (Equations (14)-(16)), the following hold.
where is the expectation under . To calculate and , we rely on the observational density , which was constructed using . By Lemma 18, equals the covariance of and under , and by Lemma 17, equals the product of all edge coefficients in , which were chosen to be in . Therefore, . But by definition of , .
-
(i)
C.2 Supporting Result
Lemma 37
Let , , and be pairwise disjoint node sets in an MPDAG . Suppose that there is a proper possibly causal path from to in that starts with an undirected edge and does not contain nodes in . Then there is one such path , , , , where the corresponding paths in two DAGs in take the forms and ( when ), respectively.
-
Proof of Lemma 37.
This lemma is similar to Lemma A.3 of Perković (2020) and its proof borrows from the proof strategy of Lemma C.1 of Perković et al. (2017).
Let be an arbitrary proper possibly causal path from to in that starts with an undirected edge and does not contain nodes in . Then let , , , , be a shortest subsequence of in that also starts with an undirected edge. Note that is a proper possibly causal path from to in that starts with an undirected edge and does not contain nodes in .
Consider when is of definite status. Since is possibly causal, all non-endpoints of are definite non-colliders. Let be a DAG in that contains . Then since is either or a definite non-collider on , the path corresponding to in takes the form by induction. Let be a DAG in with no additional edges into compared to (Lemma 33). Since contains , contains . When , contains either or , and so contains . Thus by the same inductive reasoning as above, the path corresponding to in takes the form (or simply when ).
Consider instead when is not of definite status. Note that . To see that contains , note that by the choice of and the fact that is possibly causal, is unshielded and possibly causal. Thus, is of definite status. However, is not of definite status, so must not be of definite status on , which implies that cannot contain . Since is possibly causal, it also cannot contain .
To find two DAGs in with paths corresponding to that fit our desired forms, we narrow our search to , where we let be an MPDAG constructed from by adding and completing R1-R4 of Meek (1995). We show below that the path corresponding to in takes the form , and thus, there must be two DAGs in with corresponding paths of the forms and .
We first show that contains by the contraposition of Lemma 32. Note that we have already shown that contains , that is formed by adding to , and that contains . It remains to show that . To see this, note that must contain an edge , because is not of definite status on . This edge must take the form by the choice of and the fact that is possibly causal. Thus, contains and . Therefore, . Finally, note that contains by R1 of Meek (1995), since we constructed be adding to a path that is unshielded and possibly causal.
Lemma 38
Let , , and be pairwise disjoint node sets in a causal DAG . Then let be a causal DAG constructed by removing edges from , and let be an interventional density over . If is consistent with , then it is consistent with .
-
Proof of Lemma 38.
Suppose that is consistent with . Then by definition, there exists a set of interventional densities such that is compatible with . Let be the density in under a null intervention. Note that by the truncated factorization in Equation (3), is Markov compatible with . Thus by Lemma 19,
(17) for all , where indicates independence with respect to . Further, since , then and thus . Therefore it follows from (17) that
(18) Let , , be an arbitrary density in . Then by definition and (18)
Since was arbitrary, this holds for all densities in . Thus, is compatible with . Since , then by definition, it is consistent with .
Appendix D PROOFS FOR SECTION 3.1: MPDAGS - CONDITIONAL ADJUSTMENT CRITERION
The following results show the completeness and soundness of the conditional adjustment criterion for identifying conditional adjustment sets in DAGs. We rely on these results to show the analogous results for MPDAGs in Theorem 3 of Section 3.1. Figure 5 shows how the results in this paper fit together to prove Theorem 3. Two supporting results needed for the proof of soundness in DAGs follow the main results below.
D.1 Main Results
Theorem 39
-
Proof of Theorem 39.
Let be a conditional adjustment set relative to in , and let be a density consistent with . We start by showing that is an adjustment set relative to in . To do this, we calculate the following. (Justification for the numbered equations is below.)
(19) (20) Equation (19) follows from Rule 3 of the do calculus (Equation (16)). To show that this rule holds, let be an arbitrary path from to in . Note that must begin with an edge out of . Since , cannot be causal and, therefore, must have colliders. Thus, is blocked, and so . Equation (20) follows from the fact that is a conditional adjustment set relative to . This shows that is an adjustment set relative to in .
Theorem 40
-
Proof of Theorem 40.
This theorem is analogous to Theorem 58 of Perković et al. (2018) for the adjustment criterion. We use the same proof strategy and adapt the arguments to suit our needs.
Suppose that satisfies the conditional adjustment criterion relative to in and let be a density consistent with . Our goal is to prove that
(21) We consider three cases below. Before this, we prove an equality that holds in all cases. Let and . Then in , since does not contain edges into and since all paths from to that start with an edge out of in contain a collider – a collider that cannot be an element of since . Rule 3 of the do calculus (Equation (16)) then implies
(22)
Case 1: Assume that so that . Then we have the following. (Justification for the numbered equations is below.)
(23) | ||||
(24) |
Equation (23) follows from Equation (22) and . Equation (24) follows from the following logic. Since satisfies the conditional adjustment criterion relative to () in and since , it holds that blocks all paths from to in . Thus, in , which implies the analogous independence statement.
Case 2: Assume so that . Define , , , and . Then we have the following. (Justification for the numbered equations is below.)
(25) | ||||
(26) | ||||
(27) | ||||
(28) |
Equation (25) holds since by Lemma 42(iv), is a conditional adjustment set relative to in . Equation (26) holds since is disjoint from . Equation (27) holds since by Lemma 42(iii), we have in , where the analogous independence statement follows. Finally, Equation (28) results from applying Lemma 41(ii).
Case 3: Assume and and define , and as in Case 2 above. We start by showing two equalities that rely on the do calculus. First note that by Lemma 42(vi), in . Thus by Rule 2 of the do calculus (Equation (15)), we have that
(29) |
Second, note by Lemma 42(vii), in . Thus by Rule 3 of the do calculus (Equation (16)), we have that
(30) |
Then we have the following. (Justification for the numbered equations is below.)
(31) | ||||
(32) | ||||
(33) | ||||
(34) |
Equation (31) holds by the applying Equations (29), (30), and (22). Equation (32) holds by the following logic. By Lemma 42(v), the empty set is an adjustment set relative to in . Then by Lemma 41(i), satisfies the conditional adjustment criterion relative to in , and so blocks all paths from to in . Thus, in , where the analogous independence statement follows.
D.2 Supporting Results
Lemma 41
Let , , , and be pairwise disjoint node sets in a causal DAG , where and where satisfies the conditional adjustment criterion relative to in (Definition 2). Let and . Then:
-
(i)
satisfies the conditional adjustment criterion relative to in , and
-
(ii)
, for any density consistent with .
-
Proof of Lemma 41.
This lemma is analogous to Lemma 59 of Perković et al. (2018) (Lemma 34). We use the same proof strategy and adapt the arguments to suit our needs.
(i) By Lemma 6(a), since satisfies the conditional adjustment criterion relative to in , then satisfies the adjustment criterion relative to in . Then by Lemma 34, satisfies the adjustment criterion relative to . The statement follows by a second use of Lemma 6(a).
(ii) Let be an arbitrary density consistent with . We proceed with a proof by induction.
Base case: Suppose so that . When , the claim clearly holds. Thus, we let . Note that the claim holds if either or in . To see this, we calculate the following.
-
(a)
When , then
where the second equality holds since .
-
(b)
When , then
where the second equality holds since .
We use the remainder of the base case to show that (a) or (b) must hold. For sake of contradiction, suppose that neither hold. This implies that there are two paths in : one from to that is d-connecting given and one from to that is d-connecting given . Let , , and , , be such paths, respectively, where is proper. In the arguments below, we use paths related to and – in the proper back-door graph (see Definition 14) and in four of its moral induced subgraphs (see Definition 15) – before applying Theorems 23 and 24 to reach our final contradiction (that cannot satisfy the conditional adjustment criterion relative to in ).
First, we claim that both and are d-connecting given . This holds for by definition. For sake of contradiction, suppose that is blocked by . Since is d-connecting given , it must contain a collider in . Let be the closest collider to on such that , and let , be a shortest causal path in from to . Then let be the node closest to on that is also on , and define the path . Note that is non-causal since either is of non-zero length or , so that is a path into . Further, by the definitions of , , and , we have that is proper non-causal path from to that is d-connecting given . But this contradicts that satisfies the conditional adjustment criterion relative to in .
Next, we prove that the sequence of nodes in corresponding to forms a path. Note that since is proper, we only need to show that does not start with an edge , where is a node that lies on a proper causal path in from to . For sake of contradiction, suppose that starts with for such a . Note that cannot be causal from to , since by the definition of . Thus, is non-causal and there is a collider on such that . Since is d-connecting given and , then . Further, since , this implies that . But this contradicts that satisfies the conditional adjustment criterion relative to in .
Similarly, we prove that the sequence of nodes in corresponding to also forms a path. For this, note that all nodes in on must be a colliders on , since is d-connecting given . Thus, removing edges out of from in order to form will not affect the edges on .
Let and be the paths in corresponding to and , respectively. Then for sake of contradiction, suppose either or is blocked given . Since and are d-connecting given , then there must be a node on or where is a collider on or and every causal path in from to contains the first edge of a proper causal path from to in . Let be an arbitrary such causal path in from to . Note that is a path from to , since must contain a node in and since . But since contains the first edge of a proper causal path from to in , this implies that , which contradicts that satisfies the conditional adjustment criterion relative to in .
We continue the base case by reasoning with four moral induced subgraphs of (see Definition 15). Start by defining the following.
Then define , , , and to be the moral induced subgraphs of on nodes , , , and , respectively. In order to use Theorem 24, we want to show that contains a path from to that does not contain a node in .
Since and are d-connecting given , then by Theorem 23, the following two paths must exist in : path from to and path from to , where neither path contains a node in . Note that since and , any path in or will also be in . Further, since by definition and since we form by removing edges out of from , then . Therefore, and . Thus, and are both paths in .
We complete the base case by applying Theorems 23 and 24 to show our necessary contradiction. Since we can combine subpaths of and to form a path in from to that does not contain a node in , then by Theorem 23, and are d-connecting given in . By Theorem 24, this implies that does not satisfy the adjustment criterion relative to in (see Definition 12). Therefore, by the contraposition of Lemma 6(a), does not satisfy the conditional adjustment criterion relative to in , which is a contradiction.
Induction step: Assume that the result holds for , , and let . Take an arbitrary , and define and . Since the base case holds and since , then
(35) |
Further, by part (i), satisfies the conditional adjustment criterion relative to in . Since and , then by the induction assumption,
(36) |
Lemma 42
Let , and be pairwise disjoint node sets in a causal DAG , where and where satisfies the conditional adjustment criterion relative to in (Definition 2). Let and . Additionally, let , , , and . Then the following statements hold:
-
Proof of Lemma 42.
This lemma is analogous to Lemma 60 of Perković et al. (2018) (Lemma 35), which is needed for adjustment in total effect identification. We rely on this result in the proof below.
Note that , and are pairwise disjoint node sets in , where by Lemma 6(a), satisfies the adjustment criterion relative to in . Results (i)-(iii) and (vi) follow directly from Lemma 35. Result (v) follows additionally from Theorem 25. Result (iv) follows additionally from Theorem 25 and Lemma 6(a).
(vii) Let be an arbitrary path from to in . By definition of , begins with an edge out of . Since, by definition, , where , then must contain at least one collider. Let be the set containing the closest collider to on and its descendants in . Note that . By definition of and by assumption, , and thus, is blocked by .
Appendix E CONDITIONAL BACK-DOOR CRITERION
This section extends Pearl’s back-door criterion (2009) to the context of estimating a conditional causal effect in a DAG. Definition 43 provides the extended criterion, and Lemma 44 establishes that this criterion is sufficient for conditional adjustment. Lemma 45 makes a comparison between this criterion and the generalized back-door criterion of Maathuis and Colombo (2015) (Definition 13).
Definition 43
(Conditional Back-door Criterion for DAGs) Let , , , and be pairwise disjoint node sets in a DAG , where . Then satisfies the conditional back-door criterion relative to in if
-
(a)
, and
-
(b)
blocks all proper back-door paths from to .
Lemma 44
-
Proof of Lemma 44.
Let be a set that satisfies the conditional back-door criterion relative to in , and let be a density consistent with . Then
The first two equalities follow from the law of total probability and the chain rule. The third equality follows from Rules 2 and 3 of the do calculus (Equations (15) and (16)) and the d-separations shown below.
In order to use Rule 2 to conclude that , we show that . Note that only contains back-door paths from to . So every path from to in contains a proper back-door path from to as a subpath. Since blocks all proper back-door paths from to in , the d-separation holds.
In order to use Rule 3 to conclude that , we show that . This follows from the assumptions that and .
Lemma 45
-
Proof of Lemma 45.
Follows immediately.
Since satisfies the conditional back-door criterion relative to in , then . Combining this with our assumptions gives us that . In the remainder of the proof, we show that blocks all back-door paths from to . The result follows by Definition 13.
Let be an arbitrary back-door path from to in . For sake of contradiction, suppose that is d-connecting given . Let be the node in closest to on , and let . Note that is proper. When , then is a back-door path. When , then because is d-connecting given , we have that is a collider on , and therefore, is again a back-door path. Thus, is a proper back-door path from to that, by assumption, must be blocked by .
Let be the node on immediately following . That is, contains . Note that since is blocked given , then . Thus, we consider the path . Since is d-connecting given , where is a non-collider on , then and thus, is also d-connecting given . Similarly, since is blocked given , where is not a collider on and , then is also blocked by .
Since is d-connecting given and blocked given , then must contain at least one collider in . Let be the closest such collider to on and let , be a shortest causal path from to in . While there must be a causal path from to in , note that need not be one, and thus, we allow for the possibility that .
Let be the node closest to on that is also on , and define the path . Note that since is proper, is at least of length one, and therefore, is a back-door path. Further, since is d-connecting given and by the definition of and , we have that is proper back-door path from to that is d-connecting given . But this contradicts that satisfies the conditional back-door criterion relative to in .
Appendix F PROOFS FOR SECTION 3.3: MPDAGS - CONSTRUCTING CONDITIONAL ADJUSTMENT SETS
This section includes the proofs of two results from Section 3.3: Lemma 4 and Theorem 5. We also provide three supporting results needed for these proofs.
F.1 Main Results
-
Proof of Lemma 4.
By Lemma 26, must satisfy condition (a) of Definition 2, so it suffices to show that blocks all non-causal definite status paths from to in . Note that since , any definite status path from to in that starts with an edge into is blocked by .
Further, any non-causal definite status path from to in that starts with an edge out of or an undirected edge must contain a collider. Additionally, the closest collider to on any such path and all of its descendants in must be in by Lemma 48. Then since , these paths are also blocked by .
-
Proof of Theorem 5.
By Theorem 3, it suffices to show that and separately satisfy the conditional adjustment criterion relative to in (Definition 2). We start by noting that and are both disjoint from , so it suffices to prove that (a) and (b) block all proper non-causal definite status paths from to in . We prove (a) and (b) below. For these proofs, note that by the assumption that and by Lemma 26.
(a) Suppose for sake of contradiction that there is a proper non-causal definite status path from to in that is d-connecting given . Let be a shortest such path.Since is proper, no non-endpoint on is in . Suppose for sake of contradiction that there exists that is a non-endpoint on . By choice of , this implies that is possibly causal. Then by Lemma 27, since is non-causal, must contain a collider on . Let be the closest such collider to (possibly ). Note that by Lemma 27, , so by Lemma 48, , where . Thus, . However, this contradicts that is d-connecting given . Therefore, no non-endpoint on is in .
We now consider cases (1) and (2) below.
-
(1)
Consider when there is no collider on . Since is d-connecting given , no node on is in . Then by Equation (8), no node on is in . However, note that by Lemma 27, every non-endpoint on is a possible ancestor of an endpoint on and thus is in . Combining these, we have that all non-endpoints on are in . But this implies that there is no set that is both disjoint from and can block . By Theorem 3, this contradicts our assumption that there is a conditional adjustment set relative to in .
-
(2)
Consider when there is at least one collider on . For sake of contradiction, suppose that there are more than three nodes on . Then there is a non-collider such that or is on . Since is d-connecting given , then and . By Equation (8) and Lemma 48, . Additionally, since is d-connecting given , then . Combining these, we have that . Since there is a causal path in from to every node in , by Lemma 48, . However, this would contradict that is d-connecting given .
Hence, must be of the form , where and thus by Equation (8) and Lemma 48, . Note that , since otherwise, . Further, , because otherwise by Lemma 48, , which would imply which we have shown is a contradiction. Therefore, .
Let , be a shortest possibly causal path in from to . Further, define the node , as follows. When has no directed edges, let . When has at least one directed edge, let be the node on closest to such that is on . Note that by Lemma 46, is unshielded. Thus by R1 of Meek (1995), takes the form .
Pause to consider the path . Note that cannot be in , because no set can block this proper non-causal definite status path from to in . By Theorem 3, this would contradict our assumption that there is a conditional adjustment set relative to in . Similarly, and are not in , because this would imply , which we have shown is a contradiction. Thus, is an unshielded collider in .
We complete this case by showing that contains . If , we are done. If instead , then consider the node . Since and are in , so is a path by R1 of Meek (1995). The unshielded paths and contradict that R1 of Meek (1995) is completed in . Further, the path or contradicts that R2 of Meek (1995) is completed in , and the path contradicts that R3 of Meek (1995) is completed in . This leaves only one option for , and that is .
If , we are done. If instead , then we consider the node . By identical logic to that above, we can show that contains . Continuing in this way, we have that contains .
With this shown, we derive our final contradictions. When , then contains . But this is a proper non-causal definite status path from to that no set can block, which we have shown is a contradiction. When , then contains the following two paths: and . These paths are proper non-causal definite status paths from to that cannot both be blocked by the same set, which again is a contradiction.
-
(1)
We now consider cases (1) and (2) below. In both cases, we show that – and therefore – is blocked by .
-
(1)
Suppose that ends with or . If has no colliders, then by Lemma 27, is a possibly causal path from to . Since , this implies that . But then there is no set that is both disjoint from and can block . By Theorem 3, this contradicts our assumption that there is a conditional adjustment set relative to in . Hence, there must be a collider on .
-
(2)
Suppose that ends with . Note that is not a possibly causal path from to , so by Lemma 27, there must be an edge , , on . In particular, let be the closest node to on such that is on .
In order to complete this proof, we want to show that either or . In both cases, we will show that is blocked by . To do this, we briefly note that by the choice of , the path is possibly causal and every node in is a non-collider on . Further by the choice of , no node in is in . We turn to consider each node in , working backward through the set.
Consider the node . If , then since is a non-collider on , is blocked by , and we are done. Consider when . Since and since is in , then either or . We show the latter is impossible. If and , then by Equation (9), we have that . But by Lemma 26, this implies that . Since contains , then . But this contradicts that by the definition of a parent set. Therefore, either and we are done, or .
In the latter case, we turn to consider if such a node exists. If contains , then since , we can use the same logic as above to show that either and we are done, or . If contains , then since , we have that . Because is possibly causal, then by Lemma 48, .
Working backward in this way, either a node on is in and we are done, or for all . In the latter case, we have that and that every node in is a non-collider on . We can now apply the same argument as in (1) above to show that – and therefore – is blocked given .
F.2 Supporting Results
Lemma 46
Let and be distinct nodes in an MPDAG and let be a possibly causal path from to in . Then any shortest subsequence of forms an unshielded, possibly causal path from to .
-
Proof of Lemma 46.
This result is similar to Lemma 3.6 of Perković et al. (2017), but we derive a slightly more general statement.
Let be the number of nodes on . Pick an arbitrary shortest subsequence of and call it , where , . Note that there is no edge in , since this would contradict that is possibly causal. Thus, is also possibly causal by definition. Further note that is unshielded, since if any triple on the path is shielded, it either contradicts that is possibly causal (i.e. cannot be in ) or that is a shortest subsequence of (i.e. and cannot be in ).
Lemma 47
Let be a path in an MPDAG . Then is possibly causal if and only if does not contain any path , .
-
Proof of Lemma 47.
Suppose that does not contain any path , . Then does not contain any edge , . Therefore, by definition, is possibly causal in .
Now suppose is possibly causal in . For sake of contradiction, suppose contains a path from to , , of the form .
Consider the subpath of from to . Note that this subpath is a possibly causal path. Let be a shortest subsequence of this subpath. By Lemma 46, is an unshielded, possibly causal path.
Consider the edge . cannot be in , since is possibly causal. Neither is in since being unshielded would imply, by R1 of Meek (1995), that contains the cycle . Thus contains .
However, note that no DAG in can contain the edge , since being unshielded would imply, by R1 of Meek (1995), that the DAG contains the cycle . This contradicts that contains . Thus we conclude that does not contain any path , .
Lemma 48
Let , , and be distinct nodes in an MPDAG .
-
(i)
If is a possibly causal path from to and is a causal path from to , then is a possibly causal path from to .
-
(ii)
If is a causal path from to and is a possibly causal path from to , then is a possibly causal path from to .
-
Proof of Lemma 48.
Let and let . Before beginning the main arguments, we note that and cannot share any nodes other than , and thus, we can define a path . To see this, for sake of contradiction, suppose and share at least one node other than . Let denote the collection of such nodes, and consider the node in with the lowest index on . That is, consider such that for all . Let for some on . Note that since or is causal, contains either or . By Lemma 47, the first option contradicts that is possibly causal and the second contradicts that is possibly causal. Thus we conclude that and cannot share any nodes other than .
For to be possibly causal in we only need to show that there is no backward edge between any two nodes on . Note that there is no edge for , or for in , by choice of and .
(i) Assume for sake of contradiction that there exists an edge in for and . Note that is on and not , and analogously, is on and not , since we have shown and cannot share nodes other than . Also note that since is causal, it contains .
Consider the subpath . Since is possibly causal, so is this subpath. Pick an arbitrary shortest subsequence of and call it , where , . By Lemma 46, forms an unshielded, possibly causal path from to .
Consider the edge . Edge cannot be on , since is possibly causal. Then or must be in . However, note that no DAG in can contain the edge , since being unshielded would imply, by R1 of Meek (1995), that the DAG contains the cycle . This contradicts that contains or . Thus, there does not exist an edge in .
(ii) Assume for sake of contradiction that there exists an edge in for and . Note that is on and not , and analogously, is on and not , since we have shown and cannot share nodes other than . Also note that since is causal, it contains .
Consider the subpath . Since is possibly causal, so is this subpath. Pick an arbitrary shortest subsequence of and call it , where , . By Lemma 46, forms an unshielded, possibly causal path from to .
Consider the edge . Edge cannot be on , since is possibly causal. Then or must be in . However, note that no DAG in can contain the edge , since being unshielded would imply, by R1 of Meek (1995), that the DAG contains the cycle . This contradicts that contains or . Thus, there does not exist an edge in .
Appendix G PROOF FOR SECTION 4.1: PAGS - CONDITIONAL ADJUSTMENT CRITERION
This section includes the proof of Theorem 9 and one result (Lemma 49) needed for the proof of Lemma 8. The statements of Theorem 9 and Lemma 8 can be found in Section 4.1.
Figure 6 shows how the results in this paper fit together to prove Theorem 9. Note that Theorem 9 is an analogous result to Theorem 3 (Section 3.1), where the former applies to PAGs and the latter to MPDAGs. However, while the proof of Theorem 3 relies directly on completeness and soundness proofs for DAGs (see Figure 5 in Supplement D), the proof of Theorem 9 relies on them indirectly through Theorem 3.
Lemma 49
Let and be disjoint node sets in a PAG . Then the following statements are equivalent.
-
(i)
.
-
(ii)
in every DAG represented by .
-
Proof of Lemma 49.
Let be a possibly causal path from to in and let , , , , be an unshielded possibly causal subsequence of in .
Since contains , or , there must be some MAG in with the edge . Let be the path in corresponding to in . Then since is unshielded, so is , and so takes the form . Let be a DAG created from , by retaining all the nodes in and all the directed edges in and by adding a node and edges and for each bidirected edge in (this DAG is titled the canonical DAG by Richardson and Spirtes, 2002). Now, DAG contains a causal path from to .
If there is a DAG represented by with a causal path from to , then any MAG of that contains and will contain a causal path from to . This is due to the fact that a MAG of a DAG will preserve ancestral relationships between observed variables. Then the path in that corresponds to in cannot have any arrowheads pointing in the direction of , and so it must be possibly causal.
Appendix H PROOFS FOR SECTION 4.2: PAGS - CONSTRUCTING CONDITIONAL ADJUSTMENT SETS
This section includes the proof of Theorem 10, which can be found in Section 4.2. We provide one supporting result needed for the proof of this theorem.
We make an important remark here on R software. Note that by Lemmas 6 and 8, any algorithms developed for checking the existence of an unconditional adjustment set (Definition 11) also apply to conditional adjustment sets – provided that . First consider the R package dagitty (Textor et al., 2016). Suppose the condition on is satisfied and let be a set such that . Then, one can apply the function isAdjustmentSet of the package dagitty to a PAG , set , exposure , and outcome to learn whether is a conditional adjustment set relative to in . Next consider the R package pcalg (Kalisch et al., 2012). Suppose the condition on is satisfied and let be a set such that . Then, one could apply the function gac of the package pcalg to the MPDAG or PAG and to the node sets , , and . These functions will return TRUE if and only if is a conditional adjustment set relative to in , and FALSE otherwise.
H.1 Main Result
-
Proof of Theorem 10.
Suppose that does not satisfy the conditional adjustment criterion relative to in . Since by construction, it must be that there is a proper definite status non-causal path from to that is m-connecting given . By Lemma 50, there is then a proper definite status non-causal path from to in such that all definite non-colliders on are in (case (ii) of Lemma 50) and all colliders on are in (cases (iii) and (vi) of Lemma 50). Since , for any set that satisfies , Lemma 30 implies that there is also a proper definite status non-causal path from to in that is open given . Since this is true for an arbitrary set that satisfies condition (a) of Definition 7, it follows that there cannot be any set that satisfies the conditional adjustment criterion relative to to in .
H.2 Supporting Result
Lemma 50
Let , , and , be pairwise disjoint node sets in a PAG , where and where every proper possibly causal path from to in starts with a visible edge out of . Suppose furthermore, that there exists a set that satisfies the conditional adjustment criterion for in . If there is a proper definite status non-causal path from to in that is m-connecting given (see definition in Theorem 10), then there is a path from to in such that the following hold.
-
(i)
Path is a proper definite status non-causal path from to in .
-
(ii)
All definite non-colliders on are in .
-
(iii)
There is at least one collider on , and all colliders on are in , where and are disjoint sets such that
-
(iv)
None of the colliders on can be possible descendants of a non-collider on .
-
(v)
For any collider on there is an unshielded possibly directed path from to that does not start with .
-
(vi)
, that is for any collider on there is an unshielded directed path from to .
-
Proof of Lemma 50.
Consider the sets of all proper definite status non-causal paths from to in that are m-connecting given and choose among them a shortest path with a shortest distance to (Definition 16). Let this path be called , where , , . By choice of , (i) is satisfied. We will now show that also satisfies properties (ii)-(vi) above.
Also, since is proper, a node in cannot be a non-endpoint node on . Now, since is additionally chosen as a shortest proper non-causal definite status path from to that is m-connecting given , it holds that either a node in is not a non-endpoint node on , or there is a node on such that is a possibly causal path from to . Moreover, in this case must be a causal path in (because must start with a visible edge and because cannot be a subpath of a definite status path). Since itself is a non-causal path in , there is a collider on that is a descendant of . But since , this collider would then also have to be in , which we have ruled out as an option in the previous paragraph. Hence, a node on is also not a non-endpoint node on
Then all colliders on are in . Also, any definite non-collider on is a possible ancestor of a collider on or of an endpoint on . Hence, every definite non-collider on is in But, since is m-connecting given , none of the definite non-colliders on are in . Therefore, any definite non-collider on is in . This proves property (ii).
Next, consider property (iii). We have already shown that any collider on is in . So it is only left to show that at least one collider is on . Since we know that must be blocked by for some set , where , and since all definite non-colliders on are in , there is at least one collider on .
Property (iv) follows almost directly now, since by (ii), all definite non-colliders on are in and by (iii), none of the colliders can be in . The claim then holds since by definition of the in a PAG, .
Next, we show properties (v) and (vi). Let be a collider on . Then and that there is an unshielded possibly directed path from to a node .
(v) Suppose for a contradiction that edge on is of type (possibly ). We derive a contradiction by constructing a proper definite status non-causal path from to that is m-connecting given and shorter than , or of the same length as but with a shorter distance to (Definition 16).
Let and be nodes on such that is a subpath of (possibly , ). Then paths and together with Lemma 28 imply that is in .
Suppose first that , and . Note that by property (iv) above, if , then is in . Moreover, if is in , then is in , otherwise path and edge contradict Lemma 29. Hence, if , the collider/definite non-collider status of is the same on and on . Analogous reasoning can be employed in the case when , to show that , that is, the collider/definite non-collider status of is the same on and on .
Now, we return to the general case where we allow and . In each of the cases below we will derive the contradiction by finding a path from to in that is a proper non-causal definite status path in and m-connecting given . Additionally, the path will either be shorter than or of the same length as , but with a shorter distance to (Definition 16) which implies a contradiction with our choice of .
Suppose first that is not a node on .
-
–
If , then
-
*
if and , then let . By the reasoning above, this path transformation amounts to replacing on with on thereby creating a path with the same properties as but with a shorter distance to (Definition 16).
-
*
If , and , then let . This path transformation amounts to replacing on , with on , thereby creating a path with the same properties as and of the same length as but with a shorter distance to (Definition 16).
-
*
If , and , then let . This path transformation amounts to replacing on , with on , thereby creating a path with the same properties as , that is of the same length, but with a shorter distance to (Definition 16).
-
*
If , and , then let . Now is of the form and clearly satisfies all the same properties as while being of the same length, but with a shorter distance to (Definition 16).
-
*
-
–
If , , then:
-
*
if , let . This path transformation amounts to replacing on , with on , thereby creating a shorter path with the same properties as .
-
*
If , then let . Due to the discussion above, is of the form in .
-
*
-
–
Otherwise, . If , this would imply that , which contradicts (iii). So must be in . Then:
-
*
if , then let . This path transformation amounts to replacing on , with on , thereby creating a shorter path with the same properties as .
-
*
If , then let . Due to the discussion above, is of the form in .
-
*
Otherwise, is on . Therefore, . Also, is a collider on , otherwise and , because of .
-
–
Suppose first that is on . Then:
-
*
if , then let . This path transformation amounts to replacing on , with on , thereby creating a shorter path with the same properties as .
-
*
If , then let . This path transformation amounts to replacing on , with on , thereby creating a shorter path with the same properties as .
-
*
-
–
Next, suppose that is on . Then depending on whether , we can choose one of the following paths as the path :
-
*
if , then let . This path transformation amounts to replacing on , with on , thereby creating a shorter path with the same properties as .
-
*
If , then let . Similarly to above, this path transformation amounts to replacing on , with on , thereby creating a shorter path with the same properties as .
-
*
(vi) Since we showed above that the starting edge on is not of the form , and since is an unshielded possibly directed path from to , in order to prove property (vi) it is enough to show that is also not of the form (since cannot be a subpath of any unshielded possibly directed path in , Zhang, 2008b). Suppose for a contradiction that is exactly of that form. Since and are in , by Lemma 28, is in .
Now, our goal is to identify a nodes and on that satisfy the following. Node is on , and edge is in . Additionally, or is a non-endpoint node on that has the same definite non-collider/collider status on and on . Similarly, is on , and edge is in . Additionally, or is a non-endpoint node on that has the same definite non-collider/collider status on and on . We only show how to find node on , since the argument for finding on is exactly symmetric.
-
–
Consider the path . Note that by (iv) and the properties of unshielded paths, is of the form or , for some , .
Hence, if there is any non-endpoint node on such that , this node has the same definite collider / non-collider status on both and on . Then we choose . Otherwise, if there is a non-endpoint node on such that is of the form , and an edge or is in , then is a definite non-collider on both and and we choose .
We will now show that if neither of the above choices for are possible in , then is of the form , and for every node , on , the edge or is in . In this case, we choose .
Hence, consider first node on . By above is in . Also, by our assumption is not in , so we must have either or is in . Similarly, by the assumption above we now know that edge is not of the form , so we can conclude that is in .
Now, and either or is in . If is in , then of Zhang (2008b) would imply that . Moreover, since is in , of Zhang (2008b) would imply that is in , and our assumption further lets us conclude that , or is in
If is in , then and Lemma 28 imply that, is in . Hence, as above either , or is in
If we are done. Otherwise, we can repeat the same argument as in the preceding three paragraphs to conclude that is in , and either or are in . If , we can keep applying the same argument, until we reach .
Now that we have chosen the appropriate and the remaining argument is very similar to case (v). In each of the cases below we will derive the contradiction by finding a path from to in that is a proper non-causal definite status path in and m-connecting given . Additionally, the path will either be shorter than or of the same length as , but with a shorter distance to (Definition 16) which implies a contradiction with our choice of .
Suppose first that is not on :
-
–
If , then
-
*
if and , then let . By the reasoning above, this path transformation amounts to replacing on with on such that the collider / definite non-collider status of and is the same on both paths. Therefore, is a path with the same properties as , but either shorter than or of the same length but with a shorter distance to (Definition 16).
-
*
If , and , then let . By the reasoning above, this path transformation amounts to replacing on with on such that the collider / definite non-collider status of is the same on both paths, and is a non-causal path because of edge. Therefore, is a path with the same properties as but either shorter than or of the same length but with a shorter distance to (Definition 16).
-
*
If , and , then let . This path transformation amounts to replacing on with on such that the collider / definite non-collider status of is the same on both paths, and is a non-causal path because of edge. Therefore, is a path with the same properties as but either shorter than or of the same length but with a shorter distance to (Definition 16).
-
*
If and , . Then is of the form and and has a shorter distance to than .
-
*
-
–
If , , then:
-
*
if , then let . This path transformation amounts to replacing on with on such that the collider / definite non-collider status of is the same on both paths, and is a non-causal path because of edge. Therefore, is a path with the same properties as shorter than .
-
*
If , then let , where based on the reasoning above, is of the form .
-
*
-
–
Otherwise, , . Then
-
*
if , then . Note that in this case is of the form , or , or . In all cases, is a proper non-causal definite status path from to that is m-connecting given
-
*
If , then let . We now discuss why is of the form in .
Note that cannot be in , since there exists a set that can satisfy the conditional adjustment criterion relative to in . If instead is a visible edge in , then there is either a node such that is in or there is a collection of nodes , such that , , and is in . Without loss of generality we will assume that we are in the fist case, that is is in and , since the latter case has an analogous proof to what follows.
By above, the only way way that is if is in and if for all nodes , or is in . Now since, is also in , and , we can use of Zhang (2008b) iteratively to conclude that is in for all . However, as , this contradicts our assumption that is in , for .
-
*
Otherwise, is on . Therefore, .
-
–
Suppose first that is on . By (iii), (iv), and the definition of , we have that is of one of the following forms:
-
*
for , or
-
*
, for some on , or
-
*
, for some on , or
-
*
.
Then
-
*
If , then . Note that by above forms of is always a is a proper non-causal path from to . Additionally, by above listed options for we know that has the same collider / definite non-collider status on both and . Hence, is also an m-connecting path given . Since is also shorter than we obtain our contradiction.
-
*
If , we let . Path is proper, since itself is proper and . Furthermore, by the above listed options for we know that has the same collider / definite non-collider status on both and and that is a definite status path. Hence, is also an m-connecting path given . If is a non-causal path in , we obtain a contradiction with the choice of .
Hence, suppose for a contradiction that is a possibly causal path from to in . By assumption, it must be that is a visible edge in . Now, similarly to the previous case, since is a visible edge in , there is either a node such that is in or there is a collection of nodes such that , , and is in . We again assume without loss of generality that we are in the former case, that is is in and .
Since , by the same reasoning as in the previous case above we know that is in and that for all nodes , or is in . Now since, is in , and since , we can use of Zhang (2008b) iteratively to conclude that is in for all . However, as , this contradicts our assumption that is in .
-
*
-
–
Lastly, suppose that is on . Analogously to above, by (iii), (iv), and the definition of , we have that is of one of the following forms:
-
*
, or
-
*
, for some on , or
-
*
, for some on , or
-
*
.
Then
-
*
If , we have that is a proper non-causal path from to that is shorter than . Additionally, is of the same collider / definite non-collider status on both and and therefore, is not only of definite status, but also m-connecting given in which leads to a contradiction.
-
*
If , then is a proper definite status non-causal path that is m-connecting given in and shorter than .
-
*