This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

 

Conditional Adjustment in a Markov Equivalence Class


 


Sara LaPlante                        Emilija Perković University of Washington                        University of Washington

Supplement to:
Conditional Adjustment in a Markov Equivalence Class

Abstract

We consider the problem of identifying a conditional causal effect through covariate adjustment. We focus on the setting where the causal graph is known up to one of two types of graphs: a maximally oriented partially directed acyclic graph (MPDAG) or a partial ancestral graph (PAG). Both MPDAGs and PAGs represent equivalence classes of possible underlying causal models. After defining adjustment sets in this setting, we provide a necessary and sufficient graphical criterion – the conditional adjustment criterion – for finding these sets under conditioning on variables unaffected by treatment. We further provide explicit sets from the graph that satisfy the conditional adjustment criterion, and therefore, can be used as adjustment sets for conditional causal effect identification.

1 INTRODUCTION

Many scientific disciplines have an interest in identifying and estimating causal effects for specific subgroups of a population. For instance, researchers may want to know if a medical treatment is beneficial for people with heart disease or if the treatment will harm older patients (Brand and Xie, 2010; Health, 2010). Such causal effects are referred to as conditional causal effects or heterogeneous causal effects. The identification of these conditional causal effects from observational data is the subject of this work.

Much of the literature on estimating conditional causal effects from observational data focuses on the conditional average treatment effect (CATE; Athey and Imbens, 2016; Wager and Athey, 2018; Künzel et al., 2019; Nie and Wager, 2021; Kennedy et al., 2022). The CATE is represented as a contrast of means for a response YY under different do-interventions (see Section 2 for definition) of a treatment XX when conditioning on a set of covariate values 𝐳\mathbf{z}. These means take the form 𝔼[Y|do(X=x),𝐙=𝐳]\operatorname{\mathbb{E}}[Y|do(X=x),\mathbf{Z}=\mathbf{z}].

Some results on CATE estimation assume that the conditioning set 𝐙\mathbf{Z} is rich enough to capture all relevant common causes of XX and YY – meaning that XX and YY are unconfounded given 𝐙\mathbf{Z}. This implies

𝔼[Y|do(X=x),𝐙=𝐳]=𝔼[Y|X=x,𝐙=𝐳],\displaystyle\operatorname{\mathbb{E}}[Y|do(X=x),\mathbf{Z}=\mathbf{z}]=\operatorname{\mathbb{E}}[Y|X=x,\mathbf{Z}=\mathbf{z}], (1)

which allows the CATE to be estimated as a difference of means from observational data.

However, this assumption does not hold in all applications. Consider, for example, the setting depicted in the causal directed acyclic graph (DAG) of Figure 1, where we want to compute a causal effect of XX on YY given some set 𝐙\mathbf{Z}. In this setting, age and smoking status are common causes of XX and YY, and therefore, XX and YY are confounded unless we condition on both age and smoking status (𝐙={Age,Smoking}\mathbf{Z}=\{Age,Smoking\}). But we may want to know the causal effect of XX on YY conditional on age alone (𝐙={Age}\mathbf{Z}=\{Age\}).

XXAgeAgeSmokingSmokingYY
Figure 1: A causal DAG used in Section 1.

To allow for estimation of the CATE in such cases, various recent works (Abrevaya et al., 2015; Fan et al., 2022; Chernozhukov et al., 2023; Smucler et al., 2020) have proposed estimation methods that rely on knowing an additional set of covariates 𝐒\mathbf{S} that – together with 𝐙\mathbf{Z} – leads to XX and YY being unconfounded. We refer to this set of variables as a conditional adjustment set (Definition 1). For such a set 𝐒\mathbf{S},

𝔼[Y|d\displaystyle\operatorname{\mathbb{E}}[Y|d o(X=x),𝐙=𝐳]\displaystyle o(X=x),\mathbf{Z}=\mathbf{z}] (2)
=𝔼𝐒[𝔼[Y|X=x,𝐙,𝐒]|𝐙=𝐳].\displaystyle=\operatorname{\mathbb{E}}_{\mathbf{S}}\Big{[}\operatorname{\mathbb{E}}[Y|X=x,\mathbf{Z},\mathbf{S}]\ \Big{|}\ \mathbf{Z}=\mathbf{z}\Big{]}.

In the example above, if 𝐙={Smoking}\mathbf{Z}=\{Smoking\}, then 𝐒={Age}\mathbf{S}=\{Age\}.

Of course, not all conditional causal effect research focuses on estimation through the functional in Equation (2). Notably, other work has explored identifiability without limiting focus to a particular functional. For example, Shpitser and Pearl (2008) and Jaber et al. (2019, 2022) focus on the conditions under which the interventional distribution f(𝐲|do(𝐱),𝐳)f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) is identifiable given a causal graph. Though these results broaden the options for identification, estimators based on these results would have to rely on functionals that may prove difficult to estimate, such as f(𝐲,𝐳|do(𝐱))f(𝐳|do(𝐱))\frac{f(\mathbf{y},\mathbf{z}|do(\mathbf{x}))}{f(\mathbf{z}|do(\mathbf{x}))} (Shpitser and Pearl, 2008; Jaber et al., 2019, 2022). Our work addresses this by focusing on identification of the same interventional distribution given a causal graph – but through the use of conditional adjustment sets, which may lead to more desirable estimators. To the best of our knowledge, this area of research is largely unexplored.

Our main contribution is the conditional adjustment criterion (Definitions 2 and 7), a graphical criterion that we show is necessary and sufficient for identifying a conditional adjustment set (Theorems 3 and 9). We additionally provide explicit sets that satisfy this criterion when any such set exists. We note, however, that these results are restricted to a setting where the conditioning set 𝐙\mathbf{Z} consists of variables known to be unaffected by treatment. While this restricted setting produces limitations (see the second example in the discussion, Section 5), our results are broadly applicable to a variety of research questions. For example, the restriction is met when the conditioning set includes exclusively pre-treatment variables.

In considering the problem of identifying a conditional adjustment set, we assume that the underlying causal system can be represented by a causal DAG. When we collect observational data on all variables in the system, we can attempt to learn this causal DAG by relying on the constraints present in the data (Spirtes et al., 1999; Chickering, 2002; Zhang, 2008b; Hauser and Bühlmann, 2012; Mooij et al., 2020; Squires and Uhler, 2022). However, this task is often impossible from observational data alone, regardless of the available sample size. And further, we cannot always observe every variable.

Thus, our work focuses on causal models that represent Markov equivalence classes of graphs that can be learned from observational data: a maximally oriented partially directed acyclic graph (MPDAG; Meek, 1995) and a maximally oriented partial ancestral graph (PAG; Richardson and Spirtes, 2002). An MPDAG represents a restriction of the Markov equivalence class of DAGs that can be learned from observational data and background knowledge when all variables are observed (Andersson et al., 1997; Meek, 1995; Chickering, 2002). A PAG represents a Markov equivalence class of maximal ancestral graphs (MAGs; Richardson and Spirtes, 2002), which can be learned from observational data and which allows for unobserved variables (Spirtes et al., 2000; Zhang, 2008b; Ali et al., 2009). A MAG, in turn, can be seen as a marginalization of a DAG containing only the observed variables (Richardson and Spirtes, 2002). See Section 2 and Supp. A for further definitions.

The structure of this paper is as follows: Section 2 provides preliminary definitions, with the remaining definitions given in Supp. A. Section 3 contains all results for the MPDAG setting. In particular, we introduce our conditional adjustment criterion in Section 3.1; Section 3.2 illustrates applications of our criterion with examples; Section 3.3 provides several methods for constructing conditional adjustment sets; and Section 3.4 includes a discussion of the similarities of our conditional adjustment criterion with both the adjustment criterion of Perković et al. (2017) and the 𝐙\mathbf{Z}-dependent dynamic adjustment criterion of Smucler et al. (2020). We present some analogous results for PAGs in Section 4, and we discuss some limitations of our results and areas for future work in Section 5.

2 PRELIMINARIES

We use capital letters (e.g. XX) to denote nodes in a graph as well as random variables that these nodes represent. Similarly, bold capital letters (e.g. 𝐗\mathbf{X}) are used to denote node sets and random vectors.

Nodes, Edges, and Subgraphs. A graph 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V},\mathbf{E}) consists of a set of nodes (variables) 𝐕={V1,,Vp},p1\mathbf{V}=\left\{V_{1},\dots,V_{p}\right\},p\geq 1, and a set of edges 𝐄\mathbf{E}. Edges can be directed (\rightarrow), bi-directed (\leftrightarrow), undirected ( or -), or partially directed (\rightarrow). We use \bullet as a stand in for any of the allowed edge marks. An edge is into (out of) a node XX if the edge has an arrowhead (tail) at XX. An induced subgraph 𝒢𝐕=(𝐕,𝐄)\mathcal{G}_{\mathbf{V^{\prime}}}=(\mathbf{V^{\prime}},\mathbf{E^{\prime}}) of 𝒢\mathcal{G} consists of 𝐕𝐕\mathbf{V^{\prime}}\subseteq\mathbf{V} and 𝐄𝐄\mathbf{E^{\prime}}\subseteq\mathbf{E} where 𝐄\mathbf{E^{\prime}} are all edges in 𝐄\mathbf{E} between nodes in 𝐕\mathbf{V^{\prime}}.

Directed and Partially Directed Graphs. A directed graph contains only directed edges (\to). A partially directed graph may contain undirected edges (-) and directed edges (\to).

Mixed and Partially Directed Mixed Graphs. A mixed graph may contain directed and bi-directed edges. The partially directed mixed graphs we consider can contain any of the following edge types: , \rightarrow, \to, and \leftrightarrow. Hence, an edge \bullet\rightarrow in a partially directed graph can only refer to edge \to, whereas in a partially directed mixed graph, \bullet\rightarrow can represent \to, \leftrightarrow, or \rightarrow.

Paths and Cycles. For disjoint node sets 𝐗\mathbf{X} and 𝐘\mathbf{Y}, a path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} is a sequence of distinct nodes X,,Y\langle X,\dots,Y\rangle from some X𝐗X\in\mathbf{X} to some Y𝐘Y\in\mathbf{Y} for which every pair of successive nodes is adjacent. A path consisting of undirected edges (- or ) is an undirected path. A directed path from XX to YY is a path of the form XYX\to\dots\to Y. A directed path from XX to YY and the edge YXY\to X form a directed cycle. A directed path from XX to YY and the edge XYX\to Y form an almost directed cycle. A path V1,,Vk\langle V_{1},\dots,V_{k}\rangle, k>1k>1, in a graph 𝒢\mathcal{G} is a possibly directed path if no edge ViVj,1i<jkV_{i}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}V_{j},1\leq i<j\leq k, is in 𝒢\mathcal{G} (Perković et al., 2017, Zhang, 2008a).

A path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} is proper (w.r.t. 𝐗\mathbf{X}) if only its first node is in 𝐗\mathbf{X}. A path from XX to YY is a back-door path if does not begin with a visible edge out of XX (see definition of visible below; Pearl, 2009, Maathuis and Colombo, 2015). For a path p=X1,X2,,Xkp=\langle X_{1},X_{2},\dots,X_{k}\rangle and i,j,ki,j,k such that 1i<jk1\leq i<j\leq k, we define the subpath of pp from XiX_{i} to XjX_{j} as the path p(Xi,Xj)=Xi,Xi+1,,Xjp(X_{i},X_{j})=\langle X_{i},X_{i+1},\dots,X_{j}\rangle.

Colliders, Shields, and Definite Status Paths. If a path pp contains XiXjXkX_{i}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X_{j}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}X_{k} as a subpath, then XjX_{j} is a collider on pp. A path Xi,Xj,Xk\langle X_{i},X_{j},X_{k}\rangle is an unshielded triple if XiX_{i} and XkX_{k} are not adjacent. A path is unshielded if all successive triples on the path are unshielded. A node XjX_{j} is a definite non-collider on a path pp if the edge XiXjX_{i}\leftarrow X_{j} or XjXkX_{j}\rightarrow X_{k} is on pp, or if Xi,Xj,Xk\langle X_{i},X_{j},X_{k}\rangle is an undirected subpath of pp and XiX_{i} is not adjacent to XkX_{k}. A node is of definite status on a path if it is a collider, a definite non-collider, or an endpoint on the path. A path pp is of definite status if every node on pp is of definite status.

Blocking, D-separation, and M-separation. Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a directed or partially directed graph 𝒢\mathcal{G}. A definite-status path pp from 𝐗\mathbf{X} to 𝐘\mathbf{Y} is d-connecting given 𝐙\mathbf{Z} if every definite non-collider on pp is not in 𝐙\mathbf{Z} and every collider on pp has a descendant in 𝐙\mathbf{Z}. Otherwise, 𝐙\mathbf{Z} blocks pp. If 𝐙\mathbf{Z} blocks all definite status paths between 𝐗\mathbf{X} and 𝐘\mathbf{Y} in 𝒢\mathcal{G}, then 𝐗\mathbf{X} is d-separated from 𝐘\mathbf{Y} given 𝐙\mathbf{Z} in 𝒢\mathcal{G} and we write (𝐗d𝐘|𝐙)𝒢(\mathbf{X}\perp_{d}\mathbf{Y}|\mathbf{Z})_{\mathcal{G}} (Pearl, 2009).

If 𝒢\mathcal{G} is a mixed or partially directed mixed graph, the analogous terms to d-connection and d-separation are called m-connection and m-separation (Richardson and Spirtes, 2002). If a path is not m-connecting in such a graph 𝒢\mathcal{G} we will also call it blocked. We will also use the same notation d\perp_{d} to denote m-separation in a mixed or partially directed mixed graph 𝒢\mathcal{G}.

Ancestral Relationships. If XYX\to Y, then XX is a parent of YY. If XYX-Y, XYX\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}Y, XYX\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Y, or XYX\to Y, then XX is a possible parent of YY. If there is a directed path from XX to YY, such as XM1MkX\to M_{1}\to\dots\to M_{k}, Mk=YM_{k}=Y, k1k\geq 1, then XX is an ancestor of YY, YY is a descendant of XX, and M1,,MkM_{1},\dots,M_{k} are mediators for XX and YY. We use the convention that if YY is a descendant of XX, then YY is also a mediator for XX and YY. If there is a possibly directed path from XX to YY, then XX is a possible ancestor of YY, YY is a possible descendant of XX, and any node on this path that is not XX is a possible mediator of XX and YY. We use the convention that if YY is a possible descendant of XX, then YY is also a possible mediator for XX and YY. We also use the convention that every node is an ancestor, descendant, possible ancestor, and possible descendant of itself. The sets of parents, possible parents, ancestors, descendants, possible ancestors, and possible descendants of XX in 𝒢\mathcal{G} are denoted by Pa(X,𝒢)\operatorname{Pa}(X,\mathcal{G}), PossPa(X,𝒢)\operatorname{PossPa}(X,\mathcal{G}), An(X,𝒢)\operatorname{An}(X,\mathcal{G}), De(X,𝒢)\operatorname{De}(X,\mathcal{G}), PossAn(X,𝒢)\operatorname{PossAn}(X,\mathcal{G}), and PossDe(X,𝒢)\operatorname{PossDe}(X,\mathcal{G}), respectively. Similarly, we denote the sets of mediators and possible mediators for XX and YY in 𝒢\mathcal{G} by Med(X,Y,𝒢)\operatorname{Med}({X,Y},\mathcal{G}) and PossMed(X,Y,𝒢)\operatorname{PossMed}({X,Y},\mathcal{G}).

We let An(𝐗,𝒢)=X𝐗An(X,𝒢)\operatorname{An}(\mathbf{X},\mathcal{G})=\cup_{X\in\mathbf{X}}\operatorname{An}(X,\mathcal{G}), with analogous definitions for De(𝐗,𝒢)\operatorname{De}(\mathbf{X},\mathcal{G}), PossAn(𝐗,𝒢)\operatorname{PossAn}(\mathbf{X},\mathcal{G}), and PossDe(𝐗,𝒢)\operatorname{PossDe}(\mathbf{X},\mathcal{G}). For disjoint node sets 𝐗\mathbf{X} and 𝐘\mathbf{Y}, we let Med(𝐗,𝐘,𝒢)\operatorname{Med}(\mathbf{X,Y},\mathcal{G}) be the union of all mediators of X𝐗X\in\mathbf{X} and Y𝐘Y\in\mathbf{Y} that lie on a proper causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y}, with an analogous definition for PossMed(𝐗,𝐘,𝒢)\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}). Unconventionally, we define Pa(𝐗,𝒢)=(X𝐗Pa(X,𝒢))𝐗\operatorname{Pa}(\mathbf{X},\mathcal{G})=(\cup_{X\in\mathbf{X}}\operatorname{Pa}(X,\mathcal{G}))\setminus\mathbf{X}. We denote that XX is adjacent to YY in 𝒢\mathcal{G} by XAdj(Y,𝒢)X\in\operatorname{Adj}(Y,\mathcal{G}).

DAGs and PDAGs. A directed graph without directed cycles is a directed acyclic graph (DAG). A partially directed acyclic graph (PDAG) is a partially directed graph without directed cycles.

MAGs. A mixed graph without directed or almost directed cycles is called ancestral. Note that we do not consider ancestral graphs that represent selection bias (see Zhang, 2008a, for details). A maximal ancestral graph (MAG) is an ancestral graph =(𝐕,𝐄)\mathcal{M}=(\mathbf{V,E}) where every pair of non-adjacent nodes XX and YY in \mathcal{M} can be m-separated by a set 𝐙𝐕{X,Y}\mathbf{Z}\subseteq\mathbf{V}\setminus\{X,Y\}. A DAG 𝒟=(𝐕,𝐄)\mathcal{D}=(\mathbf{V,E}) with unobserved variables 𝐔𝐕\mathbf{U}\subseteq\mathbf{V} can be uniquely represented by a MAG =(𝐕𝐔,𝐄)\mathcal{M}=(\mathbf{V}\setminus\mathbf{U},\mathbf{E^{\prime}}), which preserves the ancestry and m-separations among the observed variables (Richardson and Spirtes, 2002).

MPDAGs and Markov Equivalence. All DAGs over a node set 𝐕\mathbf{V} with the same adjacencies and unshielded colliders can be uniquely represented by a completed PDAG (CPDAG). These DAGs form a Markov equivalence class with the same set of d-separations. A maximally oriented PDAG (MPDAG) is formed by taking a CPDAG, adding background knowledge (by directing undirected edges), and completing Meek (1995)’s orientation rules. We say a DAG is represented by an MPDAG 𝒢\mathcal{G} if it has the same nodes, adjacencies, and directed edges as 𝒢\mathcal{G}. The set of such DAGs – denoted by [𝒢][\mathcal{G}] – forms a restriction of the Markov equivalence class so that all DAGs in [𝒢][\mathcal{G}] have same set of d-separations. Note that if 𝒢\mathcal{G} has the edge ABA-B, then [𝒢][\mathcal{G}] contains at least one DAG with ABA\to B and one DAG with ABA\leftarrow B (Meek, 1995). Further, note that all DAGs and CPDAGs are MPDAGs.

PAGs and Markov Equivalence. All MAGs that encode the same set of m-separations form a Markov equivalence class, which can be uniquely represented by a partial ancestral graph (PAG; Richardson and Spirtes, 2002; Ali et al., 2009). [𝒢][\mathcal{G}] denotes all MAGs represented by a PAG 𝒢\mathcal{G}. We say a DAG 𝒟\mathcal{D} is represented by a PAG 𝒢\mathcal{G} if there is a MAG [𝒢]\mathcal{M}\in[\mathcal{G}] such that 𝒟\mathcal{D} is represented by \mathcal{M}.

We do not consider PAGs that represent selection bias (see Zhang, 2008b). Further, we only consider maximally informative PAGs (Zhang, 2008b). That is, if a PAG 𝒢\mathcal{G} has the edge ABA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.1,1.0){\line(1,0){2.4}} \put(4.0,1.0){\circle{1.0}} \end{picture}B, then [𝒢][\mathcal{G}] contains a MAG with ABA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}B and a MAG with ABA\leftarrow B. (We preclude MAGs with ABA-B by assuming no selection bias.) Any arrowhead or tail edge mark in a PAG 𝒢\mathcal{G} corresponds to that same arrowhead or tail edge mark in every MAG in [𝒢][\mathcal{G}]. The edge orientations in every PAG we consider are completed with respect to orientation rules R1R4R1-R4 and R8R10R8-R10 of Zhang (2008b).

Visible and Invisible Edges. Given a MAG or PAG 𝒢\mathcal{G}, a directed edge XYX\rightarrow Y is visible in 𝒢\mathcal{G} if there is a node VAdj(Y,𝒢)V\notin\operatorname{Adj}(Y,\mathcal{G}) such that 𝒢\mathcal{G} contains either VXV\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X or VV1VkXV\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}V_{1}\leftrightarrow\dots\leftrightarrow V_{k}\leftrightarrow X, where k1k\geq 1 and V1,,VkPa(Y,𝒢){V,X,Y}V_{1},\dots,V_{k}\in\operatorname{Pa}(Y,\mathcal{G})\setminus\{V,X,Y\} (Zhang, 2006). A directed edge that is not visible in a MAG or PAG is said to be invisible.

Markov Compatibility and Positivity. An observational density f(𝐯)f(\mathbf{v}) is Markov compatible with a DAG 𝒟=(𝐕,𝐄)\mathcal{D}=(\mathbf{V},\mathbf{E}) if f(𝐯)=Vi𝐕f(vi|pa(vi,𝒟))f(\mathbf{v})=\prod_{V_{i}\in\mathbf{V}}f(v_{i}|\operatorname{pa}(v_{i},\mathcal{D})). If f(𝐯)f(\mathbf{v}) is Markov compatible with a DAG 𝒟\mathcal{D}, then it is Markov compatible with every DAG that is Markov equivalent to 𝒟\mathcal{D} (Pearl, 2009). Hence, we say that a density is Markov compatible with an MPDAG, MAG, or PAG 𝒢\mathcal{G} if it is Markov compatible with a DAG represented by 𝒢\mathcal{G}. Throughout, we assume positivity. That is, we only consider distributions that satisfy f(𝐯)>0f(\mathbf{v})>0 for all valid values of 𝐕\mathbf{V} (Kivva et al., 2023).

Probabilistic Implications of Graph Separation. Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a DAG, MPDAG, MAG, or PAG 𝒢\mathcal{G}. If 𝐗\mathbf{X} and 𝐘\mathbf{Y} are d-separated or m-separated given 𝐙\mathbf{Z} in 𝒢\mathcal{G}, then 𝐗\mathbf{X} and 𝐘\mathbf{Y} are conditionally independent given 𝐙\mathbf{Z} in any observational density that is Markov compatible with 𝒢\mathcal{G} (Lauritzen et al., 1990; Zhang, 2008a; Henckel et al., 2022).

Causal Graphs. Let 𝒢\mathcal{G} be a graph with nodes ViV_{i} and VjV_{j}. When 𝒢\mathcal{G} is an MPDAG, it is a causal MPDAG if every edge ViVjV_{i}\to V_{j} represents a direct causal effect of ViV_{i} on VjV_{j} and if every edge ViVjV_{i}-V_{j} represents a direct causal effect of unknown direction (either ViV_{i} affects VjV_{j} or VjV_{j} affects ViV_{i}). Note that all DAGs are MPDAGs.

When 𝒢\mathcal{G} is a MAG or PAG, it is a causal MAG or causal PAG, respectively, if every edge ViVjV_{i}\to V_{j} represents the presence of a causal path from ViV_{i} to VjV_{j}; every edge ViVjV_{i}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}V_{j} represents the absence of a causal path from ViV_{i} to VjV_{j}; and every edge ViVjV_{i}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}V_{j} represents the presence of a causal path of unknown direction or a common cause in the underlying causal DAG.

Causal and Non-causal Paths. Note that any directed or possibly directed path in a causal graph is causal or possibly causal, respectively. However, since we focus on causal graphs, we will use this causal terminology for paths in any of our graphs. We will say a path is non-causal if it is not possibly causal.

Consistency. Let f(𝐯)f(\mathbf{v}) be an observational density over 𝐕\mathbf{V}. The notation do(𝐗=𝐱)do(\mathbf{X}=\mathbf{x}), or do(𝐱)do(\mathbf{x}) for short, represents an outside intervention that sets 𝐗𝐕\mathbf{X}\subseteq\mathbf{V} to fixed values 𝐱\mathbf{x}. An interventional density f(𝐯|do(𝐱))f(\mathbf{v}|do(\mathbf{x})) is a density resulting from such an intervention.

Let 𝐅\mathbf{F^{*}} denote the set of all interventional densities f(𝐯|do(𝐱))f(\mathbf{v}|do(\mathbf{x})) such that 𝐗𝐕\mathbf{X}\subseteq\mathbf{V} (including 𝐗=\mathbf{X}=\emptyset). A causal DAG 𝒟=(𝐕,𝐄)\mathcal{D}=(\mathbf{V,E}) is a causal Bayesian network compatible with 𝐅\mathbf{F^{*}} if and only if for all f(𝐯|do(𝐱))𝐅f(\mathbf{v}|do(\mathbf{x}))\in\mathbf{F^{*}}, the following truncated factorization holds:

f(𝐯|do(𝐱))=Vi𝐕𝐗f(vi|pa(vi,𝒟))𝟙(𝐗=𝐱)\displaystyle f(\mathbf{v}|do(\mathbf{x}))=\prod_{V_{i}\in\mathbf{V}\setminus\mathbf{X}}f(v_{i}|\operatorname{pa}(v_{i},\mathcal{D}))\mathds{1}(\mathbf{X}=\mathbf{x}) (3)

(Pearl, 2009; Bareinboim et al., 2012). We say an interventional density is consistent with a causal DAG 𝒟\mathcal{D} if it belongs to a set of interventional densities 𝐅\mathbf{F^{*}} such that 𝒟\mathcal{D} is compatible with 𝐅\mathbf{F^{*}}. Note that any observational density that is Markov compatible with 𝒟\mathcal{D} is consistent with 𝒟\mathcal{D}. We say an interventional density is consistent with a causal MPDAG, MAG, or PAG 𝒢\mathcal{G} if it is consistent with each DAG represented by 𝒢\mathcal{G} – were the DAG to be causal.

Identifiability. Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a causal MPDAG or PAG 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V,E}), and let 𝐅𝐢={fi(𝐯|do(𝐱)):𝐗𝐕}\mathbf{F^{*}_{i}}=\{f_{i}(\mathbf{v}|do(\mathbf{x^{\prime}})):\mathbf{X^{\prime}}\subseteq\mathbf{V}\} be a set with which a DAG 𝒟i\mathcal{D}_{i} represented by 𝒢\mathcal{G} is compatible – were 𝒟i\mathcal{D}_{i} to be causal. We say the conditional causal effect of 𝐗\mathbf{X} on 𝐘\mathbf{Y} given 𝐙\mathbf{Z} is identifiable in 𝒢\mathcal{G} if for any 𝐅𝟏,𝐅𝟐\mathbf{F^{*}_{1}},\mathbf{F^{*}_{2}} where f1(𝐯)=f2(𝐯)f_{1}(\mathbf{v})=f_{2}(\mathbf{v}), we have f1(𝐲|do(𝐱),𝐳)=f2(𝐲|do(𝐱),𝐳)f_{1}(\mathbf{y}|do(\mathbf{x}),\mathbf{z})=f_{2}(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) (Pearl, 2009).

Forbidden Set. Let 𝐗\mathbf{X} and 𝐘\mathbf{Y} be disjoint node sets in an MPDAG or PAG 𝒢\mathcal{G}. Then the forbidden set relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒢\mathcal{G} is

Fo rb(𝐗,𝐘,𝒢)=\displaystyle\text{rb}(\mathbf{X},\mathbf{Y},\mathcal{G})= {nodes inPossDe(W,𝒢), whereWPossMed(𝐗,𝐘,𝒢)}.\displaystyle\left\{\begin{array}[]{@{}l@{}l@{}}\text{nodes in}\,\operatorname{PossDe}(W,\mathcal{G})\text{, where}\\ \,\,\,\,\,\,\,\,\,\,W\in\operatorname{PossMed}(\mathbf{X},\mathbf{Y},\mathcal{G})\end{array}\right\}. (5)

3 RESULTS - MPDAGS

In this section, we present our results on identifying a conditional causal effect via our conditional adjustment criterion in the setting of an MPDAG (Definition 2). Examples of how to use our criterion and explicit conditional adjustment sets based on our criterion follow these results. We remark here that our criterion shares similarities with the adjustment criterion for total effect identification of Perković et al. (2017) and with the 𝐙\mathbf{Z}-dependent dynamic adjustment criterion of Smucler et al. (2020), but we save these results and reflections for Section 3.4.

Note that the results of this section hold when a fully oriented DAG is known, since all DAGs are MPDAGs. Throughout, our goal is to identify the conditional causal effect of treatments 𝐗\mathbf{X} on responses 𝐘\mathbf{Y} conditional on covariates 𝐙\mathbf{Z} and given a known graph 𝒢\mathcal{G}.

3.1 Conditional Adjustment Criterion

We include our definition of a conditional adjustment set below (Definition 1). Note that, while this section focuses on MPDAGs, we write Definition 1 broadly for further use in Section 4. Our goal in this section is to find an equivalent graphical characterization of a conditional adjustment set. Theorem 3 establishes that Definition 2 provides such a graphical characterization, which we call the conditional adjustment criterion, under the assumption that the conditioning set does not contain variables affected by treatment (𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset).

Definition 1

(Conditional Adjustment Set for MPDAGs, PAGs) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal MPDAG or PAG 𝒢\mathcal{G}. Then 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G} if for any density ff consistent with 𝒢\mathcal{G}

f(𝐲|do(𝐱),𝐳)={f(𝐲|𝐱,𝐳)𝐒=f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬𝐒.\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z})=\begin{cases}f(\mathbf{y}|\mathbf{x},\mathbf{z})&\mathbf{S}=\emptyset\\ \int f(\mathbf{y}|\mathbf{x},\mathbf{z},\mathbf{s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}&\mathbf{S}\neq\emptyset.\end{cases} (6)
Definition 2

(Conditional Adjustment Criterion for MPDAGs) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in an MPDAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed edge. Then 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G} if

  1. (a)

    𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset, and

  2. (b)

    𝐒𝐙\mathbf{S}\cup\mathbf{Z} blocks all proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}.

Theorem 3

(Completeness, Soundness of Conditional Adjustment Criterion for MPDAGs) Let 𝐗,𝐘,𝐙\mathbf{X,Y},\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal MPDAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. Then 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 1) if and only if 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 2).

  •   Proof of Theorem 3.

    First note the following facts.

    1. (i)

      Every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed edge.

    2. (ii)

      𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset in every DAG 𝒟\mathcal{D} in [𝒢][\mathcal{G}].

    3. (iii)

      𝐙Forb(𝐗,𝐘,𝒢)=\mathbf{Z}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset.

    We have that (i) holds in either direction – by definition (\Leftarrow) or by Proposition 36 (Supp. C) (\Rightarrow). Then Lemmas 20 and 26 (Supp. B) imply (ii) and (iii), respectively, given 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and (i).

    Now consider the following statements.

    1. (a)

      𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G}.

    2. (b)

      𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in each DAG in [𝒢][\mathcal{G}] – were the DAG to be causal.

    3. (c)

      𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in each DAG in [𝒢][\mathcal{G}].

    4. (d)

      𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}.

    By definition, (a) \Leftrightarrow (b). Then (b) \Leftrightarrow (c) by Theorems 39 and 40 (Supp. D) and the fact that the conditional adjustment criterion does not require a causal DAG. Lastly, by the facts above and by applying Lemmas 21 and 22 (Supp. B) in turn, (c) \Leftrightarrow (d).  

3.2 Examples

To illustrate the usefulness of the results above, we provide examples below where we aim to find f(𝐲|do(𝐱),𝐳)f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) when 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. Theorem 3 allows us to use the conditional adjustment criterion to (a) check whether a set can be used for conditional adjustment (Examples 1-3) or (b) determine if no such set exists (Example 4).

XXV1V_{1}V2V_{2}V3V_{3}YYV4V_{4}
(a)
X1X_{1}ZZWWSSX2X_{2}YYLL
(b)
XXV1V_{1}V2V_{2}V3V_{3}YY
(c)
Figure 2: Causal MPDAGs used in Examples 1-4.
Example 1

(Empty Conditional Adjustment Set.) Let 𝒢\mathcal{G} be the causal MPDAG in Figure 2(a) 111Compare to Figure 5(a) of Perković (2020)., and let 𝐗={X}\mathbf{X}=\{X\}, 𝐘={Y}\mathbf{Y}=\{Y\}, and 𝐙={V1,V2}\mathbf{Z}=\{V_{1},V_{2}\}. Note that 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and that every possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed edge.

Let 𝐒=\mathbf{S}=\emptyset. Note that 𝐒(𝐗𝐘𝐙)=\mathbf{S}\cap(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z})=\emptyset, 𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset, and 𝐒𝐙\mathbf{S}\cup\mathbf{Z} blocks all non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y}. Thus, 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G}, and by Theorem 3, f(𝐲|do(𝐱),𝐳)=f(y|x,v1,v2)f(\mathbf{y}|do(\mathbf{x}),\mathbf{z})=f(y|x,v_{1},v_{2}).

Example 2

(Only Nonempty Conditional Adjustment Sets.) Again let 𝒢\mathcal{G} be the causal MPDAG in Figure 2(a), where 𝐗={X}\mathbf{X}=\{X\} and 𝐘={Y}\mathbf{Y}=\{Y\}. But now let 𝐙={V1}\mathbf{Z}=\{V_{1}\}. We still have that 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and that every possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed edge.

Note that if we let 𝐒=\mathbf{S}=\emptyset, 𝐒𝐙\mathbf{S}\cup\mathbf{Z} does not block the path XV2YX\leftarrow V_{2}\to Y, which is a proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y}. Thus, the empty set is not a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}.

Consider, instead, the set 𝐒={V2}\mathbf{S}=\{V_{2}\}. Note that 𝐒(𝐗𝐘𝐙)=\mathbf{S}\cap(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z})=\emptyset, 𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset, and 𝐒𝐙\mathbf{S}\cup\mathbf{Z} blocks all non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y}. Thus, 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}, and by Theorem 3, f(𝐲|do(𝐱),𝐳)=f(y|x,v1,v2)f(v2|v1)dv2f(\mathbf{y}|do(\mathbf{x}),\mathbf{z})=\int f(y|x,v_{1},v_{2})f(v_{2}|v_{1})\mathop{}\!\mathrm{d}v_{2}.

Example 3

(Conditional Adjustment Set Contains Descendants of 𝐗\mathbf{X}.) Let 𝒢\mathcal{G} be the causal DAG (and therefore, MPDAG) in Figure 2(b) 222Compare to Figure 6(a) of Perković et al. (2018)., where we assume LL is a variable that cannot be measured. Define 𝐗={X1,X2}\mathbf{X}=\{X_{1},X_{2}\}, 𝐘={Y}\mathbf{Y}=\{Y\}, and 𝐙={Z}\mathbf{Z}=\{Z\}. Note that 𝐙De(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{G})=\emptyset.

Consider the set 𝐒={S,W}\mathbf{S}=\{S,W\}. Note that 𝐒(𝐗𝐘𝐙)=\mathbf{S}\cap(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z})=\emptyset, 𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset, and 𝐒\mathbf{S} blocks all proper non-causal paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}. Hence, 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}, and by Theorem 3, f(𝐲|do(𝐱),𝐳)=f(y|x1,x2,z,s,w)f(s,w|z)dsdwf(\mathbf{y}|do(\mathbf{x}),\mathbf{z})=\int f(y|x_{1},x_{2},z,s,w)f(s,w|z)\mathop{}\!\mathrm{d}s\mathop{}\!\mathrm{d}w.

Example 4

(No Conditional Adjustment Set, Effect Non-identifiable.) Let 𝒢\mathcal{G} be the causal MPDAG in Figure 2(c), and let 𝐗={X}\mathbf{X}=\{X\}, 𝐘={Y}\mathbf{Y}=\{Y\}, and 𝐙={V3}\mathbf{Z}=\{V_{3}\}. Note that 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. However, XV1V2YX-V_{1}\to V_{2}\to Y is a proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that starts with an undirected edge. Thus, by Theorem 3, there can be no conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}. In fact, by Proposition 36 (Supp. C), f(𝐲|do(𝐱),𝐳)f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) is not identifiable in 𝒢\mathcal{G} using any method.

3.3 Constructing Adjustment Sets

The conditional adjustment criterion provides a way to check if a set can be used for conditional adjustment given an MPDAG 𝒢\mathcal{G}, but it does not provide a way to construct a conditional adjustment set – a task that may be difficult when 𝒢\mathcal{G} is large. The results in this section provide such a roadmap under certain assumptions. The proofs can be found in Supp. F.

Lemma 4

Let 𝐗={X}\mathbf{X}=\{X\}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a causal MPDAG 𝒢\mathcal{G}, where 𝐙PossDe(X,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(X,\mathcal{G})=\emptyset and where every possibly causal path from XX to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed edge. If 𝐘Pa(X,𝒢)=\mathbf{Y}\cap\operatorname{Pa}(X,\mathcal{G})=\emptyset, then the following is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}:

Pa(X,𝒢)𝐙.\displaystyle\operatorname{Pa}(X,\mathcal{G})\setminus\mathbf{Z}. (7)
Theorem 5

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a causal MPDAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed edge.

  1. (a)

    If there is any conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}, then the following set is one:

    Adjust(\displaystyle\operatorname{Adjust}( 𝐗,𝐘,𝐙,𝒢)\displaystyle\mathbf{X},\mathbf{Y},\mathbf{Z},\mathcal{G}) (8)
    =[PossAn(𝐗𝐘,𝒢)An(𝐙,𝒢)]\displaystyle=\big{[}\operatorname{PossAn}(\mathbf{X\cup Y},\mathcal{G})\cup\operatorname{An}(\mathbf{Z},\mathcal{G})\big{]}
    [Forb(𝐗,𝐘,𝒢)𝐗𝐘𝐙].\displaystyle\hskip 18.06749pt\setminus\big{[}\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}\big{]}.
  2. (b)

    Suppose 𝐘PossDe(𝐗,𝒢)\mathbf{Y}\subseteq\operatorname{PossDe}(\mathbf{X},\mathcal{G}). If there is any conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}, then the following set is one:

    O(𝐗,𝐘,𝒢)\displaystyle\operatorname{O}(\mathbf{X,Y},\mathcal{G}) =Pa(PossMed(𝐗,𝐘,𝒢),𝒢)\displaystyle=\emph{Pa}\Big{(}\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}),\mathcal{G}\Big{)} (9)
    [Forb(𝐗,𝐘,𝒢)𝐗𝐘𝐙].\displaystyle\hskip 18.06749pt\setminus\Big{[}\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}\Big{]}.
Example 5

Consider again the causal MPDAG 𝒢\mathcal{G} in Figure 2(a), where 𝐗={X}\mathbf{X}=\{X\}, 𝐘={Y}\mathbf{Y}=\{Y\}, and 𝐙={V1}\mathbf{Z}=\{V_{1}\}. Note that the conditions of Lemma 4 and Theorem 5 are met, so we can construct three valid conditional adjustment sets using Equations (7), (8), and (9).

Pa(X,𝒢)𝐙\displaystyle\operatorname{Pa}(X,\mathcal{G})\setminus\mathbf{Z} ={V1,V2,V3}{V1}\displaystyle=\{V_{1},V_{2},V_{3}\}\setminus\{V_{1}\}
={V2,V3}.\displaystyle=\{V_{2},V_{3}\}.
Adjust(𝐗,𝐘,𝐙,𝒢)\displaystyle\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}) ={X,Y,V1,V2,V3,V4}{X,Y,V1}\displaystyle=\{X,Y,V_{1},V_{2},V_{3},V_{4}\}\setminus\{X,Y,V_{1}\}
={V2,V3,V4}.\displaystyle=\{V_{2},V_{3},V_{4}\}.
O(𝐗,𝐘,𝒢)\displaystyle\operatorname{O}(\mathbf{X,Y},\mathcal{G}) ={X,V1,V2,V4}{X,Y,V1}\displaystyle=\{X,V_{1},V_{2},V_{4}\}\setminus\{X,Y,V_{1}\}
={V2,V4}.\displaystyle=\{V_{2},V_{4}\}.

3.4 Comparison of Contexts

In this section, we point out a bridge between our conditional adjustment results and prior literature on unconditional adjustment and adjustment under dynamic treatment. We begin by presenting Lemma 6, which provides an equivalence between our criterion and the criterion of Perković et al. (2017) used for unconditional adjustment given an MPDAG. Note that this lemma is used to prove Theorem 3 (see Figure 5 in Supp. D). See Supp. D for the lemma’s proof.

Lemma 6

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in an MPDAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. Then we have the following.

  1. (a)

    Comparison of Adjustment Criteria:
    S\mathbf{S}
    satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 2) if and only if 𝐒𝐙\mathbf{S}\cup\mathbf{Z} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒢\mathcal{G} (Definition 12, Supp. A).

  2. (b)

    Comparison of Adjustment Sets:
    S\mathbf{S}
    is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 1) if and only if 𝐒𝐙\mathbf{S}\cup\mathbf{Z} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒢\mathcal{G} (Definition 11, Supp. A).

Next we turn to the work of Smucler et al. (2020), where the authors consider causal effect estimation under a dynamic treatment. For this purpose, Smucler et al. (2020) define a dynamic adjustment set, which they then relate to the set used by Maathuis and Colombo (2015) for unconditional adjustment (Definition 11, Supp. A). Lemma 6 allows us to connect this dynamic adjustment to our work.

Before making this connection, we briefly describe the context of these authors’ work. Unlike a do-intervention that sets 𝐗\mathbf{X} to fixed values 𝐱\mathbf{x}, a dynamic intervention sets 𝐗\mathbf{X} to values 𝐱\mathbf{x} with probability π(𝐱|𝐙=𝐳)\pi(\mathbf{x}|\mathbf{Z}=\mathbf{z}). However, a do-intervention can be seen as a special case of a dynamic intervention where π(𝐱|𝐙=𝐳)=𝟙(𝐗=𝐱)\pi(\mathbf{x}|\mathbf{Z}=\mathbf{z})=\mathds{1}(\mathbf{X}=\mathbf{x}). Dynamic interventions are often of interest in personalized medicine (Robins, 1993; Murphy et al., 2001; Chakraborty and Moodie, 2013).

Smucler et al. (2020) refer to a causal effect under a dynamic intervention, whose assignment probability depends on 𝐙\mathbf{Z}, as a 𝒁\boldsymbol{\mathit{Z}}-dependent dynamic causal effect (also called a single stage dynamic treatment effect in Chakraborty and Moodie (2013)). They consider these causal effects in the setting where 𝐗\mathbf{X} and 𝐘\mathbf{Y} are nodes, the given graph 𝒢\mathcal{G} is a DAG, and the following assumption holds: 𝐙De(X,𝒢)=\mathbf{Z}\cap\operatorname{De}(X,\mathcal{G})=\emptyset. They then define a 𝒁\boldsymbol{\mathit{Z}}-dependent dynamic adjustment set as a set 𝐒\mathbf{S} that satisfies

f(y|π(x|𝐳))={π(x|𝐳)f(y|x,𝐳)𝐒=,π(x|𝐳)f(y|x,𝐳,𝐬)f(𝐬|𝐳)d𝐬𝐒.\displaystyle f(y|\pi(x|\mathbf{z}))=\begin{cases}\pi(x|\mathbf{z})f(y|x,\mathbf{z})&\mathbf{S}=\emptyset,\\ \pi(x|\mathbf{z})\int f(y|x,\mathbf{z},\mathbf{s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}&\mathbf{S}\neq\emptyset.\end{cases}

To compare these sets to our conditional adjustment sets, we reference Proposition 1 of Smucler et al. (2020). This result states that, under their assumptions, 𝐒𝐙\mathbf{S}\cup\mathbf{Z} is a 𝐙\mathbf{Z}-dependent dynamic adjustment set if and only if 𝐒𝐙\mathbf{S}\cup\mathbf{Z} is an adjustment set relative to (X,Y)(X,Y) in 𝒢\mathcal{G} (Definition 11, Supp. A). It follows from Lemma 6 that 𝐒𝐙\mathbf{S}\cup\mathbf{Z} is a 𝐙\mathbf{Z}-dependent dynamic adjustment set if and only if 𝐒\mathbf{S} is a conditional adjustment set relative to (X,Y,𝐙)(X,Y,\mathbf{Z}) in 𝒢\mathcal{G} – when 𝒢\mathcal{G} is a DAG such that 𝐙De(X,𝒢)=\mathbf{Z}\cap\operatorname{De}(X,\mathcal{G})=\emptyset. Thus, our results can be seen as generalizations of Smucler et al. (2020) for |𝐗|>1|\mathbf{X}|>1 and, therefore, can be used for 𝐙\mathbf{Z}-dependent dynamic causal effect identification.

4 RESULTS - PAGS

We now extend our results on conditional adjustment to the setting of a PAG.

4.1 Conditional Adjustment Criterion

We first introduce our conditional adjustment criterion for PAGs (Definition 7). Note that the difference between this criterion and the analogous criterion for MPDAGs is the use of a visible as opposed to a directed edge. Visibility is a stronger condition introduced by Zhang (2008a) (see Supp. A for definition).

Following this, Lemma 8 provides an equivalence between our criterion and the criterion of Perković et al. (2018) used for unconditional adjustment given a PAG. Theorem 9 is our main result in this section. It establishes that, under restrictions on 𝐙\mathbf{Z}, the conditional adjustment criterion is an equivalent graphical characterization of a conditional adjustment set in causal PAGs. Proofs of these results are given in Supp. G.

Definition 7

(Conditional Adjustment Criterion for PAGs) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a PAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a visible edge out of 𝐗\mathbf{X}. Then 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G} if

  1. (a)

    𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset, and

  2. (b)

    𝐒𝐙\mathbf{S}\cup\mathbf{Z} blocks all proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}.

Lemma 8

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a PAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. Then we have the following.

  1. (a)

    Comparison of Adjustment Criteria:
    S\mathbf{S}
    satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 7) if and only if 𝐒𝐙\mathbf{S}\cup\mathbf{Z} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒢\mathcal{G} (Definition 12, Supp. A).

  2. (b)

    Comparison of Adjustment Sets:
    S\mathbf{S}
    is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 1) if and only if 𝐒𝐙\mathbf{S}\cup\mathbf{Z} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒢\mathcal{G} (Definition 11, Supp. A).

  •   Proof of Lemma 8.

    (a) Follows from the fact that Forb(𝐗,𝐘,𝒢)PossDe(𝐗,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\subseteq\operatorname{PossDe}(\mathbf{X},\mathcal{G}).

    (b) We start by noting the following fact. Since 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset, then 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset in every DAG represented by 𝒢\mathcal{G} (Lemma 49, Supp. G). Then consider the following statements.

    1. (a)

      𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G}.

    2. (b)

      𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in each DAG represented by 𝒢\mathcal{G} – were the DAG to be causal.

    3. (c)

      𝐒𝐙\mathbf{S}\cup\mathbf{Z} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in each DAG represented by 𝒢\mathcal{G} – were the DAG to be causal.

    4. (d)

      𝐒𝐙\mathbf{S}\cup\mathbf{Z} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒢\mathcal{G}.

    By definition, (a) \Leftrightarrow (b). Then by Lemma 6(b) and the fact above, we have (b) \Leftrightarrow (c). The statement (c) \Leftrightarrow (d) follows again by definition.  

Theorem 9

(Completeness, Soundness of Conditional Adjustment Criterion for PAGs) Let 𝐗,𝐘,𝐙\mathbf{X,Y},\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal PAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. Then 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 1) if and only if 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 7).

4.2 Constructing Adjustment Sets

We now provide a method for constructing conditional adjustment sets given a causal PAG (Theorem 10). We illustrate this result in Example 6. The proof of Theorem 10 can be found in Supp. H.

Theorem 10

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a causal PAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a visible edge out of 𝐗\mathbf{X}. If there is any conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}, then the following set is one:

Adjust\displaystyle\operatorname{Adjust} (𝐗,𝐘,𝐙,𝒢)\displaystyle(\mathbf{X},\mathbf{Y},\mathbf{Z},\mathcal{G}) (10)
=[PossAn(𝐗𝐘,𝒢)PossAn(𝐙,𝒢)]\displaystyle=\big{[}\operatorname{PossAn}(\mathbf{X\cup Y},\mathcal{G})\cup\operatorname{PossAn}(\mathbf{Z},\mathcal{G})\big{]}
[Forb(𝐗,𝐘,𝒢)𝐗𝐘𝐙].\displaystyle\hskip 18.06749pt\setminus\Big{[}\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}\Big{]}.
XXV1V_{1}V2V_{2}V3V_{3}V5V_{5}YYV4V_{4}
Figure 3: A causal PAG used in Example 6.
Example 6

Let 𝒢\mathcal{G} be the causal PAG in Figure 3, and let 𝐗={X}\mathbf{X}=\{X\}, 𝐘={Y}\mathbf{Y}=\{Y\}, and 𝐙={V1}\mathbf{Z}=\{V_{1}\}. Note that 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. Furthermore, the only possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} is the edge XYX\to Y, which is visible due to the presence of V3XV_{3}\leftrightarrow X, where V3Adj(Y,𝒢)V_{3}\notin\operatorname{Adj}(Y,\mathcal{G}). If there is any conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}, then the conditions of Theorem 10 are met. We consider the set from Equation (10).

Adjust(𝐗,𝐘,𝐙,𝒢)\displaystyle\operatorname{Adjust}(\mathbf{X},\mathbf{Y},\mathbf{Z},\mathcal{G}) ={X,Y,V1,V2,V4}{X,Y,V1}\displaystyle=\{X,Y,V_{1},V_{2},V_{4}\}\setminus\{X,Y,V_{1}\}
={V2,V4}.\displaystyle=\{V_{2},V_{4}\}.

To see that this is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}, we note that it fulfills the requirements of Definition 7. That is, Adjust(𝐗,𝐘,𝐙,𝒢)Forb(𝐗,𝐘,𝒢)=\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset and Adjust(𝐗,𝐘,𝐙,𝒢)𝐙={V1,V2,V4}\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}=\{V_{1},V_{2},V_{4}\} blocks all proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}.

5 DISCUSSION

This paper defines a conditional adjustment set that can be used to identify a causal effect in a setting where a causal MPDAG or PAG is known (Definition 1). We give necessary and sufficient graphical conditions for identifying such a set when 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset (Theorems 3 and 9). Further, we provide multiple methods for constructing these sets (Sections 3.3 and 4.2). While our results can be used to identify a broad class of conditional causal effects, we discuss some limitations below.

One such limitation is that there are conditional causal effects that can be identified but cannot be identified using conditional adjustment sets. As an example, consider the causal DAG (and therefore, MPDAG) 𝒢\mathcal{G} in Figure 4, and let 𝐗={X1,X2}\mathbf{X}=\{X_{1},X_{2}\}, 𝐘={Y}\mathbf{Y}=\{Y\}, and 𝐙={V2}\mathbf{Z}=\{V_{2}\}. Note that the conditional causal effect of 𝐗\mathbf{X} on 𝐘\mathbf{Y} given 𝐙\mathbf{Z} is identifiable using do calculus rules (Pearl, 2009, see Equations (14)-(16) in Supp. B):

f\displaystyle f (𝐲|do(𝐱),𝐳)\displaystyle(\mathbf{y}|do(\mathbf{x}),\mathbf{z})
=v1f(y,v1|do(𝐱),v2)dv1\displaystyle=\int_{v_{1}}f(y,v_{1}|do(\mathbf{x}),v_{2})\mathop{}\!\mathrm{d}v_{1}
=v1f(y|do(𝐱),v1,v2)f(v1|do(𝐱),v2)dv1\displaystyle=\int_{v_{1}}f(y|do(\mathbf{x}),v_{1},v_{2})f(v_{1}|do(\mathbf{x}),v_{2})\mathop{}\!\mathrm{d}v_{1}
=v1f(y|do(𝐱),v1,v2)f(v1|do(𝐱))dv1\displaystyle=\int_{v_{1}}f(y|do(\mathbf{x}),v_{1},v_{2})f(v_{1}|do(\mathbf{x}))\mathop{}\!\mathrm{d}v_{1} (11)
=v1f(y|do(x2),v1,v2)f(v1|do(x1))dv1\displaystyle=\int_{v_{1}}f(y|do(x_{2}),v_{1},v_{2})f(v_{1}|do(x_{1}))\mathop{}\!\mathrm{d}v_{1} (12)
=v1f(y|x2,v1,v2)f(v1|x1)dv1.\displaystyle=\int_{v_{1}}f(y|x_{2},v_{1},v_{2})f(v_{1}|x_{1})\mathop{}\!\mathrm{d}v_{1}. (13)

The first two equalities follow from basic probability rules. Equation (11) follows from Rule 1 of the do calculus, since V1dV2|X1,X2V_{1}\perp_{d}V_{2}|X_{1},X_{2} in 𝒢{X1,X2}¯\mathcal{G}_{\overline{\{X_{1},X_{2}\}}}. Equation (12) follows from Rule 3 of the do calculus, since YdX1|V1,V2,X2Y\perp_{d}X_{1}|V_{1},V_{2},X_{2} in 𝒢X2¯\mathcal{G}_{\overline{X_{2}}} and V1dX2|X1V_{1}\perp_{d}X_{2}|X_{1} in 𝒢{X1,X2}¯\mathcal{G}_{\overline{\{X_{1},X_{2}\}}}. Equation (13) follows from Rule 2 of the do calculus, since YdX2|V1,V2Y\perp_{d}X_{2}|V_{1},V_{2} in 𝒢X2¯\mathcal{G}_{\underline{X_{2}}} and V1dX1V_{1}\perp_{d}X_{1} in 𝒢X1¯\mathcal{G}_{\underline{X_{1}}}.

However, we can show that there is no conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G} that could have been used to identify the effect above. To see this, note that since 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset, we can use Theorem 3 to state the following. A set 𝐒\mathbf{S} must satisfy the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G} (Definition 2) in order to be a conditional adjustment set. Definition 2 requires that 𝐒\mathbf{S} block the path X2V1YX_{2}\leftarrow V_{1}\to Y, since it is a proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y}. It follows that 𝐒\mathbf{S} must contain V1Forb(𝐗,𝐘,𝒢)V_{1}\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), but this contradicts Definition 2’s requirement that 𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset.

Adding to the limitation above, there are conditional causal effects that can be identified using conditional adjustment sets but where these conditional adjustment sets cannot be identified using our criterion. This can occur when 𝐙PossDe(𝐗,𝒢)\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})\neq\emptyset, since our graphical criterion requires this restriction but our conditional adjustment set definition does not. As an example, consider again the causal DAG 𝒢\mathcal{G} given in Figure 2(b), and let 𝐗={X1,X2}\mathbf{X}=\{X_{1},X_{2}\}, 𝐘={Y}\mathbf{Y}=\{Y\}, and 𝐙={Z,W}\mathbf{Z}=\{Z,W\}. Since 𝐙PossDe(𝐗,𝒢)\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})\neq\emptyset, no set satisfies the conditional adjustment criterion. However, using do calculus rules (Pearl, 2009), we can show that 𝐒={S}\mathbf{S}=\{S\} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}:

f(𝐲|do(𝐱),𝐳)\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) =𝐬f(𝐲,𝐬|do(𝐱),𝐳)d𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{y},\mathbf{s}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}
=𝐬f(𝐲|do(𝐱),𝐳,𝐬)f(𝐬|do(𝐱),𝐳)d𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|do(\mathbf{x}),\mathbf{z},\mathbf{s})f(\mathbf{s}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}
=𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬.\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x},\mathbf{z},\mathbf{s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}.

The first and second equality follow from basic probability rules. The third follows by Rules 2 and 3 of the do calculus, since 𝐘d𝐗|𝐙𝐒\mathbf{Y}\perp_{d}\mathbf{X}\>|\>\mathbf{Z}\cup\mathbf{S} in 𝒢𝐗¯\mathcal{G}_{\underline{\mathbf{X}}} and 𝐒d𝐗|𝐙\mathbf{S}\perp_{d}\mathbf{X}\>|\>\mathbf{Z} in 𝒢𝐗(𝐙)¯\mathcal{G}_{\overline{\mathbf{X}(\mathbf{Z})}}. Future work could address identification in this setting by expanding our graphical criterion to allow for arbitrary conditioning.

X1X_{1}V1V_{1}YYX2X_{2}V2V_{2}
Figure 4: A causal DAG used in Section 5.

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 2210210.

References

  • Abrevaya et al. (2015) J. Abrevaya, Y.-C. Hsu, and R. P. Lieli. Estimating conditional average treatment effects. Journal of Business & Economic Statistics, 33(4):485–505, 2015.
  • Ali et al. (2009) R. A. Ali, T. S. Richardson, and P. Spirtes. Markov equivalence for ancestral graphs. Annals of Statistics, 37:2808–2837, 2009.
  • Andersson et al. (1997) S. A. Andersson, D. Madigan, and M. D. Perlman. A characterization of Markov equivalence classes for acyclic digraphs. Annals of Statistics, 25:505–541, 1997.
  • Athey and Imbens (2016) S. Athey and G. Imbens. Recursive partitioning for heterogeneous causal effects. In Proceedings of the National Academy of Sciences, volume 113, pages 7353–7360, 2016.
  • Bareinboim et al. (2012) E. Bareinboim, C. Brito, and J. Pearl. Local characterizations of causal Bayesian networks. In Graph Structures for Knowledge Representation and Reasoning: Second International Workshop, GKR 2011, Barcelona, Spain, July 16, 2011. Revised Selected Papers, pages 1–17. Springer, 2012.
  • Brand and Xie (2010) J. E. Brand and Y. Xie. Who benefits most from college? Evidence for negative selection in heterogeneous economic returns to higher education. American Sociological Review, 75(2):273–302, 2010.
  • Chakraborty and Moodie (2013) B. Chakraborty and E. E. Moodie. Statistical Methods for Dynamic Treatment Regimes. Springer, 2013.
  • Chernozhukov et al. (2023) V. Chernozhukov, W. K. Newey, and R. Singh. A simple and general debiased machine learning theorem with finite-sample guarantees. Biometrika, 110(1):257–264, 2023.
  • Chickering (2002) D. M. Chickering. Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2:445–498, 2002.
  • Fan et al. (2022) Q. Fan, Y.-C. Hsu, R. P. Lieli, and Y. Zhang. Estimation of conditional average treatment effects with high-dimensional data. Journal of Business & Economic Statistics, 40(1):313–327, 2022.
  • Hauser and Bühlmann (2012) A. Hauser and P. Bühlmann. Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. Journal of Maching Learning Research, 13:2409–2464, 2012.
  • Health (2010) W. H. O. R. Health. Medical eligibility criteria for contraceptive use. World Health Organization, 2010.
  • Henckel et al. (2022) L. Henckel, E. Perković, and M. H. Maathuis. Graphical criteria for efficient total effect estimation via adjustment in causal linear models. Journal of the Royal Statistical Society: Series B, pages 579–599, 2022.
  • Jaber et al. (2019) A. Jaber, J. Zhang, and E. Bareinboim. Identification of conditional causal effects under Markov equivalence. In Proceedings of NeurIPS, pages 11516–11524, 2019.
  • Jaber et al. (2022) A. Jaber, A. Ribeiro, J. Zhang, and E. Bareinboim. Causal identification under Markov equivalence: Calculus, algorithm, and completeness. In Proceedings of NeurIPS, volume 35, pages 3679–3690, 2022.
  • Kalisch et al. (2012) M. Kalisch, M. Mächler, D. Colombo, M. H. Maathuis, and P. Bühlmann. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11):1–26, 2012.
  • Kennedy et al. (2022) E. H. Kennedy, S. Balakrishnan, J. M. Robins, and L. Wasserman. Minimax rates for heterogeneous causal effect estimation. arXiv preprint arXiv:2203.00837, 2022.
  • Kivva et al. (2023) Y. Kivva, J. Etesami, and N. Kiyavash. On identifiability of conditional causal effects. arXiv preprint arXiv:2306.11755, 2023.
  • Künzel et al. (2019) S. R. Künzel, J. S. Sekhon, P. J. Bickel, and B. Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. In Proceedings of the National Academy of Sciences, volume 116, pages 4156–4165, 2019.
  • Lauritzen and Spiegelhalter (1988) S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B, pages 157–224, 1988.
  • Lauritzen et al. (1990) S. L. Lauritzen, A. P. Dawid, B. N. Larsen, and H.-G. Leimer. Independence properties of directed Markov fields. Networks, 20(5):491–505, 1990.
  • Maathuis and Colombo (2015) M. H. Maathuis and D. Colombo. A generalized back-door criterion. Annals of Statistics, 43:1060–1088, 2015.
  • Mardia et al. (1980) K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis (Probability and Mathematical Statistics). Academic Press London, 1980.
  • Meek (1995) C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of UAI, pages 403–410, 1995.
  • Mooij et al. (2020) J. M. Mooij, S. Magliacane, and T. Claassen. Joint causal inference from multiple contexts. The Journal of Machine Learning Research, 21(1):3919–4026, 2020.
  • Murphy et al. (2001) S. A. Murphy, M. J. van der Laan, J. M. Robins, and C. P. P. R. Group. Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456):1410–1423, 2001.
  • Nie and Wager (2021) X. Nie and S. Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.
  • Pearl (2009) J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009.
  • Perković (2020) E. Perković. Identifying causal effects in maximally oriented partially directed acyclic graphs. In Proceedings of UAI, pages 530–539, 2020.
  • Perković et al. (2015) E. Perković, J. Textor, M. Kalisch, and M. H. Maathuis. A complete generalized adjustment criterion. In Proceedings of UAI, pages 682–691, 2015.
  • Perković et al. (2017) E. Perković, M. Kalisch, and M. H. Maathuis. Interpreting and using CPDAGs with background knowledge. In Proceedings of UAI, 2017.
  • Perković et al. (2018) E. Perković, J. Textor, M. Kalisch, and M. H. Maathuis. Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs. Journal of Machine Learning Research, 18, 2018.
  • Richardson (2003) T. S. Richardson. Markov properties for acyclic directed mixed graphs. Scandinavian Jouranl of Statistics, 30:145–157, 2003.
  • Richardson and Spirtes (2002) T. S. Richardson and P. Spirtes. Ancestral graph Markov models. Annals of Statistics, 30:962–1030, 2002.
  • Robins (1993) J. M. Robins. Analytic methods for estimating HIV-treatment and cofactor effects. Methodological Issues in AIDS Behavioral Research, pages 213–288, 1993.
  • Rothenhäusler et al. (2018) D. Rothenhäusler, J. Ernest, and P. Bühlmann. Causal inference in partially linear structural equation models: identifiability and estimation. Annals of Statistics, 46:2904–2938, 2018.
  • Shpitser and Pearl (2008) I. Shpitser and J. Pearl. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9:1941–1979, 2008.
  • Smucler et al. (2020) E. Smucler, F. Sapienza, and A. Rotnitzky. Efficient adjustment sets in causal graphical models with hidden variables. Biometrika, 2020.
  • Spirtes et al. (1999) P. Spirtes, C. Meek, and T. S. Richardson. Computation, Causation and Discovery, chapter An algorithm for causal inference in the presence of latent variables and selection bias, pages 211–252. MIT Press, 1999.
  • Spirtes et al. (2000) P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, second edition, 2000.
  • Squires and Uhler (2022) C. Squires and C. Uhler. Causal structure learning: A combinatorial perspective. Foundations of Computational Mathematics, pages 1–35, 2022.
  • Textor et al. (2016) J. Textor, B. Van der Zander, M. S. Gilthorpe, M. Liśkiewicz, and G. T. Ellison. Robust causal inference using directed acyclic graphs: the R package ‘dagitty’. International Journal of Epidemiology, 45(6):1887–1894, 2016.
  • Wager and Athey (2018) S. Wager and S. Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
  • Wright (1921) S. Wright. Correlation and causation. Journal of Agricultural Research, 20(7):557–585, 1921.
  • Zhang (2006) J. Zhang. Causal Inference and Reasoning in Causally Insufficient Systems. PhD thesis, Carnegie Mellon University, 2006.
  • Zhang (2008a) J. Zhang. Causal reasoning with ancestral graphs. Journal of Machine Learning Research, 9:1437–1474, 2008a.
  • Zhang (2008b) J. Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172:1873–1896, 2008b.

Checklist

  1. 1.

    For all models and algorithms presented, check if you include:

    1. (a)

      A clear description of the mathematical setting, assumptions, algorithm, and/or model. [Yes/No/Not Applicable] Not Applicable

    2. (b)

      An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Yes/No/Not Applicable] Not Applicable

    3. (c)

      (Optional) Anonymized source code, with specification of all dependencies, including external libraries. [Yes/No/Not Applicable] Not Applicable

  2. 2.

    For any theoretical claim, check if you include:

    1. (a)

      Statements of the full set of assumptions of all theoretical results. [Yes/No/Not Applicable] Yes

    2. (b)

      Complete proofs of all theoretical results. [Yes/No/Not Applicable] Yes

    3. (c)

      Clear explanations of any assumptions. [Yes/No/Not Applicable] Yes

  3. 3.

    For all figures and tables that present empirical results, check if you include:

    1. (a)

      The code, data, and instructions needed to reproduce the main experimental results (either in the Supplemental material or as a URL). [Yes/No/Not Applicable] Not Applicable

    2. (b)

      All the training details (e.g., data splits, hyperparameters, how they were chosen). [Yes/No/Not Applicable] Not Applicable

    3. (c)

      A clear definition of the specific measure or statistics and error bars (e.g., with respect to the random seed after running experiments multiple times). [Yes/No/Not Applicable] Not Applicable

    4. (d)

      A description of the computing infrastructure used. (e.g., type of GPUs, internal cluster, or cloud provider). [Yes/No/Not Applicable] Not Applicable

  4. 4.

    If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include:

    1. (a)

      Citations of the creator If your work uses existing assets. [Yes/No/Not Applicable] Not Applicable

    2. (b)

      The license information of the assets, if applicable. [Yes/No/Not Applicable] Not Applicable

    3. (c)

      New assets either in the Supplemental material or as a URL, if applicable. [Yes/No/Not Applicable] Not Applicable

    4. (d)

      Information about consent from data providers/curators. [Yes/No/Not Applicable] Not Applicable

    5. (e)

      Discussion of sensible content if applicable, e.g., personally identifiable information or offensive content. [Yes/No/Not Applicable] Not Applicable

  5. 5.

    If you used crowdsourcing or conducted research with human subjects, check if you include:

    1. (a)

      The full text of instructions given to participants and screenshots. [Yes/No/Not Applicable] Not Applicable

    2. (b)

      Descriptions of potential participant risks, with links to Institutional Review Board (IRB) approvals if applicable. [Yes/No/Not Applicable] Not Applicable

    3. (c)

      The estimated hourly wage paid to participants and the total amount spent on participant compensation. [Yes/No/Not Applicable] Not Applicable

Appendix A FURTHER PRELIMINARIES AND DEFINITIONS

A.1 Preliminaries

Path Construction. A subsequence of a path pp is a path obtained by deleting non-endpoint nodes from pp without changing the order of the remaining nodes. Let p=X1,X2,,Xkp=\langle X_{1},X_{2},\dots,X_{k}\rangle and i,j,ki,j,k such that 1i<jk1\leq i<j\leq k. We denote the concatenation of paths by the symbol \oplus, so that p=p(X1,Xi)p(Xi,Xk)p=p(X_{1},X_{i})\oplus p(X_{i},X_{k}). We use the notation (p)(Xj,Xi)(-p)(X_{j},X_{i}) to denote the path Xj,Xj1,,Xi\langle X_{j},X_{j-1},\dots,X_{i}\rangle.

A.2 Definitions

Definition 11

(Adjustment Set for MPDAGs (PAGs); Perković et al., 2017, 2018, 2015; cf. Maathuis and Colombo, 2015) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal MPDAG (PAG) 𝒢\mathcal{G}. Then 𝐒\mathbf{S} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒢\mathcal{G} if for any density ff consistent with 𝒢\mathcal{G}

f(𝐲|do(𝐱))={f(𝐲|𝐱)𝐒=f(𝐲|𝐱,𝐬)f(𝐬)d𝐬𝐒.f(\mathbf{y}|do(\mathbf{x}))=\begin{cases}f(\mathbf{y}|\mathbf{x})&\mathbf{S}=\emptyset\\ \int f(\mathbf{y}|\mathbf{x},\mathbf{s})f(\mathbf{s})\mathop{}\!\mathrm{d}\mathbf{s}&\mathbf{S}\neq\emptyset.\end{cases}
Definition 12

(Adjustment Criterion for MPDAGs (PAGs); Perković et al., 2017, 2018) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐒\mathbf{S} be pairwise disjoint node sets in an MPDAG (PAG) 𝒢\mathcal{G}, where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed (visible) edge out of 𝐗\mathbf{X}. Then 𝐒\mathbf{S} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒢\mathcal{G} if

  1. (a)

    𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset, and

  2. (b)

    𝐒\mathbf{S} blocks all proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}.

Definition 13

(Generalized Back-Door Criterion for DAGs; cf. Maathuis and Colombo, 2015) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a DAG 𝒟\mathcal{D}. Then 𝐒\mathbf{S} satisfies the generalized back-door criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒟\mathcal{D} if

  1. (a)

    𝐒De(𝐗,𝒟)=\mathbf{S}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset, and

  2. (b)

    𝐒𝐗{X}\mathbf{S}\cup\mathbf{X}\setminus\{X\} blocks all back-door paths from XX to 𝐘\mathbf{Y} in 𝒟\mathcal{D}, for every X𝐗X\in\mathbf{X}.

Definition 14

(Proper Back-Door Graph for DAGs; cf. Perković et al., 2018) Let 𝐗\mathbf{X} and 𝐘\mathbf{Y} be disjoint node sets in a DAG 𝒟\mathcal{D}. The proper back-door graph 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{\mathbf{XY}}}^{pbd} is obtained from 𝒟\mathcal{D} by removing all edges out of 𝐗\mathbf{X} that are on proper causal paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒟\mathcal{D}.

Definition 15

(Moral Graph for DAGs; cf. Lauritzen and Spiegelhalter, 1988; cf. Perković et al., 2018) Let 𝒟=(𝐕,𝐄)\mathcal{D}=(\mathbf{V},\mathbf{E}) be a DAG. The moral graph 𝒟m\mathcal{D}^{m} is formed by adding the edge ABA-B to any structure of the form ACBA\to C\leftarrow B for any A,B,C𝐕A,B,C\in\mathbf{V}, with AAdj(B,𝒟)A\notin\operatorname{Adj}(B,\mathcal{D}) (marrying unmarried parents) and subsequently making all edges in the resulting graph undirected.

Definition 16

(Distance to 𝐙\mathbf{Z}; Zhang, 2006; Perković et al., 2017) Let 𝐗,𝐘\mathbf{X,Y} and 𝐙\mathbf{Z} be pairwise disjoint node sets in an MPDAG or PAG 𝒢\mathcal{G}. Let pp be a path between 𝐗\mathbf{X} and 𝐘\mathbf{Y} in 𝒢\mathcal{G} such that every collider CC on pp has a possibly directed path (possibly of length 0) to 𝐙\mathbf{Z}. Define the distance to 𝐙\mathbf{Z} of CC to be the length of a shortest possibly directed path (possibly of length 0) from CC to 𝐙\mathbf{Z}, and define the distance to 𝐙\mathbf{Z} of pp to be the sum of the distances from 𝐙\mathbf{Z} of the colliders on pp.

Appendix B EXISTING RESULTS

Rules of the Do Calculus (Pearl, 2009). Let 𝐗,𝐘,𝐙,\mathbf{X,Y,Z,} and 𝐖\mathbf{W} be pairwise disjoint (possibly empty) node sets in a causal DAG 𝒟\mathcal{D}. Let 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}} denote the graph obtained by deleting all edges into 𝐗\mathbf{X} from 𝒟\mathcal{D}. Similarly, let 𝒟𝐗¯\mathcal{D}_{\underline{\mathbf{X}}} denote the graph obtained by deleting all edges out of 𝐗\mathbf{X} in 𝒟\mathcal{D}, and let 𝒟𝐗¯𝐙¯\mathcal{D}_{\overline{\mathbf{X}}\underline{\mathbf{Z}}} denote the graph obtained by deleting all edges into 𝐗\mathbf{X} and all edges out of 𝐙\mathbf{Z} in 𝒟\mathcal{D}. The following rules hold for all densities consistent with 𝒟\mathcal{D}.

Rule 1. If (𝐘d𝐙|𝐗𝐖)𝒟𝐗¯(\mathbf{Y}\perp_{d}\mathbf{Z}\>|\>\mathbf{X}\cup\mathbf{W})_{\mathcal{D}_{\overline{\mathbf{X}}}}, then

f(𝐲|do(𝐱),𝐳,𝐰)=f(𝐲|do(𝐱),𝐰).\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z,w})=f(\mathbf{y}|do(\mathbf{x}),\mathbf{w}). (14)

Rule 2. If (𝐘d𝐗|𝐙𝐖)𝒟𝐗¯𝐖¯(\mathbf{Y}\perp_{d}\mathbf{X}\>|\>\mathbf{Z}\cup\mathbf{W})_{\mathcal{D}_{\underline{\mathbf{X}}\overline{\mathbf{W}}}}, then

f(𝐲|do(𝐱),𝐳,do(𝐰))=f(𝐲|𝐱,𝐳,do(𝐰)).\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z},do(\mathbf{w}))=f(\mathbf{y}|\mathbf{x},\mathbf{z},do(\mathbf{w})). (15)

Rule 3. If (𝐘d𝐗|𝐙𝐖)𝒟𝐗(𝐙)𝐖¯(\mathbf{Y}\perp_{d}\mathbf{X}\>|\>\mathbf{Z}\cup\mathbf{W})_{\mathcal{D}_{\overline{\mathbf{X}(\mathbf{Z})\cup\mathbf{W}}}}, then

f(𝐲|do(𝐱),𝐳,do(𝐰))=f(𝐲|𝐳,do(𝐰)),\displaystyle\begin{split}f(\mathbf{y}|do(\mathbf{x}),\mathbf{z},do(\mathbf{w}))=f(\mathbf{y}|\mathbf{z},do(\mathbf{w})),\end{split} (16)

where 𝐗(𝐙)=𝐗An(𝐙,𝒟𝐖¯)\mathbf{X(Z)}=\mathbf{X}\setminus\operatorname{An}(\mathbf{Z},\mathcal{D}_{\overline{\mathbf{W}}}).

Lemma 17

(Wright’s Rule of Wright, 1921) Let 𝐗=𝐀𝐗+ϵ\mathbf{X}=\mathbf{AX}+\mathbf{\epsilon}, where 𝐐k×k\mathbf{Q}\in\mathbb{R}^{k\times k}, 𝐗=(X1,,Xk)T\mathbf{X}=(X_{1},\dots,X_{k})^{T} and ϵ=(ϵ1,,ϵk)T\mathbf{\epsilon}=(\epsilon_{1},\dots,\epsilon_{k})^{T} is a vector of mutually independent errors with means zero. Moreover, let Var(𝐗)=𝐈Var(\mathbf{X})=\mathbf{I}. Let 𝒟=(𝐗,𝐄)\mathcal{D}=(\mathbf{X},\mathbf{E}), be the corresponding DAG such that XiXjX_{i}\to X_{j} is in 𝒟\mathcal{D} if and only if Aji0A_{ji}\neq 0. A non-zero entry AjiA_{ji} is called the edge coefficient of XiXjX_{i}\to X_{j}. For two distinct nodes XiX_{i}, Xj𝐗X_{j}\in\mathbf{X}, let p1,,prp_{1},\dots,p_{r} be all paths between XiX_{i} and XjX_{j} in 𝒟\mathcal{D} that do not contain a collider. Then Cov(Xi,Xj)=s=1rπs\operatorname{Cov}(X_{i},X_{j})=\sum_{s=1}^{r}\pi_{s}, where πs\pi_{s} is the product of all edge coefficients along path psp_{s}, s{1,,r}s\in\{1,\dots,r\}.

Lemma 18

(Theorem 3.2.4 of Mardia et al., 1980) Let 𝐗=(𝐗𝟏T,𝐗𝟐T)T\mathbf{X}=(\mathbf{X_{1}}^{T},\mathbf{X_{2}}^{T})^{T} be a pp-dimensional multivariate Gaussian random vector with mean vector μ=(μ𝟏T,μ𝟐T)T\mathbf{\mu}=(\mathbf{\mu_{1}}^{T},\mathbf{\mu_{2}}^{T})^{T} and covariance matrix 𝚺=[𝚺𝟏𝟏𝚺12𝚺𝟐𝟏𝚺22]\mathbf{\Sigma}=\begin{bmatrix}\mathbf{\Sigma_{11}}&\mathbf{\Sigma}_{12}\\ \mathbf{\Sigma_{21}}&\mathbf{\Sigma}_{22}\end{bmatrix}, so that 𝐗𝟏\mathbf{X_{1}} is a qq-dimensional multivariate Gaussian random vector with mean vector μ𝟏\mathbf{\mu_{1}} and covariance matrix 𝚺𝟏𝟏\mathbf{\Sigma_{11}} and 𝐗𝟐\mathbf{X_{2}} is a (pq)(p-q)-dimensional multivariate Gaussian random vector with mean vector μ𝟐\mathbf{\mu_{2}} and covariance matrix 𝚺𝟐𝟐\mathbf{\Sigma_{22}}. Then E[𝐗𝟐|𝐗𝟏=𝐱𝟏]=μ𝟐+𝚺𝟐𝟏𝚺𝟏𝟏1(𝐱𝟏μ𝟏)E[\mathbf{X_{2}}|\mathbf{X_{1}}=\mathbf{x_{1}}]=\mathbf{\mu_{2}}+\mathbf{\Sigma_{21}}\mathbf{\Sigma_{11}}^{-1}(\mathbf{x_{1}}-\mathbf{\mu_{1}}).

Lemma 19

(cf. Theorem 1 and Proposition 3 of Lauritzen et al., 1990) Let 𝒟=(𝐕,𝐄)\mathcal{D}=(\mathbf{V},\mathbf{E}) be a DAG, and let ff be an observational density over 𝐕\mathbf{V}. Then ff is Markov compatible with 𝒟\mathcal{D} if and only if

Vi[𝐕(De(Vi,𝒟)Pa(Vi,𝒟))]|Pa(Vi,𝒟)\displaystyle V_{i}\perp\!\!\!\!\perp\Big{[}\mathbf{V}\setminus\big{(}\operatorname{De}(V_{i},\mathcal{D})\cup\operatorname{Pa}(V_{i},\mathcal{D})\big{)}\Big{]}|\operatorname{Pa}(V_{i},\mathcal{D})

for all Vi𝐕V_{i}\in\mathbf{V}, where \perp\!\!\!\!\perp indicates independence with respect to ff.

Lemma 20

(cf. Lemma 3.2 of Perković et al., 2017) Let 𝐗\mathbf{X} and 𝐙\mathbf{Z} be disjoint node sets in an MPDAG 𝒢\mathcal{G}. If 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset, then 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset in every DAG 𝒟\mathcal{D} in [𝒢][\mathcal{G}].

Lemma 21

(Lemma C.2 of Perković et al., 2017, Lemma 9 of Perković et al., 2018) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐒\mathbf{S} be pairwise disjoint node sets in an MPDAG (PAG) 𝒢\mathcal{G}, where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed (visible) edge out of 𝐗\mathbf{X}. Then the following statements are equivalent.

  1. (i)

    𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset.

  2. (ii)

    𝐒Forb(𝐗,𝐘,𝒟)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{D})=\emptyset in every DAG (MAG) 𝒟\mathcal{D} in [𝒢][\mathcal{G}].

Lemma 22

(cf. Lemma C.3 of Perković et al., 2017, Lemma 10 of Perković et al., 2018) Let 𝐗,𝐘\mathbf{X,Y} and 𝐒\mathbf{S} be pairwise disjoint node sets in an MPDAG (PAG) 𝒢\mathcal{G}, where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a directed (visible) edge out of 𝐗\mathbf{X} and where 𝐒Forb(𝐗,𝐘,𝒢)=\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset. Then the following statements are equivalent.

  1. (i)

    𝐒\mathbf{S} blocks all proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}.

  2. (ii)

    𝐒\mathbf{S} blocks all proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒟\mathcal{D} for every DAG (MAG) 𝒟\mathcal{D} in [𝒢][\mathcal{G}].

Theorem 23

(cf. Proposition 3 of Lauritzen et al. (1990), cf. Corollary 2 of Richardson (2003)) Let 𝐗,𝐘\mathbf{X},\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a DAG 𝒟\mathcal{D}. Further let (𝒟An(𝐗𝐘𝐙,𝒟))m(\mathcal{D}_{\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{D})})^{m} be the moral induced subgraph of 𝒟\mathcal{D} on nodes An(𝐗𝐘𝐙,𝒟)\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{D}) (see Definition 15). Then 𝐙\mathbf{Z} d-separates 𝐗\mathbf{X} and 𝐘\mathbf{Y} in 𝒟\mathcal{D} if and only if all paths between 𝐗\mathbf{X} and 𝐘\mathbf{Y} in (𝒟An(𝐗𝐘𝐙,𝒟))m(\mathcal{D}_{\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{D})})^{m} contain at least one node in 𝐙\mathbf{Z}.

Theorem 24

(cf. Theorem 7 of Perković et al., 2018) Consider the definition of the adjustment criterion for MPDAGs (Definition 12) in the specific setting of a DAG. In this setting, replacing condition (b) in Definition 12 with

  1. (b)

    𝐒\mathbf{S} d-separates 𝐗\mathbf{X} and 𝐘\mathbf{Y} in 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{\mathbf{XY}}}^{pbd} (see Definition 14)

results in a criterion that is equivalent to Definition 12 applied to a DAG.

Theorem 25

(cf. Theorem 3.1 of Maathuis and Colombo, 2015) Let 𝐗,𝐘\mathbf{X},\mathbf{Y}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal DAG 𝒟\mathcal{D}. If 𝐒\mathbf{S} satisfies the generalized back-door criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒟\mathcal{D} (Definition 13), then 𝐒\mathbf{S} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒟\mathcal{D} (Definition 11).

Lemma 26

(cf. Lemma E.6 of Henckel et al., 2022) Let 𝐗,𝐘\mathbf{X},\mathbf{Y} be disjoint node sets in an MPDAG 𝒢\mathcal{G}. If there is no proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that starts with an undirected edge in 𝒢\mathcal{G}, then Forb(𝐗,𝐘,𝒢)De(𝐗,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\subseteq\operatorname{De}(\mathbf{X},\mathcal{G}).

Lemma 27

(cf. Lemma 3.5 of Perković et al., 2017) Let p=V1,,Vk,k>1p=\langle V_{1},\dots,V_{k}\rangle,k>1, be a definite status path in MPDAG 𝒢\mathcal{G}. Then p is a possibly causal path in 𝒢\mathcal{G} if and only if there is no edge ViVi+1V_{i}\leftarrow V_{i+1}, i{1,,k1}i\in\{1,\dots,k-1\} in 𝒢\mathcal{G}.

Lemma 28

(cf. Lemma 3.3.1 of Zhang, 2006) Let XX, YY, and ZZ be distinct nodes in a PAG 𝒢\mathcal{G}. If XYZX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Y\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.4}} \put(2.9,0.0){$\bullet$} \end{picture}Z, then there is an edge between XX and ZZ with an arrowhead at ZZ. Furthermore, if the edge between XX and YY is XYX\rightarrow Y, then the edge between XX and ZZ is either XZX\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Z or XZX\rightarrow Z (that is, not XZX\leftrightarrow Z).

Lemma 29

(cf. Lemma 7.5 of Maathuis and Colombo, 2015) Let XX and YY be two distinct nodes in a MAG or PAG 𝒢\mathcal{G}. Then 𝒢\mathcal{G} cannot have both an edge YXY\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X and a path X=V1,,Vk=Y,k>2\langle X=V_{1},\dots,V_{k}=Y\rangle,k>2 where each edge Vi,Vi+1,i{1,,k1}\langle V_{i},V_{i+1}\rangle,i\in\{1,\dots,k-1\}, is of one of these forms: ViVi+1V_{i}\to V_{i+1} or ViVi+1V_{i}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.4}} \put(2.9,0.0){$\bullet$} \end{picture}V_{i+1}.

Lemma 30

(cf. Lemma 17 of Perković et al., 2018) Let 𝐗,𝐘,𝐙\mathbf{X,Y,Z} and 𝐒\mathbf{S} be pairwise disjoint node sets in a MAG or PAG 𝒢\mathcal{G}. Suppose that every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a visible edge out of 𝐗\mathbf{X} and that [𝐒𝐙]Forb(𝐗,𝐘,𝒢)=\big{[}\mathbf{S}\cup\mathbf{Z}\big{]}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset. Suppose furthermore that there is a path pp from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} such that

  1. (i)

    pp is a proper definite status non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G},

  2. (ii)

    all colliders on pp are in An(𝐗𝐘𝐙𝐒,𝒢)[𝐗𝐘Forb(𝐗,𝐘,𝒢)]\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}\cup\mathbf{S},\mathcal{G})\setminus\big{[}\mathbf{X}\cup\mathbf{Y}\cup\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\big{]}, and

  3. (iii)

    no definite non-collider on pp is in 𝐒𝐙\mathbf{S}\cup\mathbf{Z}.

Then there is a proper definite status non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is m-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z} in 𝒢\mathcal{G}.

Theorem 31

(cf. Theorem 4.4 of Perković et al., 2017, Theorems 5 and 56 of Perković et al., 2018) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal MPDAG (PAG) 𝒢\mathcal{G}. Then 𝐒\mathbf{S} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒢\mathcal{G} (Definition 11) if and only if 𝐒\mathbf{S} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒢\mathcal{G} (Definition 12).

Lemma 32

(cf. Lemma F.1 of Rothenhäusler et al., 2018) Let XX and YY be nodes in an MPDAG 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V,E}) such that XYX-Y is in 𝒢\mathcal{G}. Let 𝒢\mathcal{G}^{\prime} be an MPDAG constructed from 𝒢\mathcal{G} by adding XYX\to Y and completing the orientation rules R1 - R4 of Meek (1995). For any Z,W𝐕Z,W\in\mathbf{V}, if ZWZ-W is in 𝒢\mathcal{G} and ZWZ\rightarrow W is in 𝒢\mathcal{G}^{\prime}, then WDe(Y,𝒢)W\in\operatorname{De}(Y,\mathcal{G}^{\prime}).

Lemma 33

(cf. Lemma F.2 of Rothenhäusler et al., 2018) Let XX be a node in an MPDAG 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V},\mathbf{E}), and let 𝐒\mathbf{S} be a set such that for all S𝐒S\in\mathbf{S}, XSX-S is in 𝒢\mathcal{G}. Then there is an MPDAG 𝒢=(𝐕,𝐄)\mathcal{G}^{\prime}=(\mathbf{V},\mathbf{E^{\prime}}) that is formed by taking 𝒢\mathcal{G}, orienting XSX\to S for all S𝐒S\in\mathbf{S}, and completing R1-R4 of Meek (1995).

Lemma 34

(cf. Lemma 59 of Perković et al., 2018) Let 𝐗,𝐘\mathbf{X},\mathbf{Y} and 𝐒\mathbf{S} be pairwise disjoint node sets in a DAG 𝒟\mathcal{D} such that 𝐒\mathbf{S} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D} (Definition 12). Let 𝐉An(𝐗𝐘,𝒟)(De(𝐗,𝒟)𝐘)\mathbf{J}\subseteq\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D})\setminus(\operatorname{De}(\mathbf{X},\mathcal{D})\cup\mathbf{Y}) and 𝐒~=𝐒𝐉\mathbf{\tilde{S}}=\mathbf{S}\cup\mathbf{J}. Then the following statements hold:

  1. (i)

    𝐒~\mathbf{\tilde{S}} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D}, and

  2. (ii)

    𝐬f(𝐲𝐱,𝐬)f(𝐬)𝑑𝐬=𝐬~f(𝐲𝐱,𝐬~)f(𝐬~)𝑑𝐬~\int_{\mathbf{s}}f(\mathbf{y}\mid\mathbf{x,s})f(\mathbf{s})d\mathbf{s}=\int_{\mathbf{\tilde{s}}}f(\mathbf{y}\mid\mathbf{x,\tilde{s}})f(\mathbf{\tilde{s}})d\mathbf{\tilde{s}}, for any density ff consistent with 𝒟\mathcal{D}.

Lemma 35

(Lemma 60 of Perković et al., 2018) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal DAG 𝒟\mathcal{D} such that 𝐒\mathbf{S} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D}. Let 𝐉=An(𝐗𝐘,𝒟)(De(𝐗,𝒟)𝐘)\mathbf{J}=\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D})\setminus\big{(}\operatorname{De}(\mathbf{X},\mathcal{D})\cup\mathbf{Y}\big{)} and 𝐒~=𝐒𝐉\mathbf{\tilde{S}}=\mathbf{S}\cup\mathbf{J}. Additionally, let 𝐒~𝐃=𝐒~De(𝐗,𝒟)\mathbf{\tilde{S}_{D}}=\mathbf{\tilde{S}}\cap\operatorname{De}(\mathbf{X},\mathcal{D}), 𝐒~𝐍=𝐒~De(𝐗,𝒟)\mathbf{\tilde{S}_{N}}=\mathbf{\tilde{S}}\setminus\operatorname{De}(\mathbf{X},\mathcal{D}), 𝐘𝐃=𝐘De(𝐗,𝒟)\mathbf{Y_{D}}=\mathbf{Y}\cap\operatorname{De}(\mathbf{X},\mathcal{D}) and 𝐘𝐍=𝐘De(𝐗,𝒟)\mathbf{Y_{N}}=\mathbf{Y}\setminus\operatorname{De}(\mathbf{X},\mathcal{D}). Then the following statements hold:

  1. (i)

    (𝐗𝐘𝐍𝐒~)Forb(𝐗,𝐘,𝒟)=(\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}})\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{D})=\emptyset,

  2. (ii)

    if p=H,,YDp=\langle H,\dots,Y_{D}\rangle is a non-causal path from H𝐗𝐘𝐍𝐒~H\in\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}} to YD𝐘𝐃Y_{D}\in\mathbf{Y_{D}}, then pp is blocked by (𝐗𝐘𝐍𝐒~𝐍){H}(\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}})\setminus\{H\} in 𝒟\mathcal{D},

  3. (iii)

    𝐘𝐃d𝐒~𝐃|𝐘𝐍𝐗𝐒~𝐍\mathbf{Y_{D}}\perp_{d}\mathbf{\tilde{S}_{D}}\>|\>\mathbf{Y_{N}}\cup\mathbf{X}\cup\mathbf{\tilde{S}_{N}} in 𝒟\mathcal{D}, where 𝐘𝐍=\mathbf{Y_{N}}=\emptyset is allowed,

  4. (iv)

    if 𝐘𝐍=\mathbf{Y_{N}}=\emptyset then 𝐒~𝐍\mathbf{\tilde{S}_{N}} satisfies the generalized back-door criterion relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D} (Definition 13),

  5. (v)

    the empty set satisfies the generalized back-door criterion relative to (𝐗𝐘𝐍𝐒~𝐍,𝐘𝐃)(\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}},\mathbf{Y_{D}}) in 𝒟\mathcal{D},

  6. (vi)

    𝐘𝐃d(𝐘𝐍𝐒~𝐍)|𝐗\mathbf{Y_{D}}\perp_{d}(\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}})\>|\>\mathbf{X} in 𝒟𝐗¯𝐘𝐍𝐒~𝐍¯\mathcal{D}_{\overline{\mathbf{X}}\underline{\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}}}, and

  7. (vii)

    𝐒~𝐍d𝐗|𝐘𝐍\mathbf{\tilde{S}_{N}}\perp_{d}\mathbf{X}\>|\>\mathbf{Y_{N}} in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}.

Appendix C A NECESSARY CONDITION FOR IDENTIFIABILITY

This section includes the proof of Proposition 36, which provides a necessary condition for the identifiability of the conditional causal effect given an MPDAG. This result is needed twice – once for the proof of Theorem 3 in Section 3.1 and once for Example 4 in Section 3.2. Below we also provide two supporting results for the proof of Proposition 36 – namely, Lemmas 37 and 38.

C.1 Main Result

Proposition 36

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a causal MPDAG 𝒢\mathcal{G}. If there is a proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that starts with an undirected edge and does not contain any element of 𝐙\mathbf{Z}, then the conditional causal effect of 𝐗\mathbf{X} on 𝐘\mathbf{Y} given 𝐙\mathbf{Z} is not identifiable in 𝒢\mathcal{G}.

  •   Proof of Proposition 36.

    This lemma extends Proposition 3.2 of Perković (2020) and its proof follows similar logic to that of Perković (2020).

    Suppose that there is a proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V},\mathbf{E}) that starts with an undirected edge and does not contain any element of 𝐙\mathbf{Z}. Then by Lemma 37, there is one such path – call it q=X=V0,,Vk=Yq=\langle X=V_{0},\dots,V_{k}=Y\rangle, X𝐗X\in\mathbf{X}, Y𝐘,k1Y\in\mathbf{Y},k\geq 1 – where the corresponding paths in two DAGs in [𝒢][\mathcal{G}] take the forms XYX\to\dots\to Y and XV1YX\leftarrow V_{1}\to\dots\to Y (XYX\leftarrow Y when k=1k=1). Call these DAGs 𝒟1\mathcal{D}^{1} and 𝒟2\mathcal{D}^{2} with paths q1q_{1} and q2q_{2}, respectively.

    To prove that the conditional causal effect of 𝐗\mathbf{X} on 𝐘\mathbf{Y} given 𝐙\mathbf{Z} is not identifiable in 𝒢\mathcal{G}, it suffices to show that there are two families of interventional densities over 𝐕\mathbf{V} – call them 𝐅𝟏\mathbf{F^{*}_{1}} and 𝐅𝟐\mathbf{F^{*}_{2}}, where for i{1,2}i\in\{1,2\}, we define 𝐅𝐢={fi(𝐯|do(𝐱)):𝐗𝐕}\mathbf{F^{*}_{i}}=\{f_{i}(\mathbf{v}|do(\mathbf{x^{\prime}})):\mathbf{X^{\prime}}\subseteq\mathbf{V}\} – such that the following properties hold.

    1. (i)

      𝒟1\mathcal{D}^{1} and 𝒟2\mathcal{D}^{2} are compatible with 𝐅𝟏\mathbf{F^{*}_{1}} and 𝐅𝟐\mathbf{F^{*}_{2}}, respectively. 333For brevity, we say a DAG is “compatible with” a set of interventional densities and an interventional density is “consistent with” a DAG as shorthand for these claims holding only were the DAG to be causal.

    2. (ii)

      f1(𝐯)=f2(𝐯)f_{1}(\mathbf{v})=f_{2}(\mathbf{v}).

    3. (iii)

      f1(𝐲|do(𝐱),𝐳)f2(𝐲|do(𝐱),𝐳)f_{1}(\mathbf{y}|do(\mathbf{x}),\mathbf{z})\neq f_{2}(\mathbf{y}|do(\mathbf{x}),\mathbf{z}).

    To define such families, we start by introducing an additional DAG and an observational density f(𝐯)f(\mathbf{v}). That is, let 𝒟1\mathcal{D}^{1^{\prime}} be a DAG constructed by removing every edge from 𝒟1\mathcal{D}^{1} except for the edges on q1q_{1}. Then let f(𝐯)f(\mathbf{v}) be the multivariate normal distribution under the following linear structural equation model (SEM). Each random variable A𝐕A\in\mathbf{V} has mean zero and is a linear combination of its parents in 𝒟1\mathcal{D}^{1^{\prime}} and ϵAN(0,σA2)\epsilon_{A}\sim N(0,\sigma^{2}_{A}), where {ϵA:A𝐕}\{\mathbf{\epsilon}_{A}:A\in\mathbf{V}\} are mutually independent. The coefficients in this linear combination are defined by the edge coefficients of 𝒟1\mathcal{D}^{1^{\prime}}. We pick these edge coefficients in conjunction with {σA2:A𝐕}\{\mathbf{\sigma}^{2}_{A}:A\in\mathbf{V}\} in such a way that each coefficient is in (0,1)(0,1) and Var(A)=1\operatorname{Var}(A)=1 for all A𝐕A\in\mathbf{V}.

    From this, we define 𝐅𝟏={f1(𝐯|do(𝐱)):𝐗𝐕}\mathbf{F^{*}_{1}}=\{f_{1}(\mathbf{v}|do(\mathbf{x^{\prime}})):\mathbf{X^{\prime}}\subseteq\mathbf{V}\} such that 𝒟1\mathcal{D}^{1^{\prime}} is compatible with 𝐅𝟏\mathbf{F^{*}_{1}} and such that f1(𝐯)=f(𝐯)f_{1}(\mathbf{v})=f(\mathbf{v}). Note that f(𝐯)f(\mathbf{v}) is Markov compatible with 𝒟1\mathcal{D}^{1^{\prime}} by construction, and we build the interventional densities in 𝐅𝟏\mathbf{F^{*}_{1}} by replacing the intervening random variables in the SEM with their interventional values (Pearl, 2009).

    To construct the second family of interventional densities, we introduce the DAG 𝒟2\mathcal{D}^{2^{\prime}}, which we form by removing every edge from 𝒟2\mathcal{D}^{2} except for the edges on q2q_{2}. Then note that we could have defined f(𝐯)f(\mathbf{v}) using a linear SEM based on the parents in 𝒟2\mathcal{D}^{2^{\prime}}. In this case, the resulting observational density would again be a multivariate normal with mean vector zero and a covariance matrix with ones on the diagonal. The off-diagonal entries would be the covariances between the variables in 𝒟2\mathcal{D}^{2^{\prime}}. But note that by Lemma 17, these values will equal the product of all edge coefficients between the relevant nodes in 𝒟2\mathcal{D}^{2^{\prime}}. Since 𝒟1\mathcal{D}^{1^{\prime}} and 𝒟2\mathcal{D}^{2^{\prime}} contain no paths with colliders, the observational density f(𝐯)f(\mathbf{v}) built using 𝒟2\mathcal{D}^{2^{\prime}} will be an identical distribution to that built under 𝒟1\mathcal{D}^{1^{\prime}}. Thus, in an analogous way to 𝐅𝟏\mathbf{F^{*}_{1}}, we define 𝐅𝟐={f1(𝐯|do(𝐱)):𝐗𝐕}\mathbf{F^{*}_{2}}=\{f_{1}(\mathbf{v}|do(\mathbf{x^{\prime}})):\mathbf{X^{\prime}}\subseteq\mathbf{V}\} such that 𝒟2\mathcal{D}^{2^{\prime}} is compatible with 𝐅𝟐\mathbf{F^{*}_{2}} and such that f2(𝐯)=f(𝐯)f_{2}(\mathbf{v})=f(\mathbf{v}).

    Having defined 𝐅𝟏\mathbf{F^{*}_{1}} and 𝐅𝟐\mathbf{F^{*}_{2}}, we check that their desired properties hold. Note that by construction, 𝒟1\mathcal{D}^{1^{\prime}} and 𝒟2\mathcal{D}^{2^{\prime}} are compatible with 𝐅𝟏\mathbf{F^{*}_{1}} and 𝐅𝟐\mathbf{F^{*}_{2}}, respectively. Thus 3 holds by Lemma 38. Similarly by construction, (ii) holds. To show that (iii) holds, it suffices to show that E[Y|do(𝐗=𝟏),𝐙]E[Y|do(\mathbf{X}=\mathbf{1}),\mathbf{Z}] is not the same under f1f_{1} and f2f_{2}.

    To calculate these expectations, we first want to apply Rules 1-3 of the do calculus (Equations (14)-(16)). Since fi(𝐯|do(𝐱))f_{i}(\mathbf{v}|do(\mathbf{x})), i{1,2}i\in\{1,2\}, is consistent with 𝒟i\mathcal{D}^{i^{\prime}}, we apply these rules using graphical relationships in 𝒟i\mathcal{D}^{i^{\prime}}. Because the path in 𝒟i\mathcal{D}^{i^{\prime}} corresponding to qiq_{i}, i{1,2}i\in\{1,2\}, does not contain nodes in 𝐙\mathbf{Z} or 𝐗{X}\mathbf{X}\setminus\{X\}, then Yd𝐙|𝐗Y\perp_{d}\mathbf{Z}|\mathbf{X} and Yd𝐗{X}|XY\perp_{d}\mathbf{X}\setminus\{X\}|X in 𝒟𝐗¯i\mathcal{D}^{i^{\prime}}_{\overline{\mathbf{X}}}. Further, YdXY\perp_{d}X in 𝒟X¯1\mathcal{D}^{1^{\prime}}_{\underline{X}} and YdXY\perp_{d}X in 𝒟X¯2\mathcal{D}^{2^{\prime}}_{\overline{X}}. Thus by Rules 1-3 of the do calculus (Equations (14)-(16)), the following hold.

    E1[Y|do(𝐗=𝟏),𝐙]\displaystyle E_{1}[Y|do(\mathbf{X}=\mathbf{1}),\mathbf{Z}] =E1[Y|do(X=1)]=E1[Y|X=1]:=a.\displaystyle=E_{1}[Y|do(X=1)]=E_{1}[Y|X=1]:=a.
    E2[Y|do(𝐗=𝟏),𝐙]\displaystyle E_{2}[Y|do(\mathbf{X}=\mathbf{1}),\mathbf{Z}] =E2[Y|do(X=1)]=E2[Y]:=b,\displaystyle=E_{2}[Y|do(X=1)]=E_{2}[Y]:=b,

    where Ei,i{1,2}E_{i},i\in\{1,2\} is the expectation under fif_{i}. To calculate aa and bb, we rely on the observational density f(𝐯)f(\mathbf{v}), which was constructed using 𝒟1\mathcal{D}^{1^{\prime}}. By Lemma 18, aa equals the covariance of XX and YY under f(𝐯)f(\mathbf{v}), and by Lemma 17, Cov(X,Y)\operatorname{Cov}(X,Y) equals the product of all edge coefficients in 𝒟1\mathcal{D}^{1^{\prime}}, which were chosen to be in (0,1)(0,1). Therefore, a0a\neq 0. But by definition of f(𝐯)f(\mathbf{v}), b=0b=0.  

C.2 Supporting Result

Lemma 37

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in an MPDAG 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V},\mathbf{E}). Suppose that there is a proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that starts with an undirected edge and does not contain nodes in 𝐙\mathbf{Z}. Then there is one such path X=V0,,Vk=Y\langle X=V_{0},\dots,V_{k}=Y\rangle, X𝐗X\in\mathbf{X}, Y𝐘Y\in\mathbf{Y}, k1k\geq 1, where the corresponding paths in two DAGs in [𝒢][\mathcal{G}] take the forms XYX\to\dots\to Y and XV1YX\leftarrow V_{1}\to\dots\to Y (XYX\leftarrow Y when k=1k=1), respectively.

  •   Proof of Lemma 37.

    This lemma is similar to Lemma A.3 of Perković (2020) and its proof borrows from the proof strategy of Lemma C.1 of Perković et al. (2017).

    Let qq^{*} be an arbitrary proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that starts with an undirected edge and does not contain nodes in 𝐙\mathbf{Z}. Then let q=X=V0,,Vk=Yq=\langle X=V_{0},\dots,V_{k}=Y\rangle, X𝐗X\in\mathbf{X}, Y𝐘Y\in\mathbf{Y}, k1k\geq 1, be a shortest subsequence of qq^{*} in 𝒢\mathcal{G} that also starts with an undirected edge. Note that qq is a proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that starts with an undirected edge and does not contain nodes in 𝐙\mathbf{Z}.

    Consider when qq is of definite status. Since qq is possibly causal, all non-endpoints of qq are definite non-colliders. Let 𝒟1\mathcal{D}^{1} be a DAG in [𝒢][\mathcal{G}] that contains XV1X\to V_{1}. Then since V1V_{1} is either YY or a definite non-collider on qq, the path corresponding to qq in 𝒟1\mathcal{D}^{1} takes the form XYX\to\dots\to Y by induction. Let 𝒟2\mathcal{D}^{2} be a DAG in [𝒢][\mathcal{G}] with no additional edges into V1V_{1} compared to 𝒢\mathcal{G} (Lemma 33). Since 𝒢\mathcal{G} contains XV1X-V_{1}, 𝒟2\mathcal{D}^{2} contains XV1X\leftarrow V_{1}. When k>1k>1, 𝒢\mathcal{G} contains either V1V2V_{1}-V_{2} or V1V2V_{1}\to V_{2}, and so 𝒟2\mathcal{D}^{2} contains XV1V2X\leftarrow V_{1}\to V_{2}. Thus by the same inductive reasoning as above, the path corresponding to qq in 𝒟2\mathcal{D}^{2} takes the form XV1YX\leftarrow V_{1}\to\dots\to Y (or simply XYX\leftarrow Y when k=1k=1).

    Consider instead when qq is not of definite status. Note that k>1k>1. To see that qq contains V1V2V_{1}-V_{2}, note that by the choice of qq and the fact that qq is possibly causal, q(V1,Y)q(V_{1},Y) is unshielded and possibly causal. Thus, q(V1,Y)q(V_{1},Y) is of definite status. However, qq is not of definite status, so V1V_{1} must not be of definite status on qq, which implies that qq cannot contain V1V2V_{1}\to V_{2}. Since qq is possibly causal, it also cannot contain V1V2V_{1}\leftarrow V_{2}.

    To find two DAGs in [𝒢][\mathcal{G}] with paths corresponding to qq that fit our desired forms, we narrow our search to [𝒢][\mathcal{G}^{\prime}], where we let 𝒢\mathcal{G}^{\prime} be an MPDAG constructed from 𝒢\mathcal{G} by adding V1V2V_{1}\to V_{2} and completing R1-R4 of Meek (1995). We show below that the path corresponding to qq in 𝒢\mathcal{G}^{\prime} takes the form XV1YX-V_{1}\to\dots\to Y, and thus, there must be two DAGs in [𝒢][𝒢][\mathcal{G}^{\prime}]\subseteq[\mathcal{G}] with corresponding paths of the forms XYX\to\dots\to Y and XV1YX\leftarrow V_{1}\to\dots\to Y.

    We first show that 𝒢\mathcal{G}^{\prime} contains XV1X-V_{1} by the contraposition of Lemma 32. Note that we have already shown that 𝒢\mathcal{G} contains V1V2V_{1}-V_{2}, that 𝒢\mathcal{G}^{\prime} is formed by adding V1V2V_{1}\to V_{2} to 𝒢\mathcal{G}, and that 𝒢\mathcal{G} contains XV1X-V_{1}. It remains to show that X,V1De(V2,𝒢)X,V_{1}\notin\operatorname{De}(V_{2},\mathcal{G}^{\prime}). To see this, note that 𝒢\mathcal{G} must contain an edge X,V2\langle X,V_{2}\rangle, because V1V_{1} is not of definite status on qq. This edge must take the form XV2X\to V_{2} by the choice of qq and the fact that qq is possibly causal. Thus, 𝒢\mathcal{G}^{\prime} contains XV2X\to V_{2} and V1V2V_{1}\to V_{2}. Therefore, X,V1De(V2,𝒢)X,V_{1}\notin\operatorname{De}(V_{2},\mathcal{G}^{\prime}). Finally, note that 𝒢\mathcal{G}^{\prime} contains V1YV_{1}\to\dots\to Y by R1 of Meek (1995), since we constructed 𝒢\mathcal{G}^{\prime} be adding V1V2V_{1}\to V_{2} to a path q(V1,Y)q(V_{1},Y) that is unshielded and possibly causal.  

Lemma 38

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z} be pairwise disjoint node sets in a causal DAG 𝒟=(𝐕,𝐄)\mathcal{D}=(\mathbf{V},\mathbf{E}). Then let 𝒟=(𝐕,𝐄)\mathcal{D}^{*}=(\mathbf{V},\mathbf{E^{\prime}}) be a causal DAG constructed by removing edges from 𝒟\mathcal{D}, and let f(𝐯|do(𝐱))f(\mathbf{v}|do(\mathbf{x})) be an interventional density over 𝐕\mathbf{V}. If f(𝐯|do(𝐱))f(\mathbf{v}|do(\mathbf{x})) is consistent with 𝒟\mathcal{D}^{*}, then it is consistent with 𝒟\mathcal{D}.

  •   Proof of Lemma 38.

    Suppose that f(𝐯|do(𝐱))f(\mathbf{v}|do(\mathbf{x})) is consistent with 𝒟\mathcal{D}^{*}. Then by definition, there exists a set of interventional densities 𝐅\mathbf{F^{*}} such that 𝒟\mathcal{D}^{*} is compatible with 𝐅\mathbf{F^{*}}. Let f(𝐯)f(\mathbf{v}) be the density in 𝐅\mathbf{F^{*}} under a null intervention. Note that by the truncated factorization in Equation (3), f(𝐯)f(\mathbf{v}) is Markov compatible with 𝒟\mathcal{D}^{*}. Thus by Lemma 19,

    Vi[𝐕(De(Vi,𝒟)Pa(Vi,𝒟))]|Pa(Vi,𝒟)\displaystyle V_{i}\perp\!\!\!\!\perp\Big{[}\mathbf{V}\setminus\big{(}\operatorname{De}(V_{i},\mathcal{D}^{*})\cup\operatorname{Pa}(V_{i},\mathcal{D}^{*})\big{)}\Big{]}|\operatorname{Pa}(V_{i},\mathcal{D}^{*}) (17)

    for all Vi𝐕V_{i}\in\mathbf{V}, where \perp\!\!\!\!\perp indicates independence with respect to f(𝐯)f(\mathbf{v}). Further, since De(Vi,𝒟)De(Vi,𝒟)\operatorname{De}(V_{i},\mathcal{D}^{*})\subseteq\operatorname{De}(V_{i},\mathcal{D}), then De(Vi,𝒟)Pa(Vi,𝒟)=\operatorname{De}(V_{i},\mathcal{D}^{*})\cap\operatorname{Pa}(V_{i},\mathcal{D})=\emptyset and thus Pa(Vi,𝒟)𝐕De(Vi,𝒟)\operatorname{Pa}(V_{i},\mathcal{D})\subseteq\mathbf{V}\setminus\operatorname{De}(V_{i},\mathcal{D}^{*}). Therefore it follows from (17) that

    Vi[Pa(Vi,𝒟)Pa(Vi,𝒟)]|Pa(Vi,𝒟).\displaystyle V_{i}\perp\!\!\!\!\perp\Big{[}\operatorname{Pa}(V_{i},\mathcal{D})\setminus\operatorname{Pa}(V_{i},\mathcal{D}^{*})\Big{]}\,\,\Big{|}\,\,\operatorname{Pa}(V_{i},\mathcal{D}^{*}). (18)

    Let f(𝐯|do(𝐱))f(\mathbf{v}|do(\mathbf{x^{\prime}})), 𝐗𝐕\mathbf{X^{\prime}}\subseteq\mathbf{V}, be an arbitrary density in 𝐅\mathbf{F^{*}}. Then by definition and (18)

    f(𝐯|do(𝐱))\displaystyle f(\mathbf{v}|do(\mathbf{x^{\prime}})) =Vi𝐕𝐗f(vi|pa(vi,𝒟))𝟙(𝐗=𝐱)\displaystyle=\prod_{V_{i}\in\mathbf{V}\setminus\mathbf{X^{\prime}}}f(v_{i}|\operatorname{pa}(v_{i},\mathcal{D}^{*}))\mathds{1}(\mathbf{X^{\prime}}=\mathbf{x^{\prime}})
    =Vi𝐕𝐗f(vi|pa(vi,𝒟))𝟙(𝐗=𝐱).\displaystyle=\prod_{V_{i}\in\mathbf{V}\setminus\mathbf{X^{\prime}}}f(v_{i}|\operatorname{pa}(v_{i},\mathcal{D}))\mathds{1}(\mathbf{X^{\prime}}=\mathbf{x^{\prime}}).

    Since f(𝐯|do(𝐱))f(\mathbf{v}|do(\mathbf{x^{\prime}})) was arbitrary, this holds for all densities in 𝐅\mathbf{F^{*}}. Thus, 𝒟\mathcal{D} is compatible with 𝐅\mathbf{F^{*}}. Since f(𝐯|do(𝐱))𝐅f(\mathbf{v}|do(\mathbf{x}))\in\mathbf{F^{*}}, then by definition, it is consistent with 𝒟\mathcal{D}.  

Appendix D PROOFS FOR SECTION 3.1: MPDAGS - CONDITIONAL ADJUSTMENT CRITERION

The following results show the completeness and soundness of the conditional adjustment criterion for identifying conditional adjustment sets in DAGs. We rely on these results to show the analogous results for MPDAGs in Theorem 3 of Section 3.1. Figure 5 shows how the results in this paper fit together to prove Theorem 3. Two supporting results needed for the proof of soundness in DAGs follow the main results below.

Theorem 3Theorem 39Theorem 40Lemma 41Lemma 42Lemma 6(a)
Figure 5: Proof structure of Theorem 3.

D.1 Main Results

  •   Proof of Lemma 6.

    (a) Follows from Lemma 26. (b) Holds by Theorem 3, Lemma 6(a), and Theorem 31.  

Theorem 39

(Completeness of the Conditional Adjustment Criterion for DAGs) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal DAG 𝒟\mathcal{D}, where 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. If 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 1), then 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 2).

  •   Proof of Theorem 39.

    Let 𝐒\mathbf{S} be a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}, and let ff be a density consistent with 𝒟\mathcal{D}. We start by showing that 𝐒𝐙\mathbf{S}\cup\mathbf{Z} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D}. To do this, we calculate the following. (Justification for the numbered equations is below.)

    f(𝐲|do(𝐱))\displaystyle f(\mathbf{y}|do(\mathbf{x})) =𝐳f(𝐲,𝐳|do(𝐱))d𝐳\displaystyle=\int_{\mathbf{z}}f(\mathbf{y,z}|do(\mathbf{x}))\mathop{}\!\mathrm{d}\mathbf{z}
    =𝐳f(𝐳|do(𝐱))f(𝐲|do(𝐱),𝐳)d𝐳\displaystyle=\int_{\mathbf{z}}f(\mathbf{z}|do(\mathbf{x}))f(\mathbf{y}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{z}
    =𝐳f(𝐳)f(𝐲|do(𝐱),𝐳)d𝐳\displaystyle=\int_{\mathbf{z}}f(\mathbf{z})f(\mathbf{y}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{z} (19)
    =𝐳f(𝐳)𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬d𝐳\displaystyle=\int_{\mathbf{z}}f(\mathbf{z})\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}\mathop{}\!\mathrm{d}\mathbf{z} (20)
    =𝐬,𝐳f(𝐲|𝐱,𝐬,𝐳)f(𝐬,𝐳)d𝐬d𝐳.\displaystyle=\int_{\mathbf{s,z}}f(\mathbf{y}|\mathbf{x,s,z})f(\mathbf{s,z})\mathop{}\!\mathrm{d}\mathbf{s}\mathop{}\!\mathrm{d}\mathbf{z}.

    Equation (19) follows from Rule 3 of the do calculus (Equation (16)). To show that this rule holds, let pp be an arbitrary path from 𝐗\mathbf{X} to 𝐙\mathbf{Z} in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}. Note that pp must begin with an edge out of 𝐗\mathbf{X}. Since 𝐙De(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{G})=\emptyset, pp cannot be causal and, therefore, must have colliders. Thus, pp is blocked, and so (𝐙d𝐗)𝒟𝐗¯(\mathbf{Z}\perp_{d}\mathbf{X})_{\mathcal{D}_{\overline{\mathbf{X}}}}. Equation (20) follows from the fact that 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}). This shows that 𝐒𝐙\mathbf{S}\cup\mathbf{Z} is an adjustment set relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D}.

    By Theorem 31, 𝐒𝐙\mathbf{S}\cup\mathbf{Z} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D}. Then by Lemma 6(a), 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}.  

Theorem 40

(Soundness of the Conditional Adjustment Criterion for DAGs) Let 𝐗,𝐘,𝐙\mathbf{X,Y,Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal DAG 𝒟\mathcal{D}, where 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. If 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 2), then 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 1).

  •   Proof of Theorem 40.

    This theorem is analogous to Theorem 58 of Perković et al. (2018) for the adjustment criterion. We use the same proof strategy and adapt the arguments to suit our needs.

    Suppose that 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} and let ff be a density consistent with 𝒟\mathcal{D}. Our goal is to prove that

    f(𝐲|do(𝐱),𝐳)=𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬.{f(\mathbf{y}|do(\mathbf{x}),\mathbf{z})=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}.} (21)

    We consider three cases below. Before this, we prove an equality that holds in all cases. Let 𝐘𝐃=𝐘De(𝐗,𝒟)\mathbf{Y_{D}}=\mathbf{Y}\cap\operatorname{De}(\mathbf{X},\mathcal{D}) and 𝐘𝐍=𝐘De(𝐗,𝒟)\mathbf{Y_{N}}=\mathbf{Y}\setminus\operatorname{De}(\mathbf{X},\mathcal{D}). Then 𝐘𝐍d𝐗|𝐙\mathbf{Y_{N}}\perp_{d}\mathbf{X}\>|\>\mathbf{Z} in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}, since 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}} does not contain edges into 𝐗\mathbf{X} and since all paths from 𝐗\mathbf{X} to 𝐘𝐍\mathbf{Y_{N}} that start with an edge out of 𝐗\mathbf{X} in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}} contain a collider – a collider that cannot be an element of An(𝐙,𝒟)\operatorname{An}(\mathbf{Z},\mathcal{D}) since 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. Rule 3 of the do calculus (Equation (16)) then implies

    f(𝐲𝐍|do(𝐱),𝐳)=f(𝐲𝐍|𝐳).\displaystyle{f(\mathbf{y_{N}}|do(\mathbf{x}),\mathbf{z})=f(\mathbf{y_{N}}|\mathbf{z}).} (22)

Case 1: Assume that 𝐘𝐃=\mathbf{Y_{D}}=\emptyset so that 𝐘=𝐘𝐍\mathbf{Y}=\mathbf{Y_{N}}. Then we have the following. (Justification for the numbered equations is below.)

f(𝐲|do(𝐱),𝐳)\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) =f(𝐲|𝐳)\displaystyle=f(\mathbf{y}|\mathbf{z}) (23)
=𝐬f(𝐲|𝐳,𝐬)f(𝐬|𝐳)d𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}
=𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬.\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}. (24)

Equation (23) follows from Equation (22) and 𝐘=𝐘𝐍\mathbf{Y}=\mathbf{Y_{N}}. Equation (24) follows from the following logic. Since 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙\mathbf{X,Y,Z}) in 𝒟\mathcal{D} and since 𝐘=𝐘𝐍\mathbf{Y}=\mathbf{Y_{N}}, it holds that 𝐒𝐙\mathbf{S}\cup\mathbf{Z} blocks all paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒟\mathcal{D}. Thus, 𝐗d𝐘|𝐒𝐙\mathbf{X}\perp_{d}\mathbf{Y}\>|\>\mathbf{S}\cup\mathbf{Z} in 𝒟\mathcal{D}, which implies the analogous independence statement.

Case 2: Assume 𝐘𝐍=\mathbf{Y_{N}}=\emptyset so that 𝐘=𝐘𝐃\mathbf{Y}=\mathbf{Y_{D}}. Define 𝐇=An(𝐗𝐘,𝒟)(De(𝐗,𝒟)𝐘𝐙)\mathbf{H}=\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D})\setminus(\operatorname{De}(\mathbf{X},\mathcal{D})\cup\mathbf{Y}\cup\mathbf{Z}), 𝐒~=𝐒𝐇\mathbf{\tilde{S}}=\mathbf{S}\cup\mathbf{H}, 𝐒~𝐃=𝐒~De(𝐗,𝒟)\mathbf{\tilde{S}_{D}}=\mathbf{\tilde{S}}\cap\operatorname{De}(\mathbf{X},\mathcal{D}), and 𝐒~𝐍=𝐒~De(𝐗,𝒟)\mathbf{\tilde{S}_{N}}=\mathbf{\tilde{S}}\setminus\operatorname{De}(\mathbf{X},\mathcal{D}). Then we have the following. (Justification for the numbered equations is below.)

f(𝐲|do(𝐱),𝐳)\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) =𝐬~𝐍f(𝐲|𝐱,𝐳,𝐬~𝐍)f(𝐬~𝐍|𝐳)d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}_{N}})f(\mathbf{\tilde{s}_{N}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}} (25)
=𝐬~𝐍f(𝐲|𝐱,𝐳,𝐬~𝐍)𝐬~𝐃f(𝐬~𝐃,𝐬~𝐍|𝐳)d𝐬~𝐃d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}_{N}})\int_{\mathbf{\tilde{s}_{D}}}f(\mathbf{\tilde{s}_{D},\tilde{s}_{N}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{D}}\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}}
=𝐬~𝐃,𝐬~𝐍f(𝐲|𝐱,𝐳,𝐬~𝐍)f(𝐬~𝐃,𝐬~𝐍|𝐳)d𝐬~𝐃d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{D},\tilde{s}_{N}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}_{N}})f(\mathbf{\tilde{s}_{D},\tilde{s}_{N}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{D}}\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}} (26)
=𝐬~f(𝐲|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~\displaystyle=\int_{\mathbf{\tilde{s}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}} (27)
=𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬.\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}. (28)

Equation (25) holds since by Lemma 42(iv), 𝐒~𝐍\mathbf{\tilde{S}_{N}} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}. Equation (26) holds since 𝐒~𝐃\mathbf{\tilde{S}_{D}} is disjoint from 𝐘𝐗𝐒~𝐍𝐙\mathbf{Y}\cup\mathbf{X}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z}. Equation (27) holds since by Lemma 42(iii), we have 𝐘d𝐒~𝐃|𝐗𝐒~𝐍𝐙\mathbf{Y}\perp_{d}\mathbf{\tilde{S}_{D}}\>|\>\mathbf{X}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z} in 𝒟\mathcal{D}, where the analogous independence statement follows. Finally, Equation (28) results from applying Lemma 41(ii).

Case 3: Assume 𝐘𝐃\mathbf{Y_{D}}\neq\emptyset and 𝐘𝐍\mathbf{Y_{N}}\neq\emptyset and define 𝐇,𝐒~,𝐒~𝐃\mathbf{H},\mathbf{\tilde{S}},\mathbf{\tilde{S}_{D}}, and 𝐒~𝐍\mathbf{\tilde{S}_{N}} as in Case 2 above. We start by showing two equalities that rely on the do calculus. First note that by Lemma 42(vi), 𝐘𝐃d𝐘𝐍𝐒~𝐍𝐙|𝐗\mathbf{Y_{D}}\perp_{d}\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z}\>|\>\mathbf{X} in 𝒟𝐗¯𝐘𝐍𝐒~𝐍𝐙¯\mathcal{D}_{\overline{\mathbf{X}}\underline{\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z}}}. Thus by Rule 2 of the do calculus (Equation (15)), we have that

f(𝐲𝐃|do(𝐱),𝐲𝐍,𝐳,𝐬~𝐍)=f(𝐲𝐃|do(𝐱,𝐲𝐍,𝐳,𝐬~𝐍)).\displaystyle{f(\mathbf{y_{D}}|do(\mathbf{x}),\mathbf{y_{N},z,\tilde{s}_{N}})=f(\mathbf{y_{D}}|do(\mathbf{x,y_{N},z,\tilde{s}_{N}})).} (29)

Second, note by Lemma 42(vii), 𝐒~𝐍d𝐗|𝐘𝐍𝐙\mathbf{\tilde{S}_{N}}\perp_{d}\mathbf{X}\>|\>\mathbf{Y_{N}}\cup\mathbf{Z} in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}. Thus by Rule 3 of the do calculus (Equation (16)), we have that

f(𝐬~𝐍|do(𝐱),𝐲𝐍,𝐳)=f(𝐬~𝐍|𝐲𝐍,𝐳).\displaystyle{f(\mathbf{\tilde{s}_{N}}|do(\mathbf{x}),\mathbf{y_{N},z})=f(\mathbf{\tilde{s}_{N}}|\mathbf{y_{N},z}).} (30)

Then we have the following. (Justification for the numbered equations is below.)

f(𝐲|do(𝐱),𝐳)\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) =𝐬~𝐍f(𝐲,𝐬~𝐍|do(𝐱),𝐳)d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y,\tilde{s}_{N}}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}}
=𝐬~𝐍f(𝐲𝐃|𝐬~𝐍,𝐲𝐍,do(𝐱),𝐳)f(𝐬~𝐍|𝐲𝐍,do(𝐱),𝐳)f(𝐲𝐍|do(𝐱),𝐳)d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y_{D}}|\mathbf{\tilde{s}_{N},y_{N}},do(\mathbf{x}),\mathbf{z})f(\mathbf{\tilde{s}_{N}}|\mathbf{y_{N}},do(\mathbf{x}),\mathbf{z})f(\mathbf{y_{N}}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}}
=𝐬~𝐍f(𝐲𝐃|do(𝐱,𝐲𝐍,𝐳,𝐬~𝐍))f(𝐬~𝐍|𝐲𝐍,𝐳)f(𝐲𝐍|𝐳)d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y_{D}}|do(\mathbf{x,y_{N},z,\tilde{s}_{N}}))f(\mathbf{\tilde{s}_{N}}|\mathbf{y_{N},z})f(\mathbf{y_{N}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}} (31)
=𝐬~𝐍f(𝐲𝐃|do(𝐱,𝐲𝐍,𝐳,𝐬~𝐍))𝐬~𝐃f(𝐬~𝐍,𝐲𝐍,𝐬~𝐃|𝐳)d𝐬~𝐃d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y_{D}}|do(\mathbf{x,y_{N},z,\tilde{s}_{N}}))\int_{\mathbf{\tilde{s}_{D}}}f(\mathbf{\tilde{s}_{N},y_{N},\tilde{s}_{D}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{D}}\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}}
=𝐬~𝐍f(𝐲𝐃|do(𝐱,𝐲𝐍,𝐳,𝐬~𝐍))𝐬~𝐃f(𝐲𝐍|𝐬~,𝐳)f(𝐬~|𝐳)d𝐬~𝐃d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y_{D}}|do(\mathbf{x,y_{N},z,\tilde{s}_{N}}))\int_{\mathbf{\tilde{s}_{D}}}f(\mathbf{y_{N}}|\mathbf{\tilde{s},z})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{D}}\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}}
=𝐬~𝐍f(𝐲𝐃|𝐲𝐍,𝐱,𝐳,𝐬~𝐍)𝐬~𝐃f(𝐲𝐍|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~𝐃d𝐬~𝐍\displaystyle=\int_{\mathbf{\tilde{s}_{N}}}f(\mathbf{y_{D}}|\mathbf{y_{N},x,z,\tilde{s}_{N}})\int_{\mathbf{\tilde{s}_{D}}}f(\mathbf{y_{N}}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{D}}\mathop{}\!\mathrm{d}\mathbf{\tilde{s}_{N}} (32)
=𝐬~f(𝐲𝐃|𝐲𝐍,𝐱,𝐳,𝐬~)f(𝐲𝐍|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~\displaystyle=\int_{\mathbf{\tilde{s}}}f(\mathbf{y_{D}}|\mathbf{y_{N},x,z,\tilde{s}})f(\mathbf{y_{N}}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}} (33)
=𝐬~f(𝐲|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~\displaystyle=\int_{\mathbf{\tilde{s}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}}
=𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬.\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,}\mathbf{s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}. (34)

Equation (31) holds by the applying Equations (29), (30), and (22). Equation (32) holds by the following logic. By Lemma 42(v), the empty set is an adjustment set relative to (𝐗𝐘𝐍𝐒~𝐍𝐙,𝐘𝐃)(\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z},\mathbf{Y_{D}}) in 𝒟\mathcal{D}. Then by Lemma 41(i), 𝐒~\mathbf{\tilde{S}} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}, and so 𝐒~𝐙\mathbf{\tilde{S}}\cup\mathbf{Z} blocks all paths from 𝐗\mathbf{X} to 𝐘𝐍\mathbf{Y_{N}} in 𝒟\mathcal{D}. Thus, 𝐘𝐍d𝐗|𝐒~𝐙\mathbf{Y_{N}}\perp_{d}\mathbf{X}\>|\>\mathbf{\tilde{S}}\cup\mathbf{Z} in 𝒟\mathcal{D}, where the analogous independence statement follows.

Equation (33) holds since 𝐒~𝐃\mathbf{\tilde{S}_{D}} is disjoint from 𝐘𝐗𝐒~𝐍𝐙\mathbf{Y}\cup\mathbf{X}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z} and since by Lemma 42(iii), we have that 𝐘𝐃d𝐒~𝐃|𝐘𝐍𝐗𝐒~𝐍𝐙\mathbf{Y_{D}}\perp_{d}\mathbf{\tilde{S}_{D}}\>|\>\mathbf{Y_{N}}\cup\mathbf{X}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z} in 𝒟\mathcal{D}, where the analogous independence statement follows. Finally, Equation (34) results from applying Lemma 41(ii).  

D.2 Supporting Results

Lemma 41

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal DAG 𝒟\mathcal{D}, where 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset and where 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 2). Let 𝐇An(𝐗𝐘,𝒟)(De(𝐗,𝒟)𝐘𝐙)\mathbf{H}\subseteq\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D})\setminus(\operatorname{De}(\mathbf{X},\mathcal{D})\cup\mathbf{Y}\cup\mathbf{Z}) and 𝐒~=𝐒𝐇\mathbf{\tilde{S}}=\mathbf{S}\cup\mathbf{H}. Then:

  1. (i)

    𝐒~\mathbf{\tilde{S}} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}, and

  2. (ii)

    𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬=𝐬~f(𝐲|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}=\int_{\mathbf{\tilde{s}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}}, for any density ff consistent with 𝒟\mathcal{D}.

  •   Proof of Lemma 41.

    This lemma is analogous to Lemma 59 of Perković et al. (2018) (Lemma 34). We use the same proof strategy and adapt the arguments to suit our needs.

    (i) By Lemma 6(a), since 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}, then 𝐒𝐙\mathbf{S}\cup\mathbf{Z} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒟\mathcal{D}. Then by Lemma 34, 𝐒~𝐙\mathbf{\tilde{S}}\cup\mathbf{Z} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}). The statement follows by a second use of Lemma 6(a).

    (ii) Let ff be an arbitrary density consistent with 𝒟\mathcal{D}. We proceed with a proof by induction.

Base case: Suppose 𝐇={H}\mathbf{H}=\{H\} so that |𝐇|=1|\mathbf{H}|=1. When H𝐒H\in\mathbf{S}, the claim clearly holds. Thus, we let H𝐒H\notin\mathbf{S}. Note that the claim holds if either 𝐘dH|𝐗𝐒𝐙\mathbf{Y}\perp_{d}H\>|\>\mathbf{X}\cup\mathbf{S}\cup\mathbf{Z} or 𝐗dH|𝐒𝐙\mathbf{X}\perp_{d}H\>|\>\mathbf{S}\cup\mathbf{Z} in 𝒟\mathcal{D}. To see this, we calculate the following.

  1. (a)

    When (𝐘dH|𝐗𝐒𝐙)𝒟(\mathbf{Y}\perp_{d}H\>|\>\mathbf{X}\cup\mathbf{S}\cup\mathbf{Z})_{\mathcal{D}}, then

    𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬\displaystyle\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s} =𝐬f(𝐲|𝐱,𝐳,𝐬)hf(𝐬,h|𝐳)dhd𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})\int_{h}f(\mathbf{s},h|\mathbf{z})\mathop{}\!\mathrm{d}h\mathop{}\!\mathrm{d}\mathbf{s}
    =𝐬,hf(𝐲|𝐱,𝐳,𝐬)f(𝐬,h|𝐳)d𝐬dh\displaystyle=\int_{\mathbf{s},h}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s},h|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}\mathop{}\!\mathrm{d}h
    =𝐬~f(𝐲|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~,\displaystyle=\int_{\mathbf{\tilde{s}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}},

    where the second equality holds since H𝐘𝐗𝐒𝐙H\notin\mathbf{Y}\cup\mathbf{X}\cup\mathbf{S}\cup\mathbf{Z}.

  2. (b)

    When (𝐗dH|𝐒𝐙)𝒟(\mathbf{X}\perp_{d}H\>|\>\mathbf{S}\cup\mathbf{Z})_{\mathcal{D}}, then

    𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬\displaystyle\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s} =𝐬f(𝐬|𝐳)hf(𝐲,h|𝐱,𝐳,𝐬)dhd𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{s}|\mathbf{z})\int_{h}f(\mathbf{y},h|\mathbf{x,z,s})\mathop{}\!\mathrm{d}h\mathop{}\!\mathrm{d}\mathbf{s}
    =𝐬,hf(𝐲,h|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬dh\displaystyle=\int_{\mathbf{s},h}f(\mathbf{y},h|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}\mathop{}\!\mathrm{d}h
    =𝐬,hf(𝐲|𝐱,𝐳,𝐬,h)f(h|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬dh\displaystyle=\int_{\mathbf{s},h}f(\mathbf{y}|\mathbf{x,z,s},h)f(h|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}\mathop{}\!\mathrm{d}h
    =𝐬,hf(𝐲|𝐱,𝐳,𝐬,h)f(h|𝐳,𝐬)f(𝐬|𝐳)d𝐬dh\displaystyle=\int_{\mathbf{s},h}f(\mathbf{y}|\mathbf{x,z,s},h)f(h|\mathbf{z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}\mathop{}\!\mathrm{d}h
    =𝐬~f(𝐲|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~,\displaystyle=\int_{\mathbf{\tilde{s}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}},

    where the second equality holds since H𝐒𝐙H\notin\mathbf{S}\cup\mathbf{Z}.

We use the remainder of the base case to show that (a) or (b) must hold. For sake of contradiction, suppose that neither hold. This implies that there are two paths in 𝒟\mathcal{D}: one from 𝐗\mathbf{X} to HH that is d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z} and one from 𝐘\mathbf{Y} to HH that is d-connecting given 𝐗𝐒𝐙\mathbf{X}\cup\mathbf{S}\cup\mathbf{Z}. Let p=X,,Hp=\langle X,\ldots,H\rangle, X𝐗X\in\mathbf{X}, and q=H,,Yq=\langle H,\ldots,Y\rangle, Y𝐘Y\in\mathbf{Y}, be such paths, respectively, where pp is proper. In the arguments below, we use paths related to pp and qq – in the proper back-door graph 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{\mathbf{XY}}}^{pbd} (see Definition 14) and in four of its moral induced subgraphs (see Definition 15) – before applying Theorems 23 and 24 to reach our final contradiction (that 𝐒\mathbf{S} cannot satisfy the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}).

First, we claim that both pp and qq are d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. This holds for pp by definition. For sake of contradiction, suppose that qq is blocked by 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. Since qq is d-connecting given 𝐗𝐒𝐙\mathbf{X}\cup\mathbf{S}\cup\mathbf{Z}, it must contain a collider in An(𝐗,𝒟)An(𝐒𝐙,𝒟)\operatorname{An}(\mathbf{X},\mathcal{D})\setminus\operatorname{An}(\mathbf{S}\cup\mathbf{Z},\mathcal{D}). Let CC be the closest collider to YY on qq such that C(An(𝐗,𝒟)An(𝐒𝐙,𝒟))𝐗C\in(\operatorname{An}(\mathbf{X},\mathcal{D})\setminus\operatorname{An}(\mathbf{S}\cup\mathbf{Z},\mathcal{D}))\cup\mathbf{X}, and let r=C,,X,X𝐗r=\langle C,\ldots,X^{\prime}\rangle,X^{\prime}\in\mathbf{X}, be a shortest causal path in 𝒟\mathcal{D} from CC to 𝐗\mathbf{X}. Then let VV be the node closest to XX^{\prime} on rr that is also on q(C,Y)q(C,Y), and define the path t=(r)(X,V)q(V,Y)t=(-r)(X^{\prime},V)\oplus q(V,Y). Note that tt is non-causal since either (r)(X,V)(-r)(X^{\prime},V) is of non-zero length or X=V=CX^{\prime}=V=C, so that tt is a path into XX^{\prime}. Further, by the definitions of qq, CC, and rr, we have that tt is proper non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. But this contradicts that 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}.

Next, we prove that the sequence of nodes in 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd} corresponding to pp forms a path. Note that since pp is proper, we only need to show that pp does not start with an edge XWX\rightarrow W, where WW is a node that lies on a proper causal path in 𝒟\mathcal{D} from XX to 𝐘\mathbf{Y}. For sake of contradiction, suppose that pp starts with XWX\rightarrow W for such a WForb(𝐗,𝐘,𝒟)W\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{D}). Note that pp cannot be causal from XX to HH, since HDe(𝐗,𝒟)H\notin\operatorname{De}(\mathbf{X},\mathcal{D}) by the definition of 𝐇\mathbf{H}. Thus, pp is non-causal and there is a collider CC^{\prime} on pp such that CDe(W,𝒟)C^{\prime}\in\operatorname{De}(W,\mathcal{D}). Since pp is d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z} and 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset, then 𝐒De(C,𝒟)\mathbf{S}\cap\operatorname{De}(C^{\prime},\mathcal{D})\neq\emptyset. Further, since De(C,𝒟)Forb(𝐗,𝐘,𝒟)\operatorname{De}(C^{\prime},\mathcal{D})\subseteq\operatorname{Forb}(\mathbf{X,Y},\mathcal{D}), this implies that 𝐒Forb(𝐗,𝐘,𝒟)\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{D})\neq\emptyset. But this contradicts that 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}.

Similarly, we prove that the sequence of nodes in 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd} corresponding to qq also forms a path. For this, note that all nodes in 𝐗\mathbf{X} on qq must be a colliders on qq, since qq is d-connecting given 𝐗𝐒𝐙\mathbf{X}\cup\mathbf{S}\cup\mathbf{Z}. Thus, removing edges out of 𝐗\mathbf{X} from 𝒟\mathcal{D} in order to form 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd} will not affect the edges on qq.

Let p~\tilde{p} and q~\tilde{q} be the paths in 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd} corresponding to pp and qq, respectively. Then for sake of contradiction, suppose either p~\tilde{p} or q~\tilde{q} is blocked given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. Since pp and qq are d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}, then there must be a node CC on pp or qq where CC is a collider on pp or qq and every causal path in 𝒟\mathcal{D} from CC to 𝐒𝐙\mathbf{S}\cup\mathbf{Z} contains the first edge of a proper causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒟\mathcal{D}. Let dd be an arbitrary such causal path in 𝒟\mathcal{D} from CC to 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. Note that dd is a path from CC to 𝐒\mathbf{S}, since dd must contain a node in 𝐗\mathbf{X} and since 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. But since dd contains the first edge of a proper causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒟\mathcal{D}, this implies that 𝐒Forb(𝐗,𝐘,𝒟)\mathbf{S}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{D})\neq\emptyset, which contradicts that 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}.

We continue the base case by reasoning with four moral induced subgraphs of 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd} (see Definition 15). Start by defining the following.

𝐀𝐗𝐇𝐘𝐒𝐙\displaystyle\mathbf{A}_{\mathbf{XHYSZ}} =An(𝐗𝐇𝐘𝐒𝐙,𝒟𝐗𝐘pbd).\displaystyle=\operatorname{An}(\mathbf{X}\cup\mathbf{H}\cup\mathbf{Y}\cup\mathbf{S}\cup\mathbf{Z},\mathcal{D}_{\mathbf{XY}}^{pbd}).
𝐀𝐗𝐘𝐒𝐙\displaystyle\mathbf{A}_{\mathbf{XYSZ}} =An(𝐗𝐘𝐒𝐙,𝒟𝐗𝐘pbd).\displaystyle=\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{S}\cup\mathbf{Z},\mathcal{D}_{\mathbf{XY}}^{pbd}).
𝐀𝐗𝐇𝐒𝐙\displaystyle\mathbf{A}_{\mathbf{XHSZ}} =An(𝐗𝐇𝐒𝐙,𝒟𝐗𝐘pbd).\displaystyle=\operatorname{An}(\mathbf{X}\cup\mathbf{H}\cup\mathbf{S}\cup\mathbf{Z},\mathcal{D}_{\mathbf{XY}}^{pbd}).
𝐀𝐇𝐘𝐒𝐙\displaystyle\mathbf{A}_{\mathbf{HYSZ}} =An(𝐇𝐘𝐒𝐙,𝒟𝐗𝐘pbd).\displaystyle=\operatorname{An}(\mathbf{H}\cup\mathbf{Y}\cup\mathbf{S}\cup\mathbf{Z},\mathcal{D}_{\mathbf{XY}}^{pbd}).

Then define 𝒟𝐗𝐇𝐘𝐒𝐙\mathcal{D}_{\mathbf{XHYSZ}}, 𝒟𝐗𝐘𝐒𝐙\mathcal{D}_{\mathbf{XYSZ}}, 𝒟𝐗𝐇𝐒𝐙\mathcal{D}_{\mathbf{XHSZ}}, and 𝒟𝐇𝐘𝐒𝐙\mathcal{D}_{\mathbf{HYSZ}} to be the moral induced subgraphs of 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd} on nodes 𝐀𝐗𝐇𝐘𝐒𝐙\mathbf{A}_{\mathbf{XHYSZ}}, 𝐀𝐗𝐘𝐒𝐙\mathbf{A}_{\mathbf{XYSZ}}, 𝐀𝐗𝐇𝐒𝐙\mathbf{A}_{\mathbf{XHSZ}}, and 𝐀𝐇𝐘𝐒𝐙\mathbf{A}_{\mathbf{HYSZ}}, respectively. In order to use Theorem 24, we want to show that 𝒟𝐗𝐘𝐒𝐙\mathcal{D}_{\mathbf{XYSZ}} contains a path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that does not contain a node in 𝐒𝐙\mathbf{S}\cup\mathbf{Z}.

Since p~\tilde{p} and q~\tilde{q} are d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}, then by Theorem 23, the following two paths must exist in 𝒟𝐗𝐇𝐒𝐙\mathcal{D}_{\mathbf{XHSZ}}: path aa from XX to HH and path bb from HH to YY, where neither path contains a node in 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. Note that since 𝐀𝐗𝐇𝐒𝐙𝐀𝐗𝐇𝐘𝐒𝐙\mathbf{A}_{\mathbf{XHSZ}}\subseteq\mathbf{A}_{\mathbf{XHYSZ}} and 𝐀𝐇𝐘𝐒𝐙𝐀𝐗𝐇𝐘𝐒𝐙\mathbf{A}_{\mathbf{HYSZ}}\subseteq\mathbf{A}_{\mathbf{XHYSZ}}, any path in 𝒟𝐗𝐇𝐒𝐙\mathcal{D}_{\mathbf{XHSZ}} or 𝒟𝐇𝐘𝐒𝐙\mathcal{D}_{\mathbf{HYSZ}} will also be in 𝒟𝐗𝐇𝐘𝐒𝐙\mathcal{D}_{\mathbf{XHYSZ}}. Further, since HAn(𝐗𝐘,𝒟)H\in\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D}) by definition and since we form 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd} by removing edges out of 𝐗\mathbf{X} from 𝒟\mathcal{D}, then HAn(𝐗𝐘,𝒟𝐗𝐘pbd)H\in\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D}_{\mathbf{XY}}^{pbd}). Therefore, 𝐀𝐗𝐇𝐘𝐒𝐙=𝐀𝐗𝐘𝐒𝐙\mathbf{A}_{\mathbf{XHYSZ}}=\mathbf{A}_{\mathbf{XYSZ}} and 𝒟𝐗𝐇𝐘𝐒𝐙=𝒟𝐗𝐘𝐒𝐙\mathcal{D}_{\mathbf{XHYSZ}}=\mathcal{D}_{\mathbf{XYSZ}}. Thus, aa and bb are both paths in 𝒟𝐗𝐘𝐒𝐙\mathcal{D}_{\mathbf{XYSZ}}.

We complete the base case by applying Theorems 23 and 24 to show our necessary contradiction. Since we can combine subpaths of aa and bb to form a path cc in 𝒟𝐗𝐘𝐒𝐙\mathcal{D}_{\mathbf{XYSZ}} from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that does not contain a node in 𝐒𝐙\mathbf{S}\cup\mathbf{Z}, then by Theorem 23, 𝐗\mathbf{X} and 𝐘\mathbf{Y} are d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z} in 𝒟𝐗𝐘pbd\mathcal{D}_{\mathbf{XY}}^{pbd}. By Theorem 24, this implies that 𝐒𝐙\mathbf{S}\cup\mathbf{Z} does not satisfy the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X,Y}) in 𝒟\mathcal{D} (see Definition 12). Therefore, by the contraposition of Lemma 6(a), 𝐒\mathbf{S} does not satisfy the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}, which is a contradiction.

Induction step: Assume that the result holds for |𝐇|=k|\mathbf{H}|=k, kk\in\mathbb{N}, and let |𝐇|=k+1|\mathbf{H}|=k+1. Take an arbitrary H𝐇H\in\mathbf{H}, and define 𝐒=𝐒{H}\mathbf{S^{\prime}}=\mathbf{S}\cup\{H\} and 𝐇=𝐇{H}\mathbf{H^{\prime}}=\mathbf{H}\setminus\{H\}. Since the base case holds and since {H}An(𝐗𝐘,𝒟)(De(𝐗,𝒟)𝐘𝐙)\{H\}\subseteq\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D})\setminus(\operatorname{De}(\mathbf{X},\mathcal{D})\cup\mathbf{Y}\cup\mathbf{Z}), then

𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬\displaystyle\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{x,z,s})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s} =𝐬,hf(𝐲|𝐱,𝐳,𝐬,h)f(𝐬,h|𝐳)d𝐬dh\displaystyle=\int_{\mathbf{s},h}f(\mathbf{y}|\mathbf{x,z,s},h)f(\mathbf{s},h|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}\mathop{}\!\mathrm{d}h
=𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬.\displaystyle=\int_{\mathbf{s^{\prime}}}f(\mathbf{y}|\mathbf{x,z,s^{\prime}})f(\mathbf{s^{\prime}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s^{\prime}}. (35)

Further, by part (i), 𝐒\mathbf{S^{\prime}} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}. Since 𝐇An(𝐗𝐘,𝒟)(De(𝐗,𝒟)𝐘𝐙)\mathbf{H^{\prime}}\subseteq\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D})\setminus(\operatorname{De}(\mathbf{X},\mathcal{D})\cup\mathbf{Y}\cup\mathbf{Z}) and |𝐇|=k|\mathbf{H^{\prime}}|=k, then by the induction assumption,

𝐬f(𝐲|𝐱,𝐳,𝐬)f(𝐬|𝐳)d𝐬\displaystyle\int_{\mathbf{s^{\prime}}}f(\mathbf{y}|\mathbf{x,z,s^{\prime}})f(\mathbf{s^{\prime}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s^{\prime}} =𝐬,𝐡f(𝐲|𝐱,𝐳,𝐬,𝐡)f(𝐬,𝐡|𝐳)d𝐬d𝐡\displaystyle=\int_{\mathbf{s^{\prime},h^{\prime}}}f(\mathbf{y}|\mathbf{x,z,s^{\prime},h^{\prime}})f(\mathbf{s^{\prime},h^{\prime}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s^{\prime}}\mathop{}\!\mathrm{d}\mathbf{h^{\prime}}
=𝐬~f(𝐲|𝐱,𝐳,𝐬~)f(𝐬~|𝐳)d𝐬~.\displaystyle=\int_{\mathbf{\tilde{s}}}f(\mathbf{y}|\mathbf{x,z,\tilde{s}})f(\mathbf{\tilde{s}}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{\tilde{s}}. (36)

Combining (35) and (36) yields the desired result.  

Lemma 42

Let 𝐗,𝐘,𝐙\mathbf{X},\mathbf{Y},\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal DAG 𝒟\mathcal{D}, where 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset and where 𝐒\mathbf{S} satisfies the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 2). Let 𝐇=An(𝐗𝐘,𝒟)(De(𝐗,𝒟)𝐘𝐙)\mathbf{H}=\operatorname{An}(\mathbf{X}\cup\mathbf{Y},\mathcal{D})\setminus(\operatorname{De}(\mathbf{X},\mathcal{D})\cup\mathbf{Y}\cup\mathbf{Z}) and 𝐒~=𝐒𝐇\mathbf{\tilde{S}}=\mathbf{S}\cup\mathbf{H}. Additionally, let 𝐒~𝐃=𝐒~De(𝐗,𝒟)\mathbf{\tilde{S}_{D}}=\mathbf{\tilde{S}}\cap\operatorname{De}(\mathbf{X},\mathcal{D}), 𝐒~𝐍=𝐒~De(𝐗,𝒟)\mathbf{\tilde{S}_{N}}=\mathbf{\tilde{S}}\setminus\operatorname{De}(\mathbf{X},\mathcal{D}), 𝐘𝐃=𝐘De(𝐗,𝒟)\mathbf{Y_{D}}=\mathbf{Y}\cap\operatorname{De}(\mathbf{X},\mathcal{D}), and 𝐘𝐍=𝐘De(𝐗,𝒟)\mathbf{Y_{N}}=\mathbf{Y}\setminus\operatorname{De}(\mathbf{X},\mathcal{D}). Then the following statements hold:

  1. (i)

    (𝐗𝐘𝐍𝐒~𝐙)Forb(𝐗,𝐘,𝒟)=(\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}}\cup\mathbf{Z})\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{D})=\emptyset,

  2. (ii)

    if p=H,,YDp=\langle H,\dots,Y_{D}\rangle is a non-causal path in 𝒟\mathcal{D} from H𝐗𝐘𝐍𝐒~𝐙H\in\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}}\cup\mathbf{Z} to a node YD𝐘𝐃Y_{D}\in\mathbf{Y_{D}}, then pp is blocked by (𝐗𝐘𝐍𝐒~𝐍𝐙){H}(\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z})\setminus\{H\},

  3. (iii)

    𝐘𝐃d𝐒~𝐃|𝐘𝐍𝐗𝐒~𝐍𝐙\mathbf{Y_{D}}\perp_{d}\mathbf{\tilde{S}_{D}}\>|\>\mathbf{Y_{N}}\cup\mathbf{X}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z} in 𝒟\mathcal{D},

  4. (iv)

    if 𝐘𝐍=\mathbf{Y_{N}}=\emptyset, then 𝐒~𝐍\mathbf{\tilde{S}_{N}} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 1),

  5. (v)

    the empty set is an adjustment set relative to (𝐗𝐘𝐍𝐒~𝐍𝐙,𝐘𝐃)(\mathbf{X}\cup\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z},\mathbf{Y_{D}}) in 𝒟\mathcal{D} (Definition 11),

  6. (vi)

    𝐘𝐃d(𝐘𝐍𝐒~𝐍𝐙)|𝐗\mathbf{Y_{D}}\perp_{d}(\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z})\>|\>\mathbf{X} in 𝒟𝐗¯𝐘𝐍𝐒~𝐍𝐙¯\mathcal{D}_{\overline{\mathbf{X}}\underline{\mathbf{Y_{N}}\cup\mathbf{\tilde{S}_{N}}\cup\mathbf{Z}}}, and

  7. (vii)

    𝐒~𝐍d𝐗|𝐘𝐍𝐙\mathbf{\tilde{S}_{N}}\perp_{d}\mathbf{X}\>|\>\mathbf{Y_{N}}\cup\mathbf{Z} in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}.

  •   Proof of Lemma 42.

    This lemma is analogous to Lemma 60 of Perković et al. (2018) (Lemma 35), which is needed for adjustment in total effect identification. We rely on this result in the proof below.

    Note that 𝐗,𝐘\mathbf{X,Y}, and 𝐒𝐙\mathbf{S}\cup\mathbf{Z} are pairwise disjoint node sets in 𝒟\mathcal{D}, where by Lemma 6(a), 𝐒𝐙\mathbf{S}\cup\mathbf{Z} satisfies the adjustment criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒟\mathcal{D}. Results (i)-(iii) and (vi) follow directly from Lemma 35. Result (v) follows additionally from Theorem 25. Result (iv) follows additionally from Theorem 25 and Lemma 6(a).

    (vii) Let pp be an arbitrary path from X𝐗X\in\mathbf{X} to 𝐒~𝐍\mathbf{\tilde{S}_{N}} in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}. By definition of 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}, pp begins with an edge out of XX. Since, by definition, 𝐒~𝐍De(𝐗,𝒟)=\mathbf{\tilde{S}_{N}}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset, where De(𝐗,𝒟𝐗¯)De(𝐗,𝒟)\operatorname{De}(\mathbf{X},\mathcal{D}_{\overline{\mathbf{X}}})\subseteq\operatorname{De}(\mathbf{X},\mathcal{D}), then pp must contain at least one collider. Let 𝐂\mathbf{C} be the set containing the closest collider to XX on pp and its descendants in 𝒟𝐗¯\mathcal{D}_{\overline{\mathbf{X}}}. Note that 𝐂De(𝐗,𝒟𝐗¯)De(𝐗,𝒟)\mathbf{C}\subseteq\operatorname{De}(\mathbf{X},\mathcal{D}_{\overline{\mathbf{X}}})\subseteq\operatorname{De}(\mathbf{X},\mathcal{D}). By definition of 𝐘𝐍\mathbf{Y_{N}} and by assumption, (𝐘𝐍𝐙)De(𝐗,𝒟)=(\mathbf{Y_{N}}\cup\mathbf{Z})\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset, and thus, pp is blocked by 𝐘𝐍𝐙\mathbf{Y_{N}}\cup\mathbf{Z}.  

Appendix E CONDITIONAL BACK-DOOR CRITERION

This section extends Pearl’s back-door criterion (2009) to the context of estimating a conditional causal effect in a DAG. Definition 43 provides the extended criterion, and Lemma 44 establishes that this criterion is sufficient for conditional adjustment. Lemma 45 makes a comparison between this criterion and the generalized back-door criterion of Maathuis and Colombo (2015) (Definition 13).

Definition 43

(Conditional Back-door Criterion for DAGs) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a DAG 𝒟\mathcal{D}, where 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. Then 𝐒\mathbf{S} satisfies the conditional back-door criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒟\mathcal{D} if

  1. (a)

    𝐒De(𝐗,𝒟)=\mathbf{S}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset, and

  2. (b)

    𝐒𝐙\mathbf{S}\cup\mathbf{Z} blocks all proper back-door paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y}.

Lemma 44

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a causal DAG 𝒟\mathcal{D}, where 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. If 𝐒\mathbf{S} satisfies the conditional back-door criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒟\mathcal{D} (Definition 43), then 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒟\mathcal{D} (Definition 1).

  •   Proof of Lemma 44.

    Let 𝐒\mathbf{S} be a set that satisfies the conditional back-door criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒟\mathcal{D}, and let ff be a density consistent with 𝒟\mathcal{D}. Then

    f(𝐲|do(𝐱),𝐳)\displaystyle f(\mathbf{y}|do(\mathbf{x}),\mathbf{z}) =𝐬f(𝐲,𝐬|do(𝐱),𝐳)d𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{y},\mathbf{s}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}
    =𝐬f(𝐲|𝐬,do(𝐱),𝐳)f(𝐬|do(𝐱),𝐳)d𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{s},do(\mathbf{x}),\mathbf{z})f(\mathbf{s}|do(\mathbf{x}),\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}
    =𝐬f(𝐲|𝐬,𝐱,𝐳)f(𝐬|𝐳)d𝐬\displaystyle=\int_{\mathbf{s}}f(\mathbf{y}|\mathbf{s},\mathbf{x},\mathbf{z})f(\mathbf{s}|\mathbf{z})\mathop{}\!\mathrm{d}\mathbf{s}

    The first two equalities follow from the law of total probability and the chain rule. The third equality follows from Rules 2 and 3 of the do calculus (Equations (15) and (16)) and the d-separations shown below.

    In order to use Rule 2 to conclude that f(𝐲|𝐬,do(𝐱),𝐳)=f(𝐲|𝐬,𝐱,𝐳)f(\mathbf{y}|\mathbf{s},do(\mathbf{x}),\mathbf{z})=f(\mathbf{y}|\mathbf{s},\mathbf{x},\mathbf{z}), we show that (𝐘d𝐗|𝐒𝐙)𝒟𝐗¯(\mathbf{Y}\perp_{d}\mathbf{X}\>|\>\mathbf{S}\cup\mathbf{Z})_{\mathcal{D}_{\underline{\mathbf{X}}}}. Note that 𝒟𝐗¯\mathcal{D}_{\underline{\mathbf{X}}} only contains back-door paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y}. So every path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒟𝐗¯\mathcal{D}_{\underline{\mathbf{X}}} contains a proper back-door path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} as a subpath. Since 𝐒𝐙\mathbf{S}\cup\mathbf{Z} blocks all proper back-door paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒟\mathcal{D}, the d-separation holds.

    In order to use Rule 3 to conclude that f(𝐬|do(𝐱),𝐳)=f(𝐬|𝐳)f(\mathbf{s}|do(\mathbf{x}),\mathbf{z})=f(\mathbf{s}|\mathbf{z}), we show that (𝐒d𝐗|𝐙)𝒟𝐗(𝐙)¯(\mathbf{S}\perp_{d}\mathbf{X}\>|\>\mathbf{Z})_{\mathcal{D}_{\overline{\mathbf{X}(\mathbf{Z})}}}. This follows from the assumptions that 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset and 𝐒De(𝐗,𝒟)=\mathbf{S}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset.  

Lemma 45

(Comparison of Back-door Criteria for DAGs) Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, 𝐙\mathbf{Z}, and 𝐒\mathbf{S} be pairwise disjoint node sets in a DAG 𝒟\mathcal{D}, where 𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. Then 𝐒\mathbf{S} satisfies the conditional back-door criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D} (Definition 43) if and only if 𝐒𝐙\mathbf{S}\cup\mathbf{Z} satisfies the generalized back-door criterion relative to (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) in 𝒟\mathcal{D} (Definition 13).

  •   Proof of Lemma 45.

    :\Leftarrow: Follows immediately.

    :\Rightarrow: Since 𝐒\mathbf{S} satisfies the conditional back-door criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒟\mathcal{D}, then 𝐒De(𝐗,𝒟)=\mathbf{S}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. Combining this with our assumptions gives us that (𝐒𝐙)De(𝐗,𝒟)=(\mathbf{S}\cup\mathbf{Z})\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset. In the remainder of the proof, we show that 𝐒𝐙𝐗{X}\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\} blocks all back-door paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y}. The result follows by Definition 13.

    Let p1p_{1} be an arbitrary back-door path from X𝐗X\in\mathbf{X} to Y𝐘Y\in\mathbf{Y} in 𝒟\mathcal{D}. For sake of contradiction, suppose that p1p_{1} is d-connecting given 𝐒𝐙𝐗{X}\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\}. Let XCX_{C} be the node in 𝐗\mathbf{X} closest to YY on p1p_{1}, and let p2=p1(XC,Y)p_{2}=p_{1}(X_{C},Y). Note that p2p_{2} is proper. When XC=XX_{C}=X, then p2=p1p_{2}=p_{1} is a back-door path. When XCXX_{C}\neq X, then because p1p_{1} is d-connecting given 𝐒𝐙𝐗{X}\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\}, we have that XCX_{C} is a collider on p1p_{1}, and therefore, p2p_{2} is again a back-door path. Thus, p2p_{2} is a proper back-door path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that, by assumption, must be blocked by 𝐒𝐙\mathbf{S}\cup\mathbf{Z}.

    Let AA be the node on p2p_{2} immediately following XCX_{C}. That is, p2p_{2} contains XCAX_{C}\leftarrow A. Note that since p2p_{2} is blocked given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}, then AYA\neq Y. Thus, we consider the path p3=p2(A,Y)p_{3}=p_{2}(A,Y). Since p1p_{1} is d-connecting given 𝐒𝐙𝐗{X}\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\}, where AA is a non-collider on p1p_{1}, then A𝐒𝐙𝐗{X}A\notin\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\} and thus, p3p_{3} is also d-connecting given 𝐒𝐙𝐗{X}\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\}. Similarly, since p2p_{2} is blocked given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}, where AA is not a collider on p2p_{2} and A𝐒𝐙𝐗{X}A\notin\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\}, then p3p_{3} is also blocked by 𝐒𝐙\mathbf{S}\cup\mathbf{Z}.

    Since p3p_{3} is d-connecting given 𝐒𝐙𝐗{X}\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\} and blocked given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}, then p3p_{3} must contain at least one collider in An(𝐗{X},𝒟)An(𝐒𝐙,𝒟)\operatorname{An}(\mathbf{X}\setminus\{X\},\mathcal{D})\setminus\operatorname{An}(\mathbf{S}\cup\mathbf{Z},\mathcal{D}). Let CC be the closest such collider to YY on p3p_{3} and let r=C,,X,X𝐗r=\langle C,\ldots,X^{\prime}\rangle,X^{\prime}\in\mathbf{X}, be a shortest causal path from CC to 𝐗\mathbf{X} in 𝒟\mathcal{D}. While there must be a causal path from CC to 𝐗{X}\mathbf{X}\setminus\{X\} in 𝒟\mathcal{D}, note that rr need not be one, and thus, we allow for the possibility that X=XX^{\prime}=X.

    Let BB be the node closest to XX^{\prime} on rr that is also on p3(C,Y)p_{3}(C,Y), and define the path t=(r)(X,B)p3(B,Y)t=(-r)(X^{\prime},B)\oplus p_{3}(B,Y). Note that since p2p_{2} is proper, (r)(X,B)(-r)(X^{\prime},B) is at least of length one, and therefore, tt is a back-door path. Further, since p3p_{3} is d-connecting given 𝐒𝐙𝐗{X}\mathbf{S}\cup\mathbf{Z}\cup\mathbf{X}\setminus\{X\} and by the definition of CC and rr, we have that tt is proper back-door path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is d-connecting given 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. But this contradicts that 𝐒\mathbf{S} satisfies the conditional back-door criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒟\mathcal{D}.  

Appendix F PROOFS FOR SECTION 3.3: MPDAGS - CONSTRUCTING CONDITIONAL ADJUSTMENT SETS

This section includes the proofs of two results from Section 3.3: Lemma 4 and Theorem 5. We also provide three supporting results needed for these proofs.

F.1 Main Results

  •   Proof of Lemma 4.

    By Lemma 26, Pa(X,𝒢)\operatorname{Pa}(X,\mathcal{G}) must satisfy condition (a) of Definition 2, so it suffices to show that Pa(X,𝒢)𝐙\operatorname{Pa}(X,\mathcal{G})\cup\mathbf{Z} blocks all non-causal definite status paths from XX to 𝐘\mathbf{Y} in 𝒢\mathcal{G}. Note that since 𝐘Pa(X,𝒢)=\mathbf{Y}\cap\operatorname{Pa}(X,\mathcal{G})=\emptyset, any definite status path from XX to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that starts with an edge into XX is blocked by Pa(X,𝒢)𝐙\operatorname{Pa}(X,\mathcal{G})\cup\mathbf{Z}.

    Further, any non-causal definite status path from XX to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that starts with an edge out of XX or an undirected edge must contain a collider. Additionally, the closest collider to XX on any such path and all of its descendants in 𝒢\mathcal{G} must be in PossDe(X,𝒢)\operatorname{PossDe}(X,\mathcal{G}) by Lemma 48. Then since [Pa(X,𝒢)𝐙]PossDe(X,𝒢)=\big{[}\operatorname{Pa}(X,\mathcal{G})\cup\mathbf{Z}\big{]}\cap\operatorname{PossDe}(X,\mathcal{G})=\emptyset, these paths are also blocked by Pa(X,𝒢)𝐙\operatorname{Pa}(X,\mathcal{G})\cup\mathbf{Z}.  

  •   Proof of Theorem 5.

    By Theorem 3, it suffices to show that Adjust(𝐗,𝐘,𝐙,𝒢)\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}) and O(𝐗,𝐘,𝒢)\operatorname{O}(\mathbf{X,Y},\mathcal{G}) separately satisfy the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G} (Definition 2). We start by noting that Adjust(𝐗,𝐘,𝐙,𝒢)\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}) and O(𝐗,𝐘,𝒢)\operatorname{O}(\mathbf{X,Y},\mathcal{G}) are both disjoint from Forb(𝐗,𝐘,𝒢)𝐗𝐘𝐙\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}, so it suffices to prove that (a) Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z} and (b) O(𝐗,𝐘,𝒢)𝐙\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z} block all proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}. We prove (a) and (b) below. For these proofs, note that 𝐙Forb(𝐗,𝐘,𝒢)=\mathbf{Z}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset by the assumption that 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and by Lemma 26.

    (a) Adjust(𝐗,𝐘,𝐙,𝓖)𝐙:\textbf{Adjust(}\mathbf{X,Y,Z},\boldsymbol{\mathcal{G}}\textbf{)}\boldsymbol{\cup}\mathbf{Z}\textbf{:}Suppose for sake of contradiction that there is a proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that is d-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}. Let p=X,,Yp=\langle X,\dots,Y\rangle be a shortest such path.

    Since pp is proper, no non-endpoint on pp is in 𝐗\mathbf{X}. Suppose for sake of contradiction that there exists Y𝐘Y^{\prime}\in\mathbf{Y} that is a non-endpoint on pp. By choice of pp, this implies that p(X,Y)p(X,Y^{\prime}) is possibly causal. Then by Lemma 27, since pp is non-causal, p(Y,Y)p(Y^{\prime},Y) must contain a collider on pp. Let CC be the closest such collider to YY^{\prime} (possibly C=YC=Y^{\prime}). Note that by Lemma 27, CPossDe(Y,𝒢)C\in\operatorname{PossDe}(Y^{\prime},\mathcal{G}), so by Lemma 48, De(C,𝒢)PossDe(Y,𝒢)\operatorname{De}(C,\mathcal{G})\subseteq\operatorname{PossDe}(Y^{\prime},\mathcal{G}), where YPossDe(X,𝒢)Y^{\prime}\in\operatorname{PossDe}(X,\mathcal{G}). Thus, De(C,𝒢)Forb(𝐗,𝐘,𝒢)\operatorname{De}(C,\mathcal{G})\subseteq\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). However, this contradicts that pp is d-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}. Therefore, no non-endpoint on pp is in 𝐗𝐘\mathbf{X}\cup\mathbf{Y}.

    We now consider cases (1) and (2) below.

    1. (1)

      Consider when there is no collider on pp. Since pp is d-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}, no node on pp is in Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}. Then by Equation (8), no node on pp is in PossAn(𝐗𝐘,𝒢)[Forb(𝐗,𝐘,𝒢)𝐗𝐘𝐙]\operatorname{PossAn}(\mathbf{X\cup Y},\mathcal{G})\setminus[\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}]. However, note that by Lemma 27, every non-endpoint on pp is a possible ancestor of an endpoint on pp and thus is in PossAn(𝐗𝐘,𝒢)(𝐗𝐘𝐙)\operatorname{PossAn}(\mathbf{X\cup Y},\mathcal{G})\setminus(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}). Combining these, we have that all non-endpoints on pp are in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). But this implies that there is no set that is both disjoint from Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) and can block pp. By Theorem 3, this contradicts our assumption that there is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}.

    2. (2)

      Consider when there is at least one collider CC on pp. For sake of contradiction, suppose that there are more than three nodes on pp. Then there is a non-collider B𝐗𝐘B\notin\mathbf{X}\cup\mathbf{Y} such that CBC\leftarrow B or BCB\to C is on pp. Since pp is d-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}, then B𝐙B\notin\mathbf{Z} and BAn(Adjust(𝐗,𝐘,𝐙,𝒢)𝐙,𝒢)B\in\operatorname{An}(\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z},\mathcal{G}). By Equation (8) and Lemma 48, B[PossAn(𝐗𝐘,𝒢)An(𝐙,𝒢)](𝐗𝐘𝐙)B\in[\operatorname{PossAn}(\mathbf{X\cup Y},\mathcal{G})\cup\operatorname{An}(\mathbf{Z},\mathcal{G})]\setminus(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}). Additionally, since pp is d-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}, then BAdjust(𝐗,𝐘,𝐙,𝒢)[PossAn(𝐗𝐘,𝒢)An(𝐙,𝒢)](Forb(𝐗,𝐘,𝒢)𝐗𝐘𝐙)B\notin\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\equiv[\operatorname{PossAn}(\mathbf{X\cup Y},\mathcal{G})\cup\operatorname{An}(\mathbf{Z},\mathcal{G})]\setminus(\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}). Combining these, we have that BForb(𝐗,𝐘,𝒢)B\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). Since there is a causal path in 𝒢\mathcal{G} from BB to every node in De(C,𝒢)\operatorname{De}(C,\mathcal{G}), by Lemma 48, De(C,𝒢)Forb(𝐗,𝐘,𝒢)\operatorname{De}(C,\mathcal{G})\subseteq\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). However, this would contradict that pp is d-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}.

      Hence, pp must be of the form XCYX\to C\leftarrow Y, where CAn(Adjust(𝐗,𝐘,𝐙,𝒢)𝐙,𝒢)C\in\operatorname{An}(\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z},\mathcal{G}) and thus by Equation (8) and Lemma 48, CPossAn(𝐗𝐘,𝒢)An(𝐙,𝒢)C\in\operatorname{PossAn}(\mathbf{X\cup Y},\mathcal{G})\cup\operatorname{An}(\mathbf{Z},\mathcal{G}). Note that CAn(𝐙,𝒢)C\notin\operatorname{An}(\mathbf{Z},\mathcal{G}), since otherwise, 𝐙PossDe(𝐗,𝒢)\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})\neq\emptyset. Further, CPossAn(𝐘,𝒢)C\notin\operatorname{PossAn}(\mathbf{Y},\mathcal{G}), because otherwise by Lemma 48, CPossMed(𝐗,𝐘,𝒢)C\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}), which would imply De(C,𝒢)Forb(𝐗,𝐘,𝒢)\operatorname{De}(C,\mathcal{G})\subseteq\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) which we have shown is a contradiction. Therefore, CPossAn(𝐗,𝒢)C\in\operatorname{PossAn}(\mathbf{X},\mathcal{G}).

      Let q=C=Q1,,Qm=X,m2q=\langle C=Q_{1},\dots,Q_{m}=X^{\prime}\rangle,m\geq 2, be a shortest possibly causal path in 𝒢\mathcal{G} from CC to 𝐗\mathbf{X}. Further, define the node Qj,j{1,,m}Q_{j},j\in\{1,\dots,m\}, as follows. When qq has no directed edges, let Qj=QmQ_{j}=Q_{m}. When qq has at least one directed edge, let QjQ_{j} be the node on qq closest to Q1Q_{1} such that QjQj+1Q_{j}\to Q_{j+1} is on qq. Note that by Lemma 46, qq is unshielded. Thus by R1 of Meek (1995), qq takes the form Q1QjQmQ_{1}-\dots-Q_{j}\to\dots\to Q_{m}.

      Pause to consider the path XQ1YX\to Q_{1}\leftarrow Y. Note that XYX\leftarrow Y cannot be in 𝒢\mathcal{G}, because no set can block this proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}. By Theorem 3, this would contradict our assumption that there is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}. Similarly, XYX\to Y and XYX-Y are not in 𝒢\mathcal{G}, because this would imply De(C,𝒢)Forb(𝐗,𝐘,𝒢)\operatorname{De}(C,\mathcal{G})\subseteq\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), which we have shown is a contradiction. Thus, XQ1YX\to Q_{1}\leftarrow Y is an unshielded collider in 𝒢\mathcal{G}.

      We complete this case by showing that 𝒢\mathcal{G} contains XQjYX\to Q_{j}\leftarrow Y. If j=1j=1, we are done. If instead j>1j>1, then consider the node Q2Q_{2}. Since XQ1Q2X\to Q_{1}-Q_{2} and YQ1Q2Y\to Q_{1}-Q_{2} are in 𝒢\mathcal{G}, so is a path X,Q2,Y\langle X,Q_{2},Y\rangle by R1 of Meek (1995). The unshielded paths XQ2YX\to Q_{2}-Y and XQ2YX-Q_{2}\leftarrow Y contradict that R1 of Meek (1995) is completed in 𝒢\mathcal{G}. Further, the path Q2YQ1Q2Q_{2}\to Y\to Q_{1}-Q_{2} or Q2XQ1Q2Q_{2}\to X\to Q_{1}-Q_{2} contradicts that R2 of Meek (1995) is completed in 𝒢\mathcal{G}, and the path XQ2YX-Q_{2}-Y contradicts that R3 of Meek (1995) is completed in 𝒢\mathcal{G}. This leaves only one option for X,Q2,Y\langle X,Q_{2},Y\rangle, and that is XQ2YX\to Q_{2}\leftarrow Y.

      If j=2j=2, we are done. If instead j>2j>2, then we consider the node Q3Q_{3}. By identical logic to that above, we can show that 𝒢\mathcal{G} contains XQ3YX\to Q_{3}\leftarrow Y. Continuing in this way, we have that 𝒢\mathcal{G} contains XQjYX\to Q_{j}\leftarrow Y.

      With this shown, we derive our final contradictions. When j=mj=m, then 𝒢\mathcal{G} contains XYX^{\prime}\leftarrow Y. But this is a proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that no set can block, which we have shown is a contradiction. When j<mj<m, then 𝒢\mathcal{G} contains the following two paths: XQjYX^{\prime}\leftarrow\dots\leftarrow Q_{j}\leftarrow Y and XQjYX\to Q_{j}\leftarrow Y. These paths are proper non-causal definite status paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that cannot both be blocked by the same set, which again is a contradiction.

(b) O(𝐗,𝐘,𝓖)𝐙:\textbf{O(}\mathbf{X,Y},\boldsymbol{\mathcal{G}}\textbf{)}\boldsymbol{\cup}\mathbf{Z}\textbf{:}Let pp^{\prime} be an arbitrary proper non-causal definite status path from X𝐗X\in\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}, and let YY be the node in 𝐘\mathbf{Y} closest to XX on pp^{\prime} such that p(X,Y)p^{\prime}(X,Y) is still a proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}. Then let p=p(X,Y)p=p^{\prime}(X,Y), where p=X=V1,,Vk=Y,k2p=\langle X=V_{1},\dots,V_{k}=Y\rangle,k\geq 2. Additionally, note that by assumption, YPossMed(𝐗,𝐘,𝒢)Y\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}).

We now consider cases (1) and (2) below. In both cases, we show that pp – and therefore pp^{\prime} – is blocked by O(𝐗,𝐘,𝒢)𝐙\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}.

  1. (1)

    Suppose that pp ends with Vk1YV_{k-1}\leftarrow Y or Vk1YV_{k-1}-Y. If pp has no colliders, then by Lemma 27, (p)(-p) is a possibly causal path from YY to XX. Since YPossMed(𝐗,𝐘,𝒢)Y\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}), this implies that V2,,Vk1Forb(𝐗,𝐘,𝒢)V_{2},\dots,V_{k-1}\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). But then there is no set that is both disjoint from Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) and can block pp. By Theorem 3, this contradicts our assumption that there is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}. Hence, there must be a collider on pp.

    Let CC be the closest collider to YY on pp. By Lemma 27, CPossDe(Y,𝒢)C\in\operatorname{PossDe}(Y,\mathcal{G}). Thus by Lemma 48, De(C,𝒢)PossDe(Y,𝒢)\operatorname{De}(C,\mathcal{G})\subseteq\operatorname{PossDe}(Y,\mathcal{G}). By assumption, YPossMed(𝐗,𝐘,𝒢)Y\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}), which implies that De(C,𝒢)Forb(𝐗,𝐘,𝒢)\operatorname{De}(C,\mathcal{G})\subseteq\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). Since [O(𝐗,𝐘,𝒢)𝐙]Forb(𝐗,𝐘,𝒢)=\big{[}\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}\big{]}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset, pp is blocked by O(𝐗,𝐘,𝒢)𝐙\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}.

  2. (2)

    Suppose that pp ends with Vk1YV_{k-1}\to Y. Note that pp is not a possibly causal path from XX to YY, so by Lemma 27, there must be an edge Vi1ViV_{i-1}\leftarrow V_{i}, i{2,,k1}i\in\{2,\dots,k-1\}, on pp. In particular, let ViV_{i} be the closest node to YY on pp such that Vi1ViV_{i-1}\leftarrow V_{i} is on pp.

    In order to complete this proof, we want to show that either {Vi,,Vk1}[O(𝐗,𝐘,𝒢)𝐙]\{V_{i},\dots,V_{k-1}\}\cap[\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}]\neq\emptyset or {Vi,,Vk1}PossMed(𝐗,𝐘,𝒢)\{V_{i},\dots,V_{k-1}\}\subset\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}). In both cases, we will show that pp is blocked by O(𝐗,𝐘,𝒢)𝐙\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}. To do this, we briefly note that by the choice of ViV_{i}, the path p(Vi,Y)p(V_{i},Y) is possibly causal and every node in {Vi,,Vk1}\{V_{i},\dots,V_{k-1}\} is a non-collider on pp. Further by the choice of pp, no node in {Vi,,Vk1}\{V_{i},\dots,V_{k-1}\} is in 𝐗𝐘\mathbf{X}\cup\mathbf{Y}. We turn to consider each node in {Vi,,Vk1}\{V_{i},\dots,V_{k-1}\}, working backward through the set.

    Consider the node Vk1V_{k-1}. If Vk1O(𝐗,𝐘,𝒢)𝐙V_{k-1}\in\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}, then since Vk1V_{k-1} is a non-collider on pp, pp is blocked by O(𝐗,𝐘,𝒢)𝐙\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}, and we are done. Consider when Vk1O(𝐗,𝐘,𝒢)𝐙V_{k-1}\notin\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}. Since YPossMed(𝐗,𝐘,𝒢)Y\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}) and since Vk1YV_{k-1}\to Y is in 𝒢\mathcal{G}, then either Vk1PossMed(𝐗,𝐘,𝒢)V_{k-1}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}) or Vk1Pa(PossMed(𝐗,𝐘,𝒢),𝒢)V_{k-1}\in\operatorname{Pa}(\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}),\mathcal{G}). We show the latter is impossible. If Vk1Pa(PossMed(𝐗,𝐘,𝒢),𝒢)V_{k-1}\in\operatorname{Pa}(\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}),\mathcal{G}) and Vk1O(𝐗,𝐘,𝒢)𝐙V_{k-1}\notin\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}, then by Equation (9), we have that Vk1Forb(𝐗,𝐘,𝒢)V_{k-1}\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). But by Lemma 26, this implies that Vk1De(𝐗,𝒢)V_{k-1}\in\operatorname{De}(\mathbf{X},\mathcal{G}). Since 𝒢\mathcal{G} contains Vk1YV_{k-1}\to Y, then Vk1PossMed(𝐗,𝐘,𝒢)V_{k-1}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}). But this contradicts that Vk1Pa(PossMed(𝐗,𝐘,𝒢),𝒢)V_{k-1}\in\operatorname{Pa}(\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}),\mathcal{G}) by the definition of a parent set. Therefore, either Vk1O(𝐗,𝐘,𝒢)𝐙V_{k-1}\in\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z} and we are done, or Vk1PossMed(𝐗,𝐘,𝒢)V_{k-1}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}).

    In the latter case, we turn to consider Vk2V_{k-2} if such a node exists. If pp contains Vk2Vk1V_{k-2}\to V_{k-1}, then since Vk1PossMed(𝐗,𝐘,𝒢)V_{k-1}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}), we can use the same logic as above to show that either Vk2O(𝐗,𝐘,𝒢)𝐙V_{k-2}\in\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z} and we are done, or Vk2PossMed(𝐗,𝐘,𝒢)V_{k-2}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}). If pp contains Vk2Vk1V_{k-2}-V_{k-1}, then since Vk1PossMed(𝐗,𝐘,𝒢)V_{k-1}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}), we have that Vk2Forb(𝐗,𝐘,𝒢)De(𝐗,𝒢)V_{k-2}\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\subseteq\operatorname{De}(\mathbf{X},\mathcal{G}). Because p(Vk2,Y)p(V_{k-2},Y) is possibly causal, then by Lemma 48, Vk2PossMed(𝐗,𝐘,𝒢)V_{k-2}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}).

    Working backward in this way, either a node on p(Vi,Y)p(V_{i},Y) is in O(𝐗,𝐘,𝒢)𝐙\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z} and we are done, or VjPossMed(𝐗,𝐘,𝒢)V_{j}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}) for all j{i,,k1}j\in\{i,\dots,k-1\}. In the latter case, we have that ViPossMed(𝐗,𝐘,𝒢)V_{i}\in\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G}) and that every node in {Vi,,Vk1}PossMed(𝐗,𝐘,𝒢)Forb(𝐗,𝐘,𝒢)\{V_{i},\dots,V_{k-1}\}\subseteq\operatorname{PossMed}(\mathbf{X,Y},\mathcal{G})\subseteq\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) is a non-collider on pp. We can now apply the same argument as in (1) above to show that p(X,Vi)p(X,V_{i}) – and therefore pp – is blocked given O(𝐗,𝐘,𝒢)𝐙\operatorname{O}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{Z}.

 

F.2 Supporting Results

Lemma 46

Let XX and YY be distinct nodes in an MPDAG 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V},\mathbf{E}) and let pp be a possibly causal path from XX to YY in 𝒢\mathcal{G}. Then any shortest subsequence of pp forms an unshielded, possibly causal path from XX to YY.

  •   Proof of Lemma 46.

    This result is similar to Lemma 3.6 of Perković et al. (2017), but we derive a slightly more general statement.

    Let kk be the number of nodes on pp. Pick an arbitrary shortest subsequence of pp and call it pp^{*}, where p=X=V0,,V=Yp^{*}=\langle X=V_{0},\dots,V_{\ell}=Y\rangle, 0<k0<\ell\leq k. Note that there is no edge ViVj,0i<jkV_{i}\leftarrow V_{j},0\leq i<j\leq k in 𝒢\mathcal{G}, since this would contradict that pp is possibly causal. Thus, pp^{*} is also possibly causal by definition. Further note that pp^{*} is unshielded, since if any triple on the path is shielded, it either contradicts that pp^{*} is possibly causal (i.e. ViVi+2V_{i}\leftarrow V_{i+2} cannot be in pp^{*}) or that pp^{*} is a shortest subsequence of pp (i.e. ViVi+2V_{i}\to V_{i+2} and ViVi+2V_{i}-V_{i+2} cannot be in pp^{*}).  

Lemma 47

Let p=P0,,Pkp=\langle P_{0},\dots,P_{k}\rangle be a path in an MPDAG 𝒢\mathcal{G}. Then pp is possibly causal if and only if 𝒢\mathcal{G} does not contain any path PiPjP_{i}\leftarrow\dots\leftarrow P_{j}, 0i<jk0\leq i<j\leq k.

  •   Proof of Lemma 47.

    Suppose that 𝒢\mathcal{G} does not contain any path PiPjP_{i}\leftarrow\dots\leftarrow P_{j}, 0i<jk0\leq i<j\leq k. Then 𝒢\mathcal{G} does not contain any edge PiPjP_{i}\leftarrow P_{j}, 0i<jk0\leq i<j\leq k. Therefore, by definition, pp is possibly causal in 𝒢\mathcal{G}.

    Now suppose pp is possibly causal in 𝒢\mathcal{G}. For sake of contradiction, suppose 𝒢\mathcal{G} contains a path qq from PiP_{i} to PjP_{j}, 0i<jk0\leq i<j\leq k, of the form Pi=Q0Q1Q1Q=PjP_{i}=Q_{0}\leftarrow Q_{1}\leftarrow\dots\leftarrow Q_{\ell-1}\leftarrow Q_{\ell}=P_{j}.

    Consider the subpath of pp from PiP_{i} to PjP_{j}. Note that this subpath is a possibly causal path. Let r=Pi=R0,R1,,Rm=Pjr=\langle P_{i}=R_{0},R_{1},\dots,R_{m}=P_{j}\rangle be a shortest subsequence of this subpath. By Lemma 46, rr is an unshielded, possibly causal path.

    Consider the edge r(R0,R1)r(R_{0},R_{1}). R0R1R_{0}\leftarrow R_{1} cannot be in rr, since rr is possibly causal. Neither is R0R1R_{0}\to R_{1} in rr since rr being unshielded would imply, by R1 of Meek (1995), that 𝒢\mathcal{G} contains the cycle Pi=R0R1Rm=Pj=QQ1Q0=PiP_{i}=R_{0}\to R_{1}\to\dots\to R_{m}=P_{j}=Q_{\ell}\to Q_{\ell-1}\to\dots\to Q_{0}=P_{i}. Thus rr contains R0R1R_{0}-R_{1}.

    However, note that no DAG in [𝒢][\mathcal{G}] can contain the edge R0R1R_{0}\to R_{1}, since rr being unshielded would imply, by R1 of Meek (1995), that the DAG contains the cycle Pi=R0R1Rm=Pj=QQ1Q0=PiP_{i}=R_{0}\to R_{1}\to\dots\to R_{m}=P_{j}=Q_{\ell}\to Q_{\ell-1}\to\dots\to Q_{0}=P_{i}. This contradicts that rr contains R0R1R_{0}-R_{1}. Thus we conclude that 𝒢\mathcal{G} does not contain any path PiPjP_{i}\leftarrow\dots\leftarrow P_{j}, 0i<jk0\leq i<j\leq k.  

Lemma 48

Let XX, YY, and ZZ be distinct nodes in an MPDAG 𝒢\mathcal{G}.

  1. (i)

    If pp is a possibly causal path from XX to YY and qq is a causal path from YY to ZZ, then pqp\oplus q is a possibly causal path from XX to ZZ.

  2. (ii)

    If pp is a causal path from XX to YY and qq is a possibly causal path from YY to ZZ, then pqp\oplus q is a possibly causal path from XX to ZZ.

  •   Proof of Lemma 48.

    Let p=X=P0,P1,,Pk=Yp=\langle X=P_{0},P_{1},\dots,P_{k}=Y\rangle and let q=Y=Q0,Q1,,Qr=Zq=\langle Y=Q_{0},Q_{1},\dots,Q_{r}=Z\rangle. Before beginning the main arguments, we note that pp and qq cannot share any nodes other than YY, and thus, we can define a path pqp\oplus q. To see this, for sake of contradiction, suppose pp and qq share at least one node other than YY. Let 𝐒\mathbf{S} denote the collection of such nodes, and consider the node in 𝐒\mathbf{S} with the lowest index on qq. That is, consider Qj𝐒Q_{j}\in\mathbf{S} such that jj\leq\ell for all Q𝐒Q_{\ell}\in\mathbf{S}. Let Qj=PiQ_{j}=P_{i} for some PiYP_{i}\neq Y on pp. Note that since qq or pp is causal, 𝒢\mathcal{G} contains either Pk=Q0Q1Qj=PiP_{k}=Q_{0}\to Q_{1}\to\dots\to Q_{j}=P_{i} or Qj=PiPi+1Y=Q0Q_{j}=P_{i}\to P_{i+1}\to\dots\to Y=Q_{0}. By Lemma 47, the first option contradicts that pp is possibly causal and the second contradicts that qq is possibly causal. Thus we conclude that pp and qq cannot share any nodes other than YY.

    For pqp\oplus q to be possibly causal in 𝒢\mathcal{G} we only need to show that there is no backward edge between any two nodes on pqp\oplus q. Note that there is no edge Pi1Pj1P_{i_{1}}\leftarrow P_{j_{1}} for 0i1<j1k0\leq i_{1}<j_{1}\leq k, or Qi2Qj2Q_{i_{2}}\leftarrow Q_{j_{2}} for 0i2<j2r0\leq i_{2}<j_{2}\leq r in 𝒢\mathcal{G}, by choice of pp and qq.

    (i) Assume for sake of contradiction that there exists an edge PiQjP_{i}\leftarrow Q_{j} in 𝒢\mathcal{G} for i{0,,k1}i\in\{0,\dots,k-1\} and j{1,,r}j\in\{1,\dots,r\}. Note that PiP_{i} is on pp and not qq, and analogously, QjQ_{j} is on qq and not pp, since we have shown pp and qq cannot share nodes other than YY. Also note that since qq is causal, it contains YQ1QjY\to Q_{1}\to\dots\to Q_{j}.

    Consider the subpath p(Pi,Y)p(P_{i},Y). Since pp is possibly causal, so is this subpath. Pick an arbitrary shortest subsequence of p(Pi,Y)p(P_{i},Y) and call it tt, where t=Pi=T0,,Tm=Yt=\langle P_{i}=T_{0},\dots,T_{m}=Y\rangle, m1m\geq 1. By Lemma 46, tt forms an unshielded, possibly causal path from PiP_{i} to YY.

    Consider the edge t(Pi,T1)t(P_{i},T_{1}). Edge PiT1P_{i}\leftarrow T_{1} cannot be on tt, since tt is possibly causal. Then PiT1P_{i}\to T_{1} or PiT1P_{i}-T_{1} must be in 𝒢\mathcal{G}. However, note that no DAG in [𝒢][\mathcal{G}] can contain the edge PiT1P_{i}\to T_{1}, since tt being unshielded would imply, by R1 of Meek (1995), that the DAG contains the cycle PiT1YQjPiP_{i}\to T_{1}\to\dots\to Y\to\dots\to Q_{j}\to P_{i}. This contradicts that tt contains PiT1P_{i}-T_{1} or PiT1P_{i}\to T_{1}. Thus, there does not exist an edge PiQjP_{i}\leftarrow Q_{j} in 𝒢\mathcal{G}.

    (ii) Assume for sake of contradiction that there exists an edge PiQjP_{i}\leftarrow Q_{j} in 𝒢\mathcal{G} for i{0,,k1}i\in\{0,\dots,k-1\} and j{1,,r}j\in\{1,\dots,r\}. Note that PiP_{i} is on pp and not qq, and analogously, QjQ_{j} is on qq and not pp, since we have shown pp and qq cannot share nodes other than YY. Also note that since pp is causal, it contains PiPi+1YP_{i}\to P_{i+1}\to\dots\to Y.

    Consider the subpath q(Y,Qj)q(Y,Q_{j}). Since qq is possibly causal, so is this subpath. Pick an arbitrary shortest subsequence of q(Y,Qj)q(Y,Q_{j}) and call it tt, where t=Y=T0,,Tm=Qjt=\langle Y=T_{0},\dots,T_{m}=Q_{j}\rangle, m1m\geq 1. By Lemma 46, tt forms an unshielded, possibly causal path from YY to QjQ_{j}.

    Consider the edge t(Y,T1)t(Y,T_{1}). Edge YT1Y\leftarrow T_{1} cannot be on tt, since tt is possibly causal. Then YT1Y\to T_{1} or YT1Y-T_{1} must be in 𝒢\mathcal{G}. However, note that no DAG in [𝒢][\mathcal{G}] can contain the edge YT1Y\to T_{1}, since tt being unshielded would imply, by R1 of Meek (1995), that the DAG contains the cycle YT1QjPiPi+1YY\to T_{1}\to\dots\to Q_{j}\to P_{i}\to P_{i+1}\to\dots\to Y. This contradicts that tt contains YT1Y-T_{1} or YT1Y\to T_{1}. Thus, there does not exist an edge PiQjP_{i}\leftarrow Q_{j} in 𝒢\mathcal{G}.  

Appendix G PROOF FOR SECTION 4.1: PAGS - CONDITIONAL ADJUSTMENT CRITERION

This section includes the proof of Theorem 9 and one result (Lemma 49) needed for the proof of Lemma 8. The statements of Theorem 9 and Lemma 8 can be found in Section 4.1.

Figure 6 shows how the results in this paper fit together to prove Theorem 9. Note that Theorem 9 is an analogous result to Theorem 3 (Section 3.1), where the former applies to PAGs and the latter to MPDAGs. However, while the proof of Theorem 3 relies directly on completeness and soundness proofs for DAGs (see Figure 5 in Supplement D), the proof of Theorem 9 relies on them indirectly through Theorem 3.

Theorem 9Lemma 8Lemma 6(b)Lemma 49Theorem 3Lemma 6(a)
Figure 6: Proof structure of Theorem 9.
  •   Proof of Theorem 9.

    Follows from Lemma 8 and Theorem 31.  

Lemma 49

Let 𝐗\mathbf{X} and 𝐙\mathbf{Z} be disjoint node sets in a PAG 𝒢\mathcal{G}. Then the following statements are equivalent.

  1. (i)

    𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset.

  2. (ii)

    𝐙De(𝐗,𝒟)=\mathbf{Z}\cap\operatorname{De}(\mathbf{X},\mathcal{D})=\emptyset in every DAG 𝒟\mathcal{D} represented by 𝒢\mathcal{G}.

  •   Proof of Lemma 49.

    ¬(i)¬(ii)\neg\ref{lem:equiv-z-pag-a}\Rightarrow\neg\ref{lem:equiv-z-pag-b} Let pp be a possibly causal path from 𝐗\mathbf{X} to 𝐙\mathbf{Z} in 𝒢=(𝐕,𝐄)\mathcal{G}=(\mathbf{V},\mathbf{E}) and let p=X=V0,,Vk=Zp^{*}=\langle X=V_{0},\dots,V_{k}=Z\rangle, k1k\geq 1, X𝐗X\in\mathbf{X}, Z𝐙Z\in\mathbf{Z}, be an unshielded possibly causal subsequence of pp in 𝒢\mathcal{G}.

    Since pp^{*} contains XV1X\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}V_{1}, XV1X\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}V_{1} or XV1X\to V_{1}, there must be some MAG \mathcal{M} in [𝒢][\mathcal{G}] with the edge XV1X\to V_{1}. Let pp^{**} be the path in \mathcal{M} corresponding to pp^{*} in 𝒢\mathcal{G}. Then since pp^{*} is unshielded, so is pp^{**}, and so pp^{**} takes the form XV1VkX\to V_{1}\to\dots\to V_{k}. Let 𝒟\mathcal{D} be a DAG created from \mathcal{M}, by retaining all the nodes in \mathcal{M} and all the directed edges in \mathcal{M} and by adding a node LABL_{AB} and edges LABBL_{AB}\to B and LABAL_{AB}\to A for each bidirected edge ABA\leftrightarrow B in \mathcal{M} (this DAG is titled the canonical DAG by Richardson and Spirtes, 2002). Now, DAG 𝒟\mathcal{D} contains a causal path from XX to ZZ.

    ¬(ii)¬(i)\neg\ref{lem:equiv-z-pag-b}\Rightarrow\neg\ref{lem:equiv-z-pag-a} If there is a DAG 𝒟\mathcal{D} represented by 𝒢\mathcal{G} with a causal path from X𝐗X\in\mathbf{X} to Z𝐙Z\in\mathbf{Z}, then any MAG \mathcal{M} of 𝒟\mathcal{D} that contains XX and ZZ will contain a causal path from XX to ZZ. This is due to the fact that a MAG of a DAG will preserve ancestral relationships between observed variables. Then the path in 𝒢\mathcal{G} that corresponds to qq in \mathcal{M} cannot have any arrowheads pointing in the direction of XX, and so it must be possibly causal.  

Appendix H PROOFS FOR SECTION 4.2: PAGS - CONSTRUCTING CONDITIONAL ADJUSTMENT SETS

This section includes the proof of Theorem 10, which can be found in Section 4.2. We provide one supporting result needed for the proof of this theorem.

We make an important remark here on R software. Note that by Lemmas 6 and 8, any algorithms developed for checking the existence of an unconditional adjustment set (Definition 11) also apply to conditional adjustment sets – provided that 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset. First consider the R package dagitty (Textor et al., 2016). Suppose the condition on 𝐙\mathbf{Z} is satisfied and let 𝐒\mathbf{S} be a set such that 𝐒(𝐗𝐘𝐙)=\mathbf{S}\cap(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z})=\emptyset. Then, one can apply the function isAdjustmentSet of the package dagitty to a PAG 𝒢\mathcal{G}, set 𝐒𝐙\mathbf{S}\cup\mathbf{Z}, exposure 𝐗\mathbf{X}, and outcome 𝐘\mathbf{Y} to learn whether 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G}. Next consider the R package pcalg (Kalisch et al., 2012). Suppose the condition on 𝐙\mathbf{Z} is satisfied and let 𝐒\mathbf{S} be a set such that 𝐒(𝐗𝐘𝐙)=\mathbf{S}\cap(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z})=\emptyset. Then, one could apply the function gac of the package pcalg to the MPDAG or PAG 𝒢\mathcal{G} and to the node sets 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐒𝐙\mathbf{S}\cup\mathbf{Z}. These functions will return TRUE if and only if 𝐒\mathbf{S} is a conditional adjustment set relative to (𝐗,𝐘,𝐙)(\mathbf{X},\mathbf{Y},\mathbf{Z}) in 𝒢\mathcal{G}, and FALSE otherwise.

H.1 Main Result

  •   Proof of Theorem 10.

    Suppose that Adjust(𝐗,𝐘,𝐙,𝒢)\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}) does not satisfy the conditional adjustment criterion relative to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}. Since Adjust(𝐗,𝐘,𝐙,𝒢)Forb(𝐗,𝐘,𝒢)=\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})=\emptyset by construction, it must be that there is a proper definite status non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}}. By Lemma 50, there is then a proper definite status non-causal path pp from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} such that all definite non-colliders on pp are in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) (case (ii) of Lemma 50) and all colliders on pp are in An(𝐗𝐘𝐙,𝒢)\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G}) (cases (iii) and (vi) of Lemma 50). Since An(𝐗𝐘𝐙,𝒢)An(𝐗𝐘𝐙𝐒,𝒢)\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\subseteq\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}\cup\mathbf{S},\mathcal{G}), for any set 𝐒\mathbf{S} that satisfies [𝐒𝐙][𝐗𝐘Forb(𝐗,𝐘,𝒢)]=[\mathbf{S}\cup\mathbf{Z}]\cap[\mathbf{X}\cup\mathbf{Y}\cup\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})]=\emptyset, Lemma 30 implies that there is also a proper definite status non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that is open given 𝐒\mathbf{S}. Since this is true for an arbitrary set 𝐒\mathbf{S} that satisfies condition (a) of Definition 7, it follows that there cannot be any set that satisfies the conditional adjustment criterion relative to to (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}.  

H.2 Supporting Result

Lemma 50

Let 𝐗\mathbf{X}, 𝐘\mathbf{Y}, and 𝐙\mathbf{Z}, be pairwise disjoint node sets in a PAG 𝒢\mathcal{G}, where 𝐙PossDe(𝐗,𝒢)=\mathbf{Z}\cap\operatorname{PossDe}(\mathbf{X},\mathcal{G})=\emptyset and where every proper possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} starts with a visible edge out of 𝐗\mathbf{X}. Suppose furthermore, that there exists a set 𝐒\mathbf{S} that satisfies the conditional adjustment criterion for (𝐗,𝐘,𝐙)(\mathbf{X,Y,Z}) in 𝒢\mathcal{G}. If there is a proper definite status non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}} (see definition in Theorem 10), then there is a path pp from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} such that the following hold.

  1. (i)

    Path pp is a proper definite status non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}.

  2. (ii)

    All definite non-colliders on pp are in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}).

  3. (iii)

    There is at least one collider on pp, and all colliders on pp are in 𝐂𝟏𝐂𝟐\mathbf{C_{1}}\cup\mathbf{C_{2}}, where 𝐂𝟏\mathbf{C_{1}} and 𝐂𝟐\mathbf{C_{2}} are disjoint sets such that

    𝐂𝟏PossAn(𝐗𝐘𝐙,𝒢)[An(𝐗𝐘𝐙,𝒢)𝐗𝐘Forb(𝐗,𝐘,𝒢)]and\displaystyle\mathbf{C_{1}}\subseteq\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}},\mathcal{G})\setminus\big{[}\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\cup{\mathbf{X}\cup\mathbf{Y}\cup\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})}\big{]}\ and
    𝐂𝟐An(𝐗𝐘𝐙,𝒢)[𝐗𝐘Forb(𝐗,𝐘,𝒢)].\displaystyle\mathbf{C_{2}}\subseteq\operatorname{An}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\setminus\big{[}\mathbf{X}\cup\mathbf{Y}\cup\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\big{]}.
  4. (iv)

    None of the colliders on pp can be possible descendants of a non-collider on pp.

  5. (v)

    For any collider C𝐂𝟏C\in\mathbf{C_{1}} on pp there is an unshielded possibly directed path from CC to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z} that does not start with .

  6. (vi)

    𝐂𝟏=\mathbf{C_{1}}=\emptyset, that is for any collider C𝐂𝟏C\in\mathbf{C_{1}} on pp there is an unshielded directed path from CC to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}.

  •   Proof of Lemma 50.

    Consider the sets of all proper definite status non-causal paths from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that are m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}} and choose among them a shortest path with a shortest distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z} (Definition 16). Let this path be called pp, where p=X=V1,V2,,Vk=Yp=\langle X=V_{1},V_{2},\dots,V_{k}=Y\rangle, X𝐗,Y𝐘X\in\mathbf{X},Y\in\mathbf{Y}, k2k\geq 2. By choice of pp, (i) is satisfied. We will now show that pp also satisfies properties (ii)-(vi) above.

    First, consider properties (ii) and (iii). Since pp is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}}, any collider on pp is in An(Adjust(𝐗,𝐘,𝐙,𝒢)𝐙,𝒢)\operatorname{An}(\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z},\mathcal{G}). Furthermore, since Adjust(𝐗,𝐘,𝐙,𝒢)𝐙=PossAn(𝐗𝐘𝐙,𝒢)[𝐗𝐘Forb(𝐗,𝐘,𝒢)]\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}=\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\setminus\big{[}\mathbf{X}\cup\mathbf{Y}\cup\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\big{]}, and since in a PAG 𝒢\mathcal{G} for any set 𝐖\mathbf{W}, An(PossAn(𝐖,𝒢))=PossAn(𝐖,𝒢)\operatorname{An}(\operatorname{PossAn}(\mathbf{W},\mathcal{G}))=\operatorname{PossAn}(\mathbf{W},\mathcal{G}), we have that any collider on pp is in PossAn(𝐗𝐘𝐙,𝒢)\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G}). Furthermore, since by definition, De(Forb(𝐗,𝐘,𝒢),𝒢)=Forb(𝐗,𝐘,𝒢)\operatorname{De}(\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}),\mathcal{G})=\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), we have that no collider on pp can be in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). Hence, all colliders on pp are in PossAn(𝐗𝐘𝐙,𝒢)Forb(𝐗,𝐘,𝒢).\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\setminus\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}).

Also, since pp is proper, a node in 𝐗\mathbf{X} cannot be a non-endpoint node on pp. Now, since pp is additionally chosen as a shortest proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}), it holds that either a node in 𝐘\mathbf{Y} is not a non-endpoint node on pp, or there is a node Y𝐘{Y}Y^{\prime}\in\mathbf{Y}\setminus\{Y\} on pp such that p(X,Y)p(X,Y^{\prime}) is a possibly causal path from XX to YY^{\prime}. Moreover, in this case p(X,Y)p(X,Y^{\prime}) must be a causal path in 𝒢\mathcal{G} (because pp must start with a visible edge and because ABCA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}B\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.4}} \put(2.9,0.0){$\bullet$} \end{picture}C cannot be a subpath of a definite status path). Since pp itself is a non-causal path in 𝒢\mathcal{G}, there is a collider on pp that is a descendant of YY^{\prime}. But since YForb(𝐗,𝐘,𝒢)Y^{\prime}\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), this collider would then also have to be in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), which we have ruled out as an option in the previous paragraph. Hence, a node on 𝐘\mathbf{Y} is also not a non-endpoint node on p.p.

Then all colliders on pp are in PossAn(𝐗𝐘𝐙,𝒢)[Forb(𝐗,𝐘,𝒢)𝐗𝐘]\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\setminus[\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}]. Also, any definite non-collider on pp is a possible ancestor of a collider on pp or of an endpoint on pp. Hence, every definite non-collider on pp is in PossAn(𝐗𝐘𝐙,𝒢)[𝐗𝐘].\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\setminus[\mathbf{X}\cup\mathbf{Y}]. But, since pp is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}}, none of the definite non-colliders on pp are in PossAn(𝐗𝐘𝐙,𝒢)[𝐗𝐘Forb(𝐗,𝐘,𝒢)]\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\setminus\big{[}\mathbf{X}\cup\mathbf{Y}\cup\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\big{]}. Therefore, any definite non-collider on pp is in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). This proves property (ii).

Next, consider property (iii). We have already shown that any collider on pp is in PossAn(𝐗𝐘𝐙,𝒢)[Forb(𝐗,𝐘,𝒢)𝐗𝐘]\operatorname{PossAn}(\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z},\mathcal{G})\setminus[\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\cup\mathbf{X}\cup\mathbf{Y}]. So it is only left to show that at least one collider is on pp. Since we know that pp must be blocked by 𝐒𝐙\mathbf{S}\cup\mathbf{Z} for some set 𝐒\mathbf{S}, where 𝐒[𝐗𝐘𝐙Forb(𝐗,𝐘,𝒢)]=\mathbf{S}\cap{\big{[}\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}\cup\operatorname{Forb}(\mathbf{X,Y},\mathcal{G})\big{]}}=\emptyset, and since all definite non-colliders on pp are in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), there is at least one collider CC on pp.

Property (iv) follows almost directly now, since by (ii), all definite non-colliders on pp are in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) and by (iii), none of the colliders can be in Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). The claim then holds since by definition of the Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) in a PAG, PossDe(Forb(𝐗,𝐘,𝒢),𝒢)=Forb(𝐗,𝐘,𝒢)\operatorname{PossDe}(\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}),\mathcal{G})=\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}).

Next, we show properties (v) and (vi). Let C𝐂𝟏C\in\mathbf{C_{1}} be a collider on pp. Then C[𝐗𝐘𝐙]C\notin\big{[}\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z}\big{]} and that there is an unshielded possibly directed path r=C,Q,,Vr=\langle C,Q,\dots,V\rangle from CC to a node V𝐗𝐘𝐙V\in\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}}.

(v) Suppose for a contradiction that edge C,Q\langle C,Q\rangle on rr is of type CQC\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}Q (possibly Q=VQ=V). We derive a contradiction by constructing a proper definite status non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}} and shorter than pp, or of the same length as pp but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z} (Definition 16).

Let AA and BB be nodes on pp such that ACBA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}C\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}B is a subpath of pp (possibly A=XA=X, B=YB=Y). Then paths ACQA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}C\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}Q and BCQB\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}C\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}Q together with Lemma 28 imply that AQBA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}B is in 𝒢\mathcal{G}.

Suppose first that AXA\neq X, and BYB\neq Y. Note that by property (iv) above, if AXA\neq X, then ACA\leftrightarrow C is in 𝒢\mathcal{G}. Moreover, if ACA\leftrightarrow C is in 𝒢\mathcal{G}, then AQA\leftrightarrow Q is in 𝒢\mathcal{G}, otherwise path A,Q,C\langle A,Q,C\rangle and edge ACA\leftrightarrow C contradict Lemma 29. Hence, if AXA\neq X, the collider/definite non-collider status of AA is the same on pp and on p(X,A)A,Qp(X,A)\oplus\langle A,Q\rangle. Analogous reasoning can be employed in the case when BYB\neq Y, to show that BQB\leftrightarrow Q, that is, the collider/definite non-collider status of BB is the same on pp and on Q,Bp(B,Y)\langle Q,B\rangle\oplus p(B,Y).

Now, we return to the general case where we allow A=XA=X and B=YB=Y. In each of the cases below we will derive the contradiction by finding a path ss from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that is a proper non-causal definite status path in 𝒢\mathcal{G} and m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}}. Additionally, the path ss will either be shorter than pp or of the same length as pp, but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16) which implies a contradiction with our choice of pp.

Suppose first that QQ is not a node on pp.

  • If Q𝐗𝐘Q\notin\mathbf{X}\cup\mathbf{Y}, then

    • *

      if AXA\neq X and BYB\neq Y, then let s=p(X,A)A,Q,Bp(B,Y)s=p(X,A)\oplus\langle A,Q,B\rangle\oplus p(B,Y). By the reasoning above, this path transformation amounts to replacing ACBA\leftrightarrow C\leftrightarrow B on pp with AQBA\leftrightarrow Q\leftrightarrow B on ss thereby creating a path with the same properties as pp but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16).

    • *

      If A=XA=X, and BYB\neq Y, then let s=A,Q,Bp(B,Y)s=\langle A,Q,B\rangle\oplus p(B,Y). This path transformation amounts to replacing XCBX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}C\leftrightarrow B on pp, with XQBX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q\leftrightarrow B on ss, thereby creating a path with the same properties as pp and of the same length as pp but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16).

    • *

      If AXA\neq X, and B=YB=Y, then let s=p(X,A)A,Q,Bs=p(X,A)\oplus\langle A,Q,B\rangle. This path transformation amounts to replacing ACYA\leftrightarrow C\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y on pp, with AQYA\leftrightarrow Q\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y on ss, thereby creating a path with the same properties as pp, that is of the same length, but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16).

    • *

      If A=XA=X, and B=YB=Y, then let sA,Q,Bs\langle A,Q,B\rangle. Now ss is of the form XQYX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y and clearly satisfies all the same properties as pp while being of the same length, but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16).

  • If QXQ\equiv X^{\prime}, X𝐗X^{\prime}\in\mathbf{X}, then:

    • *

      if BYB\neq Y, let s=Q,Bp(B,Y)s=\langle Q,B\rangle\oplus p(B,Y). This path transformation amounts to replacing XCBX\dots C\leftrightarrow B on pp, with XBX^{\prime}\leftrightarrow B on ss, thereby creating a shorter path with the same properties as pp.

    • *

      If B=YB=Y, then let s=Q,Bs=\langle Q,B\rangle. Due to the discussion above, ss is of the form XYX^{\prime}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y in 𝒢\mathcal{G}.

  • Otherwise, QY,Y𝐘Q\equiv Y^{\prime},Y^{\prime}\in\mathbf{Y}. If Q𝐘Forb(𝐗,𝐘,𝒢)Q\in\mathbf{Y}\cap\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), this would imply that CForb(𝐗,𝐘,𝒢)C\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), which contradicts (iii). So QQ must be in 𝐘Forb(𝐗,𝐘,𝒢)\mathbf{Y}\setminus\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}). Then:

    • *

      if AXA\neq X, then let s=p(X,A)A,Qs=p(X,A)\oplus\langle A,Q\rangle. This path transformation amounts to replacing ACYA\leftrightarrow C\dots Y on pp, with AYA\leftrightarrow Y^{\prime} on ss, thereby creating a shorter path with the same properties as pp.

    • *

      If A=XA=X, then let s=A,Qs=\langle A,Q\rangle. Due to the discussion above, ss is of the form XYX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y^{\prime} in 𝒢\mathcal{G}.

Otherwise, QQ is on pp. Therefore, Q𝐗𝐘Q\notin\mathbf{X}\cup\mathbf{Y}. Also, QQ is a collider on pp, otherwise QForb(𝐗,𝐘,𝒢)Q\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}) and CForb(𝐗,𝐘,𝒢)C\in\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), because of CQC\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}Q.

  • Suppose first that QQ is on p(C,Y)p(C,Y). Then:

    • *

      if AXA\neq X, then let s=p(X,A)A,Qp(Q,Y)s=p(X,A)\oplus\langle A,Q\rangle\oplus p(Q,Y). This path transformation amounts to replacing ACQA\leftrightarrow C\leftrightarrow\dots\leftrightarrow Q on pp, with AQA\leftrightarrow Q on ss, thereby creating a shorter path with the same properties as pp.

    • *

      If A=XA=X, then let s=A,Qp(Q,Y)s=\langle A,Q\rangle\oplus p(Q,Y). This path transformation amounts to replacing XCQX\leftrightarrow C\leftrightarrow\dots\leftrightarrow Q on pp, with XQX\leftrightarrow Q on ss, thereby creating a shorter path with the same properties as pp.

  • Next, suppose that QQ is on p(X,C)p(X,C). Then depending on whether B=YB=Y, we can choose one of the following paths as the path ss:

    • *

      if BYB\neq Y, then let s=p(X,Q)Q,Bp(B,Y)s=p(X,Q)\oplus\langle Q,B\rangle\oplus p(B,Y). This path transformation amounts to replacing QCBQ\leftrightarrow\dots\leftrightarrow C\leftrightarrow B on pp, with QBQ\leftrightarrow B on ss, thereby creating a shorter path with the same properties as pp.

    • *

      If B=YB=Y, then let s=p(X,Q)Q,Bs=p(X,Q)\oplus\langle Q,B\rangle. Similarly to above, this path transformation amounts to replacing QCYQ\leftrightarrow\dots\leftrightarrow C\leftrightarrow Y on pp, with QYQ\leftrightarrow Y on ss, thereby creating a shorter path with the same properties as pp.

(vi) Since we showed above that the starting edge C,Q\langle C,Q\rangle on r=C,Q,,Vr=\langle C,Q,\dots,V\rangle is not of the form CQC\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.0}} \put(4.0,1.0){\circle{1.0}} \end{picture}Q, and since rr is an unshielded possibly directed path from CC to V𝐗𝐘𝐙V\in\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}}, in order to prove property (vi) it is enough to show that C,Q\langle C,Q\rangle is also not of the form CQC\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q (since P1P2P3P_{1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}P_{2}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.5,1.0){\line(1,0){2.4}} \put(2.9,0.0){$\bullet$} \end{picture}P_{3} cannot be a subpath of any unshielded possibly directed path in 𝒢\mathcal{G}, Zhang, 2008b). Suppose for a contradiction that C,Q\langle C,Q\rangle is exactly of that form. Since ACBA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}C\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}B and CQC\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q are in 𝒢\mathcal{G}, by Lemma 28, AQBA\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}B is in 𝒢\mathcal{G}.

Now, our goal is to identify a nodes AA^{\prime} and BB^{\prime} on pp that satisfy the following. Node AA^{\prime} is on p(X,A)p(X,A), and edge AQA^{\prime}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. Additionally, A=XA^{\prime}=X or AA^{\prime} is a non-endpoint node on pp that has the same definite non-collider/collider status on pp and on p(X,A)A,Qp(X,A^{\prime})\oplus\langle A^{\prime},Q\rangle. Similarly, BB^{\prime} is on p(B,Y)p(B,Y), and edge BQB^{\prime}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. Additionally, B=YB^{\prime}=Y or BB^{\prime} is a non-endpoint node on pp that has the same definite non-collider/collider status on pp and on Q,Bp(B,Y)\langle Q,B^{\prime}\rangle\oplus p(B^{\prime},Y). We only show how to find node AA^{\prime} on p(X,A)p(X,A), since the argument for finding BB^{\prime} on p(B,Y)p(B,Y) is exactly symmetric.

  • Consider the path p(X,C)=X=V1,V2,,Vi1=A,Vi=Cp(X,C)=\langle X=V_{1},V_{2},\dots,V_{i-1}=A,V_{i}=C\rangle. Note that by (iv) and the properties of unshielded paths, p(X,C)p(X,C) is of the form XV2ACX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}V_{2}\leftrightarrow\dots\leftrightarrow A\leftrightarrow C or XV2VjCX\leftarrow V_{2}\leftarrow\dots\leftarrow V_{j}\leftrightarrow\dots\leftrightarrow C, for some VjV_{j}, j{2,,i1}j\in\{2,\dots,i-1\}.

    Hence, if there is any non-endpoint node WW on p(X,A)p(X,A) such that WQW\leftrightarrow Q, this node has the same definite collider / non-collider status on both pp and on p(X,W)W,Qp(X,W)\oplus\langle W,Q\rangle. Then we choose AWA^{\prime}\equiv W. Otherwise, if there is a non-endpoint node WW on p(X,A)p(X,A) such that p(W,X)-p(W,X) is of the form WXW\to\dots\to X, and an edge WQW\to Q or WQW\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}, then WW is a definite non-collider on both pp and p(X,W)W,Qp(X,W)\oplus\langle W,Q\rangle and we choose AWA^{\prime}\equiv W.

    We will now show that if neither of the above choices for AA^{\prime} are possible in 𝒢\mathcal{G}, then p(X,C)p(X,C) is of the form XV2CX\leftrightarrow V_{2}\leftrightarrow\dots\leftrightarrow C, and for every node VjV_{j}, j{1,,i}j\in\{1,\dots,i\} on p(X,C)p(X,C), the edge VjQV_{j}\to Q or VjQV_{j}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. In this case, we choose AXA^{\prime}\equiv X.

    Hence, consider first node Vi1=AV_{i-1}=A on pp. By above Vi1QV_{i-1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. Also, by our assumption Vi1QV_{i-1}\leftrightarrow Q is not in 𝒢\mathcal{G}, so we must have either Vi1QV_{i-1}\to Q or Vi1QV_{i-1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. Similarly, by the assumption above we now know that edge Vi2,Vi1\langle V_{i-2},V_{i-1}\rangle is not of the form Vi2Vi1V_{i-2}\leftarrow V_{i-1}, so we can conclude that Vi2Vi1V_{i-2}\leftrightarrow V_{i-1} is in 𝒢\mathcal{G}.

    Now, Vi2Vi1CQV_{i-2}\leftrightarrow V_{i-1}\leftrightarrow C\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q and either Vi1QV_{i-1}\to Q or Vi1QV_{i-1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. If Vi1QV_{i-1}\to Q is in 𝒢\mathcal{G}, then R4R4 of Zhang (2008b) would imply that Vi2Adj(Q,𝒢)V_{i-2}\in\operatorname{Adj}(Q,\mathcal{G}). Moreover, since Vi2Vi1QV_{i-2}\leftrightarrow V_{i-1}\to Q is in 𝒢\mathcal{G}, R2R2 of Zhang (2008b) would imply that Vi2QV_{i-2}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}, and our assumption further lets us conclude that Vi2QV_{i-2}\to Q, or Vi2QV_{i-2}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢.\mathcal{G}.

    If Vi1QV_{i-1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}, then Vi2Vi1QV_{i-2}\leftrightarrow V_{i-1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q and Lemma 28 imply that, Vi2QV_{i-2}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. Hence, as above either Vi2QV_{i-2}\to Q, or Vi2QV_{i-2}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢.\mathcal{G}.

    If Vi2=XV_{i-2}=X we are done. Otherwise, we can repeat the same argument as in the preceding three paragraphs to conclude that Vi3Vi2Vi1CV_{i-3}\leftrightarrow V_{i-2}\leftrightarrow V_{i-1}\leftrightarrow C is in 𝒢\mathcal{G}, and either Vi3QV_{i-3}\to Q or Vi3QV_{i-3}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q are in 𝒢\mathcal{G}. If XVi3X\neq V_{i-3}, we can keep applying the same argument, until we reach XX.

Now that we have chosen the appropriate AA^{\prime} and BB^{\prime} the remaining argument is very similar to case (v). In each of the cases below we will derive the contradiction by finding a path ss from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G} that is a proper non-causal definite status path in 𝒢\mathcal{G} and m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}}. Additionally, the path ss will either be shorter than pp or of the same length as pp, but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16) which implies a contradiction with our choice of pp.

Suppose first that QQ is not on pp:

  • If Q𝐗𝐘Q\notin\mathbf{X}\cup\mathbf{Y}, then

    • *

      if AXA^{\prime}\neq X and BYB^{\prime}\neq Y, then let s=p(X,A)A,Q,Bp(B,Y)s=p(X,A^{\prime})\oplus\langle A^{\prime},Q,B^{\prime}\rangle\oplus p(B^{\prime},Y). By the reasoning above, this path transformation amounts to replacing p(A,B)p(A^{\prime},B^{\prime}) on pp with A,Q,B\langle A^{\prime},Q,B^{\prime}\rangle on ss such that the collider / definite non-collider status of AA^{\prime} and BB^{\prime} is the same on both paths. Therefore, ss is a path with the same properties as pp, but either shorter than pp or of the same length but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16).

    • *

      If A=XA^{\prime}=X, and BYB^{\prime}\neq Y, then let s=A,Q,Bp(B,Y)s=\langle A^{\prime},Q,B^{\prime}\rangle\oplus p(B^{\prime},Y). By the reasoning above, this path transformation amounts to replacing p(X,B)p(X,B^{\prime}) on pp with X,Q,B\langle X,Q,B^{\prime}\rangle on ss such that the collider / definite non-collider status of BB^{\prime} is the same on both paths, and ss is a non-causal path because of QBQ\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}B^{\prime} edge. Therefore, ss is a path with the same properties as pp but either shorter than pp or of the same length but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16).

    • *

      If AXA^{\prime}\neq X, and B=YB^{\prime}=Y, then let s=p(X,A)A,Q,Bs=p(X,A^{\prime})\oplus\langle A^{\prime},Q,B^{\prime}\rangle. This path transformation amounts to replacing p(A,Y)p(A^{\prime},Y) on pp with A,Q,Y\langle A^{\prime},Q,Y\rangle on ss such that the collider / definite non-collider status of BB^{\prime} is the same on both paths, and ss is a non-causal path because of QYQ\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y edge. Therefore, ss is a path with the same properties as pp but either shorter than pp or of the same length but with a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}{\cup\mathbf{Z}} (Definition 16).

    • *

      If A=XA^{\prime}=X and B=YB^{\prime}=Y, A,Q,B\langle A^{\prime},Q,B^{\prime}\rangle. Then ss is of the form XQYX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}Q\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y and QAn(Adjust(𝐗,𝐘,𝐙,𝒢)𝐙,𝒢)Q\in\operatorname{An}(\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z},\mathcal{G}) and QQ has a shorter distance to 𝐗𝐘𝐙\mathbf{X}\cup\mathbf{Y}\cup\mathbf{Z} than CC.

  • If QXQ\equiv X^{\prime}, X𝐗X^{\prime}\in\mathbf{X}, then:

    • *

      if BYB^{\prime}\neq Y, then let s=Q,Bp(B,Y)s=\langle Q,B^{\prime}\rangle\oplus p(B^{\prime},Y). This path transformation amounts to replacing p(X,B)p(X,B^{\prime}) on pp with X,B\langle X^{\prime},B^{\prime}\rangle on ss such that the collider / definite non-collider status of BB^{\prime} is the same on both paths, and ss is a non-causal path because of XBX^{\prime}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}B^{\prime} edge. Therefore, ss is a path with the same properties as pp shorter than pp.

    • *

      If B=YB^{\prime}=Y, then let s=Q,Bs=\langle Q,B^{\prime}\rangle, where based on the reasoning above, ss is of the form XYX^{\prime}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y.

  • Otherwise, QYQ\equiv Y^{\prime}, Y𝐘Y^{\prime}\in\mathbf{Y}. Then

    • *

      if AXA^{\prime}\neq X, then s=p(X,A)A,Qs=p(X,A^{\prime})\oplus\langle A^{\prime},Q\rangle. Note that in this case ss is of the form XAYX\leftrightarrow\dots\leftrightarrow A^{\prime}\leftrightarrow Y^{\prime}, or XAYX\leftarrow\dots\leftarrow A^{\prime}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Y^{\prime}, or XAYX\leftarrow\dots\leftarrow A^{\prime}\to Y^{\prime}. In all cases, ss is a proper non-causal definite status path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙.\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}.

    • *

      If A=XA^{\prime}=X, then let s=A,Qs=\langle A^{\prime},Q\rangle. We now discuss why ss is of the form XYX\leftrightarrow Y^{\prime} in 𝒢\mathcal{G}.

      Note that XYX\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Y^{\prime} cannot be in 𝒢\mathcal{G}, since there exists a set 𝐒\mathbf{S} that can satisfy the conditional adjustment criterion relative to 𝐗,𝐘,𝐙\mathbf{X,Y,Z} in 𝒢\mathcal{G}. If instead XYX\to Y^{\prime} is a visible edge in 𝒢\mathcal{G}^{\prime}, then there is either a node DAdj(Y,𝒢)D\notin\operatorname{Adj}(Y^{\prime},\mathcal{G}) such that DXD\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X is in 𝒢\mathcal{G} or there is a collection of nodes D1,,DkD_{1},\dots,D_{k}, such that D1Adj(Y,𝒢)D_{1}\notin\operatorname{Adj}(Y^{\prime},\mathcal{G}), D2,,DkPa(Y,𝒢)D_{2},\dots,D_{k}\in\operatorname{Pa}(Y^{\prime},\mathcal{G}), and D1D2DkXD_{1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}D_{2}\leftrightarrow\dots\leftrightarrow D_{k}\leftrightarrow X is in 𝒢\mathcal{G}. Without loss of generality we will assume that we are in the fist case, that is DXD\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X is in 𝒢\mathcal{G} and DAdj(Y,𝒢)D\notin\operatorname{Adj}(Y^{\prime},\mathcal{G}), since the latter case has an analogous proof to what follows.

      By above, the only way way that AXA^{\prime}\equiv X is if XV2CX\leftrightarrow V_{2}\leftrightarrow\dots\leftrightarrow C is in 𝒢\mathcal{G} and if for all nodes Vj{V2,,Vi2,Vi1,Vi},VjYV_{j}\in\{V_{2},\dots,V_{i-2},V_{i-1},V_{i}\},V_{j}\to Y^{\prime}, or VjYV_{j}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Y^{\prime} is in 𝒢\mathcal{G}. Now since, DXV2Vi1CD\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X\leftrightarrow V_{2}\leftrightarrow\dots\leftrightarrow V_{i-1}\leftrightarrow C is also in 𝒢\mathcal{G}, and DAdj(Y,𝒢)D\notin\operatorname{Adj}(Y^{\prime},\mathcal{G}), we can use R4R4 of Zhang (2008b) iteratively to conclude that VjYV_{j}\to Y^{\prime} is in 𝒢\mathcal{G} for all j{1,,i}j\in\{1,\dots,i\}. However, as ViCV_{i}\equiv C, this contradicts our assumption that CYC\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Y^{\prime} is in 𝒢\mathcal{G}, for Y=QY^{\prime}=Q.

Otherwise, QQ is on pp. Therefore, Q𝐗𝐘Q\notin\mathbf{X}\cup\mathbf{Y}.

  • Suppose first that QQ is on p(C,Y)p(C,Y). By (iii), (iv), and the definition of Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), we have that p(C,Y)p(C,Y) is of one of the following forms:

    • *

      CQVkYC\leftrightarrow\dots\leftrightarrow Q\leftrightarrow\dots\leftrightarrow V_{k}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\leftarrow$} \put(3.0,0.0){$\bullet$} \end{picture}Y for k>ik>i, or

    • *

      CQT1YC\leftrightarrow\dots\leftrightarrow Q\leftrightarrow\dots\leftrightarrow T_{1}\to\dots\to Y, for some T1T_{1} on p(C,Y)p(C,Y), or

    • *

      CT2QYC\leftrightarrow\dots\leftrightarrow T_{2}\to\dots\to Q\to\dots\to Y, for some T2T_{2} on p(C,Y)p(C,Y), or

    • *

      CQYC\leftrightarrow\dots\leftrightarrow Q\to\dots\dots\to Y.

    Then

    • *

      If AXA^{\prime}\neq X, then s=p(X,A)A,Qp(Q,Y)s=p(X,A^{\prime})\oplus\langle A^{\prime},Q\rangle\oplus p(Q,Y). Note that by above forms of p(C,Y)p(C,Y) ss is always a is a proper non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y}. Additionally, by above listed options for p(C,Y)p(C,Y) we know that QQ has the same collider / definite non-collider status on both pp and ss. Hence, ss is also an m-connecting path given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}. Since ss is also shorter than pp we obtain our contradiction.

    • *

      If AXA^{\prime}\equiv X, we let s=p(X,Q)p(Q,Y)s=p(X,Q)\oplus p(Q,Y). Path ss is proper, since pp itself is proper and Q𝐗𝐘Q\notin\mathbf{X\cup Y}. Furthermore, by the above listed options for p(C,Y)p(C,Y) we know that QQ has the same collider / definite non-collider status on both pp and ss and that ss is a definite status path. Hence, ss is also an m-connecting path given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G})\cup\mathbf{Z}. If ss is a non-causal path in 𝒢\mathcal{G}, we obtain a contradiction with the choice of pp.

      Hence, suppose for a contradiction that ss is a possibly causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} in 𝒢\mathcal{G}. By assumption, it must be that XQX\to Q is a visible edge in 𝒢\mathcal{G}. Now, similarly to the previous case, since XQX\to Q is a visible edge in 𝒢\mathcal{G}, there is either a node DAdj(Q,𝒢)D\notin\operatorname{Adj}(Q,\mathcal{G}) such that DXD\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X is in 𝒢\mathcal{G} or there is a collection of nodes D1,,DkD_{1},\dots,D_{k} such that D1Adj(Q,𝒢)D_{1}\notin\operatorname{Adj}(Q,\mathcal{G}), D2,,DkPa(Q,𝒢)D_{2},\dots,D_{k}\in\operatorname{Pa}(Q,\mathcal{G}), and D1D2DkXD_{1}\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}D_{2}\leftrightarrow\dots\leftrightarrow D_{k}\leftrightarrow X is in 𝒢\mathcal{G}. We again assume without loss of generality that we are in the former case, that is DXD\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X is in 𝒢\mathcal{G} and DAdj(Q,𝒢)D\notin\operatorname{Adj}(Q,\mathcal{G}).

      Since AXA^{\prime}\equiv X, by the same reasoning as in the previous case above we know that XV2CX\leftrightarrow V_{2}\leftrightarrow\dots\leftrightarrow C is in 𝒢\mathcal{G} and that for all nodes Vj{V1,,Vi1,Vi},VjQV_{j}\in\{V_{1},\dots,V_{i-1},V_{i}\},V_{j}\to Q, or VjQV_{j}\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}. Now since, DXV2CD\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}X\leftrightarrow V_{2}\leftrightarrow\dots\leftrightarrow C is in 𝒢\mathcal{G}, and since DAdj(Q,𝒢)D\notin\operatorname{Adj}(Q,\mathcal{G}), we can use R4R4 of Zhang (2008b) iteratively to conclude that VjQV_{j}\to Q is in 𝒢\mathcal{G} for all j{1,,i}j\in\{1,\dots,i\}. However, as ViCV_{i}\equiv C, this contradicts our assumption that CQC\begin{picture}(5.0,1.0)(0.0,0.0)\put(1.0,1.0){\circle{1.0}} \put(1.2,0.0){$\rightarrow$} \end{picture}Q is in 𝒢\mathcal{G}.

  • Lastly, suppose that QQ is on p(X,C)p(X,C). Analogously to above, by (iii), (iv), and the definition of Forb(𝐗,𝐘,𝒢)\operatorname{Forb}(\mathbf{X,Y},\mathcal{G}), we have that p(X,C)p(X,C) is of one of the following forms:

    • *

      XV2QCX\begin{picture}(5.0,1.0)(0.0,0.0)\put(0.2,0.0){$\bullet$} \put(1.0,0.0){$\rightarrow$} \end{picture}V_{2}\leftrightarrow\dots\leftrightarrow Q\leftrightarrow\dots\leftrightarrow C, or

    • *

      XT1QCX\leftarrow\dots\leftarrow T_{1}\leftrightarrow\dots\leftrightarrow Q\leftrightarrow\dots\leftrightarrow C, for some T1T_{1} on p(X,C)p(X,C), or

    • *

      XQT2CX\leftarrow\dots\leftarrow Q\leftarrow\dots\leftarrow T_{2}\leftrightarrow\dots\leftrightarrow C, for some T2T_{2} on p(X,C)p(X,C), or

    • *

      XQCX\leftarrow\dots\leftarrow Q\leftrightarrow\dots\leftrightarrow C.

    Then

    • *

      If BYB^{\prime}\neq Y, we have that s=p(X,Q)Q,Bp(B,Y)s=p(X,Q)\oplus\langle Q,B^{\prime}\rangle\oplus p(B,Y) is a proper non-causal path from 𝐗\mathbf{X} to 𝐘\mathbf{Y} that is shorter than pp. Additionally, QQ is of the same collider / definite non-collider status on both pp and ss and therefore, ss is not only of definite status, but also m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}} in 𝒢\mathcal{G} which leads to a contradiction.

    • *

      If BYB^{\prime}\equiv Y, then s=p(X,Q)Q,Bs=p(X,Q)\oplus\langle Q,B^{\prime}\rangle is a proper definite status non-causal path that is m-connecting given Adjust(𝐗,𝐘,𝐙,𝒢)𝐙\operatorname{Adjust}(\mathbf{X,Y,Z},\mathcal{G}){\cup\mathbf{Z}} in 𝒢\mathcal{G} and shorter than pp.