This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Axiomatization of Interventional Probability Distributions

Kayvan Sadeghilabel=e1] [email protected] \orcid0000-0001-7314-744X [    Terry Soolabel=e2][email protected] \orcid0000-0003-1744-1876 [ Department of Statistical Science, University College Londonpresep=, ]e1,e2
Abstract

Causal intervention is an essential tool in causal inference. It is axiomatized under the rules of do-calculus in the case of structure causal models. We provide simple axiomatizations for families of probability distributions to be different types of interventional distributions. Our axiomatizations neatly lead to a simple and clear theory of causality that has several advantages: it does not need to make use of any modeling assumptions such as those imposed by structural causal models; it only relies on interventions on single variables; it includes most cases with latent variables and causal cycles; and more importantly, it does not assume the existence of an underlying true causal graph as we do not take it as the primitive object—in fact, a causal graph is derived as a by-product of our theory. We show that, under our axiomatizations, the intervened distributions are Markovian to the defined intervened causal graphs, and an observed joint probability distribution is Markovian to the obtained causal graph; these results are consistent with the case of structural causal models, and as a result, the existing theory of causal inference applies. We also show that a large class of natural structural causal models satisfy the theory presented here. We note that the aim of this paper is axiomatization of interventional families, which is subtly different from “causal modeling.”

62H22,
62A01,
ancestral graphs,
causal graphs,
directed acyclic graphs,
do-calculus,
interventional distributions,
Markov properties,
structural causal models,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs

and

1 Introduction

A popular approach to infer causal relationships is to use the concept of intervention as opposed to observation. For example, as described in [24], it can be observed that there is a correlation between smoking and the colour of the teeth, but no matter how much one whitens somebody’s teeth, it would not affect their smoking habits. On the other hand, forcing someone to smoke would affect the colour of their teeth. Hence, smoking has a causal effect on the colour of the teeth, but not vice versa.

Interventions have generally been embedded in the setting of structural causal models (SCMs), also known as structural equation models [21, 32]. These are a system of assignments for a set of random variables ordered by an associated “true causal graph,” which is generally assumed to be unknown. SCMs utilise the theory of graphical (Markov) models, which are statistical models over graphs with nodes as random variables and edges that indicate some types of conditional dependencies; see [17].

An axiomatic approach to interventions for SCMs, known as Pearl’s do-calculus [22], has been developed for identifiability of interventional distributions from the observational ones; see also [13, 31] for some further theoretical developments. There has also been a substantial amount of work on generalizing the concept of intervention from the case of directed acyclic graphs (DAGs) (i.e., Bayesian networks) to more general graphs containing bidirected edges (which indicate the existence of latent variables) (see, e.g., [36]), and directed cycles [3]. However, most of these attempts stay within the setting of SCMs or, at least, under the assumption that there exists an underlying “true causal graph” that somehow captures the causal relationships [35].

Interventions in the literature (i.e. on SCMs) have been defined to be of different forms; see [16, 10]. The type of intervention with which we are dealing here is hard in the sense that it destroys all the causes of the intervened variable, and is stochastic in the sense that it replaces the marginal distribution of the intervened variable with a new distribution; although, we will show in Remark 8 that an atomic (also called surgical) intervention (which forces the variable to have a specific value) can be easily adapted in this setting.

In this paper, without assuming any modeling assumptions such as those given in the setting of SCMs, we give simple conditions for a family of joint distributions 𝒫do={Pdo(1),,Pdo(N)}\mathcal{P}_{\mathrm{do}}=\left\{{P_{\mathrm{do}(1)},\ldots,P_{\mathrm{do}(N)}}\right\} to act as a well-behaved interventional family, so that one can think of Pdo(i)P_{\mathrm{do}(i)} as an interventional distribution on a single variable XiX_{i}, for each iV={1,,N}i\in V=\{1,\ldots,N\} in a random vector XVX_{V}. As apparent from the context, our approach here is aligned with the interventional approach to causality rather than the counterfactual approach.

Here, we are not providing an alternative to the current mainstream setting (sometimes called the Pearlian setting), in the case where it uses the interventional approach, and in most cases, SCMs, and which has led to extensive work on causal learning and estimation. We simply provide theoretical backing for this approach and generalize it beyond SCMs, by providing certain axioms in order to derive as results some of the assumptions that have been used in the literature (such as the existence of the causal graph and the global Markov property w.r.t. it).

This paper carries certain important messages: (1) There is no need to take the true causal graph as the primitive object: causal graph(s) can then be formally defined and derived from interventional families (rather than posited). (2) The causal structure (and graph) can be solely derived from the family of interventional distributions; in other words, there is no need for an initial state, i.e., an underlying joint observational distribution PP of XVX_{V}, to be be known for this purpose. However, we provide axioms such that the required consistency between the interventional family and the “underlying” observational distribution is satisfied when indeed the observational distribution is available, and such that one can measure the causal relationships. (3) To derive the causal structure or graph, (in most situations) one needs to rely only on single interventions once at a time. This is an advantage as much less information is used by only relying on single interventions. Indeed, there are real world situations in which one would like to consider intervening on several variables simultaneously; we believe a similar theory can be proposed in such cases.

We must emphasize that the work presented here is about axioms that interventional distributions should satisfy for the purpose of causal reasoning. These axioms should not be confused with a causal model whose goal is to provide “correct” interpretation of causal relationships and measuring their effects. This difference is quite subtle and could lead to confusion. Similarly, one should distinguish the “causal graphs” defined and derived here from a graph learned by structure learning from observational and, potentially, interventional data (as in, e.g., [32, 5]). The goal here is not structure learning.

1.1 Key results

One of our central assumptions, Axiom 1, is that cause is transitive; see [12] for a philosophical discussion on the transitivity of the cause. Under the condition of singleton-transitivity and simple assumptions on conditional independence structure of 𝒫do\mathcal{P}_{\mathrm{do}}, we show that the causal relations are transitive; see Theorem 10. We provide a definition of causation similar to that in [24]. The concept of direct cause is defined in terms of the conditional independence properties of the interventional family (which is a departure from the widely-known definition [35, page 55]), and from this we define the intervened causal graph, and using these, we define the causal graph; see Section 4.2; this is a major relaxation of assumptions from the current paradigm where it is assumed that such a causal graph exists. The obtained causal graph allows bidirected edges and directed cycles without double edges consisting of a bidirected edge and an arrow. We call this family of graphs bowless directed mixed graphs (BDMGs). The generated graph is the “true” causal graph under the axiomatization, and we show that the definitions related to causal relationships and the graphical notions on the graph are interchangeable (Theorem 20).

One of our main theorems is that, under some additional assumptions, namely intersection and composition, intervened distributions Pdo(i)P_{\mathrm{do}(i)} in the interventional family are Markovian to the defined intervened graphs; see Theorem 26. We provide additional axioms (Axioms 2 and 3) to relate 𝒫do\mathcal{P}_{\mathrm{do}} to an observed distribution PP, and call the interventional family (strongly) observable. We show that the underlying distribution PP for an observable interventional family is Markovian to the defined causal graph; see Theorem 28. Therefore, the established theory of causality using SCMs, which mainly relies on the Markov property of the joint distribution of the SCM, could be followed from our theory.

We later provide additional axioms (Axioms 4 and 5) for the case of ancestral causal graphs, to define what we call quantifiable interventional families that allow for measuring causal effects. We show that the quantifiable interventional families are strongly observational; see Theorem 47.

We also compare and contrast our theory with the SCM setting. We show that, for SCMs with certain simple properties (which include transitivity of the cause and are implied by faithfulness), the family of interventions on each node constitutes a strongly observable interventional family, and (even without the transitivity assumption in the case of ancestral graphs) the causal graph generated by the theory presented in this paper is the same as the causal graph associated to the SCM; see Theorem 60.

Our theory is based only on intervening on single variables once at a time. We clearly identify cases (which can only be non-maximal and non-ancestral), where this theory may misidentify some direct causes; see Section 9.

1.2 Related works

To the best of our knowledge, most of the attempts to abstracting intervention or causality based on intervention in general are substantially different from our approach; see for example [28] for a category-theory approach, and more recent [20] for a measure-theoretic approach.

One such attempt is the seminal work of Dawid on the decision theoretic framework for causal inference; for example, see [8, 9]. Our approach share the same concerns and spirit as that of Dawid’s in focusing on the interventions rather than counterfactuals, as well as trying to justify the existence of the causal graph rather than assuming it, as we have heeded Dawid’s caution [6]. However, the mechanics of the two works are different. First of all, we have not provided any statistical model like Dawid does. In addition, the goal of Dawid’s work is mainly to enable “transfer of probabilistic information from an observational to an interventional setting,” whereas, here, our starting point is interventions. Finally, our approach also covers a much more general class of causal graphs than DAGs, considered by Dawid. We have not included influence diagrams, as proposed by Dawid, in our setting, but believe that it should be possible to derive them together with their conditional-independence constraints using our causal graphs and Markov properties.

A more similar approach to ours is that of Bareinboim, Brito, and Pearl [1]. Like us the authors start with a family of interventional distributions with interventions defined on single variables. A difference is that they only work on atomic interventions, which requires alternative definitions to conditional independence (which are phrased as invariances)—we believe this can be adapted to use stochastic intervention and regular conditional independence. Using this, they define the concept of direct cause, which is different, but of similar to how we define this concept. A major difference is that, in their paper, they provide different notions of compatibility of the interventional family and causal graphs by assuming certain conditions that include the global Markov property—in our work, we do not assume the global Markov property, and generate a graph directly from the interventional family by using direct cause, and prove the Markov property under certain axioms and conditions. Another important difference is that their work relies on the notion of observed initial state distribution to define interventions—as mentioned before, we do not need to rely on observational distribution to derive the causal graph.

The mentioned paper is purely on DAGs, but in a more recent work [2] the method was generalized to include arcs representing latent variables. The notion of Markov property has been replaced by semi-Markov property to deal with this generalization, but the difference between the two methods remains the same as described for the original paper.

1.3 Structure of the manuscript

In the next section, we provide some preliminary material including novel results on the equivalence of the pairwise and global Markov property in our newly defined class of BDMGs, which are used in the subsequent sections. In Section 3, we define foundational concepts related to interventional families, and provide conditions for the axiom of transitivity of the cause. In Section 4, we define the concepts of direct cause and intervened causal as well as causal graphs, and prove the Markov property of the intervened distribution to the intervened causal graph. In Section 5, we prove the Markov property of the underlying distribution to the causal graph under the observable interventional axioms. In Section 6, we specialize the definitions and results for directed ancestral graphs. In Section 7, we provide additional axioms of quantifiable interventional families for the purpose of measuring causal effects. In Section 8, we show how intervention on SCMs fits within the framework of this paper. In Section 9, we provide cases where only single interventions may misidentify certain direct causes. We conclude the paper in Section 10. In the Appendix, we provide certain proofs of results presented in Section 2.4, including the proof of equivalence of the pairwise and global Markov property for BDMGs.

2 Preliminaries

In this section, we provide the basic concepts of graphical and causal modeling needed in the paper.

2.1 Conditional measures and independence

We will work in the following setting. Let VV be a finite set of size NN. Let PP be a probability measure on the product measurable space 𝒳=iV𝒳i\mathcal{X}=\prod_{i\in V}\mathcal{X}_{i}. For AVA\subseteq V, we let 𝒳A=iA𝒳i\mathcal{X}_{A}=\prod_{i\in A}\mathcal{X}_{i} and PAP^{A} be the marginal measure of PP on 𝒳A\mathcal{X}_{A} given by

PA(W)=P(W×𝒳VA)P^{A}(W)=P(W\times\mathcal{X}_{V\setminus A})

for all measurable W𝒳AW\subseteq\mathcal{X}_{A}. We will use the notation i Pji\mbox{$\>\perp\perp$ }_{P}j to mean that the marginal P{i,j}=Pi,jP^{\left\{{i,j}\right\}}=P^{i,j} is the product measure PiPjP^{i}\otimes P^{j} on 𝒳i×𝒳j\mathcal{X}_{i}\times\mathcal{X}_{j}, so that if X=(X1,,XN)X=(X_{1},\ldots,X_{N}) is a random vector (defined on some probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P})) taking values on 𝒳\mathcal{X} with law PP, then XiX_{i} is independent of XjX_{j} [7, 17].

For xA𝒳Ax_{A}\in\mathcal{X}_{A}, we let P(|xA)=(X|XA=xA)P(\cdot|x_{A})=\mathbb{P}(X\in\cdot|X_{A}=x_{A}) denote a regular conditional probability [4] so that in particular, we have the disintegration

P(F)=𝒳AP(F|xA)𝑑PA(xA),P(F)=\int_{\mathcal{X}_{A}}P(F\,|\,x_{A})dP^{A}(x_{A}),

for F𝒳F\subseteq\mathcal{X} measurable. More generally, consider disjoint subsets AA, BB, and CC of VV. We will often consider the marginal of a conditional measure, and have a slight abuse of notation that:

PA(|xC)=[P(|xC)]A.P^{A}(\cdot\,|\,x_{C})=[P(\cdot\,|\,x_{C})]^{A}.

We write A PB|CA\mbox{$\>\perp\perp$ }_{P}B\,|\,C to denote that the measure PA,B(|xC)=PAB(|xC)P^{A,B}(\cdot|x_{C})=P^{A\cup B}(\cdot|x_{C}) is a product measure on 𝒳A×𝒳B\mathcal{X}_{A}\times\mathcal{X}_{B} for PCP^{C}-almost all xC𝒳Cx_{C}\in\mathcal{X}_{C}, so that if XX has law PP, then XAX_{A} is conditionally independent of XBX_{B} given XCX_{C}. Sometimes, we will simply say that AA is conditionally independent of BB given CC in PP. In addition, when independence fails, we write A PB|CA\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P}B\,|\,C and say that AA and BB are conditionally dependent given CC in PP.

2.2 Structural independence properties of a distribution

A probability distribution PP is always a semi-graphoid [21], i.e., it satisfies the four following properties for disjoint subsets AA, BB, CC, and DD of VV:

  1. 1.

    A PB|CA\mbox{$\>\perp\perp$ }_{P}B\,|\,C if and only if B PA|CB\mbox{$\>\perp\perp$ }_{P}A\,|\,C (symmetry);

  2. 2.

    if A PBD|CA\mbox{$\>\perp\perp$ }_{P}B\cup D\,|\,C, then A PB|CA\mbox{$\>\perp\perp$ }_{P}B\,|\,C and A PD|CA\mbox{$\>\perp\perp$ }_{P}D\,|\,C (decomposition);

  3. 3.

    if A PBD|CA\mbox{$\>\perp\perp$ }_{P}B\cup D\,|\,C, then A PB|CDA\mbox{$\>\perp\perp$ }_{P}B\,|\,C\cup D and A PD|CBA\mbox{$\>\perp\perp$ }_{P}D\,|\,C\cup B (weak union);

  4. 4.

    if A PB|CDA\mbox{$\>\perp\perp$ }_{P}B\,|\,C\cup D and A PD|CA\mbox{$\>\perp\perp$ }_{P}D\,|\,C, then A PBD|CA\mbox{$\>\perp\perp$ }_{P}B\cup D\,|\,C (contraction).

Notice that the reverse implication of contraction clearly holds by decomposition and weak union. We so use three different properties of conditional independence that are not always satisfied by probability distributions:

  1. 5.

    if A PB|CDA\mbox{$\>\perp\perp$ }_{P}B\,|\,C\cup D and A PD|CBA\mbox{$\>\perp\perp$ }_{P}D\,|\,C\cup B, then A PBD|CA\mbox{$\>\perp\perp$ }_{P}B\cup D\,|\,C (intersection);

  2. 6.

    if A PB|CA\mbox{$\>\perp\perp$ }_{P}B\,|\,C and A PD|CA\mbox{$\>\perp\perp$ }_{P}D\,|\,C, then A PBD|CA\mbox{$\>\perp\perp$ }_{P}B\cup D\,|\,C (composition);

  3. 7.

    if i Pj|Ci\mbox{$\>\perp\perp$ }_{P}j\,|\,C and i Pj|C{k}i\mbox{$\>\perp\perp$ }_{P}j\,|\,C\cup\{k\}, then i Pk|Ci\mbox{$\>\perp\perp$ }_{P}k\,|\,C or j Pk|Cj\mbox{$\>\perp\perp$ }_{P}k\,|\,C (singleton-transitivity),

where ii, jj, and kk are single elements. A semi-graphoid distribution that satisfies intersection is called graphoid. If the distribution PP is a regular multivariate Gaussian distribution, then PP is a singleton-transitive compositional graphoid; for example see [33] and [21]. If PP has strictly positive density, it is always a graphoid; see, for example, Proposition 3.1 in [17].

Remark 1.

We note that if PP has full support over its state space, then it satisfies the intersection property; for a comprehensive discussion and necessary and sufficient conditions, see [23]. \Diamond

Finally, we define the concept of ordered stabilities [29]. We say that PP satisfies ordered upward- and downward-stability w.r.t. an order \leq of VV if the following hold:

  1. \bullet  

    if i Pj|Ci\mbox{$\>\perp\perp$ }_{P}j\,|\,C, then i Pj|C{k}i\mbox{$\>\perp\perp$ }_{P}j\,|\,C\cup\{k\} for every kV{i,j}k\in V\setminus\{i,j\} such that i<ki<k or j<kj<k (ordered upward-stability);

  2. \bullet  

    if i Pj|Ci\mbox{$\>\perp\perp$ }_{P}j\,|\,C, then i Pj|C{k}i\mbox{$\>\perp\perp$ }_{P}j\,|\,C\setminus\{k\} for every kV{i,j}k\in V\setminus\{i,j\} such that iki\nless k, jkj\nless k, and k\ell\nless k for every C{k}\ell\in C\setminus\{k\} (ordered downward-stability).

2.3 Graphs and their properties

We usually refer to a graph as an ordered pair G=(V,E)G=(V,E), where VV is the node set and EE is the edge set. When nodes ii and jj are the endpoints of an edge, we call them adjacent, and write iji\sim j, and otherwise iji\nsim j.

We consider two types of edges: arrows (iji\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j) and bidirected edges or arcs (iji\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j). We do not consider graphs that have simultaneous third type of edge: undirected edges or lines (iji\mbox{$\,\frac{\hskip 4.89998pt\hskip 4.89998pt\;}{\hskip 4.89998pt\hskip 4.89998pt}$}j). We only allow for the possibility of multiple edges between nodes when they are arrows in two different directions between ii and jj, i.e., iji\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j and iji\mbox{$\hskip 0.50003pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}$}j, which we call parallel arrows. This means that we do not allow bows, i.e., a multiple edge of arrow and arc, to appear in the graph.

A subgraph of a graph G1G_{1} is graph G2G_{2} such that V(G2)V(G1)V(G_{2})\subseteq V(G_{1}) and E(G2)E(G1)E(G_{2})\subseteq E(G_{1}) and the assignment of endpoints to edges in G2G_{2} is the same as in G1G_{1}. An induced subgraph by nodes AVA\subseteq V is the subgraph that contains all and only nodes in AA and all edges between two nodes in AA.

A walk is a list v0,e1,v1,,ek,vk\langle v_{0},e_{1},v_{1},\dots,e_{k},v_{k}\rangle of nodes and edges such that for 1ik1\leq i\leq k, the edge eie_{i} has endpoints vi1v_{i-1} and viv_{i}. A path is a walk with no repeated node or edge. When we define a path, we only write the nodes (and not the edges). A maximal set of nodes in a graph whose members are connected by some paths constitutes a connected component of the graph. A cycle is a walk with no repeated nodes or edges except for v0=vkv_{0}=v_{k}.

We call the first and the last nodes endpoints of the path and all other nodes inner nodes. A path can also be seen as a certain type of connected subgraph of GG; a subpath of a path π\pi is an induced connected subgraph of π\pi. For an arrow jij\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}i, we say that the arrow is from jj to ii. We also call jj a parent of ii, ii a child of jj and we use the notation pa(i)\mathrm{pa}(i) for the set of all parents of ii in the graph. In the cases of iji\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j or iji\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j we say that there is an arrowhead at jj or pointing to jj. A path i=i0,i1,,in=j\langle i=i_{0},i_{1},\dots,i_{n}=j\rangle (or a cycle where i=ji=j) is directed from ii to jj if all ikik+1i_{k}i_{k+1} edges are arrows pointing from iki_{k} to ik+1i_{k+1}. If there is a directed path from ii to jj, then node ii is an ancestor of jj and jj is a descendant of ii. We denote the set of ancestors of jj by an(j)\mathrm{an}(j); unlike some authors, we do not allow jan(j)j\in\mathrm{an}(j). Similarly, we define an ancestor of a set of nodes AVA\subset V given by an(A):=[jAan(j)]A\mathrm{an}(A):=[\bigcup_{j\in A}\mathrm{an}(j)]\setminus A. If necessary, we might write anG\mathrm{an}_{G} to specify that this is the set of ancestors in GG.

A strongly connected component of a graph is the set of nodes that are mutually ancestors of each other, or it is a single node if that node does not belong to any directed cycle. It can be observed that nodes of the graph are partitioned into strongly connected components. We denote the members of the strongly connected component containing node ii by sc(i)\mathrm{sc}(i).

A tripath is a path with three nodes. The inner node tt in each of the three tripaths

itj,itj,itji\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}\,t\mbox{$\hskip 0.50003pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}$}\,j,\ i\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}\,t\mbox{$\hskip 0.50003pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}$}\,\ j,\ i\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}\,t\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}\,j

is a collider (or a collider node) and the inner node of any other tripath is a non-collider (or a non-collider node) on the tripath or, more generally, on any path of which the tripath is a subpath; i.e. a node is a collider if two arrowheads meet. A path is called a collider path if all its inner nodes are colliders.

The most general class of graphs that naturally arises from the theory presented in this paper is what we call the bowless directed mixed graph (BDMG), which consists of arrows and arcs, and the only multiple edges are parallel arrows (i.e., they are bowless).

Ancestral graphs [27] are graphs with arcs and arrows with no directed cycles and no arcs ijij such that ian(j)i\in\mathrm{an}(j). Acyclic directed mixed graphs (ADMGs) [26] are graphs with arcs and arrows with no directed cycles. In other words, BDMGs unify ADMGs without bows and directed cycles. BDMGs also trivially contain the class of directed ancestral graphs, i.e., ancestral graphs [27] without lines. These all also contain directed acyclic graphs (DAGs) [14], which are graphs with only arrows and no directed cycles.

The class of BDMGs is a subclass of directed mixed graphs, introduced in [3], which is a very general class of graphs with arrows and arcs that allow for directed cycles. Later on, we will use some definitions and results originally defined for directed mixed graphs.

2.4 Markov properties

In this paper, we will need global and pairwise Markov properties for BDMGs. In order to introduce the global Markov property, we need to define the concept of σ\sigma-separation for directed mixed graphs (in fact, originally defined for the larger class of directed graphs with hyperedges in [11]).

A path π=i=i0,i1,,in=j\pi=\langle i=i_{0},i_{1},\cdots,i_{n}=j\rangle is said to be σ\sigma-connecting given CC, which is disjoint from i,ji,j, if all its collider nodes are in Can(C)C\cup\mathrm{an}(C); and all its non-collider nodes iri_{r} are either outside CC, or if there is an arrowhead at ir1i_{r-1}, then ir1sc(ir)i_{r-1}\in\mathrm{sc}(i_{r}) and if there is an arrowhead at ir+1i_{r+1} on π\pi then ir+1sc(ir)i_{r+1}\in\mathrm{sc}(i_{r}). For disjoint subsets A,B,CA,B,C of VV, we say that AA and BB are σ\sigma-separated given CC, and write AσB|CA\,\mbox{$\perp$}\,_{\sigma}B\,|\,C, if there are no σ\sigma-connecting paths between AA and BB given CC.

In the case where there are no directed cycles in the graph, σ\sigma-separation reduces to the mm-separation of [27]; recall that π\pi is mm-connecting given CC if all its collider nodes are in Can(C)C\cup\mathrm{an}(C); and all its non-collider nodes are outside CC. In addition, if there are no arcs in the graph, i.e., the graph is a DAG, it reduces to the well-known dd-separation [21].

We call two graphs Markov equivalent if they induce the same set of conditional separations.

A probability distribution PP defined over VV satisfies the global Markov property w.r.t. a bowless directed mixed graph GG, or is simply Markovian to GG, if for disjoint subsets AA, BB, and CC of VV, we have

AσB|CA B|C.A\,\mbox{$\perp$}\,_{\sigma}B\,|\,C\implies A\mbox{$\>\perp\perp$ }B\,|\,C.

If GG is an ancestral graph (or a DAG), then σ\,\mbox{$\perp$}\,_{\sigma} will be replaced by m\,\mbox{$\perp$}\,_{m} (or d\,\mbox{$\perp$}\,_{d}) in the definition of the global Markov property.

If, in addition to the global Markov property, the other direction of the implication holds, i.e., AσB|CA B|CA\,\mbox{$\perp$}\,_{\sigma}B\,|\,C\iff A\mbox{$\>\perp\perp$ }B\,|\,C, then we say that PP and GG are faithful. A weaker condition of adjacency-faithfulness [25, 37] states that for every edge between kk and jj in GG, there are no independence statements k Pj|Ck\mbox{$\>\perp\perp$ }_{P}j\,|\,C for any CC.

We now define that a distribution PP satisfies the pairwise Markov property (PMP) w.r.t. a bowless directed mixed graph GG, if for every pair of non-adjacent nodes i,ji,j in GG, we have

i Pj|an({i,j}).i\mbox{$\>\perp\perp$ }_{P}j\,|\,\mathrm{an}(\{i,j\}). (PMP)

This is the same wording as that of the pairwise Markov property for the subclass of ancestral graphs; see [19].

We prove the equivalence of the pairwise and global Markov properties, which shall be used later for causal graphs and Theorem 28. The proofs are presented in the Appendix as they are not the main focus of this manuscript.

Theorem 2.

Let GG be a BDMG, and PP satisfy the intersection and composition properties. If PP satisfies the pairwise Markov property (PMP) with respect to GG, then PP is Markovian to GG.

We also define the converse of the pairwise Markov property. We say that PP satisfies the converse pairwise Markov property w.r.t. GG if an edge between ii and jj in GG implies that

i Pj|an({i,j}).i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P}j\,|\,\mathrm{an}(\{i,j\}).

Notice that faithfulness and adjacency-faithfulness of P𝒞P_{\mathcal{C}} and G𝒞G_{\mathcal{C}} imply the converse pairwise Markov property; see [30].

A graph is called maximal if the absence of an edge between ii and jj corresponds to a conditional separation statement for ii and jj, i.e. there exists for some CC a statement of form ij|Ci\,\mbox{$\perp$}\,j\,|\,C. Notice from the definition of σ\sigma-separation that graphs with chordless directed cycles (i.e., having two non-adjacent nodes in a cycle) are not maximal.

We have the following corresponding converse to Theorem 2.

Proposition 3.

Let GG be a maximal BDMG. If PP is Markovian to GG, then PP satisfies the pairwise Markov property (PMP) with respect to GG.

We call a non-adjacent pair of nodes which cannot be σ\sigma-separated in a non-maximal graph, regardless of what to condition on, an inseparable pair, and a non-adjacent pair of nodes which can be σ\sigma-separated in a maximal or non-maximal graph a separable pair.

For BDMGs, we define a primitive inducing path (PIP) to be a path i,q1,,qr,j\langle i,q_{1},\cdots,q_{r},j\rangle (with at least 33 nodes) between ii and jj, where

  1. (i)

    all edges qmqm+1q_{m}q_{m+1} are either arcs or an arrow where qmsc(qm+1)q_{m}\in\mathrm{sc}(q_{m+1}) except for the first and last edges, which may be iq1i\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}q_{1} or qrjq_{r}\mbox{$\hskip 0.50003pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}$}j (without being in the same connected component);

  2. (ii)

    for all inner nodes, we have qman({i,j})q_{m}\in\mathrm{an}(\{i,j\}), i.e., they are in ancestors of ii or jj.

PIPs were originally defined for the case of ancestral graphs (where they were allowed to be an edge) [27]. We show in the Appendix the result below:

Proposition 4.

In a BDMG, inseparable pairs are connected by PIPs.

In addition, notice from Proposition 3 that if PP is Markovian to GG, then a pair i,ji,j being a separable pair is equivalent to the separation iσj|an{i,j}i\,\mbox{$\perp$}\,_{\sigma}j\,|\,\mathrm{an}{\{i,j\}}.

We say that a graph G=(V,E)G=(V,E) admits a valid order \leq if for nodes ii and jj of GG, iji\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j implies that i>ji>j; and iji\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j implies that ii and jj are incomparable. Notice that this specifies the partial order via its cover relations. In fact, this order can is used as the order w.r.t. which ordered upward- and downward-stability hold for graph separations; see [29].

Finally, for ancestral graphs, the mm-separation holds for every set “between” parents and ancestors. We will use this fact later:

Lemma 5.

For an ancestral graph and for separable nodes i,ji,j, we have

imj|A,i\,\mbox{$\perp$}\,_{m}j\,|\,A,

for every AA such that pa({i,j})Aan({i,j})\mathrm{pa}(\{i,j\})\subseteq A\subseteq\mathrm{an}(\{i,j\}).

Proof.

The set pa(i)\mathrm{pa}(i) separating ii and jj follows from the fact that it is the Markov blanket; see [26]. The fact that mm-separation satisfies ordered upward-stability w.r.t. orderings associated to an ancestral graphs [29] implies that imj|pa({i,j})i\,\mbox{$\perp$}\,_{m}j\,|\,\mathrm{pa}(\{i,j\}), and consequently, imj|Ai\,\mbox{$\perp$}\,_{m}j\,|\,A. ∎

2.5 Structural causal models

The theory in this paper does not use or assume structural causal models (SCMs) (also known as the structural equation models) [22, 32]. We define SCMs here as they are an interesting special case of our theory, for which intervention could be easily conceptualized.

Here, we define SCMs for the class of BDMGs as a simplified version of SCMs defined for directed (mixed) graphs in [3]. Consider a graph GG with NN nodes, which in the context of causal inference is often referred to as the “true causal graph.” A structural causal model \mathfrak{C} associated with GG is defined as a collection of NN equations

Xi=ϕi(XpaG(i),ϵi),i{1,,N},X_{i}=\phi_{i}(X_{\mathrm{pa}_{G}(i)},\epsilon_{i}),\hskip 6.99997pti\in\{1,\dots,N\},

where paG(i)\mathrm{pa}_{G}(i) is defined on GG and ϵi\epsilon_{i} might be called noises; for any subsets AA and BB, we require that ϵA ϵB\epsilon_{A}\mbox{$\>\perp\perp$ }\epsilon_{B} if and only if, in GG, there is no arc between any node in AA and any node in BB. In this paper, we usually refer to an SCM as 𝒞\mathcal{C} and its joint distribution as P𝒞P_{\mathcal{C}}.

In the more widely-used case where GG is a DAG, all the ϵi\epsilon_{i} are jointly independent. For both mathematical and causal discussions on SCMs with DAGs, see [24]. When directed cycles are existent, some solvability conditions are required in order for the theory of SCMs to work properly; for this and for more general discussion, see [3].

Standard interventions are defined quite naturally when functional equations are specified, as in the case of SCMs: By intervening on XiX_{i} we replace the equation associated to XiX_{i} by Xi=X~iX_{i}=\tilde{X}_{i}, where X~i\tilde{X}_{i} is independent of all other noises; it is not necessary that X~i\tilde{X}_{i} has the same distribution as XiX_{i}. We are concerned with a similar type of intervention in this paper – this is a special case of the so-called stochastic intervention [16], where some parental set of XiX_{i} might still exist after intervention on XiX_{i}; see also [24]. A more special type of intervention is called perfect intervention (or surgical intervention), where it puts a point mass on a real value aa – this is the original idea of do-calculus [22], and is often denoted by do(Xi=a)\mathrm{do}(X_{i}=a).

An important result for SCMs, which facilitates causal inference using them, is that the joint distribution of an SCM is Markovian to its associated graph; see [34, 22] for the case of DAGs, [30] for directed ancestral graphs, and [3] for directed mixed graphs.

3 Interventional family of distributions

3.1 Interventional families and the cause

Again, let VV be a finite set of size NN. Consider a family of distributions 𝒫do={Pdo(i)}iV\mathcal{P}_{\mathrm{do}}=\{P_{\mathrm{do}(i)}\}_{i\in V}, where each Pdo(i)P_{\mathrm{do}(i)} is defined over the same state space 𝒳=iV𝒳i\mathcal{X}=\prod_{i\in V}\mathcal{X}_{i}. We refer to 𝒫do\mathcal{P}_{\mathrm{do}} as an interventional family (of distributions). For (X~j)jV(\tilde{X}_{j})_{j\in V} a random vector with distribution Pdo(i)P_{\mathrm{do}(i)}, we think of Pdo(i)P_{\mathrm{do}(i)} as the interventional distribution after intervening on some variable XiX_{i}.

For kVk\in V, we define the set of the causes of kk as

cause𝒫do(k)=cause(k):={i:ik,i Pdo(i)k};{\rm{cause}}^{\mathcal{P}_{\mathrm{do}}}(k)=\mathrm{cause}({k}):=\{i:\hskip 3.50006pti\neq k,\hskip 3.50006pti\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\};

we rarely, simultaneously, have to consider two different interventional families at once. Thus, if icause(k)i\in\mathrm{cause}({k}), then, for (X~j)jV(\tilde{X}_{j})_{j\in V}, we have that X~k\tilde{X}_{k} is dependent on X~i\tilde{X}_{i}. Notice that, by convention, kcause(k)k\notin\mathrm{cause}({k}). For a subset AVA\subseteq V, we define cause(A)=(kAcause(k))A\mathrm{cause}({A})=(\bigcup_{k\in A}\mathrm{cause}({k}))\setminus A.

The definition of the cause is identical to what is know in the literature as the “existence of the total causal effect” (see, e.g., Definition 6.12 in [24]). Its combination with 𝒫do\mathcal{P}_{\mathrm{do}} meets the intuition behind cause and intervention: after intervention on a variable XiX_{i}, it is dependent on a variable XjX_{j} if and only if it is a cause of that variable.

The above setting is compatible with the well-known intervention for SCMs; for a comprehensive discussion on this, see Section 8. We will often illustrate our theory with simple examples of SCMs and standard intervention on a single node.

We can also define the set of effects of ii denoted by eff(i)\mathrm{eff}({i}) by icause(k)keff(i)i\in\mathrm{cause}({k})\iff k\in\mathrm{eff}({i}). We take note of the following useful fact.

Remark 6.

From the definition of cause, we have eff(i)={k:ik,i Pdo(i)k}.\mathrm{eff}({i})=\{k:i\neq k,\hskip 3.50006pti\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\}. \Diamond

Remark 7 (Interventional families with the same cause).

Interventions families are defined for one joint distribution per intervention. This can be considered an advantage as not all interventions have to follow a single causal graph. Here, we provide an immediate condition for interventional families to define the same set of causes. After causal graphs have defined, in Section 5.3, we provide conditions for families of distributions that lead to the same causal graph.

Consider the interventional families 𝒫do={Pdo(j)}jV\mathcal{P}_{\mathrm{do}}=\{P_{\mathrm{do}(j)}\}_{j\in V} and 𝒬do={Qdo(j)}jV\mathcal{Q}_{\mathrm{do}}=\{Q_{\mathrm{do}(j)}\}_{j\in V} over the same state space 𝒳\mathcal{X}. Causes and effects depend on the interventional family, and, by Remark 6, it follows that, for all i,kVi,k\in V,

[i Pdo(i)ki Qdo(i)k] if and only if cause𝒫do()=cause𝒬do(),\Big{[}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\iff i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{Q_{\mathrm{do}(i)}}k\Big{]}\text{ if and only if }{\rm{cause}}^{\mathcal{P}_{\mathrm{do}}}(\ell)={\rm{cause}}^{\mathcal{Q}_{\mathrm{do}}}(\ell),

for all V\ell\in V, in which case we say they have the same causes. In the case of SCMs, under standard interventions, the causes are usually invariant with respect to the choice of distribution, except for technical counterexamples; see Remark 52 and Example 42.

Consider a fixed iVi\in V, and suppose that measures PiP_{i} and QiQ_{i} have the same null sets. Notice that the equality

Qdo(i)(|xi)=Pdo(i)(|xi)Q_{\mathrm{do}(i)}(\cdot\,|\,x_{i})=P_{\mathrm{do}(i)}(\cdot\,|\,x_{i})

for almost every xi𝒳ix_{i}\in\mathcal{X}_{i}, is sufficient for the effects of ii to be same in both families. Moreover, with the disintegration

dPdo(i)(x)=dPdo(i)(xV{i}|xi)dPdo(i)i(xi)dP_{\mathrm{do}(i)}(x)=dP_{\mathrm{do}(i)}(x_{V\setminus\left\{{i}\right\}}\,|\,x_{i})dP_{\mathrm{do}(i)}^{i}(x_{i})

we see that effects of ii depend only on the corresponding conditional distribution, and are invariant under the marginal distributions on ii with the same null sets. \Diamond

Remark 8 (Atomic interventions).

Observe that the dependence

i Pdo(i)ki\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k

is equivalent to the existence of disjoint measurable subsets W,W𝒳iW^{*},W^{**}\subset\mathcal{X}_{i} of positive measures under Pdo(i)iP_{\mathrm{do}(i)}^{i} satisfying the inequality

Pdo(i)k(|xi)Pdo(i)k(|xi),P_{\mathrm{do}(i)}^{k}(\cdot\,|\,x_{i}^{*})\neq P_{\mathrm{do}(i)}^{k}(\cdot\,|\,x_{i}^{**}), (1)

for all xiWx_{i}^{*}\in W^{*} and all xiWx_{i}^{**}\in W^{**}. Since Pdo(i)(|xi)P_{\mathrm{do}(i)}(\cdot\,|\,x_{i}^{*}) and Pdo(i)(|xi)P_{\mathrm{do}(i)}(\cdot\,|\,x_{i}^{**}) are probability measures on 𝒳V{i}\mathcal{X}_{V\setminus\left\{{i}\right\}}, they can be thought of as atomic interventions on ii, where the values at ii are fixed, at xix_{i}^{*} and xix_{i}^{**}, respectively. Thus inequality (1) has the interpretation that ii is a cause of kk if and only if there exists atomic interventions that witness an effect on kk; that is, as a function of xix_{i}, the conditional probability, Pdo(i)k(|xi)P_{\mathrm{do}(i)}^{k}(\cdot\,|\,x_{i}), is non-constant.

From Remark 7, without loss of generality, given atomic interventions, Ado(xi)A_{\mathrm{do}(x_{i})}, which are measures on 𝒳V{i}\mathcal{X}_{V\setminus\left\{{i}\right\}} indexed by xi𝒳ix_{i}\in\mathcal{X}_{i}, we can extend these to an intervention, defined on the complete space, 𝒳\mathcal{X}, via the disintegration

dPdo(i)(x):=dAdo(xi)(xV{i})dR(xi),d{P}_{\mathrm{do}(i)}(x):=dA_{\mathrm{do}(x_{i})}(x_{V\setminus\left\{{i}\right\}})dR(x_{i}),

where RR is a suitably chosen probability measure on 𝒳i\mathcal{X}_{i} Specifically, in the case where 𝒳i\mathcal{X}_{i} is finite, RR can be taken to be a uniform measure on 𝒳i\mathcal{X}_{i}. \Diamond

We call a subset SVS\subseteq V a causal cycle if for every i,kSi,k\in S, we have kcause(i)eff(i)k\in\mathrm{cause}({i})\cap\mathrm{eff}({i}). We write cc(i)\mathrm{cc}({i}) to denote the causal cycle containing ii.

Under the composition property, we have the following independence. Let neff(i):=Veff(i)\mathrm{neff}({i}):=V\setminus\mathrm{eff}({i}) denote the subset of VV that contains members that are not an effect of ii.

Proposition 9 (Non-effects under composition).

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a family of distributions. If Pdo(i)P_{\mathrm{do}(i)} satisfies the composition property, then

i Pdo(i)neff(i).i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}\mathrm{neff}({i}).
Proof.

Let kneff(i)k\in\mathrm{neff}({i}). Since ii is not a cause of kk, by definition, i Pdo(i)ki\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k; iteratively applying the composition property, we obtain the desired result. ∎

3.2 Transitive interventional families

We now say that 𝒫do\mathcal{P}_{\mathrm{do}} is a transitive interventional family if the following axiom holds.

Axiom 1 (Transitivity of cause).

For distinct i,j,kVi,j,k\in V, if icause(j)i\in\mathrm{cause}({j}) and jcause(k)j\in\mathrm{cause}({k}), then icause(k)i\in\mathrm{cause}({k}).

Notice also that Pdo(i)P_{\mathrm{do}(i)} in transitive interventional families are restricted: Axiom 1 places constraints between different Pdo(i)P_{\mathrm{do}(i)} since cause(k)\mathrm{cause}({k}) depends on all Pdo(i)P_{\mathrm{do}(i)}.

Under singleton-transitivity, we have sufficient conditions for Axiom 1 to hold. Notice that that these conditions are not satisfied in general, even in the case of an SCM with standard interventions.

Theorem 10 (Transitivity of cause under singleton-transitivity).

Let 𝒫do\mathcal{P}_{\mathrm{do}} be an interventional family whose members Pdo(i)P_{\mathrm{do}(i)} satisfy singleton-transitivity. Assume, for distinct i,j,kVi,j,k\in V such that icause(k)i\notin\mathrm{cause}({k}) and jcause(k)j\in\mathrm{cause}({k}), we have:

  1. (a)

    i Pdo(i)k|ji\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,j; and

  2. (b)

    j Pdo(i)k.j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k.

Then 𝒫do\mathcal{P}_{\mathrm{do}} is a transitive interventional family.

Proof.

Towards a contradiction, assume that icause(j)i\in\mathrm{cause}({j}) and jcause(k)j\in\mathrm{cause}({k}), but icause(k)i\notin\mathrm{cause}({k}). Since jcause(k)j\in\mathrm{cause}({k}), we have, by (b), that j Pdo(i)kj\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k. Also, since icause(j)i\in\mathrm{cause}({j}), we have i Pdo(i)ji\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}j.

The dependencies j Pdo(i)kj\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k and i Pdo(i)ji\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}j, along with singleton-transitivity, in its contrapositive form, imply that i Pdo(i)ki\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k or i Pdo(i)k|ji\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,j; however, the former is ruled out by assumption, and the latter contradicts (a). ∎

Proposition 11.

Transitivity of the cause induces a strict preordering \lesssim on VV by

{i<kkcause(i) and kcause(i);ikkcause(i)eff(i).\left\{\begin{array}[]{l}i<k\iff k\in\mathrm{cause}({i})\text{ and }k\notin\mathrm{cause}({i});\\ i\sim k\iff k\in\mathrm{cause}({i})\cap\mathrm{eff}({i}).\end{array}\right. (2)
Proof.

Irreflexivity is implied by the convention that icause(i)i\notin\mathrm{cause}({i}). Transitivity is Axiom 1. ∎

Corollary 12.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive interventional family that satisfies the composition property. If icause(k)i\notin\mathrm{cause}({k}), then

i Pdo(i)k|cause(k).i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k}).
Proof.

Transitivity implies that ii is not a cause of members of cause(k)\mathrm{cause}({k}), so that cause(k)neff(i)\mathrm{cause}({k})\subseteq\mathrm{neff}({i}). Thus Proposition 9, and the weak union property yield the result. ∎

In the next example, we show that we cannot drop the singleton transitivity assumption in Theorem 10.

Example 13 (Failure of transitivity without singleton transitivity).

Suppose 11 is the cause of 22, and 22 is a cause of 33. It may not be the case that 11 is a cause of 33. Consider the SCM, with X1X2X3X_{1}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{2}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{3}, where X1X_{1} is Bernoulli p(0,1)p\in(0,1), conditional on X1=x1X_{1}=x_{1}, we sample a Poisson random variable N=nN=n with mean x1+1x_{1}+1, and then we sample X2=(X20,,X2n)X_{2}=(X_{2}^{0},\ldots,X_{2}^{n}) as an i.i.d. sequence of n+1n+1 Bernoulli random variable(s) with parameter 12\tfrac{1}{2}, and finally set X3:=X20X_{3}:=X_{2}^{0}. The first random variable X1X_{1} is independent of the final result X3X_{3}. The standard interventions where we simply substitute a distributional copy of XiX_{i} for each ii gives that 11 is clearly the cause of 22, and 22 a cause of 33. However, since Pdo(1)=PP_{\mathrm{do}(1)}=P, singleton-transitivity fails: 1 P31\mbox{$\>\perp\perp$ }_{P}3 and 1 P3| 21\mbox{$\>\perp\perp$ }_{P}3\,|\,2, but we have neither 1 P21\mbox{$\>\perp\perp$ }_{P}2 nor 3 P23\mbox{$\>\perp\perp$ }_{P}2. \Diamond

4 Causal graphs and intervened Markov properties

4.1 Intervened and direct cause

Notice that only knowing the causal ordering cannot yield a graph, since, for example, for ijki\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}k, there is no way to distinguish the two graphs corresponding to whether an additional iki\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}k exists in the graph or not. For this reason, we need to define the concept of direct cause.

In order to define direct and intervened cause in the general case, we need an iterative procedure:

  1. 1.

    For each i,kVi,k\in V, start with dcause(k):=cause(k)\mathrm{dcause}({k}):=\mathrm{cause}({k}), ιcausei(k):=cause(k)\mathrm{\iota cause}_{i}(k):=\mathrm{cause}({k}), and S(𝒫do)S(\mathcal{P}_{\mathrm{do}}) an empty graph with node set VV;

  2. 2.

    Redefine

    dcause(k):={i:idcause(k),i Pdo(i)k|ιcausei(k){i}};\mathrm{dcause}({k}):=\{i:\hskip 3.50006pti\in\mathrm{dcause}({k}),\hskip 3.50006pti\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\}\};
  3. 3.

    Generate S(𝒫do)S(\mathcal{P}_{\mathrm{do}}) by setting arrows from ii to kk, i.e., iki\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}k if idcause(k)i\in\mathrm{dcause}({k});

  4. 4.

    Generate the graph Si(𝒫do)S_{i}(\mathcal{P}_{\mathrm{do}}) by removing all arrows pointing to ii from S(𝒫do)S(\mathcal{P}_{\mathrm{do}});

  5. 5.

    Redefine ιcausei(k):=anSi(𝒫do)(k)\mathrm{\iota cause}_{i}(k):=\mathrm{an}_{S_{i}(\mathcal{P}_{\mathrm{do}})}(k);

  6. 6.

    If S(𝒫do)S(\mathcal{P}_{\mathrm{do}}) is modified by Step 3, then go to Step 2; otherwise, output dcause(k)\mathrm{dcause}({k}), ιcausei(k)\mathrm{\iota cause}_{i}(k), S(𝒫do)S(\mathcal{P}_{\mathrm{do}}), and Si(𝒫do)S_{i}(\mathcal{P}_{\mathrm{do}}).

We call dcause(k)\mathrm{dcause}({k}) the set of the direct causes of kk, and ιcausei(k)\mathrm{\iota cause}_{i}(k) the set of intervened causes of kk after intervention on ii. We also call S(𝒫do)S(\mathcal{P}_{\mathrm{do}}) the causal structure, and Si(𝒫do)S_{i}(\mathcal{P}_{\mathrm{do}}) the ii-intervened causal structure.

Notice that, since in the iteration dcause(k)\mathrm{dcause}({k}) is getting smaller, the procedure will stop.

Notice also that, by convention, kdcause(k)ιcausei(k)k\notin\mathrm{dcause}({k})\cup\mathrm{\iota cause}_{i}(k), for any iVi\in V. We also let dcause(A)=(kAdcause(k))A\mathrm{dcause}({A})=(\bigcup_{k\in A}\mathrm{dcause}({k}))\setminus A, and ιcausei(A)=(kAιcausei(k))A\mathrm{\iota cause}_{i}(A)=(\bigcup_{k\in A}\mathrm{\iota cause}_{i}(k))\setminus A.

As seen by definition, “cause” is a universal concept: no matter how large the system of random variables is, as long as it contains the two investigated random variables, the marginal dependence of those variables stays intact. On the other hand, “direct cause” depends on the system of variables in which the two investigated variables lie.

The below example shows why we need ιcausei(k)\mathrm{\iota cause}_{i}(k) as opposed to cause(k)\mathrm{cause}({k}) in the definition of dcause(k)\mathrm{dcause}({k}); see also Section 7.2.

Example 14.

Let the graph of Figure 1, below, be the graph associated to an SCM with standard interventions. Under faithfulness, notice that, i Pdo(i)k|cause(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}, where cause(k){i}={j,}\mathrm{cause}({k})\setminus\{i\}=\{j,\ell\}. However, ii is clearly not a direct cause of kk. On the other hand, ιcausei(k){i}={j}\mathrm{\iota cause}_{i}(k)\setminus\{i\}=\{j\}, and i Pdo(i)k|ιcausei(k){i}i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\}.

In the iterative procedure above, in the first round, there will be an arrow from ii to kk in S(𝒫do)S(\mathcal{P}_{\mathrm{do}}). In the second round, ii will be removed.

iijjkk\ell
Figure 1: A graph for which detecting the direct cause requires an iterative procedure.

\Diamond

Remark 15.

For causal graphs (as defined in the next subsection) that are ancestral, dcause(k)\mathrm{dcause}({k}) can simply be defined by

i Pdo(i)k|cause(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}

rather than using the iterative procedure and ιcausei(k)\mathrm{\iota cause}_{i}(k) in the conditioning set; see Section 6 for the equivalence of the two methods under certain conditions. \Diamond

Remark 16.

As it is seen in Section 9, there may still be arrows generated here that arguably should not be considered direct causes in the case of non-maximal non-ancestral graphs. We study and identify these cases and offer adjustments in that section. \Diamond

The next proposition can be thought of as a proxy for the pairwise Markov property, and will be used in our proofs of the Markov property for our casual graphs.

Proposition 17.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive interventional family, and Pdo(i)P_{\mathrm{do}(i)} satisfy the composition property. For distinct i,kVi,k\in V, if idcause(k)i\notin\mathrm{dcause}({k}), then

i Pdo(i)k|ιcausei(k){i}.i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\}.
Proof.

If icause(k)i\in\mathrm{cause}({k}), then the result follows directly from the definition of dcause(k)\mathrm{dcause}({k}). If icause(k)i\notin\mathrm{cause}({k}), then by Corollary 12, we have i Pdo(i)k|cause(k){i}i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus{\left\{{i}\right\}}; moreover, observe that transitivity also implies that ιcausei(k)=cause(k)\mathrm{\iota cause}_{i}(k)=\mathrm{cause}({k}). ∎

4.2 Intervened causal and causal graphs

We are now ready to define a graph that demonstrates the causal relationships by capturing the direct causes as well as non-causal dependencies due to latent variables.

Given an interventional family 𝒫do\mathcal{P}_{\mathrm{do}}, and iVi\in V, we define the ii-intervened graph, denoted by Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) to be the ii-intervened causal structure, where, in addition, for each pair of non-adjacent nodes j,kVj,k\in V that are distinct from ii, we place an arc between jj and kk, i.e., jkj\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}k, if

j Pdo(i)k|ιcausei({j,k}),j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}), (3)

Thus with (3), we put an arc if one of the interventions Pdo(i)P_{\mathrm{do}(i)} suggests the presence of a latent variable.

We also define the causal graph, denoted by G(𝒫do)G(\mathcal{P}_{\mathrm{do}}), to be the causal structure, where, in addition, for each pair of nodes j,kj,k that are not adjacent by an arrow, we place an arc between them if the jkjk-arc exists in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) for every iVi\in V that is distinct from jj and kk.

Remark 18.

We note that Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) does not contain arrows pointing to ii in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) and all arcs with ii as an endpoint in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}).

We also note that the existence of the jkjk-arc in two different intervened graphs Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) and G(𝒫do)G_{\ell}(\mathcal{P}_{\mathrm{do}}) may not coincide when there is a PIP as the below example shows. \Diamond

Example 19.

Consider an SCM with the graph presented in Figure 2, below, with standard intervention. Assume that the joint distribution of the SCM is faithful to this graph. It is easy to observe that j Pdo(i)k|ιcausei({j,k})j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}), but j Pdo()k|ιcausei({j,k})j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(\ell)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}). This implies that there is a jkjk-arc in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}), but not in G(𝒫do)G_{\ell}(\mathcal{P}_{\mathrm{do}}).

\ellhhkkjjii
Figure 2: A (non-maximal) graph associated to an SCM

\Diamond

For an interventional family 𝒫do\mathcal{P}_{\mathrm{do}}, our definitions now allow us to use the notions paG(𝒫do)(k)\mathrm{pa}_{G(\mathcal{P}_{\mathrm{do}})}(k) and dcause(k)\mathrm{dcause}({k}) interchangeably. For a transitive 𝒫do\mathcal{P}_{\mathrm{do}}, we have that, moreover, other causal terminologies on 𝒫do\mathcal{P}_{\mathrm{do}} and the graph terminologies on G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) can be used interchangeably:

Theorem 20 (Interchangeable terminology).

For a transitive interventional 𝒫do\mathcal{P}_{\mathrm{do}} where Pdo(i)P_{\mathrm{do}(i)} satisfy the composition property, we have the following:

  1. (i)

    ianG(𝒫do)(k)icause(k) in 𝒫do.i\in\mathrm{an}_{G(\mathcal{P}_{\mathrm{do}})}(k)\Longleftrightarrow i\in\mathrm{cause}({k})\text{ in }\mathcal{P}_{\mathrm{do}}.

  2. (ii)

    iscG(𝒫do)(k)kcc(i) in 𝒫doi\in\mathrm{sc}_{G(\mathcal{P}_{\mathrm{do}})}(k)\Longleftrightarrow k\in\mathrm{cc}({i})\text{ in }\mathcal{P}_{\mathrm{do}}.

Proof.
  1. (i)

    The direction ianG(𝒫do)(k)icause(k)i\in\mathrm{an}_{G(\mathcal{P}_{\mathrm{do}})}(k)\Rightarrow i\in\mathrm{cause}({k}) is implied by the transitivity of ancestors and Axiom 1.

    Assume that icause(k)i\in\mathrm{cause}({k}). We prove the result by induction on the cardinality of ιcausei(k)eff(i)\mathrm{\iota cause}_{i}(k)\cap\mathrm{eff}({i}). For the basis, if |ιcausei(k)eff(i)|=0|\mathrm{\iota cause}_{i}(k)\cap\mathrm{eff}({i})|=0, then by the definition of eff(i)\mathrm{eff}({i}), for each ιcausei(k){i}\ell\in\mathrm{\iota cause}_{i}(k)\setminus{\left\{{i}\right\}}, we must have i Pdo(i)i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}\ell; furthermore, composition implies

    i Pdo(i)ιcausei(k){i}.i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}\mathrm{\iota cause}_{i}(k)\setminus{\left\{{i}\right\}}.

    Towards a contradiction, suppose that idcause(k)i\notin\mathrm{dcause}({k}), so that

    i Pdo(i)k|ιcausei(k){i}.i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(k)\setminus{\left\{{i}\right\}}.

    Contraction and decomposition imply i Pdo(i)ki\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k, so that we have icause(k)i\not\in\mathrm{cause}({k}), contrary to our original assumption.

    For the inductive step, assume that |ιcausei(k)eff(i)|=r1|\mathrm{\iota cause}_{i}(k)\cap\mathrm{eff}({i})|=r\geq 1. Choose an ιcausei(k)eff(i)\ell\in\mathrm{\iota cause}_{i}(k)\cap\mathrm{eff}({i}). The transitivity of the cause, assumed in Axiom 1, implies that ιcausei()ιcausei(k)\mathrm{\iota cause}_{i}(\ell)\subset\mathrm{\iota cause}_{i}(k). This implies that |ιcausei()eff(i)|<r|\mathrm{\iota cause}_{i}(\ell)\cap\mathrm{eff}({i})|<r, which, by inductive hypothesis, implies there is a directed path from ii to \ell.

    Similarly, eff()ιcausei(k)eff(i)ιcausei(k)\mathrm{eff}({\ell})\cap\mathrm{\iota cause}_{i}(k)\subset\mathrm{eff}({i})\cap\mathrm{\iota cause}_{i}(k), so that |eff()ιcausei(k)|<r|\mathrm{eff}({\ell})\cap\mathrm{\iota cause}_{i}(k)|<r and again the inductive hypothesis gives a directed path from \ell to kk. Hence, there is a directed path from ii to kk, as desired.

  2. (ii)

    This equivalence now follows directly from (i) using the definition of scG(𝒫do)(k)\mathrm{sc}_{G(\mathcal{P}_{\mathrm{do}})}(k) and cc(i)\mathrm{cc}({i}). ∎

As an immediate consequence of the above result, we see that if the set of causes of a variable is non-empty, then at least one of the causes must act as the direct cause.

Corollary 21.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive interventional family where Pdo(i)P_{\mathrm{do}(i)} satisfy the composition property. For kVk\in V, if cause(k)\mathrm{cause}({k})\neq\emptyset, then dcause(k)\mathrm{dcause}({k})\neq\emptyset.

Proof.

Let icause(k)i\in\mathrm{cause}({k}). We know that there is at least one directed path from ii to kk. The node adjacent to kk on the path is a direct cause by definition. ∎

In the next example, we illustrate how all the different edges can easily occur. We remark that we can generate cycles quite easily, but in the standard SCM setting, their existence is non-trivial and requires solvability conditions [3].

Example 22 (A simple example of an arrow, cycle, and arc).

Consider a fixed joint distribution PP for random variables (X1,X2,X3)(X_{1},X_{2},X_{3}), where X1X_{1} is not independent of X2X_{2}, and X3X_{3} is independent of (X1,X2)(X_{1},X_{2}) Notice we may think of the joint distribution (X1,X2)(X_{1},X_{2}) as generated via the following functional equations: (X1,X2=ϕ(X1,U))(X_{1},X_{2}=\phi(X_{1},U)) or (X1=ψ(X2,V),X2)(X_{1}=\psi(X_{2},V),X_{2}), where ϕ\phi and ψ\psi are deterministic functions, UU and VV are uniformly distributed on [0,1][0,1].

Thus corresponding to ϕ\phi, we have an SCM, where X1X2X_{1}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{2}, and X3X_{3} is isolated, and similarly, corresponding to ψ\psi, we have an SCM, where X2X1X_{2}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{1}, and X3X_{3} is isolated. Next, we see how these SCM interact with various interventions, and see the corresponding causal graphs that can be defined.

Via the function ϕ\phi, we consider standard interventions, where X1X_{1}, X2X_{2}, and X3X_{3} are replaced with a distributional copies of themselves, giving the family (P=Pdo(1)ϕ,Pdo(2)ϕ,P=Pdo(3)ϕ)(P=P^{\phi}_{\mathrm{do}(1)},P^{\phi}_{\mathrm{do}(2)},P=P^{\phi}_{\mathrm{do}(3)}) and via the function ψ\psi we consider standard interventions giving the family (Pdo(1)ψ,P=Pdo(2)ψ,P=Pdo(3)ψ)(P^{\psi}_{\mathrm{do}(1)},P=P^{\psi}_{\mathrm{do}(2)},P=P^{\psi}_{\mathrm{do}(3)}); in both cases, we place an arrow between 11 and 22 in the expected direction. Note that

Pind:=P1P2P3=Pdo(2)ϕ=Pdo(1)ψ.P^{\rm{ind}}:=P^{1}\otimes P^{2}\otimes P^{3}=P^{\phi}_{\mathrm{do}(2)}=P^{\psi}_{\mathrm{do}(1)}.

Consider also the non-standard interventional family (Pdo(1)ϕ,Pdo(2)ψ,P)(P^{\phi}_{\mathrm{do}(1)},P^{\psi}_{\mathrm{do}(2)},P); 11 is a direct cause of 22, and 22 is also a direct cause of 11, so we obtain a parallel edge.

Finally, to have an arc, we consider the interventional family

(Pdo(1),Pdo(2),Pdo(3))=(Pind,Pind,P);(P_{\mathrm{do}(1)},P_{\mathrm{do}(2)},P_{\mathrm{do}(3)})=(P^{\mathrm{ind}},P^{\mathrm{ind}},P);

there are no causes, and no arrows are placed, but Pdo(3)P_{\mathrm{do}(3)} detects the dependence in X1X_{1} and X2X_{2}, and places an arc between them. \Diamond

In the following example, we stress that the generated causal graph heavily depends on the interventional family that is considered, even under standard interventions arising from simple SCM.

Example 23 (Two different graphs, one underlying distribution).

Let ϵ1,ϵ2,ϵ3\epsilon_{1},\epsilon_{2},\epsilon_{3} be independent Bernoulli random variables with parameter 12\tfrac{1}{2}. Consider the random variables X1=ϵ1X_{1}=\epsilon_{1}, X2=X1+ϵ2X_{2}=X_{1}+\epsilon_{2}, and X3=X1+ϵ3X_{3}=X_{1}+\epsilon_{3}; they have a joint distribution PP, and can be thought of as an SCM 𝒞\mathcal{C} with X1X2X_{1}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{2} and X1X3X_{1}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{3}. Thus we can define a intervention family 𝒫do\mathcal{P}_{\mathrm{do}} corresponding to standard interventions on 𝒞\mathcal{C}; it is not difficult to verify in this case that the resulting casual graph will be the same as the graph for 𝒞\mathcal{C}.

However, it is not difficult to construct another SCM 𝒞\mathcal{C}^{\prime} with X3X2X_{3}^{\prime}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{2}^{\prime}, X3X1X_{3}^{\prime}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{1}^{\prime}, and X2X1X_{2}^{\prime}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{1}^{\prime}, where (X1,X2,X3)(X_{1}^{\prime},X_{2}^{\prime},X_{3}^{\prime}) also have the joint distribution PP. Thus we can define another intervention family 𝒫do\mathcal{P}_{\mathrm{do}}^{\prime} corresponding to standard interventions on 𝒞\mathcal{C}^{\prime}; again it is not difficult to verify in this case that the resulting casual graph will be the same as the graph for 𝒞\mathcal{C}^{\prime}.

We exploited the simple fact that that the joint distribution PP does not uniquely determine a corresponding SCM—not even up to adjacency. Notice that 𝒞\mathcal{C} and 𝒞\mathcal{C}^{\prime} do not have the same number of edges. \Diamond

The generated graph is indeed a bowless directed mixed graph (BDMG).

Proposition 24 (The causal graph is BDMG).

The causal graph and intervened graphs generated from a transitive interventional family are BDMG.

Proof.

The proof is immediate from the definition of intervened causal and causal graphs. ∎

Remark 25.

If we assume that cc(k)={k}\mathrm{cc}({k})=\{k\} for every kVk\in V, so that there is no causal cycle, then the causal graph is a bowless ADMG. If we assume that for every j,kVj,k\in V that are not direct causes of each other and every ii, we have

j Pdo(i)k|ιcausei({j,k}),j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}),

then the causal graph does not contain arcs – in this case it is seen that the interventional family does not detect any latent variables that cause both jj and kk, since they are not dependent. Assuming both of these conditions results in a DAG. \Diamond

4.3 The Markov property with respect to the intervened causal graphs

We can now present a main result of this paper.

Theorem 26 (Interventional distributions are Markovian to intervened graphs).

Let 𝒫do={Pdo(i)}iV\mathcal{P}_{\mathrm{do}}=\left\{{P_{\mathrm{do}(i)}}\right\}_{i\in V} be a transitive interventional family. For each iVi\in V, if Pdo(i)P_{\mathrm{do}(i)} satisfies the intersection property and the composition property, then Pdo(i)P_{\mathrm{do}(i)} is Markovian to the ii-intervened graph, Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}).

Proof.

By Theorem 2, it suffices to show that Pdo(i)P_{\mathrm{do}(i)} satisfies PMP w.r.t. Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}). Consider first the case where i,kVi,k\in V are non-adjacent in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}). We have that ipaGi(𝒫do)(k)i\notin\mathrm{pa}_{G_{i}(\mathcal{P}_{\mathrm{do}})}(k); moreover, by definition the definition of Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}), we have that idcause(k)i\notin\mathrm{dcause}({k}). Hence, Proposition 17 implies PMP.

Now we look at the case where j,kVj,k\in V are distinct from ii and are non-adjacent in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}); in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}), there are no arrows between jj and kk, and furthermore, from (3), the absence of an arc immediately gives (PMP). ∎

4.4 Alternative intervened causal graph

We defined intervened causal graphs Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) by deciding to place arcs in the graph in places where arrows are missing. We used j Pdo(i)k|ιcausei({j,k})j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}), which immediately implies parts of the global Markov property as expressed in Theorem 26. If a latent variable that causes both jj and kk should always induce dependencies between the two regardless of what to condition on, in order to obtain Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}), one can instead place an arc in the causal structure when, for every CVC\subset V such that i,j,kCi,j,k\notin C, we have

j Pdo(i)k|C.j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,C. (4)

We note that the above definition removes any assumptions related to Markov properties. In order to obtain the same graph, and consequently, Markov property, we will need ordered upward- and downward-stability as the following result shows. Let us denote the intervened causal graph generated in this way by G~i(𝒫do)\tilde{G}_{i}(\mathcal{P}_{\mathrm{do}}).

Proposition 27.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be an interventional family. For each iVi\in V, if Pdo(i)P_{\mathrm{do}(i)} satisfies ordered upward- and downward-stability w.r.t. the order induced by ii-intervened cause, then G~i(𝒫do)=Gi(𝒫do)\tilde{G}_{i}(\mathcal{P}_{\mathrm{do}})=G_{i}(\mathcal{P}_{\mathrm{do}}), where we use (4) in lieu of (3) for arcs in G~i(𝒫do)\tilde{G}_{i}(\mathcal{P}_{\mathrm{do}}).

Proof.

By definition, the arrows are the same in both graphs. We need to show that they have the same arcs. One direction is trivial. We prove the other direction in contrapositive from by assuming j Pdo(i)k|Cj\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,C for some CC. Using ordered upward-stability, we can add ιcausei({j,k})\mathrm{\iota cause}_{i}(\{j,k\}) to the conditioning set, and by applying ordered downward-stability we can remove the other nodes to obtain j Pdo(i)k|ιcausei({j,k})j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}). ∎

5 Observable interventional families

In Section 4, the causal structures are completely defined by interventional families; here we provide additional assumptions for the causal graph to be Markovian with respect to an underlying distribution that can be observed.

5.1 Markov property w.r.t. the causal graph

We say that an interventional family of distributions 𝒫do\mathcal{P}_{\mathrm{do}} on the state space 𝒳=iV𝒳i\mathcal{X}=\prod_{i\in V}\mathcal{X}_{i} is observable with respect to an underlying distribution PP on 𝒳\mathcal{X} such that the following axiom holds.

Axiom 2.
  1. (a)

    For every separable pair j,kVj,k\in V, in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}), and every distinct iVi\in V, we have

    j Pdo(i)k|ιcausei({j,k})j Pk|cause({j,k}).j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\})\Rightarrow j\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({\{j,k\}}).
  2. (b)

    For every kk and every icause(k)i\in\mathrm{cause}({k}), we have

    i Pdo(i)k|ιcausei({i,k})i Pk|cause({i,k}).i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{i,k\})\Rightarrow i\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({\{i,k\}}).

Let us remark if PP is a product measure on 𝒳\mathcal{X}, then every interventional family will be observable with respect to PP; in practice we want to consider underlying distributions that have some relation to the interventional family, such as when PP is the distribution of an SCM.

Theorem 28 (The underlying distribution is Markovian to the causal graph).

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive observable interventional family, with underlying joint distribution PP that satisfies the intersection property and the composition property. Then PP is Markovian to the causal graph, G(𝒫do)G(\mathcal{P}_{\mathrm{do}}).

Proof.

By Theorem 2, it suffices to verify that PP satisfies pairwise Markov property (PMP) w.r.t. G(𝒫do)G(\mathcal{P}_{\mathrm{do}}); moreover, we can assume, without loss of generality that G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) is maximal since separable pairs do not correspond to any separation statements. If j,kVj,k\in V are non-adjacent nodes, then there is no arc between jj and kk in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). Hence, by definition, there exists an ii such that j Pdo(i)k|ιcausei({j,k})j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}), from which Axiom 2 (a) gives the desired result. ∎

5.2 Causal graphs for observational distributions

The arcs in the causal graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) can be generated from the observational distribution PP and the causes, if the inverse of Axiom 2 holds. Assume 𝒫do\mathcal{P}_{\mathrm{do}} is an observable interventional family of distributions with respect to an underlying observational measure PP. We say that 𝒫do\mathcal{P}_{\mathrm{do}} is a strongly-observable interventional family if the following additional axiom holds.

Axiom 3.
  1. (a)

    For every distinct i,j,kVi,j,k\in V, we have

    j Pk|cause({j,k})j Pdo(i)k|ιcausei({j,k}).j\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({\{j,k\}})\Rightarrow j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}).
  2. (b)

    For every kk and every icause(k)i\in\mathrm{cause}({k}), we have

    i Pk|cause({i,k})i Pdo(i)k|ιcausei({i,k}).i\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({\{i,k\}})\Rightarrow i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{i,k\}).

Our next example shows that there are univariate observable interventional families that are not given by standard interventions and that univariate observable interventional families may not satisfy Axiom 3.

Example 29 (Interventions on joint distributions).

Suppose X1X_{1}, X2X_{2}, and X3X_{3} are jointly independent (Bernoulli) random variables, with law PP, which will serve as an underlying distribution. Now consider the (non-standard) intervention, where Pdo(1)P_{\mathrm{do}(1)} changes the joint distribution of X2X_{2} and X3X_{3} to one of dependence, but leaves the marginal distributions of X2X_{2} and X3X_{3}, and X1X_{1} alone; furthermore, we leave X1X_{1} independent of (X2,X3)(X_{2},X_{3}). Although we normally think of Pdo(1)P_{\mathrm{do}(1)} as an interventional on X1X_{1}, our general definition allows for somewhat counter-intuitive constructions.

Let Pdo(2)P_{\mathrm{do}(2)} and Pdo(3)P_{\mathrm{do}(3)} be standard interventions on X2X_{2} and X3X_{3}, respectively, that simply leave the original independent distribution unchanged. The family {Pdo(1),Pdo(2),Pdo(3)}\{P_{\mathrm{do}(1)},P_{\mathrm{do}(2)},P_{\mathrm{do}(3)}\} satisfies Axioms 1 and 2 trivially, but Axiom 3 is not satisfied. \Diamond

Define G(P)G(P) by adding to the causal structure S(𝒫do)S(\mathcal{P}_{\mathrm{do}}), arcs between jj and kk, i.e., jkj\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}k, for nodes jj and kk not adjacent by an arrow, if

j Pk|cause({j,k}).j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P}k\,|\,\mathrm{cause}({\{j,k\}}). (5)

Clearly, (5) suggests the presence of a latent variable; compare with (3).

Proposition 30.

Suppose that 𝒫do\mathcal{P}_{\mathrm{do}} is a strongly-observable interventional family with the underlying distribution PP, and N3N\geq 3, so that there are at least three nodes. Then G(P)=G(𝒫do)G(P)=G(\mathcal{P}_{\mathrm{do}}) when these graphs are maximal.

Proof.

By definition, the arrows are the same in both graphs. If there is an arc between j,kj,k in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}), then by definition of arcs, for all iVi\in V, we have j Pdo(i)k|ιcausei({j,k})j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}), hence Axiom 3 (a) implies that j Pk|cause({j,k})j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P}k\,|\,\mathrm{cause}({\{j,k\}}), and thus there is an arc between j,kj,k in G(𝒫)G(\mathcal{P}). Conversely, if j Pk|cause({j,k})j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P}k\,|\,\mathrm{cause}({\{j,k\}}), then Axiom 2 (a) implies the existence of the necessary arc in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) for maximal graphs. ∎

Note that for strongly-observable interventional families, the existence of the jkjk-arc in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) and G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) may differ for non-maximal graphs only when j,kj,k is an inseparable pair. This implies that these two graphs are Markov equivalent.

5.3 Congruent interventional families

Consider the interventional families 𝒫do={Pdo(j)}jV\mathcal{P}_{\mathrm{do}}=\{P_{\mathrm{do}(j)}\}_{j\in V} and 𝒬do={Qdo(j)}jV\mathcal{Q}_{\mathrm{do}}=\{Q_{\mathrm{do}(j)}\}_{j\in V} over the same state space 𝒳\mathcal{X}. It is immediate that if both families are strongly observable with respect to a single underlying distribution PP, and if the causal graphs G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) and G(𝒬do)G(\mathcal{Q}_{\mathrm{do}}) are maximal, then they have the same adjacencies. Motivated by Axioms 2 and 3, we say that the families are congruent if the have the same causes (see Remark 7) and

  1. 1.)

    For every kk and every icause𝒫do(k)=cause𝒬do(k)i\in{\rm{cause}}^{\mathcal{P}_{\mathrm{do}}}(k)={\rm{cause}}^{\mathcal{Q}_{\mathrm{do}}}(k), we have

    i Pdo(i)k|ιcausei𝒫do({i,k})i Qdo(i)k|ιcausei𝒬do{i,k}.i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}^{\mathcal{P}_{\mathrm{do}}}(\{i,k\})\iff i\mbox{$\>\perp\perp$ }_{Q_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}^{\mathcal{Q}_{\mathrm{do}}}{\{i,k\}}.
  2. 2.)

    For every distinct i,j,kVi,j,k\in V, we have

    j Pdo(i)k|ιcausei𝒫do({j,k})j Qdo(i)k|ιcause𝒬do{j,k}.j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}^{\mathcal{P}_{\mathrm{do}}}(\{j,k\})\iff j\mbox{$\>\perp\perp$ }_{Q_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}^{\mathcal{Q}_{\mathrm{do}}}{\{j,k\}}.

We collect our observations in the following proposition.

Proposition 31.

Consider two interventional families over the same state space.

  1. 1.)

    The families are congruent if and only if their causal graphs are the same.

  2. 2.)

    If the families are strongly observable with respect to the same underlying distribution, and their causal graphs are maximal, then the graphs have the same adjacencies; furthermore, if the families have the same causes, and if the causal graphs are ancestral, then the graphs are the same, and the families are congruent.

Proof.

The first claim is immediate. For the second claim, it is immediate from Axioms 2 and 3, that the causal graphs have the same adjacencies; furthermore, under the ancestral assumption, we can also direct the graph using the causes, which are also the same. Furthermore, if two nodes are adjacent and not causes of each other, we deduce the presence of an arc. Hence the graphs are the same, and the families are congruent. ∎

5.4 Alternative causal graphs

As another alternative for causal graphs, one can consider a setting where we relax the criterion for an arc to exist in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) by only requiring that a corresponding arc exists in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) for some ii (as opposed to every ii).

This graph clearly has more arcs than in the previous setting. This new setting leads to something extra to the Markov property above when it comes to conditional independencies related to causal graphs: For a separable pair (j,k)(j,k), Axiom 2 always implies that j pk|cause({j,k})j\mbox{$\>\perp\perp$ }_{p}k\,|\,\mathrm{cause}({\{j,k\}}). On the other hand, the original setting is more consistent with the idea that the presence of an arc indicates the existence of a latent random variable causing the endpoints.

This also leads to Axiom 3 to hold under transitivity, composition, and converse Markov property:

Proposition 32.

Suppose that 𝒫do\mathcal{P}_{\mathrm{do}} is a transitive observable interventional family with the underlying distribution PP. If Pdo(i)P_{\mathrm{do}(i)} satisfy the composition property, and PP satisfies the converse Markov property w.r.t. G(𝒫do)G(\mathcal{P}_{\mathrm{do}}), then Axiom 3 holds.

Proof.

We need two cases. If j,kj,k (where jj may be ii) are non-adjacent in all G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) then these are not adjacent in every Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}). The right-hand-side of the axiom then holds by Markov property of Theorem 26 under transitivity (which ensures, by Theorem 20, that ancestors and causes are the same).

If j,kj,k (where jj may be ii) are adjacent in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}), then the left-hand-side never holds under the converse pairwise Markov property. ∎

6 Specialization to directed ancestral graphs

When we assume that the causal graph is ancestral (in fact, directed ancestral), in all the definitions and results ιcausei()\mathrm{\iota cause}_{i}(\cdot) can be replaced by cause()\mathrm{cause}({\cdot}) under intersection and composition. In particular, for the definition of dcause()\mathrm{dcause}({\cdot}), we have the following:

Proposition 33.

Let 𝒫do={Pdo(i)}iV\mathcal{P}_{\mathrm{do}}=\left\{{P_{\mathrm{do}(i)}}\right\}_{i\in V} be a transitive interventional family, and suppose that Pdo(i)P_{\mathrm{do}(i)} satisfies the intersection and composition properties for every iVi\in V. If G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) is ancestral, then for each icause(k)i\in\mathrm{cause}({k}), we have that

idcause(k) if and only if i Pdo(i)k|cause(k){i}.i\in\mathrm{dcause}({k})\text{ if and only if }i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}. (6)
Proof.

Let C=cause(k)ιcausei(k)C=\mathrm{cause}({k})\setminus\mathrm{\iota cause}_{i}(k). Since G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) is ancestral, there are no arcs between members of CC and kk and no directed paths from kk to members of CC in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). We have the separation, imC|ιcausei(k){i}i\,\mbox{$\perp$}\,_{m}C\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\} in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}), and from Markov property given in Theorem 26, we deduce that

i Pdo(i)C|ιcausei(k){i}.i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}C\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\}. (7)
  1. (\Leftarrow)

    If i Pdo(i)k|cause(k){i}i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}, then by (7) and contraction property, we obtain

    i Pdo(i)C{k}|ιcausei(k){i},i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}C\cup\{k\}\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\},

    which by decomposition implies idcause(k)i\not\in\mathrm{dcause}({k}).

  2. (\Rightarrow)

    If idcause(k)i\not\in\mathrm{dcause}({k}), then i Pdo(i)k|ιcausei(k){i}i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\}; then from the composition property and (7), we obtain i Pdo(i)C{k}|ιcausei(k){i}i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}C\cup\{k\}\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\}, and finally by weak union we have i Pdo(i)k|cause(k){i}i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}. ∎

We note that the implication above is, in fact, only using the global Markov property in one direction (and composition, in addition) in the other.

In addition to this new definition of direct cause, and consequently causal structure, we can define a jkjk-arc in the intervened graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) to exist if

j Pdo(i)k|cause({j,k}),j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({\{j,k\}}), (8)

as the following result shows.

Proposition 34.

Let 𝒫do={Pdo(i)}iV\mathcal{P}_{\mathrm{do}}=\left\{{P_{\mathrm{do}(i)}}\right\}_{i\in V} be a transitive interventional family, and suppose that Pdo(i)P_{\mathrm{do}(i)} satisfies the intersection and composition properties for every iVi\in V. If G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) is ancestral, then for each i,j,ki,j,k, we have that

j Pdo(i)k|cause({j,k}) if and only if j Pdo(i)k|ιcausei({j,k}).j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({\{j,k\}})\text{ if and only if }j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(\{j,k\}).
Proof.

The proof is a routine variation of the proof of Proposition 33, where instead of (7), we use

k Pdo(i)C|ιcausei({j,k}).k\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}C\,|\,\mathrm{\iota cause}_{i}(\{j,k\}).\qed
Remark 35.

Consequently, for ancestral causal graphs, Axioms 2 and 3, for strongly observable interventional families, can be written with cause()\mathrm{cause}({\cdot}) instead of ιcausei()\mathrm{\iota cause}_{i}({\cdot}). \Diamond

7 Quantifiable interventional families

Although transitive observable interventional families lead to Markov property of the observational distribution to the causal graph, they do not allow us to measure causal effects from observed data. Measuring causal effects from observed data is an important consequence of the theory presented here, although it is beyond the scope of this paper. Here we provide a more restricted axiom that allows this possibility.

For technical simplicity, we focus on the case where the graphs are directed ancestral. We recall from Section 6 that under the additional assumptions that the interventional family is transitive with each member satisfying the intersection and composition properties, all occurrences ιcause()\mathrm{\iota cause}(\cdot) in the definitions and axioms can be replaced with cause()\mathrm{cause}({\cdot}) and dcause()\mathrm{dcause}({\cdot}) using (6).

7.1 Axiomatization

If two measures PP and QQ have the same null sets, then they are equivalent, and we write PQP\sim Q. For a distribution PP with state space 𝒳=iV𝒳i\mathcal{X}=\prod_{i\in V}\mathcal{X}_{i}, we say that the family {𝒫do,P}\left\{{\mathcal{P}_{\mathrm{do}},P}\right\} is compatible if for all distinct i,kVi,k\in V, we have Pdo(i)cause(k){k}Pcause(k){k}P^{\mathrm{cause}({k})\cup\left\{{k}\right\}}_{\mathrm{do}(i)}\sim P^{\mathrm{cause}({k})\cup\left\{{k}\right\}}.

We now say that an interventional family (of distributions) 𝒫do\mathcal{P}_{\mathrm{do}} is quantifiable interventional if there exists an underlying distribution PP with state space 𝒳\mathcal{X} such that the family {𝒫do,P}\left\{{\mathcal{P}_{\mathrm{do}},P}\right\} is compatible and the following axiom holds.

Axiom 4.

For all distinct i,kVi,k\in V, there exist regular conditional probabilities such that their marginals on 𝒳k\mathcal{X}_{k} satisfy

Pdo(i)k(|xcause(k){i},xi)={Pk(|xcause(k){i}), if icause(k)Pk(|xcause(k){i},xi), if icause(k)P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({k})\setminus\left\{{i}\right\}},x_{i})=\begin{cases}P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})\setminus\left\{{i}\right\}}),\text{ if }i\not\in\mathrm{cause}({k})&\\ P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})\setminus\left\{{i}\right\}},x_{i}),\text{ if }i\in\mathrm{cause}({k})\end{cases}

for all xcause(k){i}𝒳cause(k){i}x_{\mathrm{cause}({k})\setminus\left\{{i}\right\}}\in\mathcal{X}_{\mathrm{cause}({k})\setminus\left\{{i}\right\}} and all xi𝒳ix_{i}\in\mathcal{X}_{i}.

In addition, if X=(Xj)jVX=(X_{j})_{j\in V} is a random vector with distribution PP, then we say that XiX_{i} is a cause of XkX_{k}, if icause(k)i\in\mathrm{cause}({k}), and we write cause(Xk)\mathrm{cause}({X_{k}}) to denote the set of the causes of XkX_{k}; and similarly for dcause(Xk)\mathrm{dcause}({X_{k}}).

When Axiom 4 holds, we can drop the occurrence of the xix_{i} in the conditioning set of the distribution.

Proposition 36.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a quantifiable interventional family with the underlying distribution PP. For all distinct i,kVi,k\in V, there exist regular conditional probabilities such that their marginals on 𝒳k\mathcal{X}_{k} satisfy

Pdo(i)k(|xcause(k))=Pk(|xcause(k))P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({k})})=P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})})

for all xcause(k)𝒳cause(k)x_{\mathrm{cause}({k})}\in\mathcal{X}_{\mathrm{cause}({k})}.

Proof.

If icause(k)i\in\mathrm{cause}({k}), then the result is obvious by Axiom 4; otherwise, by elementary manipulations, and Axiom 4, we have

Pdo(i)k(|xcause(k))\displaystyle P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({k})}) =\displaystyle= Pdo(i)k(|xcause(k),xi)dPdo(i)i(xi|xcause(k))\displaystyle\int P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({k})},x_{i})dP^{i}_{\mathrm{do}(i)}(x_{i}\,|\,x_{\mathrm{cause}({k})})
=\displaystyle= Pk(|xcause(k))dPdo(i)i(xi|xcause(k))\displaystyle\int P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})})dP^{i}_{\mathrm{do}(i)}(x_{i}\,|\,x_{\mathrm{cause}({k})})
=\displaystyle= Pk(|xcause(k))dPdo(i)i(xi|xcause(k))\displaystyle P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})})\int dP^{i}_{\mathrm{do}(i)}(x_{i}\,|\,x_{\mathrm{cause}({k})})
=\displaystyle= Pk(|xcause(k)).\displaystyle P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})}).\qed

In the corollary below, we see that Axiom 4 imposes strong relationships among different Pdo(i)P_{\mathrm{do}(i)}.

Corollary 37.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a quantifiable interventional family. For every distinct i,j,kVi,j,k\in V, we have

Pdo(j)k(|xcause(k))=Pdo(i)k(|xcause(k))=Pk(|xcause(k)).P^{k}_{\mathrm{do}(j)}(\cdot\,|\,x_{\mathrm{cause}({k})})=P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({k})})=P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})}).
Proof.

The first equality is a direct consequence of Axiom 4. The second equality is Proposition 36. ∎

Remark 38.

Thus, given 𝒫do\mathcal{P}_{\mathrm{do}} and Axiom 4, all Pk(|xcause(k))P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})}) are uniquely determined; this will lead to the whole distribution PP being uniquely determined in the case where there are no latent variables under certain conditions; see Proposition 50. \Diamond

Remark 39.

By Corollary 37, Axiom 4 imposes strong relationships between different Pdo(i)P_{\mathrm{do}(i)}. On the other hand, Axioms 2 and 3 only deal with conditional independencies implied by interventional and underlying distributions; thus, they clearly do not impose the restrictions that appear in Corollary 37. This shows that there are many interventional families that are strongly observable but not quantifiable. \Diamond

Lemma 40.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a quantifiable interventional family with the underlying distribution PP and directed ancestral causal graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). For icause(k)i\in\mathrm{cause}({k}), we have

i Pdo(i)k|cause(k){i}i Pk|cause(k){i}.i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}\Longleftrightarrow i\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({k})\setminus\{i\}. (9)
Proof.

Observe that the dependence

i Pdo(i)k|cause(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\} (10)

is equivalent to the existence of measurable subset FF of positive measure under Pdo(i)cause(k){i}P_{\mathrm{do}(i)}^{\mathrm{cause}({k})\setminus\left\{{i}\right\}} such that for every zFz\in F there exist disjoint measurable subsets W,W𝒳iW^{*},W^{**}\subset\mathcal{X}_{i} of positive measures under the conditional probability Pdo(i)k(|z)P_{\mathrm{do}(i)}^{k}(\cdot\,|\,z) satisfying the inequality

Pdo(i)k(|z,xi)Pdo(i)k(|z,xi),P_{\mathrm{do}(i)}^{k}(\cdot\,|\,z,x_{i}^{*})\neq P_{\mathrm{do}(i)}^{k}(\cdot\,|\,z,x_{i}^{**}),

for all xiWx_{i}^{*}\in W^{*} and all xiWx_{i}^{**}\in W^{**}; in the case that cause(k){i}\mathrm{cause}({k})\setminus\left\{{i}\right\} is empty, we simply omit reference to the variable zFz\in F.

Axiom 4 allows us to interchange PP and Pdo(i)P_{\mathrm{do}(i)} and infer that the dependence (10) is equivalent to

Pk(|z,xi)Pk(|z,xi),P^{k}(\cdot\,|\,z,x_{i}^{*})\neq P^{k}(\cdot\,|\,z,x_{i}^{**}),

which, with compatibility, is itself equivalent to dependence

i Pk|cause(k){i}.i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P}k\,|\,\mathrm{cause}({k})\setminus\{i\}.\qed
Proposition 41.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive quantifiable interventional family with the underlying distribution PP and directed ancestral causal graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). Assume that jj is a direct cause of kk. Then

j Pk|cause(k){j}.j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P}k\,|\,\mathrm{cause}({k})\setminus\{j\}.
Proof.

Assume jdcause(k)j\in\mathrm{dcause}({k}). Thus, the left-hand side of equivalence (9) in Lemma 40 never holds with ii being replaced by jj. Hence, we have the result. ∎

The following examples may help to illustrate the need for compatibility in our proof of Lemma 40.

Example 42 (Supports).

Consider the random variables, X1X_{1} and X2X_{2}, where X1X_{1} is a random isometry of 2\mathbb{R}^{2}, chosen uniformly from a set of a rotations, and X2=X1ϵ2X_{2}=X_{1}\epsilon_{2}, where ϵ2\epsilon_{2} is a standard bivariate normal random variable that is independent of X1X_{1}, and X1X_{1} acts on ϵ2\epsilon_{2} in the natural way. It is well-known that X1X_{1} is independent of X2X_{2}. However, on a standard intervention, if the possibility of translations are included, then X~1\tilde{X}_{1} would be dependent of X2X_{2}, and 1dcause(2)1\in\mathrm{dcause}({2}). Notice that compatibility is not satisfied, since it allows for translations that were a null set under the underlying distribution for X1X_{1}, which was restricted to rotations. \Diamond

With regards to compatibility, it is not enough for two measures to have common support.

Example 43 (Mutually singular measures with common support).

Let UU be uniformly distributed on the unit interval. Note that the bits of the binary expansion of UU given by sequence b(U)b(U) are iid Bernoulli random variables with parameter p=12p=\tfrac{1}{2}. Working in reverse, we let VV be a random variable on the unit interval such that b(V)b(V) is a sequence of iid Bernoulli random variables with parameter p=13p=\tfrac{1}{3}. Note that although UU and VV both have the unit interval as their support, their associated laws are mutually singular.

Consider random variables, X1X_{1} and X2=ϕ(X1,ϵ2)X_{2}=\phi(X_{1},\epsilon_{2}), where ϵ2\epsilon_{2} is uniformly distributed in the unit interval, and independent of X1X_{1}, and the procedure for generating X2X_{2} is as follows. We examine the input X1X_{1} by taking b(X1)b(X_{1}) and then taking the (infinite) sample average; once the sample average qq is obtained, we generate a single Bernoulli random variable with parameter qq. If the distribution of X1X_{1} is fixed to that of either UU or VV above, then clearly X2X_{2} is independent of X1X_{1}. However, with an intervention on X1X_{1} that allows for a mixture of UU and VV, we will have 1dcause(2)1\in\mathrm{dcause}({2}). \Diamond

7.2 Bivariate-quantifiable interventional families

As before, let V={1,,N}V=\{1,\ldots,N\}. Let 𝒫do\mathcal{P}_{\mathrm{do}} be a quantifiable interventional family. We say that 𝒫do\mathcal{P}_{\mathrm{do}} is a bivariate-quantifiable interventional family (of distributions) if the underlying distribution PP satisfies the following:

Axiom 5.

For all distinct i,j,kVi,j,k\in V such that jcause(k)j\notin\mathrm{cause}({k}) and kcause(j)k\notin\mathrm{cause}({j}), there exist regular conditional probabilities such that their marginals on 𝒳A\mathcal{X}_{A} satisfy

Pdo(i){j,k}(|xcause({j,k}){i},xi)={P{j,k}(|xcause({j,k})), if icause({j,k})P{j,k}(|xcause({j,k}){i},xi), if icause({j,k})P^{\{j,k\}}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({\{j,k\}})\setminus\left\{{i}\right\}},x_{i})=\begin{cases}P^{\{j,k\}}(\cdot\,|\,x_{\mathrm{cause}({\{j,k\}})}),\text{ if }i\not\in\mathrm{cause}({\{j,k\}})&\\ P^{\{j,k\}}(\cdot\,|\,x_{\mathrm{cause}({\{j,k\}})\setminus\left\{{i}\right\}},x_{i}),\text{ if }i\in\mathrm{cause}({\{j,k\}})\end{cases}

for all xcause({j,k}){i}𝒳cause({j,k}){i}x_{\mathrm{cause}({\{j,k\}})\setminus\left\{{i}\right\}}\in\mathcal{X}_{\mathrm{cause}({\{j,k\}})\setminus\left\{{i}\right\}} and all xi𝒳ix_{i}\in\mathcal{X}_{i}.

We note that a bivariate version above does not hold in general, and in particular, may not hold when the intervention ii takes place in between jj and kk.

Example 44.

Consider again the simple SCM, where 1231\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}2\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}3 and 131\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}3. In a discrete setting, we have

Pdo(2)1,3(x1,x3|x2)=(X1=x1,ϕ3(X1,x2,ϵ3)=x3),P^{1,3}_{\mathrm{do}(2)}(x_{1},x_{3}\,|\,x_{2})=\mathbb{P}(X_{1}=x_{1},\phi_{3}(X_{1},x_{2},\epsilon_{3})=x_{3}),

but

P𝒞1,3(x1,x3|x2)=(X1=x1,ϕ3(X1,x2,ϵ3)=x3|X2=x2),P_{\mathcal{C}}^{1,3}(x_{1},x_{3}\,|\,x_{2})=\mathbb{P}(X_{1}=x_{1},\phi_{3}(X_{1},x_{2},\epsilon_{3})=x_{3}|X_{2}=x_{2}),

and in general, the expressions will not be equal. \Diamond

We have a similar result to that of Proposition 36 for bivariate observable interventional families.

Proposition 45.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a bivariate-quantifiable interventional family with the underlying distribution PP. Then for all distinct i,j,kVi,j,k\in V such that jcause(k)j\notin\mathrm{cause}({k}) and kcause(j)k\notin\mathrm{cause}({j}), there exist regular conditional probabilities such that their marginals on 𝒳{j,k}\mathcal{X}_{\left\{{j,k}\right\}} satisfy

Pdo(i){j,k}(|xcause({j,k}))=P{j,k}(|xcause({j,k}))P^{\{j,k\}}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({\{j,k\}})})=P^{\{j,k\}}(\cdot\,|\,x_{\mathrm{cause}({\{j,k\}})})

for all xcause({j,k})𝒳cause({j,k})x_{\mathrm{cause}({\{j,k\}})}\in\mathcal{X}_{\mathrm{cause}({\{j,k\}})}.

Proof.

The proof is a simple routine modification of the proof of Proposition 36 for the univariate case. ∎

For bivariate-quantifiable interventional families, we have the following important conditional independence result.

Corollary 46 (Conditional independence of a disjoint pair on intervention).

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a bivariate-quantifiable interventional family. Then, for distinct i,j,kVi,j,k\in V such that jcause(k)j\notin\mathrm{cause}({k}) and kcause(j)k\notin\mathrm{cause}({j}), we have that

j Pdo(i)k|cause({j,k})j Pk|cause({j,k}).j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({\{j,k\}})\Longleftrightarrow j\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({\{j,k\}}).
Proof.

The result is an immediate consequence of Proposition 45. ∎

We now provide conditions under which quantifiable families are observable.

Theorem 47 (Bivariate-quantifiable interventional families are strongly-observable interventional).

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive bivariate-quantifiable interventional family with the underlying distribution PP and a directed ancestral causal graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). If Pdo(i)P_{\mathrm{do}(i)} satisfies the intersection and composition properties for every iVi\in V, then 𝒫do\mathcal{P}_{\mathrm{do}} is strongly observable.

Proof.

Note that by Remark 35, we may replace ιcause()\mathrm{\iota cause}({\cdot}) with cause()\mathrm{cause}({\cdot}) in the axioms.

Axioms 2 (b) and 3 (b) now follow from Lemma 40 and transitivity.

We prove Axioms 2 (a) and 3 (a). If jcause(k)j\notin\mathrm{cause}({k}) and kcause(j)k\notin\mathrm{cause}({j}), then the result follows directly from Corollary 46. Hence, assume, without loss of generality, that jcause(k)j\in\mathrm{cause}({k}). In this case, cause({j,k})=cause(k){j}\mathrm{cause}({\{j,k\}})=\mathrm{cause}({k})\setminus\{j\}. By conditioning, for measurable J𝒳jJ\subseteq\mathcal{X}_{j} and K𝒳kK\subseteq\mathcal{X}_{k}, we have

Pdo(i){j,k}(J,K|xcause(k){j})=KJ𝑑Pdo(i)k(xk|xcause(k){j},xj)𝑑Pdo(i)j(xj|xcause(k){j})P^{\{j,k\}}_{\mathrm{do}(i)}(J,K\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})=\int_{K}\int_{J}dP^{k}_{\mathrm{do}(i)}(x_{k}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}},x_{j})dP^{j}_{\mathrm{do}(i)}(x_{j}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})

and

P{j,k}(J,K|xcause(k){j})=KJ𝑑Pk(xk|xcause(k){j},xj)𝑑Pj(xj|xcause(k){j})P^{\{j,k\}}(J,K\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})=\int_{K}\int_{J}dP^{k}(x_{k}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}},x_{j})dP^{j}(x_{j}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})

By Proposition 36, we have

Pdo(i)k(|xcause(k){j},xj)=Pk(|xcause(k){j},xj).P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{\mathrm{cause}({k})\setminus\{j\}},x_{j})=P^{k}(\cdot\,|\,x_{\mathrm{cause}({k})\setminus\{j\}},x_{j}).

Hence, together with conditional independence j Pdo(i)k|cause(k){j}j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\left\{{j}\right\}, we have the disintegration

P{j,k}(J,K|xcause(k){j})\displaystyle P^{\{j,k\}}(J,K\,|\,x_{\mathrm{cause}({k})\setminus\{j\}}) =\displaystyle= KJ𝑑Pdo(i){k}(xk|xcause(k){j},xj)𝑑P{j}(xj|xcause(k){j})\displaystyle\int_{K}\int_{J}dP^{\{k\}}_{\mathrm{do}(i)}(x_{k}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}},x_{j})dP^{\{j\}}(x_{j}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})
=\displaystyle= K𝑑Pdo(i)k(xk|xcause(k){j})J𝑑Pj(xj|xcause(k){j})\displaystyle\int_{K}dP^{k}_{\mathrm{do}(i)}(x_{k}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})\int_{J}dP^{j}(x_{j}\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})
=\displaystyle= Pdo(i)k(K|xcause(k){j})Pj(J|xcause(k){j}),\displaystyle P^{k}_{\mathrm{do}(i)}(K\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})P^{j}(J\,|\,x_{\mathrm{cause}({k})\setminus\{j\}}),

and setting J=𝒳jJ=\mathcal{X}_{j} we have

Pk(K|xcause(k){j})=Pdo(i)k(K|xcause(k){j})P^{k}(K\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})=P^{k}_{\mathrm{do}(i)}(K\,|\,x_{\mathrm{cause}({k})\setminus\{j\}})

from which we have the conditional independence j Pk|cause(k){j}j\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({k})\setminus\left\{{j}\right\}; the other required direction of the proof is similar. ∎

As a direct consequence of Theorem 28, we have the following Markov property.

Corollary 48.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive bivariate-quantifiable interventional family, with underlying joint distribution PP and directed ancestral causal graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). Assume that PP and Pdo(i)P_{\mathrm{do}(i)}, for every iVi\in V, satisfy the intersection property and the composition property. It then holds that PP is Markovian to the causal graph, G(𝒫do)G(\mathcal{P}_{\mathrm{do}}).

Corollary 49.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive quantifiable interventional family with the underlying distribution PP and directed ancestral causal graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). If Pdo(i)P_{\mathrm{do}(i)} satisfies the intersection and composition properties for every iVi\in V, then PP satisfies the converse pairwise Markov property w.r.t. G(𝒫do)G(\mathcal{P}_{\mathrm{do}}).

Proof.

The case where there is an arrow from jj to kk is Proposition 41.

The required dependence, j Pdo(i)k|cause({j,k})j\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({\{j,k\}}), where there is an arc between jj and kk in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) follows from Theorem 47 and Proposition 30. ∎

7.3 Uniqueness of observable distribution

We can now prove the uniqueness of PP under Axiom 4 for DAGs.

Proposition 50.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive bivariate-quantifiable interventional family, with underlying joint distribution PP. Assume that PP and Pdo(i)P_{\mathrm{do}(i)}, for every iVi\in V, satisfy the intersection property and the composition property. If G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) is a DAG then PP is unique.

Proof.

Notice that the Markov property in Corollary 48 leads to a factorization of PP for DAGs. For measurable F𝒳=iV𝒳iF\subset\mathcal{X}=\prod_{i\in V}\mathcal{X}_{i}, we have

P(F)\displaystyle P(F) =\displaystyle= A𝑑P(x)\displaystyle\int_{A}dP(x)
=\displaystyle= FkVdPk(xk|x>k)\displaystyle\int_{F}\prod_{k\in V}dP^{k}(x_{k}\,|\,x_{>k})
=\displaystyle= FkVdPk(xk|xcause(k)),\displaystyle\int_{F}\prod_{k\in V}dP^{k}(x_{k}\,|\,x_{\mathrm{cause}({k})}),

where >k>k is the set of all nodes larger than kk, which is obtained from a valid ordering of nodes (see Corollary 11), and the last equality uses the independence k P(>k)cause(k)|cause(k)k\mbox{$\>\perp\perp$ }_{P}(>k)\setminus{\mathrm{cause}({k})}\,|\,\mathrm{cause}({k}), implied by Markov property. The desired uniqueness now follows from Remark 38. ∎

Notice that, in the case where there is an arc, the uniqueness does not hold. One can simply consider an ijij-arc to observe that, using Axiom 4, only the marginal distributions of XiX_{i} and XjX_{j} are determined.

8 Specialization to structural causal models

In this section, we relate the standard intervention on SCMs to the setting presented in this paper.

Let 𝒞\mathcal{C} be an SCM with random vector XX taking values on 𝒳=iV𝒳i\mathcal{X}=\prod_{i\in V}\mathcal{X}_{i} with joint distribution P𝒞P_{\mathcal{C}}, and associated graph G𝒞G_{\mathcal{C}}. Consider again the standard intervention in SCMs, where intervention on iVi\in V replaces the equation Xi=ϕi(XpaG(i),ϵi)X_{i}=\phi_{i}(X_{\mathrm{pa}_{G}(i)},\epsilon_{i}) by Xi=X~iX_{i}=\tilde{X}_{i}, where X~i\tilde{X}_{i} is independent of all other noises. In the setting of this paper, the new system of equations after intervening on ii yields the joint distribution P𝒞[do(i)=X~i]P_{\mathcal{C}}{\scriptstyle[{\mathrm{do}(i)=\tilde{X}_{i}}]}, and consequently one obtains the family of distributions 𝒫𝒞[do=X~]:={P𝒞[do=X~1],,P𝒞[do=X~N]}.\mathcal{P}_{\mathcal{C}}{\scriptstyle[{\mathrm{do}=\tilde{X}}]}:=\{P_{\mathcal{C}}{\scriptstyle[{\mathrm{do}=\tilde{X}_{1}}]},\ldots,P_{\mathcal{C}}{\scriptstyle[{\mathrm{do}=\tilde{X}_{N}}]}\}.

Remark 51.

The definition of the set cause(i)\mathrm{cause}({i}), defined for 𝒫𝒞[do=X~]\mathcal{P}_{\mathcal{C}}{\scriptstyle[{\mathrm{do}=\tilde{X}}]}, in the setting of this paper is identical to the definition of the set of “causes” of ii in the SCM setting [22, 24]. \Diamond

Remark 52.

Note that under some weak, but technical assumptions (in the sense of compatibility), we suspect it is possible to show that cause(A)\mathrm{cause}({A}) is invariant under the choice of X~\tilde{X}; see also Proposition 6.13 in [24], which has technical counterexamples.

Therefore, under invariance, if we are only interested in the causal structure, we can simply refer to some canonical family, where intervention X~i\tilde{X}_{i} has the same distribution as XiX_{i}, which we denote by 𝒫do(𝒞)={Pdo(1),,Pdo(N)}\mathcal{P}_{\mathrm{do}}(\mathcal{C})=\{P_{\mathrm{do}(1)},\ldots,P_{\mathrm{do}(N)}\}. \Diamond

8.1 SCMs and interventional families

The first question that needs to be addressed is when 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies different axioms and key assumptions related to interventional families. It is well known that cancellations may occur in SCMs so that cause is not transitive (as required in Axiom 1). We do not provide conditions for SCMs such that cause is transitive– the main results in this section do not require transitivity of cause. The following example shows that standard interventions on SCMs do not lead to quantifiable interventional families either.

Example 53.

Consider the collider X1X3X2X_{1}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{3}\mbox{$\hskip 0.50003pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}$}X_{2}, where X3=X1X2mod2X_{3}=X_{1}\oplus X_{2}\bmod 2. Consider the underlying joint distribution P𝒞P_{\mathcal{C}} where X1X_{1} is Bernoulli with parameter p1=1100p_{1}=\tfrac{1}{100} and X2X_{2} is Bernoulli with parameter p2=12p_{2}=\tfrac{1}{2}. Consider the standard interventions where p112p_{1}\to\tfrac{1}{2} and p21100p_{2}\to\tfrac{1}{100}. Although these are standard interventions, the resulting family does not satisfy Axiom 4. Observe that X2X_{2} is a direct cause of X3X_{3}, but X1X_{1} is not a cause of X3X_{3}. However, Pdo(1)(x3=1|x2=1,x1=1)=0P_{\mathrm{do}(1)}(x_{3}=1|x_{2}=1,x_{1}=1)=0 and P(x3=1|x2=1)=P(x1=0)=99100.P(x_{3}=1|x_{2}=1)=P(x_{1}=0)=\tfrac{99}{100}. We also see that the conditions of Theorem 10 are not satisfied. \Diamond

We will need to introduce a concept related to faithfulness on the edge level. We say that 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies the edge-cause condition w.r.t. G𝒞G_{\mathcal{C}} if an arrow from ii to jj in G𝒞G_{\mathcal{C}} implies that idcause(j)i\in\mathrm{dcause}({j}), i.e., i Pdo(i)ji\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}j and i Pdo(i)j|ιcausei(j){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}j\,|\,\mathrm{\iota cause}_{i}(j)\setminus\{i\}.

In Section 8.4, we will discuss simple conditions on the SCM that imply the edge-cause condition. In particular, it is easy to see that if Pdo(i)P_{\mathrm{do}(i)} are faithful to the intervened graphs (G𝒞)i(G_{\mathcal{C}})_{i}, where the upcoming arrows and arcs to ii are removed, the edge-cause condition is satisfied. The same can be said with the weaker condition of adjacency-faithfulness of Pdo(i)P_{\mathrm{do}(i)} and (G𝒞)i(G_{\mathcal{C}})_{i}.

Proposition 54 (Ancestors and causes).

For an SCM 𝒞\mathcal{C} and the family 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}), we have that

dcause(k)paG𝒞(k) and cause(k)anG𝒞(k),\mathrm{dcause}({k})\subseteq\mathrm{pa}_{G_{\mathcal{C}}}(k)\text{ and }\mathrm{cause}({k})\subseteq\mathrm{an}_{G_{\mathcal{C}}}(k), (11)

for every kVk\in V. In addition, if 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies the edge-cause condition w.r.t. G𝒞G_{\mathcal{C}} then

dcause(k)=paG𝒞(k) and paG𝒞(k)cause(k).\mathrm{dcause}({k})=\mathrm{pa}_{G_{\mathcal{C}}}(k)\text{ and }\mathrm{pa}_{G_{\mathcal{C}}}(k)\subseteq\mathrm{cause}({k}). (12)

Moreover, 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) is transitive if and only if

cause(k)\displaystyle\mathrm{cause}({k}) =\displaystyle= anG𝒞(k).\displaystyle\mathrm{an}_{G_{\mathcal{C}}}(k). (13)
Proof.

By the structure of the SCM, if janG𝒞(k)j\notin\mathrm{an}_{G_{\mathcal{C}}}(k), then an intervention X~j\tilde{X}_{j} on XjX_{j} does not affect the distribution of XkX_{k}, so that X~j\tilde{X}_{j} would be independent of XkX_{k}, and would not be a cause of kk. Similarly, if jpaG𝒞(k)j\notin\mathrm{pa}_{G_{\mathcal{C}}}(k), then given ιcausejk{j}\mathrm{\iota cause}_{j}{k}\setminus\{j\} an intervention X~j\tilde{X}_{j} on XjX_{j} does not affect the distribution of XkX_{k}, and thus jj cannot be a direct cause of kk. Hence we have established (11), and (12) follows from the definition of the edge-cause condition.

To prove (13), we note that we already have dcause(k)=paG𝒞(k)\mathrm{dcause}({k})=\mathrm{pa}_{G_{\mathcal{C}}}(k). Transitivity of the interventional family and cause(k)anG𝒞(k)\mathrm{cause}({k})\subseteq\mathrm{an}_{G_{\mathcal{C}}}(k) imply that cause(k)=anG𝒞(k)\mathrm{cause}({k})=\mathrm{an}_{G_{\mathcal{C}}}(k); the other direction follows from the fact that the set of ancestors is transitive. ∎

The following example shows that the inequality may be strict in (11); and (12) does not hold without the edge-cause condition.

Example 55 (Independence and cause in an SCM).

Consider the collider

X1X3X2,X_{1}\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}X_{3}\mbox{$\hskip 0.50003pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}$}X_{2},

where X3=X1X2mod2X_{3}=X_{1}\oplus X_{2}\bmod 2. Consider the underlying joint distribution P𝒞P_{\mathcal{C}} where X1X_{1} is Bernoulli with parameter p1=12p_{1}=\tfrac{1}{2} and X2X_{2} is Bernoulli with parameter p2=12p_{2}=\tfrac{1}{2}. Consider the standard interventions where nothing happens: p112p_{1}\to\tfrac{1}{2} and p212p_{2}\to\tfrac{1}{2}. Then X3X_{3} has no causes, and thus its set of ancestors is not equal to its causes. However, it is easy to verify that Axiom 4 is satisfied. \Diamond

We also recall that the joint distribution P𝒞P_{\mathcal{C}} of a structural causal model 𝒞\mathcal{C} is Markovian to G𝒞G_{\mathcal{C}}. We need a corresponding result to the Markov property of the joint distribution of the SCMs for the intervened distribution.

Lemma 56.

Let 𝒞\mathcal{C} be an SCM. For each iVi\in V, its intervened distribution Pdo(i)P_{\mathrm{do}(i)} is Markovian to (G𝒞)i(G_{\mathcal{C}})_{i} and G𝒞G_{\mathcal{C}}.

Proof.

Notice that intervention on ii yields another SCM with joint distribution Pdo(i)P_{\mathrm{do}(i)} and intervened graph GiG_{i}, where GiG_{i} is obtained from G𝒞G_{\mathcal{C}} by removing all the parents of ii and all the arcs with one endpoint ii. Therefore, Pdo(i)P_{\mathrm{do}(i)} is Markovian to GiG_{i}. Since, GiG_{i} is a strict subgraph of G𝒞G_{\mathcal{C}} (with the same node set), so that a separation in the original graph is a separation in the intervened graph, we conclude that Pdo(i)P_{\mathrm{do}(i)} is Markovian to G𝒞G_{\mathcal{C}}. ∎

Theorem 57 (Strongly observable SCMs).

Let 𝒞\mathcal{C} be an SCM associated to graph G𝒞G_{\mathcal{C}}. Assume that P𝒞P_{\mathcal{C}} satisfies the converse pairwise Markov property w.r.t. G𝒞G_{\mathcal{C}}. Also assume that 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies the edge-cause condition w.r.t. G𝒞G_{\mathcal{C}}. Then 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) is a strongly observable interventional family if G𝒞G_{\mathcal{C}} is ancestral. In addition, if 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) is transitive, then the result holds for BDMGs.

Proof.

To prove Axiom 2 (a) in the ancestral case, by Lemma 5, for a separable pair (j,k)(j,k), we have that jmk|Aj\,\mbox{$\perp$}\,_{m}k\,|\,A, for any subset AA of the ancestors of jj and kk containing their parents. Under the edge-cause condition, cause({j,k})\mathrm{cause}({\{j,k\}}) contains the parents. Hence, by the Markov property of SCMs, we have that j Pk|cause({j,k})j\mbox{$\>\perp\perp$ }_{P}k\,|\,\mathrm{cause}({\{j,k\}}).

To prove Axiom 2 (b), in the ancestral case, suppose that icause(k)i\in\mathrm{cause}({k}). By Proposition 54, we have that ianG𝒞(k)i\in\mathrm{an}_{G_{\mathcal{C}}}(k). By the edge-cause condition, we may assume that there is no arrow from ii to kk. There is also no arc between ii and kk, since causes are ancestors. Since in ancestral graphs, inseparable pairs of nodes cannot be ancestors of each other, we conclude that i,ki,k are separable. Again by using Lemma 5 and the same argument as for Axiom 2 (a), the result follows from the Markov property for SCMs.

If we further assume that 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) is transitive then, by Proposition 54, we have that cause({i,k})=an({i,k})\mathrm{cause}({\{i,k\}})=\mathrm{an}(\{i,k\}), and the results follow without the assumption of the graph being ancestral.

To prove Axioms 3 (a) and (b) , we will consider two cases. If j,kVj,k\in V are not adjacent in G𝒞G_{\mathcal{C}}, then the result follows from Markov property of Pdo(i)P_{\mathrm{do}(i)} for SCMs, given in Lemma 56. If jj and kk are adjacent, then the result follows from the converse pairwise Markov property. ∎

8.2 Causal graphs and graphs associated to SCMs

In this subsection, we present the ultimate relationship between interventions in the SCM setting and the setting in this paper, which relates the “true causal graph” G𝒞G_{\mathcal{C}} with the causal graph G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) defined in this paper. We first need some lemmas.

Lemma 58.

Let 𝒞\mathcal{C} be an SCM. We have that paG(𝒫do(𝒞))(k)paG𝒞(k),\mathrm{pa}_{G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))}(k)\subseteq\mathrm{pa}_{G_{\mathcal{C}}}(k), for every kVk\in V. In addition, if 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies the edge-cause condition w.r.t. its associated graph G𝒞G_{\mathcal{C}}, then paG(𝒫do(𝒞))(k)=paG𝒞(k).\mathrm{pa}_{G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))}(k)=\mathrm{pa}_{G_{\mathcal{C}}}(k).

Proof.

Let jpaG(𝒫do(𝒞))(k)j\in\mathrm{pa}_{G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))}(k). This arrow is a direct cause by construction; hence, by Proposition 54, it follows that jpaG𝒞(k)j\in\mathrm{pa}_{G_{\mathcal{C}}}(k).

To prove the second statement, if ipaG𝒞(k)i\in\mathrm{pa}_{G_{\mathcal{C}}}(k), then by the edge-cause condition, we have idcause(j)i\in\mathrm{dcause}({j}). Hence, by construction, ipaG(𝒫do(𝒞))(j)i\in\mathrm{pa}_{G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))}(j). ∎

To prove a part of the next main result that deals with directed ancestral graphs, we need the following lemma.

Lemma 59.

Let 𝒞\mathcal{C} be an SCM, and assume that 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies the edge-cause condition w.r.t. its associated directed ancestral graph G𝒞G_{\mathcal{C}}. Then for every i,jVi,j\in V, we have

i P𝒞j|cause({i,j})i P𝒞j|an𝒞({i,j}).i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{cause}({\{i,j\}})\Rightarrow i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{an}_{\mathcal{C}}(\{i,j\}).
Proof.

Assume that i P𝒞j|cause({i,j})i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{cause}({\{i,j\}}); in addition, without loss of generality, choose jj not to be a descendent of ii. Let K:=anG𝒞({i,j})cause({i,j})K:=\mathrm{an}_{G_{\mathcal{C}}}(\{i,j\})\setminus\mathrm{cause}({\{i,j\}}), and kKk\in K. By edge-cause condition, kpa(i)pa(j)k\notin\mathrm{pa}(i)\cup\mathrm{pa}(j), and since G𝒞G_{\mathcal{C}} is ancestral, we know that there is no arc between kk and {i,j}\{i,j\}. Hence we have the separation, imk|cause({i,j}){j}i\,\mbox{$\perp$}\,_{m}k\,|\,\mathrm{cause}({\{i,j\}})\cup\{j\}; moreover, since P𝒞P_{\mathcal{C}} is Markovian to G𝒞G_{\mathcal{C}}, we have the independence, i P𝒞k|cause({i,j}){j}i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}k\,|\,\mathrm{cause}({\{i,j\}})\cup\{j\}.

The contraction property implies that i P𝒞{j,k}|cause({i,j})i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}\{j,k\}\,|\,\mathrm{cause}({\{i,j\}}), and an inductive argument with similar reasoning, we obtain that

i P𝒞[{j}K]|cause({i,j}).i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}[\{j\}\cup K]\,|\,\mathrm{cause}({\{i,j\}}).

Finally, by weak union, we have i P𝒞j|an({i,j}).i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{an}(\{i,j\}).

Theorem 60 (Equality of causal and SCM graphs).

Consider a structural causal model 𝒞\mathcal{C} with the joint distribution P𝒞P_{\mathcal{C}}. If 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies the edge-cause condition, and P𝒞P_{\mathcal{C}} satisfies the converse pairwise Markov property w.r.t. the maximal directed ancestral graph G𝒞G_{\mathcal{C}}, then G(𝒫do(𝒞))=G𝒞G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))=G_{\mathcal{C}}. In addition, if 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) is transitive, then the result holds for maximal BDMGs. Moreover, without the maximality assumption, it holds that G(𝒫do(𝒞))G(\mathcal{P}_{\mathrm{do}}(\mathcal{C})) and G𝒞G_{\mathcal{C}} are Markov equivalent.

Proof.

By Lemma 58, we have that paG(𝒫do(𝒞))(k)=paG𝒞(k)\mathrm{pa}_{G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))}(k)=\mathrm{pa}_{G_{\mathcal{C}}}(k). Therefore, in order to show G(𝒫do(𝒞))=G𝒞G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))=G_{\mathcal{C}}, it is enough to prove that G(𝒫do(𝒞))G(\mathcal{P}_{\mathrm{do}}(\mathcal{C})) and G𝒞G_{\mathcal{C}} have the same arcs. We will check pairs of nodes with no edge between them.

Since 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) is strongly observable (Theorem 57), by Proposition 30, there is no edge between ii and jj in G(𝒫do(𝒞))G(\mathcal{P}_{\mathrm{do}}(\mathcal{C})) if and only if i P𝒞j|cause({i,j})i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{cause}({\{i,j\}}); assume that this conditional independence statement holds. If G𝒞G_{\mathcal{C}} is ancestral then by Lemma 59, we have that i P𝒞j|an({i,j})i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{an}(\{i,j\}).

If G𝒞G_{\mathcal{C}} is not ancestral, then using transitivity, and Proposition 54, an()\mathrm{an}(\cdot) is replaced with cause()\mathrm{cause}({\cdot}). Now. in both cases, the converse pairwise Markov property implies that there is no edge between ii and jj in G𝒞G_{\mathcal{C}}.

Now consider the other direction: Let i≁ji\not\sim j in G𝒞G_{\mathcal{C}}. If G𝒞G_{\mathcal{C}} is ancestral, then, since it is also maximal, and cause({i,j})anG𝒞{i,j}\mathrm{cause}({\{i,j\}})\subseteq\mathrm{an}_{G_{\mathcal{C}}}{\{i,j\}}, by Lemma 5, we have imj|cause({i,j})i\,\mbox{$\perp$}\,_{m}j\,|\,\mathrm{cause}({\{i,j\}}). Then the global Markov property for SCMs implies i P𝒞j|cause({i,j})i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{cause}({\{i,j\}}).

If G𝒞G_{\mathcal{C}} is not ancestral, then by Proposition 3, with an()\mathrm{an}(\cdot) replaced with cause()\mathrm{cause}({\cdot}) (using Proposition 54), it holds that i P𝒞j|cause({i,j})i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{cause}({\{i,j\}}). This implies that there is no arc between ii and jj in G(𝒫do(𝒞))G(\mathcal{P}_{\mathrm{do}}(\mathcal{C})).

Finally, we note that if the graph is not maximal, the only pairs of nodes (i,j)(i,j) for which i P𝒞j|cause({i,j})i\mbox{$\>\perp\perp$ }_{P_{\mathcal{C}}}j\,|\,\mathrm{cause}({\{i,j\}}) may not hold are inseparable pairs. However, the lack of an edge between inseparable pairs in G𝒞G_{\mathcal{C}}, and simultaneously the existence of it in G(𝒫do(𝒞))G(\mathcal{P}_{\mathrm{do}}(\mathcal{C})), do not violate Markov equivalence. ∎

The following example shows that we, indeed, require to assume that G𝒞G_{\mathcal{C}} is maximal.

Example 61.

Consider the non-maximal graph of Figure 3 to be G𝒞G_{\mathcal{C}}. Assume standard intervention and also faithfulness of P𝒞P_{\mathcal{C}} and G𝒞G_{\mathcal{C}}. In G(𝒫do(𝒞))G(\mathcal{P}_{\mathrm{do}}(\mathcal{C})), there exists an arc between jj and kk since no matter what one intervenes on, jj and kk always stay dependent given any conditioning set. Notice that here we require two discriminating paths between jj and kk. If there were only one discriminating path between jj and kk (for example with no hh^{\prime} and \ell^{\prime}) then by intervening on any node on the discriminating path (e.g., hh), one obtains the required independence j Pdo(h)k|ιcauseh({j,k})j\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(h)}}k\,|\,\mathrm{\iota cause}_{h}(\{j,k\}). \Diamond

\ellhhkkjj\ell^{\prime}hh^{\prime}
Figure 3: A (non-maximal) graph associated to an SCM
Corollary 62.

Let 𝒞\mathcal{C} be an SCM. If its joint distribution P𝒞P_{\mathcal{C}} and its associated bowless directed mixed graph G𝒞G_{\mathcal{C}} are faithful, and its intervened distribution Pdo(i)P_{\mathrm{do}(i)} and its associated intervened graphs (G𝒞)i(G_{\mathcal{C}})_{i} are faithful, for every ii, then G(𝒫do(𝒞))=G𝒞G(\mathcal{P}_{\mathrm{do}}(\mathcal{C}))=G_{\mathcal{C}}.

Proof.

The result follows from the fact that faithfulness of intervened distributions and graphs implies the edge-cause condition and faithfulness of the underlying distribution and graph implies the converse pairwise Markov property. ∎

8.3 SCMs and quantifiable interventional families

In this subsection, we assume that the graph G𝒞G_{\mathcal{C}} associated to an SCM is a directed ancestral graph in order to follow the theory presented in Section 7.

Although Axiom 4 may not in general be satisfied, we do have the following observations.

Lemma 63.

Let 𝒞\mathcal{C} be an SCM associated to a directed ancestral graph G𝒞G_{\mathcal{C}}. Let iVi\in V, kV{i}k\in V\setminus\left\{{i}\right\}, and AVA\subseteq V be such that pa(k)AanG𝒞(k)\mathrm{pa}(k)\subseteq A\subseteq\mathrm{an}_{G_{\mathcal{C}}}(k). There exist regular conditional probabilities such that their marginals on 𝒳k\mathcal{X}_{k} satisfy

Pk(|xA)=(ϕk(xpaG𝒞(k),ϵk)) for all xA𝒳A,P^{k}(\cdot\,|\,x_{A})=\mathbb{P}(\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in\cdot)\text{ \ for all $x_{A}\in\mathcal{X}_{A}$},
Pdo(i)k(|xA)=Pk(|xA) for all ianG𝒞(k)xA𝒳A,P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{A})=P^{k}(\cdot\,|\,x_{A})\text{ \ for all $i\in\mathrm{an}_{G_{\mathcal{C}}}(k)$, $x_{A}\in\mathcal{X}_{A}$},

and

Pdo(i)k(|xA,xi)=Pk(|xA) for all ianG𝒞(k)xi𝒳ixA𝒳A.P^{k}_{\mathrm{do}(i)}(\cdot\,|\,x_{A},x_{i})=P^{k}(\cdot\,|\,x_{A})\text{ \ for all $i\not\in\mathrm{an}_{G_{\mathcal{C}}}(k)$, $x_{i}\in\mathcal{X}_{i}$, $x_{A}\in\mathcal{X}_{A}$}.
Proof.

Let K𝒳kK\subseteq\mathcal{X}_{k} be measurable. Then for xA𝒳Ax_{A}\in\mathcal{X}_{A}, we have

(XkK|XA=xA)\displaystyle\mathbb{P}(X_{k}\in K|X_{A}=x_{A}) =\displaystyle= (ϕk(xpaG𝒞(k),ϵk)K|XA=xA)\displaystyle\mathbb{P}(\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K|X_{A}=x_{A})
=\displaystyle= (ϕk(xpaG𝒞(k),ϵk)K),\displaystyle\mathbb{P}(\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K),

where the last equality follows from the fact the graph is ancestral, and thus ϵk\epsilon_{k} is independent of XAX_{A}.

Suppose that ianG𝒞(k)i\in\mathrm{an}_{G_{\mathcal{C}}}(k). Similarly, upon standard intervention, on ii, since all the parents are given, we still have

(X~kK|X~A=xA)=(ϕk(xpaG𝒞(k),ϵk)K).\mathbb{P}(\tilde{X}_{k}\in K|\tilde{X}_{A}=x_{A})=\mathbb{P}(\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K).

If ianG𝒞(k)i\not\in\mathrm{an}_{G_{\mathcal{C}}}(k), then upon standard intervention on ii, we have that (ϵk,X~anG𝒞(k),X~i)(\epsilon_{k},\tilde{X}_{\mathrm{an}_{G_{\mathcal{C}}}(k)},\tilde{X}_{i}) are independent, so that

(X~kK|X~A=xA,X~i=xi)=(ϕk(xpaG𝒞(k),ϵk)K).\mathbb{P}(\tilde{X}_{k}\in K|\tilde{X}_{A}=x_{A},\tilde{X}_{i}=x_{i})=\mathbb{P}(\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K).\qed
Lemma 64.

Let 𝒞\mathcal{C} be an SCM associated to a directed ancestral graph G𝒞G_{\mathcal{C}}. Consider distinct i,j,kVi,j,k\in V such that jpaG𝒞(k)j\notin\mathrm{pa}_{G_{\mathcal{C}}}(k) and kpaG𝒞(j)k\notin\mathrm{pa}_{G_{\mathcal{C}}}(j), and AVA\subseteq V be such that pa(j,k)AanG𝒞(j,k)\mathrm{pa}(j,k)\subseteq A\subseteq\mathrm{an}_{G_{\mathcal{C}}}(j,k). There exist regular conditional probabilities such that their marginals on 𝒳j,k\mathcal{X}_{j,k} satisfy

Pj,k(J,K|xA)=(ϕj(xpaG𝒞(j),ϵj)J),ϕk(xpaG𝒞(k),ϵk)K),P^{j,k}(J,K\,|\,x_{A})=\mathbb{P}\big{(}\phi_{j}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(j)},\epsilon_{j})\in J),\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K\big{)},
Pdo(i)j,k(J,K|xA)=Pj,k(J,K|xA) for all ianG𝒞(j,k)xA𝒳A,P^{j,k}_{\mathrm{do}(i)}(J,K\,|\,x_{A})=P^{j,k}(J,K\,|\,x_{A})\text{ \ for all $i\in\mathrm{an}_{G_{\mathcal{C}}}(j,k)$, $x_{A}\in\mathcal{X}_{A}$},

and

Pdo(i)k(J,K|xA,xi)=Pk(J,K|xA) for all ianG𝒞(j,k)xi𝒳ixA𝒳A,P^{k}_{\mathrm{do}(i)}(J,K\,|\,x_{A},x_{i})=P^{k}(J,K\,|\,x_{A})\text{ \ for all $i\not\in\mathrm{an}_{G_{\mathcal{C}}}(j,k)$, $x_{i}\in\mathcal{X}_{i}$, $x_{A}\in\mathcal{X}_{A}$},

for all J𝒳jJ\subseteq\mathcal{X}_{j} and K𝒳kK\subseteq\mathcal{X}_{k} measurable.

Proof.

We have

(XjJ,XkK|XA=xA)\displaystyle\mathbb{P}(X_{j}\in J,X_{k}\in K\,|\,X_{A}=x_{A}) =\displaystyle= (ϕj(xpaG𝒞(j),ϵj)J,ϕj(xpaG𝒞(k),ϵk)K|XA=xA)\displaystyle\mathbb{P}(\phi_{j}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(j)},\epsilon_{j})\in J,\phi_{j}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K\,|\,X_{A}=x_{A})
=\displaystyle= (ϕj(xpaG𝒞(j),ϵj)J),ϕk(xpaG𝒞(k),ϵk)K),\displaystyle\mathbb{P}\big{(}\phi_{j}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(j)},\epsilon_{j})\in J),\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K\big{)},

since by definition AA does not contain jj nor kk, and thus XAX_{A} and (ϵj,ϵk)(\epsilon_{j},\epsilon_{k}) are independent.

Suppose that ianG𝒞(j,k)i\in\mathrm{an}_{G_{\mathcal{C}}}(j,k). Similarly, upon standard intervention, on ii, since, we assumed jpaG𝒞(k)j\notin\mathrm{pa}_{G_{\mathcal{C}}}(k) and kpaG𝒞(j)k\notin\mathrm{pa}_{G_{\mathcal{C}}}(j), all the parents are given,  we still have

(X~jJ,X~kK|X~A=xA)=(ϕk(xpaG𝒞(k),ϵk)K,ϕj(xpaG𝒞(j),ϵj)J).\mathbb{P}(\tilde{X}_{j}\in J,\tilde{X}_{k}\in K\,|\,\tilde{X}_{A}=x_{A})=\mathbb{P}(\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K,\phi_{j}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(j)},\epsilon_{j})\in J).

If ianG𝒞(j,k)i\not\in\mathrm{an}_{G_{\mathcal{C}}}(j,k), then upon standard intervention, on ii, we have ((ϵj,ϵk),X~anG𝒞(j,k),X~i)((\epsilon_{j},\epsilon_{k}),\tilde{X}_{\mathrm{an}_{G_{\mathcal{C}}}(j,k)},\tilde{X}_{i}) are independent, so that

(X~jJ,X~kK|X~A=xA,X~i=xi)=(ϕk(xpaG𝒞(j),ϵj)J,ϕk(xpaG𝒞(k),ϵk)K)\mathbb{P}(\tilde{X}_{j}\in J,\tilde{X}_{k}\in K\,|\,\tilde{X}_{A}=x_{A},\tilde{X}_{i}=x_{i})=\mathbb{P}(\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(j)},\epsilon_{j})\in J,\phi_{k}(x_{\mathrm{pa}_{G_{\mathcal{C}}}(k)},\epsilon_{k})\in K)\qed
Theorem 65.

Let 𝒞\mathcal{C} be an SCM associated to a directed ancestral graph G𝒞G_{\mathcal{C}}. Assume that {𝒫do(𝒞),P𝒞}\{\mathcal{P}_{\mathrm{do}}(\mathcal{C}),P_{\mathcal{C}}\} is compatible, and 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) satisfies the edge-cause condition w.r.t. G𝒞G_{\mathcal{C}}. Then 𝒫do(𝒞)\mathcal{P}_{\mathrm{do}}(\mathcal{C}) is a bivariate-quantifiable interventional family.

Proof.

Using Proposition 54, we have that Lemmas 63 and 64 imply Axioms 4 and 5. ∎

8.4 Characterizing the edge-cause condition

Consider an SCM, where ii is a parent of jj. Previously [30, Proposition 35], we characterized completely which functions ϕj\phi_{j} would result in the independence i j|an(i,j)i\mbox{$\>\perp\perp$ }j\,|\,\mathrm{an}(i,j) and thus the converse pairwise Markov property. Using similar ideas, we give a similar characterization of when the edge-cause holds using the following two remarks.

Proposition 66.

Consider an SCM, with ipa(j)i\in\mathrm{pa}(j). Write Xj=ϕj(Xpa(j){i},Xi,ϵj)X_{j}=\phi_{j}(X_{\mathrm{pa}(j)\setminus\{i\}},X_{i},\epsilon_{j}). For almost all xi𝒳ix_{i}\in\mathcal{X}_{i}, let DxiD_{x_{i}} be a random variable with law Qxi()=(Xpa(j){i}|Xi=xi)Q_{x_{i}}(\cdot)=\mathbb{P}(X_{\mathrm{pa}(j)\setminus\{i\}}\in\cdot\,|\,X_{i}=x_{i}) that is independent of ϵj\epsilon_{j}. Then XiX_{i} is independent of XjX_{j} if and only if for almost all xi𝒳ix_{i}\in\mathcal{X}_{i}, we have that XjX_{j} has the same law as ϕj(Dxi,xi,ϵj).\phi_{j}(D_{x_{i}},x_{i},\epsilon_{j}).

Proof.

Let μi\mu_{i} denote the law of XiX_{i} and let Eϵj\mathrm{E}_{\epsilon_{j}} denote the law of ϵj\epsilon_{j}. For measurable Fi𝒳iF_{i}\subset\mathcal{X}_{i} and Fj𝒳jF_{j}\subset\mathcal{X}_{j}, we have

(Xi\displaystyle\mathbb{P}(X_{i} \displaystyle\in Fi,XjFj)=\displaystyle F_{i},X_{j}\in F_{j})= (14)
𝟏[ϕj(xpa(j){i},xi,ej)Fj)]𝟏[xiFi]dEϵj(ej)dQxi(xpa(j){i})dμi(xi)\displaystyle\int\int\int\mathbf{1}[\phi_{j}(x_{\mathrm{pa}(j)\setminus\{i\}},x_{i},e_{j})\in F_{j})]\mathbf{1}[x_{i}\in F_{i}]dE_{\epsilon_{j}}(e_{j})dQ_{x_{i}}(x_{\mathrm{pa}(j)\setminus\{i\}})d\mu_{i}(x_{i})
=\displaystyle= 𝟏[xiFi](ϕj(Dxi,xi,ϵj)Fj)𝑑μi(xi).\displaystyle\int\mathbf{1}[x_{i}\in F_{i}]\mathbb{P}\big{(}\phi_{j}(D_{x_{i}},x_{i},\epsilon_{j})\in F_{j}\big{)}d\mu_{i}(x_{i}).

Thus if XjX_{j} and XiX_{i} are independent, then

(XjFj)=𝟏[xiFi](XiFi)(ϕj(Dxi,xi,ϵj)Fj)𝑑μi(xi),\mathbb{P}(X_{j}\in F_{j})=\int\frac{\mathbf{1}[x_{i}\in F_{i}]}{\mathbb{P}(X_{i}\in F_{i})}\mathbb{P}\big{(}\phi_{j}(D_{x_{i}},x_{i},\epsilon_{j})\in F_{j}\big{)}d\mu_{i}(x_{i}),

for all measurable Fi𝒳iF_{i}\subset\mathcal{X}_{i} and all Fj𝒳jF_{j}\subset\mathcal{X}_{j} with nonzero probability; since this holds for all measurable subsets FiF_{i} with nonzero probability, independence implies that, for almost every xiFix_{i}\in F_{i}, we have

(XjFj)=(ϕj(Dxi,xi,ϵj)Fj)\mathbb{P}(X_{j}\in F_{j})=\mathbb{P}\big{(}\phi_{j}(D_{x_{i}},x_{i},\epsilon_{j})\in F_{j}\big{)} (15)

and they have same law, as desired.

On the other hand, substituting (15) in (14), we obtain the independence of XiX_{i} and XjX_{j}. ∎

Remark 67 (Characterizing the edge-cause condition in Pdo(i)P_{\mathrm{do}(i)}: cause).

Notice that Proposition 66 remains valid for the associated SCM that is defined by intervention on ii. In particular, with the intervention Xi=X~iX_{i}=\tilde{X}_{i}, we replace QxiQ_{x_{i}} by

Q~xi()=(X~pa(j){i}|X~i=xi).\tilde{Q}_{x_{i}}(\cdot)=\mathbb{P}(\tilde{X}_{\mathrm{pa}(j)\setminus\{i\}}\in\cdot\,|\,\tilde{X}_{i}=x_{i}).

We remark that it is possible on intervention, X~pa(j){i}\tilde{X}_{\mathrm{pa}(j)\setminus\{i\}} is independent X~i\tilde{X}_{i}. Thus Q~xi\tilde{Q}_{x_{i}} and Qxi{Q}_{x_{i}} could behave quite differently. \Diamond

Proposition 68.

Consider an SCM, with ipa(j)i\in\mathrm{pa}(j). Write Xj=ϕj(Xpa(j){i},Xi,ϵj)X_{j}=\phi_{j}(X_{\mathrm{pa}(j)\setminus\{i\}},X_{i},\epsilon_{j}). Suppose that Spa(j){i}S\supseteq\mathrm{pa}(j)\setminus\left\{{i}\right\}. Then i j|Si\mbox{$\>\perp\perp$ }j\,|\,S if and only if for almost all xs𝒳Sx_{s}\in\mathcal{X}_{S} the conditional law of XjX_{j} given XS=xSX_{S}=x_{S} is the same as the law of ϕj(xpa(j){i},xi,ϵj)\phi_{j}(x_{\mathrm{pa}(j)\setminus\{i\}},x_{i},\epsilon_{j}), for almost all xi𝒳ix_{i}\in\mathcal{X}_{i} in the support of the conditional distribution of XiX_{i} given xSx_{S}.

Proof.

The proof is similar to that of Proposition 66 and is a routine variation of the [30, Proposition 35]. ∎

Remark 69 (Characterizing the edge-cause condition in Pdo(i)P_{\mathrm{do}(i)}: direct cause).

As in Remark 67, we apply Proposition 68 to the associated SCM that is defined by intervention on ii; we set S=ιcausei(j){i}S=\mathrm{\iota cause}_{i}(j)\setminus\{i\}. \Diamond

9 Identifying cases that need extra or multiple concurrent interventions

Our theory is based on the interventional family 𝒫do={Pdo(i)}iV\mathcal{P}_{\mathrm{do}}=\{P_{\mathrm{do}(i)}\}_{i\in V}, which only allows single interventions.

We note again that, for ancestral causal graphs, under certain conditions, direct cause can be simply defined using i Pdo(i)k|cause(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}; see Section 6. The following example shows that for the case of SCMs for non-ancestral graphs, this definition might misidentify some direct causes.

Example 70.

In Example 14 and the graph of Figure 1, we observed that an iterative procedure is needed to obtain i Pdo(i)k|ιcausei(k){i}}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\}\}, which does not coincide with i Pdo(i)k|cause(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\} in this case.

In an SCM associated to the below graph with standard intervention, we see that even i Pdo(i)k|ιcausei(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{\iota cause}_{i}(k)\setminus\{i\} misidentifies ii as the direct cause of kk since ii has no parents in the graph. We observe that, in this case the independence i Pdo(j)k|cause(k){i}i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(j)}}k\,|\,\mathrm{cause}({k})\setminus\{i\} holds.

jjhhkkii
Figure 4: A (non-maximal) graph associated to an SCM

\Diamond

Below, we classify the cases for non-ancestral graphs, where direct cause cannot be defined in the way described above.

The result below shows that all cases where cause(k)\mathrm{cause}({k}) is not sufficient to define dcause(k)\mathrm{dcause}({k}) result in PIPs:

Proposition 71.

For a transitive interventional family 𝒫do\mathcal{P}_{\mathrm{do}}, assume Pdo(i)P_{\mathrm{do}(i)} is Markovian to the ii-intervened graph, Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) (e.g., by satisfying conditions of Theorem 26.) For non-adjacent pair of nodes ii and kk, let icause(k)dcause(k)i\in\mathrm{cause}({k})\setminus\mathrm{dcause}({k}). If i Pdo(i)k|cause(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(i)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}, then there is a PIP between ii and kk in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}).

Proof.

Since Pdo(i)P_{\mathrm{do}(i)} is Markovian to Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}), there is a σ\sigma-connecting path between ii and kk given cause(k){i}\mathrm{cause}({k})\setminus\{i\}; this path is also connecting in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). Since cause(k)=anG(𝒫do)(k)\mathrm{cause}({k})=\mathrm{an}_{G(\mathcal{P}_{\mathrm{do}})}(k), and the global Markov property does not imply the pairwise Markov property only for inseparable pairs, i,ki,k must be an inseparable pair. By Proposition 4, there is a PIP in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}) between ii and kk. ∎

For icause(k)i\in\mathrm{cause}({k}), there could be two types of PIPs between ii and kk, described in the above proposition. Notice again that such cases can happen only for non-ancestral graphs and when the true causal graph is non-maximal.

  1. (1)

    If this PIP is not a PIP in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) (as in the case of Figure 1), i.e., an inner node of the PIP is only an ancestor of kk via ii, then the iterative procedure to define direct cause works as it is designed to ensure that a direct cause is not placed in the causal graph incorrectly.

  2. (2)

    If the PIP is a PIP in Gi(𝒫do)G_{i}(\mathcal{P}_{\mathrm{do}}) (as is, e.g., the case in Figure 4), then the current theory is incomplete in the sense that some direct causes may be misidentified as our theory considers idcause(k)i\in\mathrm{dcause}({k}), and it always places an arrow from ii to kk in G(𝒫do)G(\mathcal{P}_{\mathrm{do}}). We have two sub-cases here:

    1. (a)

      If there exists maximum one such PIP between each pair of nodes, then we propose the following adjustment to the definition of direct cause, which fixes this issue:

      We define ii to be a direct cause of kk if for every jj (that may be ii but not kk), it holds that i Pdo(j)k|cause(k){i}i\nolinebreak{\not\mbox{$\>\perp\perp$ }}_{P_{\mathrm{do}(j)}}k\,|\,\mathrm{cause}({k})\setminus\{i\}.

      Hence, as a procedure, one can generate the intervened causal structure Si(𝒫do)S_{i}(\mathcal{P}_{\mathrm{do}}), and if there is a PIP between ii and kk then performs the extra test

      i Pdo(j)k|cause(k){i},i\mbox{$\>\perp\perp$ }_{P_{\mathrm{do}(j)}}k\,|\,\mathrm{cause}({k})\setminus\{i\},

      for a jj on PIP. If it holds, then the arrow from ii to kk will be removed.

      We note that, for observable interventional families, and for separable pairs, by Axioms 2 and 5, the original definition of direct cause is equivalent to this new one. However, this definition, is an extension of the original definition for inseparable pairs.

      Notice also that in such cases, similar to bows, by only knowing 𝒫do\mathcal{P}_{\mathrm{do}}, it is not possible to distinguish an arrow from an arc between ii and kk. We treat such cases as a direct cause from ii to kk.

    2. (b)

      If there are more than one such PIPs between ii and kk then, In the case of faithful SCMs, no matter which jj one intervenes on, iσk|cause(k){i}i\nolinebreak{\not\,\mbox{$\perp$}\,}_{\sigma}k\,|\,\mathrm{cause}({k})\setminus\{i\}. Hence, Markov property does not imply independence of ii and kk given cause(k){i}\mathrm{cause}({k})\setminus\{i\}.

      In such cases, one should intervene concurrently on one node on each of these PIPs to determine whether ii and kk become separated given cause(k){i}\mathrm{cause}({k})\setminus\{i\}. Hence our theory based on single interventions is not identifying such direct causes. To be more precise, we have the following remark.

Remark 72.

Let 𝒫do\mathcal{P}_{\mathrm{do}} be a transitive interventional family. Assume icause(k)i\in\mathrm{cause}({k}). Suppose that there are rr PIPs π1,,πr\pi_{1},\cdots,\pi_{r} between ii and kk. In order to identify whether idcause(k)i\in\mathrm{dcause}({k}), the existence of a concurrent intervention Pdo(j1,,jr)P_{\mathrm{do}(j_{1},\cdots,j_{r})} of size rr, is necessary, where each jsj_{s}, 1sr1\leq s\leq r, is an inner node of πs\pi_{s}. \Diamond

10 Summary and discussion

For a given family of distributions for a finite number of variables with the same state space (which we refer to as interventional family), we have defined the set of causes of each variable. This is equivalent to the definition of the cause provided in the literature for SCMs. We have provided transitivity of the cause as the first axiom on the given family, as it leads to reasonable causal graphs. We have provided weaker conditions for interventional families under which the family is transitive if the interventional distributions satisfy the property of singleton-transitivity. Although, in general, causation does not seem to be transitive [12], it seems to us that examples for which causation is not transitive mostly follow the counterfactual interpretation of causation rather than the conditional independence one; and possible examples based on conditional independence interpretations do not satisfy singleton-transitivity. We refrain from philosophical discussions here, but note that in our opinion, representing causal structure using graphs implicitly implies that one is focusing on the cases where cause in indeed transitive as the directed paths in directed graphs are transitive.

We have defined the concept of direct cause using the definition of the cause and pairwise independencies given the joint causes in Pdo(i)P_{\mathrm{do}(i)} with an iterative procedure in the general case. However, under the assumption that the causal graph is directed ancestral, causal graphs can be defined without an iterative procedure. The major departure here from the literature is that the direct cause of a variable is defined using single interventions and by conditioning on other causes of the variable. As opposed to the defined causal relationships, which stand true in a larger system of variables, the direct causal relationship clearly depends on the system—it seems that one can always add a new variable to the system that breaks the direct causal relationship by sitting between the two variables as an intermediary.

Our original motivation to write this paper was to relax the common assumption that there exists a true causal graph, and, thereby, the goal of causal inference is solely to learn or estimate this graph. We do not need any such assumption under our axiomatization as we define causal graphs using intervened graphs, which themselves are defined using the concept of the direct cause for arrows and the pairwise dependencies given the joint causes in the interventional distributions for arcs. Arcs represent latent variables, and our generated graphs also allow for causal cycles. Transitivity ensures that the causal relationships in the interventional family and the ancestral relationships in graph are interchangeable.

We believe this setting can be extended to causal graphs that unify anterial graphs [19] with cyclic graphs. Such a graph represents, in addition, symmetric causal relationships implied by feedback loops; see [18]. In order to do so, some extension of Markov properties, presented here for BDMGs, for this larger class of graphs is needed.

For the case where causal cycles exist, one advantage of the setting presented here is that it is easy to provide examples for cyclic graphs under our axiomatization; see Example 22. This is in contrast with the case of SCMs with cycles, where, for this purpose, strong solvability assumptions must be satisfied [3].

We notice again that there is no need for PP to define the causal graph. We provide a minimalist and a maximalist approach to place an arc in the casual graph based on whether the arc exists in intervened graphs. This will only lead to different causal graphs where arcs between inseparable pairs of nodes are or are not present.

We have also axiomatized interventional families with an observed distribution relating certain conditional independence properties in PP and Pdo(i)P_{\mathrm{do}(i)}, and called a family that satisfies these axioms (strongly) observable interventional families. Although we have defined the causal graph only using Pdo(i)P_{\mathrm{do}(i)}, we show that under these axioms, the arcs in causal graph can be directly defined using the observed distribution PP.

There are two main results in this paper: firstly, under compositional graphoids with transitive families, any interventional distribution Pdo(i)P_{\mathrm{do}(i)} is Markovian to the intervened causal graph over node ii; secondly, under observable interventional families, PP is Markovian to the generated causal graph. We note that our definition of the arc, partly ensures automatically that the Markov properties are satisfied. However, under an alternative definition of the arc we provided—which places an arc when the endpoint variables are always dependent regardless of what we condition on—no assumptions related to Markov property is in place—in this case, we need the extra assumption of ordered upward- and downward-stability to obtain the Markov property. These Markov-property results are analogous to the case of SCMs, and, consequently, this allows the developed theory for causality in the SCMs to be embedded in the general setting of this paper.

We mostly work on definitions and axioms for interventional families that are only related to conditional independence structure of interventional and observed distributions. We note that although they are sufficient for generating and making sense of causal graphs, for “measuring” causal effects (which we do not discuss) they are not sufficient. For that purpose, for the case of directed ancestral causal graphs, we provide the axiom of (bivariate) quantifiable interventional families, which relates the univariate (and bivariate) conditional-marginals of the distributions in the family to those of an underlying distribution PP. In principle, PP could be learned via observation, and in the case of DAGs, it is determined uniquely by the interventional family. The extension of this axiom to BDMGs seems quite technical, and requires further study.

Note that satisfaction of the axioms for a family of distributions does not mean that the family provides the correct interventions—refer again to Example 22 to observe that all three types of edges, as the causal graph for different interventional families of two variables with the same underlying distribution PP, can occur. Finding the correct interventional families is a question for mathematical and statistical modeling. One can think of this as being analogous to Kolmogorov probability axioms [15]: a measurable space satisfying Kolmogorov axioms does not mean that it provides the correct probability for the experiment at hand. This is not the case in the SCM setting, as in the presence of densities, interventional distributions with full support are equivalent [24]—this is because the causal graph in this setting is assumed to exist and already set in place. Example 23 shows that even the skeleton of the causal graph could change by the change of interventions with the same underlying distribution.

When we relate SCMs to the setting of this paper, we find that if the SCM satisfies some weaker version of faithfulness given by the edge-cause and converse pairwise Markov property, then, in the case where the natural graph associated with the SCM is ancestral, the causal graph, given by standard interventions, is the same as the SCM graph; if the graph is not ancestral, then if the interventions are transitive, we can recover this result for maximal BDMGs. These results demonstrate that our theory is compatible with the standard theory, for a large class of SCM.

We have not provided conditions on SCMs under which the cause is transitive, although it is not used for the main results related to SCMs being embedded in the setting of this paper. Our initial investigation revealed that this is quite a technical problem. This is nevertheless beyond the scope of this paper, and is a subject of future work.

Finally, although an advantage of our theory is that it only relies on single interventions, our theory might misidentify direct causes between primitive inducing paths in the intervened graphs for non-maximal non-ancestral causal graphs. We provide an adjustment to deal with this when there is only one PIP exists between a pair. If there are more PIPs between a pair, we showed that we need multiple concurrent intervention of the size of the number of PIPs between the pair.

Similarly, our theory does not include some cases where multiple concurrent interventions could act as the cause of a random variable whereas none of them individually act as the cause; for example; see Example 53. Understanding these cases, and developing a similar theory for such cases is a subject of future work.

11 Appendix

Pairwise Markov properties for bowless directed mixed graphs (BDMGs)

Here, we prove the equivalence of the pairwise Markov property (PMP) and the global Markov property by defining an intermediary pairwise Markov property for acyclic directed mixed graphs (ADMGs).

First, we need to define the concept of acyclification from [3, 11], which generates a Markov-equivalent acyclic graph from a graph that contains directed cycles by adding and replacing some edges in the original graph: For a directed mixed graph GG (which contains directed cycles), the acyclification of GG is the acyclic graph GacyG^{\mathrm{acy}} with the same node set, and the following edge set:

  1. \bullet  

    There is an arrow jij\mbox{$\hskip 0.50003pt\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}i in GacyG^{\mathrm{acy}} if jpa(sc(i))sc(i)j\in\mathrm{pa}(\mathrm{sc}(i))\setminus\mathrm{sc}(i) in GG;

  2. \bullet  

    and there is an arc iji\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j in GacyG^{\mathrm{acy}} if and only if there exist isc(i)i^{\prime}\in\mathrm{sc}(i) and jsc(j)j^{\prime}\in\mathrm{sc}(j) with i=ji^{\prime}=j^{\prime} or iji^{\prime}\mbox{$\hskip 0.59998pt\prec\!\!\!\!\!\frac{\hskip 4.89998pt\hskip 4.89998pt}{\hskip 4.89998pt}\!\!\!\!\!\succ\!\hskip 1.07639pt$}j^{\prime} in GG.

Lemma 73.

Let GG be a BDMG. For distinct nodes ii and jj, we have ianG(j)i\in\mathrm{an}_{G}(j) and iscG(j)i\notin\mathrm{sc}_{G}(j) if and only if ianGacy(j)i\in\mathrm{an}_{G^{\mathrm{acy}}}(j).

Proof.

(\Rightarrow) Suppose that there is a directed path π=i=i0,i1,in=j\pi=\langle i=i_{0},i_{1}\cdots,i_{n}=j\rangle from ii to jj in GG. This path is a directed path in GacyG^{\mathrm{acy}} unless an arrow irir+1i_{r}i_{r+1} turns into an arc. However, in this case, ir+1sc(ir)i_{r+1}\in\mathrm{sc}(i_{r}) and, hence, ir1pa(ir+1)i_{r-1}\in\mathrm{pa}(i_{r+1}). An inductive argument implies the result since isc(j)i\notin\mathrm{sc}(j).

(\Leftarrow) Conversely, if π=i=i0,i1,in=j\pi=\langle i=i_{0},i_{1}\cdots,i_{n}=j\rangle is a directed path in GacyG^{\mathrm{acy}} from ii to jj then an arrow irir+1i_{r}i_{r+1} on π\pi exists in GG unless iri_{r} is a parent of another node kk such that ksc(ir+1)k\in\mathrm{sc}(i_{r+1}). This implies that iran(ir+1)i_{r}\in\mathrm{an}(i_{r+1}) in GG, and, again, an inductive argument implies that ian(j)pa(j)i\in\mathrm{an}(j)\setminus\mathrm{pa}(j).

In addition, isc(j)i\notin\mathrm{sc}(j) in GG since, if, for contradiction, this is not the case then all the arrows in the strongly connected component containing ii and jj turn into arcs. If there is an arrow kk\ell generated in GacyG^{\mathrm{acy}} that makes the directed path from ii to jj, then by the construction of acyclification, it can be seen that kk is an ancestor of \ell, and hence they are still a part of the same strongly connected component, and they must turn into arcs, which is a contradiction. ∎

Proposition 74.

If GG is a BDMG, then GacyG^{\mathrm{acy}} is an ADMG.

Proof.

It is easy to see from the definition of acyclification that GacyG^{\mathrm{acy}} is acyclic; see also [3].∎

It was shown in [3, Proposition A.19] that the global Markov property could be read off equivalently from the acyclification of a directed mixed graph; see also [11].

Proposition 75 (Equivalence of the separation criteria [3]).

Let GG be a BDMG. For disjoint subsets of nodes A,B,CA,B,C, we have

AσB|C in GAmB|C in Gacy.A\,\mbox{$\perp$}\,_{\sigma}B\,|\,C\text{ in }G\iff A\,\mbox{$\perp$}\,_{m}B\,|\,C\text{ in }G^{\mathrm{acy}}.

Before proceeding to provide required results for proving Theorem 2, we prove Proposition 4. It was proven in [19] that every inseparable pair is connected by a PIP for a generalization of ADMGs; see Section 4 of this paper. The definition of a PIP for ADMGs is as follows: a path is a PIP if every inner node is a collider node and an ancestor of one of the endpoints.

Proof of Proposition 4.

We suppose that (i,k)(i,k) is an inseparable pair in GG. In GacyG^{\mathrm{acy}}, by Propositions 74 and 75, this is an inseparable pair or there is an edge between ii and kk.

First we show that if there is an edge between hh and \ell in GacyG^{\mathrm{acy}} then this is an edge in GG or there is a PIP in GG. Assume it is not an edge in GG. There are two cases: If it is an arrow, then hh and \ell are part of the same strongly connected component and a path on this is a PIP in GG by definition. If hh\ell is an arc, then hh and \ell are part of the same strongly connected component again or h,hh,h^{\prime} and ,\ell,\ell^{\prime} are parts of the same strongly connected component, respectively, where there is an arc between hh^{\prime} and \ell^{\prime}. In this case, a path consisting of a directed path on the component between hh and hh^{\prime}, the hh^{\prime}\ell^{\prime}-arc, and a directed path on the component between \ell and \ell^{\prime} is a PIP in GG.

The above will resolve the case where there is an edge between ii and kk. Now consider the case where ii and kk are inseparable in GacyG^{\mathrm{acy}}. It was proven in [19] that every inseparable pair is connected by a PIP for a generalization of ADMGs; see Section 4 of this paper.

Hence, we need to show that if there is a primitive inducing path π\pi in GacyG^{\mathrm{acy}} between ii and kk then there is a PIP in GG between ii and kk. From the previous argument about edges in GacyG^{\mathrm{acy}}, there are edges or PIPs corresponding to every edge of π\pi. We put all these together to form a walk between ii and kk. We then look at a subpath of this walk between ii and kk, and call it ρ\rho.

Every edge on ρ\rho is either an arrow, where endpoints are in the same strongly connected component or an arc as required.

Hence, we only need to show that every inner node is in anG({i,k})\mathrm{an}_{G}(\{i,k\}). We know that they are all in anGacy({i,k})\mathrm{an}_{G^{\mathrm{acy}}}(\{i,k\}); thus, the result follows from Lemma 73. ∎

We now recall that a pairwise Markov property for ADMGs is the same as in the BDMGs as defined by (PMP). The proof of the equivalence of the pairwise Markov property and the global Markov property for ADMGs is a trivial special case of [19, Corollary 5].

Proposition 76 (Equivalence of pairwise and global Markov properties for ancestral graphs [19]).

Let GG be an ADMG, and assume that the distribution PP satisfies the intersection property and the composition property. Then PP satisfies the pairwise Markov property w.r.t. GG if and only if it is Markovian to GG.

Proposition 77.

Let GG be a BDMG. If PP satisfies the pairwise Markov property w.r.t. GG, then it satisfies the pairwise Markov property w.r.t. its acyclification, GacyG^{\mathrm{acy}}.

Proof.

Consider two arbitrary non-adjacent nodes ii and jj in GacyG^{\mathrm{acy}}; they are not adjacent in GG, and also iscG(j)i\notin\mathrm{sc}_{G}(j) since otherwise they would have made adjacent after acyclification. Thus, by the pairwise Markov property for GG, we have

i Pj|[anG(i)anG(j)]{i,j}.i\mbox{$\>\perp\perp$ }_{P}j\,|\,[\mathrm{an}_{G}(i)\cup\mathrm{an}_{G}(j)]\setminus\{i,j\}.

Moreover, by Lemma 73, anG(i)=anGacy(i)\mathrm{an}_{G}(i)=\mathrm{an}_{G^{\mathrm{acy}}}(i), and same equality holds for the node jj. Thus, we obtain the pairwise Markov property for GacyG^{\mathrm{acy}}. ∎

Proof of Theorem 2.

By Proposition 77, the pairwise Markov property w.r.t. GG carries over to its acyclification GacyG^{\mathrm{acy}}, which by Proposition 76 implies the global Markov property w.r.t. GacyG^{\mathrm{acy}}, since by Proposition 74, GacyG^{\mathrm{acy}} is an ADMG. Finally, by Proposition 75, we recover the global Markov property w.r.t. the original graph GG. ∎

Proof of Proposition 3.

It is enough to prove, for two arbitrary nodes ii and jj in a bowless directed mixed graph GG, that iσj|an({i,j})i\,\mbox{$\perp$}\,_{\sigma}j\,|\,\mathrm{an}(\{i,j\}). By Proposition 75, this is equivalent to imj|an({i,j})i\,\mbox{$\perp$}\,_{m}j\,|\,\mathrm{an}(\{i,j\}) in GacyG^{\mathrm{acy}}. Notice that, by Proposition 74, GacyG^{\mathrm{acy}} is an ADMG. In addition, by Proposition 75, for every separation statement between two nodes kk and ll in GG, there is a separation statement between kk and ll in GacyG^{\mathrm{acy}}; hence, maximality of GG implies that GacyG^{\mathrm{acy}} is maximal. The result now follows from the fact that, for maximal ADMGs, the separation imj|an({i,j})i\,\mbox{$\perp$}\,_{m}j\,|\,\mathrm{an}(\{i,j\}) always holds; see [19]. ∎

12 Acknowledgments

The authors are grateful to Patrick Forré for a helpful discussion on pairwise Markov properties for graphs with directed cycles, and to Philip Dawid, Thomas Richardson, and Jiji Zhang for helpful general discussions related to this manuscript. Work of the first author is supported in part by the EPSRC grant EP/W015684/1.

References

  • [1] {binproceedings}[author] \bauthor\bsnmBareinboim, \bfnmElias\binitsE., \bauthor\bsnmBrito, \bfnmCarlos\binitsC. and \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear2011). \btitleLocal Characterizations of Causal Bayesian Networks. \bpublisherSpringer-Verlag, \baddressBerlin, Heidelberg. \endbibitem
  • [2] {binbook}[author] \bauthor\bsnmBareinboim, \bfnmElias\binitsE., \bauthor\bsnmCorrea, \bfnmJuan D.\binitsJ. D., \bauthor\bsnmIbeling, \bfnmDuligur\binitsD. and \bauthor\bsnmIcard, \bfnmThomas\binitsT. (\byear2022). \btitleOn Pearl’s Hierarchy and the Foundations of Causal Inference, In \bbooktitleProbabilistic and Causal Inference: The Works of Judea Pearl \bedition1 ed. \bpages507–556. \bpublisherAssociation for Computing Machinery, \baddressNew York, NY, USA. \endbibitem
  • [3] {barticle}[author] \bauthor\bsnmBongers, \bfnmStephan\binitsS., \bauthor\bsnmForré, \bfnmPatrick\binitsP., \bauthor\bsnmPeters, \bfnmJonas\binitsJ. and \bauthor\bsnmMooij, \bfnmJoris M.\binitsJ. M. (\byear2021). \btitleFoundations of structural causal models with cycles and latent variables. \bjournalAnn. Statist. \bvolume49 \bpages2885 – 2915. \endbibitem
  • [4] {barticle}[author] \bauthor\bsnmChang, \bfnmJ. T.\binitsJ. T. and \bauthor\bsnmPollard, \bfnmD.\binitsD. (\byear1997). \btitleConditioning as disintegration. \bjournalStatist. Neerlandica \bvolume51 \bpages287–317. \endbibitem
  • [5] {barticle}[author] \bauthor\bsnmColombo, \bfnmDiego\binitsD. and \bauthor\bsnmMaathuis, \bfnmMarloes H.\binitsM. H. (\byear2014). \btitleOrder-independent constraint-based causal structure learning. \bjournalJ. Mach. Learn. Res. \bvolume15 \bpages3741–3782. \endbibitem
  • [6] {barticle}[author] \bauthor\bsnmDawid, \bfnmAlexander\binitsA. (\byear2010). \btitleBeware of the DAG! \bjournalJournal of Machine Learning Research - Proceedings Track \bvolume6 \bpages59-86. \endbibitem
  • [7] {barticle}[author] \bauthor\bsnmDawid, \bfnmA. P.\binitsA. P. (\byear1979). \btitleConditional independence in statistical theory (with discussion). \bjournalJ. R. Stat. Soc. Ser. B. Stat. Methodol. \bvolume41 \bpages1–31. \endbibitem
  • [8] {barticle}[author] \bauthor\bsnmDawid, \bfnmA. P.\binitsA. P. (\byear2002). \btitleInfluence Diagrams for Causal Modelling and Inference. \bjournalInternational Statistical Review \bvolume70 \bpages161-189. \endbibitem
  • [9] {barticle}[author] \bauthor\bsnmDawid, \bfnmPhilip\binitsP. (\byear2021). \btitleDecision-theoretic foundations for statistical causality. \bjournalJournal of Causal Inference \bvolume9 \bpages39–77. \endbibitem
  • [10] {barticle}[author] \bauthor\bsnmEberhardt, \bfnmFrederick\binitsF. and \bauthor\bsnmScheines, \bfnmRichard\binitsR. (\byear2007). \btitleInterventions and causal inference. \bjournalPhilosophy of Science \bvolume74 \bpages981–995. \endbibitem
  • [11] {barticle}[author] \bauthor\bsnmForre, \bfnmPatrick\binitsP. and \bauthor\bsnmMooij, \bfnmJoris M.\binitsJ. M. (\byear2017). \btitleMarkov properties for graphical models with cycles and latent variables. \bjournalarXiv:1710.08775. \endbibitem
  • [12] {barticle}[author] \bauthor\bsnmHall, \bfnmNed\binitsN. (\byear2000). \btitleCausation and the price of transitivity. \bjournalJournal of Philosophy \bvolume97 \bpages198. \endbibitem
  • [13] {binproceedings}[author] \bauthor\bsnmHuang, \bfnmYimin\binitsY. and \bauthor\bsnmValtorta, \bfnmMarco\binitsM. (\byear2006). \btitlePearl’s calculus of intervention is complete. In \bbooktitleProceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. \bseriesUAI’06 \bpages217–224. \bpublisherAUAI Press, \baddressArlington, Virginia, USA. \endbibitem
  • [14] {barticle}[author] \bauthor\bsnmKiiveri, \bfnmH.\binitsH., \bauthor\bsnmSpeed, \bfnmT. P.\binitsT. P. and \bauthor\bsnmCarlin, \bfnmJ. B.\binitsJ. B. (\byear1984). \btitleRecursive causal models. \bjournalJ. Aust. Math. Soc., Ser. A \bvolume36 \bpages30–52. \endbibitem
  • [15] {bbook}[author] \bauthor\bsnmKolmogorov, \bfnmAndrey N.\binitsA. N. (\byear1960). \btitleFoundations of the Theory of Probability, \bedition2 ed. \bpublisherChelsea Pub Co. \endbibitem
  • [16] {binproceedings}[author] \bauthor\bsnmKorb, \bfnmKevin B.\binitsK. B., \bauthor\bsnmHope, \bfnmLucas R.\binitsL. R., \bauthor\bsnmNicholson, \bfnmAnn E.\binitsA. E. and \bauthor\bsnmAxnick, \bfnmKarl\binitsK. (\byear2004). \btitleVarieties of causal intervention. In \bbooktitlePRICAI 2004: Trends in Artificial Intelligence (\beditor\bfnmChengqi\binitsC. \bsnmZhang, \beditor\bfnmHans\binitsH. \bsnmW. Guesgen and \beditor\bfnmWai-Kiang\binitsW.-K. \bsnmYeap, eds.) \bpages322–331. \bpublisherSpringer Berlin Heidelberg, \baddressBerlin, Heidelberg. \endbibitem
  • [17] {bbook}[author] \bauthor\bsnmLauritzen, \bfnmS. L.\binitsS. L. (\byear1996). \btitleGraphical Models. \bpublisherClarendon Press, \baddressOxford, United Kingdom. \endbibitem
  • [18] {barticle}[author] \bauthor\bsnmLauritzen, \bfnmSteffen L.\binitsS. L. and \bauthor\bsnmRichardson, \bfnmThomas S.\binitsT. S. (\byear2002). \btitleChain graph models and their causal interpretations. \bjournalJournal of the Royal Statistical Society: Series B (Statistical Methodology) \bvolume64 \bpages321-348. \endbibitem
  • [19] {barticle}[author] \bauthor\bsnmLauritzen, \bfnmSteffen L.\binitsS. L. and \bauthor\bsnmSadeghi, \bfnmKayvan\binitsK. (\byear2018). \btitleUnifying Markov properties for graphical models. \bjournalAnn. Statist. \bvolume46 \bpages2251–2278. \endbibitem
  • [20] {bmisc}[author] \bauthor\bsnmPark, \bfnmJunhyung\binitsJ., \bauthor\bsnmBuchholz, \bfnmSimon\binitsS., \bauthor\bsnmSchölkopf, \bfnmBernhard\binitsB. and \bauthor\bsnmMuandet, \bfnmKrikamol\binitsK. (\byear2023). \btitleA Measure-Theoretic Axiomatisation of Causality. \endbibitem
  • [21] {bbook}[author] \bauthor\bsnmPearl, \bfnmJ.\binitsJ. (\byear1988). \btitleProbabilistic Reasoning in Intelligent Systems : networks of plausible inference. \bpublisherMorgan Kaufmann Publishers, \baddressSan Mateo, CA, USA. \endbibitem
  • [22] {bbook}[author] \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear2009). \btitleCausality: Models, Reasoning and Inference, \bedition2nd ed. \bpublisherCambridge University Press, \baddressNew York, NY, USA. \endbibitem
  • [23] {barticle}[author] \bauthor\bsnmPeters, \bfnmJonas\binitsJ. (\byear2015). \btitleOn the intersection property of conditional independence and its application to causal discovery. \bjournalJ. Causal Inference \bvolume3 \bpages97–108. \endbibitem
  • [24] {bbook}[author] \bauthor\bsnmPeters, \bfnmJ.\binitsJ., \bauthor\bsnmJanzing, \bfnmD.\binitsD. and \bauthor\bsnmSchölkopf, \bfnmB.\binitsB. (\byear2017). \btitleElements of Causal Inference - Foundations and Learning Algorithms. \bseriesAdaptive Computation and Machine Learning Series. \bpublisherThe MIT Press, \baddressCambridge, MA, USA. \endbibitem
  • [25] {binproceedings}[author] \bauthor\bsnmRamsey, \bfnmJoseph\binitsJ., \bauthor\bsnmSpirtes, \bfnmPeter\binitsP. and \bauthor\bsnmZhang, \bfnmJiji\binitsJ. (\byear2006). \btitleAdjacency-faithfulness and conservative causal inference. In \bbooktitleProceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. \bseriesUAI’06 \bpages401–408. \bpublisherAUAI Press, \baddressArlington, Virginia, USA. \endbibitem
  • [26] {barticle}[author] \bauthor\bsnmRichardson, \bfnmT. S.\binitsT. S. (\byear2003). \btitleMarkov properties for acyclic directed mixed graphs. \bjournalScand. J. Stat. \bvolume30 \bpages145-157. \endbibitem
  • [27] {barticle}[author] \bauthor\bsnmRichardson, \bfnmT. S.\binitsT. S. and \bauthor\bsnmSpirtes, \bfnmP.\binitsP. (\byear2002). \btitleAncestral graph Markov models. \bjournalAnn. Statist. \bvolume30 \bpages962–1030. \endbibitem
  • [28] {binproceedings}[author] \bauthor\bsnmRischel, \bfnmEigil F. \binitsE. and \bauthor\bsnmWeichwald, \bfnmSebastian\binitsS. (\byear2021). \btitleCompositional Abstraction Error and a Category of Causal Models. In \bbooktitleProceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence,. \bseriesProceedings of Machine Learning Research \bpages1013–1023. \bpublisherPMLR. \endbibitem
  • [29] {barticle}[author] \bauthor\bsnmSadeghi, \bfnmKayvan\binitsK. (\byear2017). \btitleFaithfulness of probability distributions and graphs. \bjournalJ. Mach. Learn. Res. \bvolume18 \bpages1–29. \endbibitem
  • [30] {barticle}[author] \bauthor\bsnmSadeghi, \bfnmKayvan\binitsK. and \bauthor\bsnmSoo, \bfnmTerry\binitsT. (\byear2022). \btitleConditions and assumptions for constraint-based causal structure learning. \bjournalJournal of Machine Learning Research \bvolume23 \bpages1–34. \endbibitem
  • [31] {binproceedings}[author] \bauthor\bsnmShpitser, \bfnmIlya\binitsI. and \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear2006). \btitleIdentification of conditional interventional distributions. In \bbooktitleProceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. \bseriesUAI’06 \bpages437–444. \bpublisherAUAI Press, \baddressArlington, Virginia, USA. \endbibitem
  • [32] {bbook}[author] \bauthor\bsnmSpirtes, \bfnmP.\binitsP., \bauthor\bsnmGlymour, \bfnmC.\binitsC. and \bauthor\bsnmScheines, \bfnmR.\binitsR. (\byear2000). \btitleCausation, Prediction, and Search, \bedition2nd ed. \bpublisherMIT press. \endbibitem
  • [33] {bbook}[author] \bauthor\bsnmStudený, \bfnmM.\binitsM. (\byear2005). \btitleProbabilistic Conditional Independence Structures. \bpublisherSpringer-Verlag, \baddressLondon, United Kingdom. \endbibitem
  • [34] {barticle}[author] \bauthor\bsnmVerma, \bfnmTom\binitsT. and \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear1988). \btitleCausal networks: semantics and expressiveness. \bjournalProceedings of the Fourth Workshop on Uncertainty in Artificial Intelligence (UAI) \bvolume4 \bpages352–359. \endbibitem
  • [35] {bbook}[author] \bauthor\bsnmWoodward, \bfnmJames\binitsJ. (\byear2004). \btitleMaking Things Happen: A Theory of Causal Explanation. \bpublisherOxford University Press. \endbibitem
  • [36] {barticle}[author] \bauthor\bsnmZhang, \bfnmJiji\binitsJ. (\byear2008). \btitleOn the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. \bjournalArtif. Intell. \bvolume172 \bpages1873–1896. \endbibitem
  • [37] {barticle}[author] \bauthor\bsnmZhang, \bfnmJiji\binitsJ. and \bauthor\bsnmSpirtes, \bfnmPeter\binitsP. (\byear2008). \btitleDetection of unfaithfulness and robust causal inference. \bjournalMinds and Machines \bvolume18 \bpages239–271. \endbibitem