Axiomatization of Interventional Probability Distributions
Abstract
Causal intervention is an essential tool in causal inference. It is axiomatized under the rules of do-calculus in the case of structure causal models. We provide simple axiomatizations for families of probability distributions to be different types of interventional distributions. Our axiomatizations neatly lead to a simple and clear theory of causality that has several advantages: it does not need to make use of any modeling assumptions such as those imposed by structural causal models; it only relies on interventions on single variables; it includes most cases with latent variables and causal cycles; and more importantly, it does not assume the existence of an underlying true causal graph as we do not take it as the primitive object—in fact, a causal graph is derived as a by-product of our theory. We show that, under our axiomatizations, the intervened distributions are Markovian to the defined intervened causal graphs, and an observed joint probability distribution is Markovian to the obtained causal graph; these results are consistent with the case of structural causal models, and as a result, the existing theory of causal inference applies. We also show that a large class of natural structural causal models satisfy the theory presented here. We note that the aim of this paper is axiomatization of interventional families, which is subtly different from “causal modeling.”
keywords:
[class=MSC]keywords:
and
1 Introduction
A popular approach to infer causal relationships is to use the concept of intervention as opposed to observation. For example, as described in [24], it can be observed that there is a correlation between smoking and the colour of the teeth, but no matter how much one whitens somebody’s teeth, it would not affect their smoking habits. On the other hand, forcing someone to smoke would affect the colour of their teeth. Hence, smoking has a causal effect on the colour of the teeth, but not vice versa.
Interventions have generally been embedded in the setting of structural causal models (SCMs), also known as structural equation models [21, 32]. These are a system of assignments for a set of random variables ordered by an associated “true causal graph,” which is generally assumed to be unknown. SCMs utilise the theory of graphical (Markov) models, which are statistical models over graphs with nodes as random variables and edges that indicate some types of conditional dependencies; see [17].
An axiomatic approach to interventions for SCMs, known as Pearl’s do-calculus [22], has been developed for identifiability of interventional distributions from the observational ones; see also [13, 31] for some further theoretical developments. There has also been a substantial amount of work on generalizing the concept of intervention from the case of directed acyclic graphs (DAGs) (i.e., Bayesian networks) to more general graphs containing bidirected edges (which indicate the existence of latent variables) (see, e.g., [36]), and directed cycles [3]. However, most of these attempts stay within the setting of SCMs or, at least, under the assumption that there exists an underlying “true causal graph” that somehow captures the causal relationships [35].
Interventions in the literature (i.e. on SCMs) have been defined to be of different forms; see [16, 10]. The type of intervention with which we are dealing here is hard in the sense that it destroys all the causes of the intervened variable, and is stochastic in the sense that it replaces the marginal distribution of the intervened variable with a new distribution; although, we will show in Remark 8 that an atomic (also called surgical) intervention (which forces the variable to have a specific value) can be easily adapted in this setting.
In this paper, without assuming any modeling assumptions such as those given in the setting of SCMs, we give simple conditions for a family of joint distributions to act as a well-behaved interventional family, so that one can think of as an interventional distribution on a single variable , for each in a random vector . As apparent from the context, our approach here is aligned with the interventional approach to causality rather than the counterfactual approach.
Here, we are not providing an alternative to the current mainstream setting (sometimes called the Pearlian setting), in the case where it uses the interventional approach, and in most cases, SCMs, and which has led to extensive work on causal learning and estimation. We simply provide theoretical backing for this approach and generalize it beyond SCMs, by providing certain axioms in order to derive as results some of the assumptions that have been used in the literature (such as the existence of the causal graph and the global Markov property w.r.t. it).
This paper carries certain important messages: (1) There is no need to take the true causal graph as the primitive object: causal graph(s) can then be formally defined and derived from interventional families (rather than posited). (2) The causal structure (and graph) can be solely derived from the family of interventional distributions; in other words, there is no need for an initial state, i.e., an underlying joint observational distribution of , to be be known for this purpose. However, we provide axioms such that the required consistency between the interventional family and the “underlying” observational distribution is satisfied when indeed the observational distribution is available, and such that one can measure the causal relationships. (3) To derive the causal structure or graph, (in most situations) one needs to rely only on single interventions once at a time. This is an advantage as much less information is used by only relying on single interventions. Indeed, there are real world situations in which one would like to consider intervening on several variables simultaneously; we believe a similar theory can be proposed in such cases.
We must emphasize that the work presented here is about axioms that interventional distributions should satisfy for the purpose of causal reasoning. These axioms should not be confused with a causal model whose goal is to provide “correct” interpretation of causal relationships and measuring their effects. This difference is quite subtle and could lead to confusion. Similarly, one should distinguish the “causal graphs” defined and derived here from a graph learned by structure learning from observational and, potentially, interventional data (as in, e.g., [32, 5]). The goal here is not structure learning.
1.1 Key results
One of our central assumptions, Axiom 1, is that cause is transitive; see [12] for a philosophical discussion on the transitivity of the cause. Under the condition of singleton-transitivity and simple assumptions on conditional independence structure of , we show that the causal relations are transitive; see Theorem 10. We provide a definition of causation similar to that in [24]. The concept of direct cause is defined in terms of the conditional independence properties of the interventional family (which is a departure from the widely-known definition [35, page 55]), and from this we define the intervened causal graph, and using these, we define the causal graph; see Section 4.2; this is a major relaxation of assumptions from the current paradigm where it is assumed that such a causal graph exists. The obtained causal graph allows bidirected edges and directed cycles without double edges consisting of a bidirected edge and an arrow. We call this family of graphs bowless directed mixed graphs (BDMGs). The generated graph is the “true” causal graph under the axiomatization, and we show that the definitions related to causal relationships and the graphical notions on the graph are interchangeable (Theorem 20).
One of our main theorems is that, under some additional assumptions, namely intersection and composition, intervened distributions in the interventional family are Markovian to the defined intervened graphs; see Theorem 26. We provide additional axioms (Axioms 2 and 3) to relate to an observed distribution , and call the interventional family (strongly) observable. We show that the underlying distribution for an observable interventional family is Markovian to the defined causal graph; see Theorem 28. Therefore, the established theory of causality using SCMs, which mainly relies on the Markov property of the joint distribution of the SCM, could be followed from our theory.
We later provide additional axioms (Axioms 4 and 5) for the case of ancestral causal graphs, to define what we call quantifiable interventional families that allow for measuring causal effects. We show that the quantifiable interventional families are strongly observational; see Theorem 47.
We also compare and contrast our theory with the SCM setting. We show that, for SCMs with certain simple properties (which include transitivity of the cause and are implied by faithfulness), the family of interventions on each node constitutes a strongly observable interventional family, and (even without the transitivity assumption in the case of ancestral graphs) the causal graph generated by the theory presented in this paper is the same as the causal graph associated to the SCM; see Theorem 60.
Our theory is based only on intervening on single variables once at a time. We clearly identify cases (which can only be non-maximal and non-ancestral), where this theory may misidentify some direct causes; see Section 9.
1.2 Related works
To the best of our knowledge, most of the attempts to abstracting intervention or causality based on intervention in general are substantially different from our approach; see for example [28] for a category-theory approach, and more recent [20] for a measure-theoretic approach.
One such attempt is the seminal work of Dawid on the decision theoretic framework for causal inference; for example, see [8, 9]. Our approach share the same concerns and spirit as that of Dawid’s in focusing on the interventions rather than counterfactuals, as well as trying to justify the existence of the causal graph rather than assuming it, as we have heeded Dawid’s caution [6]. However, the mechanics of the two works are different. First of all, we have not provided any statistical model like Dawid does. In addition, the goal of Dawid’s work is mainly to enable “transfer of probabilistic information from an observational to an interventional setting,” whereas, here, our starting point is interventions. Finally, our approach also covers a much more general class of causal graphs than DAGs, considered by Dawid. We have not included influence diagrams, as proposed by Dawid, in our setting, but believe that it should be possible to derive them together with their conditional-independence constraints using our causal graphs and Markov properties.
A more similar approach to ours is that of Bareinboim, Brito, and Pearl [1]. Like us the authors start with a family of interventional distributions with interventions defined on single variables. A difference is that they only work on atomic interventions, which requires alternative definitions to conditional independence (which are phrased as invariances)—we believe this can be adapted to use stochastic intervention and regular conditional independence. Using this, they define the concept of direct cause, which is different, but of similar to how we define this concept. A major difference is that, in their paper, they provide different notions of compatibility of the interventional family and causal graphs by assuming certain conditions that include the global Markov property—in our work, we do not assume the global Markov property, and generate a graph directly from the interventional family by using direct cause, and prove the Markov property under certain axioms and conditions. Another important difference is that their work relies on the notion of observed initial state distribution to define interventions—as mentioned before, we do not need to rely on observational distribution to derive the causal graph.
The mentioned paper is purely on DAGs, but in a more recent work [2] the method was generalized to include arcs representing latent variables. The notion of Markov property has been replaced by semi-Markov property to deal with this generalization, but the difference between the two methods remains the same as described for the original paper.
1.3 Structure of the manuscript
In the next section, we provide some preliminary material including novel results on the equivalence of the pairwise and global Markov property in our newly defined class of BDMGs, which are used in the subsequent sections. In Section 3, we define foundational concepts related to interventional families, and provide conditions for the axiom of transitivity of the cause. In Section 4, we define the concepts of direct cause and intervened causal as well as causal graphs, and prove the Markov property of the intervened distribution to the intervened causal graph. In Section 5, we prove the Markov property of the underlying distribution to the causal graph under the observable interventional axioms. In Section 6, we specialize the definitions and results for directed ancestral graphs. In Section 7, we provide additional axioms of quantifiable interventional families for the purpose of measuring causal effects. In Section 8, we show how intervention on SCMs fits within the framework of this paper. In Section 9, we provide cases where only single interventions may misidentify certain direct causes. We conclude the paper in Section 10. In the Appendix, we provide certain proofs of results presented in Section 2.4, including the proof of equivalence of the pairwise and global Markov property for BDMGs.
2 Preliminaries
In this section, we provide the basic concepts of graphical and causal modeling needed in the paper.
2.1 Conditional measures and independence
We will work in the following setting. Let be a finite set of size . Let be a probability measure on the product measurable space . For , we let and be the marginal measure of on given by
for all measurable . We will use the notation to mean that the marginal is the product measure on , so that if is a random vector (defined on some probability space ) taking values on with law , then is independent of [7, 17].
For , we let denote a regular conditional probability [4] so that in particular, we have the disintegration
for measurable. More generally, consider disjoint subsets , , and of . We will often consider the marginal of a conditional measure, and have a slight abuse of notation that:
We write to denote that the measure is a product measure on for -almost all , so that if has law , then is conditionally independent of given . Sometimes, we will simply say that is conditionally independent of given in . In addition, when independence fails, we write and say that and are conditionally dependent given in .
2.2 Structural independence properties of a distribution
A probability distribution is always a semi-graphoid [21], i.e., it satisfies the four following properties for disjoint subsets , , , and of :
-
1.
if and only if (symmetry);
-
2.
if , then and (decomposition);
-
3.
if , then and (weak union);
-
4.
if and , then (contraction).
Notice that the reverse implication of contraction clearly holds by decomposition and weak union. We so use three different properties of conditional independence that are not always satisfied by probability distributions:
-
5.
if and , then (intersection);
-
6.
if and , then (composition);
-
7.
if and , then or (singleton-transitivity),
where , , and are single elements. A semi-graphoid distribution that satisfies intersection is called graphoid. If the distribution is a regular multivariate Gaussian distribution, then is a singleton-transitive compositional graphoid; for example see [33] and [21]. If has strictly positive density, it is always a graphoid; see, for example, Proposition 3.1 in [17].
Remark 1.
We note that if has full support over its state space, then it satisfies the intersection property; for a comprehensive discussion and necessary and sufficient conditions, see [23].
Finally, we define the concept of ordered stabilities [29]. We say that satisfies ordered upward- and downward-stability w.r.t. an order of if the following hold:
-
if , then for every such that or (ordered upward-stability);
-
if , then for every such that , , and for every (ordered downward-stability).
2.3 Graphs and their properties
We usually refer to a graph as an ordered pair , where is the node set and is the edge set. When nodes and are the endpoints of an edge, we call them adjacent, and write , and otherwise .
We consider two types of edges: arrows () and bidirected edges or arcs (). We do not consider graphs that have simultaneous third type of edge: undirected edges or lines (). We only allow for the possibility of multiple edges between nodes when they are arrows in two different directions between and , i.e., and , which we call parallel arrows. This means that we do not allow bows, i.e., a multiple edge of arrow and arc, to appear in the graph.
A subgraph of a graph is graph such that and and the assignment of endpoints to edges in is the same as in . An induced subgraph by nodes is the subgraph that contains all and only nodes in and all edges between two nodes in .
A walk is a list of nodes and edges such that for , the edge has endpoints and . A path is a walk with no repeated node or edge. When we define a path, we only write the nodes (and not the edges). A maximal set of nodes in a graph whose members are connected by some paths constitutes a connected component of the graph. A cycle is a walk with no repeated nodes or edges except for .
We call the first and the last nodes endpoints of the path and all other nodes inner nodes. A path can also be seen as a certain type of connected subgraph of ; a subpath of a path is an induced connected subgraph of . For an arrow , we say that the arrow is from to . We also call a parent of , a child of and we use the notation for the set of all parents of in the graph. In the cases of or we say that there is an arrowhead at or pointing to . A path (or a cycle where ) is directed from to if all edges are arrows pointing from to . If there is a directed path from to , then node is an ancestor of and is a descendant of . We denote the set of ancestors of by ; unlike some authors, we do not allow . Similarly, we define an ancestor of a set of nodes given by . If necessary, we might write to specify that this is the set of ancestors in .
A strongly connected component of a graph is the set of nodes that are mutually ancestors of each other, or it is a single node if that node does not belong to any directed cycle. It can be observed that nodes of the graph are partitioned into strongly connected components. We denote the members of the strongly connected component containing node by .
A tripath is a path with three nodes. The inner node in each of the three tripaths
is a collider (or a collider node) and the inner node of any other tripath is a non-collider (or a non-collider node) on the tripath or, more generally, on any path of which the tripath is a subpath; i.e. a node is a collider if two arrowheads meet. A path is called a collider path if all its inner nodes are colliders.
The most general class of graphs that naturally arises from the theory presented in this paper is what we call the bowless directed mixed graph (BDMG), which consists of arrows and arcs, and the only multiple edges are parallel arrows (i.e., they are bowless).
Ancestral graphs [27] are graphs with arcs and arrows with no directed cycles and no arcs such that . Acyclic directed mixed graphs (ADMGs) [26] are graphs with arcs and arrows with no directed cycles. In other words, BDMGs unify ADMGs without bows and directed cycles. BDMGs also trivially contain the class of directed ancestral graphs, i.e., ancestral graphs [27] without lines. These all also contain directed acyclic graphs (DAGs) [14], which are graphs with only arrows and no directed cycles.
The class of BDMGs is a subclass of directed mixed graphs, introduced in [3], which is a very general class of graphs with arrows and arcs that allow for directed cycles. Later on, we will use some definitions and results originally defined for directed mixed graphs.
2.4 Markov properties
In this paper, we will need global and pairwise Markov properties for BDMGs. In order to introduce the global Markov property, we need to define the concept of -separation for directed mixed graphs (in fact, originally defined for the larger class of directed graphs with hyperedges in [11]).
A path is said to be -connecting given , which is disjoint from , if all its collider nodes are in ; and all its non-collider nodes are either outside , or if there is an arrowhead at , then and if there is an arrowhead at on then . For disjoint subsets of , we say that and are -separated given , and write , if there are no -connecting paths between and given .
In the case where there are no directed cycles in the graph, -separation reduces to the -separation of [27]; recall that is -connecting given if all its collider nodes are in ; and all its non-collider nodes are outside . In addition, if there are no arcs in the graph, i.e., the graph is a DAG, it reduces to the well-known -separation [21].
We call two graphs Markov equivalent if they induce the same set of conditional separations.
A probability distribution defined over satisfies the global Markov property w.r.t. a bowless directed mixed graph , or is simply Markovian to , if for disjoint subsets , , and of , we have
If is an ancestral graph (or a DAG), then will be replaced by (or ) in the definition of the global Markov property.
If, in addition to the global Markov property, the other direction of the implication holds, i.e., , then we say that and are faithful. A weaker condition of adjacency-faithfulness [25, 37] states that for every edge between and in , there are no independence statements for any .
We now define that a distribution satisfies the pairwise Markov property (PMP) w.r.t. a bowless directed mixed graph , if for every pair of non-adjacent nodes in , we have
(PMP) |
This is the same wording as that of the pairwise Markov property for the subclass of ancestral graphs; see [19].
We prove the equivalence of the pairwise and global Markov properties, which shall be used later for causal graphs and Theorem 28. The proofs are presented in the Appendix as they are not the main focus of this manuscript.
Theorem 2.
Let be a BDMG, and satisfy the intersection and composition properties. If satisfies the pairwise Markov property (PMP) with respect to , then is Markovian to .
We also define the converse of the pairwise Markov property. We say that satisfies the converse pairwise Markov property w.r.t. if an edge between and in implies that
Notice that faithfulness and adjacency-faithfulness of and imply the converse pairwise Markov property; see [30].
A graph is called maximal if the absence of an edge between and corresponds to a conditional separation statement for and , i.e. there exists for some a statement of form . Notice from the definition of -separation that graphs with chordless directed cycles (i.e., having two non-adjacent nodes in a cycle) are not maximal.
We have the following corresponding converse to Theorem 2.
Proposition 3.
Let be a maximal BDMG. If is Markovian to , then satisfies the pairwise Markov property (PMP) with respect to .
We call a non-adjacent pair of nodes which cannot be -separated in a non-maximal graph, regardless of what to condition on, an inseparable pair, and a non-adjacent pair of nodes which can be -separated in a maximal or non-maximal graph a separable pair.
For BDMGs, we define a primitive inducing path (PIP) to be a path (with at least nodes) between and , where
-
(i)
all edges are either arcs or an arrow where except for the first and last edges, which may be or (without being in the same connected component);
-
(ii)
for all inner nodes, we have , i.e., they are in ancestors of or .
PIPs were originally defined for the case of ancestral graphs (where they were allowed to be an edge) [27]. We show in the Appendix the result below:
Proposition 4.
In a BDMG, inseparable pairs are connected by PIPs.
In addition, notice from Proposition 3 that if is Markovian to , then a pair being a separable pair is equivalent to the separation .
We say that a graph admits a valid order if for nodes and of , implies that ; and implies that and are incomparable. Notice that this specifies the partial order via its cover relations. In fact, this order can is used as the order w.r.t. which ordered upward- and downward-stability hold for graph separations; see [29].
Finally, for ancestral graphs, the -separation holds for every set “between” parents and ancestors. We will use this fact later:
Lemma 5.
For an ancestral graph and for separable nodes , we have
for every such that .
2.5 Structural causal models
The theory in this paper does not use or assume structural causal models (SCMs) (also known as the structural equation models) [22, 32]. We define SCMs here as they are an interesting special case of our theory, for which intervention could be easily conceptualized.
Here, we define SCMs for the class of BDMGs as a simplified version of SCMs defined for directed (mixed) graphs in [3]. Consider a graph with nodes, which in the context of causal inference is often referred to as the “true causal graph.” A structural causal model associated with is defined as a collection of equations
where is defined on and might be called noises; for any subsets and , we require that if and only if, in , there is no arc between any node in and any node in . In this paper, we usually refer to an SCM as and its joint distribution as .
In the more widely-used case where is a DAG, all the are jointly independent. For both mathematical and causal discussions on SCMs with DAGs, see [24]. When directed cycles are existent, some solvability conditions are required in order for the theory of SCMs to work properly; for this and for more general discussion, see [3].
Standard interventions are defined quite naturally when functional equations are specified, as in the case of SCMs: By intervening on we replace the equation associated to by , where is independent of all other noises; it is not necessary that has the same distribution as . We are concerned with a similar type of intervention in this paper – this is a special case of the so-called stochastic intervention [16], where some parental set of might still exist after intervention on ; see also [24]. A more special type of intervention is called perfect intervention (or surgical intervention), where it puts a point mass on a real value – this is the original idea of do-calculus [22], and is often denoted by .
3 Interventional family of distributions
3.1 Interventional families and the cause
Again, let be a finite set of size . Consider a family of distributions , where each is defined over the same state space . We refer to as an interventional family (of distributions). For a random vector with distribution , we think of as the interventional distribution after intervening on some variable .
For , we define the set of the causes of as
we rarely, simultaneously, have to consider two different interventional families at once. Thus, if , then, for , we have that is dependent on . Notice that, by convention, . For a subset , we define .
The definition of the cause is identical to what is know in the literature as the “existence of the total causal effect” (see, e.g., Definition 6.12 in [24]). Its combination with meets the intuition behind cause and intervention: after intervention on a variable , it is dependent on a variable if and only if it is a cause of that variable.
The above setting is compatible with the well-known intervention for SCMs; for a comprehensive discussion on this, see Section 8. We will often illustrate our theory with simple examples of SCMs and standard intervention on a single node.
We can also define the set of effects of denoted by by . We take note of the following useful fact.
Remark 6.
From the definition of cause, we have
Remark 7 (Interventional families with the same cause).
Interventions families are defined for one joint distribution per intervention. This can be considered an advantage as not all interventions have to follow a single causal graph. Here, we provide an immediate condition for interventional families to define the same set of causes. After causal graphs have defined, in Section 5.3, we provide conditions for families of distributions that lead to the same causal graph.
Consider the interventional families and over the same state space . Causes and effects depend on the interventional family, and, by Remark 6, it follows that, for all ,
for all , in which case we say they have the same causes. In the case of SCMs, under standard interventions, the causes are usually invariant with respect to the choice of distribution, except for technical counterexamples; see Remark 52 and Example 42.
Consider a fixed , and suppose that measures and have the same null sets. Notice that the equality
for almost every , is sufficient for the effects of to be same in both families. Moreover, with the disintegration
we see that effects of depend only on the corresponding conditional distribution, and are invariant under the marginal distributions on with the same null sets.
Remark 8 (Atomic interventions).
Observe that the dependence
is equivalent to the existence of disjoint measurable subsets of positive measures under satisfying the inequality
(1) |
for all and all . Since and are probability measures on , they can be thought of as atomic interventions on , where the values at are fixed, at and , respectively. Thus inequality (1) has the interpretation that is a cause of if and only if there exists atomic interventions that witness an effect on ; that is, as a function of , the conditional probability, , is non-constant.
From Remark 7, without loss of generality, given atomic interventions, , which are measures on indexed by , we can extend these to an intervention, defined on the complete space, , via the disintegration
where is a suitably chosen probability measure on Specifically, in the case where is finite, can be taken to be a uniform measure on .
We call a subset a causal cycle if for every , we have . We write to denote the causal cycle containing .
Under the composition property, we have the following independence. Let denote the subset of that contains members that are not an effect of .
Proposition 9 (Non-effects under composition).
Let be a family of distributions. If satisfies the composition property, then
Proof.
Let . Since is not a cause of , by definition, ; iteratively applying the composition property, we obtain the desired result. ∎
3.2 Transitive interventional families
We now say that is a transitive interventional family if the following axiom holds.
Axiom 1 (Transitivity of cause).
For distinct , if and , then .
Notice also that in transitive interventional families are restricted: Axiom 1 places constraints between different since depends on all .
Under singleton-transitivity, we have sufficient conditions for Axiom 1 to hold. Notice that that these conditions are not satisfied in general, even in the case of an SCM with standard interventions.
Theorem 10 (Transitivity of cause under singleton-transitivity).
Let be an interventional family whose members satisfy singleton-transitivity. Assume, for distinct such that and , we have:
-
(a)
; and
-
(b)
Then is a transitive interventional family.
Proof.
Towards a contradiction, assume that and , but . Since , we have, by (b), that . Also, since , we have .
The dependencies and , along with singleton-transitivity, in its contrapositive form, imply that or ; however, the former is ruled out by assumption, and the latter contradicts (a). ∎
Proposition 11.
Transitivity of the cause induces a strict preordering on by
(2) |
Proof.
Irreflexivity is implied by the convention that . Transitivity is Axiom 1. ∎
Corollary 12.
Let be a transitive interventional family that satisfies the composition property. If , then
Proof.
Transitivity implies that is not a cause of members of , so that . Thus Proposition 9, and the weak union property yield the result. ∎
In the next example, we show that we cannot drop the singleton transitivity assumption in Theorem 10.
Example 13 (Failure of transitivity without singleton transitivity).
Suppose is the cause of , and is a cause of . It may not be the case that is a cause of . Consider the SCM, with , where is Bernoulli , conditional on , we sample a Poisson random variable with mean , and then we sample as an i.i.d. sequence of Bernoulli random variable(s) with parameter , and finally set . The first random variable is independent of the final result . The standard interventions where we simply substitute a distributional copy of for each gives that is clearly the cause of , and a cause of . However, since , singleton-transitivity fails: and , but we have neither nor .
4 Causal graphs and intervened Markov properties
4.1 Intervened and direct cause
Notice that only knowing the causal ordering cannot yield a graph, since, for example, for , there is no way to distinguish the two graphs corresponding to whether an additional exists in the graph or not. For this reason, we need to define the concept of direct cause.
In order to define direct and intervened cause in the general case, we need an iterative procedure:
-
1.
For each , start with , , and an empty graph with node set ;
-
2.
Redefine
-
3.
Generate by setting arrows from to , i.e., if ;
-
4.
Generate the graph by removing all arrows pointing to from ;
-
5.
Redefine ;
-
6.
If is modified by Step 3, then go to Step 2; otherwise, output , , , and .
We call the set of the direct causes of , and the set of intervened causes of after intervention on . We also call the causal structure, and the -intervened causal structure.
Notice that, since in the iteration is getting smaller, the procedure will stop.
Notice also that, by convention, , for any . We also let , and .
As seen by definition, “cause” is a universal concept: no matter how large the system of random variables is, as long as it contains the two investigated random variables, the marginal dependence of those variables stays intact. On the other hand, “direct cause” depends on the system of variables in which the two investigated variables lie.
The below example shows why we need as opposed to in the definition of ; see also Section 7.2.
Example 14.
Let the graph of Figure 1, below, be the graph associated to an SCM with standard interventions. Under faithfulness, notice that, , where . However, is clearly not a direct cause of . On the other hand, , and .
In the iterative procedure above, in the first round, there will be an arrow from to in . In the second round, will be removed.
Remark 15.
For causal graphs (as defined in the next subsection) that are ancestral, can simply be defined by
rather than using the iterative procedure and in the conditioning set; see Section 6 for the equivalence of the two methods under certain conditions.
Remark 16.
As it is seen in Section 9, there may still be arrows generated here that arguably should not be considered direct causes in the case of non-maximal non-ancestral graphs. We study and identify these cases and offer adjustments in that section.
The next proposition can be thought of as a proxy for the pairwise Markov property, and will be used in our proofs of the Markov property for our casual graphs.
Proposition 17.
Let be a transitive interventional family, and satisfy the composition property. For distinct , if , then
Proof.
If , then the result follows directly from the definition of . If , then by Corollary 12, we have ; moreover, observe that transitivity also implies that . ∎
4.2 Intervened causal and causal graphs
We are now ready to define a graph that demonstrates the causal relationships by capturing the direct causes as well as non-causal dependencies due to latent variables.
Given an interventional family , and , we define the -intervened graph, denoted by to be the -intervened causal structure, where, in addition, for each pair of non-adjacent nodes that are distinct from , we place an arc between and , i.e., , if
(3) |
Thus with (3), we put an arc if one of the interventions suggests the presence of a latent variable.
We also define the causal graph, denoted by , to be the causal structure, where, in addition, for each pair of nodes that are not adjacent by an arrow, we place an arc between them if the -arc exists in for every that is distinct from and .
Remark 18.
We note that does not contain arrows pointing to in and all arcs with as an endpoint in .
We also note that the existence of the -arc in two different intervened graphs and may not coincide when there is a PIP as the below example shows.
Example 19.
Consider an SCM with the graph presented in Figure 2, below, with standard intervention. Assume that the joint distribution of the SCM is faithful to this graph. It is easy to observe that , but . This implies that there is a -arc in , but not in .
For an interventional family , our definitions now allow us to use the notions and interchangeably. For a transitive , we have that, moreover, other causal terminologies on and the graph terminologies on can be used interchangeably:
Theorem 20 (Interchangeable terminology).
For a transitive interventional where satisfy the composition property, we have the following:
-
(i)
-
(ii)
.
Proof.
-
(i)
The direction is implied by the transitivity of ancestors and Axiom 1.
Assume that . We prove the result by induction on the cardinality of . For the basis, if , then by the definition of , for each , we must have ; furthermore, composition implies
Towards a contradiction, suppose that , so that
Contraction and decomposition imply , so that we have , contrary to our original assumption.
For the inductive step, assume that . Choose an . The transitivity of the cause, assumed in Axiom 1, implies that . This implies that , which, by inductive hypothesis, implies there is a directed path from to .
Similarly, , so that and again the inductive hypothesis gives a directed path from to . Hence, there is a directed path from to , as desired.
-
(ii)
This equivalence now follows directly from (i) using the definition of and . ∎
As an immediate consequence of the above result, we see that if the set of causes of a variable is non-empty, then at least one of the causes must act as the direct cause.
Corollary 21.
Let be a transitive interventional family where satisfy the composition property. For , if , then .
Proof.
Let . We know that there is at least one directed path from to . The node adjacent to on the path is a direct cause by definition. ∎
In the next example, we illustrate how all the different edges can easily occur. We remark that we can generate cycles quite easily, but in the standard SCM setting, their existence is non-trivial and requires solvability conditions [3].
Example 22 (A simple example of an arrow, cycle, and arc).
Consider a fixed joint distribution for random variables , where is not independent of , and is independent of Notice we may think of the joint distribution as generated via the following functional equations: or , where and are deterministic functions, and are uniformly distributed on .
Thus corresponding to , we have an SCM, where , and is isolated, and similarly, corresponding to , we have an SCM, where , and is isolated. Next, we see how these SCM interact with various interventions, and see the corresponding causal graphs that can be defined.
Via the function , we consider standard interventions, where , , and are replaced with a distributional copies of themselves, giving the family and via the function we consider standard interventions giving the family ; in both cases, we place an arrow between and in the expected direction. Note that
Consider also the non-standard interventional family ; is a direct cause of , and is also a direct cause of , so we obtain a parallel edge.
Finally, to have an arc, we consider the interventional family
there are no causes, and no arrows are placed, but detects the dependence in and , and places an arc between them.
In the following example, we stress that the generated causal graph heavily depends on the interventional family that is considered, even under standard interventions arising from simple SCM.
Example 23 (Two different graphs, one underlying distribution).
Let be independent Bernoulli random variables with parameter . Consider the random variables , , and ; they have a joint distribution , and can be thought of as an SCM with and . Thus we can define a intervention family corresponding to standard interventions on ; it is not difficult to verify in this case that the resulting casual graph will be the same as the graph for .
However, it is not difficult to construct another SCM with , , and , where also have the joint distribution . Thus we can define another intervention family corresponding to standard interventions on ; again it is not difficult to verify in this case that the resulting casual graph will be the same as the graph for .
We exploited the simple fact that that the joint distribution does not uniquely determine a corresponding SCM—not even up to adjacency. Notice that and do not have the same number of edges.
The generated graph is indeed a bowless directed mixed graph (BDMG).
Proposition 24 (The causal graph is BDMG).
The causal graph and intervened graphs generated from a transitive interventional family are BDMG.
Proof.
The proof is immediate from the definition of intervened causal and causal graphs. ∎
Remark 25.
If we assume that for every , so that there is no causal cycle, then the causal graph is a bowless ADMG. If we assume that for every that are not direct causes of each other and every , we have
then the causal graph does not contain arcs – in this case it is seen that the interventional family does not detect any latent variables that cause both and , since they are not dependent. Assuming both of these conditions results in a DAG.
4.3 The Markov property with respect to the intervened causal graphs
We can now present a main result of this paper.
Theorem 26 (Interventional distributions are Markovian to intervened graphs).
Let be a transitive interventional family. For each , if satisfies the intersection property and the composition property, then is Markovian to the -intervened graph, .
4.4 Alternative intervened causal graph
We defined intervened causal graphs by deciding to place arcs in the graph in places where arrows are missing. We used , which immediately implies parts of the global Markov property as expressed in Theorem 26. If a latent variable that causes both and should always induce dependencies between the two regardless of what to condition on, in order to obtain , one can instead place an arc in the causal structure when, for every such that , we have
(4) |
We note that the above definition removes any assumptions related to Markov properties. In order to obtain the same graph, and consequently, Markov property, we will need ordered upward- and downward-stability as the following result shows. Let us denote the intervened causal graph generated in this way by .
Proposition 27.
Proof.
By definition, the arrows are the same in both graphs. We need to show that they have the same arcs. One direction is trivial. We prove the other direction in contrapositive from by assuming for some . Using ordered upward-stability, we can add to the conditioning set, and by applying ordered downward-stability we can remove the other nodes to obtain . ∎
5 Observable interventional families
In Section 4, the causal structures are completely defined by interventional families; here we provide additional assumptions for the causal graph to be Markovian with respect to an underlying distribution that can be observed.
5.1 Markov property w.r.t. the causal graph
We say that an interventional family of distributions on the state space is observable with respect to an underlying distribution on such that the following axiom holds.
Axiom 2.
-
(a)
For every separable pair , in , and every distinct , we have
-
(b)
For every and every , we have
Let us remark if is a product measure on , then every interventional family will be observable with respect to ; in practice we want to consider underlying distributions that have some relation to the interventional family, such as when is the distribution of an SCM.
Theorem 28 (The underlying distribution is Markovian to the causal graph).
Let be a transitive observable interventional family, with underlying joint distribution that satisfies the intersection property and the composition property. Then is Markovian to the causal graph, .
Proof.
By Theorem 2, it suffices to verify that satisfies pairwise Markov property (PMP) w.r.t. ; moreover, we can assume, without loss of generality that is maximal since separable pairs do not correspond to any separation statements. If are non-adjacent nodes, then there is no arc between and in . Hence, by definition, there exists an such that , from which Axiom 2 (a) gives the desired result. ∎
5.2 Causal graphs for observational distributions
The arcs in the causal graph can be generated from the observational distribution and the causes, if the inverse of Axiom 2 holds. Assume is an observable interventional family of distributions with respect to an underlying observational measure . We say that is a strongly-observable interventional family if the following additional axiom holds.
Axiom 3.
-
(a)
For every distinct , we have
-
(b)
For every and every , we have
Our next example shows that there are univariate observable interventional families that are not given by standard interventions and that univariate observable interventional families may not satisfy Axiom 3.
Example 29 (Interventions on joint distributions).
Suppose , , and are jointly independent (Bernoulli) random variables, with law , which will serve as an underlying distribution. Now consider the (non-standard) intervention, where changes the joint distribution of and to one of dependence, but leaves the marginal distributions of and , and alone; furthermore, we leave independent of . Although we normally think of as an interventional on , our general definition allows for somewhat counter-intuitive constructions.
Define by adding to the causal structure , arcs between and , i.e., , for nodes and not adjacent by an arrow, if
(5) |
Clearly, (5) suggests the presence of a latent variable; compare with (3).
Proposition 30.
Suppose that is a strongly-observable interventional family with the underlying distribution , and , so that there are at least three nodes. Then when these graphs are maximal.
Proof.
By definition, the arrows are the same in both graphs. If there is an arc between in , then by definition of arcs, for all , we have , hence Axiom 3 (a) implies that , and thus there is an arc between in . Conversely, if , then Axiom 2 (a) implies the existence of the necessary arc in for maximal graphs. ∎
Note that for strongly-observable interventional families, the existence of the -arc in and may differ for non-maximal graphs only when is an inseparable pair. This implies that these two graphs are Markov equivalent.
5.3 Congruent interventional families
Consider the interventional families and over the same state space . It is immediate that if both families are strongly observable with respect to a single underlying distribution , and if the causal graphs and are maximal, then they have the same adjacencies. Motivated by Axioms 2 and 3, we say that the families are congruent if the have the same causes (see Remark 7) and
-
1.)
For every and every , we have
-
2.)
For every distinct , we have
We collect our observations in the following proposition.
Proposition 31.
Consider two interventional families over the same state space.
-
1.)
The families are congruent if and only if their causal graphs are the same.
-
2.)
If the families are strongly observable with respect to the same underlying distribution, and their causal graphs are maximal, then the graphs have the same adjacencies; furthermore, if the families have the same causes, and if the causal graphs are ancestral, then the graphs are the same, and the families are congruent.
Proof.
The first claim is immediate. For the second claim, it is immediate from Axioms 2 and 3, that the causal graphs have the same adjacencies; furthermore, under the ancestral assumption, we can also direct the graph using the causes, which are also the same. Furthermore, if two nodes are adjacent and not causes of each other, we deduce the presence of an arc. Hence the graphs are the same, and the families are congruent. ∎
5.4 Alternative causal graphs
As another alternative for causal graphs, one can consider a setting where we relax the criterion for an arc to exist in by only requiring that a corresponding arc exists in for some (as opposed to every ).
This graph clearly has more arcs than in the previous setting. This new setting leads to something extra to the Markov property above when it comes to conditional independencies related to causal graphs: For a separable pair , Axiom 2 always implies that . On the other hand, the original setting is more consistent with the idea that the presence of an arc indicates the existence of a latent random variable causing the endpoints.
This also leads to Axiom 3 to hold under transitivity, composition, and converse Markov property:
Proposition 32.
Suppose that is a transitive observable interventional family with the underlying distribution . If satisfy the composition property, and satisfies the converse Markov property w.r.t. , then Axiom 3 holds.
Proof.
We need two cases. If (where may be ) are non-adjacent in all then these are not adjacent in every . The right-hand-side of the axiom then holds by Markov property of Theorem 26 under transitivity (which ensures, by Theorem 20, that ancestors and causes are the same).
If (where may be ) are adjacent in , then the left-hand-side never holds under the converse pairwise Markov property. ∎
6 Specialization to directed ancestral graphs
When we assume that the causal graph is ancestral (in fact, directed ancestral), in all the definitions and results can be replaced by under intersection and composition. In particular, for the definition of , we have the following:
Proposition 33.
Let be a transitive interventional family, and suppose that satisfies the intersection and composition properties for every . If is ancestral, then for each , we have that
(6) |
Proof.
Let . Since is ancestral, there are no arcs between members of and and no directed paths from to members of in . We have the separation, in , and from Markov property given in Theorem 26, we deduce that
(7) |
- ()
-
()
If , then ; then from the composition property and (7), we obtain , and finally by weak union we have . ∎
We note that the implication above is, in fact, only using the global Markov property in one direction (and composition, in addition) in the other.
In addition to this new definition of direct cause, and consequently causal structure, we can define a -arc in the intervened graph to exist if
(8) |
as the following result shows.
Proposition 34.
Let be a transitive interventional family, and suppose that satisfies the intersection and composition properties for every . If is ancestral, then for each , we have that
Proof.
7 Quantifiable interventional families
Although transitive observable interventional families lead to Markov property of the observational distribution to the causal graph, they do not allow us to measure causal effects from observed data. Measuring causal effects from observed data is an important consequence of the theory presented here, although it is beyond the scope of this paper. Here we provide a more restricted axiom that allows this possibility.
For technical simplicity, we focus on the case where the graphs are directed ancestral. We recall from Section 6 that under the additional assumptions that the interventional family is transitive with each member satisfying the intersection and composition properties, all occurrences in the definitions and axioms can be replaced with and using (6).
7.1 Axiomatization
If two measures and have the same null sets, then they are equivalent, and we write . For a distribution with state space , we say that the family is compatible if for all distinct , we have .
We now say that an interventional family (of distributions) is quantifiable interventional if there exists an underlying distribution with state space such that the family is compatible and the following axiom holds.
Axiom 4.
For all distinct , there exist regular conditional probabilities such that their marginals on satisfy
for all and all .
In addition, if is a random vector with distribution , then we say that is a cause of , if , and we write to denote the set of the causes of ; and similarly for .
When Axiom 4 holds, we can drop the occurrence of the in the conditioning set of the distribution.
Proposition 36.
Let be a quantifiable interventional family with the underlying distribution . For all distinct , there exist regular conditional probabilities such that their marginals on satisfy
for all .
Proof.
In the corollary below, we see that Axiom 4 imposes strong relationships among different .
Corollary 37.
Let be a quantifiable interventional family. For every distinct , we have
Proof.
Remark 38.
Remark 39.
By Corollary 37, Axiom 4 imposes strong relationships between different . On the other hand, Axioms 2 and 3 only deal with conditional independencies implied by interventional and underlying distributions; thus, they clearly do not impose the restrictions that appear in Corollary 37. This shows that there are many interventional families that are strongly observable but not quantifiable.
Lemma 40.
Let be a quantifiable interventional family with the underlying distribution and directed ancestral causal graph . For , we have
(9) |
Proof.
Observe that the dependence
(10) |
is equivalent to the existence of measurable subset of positive measure under such that for every there exist disjoint measurable subsets of positive measures under the conditional probability satisfying the inequality
for all and all ; in the case that is empty, we simply omit reference to the variable .
Proposition 41.
Let be a transitive quantifiable interventional family with the underlying distribution and directed ancestral causal graph . Assume that is a direct cause of . Then
Proof.
The following examples may help to illustrate the need for compatibility in our proof of Lemma 40.
Example 42 (Supports).
Consider the random variables, and , where is a random isometry of , chosen uniformly from a set of a rotations, and , where is a standard bivariate normal random variable that is independent of , and acts on in the natural way. It is well-known that is independent of . However, on a standard intervention, if the possibility of translations are included, then would be dependent of , and . Notice that compatibility is not satisfied, since it allows for translations that were a null set under the underlying distribution for , which was restricted to rotations.
With regards to compatibility, it is not enough for two measures to have common support.
Example 43 (Mutually singular measures with common support).
Let be uniformly distributed on the unit interval. Note that the bits of the binary expansion of given by sequence are iid Bernoulli random variables with parameter . Working in reverse, we let be a random variable on the unit interval such that is a sequence of iid Bernoulli random variables with parameter . Note that although and both have the unit interval as their support, their associated laws are mutually singular.
Consider random variables, and , where is uniformly distributed in the unit interval, and independent of , and the procedure for generating is as follows. We examine the input by taking and then taking the (infinite) sample average; once the sample average is obtained, we generate a single Bernoulli random variable with parameter . If the distribution of is fixed to that of either or above, then clearly is independent of . However, with an intervention on that allows for a mixture of and , we will have .
7.2 Bivariate-quantifiable interventional families
As before, let . Let be a quantifiable interventional family. We say that is a bivariate-quantifiable interventional family (of distributions) if the underlying distribution satisfies the following:
Axiom 5.
For all distinct such that and , there exist regular conditional probabilities such that their marginals on satisfy
for all and all .
We note that a bivariate version above does not hold in general, and in particular, may not hold when the intervention takes place in between and .
Example 44.
Consider again the simple SCM, where and . In a discrete setting, we have
but
and in general, the expressions will not be equal.
We have a similar result to that of Proposition 36 for bivariate observable interventional families.
Proposition 45.
Let be a bivariate-quantifiable interventional family with the underlying distribution . Then for all distinct such that and , there exist regular conditional probabilities such that their marginals on satisfy
for all .
Proof.
The proof is a simple routine modification of the proof of Proposition 36 for the univariate case. ∎
For bivariate-quantifiable interventional families, we have the following important conditional independence result.
Corollary 46 (Conditional independence of a disjoint pair on intervention).
Let be a bivariate-quantifiable interventional family. Then, for distinct such that and , we have that
Proof.
The result is an immediate consequence of Proposition 45. ∎
We now provide conditions under which quantifiable families are observable.
Theorem 47 (Bivariate-quantifiable interventional families are strongly-observable interventional).
Let be a transitive bivariate-quantifiable interventional family with the underlying distribution and a directed ancestral causal graph . If satisfies the intersection and composition properties for every , then is strongly observable.
Proof.
Note that by Remark 35, we may replace with in the axioms.
We prove Axioms 2 (a) and 3 (a). If and , then the result follows directly from Corollary 46. Hence, assume, without loss of generality, that . In this case, . By conditioning, for measurable and , we have
and
By Proposition 36, we have
Hence, together with conditional independence , we have the disintegration
and setting we have
from which we have the conditional independence ; the other required direction of the proof is similar. ∎
As a direct consequence of Theorem 28, we have the following Markov property.
Corollary 48.
Let be a transitive bivariate-quantifiable interventional family, with underlying joint distribution and directed ancestral causal graph . Assume that and , for every , satisfy the intersection property and the composition property. It then holds that is Markovian to the causal graph, .
Corollary 49.
Let be a transitive quantifiable interventional family with the underlying distribution and directed ancestral causal graph . If satisfies the intersection and composition properties for every , then satisfies the converse pairwise Markov property w.r.t. .
7.3 Uniqueness of observable distribution
We can now prove the uniqueness of under Axiom 4 for DAGs.
Proposition 50.
Let be a transitive bivariate-quantifiable interventional family, with underlying joint distribution . Assume that and , for every , satisfy the intersection property and the composition property. If is a DAG then is unique.
Proof.
Notice that the Markov property in Corollary 48 leads to a factorization of for DAGs. For measurable , we have
where is the set of all nodes larger than , which is obtained from a valid ordering of nodes (see Corollary 11), and the last equality uses the independence , implied by Markov property. The desired uniqueness now follows from Remark 38. ∎
Notice that, in the case where there is an arc, the uniqueness does not hold. One can simply consider an -arc to observe that, using Axiom 4, only the marginal distributions of and are determined.
8 Specialization to structural causal models
In this section, we relate the standard intervention on SCMs to the setting presented in this paper.
Let be an SCM with random vector taking values on with joint distribution , and associated graph . Consider again the standard intervention in SCMs, where intervention on replaces the equation by , where is independent of all other noises. In the setting of this paper, the new system of equations after intervening on yields the joint distribution , and consequently one obtains the family of distributions
Remark 51.
Remark 52.
Note that under some weak, but technical assumptions (in the sense of compatibility), we suspect it is possible to show that is invariant under the choice of ; see also Proposition 6.13 in [24], which has technical counterexamples.
Therefore, under invariance, if we are only interested in the causal structure, we can simply refer to some canonical family, where intervention has the same distribution as , which we denote by .
8.1 SCMs and interventional families
The first question that needs to be addressed is when satisfies different axioms and key assumptions related to interventional families. It is well known that cancellations may occur in SCMs so that cause is not transitive (as required in Axiom 1). We do not provide conditions for SCMs such that cause is transitive– the main results in this section do not require transitivity of cause. The following example shows that standard interventions on SCMs do not lead to quantifiable interventional families either.
Example 53.
Consider the collider , where . Consider the underlying joint distribution where is Bernoulli with parameter and is Bernoulli with parameter . Consider the standard interventions where and . Although these are standard interventions, the resulting family does not satisfy Axiom 4. Observe that is a direct cause of , but is not a cause of . However, and We also see that the conditions of Theorem 10 are not satisfied.
We will need to introduce a concept related to faithfulness on the edge level. We say that satisfies the edge-cause condition w.r.t. if an arrow from to in implies that , i.e., and .
In Section 8.4, we will discuss simple conditions on the SCM that imply the edge-cause condition. In particular, it is easy to see that if are faithful to the intervened graphs , where the upcoming arrows and arcs to are removed, the edge-cause condition is satisfied. The same can be said with the weaker condition of adjacency-faithfulness of and .
Proposition 54 (Ancestors and causes).
For an SCM and the family , we have that
(11) |
for every . In addition, if satisfies the edge-cause condition w.r.t. then
(12) |
Moreover, is transitive if and only if
(13) |
Proof.
By the structure of the SCM, if , then an intervention on does not affect the distribution of , so that would be independent of , and would not be a cause of . Similarly, if , then given an intervention on does not affect the distribution of , and thus cannot be a direct cause of . Hence we have established (11), and (12) follows from the definition of the edge-cause condition.
To prove (13), we note that we already have . Transitivity of the interventional family and imply that ; the other direction follows from the fact that the set of ancestors is transitive. ∎
The following example shows that the inequality may be strict in (11); and (12) does not hold without the edge-cause condition.
Example 55 (Independence and cause in an SCM).
Consider the collider
where . Consider the underlying joint distribution where is Bernoulli with parameter and is Bernoulli with parameter . Consider the standard interventions where nothing happens: and . Then has no causes, and thus its set of ancestors is not equal to its causes. However, it is easy to verify that Axiom 4 is satisfied.
We also recall that the joint distribution of a structural causal model is Markovian to . We need a corresponding result to the Markov property of the joint distribution of the SCMs for the intervened distribution.
Lemma 56.
Let be an SCM. For each , its intervened distribution is Markovian to and .
Proof.
Notice that intervention on yields another SCM with joint distribution and intervened graph , where is obtained from by removing all the parents of and all the arcs with one endpoint . Therefore, is Markovian to . Since, is a strict subgraph of (with the same node set), so that a separation in the original graph is a separation in the intervened graph, we conclude that is Markovian to . ∎
Theorem 57 (Strongly observable SCMs).
Let be an SCM associated to graph . Assume that satisfies the converse pairwise Markov property w.r.t. . Also assume that satisfies the edge-cause condition w.r.t. . Then is a strongly observable interventional family if is ancestral. In addition, if is transitive, then the result holds for BDMGs.
Proof.
To prove Axiom 2 (a) in the ancestral case, by Lemma 5, for a separable pair , we have that , for any subset of the ancestors of and containing their parents. Under the edge-cause condition, contains the parents. Hence, by the Markov property of SCMs, we have that .
To prove Axiom 2 (b), in the ancestral case, suppose that . By Proposition 54, we have that . By the edge-cause condition, we may assume that there is no arrow from to . There is also no arc between and , since causes are ancestors. Since in ancestral graphs, inseparable pairs of nodes cannot be ancestors of each other, we conclude that are separable. Again by using Lemma 5 and the same argument as for Axiom 2 (a), the result follows from the Markov property for SCMs.
If we further assume that is transitive then, by Proposition 54, we have that , and the results follow without the assumption of the graph being ancestral.
8.2 Causal graphs and graphs associated to SCMs
In this subsection, we present the ultimate relationship between interventions in the SCM setting and the setting in this paper, which relates the “true causal graph” with the causal graph defined in this paper. We first need some lemmas.
Lemma 58.
Let be an SCM. We have that for every . In addition, if satisfies the edge-cause condition w.r.t. its associated graph , then
Proof.
Let . This arrow is a direct cause by construction; hence, by Proposition 54, it follows that .
To prove the second statement, if , then by the edge-cause condition, we have . Hence, by construction, . ∎
To prove a part of the next main result that deals with directed ancestral graphs, we need the following lemma.
Lemma 59.
Let be an SCM, and assume that satisfies the edge-cause condition w.r.t. its associated directed ancestral graph . Then for every , we have
Proof.
Assume that ; in addition, without loss of generality, choose not to be a descendent of . Let , and . By edge-cause condition, , and since is ancestral, we know that there is no arc between and . Hence we have the separation, ; moreover, since is Markovian to , we have the independence, .
The contraction property implies that , and an inductive argument with similar reasoning, we obtain that
Finally, by weak union, we have ∎
Theorem 60 (Equality of causal and SCM graphs).
Consider a structural causal model with the joint distribution . If satisfies the edge-cause condition, and satisfies the converse pairwise Markov property w.r.t. the maximal directed ancestral graph , then . In addition, if is transitive, then the result holds for maximal BDMGs. Moreover, without the maximality assumption, it holds that and are Markov equivalent.
Proof.
By Lemma 58, we have that . Therefore, in order to show , it is enough to prove that and have the same arcs. We will check pairs of nodes with no edge between them.
Since is strongly observable (Theorem 57), by Proposition 30, there is no edge between and in if and only if ; assume that this conditional independence statement holds. If is ancestral then by Lemma 59, we have that .
If is not ancestral, then using transitivity, and Proposition 54, is replaced with . Now. in both cases, the converse pairwise Markov property implies that there is no edge between and in .
Now consider the other direction: Let in . If is ancestral, then, since it is also maximal, and , by Lemma 5, we have . Then the global Markov property for SCMs implies .
If is not ancestral, then by Proposition 3, with replaced with (using Proposition 54), it holds that . This implies that there is no arc between and in .
Finally, we note that if the graph is not maximal, the only pairs of nodes for which may not hold are inseparable pairs. However, the lack of an edge between inseparable pairs in , and simultaneously the existence of it in , do not violate Markov equivalence. ∎
The following example shows that we, indeed, require to assume that is maximal.
Example 61.
Consider the non-maximal graph of Figure 3 to be . Assume standard intervention and also faithfulness of and . In , there exists an arc between and since no matter what one intervenes on, and always stay dependent given any conditioning set. Notice that here we require two discriminating paths between and . If there were only one discriminating path between and (for example with no and ) then by intervening on any node on the discriminating path (e.g., ), one obtains the required independence .
Corollary 62.
Let be an SCM. If its joint distribution and its associated bowless directed mixed graph are faithful, and its intervened distribution and its associated intervened graphs are faithful, for every , then .
Proof.
The result follows from the fact that faithfulness of intervened distributions and graphs implies the edge-cause condition and faithfulness of the underlying distribution and graph implies the converse pairwise Markov property. ∎
8.3 SCMs and quantifiable interventional families
In this subsection, we assume that the graph associated to an SCM is a directed ancestral graph in order to follow the theory presented in Section 7.
Although Axiom 4 may not in general be satisfied, we do have the following observations.
Lemma 63.
Let be an SCM associated to a directed ancestral graph . Let , , and be such that . There exist regular conditional probabilities such that their marginals on satisfy
and
Proof.
Let be measurable. Then for , we have
where the last equality follows from the fact the graph is ancestral, and thus is independent of .
Suppose that . Similarly, upon standard intervention, on , since all the parents are given, we still have
If , then upon standard intervention on , we have that are independent, so that
Lemma 64.
Let be an SCM associated to a directed ancestral graph . Consider distinct such that and , and be such that . There exist regular conditional probabilities such that their marginals on satisfy
and
for all and measurable.
Proof.
We have
since by definition does not contain nor , and thus and are independent.
Suppose that . Similarly, upon standard intervention, on , since, we assumed and , all the parents are given, we still have
If , then upon standard intervention, on , we have are independent, so that
Theorem 65.
Let be an SCM associated to a directed ancestral graph . Assume that is compatible, and satisfies the edge-cause condition w.r.t. . Then is a bivariate-quantifiable interventional family.
8.4 Characterizing the edge-cause condition
Consider an SCM, where is a parent of . Previously [30, Proposition 35], we characterized completely which functions would result in the independence and thus the converse pairwise Markov property. Using similar ideas, we give a similar characterization of when the edge-cause holds using the following two remarks.
Proposition 66.
Consider an SCM, with . Write . For almost all , let be a random variable with law that is independent of . Then is independent of if and only if for almost all , we have that has the same law as
Proof.
Let denote the law of and let denote the law of . For measurable and , we have
(14) | |||||
Thus if and are independent, then
for all measurable and all with nonzero probability; since this holds for all measurable subsets with nonzero probability, independence implies that, for almost every , we have
(15) |
and they have same law, as desired.
Remark 67 (Characterizing the edge-cause condition in : cause).
Notice that Proposition 66 remains valid for the associated SCM that is defined by intervention on . In particular, with the intervention , we replace by
We remark that it is possible on intervention, is independent . Thus and could behave quite differently.
Proposition 68.
Consider an SCM, with . Write . Suppose that . Then if and only if for almost all the conditional law of given is the same as the law of , for almost all in the support of the conditional distribution of given .
Proof.
9 Identifying cases that need extra or multiple concurrent interventions
Our theory is based on the interventional family , which only allows single interventions.
We note again that, for ancestral causal graphs, under certain conditions, direct cause can be simply defined using ; see Section 6. The following example shows that for the case of SCMs for non-ancestral graphs, this definition might misidentify some direct causes.
Example 70.
In Example 14 and the graph of Figure 1, we observed that an iterative procedure is needed to obtain , which does not coincide with in this case.
In an SCM associated to the below graph with standard intervention, we see that even misidentifies as the direct cause of since has no parents in the graph. We observe that, in this case the independence holds.
Below, we classify the cases for non-ancestral graphs, where direct cause cannot be defined in the way described above.
The result below shows that all cases where is not sufficient to define result in PIPs:
Proposition 71.
For a transitive interventional family , assume is Markovian to the -intervened graph, (e.g., by satisfying conditions of Theorem 26.) For non-adjacent pair of nodes and , let . If , then there is a PIP between and in .
Proof.
Since is Markovian to , there is a -connecting path between and given ; this path is also connecting in . Since , and the global Markov property does not imply the pairwise Markov property only for inseparable pairs, must be an inseparable pair. By Proposition 4, there is a PIP in between and . ∎
For , there could be two types of PIPs between and , described in the above proposition. Notice again that such cases can happen only for non-ancestral graphs and when the true causal graph is non-maximal.
-
(1)
If this PIP is not a PIP in (as in the case of Figure 1), i.e., an inner node of the PIP is only an ancestor of via , then the iterative procedure to define direct cause works as it is designed to ensure that a direct cause is not placed in the causal graph incorrectly.
-
(2)
If the PIP is a PIP in (as is, e.g., the case in Figure 4), then the current theory is incomplete in the sense that some direct causes may be misidentified as our theory considers , and it always places an arrow from to in . We have two sub-cases here:
-
(a)
If there exists maximum one such PIP between each pair of nodes, then we propose the following adjustment to the definition of direct cause, which fixes this issue:
We define to be a direct cause of if for every (that may be but not ), it holds that .
Hence, as a procedure, one can generate the intervened causal structure , and if there is a PIP between and then performs the extra test
for a on PIP. If it holds, then the arrow from to will be removed.
We note that, for observable interventional families, and for separable pairs, by Axioms 2 and 5, the original definition of direct cause is equivalent to this new one. However, this definition, is an extension of the original definition for inseparable pairs.
Notice also that in such cases, similar to bows, by only knowing , it is not possible to distinguish an arrow from an arc between and . We treat such cases as a direct cause from to .
-
(b)
If there are more than one such PIPs between and then, In the case of faithful SCMs, no matter which one intervenes on, . Hence, Markov property does not imply independence of and given .
In such cases, one should intervene concurrently on one node on each of these PIPs to determine whether and become separated given . Hence our theory based on single interventions is not identifying such direct causes. To be more precise, we have the following remark.
-
(a)
Remark 72.
Let be a transitive interventional family. Assume . Suppose that there are PIPs between and . In order to identify whether , the existence of a concurrent intervention of size , is necessary, where each , , is an inner node of .
10 Summary and discussion
For a given family of distributions for a finite number of variables with the same state space (which we refer to as interventional family), we have defined the set of causes of each variable. This is equivalent to the definition of the cause provided in the literature for SCMs. We have provided transitivity of the cause as the first axiom on the given family, as it leads to reasonable causal graphs. We have provided weaker conditions for interventional families under which the family is transitive if the interventional distributions satisfy the property of singleton-transitivity. Although, in general, causation does not seem to be transitive [12], it seems to us that examples for which causation is not transitive mostly follow the counterfactual interpretation of causation rather than the conditional independence one; and possible examples based on conditional independence interpretations do not satisfy singleton-transitivity. We refrain from philosophical discussions here, but note that in our opinion, representing causal structure using graphs implicitly implies that one is focusing on the cases where cause in indeed transitive as the directed paths in directed graphs are transitive.
We have defined the concept of direct cause using the definition of the cause and pairwise independencies given the joint causes in with an iterative procedure in the general case. However, under the assumption that the causal graph is directed ancestral, causal graphs can be defined without an iterative procedure. The major departure here from the literature is that the direct cause of a variable is defined using single interventions and by conditioning on other causes of the variable. As opposed to the defined causal relationships, which stand true in a larger system of variables, the direct causal relationship clearly depends on the system—it seems that one can always add a new variable to the system that breaks the direct causal relationship by sitting between the two variables as an intermediary.
Our original motivation to write this paper was to relax the common assumption that there exists a true causal graph, and, thereby, the goal of causal inference is solely to learn or estimate this graph. We do not need any such assumption under our axiomatization as we define causal graphs using intervened graphs, which themselves are defined using the concept of the direct cause for arrows and the pairwise dependencies given the joint causes in the interventional distributions for arcs. Arcs represent latent variables, and our generated graphs also allow for causal cycles. Transitivity ensures that the causal relationships in the interventional family and the ancestral relationships in graph are interchangeable.
We believe this setting can be extended to causal graphs that unify anterial graphs [19] with cyclic graphs. Such a graph represents, in addition, symmetric causal relationships implied by feedback loops; see [18]. In order to do so, some extension of Markov properties, presented here for BDMGs, for this larger class of graphs is needed.
For the case where causal cycles exist, one advantage of the setting presented here is that it is easy to provide examples for cyclic graphs under our axiomatization; see Example 22. This is in contrast with the case of SCMs with cycles, where, for this purpose, strong solvability assumptions must be satisfied [3].
We notice again that there is no need for to define the causal graph. We provide a minimalist and a maximalist approach to place an arc in the casual graph based on whether the arc exists in intervened graphs. This will only lead to different causal graphs where arcs between inseparable pairs of nodes are or are not present.
We have also axiomatized interventional families with an observed distribution relating certain conditional independence properties in and , and called a family that satisfies these axioms (strongly) observable interventional families. Although we have defined the causal graph only using , we show that under these axioms, the arcs in causal graph can be directly defined using the observed distribution .
There are two main results in this paper: firstly, under compositional graphoids with transitive families, any interventional distribution is Markovian to the intervened causal graph over node ; secondly, under observable interventional families, is Markovian to the generated causal graph. We note that our definition of the arc, partly ensures automatically that the Markov properties are satisfied. However, under an alternative definition of the arc we provided—which places an arc when the endpoint variables are always dependent regardless of what we condition on—no assumptions related to Markov property is in place—in this case, we need the extra assumption of ordered upward- and downward-stability to obtain the Markov property. These Markov-property results are analogous to the case of SCMs, and, consequently, this allows the developed theory for causality in the SCMs to be embedded in the general setting of this paper.
We mostly work on definitions and axioms for interventional families that are only related to conditional independence structure of interventional and observed distributions. We note that although they are sufficient for generating and making sense of causal graphs, for “measuring” causal effects (which we do not discuss) they are not sufficient. For that purpose, for the case of directed ancestral causal graphs, we provide the axiom of (bivariate) quantifiable interventional families, which relates the univariate (and bivariate) conditional-marginals of the distributions in the family to those of an underlying distribution . In principle, could be learned via observation, and in the case of DAGs, it is determined uniquely by the interventional family. The extension of this axiom to BDMGs seems quite technical, and requires further study.
Note that satisfaction of the axioms for a family of distributions does not mean that the family provides the correct interventions—refer again to Example 22 to observe that all three types of edges, as the causal graph for different interventional families of two variables with the same underlying distribution , can occur. Finding the correct interventional families is a question for mathematical and statistical modeling. One can think of this as being analogous to Kolmogorov probability axioms [15]: a measurable space satisfying Kolmogorov axioms does not mean that it provides the correct probability for the experiment at hand. This is not the case in the SCM setting, as in the presence of densities, interventional distributions with full support are equivalent [24]—this is because the causal graph in this setting is assumed to exist and already set in place. Example 23 shows that even the skeleton of the causal graph could change by the change of interventions with the same underlying distribution.
When we relate SCMs to the setting of this paper, we find that if the SCM satisfies some weaker version of faithfulness given by the edge-cause and converse pairwise Markov property, then, in the case where the natural graph associated with the SCM is ancestral, the causal graph, given by standard interventions, is the same as the SCM graph; if the graph is not ancestral, then if the interventions are transitive, we can recover this result for maximal BDMGs. These results demonstrate that our theory is compatible with the standard theory, for a large class of SCM.
We have not provided conditions on SCMs under which the cause is transitive, although it is not used for the main results related to SCMs being embedded in the setting of this paper. Our initial investigation revealed that this is quite a technical problem. This is nevertheless beyond the scope of this paper, and is a subject of future work.
Finally, although an advantage of our theory is that it only relies on single interventions, our theory might misidentify direct causes between primitive inducing paths in the intervened graphs for non-maximal non-ancestral causal graphs. We provide an adjustment to deal with this when there is only one PIP exists between a pair. If there are more PIPs between a pair, we showed that we need multiple concurrent intervention of the size of the number of PIPs between the pair.
Similarly, our theory does not include some cases where multiple concurrent interventions could act as the cause of a random variable whereas none of them individually act as the cause; for example; see Example 53. Understanding these cases, and developing a similar theory for such cases is a subject of future work.
11 Appendix
Pairwise Markov properties for bowless directed mixed graphs (BDMGs)
Here, we prove the equivalence of the pairwise Markov property (PMP) and the global Markov property by defining an intermediary pairwise Markov property for acyclic directed mixed graphs (ADMGs).
First, we need to define the concept of acyclification from [3, 11], which generates a Markov-equivalent acyclic graph from a graph that contains directed cycles by adding and replacing some edges in the original graph: For a directed mixed graph (which contains directed cycles), the acyclification of is the acyclic graph with the same node set, and the following edge set:
-
There is an arrow in if in ;
-
and there is an arc in if and only if there exist and with or in .
Lemma 73.
Let be a BDMG. For distinct nodes and , we have and if and only if .
Proof.
() Suppose that there is a directed path from to in . This path is a directed path in unless an arrow turns into an arc. However, in this case, and, hence, . An inductive argument implies the result since .
() Conversely, if is a directed path in from to then an arrow on exists in unless is a parent of another node such that . This implies that in , and, again, an inductive argument implies that .
In addition, in since, if, for contradiction, this is not the case then all the arrows in the strongly connected component containing and turn into arcs. If there is an arrow generated in that makes the directed path from to , then by the construction of acyclification, it can be seen that is an ancestor of , and hence they are still a part of the same strongly connected component, and they must turn into arcs, which is a contradiction. ∎
Proposition 74.
If is a BDMG, then is an ADMG.
Proof.
It is easy to see from the definition of acyclification that is acyclic; see also [3].∎
It was shown in [3, Proposition A.19] that the global Markov property could be read off equivalently from the acyclification of a directed mixed graph; see also [11].
Proposition 75 (Equivalence of the separation criteria [3]).
Let be a BDMG. For disjoint subsets of nodes , we have
Before proceeding to provide required results for proving Theorem 2, we prove Proposition 4. It was proven in [19] that every inseparable pair is connected by a PIP for a generalization of ADMGs; see Section 4 of this paper. The definition of a PIP for ADMGs is as follows: a path is a PIP if every inner node is a collider node and an ancestor of one of the endpoints.
Proof of Proposition 4.
We suppose that is an inseparable pair in . In , by Propositions 74 and 75, this is an inseparable pair or there is an edge between and .
First we show that if there is an edge between and in then this is an edge in or there is a PIP in . Assume it is not an edge in . There are two cases: If it is an arrow, then and are part of the same strongly connected component and a path on this is a PIP in by definition. If is an arc, then and are part of the same strongly connected component again or and are parts of the same strongly connected component, respectively, where there is an arc between and . In this case, a path consisting of a directed path on the component between and , the -arc, and a directed path on the component between and is a PIP in .
The above will resolve the case where there is an edge between and . Now consider the case where and are inseparable in . It was proven in [19] that every inseparable pair is connected by a PIP for a generalization of ADMGs; see Section 4 of this paper.
Hence, we need to show that if there is a primitive inducing path in between and then there is a PIP in between and . From the previous argument about edges in , there are edges or PIPs corresponding to every edge of . We put all these together to form a walk between and . We then look at a subpath of this walk between and , and call it .
Every edge on is either an arrow, where endpoints are in the same strongly connected component or an arc as required.
Hence, we only need to show that every inner node is in . We know that they are all in ; thus, the result follows from Lemma 73. ∎
We now recall that a pairwise Markov property for ADMGs is the same as in the BDMGs as defined by (PMP). The proof of the equivalence of the pairwise Markov property and the global Markov property for ADMGs is a trivial special case of [19, Corollary 5].
Proposition 76 (Equivalence of pairwise and global Markov properties for ancestral graphs [19]).
Let be an ADMG, and assume that the distribution satisfies the intersection property and the composition property. Then satisfies the pairwise Markov property w.r.t. if and only if it is Markovian to .
Proposition 77.
Let be a BDMG. If satisfies the pairwise Markov property w.r.t. , then it satisfies the pairwise Markov property w.r.t. its acyclification, .
Proof.
Consider two arbitrary non-adjacent nodes and in ; they are not adjacent in , and also since otherwise they would have made adjacent after acyclification. Thus, by the pairwise Markov property for , we have
Moreover, by Lemma 73, , and same equality holds for the node . Thus, we obtain the pairwise Markov property for . ∎
Proof of Theorem 2.
Proof of Proposition 3.
It is enough to prove, for two arbitrary nodes and in a bowless directed mixed graph , that . By Proposition 75, this is equivalent to in . Notice that, by Proposition 74, is an ADMG. In addition, by Proposition 75, for every separation statement between two nodes and in , there is a separation statement between and in ; hence, maximality of implies that is maximal. The result now follows from the fact that, for maximal ADMGs, the separation always holds; see [19]. ∎
12 Acknowledgments
The authors are grateful to Patrick Forré for a helpful discussion on pairwise Markov properties for graphs with directed cycles, and to Philip Dawid, Thomas Richardson, and Jiji Zhang for helpful general discussions related to this manuscript. Work of the first author is supported in part by the EPSRC grant EP/W015684/1.
References
- [1] {binproceedings}[author] \bauthor\bsnmBareinboim, \bfnmElias\binitsE., \bauthor\bsnmBrito, \bfnmCarlos\binitsC. and \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear2011). \btitleLocal Characterizations of Causal Bayesian Networks. \bpublisherSpringer-Verlag, \baddressBerlin, Heidelberg. \endbibitem
- [2] {binbook}[author] \bauthor\bsnmBareinboim, \bfnmElias\binitsE., \bauthor\bsnmCorrea, \bfnmJuan D.\binitsJ. D., \bauthor\bsnmIbeling, \bfnmDuligur\binitsD. and \bauthor\bsnmIcard, \bfnmThomas\binitsT. (\byear2022). \btitleOn Pearl’s Hierarchy and the Foundations of Causal Inference, In \bbooktitleProbabilistic and Causal Inference: The Works of Judea Pearl \bedition1 ed. \bpages507–556. \bpublisherAssociation for Computing Machinery, \baddressNew York, NY, USA. \endbibitem
- [3] {barticle}[author] \bauthor\bsnmBongers, \bfnmStephan\binitsS., \bauthor\bsnmForré, \bfnmPatrick\binitsP., \bauthor\bsnmPeters, \bfnmJonas\binitsJ. and \bauthor\bsnmMooij, \bfnmJoris M.\binitsJ. M. (\byear2021). \btitleFoundations of structural causal models with cycles and latent variables. \bjournalAnn. Statist. \bvolume49 \bpages2885 – 2915. \endbibitem
- [4] {barticle}[author] \bauthor\bsnmChang, \bfnmJ. T.\binitsJ. T. and \bauthor\bsnmPollard, \bfnmD.\binitsD. (\byear1997). \btitleConditioning as disintegration. \bjournalStatist. Neerlandica \bvolume51 \bpages287–317. \endbibitem
- [5] {barticle}[author] \bauthor\bsnmColombo, \bfnmDiego\binitsD. and \bauthor\bsnmMaathuis, \bfnmMarloes H.\binitsM. H. (\byear2014). \btitleOrder-independent constraint-based causal structure learning. \bjournalJ. Mach. Learn. Res. \bvolume15 \bpages3741–3782. \endbibitem
- [6] {barticle}[author] \bauthor\bsnmDawid, \bfnmAlexander\binitsA. (\byear2010). \btitleBeware of the DAG! \bjournalJournal of Machine Learning Research - Proceedings Track \bvolume6 \bpages59-86. \endbibitem
- [7] {barticle}[author] \bauthor\bsnmDawid, \bfnmA. P.\binitsA. P. (\byear1979). \btitleConditional independence in statistical theory (with discussion). \bjournalJ. R. Stat. Soc. Ser. B. Stat. Methodol. \bvolume41 \bpages1–31. \endbibitem
- [8] {barticle}[author] \bauthor\bsnmDawid, \bfnmA. P.\binitsA. P. (\byear2002). \btitleInfluence Diagrams for Causal Modelling and Inference. \bjournalInternational Statistical Review \bvolume70 \bpages161-189. \endbibitem
- [9] {barticle}[author] \bauthor\bsnmDawid, \bfnmPhilip\binitsP. (\byear2021). \btitleDecision-theoretic foundations for statistical causality. \bjournalJournal of Causal Inference \bvolume9 \bpages39–77. \endbibitem
- [10] {barticle}[author] \bauthor\bsnmEberhardt, \bfnmFrederick\binitsF. and \bauthor\bsnmScheines, \bfnmRichard\binitsR. (\byear2007). \btitleInterventions and causal inference. \bjournalPhilosophy of Science \bvolume74 \bpages981–995. \endbibitem
- [11] {barticle}[author] \bauthor\bsnmForre, \bfnmPatrick\binitsP. and \bauthor\bsnmMooij, \bfnmJoris M.\binitsJ. M. (\byear2017). \btitleMarkov properties for graphical models with cycles and latent variables. \bjournalarXiv:1710.08775. \endbibitem
- [12] {barticle}[author] \bauthor\bsnmHall, \bfnmNed\binitsN. (\byear2000). \btitleCausation and the price of transitivity. \bjournalJournal of Philosophy \bvolume97 \bpages198. \endbibitem
- [13] {binproceedings}[author] \bauthor\bsnmHuang, \bfnmYimin\binitsY. and \bauthor\bsnmValtorta, \bfnmMarco\binitsM. (\byear2006). \btitlePearl’s calculus of intervention is complete. In \bbooktitleProceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. \bseriesUAI’06 \bpages217–224. \bpublisherAUAI Press, \baddressArlington, Virginia, USA. \endbibitem
- [14] {barticle}[author] \bauthor\bsnmKiiveri, \bfnmH.\binitsH., \bauthor\bsnmSpeed, \bfnmT. P.\binitsT. P. and \bauthor\bsnmCarlin, \bfnmJ. B.\binitsJ. B. (\byear1984). \btitleRecursive causal models. \bjournalJ. Aust. Math. Soc., Ser. A \bvolume36 \bpages30–52. \endbibitem
- [15] {bbook}[author] \bauthor\bsnmKolmogorov, \bfnmAndrey N.\binitsA. N. (\byear1960). \btitleFoundations of the Theory of Probability, \bedition2 ed. \bpublisherChelsea Pub Co. \endbibitem
- [16] {binproceedings}[author] \bauthor\bsnmKorb, \bfnmKevin B.\binitsK. B., \bauthor\bsnmHope, \bfnmLucas R.\binitsL. R., \bauthor\bsnmNicholson, \bfnmAnn E.\binitsA. E. and \bauthor\bsnmAxnick, \bfnmKarl\binitsK. (\byear2004). \btitleVarieties of causal intervention. In \bbooktitlePRICAI 2004: Trends in Artificial Intelligence (\beditor\bfnmChengqi\binitsC. \bsnmZhang, \beditor\bfnmHans\binitsH. \bsnmW. Guesgen and \beditor\bfnmWai-Kiang\binitsW.-K. \bsnmYeap, eds.) \bpages322–331. \bpublisherSpringer Berlin Heidelberg, \baddressBerlin, Heidelberg. \endbibitem
- [17] {bbook}[author] \bauthor\bsnmLauritzen, \bfnmS. L.\binitsS. L. (\byear1996). \btitleGraphical Models. \bpublisherClarendon Press, \baddressOxford, United Kingdom. \endbibitem
- [18] {barticle}[author] \bauthor\bsnmLauritzen, \bfnmSteffen L.\binitsS. L. and \bauthor\bsnmRichardson, \bfnmThomas S.\binitsT. S. (\byear2002). \btitleChain graph models and their causal interpretations. \bjournalJournal of the Royal Statistical Society: Series B (Statistical Methodology) \bvolume64 \bpages321-348. \endbibitem
- [19] {barticle}[author] \bauthor\bsnmLauritzen, \bfnmSteffen L.\binitsS. L. and \bauthor\bsnmSadeghi, \bfnmKayvan\binitsK. (\byear2018). \btitleUnifying Markov properties for graphical models. \bjournalAnn. Statist. \bvolume46 \bpages2251–2278. \endbibitem
- [20] {bmisc}[author] \bauthor\bsnmPark, \bfnmJunhyung\binitsJ., \bauthor\bsnmBuchholz, \bfnmSimon\binitsS., \bauthor\bsnmSchölkopf, \bfnmBernhard\binitsB. and \bauthor\bsnmMuandet, \bfnmKrikamol\binitsK. (\byear2023). \btitleA Measure-Theoretic Axiomatisation of Causality. \endbibitem
- [21] {bbook}[author] \bauthor\bsnmPearl, \bfnmJ.\binitsJ. (\byear1988). \btitleProbabilistic Reasoning in Intelligent Systems : networks of plausible inference. \bpublisherMorgan Kaufmann Publishers, \baddressSan Mateo, CA, USA. \endbibitem
- [22] {bbook}[author] \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear2009). \btitleCausality: Models, Reasoning and Inference, \bedition2nd ed. \bpublisherCambridge University Press, \baddressNew York, NY, USA. \endbibitem
- [23] {barticle}[author] \bauthor\bsnmPeters, \bfnmJonas\binitsJ. (\byear2015). \btitleOn the intersection property of conditional independence and its application to causal discovery. \bjournalJ. Causal Inference \bvolume3 \bpages97–108. \endbibitem
- [24] {bbook}[author] \bauthor\bsnmPeters, \bfnmJ.\binitsJ., \bauthor\bsnmJanzing, \bfnmD.\binitsD. and \bauthor\bsnmSchölkopf, \bfnmB.\binitsB. (\byear2017). \btitleElements of Causal Inference - Foundations and Learning Algorithms. \bseriesAdaptive Computation and Machine Learning Series. \bpublisherThe MIT Press, \baddressCambridge, MA, USA. \endbibitem
- [25] {binproceedings}[author] \bauthor\bsnmRamsey, \bfnmJoseph\binitsJ., \bauthor\bsnmSpirtes, \bfnmPeter\binitsP. and \bauthor\bsnmZhang, \bfnmJiji\binitsJ. (\byear2006). \btitleAdjacency-faithfulness and conservative causal inference. In \bbooktitleProceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. \bseriesUAI’06 \bpages401–408. \bpublisherAUAI Press, \baddressArlington, Virginia, USA. \endbibitem
- [26] {barticle}[author] \bauthor\bsnmRichardson, \bfnmT. S.\binitsT. S. (\byear2003). \btitleMarkov properties for acyclic directed mixed graphs. \bjournalScand. J. Stat. \bvolume30 \bpages145-157. \endbibitem
- [27] {barticle}[author] \bauthor\bsnmRichardson, \bfnmT. S.\binitsT. S. and \bauthor\bsnmSpirtes, \bfnmP.\binitsP. (\byear2002). \btitleAncestral graph Markov models. \bjournalAnn. Statist. \bvolume30 \bpages962–1030. \endbibitem
- [28] {binproceedings}[author] \bauthor\bsnmRischel, \bfnmEigil F. \binitsE. and \bauthor\bsnmWeichwald, \bfnmSebastian\binitsS. (\byear2021). \btitleCompositional Abstraction Error and a Category of Causal Models. In \bbooktitleProceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence,. \bseriesProceedings of Machine Learning Research \bpages1013–1023. \bpublisherPMLR. \endbibitem
- [29] {barticle}[author] \bauthor\bsnmSadeghi, \bfnmKayvan\binitsK. (\byear2017). \btitleFaithfulness of probability distributions and graphs. \bjournalJ. Mach. Learn. Res. \bvolume18 \bpages1–29. \endbibitem
- [30] {barticle}[author] \bauthor\bsnmSadeghi, \bfnmKayvan\binitsK. and \bauthor\bsnmSoo, \bfnmTerry\binitsT. (\byear2022). \btitleConditions and assumptions for constraint-based causal structure learning. \bjournalJournal of Machine Learning Research \bvolume23 \bpages1–34. \endbibitem
- [31] {binproceedings}[author] \bauthor\bsnmShpitser, \bfnmIlya\binitsI. and \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear2006). \btitleIdentification of conditional interventional distributions. In \bbooktitleProceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. \bseriesUAI’06 \bpages437–444. \bpublisherAUAI Press, \baddressArlington, Virginia, USA. \endbibitem
- [32] {bbook}[author] \bauthor\bsnmSpirtes, \bfnmP.\binitsP., \bauthor\bsnmGlymour, \bfnmC.\binitsC. and \bauthor\bsnmScheines, \bfnmR.\binitsR. (\byear2000). \btitleCausation, Prediction, and Search, \bedition2nd ed. \bpublisherMIT press. \endbibitem
- [33] {bbook}[author] \bauthor\bsnmStudený, \bfnmM.\binitsM. (\byear2005). \btitleProbabilistic Conditional Independence Structures. \bpublisherSpringer-Verlag, \baddressLondon, United Kingdom. \endbibitem
- [34] {barticle}[author] \bauthor\bsnmVerma, \bfnmTom\binitsT. and \bauthor\bsnmPearl, \bfnmJudea\binitsJ. (\byear1988). \btitleCausal networks: semantics and expressiveness. \bjournalProceedings of the Fourth Workshop on Uncertainty in Artificial Intelligence (UAI) \bvolume4 \bpages352–359. \endbibitem
- [35] {bbook}[author] \bauthor\bsnmWoodward, \bfnmJames\binitsJ. (\byear2004). \btitleMaking Things Happen: A Theory of Causal Explanation. \bpublisherOxford University Press. \endbibitem
- [36] {barticle}[author] \bauthor\bsnmZhang, \bfnmJiji\binitsJ. (\byear2008). \btitleOn the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. \bjournalArtif. Intell. \bvolume172 \bpages1873–1896. \endbibitem
- [37] {barticle}[author] \bauthor\bsnmZhang, \bfnmJiji\binitsJ. and \bauthor\bsnmSpirtes, \bfnmPeter\binitsP. (\byear2008). \btitleDetection of unfaithfulness and robust causal inference. \bjournalMinds and Machines \bvolume18 \bpages239–271. \endbibitem