This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Semantics for Counterfactuals
in Quantum Causal Models

Ardra Kooderi Suresh 1 [email protected] 1Centre for Quantum Dynamics, Griffith University,
Yugambeh Country, Gold Coast, QLD 4222, Australia
Markus Frembs 1 [email protected] 1Centre for Quantum Dynamics, Griffith University,
Yugambeh Country, Gold Coast, QLD 4222, Australia
Eric G. Cavalcanti 1 [email protected] 1Centre for Quantum Dynamics, Griffith University,
Yugambeh Country, Gold Coast, QLD 4222, Australia
Abstract

We introduce a formalism for the evaluation of counterfactual queries in the framework of quantum causal models, generalising Pearl’s semantics for counterfactuals in classical causal models [1], thus completing the last rung in the quantum analogue of Pearl’s “ladder of causation”. To this end, we define a suitable extension of Pearl’s notion of a ‘classical structural causal model’, which we denote analogously by ‘quantum structural causal model’, and a corresponding extension of Pearl’s three-step procedure of abduction, action, and prediction. We show that every classical (probabilistic) structural causal model can be extended to a quantum structural causal model, and prove that counterfactual queries that can be formulated within a classical structural causal model agree with their corresponding queries in the quantum extension – but the latter is more expressive. Counterfactuals in quantum causal models come in different forms: we distinguish between active and passive counterfactual queries, depending on whether or not an intervention is to be performed in the action step. This is in contrast to the classical case, where counterfactuals are always interpreted in the active sense. Another distinctive feature of our formalism is that it break the connection between causal and counterfactual dependence that exists in the classical case: quantum counterfactuals allow for counterfactual dependence without causal dependence. This distinction between classical and quantum causal models may shed light on how the latter can reproduce quantum correlations that violate Bell inequalities while being faithful to the relativistic causal structure.

1 Introduction

The world of alternative possibilities has been pondered upon and analyzed routinely, in many fields of study including but not limited to social [2] and public policy [3], psychiatry [4], economy [5], weather and climate change [6], artificial intelligence [7], philosophy and causality [8, 9]. For example, questions involving counterfactuals can have important social and legal implications, such as “Given that the patient has died after treatment, would they have survived had they been given a different treatment?”, or “how many lives could the US have saved had it authorized booster vaccines sooner?”[10].

The status of counterfactual questions also figures centrally in debates about quantum mechanics [11], where results such as Bell’s theorem [12] and the Kochen-Specker theorem [13] have been interpreted as requiring the abandonment of “counterfactual definiteness”[14], encapsulated in Peres’ famous dictum “unperformed experiments have no results” [15]. Could this assertion be used by a lawyer in an argument to dismiss a medical malpractice lawsuit as meaningless? Presumably not. Dismissing all counterfactual questions as meaningless due to quantum theory thus seems too strong. Here, we seek to delineate what counterfactual questions involving quantum systems can be unambiguously answered when unambiguously formulated, and to provide some direction for resolving the ambiguity that is inherent in counterfactual questions that are not so carefully constructed.

The semantics of counterfactuals has a controversial history. In one of the early accounts, David Lewis [16] proposed to evaluate counterfactuals via a similarity analysis of possible worlds, where “a counterfactual ‘If it were that A, then it would be that C’ is (non-vacuously) true if and only if some (accessible) world where both A and C are true is more similar to our actual world, overall, than is any world where A is true but C is false” [16]. This analysis is inevitably vague, as it requires an account of “similarity” among possible worlds, which Lewis attempts to resolve via a system of priorities. The goal is to identify closest worlds as possible worlds in which things are kept more or less the same as in our actual world, except for some ‘minimal changes’, required to make the antecedent of a given counterfactual true.

A recent approach, due to Judea Pearl, proposes to define counterfactuals in terms of a sufficiently well-specified causal model for a given situation, denoted by a (classical) structural causal model [1]. In Pearl’s approach, the ‘minimal changes’ required to make the antecedent of a counterfactual true are conceptualised in terms of an intervention, which breaks the causal connections into the variable being intervened upon while fixing it to the required counterfactual value. Structural causal models feature at the top of a hierarchy of progressively sophisticated models that can answer progressively sophisticated questions, which Pearl has dubbed the “ladder of causation” [17] (see Fig. 1).

As is well known, however, the classical causal model-framework of Pearl fails to reproduce quantum correlations while maintaining faithfulness to the relativistic causal structure—as vividly expressed by Bell’s theorem [12] and recent ‘fine-tuning theorems’ [18, 19, 20]. The program of quantum causal models [21, 22, 23, 24, 25, 26, 27, 28, 29] aims to resolve this tension by extending the classical causal model framework, while maintaining compatibility with relativistic causality. One of the aims of our work is to complete the last rung in the quantum analogue of Pearl’s “ladder of causation”, by proposing a framework to answer counterfactual queries in quantum causal models.

Pearl argues that the levels of interventions and counterfactuals are particularly important for human intelligence and understanding, as they are crucial for our internal modeling of the world and of the effects of our actions. In contrast, he argues that current artificial intelligence (AI) models —however impressive— are still restricted to level 1 of his causal hierarchy [17]. Another motivation of our extension of Pearl’s analysis to the framework of quantum causal models is thus its potential applications for quantum AI.

A key distinction from the classical case is that, due to the indeterminism inherent in quantum causal models, counterfactual queries do not always have truth values (unlike in Lewis’ and Pearl’s accounts). Another difference is that an intervention is not always required in order to make the antecedent of a counterfactual true. This leads to a richer semantics for counterfactuals in the quantum case, which contains Pearl’s classical structural causal model as a special case, as we show.

Finally, an important distinction regards the connection between counterfactual dependence and causal dependence. In Pearl’s account, counterfactual dependence requires causal dependence. Similarly, Lewis [30] proposed an analysis of causal dependence based on his own notion of counterfactual dependence. In contrast, in quantum causal models there can be counterfactual dependence among events without causal dependence. This fact sheds new light on the nature of the compatibility with relativistic causality that is offered by quantum causal models. It can be thought of as a clarification and generalisation of Shimony’s notion of “passion at a distance” [31].

The rest of the paper is organised as follows. In Sec. 2, we review the basic ingredients to Pearl’s ladder of causation (see Fig. 1), as well as his three-step procedure for evaluating counterfactuals based on the notion of (classical) structural causal models. In Sec. 3, we highlight the issues in accommodating quantum theory within this framework, in the light of Bell’s theorem and the assumption of “no fine-tuning” [18, 19]. The framework of quantum causal models aims to resolve this discrepancy. We introduce some key notions and notation of the latter in Sec. 4, which will set the stage for our definition of quantum counterfactuals and their semantics based on a novel notion of quantum structural causal models in Sec. 5. In Sec. 6, we show that Pearl’s classical formalism of counterfactuals naturally embeds into our framework; conversely, in Sec. 7 we elaborate on how our framework generalizes Pearl’s formalism, by distinguishing passive from active counterfactuals in quantum causal models. This results in a difference between causal and counterfactual dependence in quantum causal models, which pinpoints a remarkable departure from classical counterfactual reasoning. We discuss this using the pertinent example of the Bell scenario in Sec. 7.3. In Sec. 8, we briefly review the debate surrounding the notion of counterfactual definiteness in quantum mechanics, and how although it fails in our formalism, this is not the particularly distinctive feature of quantum counterfactuals, and this rejection by itself cannot resolve Bell’s theorem. Sec. 9 reflects on some of the key assumptions to our notion of quantum counterfactuals in the context of recent developments on quantum statistical inference. Sec. 10 concludes with a brief summary and discussion of the results, and questions for further work.

Refer to caption
Figure 1: A depiction of “The Ladder of Causation”. [Republished with permission from Ref. [17].]

2 The Classical Causal Model Framework

This section contains the minimal background on classical causal models and the evaluation of counterfactuals required for the generalization to the quantum case in Sec. 5. We will review a small fraction of the framework outlined in much more detail in Ref. [1]; readers familiar with the latter may readily skip this section.

In his book on causality [1], Judea Pearl identifies a hierarchy of progressively sophisticated models that are capable of answering progressively sophisticated causal queries. This hierarchy is often depicted as the three-rung ‘Ladder of Causation’ (see Fig. 1).

At the bottom of the ladder is the level of association (‘level 1’), related to observations and statistical relations. It answers questions such as “how would seeing XX change my belief in YY?” The second rung is the level of intervention (‘level 2’), which considers questions such as “If I take aspirin, will my headache be cured”? The final rung in the ladder of causation is the level of counterfactuals (‘level 3’), associated with activities such as imagining, retrospecting, and understanding. It considers questions such as “Was it the aspirin that stopped my headache?”, “Had I not taken the aspirin, would my headache not have been cured?” etc. In other words, counterfactuals deal with ‘why’-questions. Formally, levels 1, 2 and 3 are related to Bayesian networks, causal Bayesian networks, and structural causal models, respectively. We will formally define these in the coming subsections.

2.1 Level 1 - Bayesian networks

In Pearl’s framework, level 1 of the ladder of causation (Fig. 1) is the level of association, which encodes statistical data in the form of a probability distribution P(𝐯)=P(v1,,vn)P(\mathbf{v})=P(v_{1},\cdots,v_{n}) over random variables 𝐕={Vi}i=1n\mathbf{V}=\{V_{i}\}_{i=1}^{n}.111Throughout, we will use boldface notation to indicate tuples of variables. The latter are assumed to take values in a finite set, whose elements are denoted by the corresponding lowercase viv_{i}. The proposition ‘Vi=viV_{i}=v_{i}’ represents an event where the random variable ViV_{i} takes the value viv_{i}, and P(vi):=P(Vi=vi)P(v_{i}):=P(V_{i}=v_{i}) denotes the probability that this event occurs.

Statistical independence conditions in a probability distribution can be conveniently represented graphically using directed acyclic graphs (DAGs), which in this context are also known as Bayesian networks. The nodes in a Bayesian network GG represent the random variables 𝐕={Vi}i=1n\mathbf{V}=\{V_{i}\}_{i=1}^{n}, while arrows (‘VjVkV_{j}\rightarrow V_{k}’) in GG impose a ‘kinship’ relation: we call Pa(Vi)=Pai:={Vj𝐕(VjVi)G}\mathrm{Pa}(V_{i})=\mathrm{Pa}_{i}:=\{V_{j}\in\mathbf{V}\mid(V_{j}\rightarrow V_{i})\in G\} the “parents” and Ch(Vi)=Chi:={Vj𝐕(ViVj)G}\mathrm{Ch}(V_{i})=\mathrm{Ch}_{i}:=\{V_{j}\in\mathbf{V}\mid(V_{i}\rightarrow V_{j})\in G\} the “children” of the node ViV_{i}. For example, in Fig. 2, V1V_{1} is the parent node of V2V_{2} and V3V_{3}; V4V_{4} is a child node of V2V_{2}, V3V_{3} and V6V_{6}.

V1V_{1}V2V_{2}V3V_{3}V4V_{4}V5V_{5}V6V_{6}
Figure 2: A directed acyclic graph (DAG) with nodes 𝐕={V1,,V6}\mathbf{V}=\{V_{1},\cdots,V_{6}\} representing random variables, and arrows representing (causal) statistical dependencies.
Definition 1 (Classical Markov condition).

A joint probability distribution P(𝐯)=P(v1,,vn)P(\mathbf{v})=P(v_{1},\cdots,v_{n}) is said to be Markov relative to a DAG GG with nodes 𝐕={Vi}i=1n\mathbf{V}=\{V_{i}\}_{i=1}^{n} if and only if there exist conditional probability distributions P(vi|pai)P(v_{i}|pa_{i}) for each Vi𝐕V_{i}\in\mathbf{V} such that,

P(𝐯)=i=1nP(vi|pai).P(\mathbf{v})=\prod_{i=1}^{n}P(v_{i}|pa_{i})\;. (1)

In general, a probability distribution may be Markov relative to many Bayesian networks, corresponding to different ways it can be decomposed into conditional distributions. Moreover, a Bayesian network will have many distributions which are Markov with respect to it. Note that at this level (level 1), the DAG GG representing a Bayesian network does not carry causal meaning, but is merely a convenient representation of statistical conditional independences.

2.2 Level 2 - Causal Bayesian networks and classical causal models

At level 2 of the hierarchy are causal (Bayesian) networks. In contrast to Bayesian networks, the arrows between nodes 𝐕={Vi}i=1n\mathbf{V}=\{V_{i}\}^{n}_{i=1} in a causal Bayesian network do encode causal relationships. In particular, the parents Pa(Vi)\mathrm{Pa}(V_{i}) of a node ViV_{i} are now interpreted as direct causes of ViV_{i}. Moreover, a causal network is an oracle for interventions. The effect of an intervention is modeled as a “mini-surgery” in the graph that cuts all incoming arrows into the node being intervened upon and sets it to a specified value. We define the do-intervention do(𝐗=𝐱)\mathrm{do}(\mathbf{X}=\mathbf{x}) on a subset of nodes 𝐗𝐕\mathbf{X}\subset\mathbf{V} as the submodel G𝐱,P𝐱\langle G_{\mathbf{x}},P_{\mathbf{x}}\rangle, where G𝐱G_{\mathbf{x}} is the modified DAG with the same nodes as GG, but with all incoming arrows VjViV_{j}\rightarrow V_{i} for Vi𝐗V_{i}\in\mathbf{X} removed from GG, and where P𝐱P_{\mathbf{x}} arises from PP by setting the values at 𝐗\mathbf{X} to 𝐱\mathbf{x}. More precisely, letting 𝐕𝐱=𝐕\𝐗\mathbf{V}_{\mathbf{x}}=\mathbf{V}\backslash\mathbf{X}

P𝐱(𝐯)=Vi𝐗δvi,xiVi𝐕𝐱P(vi|pai).P_{\mathbf{x}}(\mathbf{v})=\prod_{V_{i}\in\mathbf{X}}\delta_{v_{i},x_{i}}\prod_{V_{i}\in\mathbf{V}_{\mathbf{x}}}P(v_{i}|pa_{i})\;. (2)
Definition 2 (Classical Causal Model).

A classical causal model is a pair G,P\langle G,P\rangle, consisting of a directed acyclic graph GG with nodes 𝐕={Vi}i=1n\mathbf{V}=\{V_{i}\}_{i=1}^{n}, a probability distribution PP that is Markov with respect to GG, according to Def. 1 and all its submodels G𝐱,P𝐱\langle G_{\mathbf{x}},P_{\mathbf{x}}\rangle, arising from do-interventions do(𝐗=𝐱)\mathrm{do}(\mathbf{X}=\mathbf{x}) with 𝐗𝐕\mathbf{X}\subset\mathbf{V}.

For example, if we perform a do-intervention do(V2=v2)\mathrm{do}(V_{2}=v_{2}) on the classical causal model G,P\langle G,P\rangle with DAG GG in Fig. 2, then Gv2G_{v_{2}} is the DAG shown in Fig. 3, and the truncated factorization formula for the remaining variables reads

Pv2(v1,v3,v4,v5,v6)=P(v1)P(v3|v1)P(v4|V2=v2,v3,v6)P(v5|v4)P(v6).P_{v_{2}}(v_{1},v_{3},v_{4},v_{5},v_{6})=P(v_{1})P(v_{3}|v_{1})P(v_{4}|V_{2}=v_{2},v_{3},v_{6})P(v_{5}|v_{4})P(v_{6})\;. (3)
V1V_{1}V2V_{2}V3V_{3}V4V_{4}V5V_{5}V6V_{6}
Figure 3: The directed acyclic graph from Fig. 2 after a do-intervention on node V2V_{2}. The effect of this do-intervention is graphically represented by removing all the arrows into V2V_{2}.

2.3 Level 3 - Structural causal models and the evaluation of counterfactuals

At level 3 of the hierarchy are (classical) structural causal models. Such models consist of a set of nodes (𝐕,𝐔)(\mathbf{V},\mathbf{U}), distinguished into endogenous variables 𝐕\mathbf{V} and exogenous variables 𝐔\mathbf{U}, together with a set of functions 𝐅\mathbf{F} that encode structural relations between the variables. The term “exogenous” indicates that any causes of such variables lie outside the model; they can be thought of as local ‘noise variables’.

Definition 3 (Classical Structural Causal Model).

A (classical) structural causal model (CSM) MM is a triple M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle, where 𝐕={V1,,Vn}\mathbf{V}=\{V_{1},\dots,V_{n}\} is a set of endogenous variables, 𝐔={U1,,Un}\mathbf{U}=\{U_{1},\cdots,U_{n}\} is a set of exogenous variables and 𝐅={f1,,fn}\mathbf{F}=\{f_{1},\dots,f_{n}\} is a set of functions such that vi=fi(pai,ui)v_{i}=f_{i}(pa_{i},u_{i}) for some Pai𝐕Pa_{i}\subseteq\mathbf{V}.

Every structural causal model MM is associated with a directed graph G(M)G(M), which represents the causal structure of the model as specified by the relations G(M)(VjfiVi)VjPaiG(M)\ni(V_{j}\stackrel{{\scriptstyle f_{i}}}{{\rightarrow}}V_{i})\Leftrightarrow V_{j}\in\mathrm{Pa}_{i}. Here, we will restrict CSMs to those defining directed acyclic graphs. For example, the causal model of Fig. 2 can be extended to a CSM with causal relations as depicted in Fig. 4.

U1U_{1}V1V_{1}V2V_{2}U2U_{2}V3V_{3}V4V_{4}V5V_{5}V6V_{6}U3U_{3}U4U_{4}U5U_{5}U6U_{6}
Figure 4: A classical structural causal model (CSM) with endogenous nodes 𝐕={V1,,V6}\mathbf{V}=\{V_{1},\cdots,V_{6}\} and exogenous nodes 𝐔={U1,,U6}\mathbf{U}=\{U_{1},\cdots,U_{6}\}.

In analogy with the do-interventions for causal Bayesian networks in Sec. 2.2, we define do-interventions in a CSM M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle. Let 𝐗𝐕\mathbf{X}\subset\mathbf{V} with corresponding exogenous variables 𝐔(𝐗)𝐔\mathbf{U}(\mathbf{X})\subset\mathbf{U} and functions 𝐅(𝐗)𝐅\mathbf{F}(\mathbf{X})\subset\mathbf{F}, and let 𝐕𝐱=𝐕\𝐗\mathbf{V}_{\mathbf{x}}=\mathbf{V}\backslash\mathbf{X}, 𝐔𝐱=𝐔\𝐔(𝐗)\mathbf{U}_{\mathbf{x}}=\mathbf{U}\backslash\mathbf{U}(\mathbf{X}) and 𝐅𝐱=𝐅\𝐅(𝐗)\mathbf{F}_{\mathbf{x}}=\mathbf{F}\backslash\mathbf{F}(\mathbf{X}). Then the do-intervention do(𝐗=𝐱)\mathrm{do}(\mathbf{X}=\mathbf{x}) defines a submodel M𝐱=𝐔𝐱,(𝐕𝐱,𝐗=𝐱),𝐅𝐱M_{\mathbf{x}}=\langle\mathbf{U}_{\mathbf{x}},(\mathbf{V}_{\mathbf{x}},\mathbf{X}=\mathbf{x}),\mathbf{F}_{\mathbf{x}}\rangle. In terms of the causal graph G(M)G(M), the action do(𝐗=𝐱)\mathrm{do}(\mathbf{X}=\mathbf{x}) removes all incoming arrows to the nodes XiX_{i}, thus generating a new graph G(M𝐱)G(M_{\mathbf{x}}).

The submodel M𝐱M_{\mathbf{x}} represents a minimal change to the original model MM such that 𝐗=𝐱\mathbf{X}=\mathbf{x} is true while keeping the values of the exogenous variables fixed – which are thought of as “background conditions”. In turn, we can use M𝐱M_{\mathbf{x}} to analyze counterfactual statements with antecedent 𝐗=𝐱\mathbf{X}=\mathbf{x}.

Definition 4 (Counterfactual).

Let M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle be a structural causal model, and let 𝐗,𝐘𝐕\mathbf{X},\mathbf{Y}\subseteq\mathbf{V}. The counterfactual statement “𝐘\mathbf{Y} would have been 𝐲\mathbf{y}, had 𝐗\mathbf{X} been 𝐱\mathbf{x}, in a situation specified by the background variables 𝐔=𝐮\mathbf{U}=\mathbf{u}” is denoted by 𝐘𝐱(𝐮)=𝐲\mathbf{Y}_{\mathbf{x}}(\mathbf{u})=\mathbf{y}, where 𝐘𝐱(𝐮)\mathbf{Y}_{\mathbf{x}}(\mathbf{u}) is the potential response of 𝐘\mathbf{Y} to the action do(𝐗=𝐱)\mathrm{do}(\mathbf{X}=\mathbf{x}), that is, the solution for 𝐘\mathbf{Y} of the modified set of equations 𝐅𝐱\mathbf{F}_{\mathbf{x}} in the submodel M𝐱M_{\mathbf{x}}. 𝐗=𝐱\mathbf{X}=\mathbf{x} is called the antecedent and 𝐘=𝐲\mathbf{Y}=\mathbf{y} is the consequent of the counterfactual.

Note that given any complete specification 𝐔=𝐮\mathbf{U}=\mathbf{u} of the exogenous variables, every counterfactual statement of the form above has a truth value.222Here, by “truth values” we mean Boolean truth values ‘true’ and ‘false’. We do not consider more general truth values such as ‘indefinite’ which may arise e.g. in intuitionistic logic. Denoting a “causal world” by the pair M,𝐮\langle M,\mathbf{u}\rangle, we can say that a counterfactual has a truth value in every causal world where it can be defined. This is the case even when the model MM with 𝐔=𝐮\mathbf{U}=\mathbf{u} determines 𝐗\mathbf{X} to have a value different from that specified in the antecedent, because the counterfactual is evaluated relative to the modified submodel M𝐱M_{\mathbf{x}}.

Definition 5 (Probabilistic structural causal model).

A probabilistic structural causal model (PSM) is defined by a pair M,P(𝐮)\langle M,P(\mathbf{u})\rangle, where M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle is a structural causal model (see Def  3) and P(𝐮)P(\mathbf{u}) is a probability distribution defined over the exogenous variables 𝐔\mathbf{U} of MM.

Since every endogenous variable Vi𝐕V_{i}\in\mathbf{V} is a function of UiU_{i} and its parent nodes, fi:Ui×PaiVif_{i}:U_{i}\times\mathrm{Pa}_{i}\rightarrow V_{i}, the distribution P(𝐮)P(\mathbf{u}) in a PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle defines a probability distribution over every subset 𝐘𝐕\mathbf{Y}\subseteq\mathbf{V} by

P(𝐲):=P(𝐘=𝐲)=𝐮|𝐘(𝐮)=𝐲P(𝐮).P(\mathbf{y}):=P(\mathbf{Y}=\mathbf{y})=\sum_{\mathbf{u}|\mathbf{Y}(\mathbf{u})=\mathbf{y}}P(\mathbf{u})\;. (4)

In particular, the probability of the counterfactual𝐘\mathbf{Y} would have been 𝐲\mathbf{y}, had 𝐗\mathbf{X} been 𝐱\mathbf{x}” can be computed using the submodel M𝐱M_{\mathbf{x}} as

P(𝐘𝐱=𝐲)=𝐮|𝐘𝐱(𝐮)=𝐲P(𝐮).P(\mathbf{Y}_{\mathbf{x}}=\mathbf{y})=\sum_{\mathbf{u}|\mathbf{Y}_{\mathbf{x}}(\mathbf{u})=\mathbf{y}}P(\mathbf{u})\;. (5)

More generally, the probability of a counterfactual query might be conditioned on prior observations ‘𝐞\mathbf{e}’. In this case, we first update the probability distribution P(𝐮)P(\mathbf{u}) in the PSM to obtain a modified probability distribution P(𝐮𝐞)P(\mathbf{u}\mid\mathbf{e}) conditioned on observed data 𝐞\mathbf{e} and then use this updated probability distribution to evaluate the probability for the counterfactual as in Eq. (5). Combining the above steps, one arrives at the following theorem, proved in Ref. [1]:

Theorem 1 (Pearl [1]).

Given a probabilistic structural causal model (PSM) M,P(𝐮)\langle M,P(\mathbf{u})\rangle (see Def  5), and subsets 𝐗,𝐘,𝐄𝐕\mathbf{X},\mathbf{Y},\mathbf{E}\subset\mathbf{V}, the probability for the counterfactual “𝐘\mathbf{Y} would have been 𝐲\mathbf{y}, had 𝐗\mathbf{X} been 𝐱\mathbf{x}”, given the observation of 𝐄=𝐞\mathbf{E}=\mathbf{e}, is denoted by P(𝐘𝐱|𝐞)P(\mathbf{Y}_{\mathbf{x}}|\mathbf{e}) and can be evaluated systematically by a three-step procedure:

  • Step 1: Abduction: using the observed data 𝐄=𝐞\mathbf{E}=\mathbf{e}, use Bayesian inference to update the probability distribution P(𝐮)P(\mathbf{u}) in the PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle to obtain P(𝐮|𝐞)P(\mathbf{u}|\mathbf{e}).

  • Step 2: Action: perform a do-intervention do(𝐗=𝐱)\mathrm{do}(\mathbf{X}=\mathbf{x}), by which the values of 𝐗𝐕\mathbf{X}\subset\mathbf{V} are specified independently of their parent nodes. The resultant model is denoted as M𝐱M_{\mathbf{x}}.

  • Step 3: Prediction: in the modified model M𝐱,P(𝐮|𝐞)\langle M_{\mathbf{x}},P(\mathbf{u}|\mathbf{e})\rangle, compute the probability of 𝐘\mathbf{Y} via Eq. (5).

As an example, consider the situation where 𝐗=𝐱\mathbf{X}=\mathbf{x} and 𝐘=𝐲\mathbf{Y}=\mathbf{y} are observed, that is, 𝐄=(𝐗,𝐘)\mathbf{E}=(\mathbf{X},\mathbf{Y}).333Note that 𝐗,𝐄\mathbf{X},\mathbf{E} and 𝐘,𝐄\mathbf{Y},\mathbf{E} in Thm. 1 are not necessarily disjoint. We evaluate the probability of the counterfactual “𝐘\mathbf{Y} would have been 𝐲\mathbf{y}^{\prime}, had 𝐗\mathbf{X} been 𝐱\mathbf{x}^{\prime}” as:

P(𝐘𝐱=𝐲|𝐗=𝐱,𝐘=𝐲)\displaystyle P(\mathbf{Y}_{\mathbf{x}^{\prime}}=\mathbf{y}^{\prime}|\mathbf{X}=\mathbf{x},\mathbf{Y}=\mathbf{y}) =𝐮|𝐘𝐱(𝐮)=𝐲P(𝐮|𝐗=𝐱,𝐘=𝐲)=𝐮|𝐘𝐱(𝐮)=𝐲P(𝐗=𝐱,𝐘=𝐲|𝐮)P(𝐮)P(𝐗=𝐱,𝐘=𝐲),\displaystyle=\sum_{\mathbf{u}|\mathbf{Y}_{\mathbf{x}^{\prime}}(\mathbf{u})=\mathbf{y}^{\prime}}P(\mathbf{u}|\mathbf{X}=\mathbf{x},\mathbf{Y}=\mathbf{y})=\sum_{\mathbf{u}|\mathbf{Y}_{\mathbf{x}^{\prime}}(\mathbf{u})=\mathbf{y}^{\prime}}\frac{P(\mathbf{X}=\mathbf{x},\mathbf{Y}=\mathbf{y}|\mathbf{u})P(\mathbf{u})}{P(\mathbf{X}=\mathbf{x},\mathbf{Y}=\mathbf{y})}\;, (6)

where we used Eq. (5) in the first and Bayes’ theorem in the second step.444Note that a probabilistic structural causal model implies the existence of a joint probability distribution over all variables. In this case, an alternative expression for the probability of the counterfactual in Eq. (6) reads P(𝐘𝐱=𝐲|𝐗=𝐱,𝐘=𝐲)=P(𝐘𝐱=𝐲,𝐗=𝐱,𝐘=𝐲)P(𝐗=𝐱,𝐘=𝐲).\displaystyle P(\mathbf{Y}_{\mathbf{x}^{\prime}}=\mathbf{y}^{\prime}|\mathbf{X}=\mathbf{x},\mathbf{Y}=\mathbf{y})=\dfrac{P(\mathbf{Y}_{\mathbf{x}^{\prime}}=\mathbf{y}^{\prime},\mathbf{X}=\mathbf{x},\mathbf{Y}=\mathbf{y})}{P(\mathbf{X}=\mathbf{x},\mathbf{Y}=\mathbf{y})}\;. (7) In the quantum case such a distribution does not generally exist [13].

In temporal metaphors, step 1 explains the past (the exogenous variables 𝐔\mathbf{U}) in light of the current evidence 𝐞\mathbf{e}; step 2 minimally bends the course of history to comply with the hypothetical antecedent and step 3 predicts the future based on our new understanding of the past and our newly established condition.

3 Quantum violations of classical causality

Classical causal models face notorious difficulties in explaining quantum correlations. Firstly, Bell’s theorem [12, 32, 33] can be interpreted in terms of classical causal models, as proving that such models cannot reproduce all quantum correlations (in particular, those that violate a Bell inequality) while maintaining relativistic causal structure and the assumption of “free choice”. The latter is the assumption that experimentally controllable parameters like measurement settings can always be chosen via “free variables”, which can be understood as variables that have no relevant causes in a causal model for the experiment. That is, they share no common causes with, nor are caused by, any other variables in the model. Thus, “free variables” can be modeled as exogenous variables.

For concreteness, consider the standard Bell scenario with a causal structure represented in the DAG in Fig. 5, where variables AA and BB denote the outcomes of experiments performed by two agents, Alice and Bob. Variables XX and YY denote their choices of experiment, which are assumed to be “free variables” and thus have no incoming arrows. Since Alice and Bob perform measurements in space-like separated regions, no relativistic causal connection is allowed between XX and BB nor between YY and AA. In this scenario, Reichenbach’s principle of common cause [34, 22] – which is a consequence of the classical causal Markov condition – implies the existence of common causes underlying any correlations between the two sides of the experiment. Λ\Lambda denotes a complete specification of any such common causes. As we are assuming a relativistic causal structure, those must be in the common past light cone of Alice’s and Bob’s experiments.

Λ\LambdaAABBXXYY
Figure 5: A Directed Acyclic Graph (DAG) depicting the standard Bell scenario.

Marginalizing over the common cause variable Λ\Lambda, the classical causal Markov condition applied to the DAG in Fig. 5 implies the factorization:

P(AB|XY)=ΛP(Λ)P(A|XΛ)P(B|YΛ).P(AB|XY)=\sum_{\Lambda}P(\Lambda)P(A|X\Lambda)P(B|Y\Lambda)\;. (8)

A model satisfying Eq. (8) is also called a local hidden variable model. Importantly, local hidden variable models satisfy the Bell inequalities [12, 32], which have been experimentally violated by quantum correlations [35, 36, 37, 38].555The 2022 Nobel Prize in Physics was awarded in part for the demonstration of Bell inequality violations. It follows that no classical causal model can explain quantum correlations under the above assumptions.

More recently, Wood and Spekkens [18] showed that certain Bell inequality violations cannot be reproduced by any classical causal model that satisfies the assumption of “no fine-tuning”. This is the requirement that any conditional independence between variables in the model be explained as arising from the structure of the causal graph, rather than from fine-tuned model parameters. This assumption is essential for causal discovery – without it, it is generally not possible to experimentally determine which of a number of candidate graphs faithfully represents a given situation. This result was later generalized to arbitrary Bell and Kochen-Specker inequality violations in Refs. [19, 20].

These results motivate the search for a generalization of classical causal models that accommodates quantum correlations and allows for causal discovery, while maintaining faithfulness to relativistic causal structure. Ref. [22] considers modifications of Reichenbach’s principle of common cause [34]—which is implied by the causal Markov condition in the special case of the common cause scenario in Fig. 5, as assumed in Bell’s theorem [12, 32]. The authors of Ref. [22] argue that one could maintain the principle of common cause—the requirement that correlations between two causally disconnected events should be explained via common causes—by relaxing the condition that a full specification of those common causes factorizes the probabilities for the events in question, as by Eq. (8). Using the Leifer-Spekkens formalism for quantum conditional states, they instead propose that Eq. (8) should be replaced by the requirement that the channels between the common cause and Alice and Bob’s labs factorize—or more precisely, the Choi-Jamiołkowski operators corresponding to those channels. This is essentially the type of resolution of Bell’s theorem that is provided by quantum causal models, to which we now turn. After introducing structural quantum causal models in Sec. 4.1 and quantum counterfactuals queries in Sec. 5, in Sec. 7.3 we will revisit the Bell scenario from the perspective of counterfactuals in quantum causal models.

4 Quantum causal models

In recent years a growing number of papers have addressed the problem of generalizing the classical causal models formalism to accommodate quantum correlations, in a way that is compatible with relativistic causality and faithfulness. This has led to the development of various frameworks for quantum causal models. The more developed of those are the frameworks by Costa and Shrapnel [26] and Barrett, Lorenz and Oreshkov [29]. In this work, we use a combination of the notation and features of both of these formalisms.

Quantum nodes and quantum interventions. Recall that in a classical causal model, a node represents a locus for potential interventions. In order to generalize this to the quantum case, we start by introducing a quantum node AA, which is associated with two Hilbert spaces Ain\mathcal{H}_{A^{\mathrm{in}}} and Aout\mathcal{H}_{A^{\mathrm{out}}}, corresponding to the incoming system and the outgoing system, respectively. An intervention at a quantum node AA is represented by a quantum instrument Az\mathcal{I}^{z}_{A} (see Fig. 6). This is a set of trace-non-increasing completely positive (CP) maps from the space of linear operators on Ain\mathcal{H}_{A^{\mathrm{in}}} to the space of linear operators on Aout\mathcal{H}_{A^{\mathrm{out}}},

Az={Aa|z:(Ain)(Aout)}a,\mathcal{I}_{A}^{z}=\{\mathcal{M}_{A}^{a|z}:\mathcal{L}({\mathcal{H}_{A^{\mathrm{in}}}})\rightarrow\mathcal{L}({\mathcal{H}_{A^{\mathrm{out}}}})\}_{a}\;, (9)

such that A=aAa|z\mathcal{M}_{A}=\sum_{a}\mathcal{M}_{A}^{a|z} is a completely positive, trace-preserving (CPTP) map—i.e. a quantum channel.666We sometimes write A|z\mathcal{M}_{A}^{|z} for this CPTP map to indicate that it is associated with the instrument Az\mathcal{I}_{A}^{z}. Note however that a given CPTP map will in general be associated with many different instruments. Here, zz is a label for the (choice of) instrument, and aa labels the classical outcome of the instrument, which occurs with probability Pz(a)=Tr[Aa|z(ρAin)]P_{z}(a)=\mathrm{Tr}[\mathcal{M}_{A}^{a|z}(\rho_{{A^{\mathrm{in}}}})] for an input state ρAin(Ain)\rho_{{A^{\mathrm{in}}}}\in\mathcal{L}({\mathcal{H}_{A^{\mathrm{in}}}}); consequently, the state on the output system conditioned on the outcome of the intervention is given by Aa|z(ρAin)/Pz(a)\mathcal{M}_{A}^{a|z}(\rho_{{A^{\mathrm{in}}}})/P_{z}(a). For simplicity, we consider finite-dimensional systems only.

AAzzaaAout\mathcal{H}_{{A^{\mathrm{out}}}}Ain\mathcal{H}_{{A^{\mathrm{in}}}}
Figure 6: A quantum node AA is associated with an incoming (Ain\mathcal{H}_{{A^{\mathrm{in}}}}) and outgoing Hilbert space (Aout)\mathcal{H}_{{A^{\mathrm{out}}}}). It can be intervened on via a quantum instrument Az={Aa|z}a\mathcal{I}_{A}^{z}=\{\mathcal{M}_{A}^{a|z}\}_{a}, resulting in an outcome ‘aa’ corresponding to a completely-positive (CP) map Aa|z\mathcal{M}_{A}^{a|z}.

Using the Choi-Jamiołkowski (CJ) isomorphism,777Here, we follow the notation in Ref. [26]. This differs from the one used in Refs. [27, 39, 29], which applies a basis-independent version of the Choi-Jamiołkowski isomorphism, by identifying the Hilbert space associated with outgoing systems with its dual (see also Ref. [40]). we represent a quantum instrument Az={Aa|z}a\mathcal{I}_{A}^{z}=\{\mathcal{M}_{A}^{a|z}\}_{a} in terms of a positive operator-valued measure aτAa|za\mapsto\tau_{A}^{a|z}. More precisely, every completely positive map Aa|z\mathcal{M}_{A}^{a|z} is represented by a positive semi-definite operator τAa|z(AoutAin)\tau_{A}^{a|z}\in\mathcal{L}(\mathcal{H}_{{A^{\mathrm{out}}}}\otimes\mathcal{H}_{{A^{\mathrm{in}}}}) given by

τAa|z=i,jAa|z(|ij|)AoutT|ji|Ain.\tau^{a|z}_{A}=\sum_{i,j}\mathcal{M}_{A}^{a|z}(|i\rangle\langle j|)^{T}_{{A^{\mathrm{out}}}}\otimes|j\rangle\langle i|_{{A^{\mathrm{in}}}}\;. (10)

In a slight abuse of notation, we will write Az={Aa|z}aCJ{τAa|z}a\mathcal{I}^{z}_{A}=\{\mathcal{M}^{a|z}_{A}\}_{a}\stackrel{{\scriptstyle CJ}}{{\leftrightarrow}}\{\tau^{a|z}_{A}\}_{a} also for the representation of an instrument in terms of positive operators under the Choi-Jamiołkowski isomorphism. Note that the fact that A=aAa|z\mathcal{M}_{A}=\sum_{a}\mathcal{M}_{A}^{a|z} is trace-preserving imposes the following trace condition on τA|z=aτAa|z\tau^{|z}_{A}=\sum_{a}\tau^{a|z}_{A} (cf. Ref. [41]),

TrAout[τA|z]=𝕀Ain.\mathrm{Tr}_{{A^{\mathrm{out}}}}[\tau^{|z}_{A}]=\mathbb{I}_{{A^{\mathrm{in}}}}\;. (11)

Quantum process operators. In a quantum causal model we will distinguish between two types of quantum operations: quantum interventions, which are local to a quantum node, and a quantum process operator, which acts between quantum nodes and contains information about the causal (influence) relations between the nodes in the model.

To motivate the general definition (Def. 6 below), we first consider the simplest case: for a single quantum node AA, a quantum process operator is any operator σA(AinAout)\sigma_{A}\in\mathcal{L}(\mathcal{H}_{\mathrm{A}^{\mathrm{in}}}\otimes\mathcal{H}_{\mathrm{A}^{\mathrm{out}}}) such that the pairing888With Ref. [29], we will adopt the shorthand TrA[]:=TrAinAout[]\mathrm{Tr}_{A}[\cdots]:=\mathrm{Tr}_{{A^{\mathrm{in}}}{A^{\mathrm{out}}}}[\cdots].

TrA[σAτAa|z]=TrAinAout[σAτAa|z]=:Pz(a)[0,1],\mathrm{Tr}_{A}[\sigma_{A}\tau^{a|z}_{A}]=\mathrm{Tr}_{{A^{\mathrm{in}}}{A^{\mathrm{out}}}}[\sigma_{A}\tau^{a|z}_{A}]=:P_{z}(a)\in[0,1]\;, (12)

defines a probability for every positive semi-definite operator τAa|z\tau^{a|z}_{A}, and satisfies the normalisation condition

aPz(a)=TrA[σAτA|z]=1,\sum_{a}P_{z}(a)=\mathrm{Tr}_{A}[\sigma_{A}\tau^{|z}_{A}]=1\;, (13)

for every quantum channel (CPTP map) τA|z\tau^{|z}_{A}. Consequently, given a process operator σA\sigma_{A}, we may interpret Pz(a)P_{z}(a) as the probability to obtain outcome aa when performing an instrument zz.

As a generalisation of the Born rule (on the composite system AinAout\mathcal{H}_{A^{\mathrm{in}}}\otimes\mathcal{H}_{A^{\mathrm{out}}}), Eq. (12) in particular implies that σA\sigma_{A} is positive, hence, corresponds to a completely positive map :(Aout)(Ain)\mathcal{E}:\mathcal{L}({\mathcal{H}_{A^{\mathrm{out}}}})\rightarrow\mathcal{L}({\mathcal{H}_{A^{\mathrm{in}}}}).

More generally, it will be useful to introduce a notation for the positive semi-definite operator ρB|A\rho^{\mathcal{E}}_{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}B|A}} corresponding to a bipartite channel of the form :(Aout)(Bin)\mathcal{E}:\mathcal{L}({\mathcal{H}_{A^{\mathrm{out}}}})\rightarrow\mathcal{L}({\mathcal{H}_{B^{\mathrm{in}}}}):

ρB|A=ρBin|Aout:=i,j(|ij|)Bin|iAoutj|.\rho^{\mathcal{E}}_{B|A}=\rho^{\mathcal{E}}_{{B^{\mathrm{in}}}|{A^{\mathrm{out}}}}:=\sum_{i,j}\mathcal{E}(|i\rangle\langle j|)_{{B^{\mathrm{in}}}}\otimes|i\rangle_{{A^{\mathrm{out}}}}\langle j|\;. (14)

Note that ρB|A\rho^{\mathcal{E}}_{B|A} is distinguished from the representation of the Choi matrices corresponding to quantum instruments in Eq. (10) by an overall transposition, indicating the different roles played by instruments and processes in the inner product of Eq. (12). In particular, we have σA=ρA|A=ρAin|Aout\sigma_{A}=\rho^{\mathcal{E}}_{A|A}=\rho^{\mathcal{E}}_{{A^{\mathrm{in}}}|{A^{\mathrm{out}}}} for some channel satisfying the normalisation condition in Eq. (13).

Generalizing this idea to finitely many quantum nodes, a quantum process operator is defined as follows.

Definition 6 (Process operator).

A (quantum) process operator over quantum nodes 𝐀={A1,,An}\mathbf{A}=\{A_{1},\cdots,A_{n}\} is a positive semi-definite operator σ𝐀=σA1,,An(i=1nAiinAiout)+\sigma_{\mathbf{A}}=\sigma_{A_{1},\cdots,A_{n}}\in\mathcal{L}(\bigotimes_{i=1}^{n}\mathcal{H}_{A_{i}^{\mathrm{in}}}\otimes\mathcal{H}_{A_{i}^{\mathrm{out}}})_{+}, which satisfies the normalisation condition,

TrA1An[σA1,,An(τA1|z1τAn|zn)]=1,\mathrm{Tr}_{A_{1}\cdots A_{n}}[\sigma_{A_{1},\cdots,A_{n}}(\tau^{|z_{1}}_{A_{1}}\otimes\cdots\otimes\tau^{|z_{n}}_{A_{n}})]=1\;, (15)

for any choice of quantum channels τA1|z1,,τAn|zn\tau^{|z_{1}}_{A_{1}},\cdots,\tau^{|z_{n}}_{A_{n}} at nodes A1,,AnA_{1},\cdots,A_{n}.999Every process operator satisfies a trace condition analogous to Eq. (11): TrA1inAnin[σA1,,An]=𝕀A1out𝕀Anout\mathrm{Tr}_{A_{1}^{\mathrm{in}}\cdots A_{n}^{\mathrm{in}}}[\sigma_{A_{1},\cdots,A_{n}}]=\mathbb{I}_{A_{1}^{\mathrm{out}}}\otimes\cdots\otimes\mathbb{I}_{A_{n}^{\mathrm{out}}}, hence, σA1An\sigma_{A_{1}\cdots A_{n}} defines a CPTP map (A1outAnout)(A1inAnin)\mathcal{L}(\mathcal{H}_{A^{\mathrm{out}}_{1}}\otimes\cdots\otimes\mathcal{H}_{A^{\mathrm{out}}_{n}})\rightarrow\mathcal{L}(\mathcal{H}_{A^{\mathrm{in}}_{1}}\otimes\cdots\otimes\mathcal{H}_{A^{\mathrm{in}}_{n}}). Yet, the converse is generally not true.

Comparing with Eq. (12), we define the probability of obtaining outcomes 𝐚={a1,,an}\mathbf{a}=\{a_{1},\cdots,a_{n}\} when performing interventions {{τA1a1|z1}a1,,{τAnan|zn}an}\{\{\tau_{A_{1}}^{a_{1}|z_{1}}\}_{a_{1}},\cdots,\{\tau_{A_{n}}^{a_{n}|z_{n}}\}_{a_{n}}\} at quantum nodes 𝐀={A1,,An}\mathbf{A}=\{A_{1},\cdots,A_{n}\} by

P𝐳(𝐚)=TrA1An[σA1,,An(τA1a1|z1τAnan|zn)].P_{\mathbf{z}}(\mathbf{a})=\mathrm{Tr}_{A_{1}\cdots A_{n}}[\sigma_{A_{1},\cdots,A_{n}}(\tau_{A_{1}}^{a_{1}|z_{1}}\otimes\cdots\otimes\tau_{A_{n}}^{a_{n}|z_{n}})]\;. (16)

Eq. (16) defines a generalization of the Born rule (on the composite system i=1nAiinAiout\bigotimes_{i=1}^{n}\mathcal{H}_{A_{i}^{\mathrm{in}}}\otimes\mathcal{H}_{A_{i}^{\mathrm{out}}}) [42, 28].

Quantum causal models. With the above ingredients, we obtain quantum generalizations of the causal Markov condition in Def. 1 and thereby of classical causal models (causal networks) in Def. 2.

Definition 7 (Quantum causal Markov condition).

A quantum process operator σ𝐀=σA1,,An\sigma_{\mathbf{A}}=\sigma_{A_{1},\cdots,A_{n}} is Markov for a given DAG GG if and only if there exist positive operators ρAi|Pa(Ai)\rho_{A_{i}|\mathrm{Pa}(A_{i})} such that TrPaiout[ρAi|Pa(Ai)]=𝕀Aiin\mathrm{Tr}_{\mathrm{Pa}_{i}^{\mathrm{out}}}[\rho_{A_{i}|\mathrm{Pa}(A_{i})}]=\mathbb{I}_{A_{i}^{\mathrm{in}}} (corresponding to quantum channels i:(Paiout)(Aiin)\mathcal{E}_{i}:\mathcal{L}(\mathcal{H}_{\mathrm{Pa}_{i}^{\mathrm{out}}})\rightarrow\mathcal{L}(\mathcal{H}_{A_{i}^{\mathrm{in}}})) for each quantum node AiA_{i} of GG such that101010Here and below, we implicitly assume the individual operators ρAi|Pa(Ai)\rho_{A_{i}|\mathrm{Pa}(A_{i})} to be ‘padded’ with identities on all nodes not explicitly involved in ρAi|Pa(Ai)\rho_{A_{i}|\mathrm{Pa}(A_{i})} such that the multiplication of operators is well-defined.

σ𝐀=i=1nρAi|Pa(Ai),\sigma_{\mathbf{A}}=\prod_{i=1}^{n}\rho_{A_{i}|\mathrm{Pa}(A_{i})}\;, (17)

and [ρAi|Pa(Ai),ρAj|Pa(Aj)]=0[\rho_{A_{i}|\mathrm{Pa}(A_{i})},\rho_{A_{j}|\mathrm{Pa}(A_{j})}]=0 for all i,j{1,,n}i,j\in\{1,\cdots,n\}.

A1A_{1}A3A_{3}A2A_{2}A4A_{4}
(a)
A1A_{1}A2A_{2}A3A_{3}A4A_{4}Λ1\Lambda_{1}Λ2\Lambda_{2}Λ3\Lambda_{3}Λ4\Lambda_{4}
(b)
Figure 7: Example of a quantum causal model, (a) with four endogenous nodes 𝐀={A1,A2,A3,A4}\mathbf{A}=\{A_{1},A_{2},A_{3},A_{4}\}, (b) including exogenous nodes 𝚲={Λ1,Λ2,Λ3,Λ4}\bm{\Lambda}=\{\Lambda_{1},\Lambda_{2},\Lambda_{3},\Lambda_{4}\} in a quantum structural causal model.
Definition 8 (Quantum causal model).

A quantum causal model is a pair G,σ𝐀\langle G,\sigma_{\mathbf{A}}\rangle, consisting of a DAG GG, whose vertices represent quantum nodes 𝐀={A1,,An}\mathbf{A}=\{A_{1},\cdots,A_{n}\}, and a quantum process operator that is Markov with respect to GG, according to Def. 7.

4.1 Quantum structural causal models

Recall that in the classical case, counterfactuals are evaluated relative to a classical structural causal model (CSM) M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle (see Def. 3), which associates an exogenous variable Ui𝐔U_{i}\in\mathbf{U} and a function 𝐅fi:Pa(Ai)×UiAi\mathbf{F}\ni f_{i}:\mathrm{Pa}(A_{i})\times U_{i}\rightarrow A_{i}, to every node Vi𝐕V_{i}\in\mathbf{V}. Given a CSM, we thus have full information about the underlying process and any uncertainty arises solely from our lack of knowledge about the values of the variables at exogenous nodes, which is encoded in the probability distribution P(𝐮)P(\mathbf{u}) of the probabilistic structural causal model (PSM) M,P(𝐮)\langle M,P(\mathbf{u})\rangle.

In order to define a notion of quantum structural causal models, we find it useful to introduce the lack of knowledge on exogenous nodes directly in terms of a special type of quantum instruments,111111Here, our formalism diverges from the one in Ref. [29], which assigns the lack of knowledge about exogenous degrees of freedom as part of the process operator σ\sigma, and which does not distinguish between different state preparations. This is a change in perspective in so far as we will place our lack of knowledge as a lack of knowledge about events at the exogenous nodes, rather than a lack of knowledge about the process.

{τΛλ}λ:={P(λ)(ρΛoutλ)T𝕀Λin}λ.\{\tau^{\lambda}_{\Lambda}\}_{\lambda}:=\{P(\lambda)(\rho^{\lambda}_{\Lambda^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{\Lambda^{\mathrm{in}}}\}_{\lambda}\;. (18)

Quantum instruments of this form discard the input to the node Λ\Lambda and with probability P(λ)P(\lambda) prepare the state ρλ\rho^{\lambda} in the output. In other words, {τΛλ}λ\{\tau^{\lambda}_{\Lambda}\}_{\lambda} is a discard-and-prepare instrument. Ignoring the outcome of this instrument, one obtains the channel τΛρ=λτΛλ\tau^{\rho}_{\Lambda}=\sum_{\lambda}\tau^{\lambda}_{\Lambda}, corresponding to the preparation of state ρ=λP(λ)ρλ\rho=\sum_{\lambda}P(\lambda)\rho^{\lambda} in the output of node Λ\Lambda.

Note that the outcome and output of a discard-and-prepare instrument are independent of the input state ρΛin\rho_{\Lambda^{\mathrm{in}}}. In order to avoid carrying around arbitrary input states in formulas below (as required for normalization), we will therefore adopt the convention,

{τ~Λλ}λ:={P(λ)(ρΛoutλ)T1dim(Λin)𝕀Λin}λ,\{\widetilde{\tau}^{\lambda}_{\Lambda}\}_{\lambda}:=\left\{P(\lambda)(\rho^{\lambda}_{\Lambda^{\mathrm{out}}})^{T}\otimes\frac{1}{\mathrm{dim}(\mathcal{H}_{\Lambda^{\mathrm{in}}})}\mathbb{I}_{\Lambda^{\mathrm{in}}}\right\}_{\lambda}\;, (19)

such that TrΛin[τ~Λλ]=TrΛin[τΛλρΛin]\mathrm{Tr}_{\Lambda^{\mathrm{in}}}[\widetilde{\tau}^{\lambda}_{\Lambda}]=\mathrm{Tr}_{\Lambda^{\mathrm{in}}}[\tau^{\lambda}_{\Lambda}\rho_{\Lambda^{\mathrm{in}}}] for any state ρΛin\rho_{\Lambda^{\mathrm{in}}}.

Definition 9.

(no-influence condition). Let ρCD|ABU\rho_{CD|AB}^{U} be the Choi-Jamiolkowski (CJ) representation of the channel corresponding to the unitary transformation U:ABCDU:\mathcal{H}_{A}\otimes\mathcal{H}_{B}\rightarrow\mathcal{H}_{C}\otimes\mathcal{H}_{D}. We say that system AA does not influence system DD (denoted as ADA\nrightarrow D) if and only if there exists a quantum channel :(B)(D)\mathcal{M}:\mathcal{L}(\mathcal{H}_{B})\rightarrow\mathcal{L}(\mathcal{H}_{D}) with corresponding CJ representation ρD|B\rho_{D|B}^{\mathcal{M}} such that TrC[ρCD|ABU]=ρD|B𝕀A\mathrm{Tr}_{C}[\rho_{CD|AB}^{U}]=\rho_{D|B}^{\mathcal{M}}\otimes\mathbb{I}_{A}.121212We remark that the labels A,B,C,DA,B,C,D refer to arbitrary systems, not necessarily nodes in a quantum causal model. Within a quantum causal model, two of those labels, say AA and CC, may refer to output and input Hilbert spaces of the same node.

Given these preliminaries, we define a quantum version of the structural causal models in Def. 3.

Definition 10 (Quantum structural causal model).

A quantum structural causal model (QSM) is a triple
MQ=(𝐀,𝚲,S),ρ𝐀S|𝐀𝚲U,{τΛiλi}λiM_{Q}=\langle(\mathbf{A},\mathbf{\Lambda},S),\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U},\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}}\rangle, specified by:

  • (i)

    a set of quantum nodes, which are split into

    • a set of endogenous nodes 𝐀={A1,,An}\mathbf{A}=\{A_{1},\cdots,A_{n}\},

    • a set of exogenous nodes 𝚲={Λ1,,Λn}\mathbf{\Lambda}=\{\Lambda_{1},\cdots,\Lambda_{n}\},

    • and a sink node SS;

  • (ii)

    a unitary ρ𝐀S|𝐀𝚲U\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U} that satisfies the no-influence conditions

    {ΛjAi}ji\{\Lambda_{j}\nrightarrow A_{i}\}_{j\neq i} (20)

    according to Def. 9; and

  • (iii)

    a set of discard-and-prepare instruments {τΛiλi}λi\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}} for every exogenous node Λi𝚲\Lambda_{i}\in\mathbf{\Lambda}.

Note that in general we need to include an additional sink node SS, in order for the process operator ρ𝐀S|𝐀𝚲U\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U} to be unitary. SS contains any excess information that is discarded in the process (cf. Ref. [29]).

We emphasize the subtle, but conceptually crucial difference between Def. 4.5 in Ref. [29] and our Def. 10. The former specifies the input states on ancillary nodes directly, as part of a ‘unitary process with inputs’, while the latter encodes input states in terms of discard-and-prepare instruments, acting on an arbitrary input state. This will enable us to use classical Bayesian inference on the outcomes of instruments at exogenous nodes in the abduction step of the evaluation of quantum counterfactuals, as we’ll see in Sec. 5 below. In contrast, this is not possible using Def. 4.5 in Ref. [29], but may require a generalisation of Bayesian inference to the quantum case (see Sec. 9).

Following Ref. [29], we define a notion of structural compatibility of a process operator σ𝐀\sigma_{\mathbf{A}} with a graph GG.

Definition 11.

[Compatibility of a quantum process operator with a DAG] A quantum process operator σ𝐀=σA1An\sigma_{\mathbf{A}}=\sigma_{A_{1}\cdots A_{n}} over nodes 𝐀={A1,,An}\mathbf{A}=\{A_{1},\cdots,A_{n}\} is said to be structurally compatible with a DAG GG if and only if there exists a quantum structural causal model (QSM) MQ=(𝐀,𝚲,S),ρ𝐀S|𝐀𝚲U,{τΛiλi}λiM_{Q}=\langle(\mathbf{A},\mathbf{\Lambda},S),\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U},\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}}\rangle that recovers σ𝐀\sigma_{\mathbf{A}} as a marginal,

σ𝐀=TrSin𝚲[ρ𝐀S|𝐀𝚲U(τ~Λ1ρ1τ~Λnρn)],\sigma_{\mathbf{A}}=\mathrm{Tr}_{S^{\mathrm{in}}\mathbf{\Lambda}}[\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U}(\widetilde{\tau}^{\rho_{1}}_{\Lambda_{1}}\otimes\cdots\otimes\widetilde{\tau}^{\rho_{n}}_{\Lambda_{n}})]\;, (21)

where ρ𝐀S|𝐀𝚲U\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U} satisfies the no-influence relations

{AjAi}AjPa(Ai),\{A_{j}\nrightarrow A_{i}\}_{A_{j}\notin\mathrm{Pa}(A_{i})}\;, (22)

with Pa(Ai)\mathrm{Pa}(A_{i}) defined by GG.

Similar to Thm. 4.10 in Ref. [29], one shows that a process operator σ𝐀\sigma_{\mathbf{A}} is structurally compatible with GG if and only if it is Markov for GG.

Theorem 2 (Equivalence of quantum compatibility and Markovianity).

For a DAG GG with nodes 𝐀={A1,,An}\mathbf{A}=\{A_{1},\cdots,A_{n}\} and a quantum process operator σ𝐀\sigma_{\mathbf{A}}, the following are equivalent:

  1. 1.

    σ𝐀\sigma_{\mathbf{A}} is structurally compatible with GG.

  2. 2.

    σ𝐀\sigma_{\mathbf{A}} is Markov for GG.

Proof.

The difference between our definition of ‘structural compatibility’ in Def. 11 and that of ’compatibility’ in Def. 4.8 in Ref. [29] is that the latter applies to a “unitary process with inputs” (see Def. 4.5 in Ref. [29]), while Def. 11 applies to a QSM as defined in Def. 10. Yet, we show that σ𝐀\sigma_{\mathbf{A}} is compatible with GG if and only if it is structurally compatible with GG. The result then follows from the proof of Thm. 4.10 in Ref. [29].

First, let σ𝐀\sigma_{\mathbf{A}} be compatible with GG, then by Def. 4.8 in Ref. [29] there exists a unitary process ρ𝐀S|𝐀𝚲U\rho^{U}_{\mathbf{A}S|\mathbf{A}\bm{\Lambda}} that satisfies the no-influence conditions {AjAi}AjPa(Ai)\{A_{j}\nrightarrow A_{i}\}_{A_{j}\notin\mathrm{Pa}(A_{i})} and {ΛjAi}ji\{\Lambda_{j}\nrightarrow A_{i}\}_{j\neq i}, and states ρΛ1ρΛn\rho_{\Lambda_{1}}\otimes\dots\otimes\rho_{\Lambda_{n}} such that σ𝐀\sigma_{\mathbf{A}} is recovered as a marginal,

σ𝐀=TrSin𝚲out[ρ𝐀S|𝐀𝚲U(ρΛ1TρΛnT)],\sigma_{\mathbf{A}}=\mathrm{Tr}_{S^{\mathrm{in}}\mathbf{\Lambda}^{\mathrm{out}}}\left[\rho^{U}_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}(\rho^{T}_{\Lambda_{1}}\otimes\cdots\otimes\rho^{T}_{\Lambda_{n}})\right]\;, (23)

where we traced over the inputs of exogenous nodes Λi\Lambda_{i}. Choosing discard-and-prepare measurements {τΛiλi}λi\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}} such that τΛiρΛi:=λiτΛiλi\tau^{\rho_{\Lambda_{i}}}_{\Lambda_{i}}:=\sum_{\lambda_{i}}\tau^{\lambda_{i}}_{\Lambda_{i}} (cf. Eq. (19)), (𝐀,𝚲,S),ρ𝐀S|𝐀𝚲U,{τΛiλi}λi\langle(\mathbf{A},\bm{\Lambda},S),\rho^{U}_{\mathbf{A}S|\mathbf{A}\bm{\Lambda}},\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}}\rangle defines a QSM (cf. Def. 10): in particular, ρ𝐀S|𝐀𝚲U\rho^{U}_{\mathbf{A}S|\mathbf{A}\bm{\Lambda}} satisfies Eq. (20). Moreover, ρ𝐀S|𝐀𝚲U\rho^{U}_{\mathbf{A}S|\mathbf{A}\bm{\Lambda}} also satisfies Eq. (22), and Eq. (23) implies Eq. (21. From this it follows that σ𝐀\sigma_{\mathbf{A}} is structurally compatible with GG.

Conversely, if σ𝐀\sigma_{\mathbf{A}} is structurally compatible with GG it admits a QSM (𝐀,𝚲,S),ρ𝐀S|𝐀𝚲U,{τΛiλi}λi\langle(\mathbf{A},\bm{\Lambda},S),\rho^{U}_{\mathbf{A}S|\mathbf{A}\bm{\Lambda}},\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}}\rangle, from which we extract the unitary process operator ρ𝐀S|𝐀𝚲U\rho^{U}_{\mathbf{A}S|\mathbf{A}\bm{\Lambda}} satisfying the no-influence conditions in Eq. (20) and Eq. (22), and which recovers σ𝐀\sigma_{\mathbf{A}} as a marginal in Eq. (23) for inputs ρΛi=TrΛin[τ~ΛiρΛi]=λiTrΛin[τ~Λiλi]\rho_{\Lambda_{i}}=\mathrm{Tr}_{\Lambda^{\mathrm{in}}}[\widetilde{\tau}^{\rho_{\Lambda_{i}}}_{\Lambda_{i}}]=\sum_{\lambda_{i}}\mathrm{Tr}_{\Lambda^{\mathrm{in}}}[\widetilde{\tau}^{\lambda_{i}}_{\Lambda_{i}}], as a consequence of Eq. (21). It then follows that σ𝐀\sigma_{\mathbf{A}} is compatible with GG. ∎

Theorem 2 establishes that for every process operator that is Markov for a graph GG, there exists a QSM model over GG that reproduces that process. Note however that this does not necessarily give us information about which QSM correctly describes a given physical process. This requires that the outcomes of instruments at the exogenous nodes correspond to “stable events” (cf. Ref. [43]), e.g. due to decoherence. That is, for a QSM to be taken to correctly describe a physical process, the events represented at the exogenous nodes must be effectively classical events, in line with their treatment as fixed background events. The evaluation of counterfactuals will be relative to a QSM, and different QSMs compatible with the same process σ𝐀\sigma_{\mathbf{A}} will in general give different answers to the same counterfactual query. This situation is analogous to the classical case. The question of determining which (classical or quantum) structural causal model correctly describes a given physical realisation of a process is an important question, but beyond the scope of this work.

Finally, we need the following notion (cf. Eq. (19) in Ref. [28]). Given a particular set of outcomes 𝝀=(λ1,,λn)\bm{\lambda}=(\lambda_{1},\cdots,\lambda_{n}) at the exogenous instruments, we define a conditional process operator as follows,

σ𝐀𝝀=TrSin𝚲[ρ𝐀S|𝐀𝚲U(τ~Λ1λ1τ~Λnλn)]P(λ1,,λn).\sigma_{\mathbf{A}}^{\bm{\lambda}}=\frac{\mathrm{Tr}_{S^{\mathrm{in}}\mathbf{\Lambda}}\left[\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U}(\widetilde{\tau}^{\lambda_{1}}_{\Lambda_{1}}\otimes\cdots\otimes\widetilde{\tau}^{\lambda_{n}}_{\Lambda_{n}})\right]}{P(\lambda_{1},\cdots,\lambda_{n})}\;. (24)

This allows us to calculate the conditional probability ‘P𝐳(𝐚|𝝀)P_{\mathbf{z}}(\mathbf{a}|\bm{\lambda})’ to obtain a set of outcomes 𝐚=(a1,,an)\mathbf{a}=(a_{1},\cdots,a_{n}) for a set of instruments 𝐳=(z1,,zn)\mathbf{z}=(z_{1},\cdots,z_{n}) at endogenous nodes, given a set of outcomes 𝝀\bm{\lambda} for the exogenous instruments:

P𝐳(𝐚|𝝀)=Tr𝐀[σ𝐀𝝀τ𝐀𝐚|𝐳]withτ𝐀𝐚|𝐳=τA1a1|z1τAnan|zn.P_{\mathbf{z}}(\mathbf{a}|\bm{\lambda})=\mathrm{Tr}_{\mathbf{A}}[\sigma_{\mathbf{A}}^{\bm{\lambda}}\tau^{\mathbf{a}|\mathbf{z}}_{\mathbf{A}}]\quad\quad\quad\mathrm{with}\quad\quad\quad\tau^{\mathbf{a}|\mathbf{z}}_{\mathbf{A}}=\tau_{A_{1}}^{a_{1}|z_{1}}\otimes\cdots\otimes\tau_{A_{n}}^{a_{n}|z_{n}}\;. (25)

Assuming that the a QSM correctly describes a given physical scenario, and in particular that the events associated with 𝝀\bm{\lambda} can be thought of as well-decohered, stable events, we can think of Eq. (24) as representing the actual process realised in a given run of the experiment, where our (prior) ignorance about which process is actually realised is encoded in the subjective probabilities P(λ1,,λn)P(\lambda_{1},\cdots,\lambda_{n}).

5 Counterfactuals in Quantum Causal Models

Classically, a counterfactual query has the form “Given evidence 𝐞\mathbf{e}, would 𝐘\mathbf{Y} have been 𝐲\mathbf{y} had 𝐙\mathbf{Z} been 𝐳\mathbf{z}?”. In Pearl’s formalism, the corresponding counterfactual statement can be assigned a truth value given a full specification 𝐔=𝐮\mathbf{U}=\mathbf{u} of the background conditions in a structural causal model. In that formalism, probabilities only arise out of our lack of knowledge about exogenous variables, and one can define the probability for the counterfactual to be true as the probability that 𝐮\mathbf{u} lies in the range of values where the counterfactual is evaluated as true. In contrast, in quantum causal models, a counterfactual statement will in general not have a truth value! This is the case even if we are given maximal information about the process (represented as a unitary process) and maximal information about the events at the exogenous nodes (represented as a full specification of the exogenous variables ‘𝚲=𝝀\bm{\Lambda}=\bm{\lambda}’ in a quantum structural causal model131313Here we are assuming that maximal information about an event corresponding to the preparation of a quantum state is given by a (pure) quantum state. This of course assumes that quantum mechanics is “complete” in the sense that there are no hidden variables that would further specify the outcomes of instruments. While this is admittedly an important assumption, it is the natural assumption to make in the context of quantum causal models—which aim to maintain compatibility with relativistic causality [44].).

In order to avoid the implicit assumption of ‘counterfactual definiteness’ inherent to the notion of a probability of a counterfactual as in the classical case (see Def. 4), we seek a notion of counterfactual probability in the quantum case.

Definition 12 (Counterfactual probability).

Let MQ=(𝐀,𝚲,S),ρ𝐀S|𝐀𝚲U,{τΛiλi}λiM_{Q}=\langle(\mathbf{A},\mathbf{\Lambda},S),\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U},\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}}\rangle be a quantum structural causal model. Then the counterfactual probability that outcomes 𝐜\mathbf{c}^{\prime} would have obtained for a subset of nodes C, had instruments z=(z1,,zn)\textbf{z}^{\prime}=(z_{1}^{\prime},\cdots,z_{n}^{\prime}) been implemented and outcomes 𝐛\mathbf{b}^{\prime} obtained at a set of nodes B (disjoint from C), in the situation specified by the background variables 𝚲=𝛌\bm{\Lambda}=\bm{\lambda}, is denoted by Pz’𝛌(𝐜|𝐛)P^{\bm{\lambda}}_{\textbf{z'}}(\mathbf{c}^{\prime}|\mathbf{b}^{\prime}) and given by

P𝐳𝝀(𝐜|𝐛)=P𝐳𝝀(𝐜,𝐛)P𝐳𝝀(𝐛)=Tr𝐀[σ𝐀λ(τ𝐁𝐛|𝐳𝐁τ𝐂𝐜|𝐳𝐂τ𝐀𝐁𝐂|𝐳)]Tr𝐀[σ𝐀λ(τ𝐁𝐛|𝐳𝐁τ𝐀𝐁|𝐳)],P^{\bm{\lambda}}_{\mathbf{z}^{\prime}}(\mathbf{c}^{\prime}|\mathbf{b}^{\prime})=\frac{P^{\bm{\lambda}}_{\mathbf{z}^{\prime}}(\mathbf{c}^{\prime},\mathbf{b}^{\prime})}{P^{\bm{\lambda}}_{\mathbf{z}^{\prime}}(\mathbf{b}^{\prime})}=\frac{\mathrm{Tr}_{\mathbf{A}}\left[\sigma^{\lambda}_{\mathbf{A}}(\tau^{\mathbf{b}^{\prime}|\mathbf{z}^{\prime}_{\mathbf{B}}}_{\mathbf{B}}\otimes\tau^{\mathbf{c}^{\prime}|\mathbf{z}^{\prime}_{\mathbf{C}}}_{\mathbf{C}}\otimes\tau^{|\mathbf{z}^{\prime}}_{\mathbf{A}\setminus\mathbf{B}\cup\mathbf{C}})\right]}{\mathrm{Tr}_{\mathbf{A}}\left[\sigma^{\lambda}_{\mathbf{A}}(\tau^{\mathbf{b}^{\prime}|\mathbf{z}^{\prime}_{\mathbf{B}}}_{\mathbf{B}}\otimes\tau^{|\mathbf{z}^{\prime}}_{\mathbf{A}\setminus\mathbf{B}})\right]}\;, (26)

where τ𝐁𝐛|𝐳𝐁=Bj𝐁τBjbj|zj\tau^{\mathbf{b}^{\prime}|\mathbf{z}^{\prime}_{\mathbf{B}}}_{\mathbf{B}}=\bigotimes_{B_{j}\in\mathbf{B}}\,\tau^{b^{\prime}_{j}|z^{\prime}_{j}}_{B_{j}}, τ𝐂𝐜|𝐳𝐂=Ck𝐂τCkck|zk\tau^{\mathbf{c}^{\prime}|\mathbf{z}^{\prime}_{\mathbf{C}}}_{\mathbf{C}}=\bigotimes_{C_{k}\in\mathbf{C}}\,\tau^{c^{\prime}_{k}|z^{\prime}_{k}}_{C_{k}}, τ𝐀𝐁𝐂|𝐳=Ai𝐁𝐂τAi|zi\tau^{|\mathbf{z}^{\prime}}_{\mathbf{A}\setminus\mathbf{B}\cup\mathbf{C}}=\bigotimes_{A_{i}\notin\mathbf{B}\cup\mathbf{C}}\tau^{|z^{\prime}_{i}}_{A_{i}} and τ𝐀𝐁|𝐳=Ai𝐁τAi|zi\tau^{|\mathbf{z}^{\prime}}_{\mathbf{A}\setminus\mathbf{B}}=\bigotimes_{A_{i}\notin\mathbf{B}}\tau^{|z^{\prime}_{i}}_{A_{i}}. For P𝐳λ(𝐛)=0P_{\mathbf{z}^{\prime}}^{\lambda}(\mathbf{b}^{\prime})=0, we set P𝐳λ(𝐜|𝐛)=P_{\mathbf{z}^{\prime}}^{\lambda}(\mathbf{c}^{\prime}|\mathbf{b}^{\prime})=* for counterfactuals with impossible antecedent (‘counterpossibles’).

More generally, we want to calculate the expected value of the counterfactual probability given some evidence, for which we define a standard quantum counterfactual query as follows.

Definition 13 (Standard quantum counterfactual query).

Let MQ=(𝐀,𝚲,S),ρ𝐀S|𝐀𝚲U,{τΛiλi}λiM_{Q}=\langle(\mathbf{A},\mathbf{\Lambda},S),\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U},\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}}\rangle be a quantum structural causal model. Then a standard quantum counterfactual query, denoted by P𝐛|𝐳𝐚|𝐳(𝐜)P_{\mathbf{b}^{\prime}|\mathbf{z}^{\prime}}^{\mathbf{a}|\mathbf{z}}(\mathbf{c}^{\prime}), is the expected probability that outcomes 𝐜\mathbf{c}^{\prime} would have obtained for a subset of nodes C, had instruments z=(z1,,zn)\textbf{z}^{\prime}=(z_{1}^{\prime},\cdots,z_{n}^{\prime}) been implemented and outcomes 𝐛\mathbf{b}^{\prime} obtained at a set of nodes B (disjoint from C), given the evidence that a set of instruments z=(z1,,zn)\textbf{z}=(z_{1},\cdots,z_{n}) has been implemented and outcomes 𝐚=(a1,,an)\mathbf{a}=(a_{1},\cdots,a_{n}) obtained.

Note that to obtain an unambiguous answer, one needs to specify all the instruments in all the nodes, both actual and counterfactual. Def. 13 may not look general enough to accommodate all types of counterfactuals one can envisage, but we will discuss later how the answer to seemingly different types of counterfactual queries can be obtained from the answer to a standard query after suitable interpretation. At times there will be ambiguity in how to interpret some counterfactual queries, and the task of interpretation will be to reduce any counterfactual query to the appropriate standard query—we will return to this later. We now proceed to show how we can answer a quantum counterfactual query.

5.1 Evaluation of counterfactuals

The evaluation of a standard counterfactual query within a quantum structural causal model proceeds through a three-step process of abduction, action and prediction, in analogy with the classical case.

Abduction. We infer what the past must have been, given information we have at present, that is, we want to update our information about the instrument outcomes λi\lambda_{i} at the exogenous nodes Λi\Lambda_{i}, given that outcomes aia_{i} have been observed upon performing instruments ziz_{i} at nodes AiA_{i}.141414In the language of Ref. [43], we treat the outcomes λi\lambda_{i} at exogenous nodes Λi\Lambda_{i} as “stable facts”. In taking this stance we set aside the question of when an instrument outcome can be said to be a stable fact, i.e. we set aside the measurement problem, which applies to the quantum causal model framework in the same way as to standard quantum theory [45, 44]. Since we are talking about jointly measured variables, we can perform Bayesian update to calculate the conditional probability151515Here, we assume that P𝐳(𝐚)>0P_{\mathbf{z}}(\mathbf{a})>0 since 𝐚|𝐳\mathbf{a}|\mathbf{z} is an actually observed event.

P𝐳(𝝀|𝐚)=P𝐳(𝐚|𝝀)P(𝝀)P𝐳(𝐚)=Tr𝐀[σ𝐀𝝀τ𝐀𝐚|𝐳]P(λ1,,λn)Tr𝐀[σ𝐀τ𝐀𝐚|𝐳].P_{\mathbf{z}}(\bm{\lambda}|\mathbf{a})=\frac{P_{\mathbf{z}}(\mathbf{a}|\bm{\lambda})P(\bm{\lambda})}{P_{\mathbf{z}}(\mathbf{a})}=\frac{\mathrm{Tr}_{\mathbf{A}}\left[\sigma_{\mathbf{A}}^{\bm{\lambda}}\tau^{\mathbf{a}|\mathbf{z}}_{\mathbf{A}}\right]P(\lambda_{1},\cdots,\lambda_{n})}{\mathrm{Tr}_{\mathbf{A}}\left[\sigma_{\mathbf{A}}\tau^{\mathbf{a}|\mathbf{z}}_{\mathbf{A}}\right]}\;. (27)

Action. Next, we modify the instruments at endogenous nodes to {τAiai|zi}ai\{\tau^{a^{\prime}_{i}|z^{\prime}_{i}}_{A_{i}}\}_{a^{\prime}_{i}}, as required by the antecedent of the counterfactual query. We highlight an important distinction from the classical case: unlike in Pearl’s formalism, we do not need to modify the process itself, since an ‘arrow-breaking’ intervention at a node AA can always be emulated via some appropriate discard-and-prepare instrument, for example, by the instrument

τAdo(ρ):=(ρAout)T𝕀Ain.\tau^{\mathrm{do}(\rho)}_{A}:=(\rho_{A^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{A^{\mathrm{in}}}\;. (28)

Deciding what instruments are appropriate for a given counterfactual query not in standard form is part of the interpretational task we will return to in Sec. 7 below. For a standard quantum counterfactual query, this is unambiguous since the counterfactual instruments are defined as part of the query (see Def. 13).

Prediction. Finally, we calculate the expected value of the counterfactual probability

P𝐛|𝐳𝐚|𝐳(𝐜)=𝝀𝚲P𝐳(𝝀|𝐚)P𝐳𝝀(𝐜|𝐛).P_{\mathbf{b}^{\prime}|\mathbf{z}^{\prime}}^{\mathbf{a}|\mathbf{z}}(\mathbf{c}^{\prime})=\sum_{\bm{\lambda}\in\bm{\Lambda}}P_{\mathbf{z}}(\bm{\lambda}|\mathbf{a})P_{\mathbf{z}^{\prime}}^{\bm{\lambda}}(\mathbf{c}^{\prime}|\mathbf{b}^{\prime})\;. (29)

Whenever the counterfactual has an impossible antecedent for some values of the background variables with nonzero probability, that is, whenever P𝐳𝝀(𝐜|𝐛)=P_{\mathbf{z}^{\prime}}^{\bm{\lambda}}(\mathbf{c}^{\prime}|\mathbf{b}^{\prime})=* for some 𝝀𝚲\bm{\lambda}\in\bm{\Lambda} with P𝐳(𝝀|𝐚)0P_{\mathbf{z}}(\bm{\lambda}|\mathbf{a})\neq 0, we set P𝐛|𝐳𝐚|𝐳(𝐜)=P_{\mathbf{b}^{\prime}|\mathbf{z}^{\prime}}^{\mathbf{a}|\mathbf{z}}(\mathbf{c}^{\prime})=*.

If a counterfactual query can be interpreted as a standard quantum counterfactual query, then it will have an unambiguous answer as above. In Sec. 7, we will discuss the task of interpreting a general quantum counterfactual query that is not already in standard form. Before doing so, we proceed by proving that the present formalism extends Pearl’s classical formalism.

6 From classical to quantum structural causal models

Having defined a notion of quantum structural causal models (QSM) in Def. 10, it is an important question to ask in what sense this definition extends that of a probabilistic structural causal model (PSM) in Def. 5 and, in particular, that of a classical structural causal model (CSM) in Def. 3. In this section, we show that QSMs indeed provide a generalization of PSMs—by extending an arbitrary PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle to a QSM MQM_{Q}. In order to do so, we need to take care of two crucial physical differences between Def. 3 and Def. 10.

First, note that the structural relations 𝐅\mathbf{F} in a CSM M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle are generally not reversible, while unitary evolution in QSMs postulates an underlying reversible process. We therefore need to lift a generic CSM to a reversible CSM, whose structural relations are given in terms of bijective functions, yet whose independence conditions coincide with those of the original CSM. Second, while classical information (in a CSM) can be copied, quantum information famously cannot. We therefore need to find a mechanism to encode classical copy operations into a QSM. This will require us to introduce auxiliary systems, which also need to preserve the no-influence conditions required between exogenous variables in Def. 10, (ii).

The next theorem asserts that an extension of a CSM to a QSM satisfying these constraints always exists.

Theorem 3.

Every PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle, consisting of a CSM M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle and a probability distribution P(𝐮)P(\mathbf{u}) over exogenous variables, can be extended to a QSM MQ=(𝐕′′,𝚲′′,S′′),ρ𝐕′′S′′|𝐕′′𝚲′′W,{τΛi′′ui}uiM_{Q}=\langle(\mathbf{V}^{\prime\prime},\bm{\Lambda}^{\prime\prime},S^{\prime\prime}),\rho_{\mathbf{V}^{\prime\prime}S^{\prime\prime}|\mathbf{V}^{\prime\prime}\bm{\Lambda}^{\prime\prime}}^{W},\{\tau^{u_{i}}_{\Lambda^{\prime\prime}_{i}}\}_{u_{i}}\rangle such that

P(𝐯)\displaystyle P(\mathbf{v}) =uii=1nδvi,fi(pai,ui)P(ui)\displaystyle=\sum_{u_{i}}\prod_{i=1}^{n}\delta_{v_{i},f_{i}(pa_{i},u_{i})}P(u_{i}) (30)
=𝐮TrS′′in𝐕′′[ρ𝐕′′S′′𝐕′′𝚲′′W(τV1′′v1τVn′′vn)(τ~Λ1′′u1τ~Λn′′un)],\displaystyle=\sum_{\mathbf{u}}\mathrm{Tr}_{S^{\prime\prime\mathrm{in}}\mathbf{V}^{\prime\prime}}[\rho_{\mathbf{V}^{\prime\prime}S^{\prime\prime}\mid\mathbf{V}^{\prime\prime}\bm{\Lambda}^{\prime\prime}}^{W}(\tau^{\mathrm{v_{1}}}_{V^{\prime\prime}_{1}}\otimes\cdots\otimes\tau^{\mathrm{v_{n}}}_{V^{\prime\prime}_{n}})\otimes(\widetilde{\tau}^{u_{1}}_{\Lambda^{\prime\prime}_{1}}\otimes\cdots\otimes\widetilde{\tau}^{u_{n}}_{\Lambda^{\prime\prime}_{n}})]\;, (31)

In particular, MQM_{Q} preserves the independence conditions between variables 𝐕\mathbf{V} in MM (as defined by 𝐅\mathbf{F}),

ρ𝐕′′S′′𝐕′′𝚲′′W=i=1nρVi′′Si′′Pai′′Λi′′Wi.\rho_{\mathbf{V}^{\prime\prime}S^{\prime\prime}\mid\mathbf{V}^{\prime\prime}\bm{\Lambda}^{\prime\prime}}^{W}=\prod_{i=1}^{n}\rho_{V^{\prime\prime}_{i}S^{\prime\prime}_{i}\mid Pa^{\prime\prime}_{i}\Lambda^{\prime\prime}_{i}}^{W_{i}}\;. (32)
Proof.

(Sketch) The proof consists of several parts:

  • (i)

    we find a binary extension of the CSM MM,

  • (ii)

    we extend the binary CSM to a binary, reversible CSM, where all functional relations are bijective,

  • (iii)

    we encode classical copy operations in a QSM using CNOT-gates,

  • (iv)

    by promoting classical variables to quantum nodes, and by linearly extending bijective functions between classical variables to isometries, we construct a QSM MQM_{Q}, which extends the PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle as desired.

For details of the proof, see App. A. ∎

We will see in Sec. 7 that a QSM admits different types of counterfactual queries, some of which are genuinely quantum, that is, they do not arise in a CSM. Nevertheless, Thm. 3 implies that counterfactual queries arising in a PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle coincide with the corresponding queries in its quantum extension MQM_{Q}.

Corollary 1.

The evaluation of a counterfactual in a (PSM) M,P(𝐮)\langle M,P(\mathbf{u})\rangle coincides with the evaluation of the corresponding do-interventional counterfactual (see also Sec. 7) in its quantum extension MQM_{Q}.

Proof.

Given a distribution over exogenous nodes, Thm. 3 assures that do-interventions in Eq. (2) yield the same prediction—whether evaluated via Eq. (6) in MM or as a do-interventional counterfactual via Eq. (29) in MQM_{Q}. This leaves us with the update step in Pearl’s analysis of counterfactuals (cf. Thm. 1). More precisely, we need to show that the Bayesian update in Eq. (27) does not affect the distribution over the space of additional ancillae 𝐓\mathbf{T}^{\prime} and 𝚲\bm{\Lambda}^{\prime} in the proof of Thm. 3. This is a simple consequence of the way distributions P(𝐮)P(\mathbf{u}) over exogenous nodes in MM are encoded in MQM_{Q}.

First, the distribution over copy ancillae Λi\Lambda^{\prime}_{i} is given by a δ\delta-distribution peaked on the state |00|Λi|0\rangle\langle 0|_{\Lambda_{i}} (see Eq.(90) in App. A). In other words, we have full knowledge of the initialization of the copy ancillae, hence, the update step in Eq. (27) is trivial in this case.

Second, let P(𝐮)=P(𝐮,𝐭)P(\mathbf{u}^{\prime})=P(\mathbf{u},\mathbf{t}^{\prime}) be any distribution over exogenous nodes in the binary, reversible extension MM^{\prime} of MM (see (i) and (ii) in App. A) such that P(𝐮)=𝐭𝐓P(𝐮,𝐭)P(\mathbf{u})=\sum_{\mathbf{t}^{\prime}\in\mathbf{T}^{\prime}}P(\mathbf{u},\mathbf{t}^{\prime}), that is, P(𝐮)P(\mathbf{u}) arises from P(𝐮)P(\mathbf{u}^{\prime}) by marginalisation under the discarding operation π\pi (see (ii) in App. A).161616A canonical choice for P(𝐮)P(\mathbf{u}^{\prime}) is the product distribution of P(𝐮)P(\mathbf{u}) and the uniform distribution over 𝐓\mathbf{T}^{\prime}, P(𝐮)=1|𝐓|P(𝐮)P(\mathbf{u}^{\prime})=\frac{1}{|\mathbf{T}^{\prime}|}P(\mathbf{u}). But since the variables TiT^{\prime}_{i} in Ui=Ti×UiU^{\prime}_{i}=T^{\prime}_{i}\times U_{i} are related only to the sink node SiS^{\prime}_{i} via fif^{\prime}_{i} (see Eq. (79) in App. A), we have P𝐳(𝐚|𝐮)=P𝐳(𝐚|𝐮,𝐭)=P𝐳(𝐚|𝐮)P_{\mathbf{z}}(\mathbf{a}|\mathbf{u}^{\prime})=P_{\mathbf{z}}(\mathbf{a}|\mathbf{u},\mathbf{t}^{\prime})=P_{\mathbf{z}}(\mathbf{a}|\mathbf{u}). The marginalised updated distribution thus reads

𝐭𝐓P𝐳(𝐮,𝐭|𝐚)=𝐭𝐓P𝐳(𝐚|𝐮,𝐭)P(𝐮,𝐭)P𝐳(𝐚)=𝐭𝐓P𝐳(𝐚|𝐮)P(𝐮,𝐭)P𝐳(𝐚)=P𝐳(𝐚|𝐮)P(𝐮)P𝐳(𝐚)=P𝐳(𝐮|𝐚).\sum_{\mathbf{t}^{\prime}\in\mathbf{T}^{\prime}}P_{\mathbf{z}}(\mathbf{u},\mathbf{t}^{\prime}|\mathbf{a})=\sum_{\mathbf{t}^{\prime}\in\mathbf{T}^{\prime}}\frac{P_{\mathbf{z}}(\mathbf{a}|\mathbf{u},\mathbf{t}^{\prime})P(\mathbf{u},\mathbf{t}^{\prime})}{P_{\mathbf{z}}(\mathbf{a})}=\sum_{\mathbf{t}^{\prime}\in\mathbf{T}^{\prime}}\frac{P_{\mathbf{z}}(\mathbf{a}|\mathbf{u})P(\mathbf{u},\mathbf{t}^{\prime})}{P_{\mathbf{z}}(\mathbf{a})}=\frac{P_{\mathbf{z}}(\mathbf{a}|\mathbf{u})P(\mathbf{u})}{P_{\mathbf{z}}(\mathbf{a})}=P_{\mathbf{z}}(\mathbf{u}|\mathbf{a})\;. (33)

In other words, Bayesian inference in Eq. (27) commutes with marginalisation. ∎

Thm. 3 and Cor. 1 show that our definition of QSMs in Def. 10 generalizes that of CSMs in Def. 3. What is more, this generalization is proper: a QSM cannot generally be thought of as a CSM, while also keeping the relevant independence conditions between the variables of the model. Indeed, casting a QSM to a CSM is to specify a local hidden variable model for the QSM, yet a general QSM will not admit a local hidden variable model.

In short, the counterfactual probabilities defined by a QSM can generally not be interpreted as probabilities of counterfactuals (to be true). In Sec. 7, we will further analyse the distinctions between classical and quantum counterfactuals, and see some instances of counterfactual queries in the quantum case that do not have an analog in the classical case.

7 Interpretation of quantum counterfactual queries

In this section, we emphasize some crucial differences between the semantics of counterfactuals in classical and quantum causal models. Recall that in order to compute the probability of a counterfactual in a classical structural causal model (CSM), a do-intervention has to be considered in at least one of the nodes. Indeed, there is no way for the antecedent of the counterfactual query to be true without some modification in the model, since a complete specification of the values of exogenous variables determines the values of endogenous variables, and thus determines the antecedent to have its actual value. CSMs are inherently deterministic.

In contrast, in a quantum structural causal model (QSM) the probability that a different outcome would have been obtained can be nonzero even without a do-intervention, since even maximal knowledge of the events at the exogenous nodes does not, in general, determine the outcomes of endogenous instruments. QSMs are inherently probabilistic.

As a consequence, we will distinguish between two kinds of counterfactuals in the quantum case, namely, passive and active counterfactuals, which we define and discuss examples of in Sec. 7.1. In Sec. 7.2, we provide an argument for the disambiguation between passive and active counterfactuals, when faced with an ambiguous (classical) counterfactual query. Moreover, as a consequence of the richer semantics of quantum counterfactuals, in Sec. 7.3 we show how (passive) quantum counterfactuals break the link between causal and counterfactual dependence that exists in the classical setting. We discuss this explicitly in the case of the Bell scenario.

7.1 Passive and active counterfactuals

In Sec. 5.1 we outlined a three-step procedure to evaluate counterfactual probabilities in quantum causal models. Note that, unlike in its classical counterpart (Thm. 1), an arrow-breaking do-intervention is not necessary in order to make the antecedent of the counterfactual true. Counterfactual queries can therefore be evaluated without a do-intervention on the underlying causal graph, and, in particular, without changing the instruments performed at quantum nodes at all. Indeed, according to Def. 13, the expected counterfactual probability has a well-defined numerical value whenever the antecedent has a nonzero probability P𝐳𝝀(𝐛)P_{\mathbf{z}^{\prime}}^{\bm{\lambda}}(\mathbf{b}^{\prime}) of occurring for all values of the background variables 𝝀\bm{\lambda} that are compatible with the evidence 𝐚|𝐳\mathbf{a}|\mathbf{z}, that is,

𝝀𝚲(P𝐳(𝝀|𝐚)>0P𝐳𝝀(𝐛)>0).\forall\bm{\lambda}\in\bm{\Lambda}\;\left(P_{\mathbf{z}}(\bm{\lambda}|\mathbf{a})>0\implies P_{\mathbf{z}^{\prime}}^{\bm{\lambda}}(\mathbf{b}^{\prime})>0\right)\,. (34)

It is possible for Eq. (34) to be satisfied even while keeping the endogenous instruments fixed, i.e. even if 𝐳=𝐳\mathbf{z}^{\prime}=\mathbf{z}. Crucially, unlike in the classical case, we will see that this may be the case even if the antecedent 𝐛|𝐳\mathbf{b}^{\prime}|\mathbf{z}^{\prime} is incompatible with the observed values 𝐚|𝐳\mathbf{a}|\mathbf{z}. This motivates the following distinction for quantum counterfactuals.

Definition 14.

Let MQ=(𝐀,𝚲,S),ρ𝐀S|𝐀𝚲U,{τΛiλi}λiM_{Q}=\langle(\mathbf{A},\mathbf{\Lambda},S),\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U},\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}}\rangle be a quantum structural causal model. A counterfactual query (following Def. 13) is called a passive counterfactual if 𝐳𝐁𝐳|𝐁=𝐳|𝐁𝐳𝐁\mathbf{z}^{\prime}_{\mathbf{B}}\equiv\mathbf{z}^{\prime}|_{\mathbf{B}}=\mathbf{z}|_{\mathbf{B}}\equiv\mathbf{z}_{\mathbf{B}}, that is, if no intervention is performed on the nodes specified by the antecedent; otherwise it is called an active counterfactual.

The special case of an active counterfactual where 𝐳\mathbf{z}^{\prime} specifies a do-intervention, τAdo(𝛒)={(𝛒𝐀out)T𝕀𝐀in}\tau_{A}^{\mathrm{do}(\bm{\rho})}=\{(\bm{\rho}_{\mathbf{A}^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{\mathbf{A}^{\mathrm{in}}}\} (see Eq. (28)), will also be called a do-interventional counterfactual.

In the following, we discuss two examples of passive, active and do-interventional counterfactuals.

BBAAΛ\LambdazBz_{B}zAz_{A}bbaaλ\lambda
Figure 8: A graphical representation of a quantum causal model with endogenous nodes 𝑨={A,B}\bm{A}=\{A,B\}, and exogenous node Λ\Lambda. Here we also represent the choice of instruments as inputs to a node, and their corresponding outcomes as outputs.
Example 1.

Consider the causal graph in Fig. 8 and a compatible QSM MQ=(𝐀,Λ),ρAB|AΛU,{τλ}λM_{Q}=\langle(\mathbf{A},\Lambda),\rho^{U}_{AB|A\Lambda},\{\tau^{\lambda}\}_{\lambda}\rangle, where 𝐀={A,B}\mathbf{A}=\{A,B\} represent endogenous nodes, and Λ\Lambda represents an exogenous node with the following discard-and-prepare instrument,

{τλ}λ=0,1={12(([0]Λout)T𝕀Λin),12(([1]Λout)T𝕀Λin)},\{\tau^{\lambda}\}_{\lambda=0,1}=\Bigl{\{}\frac{1}{2}(([0]_{\Lambda^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{\Lambda^{\mathrm{in}}}),\frac{1}{2}(([1]_{\Lambda^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{\Lambda^{\mathrm{in}}})\Bigr{\}}\;, (35)

such that τΛ12𝕀=λ=0,1τΛλ\tau_{\Lambda}^{\frac{1}{2}\mathbb{I}}=\sum_{\lambda=0,1}\tau^{\lambda}_{\Lambda} prepares the maximally mixed state, and we assume identity channels between pairs of nodes,

ρAB|AΛU=ρB|AidρA|Λid=ρBin|AoutidρAin|Λoutid.\rho_{AB|A\Lambda}^{U}=\rho_{B|A}^{\mathrm{id}}\rho_{A|\Lambda}^{\mathrm{id}}=\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\rho_{A^{\mathrm{in}}|\Lambda^{\mathrm{out}}}^{\mathrm{id}}\;. (36)

With respect to the model MQM_{Q}, we will calculate expected counterfactual probabilities of the form Pa=|𝐳a=+|𝐳(b)P^{a=+|\mathbf{z}}_{a^{\prime}=-|\mathbf{z}^{\prime}}(b^{\prime}), where we fix the actual instruments 𝐳=(zA=1,zB)\mathbf{z}=(z_{A}=1,z_{B}) at endogenous nodes with

AzA=1={([+]Aout)T[+]Ain,([]Aout)T[]Ain},BzB={τBb|zB}b,\mathcal{I}_{A}^{z_{A}=1}=\{([+]_{A^{\mathrm{out}}})^{T}\otimes[+]_{A^{\mathrm{in}}},([-]_{A^{\mathrm{out}}})^{T}\otimes[-]_{A^{\mathrm{in}}}\}\;,\quad\quad\quad\quad\quad\mathcal{I}_{B}^{z_{B}}=\{\tau_{B}^{b|z_{B}}\}_{b}\;, (37)

but consider different counterfactual instruments 𝐳=(zA,zB)\mathbf{z}^{\prime}=(z^{\prime}_{A},z^{\prime}_{B}), corresponding to (i) passive, (ii) do-interventional, and (iii) active counterfactual queries. To this end, we first calculate the conditional process operators (cf. Eq. (24)), conditioned on outcomes λ{0,1}\lambda\in\{0,1\} of the instrument in Eq. (35):

σABλ=0\displaystyle\sigma_{AB}^{\lambda=0} =TrΛ[ρAB|AΛUτ~Λλ=0]P(λ=0)=ρBin|Aoutid[0]Ain,\displaystyle=\frac{\mathrm{Tr}_{\Lambda}\left[\rho_{AB|A\Lambda}^{U}\widetilde{\tau}^{\lambda=0}_{\Lambda}\right]}{P(\lambda=0)}=\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[0]_{A^{\mathrm{in}}}\;,
σABλ=1\displaystyle\sigma_{AB}^{\lambda=1} =TrΛ[ρAB|AΛUτ~Λλ=1]P(λ=1)=ρBin|Aoutid[1]Ain.\displaystyle=\frac{\mathrm{Tr}_{\Lambda}\left[\rho_{AB|A\Lambda}^{U}\widetilde{\tau}^{\lambda=1}_{\Lambda}\right]}{P(\lambda=1)}=\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[1]_{A^{\mathrm{in}}}\;. (38)
  • (i)

    Passive case: “Given that a=+a=+ occurred in the actually performed instrument A1\mathcal{I}^{1}_{A}, what is the probability that bb^{\prime} would have obtained using the instrument BzB=1\mathcal{I}^{z^{\prime}_{B}=1}_{B}, had it been that a=a^{\prime}=-, using the instrument A1\mathcal{I}^{1}_{A}?”.

    In the abduction step, we update our information about the exogenous node, given the evidence. Note that observing the outcome a=+a=+ at node AA in this particular case gives us no information about the outcome at the exogenous node since both outcomes at AA occur with equal probability for both of the possible values of the exogenous variable. This is expressed in the following conditional probabilities (c.f. Eq. (27) and the denominator of Eq. (26)), noting that here the counterfactual antecedent is aa^{\prime}, in place of bb^{\prime} in Eq. (26):

    P𝐳(λ=0|a=+)\displaystyle P_{\mathbf{z}}(\lambda=0|a=+) =12P𝐳λ=0(a=)=12,\displaystyle=\frac{1}{2}\quad\wedge\quad P_{\mathbf{z}^{\prime}}^{\lambda=0}(a^{\prime}=-)=\frac{1}{2}\;, (39)
    P𝐳(λ=1|a=+)\displaystyle P_{\mathbf{z}}(\lambda=1|a=+) =12P𝐳λ=1(a=)=12.\displaystyle=\frac{1}{2}\quad\wedge\quad P_{\mathbf{z}^{\prime}}^{\lambda=1}(a^{\prime}=-)=\frac{1}{2}\;. (40)

    From the above, we see that we satisfy the conditions for a passive counterfactual to have a well-defined numerical value, given in Eq. (34). Since zA=zA=1z^{\prime}_{A}=z_{A}=1, this is a passive counterfactual, hence, no action step, that is, no intervention is needed. For the prediction step, we first compute the required counterfactual probabilities from Eq. (26),

    P𝐳λ(b|a=)\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-) =P𝐳λ(b,a=)P𝐳λ(a=)=TrAB[σABλ(τAa=|1τBb|zB=1)]TrAB[σABλ(τAa=|1τB|zB=1)].\displaystyle=\frac{P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime},a^{\prime}=-)}{P_{\mathbf{z}^{\prime}}^{\lambda}(a^{\prime}=-)}=\dfrac{\mathrm{Tr}_{AB}[\sigma_{AB}^{\lambda}(\tau_{A}^{a^{\prime}=-|1}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=1})]}{\mathrm{Tr}_{AB}[\sigma_{AB}^{\lambda}(\tau_{A}^{a^{\prime}=-|1}\otimes\tau_{B}^{|z^{\prime}_{B}=1})]}\;. (41)

    Using Eq. (1), the counterfactual probability for the different values of λ\lambda can be written as

    P𝐳λ=0(b|a=)\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda=0}(b^{\prime}|a^{\prime}=-) =2TrAB[ρBin|Aoutid[0]Ain(τAa=|1τBb|zB=1)],\displaystyle=2\;\mathrm{Tr}_{AB}[\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[0]_{A^{\mathrm{in}}}(\tau_{A}^{a^{\prime}=-|1}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=1})]\;, (42)
    P𝐳λ=1(b|a=)\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda=1}(b^{\prime}|a^{\prime}=-) =2TrAB[ρBin|Aoutid[1]Ain(τAa=|1τBb|zB=1)].\displaystyle=2\;\mathrm{Tr}_{AB}[\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[1]_{A^{\mathrm{in}}}(\tau_{A}^{a^{\prime}=-|1}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=1})]\;. (43)

    The expected counterfactual probability (refer Eq.(29),

    Pa=|zA=1a=+|𝐳(b)\displaystyle P_{a^{\prime}=-|z_{A}^{\prime}=1}^{a=+|\mathbf{z}}(b^{\prime}) =λΛP𝐳(λ|a=+)P𝐳λ(b|a=)\displaystyle=\sum_{\lambda\in\Lambda}P_{\mathbf{z}}(\lambda|a=+)P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-)\; (44)
    =TrAB[ρBin|Aoutid[0]Ain(τAa=|1τBb|zB=1)+ρBin|Aoutid[1]Ain(τAa=|1τBb|zB=1)]\displaystyle=\mathrm{Tr}_{AB}[\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[0]_{A^{\mathrm{in}}}(\tau_{A}^{a^{\prime}=-|1}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=1})+\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[1]_{A^{\mathrm{in}}}(\tau_{A}^{a^{\prime}=-|1}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=1})] (45)
    =TrAB[ρBin|Aoutid𝕀Ain([]Aout)T[]AinτBb|zB=1)]\displaystyle=\mathrm{Tr}_{AB}[\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes\mathbb{I}_{A^{\mathrm{in}}}([-]_{A^{\mathrm{out}}})^{T}\otimes[-]_{A^{\mathrm{in}}}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=1})] (46)
    =TrB[[]BinτBb|zB=1].\displaystyle=\mathrm{Tr}_{B}[[-]_{B^{\mathrm{in}}}\tau_{B}^{b^{\prime}|z^{\prime}_{B}=1}]\;. (47)

    The obtained expected counterfactual probability is thus numerically equivalent to the probability of obtaining outcome bb^{\prime} for the instrument specified by zB=1z_{B}^{\prime}=1, given the preparation of state [][-] at the input of node BB. In other words, the counterfactual probability is simply dictated by the counterfactual outcome at AA, which is a sensible result, since in this case the actual outcome aa gives us no information about the exogenous variables λ\lambda, the counterfactual outcome is possible for either of the values of λ\lambda, and furthermore the effect of the exogenous variables on BB is screened off by the outcome of the projective counterfactual instrument considered for node AA.

  • (ii)

    Do-interventional case: “Given that a=+a=+ occurred in the actually performed instrument A1\mathcal{I}^{1}_{A}, what is the probability that bb^{\prime} would have obtained using the instrument BzB=2\mathcal{I}^{z^{\prime}_{B}=2}_{B}, had it been that a=a^{\prime}=-, using the instrument τAdo([])\tau^{\mathrm{do}([-])}_{A}?”.

    Here, instead of AzA=1\mathcal{I}^{z^{\prime}_{A}=1}_{A}, we perform the single-element instrument corresponding to a do-intervention,

    AzA=2={τAa=|2}={τAdo([])}={([]Aout)T𝕀Ain},\mathcal{I}^{z^{\prime}_{A}=2}_{A}=\{\tau_{A}^{a^{\prime}=-|2}\}=\{\tau_{A}^{\mathrm{do}([-])}\}=\{([-]_{A^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{A^{\mathrm{in}}}\;\}, (48)

    which discards the input and prepares the state [][-] at the output of A. As before, the actual instruments are denoted by 𝐳=(zA=1,zB)\mathbf{z}=(z_{A}=1,z_{B}). Since the actual instruments and evidence are the same as in case (i), the abduction step yields the same result, given in the left side of Eqs. (39) and (40). In the action step, however, we modify the instrument at node AA, and the counterfactual instruments are now denoted by 𝐳=(zA=2,zB)\mathbf{z}^{\prime}=(z^{\prime}_{A}=2,z^{\prime}_{B}).

    For the prediction step, we again first compute the required counterfactual probabilities from Eq. (26),

    P𝐳λ(b|a=)\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-) =P𝐳λ(b,a=)P𝐳λ(a=)=TrAB[σABλ(τAa=|2τBb|zB=2)]TrAB[σABλ(τAa=|2τB|zB=2)].\displaystyle=\frac{P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime},a^{\prime}=-)}{P_{\mathbf{z}^{\prime}}^{\lambda}(a^{\prime}=-)}=\dfrac{\mathrm{Tr}_{AB}[\sigma_{AB}^{\lambda}(\tau_{A}^{a^{\prime}=-|2}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=2})]}{\mathrm{Tr}_{AB}[\sigma_{AB}^{\lambda}(\tau_{A}^{a^{\prime}=-|2}\otimes\tau_{B}^{|z^{\prime}_{B}=2})]}\;. (49)

    The counterfactual probabilities for the different values of λ\lambda thus take the same form as in Eqs. (42) and (43), but with the instrument element τAa=|2\tau_{A}^{a^{\prime}=-|2} in place of τAa=|1\tau_{A}^{a^{\prime}=-|1}. The expected counterfactual probability corresponding to query (ii) can then be computed as

    Pa=|zA=2a|𝐳(b)\displaystyle P_{a^{\prime}=-|z_{A}^{\prime}=2}^{a|\mathbf{z}}(b^{\prime}) =λΛP𝐳(λ|a=+)P𝐳λ(b|a=)\displaystyle=\sum_{\lambda\in\Lambda}P_{\mathbf{z}}(\lambda|a=+)P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-)\; (50)
    =12TrAB[ρBin|Aoutid𝕀Ain(([]Aout)T𝕀AinτBb|zB=2)]\displaystyle=\frac{1}{2}\;\mathrm{Tr}_{AB}[\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes\mathbb{I}_{A^{\mathrm{in}}}(([-]_{A^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{A^{\mathrm{in}}}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=2})] (51)
    =TrB[[]BinτBb|zB=2].\displaystyle=\mathrm{Tr}_{B}[[-]_{B^{\mathrm{in}}}\tau_{B}^{b^{\prime}|z^{\prime}_{B}=2}]\;. (52)

    As the instrument zB=2z^{\prime}_{B}=2 is arbitrary, we see that we obtain the same result for the do-interventional as for the passive counterfactual in this case. So we see that although the do-intervention guarantees the conditions for the counterfactual antecedent to have occurred, it is not necessary in situations where it could have occurred passively, as in case (i).

  • (iii)

    Active case: we ask “Given that a=+a=+ occurred in the actually performed instrument A1\mathcal{I}^{1}_{A}, what is the probability that bb^{\prime} would have obtained using the instrument BzB=3\mathcal{I}^{z^{\prime}_{B}=3}_{B}, had it been that a=a^{\prime}=-, using the instrument A3\mathcal{I}^{3}_{A}.

    Again, the abduction step yields the same results as in cases (i) and (ii). For the action step, this is an active counterfactual whenever A1A3\mathcal{I}^{1}_{A}\neq\mathcal{I}^{3}_{A}. Specifically, let’s consider

    A3={([+]Aout)T[ϕ]Ain,([]Aout)T[ϕ¯]Ain},\mathcal{I}^{3}_{A}=\{([+]_{A^{\mathrm{out}}})^{T}\otimes[\phi]_{A^{\mathrm{in}}},([-]_{A^{\mathrm{out}}})^{T}\otimes[\overline{\phi}]_{A^{\mathrm{in}}}\}\;, (53)

    where [ϕ][{\phi}] is an arbitrary projector and [ϕ¯][\overline{\phi}] is its orthogonal complement. This instrument performs a projective measurement on the {[ϕ],[ϕ¯]}\{[{\phi}],[\overline{\phi}]\} basis on the input of AA, and prepares the corresponding state [+]/[][+]/[-] at the output. Considering the counterfactual probabilities from Eq. (26), we find them to have the same value for the two possible values of λ\lambda, largely independent of [ϕ][\phi],

    P𝐳λ(b|a=)\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-) =P𝐳λ(b,a=)P𝐳λ(a=)=TrAB[σABλ(τAa=|3τBb|zB=3)]TrAB[σABλ(τAa=|3τB|zB=3)]=TrB[[]BinτBb|zB=3)],\displaystyle=\frac{P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime},a^{\prime}=-)}{P_{\mathbf{z}^{\prime}}^{\lambda}(a^{\prime}=-)}=\dfrac{\mathrm{Tr}_{AB}[\sigma_{AB}^{\lambda}(\tau_{A}^{a^{\prime}=-|3}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=3})]}{\mathrm{Tr}_{AB}[\sigma_{AB}^{\lambda}(\tau_{A}^{a^{\prime}=-|3}\otimes\tau_{B}^{|z^{\prime}_{B}=3})]}=\mathrm{Tr}_{B}[[-]_{B^{\mathrm{in}}}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=3})]\;, (54)

    provided the denominator of Eq. (54) is nonzero (which does depend on [ϕ][\phi]). And since the abducted probabilities for the two values of λ\lambda in this case are equal, the expected counterfactual probability also has the same numerical value whenever the denominator of Eq. (54) is nonzero for both values of λ\lambda, namely,

    Pa|zA=3a|𝐳(b)=λΛP𝐳(λ|a=+)P𝐳λ(b|a=)=TrB[[]BinτBb|zB=3)].\displaystyle P_{a^{\prime}|z_{A}^{\prime}=3}^{a|\mathbf{z}}(b^{\prime})=\sum_{\lambda\in\Lambda}P_{\mathbf{z}}(\lambda|a=+)P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-)=\mathrm{Tr}_{B}[[-]_{B^{\mathrm{in}}}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=3})]\;. (55)

    However, there are exceptions to this, for some specific values of [ϕ][\phi]. The denominator of Eq. (54) will be zero (in other words, the counterfactual antecedent will be impossible) for λ=0\lambda=0 when [ϕ]=[0][\phi]=[0] and for λ=1\lambda=1 when [ϕ]=[1][\phi]=[1]. In these cases we set the value of the corresponding counterfactual probability in Eq. (54) – and for the expected counterfactual probability in Eq. (55) – to *, by definition, to indicate a counterpossible (see Section 5.1). Whenever it is numerically well-defined, on the other hand, the expected counterfactual probability has the same value as in cases (i) and (ii), considering that the instrument denoted by zB=3z^{\prime}_{B}=3 is again arbitrary.

Example 2.

Consider the same setup as in Ex. 1, but with a different instrument at the exogenous node Λ\Lambda,

Λ2={τλ}λ=+,={12(([+]Λout)T𝕀Λin),12(([]Λout)T𝕀Λin)},\mathcal{I}_{\Lambda}^{2}=\{\tau^{\lambda}\}_{\lambda=+,-}=\Bigl{\{}\frac{1}{2}(([+]_{\Lambda^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{\Lambda^{\mathrm{in}}}),\frac{1}{2}(([-]_{\Lambda^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{\Lambda^{\mathrm{in}}})\Bigr{\}}\;, (56)

This instrument also prepares at the output of the exogenous node, on average, the maximally entangled state, that is, τΛ12𝕀=λ=+,τΛλ\tau_{\Lambda}^{\frac{1}{2}\mathbb{I}}=\sum_{\lambda=+,-}\tau^{\lambda}_{\Lambda}. This implies that, averaging over the values of the exogenous variable, we obtain the same process operator σAB\sigma_{AB} for the endogenous nodes, and thus the same quantum causal model as per Def. 7. However, it implies a distinct quantum structural causal model, as per Def. 10. Let’s analyse some of the consequences of this fact for the evaluation of counterfactuals.

In contrast to Ex. 1, the conditional process operators for outcomes λ{+,}\lambda\in\{+,-\} now become

σABλ=+\displaystyle\sigma_{AB}^{\lambda=+} =TrΛ[ρAB|AΛUτ~Λλ=+]P(λ=+)=ρBin|Aoutid[+]Ain,\displaystyle=\frac{\mathrm{Tr}_{\Lambda}\left[\rho_{AB|A\Lambda}^{U}\widetilde{\tau}^{\lambda=+}_{\Lambda}\right]}{P(\lambda=+)}=\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[+]_{A^{\mathrm{in}}}\;, (57)
σABλ=\displaystyle\sigma_{AB}^{\lambda=-} =TrΛ[ρAB|AΛUτ~Λλ=]P(λ=)=ρBin|Aoutid[]Ain.\displaystyle=\frac{\mathrm{Tr}_{\Lambda}\left[\rho_{AB|A\Lambda}^{U}\widetilde{\tau}^{\lambda=-}_{\Lambda}\right]}{P(\lambda=-)}=\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[-]_{A^{\mathrm{in}}}\;. (58)

Again, we compute the (i) passive, (ii) do-interventional, and (iii) active counterfactual probabilities for the same counterfactual queries as in Ex. 1.

The required abducted probabilities (Eq. (27)) will be again the same for all queries – which share the same actual instruments and evidence – and are calculated to be

P𝐳(λ=+|a=+)\displaystyle P_{\mathbf{z}}(\lambda=+|a=+) =1,\displaystyle=1\;, (59)
P𝐳(λ=|a=+)\displaystyle P_{\mathbf{z}}(\lambda=-|a=+) =0.\displaystyle=0\;. (60)
  • (i)

    Passive case: “Given that a=+a=+ occurred in the actually performed instrument A1\mathcal{I}^{1}_{A}, what is the probability that bb^{\prime} would have obtained using the instrument BzB=1\mathcal{I}^{z^{\prime}_{B}=1}_{B}, had it been that a=a^{\prime}=-, using the instrument A1\mathcal{I}^{1}_{A}?”

    Now the antecedent becomes a counterpossible for λ=+\lambda=+, and the corresponding counterfactual probability is set to

    P𝐳λ=+(b|a=)\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda=+}(b^{\prime}|a^{\prime}=-) =P𝐳λ=+(b,a=)P𝐳λ=+(a=)=.\displaystyle=\frac{P_{\mathbf{z}^{\prime}}^{\lambda=+}(b^{\prime},a^{\prime}=-)}{P_{\mathbf{z}^{\prime}}^{\lambda=+}(a^{\prime}=-)}=*\;. (61)

    And since the abducted probability for λ=+\lambda=+ is nonzero, the expected counterfactual probability is also by definition,

    Pa=|zA=1a=+|𝐳(b)\displaystyle P_{a^{\prime}=-|z_{A}^{\prime}=1}^{a=+|\mathbf{z}}(b^{\prime}) =λΛP𝐳(λ|a=+)P𝐳λ(b|a=)=.\displaystyle=\sum_{\lambda\in\Lambda}P_{\mathbf{z}}(\lambda|a=+)P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-)=*\;. (62)

    We thus see that the change in the preparation instrument at the exogenous node leads to a different result from Ex. 1 – even though both cases lead to the same process operator for the endogenous nodes, when averaged over the background variables. This illustrates the dependence of counterfactual questions on the quantum structural causal model rather than on the quantum causal model for the endogenous nodes alone.

  • (ii)

    Do-interventional case: “Given that a=+a=+ occurred in the actually performed instrument A1\mathcal{I}^{1}_{A}, what is the probability that bb^{\prime} would have obtained using the instrument BzB=2\mathcal{I}^{z^{\prime}_{B}=2}_{B}, had it been that a=a^{\prime}=-, using the instrument AzA=2=τAdo([])\mathcal{I}^{z^{\prime}_{A}=2}_{A}=\tau^{\mathrm{do}([-])}_{A}?”

    The do-intervention Eq. (48) now guarantees that the denominator of the counterfactual probability is nonzero (indeed 1),

    P𝐳λ(b|a=)=P𝐳λ(b,a=)P𝐳λ(a=)=P𝐳λ(b,a=).\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-)=\frac{P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime},a^{\prime}=-)}{P_{\mathbf{z}^{\prime}}^{\lambda}(a^{\prime}=-)}=P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime},a^{\prime}=-)\;. (63)

    Since the abducted probability for λ=\lambda=-, given the observed evidence, is zero, the expected counterfactual probability (Eq. (29)) is equal to the counterfactual probability corresponding to λ=+\lambda=+, which is computed as

    Pa=|zA=2a=+|𝐳(b)\displaystyle P_{a^{\prime}=-|z_{A}^{\prime}=2}^{a=+|\mathbf{z}}(b^{\prime}) =P𝐳λ=+(b|a=)\displaystyle=P_{\mathbf{z}^{\prime}}^{\lambda=+}(b^{\prime}|a^{\prime}=-) (64)
    =TrAB[ρBin|Aoutid[+]Ain(τAa=|2τBb|zB=2)]\displaystyle=\mathrm{Tr}_{AB}[\rho_{B^{\mathrm{in}}|A^{\mathrm{out}}}^{\mathrm{id}}\otimes[+]_{A^{\mathrm{in}}}(\tau_{A}^{a^{\prime}=-|2}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=2})] (65)
    =TrB[[]BinτBb|zB=2].\displaystyle=\mathrm{Tr}_{B}[[-]_{B^{\mathrm{in}}}\tau_{B}^{b^{\prime}|z^{\prime}_{B}=2}]\;. (66)

    Note that this is the same result as for the same query in Example 1, since the do-intervention breaks the causal dependence from the exogenous variables in this particular case.

  • (iii)

    Active case: “Given that a=+a=+ occurred in the actually performed instrument A1\mathcal{I}^{1}_{A}, what is the probability that bb^{\prime} would have obtained using the instrument BzB=3\mathcal{I}^{z^{\prime}_{B}=3}_{B}, had it been that a=a^{\prime}=-, using the instrument A3\mathcal{I}^{3}_{A}?”

    Using the instrument in Eq. (53), we find the counterfactual probability to have the same form for the two values of λ\lambda, and largely independent of [ϕ][\phi] (provided the denominator is nonzero, which again depends on [ϕ][\phi]),

    P𝐳λ(b|a=)\displaystyle P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime}|a^{\prime}=-) =P𝐳λ(b,a=)P𝐳λ(a=)=TrB[[]BinτBb|zB=3)].\displaystyle=\frac{P_{\mathbf{z}^{\prime}}^{\lambda}(b^{\prime},a^{\prime}=-)}{P_{\mathbf{z}^{\prime}}^{\lambda}(a^{\prime}=-)}=\mathrm{Tr}_{B}[[-]_{B^{\mathrm{in}}}\otimes\tau_{B}^{b^{\prime}|z^{\prime}_{B}=3})]\;. (67)

    Distinct from Example 1, however, the denominator of Eq. (67) will be zero (in other words, the counterfactual antecedent will be impossible) for λ=+\lambda=+ when [ϕ]=[+][\phi]=[+] and for λ=\lambda=- when [ϕ]=[][\phi]=[-]. However, since the only nonzero abducted probability is P𝐳(λ=+|a=+)=1P_{\mathbf{z}}(\lambda=+|a=+)=1, the expected counterfactual probability is numerically well-defined for all [ϕ][+][\phi]\neq[+],

    Pa=|zA=3a=+|𝐳(b)=P𝐳λ=+(b|a=)=TrB[[]BinτBb|zB=3].\displaystyle P_{a^{\prime}=-|z_{A}^{\prime}=3}^{a=+|\mathbf{z}}(b^{\prime})=P_{\mathbf{z}^{\prime}}^{\lambda=+}(b^{\prime}|a^{\prime}=-)=\mathrm{Tr}_{B}[[-]_{B^{\mathrm{in}}}\tau_{B}^{b^{\prime}|z^{\prime}_{B}=3}]\;. (68)

Comparing the two examples, we see that while the passive counterfactual in Ex. 2 is always a counterpossible, that is, it has an impossible antecedent (and is thus assigned a conventional value *), the same passive counterfactual evaluated in Ex. 1 yields an expected counterfactual probability that is always numerically well-defined. This is in stark contrast to the classical case, where - as a consequence of the intrinsic determinism of classical structural causal models - a passive interpretation of a counterfactual query would always result in a counterpossible. Note also that both examples have the same average state (a maximally mixed state) prepared at the exogenous node, showing that different contexts for the state preparations of the same mixed state, and thus different quantum structural models, can result in different evaluations for a quantum counterfactual. We also see that in both examples the do-interventional counterfactual is numerically well-defined and has the same value, whereas the counterfactual probabilities in active cases are counterpossibles for different counterfactual instruments.

These distinctions then lead to the question: what should we do when a counterfactual statement is not already in standard form, and thus it is unclear whether it should be interpreted as a passive or active counterfactual?

7.2 Disambiguation of counterfactual queries: the principle of minimality

Note that classical counterfactuals evaluated with respect to a probabilistic structural causal model M,P(𝐮)\langle M,P(\mathbf{u})\rangle correspond to do-interventional (quantum) counterfactual queries when evaluated with respect to the quantum extension MQM_{Q} (cf. Cor. 1). In fact, a classical counterfactual query in Def. 4 is always defined in terms of a do-intervention, since this is the only way to make the antecedent true. In this sense, we may say that classical counterfactual queries naturally embed into our formalism as do-interventional counterfactuals.

Yet, the richer structure of quantum counterfactuals, as seen in the previous section, may sometimes allow for a different interpretation of a classical counterfactual query, in particular, the antecedent of a quantum counterfactual can sometimes be true without an intervention. This leaves a certain ambiguity if we want to interpret a classical counterfactual as a quantum counterfactual query according to Def. 13: for the latter, one must specify a counterfactual instrument, in particular, one must decide whether to interpret the classical counterfactual query passively or actively (do-interventionally). For example, again referring to the scenario represented in Fig. 8, consider the query:

Given that a=+a=+, what is the probability that bb^{\prime}, had it been that a=a^{\prime}=-?

Note that all of the counterfactuals in the previous section are of this form, until we specify what instruments those outcomes correspond to. And in the model of Example 1, both the passive and do-interventional interpretations of this query are always numerically well-defined.

This ambiguity does not occur in a classical structural causal model (CSM), since in that case all the variables are determined by a complete specification of the exogenous variables. Consequently, the only way the antecedent of a counterfactual like the one above could be realized while keeping the background variables fixed, is via some modification of the model.171717We remark that, contrary to Pearl, a counterfactual may also be interpreted as a backtracking counterfactual, where the background conditions are not necessarily kept fixed. A semantics for backtracking counterfactuals within a classical SCM has recently been proposed in Ref. [46]. Pearl justifies the do-intervention as “the minimal change (to a model) necessary for establishing the antecedent” [1]. In our case, due to the split-node structure, a do-intervention is reflected not as a change in the model itself, but as a change in the instrument used at the antecedent nodes, that is, via the use of a do-instrument.

To decide whether a counterfactual query not in standard form should be analyzed as passive or active when interpreted with respect to a QSM, we thus propose a principle of minimality, motivated by the minimal changes from actuality required in Pearl’s analysis. If the antecedent of a counterfactual can be established with no change to (the instruments applied to) a model – that is, as in a passive reading of the counterfactual – this is by definition the minimal change.

Definition 15 (Principle of Minimality).

Whenever it is ambiguous whether a counterfactual query should be interpreted as a passive or active counterfactual in a QSM (as by Def. 14), it should be interpreted passively if it is not a counterpossible, that is, if its antecedent is not impossible (as by Eq. (34)).

Lewis’ account of counterfactuals invokes a notion of similarity among possible worlds [16]. For Lewis, one should order the closest possible worlds by some measure of similarity, based on which a counterfactual is declared true in a world ww if the consequent of the counterfactual is true in all the closest worlds to ww where the antecedent of the counterfactual is true. Arguably, a world in which both the model and instruments are the same, but where the counterfactual antecedent occurs, is closer to the actual world than any world where a different instrument is used. Thus the Principle of Minimality is also justified by a form of Lewis’ analysis applied to our case.

In Example 1, however, where both the active and do-interventional readings are numerically well-defined, they also produce the same (expected) counterfactual probability, rendering the ambiguity essentially irrelevant. We next turn to an important example where this is not the case.

7.3 Causal dependence and counterfactual dependence in the Bell scenario

A conceptually important consequence of our semantics for counterfactuals (and relevant for the disambiguation of passive from active counterfactual queries) is that, unlike in the case of Pearl’s framework, counterfactual dependence does not in general imply causal dependence. We establish this claim using the pertinent example of a common cause scenario, as shown in Fig. 9. This is essentially the causal structure of a Bell scenario, as in Fig. 5, although here we omit the nodes associated with the choices of setting XX and YY, which are now the choices of instrument for the quantum nodes AA and BB, and left implicit181818Similarly, the labels AA and BB of the quantum nodes themselves are not to be confused with the outcomes of instruments that may be used at those nodes.

CCAABB
Figure 9: Causal graph of a quantum causal model where CC is a common cause of AA and BB (equivalent to the Bell scenario in (Fig. 5), but with choices of instruments at the quantum nodes AA and BB left implicit).
Example 3 (Bell scenario).

Consider the causal scenario in Fig. 9, with instruments

A\displaystyle\mathcal{I}_{A} ={([0]Aout)T[0]Ain,([1]Aout)T[1]Ain},\displaystyle=\{([0]_{A^{\mathrm{out}}})^{T}\otimes[0]_{A^{\mathrm{in}}},([1]_{A^{\mathrm{out}}})^{T}\otimes[1]_{A^{\mathrm{in}}}\}\;, (69)
B\displaystyle\mathcal{I}_{B} ={([0]Bout)T[0]Bin,([1]Bout)T[1]Bin},\displaystyle=\{([0]_{B^{\mathrm{out}}})^{T}\otimes[0]_{B^{\mathrm{in}}},([1]_{B^{\mathrm{out}}})^{T}\otimes[1]_{B^{\mathrm{in}}}\}\;, (70)
C\displaystyle\mathcal{I}_{C} ={([Φ+]Cout)T𝕀Cin},\displaystyle=\Bigl{\{}([\Phi_{+}]_{C^{\mathrm{out}}})^{T}\otimes\mathbb{I}_{C^{\mathrm{in}}}\Bigr{\}}\;, (71)

where the output of CC factorises as Cout=CAoutCBoutC^{\mathrm{out}}=C_{A}^{\mathrm{out}}\otimes C_{B}^{\mathrm{out}} and where |Φ+=12(|0CAout|0CBout+|1CAout|1CBout)|\Phi_{+}\rangle=\frac{1}{\sqrt{2}}(|0\rangle_{C_{A}^{\mathrm{out}}}|0\rangle_{C_{B}^{\mathrm{out}}}+|1\rangle_{C_{A}^{\mathrm{out}}}|1\rangle_{C_{B}^{\mathrm{out}}}) is a Bell state. Let the unitary channel ρAB|CU\rho_{AB|C}^{U} be given by identities

ρAB|CU=ρA|CAoutidρB|CBoutid.\rho_{AB|C}^{U}=\rho_{A|C_{A}^{\mathrm{out}}}^{\mathrm{id}}\rho_{B|C_{B}^{\mathrm{out}}}^{\mathrm{id}}\;. (72)

Here for simplicity we consider a case where we have complete information about the event at the common cause node CC (that is, the single-outcome instrument that prepares the Bell state), and thus there is no exogenous node, and no abduction step is necessary. Now consider the counterfactual query

Q1Q_{1}:“Given that a=b=0a=b=0, what is the probability that b=1b^{\prime}=1 had it been that a=1a^{\prime}=1?”.

This query is ambiguous until we specify what instruments those counterfactual outcomes correspond to. On the one hand, interpreting as a do-interventional counterfactual with τAa=1=τAdo([1])\tau_{A}^{a^{\prime}=1}=\tau_{A}^{\mathrm{do}([1])}, while keeping the instrument at BB fixed – the most parsimonious interpretation, since the consequent of a counterfactual does not call for an intervention even in the classical case – we obtain the (expected) counterfactual probability (which in this case are the same as there is no abduction involved),

Pdo(a=1)a=b=0(b=1)=Pdo(a=1)(b=1|a=1)=12.P_{\mathrm{do}(a^{\prime}=1)}^{a=b=0}(b^{\prime}=1)=P_{\mathrm{do}(a^{\prime}=1)}(b^{\prime}=1|a^{\prime}=1)=\frac{1}{2}\;. (73)

Similarly, consider the query

Q2Q_{2}:“Given that a=b=0a=b=0, what is the probability that b=1b^{\prime}=1 had it been that a=0a^{\prime}=0?”.

Interpreted do-interventionally, the answer to this query is

Pdo(a=0)a=b=0(b=1)=Pdo(a=1)(b=1|a=1)=12.P_{\mathrm{do}(a^{\prime}=0)}^{a=b=0}(b^{\prime}=1)=P_{\mathrm{do}(a^{\prime}=1)}(b^{\prime}=1|a^{\prime}=1)=\frac{1}{2}\;. (74)

In other words, there is no counterfactual dependence between bb^{\prime} and aa^{\prime} in this case, when the counterfactual antecedent is interpreted as the outcome of a do-intervention.

On the other hand, both Q1Q_{1} and Q2Q_{2} can be interpreted as a passive counterfactual query (with 𝐳=𝐳\mathbf{z}^{\prime}=\mathbf{z}), since the antecedent of both queries have nonzero model probabilities P𝐳=𝐳(a=1)=P𝐳=𝐳(a=0)=12P_{\mathbf{z}^{\prime}=\mathbf{z}}(a^{\prime}=1)=P_{\mathbf{z}^{\prime}=\mathbf{z}}(a^{\prime}=0)=\frac{1}{2}. In this passive reading, we obtain

Pa=1|𝐳=𝐳a=b=0|𝐳(b=1)=P𝐳=𝐳(b=1|a=1)=1P^{a=b=0|\mathbf{z}}_{a^{\prime}=1|\mathbf{z}^{\prime}=\mathbf{z}}(b^{\prime}=1)=P_{\mathbf{z}^{\prime}=\mathbf{z}}(b^{\prime}=1|a^{\prime}=1)=1\, (75)

for Q1Q_{1}, and

Pa=0|z’=𝐳a=b=0|𝐳(b=1)=P𝐳=𝐳(b=1|a=0)=0P_{a^{\prime}=0|\textbf{z'}=\mathbf{z}}^{a=b=0|\mathbf{z}}(b^{\prime}=1)=P_{\mathbf{z}^{\prime}=\mathbf{z}}(b^{\prime}=1|a^{\prime}=0)=0\; (76)

for Q2Q_{2}. In other words, According to the above equations, it would have been the case that b=1b^{\prime}=1 with certainty had it been the case that a=1a^{\prime}=1, and it would have been the case that b=0b^{\prime}=0 with certainty had it been the case that a=0a^{\prime}=0 (which in fact, is the actual case). In other words, the outcomes at AA and BB are counterfactually dependent.

In Pearl’s classical semantics, counterfactual dependence of the type in Eqs. (75) and (76) would imply that AA is a cause of BB.191919In Lewis’s account [30], such counterfactual dependence also implies causal dependence. The difference is that Pearl analyzes counterfactuals in terms of causation, which he takes to be more fundamental, whereas Lewis analyzes causation in terms of counterfactuals, which he takes as more fundamental. Nevertheless, the quantum structural causal model we used to derive this result has by construction no causal dependence from AA to BB. This shows that in quantum causal models, (passive) counterfactual dependence does not imply causal dependence.

Note also that in the passive reading, the counterfactual antecedent corresponds to an event – in the technical sense of an instrument element, that is, a CP map – that was one of the potential outcomes of the actual instrument. A counterfactual antecedent interpreted as a do-intervention, on the other hand, is a different event altogether – technically distinct from any event in the actual instrument. This fact is obscured in the classical case, since in Pearl’s formalism we identify the incoming and outgoing systems, and it is implicitly assumed that we can always (at least in principle) perform ideal non-disturbing measurements of the variables involved. Classically, the event ‘X=xX=x’ can ambiguously correspond to “an ideal non-disturbing measurement of XX has produced the outcome xx” or “the variable XX was set to the value X=xX=x”. The distinction between those interpretations, in Pearl, is attributed to the structural relations in the model; the second interpretation is represented as a surgical excision of causal arrows, while leaving the variables themselves otherwise intact. In a quantum causal model, on the other hand, a do-intervention corresponds to a related but technically distinct event in an otherwise intact model.

In our view, counterfactual dependence without causal dependence is a much more distinctively quantum feature than the failure of “counterfactual definiteness”, which we turn to next.

8 A Note on Counterfactual Definiteness and Bell’s theorem

As we’ve seen in Sec. 7, one of the features of our formalism is that a counterfactual proposition is not always either true or false, unlike in the classical semantics. In the classical case, a structural causal model underpins the deterministic nature of the system by defining functional dependencies among the nodes. This is not in the quantum case, even with full knowledge about a quantum structural model. Whereas in the classical case we can define the probability of a counterfactual (to be true), in the quantum case we can in general only define a counterfactual probability.

The lack of definite truth values for counterfactuals in quantum mechanics can be thought of as a failure of “counterfactual definiteness”, a concept that has a long and controversial history in discussions of Bell’s theorem. Skyrms [14] defines counterfactual definiteness (CFD) as follows, (attributing it to Stapp [47], who expressed the idea that Bell’s theorem requires it as an underlying assumption):

“Counterfactual definiteness; essentially the assumption that subjective conditionals of the form: ‘If measurement MM had been performed, result RR would have been obtained’ always have a definite truth value (even for measurements that were not carried out because incompatible measurements were being made) and that the quantum mechanical statistics are the probabilities of such conditionals.”

Skyrms [14] argues, contrary to Stapp, that some forms of Bell’s theorem can be proved using conditional probabilities rather than probabilities of subjunctive conditionals, and hence that CFD is not a necessary assumption in its derivation. Since then there has been a long debate about the status of CFD as an assumption underlying the derivation of Bell’s theorem (see e.g. [48, 49, 50, 51, 52, 53]). Analysing the details of this long and nuanced literature is beyond our scope, but from the perspective of our framework, we can say some (hopefully) clarifying remarks.

Firstly, Bell inequalities can be derived from many different sets of propositions [33, 44]. Disagreements often arise due to different “camps” using the same terms (most notably ‘locality’) to refer to different concepts. Bell’s 1964 theorem [12] explicitly used a notion of ‘locality’ and an assumption of determinism (as well as an implicit ”free choice” assumption). Bell inequalities can also be derived [33], however, by replacing Locality and determinism by Bell’s 1976 notion of Local Causality [54] (stronger than Locality).

In the causal language we are using here, Bell’s 1964 propositions, using determinism, are analogous to assuming the existence of a classical structural causal model (over the common-cause causal structure of a Bell scenario). As we’ve seen above, in Pearl’s semantics, given a classical structural causal model, counterfactuals always have well-defined truth values. When one argues that CFD is necessary for derivations of Bell inequalities, rather than ‘locality’ alone, one is (we surmise) implicitly assuming something like Bell’s 1964 notions of Locality, rather than Local Causality. To derive a Bell inequality, the notion of Locality indeed needs to be supplemented with something else, and the upshot is that CFD carries the same effect in this context as assuming determinism, as Bell did in 1964.

However, one may also assume Local Causality instead, as Bell did in 1976 (or indeed other assumptions [33, 44]), without assuming determinism. In this case, in causal language, one can effectively assume a Classical Causal Model, without further assuming that there exists an underlying Structural Causal Model. In this case, CFD may in general fail, or at least we fail to have the structure necessary to define Pearl’s semantics of counterfactuals. Thus, Bell inequalities can be derived without assuming CFD.

Nevertheless, one may argue (essentially as in [49]), that determinism can be derived from Local Causality plus the perfect correlations of a pure entangled state. This would then make CFD true after all. However, perfect correlations are not observable in practice in real experiments, and in any case this does not change the fact that the derivation of Bell inequalities does not require perfect correlations (and indeed this is what makes them experimentally testable!).

Clearly, CFD as defined by Skyrms [14] does indeed fail in the quantum framework presented here. This however does not imply that Bell’s theorem can be resolved merely by rejecting CFD. Indeed, CFD may fail even in a completely classical but indeterministic causal model. The violation of Bell inequalities is explained within quantum causal models not simply as due to the indeterminism of quantum theory, but something deeper. One way of thinking about it is that it is due to the failure of Reichenbach’s principle of common cause, which is in turn a consequence of the Classical Causal Markov Condition: in quantum causal models, a complete specification of the causes of an event (in the case of a Bell scenario, the preparation of the entangled state) does not in general render it uncorrelated with its non-effects – even if those non-effects are space-like separated, like the outcome of a distant instrument. Another way of thinking about it, we argue here, is that instead of the failure of CFD, the “quantumness” of quantum causal models is better captured by the fact that, as discussed in Sec. 7.3, quantum causal models allow for counterfactual dependence without causal dependence. A more detailed discussion of this point, and how it does not arise merely as an artefact of the split-node structure of quantum causal models, will be left for future work [55].

9 Generalisations and related work: quantum Bayes’ theorem

Thm. 3 and Cor. 1 show that our formalism for counterfactuals in quantum causal models (see Sec. 5) is a valid generalization of Pearl’s formalism in the classical case (see Sec. 2). In this section, we review the key assumptions of our formalism, discuss possible generalizations, and draw parallels with related work on quantum Bayesian inference.

Recall that our notion of a ‘quantum counterfactual’ in Def. 13 is evaluated with respect to a quantum structural causal model (QSM) (see Def. 10). A QSM MQM_{Q} reproduces a given physical process operator σ𝐀\sigma_{\mathbf{A}} over observed nodes 𝐀\mathbf{A}, that is, σ𝐀\sigma_{\mathbf{A}} arises from coarse-graining of ancillary (environmental) degrees of freedom in MQM_{Q} (cf. Eq. (21)). As such, MQM_{Q} encodes additional information that is not present in σ𝐀\sigma_{\mathbf{A}}: namely, (i) it assumes an underlying unitary process ρ𝐀S|𝐀𝚲U\rho_{\mathbf{A}S|\mathbf{A}\mathbf{\Lambda}}^{U}, and (ii) it incorporates partial knowledge about the preparation of ancillary states at exogenous nodes in the form of preparation instruments {τΛiλi}λi\{\tau^{\lambda_{i}}_{\Lambda_{i}}\}_{\lambda_{i}} (cf. Eq. (19)), acting on an arbitrary input state. Together, this allowed us to reduce the abduction step in our formalism to classical Bayesian inference.

We remark that this situation (of a unitary background process with ancillas prepared in a fixed basis) arises naturally in the context of quantum circuits, future quantum computers, and thus supposedly in the context of future quantum AI. Nevertheless, for other use cases it might be less clear how to model our background knowledge on a physical process σ𝐀\sigma_{\mathbf{A}} in terms of a QSM, thus prompting relaxations of the assumptions baked into Def. 10. First, one may want to drop our assumption of a unitary background process. This assumption closely resembles Pearl’s classical formalism, which models any uncertainty about a stochastic physical process as a probabilistic mixture of deterministic processes. Yet, one might argue that assuming a unitary background process is too restrictive (or perhaps even in general fundamentally unwarranted) and that one should allow for arbitrary convex decompositions of a quantum stochastic process (CPTP map). To this end, note that knowledge about stable facts [43] that lead to a preferred convex decomposition of the process operator σ𝐀=𝝀P(𝝀)σ𝐀𝝀\sigma_{\mathbf{A}}=\sum_{\bm{\lambda}}P(\bm{\lambda})\sigma^{\bm{\lambda}}_{\mathbf{A}} (into valid process operators σ𝐀𝝀\sigma^{\bm{\lambda}}_{\mathbf{A}}) is all that is necessary to perform (classical) Bayesian inference (cf. Eq. (27)).

A more radical generalization could arise by taking our information about exogenous variables to be inherently quantum. That is, without information in the form of stable facts regarding the distribution over outcomes of preparation instruments in a QSM, our knowledge about exogenous variables merely takes the form of a generic quantum state ρ𝚲=ρΛ1ρΛn\rho_{\bm{\Lambda}}=\rho_{\Lambda_{1}}\otimes\cdots\otimes\rho_{\Lambda_{n}}.202020Note that without the extra information about exogenous instruments in a QSM, we reduce to the situation described by a “unitary process operator with inputs”, as defined in Def. 4.5 of Ref. [29] (see also Thm. 2). In this case, inference can no longer be described by (classical) Bayes’ theorem but requires a quantum generalization. Much recent work has been devoted to finding a generalization of Bayes’ theorem to the quantum case, which has given rise to various different proposals for a quantum Bayesian inverse – see Ref. [56, 57, 58] for a recent categorical (process-diagrammatic) definition and Ref. [59] for an attempt at an axiomatic derivation. Once a definition for the Bayesian inverse has been fixed - and provided it exists212121Ref. [58] characterises the existence of a Bayesian inverse in the categorical setting for finite-dimensional CC^{*}-algebras. - we can perform a generalized abduction step in Sec. 5.1 and, consequently, obtain a generalised formalism for counterfactuals. We leave a more careful analysis of counterfactuals arising in this way and their comparison to our formalism for future study.

10 Conclusion

We defined a notion of counterfactual in quantum causal models and provided a semantics to evaluate counterfactual probabilities, generalizing the three-step procedure of evaluating probabilities of counterfactuals in classical causal models due to Pearl [1]. The third level in Pearl’s ladder of causality (see Fig. 1) had thus far remained an open gap in the generalization of Pearl’s formalism to quantum causal models; here, we fill this gap.

To this end, we introduce the notion of a quantum structural causal model, which takes inspiration from Pearl’s notion of a classical structural causal model, yet differs from the latter in several ways. A quantum structural model is fundamentally probabilistic; it does not assign truth values to all counterfactuals, and in this sense violates “counterfactual definiteness”.

Despite these differences, we prove that every classical structural causal model admits an extension to a quantum structural causal model, which preserves the relevant independence conditions and yields the same probabilities for counterfactual queries arising in the classical case. Thus, quantum structural causal models and the evaluation of counterfactuals therein subsume Pearl’s classical formalism.

On the other hand, quantum structural causal models have a richer structure than their classical counterparts. We identify different types of counterfactual queries arising in quantum causal models, and explain how they are distinguished from counterfactual queries in classical causal models. Based on this distinction, we evaluate these different types of quantum counterfactual queries in the Bell scenario and show that counterfactual dependence does not generally imply causal dependence in this case. In this way, our analysis provides a new way of understanding how quantum causal models generalize Reichenbach’s principle of common cause to the quantum case [34, 21, 22]: a quantum common cause allows for counterfactual dependence without causal dependence, unlike a classical common cause.

Our work opens up several avenues for future study. Of practical importance are applications of counterfactuals in quantum technologies. For example, questions such as “Given that certain outcomes were observed at receiver nodes in a quantum network, what is the probability that different outcomes would have been observed, had there been an eavesdropper in the network?” can be relevant for security applications.

It is well-known that quantum theory violates “counterfactual definiteness” in the sense of the phenomenon often referred to as ‘quantum contextuality’ [13]. The latter has been identified as a key distinguishing feature between classical and quantum physics, as well as a resource for quantum computation [60, 61, 62, 63]. It would thus be interesting to study contextuality from the perspective of the counterfactual semantics spelled out here.

Finally, our analysis hinges on the classicality (‘stable facts’ in Ref. [43]) of background (exogenous) variables in the model, as it allows us to apply (classical) Bayes’ inference on our classical knowledge about exogenous variables. In turn, considering the possibility of our ‘prior’ knowledge about exogenous variables to be genuinely quantum motivates a generalization of Bayes’ theorem to the quantum case (see Sec. 9). We expect that combining our ideas with recent progress along those lines will constitute a fruitful direction for future research.

Acknowledgements.

The authors acknowledge financial support through grant number FQXi-RFP-1807 from the Foundational Questions Institute and Fetzer Franklin Fund, a donor advised fund of Silicon Valley Community Foundation, and ARC Future Fellowship FT180100317.

References

  • [1] J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2000.
  • [2] J. D. Fearon. Causes and Counterfactuals in Social Science: Exploring an analogy between cellular automata and historical processes, pages 39–68. Princeton University Press, 1996.
  • [3] M. Loi and M. Rodrigues. A note on the impact evaluation of public policies: the counterfactual analysis. Nov 2012. https://mpra.ub.uni-muenchen.de/id/eprint/42444.
  • [4] S. Tagini et al. Counterfactual thinking in psychiatric and neurological diseases: A scoping review. PLOS ONE, 16(2):e0246388, Feb 2021. https://doi.org/10.1371/journal.pone.0246388.
  • [5] M. Ravallion. Poverty in China since 1950: A Counterfactual Perspective. Technical report, National Bureau of Economic Research, Jan 2021. 10.3386/w28370.
  • [6] G. Woo. A counterfactual perspective on compound weather risk. Weather and Climate Extremes, 32:100314, Jun 2021. https://doi.org/10.1016/j.wace.2021.100314.
  • [7] K. Holtman. Counterfactual Planning in AGI systems. arXiv e-prints, Jan 2021. https://doi.org/10.48550/arXiv.2102.00834.
  • [8] C. Hoerl et al. Understanding Counterfactuals, Understanding Causation: Issues in Philosophy and Psychology. Oxford University Press, 2011.
  • [9] J. Collins et al. Causation and Counterfactuals. MIT Press, 2004.
  • [10] B. Black et al. Covid-19 boosters: If the us had matched israel’s speed and take-up, an estimated 29,000 us lives would have been saved: Study compares israel’s covid-19 booster performance to the us. Health Affairs, 42(12):1747–1757, 2023. https://doi.org/10.1377/hlthaff.2023.00718.
  • [11] L. Vaidman. Counterfactuals in Quantum Mechanics. In Compendium of Quantum Physics, pages 132–136. Springer, 2009.
  • [12] J. S. Bell. On the Einstein-Podolsky-Rosen paradox. Physics, 1:195, Nov 1964. https://doi.org/10.1103/PhysicsPhysiqueFizika.1.195.
  • [13] S. Kochen and E. P. Specker. The problem of hidden variables in quantum mechanics. Journal of Mathematics and Mechanics, 17:59–87, 1967.
  • [14] B. Skyrms. Counterfactual definiteness and local causation. Philosophy of Science, 49(1):43–50, 1982. https://doi.org/10.1086/289033.
  • [15] A. Peres. Unperformed experiments have no results. American Journal of Physics, 46(7):745–747, 1978. https://doi.org/10.1119/1.11393.
  • [16] D. Lewis. Counterfactuals and Comparative Possibility. In IFS, pages 57–85. Springer, 1973.
  • [17] J. Pearl and D. Mackenzie. The Book of Why: the New Science of Cause and Effect. Basic Books, 2018.
  • [18] C. J. Wood and R. W. Spekkens. The lesson of causal discovery algorithms for quantum correlations: causal explanations of Bell-inequality violations require fine-tuning. New J. Phys., 17(3):033002, Mar 2015. 10.1088/1367-2630/17/3/033002.
  • [19] E. G. Cavalcanti. Classical causal models for Bell and Kochen-Specker inequality violations require fine-tuning. Phys. Rev. X, 8(2):021018, Apr 2018. https://doi.org/10.1103/PhysRevX.8.021018.
  • [20] J. C. Pearl and E. G. Cavalcanti. Classical causal models cannot faithfully explain Bell nonlocality or Kochen-Specker contextuality in arbitrary scenarios. Quantum, 5:518, Aug 2021. https://doi.org/10.22331/q-2021-08-05-518.
  • [21] M. S. Leifer and R. W. Spekkens. Towards a formulation of quantum theory as a causally neutral theory of Bayesian inference. Phys. Rev. A, 88(5), Nov 2013. https://doi.org/10.1103/PhysRevA.88.052130.
  • [22] E. G. Cavalcanti and R. Lal. On modifications of Reichenbach’s principle of common cause in light of Bell’s theorem. J. Phys. A, 47(42):424018, Oct 2014. https://iopscience.iop.org/article/10.1088/1751-8113/47/42/424018.
  • [23] Joe Henson, Raymond Lal, and Matthew F. Pusey. Theory-independent limits on correlations from generalized Bayesian networks. New Journal of Physics, 16(11):113043, November 2014. arXiv: 1405.2572 Publisher: IOP Publishing.
  • [24] Rafael Chaves, Christian Majenz, and David Gross. Information–theoretic implications of quantum causal structures. Nature Communications, 6:5766, January 2015. arXiv: 1407.3800 Publisher: Nature Publishing Group.
  • [25] Jacques Pienaar and Caslav Brukner. A graph-separation theorem for quantum causal models. New Journal of Physics, 17(7):073020, July 2015. arXiv: 1406.0430.
  • [26] F. Costa and S. Shrapnel. Quantum causal modelling. New J. Phys., 18(6):063032, Jun 2016. 10.1088/1367-2630/18/6/063032.
  • [27] J. M. A. Allen et al. Quantum common causes and quantum causal models. Phys. Rev. X, 7:031021, Jul 2017. https://doi.org/10.1103/PhysRevX.7.031021.
  • [28] S. Shrapnel et al. Updating the Born rule. New J. Phys., 20(5):053010, May 2018. 10.1088/1367-2630/aabe12.
  • [29] J. Barrett, R. Lorenz, and O. Oreshkov. Quantum causal models, Jun 2019. https://doi.org/10.48550/arXiv.1906.10726.
  • [30] D. Lewis. Causation. J. Philos, 70(17):556, Oct 1973. 10.2307/2025310.
  • [31] A. Shimony. Controllable and uncontrollable non-locality. In S. Kamefuchi, editor, Foundations of Quantum Mechanics in the Light of New Technology, pages 225–230. Physical Society of Japan, Tokyo, 1984.
  • [32] J. S. Bell. The theory of local beables. Epistemol. Lett., 9:11–24, Nov 1975.
  • [33] H. M. Wiseman and E. G. Cavalcanti. Causarum Investigatio and the Two Bell’s Theorems of John Bell, pages 119–142. Springer, Cham, 2017.
  • [34] H. Reichenbach. The Direction of Time, volume 65. University of California Press, 1991.
  • [35] J. F. Clauser et al. Proposed experiment to test local hidden-variable theories. Phys. Rev. Lett., 23:880–884, Oct 1969. https://doi.org/10.1103/PhysRevLett.23.88.
  • [36] A. Aspect et al. Experimental tests of realistic local theories via Bell’s theorem. Phys. Rev. Lett., 47:460–463, Aug 1981. https://doi.org/10.1103/PhysRevLett.47.460.
  • [37] M. Giustina et al. Significant-loophole-free test of Bell’s theorem with entangled photons. Phys. Rev. Lett., 115:250401, Dec 2015. https://doi.org/10.1103/PhysRevLett.115.250401.
  • [38] L. K. Shalm et al. Strong loophole-free test of local realism. Phys. Rev. Lett., 115:250402, Dec 2015. https://doi.org/10.1103/PhysRevLett.115.250402.
  • [39] T. Hoffreumon and O. Oreshkov. The multi-round process matrix. Quantum, 5:384, Jan 2021. https://doi.org/10.22331/q-2021-01-20-384.
  • [40] M. Frembs and E. G. Cavalcanti. Variations on the Choi-Jamiołkowski isomorphism. arXiv e-prints, Nov 2022. https://doi.org/10.48550/arXiv.2211.16533.
  • [41] A. Jamiołkowski. Linear transformations which preserve trace and positive semidefiniteness of operators. Rep. Math. Phys., 3(4):275 – 278, Dec 1972. https://doi.org/10.1016/0034-4877(72)90011-0.
  • [42] M. Araújo et al. Witnessing causal nonseparability. New J. Phys., 17(10):102001, Oct 2015. 10.1088/1367-2630/17/10/102001.
  • [43] A. Di Biagio and C. Rovelli. Stable facts, relative facts. Found. Phys., 51(1):30, Feb 2021. https://doi.org/10.1007/s10701-021-00429-w.
  • [44] E. G. Cavalcanti and H. M. Wiseman. Implications of local friendliness violation for quantum causality. Entropy, 23(8):925, 2021. https://doi.org/10.3390/e23080925.
  • [45] Eric G. Cavalcanti. Bell’s theorem and the measurement problem: reducing two mysteries to one? Journal of Physics: Conference Series, 701:012002, March 2016. arXiv: 1602.07404.
  • [46] J. von Kügelgen et al. Backtracking counterfactuals. arXiv e-prints, Nov 2022. https://doi.org/10.48550/arXiv.2211.00472.
  • [47] Henry Pierce Stapp. S-matrix interpretation of quantum theory. Physical Review D, 3(6):1303, 1971. https://doi.org/10.1103/PhysRevD.3.1303.
  • [48] Guy Blaylock. The EPR paradox, Bell’s inequality, and the question of locality. American Journal of Physics, 78(1):111–120, 2010. https://doi.org/10.1119/1.3243279.
  • [49] Tim Maudlin. What Bell proved: A reply to Blaylock. American Journal of Physics, 78(1):121–125, 2010. https://doi.org/10.1119/1.3243280.
  • [50] Robert B Griffiths. EPR, Bell, and quantum locality. American Journal of Physics, 79(9):954–965, 2011. https://doi.org/10.1119/1.3606371.
  • [51] Tim Maudlin. How Bell reasoned: A reply to Griffiths. American Journal of Physics, 79(9):966–970, 2011. https://doi.org/10.1119/1.3606476.
  • [52] Justo Pastor Lambare and Rodney Franco. A note on Bell’s theorem logical consistency. Foundations of Physics, 51(4):84, 2021. https://doi.org/10.1007/s10701-021-00488-z.
  • [53] Marek Żukowski and Časlav Brukner. Quantum non-locality-it ain’t necessarily so… Journal of Physics A: Mathematical and Theoretical, 47(42):424009, 2014. 10.1088/1751-8113/47/42/424009.
  • [54] John Stewart Bell. Epistemological Lett. 9. pages 11–24, 1976.
  • [55] Ardra Kooderi Suresh and Eric G. Cavalcanti. Counterfactuals in classical split-node causal models (in preparation).
  • [56] K. Cho and B. Jacobs. Disintegration and Bayesian inversion via string diagrams. Math. Struct. Comput. Sci., 29(7):938–971, 2019. https://doi.org/10.1017/S0960129518000488.
  • [57] T. Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Adv. Math., 370:107239, 2020. https://doi.org/10.1016/j.aim.2020.107239.
  • [58] A.J. Parzygnat and B. P. Russo. A non-commutative Bayes’ theorem. Lin. Alg. Appl., 644:28–94, Jul 2022. https://doi.org/10.1016/j.laa.2022.02.030.
  • [59] A. J. Parzygnat and F. Buscemi. Axioms for retrodiction: achieving time-reversal symmetry with a prior. arXiv e-prints, Oct 2022. https://doi.org/10.48550/arXiv.2210.13531.
  • [60] R. Raussendorf. Contextuality in measurement-based quantum computation. Phys. Rev. A, 88(2):022322, Aug 2013. https://doi.org/10.1103/PhysRevA.88.022322.
  • [61] M. Howard et al. Contextuality supplies the ‘magic’ for quantum computation. Nature, 510(7505):351—355, Jun 2014. https://doi.org/10.1038/nature13460.
  • [62] M. Frembs et al. Contextuality as a resource for measurement-based quantum computation beyond qubits. New J. Phys., 20(10):103011, Oct 2018. https://iopscience.iop.org/article/10.1088/1367-2630/aae3ad.
  • [63] M. Frembs et al. Hierarchies of resources for measurement-based quantum computation. New J. Phys., 25(1):013002, Jan 2023. https://iopscience.iop.org/article/10.1088/1367-2630/acaee2.
  • [64] M. Bataille. Quantum circuits of CNOT gates. arXiv preprint arXiv:2009.13247, Oct 2020. https://doi.org/10.48550/arXiv.2009.13247.

Appendix A Proof of Thm. 3

The proof consists of several parts: (i) we find a binary extension of a classical structural causal model (CSM), (ii) we provide a protocol that extends a (binary) CSM to a reversible one, where all functional relations are bijective, (iii) we encode classical copy operations in a quantum structural causal model (QSM) using CNOT-gates. In the final part (iv), we combine (i)-(iii) to construct a QSM MQM_{Q}, which (linearly) extends a PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle as desired.

(i) binary encoding: every CSM has a binary extension. Let M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle be a CSM (see Def. 3). First, we enlarge the sets 𝐕,𝐔\mathbf{V},\mathbf{U} in MM to sets of cardinality a power of two. For all i{1,,n}i\in\{1,\cdots,n\}, let Vi(b)V^{(b)}_{i} denote a set of cardinality Ni=|Vi(b)|=log2|Vi|N_{i}=|V^{(b)}_{i}|=\lceil\log_{2}|V_{i}|\rceil. Second, we extend 𝐅\mathbf{F} to the enlarged sets Ui(b)U^{(b)}_{i} and Vi(b)V^{(b)}_{i} as follows. For every 𝐅fi:Ui×PaiVi\mathbf{F}\ni f_{i}:U_{i}\times\mathrm{Pa}_{i}\rightarrow V_{i}, let fi(b):Ui(b)×Pai(b)Vi(b)f^{(b)}_{i}:U^{(b)}_{i}\times\mathrm{Pa}^{(b)}_{i}\rightarrow V_{i}^{(b)} be the function given by

fi(b)(ui,p1,,pJi)={fi(ui,p1,,pJi)whenever uiUi,pjPai,fi(,)otherwise  .f^{(b)}_{i}(u_{i},p_{1},\cdots,p_{J_{i}})=\begin{cases}f_{i}(u_{i},p_{1},\cdots,p_{J_{i}})&\text{whenever\ }u_{i}\in U_{i},p_{j}\in\mathrm{Pa}_{i}\;,\\ f_{i}(*,*)&\text{otherwise\; .}\end{cases} (77)

In words, fi(b)f_{i}^{(b)} identifies all elements in Ui(b)\UiU^{(b)}_{i}\backslash U_{i} and all elements in Pai(b)\Pai\mathrm{Pa}^{(b)}_{i}\backslash\mathrm{Pa}_{i} with the same (arbitrary) element (,)Ui×Pai(*,*)\in U_{i}\times\mathrm{Pa}_{i}. It is easy to see that the causal structure of the CSM M(b)=𝐔(b),𝐕(b),𝐅(b)M^{(b)}=\langle\mathbf{U}^{(b)},\mathbf{V}^{(b)},\mathbf{F}^{(b)}\rangle is the same as that of MM: the extensions Ui(b)U_{i}^{(b)} and Vi(b)V_{i}^{(b)} of UiU_{i} and ViV_{i} are defined locally, and fi(b)f^{(b)}_{i} has the same functional form as fif_{i}. Consequently, M(b)M^{(b)} admits a (Markov) factorization of the same form as MM in Eq. (30).

(ii) reversibility: every (binary) CSM has a (binary) reversible extension. Let M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle be a CSM. For every fi𝐅f_{i}\in\mathbf{F}, we decompose its domain according to its pre-images fi1(vi)f_{i}^{-1}(v_{i}) for all viViv_{i}\in V_{i}. In particular, let S~i\tilde{S}_{i} be a set of cardinality |S~i|:=maxviVi{|fi1(vi)|}|\tilde{S}_{i}|:=\max_{v_{i}\in V_{i}}\{|f_{i}^{-1}(v_{i})|\}. Next, we label the elements in fi1(vi)f^{-1}_{i}(v_{i}). We will use the lexicographic order li:Ui×Pai{1,,|Ui||Pai|}l_{i}:U_{i}\times\mathrm{Pa}_{i}\rightarrow\{1,\cdots,|U_{i}||\mathrm{Pa}_{i}|\} on (any choice of) alphabets labeling the elements in the sets UiU_{i} and Pai\mathrm{Pa}_{i}, respectively. For every viViv_{i}\in V_{i}, let lvi:fi1(vi){1,,|fi1(vi)|}l_{v_{i}}:f^{-1}_{i}(v_{i})\rightarrow\{1,\cdots,|f^{-1}_{i}(v_{i})|\} be any order on fi1(vi)f^{-1}_{i}(v_{i}) such that lvi(x)lvi(y):li(x)li(y)l_{v_{i}}(x)\leq l_{v_{i}}(y):\Leftrightarrow l_{i}(x)\leq l_{i}(y) for all x,yfi1(vi)x,y\in f^{-1}_{i}(v_{i}). Combining orders lvil_{v_{i}} on fi1(vi)f_{i}^{-1}(v_{i}), we obtain an order on \bigcupdotviVifi1(vi)=Ui×Pai\bigcupdot_{v_{i}\in V_{i}}f^{-1}_{i}(v_{i})=U_{i}\times\mathrm{Pa}_{i}, denoted by lfi:Ui×Pai{1,,|Vi||S~i|}l_{f_{i}}:U_{i}\times\mathrm{Pa}_{i}\rightarrow\{1,\cdots,|V_{i}||\tilde{S}_{i}|\}. With this order, we define an extension f~i:Ui×PaiVi×S~i\tilde{f}_{i}:U_{i}\times\mathrm{Pa}_{i}\rightarrow V_{i}\times\tilde{S}_{i} of the form

f~i(ui,p1,,pJi).\tilde{f}_{i}(u_{i},p_{1},\cdots,p_{J_{i}})\,.

Let πVi:Vi×S~iVi\pi_{V_{i}}:V_{i}\times\tilde{S}_{i}\rightarrow V_{i} denote the projection onto the first factor ViV_{i}, i.e., πVi(vi,si):=vi\pi_{V_{i}}(v_{i},s_{i}):=v_{i}; clearly, fiπVif~if_{i}\cong\pi_{V_{i}}\circ\tilde{f}_{i}.222222πVi\pi_{V_{i}} can be interpreted as a ‘discarding operation’.

Now, f~i\tilde{f}_{i} is injective, however, it is not surjective in general. In order to obtain a bijective map, we also need to extend the domain of fif_{i}. Let SiS^{\prime}_{i}, TiT^{\prime}_{i} and UiU^{\prime}_{i} be sets of cardinalities such that

|Ui||Pai|=|Ti||Ui||Pai|=|Vi||Si|.|U^{\prime}_{i}||\mathrm{Pa}_{i}|=|T^{\prime}_{i}||U_{i}||\mathrm{Pa}_{i}|=|V_{i}||S^{\prime}_{i}|\;. (78)

Similar to before, we denote the lexicographic order corresponding to (some choice of) alphabets on Ui:=Ti×UiU^{\prime}_{i}:=T^{\prime}_{i}\times U_{i} and Pai\mathrm{Pa}_{i} by lil_{i}. Let \bigcupdotviViSi(vi)=Ui×Pai\bigcupdot_{v_{i}\in V_{i}}S^{\prime}_{i}(v_{i})=U^{\prime}_{i}\times\mathrm{Pa}_{i} be any decomposition such that fi1(vi)Si(vi)f^{-1}_{i}(v_{i})\subseteq S^{\prime}_{i}(v_{i}) and |Si(vi)|=|Si||S^{\prime}_{i}(v_{i})|=|S^{\prime}_{i}| for all viViv_{i}\in V_{i}. Moreover, let lfi:Ui×Pai{1,,|Vi||Si|}l_{f_{i}}:U^{\prime}_{i}\times\mathrm{Pa}_{i}\rightarrow\{1,\cdots,|V_{i}||S^{\prime}_{i}|\} be any order such that lfi(x)lfi(y)li(x)li(y)l_{f_{i}}(x)\leq l_{f_{i}}(y)\Leftrightarrow l_{i}(x)\leq l_{i}(y) for all x,ySi(vi)x,y\in S^{\prime}_{i}(v_{i}) and viViv_{i}\in V_{i}. Then we define fi:Ui×PaiVi×Sif^{\prime}_{i}:U^{\prime}_{i}\times\mathrm{Pa}_{i}\rightarrow V_{i}\times S^{\prime}_{i} by

fi(ui,p1,,pJi):=(fi(ui,p1,,pJi),lfi(ui,p1,,pJi)).f^{\prime}_{i}(u^{\prime}_{i},p_{1},\cdots,p_{J_{i}}):=(f_{i}(u_{i},p_{1},\cdots,p_{J_{i}}),l_{f_{i}}(u^{\prime}_{i},p_{1},\cdots,p_{J_{i}}))\;. (79)

In this case, we have fiπUi=πVifif_{i}\circ\pi_{U_{i}}=\pi_{V_{i}}\circ f^{\prime}_{i}. Moreover, fif^{\prime}_{i} is a bijection by construction.

Finally, we note that the extensions fifif_{i}\rightarrow f^{\prime}_{i} are local: they only add degrees of freedom TiT^{\prime}_{i} and SiS^{\prime}_{i}, which are local to the node ViV_{i}, TiT^{\prime}_{i} is a cause of SiS^{\prime}_{i} only and the SiS^{\prime}_{i} are all discarded. It follows that—up to an additional ‘sink’ node S=×i=1nSiS^{\prime}=\times_{i=1}^{n}S^{\prime}_{i}—the CSM M=𝐔,𝐕,𝐅M^{\prime}=\langle\mathbf{U}^{\prime},\mathbf{V}^{\prime},\mathbf{F}^{\prime}\rangle for 𝐔={T1×U1,,Tn×Un}\mathbf{U}^{\prime}=\{T^{\prime}_{1}\times U_{1},\cdots,T^{\prime}_{n}\times U_{n}\}, 𝐕={𝐕,S}\mathbf{V}^{\prime}=\{\mathbf{V},S^{\prime}\} and 𝐅={f1,,fn}\mathbf{F}^{\prime}=\{f^{\prime}_{1},\cdots,f^{\prime}_{n}\} has the same causal structure as MM, that is, MM^{\prime} admits a (Markov) factorisation of the same form as MM in Eq. (30). Moreover, note that we can apply the above protocol to the binary extension M(b)M^{(b)} of a CSM MM, resulting in a binary, reversible CSM, which we will denote by MM^{\prime} also.232323Note, that the binary encoding in Eq. (77) will generally increase the cardinality of the sink nodes SiS^{\prime}_{i}.

(iii) copy operation. Since classical information can be copied and processed to various nodes in a CSM MM, we need a way to encode such copying in a quantum extension MQM_{Q}. Indeed, classical information is copied in the CSM M=𝐔,𝐕,𝐅M=\langle\mathbf{U},\mathbf{V},\mathbf{F}\rangle whenever there exist fj,fj𝐅f_{j},f_{j^{\prime}}\in\mathbf{F}, jjj\neq j^{\prime} such that fj1(Vj)fj1(Vj)f_{j}^{-1}(V_{j})\cap f_{j^{\prime}}^{-1}(V_{j^{\prime}})\neq\emptyset. Let Ch(Vi):={Vj𝐕Vifj1(Vj)}𝐕\mathrm{Ch}(V_{i}):=\{V_{j}\in\mathbf{V}\mid V_{i}\in f^{-1}_{j}(V_{j})\}\subset\mathbf{V} be the set containing all nodes Vj𝐕V_{j}\in\mathbf{V} that are related to ViV_{i} via 𝐅\mathbf{F}. Since we are only interested in proving the existence of a quantum extension MQM_{Q} of MM, we will content ourselves with an inefficient copy protocol: namely, we will copy all of ViV_{i} to every node VjCh(Vi)V_{j}\in\mathrm{Ch}(V_{i}).242424This copy protocol will generally increase the cardinality of variables 𝐕\mathbf{V} in the CSM, and is inefficient in this regard.

The idea is to use CNOT gates to implement classical copying in a fixed basis. This requires additional ancillary (quantum) nodes in the quantum extension MQM_{Q} of a CSM MM, which we will denote by 𝚲\bm{\Lambda}^{\prime}. More precisely, assume that VV are quantum nodes with associated Hilbert spaces Viin,Viout\mathcal{H}_{V^{\mathrm{in}}_{i}},\mathcal{H}_{V^{\mathrm{out}}_{i}} of respective dimension dim(Viin)=dim(Viout)=2Ni\mathrm{dim}(\mathcal{H}_{V^{\mathrm{in}}_{i}})=\mathrm{dim}(\mathcal{H}_{V^{\mathrm{out}}_{i}})=2^{N_{i}}.252525By (i) and (ii), we may assume that MM is a binary, reversible CSM, in particular, |Vi|=2Ni|V_{i}|=2^{N_{i}}. Further, let Λi:=VjCh(Vi)Λji\Lambda^{\prime}_{i}:=\otimes_{V_{j}\in\mathrm{Ch}(V_{i})}\Lambda^{\prime}_{ji}, where Λji\Lambda^{\prime}_{ji} denotes a quantum node with associated Hilbert spaces Λjiin,Λjiout\mathcal{H}_{\Lambda^{\mathrm{in}}_{ji}},\mathcal{H}_{\Lambda^{\mathrm{out}}_{ji}} of the same dimension as ViV_{i}, that is, dim(Λjiout)=dim(Λjiout)=2Ni\mathrm{dim}(\mathcal{H}_{\Lambda_{ji}^{\prime\mathrm{out}}})=\mathrm{dim}(\mathcal{H}_{\Lambda_{ji}^{\prime\mathrm{out}}})=2^{N_{i}}. Then we devise a copy protocol involving a total of |Ch(Vi)|Ni|\mathrm{Ch}(V_{i})|N_{i} CNOT-gates as follows,262626Here and below, we implicitly assume the individual CNOT gates in CiC_{i} to be ‘padded’ with identities on ancillary systems not involved in the present CNOT gate. As an example, see the copy operation between three-qubit nodes in Eq. (84).

Ci:=VjCh(Vi)(k=1NiCNOTΛji,kVi,k),\displaystyle C_{i}:=\prod_{V_{j}\in\mathrm{Ch}(V_{i})}\left(\prod_{k=1}^{N_{i}}\mathrm{CNOT}_{\Lambda^{\prime}_{ji,k}V_{i,k}}\right)\;, (80)

where CNOTΛji,kVi,k\mathrm{CNOT}_{\Lambda^{\prime}_{ji,k}V_{i,k}} denotes the CNOT gate with control and target given by the kthk-th qubit in ViV_{i} and Λji\Lambda^{\prime}_{ji}, respectively; that is,

CNOTΛji,kVi,k(|0|q)=|q|q,\mathrm{CNOT}_{\Lambda^{\prime}_{ji,k}V_{i,k}}(|0\rangle\otimes|q\rangle)=|q\rangle\otimes|q\rangle\;,

whenever |q{|0,|1}|q\rangle\in\{|0\rangle,|1\rangle\}. We will adopt the convention that the Hilbert space Viin\mathcal{H}_{V_{i}^{\mathrm{in}}} after the operation CiC_{i} is discarded. Note that Eq. (80) is then valid for Ch(Vi)=\mathrm{Ch}(V_{i})=\emptyset, namely we have Ci=𝕀ViinC_{i}=\mathbb{I}_{V_{i}^{\mathrm{in}}} in this case.

We are left to show that the encoding in Eq. (80) satisfies the relations {ΛjVi}\{\Lambda^{\prime}_{j}\nrightarrow V_{i}\} whenever jij\neq i in Def. 10 for the additional ancillary nodes Λj\Lambda^{\prime}_{j}. This follows immediately from the commutation relations of CNOT gates (see Prop. 1 in [64]), together with Thm. 2.

Below, we explicitly analyze the case of two CNOT-gates, copying a single qubit from a node AA to two children B,CCh(A)B,C\in\mathrm{Ch}(A). The situation is depicted in Fig. 10.

BBAACC
(a)
ΛBA\Lambda^{\prime}_{BA}AAΛCA\Lambda^{\prime}_{CA}\bigoplus\bigoplusBBSA′′S^{\prime\prime}_{A}CC
(b)
AABBCCΛBA\Lambda^{\prime}_{BA}ΛCA\Lambda^{\prime}_{CA}
(c)
Figure 10: Example of copy protocol as described in (iii): (a) a quantum causal model involving three single-qubit quantum nodes 𝐀={A,B,C}\mathbf{A}=\{A,B,C\}, (b) the qubit at node AA is copied onto ancillary systems ΛBA\Lambda^{\prime}_{BA} and ΛCA\Lambda^{\prime}_{CA}, using two CNOT gates CNOTΛBAA\mathrm{CNOT}_{\Lambda^{\prime}_{BA}A} and CNOTΛCAA\mathrm{CNOT}_{\Lambda^{\prime}_{CA}A}, (c) the copy ancillae ΛBA\Lambda^{\prime}_{BA} and ΛCA\Lambda^{\prime}_{CA} enter as additional endogenous variables in the resulting causal model.

We want to show that ΛBA\Lambda^{\prime}_{BA} and ΛCA\Lambda^{\prime}_{CA} are local influences only, that is, ΛBA\Lambda^{\prime}_{BA} does not signal to CC and ΛCA\Lambda^{\prime}_{CA} does not signal to BB. Let

ρΛBA=[αββ1α]\displaystyle\rho_{\Lambda^{\prime}_{BA}}=\begin{bmatrix}\alpha&\beta\\ \beta^{\ast}&1-\alpha\end{bmatrix} (81)

be a state at ΛBA\Lambda^{\prime}_{BA} and let

ρAΛCA=[abcdbfghcgkldhl1afk]\displaystyle\rho_{A\Lambda^{\prime}_{CA}}=\begin{bmatrix}a&b&c&d\\ b^{\ast}&f&g&h\\ c^{\ast}&g^{\ast}&k&l\\ d^{\ast}&h^{\ast}&l^{\ast}&1-a-f-k\end{bmatrix} (82)

be any bipartite state over the joint system AA and ΛCA\Lambda^{\prime}_{CA}.272727We remark that in the QSM MQM_{Q} constructed below, ρAΛCA\rho_{A\Lambda^{\prime}_{CA}} is in fact always a product state. In order to check the no-influence relation ‘ΛBAC\Lambda^{\prime}_{BA}\rightarrow C’ we evaluate

ρC=TrAB[U(ρΛBAρAΛCA)U],\rho_{C}=\mathrm{Tr}_{AB}[U(\rho_{\Lambda^{\prime}_{BA}}\otimes\rho_{A\Lambda^{\prime}_{CA}})U^{\dagger}]\;, (83)

where U=CAU=C_{A} is the copy operation in Eq. (80), in our case, it is the product of two CNOT gates,

CA=(CNOTΛBAA𝕀ΛCA)(𝕀ΛBACNOTΛCAA),C_{A}=(\mathrm{CNOT}_{\Lambda^{\prime}_{BA}A}\otimes\mathbb{I}_{\Lambda^{\prime}_{CA}})(\mathbb{I}_{\Lambda^{\prime}_{BA}}\otimes\mathrm{CNOT}_{\Lambda^{\prime}_{CA}A})\;, (84)

and is given by,

CA=[1000000001000000000000010000001000001000000001000001000000100000].\displaystyle C_{A}=\begin{bmatrix}1&0&0&0&0&0&0&0\\ 0&1&0&0&0&0&0&0\\ 0&0&0&0&0&0&0&1\\ 0&0&0&0&0&0&1&0\\ 0&0&0&0&1&0&0&0\\ 0&0&0&0&0&1&0&0\\ 0&0&0&1&0&0&0&0\\ 0&0&1&0&0&0&0&0\end{bmatrix}\;. (85)

By straightforward computation we obtain,

ρC=[a+kb+lb+l1ak],\displaystyle\rho_{C}=\begin{bmatrix}a+k&b^{\ast}+l^{\ast}\\ b+l&1-a-k\end{bmatrix}\;, (86)

which shows that ρC\rho_{C} is independent of ρΛBA\rho_{\Lambda^{\prime}_{BA}}. Similarly, one shows that ρB\rho_{B} is not affected by the matrix elements of ρΛCA\rho_{\Lambda^{\prime}_{CA}}. The argument can be extended to more than two (blocks of) CNOT gates at a node ViV_{i} (cf. Eq. (80)) and for nodes with more than one child node.

(iv) quantum extension: every PSM has a quantum extension. Let M,P(𝐮)\langle M^{\prime},P(\mathbf{u}^{\prime})\rangle be a PSM (see Def. 5) with CSM M=𝐔,𝐕,𝐅M^{\prime}=\langle\mathbf{U}^{\prime},\mathbf{V}^{\prime},\mathbf{F}^{\prime}\rangle and P(𝐮)P(\mathbf{u}^{\prime}) a probability distribution over exogenous variables of MM^{\prime}. The structural relations 𝐅\mathbf{F}^{\prime} define independence conditions between variables 𝐕\mathbf{V}^{\prime}. Representing these by a DAG, let (V1,,Vn)(V^{\prime}_{1},\cdots,V^{\prime}_{n}) be an order compatible with the order of nodes in the DAG, that is, VjPa(Vi)j<iV^{\prime}_{j}\in Pa^{\prime}(V^{\prime}_{i})\Rightarrow j<i. Compatibility of MM^{\prime} with the DAG, equivalently the causal Markov condition in Def. 1, reads

P(𝐯)=i=1nP(vipai)=uii=1nδvi,fi(pai,ui)P(ui).P(\mathbf{v}^{\prime})=\prod_{i=1}^{n}P(v^{\prime}_{i}\mid pa^{\prime}_{i})=\sum_{u^{\prime}_{i}}\prod_{i=1}^{n}\delta_{v^{\prime}_{i},f^{\prime}_{i}(pa^{\prime}_{i},u^{\prime}_{i})}P(u^{\prime}_{i})\;.

By steps (i) and (ii), we may assume that Ui,Vi,SiU^{\prime}_{i},V^{\prime}_{i},S^{\prime}_{i} are sets of cardinality a power of two and that the fi:Ui×PaiVi×Sif^{\prime}_{i}:U^{\prime}_{i}\times Pa^{\prime}_{i}\rightarrow V^{\prime}_{i}\times S^{\prime}_{i} are bijective functions. In order to construct a QSM from M,P(𝐮)\langle M^{\prime},P(\mathbf{u}^{\prime})\rangle, we promote these classical variables to quantum nodes, by associating them with the (free) Hilbert spaces Uiin\mathcal{H}_{U_{i}^{\prime\mathrm{in}}}, Uiout\mathcal{H}_{U_{i}^{\prime\mathrm{out}}}, Viin\mathcal{H}_{V_{i}^{\prime\mathrm{in}}}, Viout\mathcal{H}_{V_{i}^{\prime\mathrm{out}}}, and Siin\mathcal{H}_{S_{i}^{\prime\mathrm{in}}}, Siout\mathcal{H}_{S_{i}^{\prime\mathrm{out}}} over the outcome sets associated to Ui,Vi,SiU^{\prime}_{i},V^{\prime}_{i},S^{\prime}_{i}, together with a respective choice and orthonormal basis, as well as an identification of bases between input and output spaces, e.g. {|vi}vi\{|v^{\prime}_{i}\rangle\}_{v^{\prime}_{i}} in ViinViout\mathcal{H}_{V_{i}^{\prime\mathrm{in}}}\cong\mathcal{H}_{V_{i}^{\prime\mathrm{out}}}. Moreover, since the fif^{\prime}_{i} are bijective, complex-linear extension yields isometries

W~i:UioutPaioutViinSiin.\widetilde{W}_{i}:\mathcal{H}_{U_{i}^{\prime\mathrm{out}}}\otimes\mathcal{H}_{\mathrm{Pa}_{i}^{\prime\mathrm{out}}}\rightarrow\mathcal{H}_{V_{i}^{\prime\mathrm{in}}}\otimes\mathcal{H}_{S_{i}^{\prime\mathrm{in}}}\;. (87)

Part (iii) further yields unitaries implementing local classical copy operations,

Ci:(VjCh(Vi)Λjiout)Viin(VjCh(Vi)Λjiout)Viin,\displaystyle C_{i}:\left(\bigotimes_{V^{\prime}_{j}\in\mathrm{Ch}^{\prime}(V^{\prime}_{i})}\mathcal{H}_{\Lambda_{ji}^{\prime\mathrm{out}}}\right)\otimes\mathcal{H}_{V_{i}^{\prime\mathrm{in}}}\longrightarrow\left(\bigotimes_{V^{\prime}_{j}\in\mathrm{Ch}^{\prime}(V^{\prime}_{i})}\mathcal{H}_{\Lambda_{ji}^{\prime\mathrm{out}}}\right)\otimes\mathcal{H}_{V_{i}^{\prime\mathrm{in}}}\;, (88)

where by our above convention, Si′′in:=Viin\mathcal{H}_{S_{i}^{\prime\prime\mathrm{in}}}:=\mathcal{H}_{V_{i}^{\prime\mathrm{in}}}. Setting Wi:=CiW~iW_{i}:=C_{i}\widetilde{W}_{i} (see Fig. 11) and composing the WiW_{i}’s for all nodes ViVV^{\prime}_{i}\in V^{\prime} in order, we define a global isometry by26{}^{\ref{fn: padding}}

W=i=1nWi=i=1nCiW~i.W=\prod_{i=1}^{n}W_{i}=\prod_{i=1}^{n}C_{i}\widetilde{W}_{i}\;. (89)

Recall that by our convention that Viin\mathcal{H}_{V_{i}^{\prime\mathrm{in}}} is discarded after the copy process, we have Ci=𝕀ViinC_{i}=\mathbb{I}_{V_{i}^{\prime\mathrm{in}}} for Ch(Vi)=\mathrm{Ch}^{\prime}(V^{\prime}_{i})=\emptyset. Moreover, adopting the notation of the copy protocol from Eq. (80), we set Pai′′:=VjPaiΛijPa^{\prime\prime}_{i}:=\otimes_{V^{\prime}_{j}\in Pa^{\prime}_{i}}\Lambda^{\prime}_{ij} as well as Vi′′:=VjCh(Vi)ΛjiV^{\prime\prime}_{i}:=\otimes_{V^{\prime}_{j}\in\mathrm{Ch}^{\prime}(V^{\prime}_{i})}\Lambda^{\prime}_{ji} for all ViV^{\prime}_{i} with Pa(Vi)\mathrm{Pa}^{\prime}(V^{\prime}_{i})\neq\emptyset in the input and output quantum nodes of the isometry W~i:UioutPai′′outVi′′inSiin\widetilde{W}_{i}:\mathcal{H}_{U^{\prime\mathrm{out}}_{i}}\otimes\mathcal{H}_{Pa^{\prime\prime\mathrm{out}}_{i}}\rightarrow\mathcal{H}_{V^{\prime\prime\mathrm{in}}_{i}}\otimes\mathcal{H}_{S^{\prime\mathrm{in}}_{i}} and consequently for Wi:Λi′′outPai′′outVi′′inSi′′inW_{i}:\mathcal{H}_{\Lambda^{\prime\prime\mathrm{out}}_{i}}\otimes\mathcal{H}_{Pa^{\prime\prime\mathrm{out}}_{i}}\rightarrow\mathcal{H}_{V^{\prime\prime\mathrm{in}}_{i}}\otimes\mathcal{H}_{S^{\prime\prime\mathrm{in}}_{i}}, where Λi′′\Lambda^{\prime\prime}_{i} consists of exogenous variables UiU^{\prime}_{i} and Λi\Lambda^{\prime}_{i}, and Si′′S^{\prime\prime}_{i} consists of SiS^{\prime}_{i} and ViV^{\prime}_{i} (see Fig. 11).

Note that the ordering of unitaries in Eq. (89) maintains the partial ordering of the directed acyclic graph corresponding to MM^{\prime}. Consequently, ρ𝐕′′S′′|𝐕′′𝚲′′W\rho^{W}_{\mathbf{V}^{\prime\prime}S^{\prime\prime}|\mathbf{V}^{\prime\prime}\bm{\Lambda}^{\prime\prime}} admits the factorisation in Eq. (32).

W~iCiUiUiTiPai′′:=VjPa(Vi)ΛijViinSiViinΛi=VjCh(Vi)ΛjiVi′′Λi=VjCh(Vi)Λji}\left.\begin{array}[]{r}\leavevmode\hbox to188.61pt{\vbox to218.05pt{\pgfpicture\makeatletter\hbox{\hskip 57.1055pt\lower-72.46948pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ } {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} {{}\pgfsys@rect{-56.9055pt}{-14.22638pt}{170.71652pt}{28.45276pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{24.71034pt}{-2.68779pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$\widetilde{W}_{i}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} {{}\pgfsys@rect{-56.9055pt}{71.1319pt}{113.811pt}{28.45276pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-4.89587pt}{82.86494pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$C_{i}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \par{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{-28.45276pt}{-28.45276pt}\pgfsys@lineto{-28.45276pt}{-14.22638pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{85.35828pt}{-28.45276pt}\pgfsys@lineto{85.35828pt}{-14.22638pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{-28.45276pt}{14.22638pt}\pgfsys@lineto{-28.45276pt}{71.1319pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{85.35828pt}{14.22638pt}\pgfsys@lineto{85.35828pt}{113.81104pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\par{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{-28.45276pt}{99.58466pt}\pgfsys@lineto{-28.45276pt}{113.81104pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{28.45276pt}{99.58466pt}\pgfsys@lineto{28.45276pt}{113.81104pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{28.45276pt}{56.90552pt}\pgfsys@lineto{28.45276pt}{71.1319pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\par{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-33.18176pt}{-45.95024pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$U^{\prime}_{i}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-47.60278pt}{-66.51204pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$U_{i}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-18.80734pt}{-67.28981pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$T_{i}^{\prime}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{-39.83368pt}{-54.06006pt}\pgfsys@lineto{-34.14322pt}{-48.3696pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{{}}{} {}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{0.8pt}\pgfsys@invoke{ }{}\pgfsys@moveto{-17.07182pt}{-54.06006pt}\pgfsys@lineto{-22.76228pt}{-48.3696pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\par{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{42.54376pt}{-45.40579pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$\mathrm{Pa}^{\prime\prime}_{i}:=\!\!\!\underset{V^{\prime}_{j}\in\mathrm{Pa}^{\prime}(V^{\prime}_{i})}{\bigotimes}\!\Lambda^{\prime}_{ij}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-44.8261pt}{39.25081pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$V_{i}^{\prime\mathrm{in}}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{81.23413pt}{124.76631pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$S^{\prime}_{i}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-33.44518pt}{124.60909pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$V_{i}^{\prime\mathrm{in}}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{30.10312pt}{132.89444pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$\overbrace{\Lambda^{\prime}_{i}=\!\!\!\underset{V^{\prime}_{j}\in\mathrm{Ch}^{\prime}(V^{\prime}_{i})}{\bigotimes}\!\Lambda^{\prime}_{ji}}^{V^{\prime\prime}_{i}}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.00027pt}{39.89691pt}\pgfsys@invoke{ }\hbox{{\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@invoke{ }\pgfsys@color@gray@fill{0}\pgfsys@invoke{ }\hbox{{$\Lambda^{\prime}_{i}=\!\!\!\underset{V^{\prime}_{j}\in\mathrm{Ch}^{\prime}(V^{\prime}_{i})}{\bigotimes}\!\Lambda^{\prime}_{ji}$}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\end{array}\quad\quad\quad\quad\right\}

WiW_{i}Λi′′\Lambda^{\prime\prime}_{i}UiU^{\prime}_{i}Λi\Lambda^{\prime}_{i}Pai′′\mathrm{Pa}^{\prime\prime}_{i}Vi′′V^{\prime\prime}_{i}Si′′S^{\prime\prime}_{i}ViinV^{\prime\mathrm{in}}_{i}SiS^{\prime}_{i}
Figure 11: By Eq. (89), the global isometry WW in the quantum extension MQM_{Q} of a PSM M,P(𝐮)\langle M,P(\mathbf{u})\rangle factorises—according to the causal Markov condition for MM—into local isometries Wi:Pai′′Λi′′Vi′′Si′′W_{i}:\mathcal{H}_{Pa^{\prime\prime}_{i}}\otimes\mathcal{H}_{\Lambda^{\prime\prime}_{i}}\rightarrow\mathcal{H}_{V^{\prime\prime}_{i}}\otimes\mathcal{H}_{S^{\prime\prime}_{i}} for every quantum node Vi′′V^{\prime\prime}_{i}, of the form on the right. Every local isometry WiW_{i} further decomposes as an isometry W~i\widetilde{W}_{i}, given as the complex linear extension of the bijective functions in the reversible, binary CSM MM^{\prime} (see steps (i) and (ii)), followed by a unitary CiC_{i} copying the node ViV^{\prime}_{i} to all its children nodes (see step (iii)). In terms of the copy ancillae Λji\Lambda^{\prime}_{ji}, we therefore have Vi′′:=VjCh(Vi)ΛjiV^{\prime\prime}_{i}:=\bigotimes_{V^{\prime}_{j}\in\mathrm{Ch}^{\prime}(V^{\prime}_{i})}\Lambda^{\prime}_{ji} and Pai′′:=VjPa(Vi)Λij\mathrm{Pa}^{\prime\prime}_{i}:=\bigotimes_{V^{\prime}_{j}\in\mathrm{Pa}^{\prime}(V^{\prime}_{i})}\Lambda^{\prime}_{ij}.

Finally, we encode the distribution P(𝐮)P(\mathbf{u}) over exogenous nodes Ui𝐔U_{i}\in\mathbf{U} of MM. First, we trivially extend P(𝐮)P(\mathbf{u}) to P(b)(𝐮(b))P^{(b)}(\mathbf{u}^{(b)}) by setting P(b)(𝐮)=P(𝐮)P^{(b)}(\mathbf{u})=P(\mathbf{u}) for 𝐮𝐔\mathbf{u}\in\mathbf{U} and P(b)(𝐮)=0P^{(b)}(\mathbf{u})=0 for 𝐮𝐔(b)\𝐔\mathbf{u}\in\mathbf{U}^{(b)}\backslash\mathbf{U} in step (i). Second, we define P(𝐮)P(\mathbf{u}^{\prime}) as the product distribution of P(b)(𝐮(b))P^{(b)}(\mathbf{u}^{(b)}) and the uniform distribution over 𝐓=×i=1nTi\mathbf{T}^{\prime}=\times_{i=1}^{n}T^{\prime}_{i}, i.e., P(𝐮)=P(𝐮,𝐭)=1|𝐓|P(b)(𝐮)P(\mathbf{u}^{\prime})=P(\mathbf{u},\mathbf{t}^{\prime})=\frac{1}{|\mathbf{T}^{\prime}|}P^{(b)}(\mathbf{u}) in step (ii). Finally, step (iii) requires us to initialise the input state of the copy ancillae Λi\Lambda^{\prime}_{i}. Taken together, we define instruments

{τΛi′′ui}ui={(|00|ΛiP(ui(b))|ui(b)ui(b)|Ui(b)1|Ti||titi|Ti)T𝕀Λiin}ui.\{\tau^{u_{i}}_{\Lambda^{\prime\prime}_{i}}\}_{u_{i}}=\left\{\left(|0\rangle\langle 0|_{\Lambda^{\prime}_{i}}\otimes P(u^{(b)}_{i})|u^{(b)}_{i}\rangle\langle u^{(b)}_{i}|_{U^{(b)}_{i}}\otimes\frac{1}{|T^{\prime}_{i}|}|t^{\prime}_{i}\rangle\langle t^{\prime}_{i}|_{T^{\prime}_{i}}\right)^{T}\otimes\mathbb{I}_{\Lambda_{i}^{in}}\right\}_{u_{i}}\;. (90)

In summary, we thus obtain a QSM MQ=(𝐕′′,𝚲′′,S′′),ρ𝐕′′S′′𝐕′′𝚲′′W,({τΛ1′′u1}u1,,{τΛn′′un}un)M_{Q}=\langle(\mathbf{V}^{\prime\prime},\bm{\Lambda}^{\prime\prime},S^{\prime\prime}),\rho_{\mathbf{V}^{\prime\prime}S^{\prime\prime}\mid\mathbf{V}^{\prime\prime}\bm{\Lambda}^{\prime\prime}}^{W},(\{\tau^{u_{1}}_{\Lambda^{\prime\prime}_{1}}\}_{u_{1}},\cdots,\{\tau^{u_{n}}_{\Lambda^{\prime\prime}_{n}}\}_{u_{n}})\rangle, which by construction reproduces the classical probability distribution defined by MM and P(𝐮)P(\mathbf{u}) in Eq. (31), when evaluated on instruments corresponding to projective measurements in the (|Ch(Vi)||\mathrm{Ch}(V_{i})|-times copied) preferred bases, {τVi′′vi}vi={|vivi||vivi|}vi\{\tau_{V^{\prime\prime}_{i}}^{v_{i}}\}_{v_{i}}=\{|v_{i}\rangle\langle v_{i}|\otimes|v_{i}\rangle\langle v_{i}|\}_{v_{i}}. This completes the proof.