This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Covariance Decomposition as a Universal Limit on Correlations in Networks

Salman Beigi1 1School of Mathematics, Institute for Research in Fundamental Sciences (IPM)
P.O. Box 19395-5746, Tehran, Iran
2ICFO-Institut de Ciencies Fotoniques, 08860 Castelldefels (Barcelona), Spain
Marc-Olivier Renou2 1School of Mathematics, Institute for Research in Fundamental Sciences (IPM)
P.O. Box 19395-5746, Tehran, Iran
2ICFO-Institut de Ciencies Fotoniques, 08860 Castelldefels (Barcelona), Spain
Abstract

Parties connected to independent sources through a network can generate correlations among themselves. Notably, the space of feasible correlations for a given network, depends on the physical nature of the sources and the measurements performed by the parties. In particular, quantum sources give access to nonlocal correlations that cannot be generated classically. In this paper, we derive a universal limit on correlations in networks in terms of their covariance matrix. We show that in a network satisfying a certain condition, the covariance matrix of any feasible correlation can be decomposed as a summation of positive semidefinite matrices each of whose terms corresponds to a source in the network. Our result is universal in the sense that it holds in any physical theory of correlation in networks, including the classical, quantum and all generalized probabilistic theories.

1 Introduction

The causal inference problem, i.e., determining the causes of observed phenomena, is a fundamental problem in many disciplines. Given two categories of observed and latent variables, it asks to deduce the causation patterns between the two sets of variables, where the former ones can be measured while the latter ones are hidden. More specifically, the problem is whether the statistics of the observed variables can be modeled with a given causal pattern [29, 18].

In this paper, we restrict ourselves to network causation patterns, which are determined by a bipartite graph with one part for observed variables and one part for latent variables [10, 6]. We can think of the vertices associated to latent variables as sources that produce signals that are independent of each other. We can also think of vertices associated to observed variables as parties who receive signals from their adjacent sources. After receiving these signals, each party performs a measurement whose output determines the value of the corresponding observed variable. For instance, if the signals are classical random variables, the measurement is a function of these variables. Here, we emphasize that the distribution of signals (latent variables) are mutually independent, and an observed variable equals the measurement outcome applied on the adjacent sources. See Figure 1 for an example of a network. Now given such a network with nn parties, and a joint probability distribution p(a1,,an)p(a_{1},\dots,a_{n}) on their observed variables, the question is, can p(a1,,an)p(a_{1},\dots,a_{n}) be modeled as the output distribution of the measurement of sources distributed in the network?

Quantum physics introduces seminal aspects to the causal inference problem [13, 11, 2]. The point is that the sources may produce quantum signals, i.e., multipartite entangled states, and parties may perform quantum measurements on the received signals. In this case the answer to the causal inference problem may differ from its answer in the completely classical (local) setting. Indeed, already in the primary example of Bell’s scenario we observe nonlocal correlations in the quantum setting that cannot be modeled classically [5]; this is Bell’s nonlocality. It is the elementary manifestation of a broader phenomenon, called network nonlocality, in which quantum nonlocal correlations admitting a model in some network cannot be modeled classically in the same network.

The existence of quantum models, which give access to more correlations than classical ones, raises the question of the ultimate limits on correlations in networks. In the case of Bell’s scenario, this ultimate limit is given by the no-signaling principle [21]. We imagine that any correlation satisfying this principle can be obtained through ‘measurements’ of ‘states’ in some hypothetical Generalised Probabilistic Theory (GPT) [4, 26, 14, 28]. Such theories generalize quantum physics and formalize the Popescu-Rohrlich box [20, 25]. However, the no-signaling principle alone is insufficient to define the ultimate limits of correlation in networks [12, 3]. In the following, we define these limits through the causality principle (see Figure 1). This resolusion, combined with the inflation technique [32], allow us to derive a universal limit on correlations in networks.

Refer to caption
Figure 1: This is an example of a network with two sources Sα,SβS_{\alpha},S_{\beta} and three parties A1,A2,A3A_{1},A_{2},A_{3}. Each party after receiving the signal from its adjacent sources, performs a measurement to determine her output. We denote the output of party AiA_{i} by aia_{i}, and the joint distribution of outputs by p(a1,a2,a3)p(a_{1},a_{2},a_{3}).
First causality law: Imaging that this network is a part of a larger network, yet the structure of the network in the neighborhood of A2A_{2} is the same. Then the marginal distribution p(a2)p(a_{2}) in the larger network remains the same.
Second causality law: Since in this network A1A_{1} and A3A_{3} do not have a common source, we have that p(a1,a3)=p(a1)p(a3)p(a_{1},a_{3})=p(a_{1})p(a_{3}).

Recently, Kale et al [15] proved a limit on the set of distributions p(a1,,an)p(a_{1},\dots,a_{n}) that can be modeled in a network in the classical setting. They showed that the covariance matrix of functions f1(a1),,fn(an)f_{1}(a_{1}),\dots,f_{n}(a_{n}) of output variables associated to such a distribution p(a1,,an)p(a_{1},\dots,a_{n}) can be decomposed as a summation of positive semidefinite matrices whose terms somehow correspond to sources in the network. Later, the same result is proven by Åberg et al [1] in the quantum setting, and discussed in Kraft et al [16]. The main contribution of our paper is that this decomposition holds to exist in a large class of networks in any underlying physical theory satisfying the aforementioned causality principle. More precisely, we prove that in any such physical theory, for local scalar functions f1,,fnf_{1},\dots,f_{n} of outputs of parties, their covariance matrix can be decomposed into a summation of positive semidefinite terms that are associated to sources in the network.

In the rest of this section, after giving some preliminary definitions, we present the statement of our main result and explain its proof techniques.

1.1 Main result

In this paper, we adopt Dirac’s ket-bra notation. We let [d]:={1,,d}[d]:=\{1,\dots,d\} and denote the computational basis of the dd-dimensional complex vector space d\mathbb{C}^{d} by {|1,,|d}={|i:i[d]}\{|1\rangle,\dots,|d\rangle\}=\{|i\rangle:\,i\in[d]\}. The complex conjugate of xx\in\mathbb{C} is x¯\bar{x}. Capital letters represent matrices and the (i,j)(i,j)-th entry of matrix MM is denoted by MijM_{ij}. For two matrices M,NM,N of the same size, MNM\circ N is the Schur or Hadamard product of MM and NN, that is a matrix with entries

(MN)ij=MijNij.(M\circ N)_{ij}=M_{ij}\cdot N_{ij}.

M0M\succeq 0 indicates that MM is positive semidefinite. We represent the Kronecker delta function by δk,\delta_{k,\ell}.

Networks.

A network 𝒩\mathcal{N} is a bipartite graph whose vertex set consists of sources and parties. The sources are denoted by SαS_{\alpha} for α=1,,m\alpha=1,\dots,m, and parties are denoted by AiA_{i} for i=1,,ni=1,\dots,n. SαAiS_{\alpha}\to A_{i} (and sometimes αi\alpha\to i) indicates that the source SαS_{\alpha} is connected to the party AiA_{i}. We assume throughout this paper that the networks do not have isolated vertices meaning that each source is adjacent to at least one party, and each party is adjacent to at least one source. Moreover, we assume that there are no two sources in the network whose adjacent sets of parties are comparable under inclusion since otherwise one of them is redundant and can be merged with the other one.

Output distribution.

In the network 𝒩\mathcal{N}, each source SαS_{\alpha} sends some signals to its adjacent parties, and each party AiA_{i} after receiving signals from its adjacent sources, performs a measurement and outputs the result aia_{i}. The joint distribution of these outputs is denoted by p(a1,,an)p(a_{1},\dots,a_{n}).

Underlying physical theory.

In this paper, we do not make any assumption about the nature of signals and measurements. Hence, we consider any correlation which can be obtained in any Generalized Probabilistic Theory (GPT), i.e., from any hypothetical model for correlations in networks beyond classical and quantum physics.

The causal structure of the Bell scenario can be derived from the no-signaling principle alone. This approach fails in general networks. Hence, we need to accept the network causal structure. Here, we assume the causality principle from which two laws are followed. By the first causality law, the marginal distribution of the outputs of a group of parties depends only on the sources adjacent to them, and not on the structure of the network beyond those sources. By the second causality law, two groups of parties who do not share a common source are uncorrelated, and their marginal distribution factorizes. We think of these two causality laws as the only generic restrictions on GPTs in networks.111Our first causality law follows from the the no-signaling principle, and our second causality law is called the independence principle in [12, 3]. Thus, as an alternative solution we can also take the No-Signaling and the Independence principles (called NSI [12, 3]) instead of the causality principle. See [19] for a discussion on these different approaches. See Figure 1.

We also assume that we can make independent and identical copies of the sources and parties of a network 𝒩\mathcal{N}. That is, we assume that we can repeat the process (experiment) under which the sources produce their signals and the parties perform their measurements. For instance, for a source SαS_{\alpha} of 𝒩\mathcal{N} that emits some signal, we can make an independent copy SαS_{\alpha^{\prime}} of it that emits the same signal. We emphasize that we do not clone this signal; we only repeat the process under which it is produced. We give more details on these assumptions and their consequences in Subsection 2.1.

Covariance matrix.

Suppose that each party AiA_{i} applies a real or complex function fi(ai)f_{i}(a_{i}) on her output. Associated to the distribution p(a1,,an)p(a_{1},\dots,a_{n}) and these functions we consider the covariance matrix 𝒞=𝒞(f1,,fn)\mathcal{C}=\mathcal{C}(f_{1},\dots,f_{n}). This is an n×nn\times n matrix whose (i,j)(i,j)-th entry equals

𝒞ij=𝖢𝗈𝗏(fi,fj)=𝖤[f¯ifj]𝖤[f¯i]𝖤[fj]=𝖤[(f¯i𝖤[f¯i])(fj𝖤[fj])],\mathcal{C}_{ij}=\mathsf{Cov}(f_{i},f_{j})=\mathsf{E}[\bar{f}_{i}f_{j}]-\mathsf{E}[\bar{f}_{i}]\cdot\mathsf{E}[f_{j}]=\mathsf{E}\big{[}(\bar{f}_{i}-\mathsf{E}[\bar{f}_{i}])\cdot(f_{j}-\mathsf{E}[f_{j}])\big{]},

where the expectations are with respect to the output distribution p(a1,,an)p(a_{1},\dots,a_{n}). It is well-known and can easily be verified that the covariance matrix is always positive semidefinite.

In this paper, we are interested in certain decompositions of covariance matrices.

Definition 1 (𝒩\mathcal{N}-compatible matrix decomposition).

Let 𝒩\mathcal{N} be a network with sources SαS_{\alpha}, α=1,,m\alpha=1,\dots,m, and parties AiA_{i}, i=1,,ni=1,\dots,n. For any source SαS_{\alpha} let α\mathcal{L}_{\alpha} be the set of n×nn\times n matrices MαM_{\alpha} such that

  • MαM_{\alpha} is positive semidefinite,

  • The (i,j)(i,j)-th entry of MαM_{\alpha} is non-zero only if αi\alpha\to i and αj\alpha\to j.

Then we say that a positive semidefinite n×nn\times n matrix MM admits an 𝒩\mathcal{N}-compatible matrix decomposition if there are MααM_{\alpha}\in\mathcal{L}_{\alpha}, α=1,,m\alpha=1,\dots,m, such that M=αMαM=\sum_{\alpha}M_{\alpha}.

For an example of a network-compatible matrix decomposition, consider the network 𝒩\mathcal{N} of Figure 1. As mentioned in the caption of this figure, by the independence principle p(a1,a3)=p(a1)p(a3)p(a_{1},a_{3})=p(a_{1})p(a_{3}). This means that 𝒞13=𝒞¯31=𝖢𝗈𝗏(f1,f3)=0\mathcal{C}_{13}=\bar{\mathcal{C}}_{31}=\mathsf{Cov}(f_{1},f_{3})=0. Then an 𝒩\mathcal{N}-compatible matrix decomposition for the covariance matrix 𝒞\mathcal{C} takes the form

𝒞=(𝖵𝖺𝗋(f1)𝖢𝗈𝗏(f1,f2)0𝖢𝗈𝗏(f2,f1)𝖵𝖺𝗋(f2)𝖢𝗈𝗏(f2,f3)0𝖢𝗈𝗏(f3,f2)𝖵𝖺𝗋(f3))=(00000)+(00000),\displaystyle\mathcal{C}=\begin{pmatrix}\mathsf{Var}(f_{1})&\mathsf{Cov}(f_{1},f_{2})&0\\ \mathsf{Cov}(f_{2},f_{1})&\mathsf{Var}(f_{2})&\mathsf{Cov}(f_{2},f_{3})\\ 0&\mathsf{Cov}(f_{3},f_{2})&\mathsf{Var}(f_{3})\end{pmatrix}=\begin{pmatrix}\ast&\ast&0\\ \ast&\ast&0\\ 0&0&0\end{pmatrix}+\begin{pmatrix}0&0&0\\ 0&\ast&\ast\\ 0&\ast&\ast\end{pmatrix},

where the two matrices on the right hand side are positive semidefinite and belong to α\mathcal{L}_{\alpha} and β\mathcal{L}_{\beta}, respectively.

In this paper, we prove that the covariance matrix 𝒞(f1,,fn)\mathcal{C}(f_{1},\dots,f_{n}) defined above admits an 𝒩\mathcal{N}-compatible matrix decomposition with the minimal aforementioned assumptions on the underlying physical theory. To prove our result we need to restrict to a subclass of networks first introduced in [22].

Definition 2 (No double common-source network).

A network 𝒩\mathcal{N} is called a No Double Common-Source (NDCS) network if for any two parties AiAjA_{i}\neq A_{j}, there are at most one source SαS_{\alpha} adjacent to both of them.

Observe that a network in which all sources are bipartite (i.e., each source is adjacent to exactly two parties) are automatically NDCS.222Recall that we assume there is no redundant source in the network. We can now formally state the main result of this paper.

Theorem 3 (Main result).

Let 𝒩\mathcal{N} be an NDCS network with sources SαS_{\alpha}, α=1,,m\alpha=1,\dots,m, and parties AiA_{i}, i=1,,ni=1,\dots,n. Consider signals emitted by sources and measurements performed by the parties in an arbitrary underlying physical theory that satisfies the no-signaling and independence principles. Suppose that this results in an output joint distribution p(a1,,an)p(a_{1},\dots,a_{n}). Let fi(ai)f_{i}(a_{i}) be a function of the output of party AiA_{i}. Then the covariance matrix 𝒞(f1,,fn)\mathcal{C}(f_{1},\dots,f_{n}) admits an 𝒩\mathcal{N}-compatible matrix decomposition.

In [15] and [1] the above theorem is proven for the classical and quantum theories without the NDCS assumption and for vector-valued functions beyond the scalar ones. Nevertheless, Theorem 3 states the validity of covariance decomposition in the box world [28, 26] as well as any GPT [4, 14].

Remark that this theorem is tight, in the sense that given an NDCS network 𝒩\mathcal{N} and a positive semidefinite matrix MM that admits an 𝒩\mathcal{N}-compatible matrix decomposition, there exists an output distribution of the network whose covariance matrix is MM. Indeed, letting M=αMαM=\sum_{\alpha}M_{\alpha} be an 𝒩\mathcal{N}-compatible matrix decomposition of MM, assume that source α\alpha sends a multivariate Gaussian distribution with covariance matrix MαM_{\alpha} to its adjacent parties (each coordinate to its associated party). Then, if each party outputs the summation of all received (one-dimensional) Gaussian signals, the joint output distribution would be a Gaussian distribution whose covariance matrix is MM.

Proof ideas.

Our main conceptual idea in proving Theorem 3 is the so called inflation technique [32] (see Subsection 2.1). Using non-fanout inflations, we consider certain expansions of the network 𝒩\mathcal{N} by adding copies of the sources and parties and connecting them based on similar rules as in 𝒩\mathcal{N}. Then we consider the covariance matrix associated to the inflated network. As a covariance matrix, it is again positive semidefinite. This fact imposes some conditions on the original covariance matrix. Then we show that these conditions imply that the covariance matrix admits an 𝒩\mathcal{N}-compatible matrix decomposition.

From a more technical point of view, we first note that, fixing a network 𝒩\mathcal{N}, the space of covariance matrices associated to 𝒩\mathcal{N} forms a convex cone (see Definition 4). On the other hand, the space of matrices that admit an 𝒩\mathcal{N}-compatible matrix decomposition is also a convex cone. To prove Theorem 3 we need to show that these two convex cones coincide. To this end, based on the theory of dual cones (see Subsection 2.2), it suffices to show that the associated dual cones are equal. Next, to prove equality in the dual picture, we use the inflation technique as discussed above. Putting these together, the theorem is proven when 𝒩\mathcal{N} contains only bipartite sources (see Theorem 8). For general NDCS networks we also need to use the theory of embezzlement, first introduced in [30] in the context of entanglement theory (see Subsection 2.3).

2 Main tools

In this section we explain the main three tools we use in the proof of Theorem 3, namely, non-fanout inflations, the theory of dual cones and the theory of embezzlement.

2.1 Non-fanout inflation

Refer to caption
Figure 2: The network on the right is a non-fanout inflation of the triangle network on the left. Here, parties Ai(1),Ai(2)A_{i}^{(1)},A_{i}^{(2)} are two copies of AiA_{i}, for i=1,2,3i=1,2,3. Moreover, sources α(1),α(2)\alpha^{(1)},\alpha^{(2)} are two independent copies of α\alpha, and similarly for β,γ\beta,\gamma. We note that, e.g., the party A1(1)A_{1}^{(1)} is adjacent to copies β(1),γ(1)\beta^{(1)},\gamma^{(1)} of β,γ\beta,\gamma, respectively, that are adjacent to A1A_{1} in the original network. Thus, A1(1)A_{1}^{(1)} receives signals of the same types as the ones received by A1A_{1} in the triangle, and can behave exactly the same as A1A_{1}; she can perform the same measurement on the received signals. According to the setup of the proof of Lemma 10, this inflation corresponds to the sign function ϵ(α)=ϵ(γ)=+1\epsilon(\alpha)=\epsilon(\gamma)=+1 and ϵ(β)=1\epsilon(\beta)=-1.

The inflation technique [32] is a method to obtain constraints over the output correlations produced in a network 𝒩\mathcal{N}. Here, we give a formal definition of a restricted class of inflations called non-fanout inflations.

Consider a network 𝒩\mathcal{N} with sources SαS_{\alpha}, α=1,,m\alpha=1,\dots,m, and parties AiA_{i}, i=1,,ni=1,\dots,n outputting a1,,ana_{1},\dots,a_{n} with joint distribution p(a1,,an)p(a_{1},\dots,a_{n}). In the inflated network of order d2d\geq 2, which we denote by 𝒩~\widetilde{\mathcal{N}}, we consider dd copies of each source SαS_{\alpha} which we denote by Sα(1),,Sα(d)S_{\alpha}^{(1)},\dots,S_{\alpha}^{(d)}, and dd copies of each party AiA_{i} which we denote by Ai(1),,Ai(d)A_{i}^{(1)},\dots,A_{i}^{(d)}. Let SαS_{\alpha} and AiA_{i} be an adjacent source and party in 𝒩\mathcal{N}. We assume that their associated copies are connected in 𝒩~\widetilde{\mathcal{N}} as follows. Fix some permutation πiα\pi_{i}^{\alpha} on {1,,d}\{1,\dots,d\}. Then we assume that Sα(k)S_{\alpha}^{(k)} is adjacent to Ai(πiα)1(k)A_{i}^{(\pi_{i}^{\alpha})^{-1}(k)} for any 1kd1\leq k\leq d. Observe that in 𝒩~\widetilde{\mathcal{N}}, any party Ai(k)A_{i}^{(k)} is adjacent to the same number of sources and of the same types as in 𝒩\mathcal{N}. Similarly, any source Sα(k)S_{\alpha}^{(k)} in 𝒩~\widetilde{\mathcal{N}} is adjacent to the same number of parties and of the same types as in 𝒩\mathcal{N}. Therefore, sources and parties in 𝒩~\widetilde{\mathcal{N}} can behave similarly to the sources and parties in the original network by producing the same signals and performing the same measurements. The output of Ai(k)A_{i}^{(k)} is denoted by ai(k)a_{i}^{(k)}, and the joint distribution of the outputs is denoted by

p(a1(1),,a1(d),a2(1),,a2(d),,an(1),,an(d)).\displaystyle p\Big{(}a_{1}^{(1)},\dots,a_{1}^{(d)},a_{2}^{(1)},\dots,a_{2}^{(d)},\dots\dots,a_{n}^{(1)},\dots,a_{n}^{(d)}\Big{)}. (1)

For an example of a non-fanout inflation see Figure 2.

In this paper, we assume that the underlying physical theory satisfies causality in the network, which is defined through the following two laws:

First causality law.

The marginal distribution of the outputs of a group of parties depends only on the structure of the network near those parties. For instance, suppose that Ai(k),Aj(k)A_{i}^{(k)},A_{j}^{(k^{\prime})} share a source Sα()S_{\alpha}^{(\ell)} in 𝒩~\widetilde{\mathcal{N}} and that Sα()S_{\alpha}^{(\ell)} is the unique such source. This means that SαAi,AjS_{\alpha}\to A_{i},A_{j} in 𝒩\mathcal{N}. Then the first principle states that the marginal distribution of outputs of Ai(k),Aj(k)A_{i}^{(k)},A_{j}^{(k^{\prime})} in 𝒩~\widetilde{\mathcal{N}} coincides with that of Ai,AjA_{i},A_{j} in 𝒩\mathcal{N}, i.e., p(ai(k),aj(k))=p(ai,aj)p\big{(}a_{i}^{(k)},a_{j}^{(k^{\prime})}\big{)}=p(a_{i},a_{j}). The point is that the correlation between parties Ai(k),Aj(k)A_{i}^{(k)},A_{j}^{(k^{\prime})} in 𝒩~\widetilde{\mathcal{N}} are created in the very same way as the one between parties Ai,AjA_{i},A_{j} in 𝒩\mathcal{N}.

As a consequence of this principle, if Ai,AjA_{i},A_{j} compute functions fi,fjf_{i},f_{j} on their outputs, and similarly Ai(k),Aj(k)A_{i}^{(k)},A_{j}^{(k^{\prime})} compute identical functions fi(k),fj(k)f_{i}^{(k)},f_{j}^{(k^{\prime})}, respectively, then we have 𝖢𝗈𝗏(fi(k),fj(k))=𝖢𝗈𝗏(fi,fj)\mathsf{Cov}(f_{i}^{(k)},f_{j}^{(k^{\prime})})=\mathsf{Cov}(f_{i},f_{j}).

Second causality law.

If two groups of parties do not share any common source, then their marginal distribution factorizes. For instance, if Ai(k),Aj(k)A_{i}^{(k)},A_{j}^{(k^{\prime})} do not share any source in 𝒩~\widetilde{\mathcal{N}} (which may be the case even if Ai,AjA_{i},A_{j} share a source in 𝒩\mathcal{N}), then the marginal distribution p(ai(k),aj(k))p\big{(}a_{i}^{(k)},a_{j}^{(k^{\prime})}\big{)} factorizes as p(ai(k),aj(k))=p(ai(k))p(aj(k))=p(ai)p(aj)p\big{(}a_{i}^{(k)},a_{j}^{(k^{\prime})}\big{)}=p\big{(}a_{i}^{(k)}\big{)}p\big{(}a_{j}^{(k^{\prime})}\big{)}=p(a_{i})p(a_{j}), where the second equality follows from the first principle. This, in particular, means that if fi(k)f_{i}^{(k)} and fj(k)f_{j}^{(k^{\prime})} are functions of the outputs of Ai(k),Aj(k)A_{i}^{(k)},A_{j}^{(k^{\prime})}, then we have 𝖢𝗈𝗏(fi(k),fk(k))=0\mathsf{Cov}(f_{i}^{(k)},f_{k}^{(k^{\prime})})=0.

Importantly, these two laws and inflation are more than a method to find limitations on correlations. They provide the definition of the ultimate limits of correlations in networks, hence should be satisfied by any GPT in network (see [7, 19]). We emphasize that we only consider non-fanout inflations meaning that we do not clone signals. In the classical theory signals can be cloned, so, e.g., a bipartite source in 𝒩\mathcal{N} may become tripartite in 𝒩~\widetilde{\mathcal{N}}. Indeed, the inflation technique with unbounded fanout exactly characterizes distributions in classical networks [17]. Here, as we consider arbitrary theories, we restrict only to non-fanout inflations. Such inflations has also been considered in [12].

2.2 Cones and duality

One of the main tools that we use to prove our main theorem is the theory of dual cones [8]. Here, restricting to cones of real or complex self-adjoint (Hermitian) matrices, we review the basic definitions and properties. Let us first introduce the Hilbert-Schmidt inner product on the space of matrices given by X,Y:=tr(XY),\langle X,Y\rangle:=\text{\rm tr}(X^{\dagger}Y), where XX^{\dagger} is the adjoint of XX. Cones and dual cones are defined as follows:

Definition 4.

A subset 𝒦\mathcal{K} of self-adjoint n×nn\times n matrices is called a convex cone if

  • For X𝒦X\in\mathcal{K} and r0r\geq 0 we have rX𝒦rX\in\mathcal{K},

  • For any X1,X2𝒦X_{1},X_{2}\in\mathcal{K} we have X1+X2𝒦X_{1}+X_{2}\in\mathcal{K}.

Moreover, the dual cone of 𝒦\mathcal{K} is defined by 𝒦:={Y|X,Y0,X𝒦}.\mathcal{K}^{*}:=\{Y\,|\,\langle X,Y\rangle\geq 0,~{}~{}\forall X\in\mathcal{K}\}.

An important example of a convex cone is the set of positive semidefinite matrices, that is self-dual.

In the proof of Theorem 3 we will use the following basic properties of dual cones, whose proofs are left for Appendix A.

Lemma 5.

[8] The followings hold for both the real and complex scalar fields:

  • (i)

    For any cone 𝒦\mathcal{K}, we have 𝒦=(𝒦¯)\mathcal{K}^{*}=(\,\overline{\mathcal{K}}\,)^{*} where 𝒦¯\overline{\mathcal{K}} is the closure of 𝒦\mathcal{K}. Moreover, the dual cone 𝒦\mathcal{K}^{*} is convex and closed.

  • (ii)

    For any two cones 𝒦\mathcal{L}\subseteq\mathcal{K} we have 𝒦\mathcal{K}^{*}\subseteq\mathcal{L}^{*}.

  • (iii)

    For any closed convex cone 𝒦\mathcal{K} we have (𝒦)=𝒦(\mathcal{K}^{*})^{*}=\mathcal{K}.

  • (iv)

    For any closed convex cones 𝒦1,𝒦2\mathcal{K}_{1},\mathcal{K}_{2}, we have that 𝒦1𝒦2\mathcal{K}_{1}\cap\mathcal{K}_{2} is a closed convex cone.

  • (v)

    For any closed convex cones 𝒦1,𝒦2\mathcal{K}_{1},\mathcal{K}_{2} , their sum 𝒦1+𝒦2={X1+X2|X1𝒦1,X2𝒦2}\mathcal{K}_{1}+\mathcal{K}_{2}=\big{\{}X_{1}+X_{2}|\,~{}X_{1}\in\mathcal{K}_{1},X_{2}\in\mathcal{K}_{2}\big{\}} is a convex cone (but not necessarily closed).

  • (vi)

    For any closed convex cones 𝒦1,𝒦2\mathcal{K}_{1},\mathcal{K}_{2} we have (𝒦1𝒦2)=𝒦1+𝒦2¯(\mathcal{K}_{1}\cap\mathcal{K}_{2})^{*}=\overline{\mathcal{K}_{1}^{*}+\mathcal{K}_{2}^{*}}.

  • (vii)

    For any closed convex cones 𝒦1,𝒦2\mathcal{K}_{1},\mathcal{K}_{2} we have (𝒦1+𝒦2)=𝒦1𝒦2(\mathcal{K}_{1}+\mathcal{K}_{2})^{*}=\mathcal{K}_{1}^{*}\cap\mathcal{K}_{2}^{*}.

2.3 Embezzlement

Embezzlement is another pivotal idea in our proof of Theorem 3. Entanglement embezzlement is a compelling notion in quantum information theory first introduced in [30]. In the following we present the construction of the universal embezzling family of this paper, but without referring to entanglement. For any integer RR define

|μR:=1χRr=1R1r|r,\displaystyle|\mu_{R}\rangle:=\frac{1}{\sqrt{\chi_{R}}}\sum_{r=1}^{R}\frac{1}{\sqrt{r}}|r\rangle, (2)

where χR=r=1Rr1\chi_{R}=\sum_{r=1}^{R}r^{-1} is the Harmonic number and a normalization factor.

Lemma 6.

[30] For any ε>0\varepsilon>0, integer dd and sufficiently large RR the following holds. Let |ϕ=j=1dcj|j|\phi\rangle=\sum_{j=1}^{d}c_{j}|j\rangle be a normal dd-dimensional vector satisfying cj0c_{j}\geq 0 for all jj. Then there exists a permutation π:[dR][dR]\pi:[dR]\to[dR] such that

μR|Pπ(|ϕ|μR)1ε.\langle\mu_{R}|P_{\pi}\,(|\phi\rangle\otimes|\mu_{R}\rangle)\geq 1-\varepsilon.

Here we identified [d]×[R][d]\times[R] with [dR][dR], e.g., via (j,r)(j1)R+r(j,r)\mapsto(j-1)R+r, and PπP_{\pi} is the permutation matrix associated with π\pi given by Pπ|j|r=|π((j1)R+r)P_{\pi}|j\rangle\otimes|r\rangle=|\pi((j-1)R+r)\rangle. Note that by abuse of notation we consider |μR|\mu_{R}\rangle as a vector in both R\mathbb{C}^{R} and dR\mathbb{C}^{dR} (by padding it with zero coordinates).

For the sake of completeness, we give the proof of this lemma in Appendix B.

We now generalize the above lemma to the case where the coordinates of |ϕ|\phi\rangle are not necessarily non-negative. For this generalization we will use another class of states. For an integer TT, let ω=e2πi/T\omega=e^{2\pi i/T} be a TT-th root of unity and define

|ϑT=1Tt=1Tωt|t.\displaystyle|\vartheta_{T}\rangle=\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\omega^{t}|t\rangle. (3)
Lemma 7.

For any ε>0\varepsilon>0, integer dd and sufficiently large R,TR,T the following holds. Let |ϕ=j=1dcj|j|\phi\rangle=\sum_{j=1}^{d}c_{j}|j\rangle be an arbitrary normal dd-dimensional vector. Then there exist a permutation π:[TdR][TdR]\pi:[TdR]\to[TdR] such that

(ϑT|μR|)Pπ(|ϑT|ϕ|μR)1ε.(\langle\vartheta_{T}|\otimes\langle\mu_{R}|)P_{\pi}\,(|\vartheta_{T}\rangle\otimes|\phi\rangle\otimes|\mu_{R}\rangle)\geq 1-\varepsilon.

Here as in the previous lemma we identify [T]×[d]×[R][T]\times[d]\times[R] with [TdR][TdR] via a fixed map, and by abuse of notation we consider |ϑT|μR|\vartheta_{T}\rangle\otimes|\mu_{R}\rangle as a vector in both TR\mathbb{C}^{T}\otimes\mathbb{C}^{R} and TdR\mathbb{C}^{TdR} (by padding it with zero coordinates).

Proof.

Let cj=bje2πiθjc_{j}=b_{j}e^{2\pi i\theta_{j}} with 0θj<10\leq\theta_{j}<1 and bj>0b_{j}>0. Then for any jj there exists an integer 0rj<T0\leq r_{j}<T such that

|θjrjT|1T.\displaystyle\big{|}\theta_{j}-\frac{r_{j}}{T}\big{|}\leq\frac{1}{T}. (4)

Let QQ be the permutation matrix acting on T\mathbb{C}^{T} that shifts the basis vectors by 1:

Q|t=|(t+1)modT,1tT.Q|t\rangle=|(t+1)\mod T\rangle,\qquad 1\leq t\leq T.

Then we have Q|ϑT=ω1|ϑTQ|\vartheta_{T}\rangle=\omega^{-1}|\vartheta_{T}\rangle. Next, let

Q~=j=1dQrj|jj|.\widetilde{Q}=\sum_{j=1}^{d}Q^{r_{j}}\otimes|j\rangle\langle j|.

Note that Q~\widetilde{Q} is a permutation matrix acting on dT\mathbb{C}^{d}\otimes\mathbb{C}^{T}. Then we have

Q~|ϑT|ϕ\displaystyle\widetilde{Q}\,|\vartheta_{T}\rangle\otimes|\phi\rangle =j=1dcjQrj|ϑT|j\displaystyle=\sum_{j=1}^{d}c_{j}Q^{r_{j}}|\vartheta_{T}\rangle\otimes|j\rangle
=j=1dωrjcj|ϑT|j\displaystyle=\sum_{j=1}^{d}\omega^{-r_{j}}c_{j}|\vartheta_{T}\rangle\otimes|j\rangle
=j=1dbje2πi(θirj/T)|ϑT|j.\displaystyle=\sum_{j=1}^{d}b_{j}e^{2\pi i(\theta_{i}-r_{j}/T)}|\vartheta_{T}\rangle\otimes|j\rangle.

Therefore, letting |ϕ^=j=1dbj|j|\hat{\phi}\rangle=\sum_{j=1}^{d}b_{j}|j\rangle and using (4), for sufficiently large TT, the vector Q~|ϑT|ϕ\widetilde{Q}\,|\vartheta_{T}\rangle\otimes|\phi\rangle is arbitrarily close to |ϑT|ϕ^|\vartheta_{T}\rangle\otimes|\hat{\phi}\rangle. On the other hand, by Lemma 6, there is a permutation matrix PP acting on |ϕ^|μR|\hat{\phi}\rangle\otimes|\mu_{R}\rangle such that for sufficiently large RR, the vector P|ϕ^|μRP|\hat{\phi}\rangle\otimes|\mu_{R}\rangle is sufficiently close to |μR|\mu_{R}\rangle. Putting these together, we find that for sufficiently large R,TR,T, the vector (IP)(Q~I)|ϑT|ϕ|μR(I\otimes P)(\tilde{Q}\otimes I)|\vartheta_{T}\rangle\otimes|\phi\rangle\otimes|\mu_{R}\rangle is sufficiently close to |ϑT|μR|\vartheta_{T}\rangle\otimes|\mu_{R}\rangle. We are done since (IP)(Q~I)(I\otimes P)(\tilde{Q}\otimes I) is a multiplication of permutation matrices, and is a permutation matrix itself.

3 Bipartite-source networks

In this section, as a warm up and to convey some of our ideas, we prove Theorem 3 for networks all of whose sources are bipartite and the functions fif_{i} are real. Such a network can be visualized as a graph with nodes representing parties and edges representing sources. Thus for such networks, a source SαS_{\alpha} adjacent to two parties Ai,AjA_{i},A_{j} is denoted by SijS_{ij} and sometimes by α=(i,j)\alpha=(i,j).

In the following we restate Theorem 3 for bipartite-source networks and real functions.

Theorem 8 (Main result, bipartite-source networks).

Let 𝒩\mathcal{N} be a network with parties AiA_{i}, i=1,,ni=1,\dots,n and with sources SαS_{\alpha}, α=1,,m\alpha=1,\dots,m, that are all bipartite. Consider signals emitted by sources and measurements performed by the parties in an arbitrary underlying physical theory that satisfies the no-signaling and independence principles. Suppose that this results in an output joint distribution p(a1,,an)p(a_{1},\dots,a_{n}). Let fi(ai)f_{i}(a_{i}) be a real function of the output of party AiA_{i}. Then the covariance matrix 𝒞(f1,,fn)\mathcal{C}(f_{1},\dots,f_{n}) admits an 𝒩\mathcal{N}-compatible matrix decomposition.

Our proof of this theorem is in two steps. First, based on non-fanout inflations, we show in Lemma 10 that the Schur product of the covariance matrix 𝒞(f1,,fn)\mathcal{C}(f_{1},\dots,f_{n}) with any sign matrix (to be defined) is positive semidefinite. Then, using the theory of dual cones, in Lemma 11 we show that any matrix with the aforementioned property admits an 𝒩\mathcal{N}-compatible matrix decomposition.

Definition 9 (Sign matrix).

Consider a network 𝒩\mathcal{N} all of whose sources are bipartite. A sign function ϵ\epsilon on 𝒩\mathcal{N} is a function that assigns ϵ(α){±1}\epsilon(\alpha)\in\{\pm 1\} to any source SαS_{\alpha} of 𝒩\mathcal{N}. Then a sign matrix Γϵ=(γij)\Gamma_{\epsilon}=(\gamma_{ij}) associated to ϵ\epsilon is an n×nn\times n symmetric matrix such that

γij={1i=j,ϵ(α)ij&αi,j.\displaystyle\gamma_{ij}=\begin{cases}1&\qquad i=j,\\ \epsilon(\alpha)&\qquad i\neq j~{}\&~{}\alpha\to i,j.\end{cases} (5)

Note that if parties iji\neq j do not have a common source, then the entry γij\gamma_{ij} is unconstrained and can take any value.

We can now state our first lemma.

Lemma 10.

Let 𝒩\mathcal{N} be a network with parties A1,,AnA_{1},\dots,A_{n} and with bipartite sources. Let fi(ai)f_{i}(a_{i}) be an arbitrary real function of the output of AiA_{i}. Then for any sign matrix Γϵ\Gamma_{\epsilon}, the Schur product 𝒞(f1,,fn)Γϵ\mathcal{C}(f_{1},\dots,f_{n})\circ\Gamma_{\epsilon} is positive semidefinite.

Note that although the entries γij\gamma_{ij} of a sign matrix Γϵ\Gamma_{\epsilon} for which i,ji,j do not have a common source are not uniquely determined in terms of ϵ\epsilon, the Schur product 𝒞(f1,,fn)Γϵ\mathcal{C}(f_{1},\dots,f_{n})\circ\Gamma_{\epsilon} depends only on ϵ\epsilon (by the independence principle, the corresponding entries in 𝒞(f1,,fn)\mathcal{C}(f_{1},\dots,f_{n}) vanish, forcing those entries in 𝒞(f1,,fn)Γϵ\mathcal{C}(f_{1},\dots,f_{n})\circ\Gamma_{\epsilon} to be zero).

Proof.

To prove this lemma we use the inflation technique discussed in Subsection 2.1. Consider the following non-fanout inflation of 𝒩\mathcal{N}. For any party AiA_{i} in 𝒩\mathcal{N} consider two parties Ai(1),Ai(2)A_{i}^{(1)},A_{i}^{(2)} in the inflated network 𝒩~\widetilde{\mathcal{N}}. Then for any source α=(i,j)\alpha=(i,j) in 𝒩\mathcal{N} consider two sources α(1),α(2)\alpha^{(1)},\alpha^{(2)} in 𝒩~\widetilde{\mathcal{N}}. If ϵ(α)=+1\epsilon(\alpha)=+1, we connect one of them to Ai(1),Aj(1)A_{i}^{(1)},A_{j}^{(1)} and the other one to Ai(2),Aj(2)A_{i}^{(2)},A_{j}^{(2)}. If ϵ(α)=1\epsilon(\alpha)=-1, then we connect one of them to Ai(1),Aj(2)A_{i}^{(1)},A_{j}^{(2)} and the other one to Ai(2),Aj(1)A_{i}^{(2)},A_{j}^{(1)}. See Figure 2 for an example of a sign function and its associated inflation.

Now suppose that party Ai(k)A_{i}^{(k)} computes function fif_{i} of her output, and let 𝒞~\widetilde{\mathcal{C}} be the covariance matrix of these 2n2n functions, that is a (2n)×(2n)(2n)\times(2n) positive semidefinite matrix. Suppose that we index rows and columns of 𝒞~\widetilde{\mathcal{C}} by iki_{k}, 1in1\leq i\leq n and k=1,2k=1,2, representing party Ai(k)A_{i}^{(k)}. Then as mentioned in Subsection 2.1 entries of 𝒞~\widetilde{\mathcal{C}} can be computed as follows:

  • 𝒞~ik,i=δk,𝖵𝖺𝗋(fi)\widetilde{\mathcal{C}}_{i_{k},i_{\ell}}=\delta_{k,\ell}\mathsf{Var}(f_{i}),

  • 𝒞~ik,j=0\widetilde{\mathcal{C}}_{i_{k},j_{\ell}}=0 if AiAjA_{i}\neq A_{j} do not have a common source in 𝒩\mathcal{N},

  • 𝒞~ik,j=δk,𝖢𝗈𝗏(fi,fj)\widetilde{\mathcal{C}}_{i_{k},j_{\ell}}=\delta_{k,\ell}\mathsf{Cov}(f_{i},f_{j}) if α=(i,j)\alpha=(i,j) is a source in 𝒩\mathcal{N} and ϵ(α)=+1\epsilon(\alpha)=+1,

  • 𝒞~ik,j=δ3k,𝖢𝗈𝗏(fi,fj)\widetilde{\mathcal{C}}_{i_{k},j_{\ell}}=\delta_{3-k,\ell}\mathsf{Cov}(f_{i},f_{j}) if α=(i,j)\alpha=(i,j) is a source in 𝒩\mathcal{N} and ϵ(α)=1\epsilon(\alpha)=-1.

Indeed, the covariance matrix 𝒞~\widetilde{\mathcal{C}} can be written as

𝒞~=α=(i,j)𝖢𝗈𝗏(fi,fj)|ij|Tϵ(α),\widetilde{\mathcal{C}}=\sum_{\alpha=(i,j)}\mathsf{Cov}(f_{i},f_{j})|i\rangle\langle j|\otimes T_{\epsilon(\alpha)}, (6)

where

T+1=(1001),T1=(0110).T_{+1}=\begin{pmatrix}1&0\\ 0&1\end{pmatrix},\qquad\qquad T_{-1}=\begin{pmatrix}0&1\\ 1&0\end{pmatrix}.

Next, introducing the Hadamard matrix

H=12(1111),H=\frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\ 1&-1\end{pmatrix},

we find that

(IH)𝒞~(IH)=α=(i,j)𝖢𝗈𝗏(fi,fj)|ij|(100ϵ(α)).(I\otimes H)\cdot\widetilde{\mathcal{C}}\cdot(I\otimes H^{\dagger})=\sum_{\alpha=(i,j)}\mathsf{Cov}(f_{i},f_{j})|i\rangle\langle j|\otimes\begin{pmatrix}1&0\\ 0&\epsilon(\alpha)\end{pmatrix}. (7)

Recall that 𝒞~\widetilde{\mathcal{C}} is positive semidefinite. Then the matrix on the right hand side of (7), and any of its principal submatrices, is also positive semidefinite. This means that the submatrix consisting of rows and columns indexed by i2i_{2}, 1in1\leq i\leq n, that is equal to

α=(i,j)𝖢𝗈𝗏(fi,fj)ϵ(α)|ij|=𝒞Γϵ,\sum_{{\alpha}=(i,j)}\mathsf{Cov}(f_{i},f_{j})\epsilon(\alpha)|i\rangle\langle j|=\mathcal{C}\circ\Gamma_{\epsilon},

is positive semidefinite. ∎

We now show that any matrix satisfying the condition of Lemma 10 admits an 𝒩\mathcal{N}-compatible matrix decomposition.

Lemma 11.

Let 𝒩\mathcal{N} be a network with parties A1,,AnA_{1},\dots,A_{n} and with bipartite sources. Consider an n×nn\times n positive semidefinite matrix MM such that Mij=0M_{ij}=0 if AiAjA_{i}\neq A_{j} do not share a source. Then MM admits an 𝒩\mathcal{N}-compatible matrix decomposition if and only if M^\widehat{M} is positive semidefinite where

M^ij={Miii=j,|Mij|ij,\displaystyle\widehat{M}_{ij}=\begin{cases}M_{ii}&\quad i=j,\\ -|M_{ij}|&\quad i\neq j,\end{cases}

is the comparison matrix of MM.

We note that this lemma is first proven in [24] in the context of coherence theory in the special case where the underlying graph is complete (i.e., every two parties share a source). Moreover, the importance of this lemma in the causal inference problem has been notified in [16]. This lemma also has applications in entanglement theory [27]. We will generalise it in Lemma 14. Here we present a proof different from those of [24, 27].

Proof.

We prove this lemma using the theory of dual cones. We first introduce two sets of positive semidefinite matrices as follows:

  • We let 𝒦\mathcal{K} be the set of n×nn\times n positive semidefinite matrices MM such that Mij=0M_{ij}=0 if AiAjA_{i}\neq A_{j} do not have a common source in 𝒩\mathcal{N}, and M^0\widehat{M}\succeq 0.

  • We let \mathcal{L} be the set of positive semidefinite n×nn\times n matrices that admit an 𝒩\mathcal{N}-compatible matrix decomposition. Note that using the notation we introduced in Definition 1 we have =αα\mathcal{L}=\sum_{\alpha}\mathcal{L}_{\alpha}.

The statement of the lemma says that 𝒦=\mathcal{K}=\mathcal{L}. As shown in Appendix C it is not hard to verify that both 𝒦,\mathcal{K},\mathcal{L} are closed convex cones. Then using Lemma 5, to establish 𝒦=\mathcal{K}=\mathcal{L} it suffices to prove that 𝒦\mathcal{L}\subseteq\mathcal{K} and 𝒦\mathcal{L}^{*}\subseteq\mathcal{K}^{*}.

𝒦\mathcal{L}\subseteq\mathcal{K}:

For any MM\in\mathcal{L} there is a decomposition M=αMαM=\sum_{\alpha}M_{\alpha} with MααM_{\alpha}\in\mathcal{L}_{\alpha}. We note that for any 2×22\times 2 positive semidefinite matrix XX we have X^0\widehat{X}\succeq 0. This means that M^α0\widehat{M}_{\alpha}\succeq 0 for any α\alpha. Therefore, M^=αM^α\widehat{M}=\sum_{\alpha}\widehat{M}_{\alpha} is positive semidefinite and M𝒦M\in\mathcal{K}.

𝒦\mathcal{L}^{*}\subseteq\mathcal{K}^{*}:

We need to show that for any XX\in\mathcal{L}^{*} and M𝒦M\in\mathcal{K} we have X,M=tr(XM)0\langle X,M\rangle=\text{\rm tr}(X^{\dagger}M)\geq 0. To this end, first remark that by Lemma 5, we have =αα\mathcal{L}^{*}=\bigcap_{\alpha}\mathcal{L}_{\alpha}^{*}. On the other hand, for α=(i,j)\alpha=(i,j), the cone α\mathcal{L}_{\alpha}^{*} is the set of matrices whose α\alpha-block, i.e., the principal submatrix consisting of rows and columns indexed by i,ji,j, is positive semidefinite. Therefore, XX\in\mathcal{L}^{*} means that for any α=(i,j)\alpha=(i,j) we have |Xij|XiiXjj|X_{ij}|\leq\sqrt{X_{ii}X_{jj}}. Then we have

tr(XM)\displaystyle\text{\rm tr}\left(X^{\dagger}M\right) =iMiiXii+α=(i,j)(MijX¯ij+MjiX¯ji)\displaystyle=\sum_{i}M_{ii}X_{ii}+\sum_{\alpha=(i,j)}\big{(}M_{ij}\bar{X}_{ij}+M_{ji}\bar{X}_{ji}\big{)}
iMiiXii2α=(i,j)|MijXij|\displaystyle\geq\sum_{i}M_{ii}X_{ii}-2\sum_{\alpha=(i,j)}|M_{ij}X_{ij}|
iMiiXii2α=(i,j)|Mij|XiiXjj\displaystyle\geq\sum_{i}M_{ii}X_{ii}-2\sum_{\alpha=(i,j)}|M_{ij}|\sqrt{X_{ii}X_{jj}}
v|M^|v\displaystyle\geq\langle v|\,\widehat{M}\,|v\rangle
0,\displaystyle\geq 0,

where |v|v\rangle is the vector with coordinates Xii\sqrt{X_{ii}}, 1in1\leq i\leq n. Here the last inequality holds since by assumption M𝒦M\in\mathcal{K} and M^0\widehat{M}\succeq 0. ∎

We can now give the proof of Theorem 8.

Proof of Theorem 8.

By Lemma 11 we need to show that 𝒞^\widehat{\mathcal{C}} is positive semidefinite where 𝒞=𝒞(f1,,fn)\mathcal{C}=\mathcal{C}(f_{1},\dots,f_{n}). On the other hand, since by assumption entries of 𝒞\mathcal{C} are real, there is a sign matrix Γϵ\Gamma_{\epsilon} such that 𝒞^=𝒞Γϵ\widehat{\mathcal{C}}=\mathcal{C}\circ\Gamma_{\epsilon}. Next, by Lemma 10 we know that 𝒞Γϵ\mathcal{C}\circ\Gamma_{\epsilon} is positive semidefinite. We are done.

We remark that in Theorem 8 for simplicity of presentation and conveying some of our ideas we restrict to real functions fif_{i}. In the next section where we prove Theorem 3 in its most general form, we cover complex functions fif_{i} as well. Nevertheless, here we briefly explain the changes we should make in the above proofs in order to include complex functions.

Lemma 11 already works for complex matrices, so we only need to generalize Lemma 10. To this end, let ϵ\epsilon be a function that associates a norm-1 complex number ϵ(α)\epsilon(\alpha) to any source α\alpha. Then we define the generalised sign matrix Γϵ=(γij)\Gamma_{\epsilon}=(\gamma_{ij}) to be an n×nn\times n matrix with entries

γij={1i=j,ϵ(α)i<j&αi,j,ϵ¯(α)i>j&αi,j.\displaystyle\gamma_{ij}=\begin{cases}1&\qquad i=j,\\ \epsilon(\alpha)&\qquad i<j~{}\&~{}\alpha\to i,j,\\ \bar{\epsilon}(\alpha)&\qquad i>j~{}\&~{}\alpha\to i,j.\end{cases} (8)

We note that for any positive semidefinite matrix MM, there is an appropriate ϵ\epsilon such that M^=MΓϵ\widehat{M}=M\circ\Gamma_{\epsilon}. Thus for complex functions fif_{i}, we need to generalize Lemma 10 and show that 𝒞(f1,,fn)Γϵ0\mathcal{C}(f_{1},\dots,f_{n})\circ\Gamma_{\epsilon}\succeq 0 for any function ϵ(α)\epsilon(\alpha) with |ϵ(α)|=1|\epsilon(\alpha)|=1. To prove such a lemma, observe that for sufficiently large dd, there are 0tα<d0\leq t_{\alpha}<d such that |ϵ(α)ωtα||\epsilon(\alpha)-\omega^{t_{\alpha}}| is arbitrarily small, where ω=e2πi/d\omega=e^{2\pi i/d} is the dd-th root of unity. Then by a continuity argument it suffices to show that 𝒞(f1,,fn)Γϵ0\mathcal{C}(f_{1},\dots,f_{n})\circ\Gamma_{\epsilon}\succeq 0 in the special case where ϵ(α)=ωtα\epsilon(\alpha)=\omega^{t_{\alpha}}. To prove this, we consider an inflation of 𝒩\mathcal{N} with dd copies per source and party. Letting α1,,αd\alpha_{1},\dots,\alpha_{d} be copies of α=(i,j)\alpha=(i,j), we connect αk\alpha_{k} to parties Ai(k)A_{i}^{(k)} and Aj(k+tα)A_{j}^{(k+t_{\alpha})}, the kk-th and (k+tα)(k+t_{\alpha})-th copies of AiA_{i} and AjA_{j} respectively. Next, we consider the covariance matrix associated to this inflated network that is an (nd)×(nd)(nd)\times(nd) positive semidefinite matrix. The d×dd\times d block of this matrix associated to the source α=(i,j)\alpha=(i,j) equals 𝖢𝗈𝗏(fi,fj)\mathsf{Cov}(f_{i},f_{j}) times a permutation matrix that is a shift by tαt_{\alpha}. Then, conjugating this (nd)×(nd)(nd)\times(nd) matrix with the block-diagonal matrix all of whose diagonal blocks are Fourier transform, we can simultaneously diagonalize all blocks of the (nd)×(nd)(nd)\times(nd) covariance matrix. Next, putting the second diagonal element of these blocks together we obtain an n×nn\times n matrix that is equal to 𝒞(f1,,fn)Γϵ\mathcal{C}(f_{1},\dots,f_{n})\circ\Gamma_{\epsilon}. Thus, this matrix is positive semidefinite since it is a principal submatrix of a positive semidefinite matrix. This gives the proof of Lemma 10 for complex functions.

In the next section, building on the above ideas, we generalize Theorem 8 for arbitrary NDCS networks and for arbitrary complex functions.

4 NDCS networks

In this section we prove Theorem 3 in its most general form. As in the previous section, the proof is in two steps. We first show in Lemma 13 (which generalises Lemma 10) that the Schur product of the covariance matrix 𝒞(f1,,fn)\mathcal{C}(f_{1},\dots,f_{n}) with any twisted Gram matrix (to be defined) is positive semidefinite. We prove this using non-fanout inflations. Then in Lemma 14 (which generalises Lemma 11) we show that any matrix with the aforementioned property admits an 𝒩\mathcal{N}-compatible matrix decomposition. To prove this result we use the theory of dual cones and the idea of embezzlement discussed in Subsection 2.3.

Let us first generalise the notion of sign matrices. Here, the sign function in Definition 9 is replaced by a collection of permutations and a set of vectors.

Definition 12 (Twisted Gram matrix).

Let 𝒩\mathcal{N} be an arbitrary NDCS network with nn parties A1,,AnA_{1},\dots,A_{n}. Let |ψ1,,|ψnd|\psi_{1}\rangle,\dots,|\psi_{n}\rangle\in\mathbb{C}^{d} be arbitrary dd-dimensional vectors. Also for any source α\alpha and parties AiA_{i} with αi\alpha\to i, let πiα\pi_{i}^{\alpha} be an arbitrary permutation on {1,,d}\{1,\dots,d\}. Let PπiαP_{\pi_{i}^{\alpha}} be the associated permutation matrix acting on d\mathbb{C}^{d} (i.e., Pπiα|x=|πiα(x)P_{\pi_{i}^{\alpha}}|x\rangle=|\pi_{i}^{\alpha}(x)\rangle for any 1xd1\leq x\leq d). Then a twisted Gram matrix associated to these vectors and permutations is an n×nn\times n matrix WW such that

Wij={ψi|ψii=j,ψi|PπiαPπjα|ψjij&αi,j.\displaystyle W_{ij}=\begin{cases}\langle\psi_{i}|\psi_{i}\rangle&\qquad i=j,\\ \langle\psi_{i}|P^{\dagger}_{\pi_{i}^{\alpha}}P_{\pi_{j}^{\alpha}}|\psi_{j}\rangle&\qquad i\neq j~{}\&~{}\alpha\to i,j.\end{cases} (9)

Note that, when parties iji\neq j do not share any common source, WijW_{ij} is unconstrained and can take any value. Also, note that WW is well-defined since by the NDCS assumption if Ai,AjA_{i},A_{j} have a common source, then it is unique.

We can now generalise Lemma 10.

Lemma 13.

Let 𝒩\mathcal{N} be an arbitrary NDCS network with nn parties A1,,AnA_{1},\dots,A_{n}. Let fi(ai)f_{i}(a_{i}), i=1,,ni=1,\dots,n, be an arbitrary scalar function of the output of AiA_{i}. Then for any twisted Gram matrix WW, the Schur product 𝒞(f1,,fn)W\mathcal{C}(f_{1},\dots,f_{n})\circ W is positive semidefinite.

Proof.

Let WW be a twisted Gram matrix given by (9). Note that the entries of WW that are not specified by (9) are not important since by the independence principle the corresponding entries in 𝒞(f1,,fn)\mathcal{C}(f_{1},\dots,f_{n}) vanish, so 𝒞(f1,,fn)W\mathcal{C}(f_{1},\dots,f_{n})\circ W is independent of those entries of WW. We consider a non-fanout inflation of the network as follows. For any party AiA_{i} we consider dd copies Ai(1),,Ai(d)A_{i}^{(1)},\dots,A_{i}^{(d)}, and for any source SαS_{\alpha} we consider dd copies Sα(1),,Sα(d)S_{\alpha}^{(1)},\dots,S_{\alpha}^{(d)}. If source SαS_{\alpha} is adjacent to party AiA_{i} in 𝒩\mathcal{N}, in the inflated network we connect source Sα(k)S_{\alpha}^{(k)} to the party Ai((πiα)1(k))A_{i}^{((\pi_{i}^{\alpha})^{-1}(k))}. This fully describes the inflated network. Next, we assume that party Ai(k)A_{i}^{(k)} computes fi(k)f_{i}^{(k)} that is identical to fif_{i}. Then the covariance matrix

𝒞~=𝒞(f1(1),,f1(d),f2(1),,f2(d),,fn(1),,fn(d)),\widetilde{\mathcal{C}}=\mathcal{C}\big{(}f_{1}^{(1)},\dots,f_{1}^{(d)},f_{2}^{(1)},\dots,f_{2}^{(d)},\dots\dots,f_{n}^{(1)},\dots,f_{n}^{(d)}\big{)},

is positive semidefinite. Observe that 𝒞~\widetilde{\mathcal{C}} is an (nd)×(nd)(nd)\times(nd) matrix consisting of d2d^{2} blocks of size n×nn\times n. These blocks labeled by pairs (i,j)(i,j) with 1i,jn1\leq i,j\leq n and denoted by 𝒞~ij\widetilde{\mathcal{C}}_{ij} are described as follows:

  • The (i,i)(i,i)-block is diagonal with 𝖵𝖺𝗋(fi)=𝖢𝗈𝗏(fi,fi)\mathsf{Var}(f_{i})=\mathsf{Cov}(f_{i},f_{i}) on the diagonal, i.e., 𝒞~ii=𝖵𝖺𝗋(fi)I\widetilde{\mathcal{C}}_{ii}=\mathsf{Var}(f_{i})I.

  • If iji\neq j and parties Ai,AjA_{i},A_{j} do not share any source, then the (i,j)(i,j)-block is 𝒞~ij=0\widetilde{\mathcal{C}}_{ij}=0.

  • If iji\neq j and parties Ai,AjA_{i},A_{j} share source SαS_{\alpha}, then the (i,j)(i,j)-block equals 𝒞~ij=𝖢𝗈𝗏(fi,fj)PπiαPπjα\widetilde{\mathcal{C}}_{ij}=\mathsf{Cov}(f_{i},f_{j})P_{\pi_{i}^{\alpha}}^{\dagger}P_{\pi_{j}^{\alpha}}. See Figure 3 to verify this.

Refer to caption
Figure 3: If αAi,Aj\alpha\to A_{i},A_{j} in 𝒩\mathcal{N}, then in the inflated network Sα(k)S_{\alpha}^{(k)} is adjacent to Ai((πiα)1(k))A_{i}^{((\pi_{i}^{\alpha})^{-1}(k))} and Aj((πjα)1(k))A_{j}^{((\pi_{j}^{\alpha})^{-1}(k))} for any kk. This means that Sα(πjα(k))S_{\alpha}^{(\pi_{j}^{\alpha}(k))} is adjacent to Aj(k)A_{j}^{(k)} and Ai((πiα)1πjα(k))A_{i}^{((\pi_{i}^{\alpha})^{-1}\circ\pi_{j}^{\alpha}(k))}.

Let RiR_{i} be a d×dd\times d matrix such that

RiRi=RiRi=ψi2IRi|i=|ψi.R_{i}R_{i}^{\dagger}=R_{i}^{\dagger}R_{i}=\|\psi_{i}\|^{2}I\qquad\quad R_{i}|i\rangle=|\psi_{i}\rangle.

Such a matrix can be constructed as follows. If |ψ=0|\psi\rangle=0, then let Ri=0R_{i}=0. Otherwise, start with 1ψi|ψi\frac{1}{\|\psi_{i}\|}|\psi_{i}\rangle and extend it to an orthonormal basis; put those basis vectors in the columns of a matrix to get a unitary matrix; finally multiply this unitary by ψi\|\psi_{i}\| to obtain RiR_{i}. Next, let RR be the (nd)×(nd)(nd)\times(nd) block diagonal matrix whose ii-th block on the diagonal is RiR_{i}. Then R𝒞~RR^{\dagger}\widetilde{\mathcal{C}}R is a positive semidefinite matrix consisting of blocks Ri𝒞~ijRjR_{i}^{\dagger}\widetilde{\mathcal{C}}_{ij}R_{j}. Therefore, the principal submatrix of R𝒞~RR^{\dagger}\widetilde{\mathcal{C}}R consisting of the (1,1)(1,1)-entry of all these blocks is also a positive semidefinite matrix. Denoting this n×nn\times n matrix by MM, the entries of MM are computed as follows:

  • Mii=(Ri𝒞~iiRi)11=𝖵𝖺𝗋(fi)ψi2M_{ii}=(R_{i}^{\dagger}\,\widetilde{\mathcal{C}}_{ii}\,R_{i})_{11}=\mathsf{Var}(f_{i})\|\psi_{i}\|^{2}.

  • If iji\neq j and parties Ai,AjA_{i},A_{j} do not share any source, then the (i,j)(i,j)-block is Mij=0M_{ij}=0.

  • If iji\neq j and parties Ai,AjA_{i},A_{j} share source SαS_{\alpha}, then Mij=𝖢𝗈𝗏(fi,fj)ψi|PπiαPπjα|ψjM_{ij}=\mathsf{Cov}(f_{i},f_{j})\langle\psi_{i}|P_{\pi_{i}^{\alpha}}^{\dagger}P_{\pi_{j}^{\alpha}}|\psi_{j}\rangle.

Therefore,

M=𝒞(f1,,fn)W,M=\mathcal{C}(f_{1},\dots,f_{n})\circ W,

and it is a positive semidefinite matrix. We are done.

In the following lemma we present a description of the dual cone of positive semidefinite matrices that admit a network-compatible matrix decomposition. This lemma may be of independent interest, notably in coherence and entanglement theory.

Lemma 14.

Let 𝒩\mathcal{N} be an arbitrary NDCS network with nn parties AiA_{i}, i=1,,ni=1,\dots,n and sources SαS_{\alpha}, α=1,,m\alpha=1,\dots,m. Let \mathcal{L} be the cone of positive semidefinite matrices that admit an 𝒩\mathcal{N}-compatible matrix decomposition. Then

=𝒲¯,\mathcal{L}^{*}=\overline{\mathcal{W}},

where 𝒲\mathcal{W} is the set of twisted Gram matrices of Definition 12, and 𝒲¯\overline{\mathcal{W}} is its closure.

Note that this lemma is a generalization of Lemma 11, expressed in the dual picture. In the bipartite case, twisted Gram matrices turn to (generalized) sign matrices which have a much simpler description. Then the dual cone 𝒲\mathcal{W}^{*}, that is the cone of interest, is easily characterized; it contains all matrices whose comparison matrix is positive semidefinite. This provides the explicit characterization of \mathcal{L} in Lemma 11. In the multipartite case, the set 𝒲\mathcal{W} takes a much more complicated structure, so characterizing the dual cone 𝒲\mathcal{W}^{*} is not easy. To clarify this issue, note that assuming that all sources are bipartite, the permutations that we need to use are either transpositions (in the real case) or shifts (in the complex case). Such permutations can be simultaneously diagonalized using Fourier transform as we did in the proof of Lemma 10 and mentioned at the end of Section 3. However, in the multipartite case, we need to use arbitrary permutations that cannot be simultaneously diagonalized. This makes it difficult to obtain a simple direct description of the cone 𝒲\mathcal{W}.

Proof.

For any n×nn\times n matrix MM we let M(α)M^{(\alpha)} be the principal submatrix of MM consisting of rows and columns indexed by parties ii with αi\alpha\to i. We call M(α)M^{(\alpha)} the α\alpha-block of MM. Let α\mathcal{L}_{\alpha} be the cone of positive semidefinite matrices MM whose only non-zero entries are in their α\alpha-block, i.e., Mij0M_{ij}\neq 0 only if αi,j\alpha\to i,j. We note that as the dual of the cone of positive semidefinite matrices is itself, α\mathcal{L}_{\alpha}^{*} is the space of matrices whose α\alpha-block is positive semidefinite. On the other hand, by definition =αα\mathcal{L}=\sum_{\alpha}\mathcal{L}_{\alpha}, so by Lemma 5 we have

=αα.\mathcal{L}^{*}=\bigcap_{\alpha}\mathcal{L}_{\alpha}^{*}.

That is, \mathcal{L}^{*} consists of matrices whose α\alpha-block for any source SαS_{\alpha} is positive semidefinite.

𝒲¯\overline{\mathcal{W}}\subseteq\mathcal{L}^{*}:

By definition, any α\alpha-block of any matrix W𝒲W\in\mathcal{W} is a Gram matrix and is positive semidefinite. Then by the above discussion, we have 𝒲\mathcal{W}\subseteq\mathcal{L}^{*}. On the other hand \mathcal{L}^{*}, as a dual cone, is closed. Therefore, 𝒲¯\overline{\mathcal{W}}\subseteq\mathcal{L}^{*}.

𝒲¯\mathcal{L}^{*}\subseteq\overline{\mathcal{W}}:

Let MM\in\mathcal{L}^{*}. In the following we show that MM can be approximated by elements of 𝒲\mathcal{W} with arbitrary precision, which shows that M𝒲¯M\in\overline{\mathcal{W}}. To this end, we note that if iji\neq j do not share any source, the (i,j)(i,j)-entries of matrices in 𝒲\mathcal{W} can take any value. So such entries of MM can be ignored in approximating MM with elements of 𝒲\mathcal{W}. Indeed, it suffices to consider entries of MM that belong to an α\alpha-block for some source α\alpha.

Since by the characterization of \mathcal{L}^{*}, the α\alpha-block of MM is positive semidefinite, there are vectors |ϕiα|\phi_{i}^{\alpha}\rangle, for any ii with αi\alpha\to i, such that

Mij=(M(α))ij=ϕiα|ϕjα.M_{ij}=(M^{(\alpha)})_{ij}=\langle\phi_{i}^{\alpha}|\phi_{j}^{\alpha}\rangle.

By padding these vectors with zero coordinates if necessary, we assume that |ϕiαd|\phi_{i}^{\alpha}\rangle\in\mathbb{C}^{d} for all i,αi,\alpha, for some dd. To prove the lemma we need to show that for any ε>0\varepsilon>0 there are vectors |ψi|\psi_{i}\rangle and permutations πiα\pi_{i}^{\alpha} such that

|ψi|PπiαPπjα|ψjϕiα|ϕjα|ε.\big{|}\langle\psi_{i}|P_{\pi_{i}^{\alpha}}^{\dagger}P_{\pi_{j}^{\alpha}}|\psi_{j}\rangle-\langle\phi_{i}^{\alpha}|\phi_{j}^{\alpha}\rangle\big{|}\leq\varepsilon.

To construct these vectors we use Lemma 7. Let

|ψi=ϕiα|ϑT|μR=Mii|ϑT|μR.|\psi_{i}\rangle=\|\phi_{i}^{\alpha}\|\cdot|\vartheta_{T}\rangle\otimes|\mu_{R}\rangle=\sqrt{M_{ii}}|\vartheta_{T}\rangle\otimes|\mu_{R}\rangle.

where |μR|\mu_{R}\rangle and |ϑT|\vartheta_{T}\rangle are defined in (2) and (3) respectively. By Lemma 7 there exists a permutation πiα\pi_{i}^{\alpha} such that

ψi|Pπiα(|ϑT|ϕiα|μR)MiiκT,R,\langle\psi_{i}|P_{\pi_{i}^{\alpha}}\big{(}|\vartheta_{T}\rangle\otimes|\phi_{i}^{\alpha}\rangle\otimes|\mu_{R}\rangle\big{)}\geq M_{ii}\,\kappa_{T,R},

where κT,R0\kappa_{T,R}\to 0 as T,RT,R\to\infty. This means that for sufficiently large T,RT,R, the vector Pπiα|ψiP_{\pi_{i}^{\alpha}}|\psi_{i}\rangle is arbitrarily close to |ϑT|ϕiα|μR|\vartheta_{T}\rangle\otimes|\phi_{i}^{\alpha}\rangle\otimes|\mu_{R}\rangle. Therefore, for sufficiently large T,RT,R we have

ε|ψi|PπiαPπjα|ψjϑT|ϑTϕiα|ϕjαμR|μR|=|ψi|PπiαPπjα|ψjϕiα|ϕjα|.\varepsilon\geq\big{|}\langle\psi_{i}|P_{\pi_{i}^{\alpha}}^{\dagger}P_{\pi_{j}^{\alpha}}|\psi_{j}\rangle-\langle\vartheta_{T}|\vartheta_{T}\rangle\cdot\langle\phi_{i}^{\alpha}|\phi_{j}^{\alpha}\rangle\cdot\langle\mu_{R}|\mu_{R}\rangle\big{|}=\big{|}\langle\psi_{i}|P_{\pi_{i}^{\alpha}}^{\dagger}P_{\pi_{j}^{\alpha}}|\psi_{j}\rangle-\langle\phi_{i}^{\alpha}|\phi_{j}^{\alpha}\rangle\big{|}.

We are done.

Putting Lemma 13 and Lemma 14 together, we can now prove Theorem 3.

Proof of Theorem 3.

Using the notation of Lemma 14 we need to show that

𝒞=𝒞(f1,,fn).\mathcal{C}=\mathcal{C}(f_{1},\dots,f_{n})\in\mathcal{L}.

To this end, using ()=(\mathcal{L}^{*})^{*}=\mathcal{L}, it suffices to show that for any WW\in\mathcal{L}^{*} we have

𝒞,W=tr(𝒞W)0.\langle\mathcal{C},W\rangle=\text{\rm tr}(\mathcal{C}W)\geq 0.

Then using =𝒲¯\mathcal{L}^{*}=\overline{\mathcal{W}} established in Lemma 14 and by a continuity argument, we may assume that W𝒲W\in\mathcal{W} and takes the form (9). Now letting |J=j=1n|j|J\rangle=\sum_{j=1}^{n}|j\rangle, we have

𝒞,W=J|𝒞W|J.\langle\mathcal{C},W\rangle=\langle J|\mathcal{C}\circ W|J\rangle.

On the other hand, by Lemma 13 the matrix 𝒞W\mathcal{C}\circ W is positive semidefinite. Therefore, 𝒞,W0\langle\mathcal{C},W\rangle\geq 0.

5 Conclusion

In this paper we showed that the covariance matrix of any output distribution of NDCS networks can be written as a summation of certain positive semidefinite matrices. This result holds in the local classical theory, the quantum theory, the box world [28, 26] and more generally in any GPT [4, 14, 31] compatible with the network structure. To our knowledge, our result provides the first universal limit on correlations in generic networks, beyond constraints derived in specific ones (see [12, 3]). As also mentioned in [15, 1] this covariance decomposition condition can be stated as a semidefinite program and can be verified efficiently.

Our proof technique is valid only for the subclass of NDCS networks, while the covariance decomposition is known to hold for all networks in the case of local and quantum models. Hence, the necessity of this NDCS assumption in the case of generic GPTs is an open question; does the network-compatible matrix decomposition holds in all GPTs for networks that do not satisfy the NDCS assumption? Another open problem is to generalize our results for vector-valued functions beyond scalar ones.

The main conceptual tool in our proof is the inflation technique. It is proven in [17] that inflations with unbounded fanout characterize correlations in the local classical model. On the other hand, non-fanout inflation allows for a characterization of the ultimate limits on correlations in networks [12, 7, 19]. Our results show that even non-fanout inflations induce quite powerful limits on correlations in networks. To our knowledge, our work contains the first proof exploiting inflated networks of arbitrary sizes.

Note that the covariance matrix of a correlation depends only on its bipartite marginals. Yet, even the simple hexagon inflation of the triangle network (see Figure 2) allows for proving restrictions on multipartite marginals [12]. Thus, it would be interesting to find generic limits imposed on multipartite correlation functions by considering well-chosen families of non-fanout inflations, possibly of unbounded size. For instance, it would be interesting to see if the Finner’s inequality [9], as a constraint on multipartite correlations, holds for arbitrary GPTs. This inequality is proven in [23] for the quantum as well as the box world when the underlying network is the triangle.

At last, let us mention that our results may be of interest beyond causal inference in networks, particularly in coherence and entanglement theory. In Lemma 11 and Lemma 14 we characterized the space of matrices that admit a network-compatible matrix decomposition. Lemma 11 has applications in coherence and entanglement theory. Thus, Lemma 14 as a generalization of this lemma may find applications in these theories or elsewhere. Moreover, the proof of this theorem shows that the idea of embezzlement may find other applications in matrix analysis beyond the entanglement theory.

Acknowledgements.

The authors are thankful to Amin Gohari for bringing the example of Gaussian sources to their attention to show the tightness of Theorem 3; and to Elie Wolfe, Stefano Pironio and Nicolas Gisin for clarifying the different ways to define the ultimate limits of correlations in networks. M.-O.R. is supported by the Swiss National Fund Early Mobility Grant P2GEP2_191444 and acknowledges the Government of Spain (FIS2020-TRANQI and Severo Ochoa CEX2019-000910-S), Fundació Cellex, Fundació Mir-Puig, Generalitat de Catalunya (CERCA, AGAUR SGR 1381) and the ERC AdG CERQUTE.

References

  • [1] J. Åberg, R. Nery, C. Duarte, and R. Chaves. Semidefinite tests for quantum network topologies. Physical Review Letters, 125(11):110505, 2020.
  • [2] J.-M. A. Allen, J. Barrett, D. C. Horsman, C. M. Lee, and R. W. Spekkens. Quantum common causes and quantum causal models. Phys. Rev. X, 7:031021, Jul 2017.
  • [3] J.-D. Bancal and N. Gisin. Non-local boxes for networks. arXiv preprint arXiv:2102.03597, 2021.
  • [4] J. Barrett. Information processing in generalized probabilistic theories. Phys. Rev. A, 75:032304, Mar 2007.
  • [5] J. S. Bell. On the Einstein Podolsky Rosen paradox. Physics Physique Fizika, 1(3):195, 1964.
  • [6] C. Branciard, N. Gisin, and S. Pironio. Characterizing the Nonlocal Correlations Created via Entanglement Swapping. Phys. Rev. Lett., 104:170401, Apr 2010.
  • [7] X. Coiteux-Roy, E. Wolfe, and M.-O. Renou. In preparation. 2021.
  • [8] J. Dattorro. Convex optimization & Euclidean distance geometry. Meboo Publishing, 2010.
  • [9] H. Finner. A Generalization of Holder’s Inequality and Some Probability Inequalities. The Annals of probability, 20:1893–1901, 1992.
  • [10] T. Fritz. Beyond Bell’s theorem: correlation scenarios. New Journal of Physics, 14(10):103001, oct 2012.
  • [11] T. Fritz. Beyond Bell’s theorem ii: Scenarios with arbitrary causal structure. Communications in Mathematical Physics, 341:391–434, 2016.
  • [12] N. Gisin, J.-D. Bancal, Y. Cai, P. Remy, A. Tavakoli, E. Z. Cruzeiro, S. Popescu, and N. Brunner. Constraints on nonlocality in networks from no-signaling and independence. Nature Communications, 11(1):1–6, 2020.
  • [13] R. L. J. Henson and M. F. Pusey. Theory-independent limits on correlations from generalized bayesian networks. New Journal of Physics, 16:113043, 2014.
  • [14] P. Janotta and R. Lal. Generalized probabilistic theories without the no-restriction hypothesis. Phys. Rev. A, 87:052131, May 2013.
  • [15] A. Kela, K. Von Prillwitz, J. Åberg, R. Chaves, and D. Gross. Semidefinite tests for latent causal structures. IEEE Transactions on Information Theory, 66(1):339–349, 2019.
  • [16] T. Kraft, C. Spee, X.-D. Yu, and O. Gühne. Characterizing quantum networks: Insights from coherence theory. arXiv preprint arXiv:2006.06693, 2020.
  • [17] M. Navascués and E. Wolfe. The inflation technique completely solves the causal compatibility problem. Journal of Causal Inference, 8(1):70–91, 2020.
  • [18] J. Pearl. Causality. Cambridge University Press, 2009.
  • [19] S. Pironio. In preparation. 2021.
  • [20] S. Popescu and D. Rohrlich. Quantum Nonlocality as an Axiom. Found. Phys., 24(3):379–385, 1994.
  • [21] S. Popescu and D. Rohrlich. Causality and nonlocality as axioms for quantum mechanics. In Causality and Locality in Modern Physics, pages 383–389. Springer, 1998.
  • [22] M.-O. Renou and S. Beigi. Network nonlocality via rigidity of token-counting and color-matching. arXiv preprint arXiv:2011.02769, 2020.
  • [23] M.-O. Renou, Y. Wang, S. Boreiri, S. Beigi, N. Gisin, and N. Brunner. Limits on correlations in networks for quantum and no-signaling resources. Physical Review Letters, 123(7):070403, 2019.
  • [24] M. Ringbauer, T. R. Bromley, M. Cianciaruso, L. Lami, W. S. Lau, G. Adesso, A. G. White, A. Fedrizzi, and M. Piani. Certification and quantification of multilevel quantum coherence. Physical Review X, 8(4):041007, 2018.
  • [25] V. Scarani. Feats, Features and Failures of the PR-box. In AIP Conference Proceedings. AIP, 2006.
  • [26] A. J. Short and J. Barrett. Strong nonlocality: a trade-off between states and measurements. New J. Phys., 12(3):033034, 2010.
  • [27] S. Singh and I. Nechita. The ppt2 conjecture holds for all choi-type maps. arXiv preprint arXiv:2011.03809, 2020.
  • [28] P. Skrzypczyk and N. Brunner. Couplers for non-locality swapping. New J. Phys., 11(7):073014, 2009.
  • [29] P. Spirtes, C. Glymour, R. Scheines, D. Heckerman, C. Meek, G. Cooper, and T. Richardson. Causation, prediction, and search. MIT Press, 2000.
  • [30] W. van Dam and P. Hayden. Universal entanglement transformations without communication. Physical Review A, 67(6):060302, 2003.
  • [31] M. Weilenmann and R. Colbeck. Self-testing of physical theories, or, is quantum theory optimal with respect to some information-processing task? Phys. Rev. Lett., 125:060406, Aug 2020.
  • [32] E. Wolfe, R. W. Spekkens, and T. Fritz. The inflation technique for causal inference with latent variables. Journal of Causal Inference, 7(2), 2019.

Appendix A Proof of Lemma 5

Items (i), (ii), (iv) and (v) follow from the definition of dual cones.

To prove (iii), note that by definition 𝒦(𝒦)\mathcal{K}\subseteq(\mathcal{K}^{*})^{*}. To prove the inclusion in the other direction, let Z𝒦Z\notin\mathcal{K}. Since 𝒦\mathcal{K} is convex, by the Hahn-Banach separation theorem there exists YY and cc\in\mathbb{R} such that Y,Xc\langle Y,X\rangle\geq c for all X𝒦X\in\mathcal{K} and Y,Z<c\langle Y,Z\rangle<c. We note that by the definition of cones, X=0X=0 belongs to 𝒦\mathcal{K}. Therefore, c0=Y,0c\leq 0=\langle Y,0\rangle. On the other hand, suppose that X𝒦X\in\mathcal{K} is such that x=Y,X<0x=\langle Y,X\rangle<0. Then letting r=(c1)/x>0r=(c-1)/x>0, we have rX𝒦rX\in\mathcal{K}. Therefore, cY,rX=rY,X=c1c\leq\langle Y,rX\rangle=r\langle Y,X\rangle=c-1 which is a contradiction. This means that for any X𝒦X\in\mathcal{K} we have Y,X0\langle Y,X\rangle\geq 0. As a result, Y𝒦Y\in\mathcal{K}^{*} while we have Y,Z<c0\langle Y,Z\rangle<c\leq 0. This means that ZZ does not belong to (𝒦)(\mathcal{K}^{*})^{*}. Therefore, the complement of 𝒦\mathcal{K} is a subset of the complement of (𝒦)(\mathcal{K}^{*})^{*} which means that (𝒦)𝒦(\mathcal{K}^{*})^{*}\subseteq\mathcal{K}.

To prove (vi) note that by (ii) we have 𝒦i(𝒦1𝒦2)\mathcal{K}_{i}^{*}\subseteq(\mathcal{K}_{1}\cap\mathcal{K}_{2})^{*} for i=1,2i=1,2. Then since (𝒦1𝒦2)(\mathcal{K}_{1}\cap\mathcal{K}_{2})^{*} is a closed convex cone and closed under summation, we have 𝒦1+𝒦2¯(𝒦1𝒦2)\overline{\mathcal{K}_{1}^{*}+\mathcal{K}_{2}^{*}}\subseteq(\mathcal{K}_{1}\cap\mathcal{K}_{2})^{*}. To prove the other direction, suppose that Z𝒦1+𝒦2¯Z\notin\overline{\mathcal{K}_{1}^{*}+\mathcal{K}_{2}^{*}}. Then by the Hahn-Banach separation theorem and following similar steps as in the proof of (iii) we find that there exists YY such that Y,X0\langle Y,X\rangle\geq 0 for all X𝒦1+𝒦2¯X\in\overline{\mathcal{K}_{1}^{*}+\mathcal{K}_{2}^{*}}, and Y,Z<0\langle Y,Z\rangle<0. Since 𝒦i𝒦1+𝒦2¯\mathcal{K}_{i}^{*}\subseteq\overline{\mathcal{K}_{1}^{*}+\mathcal{K}^{*}_{2}}, for i=1,2i=1,2, by definition Y(𝒦i)Y\in(\mathcal{K}_{i}^{*})^{*}. Then by (iii) we have Y𝒦iY\in\mathcal{K}_{i}. This means that Y𝒦1𝒦2Y\in\mathcal{K}_{1}\cap\mathcal{K}_{2}. On the other hand, Y,Z<0\langle Y,Z\rangle<0, so Z(𝒦1𝒦2)Z\notin(\mathcal{K}_{1}\cap\mathcal{K}_{2})^{*}. We conclude that the complement of 𝒦1+𝒦2¯\overline{\mathcal{K}_{1}^{*}+\mathcal{K}_{2}^{*}} is a subset of the complement (𝒦1𝒦2)(\mathcal{K}_{1}\cap\mathcal{K}_{2})^{*}, which means that (𝒦1𝒦2)𝒦1+𝒦2¯(\mathcal{K}_{1}\cap\mathcal{K}_{2})^{*}\subseteq\overline{\mathcal{K}_{1}^{*}+\mathcal{K}_{2}^{*}}.

For (vii) take the dual of both sides of (iv) and use (iii).

Appendix B Proof of Lemma 6

Let |λRdR|\lambda_{R}\rangle\in\mathbb{C}^{dR} be the state obtained from |ϕ|μR|\phi\rangle\otimes|\mu_{R}\rangle via identification of [d]×[R][d]\times[R] with [dR][dR], so the coordinates of |λR|\lambda_{R}\rangle are cjrχR\frac{c_{j}}{\sqrt{r\chi_{R}}} for (j,r)[d]×[R](j,r)\in[d]\times[R]. Let π\pi be the permutation that sorts these coordinates in the decreasing order. Therefore,

|λ~R=Pπ|λR=s=1Rλ~s|s,|\tilde{\lambda}_{R}\rangle=P_{\pi}|\lambda_{R}\rangle=\sum_{s=1}^{R}\tilde{\lambda}_{s}|s\rangle,

with the multi-set of λ~s\tilde{\lambda}_{s}’s being the same as the multi-set of cjrχR\frac{c_{j}}{\sqrt{r\chi_{R}}}’s and λ~1λ~2λ~dR\tilde{\lambda}_{1}\geq\tilde{\lambda}_{2}\geq\cdots\geq\tilde{\lambda}_{dR}. In the following we show that μR|λ~R\langle\mu_{R}|\tilde{\lambda}_{R}\rangle is close to 11 when RR is large.

Let

Njt=|{r:cjrχR>1tχR}|=|{r:cj2t>r}|.N_{j}^{t}=\Big{|}\Big{\{}r:\,\frac{c_{j}}{\sqrt{r\chi_{R}}}>\frac{1}{\sqrt{t\chi_{R}}}\Big{\}}\Big{|}=\big{|}\big{\{}r:\,c_{j}^{2}t>r\big{\}}\big{|}.

Then we have Njt<cj2tN_{j}^{t}<c_{j}^{2}t. Moreover, by the normalization of |ϕ|\phi\rangle we have

j=1dNjt<tj=1dcj2=t.\sum_{j=1}^{d}N_{j}^{t}<t\sum_{j=1}^{d}c_{j}^{2}=t.

This means that

|{s:λ~s>1tχR}|<t.\Big{|}\Big{\{}s:\,\tilde{\lambda}_{s}>\frac{1}{\sqrt{t\chi_{R}}}\Big{\}}\Big{|}<t.

Next, using the fact that λ~1λ~t\tilde{\lambda}_{1}\geq\cdots\geq\tilde{\lambda}_{t} we conclude that

1tχRλ~t,t.\frac{1}{\sqrt{t\chi_{R}}}\geq\tilde{\lambda}_{t},\qquad\forall t.

Therefore,

μR|λ~R\displaystyle\langle\mu_{R}|\tilde{\lambda}_{R}\rangle =r=1Rλ~rrχRr=1Rλ~r2(j=1dcj2)(s=1R/d1sχR)=χR/dχRlnRlndlnR+1.\displaystyle=\sum_{r=1}^{R}\frac{\tilde{\lambda}_{r}}{\sqrt{r\chi_{R}}}\geq\sum_{r=1}^{R}\tilde{\lambda}_{r}^{2}\geq\Big{(}\sum_{j=1}^{d}c_{j}^{2}\Big{)}\cdot\Big{(}\sum_{s=1}^{\lfloor R/d\rfloor}\frac{1}{s\chi_{R}}\Big{)}=\frac{\chi_{\lfloor R/d\rfloor}}{\chi_{R}}\geq\frac{\ln R-\ln d}{\ln R+1}.

Thus, μR|λ~R\langle\mu_{R}|\tilde{\lambda}_{R}\rangle is arbitrarily close to 11 for sufficiently large RR. We are done since |λ~R|\tilde{\lambda}_{R}\rangle is obtained from |ϕ|μR|\phi\rangle\otimes|\mu_{R}\rangle by applying a permutation.

Appendix C 𝒦\mathcal{K} and \mathcal{L} are closed convex cones

In this appendix we show that the two sets 𝒦,\mathcal{K},\mathcal{L} defined in the proof of Lemma 11 are closed convex cones.

From the definition it is clear that 𝒦\mathcal{K} is closed and that rM𝒦rM\in\mathcal{K} for M𝒦M\in\mathcal{K} and r0r\geq 0. Then we need to show that for P,Q𝒦P,Q\in\mathcal{K}, we have M=P+Q𝒦M=P+Q\in\mathcal{K}. Let |v=ivi|i|v\rangle=\sum_{i}v_{i}|i\rangle be an arbitrary vector. We compute

v|M^|v\displaystyle\langle v|\widehat{M}|v\rangle i|vi|2Miii,j|vivj||Mij|\displaystyle\geq\sum_{i}|v_{i}|^{2}M_{ii}-\sum_{i,j}|v_{i}v_{j}|\cdot|M_{ij}|
=i,j|vi|2(Pii+Qii)ij|vivj||Pij+Qij|\displaystyle=\sum_{i,j}|v_{i}|^{2}(P_{ii}+Q_{ii})-\sum_{i\neq j}|v_{i}v_{j}|\cdot|P_{ij}+Q_{ij}|
i,j|vi|2(Pii+Qii)ij|vivj|(|Pij|+|Qij|)\displaystyle\geq\sum_{i,j}|v_{i}|^{2}(P_{ii}+Q_{ii})-\sum_{i\neq j}|v_{i}v_{j}|\cdot\big{(}|P_{ij}|+|Q_{ij}|\big{)}
=v^|P^|v^+v^|Q^|v^\displaystyle=\langle\hat{v}|\widehat{P}|\hat{v}\rangle+\langle\hat{v}|\widehat{Q}|\hat{v}\rangle
0,\displaystyle\geq 0,

where |v^|\hat{v}\rangle is the vector whose coordinates are |vi||v_{i}|’s, and in the last line we use P^,Q^0\widehat{P},\widehat{Q}\succeq 0. As a result, M^0\widehat{M}\succeq 0 and M𝒦M\in\mathcal{K}. Therefore, 𝒦\mathcal{K} is a closed convex cone.

From the definition it is clear that \mathcal{L} is a convex cone. Thus we need to show that \mathcal{L} is closed. Indeed, the sum of closed convex cones is not necessarily closed, yet this holds for =αα\mathcal{L}=\sum_{\alpha}\mathcal{L}_{\alpha}. To prove this, suppose that the sequence {X(j):j1}\{X^{(j)}:j\geq 1\}\subset\mathcal{L} tends to XX as jj\to\infty. Since X(j)X^{(j)}’s and XX are positive semidefinite, for sufficiently large jj we have 0X(j)X+I0\preceq X^{(j)}\preceq X+I. On the other hand, since X(j)X^{(j)} belongs to \mathcal{L}, there are Xα(j)αX^{(j)}_{\alpha}\in\mathcal{L}_{\alpha} such that X(j)=αXα(j)X^{(j)}=\sum_{\alpha}X_{\alpha}^{(j)}. Again using the fact that Xα(j)X^{(j)}_{\alpha}’s are positive semidefinite, for sufficiently large jj we have

0Xα(j)X(j)X+I.0\preceq X_{\alpha}^{(j)}\preceq X^{(j)}\preceq X+I.

Then by a compactness argument, there is subsequence {jk:k1}\{j_{k}:\,k\geq 1\} such that limkXα(jk)=Xα\lim_{k\to\infty}X_{\alpha}^{(j_{k})}=X_{\alpha} exists for all α\alpha. Now since α\mathcal{L}_{\alpha} is closed, we have XααX_{\alpha}\in\mathcal{L}_{\alpha}. On the other hand,

X=limkX(jk)=limkαXα(jk)=αXα.X=\lim_{k\to\infty}X^{(j_{k})}=\lim_{k\to\infty}\sum_{\alpha}X^{(j_{k})}_{\alpha}=\sum_{\alpha}X_{\alpha}.

This means that XX\in\mathcal{L}. Therefore, \mathcal{L} is also a closed convex cone.