This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Marginal Independence and Partial Set Partitions

Francisco Ponce Carrión and Seth Sullivant Department of Mathematics
North Carolina State University, Raleigh, NC, 27695
[email protected] [email protected]
Abstract.

We establish a bijection between marginal independence models on nn random variables and split closed order ideals in the poset of partial set partitions. We also establish that every discrete marginal independence model is toric in cdf coordinates. This generalizes results of Boege, Petrovic, and Sturmfels [1] and Drton and Richardson [2], and provides a unified framework for discussing marginal independence models.

1. Introduction

Marginal independence models are often studied by associating a combinatorial structure to describe the independence relations between random variables. For instance, Drton and Richardson [2] used bidirected graphs to develop a family of marginal independence models for discrete random variables, and Boege, Petrovic, and Sturmfels [1] used simplicial complexes. The marginal independence models of [2] fit more broadly into the class of graphical models [7, 9]. Each of these structures describes a very different family of models, with very little overlap between them. In both cases it had been shown that when we restrict to discrete random variables, we obtain toric models after a linear change of coordinates. However, there are collections of marginal independence statements that cannot be expressed either as graphs or simplicial complexes. This motivates the proposal of a more general combinatorial structure which we use in our treatment of marginal independence.

In this paper, we study marginal independence using the poset of partial set partitions, PΠnP\Pi_{n}. In Section 2, we give a combinatorial description of marginal independence models on nn random variables by showing a correspondence between the models and certain order ideals of PΠnP\Pi_{n}.

In Section 3, we then restrict to discrete random variables and show that the resulting models are toric and smooth after a linear change of coordinates. In [1, 2, 9], the proof that the mentioned family of marginal independence models are toric uses a linear change of coordinates into Möbius coordinates, by using the Möbius transformation on an appropriately defined poset. In this paper, we describe a change of coordinates using the cumulative distribution function, which is classically used to describe marginal independence. This description has the advantage of tying the results to the classical description of marginal independence. We then move from the affine description of our models to the projective description by showing a parametrization of the homogeneous ideal corresponding to the model.

Finally, in Section 4 we discuss how marginal independence models associated to graphs and simplicial complexes can fit in the framework proposed in this paper. We then compare both classes of models and give an explicit characterization of the models that can be described using both simplicial complexes and graphs. We also count the number of distinct models in these three classes for up to 44 random variables.

2. Marginal Independence and the Poset of Partial Set Partitions

In this section we discuss general marginal independence models, for completely general types of random variables. Marginal independence is naturally described via constraints on the joint cumulative distribution function of a collection of random variables. We will also see a combinatorial structure which can be used to capture all marginal independence models. This structure is called the poset of partial set partitions, and we will show how certain order ideals of this poset correspond to marginal independence models. We achieve this by introducing a closure operation on order ideals which is based on translating certain implications of marginal independence statements to the language of this poset.

Definition 2.1.

Let X1,,XnX_{1},\ldots,X_{n} be random variables in \mathbb{R}. The joint cumulative distribution function (cdf) of X1,,XnX_{1},\ldots,X_{n}, is the function defined on ({±})n(\mathbb{R}\cup\{\pm\infty\})^{n} defined by the rule

F(a1,,an)=P(X1a1,,Xnan).F(a_{1},\ldots,a_{n})={\rm P}(X_{1}\leq a_{1},\ldots,X_{n}\leq a_{n}).

Note that for a function FF to be a joint cumulative distribution function it must satisfy a few properties.

  • For all ii, limxiF(x1,,xn)=0\lim_{x_{i}\rightarrow-\infty}F(x_{1},\ldots,x_{n})=0

  • limx1,,xnF(x1,,xn)=1\lim_{x_{1},\ldots,x_{n}\rightarrow\infty}F(x_{1},\ldots,x_{n})=1.

  • FF should be nondecreasing in each coordinate: That is, if aba\leq b coordinate-wise, then F(a)F(b)F(a)\leq F(b).

  • FF should be right continuous in each coordinate: That is, for each ii, and all a1,,ana_{1},\ldots,a_{n},

    limxiai+F(a1,,ai1,xi,ai+1,,an)=F(a1,,an)\lim_{x_{i}\rightarrow a_{i}^{+}}F(a_{1},\ldots,a_{i-1},x_{i},a_{i+1},\ldots,a_{n})=F(a_{1},\ldots,a_{n})
  • For all a,ba,b, F(max(a,b))+F(min(a,b))F(a)+F(b)F(\max(a,b))+F(\min(a,b))\geq F(a)+F(b).

These conditions are not exhaustive; that is, there are functions that satisfy all of these properties but are not cdfs of a collection of random variables. However, we do not need to know a precise characterization of functions which are joint cdfs to derive our results.

Note that the joint cdf contains all the marginal cdfs as well. For example, with three random variables, we have

F(a1,a2,)=P(X1a1,X2a2,X3)=P(X1a1,X2a2)F(a_{1},a_{2},\infty)={\rm P}(X_{1}\leq a_{1},X_{2}\leq a_{2},X_{3}\leq\infty)={\rm P}(X_{1}\leq a_{1},X_{2}\leq a_{2})

Marginal independence of collections of random variables can be expressed in terms of the joint cdf. Let X=(X1,,Xn)X=(X_{1},\ldots,X_{n}) denote the random vector, and XA=(xi:iA)X_{A}=(x_{i}:i\in A) be a subvector.

Definition 2.2.

Let A,B[n]A,B\subseteq[n], be disjoint subsets. We say that XAX_{A} and XBX_{B} are marginally independent (denoted XAXBX_{A}\mbox{$\perp\kern-5.5pt\perp$}X_{B}, or just ABA\mbox{$\perp\kern-5.5pt\perp$}B for short), if

F(x)=F(y)F(z)F(x)=F(y)F(z)

for all xx such that xi=x_{i}=\infty for all i[n](AB)i\in[n]\setminus(A\cup B) and where

yi={xiif iAotherwise and zi={xiif iBotherwisey_{i}=\begin{cases}x_{i}&\mbox{if }i\in A\\ \infty&\mbox{otherwise}\end{cases}\quad\quad\quad\mbox{ and }\quad\quad\quad z_{i}=\begin{cases}x_{i}&\mbox{if }i\in B\\ \infty&\mbox{otherwise}\end{cases}
Example 2.3.

Suppose n=4n=4, and consider the marginal independence statement X1(X2,X4)X_{1}\mbox{$\perp\kern-5.5pt\perp$}(X_{2},X_{4}) (or {1}{2,4}\{1\}\mbox{$\perp\kern-5.5pt\perp$}\{2,4\}, or 1241\mbox{$\perp\kern-5.5pt\perp$}24). This translates into the condition:

F(x1,x2,,x4)=F(x1,,,)F(,x2,,x4),x1,x2,x4{}F(x_{1},x_{2},\infty,x_{4})=F(x_{1},\infty,\infty,\infty)F(\infty,x_{2},\infty,x_{4}),\qquad\forall\;x_{1},x_{2},x_{4}\in\mathbb{R}\cup\{\infty\}

Said another way, we have, for all x1,x2,x4x_{1},x_{2},x_{4},

P(X1x1,X2x2,X4x4)=P(X1x1)P(X2x2,X4x4).{\rm P}(X_{1}\leq x_{1},X_{2}\leq x_{2},X_{4}\leq x_{4})={\rm P}(X_{1}\leq x_{1}){\rm P}(X_{2}\leq x_{2},X_{4}\leq x_{4}).

More generally, we can define marginal independence for a collection of sets.

Definition 2.4.

Let A1,,Ak[n]A_{1},\ldots,A_{k}\subseteq[n] with AiAj=A_{i}\cap A_{j}=\emptyset for all iji\neq j. Then XA1XAkX_{A_{1}}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}X_{A_{k}} if, for all xx with xi=x_{i}=\infty for all i[n](A1Ak)i\in[n]\setminus(A_{1}\cup\cdots\cup A_{k}), we have

F(x)=F(y(1))F(y(k))F(x)=F(y^{(1)})\cdots F(y^{(k)})

where

yi(j)={xiif iAjotherwise.y^{(j)}_{i}=\begin{cases}x_{i}&\mbox{if }i\in A_{j}\\ \infty&\mbox{otherwise}.\end{cases}

Now that we have the notion of generalized marginal independence, we can discuss marginal independence models. A marginal independence model is a collection of marginal independence statements and the set of probability distributions that are compatible with those statements. We will describe a completely general combinatorial structure for representing all such marginal independence models. This depends on the structure of a certain poset called the poset of partial set partitions.

Definition 2.5.

A partial set partition of [n][n] is an unordered list of nonempty sets π=π1||πk\pi=\pi_{1}|\cdots|\pi_{k} such that each πi[n]\pi_{i}\subseteq[n], and πiπj=\pi_{i}\cap\pi_{j}=\emptyset for all iji\neq j. The set of all partial set partitions of [n][n] is denoted PΠnP\Pi_{n}. A set πi\pi_{i} in the partial set partition π\pi is called a block. The ground set, |π||\pi| of π\pi is defined as |π|=i=1kπi|\pi|=\cup_{i=1}^{k}\pi_{i}. The set of partial set partitions where each π\pi has at least 22 blocks is denoted PΠn,2P\Pi_{n,2}.

Note the elements of PΠn,2P\Pi_{n,2} correspond to marginal independence statements. Specifically π=π1||πk\pi=\pi_{1}|\cdots|\pi_{k} corresponds to the independence statement Xπ1XπkX_{\pi_{1}}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}X_{\pi_{k}}.

We want to turn PΠnP\Pi_{n} into a poset that is compatible with marginal independence implications. We will explore some implications between marginal independence statements that will lead naturally to a poset structure on PΠnP\Pi_{n}.

Lemma 2.6.

If a cdf FF satisfies A1AkA_{1}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}A_{k} then,

  1. (1)

    FF satisfies sSAstTAt\cup_{s\in S}A_{s}\mbox{$\perp\kern-5.5pt\perp$}\cup_{t\in T}A_{t} for all S,T[k]S,T\subseteq[k] such that ST=S\cap T=\emptyset.

  2. (2)

    FF satisfies B1BkB_{1}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}B_{k} where BiAiB_{i}\subseteq A_{i} for all ii.

Proof.

First, we will show that for S,T[k]S,T\subseteq[k] such that ST=S\cap T=\emptyset, if FF satisfies A1AkA_{1}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}A_{k}, then it satisfies sSAstTAt\cup_{s\in S}A_{s}\mbox{$\perp\kern-5.5pt\perp$}\cup_{t\in T}A_{t}. Let xx be such that xi=x_{i}=\infty if i[n]sSAsi\in[n]\setminus\cup_{s\in S}A_{s}, in particular, xi=x_{i}=\infty if i[n](A1Ak)i\in[n]\setminus(A_{1}\cup\cdots\cup A_{k}) as well. Thus,

F(x)=F(y(1))F(y(k))F(x)=F(y^{(1)})\cdots F(y^{(k)})

where

yi(j)={xiif iAj and jSotherwise.y^{(j)}_{i}=\begin{cases}x_{i}&\mbox{if }i\in A_{j}\mbox{ and }j\in S\\ \infty&\mbox{otherwise}.\end{cases}

Hence,

F(x)=sSF(y(s)).F(x)=\prod_{s\in S}F\left(y^{(s)}\right).

We get an analogous result for tTAt\cup_{t\in T}A_{t}.

Now, let xx be such that xi=x_{i}=\infty if i[n]tSTAti\in[n]\setminus\cup_{t\in S\cup T}A_{t}. Then, it holds that

F(x)=F(y(1))F(y(k))F(x)=F(y^{(1)})\cdots F(y^{(k)})

where

yi(j)={xiif iAj and jSTotherwise.y^{(j)}_{i}=\begin{cases}x_{i}&\mbox{if }i\in A_{j}\mbox{ and }j\in S\cup T\\ \infty&\mbox{otherwise}.\end{cases}

Hence,

F(x)\displaystyle F(x) =sSF(y(s))tTF(y(t))\displaystyle=\prod_{s\in S}F\left(y^{(s)}\right)\prod_{t\in T}F\left(y^{(t)}\right)
=F(w)F(z)\displaystyle=F(w)F(z)

where

wi={xiif isSAsotherwise and zi={xiif itTAtotherwisew_{i}=\begin{cases}x_{i}&\mbox{if }i\in\cup_{s\in S}A_{s}\\ \infty&\mbox{otherwise}\end{cases}\quad\quad\quad\mbox{ and }\quad\quad\quad z_{i}=\begin{cases}x_{i}&\mbox{if }i\in\cup_{t\in T}A_{t}\\ \infty&\mbox{otherwise}\end{cases}

Now we will show that if FF satisfies A1AkA_{1}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}A_{k}, then it satisfies B1BkB_{1}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}B_{k} whenever BiAiB_{i}\subseteq A_{i} for all ii. Assume FF satisfies A1AkA_{1}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}A_{k}. Let xx be such that xi=x_{i}=\infty for all i[n](B1Bk)i\in[n]\setminus(B_{1}\cup\ldots\cup B_{k}), then it follows that

F(x)=F(y(1))F(y(k))F(x)=F(y^{(1)})\cdots F(y^{(k)})

where

yi(j)={xiif iBjAjotherwise.y^{(j)}_{i}=\begin{cases}x_{i}&\mbox{if }i\in B_{j}\subseteq A_{j}\\ \infty&\mbox{otherwise}.\end{cases}

which proves FF also satisfies B1BkB_{1}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}B_{k}. ∎

Proposition 2.7.

Suppose that the joint cdf FF satisfies the marginal independence statement A1AkA_{1}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}A_{k}. Let B1BlB_{1}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}B_{l} be another marginal independence statement such that for each i[l]i\in[l], there is a set Ii[k]I_{i}\subseteq[k] such that IiIj=I_{i}\cap I_{j}=\emptyset for all iji\neq j, and BitIiAtB_{i}\subseteq\cup_{t\in I_{i}}A_{t}. Then FF also satisfies B1BlB_{1}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}B_{l}.

Proof.

Assume FF satisfies the marginal independence statement A1AkA_{1}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}A_{k}. Let B1BlB_{1}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}B_{l} be another marginal independence statement such that for each i[l]i\in[l], there is a set Ii[k]I_{i}\subseteq[k] such that IiIj=I_{i}\cap I_{j}=\emptyset for all iji\neq j, and BitIiAtB_{i}\subseteq\cup_{t\in I_{i}}A_{t}. By Lemma 2.6 (1), we know that FF satisfies tI1AttIlAt\cup_{t\in I_{1}}A_{t}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}\cup_{t\in I_{l}}A_{t}. Finally, FF satisfies B1BlB_{1}\mbox{$\perp\kern-5.5pt\perp$}\cdots\mbox{$\perp\kern-5.5pt\perp$}B_{l} by Lemma 2.6 (2), since BitIiAtB_{i}\subseteq\cup_{t\in I_{i}}A_{t} for all i[l]i\in[l]. ∎

This yields a partial order on PΠnP\Pi_{n} by the same rule as in Proposition 2.7.

Definition 2.8.

Let π=π1||πk\pi=\pi_{1}|\cdots|\pi_{k} and τ=τ1||τl\tau=\tau_{1}|\cdots|\tau_{l} be two partial set partitions. Define a partial order on PΠnP\Pi_{n} by declaring π>τ\pi>\tau if for each i[l]i\in[l], there is a set Ii[k]I_{i}\in[k] such that IiIJ=I_{i}\cap I_{J}=\emptyset for all iji\neq j, and τitIiπt\tau_{i}\subseteq\cup_{t\in I_{i}}\pi_{t}.

Note that the covering relations in PΠnP\Pi_{n} are of two types. We have that π\pi covers τ\tau if either

  1. (1)

    π\pi can be obtained from τ\tau by adding a single element to one of the blocks of τ\tau, or

  2. (2)

    π\pi can be obtained from τ\tau by splitting a block of τ\tau into two blocks.

Proposition 2.9.

In PΠnP\Pi_{n}, π<τ\pi<\tau if and only if S=|π||τ|S=|\pi|\subseteq|\tau| and πSτS\pi\succ_{S}\tau\cap S, where S\succ_{S} is the standard order in the lattice of set partitions of SS and τS=τ1S||τlS\tau\cap S=\tau_{1}\cap S|\cdots|\tau_{l}\cap S.

Proof.

This follows directly from the covering relations. ∎

Remark.

Note that PΠnP\Pi_{n} is not, in general, a lattice. Indeed, the elements 1|21|2 and 3|43|4 have two least upper bounds, namely 13|2413|24 and 14|2314|23. This shows a key difference with the poset of set partitions Πn\Pi_{n}, which is well known to be a lattice [8].

On the other hand, PΠnP\Pi_{n} is a graded poset, with

ρ(π1||πk)=#(iπi)+k1.\rho(\pi_{1}|\cdots|\pi_{k})=\#(\cup_{i}\pi_{i})+k-1.

This formula follows by counting the number of covering relations to get from the bottom element \emptyset to a partial set partition π\pi.

Remark.

Note that PΠnP\Pi_{n} contains the Boolean lattice BnB_{n} as a subposet. The lattice of set partitions Πn\Pi_{n} is not a subposet of PΠnP\Pi_{n}, but rather its opposite Πnopp\Pi_{n}^{opp} is a subposet. In fact, PΠnP\Pi_{n} contains the opposite of the lattice of set partitions of any S[n]S\subseteq[n].

For studying marginal independence models, it is more natural to consider the poset PΠn,2P\Pi_{n,2} of partial set partitions where each partition has at least two blocks. We will see that marginal independence models correspond to certain types of order ideals in PΠn,2P\Pi_{n,2}.

Definition 2.10.

Let π(1),,π(k)PΠn,2\pi^{(1)},\ldots,\pi^{(k)}\subseteq P\Pi_{n,2}, then

𝒞={π1(1)||πi1(1),,π1(k)||πik(k)}\mathcal{C}=\left\{\pi_{1}^{(1)}|\cdots|\pi_{i_{1}}^{(1)},\ldots,\pi_{1}^{(k)}|\cdots|\pi_{i_{k}}^{(k)}\right\}

or, equivalently,

𝒞={Xπ1(1)Xπi1(1),,Xπ1(k)Xπik(k)}\mathcal{C}=\left\{X_{\pi_{1}^{(1)}}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}X_{\pi_{i_{1}}^{(1)}},\ldots,X_{\pi_{1}^{(k)}}\mbox{$\perp\kern-5.5pt\perp$}\ldots\mbox{$\perp\kern-5.5pt\perp$}X_{\pi_{i_{k}}^{(k)}}\right\}

is a list of independence statements on random variables X1,,XnX_{1},\ldots,X_{n}. A marginal independence model, 𝒞\mathcal{M}_{\mathcal{C}}, on these random variables is the set of distributions PP that satisfy all the statements in 𝒞\mathcal{C}.

Definition 2.11.

Let π=π1||πkPΠn,2\pi=\pi_{1}|\cdots|\pi_{k}\in P\Pi_{n,2} and τ=τ1|τ2PΠn,2\tau=\tau_{1}|\tau_{2}\in P\Pi_{n,2}. We say τ\tau splits π\pi if πi=τ1τ2\pi_{i}=\tau_{1}\cup\tau_{2} for some ii. We denote the splitting of π\pi by τ\tau with πτ=π1||πi1|τ1|τ2|πi+1||πk\pi^{\tau}=\pi_{1}|\cdots|\pi_{i-1}|\tau_{1}|\tau_{2}|\pi_{i+1}|\cdots|\pi_{k}.

Definition 2.12.

Let 𝒞\mathcal{C} be an order ideal of PΠn,2P\Pi_{n,2}. We say 𝒞\mathcal{C} is split closed if πτ𝒞\pi^{\tau}\in\mathcal{C} for all π,τ𝒞\pi,\tau\in\mathcal{C} such that τ\tau splits π\pi.

Definition 2.13.

The split closure 𝒞¯\overline{\mathcal{C}} of 𝒞\mathcal{C} is the smallest split closed order ideal containing 𝒞\mathcal{C}.

Remark.

This allows us to make a precise connection between collections of marginal independence statements 𝒞\mathcal{C} and certain order ideals in PΠn,2P\Pi_{n,2}, namely, their split closure 𝒞¯\overline{\mathcal{C}}.

Proposition 2.14.

Let 𝒞\mathcal{C} be a a split closed order ideal. If π,μ𝒞\pi,\mu\in\mathcal{C} are such that S=|π|=|μ|S=|\pi|=|\mu| for some S[n]S\subseteq[n], then πμI\pi\wedge\mu\in I where \wedge is the common refinement in the lattice of set partitions of SS.

Proof.

Let π,μI\pi,\mu\in I be such that S=|π|=|μ|S=|\pi|=|\mu| for some S[n]S\subseteq[n]. Observe that πμ=τ\pi\wedge\mu=\tau where for each ii, τi=πrμt\tau_{i}=\pi_{r}\cap\mu_{t} for some r,tr,t. Now consider the partition γi=μπi=μ1πi||μlπi\gamma^{i}=\mu\cap\pi_{i}=\mu_{1}\cap\pi_{i}|\cdots|\mu_{l}\cap\pi_{i}, by definition, either γi=πiμj\gamma^{i}=\pi_{i}\subseteq\mu_{j} for some jj or γiμ\gamma^{i}\leq\mu. Observe further that we can write τ=γ1||γk\tau=\gamma^{1}|\cdots|\gamma^{k}. If for all ii, we have that γi=πi\gamma^{i}=\pi_{i}, then τ=πI\tau=\pi\in I. On the other hand, if γiμ\gamma^{i}\leq\mu, clearly γiI\gamma^{i}\in I and since II is a split closed order ideal, so will πγi\pi^{\gamma^{i}}. Thus, τ=πγ1γkI\tau=\pi^{\gamma^{1}\cdots\gamma^{k}}\in I

We finish this section stating a general theorem about marginal independence models that makes the correspondence between split closed order ideals and models precise. We prove this theorem in Section 3.

Theorem 2.15.

Marginal independence models 𝒞\mathcal{M}_{\mathcal{C}} on nn random variables are in bijective correspondence to split closed order ideals 𝒞PΠn,2\mathcal{C}\subseteq P\Pi_{n,2}.

In this level of generality, it is not clear how to prove the correspondence yet. One of the directions is straightforward, based on results we have already proven: for any marginal independence model, there exists a split closed order ideal by taking all implied marginal independence statements. For the other direction, it suffices to show the existence of probability distributions that satisfy exactly the independence statements in a given split closed order ideal, and no others. We will derive such result as a consequence of our study of marginal independence for discrete random variables in the following section.

3. Discrete marginal independence models

In this section, we will discuss marginal independence models restricting the class of random variables to have finitely many states to allow us to translate the independence statements into polynomial constraints. Past work has been done on certain classes of these models where the marginal independence statements are given by graphs or simplicial complexes [1, 2]. However, there are marginal independence models that cannot be represented with either of these structures, which is why we present a more general approach using split closed order ideals from the set of partial set partitions. In previous work [1, 2, 9], it has been shown these models are toric after a certain linear change of coordinates whose inverse could be found by applying Möbius inversion on a specific poset. In our treatment of marginal independence, we will use the cumulative distribution function as our coordinate system. This allows us to express marginal independence constraints in a very natural way while preserving the structure of the inverse map as a Möbius inversion on a poset isomorphic to the one used in [1].

We begin by defining what a single marginal independence statement looks like in terms of the probability tensor and then we will generalize it further to define our models. For this, we need to first discuss some notation that will be used.

Definition 3.1.

For finite random variables X1,,XnX_{1},\ldots,X_{n}, with respective state spaces [ri][r_{i}], rir_{i}\in\mathbb{N}, we use the shorthand pi1inp_{i_{1}\ldots i_{n}} to denote P(X1=i1,Xn=in)P(X_{1}=i_{1},\ldots X_{n}=i_{n}). Furthermore, when we discuss marginalizations of the probability tensor, we will allow ik=+i_{k}=+ to symbolize adding over all the states of the corresponding variable, i.e. if ik=+i_{k}=+, then

pi1,,ik,,in=pi1,,+,,in=jk=1rkpi1,,jk,,in.p_{i_{1},\ldots,i_{k},\ldots,i_{n}}=p_{i_{1},\ldots,+,\ldots,i_{n}}=\sum_{j_{k}=1}^{r_{k}}p_{i_{1},\ldots,j_{k},\ldots,i_{n}}.

For an index vector i=i1ini=i_{1}\ldots i_{n}, we define the support of ii as supp(i)={l[n]|il+}\mathrm{supp}(i)=\{l\in[n]\;|\;i_{l}\neq+\}.

Definition 3.2.

Let X1,,XnX_{1},\ldots,X_{n} be finite random variables, we say that for disjoint A,B[n]A,B\subseteq[n], XAXBX_{A}\mbox{$\perp\kern-5.5pt\perp$}X_{B} holds if and only if for a probability distribution PP indexed by pi=pi1inp_{i}=p_{i_{1}\ldots i_{n}} we have that

pi1inpj1jnpk1knpl1ln=0,p_{i_{1}\ldots i_{n}}p_{j_{1}\ldots j_{n}}-p_{k_{1}\ldots k_{n}}p_{l_{1}\ldots l_{n}}=0,

for all i,j,k,li,j,k,l, with supp(i)=supp(j)=supp(k)=supp(l)=AB\mathrm{supp}(i)=\mathrm{supp}(j)=\mathrm{supp}(k)=\mathrm{supp}(l)=A\cup B, kA=iAk_{A}=i_{A}, kB=iBk_{B}=i_{B}, lA=jAl_{A}=j_{A}, and lB=jBl_{B}=j_{B}.

Equivalently, these constraints can be obtained from all the 2×22\times 2 minors of the matrix resulting from marginalizing the probability tensor PP by summing out the indices not in ABA\cup B and whose columns are indexed by the states of XAX_{A} and the rows are indexed by the states of XBX_{B}, where XAX_{A} and XBX_{B} are seen as random vectors.

Definition 3.3.

Let PP be a probability distribution on random variables X1,,XnX_{1},\ldots,X_{n} with respective state spaces [r1],,[rn][r_{1}],\ldots,[r_{n}], with rir_{i}\in\mathbb{N} for i[n]i\in[n]. Then, we define =[r1]××[rn]\mathcal{R}=[r_{1}]\times\cdots\times[r_{n}].

Definition 3.4.

We define the ring of polynomials in probability coordinates, [P]\mathbb{R}[P], for the probability tensor of format =[r1]××[rn]\mathcal{R}=[r_{1}]\times\ldots\times[r_{n}] as

[P]=[pi:i].\mathbb{R}[P]=\mathbb{R}[p_{i}:\;i\in\mathcal{R}].
Definition 3.5.

The marginal independence ideal (MI𝒞MI_{\mathcal{C}}) associated to the split closed order ideal 𝒞\mathcal{C} on discrete random variables X1,,XnX_{1},\ldots,X_{n} is

MI𝒞=2×2 minors of all flattenings of Pπ:π𝒞,\displaystyle MI_{\mathcal{C}}=\left\langle 2\times 2\mbox{ minors of all flattenings of }P_{\pi}:\;\pi\in\mathcal{C}\right\rangle,

where π=π1||πk\pi=\pi_{1}|\cdots|\pi_{k} and PπP_{\pi} is the kk-way tensor which is obtained by first marginalizing the tensor PP by summing out the indices not in |π||\pi|, and then flattening the resulting tensor according to the set partition π\pi.

Alternatively, MI𝒞MI_{\mathcal{C}} is the ideal resulting from imposing rank 1 conditions on the tensors PπP_{\pi}, for all π𝒞\pi\in\mathcal{C}. This definition is a slight modification of the one in [1] to fit our class of models.

Example 3.6.

Let X1X_{1} be a binary random variable and let X2,X3X_{2},X_{3} be ternary randcom variables and consider the marginal independence statement X2X3X_{2}\mbox{$\perp\kern-5.5pt\perp$}X_{3}. Then the equations that define MI23MI_{2\mbox{$\perp\kern-5.5pt\perp$}3} are

p+i2i3p+j2j3p+i2j3p+j2i3=\displaystyle p_{+i_{2}i_{3}}p_{+j_{2}j_{3}}-p_{+i_{2}j_{3}}p_{+j_{2}i_{3}}= (p1i2i3+p2i2i3)(p1j2j3+p2j2j3)\displaystyle(p_{1i_{2}i_{3}}+p_{2i_{2}i_{3}})(p_{1j_{2}j_{3}}+p_{2j_{2}j_{3}})
(p1i2j3+p2i2j3)(p1j2i3+p2j2i3)=0,\displaystyle-(p_{1i_{2}j_{3}}+p_{2i_{2}j_{3}})(p_{1j_{2}i_{3}}+p_{2j_{2}i_{3}})=0,

for all i2,i3,j2,j3[3]i_{2},i_{3},j_{2},j_{3}\in[3]. We can also obtain these equations from the 2×22\times 2 minors of the matrix

(p+11p+12p+13p+21p+22p+23p+31p+32p+33).\begin{pmatrix}p_{+11}&p_{+12}&p_{+13}\\ p_{+21}&p_{+22}&p_{+23}\\ p_{+31}&p_{+32}&p_{+33}\end{pmatrix}.
Example 3.7.

Let n=4n=4 and consider the model given by binary random variables such that X1,2X3X_{1,2}\mbox{$\perp\kern-5.5pt\perp$}X_{3}. Then, the equations that define MI123MI_{12\mbox{$\perp\kern-5.5pt\perp$}3} are

pi1i2i3+pj1j2j3+pi1i2j3+pj1j2i3+=\displaystyle p_{i_{1}i_{2}i_{3}+}p_{j_{1}j_{2}j_{3}+}-p_{i_{1}i_{2}j_{3}+}p_{j_{1}j_{2}i_{3}+}= (pi1i2i31+pi1i2i32)(pj1j2j31+pj1j2j32)\displaystyle(p_{i_{1}i_{2}i_{3}1}+p_{i_{1}i_{2}i_{3}2})(p_{j_{1}j_{2}j_{3}1}+p_{j_{1}j_{2}j_{3}2})
(pi1i2j31+pi1i2j32)(pj1j2i31+pj1j2i32)=0,\displaystyle-(p_{i_{1}i_{2}j_{3}1}+p_{i_{1}i_{2}j_{3}2})(p_{j_{1}j_{2}i_{3}1}+p_{j_{1}j_{2}i_{3}2})=0,

for all i1,i2,i3,j1,j2,j3[2]i_{1},i_{2},i_{3},j_{1},j_{2},j_{3}\in[2]. These equations can again be obtained from the 2×22\times 2 minors of the matrix

(p111+p112+p121+p122+p211+p212+p221+p222+).\begin{pmatrix}p_{111+}&p_{112+}\\ p_{121+}&p_{122+}\\ p_{211+}&p_{212+}\\ p_{221+}&p_{222+}\\ \end{pmatrix}.

The ideal MI𝒞MI_{\mathcal{C}} tends not to be prime when 𝒞\mathcal{C} has more than one maximal element, which represents a challenge when studying these models. We get around this issue by using the cumulative distribution function as our new system of coordinates, since marginal independence constraints can be expressed in a simpler way using the cdf.

Definition 3.8 (cdf coordinates).

We define the linear change of coordinates φ:\varphi:\mathbb{R}^{\mathcal{R}}\rightarrow\mathbb{R}^{\mathcal{R}}, PQP\mapsto Q by transforming the tensor PP so that the entries of QQ are cumulative distributions as follows:

qj1jn\displaystyle q_{j_{1}\ldots j_{n}} =P{X1j1,,Xnjn}\displaystyle=P\left\{X_{1}\leq j_{1},\ldots,X_{n}\leq j_{n}\right\}
=i1j1,,injnpi1in\displaystyle=\sum_{i_{1}\leq j_{1},\ldots,i_{n}\leq j_{n}}p_{i_{1}\ldots i_{n}}
Remark.

In the case of binary random variables, cdf coordinates become much simpler. We have that for any qi1inq_{i_{1}\ldots i_{n}}, either il=1i_{l}=1 or il=2i_{l}=2, thus we can represent each cdf coordinate by indexing it with its support.

We can discuss this transformation, and in particular, its inverse, in terms of the poset of the index vectors of the tensors. Let 𝒫=[r1]××[rn]\mathcal{P}=[r_{1}]\times\ldots\times[r_{n}] be the usual direct product poset, where two index vectors jij\leq i if jkikj_{k}\leq i_{k} for all k[n]k\in[n]. Then, we can state the equation for cdf coordinates in Definition 3.8 as follows, for all i𝒫i\in\mathcal{P},

qi:=jipj.\displaystyle q_{i}:=\sum_{j\leq i}p_{j}.

Using the Möbius inversion formula we have that for all i𝒫i\in\mathcal{P},

pi\displaystyle p_{i} =jiμ(j,i)qj\displaystyle=\sum_{j\leq i}\mu(j,i)q_{j}
=jik=1nμ(jk,ik)qj,\displaystyle=\sum_{j\leq i}\prod_{k=1}^{n}\mu(j_{k},i_{k})q_{j},

where

μ(jk,ik)={1 if ik=jk1 if ik=jk+10 otherwise.\displaystyle\mu(j_{k},i_{k})=\begin{cases}1&\mbox{ if }i_{k}=j_{k}\\ -1&\mbox{ if }i_{k}=j_{k}+1\\ 0&\mbox{ otherwise.}\\ \end{cases}
Example 3.9.

In the case when r1=r2=3r_{1}=r_{2}=3, we have that the poset 𝒫\mathcal{P} would be as in Figure 1.

232332321313313122221212212111113333
Figure 1. Index poset 𝒫\mathcal{P} from Example 3.9.

Then, we have that

q22=(i1i2)(22)pi1i2.\displaystyle q_{22}=\sum_{(i_{1}i_{2})\leq(22)}p_{i_{1}i_{2}}.

Hence, using Möbius inversion on this poset yields

p22\displaystyle p_{22} =(i1i2)(22)μ(i1i2,22)qi1i2\displaystyle=\sum_{(i_{1}i_{2})\leq(22)}\mu(i_{1}i_{2},22)q_{i_{1}i_{2}}
=q22q12q21+q11.\displaystyle=q_{22}-q_{12}-q_{21}+q_{11}.

Marginal independence is usually stated in terms of the cumulative probability distribution, which is why this is a natural transformation to consider as it simplifies the equations for marginal independence.

Our goal now is to to explore the ideals of discrete marginal independence models in the cdf coordinates. To do this, we need to explore some properties of split closed order ideals.

Definition 3.10.

Let 𝒞\mathcal{C} be a split closed order ideal of PΠn,2P\Pi_{n,2}, then

  1. (1)

    We say that a nonempty set D[n]D\subseteq[n] is disconnected (with respect to 𝒞\mathcal{C}) if there exists π𝒞\pi\in\mathcal{C} such that D=|π|D=|\pi|.

  2. (2)

    A connected set is a set that is not disconnected.

  3. (3)

    A connected set CC is maximal with respect to some disconnected set DD if for any set CDC^{\prime}\subseteq D such that CCC\subset C^{\prime}, then CC^{\prime} is disconnected.

  4. (4)

    We define the set Con(𝒞)={C[n]:C is connected}\mathrm{Con}(\mathcal{C})=\left\{C\subseteq[n]:\;C\mbox{ is connected}\right\}

Proposition 3.11.

For any disconnected set DD with respect to 𝒞\mathcal{C}, there exists a unique maximal π𝒞\pi\in\mathcal{C} such that

D=|π|=i=1kπi,D=|\pi|=\cup_{i=1}^{k}\pi_{i},

where all πi\pi_{i} are maximal connected sets.

Proof.

Since DD is disconnected, there exists γ𝒞\gamma\in\mathcal{C} such that D=|γ|D=|\gamma|. If some block γk\gamma_{k} is disconnected, then there is some τ𝒞\tau\in\mathcal{C} such that γk=|τ|\gamma_{k}=|\tau|. Then, D=|γτ|D=|\gamma^{\tau}| is the new decomposition, additionally, γτ𝒞\gamma^{\tau}\in\mathcal{C} and γτ>γ\gamma^{\tau}>\gamma. We can repeat this process on γτ\gamma^{\tau} if any of its blocks are disconnected. This process terminates since every chain of a finite poset is finite.

To show uniqueness of π\pi, let us consider a different maximal decomposition μ\mu. Since D=|π|=|μ|D=|\pi|=|\mu|, and by assumption, πμ\pi\neq\mu, then, μ,π<πμ𝒞\mu,\pi<\pi\wedge\mu\in\mathcal{C} thus contradicting maximality of π\pi and μ\mu. ∎

Let ii\in\mathcal{R}. Define the support of ii to be supp(i)={l[n]ilrl}\mathrm{supp}(i)=\{l\in[n]\mid i_{l}\neq r_{l}\}. Note that for cdf coordinates this will agree with our previous notion of support, since if il=rli_{l}=r_{l} in qiq_{i}, it means we are assuming XlX_{l}\leq\infty in our cdf calculations, a condition denoted by ++ in pip_{i}.

Corollary 3.12.

Let 𝒞\mathcal{C} be a split closed order ideal. A probability tensor PP lies in the model 𝒞\mathcal{M}_{\mathcal{C}} if and only if its cdf coordinates satisfy the following condition. For any D[n]D\subseteq[n] that is disconnected by 𝒞\mathcal{C}, and i,j(l)i,j^{(l)}\in\mathcal{R} be index vectors where

  • D=π1πkD=\pi_{1}\cup\ldots\cup\pi_{k} is the decomposition of DD into maximal connected sets.

  • D=supp(i)D=\mathrm{supp}(i), πl=supp(j(l))\pi_{l}=\mathrm{supp}(j^{(l)}), and

  • isupp(j(l))=jsupp(j(l))(l)i_{\mathrm{supp}(j^{(l)})}=j^{(l)}_{\mathrm{supp}(j^{(l)})}

we have

(1) qi=qj(1)qj(k).q_{i}=q_{j^{(1)}}\ldots q_{j^{(k)}}.
Proof.

We know PP lies in the model 𝒞\mathcal{M}_{\mathcal{C}} if and only if its corresponding cdf satisfies all of the marginal independence constraints. We know that the cdf must satisfy that for all π𝒞\pi\in\mathcal{C},

qi=qj(1)qj(k)q_{i}=q_{j^{(1)}}\ldots q_{j^{(k)}}

where

  • |π|=supp(i)|\pi|=\mathrm{supp}(i), πl=supp(j(l))\pi_{l}=\mathrm{supp}(j^{(l)}),

  • isupp(j(l))=jsupp(j(l))(l)i_{\mathrm{supp}(j^{(l)})}=j^{(l)}_{\mathrm{supp}(j^{(l)})}.

If we have a non maximal π\pi, then we can decompose it further using Proposition 3.11. Hence, the non maximal π\pi yield equations that are implied by maximal set partitions in the sense of Proposition 3.11. ∎

Remark.

Observe that in the case of binary random variables, we get that for any disconnected set D[n]D\subseteq[n] with respect to 𝒞\mathcal{C} with maximal connected sets π1,,πk\pi_{1},\ldots,\pi_{k},

qD=qπ1qπk.q_{D}=q_{\pi_{1}}\ldots q_{\pi_{k}}.
Definition 3.13.

We define the ring of polynomials in cdf coordinates [Q]\mathbb{R}[Q] for the cdf tensor QQ of format =[r1]××[rn]\mathcal{R}=[r_{1}]\times\ldots\times[r_{n}] as

[Q]=[qi:i].\mathbb{R}[Q]=\mathbb{R}[q_{i}:\;i\in\mathcal{R}].
Definition 3.14.

Let 𝒞\mathcal{C} be a split closed order ideal on X1,,XnX_{1},\ldots,X_{n} and ii\in\mathcal{R} such that supp(i)\mathrm{supp}(i) is disconnected with respect to 𝒞\mathcal{C}. Then, the associated maximal equation fif_{i} is given by

fi=qiqj(1)qj(k)f_{i}=q_{i}-q_{j^{(1)}}\ldots q_{j^{(k)}}

where

  • πl=supp(j(l))\pi_{l}=\mathrm{supp}(j^{(l)}),

  • isupp(j(l))=jsupp(j(l))(l)i_{\mathrm{supp}(j^{(l)})}=j^{(l)}_{\mathrm{supp}(j^{(l)})}, and

  • supp(i)=π1πk\mathrm{supp}(i)=\pi_{1}\cup\ldots\cup\pi_{k} is the decomposition of supp(i)\mathrm{supp}(i) into maximal connected sets.

Definition 3.15.

The inhomogeneous marginal independence ideal (J𝒞J_{\mathcal{C}}) associated to the split closed order ideal 𝒞\mathcal{C} on discrete random variables X1,,XnX_{1},\ldots,X_{n} is

J𝒞=fi maximal:i,supp(i) is disconnected.J_{\mathcal{C}}=\left\langle f_{i}\mbox{ maximal}:\;i\in\mathcal{R},\;\mathrm{supp}(i)\mbox{ is disconnected}\right\rangle.
Example 3.16.

In Example 3.7, there are 16 cdf coordinates indexed by subsets of [4][4]. The equations that define the ideal J𝒞J_{\mathcal{C}} without indexing them with their support are

f1112=q1112q1122q2212=0,\displaystyle f_{1112}=q_{1112}-q_{1122}q_{2212}=0, f1212=q1212q1222q2212=0,\displaystyle f_{1212}=q_{1212}-q_{1222}q_{2212}=0,\;\; f2112=q2112q1212q1122=0.\displaystyle f_{2112}=q_{2112}-q_{1212}q_{1122}=0.

However, since all random variables are binary, we can instead index them by their support as follows:

f123=q123q12q3=0,\displaystyle f_{123}=q_{123}-q_{12}q_{3}=0, f13=q13q1q3=0,\displaystyle f_{13}=q_{13}-q_{1}q_{3}=0,\;\; f23=q23q2q3=0.\displaystyle f_{23}=q_{23}-q_{2}q_{3}=0.

This ideal is prime and therefore toric in cdf coordinates.

Definition 3.17.

The marginal independence ideal in cdf coordinates (MJ𝒞MJ_{\mathcal{C}}) associated to the split closed order ideal 𝒞\mathcal{C} on discrete random variables X1,,XnX_{1},\ldots,X_{n} is

MJ𝒞=2×2 minors of all flattenings of Qπ:π𝒞,\displaystyle MJ_{\mathcal{C}}=\left\langle 2\times 2\mbox{ minors of all flattenings of }Q_{\pi}:\;\pi\in\mathcal{C}\right\rangle,

where π=π1||πk\pi=\pi_{1}|\cdots|\pi_{k} and QπQ_{\pi} is the kk-way tensor which is obtained by first marginalizing the tensor QQ by making all the indices not in |π||\pi| equal to their maximum value (this corresponds to summing out indices in Definition 3.5, as we are in cdf coordinates), and then flattening the resulting tensor according to the set partition π\pi.

Alternatively, MJ𝒞MJ_{\mathcal{C}} is the ideal resulting from imposing rank 1 conditions on the tensors QπQ_{\pi}, for all π𝒞\pi\in\mathcal{C}.

Example 3.18.

Let us consider the same model as in Example 3.7. Then, MJ123MJ_{12\mbox{$\perp\kern-5.5pt\perp$}3} is going to be given by imposing rank 11 conditions on the tensors Q123Q_{12\mbox{$\perp\kern-5.5pt\perp$}3}, Q13Q_{1\mbox{$\perp\kern-5.5pt\perp$}3} and Q23Q_{2\mbox{$\perp\kern-5.5pt\perp$}3}, all of which are matrices since they are all partial set partitions with 2 blocks. Following the construction in Definition 3.17, we obtain

Q123=(q1112q1122q1212q1222q2112q2122q2212q2222).Q_{12\mbox{$\perp\kern-5.5pt\perp$}3}=\begin{pmatrix}q_{1112}&q_{1122}\\ q_{1212}&q_{1222}\\ q_{2112}&q_{2122}\\ q_{2212}&q_{2222}\\ \end{pmatrix}.

Furthermore, since the random variables are binary, we can reindex the cdf coordinates by their support as follows:

Q123=(q123q12q13q1q23q2q3q).Q_{12\mbox{$\perp\kern-5.5pt\perp$}3}=\begin{pmatrix}q_{123}&q_{12}\\ q_{13}&q_{1}\\ q_{23}&q_{2}\\ q_{3}&q_{\emptyset}\\ \end{pmatrix}.

Observe that all of the rank 1 conditions in Q13Q_{1\mbox{$\perp\kern-5.5pt\perp$}3} and Q23Q_{2\mbox{$\perp\kern-5.5pt\perp$}3} are found in the 2×22\times 2 minors of Q123Q_{12\mbox{$\perp\kern-5.5pt\perp$}3}. Thus, MJ123MJ_{12\mbox{$\perp\kern-5.5pt\perp$}3} is given by the 2×22\times 2 minors of Q123Q_{12\mbox{$\perp\kern-5.5pt\perp$}3}

This ideal and its generators generally have a more complicated structure than J𝒞J_{\mathcal{C}}. However, once we add a constraint that is imposed by the probability simplex, both ideals have the same generators, which allows us to work with the simpler ideal J𝒞J_{\mathcal{C}} to study these models. To prove this result, we first need to prove the following:

Lemma 3.19.

Let A=(aij)(i,j)[m]×[n]m×nA=(a_{ij})_{(i,j)\in[m]\times[n]}\in\mathbb{C}^{m\times n} be such that akl=1a_{kl}=1 for some (k,l)[m]×[n](k,l)\in[m]\times[n]. Then, rank(A)=1\mathrm{rank}(A)=1 if and only if the 2×22\times 2 minors involving akla_{kl} vanish.

Proof.

The forward direction is direct. For the converse, without loss of generality (since rank is invariant under row/column operations), assume a11=1a_{11}=1 and the 2×22\times 2 minors involving a11a_{11} vanish. Let aia_{i} denote the ii-th row of AA. To show AA is rank 1, it suffices to show aia_{i} is a scalar multiple of a1a_{1}, for any i[m]i\in[m]. We claim that scalar is ai1a_{i1}. Let i[m]i\in[m], note

ai=ai1a1aij=ai1a1jj[n],\displaystyle a_{i}=a_{i1}a_{1}\iff a_{ij}=a_{i1}a_{1j}\;\;\forall j\in[n],

which is true as these correspond to the vanishing minors involving a11a_{11}. ∎

Proposition 3.20.

For a split closed order ideal 𝒞\mathcal{C}, we have that MJ𝒞+qr1rn1=J𝒞+qr1rn1MJ_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle=J_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle.

Proof.

First observe that for any π=π1||πk𝒞\pi=\pi_{1}|\cdots|\pi_{k}\in\mathcal{C}, there exist partial set partitions τ1,,τk1𝒞\tau_{1},\ldots,\tau_{k-1}\in\mathcal{C} such that

π=((τ1τ2)τ3)τk\pi=((\tau_{1}^{\tau_{2}})^{\tau_{3}}\cdots)^{\tau_{k}}

these are given by

τi=πi|j=i+1kπj.\tau_{i}=\pi_{i}|\bigcup_{j=i+1}^{k}\pi_{j}.

In other words, we can get any partial set partition by taking partitions with 2 blocks and applying the splitting operation to them. On the algebraic side, this means the maximal equations fif_{i} that generate J𝒞J_{\mathcal{C}} can be obtained if we have all the equations corresponding to set partitions with 2 blocks in 𝒞\mathcal{C}.

Note that the matrices QπQ_{\pi} with π=π1|π2𝒞\pi=\pi_{1}|\pi_{2}\in\mathcal{C} always have qr1rnq_{r_{1}\cdots r_{n}} appearing. Furthermore, the 2×22\times 2 minors that involve qr1rnq_{r_{1}\cdots r_{n}} are of the form

gi=qr1rnqiqjqkMJ𝒞\displaystyle g_{i}=q_{r_{1}\cdots r_{n}}q_{i}-q_{j}q_{k}\in MJ_{\mathcal{C}}

where i,j,ki,j,k\in\mathcal{R} are such that

  • supp(i)=|π|\mathrm{supp}(i)=|\pi|,

  • supp(j)=π1\mathrm{supp}(j)=\pi_{1},

  • supp(k)=π2\mathrm{supp}(k)=\pi_{2},

  • isupp(j)=jsupp(j)i_{\mathrm{supp}(j)}=j_{\mathrm{supp}(j)}, and

  • isupp(k)=ksupp(k)i_{\mathrm{supp}(k)}=k_{\mathrm{supp}(k)}

Once we add the constraint qr1rn1q_{r_{1}\cdots r_{n}}-1, we observe that

gi^=qiqjqkMJ𝒞+qr1rn1.\displaystyle\hat{g_{i}}=q_{i}-q_{j}q_{k}\in MJ_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle.

Since any set partition in 𝒞\mathcal{C} can be obtained by splitting set partitions with 2 blocks, we can keep splitting qjq_{j} and qkq_{k} until we get the maximal equation fiJ𝒞f_{i}\in J_{\mathcal{C}}. Thus showing MJ𝒞+qr1rn1J𝒞+qr1rn1MJ_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle\supseteq J_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle.

For the other direction, observe that for any π𝒞\pi\in\mathcal{C}, we know that any flattening of the tensor QπQ_{\pi} contains qr1rnq_{r_{1}\cdots r_{n}}. Additionally, under the constraint qr1rn1=0q_{r_{1}\cdots r_{n}}-1=0, any such matrix contains a 1. Thus, a flattening of QπQ_{\pi} is rank 1 if and only if the minors involving the entry of 11 vanish, by Lemma 3.19. However, these 2×22\times 2 minors are all equations of the same form as gi^\hat{g_{i}} above, all of which are in J𝒞J_{\mathcal{C}} by construction. Hence showing the tensor QπQ_{\pi} is rank 1 if and only if its entries lie in V(J𝒞+qr1rn1)V(J_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle), which in turn shows J𝒞+qr1rn1J_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle generates MJ𝒞+qr1rn1MJ_{\mathcal{C}}+\langle q_{r_{1}\cdots r_{n}}-1\rangle. ∎

Now we will exploit the combinatorial structure of the equations in J𝒞J_{\mathcal{C}} to study these models. Observe that Corollary 3.12 gives us a unique factorization for all cdf coordinates and, in particular, it tells us that the cdf coordinates that appear in the defining equations of the variety are precisely those with disconnected support.

Theorem 3.21.

Every discrete marginal independence model 𝒞\mathcal{M}_{\mathcal{C}} is toric in cdf coordinates.

Proof.

By Corollary 3.12, we know that the ideal J𝒞J_{\mathcal{C}} that generates the model is given by equations of the form

fi=qiqj(1)qj(k)f_{i}=q_{i}-q_{j^{(1)}}\ldots q_{j^{(k)}}

where

  • DD is disconnected.

  • D=supp(i)D=\mathrm{supp}(i), πl=supp(j(l))\pi_{l}=\mathrm{supp}(j^{(l)}),

  • isupp(j(l))=jsupp(j(l))(l)i_{\mathrm{supp}(j^{(l)})}=j^{(l)}_{\mathrm{supp}(j^{(l)})} and

  • D=π1πkD=\pi_{1}\cup\ldots\cup\pi_{k} is the decomposition of DD into maximal connected sets.

Now, let us consider the term order \prec on [Q]\mathbb{R}[Q] defined by a lexicographic order with the condition that if |supp(i)|<|supp(j)||\mathrm{supp}(i)|<|\mathrm{supp}(j)|, then qiqjq_{i}\prec q_{j}. Therefore, for each generator fif_{i} of J𝒞J_{\mathcal{C}}, we have that lt(fi)=qi\mbox{lt}_{\prec}(f_{i})=q_{i}.

This is a Gröbner basis for J𝒞J_{\mathcal{C}} since the leading terms of any pair of equations are coprime, so any SS-pair reduces to 0 modulo the basis of difference of binomials. Thus, the initial ideal of J𝒞J_{\mathcal{C}}, in(J𝒞){\rm in}_{\prec}(J_{\mathcal{C}}) is generated by linear terms, so it is prime. This implies that J𝒞J_{\mathcal{C}} is a prime ideal generated by differences of binomials, hence it is toric. ∎

The explicit description of the generators of J𝒞J_{\mathcal{C}} also allows us to show that its associated affine variety is smooth.

Example 3.22.

Consider the split closed order ideal 𝒞\mathcal{C} on 3 binary random variables generated by the independence statement X1X2,3X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{2,3}. Then, by Corollary 3.12,

J𝒞=q12q1q2,q13q1q3,q123q1q23J_{\mathcal{C}}=\left\langle q_{12}-q_{1}q_{2},q_{13}-q_{1}q_{3},q_{123}-q_{1}q_{23}\right\rangle

Hence, the Jacobian of J𝒞J_{\mathcal{C}} is

Jac(J𝒞)=[0q2q1010000q30q101000q230000q11].\mbox{Jac}(J_{\mathcal{C}})=\begin{bmatrix}0&-q_{2}&-q_{1}&0&1&0&0&0\\ 0&-q_{3}&0&-q_{1}&0&1&0&0\\ 0&-q_{23}&0&0&0&0&-q_{1}&1\\ \end{bmatrix}.

Observe that J𝒞J_{\mathcal{C}} is smooth everywhere since its Jacobian has full rank independent of the point of evaluation.

Theorem 3.23.

For a split closed order ideal 𝒞\mathcal{C}, the affine variety that defines the model V(J𝒞)V(J_{\mathcal{C}}) is smooth.

Proof.

It suffices to show that the Jacobian of J𝒞J_{\mathcal{C}} is full rank. Observe that J𝒞J_{\mathcal{C}} is generated by equations of the form

(2) fi=qiqj(1)qj(k)f_{i}=q_{i}-q_{j^{(1)}}\ldots q_{j^{(k)}}

where

  • D=supp(i)D=\mathrm{supp}(i), πl=supp(j(l))\pi_{l}=\mathrm{supp}(j^{(l)}),

  • isupp(j(l))=jsupp(j(l))(l)i_{\mathrm{supp}(j^{(l)})}=j^{(l)}_{\mathrm{supp}(j^{(l)})} and

  • D=π1πkD=\pi_{1}\cup\ldots\cup\pi_{k} is the decomposition of DD into maximal connected sets.

for all disconnected DD. Hence, the equations that appear on the generating set are precisely those indexed by ii such that supp(i)\mathrm{supp}(i) is disconnected. In addition, observe that we have that for all index vectors ii and jj with disconnected support,

qjfi={1if j=i0else.\frac{\partial}{\partial q_{j}}f_{i}=\begin{cases}1&\mbox{if $j=i$}\\ 0&\mbox{else}\end{cases}.

This follows because of the uniqueness of the decomposition of qiq_{i}. Hence, the columns of the Jacobian contain some permutation of the columns of the identity matrix of size #{i[r1]××[rn]:supp(i) is disconnected}\#\{i\in[r_{1}]\times\cdots\times[r_{n}]:\mathrm{supp}(i)\mbox{ is disconnected}\}, which shows that Jac(J𝒞)\mbox{Jac}(J_{\mathcal{C}}) has full rank everywhere. ∎

We are now ready to prove a result about existence of distributions in the binary case.

Proposition 3.24.

For each split closed order ideal 𝒞PΠn,2\mathcal{C}\subseteq P\Pi_{n,2}, there exists a probability distribution PP in the binary model 𝒞\mathcal{M}_{\mathcal{C}} such that PΔ2n(Δ2n)P\in\Delta_{2^{n}}\setminus\partial(\Delta_{2^{n}}) and PP satisfies all the marginal independence statements in 𝒞\mathcal{C} and no other marginal independence statements.

Proof.

We want to show there is a point in the interior of the probability simplex that satisfies exactly the statements in 𝒞\mathcal{C} and no others.

Since we are using binary random variables, the description of our generators becomes much simpler,

J𝒞=fD|D disconnected.J_{\mathcal{C}}=\langle f_{D}|\;D\mbox{ disconnected}\rangle.

Take π𝒞\pi\notin\mathcal{C} to be minimal, this corresponds to an equation g=qπqπ1qπkg=q_{\pi}-q_{\pi_{1}}\cdots q_{\pi_{k}} where π\pi is connected. Let

I=J𝒞+qπqπ1qπkI=J_{\mathcal{C}}+\langle q_{\pi}-q_{\pi_{1}}\cdots q_{\pi_{k}}\rangle

Observe that in(I)=in(J𝒞)+qπ{\rm in}_{\prec}(I)={\rm in}_{\prec}(J_{\mathcal{C}})+\langle q_{\pi}\rangle. So that we still get a Gröbner basis for II composed of linear terms and therefore, dim(I)<dim(J𝒞)\dim(I)<\dim(J_{\mathcal{C}}). Hence, we have a strict inclusion, V(I)V(J𝒞)V(I)\subset V(J_{\mathcal{C}}), showing that V(J𝒞)V(I)V(J_{\mathcal{C}})\setminus V(I)\neq\emptyset. Since the uniform distribution UV(J𝒞)U\in V(J_{\mathcal{C}}), it intersects the interior of the probability simplex (in cdf coordinates). Finally, since we additionally know V(J𝒞)V(J_{\mathcal{C}}) is smooth, we conclude that dim(V(J𝒞))=dim(V(J𝒞)Δ2n))\dim(V(J_{\mathcal{C}}))=\dim(V(J_{\mathcal{C}})\cap\Delta_{2^{n}})). ∎

With Proposition 3.24 in hand, we are ready to prove Theorem 2.15.

Proof of Theorem 2.15.

Given a model 𝒞\mathcal{M}_{\mathcal{C}}, we obtain a unique split closed order ideal given by the closure of 𝒞\mathcal{C}. Now let us start with a split closed order ideal 𝒞PΠn,2\mathcal{C}\in P\Pi_{n,2}. It suffices to show that for any π𝒞\pi\notin\mathcal{C}, there exists a distribution PP that satisfies the statements in 𝒞\mathcal{C} and does not satisfy π\pi, this follows directly from Proposition 3.24. ∎

We have now proven the bijective correspondence between split closed order ideals and marginal independence models, which provides us with a combinatorial description of this class of models.

So far, we have been describing our models and varieties in affine space. However, Proposition 3.11 and Corollary 3.12 allow us to give an explicit homogeneous parametrization of the ideals defining our models.

Corollary 3.25.

Let [Θ]:=[t,θl(C):CCon(𝒞),supp(l)=C,li=1n[ri]]\mathbb{R}[\Theta]:=\mathbb{R}\left[t,\theta_{l}^{(C)}:\;C\in\mathrm{Con}(\mathcal{C}),\;\mathrm{supp}(l)=C,\;l\in\prod_{i=1}^{n}[r_{i}]\right]. A homogeneous parametrization of the model in cdf coordinates is given by the homomorphism between polynomial rings ϕ𝒞:[Q][Θ]\phi_{\mathcal{C}}:\mathbb{R}[Q]\rightarrow\mathbb{R}[\Theta] defined by

ϕ𝒞(qi1in)=tj=1kθiπj(πj),\phi_{\mathcal{C}}(q_{i_{1}\ldots i_{n}})=t\prod_{j=1}^{k}\theta_{i_{\pi_{j}}}^{(\pi_{j})},

where supp(i)=π1πk\mathrm{supp}(i)=\pi_{1}\cup\ldots\cup\pi_{k} is the decomposition of supp(i)\mathrm{supp}(i) into maximal connected sets.

Remark.

In the binary case, we can lose the subscript on our parameters θ\theta as a consequence of the reindexing of binary cdf coordinates with their support.

Definition 3.26.

The homogeneous marginal independence ideal (I𝒞I_{\mathcal{C}}) associated to the split closed order ideal 𝒞\mathcal{C} on discrete random variables X1,,XnX_{1},\ldots,X_{n} is given by

I𝒞=kerϕ𝒞,I_{\mathcal{C}}=\ker\phi_{\mathcal{C}},

where ϕ𝒞\phi_{\mathcal{C}} is the parametrization given in Corollary 3.25.

Proposition 3.27.

For a split order model ideal 𝒞\mathcal{C}, we have

I𝒞=J𝒞¯,I_{\mathcal{C}}=\overline{J_{\mathcal{C}}},

where J𝒞¯\overline{J_{\mathcal{C}}} is the homogenization of J𝒞J_{\mathcal{C}}.

Proof.

This follows from construction of the parametrization in Corollary 3.25 and the definition of I𝒞I_{\mathcal{C}}. ∎

In general, toric varieties are defined by a monomial map with an associated integer matrix A𝒞A_{\mathcal{C}}. This parametrization allows us to compute this integer matrix and the associated polytope 𝔓𝒞\mathfrak{P}_{\mathcal{C}} given by the convex hull of the columns of A𝒞A_{\mathcal{C}}.

Example 3.28.

Consider the model on 4 binary variables given by X1X2X3,4X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{2}\mbox{$\perp\kern-5.5pt\perp$}X_{3,4}. Then, we have 16 cdf coordinates, and the equations that define J𝒞J_{\mathcal{C}} are:

q12q1q2\displaystyle q_{12}-q_{1}q_{2} =0,\displaystyle=0, q13q1q3\displaystyle q_{13}-q_{1}q_{3} =0,\displaystyle=0, q14q1q4\displaystyle q_{14}-q_{1}q_{4} =0,\displaystyle=0, q23q2q3\displaystyle q_{23}-q_{2}q_{3} =0,\displaystyle=0, q24q2q4\displaystyle q_{24}-q_{2}q_{4} =0,\displaystyle=0,
q234q2q34\displaystyle q_{234}-q_{2}q_{34} =0,\displaystyle=0, q123q1q2q3\displaystyle q_{123}-q_{1}q_{2}q_{3} =0,\displaystyle=0, q124q1q2q4\displaystyle q_{124}-q_{1}q_{2}q_{4} =0,\displaystyle=0, q1234q1q2q34\displaystyle q_{1234}-q_{1}q_{2}q_{34} =0\displaystyle=0

From this description, we get the generators for J𝒞J_{\mathcal{C}} and consequently all coordinates with disconnected support. Hence, the parametrization described in Corollary 3.25 yields the following integer matrix:

A𝒞= (\@arstrutqq1q2q3q4q12q13q14q23q24q34q123q124q134q234q1234\\t1111111111111111\\θ(1)0100011100011101\\θ(2)0010010011011011\\θ(3)0001001010010000\\θ(4)0000100101001000\\θ(34)0000000000100111) A_{\mathcal{C}}=\hbox{}\vbox{\kern 0.86108pt\hbox{$\kern 0.0pt\kern 2.5pt\kern-5.0pt\left(\kern 0.0pt\kern-2.5pt\kern-6.66669pt\vbox{\kern-0.86108pt\vbox{\vbox{ \halign{\kern\arraycolsep\hfil\@arstrut$\kbcolstyle#$\hfil\kern\arraycolsep& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep&& \kern\arraycolsep\hfil$\@kbrowstyle#$\ifkbalignright\relax\else\hfil\fi\kern\arraycolsep\cr 5.0pt\hfil\@arstrut$\scriptstyle$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{1}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{2}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{3}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{4}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{12}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{13}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{14}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{23}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{24}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{34}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{123}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{124}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{134}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{234}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle q_{1234}\\t$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1\\\theta^{(1)}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1\\\theta^{(2)}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1\\\theta^{(3)}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0\\\theta^{(4)}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0\\\theta^{(34)}$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 0$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt&5.0pt\hfil$\scriptstyle 1$\hfil\kern 5.0pt\crcr}}}}\right)$}}

Furthermore, we know that the ideal I𝒞I_{\mathcal{C}} is going to be the kernel of the monomial map given by A𝒞A_{\mathcal{C}}, which, in this case we compute using Macaulay2 [3] and find that there are 46 degree 2 generators of the ideal. They include homogenized equations corresponding to the generators of J𝒞J_{\mathcal{C}}

q1q2qq12q1q3qq13q1q4qq14q2q3qq23q2q4qq24q1q34qq134q2q34qq234q1q23qq123q2q13qq123q3q12qq123q1q24qq124q2q14qq124q4q12qq124q1q234qq1234q2q134qq1234q12q34qq1234.\begin{matrix}q_{1}q_{2}-qq_{12}\;&q_{1}q_{3}-qq_{13}\;&q_{1}q_{4}-qq_{14}\;&q_{2}q_{3}-qq_{23}\;&q_{2}q_{4}-qq_{24}\;&q_{1}q_{34}-qq_{134}\\ q_{2}q_{34}-qq_{234}\;&q_{1}q_{23}-qq_{123}\;&q_{2}q_{13}-qq_{123}\;&q_{3}q_{12}-qq_{123}\;&q_{1}q_{24}-qq_{124}\;&q_{2}q_{14}-qq_{124}\\ q_{4}q_{12}-qq_{124}&q_{1}q_{234}-qq_{1234}\;&q_{2}q_{134}-qq_{1234}\;&q_{12}q_{34}-qq_{1234}.\end{matrix}

However, we find more equations in the minimal generators of the homogeneous ideal, such as

q134q234q34q1234q124q234q24q1234q123q234q23q1234q14q234q4q1234\begin{matrix}&q_{134}q_{234}-q_{34}q_{1234}\;&q_{124}q_{234}-q_{24}q_{1234}\;&q_{123}q_{234}-q_{23}q_{1234}\;&q_{14}q_{234}-q_{4}q_{1234}\\ \end{matrix}

which are not necessary in the de-homogenized cdf coordinates since they are implied by the original equations (when q=1q=1).

4. Graphical and simplicial marginal independence models

Two particularly interesting classes of marginal independence models are discussed in [1] and in [2]. They are graphical models and what we will refer to as simplicial models. We aim to contrast these different classes of models and show how they fit in the framework presented above.

4.1. Graphical models

Graphical models have been studied extensively because of their many applications and interesting structure that comes with a graphical representation of the independence relations. Different types of graphs are used to express different relationships between variables: directed acyclic graphs are been used to study causality in Bayesian networks, undirected graphs on the other hand, are used to study conditional independence models. Following the work of Drton and Richardson in [2], we use bidirected graphs as a framework to discuss marginal independence. Hence, for this section, G=(V,E)G=(V,E) will be a bidirected graph with vertex set V=[n]V=[n] and a set of edges EE.

Definition 4.1.

Let G=(V,E)G=(V,E) be a bidirected graph, for a vertex vVv\in V we define the spouse of vv as the set of all vertices that share an edge with vv and vv itself, i.e., Sp(v):={wV:(w,v)E}{v}\mathrm{Sp}(v):=\{w\in V:\;(w,v)\in E\}\cup\{v\}. For a subset AVA\subseteq V, we define Sp(A)=vASp(v)\mathrm{Sp}(A)=\cup_{v\in A}\mathrm{Sp}(v).

There are several different ways in which we can read different kinds of independent statements from a graph. The rules for reading a graph are called Markov properties, we are particularly interested in two sets of Markov properties: the global Markov property and the connected set Markov property.

Definition 4.2.

Let G=([n],E)G=\left([n],E\right) be a bidirected graph. A probability distribution is said to satisfy the global Markov property if for all A,B,C[n]A,B,C\subseteq[n] pairwise disjoint such that [n]ABC[n]\setminus A\cup B\cup C separates AA and BB, we have that

XAXB|XC.X_{A}\mbox{$\perp\kern-5.5pt\perp$}X_{B}|X_{C}.
Definition 4.3.

A probability distribution PP is said to satisfy the connected set Markov property associated to a bidirected graph GG if

XCXVSp(C)X_{C}\mbox{$\perp\kern-5.5pt\perp$}X_{V\setminus\mathrm{Sp}(C)}

where CC is a connected subgraph of GG.

The global Markov property contains all of the marginal statements given by the connected set Markov property since for any connected set CC, we have that CC is separated from VSp(C)V\setminus\mathrm{Sp}(C) by Sp(C)C=[n](C[n]Sp(C))\mathrm{Sp}(C)\setminus C=[n]\setminus(C\cup[n]\setminus\mathrm{Sp}(C)) (in this case, the third set would be empty). Even though the global Markov property has conditional statements that are not given by the connected set property, we know that a probability distribution PP satisfies the global Markov property if and only if it satisfies the connected set property.

Additionally, since vVv\in V is connected, the connected set Markov rules imply that if u,vVu,v\in V do not have an edge between them, then XvXuX_{v}\mbox{$\perp\kern-5.5pt\perp$}X_{u}. This is known as the pairwise Markov property, hence, both the global and connected set Markov properties imply all the statements in the pairwise Markov property.

Example 4.4.

Consider the chain with 4 vertices. The connected set Markov property yields X1X3,4X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{3,4} and X4X1,2X_{4}\mbox{$\perp\kern-5.5pt\perp$}X_{1,2} and the global Markov property gives us X1X4|X2,3X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{4}|X_{2,3} as well.

Definition 4.5 (Graphical model).

The marginal independence graphical model associated to GG, G\mathcal{M}_{G}, is given by the set of distributions P=(pi1,,in)P=(p_{i_{1},\ldots,i_{n}}) and discrete random variables X1,,XnX_{1},\ldots,X_{n} that satisfy the marginal independence statements given by the connected set Markov property of GG.

Example 4.6.

Let GG be the graph on 3 vertices with the edge (1,2)(1,2) and binary random variables. Then, the marginal independence statements in this model are given by X1,2X3X_{1,2}\mbox{$\perp\kern-5.5pt\perp$}X_{3} and the equations defining J𝒞J_{\mathcal{C}} are

q123q12q3=0,\displaystyle q_{123}-q_{12}q_{3}=0, q13q1q3=0,\displaystyle q_{13}-q_{1}q_{3}=0,\;\; q23q2q3=0.\displaystyle q_{23}-q_{2}q_{3}=0.

These are the same defining equations for the model as the ones in Example 3.7. However, they represent slightly different models since here we have 3 random variables instead of 4.

In general, given a graph GG, the collection 𝒞\mathcal{C} of the marginal independence statements is defined by all the connected set Markov statements associated to the graph. This means that we can apply the results from Section 3 to this special class of models. Furthermore, the notion of connectedness relative to a collection of statements 𝒞\mathcal{C} agrees with the notion of connectedness of subgraphs of the corresponding graph GG.

Proposition 4.7.

Let GG be a bidirected graph and 𝒞\mathcal{C} the split closed order ideal defined by the connected set Markov statements. Then a set D[n]D\subseteq[n] is disconnected with respect to 𝒞\mathcal{C} if and only if the corresponding subgraph with vertices in DD is disconnected.

Proof.

If DD is disconnected with respect to 𝒞\mathcal{C}, then there exists π𝒞\pi\in\mathcal{C} such that D=|π|D=|\pi|. Without loss of generality, we may assume that π1\pi_{1} is connected with respect to 𝒞\mathcal{C} and that it only has two blocks, so, π2=VSp(π1)\pi_{2}=V\setminus\mathrm{Sp}(\pi_{1}). Thus, the subgraph induced by DD is disconnected.

Conversely, if the subgraph of DD is disconnected, then we can certainly find a connected component π\pi of DD, and take π|(VSp(π))\pi|(V\setminus\mathrm{Sp}(\pi)) so that DD is disconnected with respect to our collection 𝒞\mathcal{C}. ∎

Proposition 4.7 ensures that in this particular class of models, we can restate the results of Section 3 in terms of the graph theoretic sense of connectedness, this agrees with the results presented in [2] in the binary case and allows us to extend them to the general case.

Example 4.8.

Let 𝒞\mathcal{M}_{\mathcal{C}} be the model on 3 binary variables given by the marginal independence statements X1X2X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{2}, X2X3X_{2}\mbox{$\perp\kern-5.5pt\perp$}X_{3}, X1X3X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{3}. The only graphical model on 3 binary variables that contains these marginal independence statements is given by the graph on 3 vertices that has no edges. However, because of the connected set Markov rule, this graphical model also has the statement X1X2X3X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{2}\mbox{$\perp\kern-5.5pt\perp$}X_{3}, which cannot be inferred from 𝒞\mathcal{C}. We will show in the following section that this model is a simplicial complex model.

4.2. Simplicial complex models

This class of models was proposed in [1] and it concerns a very different class of models. We obtain these models from a simplicial complex instead of a graph, and the way we read statements from simplices is that all of the random variables indexed by a face are completely independent. Furthermore, this can be achieved by imposing rank 1 constraints on the probability tensor indexed by any particular face, which consequently makes the factorization in cdf coordinates and the homogeneous parametrization much simpler.

Definition 4.9 (Simplicial complex model).

Let Σ2[n]\Sigma\subseteq 2^{[n]} be a simplicial complex, then the model Σ\mathcal{M}_{\Sigma} is defined by the distributions P=(pi1in)P=(p_{i_{1}\ldots i_{n}}) and finite random variables such that the random variables {Xi:iσ}\{X_{i}:i\in\sigma\} are completely independent for any face σΣ\sigma\in\Sigma.

Example 4.10.

Let Σ\Sigma be the simplicial complex on 3 binary variables defined by the 3 cycle, i.e., the marginal independence relations would be exactly the same as in Example 4.8. This shows that there are simplicial models that are not graphical models.

Remark.

We can discuss this class of models in terms of general marginal independence models by defining a collection of marginal independence statements generated by 𝒞={σ1||σk:σΣ}\mathcal{C}=\left\{\sigma_{1}|\cdots|\sigma_{k}:\sigma\in\Sigma\right\}. Thus, we can apply all of the results from Section 3 to simplicial complex models, i.e. we get a toric model in cdf coordinates and a parametrization in terms of the connected sets. By construction, the disconnected sets of 𝒞\mathcal{C} are the faces of Σ\Sigma (except the vertices), and the connected sets are the non-faces and the vertices.

Next, we wish to compare graphical models and simplicial complex models. We need to be careful when comparing the two classes of models since the way we read the marginal independence statements from their corresponding structures is very different: in a simplicial complex, any face gives an marginal independence statement (in particular, edges represent marginal independence relations), whereas in graphs, any non-edge gives a marginal independence statement (and we get more from Markov rules). For this reason, when we are comparing them, it is useful to talk about the complementary graph GG^{\prime} of GG.

In general, we can associate a simplicial complex model to any graphical model such that the graphical model is a submodel of the simplicial complex model.

Proposition 4.11.

Let G=(V,E)G=(V,E) be a graph and GG^{\prime} be its complementary graph, G=(V,E)G^{\prime}=(V,E^{\prime}) where EE^{\prime} is the usual complement of EE. Then, the associated simplicial complex, Σ\Sigma, given by

Σ(G)={FV|eFeE}\Sigma(G)=\left\{F\subseteq V\;\big{|}\;e\nsubseteq F\;\forall e\in E\right\}

gives us that GΣ\mathcal{M}_{G}\subseteq\mathcal{M}_{\Sigma}.

Proof.

We need to show the containment of the ideals IΣ(G)IGI_{\Sigma(G)}\subseteq I_{G}, which in turn requires us to show that any marginal independence statement from Σ(G)\Sigma(G) can be obtained from GG as well. The edges (1 dimensional faces) of Σ(G)\Sigma(G) are exactly the non-edges in GG and the faces of Σ(G)\Sigma(G) are the completely disconnected subgraphs of GG. Hence, every marginal independence statement from Σ(G)\Sigma(G) can be found in the model associated to GG. Thus proving that G\mathcal{M}_{G} is a submodel of Σ(G)\mathcal{M}_{\Sigma(G)}. ∎

Remark.

Observe that the construction in Proposition 4.11 for a given GG implies that the 11-skeleton of Σ(G)\Sigma(G) is the complimentary graph GG^{\prime}. The existence of a simplicial complex model that contains a given graphical model raises the question of when the simplicial complex models and graphical models are isomorphic. Consider the following example:

Example 4.12.

Let G\mathcal{M}_{G} be the model associated to the binary 4-cycle.

11223344
(a) 4-cycle graph GG
11224433
(b) Simplicial complex ΣG\Sigma_{G}
Figure 2. Graph and simplicial complex in Example 4.12

This gives us the marginal independence relations X1X3X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{3} and X2X4X_{2}\mbox{$\perp\kern-5.5pt\perp$}X_{4}. As we can see in Figure 2, the simplicial complex ΣG\Sigma_{G} associated to this graph gives us the same marginal independence statements as GG, i.e. G=Σ(G)\mathcal{M}_{G}=\mathcal{M}_{\Sigma(G)}.

Example 4.13.

Let G\mathcal{M}_{G} be the model associated to the binary 4-chain.

11223344
(a) 4-chain graph GG
33114422
(b) Simplicial complex ΣG\Sigma_{G}
Figure 3. Graph and simplicial complex in Example 4.13.

This graph gives us the following marginal independence statements: X1X3.4X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{3.4}, X4X1,2X_{4}\mbox{$\perp\kern-5.5pt\perp$}X_{1,2}. Its associated simplicial complex, ΣG\Sigma_{G}, is again a 4 chain that implies the following: X1X3X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{3}, X1X4X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{4} and X2X4X_{2}\mbox{$\perp\kern-5.5pt\perp$}X_{4}. We cannot derive the marginal independence statements given by GG from this . Thus, in this case the models are not isomorphic (even though the 1-skeleton of Σ(G)\Sigma(G) and GG are isomorphic as graphs).

For a graphical model G\mathcal{M}_{G} to be isomorphic to its simplicial complex counterpart, we would need that the marginal independence statements on GG are only statements of complete marginal independence of individual variables, since those are the only type of marginal independence statements that we can obtain from the simplicial complex model.

Proposition 4.14.

For a graph GG, G=Σ(G)\mathcal{M}_{G}=\mathcal{M}_{\Sigma(G)} if the complimentary graph GG^{\prime} of GG is a disjoint union of complete graphs.

Proof.

Let G=G1GkG^{\prime}=G_{1}^{\prime}\cup\ldots\cup G_{k}^{\prime} where each GiG_{i}^{\prime}, i[k]i\in[k], is a complete graph and they are disjoint. Then, observe that each GiG_{i}^{\prime} corresponds to a totally disconnected subgraph of GG whose vertices are connected all vertices not in GiG_{i}^{\prime}. Hence, each GiG_{i}^{\prime} yields a complete marginal independence statement on the random variables indexed by the vertices of GiG_{i}^{\prime}. Furthermore, by construction, these are all the statements that we are going to get from the graph GG. We know that those are precisely the statements that we get from Σ(G)\Sigma(G), thus showing that the models are isomorphic. ∎

Remark.

We can restate Proposition 4.14 without using the complimentary graph since that condition on GG^{\prime} is equivalent to GG being a multipartite graph. Additionally, we can see from this proposition that the number of nontrivial isomorphic models is going to be the number of partitions of #V\#V.

Example 4.15.

Let X1,,X4X_{1},\ldots,X_{4} be finite random variables, and consider the model given by the marginal independence statements X1X2,3,4X_{1}\mbox{$\perp\kern-5.5pt\perp$}X_{2,3,4} and X2X3X_{2}\mbox{$\perp\kern-5.5pt\perp$}X_{3}. This model is not simplicial and not graphical, but it still is toric.

4.3. Counting models and computational results.

We counted all models on 3 and 4 random variables and counted up to symmetry of relabeling the random variables and obtained the following:

Table 1. Number of models on 33 and 44 binary random variables
Total Up to symmetry Total Up to symmetry
n=3n=3 n=3n=3 n=4n=4 n=4n=4
Graphical 8 4 64 11
Simplicial 9 5 114 20
Graphical and simplicial 5 3 18 5
General 12 6 496 53

For the simplicial and graphical models, we used data from OEIS [4, 5, 6]. For the general case, we do not yet have an efficient way of counting split closed order ideals, so we used exhaustive computation checking split closures of order ideals.

Additionally, we computed the homogeneous ideals for all models up to labeling symmetry for n=4n=4 and found the highest degree of the generators, degree and dimension of the projective variety that comes from I𝒞I_{\mathcal{C}} and whether or not the models are simplicial or graphical.

Table 2. Computations for all models on 4 random variables.
Split order Max degree Degree Dimension Graphical Simplicial
ideal gens
\emptyset 1 15 x x
1|21|2 2 2 14 x x
1|2,1|31|2,1|3 2 3 13 x
1|2,3|41|2,3|4 2 4 13 x x
1|231|23 2 4 12 x
1|2,1|3,1|41|2,1|3,1|4 2 4 12 x
1|2,1|3,2|31|2,1|3,2|3 3 5 12 x x
1|2,1|3,2|41|2,1|3,2|4 2 5 12 x
1|2,1|3,1|4,2|31|2,1|3,1|4,2|3 3 7 11 x
1|2,1|3,2|4,3|41|2,1|3,2|4,3|4 2 6 11 x
1|2,1|341|2,1|34 2 5 11
1|2,13|41|2,13|4 2 7 11
1|2|31|2|3 2 6 11 x
1|2,1|3,1|4,2|3,2|41|2,1|3,1|4,2|3,2|4 3 9 10 x
1|2,1|3,2|341|2,1|3,2|34 3 9 10
1|2,1|3,23|41|2,1|3,23|4 2 9 10
1|2,1|3|41|2,1|3|4 2 8 10 x
1|23,1|241|23,1|24 2 6 10
1|23,14|21|23,14|2 2 10 10 x
1|2,1|3,1|4,2|3,2|4,3|41|2,1|3,1|4,2|3,2|4,3|4 3 12 9 x
1|2,1|3,1|4,2|341|2,1|3,1|4,2|34 3 12 9
1|2,1|3,2|3|41|2,1|3,2|3|4 3 11 9 x
1|2,13|4,23|41|2,13|4,23|4 3 11 9
1|2,13|4,24|31|2,13|4,24|3 2 14 9
1|23,1|24,1|341|23,1|24,1|34 2 7 9
1|23,23|41|23,23|4 2 10 9
1|2|3,1|241|2|3,1|24 2 10 9
1|2,1|3,1|4,2|3|41|2,1|3,1|4,2|3|4 3 16 8 x
1|2,1|3,123|41|2,1|3,123|4 3 15 8
1|2,1|3,123|4,2|31|2,1|3,123|4,2|3 3 17 8
1|2,123|41|2,123|4 3 13 8 x
1|2,1|3|4,23|41|2,1|3|4,23|4 3 14 8
1|2,1|34,2|341|2,1|34,2|34 3 14 8
1|23,14|2,14|31|23,14|2,14|3 2 16 8
1|2341|234 2 8 8 x
1|2|3,1|24,1|341|2|3,1|24,1|34 2 12 8
1|2|3,1|2|41|2|3,1|2|4 2 12 8 x
1|2,1|3,123|4,2|3|41|2,1|3,123|4,2|3|4 3 21 7
1|2,1|3|4,2|3|41|2,1|3|4,2|3|4 3 18 7 x
1|2,1|3|4,123|41|2,1|3|4,123|4 3 17 7
1|23,14|2,14|3,23|41|23,14|2,14|3,23|4 2 19 7
1|2|3,1|2341|2|3,1|234 2 14 7
1|2|3,1|24,24|31|2|3,1|24,24|3 3 17 7
1|2|3,1|2|4,1|341|2|3,1|2|4,1|34 2 15 7
1|2,1|3|4,123|4,2|3|41|2,1|3|4,123|4,2|3|4 3 23 6
12|3412|34 2 20 6 x
1|2|3,1|2|4,1|34,2|341|2|3,1|2|4,1|34,2|34 3 19 6
1|2|3,1|2|4,1|2341|2|3,1|2|4,1|234 2 18 6
1|2|3,1|2|4,1|3|41|2|3,1|2|4,1|3|4 3 20 6 x
1|2|341|2|34 2 20 5 x
1|2|3,1|2|4,1|3|4,2|3|41|2|3,1|2|4,1|3|4,2|3|4 3 23 5 x
1|2|3,1|2|4,1|234,1|3|41|2|3,1|2|4,1|234,1|3|4 3 25 5
1|2|3|41|2|3|4 2 24 4 x x

References

  • [1] Tobias Boege, Sonja Petrovíc, and Bernd Sturmfels. Marginal independence models. Proceedings of the 2022 International Symposium on Symbolic and Algebraic Computation, 2021.
  • [2] Mathias Drton and Thomas S. Richardson. Binary models for marginal independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(2):287–309, 2008.
  • [3] Daniel R. Grayson and Michael E. Stillman. Macaulay2, a software system for research in algebraic geometry. Available at http://www2.macaulay2.com.
  • [4] OEIS Foundation Inc. Entry A000088 in The On-Line Encyclopedia of Integer Sequences. https://oeis.org/A000088, 2023.
  • [5] OEIS Foundation Inc. Entry A261005 in The On-Line Encyclopedia of Integer Sequences. https://oeis.org/A261005, 2023.
  • [6] OEIS Foundation Inc. Entry A307249 in The On-Line Encyclopedia of Integer Sequences. https://oeis.org/A307249, 2023.
  • [7] S.L. Lauritzen. Graphical Models. Oxford Statistical Science Series. Clarendon Press, 1996.
  • [8] Richard P. Stanley. Enumerative Combinatorics, volume 1 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2 edition, 2011.
  • [9] Seth Sullivant. Algebraic statistics, volume 194. American Mathematical Soc., 2018.