This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Compatibility of Partitions with Trees, Hierarchies, and Split Systems

Marc Hellmuth Department of Mathematics, Faculty of Science, Stockholm University, SE - 106 91 Stockholm, Sweden
[email protected]
David Schaller Bioinformatics Group, Department of Computer Science & Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16–18, D-04107 Leipzig, Germany
[email protected]\cdot[email protected]
Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
Peter F. Stadler Bioinformatics Group, Department of Computer Science & Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16–18, D-04107 Leipzig, Germany
[email protected]\cdot[email protected]
Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany Department of Theoretical Chemistry University of Vienna, Währingerstraße 17, A-1090 Wien, Austria Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
( )
Abstract

The question whether a partition 𝒫\mathcal{P} and a hierarchy \mathcal{H} or a tree-like split system 𝔖\mathfrak{S} are compatible naturally arises in a wide range of classification problems. In the setting of phylogenetic trees, one asks whether the sets of 𝒫\mathcal{P} coincide with leaf sets of connected components obtained by deleting some edges from the tree TT that represents \mathcal{H} or 𝔖\mathfrak{S}, respectively. More generally, we ask whether a refinement TT^{*} of TT exists such that TT^{*} and 𝒫\mathcal{P} are compatible in this sense. The latter is closely related to the question as to whether there exists a tree at all that is compatible with 𝒫\mathcal{P}. We report several characterizations for (refinements of) hierarchies and split systems that are compatible with (systems of) partitions. In addition, we provide a linear-time algorithm to check whether refinements of trees and a given partition are compatible. The latter problem becomes NP-complete but fixed-parameter tractable if a system of partitions is considered instead of a single partition. In this context, we also explore the close relationship of the concept of compatibility and so-called Fitch maps.


Keywords: hierarchy, split system, phylogenetic tree, partition, compatibility, recognition algorithm, Fitch map

1 Introduction

The selection of a partition 𝒫\mathcal{P} from a hierarchy \mathcal{H} (or its equivalent rooted tree TT) on a finite set XX is often the final step in applications of hierarchical clustering procedures [16, 21]. In this case, 𝒫\mathcal{P}, often called a “representative partition”, is composed of the pairwise disjoint leaf sets L(T(u))L(T(u)) of subtrees T(u)T(u) rooted at some vertices uu of TT. Equivalently, each set (class) of 𝒫\mathcal{P} is a set in \mathcal{H}, i.e., 𝒫\mathcal{P}\subseteq\mathcal{H}. Cutting a hierarchy at different “levels” leads to partitions 𝒫i\mathcal{P}_{i} of XX that are ordered by refinement, i.e., every set A𝒫iA\in\mathcal{P}_{i} at the lower level is contained in a set B𝒫jB\in\mathcal{P}_{j} at the higher level. In the image processing literature, braids of partitions have been introduced as systems {𝒫ii=1,,k}\{\mathcal{P}_{i}\mid i=1,\dots,k\} of partitions of XX that generalize such hierarchically ordered partitions. In a braid, all pairwise refinement suprema 𝒫i𝒫ji\mathcal{P}_{i}\vee\mathcal{P}_{j\neq i} (i.e., the finest partition that is refined by both 𝒫i\mathcal{P}_{i} and 𝒫j\mathcal{P}_{j}) must be hierarchically organized w.r.t. refinement, and moreover satisfy {X}𝒫i𝒫j\{X\}\neq\mathcal{P}_{i}\vee\mathcal{P}_{j} [15, 19]. Considering the distribution of attributes of species, biologists have been interested in systems of partitions {𝒫1,,𝒫k}\{\mathcal{P}_{1},\dots,\mathcal{P}_{k}\} and the associated system of splits A|(XA)A|(X\setminus A) with A𝒫iA\in\mathcal{P}_{i} for some ii. In [14], the compatible split systems that are generated by partition systems are characterized.

Instead of considering systems of partitions, one can ask whether a single partition 𝒫\mathcal{P}, or – equivalently – its associated split system 𝔖𝒫{A|(XA):A𝒫}\mathfrak{S}_{\mathcal{P}}\coloneqq\left\{A|(X\setminus A)\,\colon A\in\mathcal{P}\right\} can be obtained from a rooted or unrooted tree TT by cutting a subset HE(T)H\subseteq E(T) of the tree edges and considering the leaf sets of the resulting connected components. For rooted trees, this question arises naturally in mathematical phylogenetics. The removal of those edges from the rooted gene tree that correspond to horizontal gene transfer (HGT) then leaves subtrees of the gene phylogeny that can be analyzed independently [8, 7, 10]. If the tree TT and the partition 𝒫\mathcal{P} of the leaf set into HGT-free subsets are inferred independently, it is important to recognize whether the data are compatible with each other, i.e., whether 𝒫\mathcal{P} can be obtained from the tree TT by cutting some of its edges. If this is not possible for a tree that contains multifurcations, it may still be possible to achieve compatibility by refining some of the multifurcations in TT. Here, we address these two main questions.

The split system 𝔖𝒫\mathfrak{S}_{\mathcal{P}} introduced above suggests a different notion of compatibility with trees. Consider the partition 𝒫={{a},{b},{c,d}}\mathcal{P}=\{\{a\},\{b\},\{c,d\}\}. Clearly, 𝒫\mathcal{P} is compatible with the star tree S4S_{4} on X={a,b,c,d}X=\{a,b,c,d\}. However, S4S_{4} (more precisely, its unrooted version S4¯\overline{S_{4}}) does not display the split {c,d}(X{c,d})\{c,d\}\mid(X\setminus\{c,d\}). In fact, the condition that 𝔖𝒫\mathfrak{S}_{\mathcal{P}} is displayed by the unrooted tree T¯\overline{T} is closely related to the idea that 𝒫\mathcal{P} is a representative partition (of a suitably rooted version) of the tree T¯\overline{T}. This example shows that the notion of compatibility considered here is more general than the concepts that have appeared in the literature so far.

In this contribution, we characterize compatibility of partitions with rooted and unrooted trees (or equivalently their hierarchies and split systems, respectively). After introducing the notation and some preliminary results, we give an overview of the main concepts and results in Section 3.

2 Preliminaries

Basics

We denote the power set of a set XX by 2X2^{X}. A set system 𝒫2X\mathcal{P}\subseteq 2^{X} is a partition of XX if (P0) 𝒫\emptyset\notin\mathcal{P}, (P1) A𝒫A=X\bigcup_{A\in\mathcal{P}}A=X, and (P2) if A,B𝒫A,B\in\mathcal{P} and ABA\cap B\neq\emptyset then A=BA=B. We will interchangeably use the equivalent terms: set of partitions, collection of partitions and partition systems. Two sets AA and BB overlap if ABA\cap B\neq\emptyset, ABA\setminus B\neq\emptyset, and BAB\setminus A\neq\emptyset.

In this contribution, we consider both rooted and unrooted phylogenetic trees TT with vertex set V(T)V(T), edge set E(T)E(T), and leaf set L(T)=XL(T)=X. In this case, we also say that TT is a tree on XX. A star tree TT is a tree for which |V(T)L(T)|=1|V(T)\setminus L(T)|=1.

Rooted Trees and Hierarchies

A rooted tree TT has a distinguished vertex ρT\rho_{T} called the root of TT. For uV(T)u\in V(T), we write childT(u)\operatorname{child}_{T}(u) for the set of its children, and parT(u)\operatorname{par}_{T}(u) for the parent of uρTu\neq\rho_{T}. In both cases, we may omit the index TT whenever there is no risk of confusion. The subtree of TT rooted at uu is denoted by T(u)T(u). Furthermore, we write T\preceq_{T} for the ancestor partial order on TT, that is, uTvu\preceq_{T}v if vv lies on the path from ρT\rho_{T} to uu. If uTvu\preceq_{T}v or vTuv\preceq_{T}u, then uu and vv are comparable and, otherwise, incomparable. For a nonempty subset of AXA\subseteq X, we denote by lcaT(A)\operatorname{lca}_{T}(A) the last common ancestor of AA in TT. A rooted tree is phylogenetic if all its inner vertices V(T)L(T)V(T)\setminus L(T) have at least two children. Hence, a rooted phylogenetic tree may contain one vertex with degree 22, namely the root ρT\rho_{T}.

A set system 2X\mathcal{H}\subseteq 2^{X} is a hierarchy (on XX) if (H0) \emptyset\notin\mathcal{H}, (H1) XX\in\mathcal{H}, (H2) A,BA,B\in\mathcal{H} implies AB{A,B,}A\cap B\in\{A,B,\emptyset\}, i.e., AA and BB do not overlap, and (H3) {x}\{x\}\in\mathcal{H} for all xXx\in X. For a given non-empty set AXA\subseteq X and a hierarchy 2X\mathcal{H}\subseteq 2^{X}, we define AA_{\mathcal{H}} as the inclusion-minimal element in \mathcal{H} that contains AA.

For a hierarchy 2X\mathcal{H}\subseteq 2^{X} we can define the closure as the function cl:2X2X\operatorname{cl}_{\mathcal{H}}:2^{X}\to 2^{X} satisfying

cl(A)B,ABB\operatorname{cl}_{\mathcal{H}}(A)\coloneqq\bigcap_{B\in\mathcal{H},\,A\subseteq B}B (1)

for all subsets AXA\subseteq X. Where there is no danger of confusion, we will drop the explicit reference to \mathcal{H} and simply write cl(A)\operatorname{cl}(A) instead of cl(A)\operatorname{cl}_{\mathcal{H}}(A).

Lemma 2.1.

Let AXA\subseteq X be non-empty and \mathcal{H} be a hierarchy on XX. Then cl(A)=A\operatorname{cl}(A)=A_{\mathcal{H}} for all AA\neq\emptyset.

Proof.

Since AXA\subseteq X, XX\in\mathcal{H}, and no two elements in \mathcal{H} overlap, there is a unique inclusion-minimal element AA_{\mathcal{H}} in \mathcal{H} that contains AA, i.e., every AA^{\prime}\in\mathcal{H} that contains AA also contains AA_{\mathcal{H}}. Thus cl(A)={BAB}={BAB}=A\operatorname{cl}(A)=\bigcap\{B\in\mathcal{H}\mid A\subseteq B\}=\bigcap\{B\in\mathcal{H}\mid A_{\mathcal{H}}\subseteq B\}=A_{\mathcal{H}}. ∎

As an immediate consequence, we observe that for a hierarchy 2X\mathcal{H}\subseteq 2^{X}, cl\operatorname{cl} satisfies the classical properties of a closure operator: (C1) Acl(A)A\subseteq\operatorname{cl}(A) (enlarging); (C2) cl(cl(A))=cl(A)\operatorname{cl}(\operatorname{cl}(A))=\operatorname{cl}(A) (idempotent); (C3) If ABA\subseteq B, then cl(A)cl(B)\operatorname{cl}(A)\subseteq\operatorname{cl}(B) (isotone). For |X|2|X|\geq 2, \mathcal{H} contains at least two distinct singletons {x}\{x\} and {y}\{y\} and thus cl(){x}{y}=\operatorname{cl}(\emptyset)\subseteq\{x\}\cap\{y\}=\emptyset. Only in the special case |X|=1|X|=1, i.e., ={{x}}\mathcal{H}=\{\{x\}\}, we get cl()={x}\operatorname{cl}(\emptyset)=\{x\}. This is the consequence of the usual practice of excluding \emptyset from hierarchies to have the singleton as inclusion-minimal elements, which in turn is motivated by the 1-to-1 correspondence between rooted trees and hierarchies. We note that in the context of abstract convexities, on the other hand, it is customary to enforce \emptyset\in\mathcal{H} so that \mathcal{H} is closed under intersection [20], in which case cl()=\operatorname{cl}(\emptyset)=\emptyset also for |X|1|X|\leq 1. This subtlety is irrelevant for our discussion, however, since we are not interested in the trivial cases |X|1|X|\leq 1.

The following result shows that there is a 1-to-1 correspondence between hierarchies and rooted phylogenetic trees.

Theorem 2.2 ([18]).

Let \mathcal{H} be a collection of non-empty subsets of XX. Then, \mathcal{H} is a hierarchy on XX if and only if there is a rooted phylogenetic tree TT on XX with {L(T(v))cV(T)}\mathcal{H}\coloneqq\{L(T(v))\mid c\in V(T)\}.

The sets L(T(u))L(T(u)) for uV(T)u\in V(T) or, equivalently, the sets of a hierarchy are commonly referred to as clusters. In view of Thm. 2.2, every set AA\in\mathcal{H} corresponds to a vertex uAV(T)u_{A}\in V(T) such that L(T(uA))=AL(T(u_{A}))=A and uATuBu_{A}\preceq_{T}u_{B} is equivalent to ABA\subseteq B. The hierarchy (T)\mathcal{H}(T) of a rooted phylogenetic tree TT is (T){L(T(v))vV(T)}\mathcal{H}(T)\coloneqq\{L(T(v))\mid v\in V(T)\}. Moreover, we say that a rooted tree TT corresponds to \mathcal{H} if =(T)\mathcal{H}=\mathcal{H}(T). In particular, Thm. 2.2 ensures that for all hierarchies there is a corresponding tree and, moreover, that all results established here for hierarchies do also hold for rooted phylogenetic trees and vice versa. As an immediate consequence, we can express the closure as

cl(A)=L(T(lcaT(A))).\operatorname{cl}_{\mathcal{H}}(A)=L(T(\operatorname{lca}_{T}(A))). (2)

We therefore have lcaT(A)TlcaT(B)\operatorname{lca}_{T}(A)\preceq_{T}\operatorname{lca}_{T}(B) if and only if cl(A)cl(B)\operatorname{cl}(A)\subseteq\operatorname{cl}(B) for all A,BA,B\neq\emptyset. Given a set AA\in\mathcal{H} with |A|>1|A|>1 (and corresponding vertex uV(T)u\in V(T) that satisfies A=L(T(u))A=L(T(u))), we call BB\in\mathcal{H} a child cluster of AA if B=L(T(v))B=L(T(v)) for some vchildT(u)v\in\operatorname{child}_{T}(u). Hence, the child clusters BB of AA are exactly the inclusion-maximal sets in \mathcal{H} that satisfy BAB\subsetneq A. For simplicity of notation, we will think of the leaves of TT in this case as the xXx\in X, i.e., L(T)=XL(T)=X. A phylogenetic tree TT is a refinement of the phylogenetic tree TT^{\prime} on the same set XX if the corresponding hierarchy (T)\mathcal{H}(T) is a refinement of (T)\mathcal{H}(T^{\prime}), i.e., if (T)(T)\mathcal{H}(T^{\prime})\subseteq\mathcal{H}(T).

Unrooted Trees and Split Systems

An unrooted tree is phylogenetic if all non-leaf vertices have degree at least three. A split A1|A2{A1,A2}A_{1}|A_{2}\coloneqq\{A_{1},A_{2}\} on XX is a partition of the set XX into two disjoint non-empty subsets A1A_{1} and A2=XA1A_{2}=X\setminus A_{1}. In every tree TT on XX, we can associate with every edge eE(T)e\in E(T) the split 𝒮e=L(T1)|L(T2)\mathcal{S}_{e}=L(T_{1})|L(T_{2}) where L(T1)L(T_{1}) and L(T2)L(T_{2}) are the leaf sets of the two (not necessarily phylogenetic) trees T1T_{1} and T2T_{2}, respectively, obtained from TT by deletion of ee. An unrooted phylogenetic tree T¯\overline{T} is determined by its split system 𝔖(T¯)={𝒮e:eE(T¯)}\mathfrak{S}(\overline{T})=\{\mathcal{S}_{e}\colon e\in E(\overline{T})\}. To be more precise, there is a 1-to-1 correspondence between unrooted phylogenetic trees T¯\overline{T} with leaf set XX and split systems 𝔖\mathfrak{S} that (i) contain all “singleton splits” {x}|(X{x})\{x\}|(X\setminus\{x\}) and (ii) are “compatible” in the sense that, for any two splits A1|A2,B1|B2𝔖A_{1}|A_{2},B_{1}|B_{2}\in\mathfrak{S}, at least one of the four intersections A1B1A_{1}\cap B_{1}, A1B2A_{1}\cap B_{2}, A2B1A_{2}\cap B_{1}, or A2B2A_{2}\cap B_{2} is empty [4]. In this case, there is a unique (up to isomorphism) tree T¯\overline{T} with 𝔖(T¯)=𝔖\mathfrak{S}(\overline{T})=\mathfrak{S}. We call such split systems tree-like. An unrooted tree T¯\overline{T}^{*} is a refinement of T¯\overline{T} if 𝔖(T¯)𝔖(T¯)\mathfrak{S}(\overline{T})\subseteq\mathfrak{S}(\overline{T}^{*}).

We note that this is a special case of the analogous result for so-called XX-trees, see e.g. [18, Prop. 3.5.4]. In an XX-tree, a set of “taxa” XX is mapped (not necessarily injectively) to the vertex set V(T)V(T) of a rooted or unrooted tree TT [18]. The phylogenetic (rooted or unrooted) trees considered here are a slightly less general construction that identifies the taxa set XX with the leaf set L(T)L(T) and insists that distinct taxa are represented by distinct vertices in the underlying trees. That is, they are equivalent to XX-trees TT for which XX is bijectively mapped to L(T)L(T).

Remark.

Throughout this contribution, we assume that XX is finite and |X|2|X|\geq 2. Moreover, all rooted and unrooted trees are phylogenetic unless explicitly specified otherwise.

3 Main Ideas and Results

Let TT be a (rooted or unrooted) tree with leaf set L(T)=XL(T)=X and HE(T)H\subseteq E(T) be a subset of edges. Removal of HH disconnects TT into a forest whose connected components induce the partition (T,H)\mathcal{F}(T,H) on the leaf set XX. Of course, it may be possible that removal of the edges HH separates inner vertices, e.g., if all incident edges to an inner vertex are in HH. This, however, does not change the fact that we still obtain a partition of XX after removal of the edges in HH. We will refer to the edges eHe\in H as separating edges. Fig. 1 shows two examples for sets of separating edges, H1H_{1} and H2H_{2} as indicated by the dashed lines, for a given tree TT. The partition 𝒫1=(T,H1)\mathcal{P}_{1}=\mathcal{F}(T,H_{1}) is a “representative partition” for TT, i.e., all sets in 𝒫1\mathcal{P}_{1} appear as clusters in TT: {a,b,c}=L(T(u))\{a,b,c\}=L(T(u)), {d,e,f}=L(T(v))\{d,e,f\}=L(T(v)), and {g}=L(T(g))\{g\}=L(T(g)). In contrast, the partition 𝒫2=(T,H2)\mathcal{P}_{2}=\mathcal{F}(T,H_{2}) contains the set {a,g}\{a,g\} which is not a cluster in TT, and thus, 𝒫2\mathcal{P}_{2} is not a representative partition for TT.

Refer to caption
Figure 1: A tree TT on X={a,b,c,d,e,f,g}X=\{a,b,c,d,e,f,g\} with two examples for sets of separating edges H1H_{1} (dashed edges in the middle panel) and H2H_{2} (dashed edges in the right panel). Removal of H1H_{1} and H2H_{2} induces the partitions 𝒫1={{a,b,c},{d,e,f},{g}}\mathcal{P}_{1}=\{\{a,b,c\},\{d,e,f\},\{g\}\} and 𝒫2={{a,g},{b,c},{d},{e,f}}\mathcal{P}_{2}=\{\{a,g\},\{b,c\},\{d\},\{e,f\}\} of XX, respectively.

The notion of compatibility of partitions and trees used here is defined in terms of separating edges and the partitions that they induce:

Definition 3.1.

Let 𝒫\mathcal{P} be a partition of XX and let TT be a rooted or unrooted tree with leaf set XX. Then 𝒫\mathcal{P} and TT are compatible if there is a set of separating edges HE(T)H\subseteq E(T) such that 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T,H).

In case TT is rooted (or unrooted) and compatible with 𝒫\mathcal{P}, the corresponding hierarchy (T)\mathcal{H}(T) (or split system 𝔖(T)\mathfrak{S}(T), resp.) are said to be compatible with 𝒫\mathcal{P}.

As we shall see in Lemma 4.1, we can always find a tree on XX that is compatible with 𝒫\mathcal{P} for a given partition 𝒫\mathcal{P} of XX. In particular, the tree corresponding to the hierarchy 𝒫𝒫{{x}xX}{X}\mathcal{H}_{\mathcal{P}}\coloneqq\mathcal{P}\cup\{\{x\}\mid x\in X\}\cup\{X\} is always compatible with 𝒫\mathcal{P}, see 𝒫\mathcal{P} and T1T_{1} in Fig. 2 for an illustrative example.

Refer to caption
Figure 2: Separating edges are indicated as dashed lines in the three trees with leaf set X={a,b,c,d,e}X=\{a,b,c,d,e\}, which explain the same partition 𝒫={{a},{b,c},{d,e}}\mathcal{P}=\{\{a\},\{b,c\},\{d,e\}\}. T1T_{1} corresponds to the hierarchy 𝒫𝒫{{x}xX}{X}\mathcal{H}_{\mathcal{P}}\coloneqq\mathcal{P}\cup\{\{x\}\mid x\in X\}\cup\{X\}. The two trees T2T_{2} and T3T_{3} can be obtained from T1T_{1} by contracting a single inner edge and re-rooting of the resulting tree. In particular, T2T_{2} and T3T_{3} are minimally-resolved, that is, they have the fewest number of vertices among all trees that are compatible with 𝒫\mathcal{P}. The last statement is easy to verify, since any tree with fewer vertices must be a star tree which is not compatible with 𝒫\mathcal{P}.

However, Fig. 2 also shows that there can be multiple different trees on XX that are compatible with 𝒫\mathcal{P}. The main result of Section 4, Thm. 4.5, is a characterization of compatibility of partitions and hierarchies (and thus rooted trees).

As it turns out, not all hierarchies are compatible with a given partition 𝒫\mathcal{P}. In many applications, hierarchies (and their corresponding rooted trees) are not necessarily fully resolved, even though it is often assumed that the “ground truth” is a binary tree. In situations where \mathcal{H} and 𝒫\mathcal{P} are not compatible, it is therefore of interest to ask whether it is possible to find a refinement of \mathcal{H} that is compatible with 𝒫\mathcal{P}.

Refer to caption
Figure 3: The tree TT is not compatible with the partition 𝒫{A{a,b},B{c,d}}\mathcal{P}\coloneqq\{A\coloneqq\{a,b\},B\coloneqq\{c,d\}\} but admits a compatible refinement TT^{*}. In contrast, the tree TT^{\prime} does not admit a compatible refinement. See text for more details.

An example of a tree TT that is not compatible with a partition 𝒫\mathcal{P} but that admits a compatible refinement TT^{*} is shown in Fig. 3. To see that 𝒫={A,B}\mathcal{P}=\{A,B\} and TT as in Fig. 3 are not compatible, the edges in TT have been colored in orange and cyan if they lie on a path connecting two elements from AA or BB, respectively. Clearly, none of these edges can be a separating edge. In particular, since all edges in TT are colored, AA and BB cannot be separated by any subset HE(T)H\subseteq E(T). For similar reasons the tree TT^{\prime} as in Fig. 3 is not compatible with 𝒫\mathcal{P}. Since TT^{\prime} is already fully-resolved, it does not admit a compatible refinement.

Definition 3.2.

A tree TT and a partition 𝒫\mathcal{P} are refinement-compatible (r-compatible for short) if there exists a refinement TT^{*} of TT that is compatible with 𝒫\mathcal{P}.

In case TT is rooted (unrooted) and refinement-compatible with 𝒫\mathcal{P}, the corresponding hierarchy (T)\mathcal{H}(T) (split system 𝔖(T)\mathfrak{S}(T), resp.) are said to be refinement-compatible with 𝒫\mathcal{P}.

By definition, compatibility implies r-compatibility since every tree is a refinement of itself. In Section 5, we show that refining a hierarchy \mathcal{H} or a rooted tree TT that is already compatible with a partition 𝒫\mathcal{P} never destroys compatibility. As a main result of Section 5, we obtain a characterization of r-compatibility in terms of the simple condition that no set YY\in\mathcal{H} overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P} (cf. Thm. 5.7). We later utilize these results to derive simple linear-time algorithms for both recognition of compatibility of a partition and a tree, as well as the construction of a compatible refinement if one exists in Section 7.

Even though compatibility of partitions and rooted trees is linear-time-decidable, the situation appears substantially more complicated when systems 𝔓={𝒫1,𝒫2,,𝒫k}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k}\} of partitions of XX are considered rather than single partitions. By definition, TT (or equivalently, the hierarchy, (T)\mathcal{H}(T)) and each 𝒫i\mathcal{P}_{i} are compatible if and only if 𝒫i=(T,Hi)\mathcal{P}_{i}=\mathcal{F}(T,H_{i}) for some subset HiE(T)H_{i}\subseteq E(T), 1ik1\leq i\leq k. In this case, we say that TT (equiv. (T)\mathcal{H}(T)) and 𝔓\mathfrak{P} are compatible. It is natural, then, to ask whether for a given system of partitions 𝔓\mathfrak{P}, there exists a tree TT such that 𝔓\mathfrak{P} and TT are compatible:

Problem (Existence of Tree compatible with Partition System (ExistTP)).
Input: A partition system 𝔓\mathfrak{P} on XX.
Question: Is there a tree TT on XX such that TT and 𝔓\mathfrak{P} are compatible?

Since every tree on XX is a refinement of the star tree on XX, ExistTP is a special case of the following more general problem:

Problem (Compatibility of Tree and Partition System (CompaTP)).
Input: A tree TT with leaf set XX and a partition system 𝔓\mathfrak{P} on XX.
Question: Is there a refinement TT^{*} of TT such that TT^{*} and 𝔓\mathfrak{P} are compatible?
Refer to caption
Figure 4: The hierarchy (T)\mathcal{H}(T) corresponding to the tree TT with leaf set X={a,b,c,d,e}X=\{a,b,c,d,e\} that admits the refinements T1T^{*}_{1} and T2T^{*}_{2} of TT that are compatible with 𝒫1={{a,b},{c},{d,e}}\mathcal{P}_{1}=\{\{a,b\},\{c\},\{d,e\}\} and 𝒫2={{a},{b,c},{d,e}}\mathcal{P}_{2}=\{\{a\},\{b,c\},\{d,e\}\}, respectively. The union (T1)(T2)\mathcal{H}(T^{*}_{1})\cup\mathcal{H}(T^{*}_{2}), however, does not form a hierarchy because the two elements {a,b}\{a,b\} and {b,c}\{b,c\} overlap. Thus, there is no common refinement of T1T^{*}_{1} and T2T^{*}_{2} that is compatible with 𝔓={𝒫1,𝒫2}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2}\}. In this example, (T)\mathcal{H}(T) has another refinement \mathcal{H^{*}} with corresponding tree TT^{*} that is compatible with both 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2}.

The difficulty of both ExistTP and CompaTP stems from the fact that refinements of the underlying tree that are necessary to obtain compatibility with individual partitions in 𝔓\mathfrak{P} may contradict one another. Consider the tree TT and three of its possible refinements T1,T2T_{1}^{*},T_{2}^{*} and TT^{*} as shown in Fig. 4. In this example, TT is not compatible with 𝔓={𝒫1,𝒫2}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2}\}, and the refinement TiT_{i}^{*} is compatible with 𝒫i\mathcal{P}_{i} but not with 𝒫j\mathcal{P}_{j}, {i,j}={1,2}\{i,j\}=\{1,2\}. However, there is no common refinement of T1T_{1}^{*} and T2T_{2}^{*} that is compatible with 𝔓\mathfrak{P}. On the other hand, TT^{*} is compatible with both 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2}. Hence, TT admits a refinement TT^{*} that is compatible with 𝔓\mathfrak{P}. An example of a tree TT for which every partition 𝒫𝔓\mathcal{P}\in\mathfrak{P} is r-compatible with TT but there is no refinement of TT that is compatible with 𝔓\mathfrak{P} is provided in Fig. 7. Hence, ExistTP and CompaTP seem to be inherently difficult and indeed both are NP-complete decision problems, see Thm. 7.10.

In Section 6, compatibility of partitions with splits systems and their equivalent representations as unrooted phylogenetic trees is considered. As shown in Prop. 6.2, compatibility of partitions with unrooted and rooted trees are closely related. In fact, a partition is compatible with an unrooted tree if and only if it is compatible with any rooted version of this tree. Further characterization of (refinements of) split systems and unrooted trees being compatible with partitions will be established (cf. Thm. 6.4 as well as Lemma 6.7 and 6.8).

Section 7 is dedicated to algorithmic considerations and the complexity of deciding whether (systems of) partitions are compatible (with refinements) of hierarchies and split systems, represented by rooted and unrooted trees, respectively. As we have seen above, there are edges ee of a tree TT that can never be separating edges since their removal would break down some set A𝒫A\in\mathcal{P}. In order to identify such edges, we provide an edge coloring that we have already used in an informal way to demonstrate incompatibility in the example in Fig. 3:

Definition 3.3.

Let TT be a tree on XX and 𝒫\mathcal{P} a partition of XX. The 𝒫\mathcal{P}-(edge-)coloring of TT is the map γT,𝒫:E(T)2𝒫\gamma_{T,\mathcal{P}}\colon E(T)\to 2^{\mathcal{P}} that is given by

AγT,𝒫(e)e lies on the unique path connecting two x,xA.A\in\gamma_{T,\mathcal{P}}(e)\iff e\text{ lies on the unique path connecting two }x,x^{\prime}\in A.

The key property of this edge coloring γT,𝒫\gamma_{T,\mathcal{P}} is that any edge eE(T)e\in E(T) with γT,𝒫(e)\gamma_{T,\mathcal{P}}(e)\neq\emptyset lies on some path between two vertices a,aXa,a^{\prime}\in X that are contained in the same set of 𝒫\mathcal{P}, and thus, ee cannot be a separating edge. In contrast, all edges ee for which γT,𝒫(e)=\gamma_{T,\mathcal{P}}(e)=\emptyset do not separate any two leaves that are in the same set of 𝒫\mathcal{P}, and thus can be safely added to the set of separating edges. A key result, proven in Section 7, is that 𝒫\mathcal{P} and TT are r-compatible if and only if |γT,𝒫(e)|1|\gamma_{T,\mathcal{P}}(e)|\leq 1 for every eE(T)e\in E(T) (Prop. 7.3). In this case, the coloring γT,𝒫\gamma_{T,\mathcal{P}} can be computed in linear time and, based on this, deciding the existence of and finding a (refinement of) a tree that is compatible with a partition 𝒫\mathcal{P} can be in done in linear time as well. To establish compatibility of 𝒫\mathcal{P} and TT, it suffices to rule out the existence of a vertex uV(T)u\in V(T) that is incident with two differently colored edges (cf. Thm. 7.6). If 𝒫\mathcal{P} and TT are r-compatible but not compatible, the vertices uV(T)u\in V(T) that violate the latter condition coincide with lcaT(A)\operatorname{lca}_{T}(A) for some A𝒫A\in\mathcal{P} and can be refined by collecting all children vchildT(u)v\in\operatorname{child}_{T}(u) for which L(T(v))AL(T(v))\cap A\neq\emptyset under a newly created vertex (cf. Thm. 7.5).

4 Compatibility of Partitions and Hierarchies

We first show that, for every partition, there is a compatible hierarchy.

Lemma 4.1.

For every partition 𝒫\mathcal{P} of XX, the set system 𝒫𝒫{{x}xX}{X}\mathcal{H}_{\mathcal{P}}\coloneqq\mathcal{P}\cup\{\{x\}\mid x\in X\}\cup\{X\} is a hierarchy that is compatible with 𝒫\mathcal{P}. In particular, every partition 𝒫\mathcal{P} of XX is compatible with a rooted tree on XX.

Proof.

Since 𝒫\mathcal{P} is a partition of XX, no two sets overlap. This remains true for 𝒫\mathcal{H}_{\mathcal{P}} and thus 𝒫\mathcal{H}_{\mathcal{P}} is a hierarchy. Moreover, it is easy to see that if all edges incident to the root of the tree TT corresponding to \mathcal{H} are added to HH, then 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T,H) whenever 𝒫{X}\mathcal{P}\neq\{X\}. In case 𝒫={X}\mathcal{P}=\{X\}, TT is a star tree and we have 𝒫=(T,)\mathcal{P}=\mathcal{F}(T,\emptyset). ∎

The following result is a key step for the characterization of compatible partitions and hierarchies. In particular, it shows that, for any two elements A,B𝒫A,B\in\mathcal{P}, the set BB can only intersect with at most one child of lcaT(A)\operatorname{lca}_{T}(A).

Lemma 4.2.

If a partition 𝒫\mathcal{P} of XX and a rooted tree TT on XX are compatible, then, for all A,B𝒫A,B\in\mathcal{P}, there are no two distinct children u,uchild(lcaT(A))u,u^{\prime}\in\operatorname{child}(\operatorname{lca}_{T}(A)) such that BL(T(u))B\cap L(T(u))\neq\emptyset and BL(T(u))B\cap L(T(u^{\prime}))\neq\emptyset.

Proof.

Let A𝒫A\in\mathcal{P} and put vAlcaT(A)v_{A}\coloneqq\operatorname{lca}_{T}(A). Since 𝒫\mathcal{P} is compatible with \mathcal{H}, there is a set of separating edges HE(T)H\subseteq E(T) such that P=(T,H)P=\mathcal{F}(T,H). Assume, for contradiction, that there are two children u,uchild(vA)u,u^{\prime}\in\operatorname{child}(v_{A}) and a set B𝒫{A}B\in\mathcal{P}\setminus\{A\} such that BL(T(u))B\cap L(T(u))\neq\emptyset and BL(T(u))B\cap L(T(u^{\prime}))\neq\emptyset. Since vA=lcaT(A)v_{A}=\operatorname{lca}_{T}(A), there are children w,wchild(vA)w,w^{\prime}\in\operatorname{child}(v_{A}) such that AL(T(w))A\cap L(T(w))\neq\emptyset and AL(T(w))A\cap L(T(w^{\prime}))\neq\emptyset. The vertices w,ww,w^{\prime} are not necessarily distinct from u,uu,u^{\prime}. However, we can assume w.l.o.g. that uwu\neq w. Since 𝒫\mathcal{P} is compatible with \mathcal{H}, there is no separating edge on the path from aa to aa^{\prime} for any aAL(T(w))a\in A\cap L(T(w))\neq\emptyset and aAL(T(w))a^{\prime}\in A\cap L(T(w^{\prime}))\neq\emptyset. In particular, the path from aa to vAv_{A} does not contain a separating edge. Similarly, for all vertices bBL(T(u))b\in B\cap L(T(u))\neq\emptyset and bBL(T(u))b^{\prime}\in B\cap L(T(u^{\prime}))\neq\emptyset, there is no separating edge on the path from bb to bb^{\prime}, and the path from any such bb to vAv_{A} does not contain a separating edge. Since uwu\neq w and both uu and ww are children of vAv_{A}, the path from aa to bb is the concatenation of the path from aa to vAv_{A} and the path from bb to vAv_{A}, and thus, does not contain a separating edge. Hence aa and bb are in the same connected component of (T,H)\mathcal{F}(T,H); a contradiction since AA and BB are disjoint by assumption. ∎

Refer to caption
Figure 5: The tree TT (on X={ABC1C2C3D{e}}X=\{A\cup B\cup C_{1}\cup C_{2}\cup C_{3}\cup D\cup\{e\}\}) and the partition 𝒫={A,B,CC1C2C3,D,E{e}}\mathcal{P}=\{A,B,C\coloneqq C_{1}\cup C_{2}\cup C_{3},D,E\coloneqq\{e\}\} are compatible since removal of the separating edges (dashed-lined) that are chosen according to Eq. (3) induces 𝒫\mathcal{P}. To emphasize its connectedness after removal of the separating edges, the minimal subtree connecting all elements in CC is highlighted in cyan. Let \mathcal{H} be the hierarchy corresponding to TT. For some Y𝒫Y\in\mathcal{P}, we have Y=YY=Y_{\mathcal{H}}, that is, L(T(v))=AL(T(v))=A_{\mathcal{H}} (i.e. the vertex vv corresponds to AA_{\mathcal{H}}\in\mathcal{H}), L(T(w))=BL(T(w))=B_{\mathcal{H}}, L(T(u))=CL(T(u))=C_{\mathcal{H}}, and L(T(e))={e}=EL(T(e))=\{e\}=E_{\mathcal{H}}. For CC it holds that CC=L(T(u))C\subsetneq C_{\mathcal{H}}=L(T(u)). Moreover, some elements in 𝒫\mathcal{P} overlap with elements in \mathcal{H}, e.g., C𝒫C\in\mathcal{P} overlaps with DC3D\cup C_{3}\in\mathcal{H}. Note that it suffices to have only one of {ρT,u}\{\rho_{T},u\} and {ρT,e}\{\rho_{T},e\} as a separating edge.

Using the correspondence between phylogenetic trees and hierarchies (cf. Thm. 2.2), one can translate Lemma 4.2 to the language of hierarchies:

Corollary 4.3.

Suppose a partition 𝒫\mathcal{P} of XX and a hierarchy \mathcal{H} on XX (with corresponding tree TT) are compatible and A,B𝒫A,B\in\mathcal{P} are distinct. Then lcaT(A)lcaT(B)\operatorname{lca}_{T}(A)\neq\operatorname{lca}_{T}(B) and A=cl(A)cl(B)=BA_{\mathcal{H}}=\operatorname{cl}(A)\neq\operatorname{cl}(B)=B_{\mathcal{H}}.

Lemma 4.4.

Suppose a partition 𝒫\mathcal{P} of XX and a hierarchy \mathcal{H} on XX are compatible, and A𝒫A\in\mathcal{P}. The set AA_{\mathcal{H}} does not overlap with any B𝒫B\in\mathcal{P}.

Proof.

Assume, for contradiction, that AA_{\mathcal{H}} and BB overlap for some A,B𝒫A,B\in\mathcal{P}. Since AAA\subseteq A_{\mathcal{H}} and BAB\not\subseteq A_{\mathcal{H}}, we have ABA\neq B and, by Lemma 4.2, ABA_{\mathcal{H}}\neq B_{\mathcal{H}}. Since BB overlaps with AA_{\mathcal{H}} and \mathcal{H} is a hierarchy, we have ABA_{\mathcal{H}}\subsetneq B_{\mathcal{H}}. Now let TT be the tree of \mathcal{H} and set vAlcaT(A)TlcaT(B)vBv_{A}\coloneqq\operatorname{lca}_{T}(A_{\mathcal{H}})\prec_{T}\operatorname{lca}_{T}(B_{\mathcal{H}})\eqqcolon v_{B}. Since AA_{\mathcal{H}} is the unique inclusion-minimal element in \mathcal{H} that contains AA, there are two distinct children u,uchild(vA)u,u^{\prime}\in\operatorname{child}(v_{A}) such that AL(T(u))A\cap L(T(u))\neq\emptyset and AL(T(u))A\cap L(T(u^{\prime}))\neq\emptyset. Let aAL(T(u))a\in A\cap L(T(u)) and aAL(T(u))a^{\prime}\in A\cap L(T(u^{\prime})). Since 𝒫\mathcal{P} and \mathcal{H} are compatible, there is a set of separating edges HE(T)H\subseteq E(T) such that 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T,H). Since a,aAa,a^{\prime}\in A, the unique path Pa,aP_{a,a^{\prime}} from aa to aa^{\prime} in TT, and in particular the path Pa,vAP_{a,v_{A}} from aa to vAv_{A}, cannot contain any separating edge. Furthermore, since AA_{\mathcal{H}} and BB overlap, there is an element bABb\in A_{\mathcal{H}}\cap B. Since AA and BB are disjoint, we have ba,ab\neq a,a^{\prime}. Hence, bTu′′b\preceq_{T}u^{\prime\prime} for some child u′′child(vA)u^{\prime\prime}\in\operatorname{child}(v_{A}). Since uu and uu^{\prime} are distinct children of vAv_{A}, we can assume w.l.o.g. that u′′uu^{\prime\prime}\neq u. Moreover, vATvBv_{A}\prec_{T}v_{B} together with the fact that BB_{\mathcal{H}} is the unique inclusion-minimal element in \mathcal{H} that contains BB, implies that there are two distinct children w,wchild(vB)w,w^{\prime}\in\operatorname{child}(v_{B}) that satisfy bTu′′TvATwb\preceq_{T}u^{\prime\prime}\prec_{T}v_{A}\preceq_{T}w and bBL(T(w))b^{\prime}\in B\cap L(T(w^{\prime})). Since b,bBb,b^{\prime}\in B, the unique path Pb,bP_{b,b^{\prime}} from bb to bb^{\prime} in TT, and in particular the path Pb,vAP_{b,v_{A}} from bb to vAv_{A}, cannot contain a separating edge. Since uu′′u\neq u^{\prime\prime} and both uu and u′′u^{\prime\prime} are children of vAv_{A}, the path from aa to bb is the concatenation of the paths Pa,vAP_{a,v_{A}} and Pb,vAP_{b,v_{A}}, and thus, does not contain a separating edge. Therefore, there is a set C(T,H)C\in\mathcal{F}(T,H) with a,bCa,b\in C. Now, aAa\in A and bBb\in B implies that 𝒫(T,H)\mathcal{P}\neq\mathcal{F}(T,H); a contradiction. ∎

Theorem 4.5.

Let \mathcal{H} be a hierarchy on XX and 𝒫\mathcal{P} be a partition of XX. Then, 𝒫\mathcal{P} and \mathcal{H} are compatible if and only if the following two conditions are satisfied for all A,B𝒫A,B\in\mathcal{P}:

  1. (i)

    AA_{\mathcal{H}} is a union of sets of 𝒫\mathcal{P}.

  2. (ii)

    If A=BA_{\mathcal{H}}=B_{\mathcal{H}}, then A=BA=B.

Proof.

Let TT be the tree with (T)=\mathcal{H}(T)=\mathcal{H}.

Assume that 𝒫\mathcal{P} is compatible with \mathcal{H}. First observe that, for all A𝒫A\in\mathcal{P}, AAA\subseteq A_{\mathcal{H}} implies that AA_{\mathcal{H}} is the union of sets of 𝒫\mathcal{P} if and only if AA_{\mathcal{H}} does not overlap with any B𝒫B\in\mathcal{P}. Hence, Condition (i) follows immediately from Lemma 4.4. Condition (ii) follows immediately from the fact that A=BA_{\mathcal{H}}=B_{\mathcal{H}} implies lcaT(A)=lcaT(B)\operatorname{lca}_{T}(A)=\operatorname{lca}_{T}(B) and Cor. 4.3.

Now suppose (i) and (ii) holds. Consider the tree TT and the following set of separating edges

H{{par(lcaT(A)),lcaT(A)}A𝒫,lcaT(A)ρT},H\coloneqq\left\{\{\operatorname{par}(\operatorname{lca}_{T}(A)),\operatorname{lca}_{T}(A)\}\mid A\in\mathcal{P},\ \operatorname{lca}_{T}(A)\neq\rho_{T}\right\}, (3)

Thus, an edge e={u,v}E(T)e=\{u,v\}\in E(T) is a separating edge if and only if L(T(v))=AL(T(v))=A_{\mathcal{H}} and thus v=lcaT(A)v=\operatorname{lca}_{T}(A) for some A𝒫A\in\mathcal{P}.

We first show that any two distinct A,B𝒫A,B\in\mathcal{P} are separated by at least one separating edge in HH, i.e., there is no path in THT-H connecting any vertex aAa\in A with any vertex bBb\in B. To this end, let A,B𝒫A,B\in\mathcal{P} be chosen arbitrarily but distinct. By contraposition of Condition (ii), we have ABA_{\mathcal{H}}\neq B_{\mathcal{H}}. Therefore and since \mathcal{H} is a hierarchy, we have to consider the two cases (a) AB=A_{\mathcal{H}}\cap B_{\mathcal{H}}=\emptyset, and (b) ABA_{\mathcal{H}}\subsetneq B_{\mathcal{H}} or BAB_{\mathcal{H}}\subsetneq A_{\mathcal{H}}. Case (a) corresponds to the situation in which vAlcaT(A)v_{A}\coloneqq\operatorname{lca}_{T}(A) and vBlcaT(B)v_{B}\coloneqq\operatorname{lca}_{T}(B) are incomparable in TT, which is, moreover, only possible if neither vAv_{A} nor vBv_{B} are the root. Hence, the edges {par(vA),vA}\{\operatorname{par}(v_{A}),v_{A}\} and {par(vB),vB}\{\operatorname{par}(v_{B}),v_{B}\} are contained in HH and every path from some aAa\in A to some bBb\in B contains these two edges. Thus, the two sets AA and BB are separated by separating edges in TT. In case (b), we assume w.l.o.g. that ABA_{\mathcal{H}}\subsetneq B_{\mathcal{H}} which corresponds to the situation in which vAlcaT(A)TlcaT(B)vBv_{A}\coloneqq\operatorname{lca}_{T}(A)\prec_{T}\operatorname{lca}_{T}(B)\eqqcolon v_{B}. Hence, vAv_{A} is not the root of TT, and we have {par(vA),vA}H\{\operatorname{par}(v_{A}),v_{A}\}\in H. Since L(T(vA))=AL(T(v_{A}))=A_{\mathcal{H}} contains all elements in AA, the two sets AA and BB are completely separated by this separating edge if AA_{\mathcal{H}} does not contain any element of BB. Thus, assume for contradiction that ABA_{\mathcal{H}}\cap B\neq\emptyset. This together with the facts that AABA\subseteq A_{\mathcal{H}}\subsetneq B_{\mathcal{H}}, AB=A\cap B=\emptyset and BB_{\mathcal{H}} is inclusion-minimal for BB implies that AA_{\mathcal{H}} and BB overlap. Therefore, AA_{\mathcal{H}} is not the union of sets of 𝒫\mathcal{P}; a contradiction to Condition (i).

It remains to show that no set APA\in P contains two elements a,aAa,a^{\prime}\in A which are separated by a separating edge in TT. Thus, assume for contradiction that there is such an edge e={u,v}He=\{u,v\}\in H lying on the path that connects two a,aAa,a^{\prime}\in A for some APA\in P. We can assume w.l.o.g. that aL(T(v))a\in L(T(v)) and aL(T)L(T(v))a^{\prime}\in L(T)\setminus L(T(v)). Since eHe\in H, we have that v=lcaT(B)v=\operatorname{lca}_{T}(B) corresponds to BB_{\mathcal{H}} for some B𝒫B\in\mathcal{P}. Since aB=L(T(v))a\in B_{\mathcal{H}}=L(T(v)) but aBa^{\prime}\notin B_{\mathcal{H}} and by similar arguments as above, the sets AA and BB_{\mathcal{H}} overlap which is again a contradiction to Condition (i).

In summary, we have A(T,H)=𝒫A\in\mathcal{F}(T,H)=\mathcal{P} for all A𝒫A\in\mathcal{P} and thus, 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T,H). Therefore, 𝒫\mathcal{P} is compatible with \mathcal{H}. ∎

Refer to caption
Figure 6: The tree TT admits several possible choices for the set HH of separating edges (dashed-lined) such that (T,H)=𝒫={A,B,C}\mathcal{F}(T,H)=\mathcal{P}=\{A,B,C\}. The separating edges in Panel (a) are the ones specified in Eq. (3). The two sets of separating edges in Panel (b) and (c) are both of minimum size, while the set of separating edges in Panel (d) is of maximum size. The set of separating edges of minimum size thus is not unique in general. In contrast, Cor. 7.7 implies that the maximum-sized set of separating edges is always unique. The set HH defined by Eq. (3) is neither of minimum nor of maximum size.

The second part of the proof of Theorem 4.5 implies a simple algorithm to determine whether 𝒫\mathcal{P} and \mathcal{H} are compatible and to construct a (minimal) set HH of separating edges that realizes the partition 𝒫\mathcal{P} on TT: In Section 7, we derive a linear-time compatibility test algorithm. The set HH of separating edges in Eq. (3) can also be constructed in polynomial time. Moreover, if there is an element A𝒫A\in\mathcal{P} with A=XA_{\mathcal{H}}=X, then the set HH as in Eq. (3) is minimal because it contains, by construction, |𝒫|1|\mathcal{P}|-1 separating edges, i.e., the minimal number of splits required to decompose a tree into |𝒫||\mathcal{P}| connected components. If there is no A𝒫A\in\mathcal{P} with A=XA_{\mathcal{H}}=X, however, then there is one edge too many. A minimal set of separating edges can easily obtained in this case by omitting the separating edge {par(lcaT(A)),lcaT(A)}\{\operatorname{par}(\operatorname{lca}_{T}(A)),\operatorname{lca}_{T}(A)\} for one of the sets that are inclusion-maximal among the inclusion-minimal sets AA_{\mathcal{H}}, i.e., those which correspond to vertices that are closest to the root (see Fig. 5 and 6 for further examples). We summarize the latter discussion in the following

Lemma 4.6.

Suppose that a partition 𝒫\mathcal{P} of XX and a rooted tree TT on XX are compatible. Then, there always exists a minimum-sized set of separating edges HH^{*} such that 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T,H^{*}) and |H|=|𝒫|1|H^{*}|=|\mathcal{P}|-1. In particular, if HH is chosen as in Eq. (3), then |H|{|𝒫|1,|𝒫|}|H|\in\{|\mathcal{P}|-1,|\mathcal{P}|\} and |H|=|𝒫|1|H|=|\mathcal{P}|-1 if and only if there is no A𝒫A\in\mathcal{P} with A=XA_{\mathcal{H}}=X.

The following result is a simple consequence of Thm. 4.5 and the discussion above.

Corollary 4.7.

Let 𝒫\mathcal{P} be a partition of XX, \mathcal{H} a hierarchy on XX with corresponding tree TT, and let HH be the edge set defined in Eq. (3). Then 𝒫\mathcal{P} and \mathcal{H} are compatible if and only if 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T,H).

Lemma 4.8.

If the partition 𝒫\mathcal{P} and the hierarchy \mathcal{H} on XX are compatible, then the following conditions hold for all A,B𝒫A,B\in\mathcal{P}: If BAB\subseteq A_{\mathcal{H}} and BAB\neq A, then BA=B_{\mathcal{H}}\cap A=\emptyset.

Proof.

By Property (i) of Thm. 4.5, BB_{\mathcal{H}} is a union of sets of 𝒫\mathcal{P}, and thus either (a) ABA\subseteq B_{\mathcal{H}} or (b) AB=A\cap B_{\mathcal{H}}=\emptyset. In case (a), we have A=cl(A)cl(B)=BA_{\mathcal{H}}=\operatorname{cl}(A)\subseteq\operatorname{cl}(B_{\mathcal{H}})=B_{\mathcal{H}} by isotony and idempotence of the closure. Similarly, we obtain BAB_{\mathcal{H}}\subseteq A_{\mathcal{H}} from the assumption BAB\subseteq A_{\mathcal{H}}. Therefore, A=BA_{\mathcal{H}}=B_{\mathcal{H}}. By Property (ii) of Thm. 4.5 this is a contradiction to ABA\neq B. Thus, case (a) is impossible and we always have AB=A\cap B_{\mathcal{H}}=\emptyset. ∎

Thm. 4.5 and Lemma 4.8 together can be rephrased as

Corollary 4.9.

The partition 𝒫\mathcal{P} and the hierarchy \mathcal{H} are compatible if and only if

A=AB𝒫BABA=A_{\mathcal{H}}\setminus\bigcup_{\begin{subarray}{c}B\in\mathcal{P}\\ B_{\mathcal{H}}\subsetneq A_{\mathcal{H}}\end{subarray}}B_{\mathcal{H}} (4)

holds for all A𝒫A\in\mathcal{P}.

5 Compatibility of (Systems of) Partitions and Refinements of Hierarchies

In many applications hierarchies (and their associated trees) are not necessarily fully resolved, even though it is often assumed that the “ground truth” is a binary tree. We show first that refining a hierarchy \mathcal{H} or a tree TT that is already compatible with a partition 𝒫\mathcal{P} never destroys compatibility.

Proposition 5.1.

A hierarchy \mathcal{H} on XX and a partition 𝒫\mathcal{P} of XX are compatible if and only if 𝒫\mathcal{P} is compatible with every refinement \mathcal{H}^{*} of \mathcal{H}.

Proof.

The if-direction immediately follows from the fact that =\mathcal{H}^{*}=\mathcal{H} is a refinement of \mathcal{H}. For the only-if-direction, let TT be the tree corresponding to \mathcal{H}, denote by HE(T)H\subseteq E(T) the set of separating edges as defined in Eq. (3). Thus, we have e={parT(v),v}E(T)He=\{\operatorname{par}_{T}(v),v\}\in E(T)\cap H if and only if v=lcaT(A)ρTv=\operatorname{lca}_{T}(A)\neq\rho_{T}. Since \mathcal{H} and 𝒫\mathcal{P} are compatible, we can apply Cor. 4.7 to conclude that 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T,H). Therefore, the path connecting any two vertices aAa\in A and bBb\in B from distinct A,B𝒫A,B\in\mathcal{P} contains at least one edge in HH. Let 𝒴\mathcal{Y} be the set of all YY\in\mathcal{H} with v=lcaT(Y)v=\operatorname{lca}_{T}(Y) and {parT(v),v}H\{\operatorname{par}_{T}(v),v\}\in H. For all Y𝒴{X}Y\in\mathcal{Y}\subseteq\mathcal{H}\setminus\{X\}, therefore, there is an A𝒫A\in\mathcal{P} such that Y=AY=A_{\mathcal{H}}.

Now consider an arbitrary refinement \mathcal{H}^{*} of \mathcal{H} and the corresponding refinement TT^{*} of TT. By construction, we have 𝒴\mathcal{Y}\subset\mathcal{H}\subseteq\mathcal{H}^{*}. For each Y𝒴Y\in\mathcal{Y}, we set vylcaT(Y)v_{y}\coloneqq\operatorname{lca}_{T^{*}}(Y) and set

H{{parT(vY),vY}Y𝒴}.H^{*}\coloneqq\left\{\{\operatorname{par}_{T^{*}}(v_{Y}),v_{Y}\}\mid Y\in\mathcal{Y}\right\}.

Since X𝒴X\notin\mathcal{Y} by construction, we have vYρTv_{Y}\neq\rho_{T^{*}} and thus parT(vY)\operatorname{par}_{T^{*}}(v_{Y}) and, in particular, HH^{*} is well-defined.

Now consider two arbitrary two vertices aAa\in A and bBb\in B in distinct A,B𝒫A,B\in\mathcal{P}. As argued above, the path connecting them in TT contains an edge {parT(v),v}H\{\operatorname{par}_{T}(v),v\}\in H. By construction, we have YL(T(v))𝒴Y\coloneqq L(T(v))\in\mathcal{Y}. We can assume w.l.o.g. that aYa\in Y and bXYb\in X\setminus Y. This together with YY\in\mathcal{H}^{*} implies that the path connecting aa and bb in TT^{*} contains the edge {parT(vY),vY}\{\operatorname{par}_{T^{*}}(v_{Y}),v_{Y}\}. By construction of HH^{*} and since Y𝒴Y\in\mathcal{Y}, we have {parT(vY),vY}H\{\operatorname{par}_{T^{*}}(v_{Y}),v_{Y}\}\in H^{*}.

It remains to show that the path in TT^{*} connecting any two a,aAa,a^{\prime}\in A for some A𝒫A\in\mathcal{P} never contains an edge that is in HH^{*}. Assume, for contradiction, that this is the case, i.e., there is an edge {parT(v),v}H\{\operatorname{par}_{T^{*}}(v),v\}\in H^{*} such that w.l.o.g. aL(T(v))Ya\in L(T(v))\eqqcolon Y^{\prime} and aXYa^{\prime}\in X\setminus Y^{\prime}. By construction of HH^{*}, Y𝒴Y^{\prime}\in\mathcal{Y} and Y=BY^{\prime}=B_{\mathcal{H}} for some B𝒫B\in\mathcal{P}. Since \mathcal{H} and 𝒫\mathcal{P} are compatible, Thm. 4.5(i) implies that Y=BY^{\prime}=B_{\mathcal{H}} is the union of sets of 𝒫\mathcal{P}; a contradiction to the fact that aYa\in Y^{\prime} and aYa^{\prime}\notin Y^{\prime}.

In summary, we conclude that 𝒫=(T,H)\mathcal{P}=\mathcal{F}(T^{*},H^{*}), and, therefore, 𝒫\mathcal{P} is compatible with \mathcal{H}^{*}. ∎

We next provide a necessary condition for r-compatibility. We will show later that this condition is also sufficient.

Lemma 5.2.

Let \mathcal{H} be a hierarchy on XX and 𝒫\mathcal{P} a partition of XX. If 𝒫\mathcal{P} is compatible with a refinement \mathcal{H}^{*} of \mathcal{H}, then there is no set YY\in\mathcal{H} that overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P}.

Proof.

Let 𝒫\mathcal{P} and \mathcal{H}^{*} with \mathcal{H}\subseteq\mathcal{H}^{*} be compatible. Assume, for contradiction, that some YY\in\mathcal{H} overlaps with two distinct A,B𝒫A,B\in\mathcal{P}. First observe that YY\in\mathcal{H}^{*}. Now consider AA_{\mathcal{H}^{*}} and BB_{\mathcal{H}^{*}}. Since ABA\neq B and 𝒫\mathcal{P} is compatible with \mathcal{H}^{*}, we have ABA_{\mathcal{H}^{*}}\neq B_{\mathcal{H}^{*}} by contraposition of Thm. 4.5(ii). Since YY overlaps with AA and BB, both AA and BB, and thus AA_{\mathcal{H}^{*}} and BB_{\mathcal{H}^{*}}, contain elements that are in YY and as well as elements that are not in YY. This together with the fact that YY, AA_{\mathcal{H}^{*}}, and BB_{\mathcal{H}^{*}} are all sets in \mathcal{H}^{*} implies that YAY\subsetneq A_{\mathcal{H}^{*}} and YBY\subsetneq B_{\mathcal{H}^{*}}. Therefore we have YABY\subseteq A_{\mathcal{H}^{*}}\cap B_{\mathcal{H}^{*}}. Since YY\neq\emptyset, \mathcal{H}^{*} is a hierarchy, and ABA_{\mathcal{H}^{*}}\neq B_{\mathcal{H}^{*}}, this implies that either ABA_{\mathcal{H}^{*}}\subsetneq B_{\mathcal{H}^{*}} or BAB_{\mathcal{H}^{*}}\subsetneq A_{\mathcal{H}^{*}}. Assume w.l.o.g. that ABA_{\mathcal{H}^{*}}\subsetneq B_{\mathcal{H}^{*}}. Since YY overlaps with BB and YAY\subsetneq A_{\mathcal{H}^{*}}, we have ABA_{\mathcal{H}^{*}}\cap B\neq\emptyset. However, since ABA_{\mathcal{H}^{*}}\subsetneq B_{\mathcal{H}^{*}} and BB_{\mathcal{H}^{*}} is inclusion-minimal for BB in the hierarchy \mathcal{H}^{*}, we conclude that BB and AA_{\mathcal{H}^{*}} overlap. Hence, AA_{\mathcal{H}^{*}} is not the union of sets of 𝒫\mathcal{P}. This violates Condition (i) for compatible partitions in Thm. 4.5; a contradiction. ∎

To show that the converse of Lemma 5.2 is satisfied as well, we will explicitly construct a compatible refinement of \mathcal{H}. To this end, we introduce the following subset of a partition 𝒫\mathcal{P}:

𝔜(,𝒫){A𝒫B𝒫{A} with BA and AB}\mathfrak{Y}(\mathcal{H},\mathcal{P})\coloneqq\{A\in\mathcal{P}\mid\exists B\in\mathcal{P}\setminus\{A\}\text{ with }B\cap A_{\mathcal{H}}\neq\emptyset\text{ and }A_{\mathcal{H}}\subseteq B_{\mathcal{H}}\} (5)

The set 𝔜(,𝒫)\mathfrak{Y}(\mathcal{H},\mathcal{P}) contains the sets A𝒫A\in\mathcal{P} for which AA_{\mathcal{H}}\in\mathcal{H} or, equivalently, the vertex u=lcaT(A)u=\operatorname{lca}_{T}(A) is “not resolved enough”:

Proposition 5.3.

A hierarchy \mathcal{H} on XX and a partition 𝒫\mathcal{P} of XX are compatible if and only if 𝔜(,𝒫)\mathfrak{Y}(\mathcal{H},\mathcal{P}) is empty.

Proof.

By Thm. 4.5, \mathcal{H} and 𝒫\mathcal{P} are compatible if and only if, for all A,B𝒫A,B\in\mathcal{P}, the following two conditions are satisfied: (i) AA_{\mathcal{H}} is a union of sets of 𝒫\mathcal{P}, and (ii) A=BA_{\mathcal{H}}=B_{\mathcal{H}} implies A=BA=B. Hence, it suffices to show that 𝔜(,𝒫)=\mathfrak{Y}(\mathcal{H},\mathcal{P})=\emptyset is equivalent to these two conditions.

First assume, for contraposition, that 𝔜(,𝒫)\mathfrak{Y}(\mathcal{H},\mathcal{P}) is not empty. Hence, there are distinct A,B𝒫A,B\in\mathcal{P} such that BAB\cap A_{\mathcal{H}}\neq\emptyset and ABA_{\mathcal{H}}\subseteq B_{\mathcal{H}}. If A=BA_{\mathcal{H}}=B_{\mathcal{H}}, then Condition (ii) is violated. If on the other hand ABA_{\mathcal{H}}\subsetneq B_{\mathcal{H}}, then the fact that BB_{\mathcal{H}} is inclusion-minimal for BB implies that BAB\setminus A_{\mathcal{H}}\neq\emptyset. This together with BAB\cap A_{\mathcal{H}}\neq\emptyset in turn implies that AA_{\mathcal{H}} is not a union of sets of 𝒫\mathcal{P}; a violation of Condition (i).

The converse can be shown by very similar arguments starting with the assumption that Condition (i) or (ii) is not satisfied. If Condition (i) does not hold, then AXA_{\mathcal{H}}\neq X and, in particular, there is a B𝒫B\in\mathcal{P} such that AA_{\mathcal{H}} and BB overlap. Since HH is a hierarchy, it holds that ABXA_{\mathcal{H}}\subsetneq B_{\mathcal{H}}\subseteq X. The latter two arguments imply 𝔜(,𝒫)\mathfrak{Y}(\mathcal{H},\mathcal{P})\neq\emptyset since it contains AA. If Condition (ii) does not hold, then there are two distinct elements A,B𝒫A,B\in\mathcal{P} with A=BA_{\mathcal{H}}=B_{\mathcal{H}}. Hence, BBB\subseteq B_{\mathcal{H}} implies BAB\cap A_{\mathcal{H}}\neq\emptyset. Thus, A𝔜(,𝒫)A\in\mathfrak{Y}(\mathcal{H},\mathcal{P}). ∎

In particular, the set 𝔜(,𝒫)\mathfrak{Y}(\mathcal{H},\mathcal{P}) can be used to characterize the cases in which a compatible refinement of \mathcal{H} exists and, if this is the case, to construct such a refinement.

Definition 5.4.

Let \mathcal{H} be a hierarchy on XX and 𝒫\mathcal{P} a partition of XX such that no set YY\in\mathcal{H} overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P}. We define, for A𝔜(,𝒫)A\in\mathfrak{Y}(\mathcal{H},\mathcal{P}), the subset YAY_{A} of AA_{\mathcal{H}} as

YAW1W2WkY_{A}\coloneqq W_{1}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{2}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}\dots\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{k}

where W1,W2,WkW_{1},W_{2},\dots W_{k}\in\mathcal{H} are the child clusters of AA_{\mathcal{H}} for which WiAW_{i}\cap A\neq\emptyset. Moreover, the subset 𝒫\mathcal{H}^{*}_{\mathcal{P}} of 2X2^{X} is given by

𝒫{YAA𝔜(,𝒫)}.\mathcal{H}^{*}_{\mathcal{P}}\coloneqq\mathcal{H}\cup\{Y_{A}\mid A\in\mathfrak{Y}(\mathcal{H},\mathcal{P})\}.

Note that, by Eq. (5), for every A𝔜(,𝒫)A\in\mathfrak{Y}(\mathcal{H},\mathcal{P}), the set AA_{\mathcal{H}} cannot be a singleton. Since AA_{\mathcal{H}} is inclusion-minimal w.r.t. AA by definition, we immediately conclude that k2k\geq 2 and thus the WiW_{i} are proper subsets of YAY_{A}. In terms of the tree TT corresponding to \mathcal{H}, the vertices wilcaT(Wi)w_{i}\coloneqq\operatorname{lca}_{T}(W_{i}) are the children of ylcaT(A)y\coloneqq\operatorname{lca}_{T}(A_{\mathcal{H}}) with AL(T(wi))A\cap L(T(w_{i}))\neq\emptyset.

Lemma 5.5.

Let \mathcal{H} be a hierarchy on XX and 𝒫\mathcal{P} a partition of XX such that no set YY\in\mathcal{H} overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P}. Then the following two statements are satisfied:

  1. 1.

    For each A𝔜(,𝒫)A\in\mathfrak{Y}(\mathcal{H},\mathcal{P}), it holds YAY_{A}\notin\mathcal{H} and, in particular, YAAY_{A}\subsetneq A_{\mathcal{H}}.

  2. 2.

    The set 𝒫\mathcal{H}^{*}_{\mathcal{P}} is a hierarchy.

Proof.

To show (1), set A𝔜𝔜(,𝒫)A\in\mathfrak{Y}\coloneqq\mathfrak{Y}(\mathcal{H},\mathcal{P}). By construction, we have YAAY_{A}\subseteq A_{\mathcal{H}}. Assume for contradiction that YA=AY_{A}=A_{\mathcal{H}}, i.e., all child clusters of AA_{\mathcal{H}} in \mathcal{H} have a non-empty intersection with AA. By definition of 𝔜\mathfrak{Y}, we must have |A|>1|A_{\mathcal{H}}|>1 and, in particular, AA_{\mathcal{H}} has a child cluster YAY^{\prime}\subsetneq A_{\mathcal{H}} satisfying BYB\cap Y^{\prime}\neq\emptyset for some B𝒫{A}B\in\mathcal{P}\setminus\{A\} such that ABA_{\mathcal{H}}\subseteq B_{\mathcal{H}}. Thus YAY^{\prime}\setminus A\neq\emptyset and AYA\setminus Y^{\prime}\neq\emptyset, i.e., AA and YY^{\prime} overlap. Since YABY^{\prime}\subsetneq A_{\mathcal{H}}\subseteq B_{\mathcal{H}} and BB_{\mathcal{H}} is inclusion-minimal w.r.t. BB, we conclude that BYB\setminus Y^{\prime}\neq\emptyset. On the other hand, since YAY^{\prime}\cap A\neq\emptyset and A,B𝒫A,B\in\mathcal{P} are disjoint, we have YBY^{\prime}\setminus B\neq\emptyset. Thus BB and YY^{\prime} overlap. Thus there are two distinct sets A,B𝒫A,B\in\mathcal{P} that overlap with YY^{\prime}\in\mathcal{H}. This contradicts the assumption that no such pair of sets exists, hence YAAY_{A}\neq A_{\mathcal{H}}. Since YAYY_{A}\subseteq Y by construction, YAY_{A} is a proper subset of AA_{\mathcal{H}}. This together with the fact that \mathcal{H} is a hierarchy and YAY_{A} is the union of at least two child clusters of AA_{\mathcal{H}} in \mathcal{H} implies that YAY_{A}\notin\mathcal{H}.

We proceed by showing that the set system 𝒫={YAA𝔜}\mathcal{H}^{*}_{\mathcal{P}}=\mathcal{H}\cup\{Y_{A}\mid A\in\mathfrak{Y}\} is again a hierarchy and thus, that (2) is satisfied. To this end, consider first one of the newly-created sets YA=W1W2WkY_{A}=W_{1}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{2}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}\dots\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{k} and an arbitrary set YY^{\prime}\in\mathcal{H}. If WiY=W_{i}\cap Y^{\prime}=\emptyset for all 1ik1\leq i\leq k, then YAY=Y_{A}\cap Y^{\prime}=\emptyset. Otherwise, there is some WiW_{i} such that WiYW_{i}\cap Y^{\prime}\neq\emptyset. Since both WiW_{i} and YY^{\prime} are sets of the hierarchy \mathcal{H}, this implies that YWiYAY^{\prime}\subseteq W_{i}\subsetneq Y_{A}, or WiYW_{i}\subsetneq Y^{\prime}. In the latter case, we have AYA_{\mathcal{H}}\subseteq Y^{\prime} by the hierarchy property of \mathcal{H} and the fact that WiW_{i} is a child cluster of A()A_{\mathcal{H}}(\in\mathcal{H}). Hence, we have YAY{,YA,Y}Y_{A}\cap Y^{\prime}\in\{\emptyset,Y_{A},Y^{\prime}\} for all YY^{\prime}\in\mathcal{H}. Now consider two newly-created sets YAY_{A} and YBY_{B} for distinct A,B𝔜A,B\in\mathfrak{Y}, and assume first that ABA_{\mathcal{H}}\neq B_{\mathcal{H}}. If AB=A_{\mathcal{H}}\cap B_{\mathcal{H}}=\emptyset, then YAAY_{A}\subsetneq A_{\mathcal{H}} and YBBY_{B}\subsetneq B_{\mathcal{H}} immediately imply that YAYB=Y_{A}\cap Y_{B}=\emptyset. Otherwise, we can assume w.l.o.g. that BAB_{\mathcal{H}}\subsetneq A_{\mathcal{H}} since AA_{\mathcal{H}} and BB_{\mathcal{H}} are both sets in the hierarchy \mathcal{H}. This together with the fact that YAY_{A} is the union of child clusters W1,W2,,WkW_{1},W_{2},\dots,W_{k}\in\mathcal{H} of AA_{\mathcal{H}} implies that either BWiB_{\mathcal{H}}\subseteq W_{i} for some 1ik1\leq i\leq k, and thus YBBWiYAY_{B}\subsetneq B_{\mathcal{H}}\subseteq W_{i}\subsetneq Y_{A}, or, if no such WiW_{i} exists, YBYA=Y_{B}\cap Y_{A}=\emptyset. It remains to consider two distinct sets A,B𝔜A,B\in\mathfrak{Y} with A=BA_{\mathcal{H}}=B_{\mathcal{H}}. If YAY_{A} and YBY_{B} overlap, then there is by construction a child cluster WW^{\prime} of A=BA_{\mathcal{H}}=B_{\mathcal{H}} in \mathcal{H} such that WAW^{\prime}\cap A\neq\emptyset and WBW^{\prime}\cap B\neq\emptyset. Since, moreover, A=BA_{\mathcal{H}}=B_{\mathcal{H}} is inclusion-minimal w.r.t. AA and BB, this implies that WW^{\prime}\in\mathcal{H} overlaps with both AA and BB; a contradiction to the assumption. Thus YAY_{A} and YBY_{B} are disjoint. In summary, no two sets in 𝒫\mathcal{H}^{*}_{\mathcal{P}} overlap. Since 𝒫\mathcal{H}\subseteq\mathcal{H}^{*}_{\mathcal{P}}, XX\in\mathcal{H} and {x}\{x\}\in\mathcal{H} for all xXx\in X, we conclude that 𝒫\mathcal{H}^{*}_{\mathcal{P}} is a hierarchy that refines \mathcal{H}. ∎

The final step towards characterizing r-compatibility is a sufficient condition for 𝒫\mathcal{H}^{*}_{\mathcal{P}} to be compatible with 𝒫\mathcal{P}.

Lemma 5.6.

Let \mathcal{H} be a hierarchy on XX and 𝒫\mathcal{P} a partition of XX. If there is no set YY\in\mathcal{H} that overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P}, then the hierarchy 𝒫\mathcal{H}^{*}_{\mathcal{P}} is compatible with 𝒫\mathcal{P}.

Proof.

Recall that 𝒫\mathcal{H}^{*}_{\mathcal{P}} is a hierarchy by Lemma 5.5(2). To prove that 𝒫\mathcal{H}^{*}_{\mathcal{P}} is compatible with 𝒫\mathcal{P}, we show that the Condition (i) and (ii) in Thm. 4.5 are satisfied for 𝒫\mathcal{H}^{*}_{\mathcal{P}} and 𝒫\mathcal{P}.

To show Condition (i) in Thm. 4.5, we assume, for contradiction, that there is a set A𝒫A\in\mathcal{P} for which A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}} is not the union of sets of 𝒫\mathcal{P}. Thus there is a set B𝒫{A}B\in\mathcal{P}\setminus\{A\} such that A𝒫BA_{\mathcal{H}^{*}_{\mathcal{P}}}\cap B\neq\emptyset and BA𝒫B\setminus A_{\mathcal{H}^{*}_{\mathcal{P}}}\neq\emptyset. We distinguish the two cases (1) A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}}\notin\mathcal{H} and (2) A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}}\in\mathcal{H}.
In case (1), we have A𝒫=YAA_{\mathcal{H}^{*}_{\mathcal{P}}}=Y_{A^{\prime}} for some A𝔜A^{\prime}\in\mathfrak{Y}. Thus A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}} is the union W1W2WkW_{1}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{2}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}\dots\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{k} of k2k\geq 2 sets W1,W2,,WkW_{1},W_{2},\dots,W_{k}\in\mathcal{H} all satisfying WiAW_{i}\cap A^{\prime}\neq\emptyset. Since BA𝒫B\setminus A_{\mathcal{H}^{*}_{\mathcal{P}}}\neq\emptyset and by construction AYA=A𝒫A^{\prime}\subseteq Y_{A^{\prime}}=A_{\mathcal{H}^{*}_{\mathcal{P}}}, we have ABA^{\prime}\neq B. Since A𝒫BA_{\mathcal{H}^{*}_{\mathcal{P}}}\cap B\neq\emptyset, there must be some W{W1,W2,,Wk}W^{\prime}\in\{W_{1},W_{2},\dots,W_{k}\}\subset\mathcal{H} such that BWB\cap W^{\prime}\neq\emptyset. Since WA𝒫W^{\prime}\subsetneq A_{\mathcal{H}^{*}_{\mathcal{P}}} and BA𝒫B\setminus A_{\mathcal{H}^{*}_{\mathcal{P}}}\neq\emptyset, we have BWB\setminus W^{\prime}\neq\emptyset. On the other hand, we also have AWA^{\prime}\setminus W^{\prime}\neq\emptyset because k2k\geq 2 and WiAW_{i}\cap A^{\prime}\neq\emptyset for all 1ik1\leq i\leq k. Since AA^{\prime} and BB are disjoint and both have a non-empty intersection with WW^{\prime}, we also obtain WAW^{\prime}\setminus A^{\prime}\neq\emptyset and WBW^{\prime}\setminus B\neq\emptyset. In summary, the set WW^{\prime}\in\mathcal{H} overlaps with the two distinct sets A,B𝒫A^{\prime},B\in\mathcal{P}; a contradiction to the assumption.
In case (2), we have A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}}\in\mathcal{H}. Since A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}} is inclusion-minimal for AA in 𝒫\mathcal{H}^{*}_{\mathcal{P}} and 𝒫\mathcal{H}\subseteq\mathcal{H}^{*}_{\mathcal{P}} we conclude that A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}} is also inclusion-minimal for AA in \mathcal{H}, and thus A𝒫=AA_{\mathcal{H}^{*}_{\mathcal{P}}}=A_{\mathcal{H}}. Since A𝒫BA_{\mathcal{H}^{*}_{\mathcal{P}}}\cap B\neq\emptyset, BA𝒫B\setminus A_{\mathcal{H}^{*}_{\mathcal{P}}}\neq\emptyset, and \mathcal{H} is a hierarchy, we conclude that ABA_{\mathcal{H}}\subsetneq B_{\mathcal{H}}. In summary, we obtain A𝔜A\in\mathfrak{Y}. Hence, we have added a set YAY_{A} that satisfies AYAA\subseteq Y_{A} and, by the arguments above, YAA𝒫Y_{A}\subsetneq A_{\mathcal{H}^{*}_{\mathcal{P}}}. Therefore, A𝒫A_{\mathcal{H}^{*}_{\mathcal{P}}} is not inclusion-minimal for AA in 𝒫\mathcal{H}^{*}_{\mathcal{P}}; a contradiction.

To show Condition (ii) in Thm. 4.5, we assume, for contradiction, that YA𝒫=B𝒫Y^{*}\coloneqq A_{\mathcal{H}^{*}_{\mathcal{P}}}=B_{\mathcal{H}^{*}_{\mathcal{P}}} for two distinct A,B𝒫A,B\in\mathcal{P}. As above, we distinguish the two cases (1’) YY^{*}\notin\mathcal{H} and (2’) YY^{*}\in\mathcal{H}.
In case (1’), we have Y=YAY^{*}=Y_{A^{\prime}} for some A𝔜A^{\prime}\in\mathfrak{Y}. Thus YY^{*} is the union W1W2WkW_{1}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{2}\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}\dots\mathbin{\mathchoice{\leavevmode\vtop{\halign{\hfil$\m@th\displaystyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\textstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}{\leavevmode\vtop{\halign{\hfil$\m@th\scriptscriptstyle#$\hfil\cr\cup\cr\cdot\crcr}}}}W_{k} of k2k\geq 2 sets W1,W2,,WkW_{1},W_{2},\dots,W_{k}\in\mathcal{H} all satisfying WiAW_{i}\cap A^{\prime}\neq\emptyset. Since AA and BB are distinct, we can assume w.l.o.g. that ABA^{\prime}\neq B. Since YBY^{*}\cap B\neq\emptyset, there must be some W{W1,W2,,Wk}W^{\prime}\in\{W_{1},W_{2},\dots,W_{k}\}\subset\mathcal{H} such that BWB\cap W^{\prime}\neq\emptyset. Since WYW^{\prime}\subsetneq Y^{*} and YY^{*} is inclusion-minimal for BB in 𝒫\mathcal{H}^{*}_{\mathcal{P}}, we have BWB\setminus W^{\prime}\neq\emptyset. On the other hand, we also have AWA^{\prime}\setminus W^{\prime}\neq\emptyset since k2k\geq 2 and WiAW_{i}\cap A^{\prime}\neq\emptyset for all 1ik1\leq i\leq k. Since AA^{\prime} and BB are disjoint and both have a non-empty intersection with WW^{\prime}, we also obtain WAW^{\prime}\setminus A^{\prime}\neq\emptyset and WBW^{\prime}\setminus B\neq\emptyset. In summary, the set WW^{\prime}\in\mathcal{H} overlaps with the two distinct sets A,B𝒫A^{\prime},B\in\mathcal{P}; a contradiction to the assumption.
In case (2’), we have YY^{*}\in\mathcal{H}. Together with the facts that YY^{*} is inclusion-minimal for AA in 𝒫\mathcal{H}^{*}_{\mathcal{P}} and that 𝒫\mathcal{H}\subseteq\mathcal{H}^{*}_{\mathcal{P}}, YY^{*}\in\mathcal{H} implies that YY^{*} is also inclusion-minimal for AA in \mathcal{H}, i.e. Y=AY^{*}=A_{\mathcal{H}}. Analogous arguments imply Y=BY^{*}=B_{\mathcal{H}}. Since YBY^{*}\cap B\neq\emptyset we conclude that A𝔜A\in\mathfrak{Y}. Therefore, we have added a set YAY_{A} that satisfies both AYAA\subseteq Y_{A} and, by the arguments above, YAYY_{A}\subsetneq Y^{*}. Therefore, YY^{*} is not inclusion-minimal for AA in 𝒫\mathcal{H}^{*}_{\mathcal{P}}; a contradiction.

In summary, 𝒫\mathcal{H}^{*}_{\mathcal{P}} is a hierarchy on XX such that Conditions (i) and (ii) in Thm. 4.5 are satisfied for all A,B𝒫A,B\in\mathcal{P}. Thus 𝒫\mathcal{P} is compatible with the refinement 𝒫\mathcal{H}^{*}_{\mathcal{P}} of \mathcal{H}. ∎

Theorem 5.7.

A hierarchy \mathcal{H} and a partition 𝒫\mathcal{P} on XX are r-compatible if and only if no set YY\in\mathcal{H} overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P}.

Proof.

The only-if-direction follows from Lemma 5.2. Conversely, if no set YY\in\mathcal{H} overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P}, then Lemma 5.5(2) and 5.6 imply that the hierarchy 𝒫\mathcal{H}^{*}_{\mathcal{P}} is compatible with 𝒫\mathcal{P}. By construction, 𝒫\mathcal{H}\subseteq\mathcal{H}^{*}_{\mathcal{P}} and thus, 𝒫\mathcal{H}^{*}_{\mathcal{P}} is a refinement of \mathcal{H}, which completes the proof. ∎

It is worth noting that the characterization of r-compatibility in Thm. 5.7 implies neither Property (i) nor Property (ii) in Thm 4.5. As a counterexample to (i), consider the hierarchy \mathcal{H} on X={a,a,b,b}X=\{a,a^{\prime},b,b^{\prime}\} that comprises in addition to XX and the singletons only the set Y={a,a,b}Y=\{a,a^{\prime},b\}, and the partition 𝒫={A,B}\mathcal{P}=\{A,B\} with A={a,a}A=\{a,a^{\prime}\} and B={b,b}B=\{b,b^{\prime}\}. We have A=YA_{\mathcal{H}}=Y and B=XB_{\mathcal{H}}=X. Clearly, Y=AY=A_{\mathcal{H}} overlaps BB and thus violates (i). On the other hand, it admits the refinement ={A}\mathcal{H}^{*}=\mathcal{H}\cup\{A\}, which is compatible with 𝒫\mathcal{P}. As a counterexample to (ii), consider \mathcal{H}^{\prime} comprising only XX and the singletons. Here, we have A=B=XA_{\mathcal{H}^{\prime}}=B_{\mathcal{H}^{\prime}}=X, while both refinements {A}\mathcal{H}^{\prime}\cup\{A\} and {B}\mathcal{H}^{\prime}\cup\{B\} are compatible with 𝒫\mathcal{P}.

We continue with considering systems 𝔓={𝒫1,𝒫2,,𝒫k}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k}\} of partitions of XX rather then single partitions. By definition, TT (or equivalently the hierarchy (T)\mathcal{H}(T)) and each 𝒫i\mathcal{P}_{i} are compatible if and only if 𝒫i=(T,Hi)\mathcal{P}_{i}=\mathcal{F}(T,H_{i}) for some subset HiE(T)H_{i}\subseteq E(T), 1ik1\leq i\leq k. In this case, we say that TT (equiv. (T)\mathcal{H}(T)) and 𝔓\mathfrak{P} are compatible. It is natural, then, to ask whether for a given system of partitions 𝔓\mathfrak{P}, there exists a tree TT such that 𝔓\mathfrak{P} and TT are compatible.

Proposition 5.8.

Let \mathcal{H} be a hierarchy on XX and 𝔓={𝒫1,𝒫2,,𝒫k}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k}\} be a collection of partitions of XX. The following two statements are equivalent

  1. (1)

    There is a refinement \mathcal{H}^{*} of \mathcal{H} that is compatible with 𝔓\mathfrak{P}.

  2. (2)

    Each 𝒫i𝔓\mathcal{P}_{i}\in\mathfrak{P} admits a compatible refinement i\mathcal{H}_{i}^{*} of \mathcal{H} such that i=1ki\bigcup_{i=1}^{k}\mathcal{H}_{i}^{*} is a hierarchy.

Proof.

Let \mathcal{H}^{*} be a refinement of \mathcal{H} that is compatible with 𝔓\mathfrak{P} and thus, with every 𝒫i𝔓\mathcal{P}_{i}\in\mathfrak{P}. Now, put i=\mathcal{H}_{i}^{*}=\mathcal{H}^{*}, 1ik1\leq i\leq k. Hence, =i=1ki\mathcal{H}^{*}=\bigcup_{i=1}^{k}\mathcal{H}_{i}^{*} is a hierarchy that is compatible with 𝔓\mathfrak{P}. Conversely, if i=1ki\bigcup_{i=1}^{k}\mathcal{H}_{i}^{*} is a hierarchy, then it is, in particular, a refinement of every i\mathcal{H}_{i}^{*}, 1ik1\leq i\leq k. Now set =i=1ki\mathcal{H}^{*}=\bigcup_{i=1}^{k}\mathcal{H}_{i}^{*}. By Prop. 5.1, \mathcal{H}^{*} is compatible with with every 𝒫i𝔓\mathcal{P}_{i}\in\mathfrak{P}. ∎

Prop. 5.8 immediately implies

Corollary 5.9.

There is a tree TT that is compatible with a collection 𝔓={𝒫1,𝒫2,,𝒫k}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k}\} of partitions of XX if and only if, for every i{1,,k}i\in\{1,\dots,k\}, there is a tree TiT_{i} that is compatible with 𝒫i\mathcal{P}_{i} such that i=1k(Ti)\bigcup_{i=1}^{k}\mathcal{H}(T_{i}) is a hierarchy.

Proof.

Since every tree on XX is a refinement of the star tree TT^{\prime} on XX, every hierarchy \mathcal{H}^{*} on XX is a refinement of (T)\mathcal{H}(T^{\prime}). Hence, the existence of some hierarchy or, equivalently, some tree that is compatible with 𝔓\mathfrak{P} is equivalent to the existence of a refinement of \mathcal{H}^{*} of (T)\mathcal{H}(T^{\prime}) that is compatible with 𝔓\mathfrak{P}. ∎

As illustrated in Fig. 4, \mathcal{H}^{*} and 𝔓={𝒫1,𝒫2,,𝒫k}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k}\} might be compatible, although there are refinements i\mathcal{H}_{i}^{*} of \mathcal{H} compatible with Pi𝔓P_{i}\in\mathfrak{P}, whose union i=1ki\bigcup_{i=1}^{k}\mathcal{H}_{i}^{*} does not form a hierarchy. Fig. 7, furthermore, shows an example of a partition system 𝔓\mathfrak{P} that is not compatible with any refinement of \mathcal{H}. We will show in Section 7 that deciding whether or not such a common refinement exists is an NP-complete problem.

Refer to caption
Figure 7: The tree TT with leaf set X={a,b,c,d}X=\{a,b,c,d\} is compatible with neither 𝒫1={{a,b},{c,d}}\mathcal{P}_{1}=\{\{a,b\},\{c,d\}\} nor 𝒫2={{a,c},{b,d}}\mathcal{P}_{2}=\{\{a,c\},\{b,d\}\}. The refinements T1T^{*}_{1} and T2T^{*}_{2} are compatible with P1P_{1} and P2P_{2}, respectively. However, there is no refinement (T)\mathcal{H}^{*}(T) of (T)\mathcal{H}(T) (and thus of (T1)\mathcal{H}(T^{*}_{1}) and (T2)\mathcal{H}(T^{*}_{2})) that is compatible with both 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} as a consequence of Prop. 6.2 and the fact that the unrooted versions T¯1\overline{T}_{1}^{*} and T¯2\overline{T}_{2}^{*} of T1T^{*}_{1} and T2T^{*}_{2}, resp., are determined by the splits {a,b}|{c,d}\{a,b\}|\{c,d\} and {a,d}|{b,c}\{a,d\}|\{b,c\}, resp., which cannot be represented in a common unrooted tree (cf. “Splits-Equivalence Theorem” [18, Thm. 3.1.4]).

The partitions of a set XX form a complete lattice [2, Sect. 4.9]. The common refinement 𝒫1𝒫2\mathcal{P}_{1}\wedge\mathcal{P}_{2} of two partitions 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} of XX is

𝒫1𝒫2{A1A2A1𝒫1,A2𝒫2,A1A2}.\mathcal{P}_{1}\wedge\mathcal{P}_{2}\coloneqq\{A_{1}\cap A_{2}\mid A_{1}\in\mathcal{P}_{1},\,A_{2}\in\mathcal{P}_{2},\,A_{1}\cap A_{2}\neq\emptyset\}. (6)

The common refinement operation is associative and commutative. The common refinement of an arbitrary system 𝔓\mathfrak{P} of partitions, therefore, consists of all distinct sets Px=A𝒫𝔓,xAAP_{x}=\bigcap_{A\in\mathcal{P}\in\mathfrak{P},\,x\in A}A. The following results shows that the common refinement of partitions that are compatible with TT is compatible with TT as well.

Proposition 5.10.

Let TT be a tree with leaf set XX and 𝔓={𝒫1,𝒫2,,𝒫k}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k}\} a collection of partitions on XX that are all compatible with TT. Then, i=1k𝒫i\bigwedge_{i=1}^{k}\mathcal{P}_{i} is compatible with TT.

Proof.

Let TT be a tree with leaf set XX. We show first that for all subsets H1,H2E(T)H_{1},H_{2}\in E(T) it holds that

(T,H1H2)=(T,H1)(T,H2).\mathcal{F}(T,H_{1}\cup H_{2})=\mathcal{F}(T,H_{1})\wedge\mathcal{F}(T,H_{2}). (7)

To see this, let A(T,H1)(T,H2)A\in\mathcal{F}(T,H_{1})\wedge\mathcal{F}(T,H_{2}). For all a,aAa,a^{\prime}\in A the following statements are equivalent

  1. 1.

    a,aA=A1A2a,a^{\prime}\in A=A_{1}\cap A_{2} for some A1(T,H1),A2(T,H2)A_{1}\in\mathcal{F}(T,H_{1}),\,A_{2}\in\mathcal{F}(T,H_{2}),

  2. 2.

    there is no separating edge in H1H_{1} and in H2H_{2} that is on the path between aa and aa^{\prime} in TT,

  3. 3.

    there is no separating edge in H1H2H_{1}\cup H_{2} that is on the path between aa and aa^{\prime} in TT, and

  4. 4.

    a,aB(T,H1H2)a,a^{\prime}\in B\in\mathcal{F}(T,H_{1}\cup H_{2}).

Hence, AB(T,H1H2)A\subseteq B\in\mathcal{F}(T,H_{1}\cup H_{2}). Similarly, the latter equivalent statements hold for all a,aB(T,H1H2)a,a^{\prime}\in B\in\mathcal{F}(T,H_{1}\cup H_{2}) and thus, A=BA=B.

Now assume that 𝒫1,𝒫2,,𝒫k\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k} are all compatible with TT. Hence, for each 𝒫i\mathcal{P}_{i} there is a set HiE(T)H_{i}\subseteq E(T) such that 𝒫i=(T,Hi)\mathcal{P}_{i}=\mathcal{F}(T,H_{i}). By the latter arguments and since \wedge is commutative and associative, we can conclude that i=1k𝒫i=i=1k(T,Hi)=(T,i=1kHi)\bigwedge_{i=1}^{k}\mathcal{P}_{i}=\bigwedge_{i=1}^{k}\mathcal{F}(T,H_{i})=\mathcal{F}(T,\cup_{i=1}^{k}H_{i}) and thus, i=1k𝒫i\bigwedge_{i=1}^{k}\mathcal{P}_{i} is compatible with TT. ∎

The converse of Prop. 5.10 is not true in general. As an example, consider the tree TT as shown in Fig. 7 and the two partitions 𝒫1={{a,b},{c,d}}\mathcal{P}_{1}=\{\{a,b\},\{c,d\}\} and 𝒫2={{a,c},{b,d}}\mathcal{P}_{2}=\{\{a,c\},\{b,d\}\}. Both 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} are not compatible with TT, however, their common refinement 𝒫1𝒫2={{a},{b},{c},{d}}=(T,E(T))\mathcal{P}_{1}\wedge\mathcal{P}_{2}=\{\{a\},\{b\},\{c\},\{d\}\}=\mathcal{F}(T,E(T)) is.

The refinement supremum or join 𝒫1𝒫2\mathcal{P}_{1}\vee\mathcal{P}_{2} is obtained by recursively unifying any two sets A1,A1𝒫1A_{1},A^{\prime}_{1}\in\mathcal{P}_{1} whenever there is A2𝒫2A_{2}\in\mathcal{P}_{2} such that A1A2A_{1}\cap A_{2}\neq\emptyset and A1A2A_{1}^{\prime}\cap A_{2}\neq\emptyset. The analogue to Prop. 5.10 does not hold for the refinement supremum. To see this, consider the tree TT on X={a,b,c,d}X=\{a,b,c,d\} with hierarchy (T)=X{{x}xX}{{a,b},{a,b}}\mathcal{H}(T)=X\cup\{\{x\}\mid x\in X\}\cup\{\{a,b\},\{a^{\prime},b^{\prime}\}\} and the partitions 𝒫1={{a,a},{b},{b}}\mathcal{P}_{1}=\{\{a,a^{\prime}\},\{b\},\{b^{\prime}\}\} and 𝒫2={{a},{a},{b,b}}\mathcal{P}_{2}=\{\{a\},\{a^{\prime}\},\{b,b^{\prime}\}\}. Both 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} are compatible with TT (just define the edges incident to the xx for respective singletons {x}\{x\} as separating edges). However, 𝒫1𝒫2={A{a,a},B{b,b}}\mathcal{P}_{1}\vee\mathcal{P}_{2}=\{A\coloneqq\{a,a^{\prime}\},B\coloneqq\{b,b^{\prime}\}\} is not compatible since, for the hierarchy \mathcal{H} corresponding to TT, we have A=BA_{\mathcal{H}}=B_{\mathcal{H}}; a contradiction to Condition (ii) in Thm. 4.5. In [19], a notion of local comparability of partitions is considered: 𝒫1𝒫2\mathcal{P}_{1}\simeq\mathcal{P}_{2} iff A1A2{,A1,A2}A_{1}\cap A_{2}\in\{\emptyset,A_{1},A_{2}\} for all A1𝒫1A_{1}\in\mathcal{P}_{1} and A2𝒫2A_{2}\in\mathcal{P}_{2}. The example above satisfies 𝒫1𝒫2\mathcal{P}_{1}\simeq\mathcal{P}_{2}. Hence, local comparability of partitions 𝒫1\mathcal{P}_{1} and 𝒫2\mathcal{P}_{2} compatible with a given hierarchy \mathcal{H} is also not sufficient to imply compatibility of 𝒫1𝒫2\mathcal{P}_{1}\vee\mathcal{P}_{2} and \mathcal{H}.

6 Compatibility of Partitions with Split Systems and Unrooted Trees

Throughout this section, we will assume that all unrooted trees are phylogenetic as well and have at least three leaves. In particular, therefore, they have at least one inner vertex. Not surprisingly, there is a very close connection between the case of rooted and unrooted phylogenetic trees.

Proposition 6.1.

Let T¯\overline{T} be an unrooted tree with leaf set XX and let 𝒫\mathcal{P} be a partition of XX. Then 𝒫\mathcal{P} and T¯\overline{T} are compatible if and only if 𝒫\mathcal{P} and the rooted tree TT are compatible, where TT is obtained by rooting T¯\overline{T} at an arbitrary inner vertex.

Proof.

Note that rooting T¯\overline{T} at an arbitrary inner vertex results in a phylogenetic rooted tree TT, since T¯\overline{T} does not contain vertices of degree two. The equivalence now follows immediately from the definition and the fact that, viewed as pair of a set vertices and a set of edges, T=T¯T=\overline{T} and therefore, 𝒫=(T¯,H)=(T,H)\mathcal{P}=\mathcal{F}(\overline{T},H)=\mathcal{F}(T,H) for some subset HE(T¯)H\subseteq E(\overline{T}). ∎

In fact, it is not necessary to root T¯\overline{T} at an inner vertex, we may as well place the root as a subdivision of any edge. This allows us to connect unrooted trees directly to hierarchies:

Proposition 6.2.

Let \mathcal{H} be a hierarchy with corresponding rooted tree TT with leaf set XX and let 𝒫\mathcal{P} be a partition of XX. Then \mathcal{H} and 𝒫\mathcal{P} are compatible if and only if the unrooted tree T¯\overline{T} obtained from TT (by suppressing a possible degree-two root) is compatible with 𝒫\mathcal{P}.

Proof.

If the root ρ\rho of TT has degree greater than two, then the tree T¯\overline{T} is phylogenetic and thus, can we apply the same arguments as in the proof of Prop. 6.1 to establish the equivalence. If ρ\rho has degree 22, then the tree T¯\overline{T} is not phylogenetic and both edges e1={ρ,v1}e_{1}=\{\rho,v_{1}\} and e2={ρ,v2}e_{2}=\{\rho,v_{2}\} that are incident to ρ\rho define the same split Se1=Se2S_{e_{1}}=S_{e_{2}}. Since 𝔖(T¯)\mathfrak{S}(\overline{T}) contains a split only once, the unique phylogenetic tree T¯\overline{T} defined by the split system 𝔖(T¯)\mathfrak{S}(\overline{T}) is obtained from T¯\overline{T} by suppressing the root ρ\rho, i.e., e{v1,v2}E(T¯)e^{*}\coloneqq\{v_{1},v_{2}\}\in E(\overline{T}). Now let HE(T)H\subseteq E(T) be such that (T,H)=𝒫\mathcal{F}(T,H)=\mathcal{P}. If e1e_{1} or e2e_{2} is contained in HH, we add the edge ee^{*} to H{e1,e2}H\setminus\{e_{1},e_{2}\} to obtain the set HE(T¯)H^{*}\subseteq E(\overline{T}) and, otherwise, we put H=HH^{*}=H. It is now easy to see that (T,H)=(T¯,H)\mathcal{F}(T,H)=\mathcal{F}(\overline{T},H^{*}) and thus, T¯\overline{T} and 𝒫\mathcal{P} are compatible. ∎

As outlined in Section 2, every unrooted phylogenetic tree T¯\overline{T} is determined by its split system 𝔖(T¯)={𝒮e:eE(T¯)}\mathfrak{S}(\overline{T})=\{\mathcal{S}_{e}\colon e\in E(\overline{T})\} and for a tree-like split system 𝔖\mathfrak{S} there is a (unique) unrooted tree T¯\overline{T} with 𝔖(T¯)=𝔖\mathfrak{S}(\overline{T})=\mathfrak{S}. Since unrooted trees are so intimately related with split systems, it is interesting, therefore, to ask whether the compatibility of rooted trees or hierarchies and partitions can also be expressed in an interesting way in terms of split systems.

Corollary 6.3.

For every partition 𝒫\mathcal{P} of a non-empty set XX, the split system 𝔖𝒫𝔖𝒫{x|(X{x}):xX})\mathfrak{S}_{\mathcal{P}}^{*}\coloneqq\mathfrak{S}_{\mathcal{P}}\cup\{x|(X\setminus\{x\})\colon x\in X\}) with

𝔖𝒫{A|(XA):A𝒫}\mathfrak{S}_{\mathcal{P}}\coloneqq\left\{A|(X\setminus A)\colon A\in\mathcal{P}\right\} (8)

is always tree-like and compatible with 𝒫\mathcal{P}.

Proof.

It is easy to see that the hierarchy 𝒫\mathcal{H}_{\mathcal{P}} (as specified in Lemma 4.1) yields a rooted tree TT for which the unrooted version T¯\overline{T} (by suppressing a possible degree-two root of TT) satisfies 𝔖(T¯)=𝔖𝒫\mathfrak{S}(\overline{T})=\mathfrak{S}_{\mathcal{P}}^{*}. Now apply Prop. 6.2. ∎

Refer to caption
Figure 8: The unrooted tree TT and the partition 𝒫={{a,b,c},{d,e}}\mathcal{P}=\{\{a,b,c\},\{d,e\}\} are compatible. However, 𝔖(T¯)𝔖𝒫\mathfrak{S}(\overline{T})\neq\mathfrak{S}_{\mathcal{P}}^{*}, because for 𝒮={a,b}|{c,d,e}𝔖(T¯)\mathcal{S}=\{a,b\}|\{c,d,e\}\in\mathfrak{S}(\overline{T}) we have 𝒮𝔖𝒫={{a,b,c}|{d,e}}\mathcal{S}\notin\mathfrak{S}_{\mathcal{P}}=\{\{a,b,c\}|\{d,e\}\}.

Compatibility of a partition 𝒫\mathcal{P} with a split system 𝔖(T¯)\mathfrak{S}(\overline{T}) of some tree, however, does not imply that 𝔖(T¯)=𝔖𝒫\mathfrak{S}(\overline{T})=\mathfrak{S}_{\mathcal{P}}^{*}, see Fig. 8. Suppose that T¯\overline{T} and 𝒫\mathcal{P} are compatible, i.e., that 𝒫=(T¯,H)\mathcal{P}=\mathcal{F}(\overline{T},H). The set HH of separating edges then corresponds to the set

H,T¯{𝒮e:eH}𝔖(T¯).\mathfrak{H}_{H,\overline{T}}\coloneqq\{\mathcal{S}_{e}\colon e\in H\}\subseteq\mathfrak{S}(\overline{T})\,.

of splits. Then we have either

BA or BXAfor every A|(XA)H,T¯ and every B𝒫B\subseteq A\text{ or }B\subseteq X\setminus A\qquad\text{for every }A|(X\setminus A)\in\mathfrak{H}_{H,\overline{T}}\text{ and every }B\in\mathcal{P} (9)

since none of the edges eHe\in H separates two vertices of BB. Furthermore, for any two distinct sets A,A𝒫A,A^{\prime}\in\mathcal{P} there is a split B|BH,T¯B|B^{\prime}\in\mathfrak{H}_{H,\overline{T}} such that, w.l.o.g., ABA\subseteq B and ABA^{\prime}\subseteq B^{\prime} because there must be an edge in HH separating AA and AA^{\prime} in T¯\overline{T}. Taken together, therefore, we observe that every set B𝒫B\in\mathcal{P} satisfies

B={A2X:A|(XA)H,T¯,BA}.B=\bigcap\left\{A\in 2^{X}\,:\,A|(X\setminus A)\in\mathfrak{H}_{H,\overline{T}},\,B\subseteq A\right\}. (10)

Conversely, suppose that an arbitrary split system \mathfrak{H} satisfies Eqs. (9) and (10) (replace H,T¯\mathfrak{H}_{H,\overline{T}} by \mathfrak{H} in the equations). By Eq. (10), for every B𝒫B^{\prime}\in\mathcal{P} with BBB\neq B^{\prime} and thus BB=B\cap B^{\prime}=\emptyset, there is a split A|(XA)A|(X\setminus A) such that BAB\subseteq A and BXAB^{\prime}\subseteq X\setminus A and thus there is an edge eBBE(T¯))e_{BB^{\prime}}\in E(\overline{T})) that separates BB and BB^{\prime} in T¯\overline{T}. Add all these edges to the set HH. Eq. (9) ensures that no two elements in BB are separated by an edge in HH. Thus, B(T¯,H)B\in\mathcal{F}(\overline{T},H) for every B𝒫B\in\mathcal{P}. Hence, 𝒫\mathcal{P} and T¯\overline{T} are compatible if and only if 𝔖(T¯)\mathfrak{H}\subseteq\mathfrak{S}(\overline{T}) satisfies Eqs. (9) and (10). Recall that splits on XX are partitions of XX and that the common refinement of a set 𝔓\mathfrak{P} of partitions of XX is the partition 𝔓\bigwedge\mathfrak{P} whose sets are the intersections Bx{B𝒫𝔓 s.t. xB}B_{x}\coloneqq\bigcap\{B\in\mathcal{P}\in\mathfrak{P}\text{ s.t.\ }x\in B\} of sets appearing in any of the partitions in the system 𝔓\mathfrak{P} that have a least one point xXx\in X in common. Thus, 𝔖(T¯)\mathfrak{H}\subseteq\mathfrak{S}(\overline{T}) satisfies Eqs. (9) and (10) if and only if 𝒫=\mathcal{P}=\bigwedge\mathfrak{H}. We summarize this discussion as

Theorem 6.4.

Let 𝒫\mathcal{P} be a partition and 𝔖\mathfrak{S} be a tree-like split system. Then 𝒫\mathcal{P} and 𝔖\mathfrak{S} are compatible if and only if there is subset 𝔖\mathfrak{H}\subseteq\mathfrak{S} such that 𝒫\mathcal{P} is the common refinement of \mathfrak{H}. In this case, 𝒫=(T¯,H)\mathcal{P}=\mathcal{F}(\overline{T},H) for the tree T¯\overline{T} with 𝔖(T¯)=𝔖\mathfrak{S}(\overline{T})=\mathfrak{S} and H={eeE(T¯),Se}H=\{e\mid e\in E(\overline{T}),S_{e}\in\mathfrak{H}\}.

Since every refinement T¯\overline{T}^{*} of a tree T¯\overline{T} corresponds to a tree-like split system 𝔖(T¯)\mathfrak{S}(\overline{T}^{*}) that satisfies 𝔖(T¯)𝔖(T¯)\mathfrak{S}(\overline{T})\subseteq\mathfrak{S}(\overline{T}^{*}), we immediately obtain a characterization of tree-like split systems that admit a refinement that is compatible with a partition 𝒫\mathcal{P}.

Corollary 6.5.

Let 𝒫\mathcal{P} be a partition and 𝔖(T¯)\mathfrak{S}(\overline{T}) be the split system associated with a tree T¯\overline{T}. Then 𝒫\mathcal{P} is compatible with a refinement of T¯\overline{T} if and only if there is a set of splits \mathfrak{H} such that (i) 𝒫\mathcal{P} is the common refinement of \mathfrak{H} and (ii) 𝔖(T¯)\mathfrak{S}(\overline{T})\cup\mathfrak{H} is tree-like.

The characterizations in Thm. 6.4 and Cor. 6.5 are not constructive, i.e., they do not provide recipes to construct \mathfrak{H}. Clearly, we can directly employ Prop. 6.1 and the linear-time algorithm provided in Section 7 to check whether a split system and a partition are compatible or not. Nevertheless, we provide the following two lemmas to provide a further constructive characterization that makes use of a step-wise decomposition and might be of further theoretical interest.

Definition 6.6 ([18, Def. 6.1.1]).

Let T¯\overline{T} be an unrooted phylogenetic tree with leaf set XX and corresponding split system 𝔖(T¯)\mathfrak{S}(\overline{T}). Then the restriction T¯|Y\overline{T}_{|Y} of T¯\overline{T} to a non-empty subset YXY\subseteq X of leaves is the tree for which

𝔖(T¯|Y)={(AY)|(YA):A|(XA)𝔖(T¯),AY,YA}.\mathfrak{S}(\overline{T}_{|Y})=\{(A\cap Y)|(Y\setminus A)\,:\,A|(X\setminus A)\in\mathfrak{S}(\overline{T}),\,A\cap Y\neq\emptyset,\,Y\setminus A\neq\emptyset\,\}.

Somewhat surprisingly, it suffices to check whether there is a single element A𝒫A\in\mathcal{P} such that 𝒫{A}\mathcal{P}\setminus\{A\} and T¯|XA\overline{T}_{|X\setminus A} are compatible, to determine whether 𝒫\mathcal{P} and T¯\overline{T} are compatible as shown in the next

Lemma 6.7.

Let T¯\overline{T} be an unrooted tree with leaf set XX and let 𝒫\mathcal{P} be a partition of XX. Then 𝒫\mathcal{P} and T¯\overline{T} are compatible if and only if |𝒫|=1|\mathcal{P}|=1, or there is a set A𝒫A\in\mathcal{P} such that (i) A|(XA)𝔖(T¯)A|(X\setminus A)\in\mathfrak{S}(\overline{T}) and (ii) 𝒫{A}\mathcal{P}\setminus\{A\} and T¯|XA\overline{T}_{|X\setminus A} are compatible.

Proof.

Clearly, every tree on XX is compatible with the partition 𝒫={X}\mathcal{P}=\{X\} since {X}=(T¯,)\{X\}=\mathcal{F}(\overline{T},\emptyset). Thus, assume |𝒫|>1|\mathcal{P}|>1 in the following. First suppose that T¯\overline{T} and 𝒫\mathcal{P} are compatible. We first show that there is a set A𝒫A\in\mathcal{P} such that A|(XA)𝔖(T¯)A|(X\setminus A)\in\mathfrak{S}(\overline{T}). Since T¯\overline{T} and 𝒫\mathcal{P} are compatible, there is an edge set HE(T¯)H\subseteq E(\overline{T}) such that (T¯,H)=𝒫\mathcal{F}(\overline{T},H)=\mathcal{P} and a corresponding set of splits H,T¯𝔖(T¯)\mathfrak{H}\coloneqq\mathfrak{H}_{H,\overline{T}}\subseteq\mathfrak{S}(\overline{T}). Since |𝒫|>1|\mathcal{P}|>1 by assumption, both sets HH and \mathfrak{H} are non-empty. Therefore, we can pick an arbitrary split S1|(XS1)S_{1}|(X\setminus S_{1})\in\mathfrak{H}. By Eq. (9), every A𝒫A\in\mathcal{P} satisfies either AS1A\subseteq S_{1} or AXS1A\subseteq X\setminus S_{1}. In particular, there must be some A𝒫A\in\mathcal{P} with AS1A\subseteq S_{1}. If there is no other split S2|(XS2)S_{2}|(X\setminus S_{2})\in\mathfrak{H} with S2S1S_{2}\subsetneq S_{1}, then S1𝒫S_{1}\in\mathcal{P} and we are done. Otherwise consider the split S2|(XS2)S_{2}|(X\setminus S_{2})\in\mathfrak{H} with S2S1S_{2}\subsetneq S_{1}. Then either there is another split S3|(XS3)S_{3}|(X\setminus S_{3})\in\mathfrak{H} with S3S2S_{3}\subsetneq S_{2} or S2𝒫S_{2}\in\mathcal{P}. Since \mathfrak{H} is finite, we eventually reach a split Sj|(XSj)S_{j}|(X\setminus S_{j})\in\mathcal{H} with Sj𝒫S_{j}\in\mathcal{P}, i.e., we can choose A=SjA=S_{j} in Condition (i). Now let eAHe_{A}\in H be the edge in T¯\overline{T} with Se=A|(XA)S_{e}=A|(X\setminus A). The restriction T¯|Y\overline{T}_{|Y} of T¯\overline{T} to Y=XAY=X\setminus A is therefore simply the connected component T¯\overline{T}^{\prime} of T¯eA\overline{T}-e_{A} that does not intersect AA. Thus (T¯|XA,H{eA})=𝒫{A}\mathcal{F}(\overline{T}_{|X\setminus A},H\setminus\{e_{A}\})=\mathcal{P}\setminus\{A\}, i.e., Condition (ii) is satisfied.

Conversely, suppose there is an A𝒫A\in\mathcal{P} such that 𝒫{A}\mathcal{P}\setminus\{A\} and T¯|XA\overline{T}_{|X\setminus A} is compatible and A|(XA)A|(X\setminus A) is a split corresponding to an edge in T¯\overline{T}. Then there is an edge set HE(T¯|XA)H^{\prime}\subseteq E(\overline{T}_{|X\setminus A}) such that (T¯|XA,H)=𝒫=𝒫{A}\mathcal{F}(\overline{T}_{|X\setminus A},H^{\prime})=\mathcal{P}^{\prime}=\mathcal{P}\setminus\{A\}. Since A|(XA)𝔖(T¯)A|(X\setminus A)\in\mathfrak{S}(\overline{T}), there is an edge eae_{a} in E(T¯)E(T¯|XA)E(\overline{T})\setminus E(\overline{T}_{|X\setminus A}) connecting the subtrees T¯|XA\overline{T}_{|X\setminus A} and T¯|A\overline{T}_{|A}. In particular eAHe_{A}\notin H^{\prime}. Therefore, we have (T¯,H{eA})=(T¯|XA,H){A}=𝒫{A}=𝒫\mathcal{F}(\overline{T},H^{\prime}\cup\{e_{A}\})=\mathcal{F}(\overline{T}_{|X\setminus A},H^{\prime})\cup\{A\}=\mathcal{P}^{\prime}\cup\{A\}=\mathcal{P}, i.e., T¯\overline{T} and 𝒫\mathcal{P} are compatible. ∎

Lemma 6.8.

Let 𝒫\mathcal{P} be a partition of XX, and T¯\overline{T} an unrooted tree with leaf set XX. Then T¯\overline{T} admits a refinement T¯\overline{T}^{*} compatible with 𝒫\mathcal{P} if and only if |𝒫|=1|\mathcal{P}|=1, or there is an A𝒫A\in\mathcal{P} such that (i) for every B1|B2𝔖(T¯)B_{1}|B_{2}\in\mathfrak{S}(\overline{T}) we have at least one of AB1A\subseteq B_{1}, AB2A\subseteq B_{2}, B1AB_{1}\subseteq A, or B2AB_{2}\subseteq A and (ii) the restriction T¯|XA\overline{T}_{|X\setminus A} admits a refinement T¯|XA\overline{T}^{*}_{|X\setminus A} that is compatible with 𝒫{A}\mathcal{P}\setminus\{A\}.

Proof.

For brevity, we write 𝔗𝔖(T¯)\mathfrak{T}\coloneqq\mathfrak{S}(\overline{T}) and 𝔗𝔖(T¯)\mathfrak{T}^{*}\coloneqq\mathfrak{S}(\overline{T}^{*}) for the split systems of the two trees TT and TT^{*}. Clearly, every tree on XX is compatible with the partition 𝒫={X}\mathcal{P}=\{X\} since {X}=(T¯,)\{X\}=\mathcal{F}(\overline{T},\emptyset). Thus, assume |𝒫|>1|\mathcal{P}|>1 in the following. Suppose the refinement T¯\overline{T}^{*} (with corresponding split system 𝔗\mathfrak{T}^{*}) of T¯\overline{T} is compatible with 𝒫\mathcal{P}. By Lemma 6.7, there is an A𝒫A\in\mathcal{P} such that A|(XA)A|(X\setminus A) is a split in 𝔗\mathfrak{T}^{*} and T¯|XA\overline{T}^{*}_{|X\setminus A} is compatible with 𝒫{A}\mathcal{P}\setminus\{A\}. Clearly, T¯|XA\overline{T}^{*}_{|X\setminus A} is a refinement of T¯|XA\overline{T}_{|X\setminus A}, i.e., T¯|XA\overline{T}_{|X\setminus A} admits a refinement that is compatible with 𝒫{A}\mathcal{P}\setminus\{A\}. Since 𝔗\mathfrak{T}^{*} identifies the tree T¯\overline{T}^{*}, it is in particular a tree-like split system. Since T¯\overline{T}^{*} is a refinement of T¯\overline{T}, we have 𝔗𝔗\mathfrak{T}\subseteq\mathfrak{T}^{*}. In particular, therefore A|(XA)A|(X\setminus A) and every split B1|B2𝔗B_{1}|B_{2}\in\mathfrak{T} must have at least one empty intersection AB1A\cap B_{1}, AB2A\cap B_{2}, (XA)B1(X\setminus A)\cap B_{1} or (XA)B2(X\setminus A)\cap B_{2}. Depending on which of the four intersections is empty, we have one of the following situations AB2A\subseteq B_{2}, AB1A\subseteq B_{1}, B1AB_{1}\subseteq A or B2AB_{2}\subseteq A, respectively.

Conversely, suppose there is an A𝒫A\in\mathcal{P} satisfying conditions (i) and (ii). Then 𝔗𝔗{A|(XA)}\mathfrak{T}^{\prime}\coloneqq\mathfrak{T}\cup\{A|(X\setminus A)\} is a tree-like split system because 𝔗\mathfrak{T} has this property and, for any B1|B2𝔗{A|(XA)}B_{1}|B_{2}\in\mathfrak{T}\setminus\{A|(X\setminus A)\}, the alternatives in (i) amount to AB2=A\cap B_{2}=\emptyset, AB1=A\cap B_{1}=\emptyset, (XA)B1=(X\setminus A)\cap B_{1}=\emptyset, or (XA)B2=(X\setminus A)\cap B_{2}=\emptyset, respectively. Moreover, 𝔗\mathfrak{T} and thus 𝔗\mathfrak{T}^{\prime} contains the singleton splits {x}|(X{x})\{x\}|(X\setminus\{x\}) for all xXx\in X. The split system 𝔗\mathfrak{T}^{\prime} therefore defines a refinement T¯\overline{T}^{\prime} of T¯\overline{T}. Furthermore, we have T¯|XA=T¯|XA\overline{T}_{|X\setminus A}=\overline{T}^{\prime}_{|X\setminus A} and T¯|A=T¯|A\overline{T}_{|A}=\overline{T}^{\prime}_{|A} since either T¯=T¯\overline{T}=\overline{T}^{\prime} or the difference between T¯\overline{T} and T¯\overline{T}^{\prime} is only the expansion or contraction of the edge eAe_{A} identified by the additional split A|(XA)A|(X\setminus A). By condition (ii), there is a refinement T¯|XA\overline{T}^{*}_{|X\setminus A} of the restriction T¯|XA=T¯|XA\overline{T}_{|X\setminus A}=\overline{T}^{\prime}_{|X\setminus A} that is compatible with the partition 𝒫{A}\mathcal{P}\setminus\{A\} of XAX\setminus A. Thus there is also a refinement T¯{\overline{T}^{\prime}}^{*} of T¯\overline{T} such that the restriction T¯|XA=T¯|XA{\overline{T}^{\prime}}^{*}_{|X\setminus A}=\overline{T}^{*}_{|X\setminus A} is compatible with 𝒫{A}\mathcal{P}\setminus\{A\}. Let HH^{*} be the corresponding set of separating edges. Then (T¯,H{eA})=(T¯|X{A},H){A}=𝒫\mathcal{F}({\overline{T}^{\prime}}^{*},H^{*}\cup\{e_{A}\})=\mathcal{F}(\overline{T}^{*}_{|X\setminus\{A\}},H^{*})\cup\{A\}=\mathcal{P}. Thus T¯\overline{T} and 𝒫\mathcal{P} are compatible. ∎

Lemma 6.7 and Lemma 6.8 are associated with a simple algorithmic intuition. Among the sets A𝒫A\in\mathcal{P}, at least one corresponds (at least in a refinement) to a connected component in the forest obtained by deletion of a single edge, and thus to a split A|(XA)A|(X\setminus A) that is either already contained in 𝔖(T¯)\mathfrak{S}(\overline{T}) or that can be used to refine the tree T¯\overline{T}. This immediately yields an algorithm on the split systems that, in each step, finds a set A𝒫A\in\mathcal{P} that satisfies condition (i) and then proceeds to checking the restriction to XAX\setminus A. While the formulation in terms of tree-like split systems is of some theoretical interest, it seems to be of little practical use compared to the linear-time algorithms described in the following section.

7 Algorithms and Complexity

In the following, we will first derive results for the complexity of checking whether 𝒫\mathcal{P} and TT are r-compatible and the construction of the edge coloring γT,𝒫\gamma_{T,\mathcal{P}} which we then use for the special case of compatibility of 𝒫\mathcal{P} and TT. Finally, we investigate the complexity of finding a refinement that is compatible with a system 𝔓\mathfrak{P} of partitions. In view of Prop. 6.2, we will assume that TT is a rooted tree in this section unless explicitly stated otherwise.

Recall that, for a tree TT on XX and a partition 𝒫\mathcal{P} of XX, the map γT,𝒫\gamma_{T,\mathcal{P}} assigns to eE(T)e\in E(T) as “colors” all sets A𝒫A\in\mathcal{P} for which ee lies on a path connecting two elements x,xAx,x^{\prime}\in A.

Observation 7.1.

Let TT be a tree on XX and 𝒫\mathcal{P} a partition of XX, vρTv\neq\rho_{T}, and e={par(v),v}E(T)e=\{\operatorname{par}(v),v\}\in E(T). Then

AγT,𝒫(e)AL(T(v)) and AL(T(v)).A\in\gamma_{T,\mathcal{P}}(e)\iff A\cap L(T(v))\neq\emptyset\text{ and }A\setminus L(T(v))\neq\emptyset.
Lemma 7.2.

Let \mathcal{H} be a hierarchy on XX, TT the corresponding tree on XX, and 𝒫\mathcal{P} a partition of XX. Moreover, let YY\in\mathcal{H}. Then, YY overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P} if and only if A,BγT,𝒫({par(lcaT(Y)),lcaT(Y)})A,B\in\gamma_{T,\mathcal{P}}(\{\operatorname{par}(\operatorname{lca}_{T}(Y)),\operatorname{lca}_{T}(Y)\}) with ABA\neq B.

Proof.

Since YY\in\mathcal{H}, there is a unique vertex vV(T)v\in V(T) with Y=L(T(v))Y=L(T(v)) and thus, v=lcaT(Y)v=\operatorname{lca}_{T}(Y). First assume that YY\in\mathcal{H} overlaps with two distinct sets A,B𝒫A,B\in\mathcal{P}. Thus, we have CL(T(v))C\cap L(T(v))\neq\emptyset and CL(T(v))C\setminus L(T(v))\neq\emptyset for C{A,B}C\in\{A,B\}. By Obs. 7.1, this implies A,BγT,𝒫({par(v),v})A,B\in\gamma_{T,\mathcal{P}}(\{\operatorname{par}(v),v\}). For the converse, assume that A,BγT,𝒫({par(v),v})A,B\in\gamma_{T,\mathcal{P}}(\{\operatorname{par}(v),v\}) with ABA\neq B. By Obs. 7.1, we have CYC\cap Y\neq\emptyset and CYC\setminus Y\neq\emptyset for C{A,B}C\in\{A,B\}. Since, in addition, AA and BB are disjoint, we have YCY\setminus C\neq\emptyset for C{A,B}C\in\{A,B\}. In summary, YY overlaps with the two distinct sets AA and BB. ∎

Lemma 7.2 together with Thm. 5.7 immediately implies

Proposition 7.3.

Let TT be a tree on XX and 𝒫\mathcal{P} a partition of XX. 𝒫\mathcal{P} and TT are r-compatible if and only if |γT,𝒫(e)|1|\gamma_{T,\mathcal{P}}(e)|\leq 1 for every eE(T)e\in E(T).

To find a refinement of a hierarchy \mathcal{H} that is compatible with 𝒫\mathcal{P}, it is crucial to know the set 𝔜(,𝒫)\mathfrak{Y}(\mathcal{H},\mathcal{P}), and, in particular, the sets AA_{\mathcal{H}} for the A𝔜(,𝒫)A\in\mathfrak{Y}(\mathcal{H},\mathcal{P}). However, an explicit construction of the latter is not needed since the property of YY\in\mathcal{H} being equal to AA_{\mathcal{H}} for some A𝔜(,𝒫)A\in\mathfrak{Y}(\mathcal{H},\mathcal{P}) or not is entirely determined by the colored edges incident to lcaT(Y)\operatorname{lca}_{T}(Y).

Lemma 7.4.

Let \mathcal{H} be a hierarchy on XX, TT the corresponding tree on XX, 𝒫\mathcal{P} a partition of XX, and YY\in\mathcal{H}. Then, Y=AY=A_{\mathcal{H}} for some A𝔜(,𝒫)A\in\mathfrak{Y}(\mathcal{H},\mathcal{P}) if and only if the following two conditions are satisfied for ulcaT(Y)u\coloneqq\operatorname{lca}_{T}(Y):

  1. (a’)

    AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}) for some vchildT(u)v\in\operatorname{child}_{T}(u) and either u=ρTu=\rho_{T} or AγT,𝒫({parT(u),u})A\notin\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}).

  2. (b’)

    BγT,𝒫({u,v})B\in\gamma_{T,\mathcal{P}}(\{u,v^{\prime}\}) for some vchildT(u)v^{\prime}\in\operatorname{child}_{T}(u) and some color BAB\neq A.

Proof.

Let Y=AY=A_{\mathcal{H}} for some A𝔜(,𝒫)𝒫A\in\mathfrak{Y}(\mathcal{H},\mathcal{P})\subseteq\mathcal{P}. By definition, this is, if and only if, (a) Y=AY=A_{\mathcal{H}} with A𝒫A\in\mathcal{P} and (b) there is some B𝒫{A}B\in\mathcal{P}\setminus\{A\} satisfying BYB\cap Y\neq\emptyset and YBY\subseteq B_{\mathcal{H}}. In particular, u=lcaT(Y)u=\operatorname{lca}_{T}(Y) is an inner vertex in this case. Thus, we have L(T(u))=Y=AL(T(u))=Y=A_{{\mathcal{H}}}. The definition of γT,𝒫\gamma_{T,\mathcal{P}} directly implies Condition (a’). Now, let B𝒫{A}B\in\mathcal{P}\setminus\{A\} such that BYB\cap Y\neq\emptyset and YBY\subseteq B_{\mathcal{H}}. Hence, BL(T(u))B\cap L(T(u))\neq\emptyset, and thus, there must be a child vchildT(u)v^{\prime}\in\operatorname{child}_{T}(u) with BL(T(v))B\cap L(T(v^{\prime}))\neq\emptyset. However, since L(T(v))L(T(u))=YBL(T(v^{\prime}))\subsetneq L(T(u))=Y\subseteq B_{\mathcal{H}} and since BB_{\mathcal{H}} is inclusion-minimal for BB, we also have BL(T(v))B\setminus L(T(v^{\prime}))\neq\emptyset. Taken together, we obtain BγT,𝒫({u,v})B\in\gamma_{T,\mathcal{P}}(\{u,v^{\prime}\}) by definition of γT,𝒫\gamma_{T,\mathcal{P}} and thus Conditions (b’).

Now assume that Conditions (a’) and (b’) are satisfied. Let AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}) for some vchildT(u)v\in\operatorname{child}_{T}(u). Hence, YAY\subseteq A_{\mathcal{H}}. Moreover, the two possible cases u=ρTu=\rho_{T} or AγT,𝒫({parT(u),u})A\notin\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}) imply that there must be a second edge {u,v′′}\{u,v^{\prime\prime}\} for some v′′childT(u){v}v^{\prime\prime}\in\operatorname{child}_{T}(u)\setminus\{v\} that is colored with AA due to the definition of γT,𝒫\gamma_{T,\mathcal{P}} and the fact that u=lcaT(Y)u=\operatorname{lca}_{T}(Y). Therefore, u=lcaT(A)=lcaT(Y)u=\operatorname{lca}_{T}(A_{\mathcal{H}})=\operatorname{lca}_{T}(Y). This together with YY\in\mathcal{H} and YAY\subseteq A_{\mathcal{H}} implies Condition (a) Y=AY=A_{\mathcal{H}}. Condition (b’) implies that BL(T(v))B\cap L(T(v^{\prime}))\neq\emptyset for some vchildT(u)v^{\prime}\in\operatorname{child}_{T}(u) (and thus BL(T(u))B\cap L(T(u))\neq\emptyset) and BL(T(v))B\setminus L(T(v^{\prime}))\neq\emptyset. Together with vchildT(u)v^{\prime}\in\operatorname{child}_{T}(u), these two arguments imply uTlcaT(B)u\preceq_{T}\operatorname{lca}_{T}(B) which is equivalent to YBY\subseteq B_{\mathcal{H}}. In summary, Condition (b) is satisfied. ∎

We are now in the position to show that r-compatibility can be decided in linear time.

Theorem 7.5.

Given a rooted tree TT on XX and a partition 𝒫\mathcal{P} of XX, it can be decided in O(|X|)O(|X|) time whether 𝒫\mathcal{P} and TT are r-compatible. In this case, the edge coloring γT,𝒫\gamma_{T,\mathcal{P}} and a compatible refinement can also be constructed in O(|X|)O(|X|) time.

Proof.

We employ the sparse-table algorithm described in [1], which, following an O(|X|)O(|X|)-preprocessing step, enables constant-time look up of lcaT(u,v)\operatorname{lca}_{T}(u,v) for any u,vV(T)u,v\in V(T). We represent γT,𝒫\gamma_{T,\mathcal{P}} by a (hash-based) map data structure that contains the O(|X|)O(|X|) edges eE(T)e\in E(T) as keys, and (hash-based, initially-empty) sets as values, which will be filled with the elements in γT,𝒫(e)\gamma_{T,\mathcal{P}}(e). The sets in 𝒫\mathcal{P} can be represented by pointers to these sets or by integer indices when used as colors. We next show that γT,𝒫\gamma_{T,\mathcal{P}} can be constructed in O(|X|)O(|X|) time. When an edge ee is colored with AA (i.e., AA is added to γT,𝒫(e)\gamma_{T,\mathcal{P}}(e)), we check in constant time whether ee still has at most one color. If this is not the case, we stop the algorithm since 𝒫\mathcal{P} and TT are not r-compatible by Prop. 7.3. Conversely, Prop. 7.3 implies that if we color each edge at most once, then 𝒫\mathcal{P} and TT are r-compatible.

We process every A={x1,,xk}𝒫A=\{x_{1},\dots,x_{k}\}\in\mathcal{P} as follows. First, we initialize the set of previously visited vertices of V(T)V(T) as visited\texttt{visited}\leftarrow\emptyset. Moreover, we initialize the current last common ancestor as curLCAx1\texttt{curLCA}\leftarrow x_{1}, which we will update stepwise until it equals lcaT(A)\operatorname{lca}_{T}(A) in the end. To this end, for each leaf x{x2,,xk}x\in\{x_{2},\dots,x_{k}\} (if any), we query newLCA=lcaT(x,curLCA)\texttt{newLCA}=\operatorname{lca}_{T}(x,\texttt{curLCA}) and move from xx upwards along the tree. Each edge e={parT(v),v}e=\{\operatorname{par}_{T}(v),v\} encountered during the traversal is colored with AA, and vv is added to visited. The traversal stops as soon as parT(v)\operatorname{par}_{T}(v) is in visited or equals newLCA. In case we have curLCATnewLCA\texttt{curLCA}\prec_{T}\texttt{newLCA}, which by definition of newLCA holds if curLCAnewLCA\texttt{curLCA}\neq\texttt{newLCA}, we perform the same bottom-up traversal starting from curLCA. As a final step in the processing of xx, we set curLCAnewLCA\texttt{curLCA}\leftarrow\texttt{newLCA}. One easily verifies that, after processing all vertices in AA, we have exactly colored the edges in the minimal subtree of TT that connects all leaves in AA. Moreover, each edge considered in the bottom-up traversals is colored with AA and required only a constant number of constant-time queries and operations. Similarly, the additional operations needed for each xAx\in A (i.e., set initialization, query, comparison, and update of the last common ancestor) are performed in constant time. Since the algorithm stops as soon as an edge would be colored with two colors, at most one edge is considered twice in the bottom-up traversal. Since TT is a phylogenetic rooted tree, we have |E(T)|2|X|2|E(T)|\leq 2|X|-2. In total, therefore, the traversals of the tree require O(|X|)O(|X|) operations. In addition, a constant effort is required for each of the O(|X|)O(|X|) vertices in the disjoint sets in 𝒫\mathcal{P}. Thus γT,𝒫(e)\gamma_{T,\mathcal{P}}(e) can be constructed in O(|X|)O(|X|) time.

It remains to show how a compatible refinement of TT can be constructed. Put (T)\mathcal{H}\coloneqq\mathcal{H}(T) and 𝔜𝔜(,𝒫)\mathfrak{Y}\coloneqq\mathfrak{Y}(\mathcal{H},\mathcal{P}). First note that all inner vertices uu that need to be resolved correspond to some YY (i.e., u=lcaT(Y)u=\operatorname{lca}_{T}(Y)) such that Y=AY=A_{\mathcal{H}} for some A𝔜A\in\mathfrak{Y}. By Lemma 7.4, it suffices to solely check the colorings for the edges incident to uu according the two conditions in Lemma 7.4. Therefore, we do not need to consider the sets Y=L(T(u))Y=L(T(u)) explicitly. In this way, each edge in TT and its set of colors must be checked at most twice. Since |γT,𝒫(e)|1|\gamma_{T,\mathcal{P}}(e)|\leq 1 for every edge eE(T)e\in E(T), this can be done in O(|X|)O(|X|) time. By Lemma 5.6, 𝒫\mathcal{H}^{*}_{\mathcal{P}} is a refinement of \mathcal{H} that is compatible with 𝒫\mathcal{P}. Instead of operating on \mathcal{H} and 𝔜\mathfrak{Y}, we directly construct the tree TT^{*} corresponding to 𝒫\mathcal{H}^{*}_{\mathcal{P}} from TT in O(|X|)O(|X|)-time as follows. If a vertex u=lcaT(Y)u=\operatorname{lca}_{T}(Y) satisfies Conditions (a’) and (b’) in Lemma 7.4, then the respective coloring of its edges imply that there is an A𝔜A\in\mathfrak{Y} with A=YA_{\mathcal{H}}=Y for some color A𝒫A\in\mathcal{P}. We refine TT at vertex uu as follows: By definition, the sets WjW_{j}\in\mathcal{H} whose disjoint union gives the newly-created sets YAY_{A} are child clusters of YY in 𝒫\mathcal{H}^{*}_{\mathcal{P}}. Hence, they correspond to the children vjchildT(u)v_{j}\in\operatorname{child}_{T}(u) for which the edge {u,vj}\{u,v_{j}\} is colored with AA. Therefore, we remove all of these edges {u,vj}\{u,v_{j}\}, and instead add the edge {u,vA}\{u,v_{A}\} and the edges {vA,vj}\{v_{A},v_{j}\}, where vAv_{A} is a newly-created vertex. In particular, we have YA=L(T(vA))Y_{A}=L(T(v_{A})). Since this is true for all sets YA𝒫Y_{A}\in\mathcal{H}^{*}_{\mathcal{P}}\setminus\mathcal{H}, the resulting tree TT^{*} corresponds to the hierarchy 𝒫\mathcal{H}^{*}_{\mathcal{P}}. Clearly, we introduce no more than O(|X|)O(|X|) new vertices. Since each edge has at most one color, at most O(|X|)O(|X|) operations are required. ∎

We next characterize compatibility of \mathcal{H} and 𝒫\mathcal{P} in terms of the edge coloring γT,𝒫\gamma_{T,\mathcal{P}} and show that compatibility of \mathcal{H} and 𝒫\mathcal{P} can be tested in linear time.

Theorem 7.6.

Let \mathcal{H} be a hierarchy on XX, TT the corresponding tree on XX, and 𝒫\mathcal{P} a partition of XX. Then \mathcal{H} and 𝒫\mathcal{P} are compatible if and only if there is no vertex uV(T)u\in V(T) and distinct A,B𝒫A,B\in\mathcal{P} such that AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}) and BγT,𝒫({u,v})B\in\gamma_{T,\mathcal{P}}(\{u,v^{\prime}\}) for (not necessarily distinct) children v,vchildT(u)v,v^{\prime}\in\operatorname{child}_{T}(u). In particular, it can be decided in O(|X|)O(|X|) time whether 𝒫\mathcal{P} and TT are compatible.

Proof.

First suppose, for contraposition, that there is a vertex uV(T)u\in V(T) with distinct A,B𝒫A,B\in\mathcal{P} such that AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}) and BγT,𝒫({u,v})B\in\gamma_{T,\mathcal{P}}(\{u,v^{\prime}\}) for children v,vchildT(u)v,v^{\prime}\in\operatorname{child}_{T}(u). If v=vv=v^{\prime}, we can apply Prop. 7.3 to conclude that there is no refinement of \mathcal{H} that is compatible with 𝒫\mathcal{P}. Thus, in particular, \mathcal{H} is not compatible with 𝒫\mathcal{P}. Now assume that vv and vv^{\prime} are distinct. We distinguish the three cases (a) uρTu\neq\rho_{T} and A,BγT,𝒫({parT(u),u})A,B\in\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}), (b) uρTu\neq\rho_{T} and exactly one of AA and BB is in γT,𝒫({parT(u),u})\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}), and (c) u=ρTu=\rho_{T} or A,BγT,𝒫({parT(u),u})A,B\notin\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}). In case (a), γT,𝒫({parT(u),u})\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}) contains two colors and we again obtain incompatibility of \mathcal{H} and 𝒫\mathcal{P} by Prop. 7.3. In case (b), we assume w.l.o.g. that AγT,𝒫({parT(u),u})A\notin\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}) and BγT,𝒫({parT(u),u})B\in\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}). Since AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}), we have |A|>1|A|>1 and since AγT,𝒫({parT(u),u})A\notin\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}), it must hold that AL(T(u))A\subseteq L(T(u)). The latter two arguments imply that there is a second edge {u,w}\{u,w\}, wchildT(u)w\in\operatorname{child}_{T}(u) with AγT,𝒫({u,w})A\in\gamma_{T,\mathcal{P}}(\{u,w\}) and, in particular, u=lcaT(A)u=\operatorname{lca}_{T}(A). Moreover, since BγT,𝒫({parT(u),u})B\in\gamma_{T,\mathcal{P}}(\{\operatorname{par}_{T}(u),u\}), we have BL(T(u))B\cap L(T(u))\neq\emptyset and BL(T(u))B\setminus L(T(u))\neq\emptyset. In particular, therefore, uu corresponds to A=L(T(u))A_{\mathcal{H}}=L(T(u)) and AA_{\mathcal{H}} is not the union of sets in 𝒫\mathcal{P}. Together with Thm. 4.5, this implies that \mathcal{H} and 𝒫\mathcal{P} are not compatible. In case (c), we have, by similar arguments as before, that u=lcaT(A)=lcaT(B)u=\operatorname{lca}_{T}(A)=\operatorname{lca}_{T}(B). Hence, uu corresponds to both AA_{\mathcal{H}} and BB_{\mathcal{H}} for distinct A,B𝒫A,B\in\mathcal{P}, and thus, Condition (ii) in Thm. 4.5 is not satisfied. Therefore, \mathcal{H} and 𝒫\mathcal{P} are not compatible.

To prove the converse, suppose, for contraposition, that \mathcal{H} and 𝒫\mathcal{P} are not compatible. Hence, Condition (i) or (ii) in Thm. 4.5 is not satisfied. If Condition (i) is not satisfied, then there is some AA\in\mathcal{H} such that AA_{\mathcal{H}} is not the union of sets in 𝒫\mathcal{P}. Since AAA\subseteq A_{\mathcal{H}} by definition, the latter implies that there must be some B𝒫{A}B\in\mathcal{P}\setminus\{A\} such that BAB\cap A_{\mathcal{H}}\neq\emptyset and BAB\setminus A_{\mathcal{H}}\neq\emptyset. Moreover, since AA and BB are disjoint and non-empty and both contain elements that are in AA_{\mathcal{H}}, the vertex ulcaT(A)u\coloneqq\operatorname{lca}_{T}(A) corresponding to AA_{\mathcal{H}} is an inner vertex and has (not necessarily distinct) children v,vchildT(u)v,v^{\prime}\in\operatorname{child}_{T}(u) such that AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}) and BγT,𝒫({u,v})B\in\gamma_{T,\mathcal{P}}(\{u,v^{\prime}\}). If Condition (ii) is not satisfied, i.e., A=BA_{\mathcal{H}}=B_{\mathcal{H}} for two distinct A,B𝒫A,B\in\mathcal{P}, we can apply similar arguments to conclude that ulcaT(A)=lcaT(B)u\coloneqq\operatorname{lca}_{T}(A)=\operatorname{lca}_{T}(B) has children v,vchildT(u)v,v^{\prime}\in\operatorname{child}_{T}(u) such that AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}) and BγT,𝒫({u,v})B\in\gamma_{T,\mathcal{P}}(\{u,v^{\prime}\}).

It remains to show that compatibility of 𝒫\mathcal{P} and TT can be decided in O(|X|)O(|X|). Compatibility implies r-compatibility which can be checked in O(|X|)O(|X|) by Thm. 7.5. In particular, the edge coloring γT,𝒫\gamma_{T,\mathcal{P}} can be constructed with the same complexity in this case. The condition whether or not there is a vertex uV(T)u\in V(T) and distinct A,B𝒫A,B\in\mathcal{P} such that AγT,𝒫({u,v})A\in\gamma_{T,\mathcal{P}}(\{u,v\}) and BγT,𝒫({u,v})B\in\gamma_{T,\mathcal{P}}(\{u,v^{\prime}\}) for (not necessarily distinct) children v,vchildT(u)v,v^{\prime}\in\operatorname{child}_{T}(u) can easily be checked in O(|X|)O(|X|) time by counting, for each vertex in an arbitrary traversal, the number of colors appearing on the edges leading to its children. ∎

Similar to Lemma 4.6, we obtain here a result for maximum-sized sets of separating edges. Since any edge eE(T)e\in E(T) with γT,𝒫(e)\gamma_{T,\mathcal{P}}(e)\neq\emptyset cannot be a separating edge, and any edge ee for which γT,𝒫(e)=\gamma_{T,\mathcal{P}}(e)=\emptyset can always be added as a separating edge, we obtain

Corollary 7.7.

Suppose that 𝒫\mathcal{P} and TT are compatible. Then, there is a unique maximum-sized set of separating edges HH^{*}, which is given by the set of edges eE(T)e\in E(T) for which γT,𝒫(e)=\gamma_{T,\mathcal{P}}(e)=\emptyset. This maximum-sized set HH^{*} can be computed in O(|X|)O(|X|) time.

For a minimum-sized set of separating edges HH^{*} for compatible 𝒫\mathcal{P} and TT, the cardinality of HH^{*} can be expressed as a function of |𝒫||\mathcal{P}| alone (cf. Lemma 4.6), and is thus independent of TT. However, the latter is not the case for a maximum-sized HH^{*} set of separating edges. To see this, consider a partition 𝒫\mathcal{P} of XX consisting of all singletons. Clearly, we have H=E(T)H^{*}=E(T) for any tree on XX where |E(T)||E(T)| varies depending on how resolved a specific tree TT is.

In what follows, we investigate the complexity of the problem of recognizing compatibility of systems 𝔓={𝒫1,𝒫2,,𝒫k}\mathfrak{P}=\{\mathcal{P}_{1},\mathcal{P}_{2},\dots,\mathcal{P}_{k}\} of partitions of XX with (refinements of) trees. In Section 3, we have introduced the two closely related problems asking whether a tree TT admits a refinement that is compatible with all partitions in 𝔓\mathfrak{P}, CompaTP, and whether a compatible tree exists at all, ExistTP. To this end, we first show that ExistTP is a simple translation of the Symm-Fitch Recognition problem, which is NP-complete [12, Thm.4.2]. In particular, this discussion will yield NP-completeness of CompaTP and ExistTP in Thm. 7.10 below.

The concept of compatible partitions and trees is intimately related to so-called undirected Fitch graphs GG, that is, complete multipartite graphs whose maximal independent sets form a partition 𝒫\mathcal{P} of V(G)V(G) [10]. To be more precise, for a given tree TT with leaf set XX and subset HE(T)H\subseteq E(T), an undirected Fitch graph G=(X,E)G=(X,E) has an edge {x,y}E\{x,y\}\in E if and only if there is an edge in HH that lies on the path between xx and yy in TT. Therefore, {x,y}E\{x,y\}\notin E if and only if xx and yy are contained in the same set B𝒫(T,H)B\in\mathcal{P}\coloneqq\mathcal{F}(T,H). This construction was generalized in [9, 11] to Fitch maps that allow multiple colors. In the following paragraph we briefly summarize the construction of symmetrized Fitch maps [12]. We then show that a symmetrized Fitch map ε\varepsilon can be interpreted as a partition system 𝔓\mathfrak{P} on XX that is compatible with a suitably chosen tree TT.

Let M{1,,k}M\coloneqq\{1,\dots,k\} be a set of colors for some kk\in\mathbb{N}. Moreover, for a set XX, we write [X×X]irr(X×X){(x,x)xX}[X\times X]_{\textrm{irr}}\coloneqq(X\times X)\setminus\{(x,x)\mid x\in X\}. An edge-colored tree (T,λ)(T,\lambda) is a tree TT together with a map λ:E(T)2M\lambda:E(T)\to 2^{M}. Note that λ\lambda can be chosen arbitrarily in contrast to the 𝒫\mathcal{P}-coloring γT,𝒫\gamma_{T,\mathcal{P}} of TT as in Def. 3.3. An edge eE(T)e\in E(T) is an mm-edge if mλ(e)m\in\lambda(e) for some mMm\in M.

A map ε:[X×X]irr2M\varepsilon\colon[X\times X]_{\textrm{irr}}\to 2^{M} is a symmetrized Fitch map if there is an edge-colored tree (T,λ)(T,\lambda) with leaf set XX and edge coloring λ:E(T)2M\lambda:E(T)\to 2^{M} such that for every pair (x,y)[X×X]irr(x,y)\in[X\times X]_{\textrm{irr}} it holds that

mε(x,y) there is an m-edge on the path from x to y.m\in\varepsilon(x,y)\iff\textnormal{ there is an }m\textnormal{-edge on the path from }x\textnormal{ to }y.

In this case, we say that ε:[X×X]irr2M\varepsilon\colon[X\times X]_{\textrm{irr}}\to 2^{M} is explained by (T,λ)(T,\lambda).

For an arbitrary map ε:[X×X]irr2M\varepsilon\colon[X\times X]_{\textrm{irr}}\to 2^{M} and each mMm\in M, the monochromatic map (induced by mm) is given by εm(x,y)ε(x,y)(M{m})\varepsilon_{m}(x,y)\coloneqq\varepsilon(x,y)\setminus(M\setminus\{m\}). By definition, we have εm(x,y){,{m}}\varepsilon_{m}(x,y)\in\{\emptyset,\{m\}\} and, in particular, εm(x,y)={m}\varepsilon_{m}(x,y)=\{m\} if and only if mε(x,y)m\in\varepsilon(x,y). If moreover ε\varepsilon (and thus εm\varepsilon_{m}) is a symmetrized Fitch map, then there is a tree (T,λ)(T,\lambda) such that εm(x,y)={m}\varepsilon_{m}(x,y)=\{m\} if and only if there is an mm-edge on the path from xx to yy. In [10], it was shown that the graph representations Gm=(X,Em)G_{m}=(X,E_{m}) of (monochromatic) symmetrized Fitch relations, given by {x,y}E\{x,y\}\in E if and only if εm(x,y)={m}\varepsilon_{m}(x,y)=\{m\}, coincide with the class of complete multipartite graphs. Therefore, they can uniquely be represented by a partition 𝒫mε\mathcal{P}_{m}^{\varepsilon} of XX in a way that each set A𝒫mεA\in\mathcal{P}_{m}^{\varepsilon} corresponds to a maximal independent subset of XX, i.e. {x,y}Em\{x,y\}\notin E_{m} for all x,yAx,y\in A. In particular, it holds that x,yXx,y\in X are elements of distinct sets of 𝒫mε\mathcal{P}_{m}^{\varepsilon} if and only if mε(x,y)m\in\varepsilon(x,y). Thus, if all monochromatic maps are symmetrized Fitch maps, 𝔓ε{𝒫mεmM}\mathfrak{P}^{\varepsilon}\coloneqq\{\mathcal{P}_{m}^{\varepsilon}\mid m\in M\} is a partition system on XX.

Lemma 7.8.

Let ε:[X×X]irr2M\varepsilon\colon[X\times X]_{\textrm{irr}}\to 2^{M} be a map such that the monochromatic maps εm\varepsilon_{m} are symmetrized Fitch maps for all mMm\in M, and T¯\overline{T} be an unrooted tree with leaf set XX. Then, there is an edge-coloring λ\lambda such that (T¯,λ)(\overline{T},\lambda) explains ε\varepsilon if and only if the partition 𝒫mε\mathcal{P}_{m}^{\varepsilon} of XX is compatible with T¯\overline{T} for all mMm\in M.

Proof.

First note that, since the monochromatic maps εm\varepsilon_{m} are symmetrized Fitch maps for all mMm\in M, the partitions 𝒫mε\mathcal{P}_{m}^{\varepsilon} of XX are all well-defined. For the case |X|{1,2}|X|\in\{1,2\}, the statement is trivially true since, in this case, every of the (at most two) possible partitions of XX are compatible with a unique tree on XX. Thus, assume that |X|3|X|\geq 3. Let TT be any rooted version of T¯\overline{T}. Since |X|3|X|\geq 3, we can apply Prop. 6.1, to conclude that T¯\overline{T} is compatible with some partition 𝒫\mathcal{P} of XX if and only if TT is compatible with 𝒫\mathcal{P}. Hence, it suffices to show the statements for TT.

Assume there is an edge-coloring λ\lambda such that (T,λ)(T,\lambda) explains ε\varepsilon. Put Hm{eE(T)mλ(e)}H_{m}\coloneqq\{e\in E(T)\mid m\in\lambda(e)\}, mMm\in M. By construction, we have for all distinct x,yXx,y\in X that

  1. xx and yy are in distinct sets of 𝒫mε\mathcal{P}_{m}^{\varepsilon}

  2. \iff mε(x,y)m\in\varepsilon(x,y)

  3. \iff mλ(e)m\in\lambda(e) for some eE(T)e\in E(T) on the path connecting xx and yy

  4. \iff there is an edge in HmH_{m} on the path connecting xx and yy

  5. \iff xx and yy are in distinct sets of (T,Hm)\mathcal{F}(T,H_{m}).

Hence, 𝒫mε=(T,Hm)\mathcal{P}_{m}^{\varepsilon}=\mathcal{F}(T,H_{m}) and 𝒫mε\mathcal{P}_{m}^{\varepsilon} is compatible with TT for all mMm\in M.

Now assume that the partition 𝒫mε\mathcal{P}_{m}^{\varepsilon} of XX is compatible with TT for all mMm\in M. For each mMm\in M, define λm(e)={m}\lambda_{m}(e)=\{m\} for all edges e={par(lcaT(B)),lcaT(B)}e=\{\operatorname{par}(\operatorname{lca}_{T}(B)),\operatorname{lca}_{T}(B)\} for some B𝒫mεB\in\mathcal{P}_{m}^{\varepsilon} with lcaT(B)ρT\operatorname{lca}_{T}(B)\neq\rho_{T} and, for all remaining edges ee put λm(e)=\lambda_{m}(e)=\emptyset. By construction, we have for eE(T)e\in E(T) that λm(e)={m}\lambda_{m}(e)=\{m\} if and only if ee is contained the set HmH_{m} as specified in Eq. (3). By Cor. 4.7, we have 𝒫mε=(T,Hm)\mathcal{P}_{m}^{\varepsilon}=\mathcal{F}(T,H_{m}). Hence, mε(x,y)m\in\varepsilon(x,y) if and only if there is an edge ee along the path between xx and yy with λm(e)={m}\lambda_{m}(e)=\{m\}. Now set λ(e)=mMλm(e)\lambda(e)=\cup_{m\in M}\lambda_{m}(e) for all eTe\in T to obtain the final coloring such that (T,λ)(T,\lambda) explains ε\varepsilon. ∎

Lemma 7.9.

Symm-Fitch Recognition remains NP-hard if the monochromatic maps εm\varepsilon_{m} are symmetrized Fitch maps for all mMm\in M.

Proof.

As shown in [12, Thm. 4.2], Symm-Fitch Recognition is NP-complete. Note, if for a map ε:[X×X]irr2M\varepsilon\colon[X\times X]_{\textrm{irr}}\to 2^{M} there is an mMm\in M such that εm\varepsilon_{m} is not a monochromatic Fitch match, then ε\varepsilon cannot be a symmetrized Fitch map; a property that can be checked in polynomial-time for all mMm\in M [10]. Hence, under the assumption that PNPP\neq NP, the NP-hard instances must, in particular, be included within the instances of Symm-Fitch Recognition for which εm\varepsilon_{m} is a monochromatic Fitch match for all mMm\in M. ∎

We are now in the position to establish NP-completeness of ExistTP and CompaTP.

Theorem 7.10.

ExistTP and CompaTP are NP-complete.

Proof.

By Thm. 7.6 and the fact that one can test in polynomial time whether TT^{*} is a refinement of TT by comparing their hierarchies, ExistTP and CompaTP are contained in the class NP. By Lemma 7.8, ExistTP is equivalent to the Symm-Fitch Recognition problem restricted to a certain set of instances for which, by Lemma 7.9, the problem remains NP-hard. Asking for a tree that is compatible with 𝔓\mathfrak{P} is equivalent to asking whether there exists a refinement TT^{*} of the star tree that is compatible with 𝔓\mathfrak{P}. Therefore, ExistTP is a special instance of CompaTP and NP-hardness of ExistTP implies that CompaTP is also NP-hard. Since both problems are in class NP, ExistTP and CompaTP are NP-complete. ∎

The latter problems become fixed-parameter tractable and thus, easier, if TT is “almost binary”. In [17], the resolution of a rooted tree is quantified by the normalized parameter res(T)(|V||X|1)/(|X|2)\operatorname{res}(T)\coloneqq(|V|-|X|-1)/(|X|-2), which varies between 0 (star tree) and 11 (binary tree). The quantity h(T)2|X||E(T)|2=(|X|2)(1res(T))h(T)\coloneqq 2|X|-|E(T)|-2=(|X|-2)(1-\operatorname{res}(T)) correspondingly measures how much TT deviates from being binary. Now let VV^{*} be the set of non-binary inner vertices of TT. Writing hv|childT(v)|2h_{v}\coloneqq|\operatorname{child}_{T}(v)|-2 for the number of “excess children” at the inner vertex vVv\in V^{*}, one easily checks that h(T)=vVhvh(T)=\sum_{v\in V^{*}}h_{v}. Now suppose h(T)hh(T)\leq h. Then all possible binary refinements of TT are obtained by inserting an arbitrary binary tree between each non-binary vertex vv of TT and its children child(v)\operatorname{child}(v). It is well known the that the number of binary rooted leaf-labeled trees on dv=|child(v)|=hv+2d_{v}=|\operatorname{child}(v)|=h_{v}+2 leaves is (2dv3)!!=(2hv+1)!!(2d_{v}-3)!!=(2h_{v}+1)!! [6]. The total number of binary refinements is therefore vV(2hv+1)!!\prod_{v\in V^{*}}(2h_{v}+1)!!. From the definition of the double factorial (2n+1)!!=13(2n+1)(2n+1)!!=1\cdot 3\cdot\dots...\cdot(2n+1) we see that, after omitting the leading factors 11, (2hv+1)!!(2h_{v}+1)!! has exactly hvh_{v} factors for each vVv\in V^{*}, and thus, vV(2hv+1)!!\prod_{v\in V^{*}}(2h_{v}+1)!! has exactly vVhv\sum_{v\in V^{*}}h_{v} factors. Similarly, (2h+1)!!(2h+1)!! has hh contributing factors greater than 11. By ordering these factors in vV(2hv+1)!!\prod_{v\in V^{*}}(2h_{v}+1)!! and (2h+1)!!(2h+1)!!, resp., one easily verifies that, since vVhvh\sum_{v\in V^{*}}h_{v}\leq h, the second product has at least as many factors as the first, and moreover, each of them is not smaller than the corresponding factor (w.r.t. the ordering) in the first product (if existent). Note that equality holds if and only if VV^{*} comprises a single vertex with h+2h+2 children. Each binary refinement can be checked in O(|𝔓||X|)O(|\mathfrak{P}|\,|X|) time for consistency with 𝔓\mathfrak{P} since the consistency check for a single partition can be performed in O(|X|)O(|X|) time by Thm. 7.5 below. Thus there is an O((2h+1)!!|𝔓||X|)O((2h+1)!!\,|\mathfrak{P}|\,|X|) algorithm and thus, CompaTP is FPT for the parameter hh.

8 Concluding Remarks

We have characterized the compatibility of a partition 𝒫\mathcal{P} with a hierarchy \mathcal{H}. The concept of compatibility considered here is much more general than that of a “representative partition”, i.e., the cutting of hierarchy \mathcal{H} at a particular aggregation level. Instead, it amounts to disconnecting the corresponding tree TT at an arbitrary set of edges HE(T)H\subseteq E(T), i.e., at a subset of the splits 𝔖(T)\mathfrak{S}(T). In Section 5, we have characterized when a refinement of TT^{*} of a tree TT exists such that 𝒫\mathcal{P} and TT^{*} are compatible. For practical application, it may be relevant to allow more general operations on the tree TT. A natural generalization is to allow not only refinements but also edge contraction while editing TT into a tree TT^{\prime} that is compatible with a partition 𝒫\mathcal{P} of interest. This amounts to minimizing the cardinality |𝔖(T¯)𝔖(T¯)||\mathfrak{S}(\overline{T})\operatorname{\triangle}\mathfrak{S}(\overline{T^{\prime}})| of the symmetric difference of the corresponding split systems, i.e., the Robinson-Foulds distance of TT and TT^{\prime}.

In Section 6, we have considered tree-like split systems, i.e., split systems that can be represented by unrooted trees. Thm. 6.4 and Cor. 6.5, however, suggest to consider the compatibility of a partition 𝒫\mathcal{P} and a split system 𝔖\mathfrak{S} in a more general setting. These characterizations may then serve as convenient definitions in a more general context: We may say that a partition 𝒫\mathcal{P} and a split system 𝔖\mathfrak{S} are compatible if there is a set of splits \mathfrak{H} such that =𝒫\bigwedge\mathfrak{H}=\mathcal{P} and 𝔖\mathfrak{H}\subseteq\mathfrak{S}. In order to handle refinements in this setting, one would ask whether there is a set of splits \mathfrak{H} such that =𝒫\bigwedge\mathfrak{H}=\mathcal{P}. Without further restrictions, =𝔖𝒫\mathfrak{H}=\mathfrak{S}_{\mathcal{P}} always provides a positive answer to this question. In analogy with our discussion above, it therefore seems natural to consider only split systems 𝔖\mathfrak{S} that belong to a certain class of interest. A refinement will be feasible only if the split system 𝔖\mathfrak{H}\cup\mathfrak{S} again belongs to the desired class. Natural generalizations of tree-like split systems to which the notion of compatibility may be applied include circular and weakly compatible split systems [18, 5], or the even more general Teutoburgan split systems considered in [13]. The suggested definition of compatibility in terms of split systems also provides the natural generalization to the framework of X-trees [18, 5], i.e., to trees in which the set of taxa XX is not restricted to the leaves of TT but may also appear as inner vertices of TT. This amounts to lifting our requirement that the trivial splits {x}|(X{x})\{x\}|(X\setminus\{x\}) must be included in 𝔖(T¯)\mathfrak{S}(\overline{T}).

We have seen in Section 7 that compatibility of 𝒫\mathcal{P} and \mathcal{H} and the existence of a refinement \mathcal{H}^{*} can be decided in linear time, while the extension to arbitrary partition systems 𝔓\mathfrak{P} is NP-complete. Several interesting open questions remain concerning the computational complexity of the Compatibility of Tree and Partition System problem and the Existence of Tree compatible with Partition System problem. Do these problems remain NP-complete if the tree corresponding to \mathcal{H} has bounded degree? What if the number |𝔓||\mathfrak{P}| of input partitions is kept constant? Since the related Symm-Fitch Recognition problem [12] is in turn closely related to the problem of Unrooted Tree Compatibility [3], which is known to be FPT in number of input trees, it is not unlikely that Compatibility of Tree and Partition System is FPT in the number of partitions. Furthermore, it is interesting to ask whether there are (easily recognizable) subclasses of partition systems for which CompaTP and ExistTP become tractable. Interesting candidates are the braids of partitions appearing in image analysis [15, 19], or the hierarchical partition systems considered in [14], for which xX{{x}}𝒫𝔓𝒫{X}\bigcup_{x\in X}\{\{x\}\}\cup\bigcup_{\mathcal{P}\in\mathfrak{P}}\mathcal{P}\cup\{X\} forms a hierarchy.

Acknowledgments

This work was funded in part by the Deutsche Forschungsgemeinschaft. We thank the anonymous referees for the constructive comments and recommendations which helped to significantly improve the readability and quality of the paper.

References

  • Bender et al. [2005] M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms, 57(2):75–94, 2005. doi: 10.1016/j.jalgor.2005.08.001.
  • Birkhoff [1967] G. Birkhoff. Lattice Theory. Amer. Math. Soc., Providence, RI, 3rd edition, 1967.
  • Bryant and Lagergren [2006] D. Bryant and J. Lagergren. Compatibility of unrooted phylogenetic trees in FPT. Theor. Comp. Sci., 351:296–302, 2006. doi: 10.1016/j.tcs.2005.10.033.
  • Buneman [1971] P. Buneman. The recovery of trees from measures of dissimilarity. In F. R. Hodson, D. G. Kendall, and P. Tautu, editors, Mathematics in the Archaeological and Historical Sciences, pages 387–385. Edinburgh University Press, Edinburgh, 1971.
  • Dress et al. [2012] A. W. M. Dress, K. T. Huber, J. Koolen, V. Moulton, and A. Spillner. Basic Phylogenetic Combinatorics. Cambridge University Press, Cambridge, UK, 2012.
  • Felsenstein [1978] J. Felsenstein. The number of evolutionary trees. Syst. Biol., 27:27–33, 1978. doi: 10.2307/2412810.
  • Geiß et al. [2018] M. Geiß, J. Anders, P. F. Stadler, N. Wieseke, and M. Hellmuth. Reconstructing gene trees from Fitch’s xenology relation. J. Math. Biol., 77:1459–1491, 2018. doi: 10.1007/s00285-018-1260-8.
  • Hellmuth [2017] M. Hellmuth. Biologically feasible gene trees, reconciliation maps and informative triples. Alg Mol Biol, 12:23, 2017. doi: 10.1186/s13015-017-0114-z.
  • Hellmuth [2019] M. Hellmuth. Generalized Fitch graphs: Edge-labeled graphs that are explained by edge-labeled trees. Disc. Appl. Math., 267:1–11, 2019. doi: 10.1016/j.dam.2019.06.015.
  • Hellmuth et al. [2018] M. Hellmuth, Y. Long, M. Geiß, and P. F. Stadler. A short note on undirected Fitch graphs. Art Discr. Appl. Math., 1:P1.08, 2018. doi: 10.26493/2590-9770.1245.98c.
  • Hellmuth et al. [2020] M. Hellmuth, C. R. Seemann, and P. F. Stadler. Generalized fitch graphs II: Sets of binary relations that are explained by edge-labeled trees. Discr. Appl. Math., 283:495–511, 2020. doi: 10.1016/j.dam.2020.01.036.
  • Hellmuth et al. [2021] M. Hellmuth, C. R. Seemann, and P. F. Stadler. Generalized Fitch graphs III: Symmetrized Fitch maps and sets of symmetric binary relations that are explained by unrooted edge-labeled trees. Discr. Math. Theor. Comp. Sci., 23, 2021. doi: 10.46298/dmtcs.6040.
  • Huber et al. [2006] K. T. Huber, J. H. Koolen, and V. Moulton. On the structure of the tight-span of a totally split-decomposable metric. Eur. J. Comb., 27:461–479, 2006. doi: 10.1016/j.ejc.2004.05.007.
  • Huber et al. [2014] K. T. Huber, V. Moulton, C. Semple, and T. Wu. Representing partitions on trees. SIAM J. Discr. Math., 28:1152–1172, 2014. doi: 10.1137/130906192.
  • Kiran and Serra [2015] B. R. Kiran and J. Serra. Braids of partitions. In J. Benediktsson, J. Chanussot, L. Najman, and T. H., editors, Mathematical Morphology and Its Applications to Signal and Image Processing, volume 9082 of Lecture Notes Comp. Sci, pages 217–228, Cham, 2015. Springer. doi: 10.1007/978-3-319-18720-4˙19.
  • Milligan and Cooper [1985] G. W. Milligan and M. C. Cooper. An examination of procedures for determing the number of clusters in a data set. Psychometrika, 50:159–179, 1985. doi: 10.1007/BF02294245.
  • Schaller et al. [2021] D. Schaller, M. Geiß, M. Hellmuth, and P. F. Stadler. Best match graphs with binary trees. In C. Martín-Vide, M. A. Vega-Rodríguez, and T. Wheeler, editors, Algorithms for Computational Biology, pages 82–93, Cham, 2021. Springer International Publishing. doi: 10.1007/978-3-030-74432-8˙6.
  • Semple and Steel [2003] C. Semple and M. Steel. Phylogenetics. Oxford University Press, Oxford UK, 2003.
  • Tochon et al. [2019] G. Tochon, M. Dalla Mura, M. A. Veganzones, T. Géraud, and J. Chanussot. Braids of partitions for the hierarchical representation and segmentation of multimodal images. Pattern Recognition, 95:162–172, 2019. doi: 10.1016/j.patcog.2019.05.029.
  • van de Vel [1993] M. van de Vel. Theory of convex structures. North Holland, Amsterdam, 1993.
  • Vega-Pons and Ruiz-Shulcloper [2010] S. Vega-Pons and J. Ruiz-Shulcloper. Partition selection approach for hierarchical clustering based on clustering ensemble. In I. Bloch and R. M. Cesar Jr., editors, CIARP 2010, volume 6419 of Lecture Notes Comp. Sci., pages 525–532, Berlin, Heidelberg, 2010. Springer-Verlag. doi: 10.1007/978-3-642-16687-7˙69.