Compatibility of Partitions with Trees, Hierarchies, and Split Systems
Abstract
The question whether a partition and a hierarchy or a tree-like split system are compatible naturally arises in a wide range of classification problems. In the setting of phylogenetic trees, one asks whether the sets of coincide with leaf sets of connected components obtained by deleting some edges from the tree that represents or , respectively. More generally, we ask whether a refinement of exists such that and are compatible in this sense. The latter is closely related to the question as to whether there exists a tree at all that is compatible with . We report several characterizations for (refinements of) hierarchies and split systems that are compatible with (systems of) partitions. In addition, we provide a linear-time algorithm to check whether refinements of trees and a given partition are compatible. The latter problem becomes NP-complete but fixed-parameter tractable if a system of partitions is considered instead of a single partition. In this context, we also explore the close relationship of the concept of compatibility and so-called Fitch maps.
Keywords: hierarchy, split system, phylogenetic tree, partition, compatibility, recognition algorithm, Fitch map
1 Introduction
The selection of a partition from a hierarchy (or its equivalent rooted tree ) on a finite set is often the final step in applications of hierarchical clustering procedures [16, 21]. In this case, , often called a “representative partition”, is composed of the pairwise disjoint leaf sets of subtrees rooted at some vertices of . Equivalently, each set (class) of is a set in , i.e., . Cutting a hierarchy at different “levels” leads to partitions of that are ordered by refinement, i.e., every set at the lower level is contained in a set at the higher level. In the image processing literature, braids of partitions have been introduced as systems of partitions of that generalize such hierarchically ordered partitions. In a braid, all pairwise refinement suprema (i.e., the finest partition that is refined by both and ) must be hierarchically organized w.r.t. refinement, and moreover satisfy [15, 19]. Considering the distribution of attributes of species, biologists have been interested in systems of partitions and the associated system of splits with for some . In [14], the compatible split systems that are generated by partition systems are characterized.
Instead of considering systems of partitions, one can ask whether a single partition , or – equivalently – its associated split system can be obtained from a rooted or unrooted tree by cutting a subset of the tree edges and considering the leaf sets of the resulting connected components. For rooted trees, this question arises naturally in mathematical phylogenetics. The removal of those edges from the rooted gene tree that correspond to horizontal gene transfer (HGT) then leaves subtrees of the gene phylogeny that can be analyzed independently [8, 7, 10]. If the tree and the partition of the leaf set into HGT-free subsets are inferred independently, it is important to recognize whether the data are compatible with each other, i.e., whether can be obtained from the tree by cutting some of its edges. If this is not possible for a tree that contains multifurcations, it may still be possible to achieve compatibility by refining some of the multifurcations in . Here, we address these two main questions.
The split system introduced above suggests a different notion of compatibility with trees. Consider the partition . Clearly, is compatible with the star tree on . However, (more precisely, its unrooted version ) does not display the split . In fact, the condition that is displayed by the unrooted tree is closely related to the idea that is a representative partition (of a suitably rooted version) of the tree . This example shows that the notion of compatibility considered here is more general than the concepts that have appeared in the literature so far.
In this contribution, we characterize compatibility of partitions with rooted and unrooted trees (or equivalently their hierarchies and split systems, respectively). After introducing the notation and some preliminary results, we give an overview of the main concepts and results in Section 3.
2 Preliminaries
Basics
We denote the power set of a set by . A set system is a partition of if (P0) , (P1) , and (P2) if and then . We will interchangeably use the equivalent terms: set of partitions, collection of partitions and partition systems. Two sets and overlap if , , and .
In this contribution, we consider both rooted and unrooted phylogenetic trees with vertex set , edge set , and leaf set . In this case, we also say that is a tree on . A star tree is a tree for which .
Rooted Trees and Hierarchies
A rooted tree has a distinguished vertex called the root of . For , we write for the set of its children, and for the parent of . In both cases, we may omit the index whenever there is no risk of confusion. The subtree of rooted at is denoted by . Furthermore, we write for the ancestor partial order on , that is, if lies on the path from to . If or , then and are comparable and, otherwise, incomparable. For a nonempty subset of , we denote by the last common ancestor of in . A rooted tree is phylogenetic if all its inner vertices have at least two children. Hence, a rooted phylogenetic tree may contain one vertex with degree , namely the root .
A set system is a hierarchy (on ) if (H0) , (H1) , (H2) implies , i.e., and do not overlap, and (H3) for all . For a given non-empty set and a hierarchy , we define as the inclusion-minimal element in that contains .
For a hierarchy we can define the closure as the function satisfying
(1) |
for all subsets . Where there is no danger of confusion, we will drop the explicit reference to and simply write instead of .
Lemma 2.1.
Let be non-empty and be a hierarchy on . Then for all .
Proof.
Since , , and no two elements in overlap, there is a unique inclusion-minimal element in that contains , i.e., every that contains also contains . Thus . ∎
As an immediate consequence, we observe that for a hierarchy , satisfies the classical properties of a closure operator: (C1) (enlarging); (C2) (idempotent); (C3) If , then (isotone). For , contains at least two distinct singletons and and thus . Only in the special case , i.e., , we get . This is the consequence of the usual practice of excluding from hierarchies to have the singleton as inclusion-minimal elements, which in turn is motivated by the 1-to-1 correspondence between rooted trees and hierarchies. We note that in the context of abstract convexities, on the other hand, it is customary to enforce so that is closed under intersection [20], in which case also for . This subtlety is irrelevant for our discussion, however, since we are not interested in the trivial cases .
The following result shows that there is a 1-to-1 correspondence between hierarchies and rooted phylogenetic trees.
Theorem 2.2 ([18]).
Let be a collection of non-empty subsets of . Then, is a hierarchy on if and only if there is a rooted phylogenetic tree on with .
The sets for or, equivalently, the sets of a hierarchy are commonly referred to as clusters. In view of Thm. 2.2, every set corresponds to a vertex such that and is equivalent to . The hierarchy of a rooted phylogenetic tree is . Moreover, we say that a rooted tree corresponds to if . In particular, Thm. 2.2 ensures that for all hierarchies there is a corresponding tree and, moreover, that all results established here for hierarchies do also hold for rooted phylogenetic trees and vice versa. As an immediate consequence, we can express the closure as
(2) |
We therefore have if and only if for all . Given a set with (and corresponding vertex that satisfies ), we call a child cluster of if for some . Hence, the child clusters of are exactly the inclusion-maximal sets in that satisfy . For simplicity of notation, we will think of the leaves of in this case as the , i.e., . A phylogenetic tree is a refinement of the phylogenetic tree on the same set if the corresponding hierarchy is a refinement of , i.e., if .
Unrooted Trees and Split Systems
An unrooted tree is phylogenetic if all non-leaf vertices have degree at least three. A split on is a partition of the set into two disjoint non-empty subsets and . In every tree on , we can associate with every edge the split where and are the leaf sets of the two (not necessarily phylogenetic) trees and , respectively, obtained from by deletion of . An unrooted phylogenetic tree is determined by its split system . To be more precise, there is a 1-to-1 correspondence between unrooted phylogenetic trees with leaf set and split systems that (i) contain all “singleton splits” and (ii) are “compatible” in the sense that, for any two splits , at least one of the four intersections , , , or is empty [4]. In this case, there is a unique (up to isomorphism) tree with . We call such split systems tree-like. An unrooted tree is a refinement of if .
We note that this is a special case of the analogous result for so-called -trees, see e.g. [18, Prop. 3.5.4]. In an -tree, a set of “taxa” is mapped (not necessarily injectively) to the vertex set of a rooted or unrooted tree [18]. The phylogenetic (rooted or unrooted) trees considered here are a slightly less general construction that identifies the taxa set with the leaf set and insists that distinct taxa are represented by distinct vertices in the underlying trees. That is, they are equivalent to -trees for which is bijectively mapped to .
Remark.
Throughout this contribution, we assume that is finite and . Moreover, all rooted and unrooted trees are phylogenetic unless explicitly specified otherwise.
3 Main Ideas and Results
Let be a (rooted or unrooted) tree with leaf set and be a subset of edges. Removal of disconnects into a forest whose connected components induce the partition on the leaf set . Of course, it may be possible that removal of the edges separates inner vertices, e.g., if all incident edges to an inner vertex are in . This, however, does not change the fact that we still obtain a partition of after removal of the edges in . We will refer to the edges as separating edges. Fig. 1 shows two examples for sets of separating edges, and as indicated by the dashed lines, for a given tree . The partition is a “representative partition” for , i.e., all sets in appear as clusters in : , , and . In contrast, the partition contains the set which is not a cluster in , and thus, is not a representative partition for .

The notion of compatibility of partitions and trees used here is defined in terms of separating edges and the partitions that they induce:
Definition 3.1.
Let be a partition of and let be a rooted or unrooted tree with leaf set . Then and are compatible if there is a set of separating edges such that .
In case is rooted (or unrooted) and compatible with , the corresponding hierarchy (or split system , resp.) are said to be compatible with .
As we shall see in Lemma 4.1, we can always find a tree on that is compatible with for a given partition of . In particular, the tree corresponding to the hierarchy is always compatible with , see and in Fig. 2 for an illustrative example.

However, Fig. 2 also shows that there can be multiple different trees on that are compatible with . The main result of Section 4, Thm. 4.5, is a characterization of compatibility of partitions and hierarchies (and thus rooted trees).
As it turns out, not all hierarchies are compatible with a given partition . In many applications, hierarchies (and their corresponding rooted trees) are not necessarily fully resolved, even though it is often assumed that the “ground truth” is a binary tree. In situations where and are not compatible, it is therefore of interest to ask whether it is possible to find a refinement of that is compatible with .

An example of a tree that is not compatible with a partition but that admits a compatible refinement is shown in Fig. 3. To see that and as in Fig. 3 are not compatible, the edges in have been colored in orange and cyan if they lie on a path connecting two elements from or , respectively. Clearly, none of these edges can be a separating edge. In particular, since all edges in are colored, and cannot be separated by any subset . For similar reasons the tree as in Fig. 3 is not compatible with . Since is already fully-resolved, it does not admit a compatible refinement.
Definition 3.2.
A tree and a partition are refinement-compatible (r-compatible for short) if there exists a refinement of that is compatible with .
In case is rooted (unrooted) and refinement-compatible with , the corresponding hierarchy (split system , resp.) are said to be refinement-compatible with .
By definition, compatibility implies r-compatibility since every tree is a refinement of itself. In Section 5, we show that refining a hierarchy or a rooted tree that is already compatible with a partition never destroys compatibility. As a main result of Section 5, we obtain a characterization of r-compatibility in terms of the simple condition that no set overlaps with two distinct sets (cf. Thm. 5.7). We later utilize these results to derive simple linear-time algorithms for both recognition of compatibility of a partition and a tree, as well as the construction of a compatible refinement if one exists in Section 7.
Even though compatibility of partitions and rooted trees is linear-time-decidable, the situation appears substantially more complicated when systems of partitions of are considered rather than single partitions. By definition, (or equivalently, the hierarchy, ) and each are compatible if and only if for some subset , . In this case, we say that (equiv. ) and are compatible. It is natural, then, to ask whether for a given system of partitions , there exists a tree such that and are compatible:
Problem (Existence of Tree compatible with Partition System (ExistTP)).
Input: | A partition system on . |
---|---|
Question: | Is there a tree on such that and are compatible? |
Since every tree on is a refinement of the star tree on , ExistTP is a special case of the following more general problem:
Problem (Compatibility of Tree and Partition System (CompaTP)).
Input: | A tree with leaf set and a partition system on . |
---|---|
Question: | Is there a refinement of such that and are compatible? |

The difficulty of both ExistTP and CompaTP stems from the fact that refinements of the underlying tree that are necessary to obtain compatibility with individual partitions in may contradict one another. Consider the tree and three of its possible refinements and as shown in Fig. 4. In this example, is not compatible with , and the refinement is compatible with but not with , . However, there is no common refinement of and that is compatible with . On the other hand, is compatible with both and . Hence, admits a refinement that is compatible with . An example of a tree for which every partition is r-compatible with but there is no refinement of that is compatible with is provided in Fig. 7. Hence, ExistTP and CompaTP seem to be inherently difficult and indeed both are NP-complete decision problems, see Thm. 7.10.
In Section 6, compatibility of partitions with splits systems and their equivalent representations as unrooted phylogenetic trees is considered. As shown in Prop. 6.2, compatibility of partitions with unrooted and rooted trees are closely related. In fact, a partition is compatible with an unrooted tree if and only if it is compatible with any rooted version of this tree. Further characterization of (refinements of) split systems and unrooted trees being compatible with partitions will be established (cf. Thm. 6.4 as well as Lemma 6.7 and 6.8).
Section 7 is dedicated to algorithmic considerations and the complexity of deciding whether (systems of) partitions are compatible (with refinements) of hierarchies and split systems, represented by rooted and unrooted trees, respectively. As we have seen above, there are edges of a tree that can never be separating edges since their removal would break down some set . In order to identify such edges, we provide an edge coloring that we have already used in an informal way to demonstrate incompatibility in the example in Fig. 3:
Definition 3.3.
Let be a tree on and a partition of . The -(edge-)coloring of is the map that is given by
The key property of this edge coloring is that any edge with lies on some path between two vertices that are contained in the same set of , and thus, cannot be a separating edge. In contrast, all edges for which do not separate any two leaves that are in the same set of , and thus can be safely added to the set of separating edges. A key result, proven in Section 7, is that and are r-compatible if and only if for every (Prop. 7.3). In this case, the coloring can be computed in linear time and, based on this, deciding the existence of and finding a (refinement of) a tree that is compatible with a partition can be in done in linear time as well. To establish compatibility of and , it suffices to rule out the existence of a vertex that is incident with two differently colored edges (cf. Thm. 7.6). If and are r-compatible but not compatible, the vertices that violate the latter condition coincide with for some and can be refined by collecting all children for which under a newly created vertex (cf. Thm. 7.5).
4 Compatibility of Partitions and Hierarchies
We first show that, for every partition, there is a compatible hierarchy.
Lemma 4.1.
For every partition of , the set system is a hierarchy that is compatible with . In particular, every partition of is compatible with a rooted tree on .
Proof.
Since is a partition of , no two sets overlap. This remains true for and thus is a hierarchy. Moreover, it is easy to see that if all edges incident to the root of the tree corresponding to are added to , then whenever . In case , is a star tree and we have . ∎
The following result is a key step for the characterization of compatible partitions and hierarchies. In particular, it shows that, for any two elements , the set can only intersect with at most one child of .
Lemma 4.2.
If a partition of and a rooted tree on are compatible, then, for all , there are no two distinct children such that and .
Proof.
Let and put . Since is compatible with , there is a set of separating edges such that . Assume, for contradiction, that there are two children and a set such that and . Since , there are children such that and . The vertices are not necessarily distinct from . However, we can assume w.l.o.g. that . Since is compatible with , there is no separating edge on the path from to for any and . In particular, the path from to does not contain a separating edge. Similarly, for all vertices and , there is no separating edge on the path from to , and the path from any such to does not contain a separating edge. Since and both and are children of , the path from to is the concatenation of the path from to and the path from to , and thus, does not contain a separating edge. Hence and are in the same connected component of ; a contradiction since and are disjoint by assumption. ∎

Using the correspondence between phylogenetic trees and hierarchies (cf. Thm. 2.2), one can translate Lemma 4.2 to the language of hierarchies:
Corollary 4.3.
Suppose a partition of and a hierarchy on (with corresponding tree ) are compatible and are distinct. Then and .
Lemma 4.4.
Suppose a partition of and a hierarchy on are compatible, and . The set does not overlap with any .
Proof.
Assume, for contradiction, that and overlap for some . Since and , we have and, by Lemma 4.2, . Since overlaps with and is a hierarchy, we have . Now let be the tree of and set . Since is the unique inclusion-minimal element in that contains , there are two distinct children such that and . Let and . Since and are compatible, there is a set of separating edges such that . Since , the unique path from to in , and in particular the path from to , cannot contain any separating edge. Furthermore, since and overlap, there is an element . Since and are disjoint, we have . Hence, for some child . Since and are distinct children of , we can assume w.l.o.g. that . Moreover, together with the fact that is the unique inclusion-minimal element in that contains , implies that there are two distinct children that satisfy and . Since , the unique path from to in , and in particular the path from to , cannot contain a separating edge. Since and both and are children of , the path from to is the concatenation of the paths and , and thus, does not contain a separating edge. Therefore, there is a set with . Now, and implies that ; a contradiction. ∎
Theorem 4.5.
Let be a hierarchy on and be a partition of . Then, and are compatible if and only if the following two conditions are satisfied for all :
-
(i)
is a union of sets of .
-
(ii)
If , then .
Proof.
Let be the tree with .
Assume that is compatible with . First observe that, for all , implies that is the union of sets of if and only if does not overlap with any . Hence, Condition (i) follows immediately from Lemma 4.4. Condition (ii) follows immediately from the fact that implies and Cor. 4.3.
Now suppose (i) and (ii) holds. Consider the tree and the following set of separating edges
(3) |
Thus, an edge is a separating edge if and only if and thus for some .
We first show that any two distinct are separated by at least one separating edge in , i.e., there is no path in connecting any vertex with any vertex . To this end, let be chosen arbitrarily but distinct. By contraposition of Condition (ii), we have . Therefore and since is a hierarchy, we have to consider the two cases (a) , and (b) or . Case (a) corresponds to the situation in which and are incomparable in , which is, moreover, only possible if neither nor are the root. Hence, the edges and are contained in and every path from some to some contains these two edges. Thus, the two sets and are separated by separating edges in . In case (b), we assume w.l.o.g. that which corresponds to the situation in which . Hence, is not the root of , and we have . Since contains all elements in , the two sets and are completely separated by this separating edge if does not contain any element of . Thus, assume for contradiction that . This together with the facts that , and is inclusion-minimal for implies that and overlap. Therefore, is not the union of sets of ; a contradiction to Condition (i).
It remains to show that no set contains two elements which are separated by a separating edge in . Thus, assume for contradiction that there is such an edge lying on the path that connects two for some . We can assume w.l.o.g. that and . Since , we have that corresponds to for some . Since but and by similar arguments as above, the sets and overlap which is again a contradiction to Condition (i).
In summary, we have for all and thus, . Therefore, is compatible with . ∎

The second part of the proof of Theorem 4.5 implies a simple algorithm to determine whether and are compatible and to construct a (minimal) set of separating edges that realizes the partition on : In Section 7, we derive a linear-time compatibility test algorithm. The set of separating edges in Eq. (3) can also be constructed in polynomial time. Moreover, if there is an element with , then the set as in Eq. (3) is minimal because it contains, by construction, separating edges, i.e., the minimal number of splits required to decompose a tree into connected components. If there is no with , however, then there is one edge too many. A minimal set of separating edges can easily obtained in this case by omitting the separating edge for one of the sets that are inclusion-maximal among the inclusion-minimal sets , i.e., those which correspond to vertices that are closest to the root (see Fig. 5 and 6 for further examples). We summarize the latter discussion in the following
Lemma 4.6.
Suppose that a partition of and a rooted tree on are compatible. Then, there always exists a minimum-sized set of separating edges such that and . In particular, if is chosen as in Eq. (3), then and if and only if there is no with .
The following result is a simple consequence of Thm. 4.5 and the discussion above.
Corollary 4.7.
Let be a partition of , a hierarchy on with corresponding tree , and let be the edge set defined in Eq. (3). Then and are compatible if and only if .
Lemma 4.8.
If the partition and the hierarchy on are compatible, then the following conditions hold for all : If and , then .
Proof.
By Property (i) of Thm. 4.5, is a union of sets of , and thus either (a) or (b) . In case (a), we have by isotony and idempotence of the closure. Similarly, we obtain from the assumption . Therefore, . By Property (ii) of Thm. 4.5 this is a contradiction to . Thus, case (a) is impossible and we always have . ∎
Corollary 4.9.
The partition and the hierarchy are compatible if and only if
(4) |
holds for all .
5 Compatibility of (Systems of) Partitions and Refinements of Hierarchies
In many applications hierarchies (and their associated trees) are not necessarily fully resolved, even though it is often assumed that the “ground truth” is a binary tree. We show first that refining a hierarchy or a tree that is already compatible with a partition never destroys compatibility.
Proposition 5.1.
A hierarchy on and a partition of are compatible if and only if is compatible with every refinement of .
Proof.
The if-direction immediately follows from the fact that is a refinement of . For the only-if-direction, let be the tree corresponding to , denote by the set of separating edges as defined in Eq. (3). Thus, we have if and only if . Since and are compatible, we can apply Cor. 4.7 to conclude that . Therefore, the path connecting any two vertices and from distinct contains at least one edge in . Let be the set of all with and . For all , therefore, there is an such that .
Now consider an arbitrary refinement of and the corresponding refinement of . By construction, we have . For each , we set and set
Since by construction, we have and thus and, in particular, is well-defined.
Now consider two arbitrary two vertices and in distinct . As argued above, the path connecting them in contains an edge . By construction, we have . We can assume w.l.o.g. that and . This together with implies that the path connecting and in contains the edge . By construction of and since , we have .
It remains to show that the path in connecting any two for some never contains an edge that is in . Assume, for contradiction, that this is the case, i.e., there is an edge such that w.l.o.g. and . By construction of , and for some . Since and are compatible, Thm. 4.5(i) implies that is the union of sets of ; a contradiction to the fact that and .
In summary, we conclude that , and, therefore, is compatible with . ∎
We next provide a necessary condition for r-compatibility. We will show later that this condition is also sufficient.
Lemma 5.2.
Let be a hierarchy on and a partition of . If is compatible with a refinement of , then there is no set that overlaps with two distinct sets .
Proof.
Let and with be compatible. Assume, for contradiction, that some overlaps with two distinct . First observe that . Now consider and . Since and is compatible with , we have by contraposition of Thm. 4.5(ii). Since overlaps with and , both and , and thus and , contain elements that are in and as well as elements that are not in . This together with the fact that , , and are all sets in implies that and . Therefore we have . Since , is a hierarchy, and , this implies that either or . Assume w.l.o.g. that . Since overlaps with and , we have . However, since and is inclusion-minimal for in the hierarchy , we conclude that and overlap. Hence, is not the union of sets of . This violates Condition (i) for compatible partitions in Thm. 4.5; a contradiction. ∎
To show that the converse of Lemma 5.2 is satisfied as well, we will explicitly construct a compatible refinement of . To this end, we introduce the following subset of a partition :
(5) |
The set contains the sets for which or, equivalently, the vertex is “not resolved enough”:
Proposition 5.3.
A hierarchy on and a partition of are compatible if and only if is empty.
Proof.
By Thm. 4.5, and are compatible if and only if, for all , the following two conditions are satisfied: (i) is a union of sets of , and (ii) implies . Hence, it suffices to show that is equivalent to these two conditions.
First assume, for contraposition, that is not empty. Hence, there are distinct such that and . If , then Condition (ii) is violated. If on the other hand , then the fact that is inclusion-minimal for implies that . This together with in turn implies that is not a union of sets of ; a violation of Condition (i).
The converse can be shown by very similar arguments starting with the assumption that Condition (i) or (ii) is not satisfied. If Condition (i) does not hold, then and, in particular, there is a such that and overlap. Since is a hierarchy, it holds that . The latter two arguments imply since it contains . If Condition (ii) does not hold, then there are two distinct elements with . Hence, implies . Thus, . ∎
In particular, the set can be used to characterize the cases in which a compatible refinement of exists and, if this is the case, to construct such a refinement.
Definition 5.4.
Let be a hierarchy on and a partition of such that no set overlaps with two distinct sets . We define, for , the subset of as
where are the child clusters of for which . Moreover, the subset of is given by
Note that, by Eq. (5), for every , the set cannot be a singleton. Since is inclusion-minimal w.r.t. by definition, we immediately conclude that and thus the are proper subsets of . In terms of the tree corresponding to , the vertices are the children of with .
Lemma 5.5.
Let be a hierarchy on and a partition of such that no set overlaps with two distinct sets . Then the following two statements are satisfied:
-
1.
For each , it holds and, in particular, .
-
2.
The set is a hierarchy.
Proof.
To show (1), set . By construction, we have . Assume for contradiction that , i.e., all child clusters of in have a non-empty intersection with . By definition of , we must have and, in particular, has a child cluster satisfying for some such that . Thus and , i.e., and overlap. Since and is inclusion-minimal w.r.t. , we conclude that . On the other hand, since and are disjoint, we have . Thus and overlap. Thus there are two distinct sets that overlap with . This contradicts the assumption that no such pair of sets exists, hence . Since by construction, is a proper subset of . This together with the fact that is a hierarchy and is the union of at least two child clusters of in implies that .
We proceed by showing that the set system is again a hierarchy and thus, that (2) is satisfied. To this end, consider first one of the newly-created sets and an arbitrary set . If for all , then . Otherwise, there is some such that . Since both and are sets of the hierarchy , this implies that , or . In the latter case, we have by the hierarchy property of and the fact that is a child cluster of . Hence, we have for all . Now consider two newly-created sets and for distinct , and assume first that . If , then and immediately imply that . Otherwise, we can assume w.l.o.g. that since and are both sets in the hierarchy . This together with the fact that is the union of child clusters of implies that either for some , and thus , or, if no such exists, . It remains to consider two distinct sets with . If and overlap, then there is by construction a child cluster of in such that and . Since, moreover, is inclusion-minimal w.r.t. and , this implies that overlaps with both and ; a contradiction to the assumption. Thus and are disjoint. In summary, no two sets in overlap. Since , and for all , we conclude that is a hierarchy that refines . ∎
The final step towards characterizing r-compatibility is a sufficient condition for to be compatible with .
Lemma 5.6.
Let be a hierarchy on and a partition of . If there is no set that overlaps with two distinct sets , then the hierarchy is compatible with .
Proof.
Recall that is a hierarchy by Lemma 5.5(2). To prove that is compatible with , we show that the Condition (i) and (ii) in Thm. 4.5 are satisfied for and .
To show Condition (i) in Thm. 4.5, we assume, for
contradiction, that there is a set for which
is not the union of sets of . Thus
there is a set such that
and
. We distinguish the two
cases (1) and
(2) .
In case (1), we have for some
. Thus is the union
of sets
all satisfying
. Since
and by construction
, we have . Since
, there must be some
such that
. Since and
, we have
. On the other hand, we also have
because and
for all . Since and are
disjoint and both have a
non-empty intersection with , we also obtain and
. In summary, the set
overlaps with the two distinct sets ; a contradiction
to the assumption.
In case (2), we have . Since
is inclusion-minimal for in and
we conclude that
is also inclusion-minimal for in , and thus
. Since
,
, and is a
hierarchy, we conclude that .
In summary, we obtain . Hence, we
have added a set that satisfies and, by the
arguments above, . Therefore,
is not inclusion-minimal for in ;
a contradiction.
To show Condition (ii) in Thm. 4.5, we assume, for
contradiction, that
for two distinct . As above, we distinguish the two
cases (1’) and
(2’) .
In case (1’), we have for some .
Thus is the union of
sets all satisfying
. Since and are distinct, we can assume
w.l.o.g. that . Since , there must be
some such that
. Since and is
inclusion-minimal for in , we have
. On the other hand, we also have
since and for all . Since and are disjoint
and both have a
non-empty intersection with , we also obtain and
. In summary, the set
overlaps with the two distinct sets ; a contradiction
to the assumption.
In case (2’), we have . Together with the facts that
is inclusion-minimal for in and that
, implies that
is also inclusion-minimal for in , i.e. . Analogous arguments imply .
Since we conclude that . Therefore, we have added a set that satisfies both
and, by the arguments above, .
Therefore, is not inclusion-minimal for in ; a
contradiction.
In summary, is a hierarchy on such that Conditions (i) and (ii) in Thm. 4.5 are satisfied for all . Thus is compatible with the refinement of . ∎
Theorem 5.7.
A hierarchy and a partition on are r-compatible if and only if no set overlaps with two distinct sets .
Proof.
It is worth noting that the characterization of r-compatibility in Thm. 5.7 implies neither Property (i) nor Property (ii) in Thm 4.5. As a counterexample to (i), consider the hierarchy on that comprises in addition to and the singletons only the set , and the partition with and . We have and . Clearly, overlaps and thus violates (i). On the other hand, it admits the refinement , which is compatible with . As a counterexample to (ii), consider comprising only and the singletons. Here, we have , while both refinements and are compatible with .
We continue with considering systems of partitions of rather then single partitions. By definition, (or equivalently the hierarchy ) and each are compatible if and only if for some subset , . In this case, we say that (equiv. ) and are compatible. It is natural, then, to ask whether for a given system of partitions , there exists a tree such that and are compatible.
Proposition 5.8.
Let be a hierarchy on and be a collection of partitions of . The following two statements are equivalent
-
(1)
There is a refinement of that is compatible with .
-
(2)
Each admits a compatible refinement of such that is a hierarchy.
Proof.
Let be a refinement of that is compatible with and thus, with every . Now, put , . Hence, is a hierarchy that is compatible with . Conversely, if is a hierarchy, then it is, in particular, a refinement of every , . Now set . By Prop. 5.1, is compatible with with every . ∎
Prop. 5.8 immediately implies
Corollary 5.9.
There is a tree that is compatible with a collection of partitions of if and only if, for every , there is a tree that is compatible with such that is a hierarchy.
Proof.
Since every tree on is a refinement of the star tree on , every hierarchy on is a refinement of . Hence, the existence of some hierarchy or, equivalently, some tree that is compatible with is equivalent to the existence of a refinement of of that is compatible with . ∎
As illustrated in Fig. 4, and might be compatible, although there are refinements of compatible with , whose union does not form a hierarchy. Fig. 7, furthermore, shows an example of a partition system that is not compatible with any refinement of . We will show in Section 7 that deciding whether or not such a common refinement exists is an NP-complete problem.

The partitions of a set form a complete lattice [2, Sect. 4.9]. The common refinement of two partitions and of is
(6) |
The common refinement operation is associative and commutative. The common refinement of an arbitrary system of partitions, therefore, consists of all distinct sets . The following results shows that the common refinement of partitions that are compatible with is compatible with as well.
Proposition 5.10.
Let be a tree with leaf set and a collection of partitions on that are all compatible with . Then, is compatible with .
Proof.
Let be a tree with leaf set . We show first that for all subsets it holds that
(7) |
To see this, let . For all the following statements are equivalent
-
1.
for some ,
-
2.
there is no separating edge in and in that is on the path between and in ,
-
3.
there is no separating edge in that is on the path between and in , and
-
4.
.
Hence, . Similarly, the latter equivalent statements hold for all and thus, .
Now assume that are all compatible with . Hence, for each there is a set such that . By the latter arguments and since is commutative and associative, we can conclude that and thus, is compatible with . ∎
The converse of Prop. 5.10 is not true in general. As an example, consider the tree as shown in Fig. 7 and the two partitions and . Both and are not compatible with , however, their common refinement is.
The refinement supremum or join is obtained by recursively unifying any two sets whenever there is such that and . The analogue to Prop. 5.10 does not hold for the refinement supremum. To see this, consider the tree on with hierarchy and the partitions and . Both and are compatible with (just define the edges incident to the for respective singletons as separating edges). However, is not compatible since, for the hierarchy corresponding to , we have ; a contradiction to Condition (ii) in Thm. 4.5. In [19], a notion of local comparability of partitions is considered: iff for all and . The example above satisfies . Hence, local comparability of partitions and compatible with a given hierarchy is also not sufficient to imply compatibility of and .
6 Compatibility of Partitions with Split Systems and Unrooted Trees
Throughout this section, we will assume that all unrooted trees are phylogenetic as well and have at least three leaves. In particular, therefore, they have at least one inner vertex. Not surprisingly, there is a very close connection between the case of rooted and unrooted phylogenetic trees.
Proposition 6.1.
Let be an unrooted tree with leaf set and let be a partition of . Then and are compatible if and only if and the rooted tree are compatible, where is obtained by rooting at an arbitrary inner vertex.
Proof.
Note that rooting at an arbitrary inner vertex results in a phylogenetic rooted tree , since does not contain vertices of degree two. The equivalence now follows immediately from the definition and the fact that, viewed as pair of a set vertices and a set of edges, and therefore, for some subset . ∎
In fact, it is not necessary to root at an inner vertex, we may as well place the root as a subdivision of any edge. This allows us to connect unrooted trees directly to hierarchies:
Proposition 6.2.
Let be a hierarchy with corresponding rooted tree with leaf set and let be a partition of . Then and are compatible if and only if the unrooted tree obtained from (by suppressing a possible degree-two root) is compatible with .
Proof.
If the root of has degree greater than two, then the tree is phylogenetic and thus, can we apply the same arguments as in the proof of Prop. 6.1 to establish the equivalence. If has degree , then the tree is not phylogenetic and both edges and that are incident to define the same split . Since contains a split only once, the unique phylogenetic tree defined by the split system is obtained from by suppressing the root , i.e., . Now let be such that . If or is contained in , we add the edge to to obtain the set and, otherwise, we put . It is now easy to see that and thus, and are compatible. ∎
As outlined in Section 2, every unrooted phylogenetic tree is determined by its split system and for a tree-like split system there is a (unique) unrooted tree with . Since unrooted trees are so intimately related with split systems, it is interesting, therefore, to ask whether the compatibility of rooted trees or hierarchies and partitions can also be expressed in an interesting way in terms of split systems.
Corollary 6.3.
For every partition of a non-empty set , the split system with
(8) |
is always tree-like and compatible with .
Proof.

Compatibility of a partition with a split system of some tree, however, does not imply that , see Fig. 8. Suppose that and are compatible, i.e., that . The set of separating edges then corresponds to the set
of splits. Then we have either
(9) |
since none of the edges separates two vertices of . Furthermore, for any two distinct sets there is a split such that, w.l.o.g., and because there must be an edge in separating and in . Taken together, therefore, we observe that every set satisfies
(10) |
Conversely, suppose that an arbitrary split system satisfies Eqs. (9) and (10) (replace by in the equations). By Eq. (10), for every with and thus , there is a split such that and and thus there is an edge that separates and in . Add all these edges to the set . Eq. (9) ensures that no two elements in are separated by an edge in . Thus, for every . Hence, and are compatible if and only if satisfies Eqs. (9) and (10). Recall that splits on are partitions of and that the common refinement of a set of partitions of is the partition whose sets are the intersections of sets appearing in any of the partitions in the system that have a least one point in common. Thus, satisfies Eqs. (9) and (10) if and only if . We summarize this discussion as
Theorem 6.4.
Let be a partition and be a tree-like split system. Then and are compatible if and only if there is subset such that is the common refinement of . In this case, for the tree with and .
Since every refinement of a tree corresponds to a tree-like split system that satisfies , we immediately obtain a characterization of tree-like split systems that admit a refinement that is compatible with a partition .
Corollary 6.5.
Let be a partition and be the split system associated with a tree . Then is compatible with a refinement of if and only if there is a set of splits such that (i) is the common refinement of and (ii) is tree-like.
The characterizations in Thm. 6.4 and Cor. 6.5 are not constructive, i.e., they do not provide recipes to construct . Clearly, we can directly employ Prop. 6.1 and the linear-time algorithm provided in Section 7 to check whether a split system and a partition are compatible or not. Nevertheless, we provide the following two lemmas to provide a further constructive characterization that makes use of a step-wise decomposition and might be of further theoretical interest.
Definition 6.6 ([18, Def. 6.1.1]).
Let be an unrooted phylogenetic tree with leaf set and corresponding split system . Then the restriction of to a non-empty subset of leaves is the tree for which
Somewhat surprisingly, it suffices to check whether there is a single element such that and are compatible, to determine whether and are compatible as shown in the next
Lemma 6.7.
Let be an unrooted tree with leaf set and let be a partition of . Then and are compatible if and only if , or there is a set such that (i) and (ii) and are compatible.
Proof.
Clearly, every tree on is compatible with the partition since . Thus, assume in the following. First suppose that and are compatible. We first show that there is a set such that . Since and are compatible, there is an edge set such that and a corresponding set of splits . Since by assumption, both sets and are non-empty. Therefore, we can pick an arbitrary split . By Eq. (9), every satisfies either or . In particular, there must be some with . If there is no other split with , then and we are done. Otherwise consider the split with . Then either there is another split with or . Since is finite, we eventually reach a split with , i.e., we can choose in Condition (i). Now let be the edge in with . The restriction of to is therefore simply the connected component of that does not intersect . Thus , i.e., Condition (ii) is satisfied.
Conversely, suppose there is an such that and is compatible and is a split corresponding to an edge in . Then there is an edge set such that . Since , there is an edge in connecting the subtrees and . In particular . Therefore, we have , i.e., and are compatible. ∎
Lemma 6.8.
Let be a partition of , and an unrooted tree with leaf set . Then admits a refinement compatible with if and only if , or there is an such that (i) for every we have at least one of , , , or and (ii) the restriction admits a refinement that is compatible with .
Proof.
For brevity, we write and for the split systems of the two trees and . Clearly, every tree on is compatible with the partition since . Thus, assume in the following. Suppose the refinement (with corresponding split system ) of is compatible with . By Lemma 6.7, there is an such that is a split in and is compatible with . Clearly, is a refinement of , i.e., admits a refinement that is compatible with . Since identifies the tree , it is in particular a tree-like split system. Since is a refinement of , we have . In particular, therefore and every split must have at least one empty intersection , , or . Depending on which of the four intersections is empty, we have one of the following situations , , or , respectively.
Conversely, suppose there is an satisfying conditions (i) and (ii). Then is a tree-like split system because has this property and, for any , the alternatives in (i) amount to , , , or , respectively. Moreover, and thus contains the singleton splits for all . The split system therefore defines a refinement of . Furthermore, we have and since either or the difference between and is only the expansion or contraction of the edge identified by the additional split . By condition (ii), there is a refinement of the restriction that is compatible with the partition of . Thus there is also a refinement of such that the restriction is compatible with . Let be the corresponding set of separating edges. Then . Thus and are compatible. ∎
Lemma 6.7 and Lemma 6.8 are associated with a simple algorithmic intuition. Among the sets , at least one corresponds (at least in a refinement) to a connected component in the forest obtained by deletion of a single edge, and thus to a split that is either already contained in or that can be used to refine the tree . This immediately yields an algorithm on the split systems that, in each step, finds a set that satisfies condition (i) and then proceeds to checking the restriction to . While the formulation in terms of tree-like split systems is of some theoretical interest, it seems to be of little practical use compared to the linear-time algorithms described in the following section.
7 Algorithms and Complexity
In the following, we will first derive results for the complexity of checking whether and are r-compatible and the construction of the edge coloring which we then use for the special case of compatibility of and . Finally, we investigate the complexity of finding a refinement that is compatible with a system of partitions. In view of Prop. 6.2, we will assume that is a rooted tree in this section unless explicitly stated otherwise.
Recall that, for a tree on and a partition of , the map assigns to as “colors” all sets for which lies on a path connecting two elements .
Observation 7.1.
Let be a tree on and a partition of , , and . Then
Lemma 7.2.
Let be a hierarchy on , the corresponding tree on , and a partition of . Moreover, let . Then, overlaps with two distinct sets if and only if with .
Proof.
Since , there is a unique vertex with and thus, . First assume that overlaps with two distinct sets . Thus, we have and for . By Obs. 7.1, this implies . For the converse, assume that with . By Obs. 7.1, we have and for . Since, in addition, and are disjoint, we have for . In summary, overlaps with the two distinct sets and . ∎
Proposition 7.3.
Let be a tree on and a partition of . and are r-compatible if and only if for every .
To find a refinement of a hierarchy that is compatible with , it is crucial to know the set , and, in particular, the sets for the . However, an explicit construction of the latter is not needed since the property of being equal to for some or not is entirely determined by the colored edges incident to .
Lemma 7.4.
Let be a hierarchy on , the corresponding tree on , a partition of , and . Then, for some if and only if the following two conditions are satisfied for :
-
(a’)
for some and either or .
-
(b’)
for some and some color .
Proof.
Let for some . By definition, this is, if and only if, (a) with and (b) there is some satisfying and . In particular, is an inner vertex in this case. Thus, we have . The definition of directly implies Condition (a’). Now, let such that and . Hence, , and thus, there must be a child with . However, since and since is inclusion-minimal for , we also have . Taken together, we obtain by definition of and thus Conditions (b’).
Now assume that Conditions (a’) and (b’) are satisfied. Let for some . Hence, . Moreover, the two possible cases or imply that there must be a second edge for some that is colored with due to the definition of and the fact that . Therefore, . This together with and implies Condition (a) . Condition (b’) implies that for some (and thus ) and . Together with , these two arguments imply which is equivalent to . In summary, Condition (b) is satisfied. ∎
We are now in the position to show that r-compatibility can be decided in linear time.
Theorem 7.5.
Given a rooted tree on and a partition of , it can be decided in time whether and are r-compatible. In this case, the edge coloring and a compatible refinement can also be constructed in time.
Proof.
We employ the sparse-table algorithm described in [1], which, following an -preprocessing step, enables constant-time look up of for any . We represent by a (hash-based) map data structure that contains the edges as keys, and (hash-based, initially-empty) sets as values, which will be filled with the elements in . The sets in can be represented by pointers to these sets or by integer indices when used as colors. We next show that can be constructed in time. When an edge is colored with (i.e., is added to ), we check in constant time whether still has at most one color. If this is not the case, we stop the algorithm since and are not r-compatible by Prop. 7.3. Conversely, Prop. 7.3 implies that if we color each edge at most once, then and are r-compatible.
We process every as follows. First, we initialize the set of previously visited vertices of as . Moreover, we initialize the current last common ancestor as , which we will update stepwise until it equals in the end. To this end, for each leaf (if any), we query and move from upwards along the tree. Each edge encountered during the traversal is colored with , and is added to visited. The traversal stops as soon as is in visited or equals newLCA. In case we have , which by definition of newLCA holds if , we perform the same bottom-up traversal starting from curLCA. As a final step in the processing of , we set . One easily verifies that, after processing all vertices in , we have exactly colored the edges in the minimal subtree of that connects all leaves in . Moreover, each edge considered in the bottom-up traversals is colored with and required only a constant number of constant-time queries and operations. Similarly, the additional operations needed for each (i.e., set initialization, query, comparison, and update of the last common ancestor) are performed in constant time. Since the algorithm stops as soon as an edge would be colored with two colors, at most one edge is considered twice in the bottom-up traversal. Since is a phylogenetic rooted tree, we have . In total, therefore, the traversals of the tree require operations. In addition, a constant effort is required for each of the vertices in the disjoint sets in . Thus can be constructed in time.
It remains to show how a compatible refinement of can be constructed. Put and . First note that all inner vertices that need to be resolved correspond to some (i.e., ) such that for some . By Lemma 7.4, it suffices to solely check the colorings for the edges incident to according the two conditions in Lemma 7.4. Therefore, we do not need to consider the sets explicitly. In this way, each edge in and its set of colors must be checked at most twice. Since for every edge , this can be done in time. By Lemma 5.6, is a refinement of that is compatible with . Instead of operating on and , we directly construct the tree corresponding to from in -time as follows. If a vertex satisfies Conditions (a’) and (b’) in Lemma 7.4, then the respective coloring of its edges imply that there is an with for some color . We refine at vertex as follows: By definition, the sets whose disjoint union gives the newly-created sets are child clusters of in . Hence, they correspond to the children for which the edge is colored with . Therefore, we remove all of these edges , and instead add the edge and the edges , where is a newly-created vertex. In particular, we have . Since this is true for all sets , the resulting tree corresponds to the hierarchy . Clearly, we introduce no more than new vertices. Since each edge has at most one color, at most operations are required. ∎
We next characterize compatibility of and in terms of the edge coloring and show that compatibility of and can be tested in linear time.
Theorem 7.6.
Let be a hierarchy on , the corresponding tree on , and a partition of . Then and are compatible if and only if there is no vertex and distinct such that and for (not necessarily distinct) children . In particular, it can be decided in time whether and are compatible.
Proof.
First suppose, for contraposition, that there is a vertex with distinct such that and for children . If , we can apply Prop. 7.3 to conclude that there is no refinement of that is compatible with . Thus, in particular, is not compatible with . Now assume that and are distinct. We distinguish the three cases (a) and , (b) and exactly one of and is in , and (c) or . In case (a), contains two colors and we again obtain incompatibility of and by Prop. 7.3. In case (b), we assume w.l.o.g. that and . Since , we have and since , it must hold that . The latter two arguments imply that there is a second edge , with and, in particular, . Moreover, since , we have and . In particular, therefore, corresponds to and is not the union of sets in . Together with Thm. 4.5, this implies that and are not compatible. In case (c), we have, by similar arguments as before, that . Hence, corresponds to both and for distinct , and thus, Condition (ii) in Thm. 4.5 is not satisfied. Therefore, and are not compatible.
To prove the converse, suppose, for contraposition, that and are not compatible. Hence, Condition (i) or (ii) in Thm. 4.5 is not satisfied. If Condition (i) is not satisfied, then there is some such that is not the union of sets in . Since by definition, the latter implies that there must be some such that and . Moreover, since and are disjoint and non-empty and both contain elements that are in , the vertex corresponding to is an inner vertex and has (not necessarily distinct) children such that and . If Condition (ii) is not satisfied, i.e., for two distinct , we can apply similar arguments to conclude that has children such that and .
It remains to show that compatibility of and can be decided in . Compatibility implies r-compatibility which can be checked in by Thm. 7.5. In particular, the edge coloring can be constructed with the same complexity in this case. The condition whether or not there is a vertex and distinct such that and for (not necessarily distinct) children can easily be checked in time by counting, for each vertex in an arbitrary traversal, the number of colors appearing on the edges leading to its children. ∎
Similar to Lemma 4.6, we obtain here a result for maximum-sized sets of separating edges. Since any edge with cannot be a separating edge, and any edge for which can always be added as a separating edge, we obtain
Corollary 7.7.
Suppose that and are compatible. Then, there is a unique maximum-sized set of separating edges , which is given by the set of edges for which . This maximum-sized set can be computed in time.
For a minimum-sized set of separating edges for compatible and , the cardinality of can be expressed as a function of alone (cf. Lemma 4.6), and is thus independent of . However, the latter is not the case for a maximum-sized set of separating edges. To see this, consider a partition of consisting of all singletons. Clearly, we have for any tree on where varies depending on how resolved a specific tree is.
In what follows, we investigate the complexity of the problem of recognizing compatibility of systems of partitions of with (refinements of) trees. In Section 3, we have introduced the two closely related problems asking whether a tree admits a refinement that is compatible with all partitions in , CompaTP, and whether a compatible tree exists at all, ExistTP. To this end, we first show that ExistTP is a simple translation of the Symm-Fitch Recognition problem, which is NP-complete [12, Thm.4.2]. In particular, this discussion will yield NP-completeness of CompaTP and ExistTP in Thm. 7.10 below.
The concept of compatible partitions and trees is intimately related to so-called undirected Fitch graphs , that is, complete multipartite graphs whose maximal independent sets form a partition of [10]. To be more precise, for a given tree with leaf set and subset , an undirected Fitch graph has an edge if and only if there is an edge in that lies on the path between and in . Therefore, if and only if and are contained in the same set . This construction was generalized in [9, 11] to Fitch maps that allow multiple colors. In the following paragraph we briefly summarize the construction of symmetrized Fitch maps [12]. We then show that a symmetrized Fitch map can be interpreted as a partition system on that is compatible with a suitably chosen tree .
Let be a set of colors for some . Moreover, for a set , we write . An edge-colored tree is a tree together with a map . Note that can be chosen arbitrarily in contrast to the -coloring of as in Def. 3.3. An edge is an -edge if for some .
A map is a symmetrized Fitch map if there is an edge-colored tree with leaf set and edge coloring such that for every pair it holds that
In this case, we say that is explained by .
For an arbitrary map and each , the monochromatic map (induced by ) is given by . By definition, we have and, in particular, if and only if . If moreover (and thus ) is a symmetrized Fitch map, then there is a tree such that if and only if there is an -edge on the path from to . In [10], it was shown that the graph representations of (monochromatic) symmetrized Fitch relations, given by if and only if , coincide with the class of complete multipartite graphs. Therefore, they can uniquely be represented by a partition of in a way that each set corresponds to a maximal independent subset of , i.e. for all . In particular, it holds that are elements of distinct sets of if and only if . Thus, if all monochromatic maps are symmetrized Fitch maps, is a partition system on .
Lemma 7.8.
Let be a map such that the monochromatic maps are symmetrized Fitch maps for all , and be an unrooted tree with leaf set . Then, there is an edge-coloring such that explains if and only if the partition of is compatible with for all .
Proof.
First note that, since the monochromatic maps are symmetrized Fitch maps for all , the partitions of are all well-defined. For the case , the statement is trivially true since, in this case, every of the (at most two) possible partitions of are compatible with a unique tree on . Thus, assume that . Let be any rooted version of . Since , we can apply Prop. 6.1, to conclude that is compatible with some partition of if and only if is compatible with . Hence, it suffices to show the statements for .
Assume there is an edge-coloring such that explains . Put , . By construction, we have for all distinct that
-
and are in distinct sets of
-
-
for some on the path connecting and
-
there is an edge in on the path connecting and
-
and are in distinct sets of .
Hence, and is compatible with for all .
Now assume that the partition of is compatible with for all . For each , define for all edges for some with and, for all remaining edges put . By construction, we have for that if and only if is contained the set as specified in Eq. (3). By Cor. 4.7, we have . Hence, if and only if there is an edge along the path between and with . Now set for all to obtain the final coloring such that explains . ∎
Lemma 7.9.
Symm-Fitch Recognition remains NP-hard if the monochromatic maps are symmetrized Fitch maps for all .
Proof.
As shown in [12, Thm. 4.2], Symm-Fitch Recognition is NP-complete. Note, if for a map there is an such that is not a monochromatic Fitch match, then cannot be a symmetrized Fitch map; a property that can be checked in polynomial-time for all [10]. Hence, under the assumption that , the NP-hard instances must, in particular, be included within the instances of Symm-Fitch Recognition for which is a monochromatic Fitch match for all . ∎
We are now in the position to establish NP-completeness of ExistTP and CompaTP.
Theorem 7.10.
ExistTP and CompaTP are NP-complete.
Proof.
By Thm. 7.6 and the fact that one can test in polynomial time whether is a refinement of by comparing their hierarchies, ExistTP and CompaTP are contained in the class NP. By Lemma 7.8, ExistTP is equivalent to the Symm-Fitch Recognition problem restricted to a certain set of instances for which, by Lemma 7.9, the problem remains NP-hard. Asking for a tree that is compatible with is equivalent to asking whether there exists a refinement of the star tree that is compatible with . Therefore, ExistTP is a special instance of CompaTP and NP-hardness of ExistTP implies that CompaTP is also NP-hard. Since both problems are in class NP, ExistTP and CompaTP are NP-complete. ∎
The latter problems become fixed-parameter tractable and thus, easier, if is “almost binary”. In [17], the resolution of a rooted tree is quantified by the normalized parameter , which varies between (star tree) and (binary tree). The quantity correspondingly measures how much deviates from being binary. Now let be the set of non-binary inner vertices of . Writing for the number of “excess children” at the inner vertex , one easily checks that . Now suppose . Then all possible binary refinements of are obtained by inserting an arbitrary binary tree between each non-binary vertex of and its children . It is well known the that the number of binary rooted leaf-labeled trees on leaves is [6]. The total number of binary refinements is therefore . From the definition of the double factorial we see that, after omitting the leading factors , has exactly factors for each , and thus, has exactly factors. Similarly, has contributing factors greater than . By ordering these factors in and , resp., one easily verifies that, since , the second product has at least as many factors as the first, and moreover, each of them is not smaller than the corresponding factor (w.r.t. the ordering) in the first product (if existent). Note that equality holds if and only if comprises a single vertex with children. Each binary refinement can be checked in time for consistency with since the consistency check for a single partition can be performed in time by Thm. 7.5 below. Thus there is an algorithm and thus, CompaTP is FPT for the parameter .
8 Concluding Remarks
We have characterized the compatibility of a partition with a hierarchy . The concept of compatibility considered here is much more general than that of a “representative partition”, i.e., the cutting of hierarchy at a particular aggregation level. Instead, it amounts to disconnecting the corresponding tree at an arbitrary set of edges , i.e., at a subset of the splits . In Section 5, we have characterized when a refinement of of a tree exists such that and are compatible. For practical application, it may be relevant to allow more general operations on the tree . A natural generalization is to allow not only refinements but also edge contraction while editing into a tree that is compatible with a partition of interest. This amounts to minimizing the cardinality of the symmetric difference of the corresponding split systems, i.e., the Robinson-Foulds distance of and .
In Section 6, we have considered tree-like split systems, i.e., split systems that can be represented by unrooted trees. Thm. 6.4 and Cor. 6.5, however, suggest to consider the compatibility of a partition and a split system in a more general setting. These characterizations may then serve as convenient definitions in a more general context: We may say that a partition and a split system are compatible if there is a set of splits such that and . In order to handle refinements in this setting, one would ask whether there is a set of splits such that . Without further restrictions, always provides a positive answer to this question. In analogy with our discussion above, it therefore seems natural to consider only split systems that belong to a certain class of interest. A refinement will be feasible only if the split system again belongs to the desired class. Natural generalizations of tree-like split systems to which the notion of compatibility may be applied include circular and weakly compatible split systems [18, 5], or the even more general Teutoburgan split systems considered in [13]. The suggested definition of compatibility in terms of split systems also provides the natural generalization to the framework of X-trees [18, 5], i.e., to trees in which the set of taxa is not restricted to the leaves of but may also appear as inner vertices of . This amounts to lifting our requirement that the trivial splits must be included in .
We have seen in Section 7 that compatibility of and and the existence of a refinement can be decided in linear time, while the extension to arbitrary partition systems is NP-complete. Several interesting open questions remain concerning the computational complexity of the Compatibility of Tree and Partition System problem and the Existence of Tree compatible with Partition System problem. Do these problems remain NP-complete if the tree corresponding to has bounded degree? What if the number of input partitions is kept constant? Since the related Symm-Fitch Recognition problem [12] is in turn closely related to the problem of Unrooted Tree Compatibility [3], which is known to be FPT in number of input trees, it is not unlikely that Compatibility of Tree and Partition System is FPT in the number of partitions. Furthermore, it is interesting to ask whether there are (easily recognizable) subclasses of partition systems for which CompaTP and ExistTP become tractable. Interesting candidates are the braids of partitions appearing in image analysis [15, 19], or the hierarchical partition systems considered in [14], for which forms a hierarchy.
Acknowledgments
This work was funded in part by the Deutsche Forschungsgemeinschaft. We thank the anonymous referees for the constructive comments and recommendations which helped to significantly improve the readability and quality of the paper.
References
- Bender et al. [2005] M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms, 57(2):75–94, 2005. doi: 10.1016/j.jalgor.2005.08.001.
- Birkhoff [1967] G. Birkhoff. Lattice Theory. Amer. Math. Soc., Providence, RI, 3rd edition, 1967.
- Bryant and Lagergren [2006] D. Bryant and J. Lagergren. Compatibility of unrooted phylogenetic trees in FPT. Theor. Comp. Sci., 351:296–302, 2006. doi: 10.1016/j.tcs.2005.10.033.
- Buneman [1971] P. Buneman. The recovery of trees from measures of dissimilarity. In F. R. Hodson, D. G. Kendall, and P. Tautu, editors, Mathematics in the Archaeological and Historical Sciences, pages 387–385. Edinburgh University Press, Edinburgh, 1971.
- Dress et al. [2012] A. W. M. Dress, K. T. Huber, J. Koolen, V. Moulton, and A. Spillner. Basic Phylogenetic Combinatorics. Cambridge University Press, Cambridge, UK, 2012.
- Felsenstein [1978] J. Felsenstein. The number of evolutionary trees. Syst. Biol., 27:27–33, 1978. doi: 10.2307/2412810.
- Geiß et al. [2018] M. Geiß, J. Anders, P. F. Stadler, N. Wieseke, and M. Hellmuth. Reconstructing gene trees from Fitch’s xenology relation. J. Math. Biol., 77:1459–1491, 2018. doi: 10.1007/s00285-018-1260-8.
- Hellmuth [2017] M. Hellmuth. Biologically feasible gene trees, reconciliation maps and informative triples. Alg Mol Biol, 12:23, 2017. doi: 10.1186/s13015-017-0114-z.
- Hellmuth [2019] M. Hellmuth. Generalized Fitch graphs: Edge-labeled graphs that are explained by edge-labeled trees. Disc. Appl. Math., 267:1–11, 2019. doi: 10.1016/j.dam.2019.06.015.
- Hellmuth et al. [2018] M. Hellmuth, Y. Long, M. Geiß, and P. F. Stadler. A short note on undirected Fitch graphs. Art Discr. Appl. Math., 1:P1.08, 2018. doi: 10.26493/2590-9770.1245.98c.
- Hellmuth et al. [2020] M. Hellmuth, C. R. Seemann, and P. F. Stadler. Generalized fitch graphs II: Sets of binary relations that are explained by edge-labeled trees. Discr. Appl. Math., 283:495–511, 2020. doi: 10.1016/j.dam.2020.01.036.
- Hellmuth et al. [2021] M. Hellmuth, C. R. Seemann, and P. F. Stadler. Generalized Fitch graphs III: Symmetrized Fitch maps and sets of symmetric binary relations that are explained by unrooted edge-labeled trees. Discr. Math. Theor. Comp. Sci., 23, 2021. doi: 10.46298/dmtcs.6040.
- Huber et al. [2006] K. T. Huber, J. H. Koolen, and V. Moulton. On the structure of the tight-span of a totally split-decomposable metric. Eur. J. Comb., 27:461–479, 2006. doi: 10.1016/j.ejc.2004.05.007.
- Huber et al. [2014] K. T. Huber, V. Moulton, C. Semple, and T. Wu. Representing partitions on trees. SIAM J. Discr. Math., 28:1152–1172, 2014. doi: 10.1137/130906192.
- Kiran and Serra [2015] B. R. Kiran and J. Serra. Braids of partitions. In J. Benediktsson, J. Chanussot, L. Najman, and T. H., editors, Mathematical Morphology and Its Applications to Signal and Image Processing, volume 9082 of Lecture Notes Comp. Sci, pages 217–228, Cham, 2015. Springer. doi: 10.1007/978-3-319-18720-4˙19.
- Milligan and Cooper [1985] G. W. Milligan and M. C. Cooper. An examination of procedures for determing the number of clusters in a data set. Psychometrika, 50:159–179, 1985. doi: 10.1007/BF02294245.
- Schaller et al. [2021] D. Schaller, M. Geiß, M. Hellmuth, and P. F. Stadler. Best match graphs with binary trees. In C. Martín-Vide, M. A. Vega-Rodríguez, and T. Wheeler, editors, Algorithms for Computational Biology, pages 82–93, Cham, 2021. Springer International Publishing. doi: 10.1007/978-3-030-74432-8˙6.
- Semple and Steel [2003] C. Semple and M. Steel. Phylogenetics. Oxford University Press, Oxford UK, 2003.
- Tochon et al. [2019] G. Tochon, M. Dalla Mura, M. A. Veganzones, T. Géraud, and J. Chanussot. Braids of partitions for the hierarchical representation and segmentation of multimodal images. Pattern Recognition, 95:162–172, 2019. doi: 10.1016/j.patcog.2019.05.029.
- van de Vel [1993] M. van de Vel. Theory of convex structures. North Holland, Amsterdam, 1993.
- Vega-Pons and Ruiz-Shulcloper [2010] S. Vega-Pons and J. Ruiz-Shulcloper. Partition selection approach for hierarchical clustering based on clustering ensemble. In I. Bloch and R. M. Cesar Jr., editors, CIARP 2010, volume 6419 of Lecture Notes Comp. Sci., pages 525–532, Berlin, Heidelberg, 2010. Springer-Verlag. doi: 10.1007/978-3-642-16687-7˙69.