Ordinally Consensus Subset over Multiple Metrics
Abstract
In this paper, we propose to study the following maximum ordinal consensus problem: Suppose we are given a metric system , which contains metrics defined on the same point set . We aim to find a maximum subset such that all metrics in are “consistent” when restricted on the subset . In particular, our definition of consistency will rely only on the ordering between pairwise distances, and thus we call a “consistent” subset an ordinal consensus of w.r.t. . We will introduce two concepts of “consistency” in the ordinal sense: a strong one and a weak one. Specifically, a subset is strongly consistent means that the ordering of their pairwise distances is the same under each of the input metric . The weak consistency, on the other hand, relaxes this exact ordering condition, and intuitively allows us to take the plurality of ordering relation between two pairwise distances.
We show in this paper that the maximum consensus problems over both the strong and the weak consistency notions are NP-complete, even when there are only 2 or 3 simple metrics, such as line metrics and ultrametrics. We also develop constant-factor approximation algorithms for the dual version, the minimum inconsistent subset problem of a metric system , – note that optimizing these two dual problems are equivalent.
1 Introduction
In recent years, there have been many studies on data sets with multiple views, which can contain different sets of features from multiple sources carrying different types of information. For example, consider neuron cells in the field of neuroscience [15]. A single neuron cell could have both morphology features and RNA-sequencing information available. Simply concatenating these two types of feature sets and applying a classical single-view method may not produce meaningful results – The types of features may be different, and it is not clear how to properly weigh them when combined. Instead, there have been many approaches developed to handle multi-view data. For example, Lashkari and Golland [14], Bickel and Scheffer [4] considered using EM algorithm and (convex) mixture model on multi-view clustering; Kumar, Rai and Hal [13] and Cai et al. [5] extended spectral clustering algorithm for multi-view data. See also surveys on multi-view clustering [6], and more broadly, on multi-view learning [16].
Very often in applications, multiple views give rise to multiple metrics over the same data set . Our goal is to study whether these metrics are “consistent” over , and identify a largest subset of , called consensus, over which these input metrics are “consistent”. However, when comparing these metrics, note that the precise distance values between points in induced by different s may not have the same meaning, two metrics may not have simple, say, linear relation between them, and thus the distance values are not readily comparable (even after normalization). For example, the distance between two neuron cells based on their tree morphology can be very different from that based on their gene expression profiles. Hence in this paper, we will compute consensus under multiple metrics based on ordinal information, namely the order of pairwise distances under each metric.
More specifically, given a metric system , consisting of a set of input metrics on a discrete data set with cardinality , we propose to study the problem of finding maximum ordinal consensus of w.r.t. . Specifically, we aim to find a maximum subset , such that all metrics will have consistent pairwise distances if restricted on node set . We also call as outliers, while is our targeted ordinal consensus. The dual problem is to find the minimum inconsistent (outlier) set such that all metrics are consistent when restricted to the subset .
Our contributions. We propose two notions to measure the “ordinal consistency”, which we call strong consistency and weak consistency, respectively. Intuitively, a strong consensus means that the order of all pairwise distances among must be the same w.r.t. all input metrics. Under the weak consistency notion, roughly speaking for each pair of pairwise distances, only a plurality of input metrics (instead of all of them) need to agree on that. The formal definitions of these consistency notions are in Section 2.
Note that the maximum (ordinal) consensus subset and the minimum inconsistent subset are equivalent. In Section 3 and 4, we will show for both the strong and weak consistency definitions, finding the subset over an input of a constant number of (two for the strong case, and three for the weak case) ultrametrics or Euclidean metrics on the real line are NP-hard. These special cases imply that the problems are NP-hard if the inputs are arbitrary metrics. We also study the approximation algorithms for both the strongly and weakly minimum inconsistent subset problems. In particular, for the strongly inconsistent subset, we propose a 4-approximation algorithm with time complexity . For the weak case, we have a -time 6-approximation algorithm. See Appendix A.10 for a table summarizing our hardness results and approximation algorithms.
All missing technical details can be found in the appendix.
Some related work. We note that this maximum consensus problem has been considered before when input metrics are tree-metrics. In particular, in the scenario where the inputs are multiple leaf-labeled phylogenetic (rooted) trees, one aims to find a maximum subset of labels that are “consistent” among all inputs. In [2], Amir and Keselman proposed Maximum Agreement Subtree problem (MAST): given a set of rooted binary trees with the same set of taxa (leaf labels), find the maximum subset, such that all the given trees restricted on the subset are isomorphic. This can be considered as a special case of tree consistency [1]. The Maximum Agreement Supertree problem (SMAST) problem is studied in [3]: Here for the given trees , the leaf label set for input trees may not be same. The goal is to find a tree with such that is maximized and for each tree , the subtree is isomorphic to (where is the subtree of restricted on leaf set ).
These definitions of consistency over trees however are not identical from the ordinal consistency we propose. These problems are related to, but still different from, our maximum ordinal consensus problem if the input metrics are ultrametrics. An ultrametric can be represented by a corresponding representing tree where each tree node has a height value. Finding a strong consensus is equivalent to finding a subset of leaf nodes such that the restricted subtrees are not only isomorphic, but also the heights of all internal nodes must have the same order – This height condition appears to make the problem much harder: While the MAST problem on two trees can be solved in polynomial time via dynamic programming, the ordinal consensus problem is NP-hard even for only two ultrametrics, as we will show in Section 3.
2 Preliminaries and problem setup
The input is a metric system , consisting of a set of metrics over the point set . For any , is the distance between point w.r.t. . Our goal is to find a minimum subset , such that the order of all pairwise distance restricted on are consistent under a certain definition. Below we first introduce two notions of consistency. The two optimization problems we will study are given in Definition 2.6.
Definition 2.1 (Strong Consistency).
Given a metric system , we say that the set of metrics is strongly consistent w.r.t. if for any quartet , we have that (i) and (ii) for any . In this case, we say that is a strongly consistent set, or a strong consensus, over .
We also say that two pairs and are strongly consistent w.r.t. , if the order between these two pairwise distances is the same w.r.t. any metric in .
Other than strong consistency, we also consider a weaker notion of consistency: In particular, we now only require that the order constructed by taking the plurality voting111In plurality voting, a candidate wins if it has the most votes than the other candidates. It does not have to get a majority (more than 50%) of the votes. over all input metrics is valid. To define the weak consistency formally, we will first define the so-called relation set and the auxiliary graph.
Definition 2.2 (Relation Set of Pairwise Distances).
Given an input set of metrics over point set , the relation set of pairwise distances w.r.t. is the set of relations over all distinct pairs defined as follows: For any two pairs and , among all three possible relations between and , namely, , , and , the one induced by most number of metrics in will be included in the relation set . If there is any tie, we will choose the relation with appearance in the metric of smaller index.
For example, the relation if and only if this relation appears in more (or the same number of) metrics than the other two relations (, ).
The relation set constructed above may be not “valid” in the sense that no single metric can generate those relations. To check the validity of this relation set, we now define a specific auxiliary graph , whose nodes correspond to pairs of points from . There are three different connections between two graph nodes, which correspond to the three possible relations between their corresponding pairs in the relation set . We will use this graph later to decide whether is valid or not.
Definition 2.3 (Auxiliary Graph for Relation Set).
Given the relation set of pairwise distances over point set and metrics , the auxiliary graph , where , is a mixed graph (meaning it contains both directed and undirected edges): There is a directed edge from to if ; there is a undirected edge between and if .
We will use (corresponding to all pairs in ) to represent nodes in graph , where if . The auxiliary graph is a fully connected mixed graph (i.e., every pair of distinct vertices is connected by a unique edge, and the edge can be directed or undirected) with one edge between every two nodes. We say (or or ) if there is a directed edge from to (or an undirected edge between them, or a directed edge from to ), which means the pair, say , represented by has a larger pairwise distance compared with the pair represented by . A cycle in a mixed graph can be formed by a mixture of directed edges and undirected edges. Intuitively, suppose we have a completely directed cycle like the one shown in Figure 1 (a), then there is a conflict in the relations for pairwise distances, as . Hence the relation set will not be valid in this case. There are more types of “directed” cycles and they can cause such conflict. In particular:
Definition 2.4 (Directed cycle of mixed graphs).
A cycle is a fully-directed cycle if all edges in it are directed, and the directions of all edges inside are consistent (namely, Figure 1 (a)). A directed cycle of a mixed graph is a cycle such that (i) it consists of at least one directed edge, and (ii) one can assign a direction for each undirected edges in to make it into a fully directed cycle. See Figure 1 for examples.
![]() |
![]() |
![]() |
![]() |
(a) A full-directed | (b) A directed cycle | (c) This is not | (d) This is not |
(and directed) cycle | (of mixed graphs) with | a directed cycle since | a directed cycle since |
with length 5. | an undirected edge. | none of the | conflicts between |
edges is directed. | directed edges. |
We say that a metric generates a relation set , if the order of any two pairwise distances induced by are the same as in . In this case, we say the relation set is valid. The following result provides a simple characterization for a valid relation set by using its corresponding auxiliary graph.
Lemma 2.1.
There exists a metric generating a relation set , or equivalently, is valid, if and only if there is no directed cycle in the auxiliary graph for the relation set .
Proof.
The “” direction. This direction is relatively easier and we prove it by contradiction. Suppose the relation set is defined over a point set and there is a metric over set that generates all relations in . Then assume there is a directed cycle of graph nodes in the auxiliary graph . Now, one can derive that the distance between the pair of nodes in will be larger than itself by following the edges for , which is a contradiction.
The “” direction. For any two nodes and connected by an undirected edge (i.e., , represents equal distance), we can always contract them together without causing any problems if there is no directed cycle in graph . It is because that for another arbitrary , the directions of edges and are consistent. These two edges will be both going to , coming out from or undirected.
Therefore, we can iteratively contract all node pairs connected by undirected edges until there is no such pair left. The produced graph will be a directed graph, and each node corresponds to a set of nodes in the original graph . It is clear that is acyclic, because any directed cycle in will map back to a directed cycle (of a mixed graph) in simply by selecting an arbitrary representative for all these supernodes in .
A linear order over those supernodes can be constructed by, e.g., topological sort, such that the orders among pairwise distances are consistent with the directed acyclic graph (DAG) . Nodes in represented by the same supernode in share the same value. This provides a way to assign values over all pairwise distances such that the ordinal relations are consistent with . To make it a metric, one can force that the minimum distance is larger than one half of the maximum distance. This way, no matter which three nodes we consider, the sum of two pairwise distances will always be larger than the third pairwise distance. ∎
We are now ready to define the weak consistency.
Definition 2.5 (Weak Consistency).
Given a set of metrics over the same point set , we say that the set of metrics is weakly consistent with if there is no directed cycle in the auxiliary graph for the relation set as specified in Definitions 2.2 and 2.3. In this case, we may also say that is a weakly consistent set, or a weak consensus, over .
The optimization problems we aim to study in this paper are defined as follows.
Definition 2.6 (Strong-MIS and Weak-MIS Problems).
Given a set of metrics on point set , the Strong-MIS problem (resp. Weak-MIS problem) aims to find a minimum subset of , such that all metrics restricted on are strongly consistent (resp. weakly consistent).
The set is called the minimum (strong/weak) inconsistent set, while is called the maximum (strong/weak) consensus set w.r.t. input metrics .
Note that minimizing the inconsistent set is equivalent to maximizing the consensus (although their approximation may not be equivalent).
The decision version of Strong-MIS (and Weak-MIS) problem is as follows: Given and also an integer . Is there a subset with such that restricted on are strongly consistent (resp. weakly consistent)?
We will show in Section 3 and 4 that the decision versions of Strong-MIS and Weak-MIS are in NP (see lemma 3.1 and lemma 4.1). Thus in most proofs, we will only show NP-hardness via reductions from NP-hard problems. The NP-completeness naturally follows by the fact that both decision problems are in NP.
Some specific metrics.
Later in Sections 3 and 4, we will show that the decision version of the minimum inconsistent set (equivalently, maximum consensus set) problem is NP-complete even when input metrics are restricted to two common choices: the Euclidean metric on the line (and thus in any ), and the ultrametrics.
Definition 2.7 (Line metric).
A line metric is a metric , where the distance function , and (Note, this is simply the Euclidean metric on ).
Definition 2.8 (Ultrametric).
An ultrametric is a metric defined on a set , which satisfies the following strong triangle inequality: for any ,
Any finite ultrametric has a corresponding representing tree [8] such that:
-
1.
is a rooted tree with the set of leaf nodes being . is equipped with a height function , where is the set of internal nodes of such that (i) all leaves have the same height; and (ii) is non-increasing along any root to leaf path.
-
2.
For any two leaf nodes and , their distance equals to , namely, the height of their lowest common ancestor (LCA ()).
An example of an ultrametric and its representing tree is given in Appendix A.1.
3 Strong-MIS problem
In this section, we will study the Strong-MIS problem. Specifically, we show in Theorem 3.1 and 3.2 that the decision version of the Strong-MIS problem is NP-complete even when the input metric spaces are restricted to two very simple cases: the line metrics and the ultrametrics. Corollary 3.1 gives an inapproximability result. To complement that, in Theorem 3.4, we provide a -approximation algorithm for the general case.
Intuitively, finding the minimum strongly inconsistent set has a similar flavor as Minimum Vertex Cover or Minimum Hitting Set [10]. Intuitively, in Strong-MIS problem, given data points , a quartet will be a target set if they induce a conflict between any two input metrics, and the goal is to find a Minimum Hitting Set such that for all target sets, at least one element is in . However, to show that Strong-MIS remains hard even for special simple metrics, we need to construct reductions carefully, and sometimes need to use different NP-complete problems to reduce from. We include the list of NP-complete problems used for reductions in Appendix B.
To study Strong-MIS, we first define the so-called Conflict Set, which will be used frequently in NP-hardness proofs.
Definition 3.1 (Conflict Set).
Given metrics on node set , the conflict set induced by is defined as , . Each element in this conflict set is called a conflict quartet.
It is clear that the decision version of Strong-MIS is in NP as stated in the following lemma.
Lemma 3.1.
The decision version of Strong-MIS is in NP.
Proof.
Given a set with size , one can check whether metrics in are strongly consistent on by simply iterating over all possible quartets and comparing their pairwise distances in different metrics. This process takes polynomial time. ∎
It turns out that the decision version of Strong-MIS is weakly NP-complete (which allows the magnitude of data involved to be exponential) even if we restrict the input metrics to be only two line metrics. The proof is in Appendix A.2.
Theorem 3.1.
The decision version of Strong-MIS is weakly NP-complete even when one only considers metric systems where contains only two line metrics.
The following theorem shows that even with only 2 ultrametrics, finding a Strong-MIS remains NP-hard. The proof uses a reduction from Max 2-SAT problem; it is non-trivial and can be found in Appendix A.3.
Theorem 3.2.
Given a metric system , where contains two ultrametrics, The decision version of Strong-MIS is NP-complete.
The hardness result on 2 ultrametrics implies that finding Strong-MIS is also NP-hard when the input is two arbitrary metrics. The result is stated below. This theorem can also be proven directly via a reduction from Minimum Vertex Cover, which for completeness we include the simple details in Appendix A.4.
Theorem 3.3.
Given a metric system , where contains two arbitrary metrics, the decision version of Strong-MIS is NP-complete.
In fact, the proof in Appendix A.4 gives a size-preserving reduction. Hence the Corollary below follows directly from the inapproximability result [12] for Minimum Vertex Cover.
Corollary 3.1.
Strong-MIS with 2 metrics is Unique Games-hard to approximate within a factor , where is an arbitrarily small positive number.
Approximation algorithm for Strong-MIS.
As we mention above, one can consider a collection of all conflict quartet as the target set. The goal is to find a minimum set from such that it intersects (hits) every quartet in . This is actually a special case of -hitting set problem, and it is easy to obtain a 4-approximation algorithm in time time by checking all quartets. However, below we show we can improve this to time complexity.
Theorem 3.4.
Given a metric system where , there is an 4-approximation algorithm for the Strong-MIS problem.
Proof.
Let denote the minimum inconsistent set so that is the maximum consensus for metric system . We propose Algorithm 1, which will compute a set to be removed as inconsistent set.
On the high level, Algorithm 1 starts with sorting all pairs of points based on their distances (If two pairs have the same distance, then they are sorted in lexicographical order). With sorted lists of pairs , we set pointers to the heads of these lists: Let be the pointer to the list of pairwise distances of metric , and denote the pair of data points in that is pointing to. As pointers move down these lists, it checks whether all pointers are pointing to the same pair of points from . If that is the case, it moves on. Otherwise, if (line-14), it means a conflict quartet is discovered, formed by the two pairs and . In this case, we will remove all these points (by adding them to ), and move on. The code from lines 6-8 and 11-13 is to skip all pairs which contain at least one point from the outlier set . The procedure ends when any pointer moves out of bound (i.e., beyond the last element of the list). An illustration of how the algorithm works is given in Figure 9 in Appendix A.5.
Time complexity. There are metrics and pairs. The sorting process takes time. Since the procedure ends when each pointer reaches the tail, the pointers will be moved for at most total times. During each pointer move, the most expensive step is line-11, which can be done in with an array of length indicating if a node is in or not.
Putting everything together, the total time complexity for our algorithm is .
Correctness of algorithm. We consider the points ever added to set . Note that is only updated in line-12, where there is a conflict (i.e, the pair pointed by in list is different from the pair by the pointer for list ). Assume that is smaller than for lexicographical order. We claim that form a conflict quartet. To prove this, first, note that as we skip all pairs that contain any element from (lines 6-8, 11-13), this means that . Hence in list , we have not yet seen (scanned) the pair – as otherwise, at the time when the pointer in reaches , if at that moment the pointer in is not pointing to , we would have already seen a conflict and added to . It then follows that w.r.t. metric , we have that . On the other hand, in list , it must be that we have not yet seen by the same reasoning, meaning that w.r.t. metric . Hence these two pairs form a conflict quartet. Obviously, for any conflict quartet, the minimum inconsistent set has to contain at least one element from it. Furthermore, since all the conflict quartet the algorithm ever identifies are disjoint. This means that the , that is, .
Finally, consider the ordered sublist of , obtained by removing from all pairs that intersect . Then it is easy to see that by construction of the algorithm, all s are the same. In other words, after removing all elements in , the remaining points form a consensus subset for the metrics . Hence is a valid inconsistent set and . It follows that is a 4-approximation of the minimum inconsistent subset for the metric system .
∎
4 Weak-MIS problem
We now focus on the Weak-MIS problem of finding a minimum weakly inconsistent subset. In Theorems 4.1 and 4.2, we show that it is NP-complete for the special case of only three input line metrics or ultrametrics. We provide a straightforward 6-approximation algorithm at the end.

By definition 2.5 and 2.6, if is a consensus set, then the auxiliary graph restricted on must contain no directed cycle. It is well known that a tournament (fully connected directed graph) has a directed cycle if and only if it has a directed triangle [7]. It turns out that a similar result holds for auxiliary graphs, which are mixed graphs: see the claim below. See Figure 2 for the 3 possible cases of directed triangles. The simple proof of this claim can be found in Appendix A.6.
Claim 4.1.
An auxiliary graph has no directed cycle if and only if it has no directed triangle.
Hence to see whether there is any directed cycle in the auxiliary graph, one only needs to check if there is any directed triangle.
Followed by claim 4.1, the decision version of Weak-MIS is in NP.
Lemma 4.1.
The decision version of Weak-MIS is in NP.
Proof.
When there are two metrics, with the tie-breaking rule defined in Definition 2.2, it is clear that we would always prefer the first metric (once the order of input metrics is fixed). Thus in this case, the minimum inconsistent set is simply . The problem of Weak-MIS becomes non-trivial when there are three metrics.
Our first main result is as follows, with proof in Appendix A.7.
Theorem 4.1.
Given a metric system , where contains three line metrics. The decision version of Weak-MIS is weakly NP-complete.
Our second main result is the hardness for ultrametrics.
Theorem 4.2.
Given a metric system , where contains three ultrametrics. The decision version of Weak-MIS is NP-complete.
Proof of Theorem 4.2.
We prove this theorem via a reduction from the so-called 3-dimensional Matching problem. In particular, instead of the problem of finding the minimum inconsistent set, we will consider the equivalent dual version of finding a maximum consensus for a set of 3 ultrametrics.
Description of the reduction. Suppose we are given an instance of 3-dimensional Matching problem (), where . Assume that , while where each relation is of the form with and . A matching is such that each element in can appear at most once in all relations in . The decision version of the 3-dimensional Matching problem is that, given and an integer , does there exist a matching such that ?
From this instance of the 3-dimensional Matching problem, we will now construct an instance of the Weak-MIS problem , where the node set is , and consists of 3 ultrametrics, , and over node set . (Note that we do not use as the node set as is already used in the instance of 3-dimensional Matching problem.) Recall that any ultrametric over node set corresponds to a representing tree, which is a rooted tree where all nodes have a height value, and all leaves (corresponding to node set ) have the same height. In what follows, we will describe the three representing trees and , generating and , respectively. In particular, these three representing trees and all have the same tree shape. However, the height of those internal nodes will be different.
We will first describe the representing tree for . The root has 5 children, represented by and . Each of has exactly children, which are , , and , respectively. Note that these children are leaves, corresponding to the first nodes in the node set (See Figure 3). The node has children, , corresponding to the points in input set of the 3-dimensional Matching instance . The child(ren) of each is defined as: ; all children of s are all leaves. See Figure 3.
Next, we assign height values for nodes in . All leaves (corresponding to elements in the node set where ultrametrics are defined on) have height . The height values for the internal nodes are listed in the row corresponding to in Table 1.
The representing tree (resp. ) has the same tree shape as , and the only difference is that the node and s are replaced by and s (resp. by and s). See Figure 3. The height values of all leaves nodes are still , and the height values of internal nodes are listed in the last two rows of Table 1. Also see Figure 3 where the height of each node is listed in the parenthesis next to each node.
10 | 5 | 4 | 3 | 1 | 2 | 0 | |
10 | 3 | 0 | 2 | 5 | 4 | 1 | |
10 | 2 | 4 | 0 | 1 | 5 | 3 |

This finishes setting up all three representing trees (thus also the ultrametrics and ). Recall that for each ultrametric say , the distance , with , corresponds to the height value of the lowest common ancestor (LCA) of leaves and .
In what follows, we will first prove some properties of the constructed ultrametrics. Specifically, consider the auxiliary graph constructed for the metric system . Recall that each graph node in corresponds to a pair of points from , . For simplicity, we use to represent the set , and similarly for , , and . Given a graph node of , we say that this pair splits if and are from two different sets in (e.g, and ). Now consider a triple ; we refer to (resp, , ) as the -coordinate (resp. - or -coordinate) of . Given a graph node of the auxiliary graph of the form , we say that this pair has shared coordinate, if . This means that shares either -, - or -coordinate. This is the key lemma to guarantee the correctness of our reduction. The proof of this lemma can be found in Appendix A.8. We remark that the height values of all nodes in the three representing trees are chosen carefully so that the lemma below holds. To compute these height values, we in fact write a computer program testing all possible permutations of heights over internal nodes to make sure all conditions in Lemma 4.2 are satisfied.
Lemma 4.2.
Consider a graph node of the auxiliary graph .
-
(i)
If splits, then this graph node cannot appear in any directed triangle in the auxiliary graph .
-
(ii)
Any directed triangle in must contain at least one graph node of the form where this pair have shared coordinate.
-
(iii)
If is of the form and this pair has shared coordinate, then this graph node must participate in at least one directed triangle.
Note that the correctness of the reduction then follows easily from the above key lemma. In particular, we now show that has a matching of size if and only if the metric system has a consensus subset of size .
“” direction: Suppose has a matching of size . Then we claim that the set
forms a consensus subset of w.r.t. the metric system . Specifically, by Claim 4.1, we just need to show that the subgraph of the auxiliary graph spanned by nodes coming from contains no directed triangle. As is a valid matching, no two and can have shared coordinates. It then follows from Lemma 4.2 (ii) that there cannot be any directed triangle in the subgraph .
“” direction: Suppose we have a consensus subset for the metric system such that . First, consider . We know that the subgraph spanned by nodes from contains no directed triangle. By Lemma 4.2 (iii), it then follows that no two , , could have shared coordinate. In other words, the set forms a valid 3D matching for of size . On the other hand, we know that (as the largest possible choise for is ). Since , it then follows that , and thus there exists a 3D matching of of size at least .
As 3-dimensional Matching is NP-complete, it then follows that the decision problem of Weak-MIS is NP-complete. This finishes the proof of Theorem 4.2. ∎
The following theorem is an implication from the previous proof. Similarly as Strong-MIS, we also provide a direct proof for arbitrary metrics in Appendix A.9. In the proof, we construct a size-preserving reduction from Minimum Vertex Cover which again leads to an -inapproximability result.
Theorem 4.3.
Given a metric system , where contains three arbitrary metrics, the decision version of Weak-MIS is NP-complete.
Furthermore, Weak-MIS with 3 metrics is Unique Games-hard to approximate within a factor for an arbitrarily small positive constant .
Finally, there is a simple 6-approximation algorithm with running time : Specifically, given a metric system with , we first build auxiliary graph as described earlier in time. We want to construct an outlier set so that it “hits” all directed triangles in : note that this will then guarantee that is a consensus set w.r.t. . To this end, we simply enumerate all directed triangles in in time. We then initialize and go through the list of directed triangles one by one. For each directed triangle (where each is a pair of points in ), if does not intersect with any of the points included in , then we simply add all these points (at most 6 distinct points) to . Otherwise, this triangle is already “hit” by and we do nothing. Let be the minimum weakly inconsistent set (i,e, the optimal solution for Weak-MIS). It is easy to see that , as all the 6-tuples (at most 6) we ever added to are all disjoint, and for each such 6-tuple, must contain at least one point from it. Furthermore, it is also easy to see that after removing all points , the resulting auxiliary graph restricted to only pairs not containing points in is free of directed triangles, and thus free of directed cycles by Lemma 2.1 and Claim 4.1. Hence is a consensus set w.r.t. , and . Hence:
Theorem 4.4.
There is a 6-approximation algorithm for the Weak-MIS problem that runs in time for metrics defined on a point set of size .
5 Conclusion
In this paper, we proposed to study the maximum ordinal consensus problem over a set of input metrics. We developed two concepts of “consistency” that only rely on ordinal information of pairwise distances. We proved several hardness results for both definitions with different input metrics. We also developed constant-factor approximation algorithms for the minimum inconsistent set problem under both definitions.
There are still some open directions for future work. For example, can we close the gap between the inapproximability and the approximation algorithm we developed? Can we improve the time complexity, especially for the Weak-MIS problem? Can we find better approximation algorithms for special cases such as Euclidean metrics or ultrametrics? We also note that the current approximation algorithms target the minimum (ordinally) inconsistent subset problems – how about the dual maximum ordinal consensus problem?
6 Acknowledgment
This work is partially supported by National Science Foundation (NSF) under grant IIS-1815697, as well as National Institute of Health (NIH) under grant R01EB022899.
References
- [1] A. V. Aho, Y. Sagiv, T. G. Szymanski, and J. D. Ullman. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal on Computing, 10(3):405–421, 1981. URL: https://doi.org/10.1137/0210030, arXiv:https://doi.org/10.1137/0210030, doi:10.1137/0210030.
- [2] A. Amir and D. Keselman. Maximum agreement subtree in a set of evolutionary trees: Metrics and efficient algorithms. SIAM Journal on Computing, 26(6):1656–1669, 1997. URL: https://doi.org/10.1137/S0097539794269461, arXiv:https://doi.org/10.1137/S0097539794269461, doi:10.1137/S0097539794269461.
- [3] Vincent Berry and François Nicolas. Maximum agreement and compatible supertrees. Journal of Discrete Algorithms, 5(3):564 – 591, 2007. Selected papers from Ad Hoc Now 2005. URL: http://www.sciencedirect.com/science/article/pii/S1570866706000785, doi:https://doi.org/10.1016/j.jda.2006.08.005.
- [4] Steffen Bickel and Tobias Scheffer. Multi-view clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM ’04, pages 19–26, Washington, DC, USA, 2004. IEEE Computer Society. URL: http://dl.acm.org/citation.cfm?id=1032649.1033432.
- [5] X. Cai, F. Nie, H. Huang, and F. Kamangar. Heterogeneous image feature integration via multi-modal spectral clustering. In CVPR 2011, pages 1977–1984, June 2011. doi:10.1109/CVPR.2011.5995740.
- [6] Guoqing Chao, Shiliang Sun, and Jinbo Bi. A survey on multi-view clustering. CoRR, abs/1712.06246, 2017. URL: http://arxiv.org/abs/1712.06246, arXiv:1712.06246.
- [7] Michael Dom, Jiong Guo, Falk Hüffner, Rolf Niedermeier, and Anke Truss. Fixed-parameter tractability results for feedback set problems in tournaments. J. of Discrete Algorithms, 8(1):76–86, March 2010. URL: http://dx.doi.org/10.1016/j.jda.2009.08.001, doi:10.1016/j.jda.2009.08.001.
- [8] Oleksiy Dovgoshey and Evgeniy Petrov. Properties and morphisms of finite ultrametric spaces and their representing trees. p-Adic Numbers, Ultrametric Analysis and Applications, 11(1):1–20, Jan 2019. URL: https://doi.org/10.1134/S2070046619010011, doi:10.1134/S2070046619010011.
- [9] M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np-complete problems. In Proceedings of the Sixth Annual ACM Symposium on Theory of Computing, STOC ’74, page 47–63, New York, NY, USA, 1974. Association for Computing Machinery. URL: https://doi.org/10.1145/800119.803884, doi:10.1145/800119.803884.
- [10] Michael R. Garey and David S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990.
- [11] Richard M. Karp. Reducibility among Combinatorial Problems, pages 85–103. Springer US, Boston, MA, 1972. URL: https://doi.org/10.1007/978-1-4684-2001-2_9, doi:10.1007/978-1-4684-2001-2_9.
- [12] Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within 2-. Journal of Computer and System Sciences, 74(3):335 – 349, 2008. Computational Complexity 2003. URL: http://www.sciencedirect.com/science/article/pii/S0022000007000864, doi:https://doi.org/10.1016/j.jcss.2007.06.019.
- [13] Abhishek Kumar, Piyush Rai, and Hal Daume. Co-regularized multi-view spectral clustering. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 1413–1421. Curran Associates, Inc., 2011. URL: http://papers.nips.cc/paper/4360-co-regularized-multi-view-spectral-clustering.pdf.
- [14] Danial Lashkari and Polina Golland. Convex clustering with exemplar-based models. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 825–832. Curran Associates, Inc., 2008. URL: http://papers.nips.cc/paper/3181-convex-clustering-with-exemplar-based-models.pdf.
- [15] Arpiar Saunders, Evan Z. Macosko, Alec Wysoker, Melissa Goldman, Fenna M. Krienen, Heather de Rivera, Elizabeth Bien, Matthew Baum, Laura Bortolin, Shuyu Wang, Aleksandrina Goeva, James Nemesh, Nolan Kamitaki, Sara Brumbaugh, David Kulp, and Steven A. McCarroll. Molecular diversity and specializations among the cells of the adult mouse brain. Cell, 174(4):1015 – 1030.e16, 2018. URL: http://www.sciencedirect.com/science/article/pii/S0092867418309553, doi:https://doi.org/10.1016/j.cell.2018.07.028.
- [16] Chang Xu, Dacheng Tao, and Chao Xu. A survey on multi-view learning. CoRR, abs/1304.5634, 2013. URL: http://arxiv.org/abs/1304.5634, arXiv:1304.5634.
Appendix A Missing Details
A.1 An Ultrametric and its Representing Tree
Figure 4 shows an example of an ultrametric and its representing tree. Values on internal nodes are their heights, and all leaf nodes are with height 0. The distance between any two leaf nodes is the height function value of their LCA. For example, the distance from to is 3.

A.2 Proof of Theorem 3.1
Proof.
We prove this theorem via a reduction from the Minimum Vertex Cover problem.
Description of the reduction. Suppose we are given an instance of Minimum Vertex Cover, where and . A vertex cover is such that every edge in has at least one endpoint in . The decision version of the Minimum Vertex Cover problem is that, given and an integer , is there a vertex cover such that ? From the instance of Minimum Vertex Cover, we construct an instance of the Strong-MIS problem , where are two line metrics. Here , and . The and are two “pivots” for th edge .
Since all these points are on real line , we will use and to represent coordinates. When the coordinate is the same for both metrics or there is no ambiguity, we will omit the subscript and use for the coordinate. For example, is the coordinate of point in . Instead of constructing directly, we construct the coordinates of each point as follows (see Figure 5 for an example):
-
1.
For both , .
-
2.
For both , the coordinate of .
-
3.
For , and , its coordinate in is: . Similarly, its coordinate in is: . The intuition is the distance from to is close to the distance from to , but orders are different in two metrics.

Check comparable pairwise distances. The distance between any pair of nodes under metrics and can be easily calculated based on coordinates and . We now show that any conflict quartet of the above constructed instance must have the form of , where . We first consider the necessary conditions for two pairs and to cause a conflict. W.l.o.g., we can assume that and in both metrics.
-
(1)
To cause a conflict, the distances and should be close so that the relation between and can be different. One can notice that the distance can be distorted by at most from one metric to the other, thus in a more formal way, close means “differ by at most .
-
(2)
Another necessary condition for and to cause a conflict is that is not identical to in any one of the two metrics. For example, assume that (equivalently and . Here and are some integers. When comes to coordinates in the second metric, one can show that as only the polarity of term is changed across two metrics.
Combining those two conditions, we define two pairs are comparable if they are different and differ by at most . A conflict is possible only when two pairs in the quartet are comparable. We then iterate over all pairs and check whether there is any other pair with a comparable pairwise distance. We will also use the leading term (the largest power of two) of coordinates to find comparable pairs. The leading term of and s are of different scales, and has the same leading term of . Moreover, if has a larger leading term than , then .
The necessary condition of two pairs and (where we assume , and ) having comparable distances is either 1). nodes and have the leading terms of the same scale. or 2). nodes and have the leading term of the same scale. The reason is (assuming has a larger leading term than and ) that and implies if none of the conditions are satisfied. In either case where the necessary condition is satisfied, there must be two nodes of the form for some edge . This observation largely reduced the possibilities of comparable pairs. All possible cases are considered below (we use “” to denote unknown elements and “” to denote comparable distances):
-
1.
and for some edge . We have . Assume , and we have , this is true only when and by checking the leading term of .
-
2.
and . By similar calculation, we should easily get and assuming .
And indeed, is a conflict quartet (where ) since but . Now, we can conclude that the quartets in conflict set have one-to-one correspondence to edges in , i.e., . Now it is not hard to prove there is a size-preserving reduction from Minimum Vertex Cover to Strong-MIS (with two line metrics).
There is a vertex cover of size for there is an inconsistent set of size for : Assume there is a vertex cover of with size denoted by . Then the corresponding set is a solution for , since this set covers all edges and thus all possible conflict quartets. The remaining nodes will have no conflict quartet.
direction: Conversely, assume there is an optimal inconsistent set with size . It is clear that there is always an optimal solution to which only includes s. The reason is that including can potentially cover multiple conflict quartets, but (or ) will only cover at most one conflict. For any solution containing (or ), including one of the endpoints of will also cover the conflict (covered by ). And it can potentially cover other conflicts and thus reduce the size of the solution. Assume one optimal solution is which covers all conflict quartets, then the corresponding node set is a vertex cover covering all edges due to the one-to-one correspondence between edges in and quartets in . ∎
A.3 Proof of Theorem 3.2
Proof.
We prove this theorem via a reduction from the Max 2-SAT problem.
Description of reduction. Suppose we are given an instance of the Max 2-SAT problem, where is a set of clauses. There are clauses with one literal and clauses with two literals. And is a set of variables. The decision version of the Max 2-SAT problem is that, given and an integer , is there an assignment to variables such that clauses are satisfied?
From the instance of Max 2-SAT problem, we construct the following instance of Strong-MIS where and are ultrametrics. In particular, we will set up each metric via its representing tree as introduced in Definition 2.8. As we will see below, both representing trees of will have the same structure; while the only difference is the height functions over internal nodes.
First, we will describe the tree structure (which will be common for both and ). The root of has children, , where s correspond to th clauses in . The node has children denoted by (here for simplicity, we use to denote the children set of ). The size of is designed to be larger than the total number of literals (allowing duplicates) appeared in the clause set . We show later that one can assume a maximum consensus always contain all nodes from . Each has at most two children corresponding to the literals in clause , and each has two leaf nodes s as its children. For example, if clause has two literals, then will have two children ; And has two children (leaves) , has two children (leaves) . If clause has only one literal, then will have one child which has two children (leaves) , . See an illustration of the tree structure in Figure 6.
Recall that the leaf set of this representing tree corresponds to the note set of the metric. Hence, in the constructed Strong-MIS instance, we have that the note set is with size .

Now we equip this tree structure with two height functions and mapping internal nodes of to real values. It generates metrics and , respectively. In particular, recall that given any two leaves and , their distance . We set up so that and . Note that the precise value of each height does not matter as only the ordering of pairwise distances matters. We use and ( and have the same tree structure as ) to denote the two representing trees for and , respectively.
To see how the maximum consensus problem for relates to the maximum satisfiability of , note the following: (1) Intuitively, the height functions and guarantee that, for a consensus , we cannot include both children of and , the heights of and have opposite orders and thus inconsistent w.r.t. and . For instance, in the example of Figure 6, one cannot include (children of ) and (children of ) in . (2) The heights of the internal node guarantee that for each , one can only choose either one of its leaf nodes or two of its leaf nodes which are the children of the same literal (See Figure 8). For the instance shown in Figure 6, we can include both and in since they are the children of the same literal . However, one cannot include both and in because and .
Basically, for the construction, the consensus can include two leaf nodes when the corresponding clause is satisfied; Otherwise it can only include one leaf node. Below we show that there is an assignment of satisfying clauses in if and only if there is a consensus of of cardinality .
direction:
Assume there is an assignment to variables of such that clauses are satisfied. Then we have a consensus of size as follows. We first include all s. For each satisfied clause, we randomly selected one of its literal with true value and keep both of its children. For the clauses not satisfied, we keep any one of its four leaves. The selected node set is strongly consistent, because (1): there are no leaf nodes from different literals of a clause; 2). we do not keep both leaf nodes from two literals (say, and ) of the same variable as we only keep both leaves of true literals. The size of the set is .
More specifically, w.l.o.g., assume nodes in are (See Figure 7). We can also assume that the corresponding literals for are . Then all pairwise distances in form the set (and for ). One can check the order of those heights are the same in two metrics, and thus is a consensus.

direction: Assume there is maximum consensus of with nodes. Then we want to show that there is an assignment satisfying at least clauses based on the following observations.
-
1.
There is always an optimal maximum consensus that keeps all s. Simply including all s will produce a consensus with nodes. If one optimal solution does not include all s, it must have removed at least s to avoid any conflict. In this case, there are only at most nodes left.
-
2.
For the restricted tree on the nodes from the maximum consensus (including all s), there are only two possible options for a subtree rooted at s (See Figure 8). The reason is that we cannot keep leaf nodes of both literals (e.g., we cannot include both and for the case in Figure 6), which will cause a conflict since but .
-
3.
For the literal with both children included, there is no conflict. Therefore, e.g., if has both children included, then cannot. This is because the heights of and are not consistent, their four leaf nodes form a conflict quartet.

Based on these observations, there is an optimal solution having exactly subtrees rooted at s with two leaves selected. The two leaves must be the children of one literal. Also, there will not be any contradiction from those literals with two children selected (i.e., , cannot both have two children selected) We can assign true value to these literals, and the corresponding clauses are satisfied. ∎
A.4 Proof of Theorem 3.3
Proof.
We prove this theorem via a reduction from Minimum Vertex Cover.
Description of reduction. Given an instance of Minimum Vertex Cover, , where , . We construct an instance of Strong-MIS, where ( are two arbitrary metrics) and . Here correspond to nodes in .
For these two metrics, , and , for . Those pairwise distances are used as standards which will be compared with pairwise distances of and , for those pairwise distances,
Clearly, for a fixed edge , is a conflict quartet, since but . We remark that there is an optimal inconsistent set of which does not have any point from . It is because by removing all ’s except one; And to cover any conflict quartet, one has to remove at least s.
Now, we want to show that there is an optimal vertex cover of size if and only if there is an inconsistent subset of size .
direction: If there is a vertex cover of size for , denoted by . Then the corresponding set is a inconsistent set, because for the remaining nodes , their pairwise distances are all 1 in both metrics, therefore they form a consensus together with s.
direction: Conversely, if there is a minimum inconsistent set with size (notice that is always smaller than and only contains nodes from ), denoted as , then there is a vertex cover . That is because, by removing , the remaining elements form a consensus set. Thus for any edge , there is at least one of in .
∎
A.5 Missing Figure in Theorem 3.4
The following figure shows an illustration for the 4-approximation algorithm for Strong-MIS problem.

A.6 Proof of Claim 4.1
Proof.
Given a mixed graph , assume is the directed cycle with smallest length and . For any edge (addition modulo ), it is either undirected or from to . W.l.o.g., let be a directed edge. Consider the edge connecting and , there are three cases. If the edge connecting and is undirected or from to , form a directed triangle. Otherwise (i.e., the edge is from to ), then forms a smaller directed cycle. Both cases contradict with the assumption of smallest length. ∎
A.7 Missing proof of Theorem 4.1
Proof.
We prove this theorem via a reduction from the Minimum Vertex Cover problem. Here we propose a similar construction of Theorem 3.1.
Description of the reduction. Suppose we are given a Minimum Vertex Cover instance , , . We construct an instance ( of the Weak-MIS problem, where are line metrics and . Here, we also use , , and to represent the coordinates in , and , respectively. When the coordinate is the same across all three metrics or there is no ambiguity, we will omit the subscript and use instead.
The coordinates are constructed as follow, and the intuition is that only triples like (where ) can potentially cause a directed triangle (See Figure 10 for an example):
-
1.
For all of these line metrics, the coordinates .
-
2.
For all of these metrics, , and .
-
3.
For each edge , in , , ; in , , ; in , , .

Comparable pairwise distances. Similarly as Theorem 3.1, we will use the concept of comparable distances. Here two distances are comparable if they are different and differ by at most . The constant factor for is changed accordingly since the difference between the coordinates of the same point in two metrics is at most (instead of in Theorem 3.1). The leading terms of s, (or ) and (or ) are in different scales. For any edge (here we overuse for simplicity), the comparable pairs are (recall that are comparable only when either or have the same leading term):
-
1.
, and . The distances are around .
-
2.
and . The distances are around .
-
3.
and . The distances are around .
-
4.
and . The distances are around .
A triangle can potentially be directed only when all three underlying edges are mutually comparable.
For any triangle, there are three cases in total depending on the scale of distances of three underlying edges .
-
1.
If the distances of these three pairs all are not comparable, the -distortion will not affect the orders of the distances at all. Therefore, the three metrics should agree on the ordering of these pairwise distances. And clearly, the triangle is not directed.
-
2.
If two of them have comparable distance, e.g., and . Then w.l.o.g., assume has much larger distance compared with those two. Then the edge in the auxiliary graph from to the other two are all outgoing, thus the triangle is not directed. Otherwise, if has a smaller distance, the edges are all incoming and the triangle is also not directed.
-
3.
Only when all three edges have comparable distances, there will be possibility that the triangle is directed. If we look at the triangle formed by . We have relations , and by plurality vote. And they indeed form a directed triangle from to to , and back to .
Therefore, the directed triangles are . Including s in the inconsistent set is better than including or since it can possibly cover multiple directed triangles. Thus, there is always an optimal solution consisting of only s. One can always replace and s by s (where ) and still cover the directed triangles. Now we are ready to show that there is a vertex cover of size for if and only if there is a weak inconsistent set of size for .
direction: Assume is a vertex cover for . Then clearly is a weak inconsistent set. This is because covers all potential directed triangles.
“” direction: We showed that there is always an optimal solution only consisting of s. Assume the optimal solution is . To cover all directed triangles, for any edge , there must be at least one corresponding node or contained in the inconsistent set. Therefore, is a vertex cover of .
∎
A.8 Proof of Lemma 4.2
Proving statement (i) of Lemma 4.2.
To prove (i), note that if splits, then is the root in each of the three representing trees, and thus , for . Now take any two other graph nodes . A simple case analysis shows that no matter these two other pairs split or not, the three graph nodes cannot form a directed triangle. For example, if neither , then there will be two directed edges and in the auxiliary graph , and thus no matter what direction the edge between and is, the triangle cannot be directed. This proves (i).
Proving statement (ii) of Lemma 4.2.
Note that by (i), to check whether auxiliary graph has directed triangles or not, we only need to consider the subgraph of spanned by nodes which do not split. Hence each node will be such that where could be or ; in this case, we say that is of type-. Two nodes are of the same type if they are both of type- (i.e, both of type-, , etc).
For such a generic non-splitting node , we list its distance under ultrametric , and , respectively, which is also the height of the in representing trees and , respectively. See Figure 11. In particular, for a non-splitting node , we first only consider the case that do not share coordinate. (The only non-splitting nodes not covered by the table in Figure 11 is of the form where and have shared coordinate.)
non-splitting graph node | |||
5 | 3 | 2 | |
4 | 0 | 4 | |
3 | 2 | 0 | |
1 | 5 | 1 | |
with no shared coordinate | 2 | 4 | 5 |

Based on these distances, the subgraphs induced by such nodes are shown in the right picture of Figure 11. Note that this subgraph does not contain any directed triangle. Combining with (i), this means that any directed triangles , where s are all of different types, has to contain a node such that have shared coordinate(s).
To prove (ii), what remains is to consider a triangle where at least two of them, say and are of the same type. Suppose all s are of types listed in the table of Figure 11, then it is easy to see that in this case, the resulting triangle can only be of the shapes in Figure 12, and thus cannot be a directed triangle. Hence at least one of has to be non-splitting (by statement (i)), yet not included in the types covered in the table of Figure 11 – in other words, at least one is of the form where have shared coordinate(s).

Putting the above two paragraphs together, statement (ii) then follows.
Proving statement (iii) of Lemma 4.2.
Finally, consider a node of type-, but such that and have shared coordinates. In particular, and could share -coordinate, share -coordinate, share -coordinate, share both - and -coordinates, share both - and -coordinates, or share both - and -coordinates.
non-splitting graph node | |||
5 | 3 | 2 | |
4 | 0 | 4 | |
3 | 2 | 0 | |
1 | 5 | 1 | |
with shared and coordinates | 2 | 1 | 3 |

By a simple but tedious case analysis, one can verify that in each of these 6 cases, will form a directed with some pair of nodes from . For example, suppose and share both - and - coordinates. Then the subgraph induced by and is shown in Figure 13, and the triangle formed by and is a directed triangle. This thus proves statement (iii).
As commented in the main text, the heights of these nodes are computed by a computer program to guarantee the three statements.
A.9 Proof of Theorem 4.3
Proof.
We prove the theorem via a reduction from Minimum Vertex Cover to Weak-MIS.
Description of the reduction. Given an instance of Minimum Vertex Cover, , and . We construct an instance of Weak-MIS with 3 metrics on node set . We will assign an index number for each edge, and will use that index to assign distances between nodes incident to that edge.
For any edge , we construct the following gadget (Figure 14), where and .

All the other pairwise distances are . One can check that, in the auxiliary graph , a directed triangle (if exists) always corresponds to an edge , and the triangle is consist of and . It is because that the only comparable distances are triples of form . Now we prove that there is an vertex cover of size for if and only if there is a (weakly) inconsistent set of size for .
direction: Assume for graph , there is a vertex cover . Then is a Weak-MIS, since each directed triangle corresponds to an edge in , and has at least one relevant node in ().
direction: If we have a Weak-MIS of size , denoted as (there is always an optimal solution only consisting of s). There is no directed triangle means that for each edge, at least one endpoint is in , which is a vertex cover of size .
∎
A.10 Summary of Hardness Results
We end this section with a summary table showing hardness results in different cases.
Problem | Input | Hardness | Inapprox. factor | Approx. algorithm |
Strong-MIS | 2 Arbitrary metrics | NP-complete (Theorem 3.3) | 2 UGC (Corollary 3.1) | 4 (Theorem 3.4) |
Strong-MIS | 2 Line metrics | Weakly NP-complete (Theorem 3.1) | ? | 4 |
Strong-MIS | 2 Ultrametrics | NP-complete (Theorem 3.2) | ? | 4 |
Weak-MIS | 3 Arbitrary metrics | NP-complete | 2 UGC (Theorem 4.3) | 6 (Theorem 4.4) |
Weak-MIS | 3 Line metrics | Weakly NP-complete (Theorem 4.1) | ? | 6 |
Weak-MIS | 3 Ultrametrics | NP-complete (Theorem 4.2) | ? | 6 |
Appendix B NP-complete Problem Repository
Here we list the NP-complete problems that are used in the hardness proofs.
Definition B.1 (Minimum Vertex Cover).
[11]
Instance: Graph and a positive integer .
Question: Does have a vertex cover of size at most ?
A vertex cover is a set of nodes that every edge has at least one endpoint in . This is a classical problem mentioned in Karp’s 21 np-complete problems.
Definition B.2 (Max 2-SAT).
[9]
Instance: Given a boolean expression of variables in conjunctive normal form (CNF) that is the
conjunction of clauses over variables, each of which is the disjunction of at most two distinct literals. An integer .
Question: Is there an assignment to variables such that clauses are satisfied?
Definition B.3 (Tournament Feedback Vertex Set).
[7]
Instance: A tournament (fully connected directed graph) and an integer .
Question: Is there a vertex set with at most nodes whose deletion will result in an acyclic directed graph.
In [7], Dom showed that Feedback Vertex Set problem is NP-complete.
Definition B.4 (3-dimensional Matching).
[11]
Instance: Let be three disjoint sets with the same size. And consists of triples where . Given and an integer .
Question: Is there a 3-dimensional matching with size at least ?
This is also a problem in the list of Karp’s 21 np-complete problems.