This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newaliascnt

proptheorem \aliascntresettheprop \newaliascntlemmatheorem \aliascntresetthelemma \newaliascntobservationtheorem \aliascntresettheobservation \newaliascntcorollarytheorem \aliascntresetthecorollary \newaliascntconjecturetheorem \aliascntresettheconjecture \newaliascntclaimtheorem \aliascntresettheclaim

A Tight Approximation Algorithm for the
Cluster Vertex Deletion Problem

Manuel Aprile Matthew Drescher Samuel Fiorini  and  Tony Huynh
Département de Mathématique
Université libre de Bruxelles
Brussels, Belgium
[email protected], [email protected], [email protected]
School of Mathematics
Monash University
Melbourne, Australia
[email protected]
Abstract.

We give the first 22-approximation algorithm for the cluster vertex deletion problem. This is tight, since approximating the problem within any constant factor smaller than 22 is UGC-hard. Our algorithm combines the previous approaches, based on the local ratio technique and the management of true twins, with a novel construction of a “good” cost function on the vertices at distance at most 22 from any vertex of the input graph.

As an additional contribution, we also study cluster vertex deletion from the polyhedral perspective, where we prove almost matching upper and lower bounds on how well linear programming relaxations can approximate the problem.

Key words and phrases:
Approximation algorithm and Cluster vertex deletion and Linear programming relaxation and Sherali-Adams hierarchy.
This project was supported by ERC Consolidator Grant 615640-ForEFront. Samuel Fiorini and Manuel Aprile are also supported by FNRS grant T008720F-35293308-BD-OCP. Tony Huynh is also supported by the Australian Research Council.

1. Introduction

A cluster graph is a graph that is a disjoint union of complete graphs. Let GG be any graph. A set XV(G)X\subseteq V(G) is called a hitting set if GXG-X is a cluster graph. Given a graph GG and (vertex) cost function c:V(G)0c:V(G)\to\mathbb{Q}_{\geqslant 0}, the cluster vertex deletion problem (Cluster-VD) asks to find a hitting set XX whose cost c(X):=vXc(v)c(X):=\sum_{v\in X}c(v) is minimum. We denote by OPT(G,c)\operatorname{\mathrm{OPT}}(G,c) the minimum cost of a hitting set.

If GG and HH are two graphs, we say that GG contains HH if some induced subgraph of GG is isomorphic to HH. Otherwise, GG is said to be HH-free. Denoting by PkP_{k} the path on kk vertices, we easily see that a graph is a cluster graph if and only if it is P3P_{3}-free. Hence, XV(G)X\subseteq V(G) is a hitting set if and only if XX contains a vertex from each induced P3P_{3}.

Cluster-VD has applications in graph modeled data clustering in which an unknown set of samples may be contaminated. An optimal solution for Cluster-VD can recover a clustered data model, retaining as much of the original data as possible [24]. Vertex deletion problems such as Cluster-VD, where one seeks to locate vertices whose removal leaves a graph with desirable properties, often arise when measuring robustness and attack tolerance of real-life networks [1, 4, 25].

From what precedes, Cluster-VD is a hitting set problem in a 33-uniform hypergraph, and as such has a “textbook” 33-approximation algorithm (see for instance the introduction of [18]). Moreover, the problem has an approximation-preserving reduction from Vertex Cover. By adding a pendant edge to each vertex of GG, one checks that solving Cluster-VD on the new graph is equivalent to solving Vertex Cover on the original graph (see Proposition 4 for more details). Hence, obtaining a (2ε)(2-\varepsilon)-approximation algorithm for some ε>0\varepsilon>0 would contradict either the Unique Games Conjecture or P \neq NP.

The first non-trivial approximation algorithm for Cluster-VD was a 5/25/2-approximation due to You, Wang and Cao [39]. Shortly afterward, Fiorini, Joret and Schaudt gave a 7/37/3-approximation [18], and subsequently a 9/49/4-approximation [19].

1.1. Our contribution

In this paper, we close the gap between 22 and 9/4=2.259/4=2.25 and prove the following tight result.

Theorem 1.

Cluster-VD has a 22-approximation algorithm.

All previous approximation algorithms for Cluster-VD are based on the local ratio technique. See the survey of Bar-Yehuda, Bendel, Freund, and Rawitz [22] for background on this standard algorithmic technique. Our algorithm is no exception, see Algorithm 1 below. However, it significantly differs from previous algorithms in its crucial step (see Step 14 below). In fact, almost all our efforts in this paper focus on that particular step of the algorithm (see Theorem 2), which searches for a special type of induced subgraph of GG, which we now describe.

Let HH be an induced subgraph of GG, and let cH:V(H)0c_{H}:V(H)\to\mathbb{Q}_{\geqslant 0}. The weighted graph (H,cH)(H,c_{H}) is said to be α\alpha-good in GG (for some factor α1\alpha\geqslant 1) if cHc_{H} is not identically 0 and cH(XV(H))αOPT(H,cH)c_{H}(X\cap V(H))\leqslant\alpha\cdot\operatorname{\mathrm{OPT}}(H,c_{H}) holds for every (inclusionwise) minimal hitting set XX of GG. We overload terminology and say that an induced subgraph HH is α\alpha-good in GG if there exists a cost function cHc_{H} such that (H,cH)(H,c_{H}) is α\alpha-good in GG. We stress that the local cost function cHc_{H} is defined obliviously of the global cost function c:V(G)0c:V(G)\to\mathbb{Q}_{\geqslant 0}.

A pair of vertices u,uu,u^{\prime} of GG are called twins111We warn the reader that, in other papers, twins are usually called true twins, whereas two vertices which have the same set of neighbours are called false twins. Since we have no need of false twins in this paper, we have chosen to use twins in place of true twins. if uuE(G)uu^{\prime}\in E(G), and for all vV(Guu)v\in V(G-u-u^{\prime}), uvE(G)uv\in E(G) if and only if uvE(G)u^{\prime}v\in E(G). We say that GG is twin-free if GG has no twins. As in [18, 19], if GG has a pair of twins u,uu,u^{\prime}, then Cluster-VD admits an easy reduction step (see Steps 812). The idea is simply to add the cost of uu^{\prime} to that of uu and delete uu^{\prime}. This works since uu^{\prime} belongs to a minimal hitting set of GG if and only if uu does (see [18] for a complete proof). Therefore, when searching for α\alpha-good induced subgraphs, we may assume that GG is twin-free, which is crucial for our proofs.

Algorithm 1 Cluster-VD-apx(G,c)\textsc{Cluster-VD-apx}(G,c)
0:  (G,c)(G,c) a weighted graph
0:  XX a minimal hitting set of GG
1:  if GG is a cluster graph then
2:     XX\leftarrow\varnothing
3:  else if there exists uV(G)u\in V(G) with c(u)=0c(u)=0 then
4:     GGuG^{\prime}\leftarrow G-u
5:     c(v)c(v)c^{\prime}(v)\leftarrow c(v) for vV(G)v\in V(G^{\prime})
6:     XCluster-VD-apx(G,c)X^{\prime}\leftarrow\textsc{Cluster-VD-apx}(G^{\prime},c^{\prime})
7:     XXX\leftarrow X^{\prime} if XX^{\prime} is a hitting set of GG; XX{u}X\leftarrow X^{\prime}\cup\{u\} otherwise
8:  else if there exist twins u,uV(G)u,u^{\prime}\in V(G)  then
9:     GGuG^{\prime}\leftarrow G-u^{\prime}
10:     c(u)c(u)+c(u)c^{\prime}(u)\leftarrow c(u)+c(u^{\prime}); c(v)c(v)c^{\prime}(v)\leftarrow c(v) for vV(Gu)v\in V(G^{\prime}-u)
11:     XCluster-VD-apx(G,c)X^{\prime}\leftarrow\textsc{Cluster-VD-apx}(G^{\prime},c^{\prime})
12:     XXX\leftarrow X^{\prime} if XX^{\prime} does not contain uu; XX{u}X\leftarrow X^{\prime}\cup\{u^{\prime}\} otherwise
13:  else
14:     find a weighted induced subgraph (H,cH)(H,c_{H}) that is 22-good in GG
15:     λmax{λvV(H):c(v)λcH(v)0}\lambda^{*}\leftarrow\max\{\lambda\mid\forall v\in V(H):c(v)-\lambda c_{H}(v)\geqslant 0\}
16:     GGG^{\prime}\leftarrow G
17:     c(v)c(v)λcH(v)c^{\prime}(v)\leftarrow c(v)-\lambda^{*}c_{H}(v) for vV(H)v\in V(H); c(v)c(v)c^{\prime}(v)\leftarrow c(v) for vV(G)V(H)v\in V(G)\setminus V(H)
18:     XCluster-VD-apx(G,c)X\leftarrow\textsc{Cluster-VD-apx}(G^{\prime},c^{\prime})
19:  end if
20:  return XX

We will use two methods to establish α\alpha-goodness of induced subgraphs. We say that (H,cH)(H,c_{H}) is strongly α\alpha-good if cHc_{H} is not identically 0 and cH(V(H))αOPT(H,cH)c_{H}(V(H))\leqslant\alpha\cdot\operatorname{\mathrm{OPT}}(H,c_{H}). Clearly, if (H,cH)(H,c_{H}) is strongly α\alpha-good then (H,cH)(H,c_{H}) is α\alpha-good in GG, for every graph GG which contains HH. We say that HH itself is strongly α\alpha-good if (H,cH)(H,c_{H}) is strongly α\alpha-good for some cost function cHc_{H}.

Let Ni[v]N_{\leqslant i}[v] (resp. Ni(v)N_{i}(v)) be the set of vertices at distance at most (resp. equal to) ii from vertex vv. We abbreviate N(v):=N1(v)N(v):=N_{1}(v) and N[v]:=N1[v]N[v]:=N_{\leqslant 1}[v]. If we cannot find a strongly α\alpha-good induced subgraph in GG, we will find an induced subgraph HH that has a special vertex v0v_{0} such that N[v0]N[v_{0}] is entirely contained in HH, and a cost function cH:V(H)0c_{H}:V(H)\to\mathbb{Z}_{\geqslant 0} such that cH(v)1c_{H}(v)\geqslant 1 for all vertices vN[v0]v\in N[v_{0}] and cH(V(H))αOPT(H,cH)+1c_{H}(V(H))\leqslant\alpha\cdot\operatorname{\mathrm{OPT}}(H,c_{H})+1. Notice that no minimal hitting set XX can contain all the vertices of N[v0]N[v_{0}], since if XX contains N(v0)N(v_{0}), then v0v_{0} is an isolated clique. Hence, cH(XV(H))cH(V(H))1αOPT(H,cH)c_{H}(X\cap V(H))\leqslant c_{H}(V(H))-1\leqslant\alpha\cdot\operatorname{\mathrm{OPT}}(H,c_{H}) and so (H,cH)(H,c_{H}) is α\alpha-good in GG. We say that (H,cH)(H,c_{H}) (sometimes simply HH) is centrally α\alpha-good (in GG) with respect to v0v_{0}. Moreover, we call v0v_{0} the root vertex.

In order to illustrate these ideas, consider the following two examples (see Figure 1). First, let HH be a C4C_{4} (that is, a 44-cycle) contained in GG and 𝟏H\mathbf{1}_{H} denote the unit cost function on V(H)V(H). Then (H,𝟏H)(H,\mathbf{1}_{H}) is strongly 22-good, since vV(H)𝟏H(v)=4=2OPT(H,𝟏H)\sum_{v\in V(H)}\mathbf{1}_{H}(v)=4=2\operatorname{\mathrm{OPT}}(H,\mathbf{1}_{H}). Second, let HH be a P3P_{3} contained in GG, starting at a vertex v0v_{0} that has degree-11 in GG. Then (H,𝟏H)(H,\mathbf{1}_{H}) is centrally 22-good with respect to v0v_{0}, but it is not strongly 22-good.

11111111
111111
Figure 1. (C4,𝟏C4)(C_{4},\mathbf{1}_{C_{4}}) on the left is strongly 22-good. (P3,𝟏P3)(P_{3},\mathbf{1}_{P_{3}}), on the right, is centrally 22-good in GG with respect to the gray vertex, which has degree 1 in GG.

Each time we find a 22-good weighted induced subgraph HH in GG, the local ratio technique allows us to recurse on an induced subgraph GG^{\prime} of GG in which at least one vertex of HH is deleted from GG. For example, the 22-good induced subgraphs mentioned above allow us to reduce to input graphs GG that are C4C_{4}-free and have minimum degree at least 22.

The crux of our algorithm, Step 14, relies on the following structural result.

Theorem 2.

Let GG be a twin-free graph, let v0v_{0} be any vertex of GG, and let HH be the subgraph of GG induced by N2[v0]N_{\leqslant 2}[v_{0}]. There exists a cost function cH:V(H)0c_{H}:V(H)\to\mathbb{Z}_{\geqslant 0} such that (H,cH)(H,c_{H}) is either strongly 22-good, or centrally 22-good in GG with respect to v0v_{0}. Moreover, cHc_{H} can be constructed in polynomial time.

We also study Cluster-VD from the polyhedral point of view. In particular we investigate how well linear programming (LP) relaxations can approximate the optimal value of Cluster-VD. As in [16, 13, 9], we use the following notion of LP relaxations which, by design, allows for extended formulations.

Fix a graph GG. Let d0d\in\mathbb{Z}_{\geqslant 0} be an arbitrary dimension. A system of linear inequalities AxbAx\geqslant b in d\mathbb{R}^{d} defines an LP relaxation of Cluster-VD on GG if the following hold: (i) For every hitting set XV(G)X\subseteq V(G), we have a point πXd\pi^{X}\in\mathbb{R}^{d} satisfying AπXbA\pi^{X}\geqslant b; (ii) For every cost function c:V(G)0c:V(G)\to\mathbb{Q}_{\geqslant 0}, we have an affine function fc:df_{c}:\mathbb{R}^{d}\to\mathbb{R}; (iii) For all hitting sets XV(G)X\subseteq V(G) and cost functions c:V(G)0c:V(G)\to\mathbb{Q}_{\geqslant 0}, the condition fc(πX)=c(X)f_{c}(\pi^{X})=c(X) holds.

The size of the LP relaxation AxbAx\geqslant b is defined as the number of rows of AA. For every cost function cc, the quantity LP(G,c):=min{fc(x)Axb}\operatorname{\mathrm{LP}}(G,c):=\min\{f_{c}(x)\mid Ax\geqslant b\} gives a lower bound on OPT(G,c)\operatorname{\mathrm{OPT}}(G,c). The integrality gap of the LP relaxation AxbAx\geqslant b is defined as sup{OPT(G,c)/LP(G,c)c0V(G)}\sup\{\operatorname{\mathrm{OPT}}(G,c)/\operatorname{\mathrm{LP}}(G,c)\mid c\in\mathbb{Q}_{\geqslant 0}^{V(G)}\}.

Letting 𝒫3(G)\mathcal{P}_{3}(G) denote the collection of all vertex sets {u,v,w}\{u,v,w\} that induce a P3P_{3} in GG, we define

P(G):={x[0,1]V(G){u,v,w}𝒫3(G):xu+xv+xw1}.P(G):=\{x\in[0,1]^{V(G)}\mid\forall\{u,v,w\}\in\mathcal{P}_{3}(G):x_{u}+x_{v}+x_{w}\geqslant 1\}.

We let 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G) denote the relaxation obtained from P(G)P(G) by applying rr rounds of the Sherali-Adams hierarchy [33], a standard procedure to derive strengthened LP relaxations of binary linear programming problems. If a cost function c:V(G)0c:V(G)\to\mathbb{Q}_{\geqslant 0} is provided, we let

𝖲𝖠r(G,c):=min{vV(G)c(v)xvx𝖲𝖠r(G)}\operatorname{\mathsf{SA}}_{r}(G,c):=\min\{\sum_{v\in V(G)}c(v)x_{v}\mid x\in\operatorname{\mathsf{SA}}_{r}(G)\}

denote the optimum value of the corresponding linear programming relaxation.

It is not hard to see that the straightforward LP relaxation P(G)P(G) has worst case integrality gap equal to 33 (by worst case, we mean that we take the supremum over all graphs GG). Indeed, for a random nn-vertex graph, OPT(G,𝟏G)=nO(log2n)\operatorname{\mathrm{OPT}}(G,\mathbf{1}_{G})=n-O(\log^{2}n) with high probability. This can be easily proved via the probabilistic method. A similar proof can be found, for instance, in the introduction of [2]). On the other hand, LP(G,𝟏G)n/3\operatorname{\mathrm{LP}}(G,\mathbf{1}_{G})\leqslant n/3, since the vector with all coordinates equal to 13\frac{1}{3} is feasible for P(G)P(G).

On the positive side, we show how applying one round of the Sherali-Adams hierarchy gives a relaxation with integrality gap at most 5/2=2.55/2=2.5, see Theorem 3. To complement this, we prove that the worst case integrality gap of the relaxation is precisely 5/25/2, see Theorem 4. We then show that the integrality gap decreases to 2+ε2+\varepsilon after applying poly(1/ε)\mathrm{poly}(1/\varepsilon) rounds, see Theorem 5.

On the negative side, applying known results on Vertex Cover [9], we show that no polynomial-size LP relaxation of Cluster-VD can have integrality gap at most 2ε2-\varepsilon for some ε>0\varepsilon>0. As is the case for similar lower bounds (see [31, 8]), this result is unconditional: it does not rely on P \neq NP nor the Unique Games Conjecture.

1.2. Comparison to previous works

We now revisit all previous approximation algorithms for Cluster-VD [39, 18, 19]. The presentation given here slightly departs from [39, 18], and explains in a unified manner what is the bottleneck in each of the algorithms.

Fix k{3,4,5}k\in\{3,4,5\}, and let α:=(2k1)/(k1)\alpha:=(2k-1)/(k-1). Notice that α=5/2\alpha=5/2 if k=3k=3, α=7/3\alpha=7/3 if k=4k=4 and α=9/4\alpha=9/4 if k=5k=5. In [19, Lemma 3], it is shown that if a twin-free graph GG contains a kk-clique, then one can find an induced subgraph HH containing the kk-clique and a cost function cHc_{H} such that (H,cH)(H,c_{H}) is strongly α\alpha-good.

Therefore, in order to derive an α\alpha-approximation for Cluster-VD, one may assume without loss of generality that the input graph GG is twin-free and has no kk-clique. Let v0v_{0} be a maximum degree vertex in GG, and let HH denote the subgraph of GG induced by N2[v0]N_{\leqslant 2}[v_{0}]. In [19], it is shown by a tedious case analysis that one can construct a cost function cHc_{H} such that (H,cH)(H,c_{H}) is 22-good in GG, using the fact that GG has no kk-clique.

The simplest case occurs when k=3k=3. Then N(v0)N(v_{0}) is a stable set. Letting cH(v0):=d(v0)1c_{H}(v_{0}):=d(v_{0})-1, cH(v):=1c_{H}(v):=1 for vN(v0)v\in N(v_{0}) and cH(v):=0c_{H}(v):=0 for the other vertices of HH, one easily sees that (H,cH)(H,c_{H}) is (centrally) 22-good in GG . For higher values of kk, one has to work harder.

In this paper, we show that one can always, and in polynomial time, construct a cost function cHc_{H} on the vertices at distance at most 22 from v0v_{0} that makes (H,cH)(H,c_{H}) 22-good in GG, provided that GG is twin-free, see Theorem 2. This result was the main missing ingredient in previous approaches, and single-handedly closes the approximability status of Cluster-VD.

1.3. Other related works

Cluster-VD has also been widely studied from the perspective of fixed parameter tractability. Given a graph GG and parameter kk as input, the task is to decide if GG has a hitting set XX of size at most kk. A 2kn𝒪(1)2^{k}n^{\mathcal{O}(1)}-time algorithm for this problem was given by Hüffner, Komusiewicz, Moser, and Niedermeier [24]. This was subsequently improved to a 1.911kn𝒪(1)1.911^{k}n^{\mathcal{O}(1)}-time algorithm by Boral, Cygan, Kociumaka, and Pilipczuk [11], and a 1.811kn𝒪(1)1.811^{k}n^{\mathcal{O}(1)}-time algorithm by Tsur [37]. By the general framework of Fomin, Gaspers, Lokshtanov, and Saurabh [20], these parametrized algorithms can be transformed into exponential algorithms which compute the size of a minimum hitting set for GG exactly, the fastest of which runs in time 𝒪(1.488n)\mathcal{O}(1.488^{n}).

For polyhedral results, Hosseinian and Butenko [23] gives some facet-defining inequalities of the Cluster-VD polytope, as well as complete linear descriptions for special classes of graphs.

Another related problem is the feedback vertex set problem in tournaments (FVST). Given a tournament TT with costs on the vertices, the task is to find a minimum cost set of vertices XX such that TXT-X does not contain a directed cycle.

For unit costs, note that Cluster-VD is equivalent to the problem of deleting as few elements as possible from a symmetric relation to obtain a transitive relation, while FVST is equivalent to the problem of deleting as few elements as possible from an antisymmetric and complete relation to obtain a transitive relation.

In a tournament, hitting all directed cycles is equivalent to hitting all directed triangles, so FVST is also a hitting set problem in a 33-uniform hypergraph. Moreover, FVST is also UGC-hard to approximate to a constant factor smaller than 22. Cai, Deng, and Zang [14] gave a 5/25/2-approximation algorithm for FVST, which was later improved to a 7/37/3-approximation algorithm by Mnich, Williams, and Végh [30]. Lokshtanov, Misra, Mukherjee, Panolan, Philip, and Saurabh [29] recently gave a randomized 22-approximation algorithm, but no deterministic (polynomial-time) 22-approximation algorithm is known. For FVST, one round of the Sherali-Adams hierarchy actually provides a 7/37/3-approximation [5].

Among other related covering and packing problems, Fomin, Le, Lokshtanov, Saurabh, Thomassé, and Zehavi [21] studied both Cluster-VD and FVST from the kernelization perspective. They proved that the unweighted versions of both problems admit subquadratic kernels: 𝒪(k53)\mathcal{O}(k^{\frac{5}{3}}) for Cluster-VD and 𝒪(k32)\mathcal{O}(k^{\frac{3}{2}}) for FVST.

1.4. Overview of the proof

We give a sketch of the proof of Theorem 2. Recall that H=G[N2[v0]]H=G[N_{\leqslant 2}[v_{0}]]. If the subgraph induced by N(v0)N(v_{0}) contains a hole (that is, an induced cycle of length at least 44), then HH is strongly 22-good by Lemma 2.1. If the subgraph induced by N(v0)N(v_{0}) contains an induced 2P32P_{3} (that is, two disjoint copies of P3P_{3} with no edges between each other), then HH is strongly 22-good by Lemma 2.1. This allows us to reduce to the case where the subgraph induced by N(v0)N(v_{0}) is chordal and 2P32P_{3}-free.

Lemma 2.2 then gives a direct construction of a cost function cHc_{H} which certifies that (H,cH)(H,c_{H}) is centrally 22-good, provided that the subgraph induced by N[v0]N[v_{0}] is twin-free. This is the crucial step of the proof. It serves as the base case of the induction. Here, we use a slick observation due to Lokshtanov [28]: since the subgraph induced by N(v0)N(v_{0}) is chordal and 2P32P_{3}-free, it has a hitting set that is a clique. In a previous version, our proof of Theorem 2 was slightly more complicated.

We show inductively that we can reduce to the case where the subgraph induced by N[v0]N[v_{0}] is twin-free. The idea is to delete vertices from HH to obtain a smaller graph HH^{\prime}, while preserving certain properties, and then compute a suitable cost function cHc_{H} for HH, given a suitable cost function cHc_{H^{\prime}} for HH^{\prime}. We delete vertices at distance 22 from v0v_{0}. When this creates twins in HH, we delete one vertex from each pair of twins. At the end, we obtain a twin-free induced subgraph of H[N[v0]]H[N[v_{0}]], which corresponds to our base case.

We conclude the introduction with a brief description of the different sections of the paper. Section 2 is entirely devoted to the proof of Theorem 2. The proof of Theorem 1 is given in Section 3, together with a complexity analysis of Algorithm 1. Section 4 presents our polyhedral results. A conclusion is given in Section 5. There, we state a few open problems for future research.

Conference version. An abstract of this paper appeared in the proceedings of IPCO 2021 [6]. The current paper is an extended version with full proofs and detailed discussions. In particular, the running time analysis of our algorithm in Section 3 is new, and the polyhedral results of Section 4 appear without proof in the conference version.

2. Finding 22-good induced subgraphs

The goal of this section is to prove Theorem 2. Our proof is by induction on the number of vertices in H:=G[N2[v0]]H:=G[N_{\leq 2}[v_{0}]]. First, we quickly show that we can assume that the subgraph induced by N(v0)N(v_{0}) is chordal and 2P32P_{3}-free. Using this, we prove the theorem in the particular case where the subgraph induced by N[v0]N[v_{0}] is twin-free. Finally, we prove the theorem in the general case by showing how to deal with twins.

2.1. Restricting to chordal, 2P32P_{3}-free neighborhoods

A vertex of GG is apex if it is adjacent to all the other vertices of GG. A wheel is a graph obtained from a cycle by adding an apex vertex (called the center).

As pointed out earlier in the introduction, 44-cycles are strongly 22-good. This implies that the wheel on 55 vertices is strongly 22-good (putting a zero cost on the center). We now show that all wheels on at least 55 vertices are strongly 22-good. This allows our algorithm to restrict to input graphs such that the subgraph induced on each neighborhood is chordal. In a similar way, we show that we can further restrict such neighborhoods to be 2P32P_{3}-free.

Lemma \thelemma.

Let H:=WkH:=W_{k} be a wheel on k5k\geqslant 5 vertices and center v0v_{0}, let cH(v0):=k5c_{H}(v_{0}):=k-5 and cH(v):=1c_{H}(v):=1 for vV(Hv0)v\in V(H-v_{0}). Then (H,cH)(H,c_{H}) is strongly 22-good.

Proof.

Notice that OPT(H,cH)k3\operatorname{\mathrm{OPT}}(H,c_{H})\geqslant k-3 since a hitting set either contains v0v_{0} and at least 22 more vertices, or does not contain v0v_{0} but contains k3k-3 other vertices. Hence, vV(H)cH(v)=k5+k1=2(k3)2OPT(H,cH)\sum_{v\in V(H)}c_{H}(v)=k-5+k-1=2(k-3)\leqslant 2\operatorname{\mathrm{OPT}}(H,c_{H}). ∎

Lemma \thelemma.

Let HH be the graph obtained from 2P32P_{3} by adding an apex vertex v0v_{0}. Let cH(v0):=2c_{H}(v_{0}):=2 and cH(v):=1c_{H}(v):=1 for vV(Hv0)v\in V(H-v_{0}). Then (H,cH)(H,c_{H}) is strongly 22-good.

Proof.

It is easy to check that OPT(H,cH)4\operatorname{\mathrm{OPT}}(H,c_{H})\geqslant 4. Thus, vV(H)cH(v)=82OPT(H,cH)\sum_{v\in V(H)}c_{H}(v)=8\leqslant 2\operatorname{\mathrm{OPT}}(H,c_{H}). ∎

2.2. When HH is twin-free

Throughout this section, we assume that HH is a twin-free graph with an apex vertex v0v_{0} such that Hv0H-v_{0} is chordal and 2P32P_{3}-free. Our goal is to construct a cost function cHc_{H} that certifies that HH is centrally 22-good.

It turns out to be easier to define the cost function on V(Hv0)=N(v0)V(H-v_{0})=N(v_{0}) first, and then adjust the cost of v0v_{0}. This is the purpose of the next lemma. Below, ω(G,c)\omega(G,c) denotes the maximum weight of a clique in weighted graph (G,c)(G,c).

Lemma \thelemma.

Let HH be a graph with an apex vertex v0v_{0} and H:=Hv0H^{\prime}:=H-v_{0}. Let cH:V(H)1c_{H^{\prime}}:V(H^{\prime})\rightarrow\mathbb{Z}_{\geqslant 1} be a cost function such that

  1. (i)

    cH(V(H))2ω(H,cH)c_{H^{\prime}}(V(H^{\prime}))\geqslant 2\omega(H^{\prime},c_{H^{\prime}}) and

  2. (ii)

    OPT(H,cH)ω(H,cH)1\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}})\geqslant\omega(H^{\prime},c_{H^{\prime}})-1.

Then we can extend cHc_{H^{\prime}} to a function cH:V(H)1c_{H}:V(H)\rightarrow\mathbb{Z}_{\geqslant 1} such that cH(V(H))2OPT(H,cH)+1c_{H}(V(H))\leqslant 2\operatorname{\mathrm{OPT}}(H,c_{H})+1. In other words, (H,cH)(H,c_{H}) is centrally 22-good with respect to v0v_{0}.

Proof.

Notice that

OPT(H,cH)=min(cH(v0)+OPT(H,cH),cH(V(H))ω(H,cH)),\operatorname{\mathrm{OPT}}(H,c_{H})=\min(c_{H}(v_{0})+\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}}),c_{H^{\prime}}(V(H^{\prime}))-\omega(H^{\prime},c_{H^{\prime}})),

since if XX is hitting set of HH that does not contain v0v_{0}, then HXH-X is a clique.

Let a=max(1,cH(V(H))2OPT(H,cH)1)a=\max(1,c_{H^{\prime}}(V(H^{\prime}))-2\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}})-1) and b=cH(V(H))2ω(H,cH)+1b=c_{H^{\prime}}(V(H^{\prime}))-2\omega(H^{\prime},c_{H^{\prime}})+1. Choose cH(v0)1c_{H}(v_{0})\in\mathbb{Z}_{\geqslant 1} such that acH(v0)ba\leqslant c_{H}(v_{0})\leqslant b. Note that cH(v0)c_{H}(v_{0}) exists since aba\leqslant b by conditions (i) and (ii).

Suppose OPT(H,cH)=cH(v0)+OPT(H,cH)\operatorname{\mathrm{OPT}}(H,c_{H})=c_{H}(v_{0})+\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}}). Since acH(v0)a\leqslant c_{H}(v_{0}),

cH(V(H))2cH(v0)+2OPT(H,cH)+1=2OPT(H,cH)+1.c_{H}(V(H))\leqslant 2c_{H}(v_{0})+2\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}})+1=2\operatorname{\mathrm{OPT}}(H,c_{H})+1.

Suppose OPT(H,cH)=cH(V(H))ω(H,cH)\operatorname{\mathrm{OPT}}(H,c_{H})=c_{H^{\prime}}(V(H^{\prime}))-\omega(H^{\prime},c_{H^{\prime}}). Since cH(v0)bc_{H}(v_{0})\leqslant b,

cH(V(H))2cH(V(H))2ω(H,cH)+1=2OPT(H,cH)+1.c_{H}(V(H))\leqslant 2c_{H^{\prime}}(V(H^{\prime}))-2\omega(H^{\prime},c_{H^{\prime}})+1=2\operatorname{\mathrm{OPT}}(H,c_{H})+1.

In either case, cH(V(H))2OPT(H,cH)+1c_{H}(V(H))\leqslant 2\operatorname{\mathrm{OPT}}(H,c_{H})+1, as required. ∎

We abuse notation and regard a clique XX of a graph as both a set of vertices and a subgraph. We call a hitting set XX of a graph GG a hitting clique if XX is also a clique.

Lemma \thelemma.

Every chordal, 2P32P_{3}-free graph contains a hitting clique.

Proof.

Let GG be a chordal, 2P32P_{3}-free graph. Since GG is chordal, GG admits a clique tree TT (see [10]). In TT, the vertices are the maximal cliques of GG and, for every two maximal cliques KK, KK^{\prime}, the intersection KKK\cap K^{\prime} is contained in every clique of the KKKK^{\prime} path in TT. For an edge e:=KKe:=KK^{\prime} of TT, Let T1T_{1} and T2T_{2} be the components of TeT-e and G1G_{1} and G2G_{2} be the subgraphs of GG induced by the union of all the cliques in T1T_{1} and T2T_{2}, respectively. It is easy to see that deleting KKK\cap K^{\prime} separates G1G_{1} from G2G_{2} in GG. Now, since GG is 2P32P_{3}-free, at least one of G1:=G1(KK)G_{1}^{\prime}:=G_{1}-(K\cap K^{\prime}) or G2:=G2(KK)G_{2}^{\prime}:=G_{2}-(K\cap K^{\prime}) is a cluster graph. If both G1G_{1}^{\prime} and G2G_{2}^{\prime} are cluster graphs, we are done since KKK\cap K^{\prime} is the desired hitting clique. Otherwise, if GiG_{i}^{\prime} is not a cluster graph, then we can orient ee towards TiT_{i}. Applying this argument on each edge, we define an orientation of TT, which must have a sink K0K_{0}. But then removing K0K_{0} from GG leaves a cluster graph, and we are done. Since the clique tree of a chordal graph can be constructed in polynomial time [10], the hitting clique can be found in polynomial time. ∎

HHHv0H-v_{0}vvSvS_{v}
Figure 2. Here HH is twin-free, v0v_{0} is the gray vertex and the blue vertices form a hitting clique K0K_{0} for Hv0H-v_{0}, which is chordal and 2P32P_{3}-free. For vK0v\in K_{0}, the set SvS_{v} defined as in the proof of Lemma 2.2 consists of the unique maximal independent set containing vv. We obtain cH=(𝟔,1,1,1,1,3,3,3)c_{H}=(\mathbf{6},{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{1}},{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{1}},{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{1}},{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}{1}},3,3,3), which is easily seen to be centrally 2-good with respect to v0v_{0}.

We are ready to prove the base case for Theorem 2. For a graph HH, we let |H||H| denote the number of vertices of HH.

Lemma \thelemma.

Let HH be a twin-free graph with an apex vertex v0v_{0} such that Hv0H-v_{0} is chordal and 2P32P_{3}-free. There exists a cost function cHc_{H} such that (H,cH)(H,c_{H}) is centrally 22-good with respect to v0v_{0}. Moreover, cHc_{H} can be found in time 𝒪(|H|3)\mathcal{O}(|H|^{3}).

Proof.

By Lemma 2.2, some maximal clique of Hv0H-v_{0}, say K0K_{0}, is a hitting set.

We claim that there is a family of stable sets 𝒮={SvvK0}\mathcal{S}=\{S_{v}\mid v\in K_{0}\} of Hv0H-v_{0} satisfying the following properties:

  1. (P1)

    every vertex of Hv0H-v_{0} is contained in some SvS_{v};

  2. (P2)

    for each vK0v\in K_{0}, SvS_{v} contains vv and at least one other vertex;

  3. (P3)

    for every two distinct vertices v,vK0v,v^{\prime}\in K_{0}, H[SvSv]H[S_{v}\cup S_{v^{\prime}}] contains a P3P_{3}.

Before proving the claim, we prove that it implies the lemma. Consider the cost function cH:=vK0χSvc_{H^{\prime}}:=\sum_{v\in K_{0}}\chi^{S_{v}} on the vertices of H:=Hv0H^{\prime}:=H-v_{0} defined by giving to each vertex uu a cost equal to the number of stable sets SvS_{v} that contain uu (see Figure 2). It suffices to show that cHc_{H^{\prime}} satisfies the conditions of Lemma 2.2 and can therefore be extended to a cost function cHc_{H} on V(H)V(H) such that (H,cH)(H,c_{H}) is centrally 22-good with respect to v0v_{0}.

First, by (P1), we have cH(u)1c_{H^{\prime}}(u)\in\mathbb{Z}_{\geqslant 1} for all uV(H)u\in V(H^{\prime}). Second, condition (i) of Lemma 2.2 follows from (P2) since each stable set SvS_{v} contributes at least two units to cH(V(H))c_{H^{\prime}}(V(H^{\prime})) and at most one unit to ω(H,cH)\omega(H^{\prime},c_{H^{\prime}}). Third, (P3) implies that every hitting set of HH^{\prime} meets every stable set SvS_{v}, except possibly one. Hence, OPT(H,cH)|K0|1\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}})\geqslant|K_{0}|-1. Also, every clique of HH^{\prime} meets every stable set SvS_{v} in at most one vertex, implying that ω(H,cH)|K0|\omega(H^{\prime},c_{H^{\prime}})\leqslant|K_{0}|, and equality holds since cH(K0)=|K0|c_{H^{\prime}}(K_{0})=|K_{0}|. Putting the last two observations together, we see that OPT(H,cH)|K0|1=ω(H,cH)1\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}})\geqslant|K_{0}|-1=\omega(H^{\prime},c_{H^{\prime}})-1 and hence condition (ii) of Lemma 2.2 holds.

Now, we prove that our claim holds. Let K1,,KtK_{1},\dots,K_{t} denote the clusters (maximal cliques) of cluster graph Hv0K0H-v_{0}-K_{0}. For i[t]i\in[t], consider the submatrix AiA_{i} of the adjacency matrix A(H)A(H) with rows indexed by the vertices of K0K_{0} and columns indexed by the vertices of KiK_{i}.

Notice that AiA_{i} contains neither (1001)\begin{pmatrix}1&0\\ 0&1\end{pmatrix} nor (0110)\begin{pmatrix}0&1\\ 1&0\end{pmatrix} as a submatrix, as this would give a C4C_{4} contained in Hv0H-v_{0}, contradicting the chordality of Hv0H-v_{0}. Hence, after permuting its rows and columns if necessary, AiA_{i} can be assumed to be staircase-shaped. That is, every row of AiA_{i} is nonincreasing and every column nondecreasing. Notice also that AiA_{i} does not have two equal columns, since these would correspond to two vertices of KiK_{i} that are twins.

For each KiK_{i} that is not complete to vK0v\in K_{0}, define φi(v)\varphi_{i}(v) as the vertex uKiu\in K_{i} whose corresponding column in AiA_{i} is the first containing a 0 in row vv. Now, for each vv, let SvS_{v} be the set including vv, and φi(v)\varphi_{i}(v), for each KiK_{i} that is not complete to vv.

Because KK is maximal, no vertex uKiu\in K_{i} is complete to K0K_{0}. Since no two columns of AiA_{i} are identical, we must have u=φi(v)u=\varphi_{i}(v) for some vK0v\in K_{0}. This proves (P1).

Notice that vSvv\in S_{v} by construction and that |Sv|2|S_{v}|\geqslant 2 since otherwise, vv would be apex in HH and thus a twin of v0v_{0}. Hence, (P2) holds.

Finally, consider any two distinct vertices v,vK0v,v^{\prime}\in K_{0}. Since v,vv,v^{\prime} are not twins, the edge vvvv^{\prime} must be in a P3P_{3} contained in Hv0H-v_{0}. Assume, without loss of generality, that there is a vertex uKiu\in K_{i} adjacent to vv and not to vv^{\prime} for some i[t]i\in[t]. Then {v,v,φi(v)}\{v,v^{\prime},\varphi_{i}(v^{\prime})\} induces a P3P_{3} contained in H[SvSv]H[S_{v}\cup S_{v^{\prime}}], proving (P3). This concludes the proof of the claim.

We observe that the collection 𝒮\mathcal{S} can be computed in 𝒪(|H|3)\mathcal{O}(|H|^{3}) time. This yields the restriction cHc_{H} to HH^{\prime}. Since HH^{\prime} is chordal, ω(H,cH)\omega(H^{\prime},c_{H^{\prime}}) can be computed in 𝒪(|H|2)\mathcal{O}(|H^{\prime}|^{2}) time. We then just let cH(v0):=cH(V(H))2ω(H,cH)+1=cH(V(H))2|K0|+1c_{H}(v_{0}):=c_{H^{\prime}}(V(H^{\prime}))-2\omega(H^{\prime},c_{H^{\prime}})+1=c_{H^{\prime}}(V(H^{\prime}))-2|K_{0}|+1. This sets cH(v0)c_{H}(v_{0}) equal to the value bb in the proof of Lemma 2.2. Therefore, cHc_{H} can be constructed in 𝒪(|H|3)\mathcal{O}(|H|^{3}) time. ∎

2.3. Handling twins in G[N[v0]]G[N[v_{0}]]

We now deal with the general case where G[N[v0]]G[N[v_{0}]] contains twins. We start with an extra bit of terminology relative to twins. Let GG be a twin-free graph, and v0V(G)v_{0}\in V(G). Suppose that u,uu,u^{\prime} are twins in G[N[v0]]G[N[v_{0}]]. Since GG is twin-free, there exists a vertex vv that is adjacent to exactly one of uu, uu^{\prime} in GG. We say that vv is a distinguisher for the edge uuuu^{\prime} (or for the pair {u,u}\{u,u^{\prime}\}). Notice that either uuvuu^{\prime}v or uuvu^{\prime}uv is an induced P3P_{3}. Notice also that vv is at distance 22 from v0v_{0}.

Now, consider a graph HH with a special vertex v0V(H)v_{0}\in V(H) (the root vertex) such that

  1. (H1)

    every vertex is at distance at most 22 from v0v_{0}, and

  2. (H2)

    every pair of vertices that are twins in H[N[v0]]H[N[v_{0}]] has a distinguisher.

Let vv be any vertex that is at distance 22 from v0v_{0}. Consider the equivalence relation \equiv on N[v0]N[v_{0}] with uuu\equiv u^{\prime} whenever u=uu=u^{\prime} or u,uu,u^{\prime} are twins in HvH-v. Observe that the equivalence classes of \equiv are of size at most 22 since, if u,u,u′′u,u^{\prime},u^{\prime\prime} are distinct vertices with uuu′′u\equiv u^{\prime}\equiv u^{\prime\prime}, then vv cannot distinguish every edge of the triangle on uu, uu^{\prime} and u′′u^{\prime\prime}. Hence, two of these vertices are twins in HH, which contradicts (H2).

From what precedes, the edges contained in N[v0]N[v_{0}] that do not have a distinguisher in HvH-v form a matching M:={u1u1,,ukuk}M:=\{u_{1}u^{\prime}_{1},\ldots,u_{k}u^{\prime}_{k}\} (possibly, k=0k=0). Let HH^{\prime} denote the graph obtained from HH by deleting vv and exactly one endpoint from each edge of MM. Notice that the resulting subgraph is the same, up to isomorphism, no matter which endpoints are chosen.

The lemma below states how we can obtain a cost function cHc_{H} that certifies that HH is centrally 22-good from a cost function cHc_{H^{\prime}} that certifies that HH^{\prime} is centrally 22-good. It is inspired by [19, Lemma 3]. See Figure 3 for an example.

Lemma \thelemma.

Let HH be any graph satisfying (H1) and (H2) for some v0V(H)v_{0}\in V(H). Let vN2(v0)v\in N_{2}(v_{0}). Let M:={u1u1,,ukuk}M:=\{u_{1}u^{\prime}_{1},\ldots,u_{k}u^{\prime}_{k}\} be the matching formed by the edges in N[v0]N[v_{0}] whose unique distinguisher is vv, where uiv0u^{\prime}_{i}\neq v_{0} for all ii (we allow the case k=0k=0). Let H:=Hu1ukvH^{\prime}:=H-u^{\prime}_{1}-\dots-u^{\prime}_{k}-v. Given a cost function cHc_{H^{\prime}} on V(H)V(H^{\prime}), define a cost function cHc_{H} on V(H)V(H) by letting cH(ui):=cH(ui)c_{H}(u^{\prime}_{i}):=c_{H^{\prime}}(u_{i}) for i[k]i\in[k], cH(v):=i=1kcH(ui)=i=1kcH(ui)c_{H}(v):=\sum_{i=1}^{k}c_{H^{\prime}}(u_{i})=\sum_{i=1}^{k}c_{H}(u^{\prime}_{i}), and cH(u):=cH(u)c_{H}(u):=c_{H^{\prime}}(u) otherwise. First, HH^{\prime} satisfies (H1) and (H2). Second, if (H,cH)(H^{\prime},c_{H^{\prime}}) is centrally 22-good, then (H,cH)(H,c_{H}) is centrally 22-good.

Proof.

For the first part, notice that HH^{\prime} satisfies (H1) by our choice of vv. Indeed, deleting vv does not change the distance of the remaining vertices from v0v_{0}.

We now prove that HH^{\prime} satisfies (H2). First notice that in HvH-v the twins in N[v0]N[v_{0}] are exactly (u1,u1),,(uk,uk)(u_{1},u^{\prime}_{1}),\ldots,(u_{k},u^{\prime}_{k}). Next, for each edge eE(H[N[v0]])e\in E(H[N[v_{0}]]), ee has at least one distinguisher different than vv unless eMe\in M. Moreover, each uiu_{i}^{\prime} is a distinguisher for ee if and only if uiu_{i} is a distinguisher for ee. Thus, each edge of H[N[v0]]H^{\prime}[N[v_{0}]] still has at least one distinguisher, which proves (H2).

For the second part, notice that cH(u)1c_{H}(u)\geqslant 1 for all uN[v0]u\in N[v_{0}] since cH(u)1c_{H^{\prime}}(u)\geqslant 1 for all uN[v0]{v,u1,,uk}u\in N[v_{0}]\setminus\{v,u^{\prime}_{1},\ldots,u^{\prime}_{k}\}. To argue that cH(V(H))2OPT(H,cH)+1c_{H}(V(H))\leqslant 2\operatorname{\mathrm{OPT}}(H,c_{H})+1, one can check that any hitting set of HH must either contain vv or at least one endpoint of each edge uiuiMu_{i}u_{i}^{\prime}\in M. Hence OPT(H,cH)i=1kcH(ui)+OPT(H,cH)\operatorname{\mathrm{OPT}}(H,c_{H})\geqslant\sum_{i=1}^{k}c_{H^{\prime}}(u_{i})+\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}}).

Since (H,cH)(H,c_{H^{\prime}}) is centrally 22-good, cH(V(H))2OPT(H,cH)+1c_{H^{\prime}}(V(H^{\prime}))\leqslant 2\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}})+1. It follows that

cH(V(H))\displaystyle c_{H}(V(H)) =cH(v)+i=1kcH(ui)=2i=1kcH(ui)+cH(V(H))2OPT(H,cH)+1\displaystyle=\overbrace{c_{H}(v)+\sum_{i=1}^{k}c_{H}(u^{\prime}_{i})}^{=2\sum_{i=1}^{k}c_{H^{\prime}}(u_{i})}+\overbrace{c_{H^{\prime}}(V(H^{\prime}))}^{\leqslant 2\operatorname{\mathrm{OPT}}(H^{\prime},c_{H^{\prime}})+1}
2OPT(H,cH)+1.\displaystyle\leqslant 2\operatorname{\mathrm{OPT}}(H,c_{H})+1\,.
vvHHHvH-vHH^{\prime}
Figure 3. Here v0v_{0} is the gray vertex and vv is the red vertex. HvH-v violates (H2), and contains two pairs of twins, indicated by the red edges. Lemma 2.3 applies. We see that HH^{\prime} is a P3P_{3}, for which Lemma 2.2 gives cH=𝟏Hc_{H^{\prime}}=\mathbf{1}_{H^{\prime}}. In (H,cH)(H,c_{H}), all vertices get a unit cost except vv, which gets a cost of 22, since there are 22 pairs of twins in HvH-v. Thus, cH=(𝟏,2,1,1,1,1)c_{H}=(\mathbf{1},{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{2}},1,1,1,1), where the entries corresponding to v0v_{0} and vv are bold and red, respectively.

2.4. Proof of Theorem 2

We are ready to prove Theorem 2.

Proof.

We can decide in polynomial time (see for instance [35]) if H[N(v0)]H[N(v_{0})] is chordal, and if not, output a hole of H[N(v0)]H[N(v_{0})]. If the latter holds, we are done by Lemma 2.1. If the former holds, we can decide in polynomial time (see [34], and the proof of Lemma 2.2) whether HH contains a 2P32P_{3}. If it does, we are done by Lemma 2.1.

From now on, assume that the subgraph induced by N(v0)N(v_{0}) is chordal and 2P32P_{3}-free. This is done without loss of generality. Notice that hypotheses (H1) and (H2) from Section 2.3 hold for HH. This is obvious for (H1). To see why (H2) holds, remember that GG is twin-free. Hence, every edge uuuu^{\prime} contained in N[v0]N[v_{0}] must have a distinguisher in GG, which is in N2[v0]N_{\leqslant 2}[v_{0}]. (In fact, notice that if uu and uu^{\prime} are twins in H[N[v0]]H[N[v_{0}]] then the distinguisher is necessarily in N2(v0)N_{2}(v_{0}).)

We repeatedly apply Lemma 2.3 in order to delete each vertex of N2(v0)N_{2}(v_{0}) one after the other and reduce to the case where HH is a twin-free graph for which v0v_{0} is apex. We can then apply Lemma 2.2. The whole process takes polynomial time. ∎

3. Running-time Analysis

We now analyse the running-time of Algorithm 1. We assume that input graphs are given by their adjacency matrix. We need the following easy lemma, whose proof we include for completeness.

Lemma \thelemma.

Given a matrix N{0,1}r×cN\in\{0,1\}^{r\times c}, the set of all equivalence classes of equal rows of NN can be found in time 𝒪(rc)\mathcal{O}(rc).

Proof.

Let R0R_{0} and R1R_{1} be the set of rows of NN whose first entry is 0 and 11, respectively. We can determine R0R_{0} and R1R_{1} by reading the first column of NN, which takes time O(r)O(r). We then recurse on N0N_{0}^{\prime} and N1N_{1}^{\prime}, where NiN_{i}^{\prime} is the submatrix of NN induced by RiR_{i} and the last c1c-1 columns of NN. ∎

Before proving the next lemma we remark that, given a graph HH with nn vertices and mm edges, one can check whether HH is a cluster graph by checking that each of its components is a clique, which takes 𝒪(n2)\mathcal{O}(n^{2}) time.

Lemma \thelemma.

Let GG be an nn-vertex, twin-free graph. In 𝒪(n3)\mathcal{O}(n^{3}) time, we can find an induced subgraph HH of GG and a cost function cHc_{H} on V(H)V(H) such that (H,cH)(H,c_{H}) is 22-good.

Proof.

We fix any vertex v0V(G)v_{0}\in V(G), and let H=G[N2[v0]]H=G[N_{\leqslant 2}[v_{0}]]. We can check in 𝒪(n2)\mathcal{O}(n^{2}) time whether H[N(v0)]H[N(v_{0})] is chordal by using the algorithm from [35]. If H[N(v0)]H[N(v_{0})] is not chordal this algorithm returns, as a certificate, a hole CC. By Lemma 2.1, H[V(C)+v0]H[V(C)+v_{0}] is strongly 2-good and the corresponding function cHc_{H} can be computed straightforwardly, hence we are done. Suppose now that H[N(v0)]H[N(v_{0})] is chordal. We can construct the clique tree of H[N(v0)]H[N(v_{0})] (see for instance [34]) in 𝒪(n2)\mathcal{O}(n^{2}) time. Each edge of the tree induces a separation of H[N(v0)]H[N(v_{0})], and we can check if each side is a cluster graph in 𝒪(n2)\mathcal{O}(n^{2}) time. If neither side is a cluster graph, then we have found a 2P32P_{3} in H[N(v0)]H[N(v_{0})]. Hence, H[V(2P3)+v0]H[V(2P_{3})+v_{0}] is strongly 2-good and the corresponding function cHc_{H} can be computed straightforwardly. Since the clique tree has at most |H|n|H|\leq n vertices, by orienting its edges as in the proof of Lemma 2.2 we find, in 𝒪(n3)\mathcal{O}(n^{3}) time, a hitting clique. By applying Lemma 2.3 to get rid of twins, which can be done in 𝒪(n2)\mathcal{O}(n^{2}) time, we obtain a subgraph satisfying the hypotheses of Lemma 2.2. Finally, by Lemma 2.2, the cost function in the statement of Lemma 2.2 can be constructed in 𝒪(n3)\mathcal{O}(n^{3}) time. ∎

Lemma \thelemma.

Algorithm 1 runs in 𝒪(n4)\mathcal{O}(n^{4})-time.

Proof.

By Lemma 3, finding all twins in GG can be done in time 𝒪(n2)\mathcal{O}(n^{2}). Therefore, the most expensive recursive call of the algorithm is the construction of the 22-good weighted induced subgraph (H,cH)(H,c_{H}) from Lemma 3, which can be done in time 𝒪(n3)\mathcal{O}(n^{3}). Therefore, the running-time T(n)T(n) of the algorithm satisfies T(n)T(n1)+𝒪(n3)T(n)\leqslant T(n-1)+\mathcal{O}(n^{3}), which gives T(n)=𝒪(n4)T(n)=\mathcal{O}(n^{4}). ∎

of Theorem 1.

The proof is identical to [19, Proof of Theorem 1, pages 365–366], except that factor 9/49/4 needs to be replaced everywhere by 22. One easily proves by induction that the vertex set XX output by the algorithm on input (G,c)(G,c) is a minimal hitting set with c(X)2OPT(G,c)c(X)\leqslant 2\operatorname{\mathrm{OPT}}(G,c). We do not include more details here, and instead refer the reader to [19]. Theorem 2 guarantees that Algorithm 1 runs in polynomial time. ∎

4. Polyhedral results

In this section we study how well LP-relaxtions can approximate Cluster-VD, as already described in Section 1. We begin with a brief description of the Sherali-Adams hierarchy [33], which is a standard procedure to obtain strengthened LP relaxations for binary linear programs. For a more thorough introduction, we refer the reader to [27]. Throughout the section we closely follow the exposition given in [5], where the Sherali-Adams hierarchy and a very similar concept of diagonal (defined below) are used to approach the FVST problem, as mentioned in Section 1.3. In particular, our Lemma 4 is similar to another lemma in [5].

Let P={xnAxb}P=\{x\in\mathbb{R}^{n}\mid Ax\geqslant b\} be a polytope contained in [0,1]n[0,1]^{n} and PI:=conv(Pn)P_{I}:=\mathrm{conv}(P\cap\mathbb{Z}^{n}). For each rr\in\mathbb{N}, we define a polytope P𝖲𝖠1(P)𝖲𝖠r(P)PIP\supseteq\operatorname{\mathsf{SA}}_{1}(P)\supseteq\dots\supseteq\operatorname{\mathsf{SA}}_{r}(P)\supseteq P_{I} as follows. Let NrN_{r} be the nonlinear system obtained from PP by multiplying each constraint by iIxijJ(1xj)\prod_{i\in I}x_{i}\prod_{j\in J}(1-x_{j}) for all disjoint subsets I,JI,J of [n][n] such that 1|I|+|J|r1\leqslant|I|+|J|\leqslant r. Note that if xi{0,1}x_{i}\in\{0,1\}, then xi2=xix_{i}^{2}=x_{i}. Therefore, we can obtain a linear system LrL_{r} from NrN_{r} by setting xi2:=xix_{i}^{2}:=x_{i} for all i[n]i\in[n] and then xI:=iIxix_{I}:=\prod_{i\in I}x_{i} for all I[n]I\subseteq[n] with |I|2|I|\geqslant 2. We then let 𝖲𝖠r(P)\operatorname{\mathsf{SA}}_{r}(P) be the projection of LrL_{r} onto the variables xix_{i}, i[n]i\in[n].

Let 𝒫3(G)\mathcal{P}_{3}(G) denote the collection of all vertex sets {u,v,w}\{u,v,w\} that induce a P3P_{3} in GG and let 𝖲𝖠r(G):=𝖲𝖠r(P(G))\operatorname{\mathsf{SA}}_{r}(G):=\operatorname{\mathsf{SA}}_{r}(P(G)), where

P(G):={x[0,1]V(G){u,v,w}𝒫3(G):xu+xv+xw1}.P(G):=\{x\in[0,1]^{V(G)}\mid\forall\{u,v,w\}\in\mathcal{P}_{3}(G):x_{u}+x_{v}+x_{w}\geqslant 1\}\,.

If a cost function c:V(G)0c:V(G)\to\mathbb{R}_{\geqslant 0} is provided, we let

𝖲𝖠r(G,c):=min{vV(G)c(v)xvx𝖲𝖠r(G)}\operatorname{\mathsf{SA}}_{r}(G,c):=\min\left\{\sum_{v\in V(G)}c(v)x_{v}\mid x\in\operatorname{\mathsf{SA}}_{r}(G)\right\}

denote the optimum value of the corresponding linear programming relaxation. For the sake of simplicity, we sometimes denote by 𝖲𝖠r(G,c)\operatorname{\mathsf{SA}}_{r}(G,c) the above linear program itself.

We say vertices aa and bb form a diagonal if there are vertices u,vu,v such that {u,v,a}𝒫3(G)\{u,v,a\}\in\mathcal{P}_{3}(G) and {u,v,b}𝒫3(G)\{u,v,b\}\in\mathcal{P}_{3}(G). We say that a path contains a diagonal if any of its pairs of vertices are diagonals. Note that a diagonal pair in a path need not be an edge in the path.

Our first results concern 𝖲𝖠1(G)\operatorname{\mathsf{SA}}_{1}(G). For later use, we list here the inequalities defining 𝖲𝖠1(G)\operatorname{\mathsf{SA}}_{1}(G). For all {u,v,w}𝒫3(G)\{u,v,w\}\in\mathcal{P}_{3}(G) and zV(Guvw)z\in V(G-u-v-w), we have the inequalities

(1) xu+xv+xw\displaystyle x_{u}+x_{v}+x_{w} 1+xuv+xvw,\displaystyle\geqslant 1+x_{uv}+x_{vw}\,,
(2) xuz+xvz+xwz\displaystyle x_{uz}+x_{vz}+x_{wz} xzand\displaystyle\geqslant x_{z}\quad\text{and}
(3) xu+xv+xw+xz\displaystyle x_{u}+x_{v}+x_{w}+x_{z} 1+xuz+xvz+xwz.\displaystyle\geqslant 1+x_{uz}+x_{vz}+x_{wz}\,.

In addition, there are the inequalities

(4) 1xvxvu01\geqslant x_{v}\geqslant x_{vu}\geqslant 0

for all distinct u,vV(G)u,v\in V(G). The polytope 𝖲𝖠1(G)\operatorname{\mathsf{SA}}_{1}(G) is the set of all (xv)(x_{v}) such that there exists (xuv)(x_{uv}) such that inequalities (1)–(4) are satisfied. Note that by definition, xuvx_{uv} and xvux_{vu} are the same variable.

In order to establish the integrality gap of 𝖲𝖠1(G)\operatorname{\mathsf{SA}}_{1}(G), we need two preliminary lemmas.

Lemma \thelemma.

Let x𝖲𝖠1(G)x\in\operatorname{\mathsf{SA}}_{1}(G). If GG contains a P3P_{3} which has a diagonal, then xv2/5x_{v}\geqslant 2/5 for some vertex vv of GG.

Proof.

Assume by contradiction that aa, bb form a diagonal and all components of xx are less than 2/5. By the definition of diagonal, there exist u,vV(G)u,v\in V(G) with {u,v,a},{u,v,b}𝒫3(G)\{u,v,a\},\{u,v,b\}\in\mathcal{P}_{3}(G). In particular, from (1) we have xa+xu+xv1+xau+xavx_{a}+x_{u}+x_{v}\geqslant 1+x_{au}+x_{av} and from (2) xab+xau+xavxax_{ab}+x_{au}+x_{av}\geqslant x_{a}.

Adding these two inequalities, we obtain xu+xv+xab1x_{u}+x_{v}+x_{ab}\geqslant 1. We must have xab1/5x_{ab}\geqslant 1/5 since otherwise max(xu,xv)2/5\max(x_{u},x_{v})\geqslant 2/5. Now let cc be the third vertex of the P3P_{3} containing a,ba,b (notice that cc can be the middle vertex of the P3P_{3}). By (1) and (4), we have xa+xb+xc1+xab+xac6/5x_{a}+x_{b}+x_{c}\geqslant 1+x_{ab}+x_{ac}\geqslant 6/5 which means that max(xa,xb,xc)2/5\max(x_{a},x_{b},x_{c})\geqslant 2/5, a contradiction which concludes the proof. ∎

Lemma \thelemma.

Let GG be a path or a cycle, and c:V(G)0c:V(G)\to\mathbb{R}_{\geqslant 0}. Then:

  1. (i)

    there is an efficient algorithm that solves Cluster-VD for (G,c)(G,c);

  2. (ii)

    the basic LP 𝖲𝖠0(G)\operatorname{\mathsf{SA}}_{0}(G) has integrality gap at most 22.

Proof.

First, let GG be a path. We notice that the coefficient matrix of the basic LP is totally unimodular, by the consecutive ones property [32]. Hence solving the LP yields an integral optimal solution, which corresponds to a hitting set XX of GG of cost equal to OPT(G,c)=𝖲𝖠0(G,c)\operatorname{\mathrm{OPT}}(G,c)=\operatorname{\mathsf{SA}}_{0}(G,c). This proves (i), (ii) when GG is a path.

Now, let GG be a cycle, and let vV(G)v\in V(G). Suppose that vv belongs to a minimum cost hitting set XX of GG. Then X{v}X\setminus\{v\} is a minimum cost hitting set of GvG-v, hence it can be found efficiently since GvG-v is a path. By iterating over all vV(G)v\in V(G) and taking the hitting set of minimum cost, we efficiently solve Cluster-VD for (G,c)(G,c). This concludes the proof of (i).

Finally, let x~\tilde{x} be an optimal solution of 𝖲𝖠0(G,c)\operatorname{\mathsf{SA}}_{0}(G,c). First, assume that there is some vV(G)v\in V(G) such that x~v1/2\tilde{x}_{v}\geqslant 1/2. Since GvG-v is a path, the optimal hitting set XX^{\prime} in GvG-v has cost c(X)uvc(u)x~uc(X^{\prime})\leqslant\sum_{u\neq v}c(u)\tilde{x}_{u}. Hence, we see that X:=X+vX:=X^{\prime}+v is a hitting set of HH with c(X)=c(v)+c(X)c(v)+uvc(u)x~u2uc(u)x~u=2𝖲𝖠0(G,c)c(X)=c(v)+c(X^{\prime})\leqslant c(v)+\sum_{u\neq v}c(u)\tilde{x}_{u}\leqslant 2\sum_{u}c(u)\tilde{x}_{u}=2\operatorname{\mathsf{SA}}_{0}(G,c). On the other hand, if x~v<1/2\tilde{x}_{v}<1/2 for all vertices vV(G)v\in V(G), then the constraint x~u+x~v+x~w1\tilde{x}_{u}+\tilde{x}_{v}+\tilde{x}_{w}\geqslant 1 (where u,wu,w are the neighbors of vv) implies that there can be no vertex vv with x~v=0\tilde{x}_{v}=0. So 0<x~v<1/20<\tilde{x}_{v}<1/2 for all vV(G)v\in V(G). Therefore, extreme point x~\tilde{x} is the unique solution of |G||G| equations of the form xu+xv+xw=1x_{u}+x_{v}+x_{w}=1 for {u,v,w}𝒫3(G)\{u,v,w\}\in\mathcal{P}_{3}(G). Hence x~v=1/3\tilde{x}_{v}=1/3 for all vertices. Thus 𝖲𝖠0(G,c)=1/3c(V(G))\operatorname{\mathsf{SA}}_{0}(G,c)=1/3\cdot c(V(G)). Now notice that since GG is a cycle we can partition its vertices into two disjoint hitting sets XX and YY. Without loss of generality assume that c(X)1/2c(V(G))c(X)\leqslant 1/2\cdot c(V(G)). Then c(X)3/2𝖲𝖠0(G,c)c(X)\leqslant 3/2\cdot\operatorname{\mathsf{SA}}_{0}(G,c). This concludes the proof of (ii). ∎

Theorem 3.

There is a polynomial-time algorithm that, given a graph GG and c:V(G)0c:V(G)\to\mathbb{R}_{\geqslant 0}, outputs a hitting set XX of GG such that c(X)5/2𝖲𝖠1(G,c)c(X)\leqslant 5/2\cdot\operatorname{\mathsf{SA}}_{1}(G,c). In particular, the integrality gap of 𝖲𝖠1(G)\operatorname{\mathsf{SA}}_{1}(G) is at most 5/25/2.

Proof.

Let x¯\bar{x} be an optimal solution of 𝖲𝖠1(G,c)\operatorname{\mathsf{SA}}_{1}(G,c). Let U={vV(G):x¯v2/5}U=\{v\in V(G):\bar{x}_{v}\geq 2/5\}, and H=GUH=G\setminus U. Notice that the restriction of x¯\bar{x} to V(H)V(H) is a feasible solution for 𝖲𝖠1(H,c)\operatorname{\mathsf{SA}}_{1}(H,c) whose components are all strictly less than 2/52/5. Hence, by Lemma 4, HH cannot contain a P3P_{3} which has a diagonal. We will now show that, after possibly getting rid of twins, HH is a disjoint union of paths and cycles. Then, by applying Lemma 4, we will obtain a minimum cost hitting set of HH, which, together with UU, will form our desired hitting set.

First, if HH contains twins uu and vv, we can delete vv, and set c(u):=c(u)+c(v)c^{\prime}(u):=c(u)+c(v), c(w):=c(w)c^{\prime}(w):=c(w) for wV(H){u,v}w\in V(H)\setminus\{u,v\} to obtain a smaller weighted graph (H,c)(H^{\prime},c^{\prime}). We claim that 𝖲𝖠1(H,c)𝖲𝖠1(H,c)\operatorname{\mathsf{SA}}_{1}(H^{\prime},c^{\prime})\leqslant\operatorname{\mathsf{SA}}_{1}(H,c). To see this, we show that one can turn any feasible solution xx of 𝖲𝖠1(H,c)\operatorname{\mathsf{SA}}_{1}(H,c) into a feasible solution of 𝖲𝖠1(H,c)\operatorname{\mathsf{SA}}_{1}(H^{\prime},c^{\prime}) without increasing the cost. Let xu:=min(xu,xv)x^{\prime}_{u}:=\min(x_{u},x_{v}) and xw:=xwx^{\prime}_{w}:=x_{w} for wu,vw\neq u,v. It is easy to check that this defines a feasible solution xx^{\prime} to 𝖲𝖠1(H,c)\operatorname{\mathsf{SA}}_{1}(H^{\prime},c^{\prime}), whose cost is

wvc(w)xw=(c(u)+c(v))min(xu,xv)+wu,vc(w)xwwc(w)xw.\sum_{w\neq v}c^{\prime}(w)x^{\prime}_{w}=(c(u)+c(v))\min(x_{u},x_{v})+\sum_{w\neq u,v}c(w)x_{w}\leqslant\sum_{w}c(w)x_{w}\,.

This proves the claim. Moreover, a hitting set XX for HH can be immediately obtained from a hitting set XX^{\prime} of HH^{\prime} by adding vv if and only if uXu\in X^{\prime}. Finally, notice that there is a feasible solution for 𝖲𝖠1(H,c)\operatorname{\mathsf{SA}}_{1}(H^{\prime},c^{\prime}) obtained from the restriction of x¯\bar{x} whose components are all strictly less than 2/52/5. Hence, from now on we assume that HH is twin-free.

Now, we claim that HH is triangle-free and claw-free. Suppose that HH contains a triangle with vertices uu, vv and ww. Since HH is twin-free, every edge of the triangle has a distinguisher. Without loss of generality, 𝒫3(H)\mathcal{P}_{3}(H) contains {u,v,y}\{u,v,y\}, {u,w,y}\{u,w,y\}, {v,w,z}\{v,w,z\} and {u,v,z}\{u,v,z\} where y,zy,z are distinct vertices outside the triangle. It is easy to see that, for instance, edge uwuw is a diagonal contained in a P3P_{3}, a contradiction. A similar argument shows that HH cannot contain a claw. This proves the claim, and implies that HH has maximum degree at most 2. That is, HH is a disjoint union of paths and cycles. By part (i) of Lemma 4, (applied to each component of HH), we obtain a minimum cost hitting set XX of HH which, thanks to (ii) of Lemma 4, satisfies c(X)2𝖲𝖠0(H,c)52𝖲𝖠1(H,c)c(X)\leq 2\operatorname{\mathsf{SA}}_{0}(H,c)\leq\frac{5}{2}\operatorname{\mathsf{SA}}_{1}(H,c).

Finally, consider XUX\cup U. Clearly, it is a hitting set of GG. Moreover, one has

c(XU)=c(X)+c(U)52(𝖲𝖠1(H,c)+25c(U))c(X\cup U)=c(X)+c(U)\leqslant\frac{5}{2}\left(\operatorname{\mathsf{SA}}_{1}(H,c)+\frac{2}{5}c(U)\right)\leqslant
52(𝖲𝖠1(G,c)uUc(u)x¯u+25c(U))52𝖲𝖠1(G,c).\frac{5}{2}\left(\operatorname{\mathsf{SA}}_{1}(G,c)-\sum_{u\in U}c(u)\bar{x}_{u}+\frac{2}{5}c(U)\right)\leqslant\frac{5}{2}\cdot\operatorname{\mathsf{SA}}_{1}(G,c).

It follows that the integrality gap of 𝖲𝖠1(G)\operatorname{\mathsf{SA}}_{1}(G) is at most 5/25/2, concluding the proof. ∎

We now complement the result above by showing a lower bound on the integrality gap of 𝖲𝖠1(G,c)\operatorname{\mathsf{SA}}_{1}(G,c).

Theorem 4.

For every ε>0\varepsilon>0 there is some instance (G,c)(G,c) of Cluster-VD such that OPT(G,c)(5/2ε)𝖲𝖠1(G,c)\operatorname{\mathrm{OPT}}(G,c)\geqslant(5/2-\varepsilon)\operatorname{\mathsf{SA}}_{1}(G,c).

Proof.

We show there is a graph GG for which every hitting set XX has c(X)(5/2ε)𝖲𝖠1(G,c)c(X)\geqslant(5/2-\varepsilon)\operatorname{\mathsf{SA}}_{1}(G,c) for c:=𝟏Gc:=\mathbf{1}_{G}. Let GG be a graph whose girth is at least kk for some constant k5k\geqslant 5 and with the independence number α(G)n/k\alpha(G)\leqslant n/k where n:=|G|n:=|G|. It can be shown via the probabilistic method that such a GG exists, see [2]. Set c(v):=1c(v):=1 for all vV(G)v\in V(G). We have c(X)n(12/k)c(X)\geqslant n(1-2/k) for every hitting set XX. To see this observe that since GG is triangle-free and α(G)n/k\alpha(G)\leqslant n/k, when we remove XX we will get at most n/kn/k components each of size at most 2. Thus there are at most 2n/k2n/k vertices in GXG-X, so |X|n2n/k|X|\geqslant n-2n/k. Therefore, OPT(G,c)(12/k)n\operatorname{\mathrm{OPT}}(G,c)\geqslant(1-2/k)n.

In order to show 𝖲𝖠1(G,c)2n/5\operatorname{\mathsf{SA}}_{1}(G,c)\leqslant 2n/5, we construct the following feasible solution xx to 𝖲𝖠1(G,c)\operatorname{\mathsf{SA}}_{1}(G,c). Set xv:=2/5x_{v}:=2/5 for all vV(G)v\in V(G) and xvw:=0x_{vw}:=0 if vwE(G)vw\in E(G) and xvw:=1/5x_{vw}:=1/5 if vwE(G)vw\notin E(G). The inequalities defining 𝖲𝖠1(G)\operatorname{\mathsf{SA}}_{1}(G) are all satisfied by xx. This is obvious for inequalities (1), (3) and (4). For inequality (2), notice that at most one of uzuz, vzvz, wzwz can be an edge of GG, since otherwise GG would have a cycle of length at most 44. Thus (2) is satisfied too, x𝖲𝖠1(G)x\in\operatorname{\mathsf{SA}}_{1}(G) and 𝖲𝖠1(G,c)2n/5\operatorname{\mathsf{SA}}_{1}(G,c)\leqslant 2n/5.

This completes the proof since, by taking k5/εk\geqslant 5/\varepsilon, we have OPT(G,c)n(12/k)(5/2ε)2n/5(5/2ε)𝖲𝖠1(G,c)\operatorname{\mathrm{OPT}}(G,c)\geqslant n(1-2/k)\geqslant(5/2-\varepsilon)2n/5\geqslant(5/2-\varepsilon)\operatorname{\mathsf{SA}}_{1}(G,c). ∎

We now show that the integrality gap decreases to 2+ε2+\varepsilon after applying poly(1/ε)\mathrm{poly}(1/\varepsilon) rounds of Sherali-Adams. We first need the following lemma.

Lemma \thelemma.

Fix α1\alpha\geqslant 1 and r0r\in\mathbb{Z}_{\geqslant 0}. Let (G,c)(G,c) be a minimum order weighted graph such that OPT(G,c)>α𝖲𝖠r(G,c)\operatorname{\mathrm{OPT}}(G,c)>\alpha\cdot\operatorname{\mathsf{SA}}_{r}(G,c). The following two assertions hold:

  1. (i)

    if xx is an optimal solution to 𝖲𝖠r(G,c)\operatorname{\mathsf{SA}}_{r}(G,c), then xv<1/αx_{v}<1/\alpha for all vV(G)v\in V(G);

  2. (ii)

    GG is connected and twin-free.

Proof.

(i) Suppose for a contradiction that xv1/αx_{v}\geqslant 1/\alpha, for some vV(G)v\in V(G). Note that xx restricted to V(G){v}V(G)\setminus\{v\} is a feasible solution to 𝖲𝖠r(Gv,c)\operatorname{\mathsf{SA}}_{r}(G-v,c). Thus 𝖲𝖠r(Gv,c)𝖲𝖠r(G,c)c(v)xv\operatorname{\mathsf{SA}}_{r}(G-v,c)\leqslant\operatorname{\mathsf{SA}}_{r}(G,c)-c(v)x_{v}. By the minimality of GG, there is a hitting set XX^{\prime} of GvG-v such that c(X)α𝖲𝖠r(Gv,c)c(X^{\prime})\leqslant\alpha\cdot\operatorname{\mathsf{SA}}_{r}(G-v,c). Therefore X:=X+vX:=X^{\prime}+v is a hitting set of GG with c(X)=c(v)+c(X)c(v)+α𝖲𝖠r(Gv,c)αc(v)xv+α𝖲𝖠r(Gv,c)α𝖲𝖠r(G,c)c(X)=c(v)+c(X^{\prime})\leqslant c(v)+\alpha\cdot\operatorname{\mathsf{SA}}_{r}(G-v,c)\leqslant\alpha\cdot c(v)x_{v}+\alpha\cdot\operatorname{\mathsf{SA}}_{r}(G-v,c)\leqslant\alpha\cdot\operatorname{\mathsf{SA}}_{r}(G,c), a contradiction.

(ii) Note that GG is connected, otherwise there exists a connected component HH of GG such that OPT(H,cH)>α𝖲𝖠r(H,cH)\operatorname{\mathrm{OPT}}(H,c_{H})>\alpha\cdot\operatorname{\mathsf{SA}}_{r}(H,c_{H}), where cHc_{H} is the restriction of cc to V(H)V(H), contradicting the minimality of GG. To show that GG is twin-free, we proceed exactly as in the proof of Theorem 3. ∎

Theorem 5.

For every fixed ε>0\varepsilon>0, performing r=poly(1/ε)r=\mathrm{poly}(1/\varepsilon) rounds of the Sherali-Adams hierarchy produces an LP relaxation of Cluster-VD whose integrality gap is at most 2+ε2+\varepsilon. That is, OPT(G,c)(2+ε)𝖲𝖠r(G,c)\operatorname{\mathrm{OPT}}(G,c)\leqslant(2+\varepsilon)\operatorname{\mathsf{SA}}_{r}(G,c) for all weighted graphs (G,c)(G,c).

Proof.

In order to simplify the notation below, let us assume that 2/ε2/\varepsilon is integer. For instance, we could restrict to ε=2l\varepsilon=2^{-l} for some l1l\in\mathbb{Z}_{\geqslant 1}. This does not hurt the generality of the argument. We take r:=1+(2/ε)4r:=1+(2/\varepsilon)^{4}. We may assume that ε<1/2\varepsilon<1/2 since otherwise we invoke Theorem 3 (taking r=1r=1 suffices in this case).

Let (G,c)(G,c) be a counterexample to the theorem, with |G||G| minimum. By Lemma  4.(i), for every optimal solution xx to 𝖲𝖠r(G,c)\operatorname{\mathsf{SA}}_{r}(G,c), every vertex vV(G)v\in V(G) has xv<1/(2+ε)x_{v}<1/(2+\varepsilon). By Lemma 4.(ii), GG is twin-free (and connected).

We will use the following fact several times in the proof: for all RV(G)R\subseteq V(G) with |R|r|R|\leqslant r and every x𝖲𝖠r(G)x\in\operatorname{\mathsf{SA}}_{r}(G), the restriction of xx to the variables in RR is a convex combination of hitting sets of G[R]G[R]. This is easy to see since, denoting by xRx_{R} the restriction of xx, we get xR𝖲𝖠r(G[R])x_{R}\in\operatorname{\mathsf{SA}}_{r}(G[R]) and the Sherali-Adams hierarchy is known to converge in at most “dimension-many” rounds, see for instance [17].

First, we claim that GG has no clique of size at least 2/ε2/\varepsilon. Suppose otherwise. Let CC be a clique of size k:=2/εk:=2/\varepsilon and let DD be a minimal set such that each edge of CC has a distinguisher in DD. Let H:=G[CD]H:=G[C\cup D]. Then, following the construction from Section 2.3, one can obtain a cost function cHc_{H} such that cH(H)=2k1c_{H}(H)=2k-1, and cH(X)k1c_{H}(X)\geqslant k-1 for any hitting set XX of HH. See [19, Lemma 3] for the full construction, whose proof also shows that |H|2k1r|H|\leqslant 2k-1\leqslant r. Since every valid inequality supported on at most rr vertices is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G), the inequality vcH(v)xvk1\sum_{v}c_{H}(v)x_{v}\geqslant k-1 is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G). Since cH(H)=2k1c_{H}(H)=2k-1, this implies that for all x𝖲𝖠r(G)x\in\operatorname{\mathsf{SA}}_{r}(G), there is some vertex aV(H)a\in V(H) with xa(k1)/(2k1)x_{a}\geqslant(k-1)/(2k-1). Since (k1)/(2k1)1/(2+ε)(k-1)/(2k-1)\geqslant 1/(2+\varepsilon), we get a contradiction. This proves our first claim.

Second, we claim that for every v0V(G)v_{0}\in V(G), the subgraph of GG induced by the neighborhood N(v0)N(v_{0}) has no stable set of size at least 2/ε2/\varepsilon. The proof is similar to that for cliques given above, except that this time we let HH be the induced star K1,kK_{1,k} with apex v0v_{0} and k=2/εk=2/\varepsilon. The cost function cHc_{H} given by Lemma 2.2 has cH(v0)=k1c_{H}(v_{0})=k-1 and cH(v)=1c_{H}(v)=1 for all vSv\in S. Notice that once again cH(H)=2k1c_{H}(H)=2k-1. The star inequality vcH(v)xvk1\sum_{v}c_{H}(v)x_{v}\geqslant k-1 is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G), which guarantees that for every x𝖲𝖠r(G)x\in\operatorname{\mathsf{SA}}_{r}(G) there is some aV(H)a\in V(H) which has xa(k1)/(2k1)1/(2+ε)x_{a}\geqslant(k-1)/(2k-1)\geqslant 1/(2+\varepsilon). This establishes our second claim.

Third, we claim that the neighborhood of every vertex v0v_{0} induces a chordal subgraph of GG. Suppose that CC is a hole in G[N(v0)]G[N(v_{0})]. We first deal with the case |C|r1=(2/ε)4|C|\leqslant r-1=(2/\varepsilon)^{4}. We can repeat the same proof as above, letting HH be the induced wheel on V(C)+v0V(C)+v_{0} and using the cost function cHc_{H} defined in the proof of Lemma 2.1. Consider the wheel inequality vcH(v)xvk3\sum_{v}c_{H}(v)x_{v}\geqslant k-3, where k:=|H|=|C|+1k:=|H|=|C|+1. Since the wheel has at most rr vertices, the wheel inequality is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G). Since cH(H)=2k6=2(k3)c_{H}(H)=2k-6=2(k-3), for every x𝖲𝖠r(G)x\in\operatorname{\mathsf{SA}}_{r}(G), there is some aV(H)a\in V(H) which has xa1/21/(2+ε)x_{a}\geqslant 1/2\geqslant 1/(2+\varepsilon). This concludes the case where |C||C| is “small”.

Now, assume that |C|r|C|\geqslant r, and consider the wheel inequality with right-hand side scaled by 2/(2+ε)2/(2+\varepsilon). Suppose this inequality is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G). This still implies that some vertex aa of HH has xa1/(2+ε)x_{a}\geqslant 1/(2+\varepsilon), for all x𝖲𝖠r(G)x\in\operatorname{\mathsf{SA}}_{r}(G), which produces the desired contradiction. It remains to prove that the scaled wheel inequality is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G).

Let FF denote any rr-vertex induced subgraph of HH that is a fan.222A fan is a graph obtained from a path by adding an apex vertex. Hence, FF contains v0v_{0} as an apex vertex, plus a path on r1r-1 vertices. Letting cF(v0):=r3(r1)/3c_{F}(v_{0}):=r-3-\lfloor(r-1)/3\rfloor and cF(v):=1c_{F}(v):=1 for vV(Fv0)v\in V(F-v_{0}), we get the inequality vcF(v)xvr3\sum_{v}c_{F}(v)x_{v}\geqslant r-3, which is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G). By taking all possible choices for FF, and averaging the corresponding inequalities, we see that the inequality

(r3r13)xv0+r1k1vV(Hv0)xvr3\displaystyle\left(r-3-\left\lfloor\frac{r-1}{3}\right\rfloor\right)x_{v_{0}}+\frac{r-1}{k-1}\sum_{v\in V(H-v_{0})}x_{v}\geqslant r-3
\displaystyle\iff k1r1(r3r13)xv0+vV(Hv0)xvk1r1(r3)\displaystyle\frac{k-1}{r-1}\left(r-3-\left\lfloor\frac{r-1}{3}\right\rfloor\right)x_{v_{0}}+\sum_{v\in V(H-v_{0})}x_{v}\geqslant\frac{k-1}{r-1}(r-3)

is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G). It can be seen that this inequality dominates the scaled wheel inequality, in the sense that each left-hand side coefficient is not larger than the corresponding coefficient in the scaled wheel inequality, while the right-hand side is not smaller than the right-hand side of the scaled wheel inequality. Therefore, the scaled wheel inequality is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G). This concludes the proof of our third claim.

By the first, second and third claim333The first inequality follows since |H|α(H)ω(H)|H|\leqslant\alpha(H)\cdot\omega(H), for every perfect graph HH., |N(v0)|ω(G[N(v0)])α(G[N(v0)])4/ε2|N(v_{0})|\leqslant\omega(G[N(v_{0})])\cdot\alpha(G[N(v_{0})])\leqslant 4/\varepsilon^{2} for all choices of v0v_{0}. This implies in particular that |N2[v0]|1+16/ε4=r|N_{\leqslant 2}[v_{0}]|\leqslant 1+16/\varepsilon^{4}=r. Now let H:=G[N2[v0]]H:=G[N_{\leqslant 2}[v_{0}]]. Theorem 2 applies since GG is twin-free, by our second claim. Let cHc_{H} be the corresponding cost function. The inequality vcH(v)xvOPT(H,cH)\sum_{v}c_{H}(v)x_{v}\geqslant\operatorname{\mathrm{OPT}}(H,c_{H}) is valid for 𝖲𝖠r(G)\operatorname{\mathsf{SA}}_{r}(G).

Let λ\lambda^{*} be defined as in Step 15 of Algorithm 1, and let aV(G)a\in V(G) denote any vertex such that (cλcH)(a)=0(c-\lambda^{*}c_{H})(a)=0. By minimality of GG, there exists in (G,c):=(Ga,cλcH)(G^{\prime},c^{\prime}):=(G-a,c-\lambda^{*}c_{H}) a minimal hitting set XX^{\prime} of cost c(X)(2+ε)𝖲𝖠r(G,c)c^{\prime}(X^{\prime})\leqslant(2+\varepsilon)\operatorname{\mathsf{SA}}_{r}(G^{\prime},c^{\prime}). We let X:=XX:=X^{\prime} in case XX^{\prime} is a hitting set of GG, and X:=X+aX:=X^{\prime}+a otherwise. Assume that X=X+aX=X^{\prime}+a, the other case is easier. We have

c(X)\displaystyle c(X) =c(X)+λcH(X)\displaystyle=c^{\prime}(X^{\prime})+\lambda^{*}c_{H}(X)
(2+ε)𝖲𝖠r(G,c)+λ(cH(H)1)\displaystyle\leqslant(2+\varepsilon)\operatorname{\mathsf{SA}}_{r}(G^{\prime},c^{\prime})+\lambda^{*}(c_{H}(H)-1)
(2+ε)𝖲𝖠r(G,c)+2λOPT(H,cH)\displaystyle\leqslant(2+\varepsilon)\operatorname{\mathsf{SA}}_{r}(G^{\prime},c^{\prime})+2\lambda^{*}\operatorname{\mathrm{OPT}}(H,c_{H})
(2+ε)(𝖲𝖠r(G,c)+λOPT(H,cH)).\displaystyle\leqslant(2+\varepsilon)\left(\operatorname{\mathsf{SA}}_{r}(G^{\prime},c^{\prime})+\lambda^{*}\operatorname{\mathrm{OPT}}(H,c_{H})\right)\,.

By LP duality, we have 𝖲𝖠r(G,c)𝖲𝖠r(G,c)+λOPT(H,cH)\operatorname{\mathsf{SA}}_{r}(G,c)\geqslant\operatorname{\mathsf{SA}}_{r}(G^{\prime},c^{\prime})+\lambda^{*}\operatorname{\mathrm{OPT}}(H,c_{H}). This implies that c(X)(2+ε)𝖲𝖠r(G,c)c(X)\leqslant(2+\varepsilon)\operatorname{\mathsf{SA}}_{r}(G,c), contradicting the fact that (G,c)(G,c) is a counterexample. This concludes the proof. ∎

We now complement the result above by showing that every LP relaxation of Cluster-VD with (worst case) integrality gap at most 2ε2-\varepsilon must have super-polynomial size. The result is a simple consequence of an analogous result of [9] on the integrality gap of Vertex Cover, and of the straightforward reduction from Vertex Cover to Cluster-VD.

Proposition \theprop.

For infinitely many values of nn, there is a graph GG on nn vertices such that every size-no(logn/loglogn)n^{o(\log n/\log\log n)} LP relaxation of Cluster-VD on GG has integrality gap 2o(1)2-o(1).

Proof.

In [9] a similar result is proved for LP-relaxations of Vertex Cover: for infinitely many values of nn, there is a graph GG on nn vertices such that every size-no(logn/loglogn)n^{o(\log n/\log\log n)} LP relaxation of Vertex Cover on GG has integrality gap at least 2ε2-\varepsilon, where ε=ε(n)=o(1)\varepsilon=\varepsilon(n)=o(1) is a non-negative function.

Let GG be such a graph, and let G+G^{+} be the graph obtained from GG by attaching a pendant edge to every vertex. It is easy to see that UV(G)U\subseteq V(G) is a hitting set for G+G^{+} if and only if UU is a vertex cover of GG.

Toward a contradiction, suppose that AxbAx\geqslant b is a size-no(logn/loglogn)n^{o(\log n/\log\log n)} LP relaxation of Cluster-VD on G+G^{+} with integrality gap at most 2δ2-\delta, for a fixed δ>ε\delta>\varepsilon (where xdx\in\mathbb{R}^{d} for some dimension dd depending on GG). For every c+0V(G+)c^{+}\in\mathbb{Q}^{V(G^{+})}_{\geqslant 0} there exists a hitting set XX of G+G^{+} such that c+(X)(2δ)LP(G+,c+)c^{+}(X)\leqslant(2-\delta)\operatorname{\mathrm{LP}}(G^{+},c^{+}).

We can easily turn AxbAx\geqslant b into an LP relaxation for Vertex Cover. For every vertex cover UU of GG, we let the corresponding point be the point πUd\pi^{U}\in\mathbb{R}^{d} for UU seen as a hitting set in G+G^{+}. For every c0V(G)c\in\mathbb{Q}_{\geqslant 0}^{V(G)}, we define c+0V(G+)c^{+}\in\mathbb{Q}^{V(G^{+})}_{\geqslant 0} via c+(v):=c(v)c^{+}(v):=c(v) for vV(G)v\in V(G), and c+(v):=uV(G)c(u)c^{+}(v):=\sum_{u\in V(G)}c(u) for vV(G+)V(G)v\in V(G^{+})\setminus V(G). Then, we let the affine function fcf_{c} for cc be the affine function fc+f_{c^{+}} for c+c^{+}.

Since the integrality gap of AxbAx\geqslant b, seen as an LP relaxation of Cluster-VD, is at most 2δ2-\delta, for every c0V(G)c\in\mathbb{Q}_{\geqslant 0}^{V(G)} there exists a hitting set XX of G+G^{+} whose cost is at most (2δ)LP(G+,c+)(2-\delta)\operatorname{\mathrm{LP}}(G^{+},c^{+}), where c+c^{+} is the cost function corresponding to cc. If XX contains any vertex of V(G+)V(G)V(G^{+})\setminus V(G), we can replace this vertex by its unique neighbor in V(G)V(G), without any increase in cost. In this way, we can find a vertex cover UU of GG whose cost satisfies c(U)c+(X)(2δ)LP(G+,c+)=(2δ)LP(G,c)c(U)\leqslant c^{+}(X)\leqslant(2-\delta)\operatorname{\mathrm{LP}}(G^{+},c^{+})=(2-\delta)\operatorname{\mathrm{LP}}(G,c). Hence, the integrality gap of AxbAx\geqslant b as an LP relaxation of Vertex Cover is also at most 2δ<2ε2-\delta<2-\varepsilon. As the size of AxbAx\geqslant b is no(logn/loglogn)n^{o(\log n/\log\log n)}, this provides the desired contradiction. ∎

We point out that the size bound in the previous result can be improved. Kothari, Meka and Raghavendra [26] have shown that for every ε>0\varepsilon>0 there is a constant δ=δ(ε)>0\delta=\delta(\varepsilon)>0 such that no LP relaxation of size less than 2nδ2^{n^{\delta}} has integrality gap less than 2ε2-\varepsilon for Max-CUT. Since Max-CUT acts as the source problem in [9], one gets a 2nδ2^{n^{\delta}} size lower bound for Vertex Cover in order to achieve integrality gap 2ε2-\varepsilon. This also follows in a black-box manner from [26] and [12]. The proof of Proposition 4 shows that the same bound applies to Cluster-VD.

5. Conclusion

In this paper we provide a tight approximation algorithm for the cluster vertex deletion problem (Cluster-VD). Our main contribution is the efficient construction of a local cost function on the vertices at distance at most 22 from any vertex v0v_{0} such that every minimal hitting set of the input graph has local cost at most twice the local optimum. If the subgraph induced by N(v0)N(v_{0}) (the first neighborhood of v0v_{0}) contains a hole, or a 2P32P_{3}, then this turns out to be straightforward. The most interesting case arises when the local subgraph HH is twin-free, has radius 11, and moreover H[N(v0)]=Hv0H[N(v_{0})]=H-v_{0} is chordal and 2P32P_{3}-free. Such graphs are very structured, which we crucially exploit.

Lemma 2.2 allows us to define the local cost function on the vertices distinct from v0v_{0} and then later adjust the cost of v0v_{0}. We point out that condition (ii) basically says that the local cost function should define a hyperplane that “almost” separates the hitting set polytope and the clique polytope of the chordal, 2P32P_{3}-free graph Hv0H-v_{0}. This was a key intuition which led us to the proof of Theorem 2. If these polytopes were disjoint, this would be easy. But actually it is not the case since they have a common vertex (as we show, Hv0H-v_{0} has a hitting clique).

One natural question arising from our approach of Cluster-VD in general graphs is the following: is the problem polynomial-time solvable on chordal graphs? This seems to be a non-trivial open question, also mentioned in [15], where similar vertex deletion problems are studied for chordal graphs. It could well be that Cluster-VD in general chordal graphs is hard. Now, what about chordal, 2P32P_{3}-free graphs? We propose this last question as our first open question.

Our second contribution are polyhedral results for the Cluster-VD problem, in particular with respect to the tightness of the Sherali-Adams hierarchy. Our results on Sherali-Adams fail to match the 2-approximation factor of our algorithm (by epsilon), and we suspect this is not by chance. We believe that, already for certain classes of triangle-free graphs, the LP relaxation given by a bounded number of rounds of the Sherali-Adams hierarchy has an integrality gap strictly larger than 22. Our intuition goes as follows. Consider the star inequality (k1)xv0+i=1kxvik1(k-1)x_{v_{0}}+\sum_{i=1}^{k}x_{v_{i}}\geqslant k-1, valid when N(v0)={v1,,vk}N(v_{0})=\{v_{1},\dots,v_{k}\} is a stable set. Capturing all star inequalities is sufficient to achieve an integrality gap of at most 2 for all triangle-free graphs  [19, Algorithm 1]. However, we suspect that Sherali-Adams will have a hard time recovering these in a constant number of rounds. The star inequality is very similar to the clique inequality i=1kxvik1\sum_{i=1}^{k}x_{v_{i}}\geqslant k-1, which is valid for Vertex Cover when {v1,,vk}\{v_{1},\dots,v_{k}\} is a clique. It is known that Sherali-Adams is unable to capture all clique inequalities in a constant number of rounds of the Vertex Cover relaxation (see  [27, Section 6.1] for an equivalent statement on clique inequalities for the stable set polytope). Whether this intuition is accurate is our second open question.

As mentioned already in the introduction, we do not know any polynomial-size LP or SDP relaxation with integrality gap at most 22 for Cluster-VD. In order to obtain such a relaxation, it suffices to derive each valid inequality implied by Lemmas 2.1, 2.1, 2.2 and also somehow simulate Lemma 2.3. Here, different techniques to construct extended formulations (see for instance [7, 3, 36]) could be used. A partial result in this direction is that the star inequality has a bounded-degree sum-of-squares proof. This implies that a bounded number of rounds of the Lasserre hierarchy provides an SDP relaxation for Cluster-VD with integrality gap at most 22, whenever the input graph is triangle-free. This should readily generalize to the wheel inequalities of Lemma 2.1 and of course to the inequality of Lemma 2.1 (since the underlying graph has bounded size). However, we do not know how for instance to derive the inequalities of Lemma 2.2. We leave this for future work as our third open question.

Our fourth open question: what is the best running time for Algorithm 1? We think that it is possible to improve on our 𝒪(n4)\mathcal{O}(n^{4}) upper bound.

Another intriguing problem is to what extent our methods can be adapted to hitting set problems in other 33-uniform hypergraphs. We mention an open question due to László Végh [38]: for which classes of 33-uniform hypergraphs and which ε>0\varepsilon>0 does the hitting set problem admit a (3ε)(3-\varepsilon)-approximation algorithm?

As mentioned in the introduction, FVST (feedback vertex set in tournaments) is another hitting set problem in a 33-uniform hypergraph, which is also UGC-hard to approximate to a factor smaller than 22. There is a recent randomized 22-approximation algorithm [29], but no deterministic (polynomial-time) algorithm is known. Let us repeat here the relevant open question from [29]: does FVST admit a deterministic 22-approximation algorithm?

Acknowledgements

We are grateful to Daniel Lokshtanov for suggesting Lemma 2.2, which allowed us to simplify our algorithm and its proof. We also thank two anonymous referees for their helpful comments, which improved the presentation of the paper.

References

  • [1] R. Albert, H. Jeong, and A.-L. Barabási. Error and attack tolerance of complex networks. Nature, 406(6794):378–382, 2000.
  • [2] N. Alon and J. H. Spencer. The Probabilistic Method. Wiley Publishing, 4th edition, 2016.
  • [3] M. Aprile. Extended formulations for matroid polytopes through randomized protocols. arXiv preprint arXiv:2106.12453, 2021.
  • [4] M. Aprile, N. Castro, G. Ferreira, J. Piccini, F. Robledo, and P. Romero. Graph fragmentation problem: analysis and synthesis. International Transactions in Operational Research, 26(1):41–53, 2019.
  • [5] M. Aprile, M. Drescher, S. Fiorini, and T. Huynh. A simple 7/3-approximation algorithm for feedback vertex set in tournaments. arXiv preprint arXiv:2008.08779, 2020.
  • [6] M. Aprile, M. Drescher, S. Fiorini, and T. Huynh. A tight approximation algorithm for the cluster vertex deletion problem. In International Conference on Integer Programming and Combinatorial Optimization, pages 340–353. Springer, 2021.
  • [7] M. Aprile and Y. Faenza. Extended formulations from communication protocols in output-efficient time. Mathematical Programming, 183(1):41–59, 2020.
  • [8] M. Aprile, Y. Faenza, S. Fiorini, T. Huynh, and M. Macchia. Extension complexity of stable set polytopes of bipartite graphs. volume 10520, pages 75–87. Springer, Cham, 2017.
  • [9] A. Bazzi, S. Fiorini, S. Pokutta, and O. Svensson. No small linear program approximates vertex cover within a factor 2ϵ2-\epsilon. Mathematics of Operations Research, 44(1):147–172, 2019.
  • [10] J. R. Blair and B. Peyton. An introduction to chordal graphs and clique trees. In Graph theory and sparse matrix computation, pages 1–29. Springer, 1993.
  • [11] A. Boral, M. Cygan, T. Kociumaka, and M. Pilipczuk. A fast branching algorithm for cluster vertex deletion. Theory Comput. Syst., 58(2):357–376, 2016.
  • [12] G. Braun, S. Pokutta, and A. Roy. Strong reductions for extended formulations. Math. Program., 172(1–2):591–620, 2018.
  • [13] G. Braun, S. Pokutta, and D. Zink. Inapproximability of combinatorial problems via small LPs and SDPs. In Proceedings of STOC 2015, pages 107–116, New York, NY, USA, 2015. ACM.
  • [14] M.-C. Cai, X. Deng, and W. Zang. An approximation algorithm for feedback vertex sets in tournaments. SIAM J. Comput., 30(6):1993–2007, 2001.
  • [15] Y. Cao, Y. Ke, Y. Otachi, and J. You. Vertex deletion problems on chordal graphs. Theoretical Computer Science, 745:75–86, 2018.
  • [16] S. O. Chan, J. R. Lee, P. Raghavendra, and D. Steurer. Approximate constraint satisfaction requires large lp relaxations. Journal of the ACM (JACM), 63(4):1–22, 2016.
  • [17] M. Conforti, G. Cornuéjols, G. Zambelli, et al. Integer programming, volume 271. Springer, 2014.
  • [18] S. Fiorini, G. Joret, and O. Schaudt. Improved approximation algorithms for hitting 3-vertex paths. In International Conference on Integer Programming and Combinatorial Optimization, pages 238–249. Springer, 2016.
  • [19] S. Fiorini, G. Joret, and O. Schaudt. Improved approximation algorithms for hitting 3-vertex paths. Math. Program., 182(1-2, Ser. A):355–367, 2020.
  • [20] F. V. Fomin, S. Gaspers, D. Lokshtanov, and S. Saurabh. Exact algorithms via monotone local search. J. ACM, 66(2):Art. 8, 23, 2019.
  • [21] F. V. Fomin, T. Le, D. Lokshtanov, S. Saurabh, S. Thomassé, and M. Zehavi. Subquadratic kernels for implicit 3-hitting set and 3-set packing problems. ACM Trans. Algorithms, 15(1):13:1–13:44, 2019.
  • [22] A. Freund, R. Bar-Yehuda, and K. Bendel. Local ratio: a unified framework for approximation algorithms. ACM Computing Surveys, 36:422–463, 2005.
  • [23] S. Hosseinian and S. Butenko. Polyhedral properties of the induced cluster subgraphs. Discrete Applied Mathematics, 297:80–96, 2021.
  • [24] F. Hüffner, C. Komusiewicz, H. Moser, and R. Niedermeier. Fixed-parameter algorithms for cluster vertex deletion. Theory Comput. Syst., 47(1):196–217, 2010.
  • [25] E. Jahanpour and X. Chen. Analysis of complex network performance and heuristic node removal strategies. Communications in Nonlinear Science and Numerical Simulation, 18(12):3458–3468, 2013.
  • [26] P. K. Kothari, R. Meka, and P. Raghavendra. Approximating rectangles by juntas and weakly-exponential lower bounds for LP relaxations of CSPs. In H. Hatami, P. McKenzie, and V. King, editors, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 590–603. ACM, 2017.
  • [27] M. Laurent. A comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre relaxations for 0-1 programming. Math. Oper. Res., 28(3):470–496, 2003.
  • [28] D. Lokshtanov. Personal communication.
  • [29] D. Lokshtanov, P. Misra, J. Mukherjee, F. Panolan, G. Philip, and S. Saurabh. 22-approximating feedback vertex set in tournaments. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1010–1018. SIAM, 2020.
  • [30] M. Mnich, V. V. Williams, and L. A. Végh. A 7/3-approximation for feedback vertex sets in tournaments. In P. Sankowski and C. D. Zaroliagis, editors, 24th Annual European Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark, volume 57 of LIPIcs, pages 67:1–67:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
  • [31] T. Rothvoß. The Lasserre hierarchy in approximation algorithms. Lecture Notes for the MAPSP, pages 1–25, 2013.
  • [32] A. Schrijver. Theory of Linear and Integer Programming. Wiley, 1998.
  • [33] H. D. Sherali and W. P. Adams. A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM J. Discrete Math., 3(3):411–430, 1990.
  • [34] R. E. Tarjan. Decomposition by clique separators. Discrete mathematics, 55(2):221–232, 1985.
  • [35] R. E. Tarjan and M. Yannakakis. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM Journal on Computing, 13(3):566–579, 1984.
  • [36] H. R. Tiwary, M. Kouteckỳ, and P. Kolman. Extension complexity, mso logic, and treewidth. Discrete Mathematics & Theoretical Computer Science, 22, 2020.
  • [37] D. Tsur. Faster parameterized algorithm for cluster vertex deletion. CoRR, abs/1901.07609, 2019.
  • [38] L. A. Végh. Personal communication.
  • [39] J. You, J. Wang, and Y. Cao. Approximate association via dissociation. Discret. Appl. Math., 219:202–209, 2017.