The Lovász Theta Function for Recovering Planted Clique Covers and Graph Colorings
Abstract
The problems of computing graph colorings and clique covers are central challenges in combinatorial optimization. Both of these are known to be NP-hard, and thus computationally intractable in the worst-case instance. A prominent approach for computing approximate solutions to these problems is the celebrated Lovász theta function , which is specified as the solution of a semidefinite program (SDP), and hence tractable to compute. In this work, we move beyond the worst-case analysis and set out to understand whether the Lovász theta function recovers clique covers for random instances that have a latent clique cover structure, possibly obscured by noise. We answer this question in the affirmative, and show that for graphs generated from the planted clique model we introduce in this work, the SDP formulation of has a unique solution that reveals the underlying clique-cover structure with high-probability. The main technical step is an intermediate result where we prove a deterministic condition of recovery based on an appropriate notion of sparsity.
Keywords: clique cover, graph coloring, clustering, community detection, Lovász theta function, beyond worst-case analysis
1 Introduction
Graph colorings and clique covers are central problems in combinatorial optimization. Given an undirected graph , the objective of the graph coloring problem is to partition the vertices into independent subsets; that is, all the vertices from the same subset are not adjacent. The minimum number of independent sets required is known as the chromatic number of and is denoted by . The objective of the clique covering problem is to partition the vertices such that each subset is a clique in ; that is, all pairs of vertices from the same set are adjacent. The minimum number of cliques required is known as the clique cover number of and is denoted by .
Graph colorings and clique covers are complementary notions – the complement of an independent set is a clique. In particular, a partition of into independent sets corresponds to a cover with cliques in the complementary graph . Both clique covers and graph colorings have applications in various fields such as scheduling [21] and clustering.
The computational tasks of finding clique covers as well as graph colorings are not known to be tractable. For instance, the problem of deciding whether a graph admits a coloring with colors is NP-complete; in fact, it is one of Karp’s original 21 problems. However, the notion of NP-hardness is rooted in a worst-case analysis perspective, which may not be representative of the instances that are typically encountered in practice. As such, it is natural to move beyond the worst-case analysis [28] and study these problems in a suitably defined average-case instance [6]. In fact, there are a number of prior works that study problems known to be hard in the worst-case sense, but admit polynomial-time solutions for average-case instances that align with the problem’s inherent characteristics – prominent examples include community detection [1], clustering [5, 16], planted cliques and bicliques [3, 13], planted -disjoint-cliques [4], and graph bisection [7].
This is the viewpoint we adopt in this work. We focus on the clique cover problem, which by complementation, is equivalent to graph coloring. The central question we aim to address is:
Can the clique cover problem be efficiently solved for randomly generated instances where a clique cover structure naturally emerges?
We study the clique cover problem in instances generated from the planted clique cover model, which is specified by a set of vertices , a corresponding partition of the vertices , and a parameter . We generate a graph from the planted clique model by creating cliques for every subset , , and by subsequently introducing edges between vertices belonging to different cliques, with probability independently across all edges; see Figure 1 for an example.


Graphs generated from the planted clique cover model have a natural clique cover structure obscured by noise. Our goal is to understand whether this latent structure can be revealed using a convex relaxation of the integer programming formulation of the clique cover number. Specifically, we focus on the Lovász theta function of an undirected graph is given by
(P) |
where is the matrix of all ones and is the identity matrix of size . We point out that the specific formulation (P) is simply one among a number of equivalent formulations, and we refer the interested reader to [17] for a list of alternative definitions.
An important property of – often referred to as the “sandwich theorem” – is that it is a lower bound to the clique cover number and an upper bound to the stability number [20]
(1) |
The Lovász theta function is the solution of a semidefinite program (SDP), and hence computable in polynomial time [22, 27] – this stands in sharp contrast with the stability number and the chromatic number of a graph, both of which are NP-hard to compute. In particular, the sandwich theorem has important consequences for perfect graphs – these are graphs for which, in every induced subgraph of , the chromatic number equals the size of the maximum clique. For such graphs, one has , and thus the clique number and the chromatic number of any perfect graph can be computed efficiently in polynomial time. One can also obtain the minimal clique covers and colorings for such graphs tractably – see, e.g., [18, Section 6.3.2].
Recovery of planted clique covers via .
The goal of this work is to show that the Lovász theta recovers the planted clique cover model with high probability. What does such a claim entail? First, note that the latter inequality in (1) – namely – relies on the fact that any clique cover corresponds to a feasible solution of the Lovász theta formulation (P). Concretely, given a graph , let be any clique cover. Consider the following matrix
(2) |
It is evident that , and hence is a feasible solution to (P) with objective value . In what follows, we define and with respect to the planted clique covering
(3) |
When we say that we wish to show that the Lovász theta recovers the planted clique, we specifically mean to prove that the pair is the unique optimal solution to (P) for graphs generated according to the above planted clique cover model, with high probability.
Our first main result is as follows.
Result 1.
Let be a random planted clique cover instance defined on the cliques , and where we introduce edges between cliques with probability . Then is the unique optimal solution to the Lovász theta formulation (P) with probability greater than
The simplest type of graphs in the context of the clique cover problem are disjoint unions of cliques. An important intermediate step of our proof is to show that the Lovász theta function succeeds at recovering clique covers for graphs that resemble disjoint union of cliques. More concretely, we say that a graph with clique cover satisfies the -sparse clique cover (c-SCC) property if, for every vertex and every clique that does not contain , one has
(-SCC) |
Note that if the graph satisfies the property for some , then every vertex cannot form a clique with any other clique for which it does not belong to.
Result 2.
Suppose is a graph that satisfies the -SCC property for some . Then is the unique optimal solution to the Lovász theta formulation (P).
If the cliques are of equal size, then the above condition requires . An important observation from Result 2 is that it reveals the types of graphs for which the Lovász theta function is most effective at uncovering the underlying clique cover – these are graphs for which the -neighborly parameter is small. The parameter in our model is related to the neighborly parameter in that small choices of will generate graphs with small neighborly parameter. In our numerical experiments in Section 7, we see that the parameter (and hence, by extension ) also reveal something about the difficulty of computing the clique covering number of a particular graph – the ‘hard’ instances (these are defined by those requiring a large number of simplex solves using a MILP solver) correspond to graphs generated using intermediate values of (say ).
1.1 Proof Outline
In this section, we give an outline of the proof of our main results. The key part of our argument is to produce a suitable dual optimal solution that certifies the optimality as well as the uniqueness of in (P). First, as a reminder, the dual program to (P) is the following semidefinite program
(D) |
A basic consequence of weak duality is that, in order to show is an optimal solution to (P) (note that this is equivalent to ), it suffices to produce a dual feasible solution that attains the same objective; i.e., one would need to exhibit that is a feasible solution to (D) and satisfies . To show that is the unique optimal solution to (P), it is necessary to appeal to a strict complementarity-type of result for SDPs; see, e.g., [2, 19]. In what follows, we specialize a set of conditions from [19] that guarantee unique solutions in SDPs to the Lovász theta function.
Theorem 1.1.
In view of Theorem 1.1, to show that is the unique optimal solution of (P), it suffices to show that is an extreme point of the feasible region of the feasible region of (P), and produce a dual optimal that satisfies strict complementarity.
The answer to the first task has simple answer: in Section 2, we show that is an extreme point of the feasible region of (P) whenever the graph satisfies the condition for some . Our proof relies on results that characterize extreme points of spectrahedra, and the specific version we use is found in Corollary 3 of [25]:
Theorem 1.2.
Let be a spectrahedron specified in linear matrix inequality(LMI) form
(4) |
where are symmetric matrices. Let , and be an matrix whose columns span the kernel of the matrix . Then, is an extreme point of if and only if the vectors are linearly independent.
The answer to the second task is considerably more involved. As a reminder, a matrix is optimal for (D) if it satisfies:
(5) | |||
(6) | |||
(7) | |||
(8) |
Here we use the notation . We call a matrix satisfying these conditions a dual certificate. Furthermore, we say that a dual certificate and satisfy strict complementarity if:
(9) |
Note that we can omit condition (8) as we can always scale a matrix that satisfies conditions (5), (6), (7) and (9) to make it trace-one. The proof our two main results is given in three steps.
Step 1: Exact recovery in the case of disjoint cliques.
In the first step, we consider the simplest setting where is the disjoint union of cliques . With a bit of guesswork, we construct the following matrix defined as follows:
(10) |
Here, the -th block has dimension .
Step 2: Exact recovery for graphs with the property.
The matrix that served as our dual certificate for graphs formed by disjoint cliques does not work in the more general setting as it is non-zero on every edge between distinct cliques; i.e., it violates (6). The key idea for constructing a suitable dual certificate in this case begins by recognizing that conditions (6) and (7) define subspaces:
In view of this, we consider a dual certificate obtained by projecting the matrix with respect to the Frobenious norm onto the intersection of these spaces:
The remainder of proof is a matrix pertubation argument in which we show that is quite small, which allows us to conclude the remaining conditions (5) and (9). We now provide some geometric intuition why the projection onto the intersection succeeds.
The graphs considered in this work are either disjoint union of cliques or have the property. In both cases, we have that is a subspace of
which we treat as the ambient space, rather than the space of all symmetric matrices. Setting
can be equivalently defined as the projection of onto the intersection of .
(11) |
In Section 4 we give the main technical result of this work, where we establish that the subspaces and are “almost orthogonal”. For this, it is now crucial that our ambient space is .
Step 3: Recovery for planted clique cover instances.
Our third step shows that planted clique instances are , with high probability. The proof is a direction application of Hoeffding’s inequality combined with a union bound, and is provided in Section 5.
2 Extremality of
The goal of this section is to provide a simple sufficient condition that guarantees when is an extreme point of the feasible region of (P). As we noted in Section 1.1, we rely on a general result that describes extreme points of spectrahedra specified by a linear matrix inequality (LMI), stated in the form of Theorem 1.2. We begin by expressing the feasible region of (P) as the following LMI
This suggests we should take the matrices to be , and to identify the matrix with in Theorem 1.2. Our next task is to characterize the kernel of .
2.1 Computing the kernel of
Proposition 2.1.
Setting
we have that
Proof.
The first equality follows immediately from definition since . We focus on the second equality. To do so, it suffices to show that every vector of the form , , is in the span of , , and vice versa.
First, we have . Then, note that . Hence the vectors lies in the linear span of vectors of the form , .
In the reverse direction, note that . In other words, every vector of the form , , lies in the linear span of vectors of the form , . This establishes the second equality. ∎
As an immediate consequence we have that:
Proposition 2.2.
The kernel of is equal to
Proof.
By definition of we have that Thus,
∎
Proposition 2.3.
The following statements are equivalent for a PSD matrix :
-
(1)
.
-
(2)
for all
-
(3)
.
-
(4)
for all – here, are rows of .
Moreover, we have that if and only if .
Proof.
We have that
where the last equivalence holds as .
Note that (2) is equivalent to
Taking orthogonal complements, this is in turn equivalent to
Finally since . ∎
Our next result describes the spectrum of the matrix and show that the columns of the matrix span the kernel of . Recall that
(12) |
This matrix can be alternatively expressed in a more convenient form:
To be clear, this is precisely the same matrix (10) we use as our dual certificate whenever is a disjoint union of cliques. However, we will not require any information about being a dual certificate at this juncture.
Proposition 2.4.
The eigenvalues of are (with multiplicity one), (with multiplicity ), and with multiplicity . Moreover, we have that .
Proof.
First, note that the is an eigenvector whose eigenvalue is . Second, note that the vector is an eigenvector with eigenvalue , for all . Third, fix a subset . Let be a vector with entries in the coordinates corresponding to satisfying . One can check that , and hence . The dimension of this eigenspace is . Finally, we have that
and using Proposition 2.1 we get that
∎
2.2 Establishing extremality
Lemma 2.5.
Consider a graph with a clique cover . Let be a subset of vertices such that, for all cliques except one, leaves out at least one vertex. Then, the columns of corresponding to are linearly independent.
Proof.
Without loss of generality, we assume that for all cliques except , the set of vertices contains at most vertices from clique , for all .
We index the columns of by , where the index refers to the clique while the index refers to the vertex within the clique. Let be the columns of the matrix and consider a linear combination:
(13) |
Fix a cluster , and let be the projection of onto the coordinates corresponding to . By applying to (13) we get:
(14) |
The rightmost equality follows by noting that for all . On the other hand, are standard basis vectors. Since leaves out at least one vector in , the span of these vectors cannot include . So it follows from (13) that , and in particular, for all . We repeat this argument for all and conclude that for all , and all .
Finally, we consider the vectors and project these to the rows in . The vectors are standard basis vectors. Since , it must be that for all . Hence all the coefficients are zero. This proves that the columns in are linearly independent. ∎
Lemma 2.6.
Let be a graph that satisfies the property for some . Then the collection of matrices are linearly independent.
Proof.
Consider a linear combination
Fix a vertex , and for all , we let denote the neighbors of vertex in . Then the -th row of the above expression is given by:
where denotes the -th row of . Now, since the vertex is not fully connected to clusters for all , the collection of vertices satisfy the conditions of the subset in Lemma 2.5. By Lemma 2.5, the corresponding columns of are linearly independent, which means that for all . The result follows by repeating this argument for all . ∎
Theorem 2.7.
Let be a graph, and let be a clique cover for . Suppose satisfies the property for some . Then is an extreme point of the feasible region of (P).
Proof of Theorem 2.7.
Suppose satisfies the property for some . By Lemma 2.6, the collection of matrices are linearly independent. Hence the vectors in are also linearly independent. This means that the following matrix formed by combining all the (vectorized) matrix product , ranging over all edges , as well as , has full column rank
Hence, by Theorem 1.1, is an extreme point of the feasible region of (P). ∎
3 Exact recovery in the case of disjoint cliques
In this section, we show that is the unique optimal solution to the Lovász theta formulation (P) when is a disjoint union of cliques. In this simple case, satisfies the condition with . Following the conclusions of Theorem 2.7, is an extreme point of the feasible region of (P). Based on Theorem 1.1, it remains to produce a suitable dual witness satisfying the requirements listed in Theorem 1.1.
As it turns out, the matrix satisfies all the requirements we need! In fact, in the process of computing the spectrum of the matrix in Proposition 2.4, we have also verified that does indeed satisfy the requirements of Theorem 1.1. We collect these conclusions below.
Theorem 3.1.
Let be the graph formed by the disjoint union of cliques . Then the matrix is the unique solution of (P).
Proof.
As we noted above, we simply need to check that the matrix satisfies conditions (5) to (9). First, from Proposition 2.4, the matrix is PSD, which is (5). Second, all the edges of are within cliques. The matrix , when restricted to each clique , is the identity matrix, and thus satisfies the support condition (6). It is easy to verify that all the rows of belong to . Hence, by Proposition 2.3, , which is (7). It is easy to see that and , which shows (8). Last, by Proposition 2.4, we have . Hence, by Proposition 2.3, we have , which is (9). ∎
4 Exact recovery for graphs with the proeprty
In this section, let be a graph that satisfies the property. Our goal is to extend the results in Section 3, and show that is also the unique solution to (P) for graphs satisfying the condition, provided is appropriately small. Following our discussion in Section 3, the matrix remains an extreme point of the feasible region of (P) if . Hence, by Theorem 1.1, it remains to produce a suitable dual witness satisfying the requirements listed in Theorem 1.1.
4.1 Incoherence-type result
The certificate we will use for graphs with the property is , defined as the projection of onto with respect to the Frobenius norm, i.e.,
Using the first-order optimality conditions we have that
(15) |
where lies in the normal cone of at and lies in the normal cone of at . Recalling that the normal cone of a subspace is its orthogonal complement we have that and .
Our main result can be viewed as the extension of orthogonality for graphs that are disjoint unions of cliques to graphs with the property. The proof is broken up into a sequence of results that follow.
Theorem 4.1.
Let be a graph satisfying the property. Then,
-
For any and we have
-
For any we have
A direct sum decomposition for .
The first step is to provide a decomposition of the space into simpler subspaces, on which it is easier to prove the near orthogonality property. We use these results as basic ingredients to build up to our near orthogonality property later.
Define the matrices so that (i) the -th block is a diagonal matrix whose -th entry is set equal to and all remaining entries equal to , and (ii) the -th block (-th block) is such that entries in the -th row (column) are equal to and all other entries equal to , (iii) all other entries are zero. As an example, in the case where the matrix is given by:
where omitted entries are zero. Second, consider the following subspaces:
Name | Description | Dimension |
---|---|---|
Proposition 4.2.
We have that
Proof of Proposition 4.2.
The fact that and are orthogonal is easy to check. Define the matrix , where , and such that (i) the -th entry of the -th block is equal to , and (ii) the entries of the -th row (column) of the -th block (-th block) are all equal to , and (iii) all other entries are .
As an example, in the case of two cliques (so ) the matrix is given by:
where all omitted entries are zero.
First, observe that the matrices specify all the linear equalities in the subspace , and thus, is precisely the span of . As such, to show that and span , it suffices to show that every matrix is expressible as the sum of matrices belonging to each of these subspaces. As a concrete example, we show this is true for – the construction for other matrices are similar. Indeed, one checks that is the linear sum of these matrices
This completes the proof. ∎
In view of Proposition 4.2, to bound the inner product between vectors in and respectively, it suffices to bound inner product between and and and .
Next, we describe a result that shows how incoherence computation for (a small number of) orthogonal subspaces can be put together to obtain incoherence computations.
Lemma 4.3.
Let be orthogonal subspaces of . Consider such that
Then, for we have that
Proof of Lemma 4.3.
For any we have
where the second last inequality follows from Cauchy-Schwarz, while the last equality uses the fact that . ∎
Bounding inner product between and .
Define the column vectors by
Note that
With a slight abuse of notation, we define the subspace of matrices in
The abuse of notation arises the vectors in the definitions of and are different. Note that because , the subspaces and have dimensions and respectively and are orthogonal. and are relevant for our problem as the block off-diagonal entries of any matrix in belong to
Lemma 4.4.
Let such that each row has at most non-zero entries for some . Then, for any we have that
The same conclusion holds if each column of has at most non-zero entries and .
Proof of Lemma 4.4.
Since , it has constant columns. Suppose its column entries are , i.e., for all . We then have
where the last inequality follows from Cauchy-Schwarz, and since there are at most non-zero entries in each column of . Finally, we have
where in the last equality we use that . ∎
Corollary 4.5.
Consider where each column has at most non-zero entries, and each row has at most non-zero entries for some . For any we have:
Proof of Corollary 4.5.
Lemma 4.6.
Suppose is a graph satisfying the property. Then, for any and we have that
Proof.
As , its diagonal blocks are zero, and also, entries corresponding to non-edges of are zero. Moreover, as has the -SCC property, each row (and column) of has at most non-zero entries. Let the block matrices be indexed by , and let and denote the -th block. Recall that the block off-diagonal entries of any matrix in belong to By Corollary 4.5 we have . By summing over the blocks and by applying Cauchy-Schwarz, we have
∎
Bounding the inner product between and .
This case is easier and the required result is given in the next lemma.
Lemma 4.7.
Suppose is a graph satisfying the property. Then, for any and we have that
Proof of Lemma 4.7.
Note that has no entries in each block diagonal. Consider the -th block matrix where . We denote the coordinates in this block by . Then
The last inequality follows from Cauchy-Schwarz, and by noting that has at most non-zero entries within the block . Then by summing over the blocks we have
The last inequality follows from Cauchy-Schwarz. Now note that , and that , from which the result follows. ∎
Finally, we are ready to prove Theorem 4.1.
Proof of Theorem 4.1.
Consider and . By Proposition 4.2 we have that and let , where .
Part . By Lemma 4.7 we have that
Let be the block off-diagonal component of and be the block diagonal component of so that . By definition, the matrix is zero on the diagonal blocks and each block belongs to . Hence by Lemma 4.6, we have
Moreover, as the diagonal blocks of are zero we have that
Putting everything together,
where the last inequality follows by noting that .
Finally, by Proposition 4.2, the subspaces and are orthogonal. Hence by Lemma 4.3 applied to the subspaces and we have .
Part . Note that
where the second and third equalities follow from the fact that orthogonal projections are self-adjoint and idempotent. Since , by Proposition 4.1, we have
from which the result follows.∎
4.2 Proof of Result 2
Theorem 4.8.
Suppose satisfies the property for some . Then is the unique optimal solution to (P).
Following Proposition 2.4, the largest singular value of is , while the smallest non-zero singular value of is . Hence an alternative way to express the lower bound in Theorem 4.8 in terms of a condition number-type of quantity
Proof.
As a reminder, our candidate dual certificate is as defined earlier in (11). By Theorem 2.7, is an extreme point of the feasible region. Thus, by Theorem 1.1, we need to show that satisfies conditions (5)–(9). Conditions (6) and (7) are satisfied by construction. As we noted in the proof of Theorem 3.1, conditions (8) and (LABEL:eq:objvalks) are taken care of by scaling. As such, it remains to show that is PSD and that .
Simplification:
We begin by showing that it suffices to prove the inequality (as a reminder, is the smallest non-zero singular value of ). To see why this is sufficient, let and consider its direct sum decomposition with respect to ; i.e., . Since , we have by Lemma 2.3, part (4) that
By Lemma 2.4, restricted on is positive definite with smallest eigenvalue at least , and hence
On the other hand, the inequality implies
This means that for all , which means that is PSD.
Finally, we need to show that . By definition, we have that , so by Proposition 2.3, we have that
For the converse inequality, we show that . For this, take . Let and assume that . Then, we would have
leading to a contradiction.
Expressing via the KKT conditions.
The first-order optimality condition corresponding to (11) gives
(16) |
where lies in the normal cone of at and lies in the normal cone of at . Here we used the fact that the normal cone to an intersection is the sum of the normal cones, e.g. see [rockafellar, Corollary 23.8.1]. Moreover, as the normal cone to a subspace is the orthogonal subspace, we have that and . First, by projecting (16) onto we have
(17) |
This follows as , . Second, by projecting (16) onto the subspace we have
(18) |
Combining (17) and (18), we have
(19) |
where the first equality follows from (17), and the last inequality follows from the triangle inequality.
We next proceed to bound the terms and .
Bound the term .
We show that
(20) |
The -th entry of the matrix is equal to if , and is equal to zero otherwise. Consider the block corresponding to , where . Each non-zero entry is and there are at most entries. Hence the sum of squares of the entries in this block is at most . We sum over the all indices and to obtain , from which the result follows. (In the first inequality, recall that the block-diagonal entries of are zero and thus do not contribute to the sum.)
Bound the term .
Concluding the proof.
5 Recovery for planted clique cover instances
Theorem 5.1.
Let be a random planted clique cover instance defined on the cliques where we introduce edges between cliques with probability . If then is -neighborly with probability greater than
Proof.
Let be a binary variable equal to 1 if vertices and are connected, and equal to 0 otherwise. First we have . By Hoeffding’s inequality we have
(24) |
Using a union bound we get that
Finally, given that , we set to get the desired result. ∎
6 Numerical comparison to alternative techniques
In this section, we assess the effectiveness of the Lovász theta function for recovering planted clique cover instances by comparing it to alternative techniques in the literature. Specifically, we compare our method with the following approaches:
-
•
Community Detection approach: We frame the clique cover problem as an instance of community detection, and use the method proposed by Oymak and Hassibi [24].
-
•
-Disjoint-Clique approach: We consider the clique cover problem as an instance of the -disjoint-clique problem, and apply the method presented by Ames and Vavasis [4].
-
•
Subgraph Isomorphism approach: We view the problem as an instance of subgraph isomorphism, and apply the framework proposed by Candogan and Chandrasekaran [9].
Community Detection/Graph clustering.
Consider a network where the presence of an edge (or possibly the size of its edge weight) models the strength of association between two nodes. The graph clustering task concerns grouping these nodes into subsets – often called communities – such that the strength of association between pairs of nodes from the same subset is significantly stronger that the strength of association between pairs belonging to different subsets. The task of community detection finds application in various fields such as sociology, physics, epidemiology, and more [14, 23]. The term community detection is often used interchangeably with graph clustering.
Oymak and Hassibi [24] (see also [11]) present a method for detecting communities in an undirected graph by solving the following SDP:
(25) |
Here is the adjacency matrix of , is a tuning parameter, is the L1-norm over the matrix entries, while is the matrix nuclear norm.
Let be the optimal solution to this SDP. The intuition behind the SDP (25) is that it performs a matrix deconvolution in which the adjacency matrix is separated as the sum of a low-rank component and a sparse matrix . The hope is that (possibly by tuning the parameter ) the low-rank component takes the form , where are the indicator vectors for disjoint communities, while the sparse component captures noise that is (hopefully) sparse.
The specific form in (25) builds upon a long sequence of ideas, originating from the observation that the L1-norm is effective at inducing sparse structure, while the matrix nuclear norm is effective at inducing low-rank structure [8, 26]. Subsequent work proposed the formulation (25) for deconvolving matrices as the linear sum of a sparse matrix and a low-rank matrix [10, 32]; in particular, Chandarasekaran et. al. (25) show that penalization with the L1-norm plus the nuclear-norm succeeds at recovering both components so long as these components satisfy a mutual incoherence condition. The formulation in (25) is a reasonable approach for obtaining clique covers because one can view cliques as clusters.
-Disjoint-Clique Approach.
In the -disjoint-clique problem, the goal is to find disjoint cliques (of any size) that cover as many nodes of the graph as possible. It is worth noting that in this particular setting, the number of cliques is constant and is determined by the user as a parameter. Ames and Vavasis [4] introduce the following SDP for this problem:
(26) | ||||
Subgraph isomorphism.
Candogan and Chandrasekaran [9] investigate the problem of finding any subgraph embedded within a larger graph, known as the subgraph isomorphism problem. A natural approach is to search over matrices obtained by conjugating the adjacency matrix of the subgraph with the permutation group, leading to a quadratic assignment problem instance. The basic idea in [9] is to relax this set by conjugating with the orthogonal group instead. The resulting convex hull is known as the Schur-Horn orbitope [29], and it admits a tractable description via semidefinite programming [12]. In our context, the adjacency matrix of the subgraph (technically not a proper subgraph) is , and the resulting SDP is
(27) | ||||
6.1 Experimental setup and results
In our experimental setup, we generate clique cover instances from the planted clique cover model with cliques, each of size . For each value of , we generate random clique cover instances. We compare the Lovász theta against the three methods described in the previous section. Our result are summarized in Figure 2, where we illustrate the average success rates of each method in recovering the underlying clique cover instances across the entire spectrum of values. In the reported experiments we use two notions of recovery for :
-
1.
We declare strong recovery when the solver outputs an optimal solution that is equal to (up to a small tolerance).
-
2.
We declare weak recovery when but the output solution is not equal to .
First, we compare against the SDP (25) for community detection. We sweep over the choices regularization parameters . This sweeping process makes their method the most expensive to implement among all methods listed here. For a given choice of , let denote the optimal solution. We say that the method succeeds if for some choice of parameter .
Second, we compare against the SDP (26) for the -disjoint clique problem. We say that the method succeeds if the optimal solution satisfies . It’s worth noting that when applying this method to uncover clique covers, there is a minor constraint in that we must specify the size of cliques , or alternatively, sweep over all feasible clique sizes.
Our third basis of comparison is the SDP (27) proposed by Candogan and Chandrasekaran for finding subgraphs based on the Schur-Horn orbitope of a graph [9]. We say that the method succeeds if the optimal solution satisfies . A limitation of using this approach for finding a clique cover is that we implicitly assume that we know what we are searching for in the first place (to be precise, it is the spectrum of the target graph we require as an input). This assumption is somewhat unrealistic, but one should bear in mind that Candogan and Chandrasekaran’s work in [9] was intended to solve a completely different problem in the first place.
Our experiments are summarized in Figure 2. First, when examining the performance of the Lovász theta function , we observe a congruence between our theoretical predictions, which predict strong recovery for small values, and the outcomes derived from our numerical experiments. Furthermore, it is noteworthy that this strong recovery pattern persists until approximately .
In the context of our comparisons with the other three methods, the Lovász theta function emerges as the most successful, with only a slight edge over the SDP (26) proposed by Ames and Vavasis [4] for finding disjoint cliques, as well as the Schur-Horn orbitope-based relaxation (27) presented by Candogan and Chandrasekaran. The method that exhibited the weakest performance in our experiments was the matrix deconvolution method by Oymak and Hassibi [24].

7 Future directions
In this section, we conduct and discuss additional numerical experiments involving the Lovász theta function . Our objective is to report interesting behavior of the Lovász theta function in the context of recovering clique covers that are not supported or apparent from our theoretical findings, and suggest future directions to investigate.
7.1 Comparison with ILP
In our first example, we compare how tight the Lovász theta function is in comparison with the clique cover number. To investigate this question, we generate a graph from the planted clique cover model for varying choices of . We compute the clique cover number by solving the following Integer Linear Program (ILP)
(28) | ||||
Here, the binary variable indicates whether vertex is contained in clique , while the binary variable indicates whether clique is non-empty. The constraint ensures no adjacent pair of vertices are in the same clique, while the constraint ensures that every vertex belongs to a unique clique.
Our motivation for studying this question is as follows: In our experiments in Section 6, we generally take the clique covering number to be equal to . However, short of solving computing the clique covering number outright (say, via (28)), there is no simple way of verifying this is indeed the case. The concern increases as increases, as we may inadvertently lower the clique covering number when more edges are added. As such, one objective of this experiment is to understand if our assumption that the clique covering number is well approximated by is sound.
In the left sub-plot of Figure 3 we compare the clique covering number with the Lovász theta function on a planted clique cover instance with cliques, each of size , and with varying choices of parameter . Based on our results we notice that the deviation between these quantities is at most , and is greatest at .
In the right sub-plot of Figure 3 we compare the time taken for the ILP solver Gurobi [15] to compute (28) with the SDP solver SDPT3 [30, 31] to compute (P). The time taken for SDP solver is approximately equal across all problem instances, as one would expect. The time taken for the ILP solver is quite small for , but becomes substantially longer for . In the same plot we also track the number of simplex solves required by the ILP solver Gurobi – the trend of this curve is largely similar to the previous curve. These observations suggest that the most complex regime for computing the clique covering number for the Erdős-Rényi noise model occurs around . There are a number of explanations: First, our theoretical findings suggest that it is generally easier for the Lovász theta function to discover minimal clique covers for small values of , and these curves do also suggest that small values of correspond to ‘easier’ instances. On the other extreme, at , there is only one clique. For values of , it may be that large cliques are easy to find, and hence ILP solvers are able to certify optimality relatively quickly. It would be interesting to investigate the behavior of these curves for increasing problem sizes, though one obvious difficulty is that the ILP instances may become impractical to compute.


7.2 Phase Transition
In the second experiment, we investigate the probability that the Lovász theta function correctly recovers the planted cliques as certain parameters are taken to . Our objective is to see if a phase transition arises.
To investigate this problem, we consider an experimental set-up where all the cliques are of equal size. We set the number of cliques to be equal to the size of each clique , with the values ranging over . For each value of and , we generate random graphs from the planted clique cover model with taking values from . In Figure 4 we plot the success probabilities corresponding to each value of . We plot the curve corresponding to strong recovery in black, and the curve corresponding to weak recovery in red.
We make a number of observations. First, we observe that the transition from success to failure becomes more abrupt as we increase the values of and . This observation holds for the both types of recovery conditions. In particular, our results suggest that a phase transition occurs as these values are taken to , and we conjecture that this is indeed the case, as is typical with many phenomena in random graphs. Second, and quite interestingly, the gap between both types of recovery shrink as the problem parameters and increase. We conjecture that both curves coincide in the limit.

References
- [1] E. Abbe, A. S. Bandeira, and G. Hall. Exact Recovery in the Stochastic Block Model. IEEE Transactions on Information Theory, 62(1):471–487, 2016.
- [2] F. Alizadeh, J. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semidefinite programming.
- [3] B. P. W. Ames and S. A. Vavasis. Nuclear Norm Minimization for the Planted Clique and Biclique Problems. Mathematical Programming, 129:68–89, 2011.
- [4] B. P. W. Ames and S. A. Vavasis. Convex Optimization for the Planted k-disjoint-clique Problem. Mathematical Programming, 143:299–337, 2014.
- [5] P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, No Need to Round: Integrality of Clustering Formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, 2015.
- [6] A. S. Bandeira. A Note on Probably Certifiably Correct Algorithms. Comptes Rendus Mathematique, 354(3), 2016.
- [7] Ravi B. Boppana. Eigenvalues and graph bisection: An average-case analysis. In 28th Annual Symposium on Foundations of Computer Science, 1987.
- [8] E. J. Candès and Y. Plan. Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements. IEEE Transactions on Information Theory, 57(4):2342–2359.
- [9] U. O. Candogan and V. Chandrasekaran. Finding Planted Subgraphs with Few Eigenvalues using the Schur–Horn Relaxation. SIAM Journal on Optimization, 28:735–759, 2018.
- [10] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-Sparsity Incoherence for Matrix Decomposition. SIAM Journal on Optimization, 21(2), 2011.
- [11] Y. Chen, A. Jalali, S. Sanghavi, and H. Xu. Clustering Partially Observed Graphs via Convex Optimization. Journal of Machine Learning, 2014.
- [12] Y. Ding and H. Wolkowicz. A Low-Dimensional Semidefinite Relaxation for the Quadratic Assignment Problem. Mathematics of Operations Research, 34:769–1024, 2009.
- [13] Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Structures & Algorithms, 16(2), 2000.
- [14] Santo Fortunato. Community Detection in Graphs. Physics Reports, 486(3–5):75–174, 2010.
- [15] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
- [16] Takayuki Iguchi, Dustin G. Mixon, Jesse Peterson, and Soledad Villar. On the tightness of an sdp relaxation of k-means. CoRR, abs/1505.04778, 2015.
- [17] D. E. Knuth. The Sandwich Theorem.
- [18] M. Laurent and F. Vallentin. Semidefinite Optimization. online, 2012.
- [19] M. Laurent and A. Varvitsiotis. Positive Semidefinite Matrix Completion, Universal Rigidity and the Strong Arnold Property. Linear Algebra and its Applications, 452:292–317, 2014.
- [20] L. Lovász. On the Shannon Capacity of a Graph. IEEE Transactions on Information Theory, 25:1–7, 1979.
- [21] D. Marx. Graph colouring problems and their applications in scheduling. Periodica Polytechnica, Electrical Engineering, 48:11–16, 2004.
- [22] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. SIAM Studies in Applied and Numerical Mathematics, 1994.
- [23] M. E. J. Newman. Modularity and Community Structure in Networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582, 2006.
- [24] S. Oymak and B. Hassibi. Finding Dense Clusters via Low Rank + Sparse Decomposition. CoRR, abs/1104.5186, 2011.
- [25] M. Ramana and A. J. Goldman. Some Geometric Results in Semidefinite Programming. Journal of Global Optimization, 7:33–50, 1995.
- [26] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. SIAM Review, 52(3):471–501, 2010.
- [27] J. Renegar. A Mathematical View of Interior-Point Methods in Convex Optimization. MOS-SIAM Series on Optimization, 2001.
- [28] Tim Roughgarden. Beyond the worst-case analysis of algorithms. Cambridge University Press, 2020.
- [29] R. Sanyal, F. Sottile, and B. Sturmfels. Orbitopes. Mathematika, 57(2):275–314, 2011.
- [30] K. C. Toh, M. J. Todd, and R. H. Tütüncü. SDPT3 – a MATLAB Software Package for Semidefinite Programming. Optimization Methods and Software, 11:545–581, 1999.
- [31] R. H. Tütüncü, K. C. Toh, and M. J. Todd. Solving Semidefinite-Quadratic-Linear Programs using SDPT3. Mathematical Programming, 95:189–217, 2003.
- [32] J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao. Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization. In Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009.