This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Lovász Theta Function for Recovering Planted Clique Covers and Graph Colorings

Jiaxin Hou, Yong Sheng Soh, and Antonios Varvitsiotis
Abstract

The problems of computing graph colorings and clique covers are central challenges in combinatorial optimization. Both of these are known to be NP-hard, and thus computationally intractable in the worst-case instance. A prominent approach for computing approximate solutions to these problems is the celebrated Lovász theta function ϑ(G)\vartheta(G), which is specified as the solution of a semidefinite program (SDP), and hence tractable to compute. In this work, we move beyond the worst-case analysis and set out to understand whether the Lovász theta function recovers clique covers for random instances that have a latent clique cover structure, possibly obscured by noise. We answer this question in the affirmative, and show that for graphs generated from the planted clique model we introduce in this work, the SDP formulation of ϑ(G)\vartheta(G) has a unique solution that reveals the underlying clique-cover structure with high-probability. The main technical step is an intermediate result where we prove a deterministic condition of recovery based on an appropriate notion of sparsity.

Keywords: clique cover, graph coloring, clustering, community detection, Lovász theta function, beyond worst-case analysis

1 Introduction

Graph colorings and clique covers are central problems in combinatorial optimization. Given an undirected graph G=(𝒱,)G=(\mathcal{V},\mathcal{E}), the objective of the graph coloring problem is to partition the vertices into independent subsets; that is, all the vertices from the same subset are not adjacent. The minimum number of independent sets required is known as the chromatic number of GG and is denoted by χ(G)\chi(G). The objective of the clique covering problem is to partition the vertices 𝒱\mathcal{V} such that each subset is a clique in GG; that is, all pairs of vertices from the same set are adjacent. The minimum number of cliques required is known as the clique cover number of GG and is denoted by χ¯(G)\overline{\chi}(G).

Graph colorings and clique covers are complementary notions – the complement of an independent set is a clique. In particular, a partition of GG into kk independent sets corresponds to a cover with kk cliques in the complementary graph G¯\overline{G}. Both clique covers and graph colorings have applications in various fields such as scheduling [21] and clustering.

The computational tasks of finding clique covers as well as graph colorings are not known to be tractable. For instance, the problem of deciding whether a graph admits a coloring with kk colors is NP-complete; in fact, it is one of Karp’s original 21 problems. However, the notion of NP-hardness is rooted in a worst-case analysis perspective, which may not be representative of the instances that are typically encountered in practice. As such, it is natural to move beyond the worst-case analysis [28] and study these problems in a suitably defined average-case instance [6]. In fact, there are a number of prior works that study problems known to be hard in the worst-case sense, but admit polynomial-time solutions for average-case instances that align with the problem’s inherent characteristics – prominent examples include community detection [1], clustering [5, 16], planted cliques and bicliques [3, 13], planted kk-disjoint-cliques [4], and graph bisection [7].

This is the viewpoint we adopt in this work. We focus on the clique cover problem, which by complementation, is equivalent to graph coloring. The central question we aim to address is:

Can the clique cover problem be efficiently solved for randomly generated instances where a clique cover structure naturally emerges?

We study the clique cover problem in instances generated from the planted clique cover model, which is specified by a set of vertices 𝒱\mathcal{V}, a corresponding partition of the vertices {𝒞l}l=1k𝒱\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}}\subset\mathcal{V}, and a parameter p[0,1]p\in[0,1]. We generate a graph from the planted clique model by creating cliques for every subset 𝒞l\mathcal{C}^{\star}_{l}, 1lk1\leq l\leq k^{\star}, and by subsequently introducing edges between vertices i,ji,j belonging to different cliques, with probability pp independently across all edges; see Figure 1 for an example.

Refer to caption
Refer to caption
Figure 1: A planted clique instance with 88 cliques of size 88 each, and with parameter p=0.2p=0.2.

Graphs generated from the planted clique cover model have a natural clique cover structure obscured by noise. Our goal is to understand whether this latent structure can be revealed using a convex relaxation of the integer programming formulation of the clique cover number. Specifically, we focus on the Lovász theta function of an undirected graph G=(𝒱,)G=(\mathcal{V},\mathcal{E}) is given by

ϑ(G):=mint,A{t:tI+AJ0,Ai,i=0,Ai,j=0 for all (i,j)},\vartheta(G)~{}~{}:=~{}~{}\min_{t,A}\Big{\{}t:tI+A-J\succeq 0,\ A_{i,i}=0,\ A_{i,j}=0\text{ for all }(i,j)\not\in\mathcal{E}\Big{\}}, (P)

where JJ is the matrix of all ones and II is the identity matrix of size |𝒱||\mathcal{V}|. We point out that the specific formulation (P) is simply one among a number of equivalent formulations, and we refer the interested reader to [17] for a list of alternative definitions.

An important property of ϑ(G)\vartheta(G) – often referred to as the “sandwich theorem” – is that it is a lower bound to the clique cover number and an upper bound to the stability number [20]

α(G)ϑ(G)χ¯(G).\alpha(G)\leq\vartheta(G)\leq\overline{\chi}(G). (1)

The Lovász theta function is the solution of a semidefinite program (SDP), and hence computable in polynomial time [22, 27] – this stands in sharp contrast with the stability number and the chromatic number of a graph, both of which are NP-hard to compute. In particular, the sandwich theorem has important consequences for perfect graphs – these are graphs for which, in every induced subgraph of GG, the chromatic number equals the size of the maximum clique. For such graphs, one has ω(G)=ϑ(G)=χ(G)\omega(G)=\vartheta(G)=\chi(G), and thus the clique number and the chromatic number of any perfect graph can be computed efficiently in polynomial time. One can also obtain the minimal clique covers and colorings for such graphs tractably – see, e.g., [18, Section 6.3.2].

Recovery of planted clique covers via ϑ(G)\vartheta(G).

The goal of this work is to show that the Lovász theta recovers the planted clique cover model with high probability. What does such a claim entail? First, note that the latter inequality in (1) – namely ϑ(G)χ¯(G)\vartheta(G)\leq\overline{\chi}(G) – relies on the fact that any clique cover corresponds to a feasible solution of the Lovász theta formulation (P). Concretely, given a graph G=(𝒱,)G=(\mathcal{V},\mathcal{E}), let 𝒞={𝒞l}l=1k\mathcal{C}=\{\mathcal{C}_{l}\}_{l=1}^{k} be any clique cover. Consider the following matrix

X=kI+(l=1kk𝟏𝒞l𝟏𝒞lTkI)J=1kl=1k(k𝟏𝒞l𝐞)(k𝟏𝒞l𝐞)T.X=kI+\Big{(}\sum_{l=1}^{k}k\mathbf{1}_{\mathcal{C}_{l}}\mathbf{1}_{\mathcal{C}_{l}}^{T}-kI\Big{)}-J={1\over k}\sum_{l=1}^{k}(k\mathbf{1}_{\mathcal{C}_{l}}-\mathbf{e})(k\mathbf{1}_{\mathcal{C}_{l}}-\mathbf{e})^{T}. (2)

It is evident that X0X\succeq 0, and hence A=l=1kk𝟏𝒞l𝟏𝒞lTkIA=\sum_{l=1}^{k}k\mathbf{1}_{\mathcal{C}_{l}}\mathbf{1}_{\mathcal{C}_{l}}^{T}-kI is a feasible solution to (P) with objective value t=kt=k. In what follows, we define AA^{\star} and XX^{\star} with respect to the planted clique covering {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}}

A:=l=1kk𝟏𝒞l𝟏𝒞lTkI and X:=kI+AJ=1kl=1k(k𝟏𝒞l𝐞)(k𝟏𝒞l𝐞)T.A^{\star}:=\sum_{l=1}^{k^{\star}}k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T}-k^{\star}I\quad\text{ and }\quad X^{\star}:=k^{\star}I+A^{\star}-J={1\over k^{\star}}\sum_{l=1}^{k^{\star}}(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e})(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e})^{T}. (3)

When we say that we wish to show that the Lovász theta recovers the planted clique, we specifically mean to prove that the pair (A,k)(A^{\star},k^{\star}) is the unique optimal solution to (P) for graphs generated according to the above planted clique cover model, with high probability.

Our first main result is as follows.

Result 1.

Let GG be a random planted clique cover instance defined on the cliques {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}}, and where we introduce edges between cliques with probability p<c:=min{14(minl1/|𝒞l|l1/|𝒞l|)2,1100}p<c:=\min\big{\{}\frac{1}{4}(\frac{\min_{l}1/|\mathcal{C}^{\star}_{l}|}{\sum_{l}1/|\mathcal{C}^{\star}_{l}|})^{2},{1\over 100}\big{\}}. Then (A,k)(A^{\star},k^{\star}) is the unique optimal solution to the Lovász theta formulation (P) with probability greater than

1i=1k|𝒞i|jiexp(2|𝒞j|(cp)2).1-\sum_{i=1}^{k^{\star}}{|\mathcal{C}^{\star}_{i}|\sum_{j\neq i}\exp(-2|\mathcal{C}^{\star}_{j}|(c-p)^{2}}).

The simplest type of graphs in the context of the clique cover problem are disjoint unions of cliques. An important intermediate step of our proof is to show that the Lovász theta function succeeds at recovering clique covers for graphs that resemble disjoint union of cliques. More concretely, we say that a graph GG with clique cover {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}} satisfies the cc-sparse clique cover (c-SCC) property if, for every vertex vv and every clique 𝒞l\mathcal{C}^{\star}_{l} that does not contain vv, one has

|e(v,𝒞l)|c|𝒞l|.|e(v,\mathcal{C}^{\star}_{l})|\leq c|\mathcal{C}^{\star}_{l}|. (cc-SCC)

Note that if the graph GG satisfies the c-SCCc\text{-}{\rm SCC} property for some c<1c<1, then every vertex v𝒱v\in\mathcal{V} cannot form a clique with any other clique 𝒞l\mathcal{C}^{\star}_{l} for which it does not belong to.

Result 2.

Suppose GG is a graph that satisfies the cc-SCC property for some c<min{14(minl1/|𝒞l|l1/|𝒞l|)2,1100}c<\min\{\frac{1}{4}(\frac{\min_{l}1/|\mathcal{C}^{\star}_{l}|}{\sum_{l}1/|\mathcal{C}^{\star}_{l}|})^{2},\frac{1}{100}\}. Then XX^{\star} is the unique optimal solution to the Lovász theta formulation (P).

If the cliques {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}} are of equal size, then the above condition requires c(1/k)2c\lesssim(1/k^{\star})^{2}. An important observation from Result 2 is that it reveals the types of graphs for which the Lovász theta function is most effective at uncovering the underlying clique cover – these are graphs for which the cc-neighborly parameter is small. The parameter pp in our model is related to the neighborly parameter cc in that small choices of pp will generate graphs with small neighborly parameter. In our numerical experiments in Section 7, we see that the parameter pp (and hence, by extension cc) also reveal something about the difficulty of computing the clique covering number of a particular graph – the ‘hard’ instances (these are defined by those requiring a large number of simplex solves using a MILP solver) correspond to graphs generated using intermediate values of pp (say p[0.6,0.7]p\in[0.6,0.7]).

1.1 Proof Outline

In this section, we give an outline of the proof of our main results. The key part of our argument is to produce a suitable dual optimal solution that certifies the optimality as well as the uniqueness of (A,k)(A^{\star},k^{\star}) in (P). First, as a reminder, the dual program to (P) is the following semidefinite program

max{J,Z:tr(Z)=1,Zi,j=0 for all (i,j),Z0}.\max\Big{\{}\,\langle J,Z\rangle\,:\,\mathrm{tr}(Z)=1,Z_{i,j}=0\text{ for all }{(i,j)\in\mathcal{E}},\ Z\succeq 0\Big{\}}. (D)

A basic consequence of weak duality is that, in order to show (A,k)(A^{\star},k^{\star}) is an optimal solution to (P) (note that this is equivalent to ϑ(G)=k\vartheta(G)=k^{\star}), it suffices to produce a dual feasible solution that attains the same objective; i.e., one would need to exhibit Z^\hat{Z} that is a feasible solution to (D) and satisfies J,Z^=k\langle J,\hat{Z}\rangle=k^{\star}. To show that (A,k)(A^{\star},k^{\star}) is the unique optimal solution to (P), it is necessary to appeal to a strict complementarity-type of result for SDPs; see, e.g., [2, 19]. In what follows, we specialize a set of conditions from [19] that guarantee unique solutions in SDPs to the Lovász theta function.

Theorem 1.1.

Let (t,A)(t,A) and Z{Z} be a pair of strict complementary primal and dual optimal solutions to (P) and (D) respectively, i.e., they satisfy:

X,Z=0 and rank(X)+rank(Z)=|𝒱|, where X=tI+AJ.\langle X,Z\rangle=0\quad\text{ and }\quad\mathrm{rank}(X)+\mathrm{rank}(Z)=|\mathcal{V}|,\qquad\text{ where }\qquad X=tI+A-J.

Then AA is the unique primal optimal solution to (P) if and only if AA is an extreme point of the feasible region of (P).

In view of Theorem 1.1, to show that (A,k)(A^{\star},k^{\star}) is the unique optimal solution of (P), it suffices to (i)(i) show that (A,k)(A^{\star},k^{\star}) is an extreme point of the feasible region of the feasible region of (P), and (ii)(ii) produce a dual optimal Z^|V|×|V|\hat{Z}\in\mathbb{R}^{|V|\times|V|} that satisfies strict complementarity.

The answer to the first task has simple answer: in Section 2, we show that (A,k)(A^{\star},k^{\star}) is an extreme point of the feasible region of (P) whenever the graph GG satisfies the c-SCCc\text{-}{\rm SCC} condition for some c<1c<1. Our proof relies on results that characterize extreme points of spectrahedra, and the specific version we use is found in Corollary 3 of [25]:

Theorem 1.2.

Let 𝒮\mathcal{S} be a spectrahedron specified in linear matrix inequality(LMI) form

𝒮={(x1,,xn)Tn:Q0+i=1nxiQi0},\mathcal{S}=\Big{\{}(x_{1},\ldots,x_{n})^{T}\in\mathbb{R}^{n}:Q_{0}+\sum_{i=1}^{n}x_{i}Q_{i}\succeq 0\Big{\}}, (4)

where Q0,Q1,,Qnm×mQ_{0},Q_{1},\ldots,Q_{n}\in\mathbb{R}^{m\times m} are symmetric matrices. Let 𝐲=(y1,,yn)Tn\mathbf{y}=(y_{1},\ldots,y_{n})^{T}\in\mathbb{R}^{n}, and UU be an m×km\times k matrix whose columns span the kernel of the matrix Q0+iyiQiQ_{0}+\sum_{i}y_{i}Q_{i}. Then, 𝐲\mathbf{y} is an extreme point of 𝒮\mathcal{S} if and only if the vectors vec(Q1U),,vec(QnU){\rm vec}(Q_{1}U),\ldots,{\rm vec}(Q_{n}U) are linearly independent.

The answer to the second task is considerably more involved. As a reminder, a matrix Z|V|×|V|Z\in~{}\mathbb{R}^{|V|\times|V|} is optimal for (D) if it satisfies:

Z0,\displaystyle Z\succeq 0, (5)
Zspan{Ei,j:(i,j)},\displaystyle Z\in{\rm span}\{E_{i,j}:\ (i,j)\in\mathcal{E}\}^{\perp}, (6)
Z,X=0(i.e. Zspan(X)),\displaystyle\langle Z,X^{\star}\rangle=0~{}(\text{i.e. }Z\in{\rm span}(X^{\star})^{\perp}), (7)
tr(Z)=1.\displaystyle\mathrm{tr}(Z)=1. (8)

Here we use the notation Ei,j=12(𝐞i𝐞jT+𝐞j𝐞iT)E_{i,j}={1\over 2}(\mathbf{e}_{i}\mathbf{e}_{j}^{T}+\mathbf{e}_{j}\mathbf{e}_{i}^{T}). We call a matrix ZZ satisfying these conditions a dual certificate. Furthermore, we say that a dual certificate ZZ and XX^{\star} satisfy strict complementarity if:

rank(Z)=|𝒱|rank(X).{\rm rank}(Z)=|\mathcal{V}|-{\rm rank}(X^{\star}). (9)

Note that we can omit condition (8) as we can always scale a matrix that satisfies conditions (5), (6), (7) and (9) to make it trace-one. The proof our two main results is given in three steps.

Step 1: Exact recovery in the case of disjoint cliques.

In the first step, we consider the simplest setting where GG is the disjoint union of cliques {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}}. With a bit of guesswork, we construct the following matrix ZZ^{\star} defined as follows:

Z:=(I|𝒞1|1|𝒞1||𝒞2|E1|𝒞1||𝒞2|EI|𝒞2|).Z^{\star}:=\left(\begin{array}[]{ccc}\frac{I}{|\mathcal{C}^{\star}_{1}|}&\frac{1}{|\mathcal{C}^{\star}_{1}||\mathcal{C}^{\star}_{2}|}E&\ldots\\ \frac{1}{|\mathcal{C}^{\star}_{1}||\mathcal{C}^{\star}_{2}|}E&\frac{I}{|\mathcal{C}^{\star}_{2}|}&\ldots\\ \vdots&\vdots&\ddots\end{array}\right). (10)

Here, the (i,j)(i,j)-th block has dimension |𝒞i|×|𝒞j||\mathcal{C}^{\star}_{i}|\times|\mathcal{C}^{\star}_{j}|.

It is relatively straightforward to verify that the conditions (6) and (7) are satisfied. In Proposition 2.4 we characterize the spectrum of the matrix ZZ^{\star}, and in so doing, show that ZZ^{\star} satisfies the conditions (5) and (9). We explain these steps in Sections 2 and 3.

Step 2: Exact recovery for graphs with the c-SCCc\text{-}{\rm SCC} property.

The matrix ZZ^{\star} that served as our dual certificate for graphs formed by disjoint cliques does not work in the more general setting as it is non-zero on every edge between distinct cliques; i.e., it violates (6). The key idea for constructing a suitable dual certificate in this case begins by recognizing that conditions (6) and (7) define subspaces:

𝒦:=span{Ei,j:(i,j)} and :=span(X).\mathcal{K}:={\rm span}\{E_{i,j}:\ (i,j)\in\mathcal{E}\}^{\perp}\ \text{ and }\ \mathcal{L}:={\rm span}(X^{\star})^{\perp}.

In view of this, we consider a dual certificate ZZ^{\prime} obtained by projecting the matrix ZZ^{\star} with respect to the Frobenious norm onto the intersection of these spaces:

Z:=argmin𝑋XZF2s.t.X𝒦.Z^{\prime}~{}~{}:=~{}~{}\underset{X}{\arg\min}~{}\|X-Z^{\star}\|_{F}^{2}\quad\mathrm{s.t.}\quad X\in\mathcal{K}\cap\mathcal{L}.

The remainder of proof is a matrix pertubation argument in which we show that ZZ\|Z^{\prime}-Z^{\star}\| is quite small, which allows us to conclude the remaining conditions (5) and (9). We now provide some geometric intuition why the projection onto the intersection 𝒦\mathcal{K}\cap\mathcal{L} succeeds.

The graphs considered in this work are either disjoint union of cliques or have the c-SCCc\text{-}{\rm SCC} property. In both cases, we have that 𝒦\mathcal{K} is a subspace of

:={X|𝒱|×|𝒱|:X=XT,(𝒞l,𝒞l)-th block is diagonal for all 1lk},\mathcal{M}:=\{X\in\mathbb{R}^{|\mathcal{V}|\times|\mathcal{V}|}:\ X=X^{T},(\mathcal{C}^{\star}_{l},\mathcal{C}^{\star}_{l})\text{-th block is diagonal for all }1\leq l\leq k\},

which we treat as the ambient space, rather than the space of all symmetric matrices. Setting

𝒦~:=span{Ei,j:(i,j)} and ~:=span(X),\tilde{\mathcal{K}}:={\rm span}\{E_{i,j}:\ (i,j)\in\mathcal{E}\}^{\perp}\cap\mathcal{M}\ \text{ and }\ \tilde{\mathcal{L}}:={\rm span}(X^{\star})^{\perp}\cap\mathcal{M},

ZZ^{\prime} can be equivalently defined as the projection of ZZ^{\star} onto the intersection of 𝒦~~\tilde{\mathcal{K}}\cap\tilde{\mathcal{L}}.

Z:=argmin𝑋XZF2s.t.X𝒦~~.Z^{\prime}~{}~{}:=~{}~{}\underset{X}{\arg\min}~{}\|X-Z^{\star}\|_{F}^{2}\quad\mathrm{s.t.}\quad X\in\tilde{\mathcal{K}}\cap\tilde{\mathcal{L}}. (11)

In Section 4 we give the main technical result of this work, where we establish that the subspaces 𝒦~\tilde{\mathcal{K}}^{\perp} and ~\tilde{\mathcal{L}}^{\perp} are “almost orthogonal”. For this, it is now crucial that our ambient space is \mathcal{M}.

Step 3: Recovery for planted clique cover instances.

Our third step shows that planted clique instances are c-SCCc\text{-}{\rm SCC}, with high probability. The proof is a direction application of Hoeffding’s inequality combined with a union bound, and is provided in Section 5.

2 Extremality of (A,k)(A^{\star},k^{\star})

The goal of this section is to provide a simple sufficient condition that guarantees when (A,k)(A^{\star},k^{\star}) is an extreme point of the feasible region of (P). As we noted in Section 1.1, we rely on a general result that describes extreme points of spectrahedra specified by a linear matrix inequality (LMI), stated in the form of Theorem 1.2. We begin by expressing the feasible region of (P) as the following LMI

{(t,ai,j)(i,j)||+1:J+tI+(i,j)ai,jEi,j0}.\Big{\{}(t,a_{i,j})_{(i,j)\in\mathcal{E}}\in\mathbb{R}^{|\mathcal{E}|+1}:-J+tI+\sum_{(i,j)\in\mathcal{E}}a_{i,j}E_{i,j}\succeq 0\Big{\}}.

This suggests we should take the matrices {I}{Ei,j:(i,j)}\{I\}\cup\{E_{i,j}:(i,j)\in\mathcal{E}\} to be {Qi}i=1n\{Q_{i}\}_{i=1}^{n}, and to identify the matrix XX^{\star} with Q0+ixiQiQ_{0}+\sum_{i}x_{i}Q_{i} in Theorem 1.2. Our next task is to characterize the kernel of XX^{\star}.

2.1 Computing the kernel of XX^{\star}

Proposition 2.1.

Setting

𝒥:={𝐱|𝒱|:𝐱,𝟏𝒞1==𝐱,𝟏𝒞k},\mathcal{J}:=\big{\{}\mathbf{x}\in\mathbb{R}^{|\mathcal{V}|}:\langle\mathbf{x},\mathbf{1}_{\mathcal{C}^{\star}_{1}}\rangle=\ldots=\langle\mathbf{x},\mathbf{1}_{\mathcal{C}^{\star}_{k^{\star}}}\rangle\big{\}},

we have that

𝒥=span{𝟏𝒞1𝟏𝒞l:2lk}=span{k𝟏𝒞l𝐞:l[k]}.\mathcal{J}=\mathrm{span}\{\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}:2\leq l\leq k^{\star}\}^{\perp}=\mathrm{span}\{k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e}:l\in[k^{\star}]\}^{\perp}.
Proof.

The first equality follows immediately from definition since 𝐱,𝟏𝒞1=𝐱,𝟏𝒞l𝐱,𝟏𝒞1𝟏𝒞l\langle\mathbf{x},\mathbf{1}_{\mathcal{C}^{\star}_{1}}\rangle=\langle\mathbf{x},\mathbf{1}_{\mathcal{C}^{\star}_{l}}\rangle\Leftrightarrow\langle\mathbf{x},\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}\rangle. We focus on the second equality. To do so, it suffices to show that every vector of the form k𝟏𝒞l𝐞k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e}, l[k]l\in[k^{\star}], is in the span of 𝟏𝒞1𝟏𝒞l\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}, 2lk2\leq l\leq k^{\star}, and vice versa.

First, we have 𝟏𝒞i𝟏𝒞j=(𝟏𝒞1𝟏𝒞j)(𝟏𝒞1𝟏𝒞i)\mathbf{1}_{\mathcal{C}^{\star}_{i}}-\mathbf{1}_{\mathcal{C}^{\star}_{j}}=(\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{j}})-(\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{i}}). Then, note that k𝟏𝒞l𝐞=i=1k(𝟏𝒞l𝟏𝒞i)k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e}=\sum_{i=1}^{k^{\star}}(\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{1}_{\mathcal{C}^{\star}_{i}}). Hence the vectors k𝟏𝒞l𝐞k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e} lies in the linear span of vectors of the form 𝟏𝒞1𝟏𝒞l\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}, 2lk2\leq l\leq k^{\star}.

In the reverse direction, note that 𝟏𝒞1𝟏𝒞l=1k(k𝟏𝒞1𝐞)1k(k𝟏𝒞l𝐞)\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}=\frac{1}{k^{\star}}(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{e})-\frac{1}{k^{\star}}(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e}). In other words, every vector of the form 𝟏𝒞1𝟏𝒞l\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}, 2lk2\leq l\leq k^{\star}, lies in the linear span of vectors of the form k𝟏𝒞l𝐞k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e}, l[k]l\in[k^{\star}]. This establishes the second equality. ∎

As an immediate consequence we have that:

Proposition 2.2.

The kernel of XX^{\star} is equal to 𝒥.\mathcal{J}.

Proof.

By definition of XX^{\star} we have that range(X)=span{k𝟏𝒞l𝐞:l[k]}.{\rm range}(X^{\star})={\rm span}\{k^{\star}\mathbf{1}_{\mathcal{C}_{l}}-\mathbf{e}:l\in[k]\}. Thus,

ker(X)=span{k𝟏𝒞l𝐞:l[k]}.{\rm ker}(X^{\star})={\rm span}\{k^{\star}\mathbf{1}_{\mathcal{C}_{l}}-\mathbf{e}:l\in[k]\}^{\perp}.

Proposition 2.3.

The following statements are equivalent for a PSD matrix Z|𝒱|×|𝒱|Z\in\mathbb{R}^{|\mathcal{V}|\times|\mathcal{V}|}:

  • (1)

    Zspan(X)Z\in{\rm span}(X^{\star})^{\perp}.

  • (2)

    Z(k𝟏𝒞l𝐞)=0,Z(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e})=0, for all l[k].l\in[k^{\star}].

  • (3)

    range(Z)𝒥\mathrm{range}(Z)\subseteq\mathcal{J}.

  • (4)

    𝐳i𝒥\mathbf{z}_{i}\in\mathcal{J} for all i𝒱i\in\mathcal{V} – here, 𝐳i\mathbf{z}_{i} are rows of ZZ.

Moreover, we have that rank(X)+rank(Z)=|𝒱|{\rm rank}(X^{\star})+{\rm rank}(Z)=|\mathcal{V}| if and only if range(Z)=𝒥\mathrm{range}(Z)=\mathcal{J}.

Proof.

(1)(2)(1)\Longleftrightarrow(2) We have that

Z,X=0l(k𝟏𝒞l𝐞)TZ(k𝟏𝒞l𝐞)=0Z(k𝟏𝒞l𝐞)=0,l[k]\langle Z,X^{\star}\rangle=0\iff\sum_{l}(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e})^{T}Z(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e})=0\iff Z(k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e})=0,\ \forall l\in[k^{\star}]

where the last equivalence holds as Z0Z\succeq 0.

(2)(3)(2)\Longleftrightarrow(3) Note that (2) is equivalent to

ker(Z)range(X)=span{k𝟏𝒞l𝐞:l[k]}.\ker(Z)\supseteq\mathrm{range}(X^{\star})=\mathrm{span}\{k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e}:l\in[k^{\star}]\}.

Taking orthogonal complements, this is in turn equivalent to

range(Z)span{k𝟏𝒞l𝐞:l[k]}=𝒥.\mathrm{range}(Z)\subseteq\mathrm{span}\{k^{\star}\mathbf{1}_{\mathcal{C}^{\star}_{l}}-\mathbf{e}:l\in[k^{\star}]\}^{\perp}=\mathcal{J}.

Finally (3)(4)(3)\iff(4) since range(Z)=range(ZT)\mathrm{range}(Z)=\mathrm{range}(Z^{T}). ∎

Our next result describes the spectrum of the matrix ZZ^{\star} and show that the columns of the matrix ZZ^{\star} span the kernel of XX^{\star}. Recall that

Z:=(I|𝒞1|1|𝒞1||𝒞2|E1|𝒞1||𝒞2|EI|𝒞2|),Z^{\star}:=\left(\begin{array}[]{ccc}\frac{I}{|\mathcal{C}^{\star}_{1}|}&\frac{1}{|\mathcal{C}^{\star}_{1}||\mathcal{C}^{\star}_{2}|}E&\ldots\\ \frac{1}{|\mathcal{C}^{\star}_{1}||\mathcal{C}^{\star}_{2}|}E&\frac{I}{|\mathcal{C}^{\star}_{2}|}&\ldots\\ \vdots&\vdots&\ddots\end{array}\right), (12)

This matrix can be alternatively expressed in a more convenient form:

Z=diag(𝐠)+𝐠𝐠Tl𝟏𝒞l𝟏𝒞lT|𝒞l|2where𝐠=(1|𝒞1|,|𝒞1|,,,1|𝒞k||𝒞k|)T.Z^{\star}=\mathrm{diag}(\mathbf{g})+\mathbf{g}\mathbf{g}^{T}-\sum_{l}\frac{\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T}}{|\mathcal{C}^{\star}_{l}|^{2}}\quad\text{where}\quad\mathbf{g}=\Big{(}\underbrace{\frac{1}{|\mathcal{C}^{\star}_{1}|},\ldots}_{|\mathcal{C}^{\star}_{1}|},\ldots,\underbrace{\ldots,\frac{1}{|\mathcal{C}^{\star}_{k^{\star}}|}}_{|\mathcal{C}^{\star}_{k^{\star}}|}\Big{)}^{T}.

To be clear, this is precisely the same matrix (10) we use as our dual certificate whenever GG is a disjoint union of cliques. However, we will not require any information about ZZ^{\star} being a dual certificate at this juncture.

Proposition 2.4.

The eigenvalues of ZZ^{\star} are l=1k1/|𝒞l|\sum_{l=1}^{k^{\star}}1/|\mathcal{C}^{\star}_{l}| (with multiplicity one), 0 (with multiplicity k1k^{\star}-1), and 1/|𝒞l|1/|\mathcal{C}^{\star}_{l}| with multiplicity |𝒞l|1|\mathcal{C}^{\star}_{l}|-1. Moreover, we have that range(Z)=𝒥\mathrm{range}(Z^{\star})=\mathcal{J}.

Proof.

First, note that the 𝐠\mathbf{g} is an eigenvector whose eigenvalue is i=1k1/|𝒞i|\sum_{i=1}^{k^{\star}}1/|\mathcal{C}^{\star}_{i}|. Second, note that the vector 𝟏𝒞1𝟏𝒞l\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}} is an eigenvector with eigenvalue 0, for all 2lk2\leq l\leq k^{\star}. Third, fix a subset 𝒞l\mathcal{C}^{\star}_{l}. Let 𝐱|V|\mathbf{x}\in\mathbb{R}^{|V|} be a vector with entries in the coordinates corresponding to 𝒞l\mathcal{C}^{\star}_{l} satisfying 𝐞T𝐱=0\mathbf{e}^{T}\mathbf{x}=0. One can check that (𝐠𝐠Tl(𝟏𝒞l𝟏𝒞lT)/|𝒞l|2)𝐱=0(\mathbf{g}\mathbf{g}^{T}-\sum_{l}(\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T})/|\mathcal{C}^{\star}_{l}|^{2})\mathbf{x}=0, and hence Z𝐱=diag(𝐠)𝐱=𝐱/|𝒞l|Z^{\star}\mathbf{x}=\mathrm{diag}(\mathbf{g})\mathbf{x}=\mathbf{x}/|\mathcal{C}^{\star}_{l}|. The dimension of this eigenspace is |𝒞l|1|\mathcal{C}^{\star}_{l}|-1. Finally, we have that

ker(Z)=span{𝟏𝒞1𝟏𝒞l:2lk}\ker(Z^{\star})=\mathrm{span}\{\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}:2\leq l\leq k^{\star}\}

and using Proposition 2.1 we get that

range(Z)=span{𝟏𝒞1𝟏𝒞l:2lk}=𝒥.\mathrm{range}(Z^{\star})=\mathrm{span}\{\mathbf{1}_{\mathcal{C}^{\star}_{1}}-\mathbf{1}_{\mathcal{C}^{\star}_{l}}:2\leq l\leq k^{\star}\}^{\perp}=\mathcal{J}.

2.2 Establishing extremality

Lemma 2.5.

Consider a graph GG with a clique cover {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}}. Let 𝒮𝒱\mathcal{S}\subset\mathcal{V} be a subset of vertices such that, for all cliques except one, 𝒮\mathcal{S} leaves out at least one vertex. Then, the columns of ZZ^{\star} corresponding to 𝒮\mathcal{S} are linearly independent.

Proof.

Without loss of generality, we assume that for all cliques except 𝒞1\mathcal{C}^{\star}_{1}, the set of vertices 𝒮\mathcal{S} contains at most |𝒞l|1|\mathcal{C}^{\star}_{l}|-1 vertices from clique 𝒞l\mathcal{C}^{\star}_{l}, for all l1l\neq 1.

We index the columns of ZZ^{\star} by (l,j)(l,j), where the index 1lk1\leq l\leq k^{\star} refers to the clique while the index j𝒞lj\in\mathcal{C}^{\star}_{l} refers to the vertex within the clique. Let {𝐯l,j}\{\mathbf{v}_{l,j}\} be the columns of the matrix ZZ^{\star} and consider a linear combination:

(l,j)𝒮θl,j𝐯l,j=0.\sum_{(l,j)\in\mathcal{S}}\theta_{l,j}\mathbf{v}_{l,j}=0. (13)

Fix a cluster 𝒞l\mathcal{C}^{\star}_{l}, and let PlP_{l} be the projection of |𝒱|\mathbb{R}^{|\mathcal{V}|} onto the coordinates corresponding to 𝒞l\mathcal{C}^{\star}_{l}. By applying PlP_{l} to (13) we get:

j:(l,j)𝒮θl,jPl(𝐯l,j)=(l,j)𝒮,1llkθl,jPl(𝐯l,j)=c(1,,1)T.\sum_{j:(l,j)\in\mathcal{S}}\theta_{l,j}P_{l}(\mathbf{v}_{l,j})=-\sum_{(l^{\prime},j)\in\mathcal{S},1\leq l^{\prime}\neq l\leq k^{\star}}\theta_{l^{\prime},j}P_{l}(\mathbf{v}_{l^{\prime},j})=c(1,\ldots,1)^{T}. (14)

The rightmost equality follows by noting that Pl(𝐯l,j)=𝐞|𝒞l|P_{l}(\mathbf{v}_{l^{\prime},j})=\mathbf{e}\in\mathbb{R}^{|\mathcal{C}^{\star}_{l}|} for all jj. On the other hand, Pl(𝐯l,j)=𝐞j|𝒞l|P_{l}(\mathbf{v}_{l,j})=\mathbf{e}_{j}\in\mathbb{R}^{|\mathcal{C}^{\star}_{l}|} are standard basis vectors. Since 𝒮\mathcal{S} leaves out at least one vector in 𝒞l\mathcal{C}^{\star}_{l}, the span of these vectors cannot include 𝐞\mathbf{e}. So it follows from (13) that c=0c=0, and in particular, θl,j=0\theta_{l,j}=0 for all jj. We repeat this argument for all 2lk2\leq l\leq k^{\star} and conclude that θl,j=0\theta_{l,j}=0 for all l2l\geq 2, and all jj.

Finally, we consider the vectors 𝐯1,j\mathbf{v}_{1,j} and project these to the rows in 𝒞1\mathcal{C}^{\star}_{1}. The vectors {Pl(𝐯1,j)}\{P_{l}(\mathbf{v}_{1,j})\} are standard basis vectors. Since (1,j)𝒮θ1,jPl(𝐯1,j)=0\sum_{(1,j)\in\mathcal{S}}\theta_{1,j}P_{l}(\mathbf{v}_{1,j})=0, it must be that θ1,j=0\theta_{1,j}=0 for all jj. Hence all the coefficients are zero. This proves that the columns in 𝒮\mathcal{S} are linearly independent. ∎

Lemma 2.6.

Let G=(𝒱,)G=(\mathcal{V},\mathcal{E}) be a graph that satisfies the c-SCCc\text{-}{\rm SCC} property for some c<1c<1. Then the collection of matrices {Ei,jZ:(i,j)}{Z}\{E_{i,j}Z^{\star}:(i,j)\in\mathcal{E}\}\cup\{Z^{\star}\} are linearly independent.

Proof.

Consider a linear combination

(i,j)θi,jEi,jZ+θZ=0.\sum_{(i,j)\in\mathcal{E}}\theta_{i,j}E_{i,j}Z^{\star}+\theta Z^{\star}=0.

Fix a vertex m𝒞lm\in\mathcal{C}^{\star}_{l}, and for all lll^{\prime}\neq l, we let Nl(m)𝒞lN_{l^{\prime}}(m)\subset\mathcal{C}^{\star}_{l^{\prime}} denote the neighbors of vertex mm in 𝒞l\mathcal{C}^{\star}_{l^{\prime}}. Then the mm-th row of the above expression is given by:

jNl(m),lmθm,j𝐳j+θ𝐳m=0,\sum_{\begin{subarray}{c}j\in N_{l^{\prime}}(m),l^{\prime}\neq m\end{subarray}}\theta_{m,j}\mathbf{z}^{\star}_{j}+\theta\mathbf{z}^{\star}_{m}=0,

where 𝐳j\mathbf{z}_{j}^{\star} denotes the jj-th row of ZZ^{\star}. Now, since the vertex mm is not fully connected to clusters 𝒞l\mathcal{C}^{\star}_{l^{\prime}} for all lll^{\prime}\neq l, the collection of vertices {Nl(m)}lm{m}\{N_{l^{\prime}}(m)\}_{l^{\prime}\neq m}\cup\{m\} satisfy the conditions of the subset 𝒮\mathcal{S} in Lemma 2.5. By Lemma 2.5, the corresponding columns of ZZ^{\star} are linearly independent, which means that θ=θm,j=0\theta=\theta_{m,j}=0 for all jj. The result follows by repeating this argument for all mm. ∎

Theorem 2.7.

Let G=(𝒱,)G=(\mathcal{V},\mathcal{E}) be a graph, and let {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k} be a clique cover for GG. Suppose GG satisfies the c-SCCc\text{-}{\rm SCC} property for some c<1c<1. Then (A,k)(A^{\star},k^{\star}) is an extreme point of the feasible region of (P).

Proof of Theorem 2.7.

Suppose GG satisfies the c-SCCc\text{-}{\rm SCC} property for some c<1c<1. By Lemma 2.6, the collection of matrices {Ei,jZ:(i,j)}{Z}\{E_{i,j}Z^{\star}:(i,j)\in\mathcal{E}\}\cup\{Z^{\star}\} are linearly independent. Hence the vectors in {vec(Ei,jZ):(i,j)}{vec(Z)}\{{\rm vec}(E_{i,j}Z^{\star}):(i,j)\in\mathcal{E}\}\cup\{{\rm vec}(Z^{\star})\} are also linearly independent. This means that the following matrix formed by combining all the (vectorized) matrix product Ei,jZE_{i,j}Z^{\star}, ranging over all edges (i,j)(i,j)\in\mathcal{E}, as well as ZZ^{\star}, has full column rank

(vec(Ei,jZ)vec(Z))(i,j).\left(\begin{array}[]{c|c|c|c}\ldots&\mathrm{vec}(E_{i,j}Z^{\star})&\ldots&\mathrm{vec}(Z^{\star})\end{array}\right)_{(i,j)\in\mathcal{E}}.

Hence, by Theorem 1.1, (A,k)(A^{\star},k^{\star}) is an extreme point of the feasible region of (P). ∎

3 Exact recovery in the case of disjoint cliques

In this section, we show that (A,k)(A^{\star},k^{\star}) is the unique optimal solution to the Lovász theta formulation (P) when GG is a disjoint union of cliques. In this simple case, GG satisfies the c-SCCc\text{-}{\rm SCC} condition with c=0c=0. Following the conclusions of Theorem 2.7, (A,k)(A^{\star},k^{\star}) is an extreme point of the feasible region of (P). Based on Theorem 1.1, it remains to produce a suitable dual witness ZZ satisfying the requirements listed in Theorem 1.1.

As it turns out, the matrix ZZ^{\star} satisfies all the requirements we need! In fact, in the process of computing the spectrum of the matrix ZZ^{\star} in Proposition 2.4, we have also verified that ZZ^{\star} does indeed satisfy the requirements of Theorem 1.1. We collect these conclusions below.

Theorem 3.1.

Let GG be the graph formed by the disjoint union of cliques {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}}. Then the matrix (A,k)(A^{\star},k^{\star}) is the unique solution of (P).

Proof.

As we noted above, we simply need to check that the matrix (1/k)Z(1/k^{\star})Z^{\star} satisfies conditions (5) to (9). First, from Proposition 2.4, the matrix ZZ^{\star} is PSD, which is (5). Second, all the edges of GG are within cliques. The matrix ZZ^{\star}, when restricted to each clique 𝒞l\mathcal{C}^{\star}_{l}, is the identity matrix, and thus satisfies the support condition (6). It is easy to verify that all the rows of ZZ^{\star} belong to 𝒥\mathcal{J}. Hence, by Proposition 2.3, Zspan(X)Z^{\star}\in{\rm span}(X^{\star})^{\perp}, which is (7). It is easy to see that tr(Z)=k\mathrm{tr}(Z^{\star})=k^{\star} and J,Z=(k)2\langle J,Z^{\star}\rangle=(k^{\star})^{2}, which shows (8). Last, by Proposition 2.4, we have range(Z)=𝒥\mathrm{range}(Z^{\star})=\mathcal{J}. Hence, by Proposition 2.3, we have rank(X)+rank(Z)=|𝒱|{\rm rank}(X^{\star})+{\rm rank}(Z^{\star})=|\mathcal{V}|, which is (9). ∎

4 Exact recovery for graphs with the c-SCCc\text{-}{\rm SCC} proeprty

In this section, let GG be a graph that satisfies the c-SCCc\text{-}{\rm SCC} property. Our goal is to extend the results in Section 3, and show that (A,k)(A^{\star},k^{\star}) is also the unique solution to (P) for graphs satisfying the c-SCCc\text{-}{\rm SCC} condition, provided cc is appropriately small. Following our discussion in Section 3, the matrix XX^{\star} remains an extreme point of the feasible region of (P) if c<1c<1. Hence, by Theorem 1.1, it remains to produce a suitable dual witness ZZ satisfying the requirements listed in Theorem 1.1.

4.1 Incoherence-type result

The certificate we will use for graphs with the c-SCCc\text{-}{\rm SCC} property is Z=P𝒦~~(Z)Z^{\prime}=P_{\tilde{\mathcal{K}}\cap\tilde{\mathcal{L}}}(Z^{\star}), defined as the projection of ZZ^{\star} onto 𝒦~~{\tilde{\mathcal{K}}\cap\tilde{\mathcal{L}}} with respect to the Frobenius norm, i.e.,

Z:=argmin𝑋XZF2s.t.X𝒦~~.Z^{\prime}~{}~{}:=~{}~{}\underset{X}{\arg\min}~{}\|X-Z^{\star}\|_{F}^{2}\quad\mathrm{s.t.}\quad X\in\tilde{\mathcal{K}}\cap\tilde{\mathcal{L}}.

Using the first-order optimality conditions we have that

ZZ=L~+K~,Z^{\star}-Z^{\prime}=\tilde{L}+\tilde{K}, (15)

where L~\tilde{L} lies in the normal cone of \mathcal{L} at ZZ^{\prime} and K~\tilde{K} lies in the normal cone of 𝒦\mathcal{K} at ZZ^{\prime}. Recalling that the normal cone of a subspace is its orthogonal complement we have that L~~\tilde{L}\in\tilde{\mathcal{L}}^{\perp} and K~𝒦~\tilde{K}\in\tilde{\mathcal{K}}^{\perp}.

Our main result can be viewed as the extension of orthogonality for graphs that are disjoint unions of cliques to graphs with the c-SCCc\text{-}{\rm SCC} property. The proof is broken up into a sequence of results that follow.

Theorem 4.1.

Let GG be a graph satisfying the c-SCCc\text{-}{\rm SCC} property. Then,

  • (i)(i)

    For any K~𝒦~\tilde{K}\in\tilde{\mathcal{K}}^{\perp} and L~~\tilde{L}\in\tilde{\mathcal{L}}^{\perp} we have

    |K~,L~|2cK~FL~F.|\langle\tilde{K},\tilde{L}\rangle|\leq 2\sqrt{c}\|\tilde{K}\|_{F}\|\tilde{L}\|_{F}.
  • (ii)(ii)

    For any L~~\tilde{L}\in\tilde{\mathcal{L}}^{\perp} we have

    P𝒦(L~)F(2c)1/2L~F.\|P_{\mathcal{K}^{\perp}}(\tilde{L})\|_{F}\leq(2\sqrt{c})^{1/2}\|\tilde{L}\|_{F}.

A direct sum decomposition for ~\tilde{\mathcal{L}}^{\perp}.

The first step is to provide a decomposition of the space ~\tilde{\mathcal{L}}^{\perp} into simpler subspaces, on which it is easier to prove the near orthogonality property. We use these results as basic ingredients to build up to our near orthogonality property later.

Define the matrices {Fx,y,z:1xyk,1z|𝒞x|}\{F_{x,y,z}:1\leq x\neq y\leq k,1\leq z\leq|\mathcal{C}^{\star}_{x}|\} so that (i) the (𝒞x,𝒞x)(\mathcal{C}^{\star}_{x},\mathcal{C}^{\star}_{x})-th block is a diagonal matrix whose zz-th entry is set equal to 2(|𝒞x|1)-2(|\mathcal{C}^{\star}_{x}|-1) and all remaining entries equal to 22, and (ii) the (𝒞x,𝒞y)(\mathcal{C}^{\star}_{x},\mathcal{C}^{\star}_{y})-th block ((𝒞y,𝒞x)(\mathcal{C}^{\star}_{y},\mathcal{C}^{\star}_{x})-th block) is such that entries in the zz-th row (column) are equal to |𝒞x|1|\mathcal{C}^{\star}_{x}|-1 and all other entries equal to 1-1, (iii) all other entries are zero. As an example, in the case where k=2k=2 the matrix F1,2,1F_{1,2,1} is given by:

F1,2,1:=(2(|𝒞1|1)|𝒞1|1|𝒞1|1211211|𝒞1|111|𝒞1|111),F_{1,2,1}:=\left(\begin{array}[]{cccc|ccc}-2(|\mathcal{C}^{\star}_{1}|-1)&&&&|\mathcal{C}^{\star}_{1}|-1&\ldots&|\mathcal{C}^{\star}_{1}|-1\\ &2&&&-1&\ldots&-1\\ &&\ddots&&\vdots&&\vdots\\ &&&2&-1&\ldots&-1\\ \hline\cr|\mathcal{C}^{\star}_{1}|-1&-1&\ldots&-1&&&\\ \vdots&\vdots&&\vdots&&&\\ |\mathcal{C}^{\star}_{1}|-1&-1&\ldots&-1&&&\\ \end{array}\right),

where omitted entries are zero. Second, consider the following subspaces:

Name Description Dimension
𝒯1\mathcal{T}_{1} {(γ1Iθ1,2Eθ1,3Eθ1,nEθ2,1Eγ2Iθ2,3Eθ2,nEθ3,1Eθ3,2Eθn,1Eθn,2EγnI):γi+θi,j=0}\left\{\left(\begin{array}[]{c|c|c|c|c}\gamma_{1}I&\theta_{1,2}E&\theta_{1,3}E&\ldots&\theta_{1,n}E\\ \hline\cr\theta_{2,1}E&\gamma_{2}I&\theta_{2,3}E&\ldots&\theta_{2,n}E\\ \hline\cr\theta_{3,1}E&\theta_{3,2}E&&&\\ \hline\cr\vdots&\vdots&&&\\ \hline\cr\theta_{n,1}E&\theta_{n,2}E&&&\gamma_{n}I\end{array}\right):\sum\gamma_{i}+\sum\theta_{i,j}=0\right\} (k+12)1{k+1\choose 2}-1
𝒯2\mathcal{T}_{2} Span{Fx,y,z:1xyk,1zn}\mathrm{Span}\left\{F_{x,y,z}:1\leq x\neq y\leq k,1\leq z\leq n\right\} (|V|k)×(k2)(|V|-k)\times(k-2)
Proposition 4.2.

We have that

~=𝒯1𝒯2.\tilde{\mathcal{L}}^{\perp}=\mathcal{T}_{1}\oplus\mathcal{T}_{2}.
Proof of Proposition 4.2.

The fact that 𝒯1\mathcal{T}_{1} and 𝒯2\mathcal{T}_{2} are orthogonal is easy to check. Define the matrix Lx,y,z|𝒱|×|𝒱|L_{x,y,z}\in\mathbb{R}^{|\mathcal{V}|\times|\mathcal{V}|}, where 1xyk1\leq x\neq y\leq k, and 1z|𝒞x|1\leq z\leq|\mathcal{C}^{\star}_{x}| such that (i) the zz-th entry of the (𝒞x,𝒞x)(\mathcal{C}^{\star}_{x},\mathcal{C}^{\star}_{x})-th block is equal to 22, and (ii) the entries of the zz-th row (column) of the (𝒞x,𝒞y)(\mathcal{C}^{\star}_{x},\mathcal{C}^{\star}_{y})-th block ((𝒞y,𝒞x)(\mathcal{C}^{\star}_{y},\mathcal{C}^{\star}_{x})-th block) are all equal to 1-1, and (iii) all other entries are 0.

As an example, in the case of two cliques (so k=2k=2) the matrix L1,2,1L_{1,2,1} is given by:

L1,2,1:=(21111),L_{1,2,1}:=\left(\begin{array}[]{cc|ccc}2&&-1&\ldots&-1\\ &&&&\\ \hline\cr-1&&&&\\ \vdots&&&&\\ -1&&&&\\ \end{array}\right),

where all omitted entries are zero.

First, observe that the matrices {Lx,y,z}\{L_{x,y,z}\} specify all the linear equalities in the subspace ~\tilde{\mathcal{L}}, and thus, ~\tilde{\mathcal{L}}^{\perp} is precisely the span of {Lx,y,z}\{L_{x,y,z}\}. As such, to show that 𝒯1\mathcal{T}_{1} and 𝒯2\mathcal{T}_{2} span ~\tilde{\mathcal{L}}^{\perp}, it suffices to show that every matrix Lx,y,zL_{x,y,z} is expressible as the sum of matrices belonging to each of these subspaces. As a concrete example, we show this is true for L1,2,1L_{1,2,1} – the construction for other matrices are similar. Indeed, one checks that L1,2,1L_{1,2,1} is the linear sum of these matrices

L1,2,1=1|𝒞1|(2IE0E00)𝒯11|𝒞1|F1,2,1𝒯2.L_{1,2,1}=\underbrace{\frac{1}{|\mathcal{C}^{\star}_{1}|}\left(\begin{array}[]{c|c|c|c}2I&-E&0&\ldots\\ \hline\cr-E&0&\ldots&\\ \hline\cr 0&\vdots&\ddots&\\ \hline\cr\vdots&&&\end{array}\right)}_{\mathcal{T}_{1}}-\frac{1}{|\mathcal{C}^{\star}_{1}|}\underbrace{F_{1,2,1}}_{\mathcal{T}_{2}}.

This completes the proof. ∎

In view of Proposition 4.2, to bound the inner product between vectors in 𝒦~\tilde{\mathcal{K}}^{\perp} and ~\tilde{\mathcal{L}}^{\perp} respectively, it suffices to bound inner product between (i)𝒦~(i)\ \tilde{\mathcal{K}}^{\perp} and 𝒯1\mathcal{T}_{1} and (ii)𝒦~(ii)\ \tilde{\mathcal{K}}^{\perp} and 𝒯2\mathcal{T}_{2}.

Next, we describe a result that shows how incoherence computation for (a small number of) orthogonal subspaces can be put together to obtain incoherence computations.

Lemma 4.3.

Let {𝒯i}i=1r\{\mathcal{T}_{i}\}_{i=1}^{r} be orthogonal subspaces of d\mathbb{R}^{d}. Consider sds\in\mathbb{R}^{d} such that

|s,ti|ϵs2ti2 for all ti𝒯i.|\langle s,t_{i}\rangle|\leq\epsilon\|s\|_{2}\|t_{i}\|_{2}\quad\text{ for all }t_{i}\in\mathcal{T}_{i}.

Then, for ti𝒯it\in\oplus_{i}\mathcal{T}_{i} we have that

|s,t|ϵrs2t2.|\langle s,t\rangle|\leq\epsilon\sqrt{r}\|s\|_{2}\|t\|_{2}.
Proof of Lemma 4.3.

For any t=ti𝒯it=\sum t_{i}\in\oplus\mathcal{T}_{i} we have

|s,t|=|s,i=1rti|i=1r|s,ti|ϵi=1rs2ti2ϵrs2(i=1rti22)1/2=ϵrs2t2,|\langle s,t\rangle|=|\langle s,\sum_{i=1}^{r}t_{i}\rangle|\leq\sum_{i=1}^{r}|\langle s,t_{i}\rangle|\leq\epsilon\sum_{i=1}^{r}\|s\|_{2}\|t_{i}\|_{2}\leq\epsilon\sqrt{r}\|s\|_{2}(\sum_{i=1}^{r}\|t_{i}\|_{2}^{2})^{1/2}=\epsilon\sqrt{r}\|s\|_{2}\|t\|_{2},

where the second last inequality follows from Cauchy-Schwarz, while the last equality uses the fact that 𝒯=𝒯i\mathcal{T}=\oplus\mathcal{T}_{i}. ∎

Bounding inner product between 𝒦~\tilde{\mathcal{K}}^{\perp} and 𝒯2\mathcal{T}_{2}.

Define the column vectors {𝐟i}i=1n\{\mathbf{f}_{i}\}_{i=1}^{n} by

𝐟i=n𝐞i𝐞=(,1i1,n1i,1i+1,)Tn.\mathbf{f}_{i}=n\mathbf{e}_{i}-\mathbf{e}=(\ldots,\underbrace{-1}_{i-1},\underbrace{n-1}_{i},\underbrace{-1}_{i+1},\ldots)^{T}\in\mathbb{R}^{n}.

Note that

𝐟i𝐞T=(fifi) and 𝐞𝐟iT=(fiTfiT).\mathbf{f}_{i}\mathbf{e}^{T}=\begin{pmatrix}f_{i}&\ldots&f_{i}\end{pmatrix}\text{ and }\mathbf{e}\mathbf{f}_{i}^{T}=\begin{pmatrix}f_{i}^{T}\\ \vdots\\ f_{i}^{T}\end{pmatrix}.

With a slight abuse of notation, we define the subspace of matrices in n1×n2\mathbb{R}^{n_{1}\times n_{2}}

𝒩C=span(𝐟1𝐞T,,𝐟n𝐞T) and 𝒩R=span(𝐞𝐟1T,,𝐞𝐟nT).\mathcal{N}_{C}={\rm span}(\mathbf{f}_{1}\mathbf{e}^{T},\ldots,\mathbf{f}_{n}\mathbf{e}^{T})\qquad\text{ and }\qquad\mathcal{N}_{R}={\rm span}(\mathbf{e}\mathbf{f}_{1}^{T},\ldots,\mathbf{e}\mathbf{f}_{n}^{T}).

The abuse of notation arises the vectors 𝐟i\mathbf{f}_{i} in the definitions of 𝒩C\mathcal{N}_{C} and 𝒩R\mathcal{N}_{R} are different. Note that because 𝐟iT𝐞=0\mathbf{f}_{i}^{T}\mathbf{e}=0, the subspaces 𝒩C\mathcal{N}_{C} and 𝒩R\mathcal{N}_{R} have dimensions n11n_{1}-1 and n21n_{2}-1 respectively and are orthogonal. 𝒩C\mathcal{N}_{C} and 𝒩R\mathcal{N}_{R} are relevant for our problem as the block off-diagonal entries of any matrix in 𝒯2\mathcal{T}_{2} belong to 𝒩C𝒩R.\mathcal{N}_{C}\oplus\mathcal{N}_{R}.

Lemma 4.4.

Let Kn1×n2K\in\mathbb{R}^{n_{1}\times n_{2}} such that each row has at most cn2cn_{2} non-zero entries for some 0c10\leq c\leq 1. Then, for any L𝒩CL\in\mathcal{N}_{C} we have that

|K,L|cLFKF.|\langle K,L\rangle|\leq\sqrt{c}\|L\|_{F}\|K\|_{F}.

The same conclusion holds if each column of KK has at most cn1cn_{1} non-zero entries and L𝒩RL\in\mathcal{N}_{R}.

Proof of Lemma 4.4.

Since L𝒩CL\in\mathcal{N}_{C}, it has constant columns. Suppose its column entries are (θ1,,θn1)(\theta_{1},\ldots,\theta_{n_{1}}), i.e., Lx,y=θxL_{x,y}=\theta_{x} for all yy. We then have

K,L=tr(KLT)=x(yKx,yLx,y)=xθxyKx,yxθxcn2(yKx,y2)1/2,\langle K,L\rangle=\mathrm{tr}(KL^{T})=\sum_{x}(\sum_{y}K_{x,y}L_{x,y})=\sum_{x}\theta_{x}\sum_{y}K_{x,y}{\leq}\sum_{x}\theta_{x}\sqrt{cn_{2}}(\sum_{y}K_{x,y}^{2})^{1/2},

where the last inequality follows from Cauchy-Schwarz, and since there are at most cn2cn_{2} non-zero entries in each column of KK. Finally, we have

cn2xθx(yKx,y2)1/2cn2(xθx2)1/2(x,yKx,y2)1/2=cLFKF,\sqrt{cn_{2}}\sum_{x}\theta_{x}(\sum_{y}K_{x,y}^{2})^{1/2}\leq\sqrt{cn_{2}}(\sum_{x}\theta_{x}^{2})^{1/2}(\sum_{x,y}K_{x,y}^{2})^{1/2}=\sqrt{c}\|L\|_{F}\|K\|_{F},

where in the last equality we use that LF=n2(θi2)1/2\|L\|_{F}=\sqrt{n_{2}}(\sum\theta_{i}^{2})^{1/2}. ∎

Corollary 4.5.

Consider Kn1×n2K\in\mathbb{R}^{n_{1}\times n_{2}} where each column has at most cn1cn_{1} non-zero entries, and each row has at most cn2cn_{2} non-zero entries for some 0c10\leq c\leq 1. For any L𝒩C𝒩RL\in\mathcal{N}_{C}\oplus\mathcal{N}_{R} we have:

|K,L|2cLFKF.|\langle K,L\rangle|\leq\sqrt{2c}\|L\|_{F}\|K\|_{F}.
Proof of Corollary 4.5.

By Lemma 4.4 for L𝒩C,L\in\mathcal{N}_{C}, or L𝒩RL\in\mathcal{N}_{R} we have that

|K,L|cLFKF.|\langle K,L\rangle|\leq\sqrt{c}\|L\|_{F}\|K\|_{F}.

As 𝐞T𝐟j=0\mathbf{e}^{T}\mathbf{f}_{j}=0, 𝒩C\mathcal{N}_{C} and 𝒩R\mathcal{N}_{R} are orthogonal subspaces. The result follows by Lemma 4.3. ∎

Lemma 4.6.

Suppose GG is a graph satisfying the c-SCCc\text{-}{\rm SCC} property. Then, for any K𝒦~K\in\tilde{\mathcal{K}}^{\perp} and L𝒯2L\in\mathcal{T}_{2} we have that

|K,L|2cKFLF.|\langle K,L\rangle|\leq\sqrt{2c}\|K\|_{F}\|L\|_{F}.
Proof.

As K𝒦~K\in\tilde{\mathcal{K}}^{\perp}, its diagonal blocks are zero, and also, entries corresponding to non-edges of GG are zero. Moreover, as GG has the cc-SCC property, each row (and column) of KK has at most cncn non-zero entries. Let the block matrices be indexed by (x,y)(x,y), and let LxyL_{xy} and KxyK_{xy} denote the xyxy-th block. Recall that the block off-diagonal entries of any matrix in 𝒯2\mathcal{T}_{2} belong to 𝒩C𝒩R.\mathcal{N}_{C}\oplus\mathcal{N}_{R}. By Corollary 4.5 we have |Kxy,Lxy|2cLxyFKxyF|\langle K_{xy},L_{xy}\rangle|\leq\sqrt{2c}\|L_{xy}\|_{F}\|K_{xy}\|_{F}. By summing over the blocks and by applying Cauchy-Schwarz, we have

|K,L|\displaystyle|\langle K,L\rangle| x,y|Kxy,Lxy|=xy|Kxy,Lxy|\displaystyle\leq\sum_{x,y}|\langle K_{xy},L_{xy}\rangle|=\sum_{x\neq y}|\langle K_{xy},L_{xy}\rangle|
2cxyKxyFLxyF2c(x,yKxyF2)1/2(x,yLxyF2)1/2=2cKFLF.\displaystyle\leq\sqrt{2c}\sum_{x\neq y}\|K_{xy}\|_{F}\|L_{xy}\|_{F}\leq\sqrt{2c}(\sum_{x,y}\|K_{xy}\|_{F}^{2})^{1/2}(\sum_{x,y}\|L_{xy}\|_{F}^{2})^{1/2}=\sqrt{2c}\|K\|_{F}\|L\|_{F}.

Bounding the inner product between 𝒦~\tilde{\mathcal{K}}^{\perp} and 𝒯1\mathcal{T}_{1}.

This case is easier and the required result is given in the next lemma.

Lemma 4.7.

Suppose GG is a graph satisfying the c-SCCc\text{-}{\rm SCC} property. Then, for any K𝒦~K\in\tilde{\mathcal{K}}^{\perp} and L𝒯1L\in\mathcal{T}_{1} we have that

|K,L|cKFLF.|\langle K,L\rangle|\leq\sqrt{c}\|K\|_{F}\|L\|_{F}.
Proof of Lemma 4.7.

Note that KK has no entries in each block diagonal. Consider the (i,j)(i,j)-th block matrix where iji\neq j. We denote the coordinates in this block by i,j\mathcal{B}_{i,j}. Then

x,yi,jKx,yLx,y=θi,j(x,yi,jKx,y)c|𝒞i||𝒞j|θi,j(x,yi,jKx,y2)1/2.\sum_{x,y\in\mathcal{B}_{i,j}}K_{x,y}L_{x,y}=\theta_{i,j}\left(\sum_{x,y\in\mathcal{B}_{i,j}}K_{x,y}\right)\leq\sqrt{c|\mathcal{C}^{\star}_{i}||\mathcal{C}^{\star}_{j}|}\theta_{i,j}\left(\sum_{x,y\in\mathcal{B}_{i,j}}K_{x,y}^{2}\right)^{1/2}.

The last inequality follows from Cauchy-Schwarz, and by noting that KK has at most c|𝒞i||𝒞j|c|\mathcal{C}^{\star}_{i}||\mathcal{C}^{\star}_{j}| non-zero entries within the block i,j\mathcal{B}_{i,j}. Then by summing over the blocks (i,j)(i,j) we have

|K,L|i,jc|𝒞i||𝒞j|θi,j(x,yi,jKx,y2)1/2c(i,j|𝒞i||𝒞j|θi,j2)1/2(x,yKx,y2)1/2.|\langle K,L\rangle|\leq\sum_{i,j}\sqrt{c|\mathcal{C}^{\star}_{i}||\mathcal{C}^{\star}_{j}|}\theta_{i,j}(\sum_{x,y\in\mathcal{B}_{i,j}}K_{x,y}^{2})^{1/2}\leq\sqrt{c}(\sum_{i,j}|\mathcal{C}^{\star}_{i}||\mathcal{C}^{\star}_{j}|\theta_{i,j}^{2})^{1/2}(\sum_{x,y}K_{x,y}^{2})^{1/2}.

The last inequality follows from Cauchy-Schwarz. Now note that (x,yKx,y2)1/2=KF(\sum_{x,y}K_{x,y}^{2})^{1/2}=\|K\|_{F}, and that (i,j|𝒞i||𝒞j|θi,j2)1/2=LF(\sum_{i,j}|\mathcal{C}^{\star}_{i}||\mathcal{C}^{\star}_{j}|\theta_{i,j}^{2})^{1/2}=\|L\|_{F}, from which the result follows. ∎

Finally, we are ready to prove Theorem 4.1.

Proof of Theorem 4.1.

Consider K~𝒦~\tilde{K}\in\tilde{\mathcal{K}}^{\perp} and L~~\tilde{L}\in\tilde{\mathcal{L}}^{\perp}. By Proposition 4.2 we have that ~=𝒯1𝒯2\tilde{\mathcal{L}}^{\perp}=\mathcal{T}_{1}\oplus\mathcal{T}_{2} and let L~=L~1+L~2\tilde{L}=\tilde{L}_{1}+\tilde{L}_{2}, where L~i𝒯i\tilde{L}_{i}\in\mathcal{T}_{i}.

Part (i)(i). By Lemma 4.7 we have that

|K~,L~1|cK~FL~1F.|\langle\tilde{K},\tilde{L}_{1}\rangle|\leq\sqrt{c}\|\tilde{K}\|_{F}\|\tilde{L}_{1}\|_{F}.

Let L~2,off\tilde{L}_{2,\mathrm{off}} be the block off-diagonal component of L~2\tilde{L}_{2} and L~2,diag\tilde{L}_{2,\mathrm{diag}} be the block diagonal component of L~2\tilde{L}_{2} so that L~2=L~2,off+L~2,diag\tilde{L}_{2}=\tilde{L}_{2,\mathrm{off}}+\tilde{L}_{2,\mathrm{diag}}. By definition, the matrix L~2,off\tilde{L}_{2,\mathrm{off}} is zero on the diagonal blocks and each block belongs to 𝒩C𝒩R\mathcal{N}_{C}\oplus\mathcal{N}_{R}. Hence by Lemma 4.6, we have

|K~,L~2,off|2cK~FL~2,offF.|\langle\tilde{K},\tilde{L}_{2,\mathrm{off}}\rangle|\leq\sqrt{2c}\|\tilde{K}\|_{F}\|\tilde{L}_{2,\mathrm{off}}\|_{F}.

Moreover, as the diagonal blocks of K~\tilde{K} are zero we have that

K~,L~2,diag=0.\langle\tilde{K},\tilde{L}_{2,\mathrm{diag}}\rangle=0.

Putting everything together,

|K~,L~2|=|K,L~2,off|2cK~FL~2,offF2cK~FL~F,|\langle\tilde{K},\tilde{L}_{2}\rangle|=|\langle K,\tilde{L}_{2,\mathrm{off}}\rangle|\leq\sqrt{2c}\|\tilde{K}\|_{F}\|\tilde{L}_{\mathrm{2,off}}\|_{F}\leq\sqrt{2c}\|\tilde{K}\|_{F}\|\tilde{L}\|_{F},

where the last inequality follows by noting that LF2=LoffF2+LdiagF2\|L\|_{F}^{2}=\|L_{\mathrm{off}}\|_{F}^{2}+\|L_{\mathrm{diag}}\|_{F}^{2}.

Finally, by Proposition 4.2, the subspaces 𝒯1\mathcal{T}_{1} and 𝒯2\mathcal{T}_{2} are orthogonal. Hence by Lemma 4.3 applied to the subspaces 𝒯1\mathcal{T}_{1} and 𝒯2\mathcal{T}_{2} we have |K~,L~|2×2cK~FL~F|\langle\tilde{K},\tilde{L}\rangle|\leq\sqrt{2}\times\sqrt{2c}\|\tilde{K}\|_{F}\|\tilde{L}\|_{F}.

Part (ii)(ii). Note that

P𝒦(L~)F2=P𝒦(L~),P𝒦(L~)=P𝒦(P𝒦(L~)),L~=P𝒦(L~),L~,\|P_{\mathcal{K}^{\perp}}(\tilde{L})\|_{F}^{2}=\langle P_{\mathcal{K}^{\perp}}(\tilde{L}),P_{\mathcal{K}^{\perp}}(\tilde{L})\rangle=\langle P_{\mathcal{K}^{\perp}}(P_{\mathcal{K}^{\perp}}(\tilde{L})),\tilde{L}\rangle=\langle P_{\mathcal{K}^{\perp}}(\tilde{L}),\tilde{L}\rangle,

where the second and third equalities follow from the fact that orthogonal projections are self-adjoint and idempotent. Since P𝒦(L~)𝒦P_{\mathcal{K}^{\perp}}(\tilde{L})\in\mathcal{K}^{\perp}, by Proposition 4.1, we have

|P𝒦(L~),L~|2cP𝒦(L~)FL~F2cL~F2,|\langle P_{\mathcal{K}^{\perp}}(\tilde{L}),\tilde{L}\rangle|\leq 2\sqrt{c}\|P_{\mathcal{K}^{\perp}}(\tilde{L})\|_{F}\|\tilde{L}\|_{F}\leq 2\sqrt{c}\|\tilde{L}\|_{F}^{2},

from which the result follows.∎

4.2 Proof of Result 2

Theorem 4.8.

Suppose GG satisfies the c-SCCc\text{-}{\rm SCC} property for some c<min{14(minl1/|𝒞l|l1/|𝒞l|)2,1100}c<\min\{\frac{1}{4}(\frac{\min_{l}1/|\mathcal{C}^{\star}_{l}|}{\sum_{l}1/|\mathcal{C}^{\star}_{l}|})^{2},\frac{1}{100}\}. Then AA^{\star} is the unique optimal solution to (P).

Following Proposition 2.4, the largest singular value of ZZ^{\star} is σmax(Z)=l1/|𝒞l|\sigma_{\max}(Z^{\star})=\sum_{l}1/|\mathcal{C}^{\star}_{l}|, while the smallest non-zero singular value of ZZ^{\star} is σk(Z)=min1/|𝒞l|\sigma_{k^{\star}}(Z^{\star})=\min 1/|\mathcal{C}^{\star}_{l}|. Hence an alternative way to express the lower bound in Theorem 4.8 in terms of a condition number-type of quantity

minl1/|𝒞l|l1/|𝒞l|=σk(Z)σmax(Z).\frac{\min_{l}1/|\mathcal{C}^{\star}_{l}|}{\sum_{l}1/|\mathcal{C}^{\star}_{l}|}=\frac{\sigma_{k^{\star}}(Z^{\star})}{\sigma_{\max}(Z^{\star})}.
Proof.

As a reminder, our candidate dual certificate is Z=P𝒦~~(Z)Z^{\prime}=P_{\tilde{\mathcal{K}}\cap\tilde{\mathcal{L}}}(Z^{\star}) as defined earlier in (11). By Theorem 2.7, AA^{\star} is an extreme point of the feasible region. Thus, by Theorem 1.1, we need to show that ZZ^{\prime} satisfies conditions (5)–(9). Conditions (6) and (7) are satisfied by construction. As we noted in the proof of Theorem 3.1, conditions (8) and (LABEL:eq:objvalks) are taken care of by scaling. As such, it remains to show that ZZ^{\prime} is PSD and that range(Z)=ker(X){\rm range}(Z^{\prime})={\rm ker}(X^{\star}).

Simplification:

We begin by showing that it suffices to prove the inequality σmax(ZZ)<σk(Z)\sigma_{\max}(Z^{\prime}-Z^{\star})<\sigma_{k^{\star}}(Z^{\star}) (as a reminder, σk(Z)\sigma_{k^{\star}}(Z^{\star}) is the smallest non-zero singular value of ZZ^{\star}). To see why this is sufficient, let 𝐯|𝒱|\mathbf{v}\in\mathbb{R}^{|\mathcal{V}|} and consider its direct sum decomposition with respect to 𝒥\mathcal{J}; i.e., 𝐯=𝐯𝒥+𝐯𝒥\mathbf{v}=\mathbf{v}_{\mathcal{J}}+\mathbf{v}_{\mathcal{J}^{\perp}}. Since Z~Z^{\prime}\in\tilde{\mathcal{L}}, we have by Lemma 2.3, part (4) that

𝐯TZ𝐯=𝐯𝒥TZ𝐯𝒥=𝐯𝒥TZ𝐯𝒥+𝐯𝒥T(ZZ)𝐯𝒥.\mathbf{v}^{T}Z^{\prime}\mathbf{v}=\mathbf{v}^{T}_{\mathcal{J}}Z^{\prime}\mathbf{v}_{\mathcal{J}}=\mathbf{v}^{T}_{\mathcal{J}}Z^{\star}\mathbf{v}_{\mathcal{J}}+\mathbf{v}^{T}_{\mathcal{J}}(Z^{\prime}-Z^{\star})\mathbf{v}_{\mathcal{J}}.

By Lemma 2.4, ZZ^{\star} restricted on 𝒥\mathcal{J} is positive definite with smallest eigenvalue at least σk(Z)\sigma_{k^{\star}}(Z^{\star}), and hence

𝐯𝒥TZ𝐯𝒥σk(Z)𝐯𝒥22.\mathbf{v}^{T}_{\mathcal{J}}Z^{\star}\mathbf{v}_{\mathcal{J}}\geq\sigma_{k^{\star}}(Z^{\star})\|\mathbf{v}_{\mathcal{J}}\|_{2}^{2}.

On the other hand, the inequality σmax(ZZ)<σk(Z)\sigma_{\max}(Z^{\prime}-Z^{\star})<\sigma_{k^{\star}}(Z^{\star}) implies

𝐯𝒥T(ZZ)𝐯𝒥<σk(Z)𝐯𝒥22.\mathbf{v}^{T}_{\mathcal{J}}(Z^{\prime}-Z^{\star})\mathbf{v}_{\mathcal{J}}<\sigma_{k^{\star}}(Z^{\star})\|\mathbf{v}_{\mathcal{J}}\|_{2}^{2}.

This means that 𝐯TZ𝐯0\mathbf{v}^{T}Z^{\prime}\mathbf{v}\geq 0 for all 𝐯|𝒱|\mathbf{v}\in\mathbb{R}^{|\mathcal{V}|}, which means that ZZ^{\prime} is PSD.

Finally, we need to show that range(Z)=ker(X){\rm range}(Z^{\prime})={\rm ker}(X^{\star}). By definition, we have that Z~Z^{\prime}\in\tilde{\mathcal{L}}, so by Proposition 2.3, we have that

range(Z)𝒥=ker(X).{\rm range}(Z^{\prime})\subseteq\mathcal{J}={\rm ker}(X^{\star}).

For the converse inequality, we show that ker(Z)𝒥{\rm ker}(Z^{\prime})\subseteq\mathcal{J}^{\perp}. For this, take 𝐯ker(Z)\mathbf{v}\in{\rm ker}(Z^{\prime}). Let 𝐯=𝐯𝒥+𝐯𝒥\mathbf{v}=\mathbf{v}_{\mathcal{J}}+\mathbf{v}_{\mathcal{J}^{\perp}} and assume that 𝐯𝒥0\mathbf{v}_{\mathcal{J}}\neq 0. Then, we would have

0=𝐯TZ𝐯=𝐯𝒥TZ𝐯𝒥>0,0=\mathbf{v}^{T}Z^{\prime}\mathbf{v}=\mathbf{v}^{T}_{\mathcal{J}}Z^{\prime}\mathbf{v}_{\mathcal{J}}>0,

leading to a contradiction.

Expressing ZZZ^{\star}-Z^{\prime} via the KKT conditions.

The first-order optimality condition corresponding to (11) gives

ZZ=L~+K~,Z^{\star}-Z^{\prime}=\tilde{L}+\tilde{K}, (16)

where L~\tilde{L} lies in the normal cone of ~\tilde{\mathcal{L}} at ZZ^{\prime} and K~\tilde{K} lies in the normal cone of 𝒦\mathcal{K} at ZZ^{\prime}. Here we used the fact that the normal cone to an intersection is the sum of the normal cones, e.g. see [rockafellar, Corollary 23.8.1]. Moreover, as the normal cone to a subspace is the orthogonal subspace, we have that L~~\tilde{L}\in{\tilde{\mathcal{L}}}^{\perp} and K~𝒦~\tilde{K}\in\tilde{\mathcal{K}}^{\perp}. First, by projecting (16) onto ~\tilde{\mathcal{L}} we have

ZZ=P~(K~).Z^{\star}-Z^{\prime}=P_{\tilde{\mathcal{L}}}(\tilde{K}). (17)

This follows as Z~Z^{\star}\in\tilde{\mathcal{L}}, Z𝒦~~Z^{\prime}\in\tilde{\mathcal{K}}\cap\tilde{\mathcal{L}}. Second, by projecting (16) onto the subspace 𝒦~\tilde{\mathcal{K}}^{\perp} we have

P𝒦~(Z)=P𝒦~(L~)+K~.P_{\tilde{\mathcal{K}}^{\perp}}(Z^{\star})=P_{\tilde{\mathcal{K}}^{\perp}}(\tilde{L})+\tilde{K}. (18)

Combining (17) and (18), we have

ZZF=P~(K~)FK~F=P𝒦~(Z)P𝒦~(L~)FP𝒦~(Z)F+P𝒦~(L~)F,\|Z^{\star}-Z^{\prime}\|_{F}=\|P_{\tilde{\mathcal{L}}}(\tilde{K})\|_{F}\leq\|\tilde{K}\|_{F}=\|P_{\tilde{\mathcal{K}}^{\perp}}(Z^{\star})-P_{\tilde{\mathcal{K}}^{\perp}}(\tilde{L})\|_{F}\leq\|P_{\tilde{\mathcal{K}}^{\perp}}(Z^{\star})\|_{F}+\|P_{\tilde{\mathcal{K}}^{\perp}}(\tilde{L})\|_{F}, (19)

where the first equality follows from (17), and the last inequality follows from the triangle inequality.

We next proceed to bound the terms P𝒦~(Z)F\|P_{\tilde{\mathcal{K}}^{\perp}}(Z^{\star})\|_{F} and P𝒦~(L~)F\|P_{\tilde{\mathcal{K}}^{\perp}}(\tilde{L})\|_{F}.

Bound the term P𝒦~(Z)F\|P_{\tilde{\mathcal{K}}^{\perp}}(Z^{\star})\|_{F}.

We show that

P𝒦~(Z)Fc(l1/|𝒞l|).\|P_{\tilde{\mathcal{K}}^{\perp}}(Z^{*})\|_{F}\leq\sqrt{c}\left(\sum_{l}1/|\mathcal{C}^{\star}_{l}|\right). (20)

The (i,j)(i,j)-th entry of the matrix P𝒦~(Z)P_{\tilde{\mathcal{K}}^{\perp}}(Z) is equal to Zi,jZ_{i,j} if (i,j)(i,j)\in\mathcal{E}, and is equal to zero otherwise. Consider the block corresponding to (𝒞x,𝒞y)(\mathcal{C}^{\star}_{x},\mathcal{C}^{\star}_{y}), where xyx\neq y. Each non-zero entry is 1/(|𝒞x||𝒞y|)1/(|\mathcal{C}^{\star}_{x}||\mathcal{C}^{\star}_{y}|) and there are at most c|𝒞x||𝒞y|c|\mathcal{C}^{\star}_{x}||\mathcal{C}^{\star}_{y}| entries. Hence the sum of squares of the entries in this block is at most c/(|𝒞x||𝒞y|)c/(|\mathcal{C}^{\star}_{x}||\mathcal{C}^{\star}_{y}|). We sum over the all indices xx and yy to obtain P𝒦~(Z)F2x,yc/(|𝒞x||𝒞y|)c(l1/|𝒞l|)2\|P_{\tilde{\mathcal{K}}^{\perp}}(Z)\|_{F}^{2}\leq\sum_{x,y}c/(|\mathcal{C}^{\star}_{x}||\mathcal{C}^{\star}_{y}|)\leq c(\sum_{l}1/|\mathcal{C}^{\star}_{l}|)^{2}, from which the result follows. (In the first inequality, recall that the block-diagonal entries of P𝒦~(Z)P_{\tilde{\mathcal{K}}^{\perp}}(Z) are zero and thus do not contribute to the sum.)

Bound the term P𝒦~(L~)F\|P_{\tilde{\mathcal{K}}^{\perp}}(\tilde{L})\|_{F}.

Setting

ϵ=L~/L~F,K~/K~F,\epsilon=\langle\tilde{L}/\|\tilde{L}\|_{F},\tilde{K}/\|\tilde{K}\|_{F}\rangle,

it follows from (16) that

ZZF2=L~F2+K~F2+2ϵL~FK~F.\|Z^{\star}-Z^{\prime}\|_{F}^{2}=\|\tilde{L}\|_{F}^{2}+\|\tilde{K}\|_{F}^{2}+2\epsilon\|\tilde{L}\|_{F}\|\tilde{K}\|_{F}.

By the AM-GM inequality we have that

2L~FK~FL~F2K~F2.2\|\tilde{L}\|_{F}\|\tilde{K}\|_{F}\geq-\|\tilde{L}\|_{F}^{2}-\|\tilde{K}\|_{F}^{2}.

Consequently, we get

ZZF2(1|ϵ|)(L~F2+K~F2)\|Z^{\star}-Z^{\prime}\|_{F}^{2}\geq(1-|\epsilon|)(\|\tilde{L}\|_{F}^{2}+\|\tilde{K}\|_{F}^{2})

and since |ϵ|1|\epsilon|\leq 1 we have

L~F2L~F2+K~F2(11|ϵ|)ZZF2.\|\tilde{L}\|_{F}^{2}\leq\|\tilde{L}\|_{F}^{2}+\|\tilde{K}\|_{F}^{2}\leq\left(\frac{1}{1-|\epsilon|}\right)\|Z^{\star}-Z^{\prime}\|_{F}^{2}. (21)

Combining (21) with Theorem 4.1 (ii)(ii) we have

P𝒦~(L~)F(2c)1/2L~F(2c)1/21|ϵ|ZZF.\|P_{\tilde{\mathcal{K}}^{\perp}}(\tilde{L})\|_{F}\leq(2\sqrt{c})^{1/2}\|\tilde{L}\|_{F}\leq{(2\sqrt{c})^{1/2}\over\sqrt{1-|\epsilon|}}\|Z^{\star}-Z^{\prime}\|_{F}. (22)

Concluding the proof.

We have established in (19) that

ZZFP𝒦~(Z)F+P𝒦~(L~)F,\|Z^{\star}-Z^{\prime}\|_{F}\leq\|P_{\tilde{\mathcal{K}}^{\perp}}(Z^{\star})\|_{F}+\|P_{\tilde{\mathcal{K}}^{\perp}}(\tilde{L})\|_{F},

which combined with (20) and (22) shows that

ZZFc(l1/|𝒞l|)+(2c)1/21|ϵ|ZZF.\|Z^{\star}-Z^{\prime}\|_{F}\leq\sqrt{c}(\sum_{l}1/|\mathcal{C}^{\star}_{l}|)+{(2\sqrt{c})^{1/2}\over\sqrt{1-|\epsilon|}}\|Z^{\star}-Z^{\prime}\|_{F}. (23)

By Theorem 4.1 (i)(i) we have that

|ϵ|=|L~/L~F,K~/K~F|2c,|\epsilon|=|\langle\tilde{L}/\|\tilde{L}\|_{F},\tilde{K}/\|\tilde{K}\|_{F}\rangle|\leq 2\sqrt{c},

so for c1/100c\leq 1/100 we get

(2c)1/21|ϵ|12.{(2\sqrt{c})^{1/2}\over\sqrt{1-|\epsilon|}}\leq{1\over 2}.

Then, (23) implies that

ZZF2c(l1/|𝒞l|).\|Z^{\star}-Z^{\prime}\|_{F}\leq 2\sqrt{c}(\sum_{l}1/|\mathcal{C}^{\star}_{l}|).

Finally, using that the Frobenius norm is always greater than the spectral norm, we have

ZZ2ZZF2c(l1/|𝒞l|),\|Z^{\star}-Z^{\prime}\|_{2}\leq\|Z^{\star}-Z^{\prime}\|_{F}\leq 2\sqrt{c}(\sum_{l}1/|\mathcal{C}^{\star}_{l}|),

which is strictly smaller than σk(Z)\sigma_{k^{\star}}(Z^{\star}) whenever c<142(σk(Z)/σmax(Z))2c<\frac{1}{4}^{2}(\sigma_{k^{\star}}(Z^{\star})/\sigma_{\max}(Z^{\star}))^{2}. ∎

5 Recovery for planted clique cover instances

We conclude Result 1 by applying Hoeffding’s inequality onto the conclusions of Result 2 directly.

Theorem 5.1.

Let GG be a random planted clique cover instance defined on the cliques {𝒞l}l=1k\{\mathcal{C}^{\star}_{l}\}_{l=1}^{k^{\star}} where we introduce edges between cliques with probability pp. If p<cp<c then GG is cc-neighborly with probability greater than

1i=1k|𝒞i|jiexp(2|𝒞j|(cp)2).1-\sum_{i=1}^{k^{\star}}{|\mathcal{C}^{\star}_{i}|\sum_{j\neq i}\exp(-2|\mathcal{C}^{\star}_{j}|(c-p)^{2}}).
Proof.

Let Xia,jbX_{ia,jb} be a binary variable equal to 1 if vertices a𝒞ia\in\mathcal{C}^{\star}_{i} and b𝒞jb\in\mathcal{C}^{\star}_{j} are connected, and equal to 0 otherwise. First we have 𝔼[b𝒞jXia,jb]=b𝒞j𝔼[Xia,jb]=|𝒞j|p\mathbb{E}[\sum_{b\in\mathcal{C}^{\star}_{j}}{X_{ia,jb}}]=\sum_{b\in\mathcal{C}^{\star}_{j}}{\mathbb{E}[X_{ia,jb}]}=|\mathcal{C}^{\star}_{j}|p. By Hoeffding’s inequality we have

(b𝒞jXia,jbtj+|𝒞j|p)=(b𝒞jXia,jb𝔼[b𝒞jXia,jb]tj)exp(2tj2/|𝒞j|).\mathbb{P}\Big{(}\sum_{b\in\mathcal{C}^{\star}_{j}}{X_{ia,jb}}\geq t_{j}+|\mathcal{C}^{\star}_{j}|p\Big{)}=\mathbb{P}\Big{(}\sum_{b\in\mathcal{C}^{\star}_{j}}{X_{ia,jb}}-\mathbb{E}[\sum_{b\in\mathcal{C}^{\star}_{j}}{X_{ia,jb}}]\geq t_{j}\Big{)}\leq\exp(-2t_{j}^{2}/|\mathcal{C}^{\star}_{j}|). (24)

Using a union bound we get that

(i,a𝒞i,ji(bXia,jb<tj+|𝒞j|p))1i|𝒞i|jiexp(2tj2/|𝒞j|).\mathbb{P}\Big{(}\bigcap_{i,a\in\mathcal{C}^{\star}_{i},j\neq i}\Big{(}\sum_{b}{X_{ia,jb}}<t_{j}+|\mathcal{C}^{\star}_{j}|p\Big{)}\Big{)}\geq 1-\sum_{i}|\mathcal{C}^{\star}_{i}|\sum_{j\neq i}\exp(-2t_{j}^{2}/|\mathcal{C}^{\star}_{j}|).

Finally, given that p<cp<c, we set tj=|𝒞j|(cp)>0t_{j}=|\mathcal{C}^{\star}_{j}|(c-p)>0 to get the desired result. ∎

6 Numerical comparison to alternative techniques

In this section, we assess the effectiveness of the Lovász theta function for recovering planted clique cover instances by comparing it to alternative techniques in the literature. Specifically, we compare our method with the following approaches:

  • Community Detection approach: We frame the clique cover problem as an instance of community detection, and use the method proposed by Oymak and Hassibi [24].

  • kk-Disjoint-Clique approach: We consider the clique cover problem as an instance of the kk-disjoint-clique problem, and apply the method presented by Ames and Vavasis [4].

  • Subgraph Isomorphism approach: We view the problem as an instance of subgraph isomorphism, and apply the framework proposed by Candogan and Chandrasekaran [9].

Community Detection/Graph clustering.

Consider a network where the presence of an edge (or possibly the size of its edge weight) models the strength of association between two nodes. The graph clustering task concerns grouping these nodes into subsets – often called communities – such that the strength of association between pairs of nodes from the same subset is significantly stronger that the strength of association between pairs belonging to different subsets. The task of community detection finds application in various fields such as sociology, physics, epidemiology, and more [14, 23]. The term community detection is often used interchangeably with graph clustering.

Oymak and Hassibi [24] (see also [11]) present a method for detecting communities in an undirected graph 𝒢=(𝒱,)\mathcal{G}=(\mathcal{V},\mathcal{E}) by solving the following SDP:

minS,LλS1+Ls.t.S+L=A,0LJ.\displaystyle\min_{S,L}~{}~{}\lambda\|S\|_{1}+\|L\|_{*}\quad\mathrm{s.t.}\quad S+L=A,\quad 0\leq L\leq J. (25)

Here AA is the adjacency matrix of GG, λ>0\lambda>0 is a tuning parameter, X1=i,j|Xi,j|\|X\|_{1}=\sum_{i,j}|X_{i,j}| is the L1-norm over the matrix entries, while X=iσi(X)\|X\|_{*}=\sum_{i}\sigma_{i}(X) is the matrix nuclear norm.

Let (S^,L^)(\hat{S},\hat{L}) be the optimal solution to this SDP. The intuition behind the SDP (25) is that it performs a matrix deconvolution in which the adjacency matrix AA is separated as the sum of a low-rank component L^\hat{L} and a sparse matrix S^\hat{S}. The hope is that (possibly by tuning the parameter λ\lambda) the low-rank component takes the form L^=l𝟏𝒞l𝟏𝒞lT\hat{L}=\sum_{l}\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T}, where 𝒞l\mathcal{C}^{\star}_{l} are the indicator vectors for disjoint communities, while the sparse component S^\hat{S} captures noise that is (hopefully) sparse.

The specific form in (25) builds upon a long sequence of ideas, originating from the observation that the L1-norm is effective at inducing sparse structure, while the matrix nuclear norm is effective at inducing low-rank structure [8, 26]. Subsequent work proposed the formulation (25) for deconvolving matrices as the linear sum of a sparse matrix and a low-rank matrix [10, 32]; in particular, Chandarasekaran et. al. (25) show that penalization with the L1-norm plus the nuclear-norm succeeds at recovering both components so long as these components satisfy a mutual incoherence condition. The formulation in (25) is a reasonable approach for obtaining clique covers because one can view cliques as clusters.

kk-Disjoint-Clique Approach.

In the kk-disjoint-clique problem, the goal is to find kk disjoint cliques (of any size) that cover as many nodes of the graph as possible. It is worth noting that in this particular setting, the number of cliques kk is constant and is determined by the user as a parameter. Ames and Vavasis [4] introduce the following SDP for this problem:

max\displaystyle\max{}{} X,J\displaystyle~{}~{}\langle X,J\rangle (26)
s.t.\displaystyle\mathrm{s.t.}{}{} X𝟏𝟏\displaystyle~{}~{}X\mathbf{1}\leq\mathbf{1}
Xi,j=0(i,j),ij\displaystyle~{}~{}X_{i,j}=0\quad(i,j)\notin\mathcal{E},i\neq j
X,I=k\displaystyle~{}~{}\langle X,I\rangle=k
X0.\displaystyle~{}~{}X\succeq 0.

Subgraph isomorphism.

Candogan and Chandrasekaran [9] investigate the problem of finding any subgraph embedded within a larger graph, known as the subgraph isomorphism problem. A natural approach is to search over matrices obtained by conjugating the adjacency matrix of the subgraph with the permutation group, leading to a quadratic assignment problem instance. The basic idea in [9] is to relax this set by conjugating with the orthogonal group O(|V|)O(|V|) instead. The resulting convex hull is known as the Schur-Horn orbitope [29], and it admits a tractable description via semidefinite programming [12]. In our context, the adjacency matrix of the subgraph (technically not a proper subgraph) is l𝟏𝒞l𝟏𝒞lT\sum_{l}\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T}, and the resulting SDP is

max\displaystyle\max{}{} A,X\displaystyle~{}~{}\langle A,X\rangle (27)
s.t.\displaystyle\mathrm{s.t.}{}{} Xi,j=0(i,j),ij\displaystyle~{}~{}X_{i,j}=0\quad(i,j)\notin\mathcal{E},i\neq j
X=nZ1+0Z0\displaystyle~{}~{}X=n\cdot Z_{1}+0\cdot Z_{0}
Z0,I=nkk\displaystyle~{}~{}\langle Z_{0},I\rangle=nk-k
Z1,I=k\displaystyle~{}~{}\langle Z_{1},I\rangle=k
Z0+Z1=I\displaystyle~{}~{}Z_{0}+Z_{1}=I
Z1,Z20.\displaystyle~{}~{}Z_{1},Z_{2}\succeq 0.

6.1 Experimental setup and results

In our experimental setup, we generate clique cover instances from the planted clique cover model with k=10k^{\star}=10 cliques, each of size n=10n=10. For each value of p{0.00,0.05,0.10,,1.00}p\in\{0.00,0.05,0.10,\ldots,1.00\}, we generate 2020 random clique cover instances. We compare the Lovász theta against the three methods described in the previous section. Our result are summarized in Figure 2, where we illustrate the average success rates of each method in recovering the underlying clique cover instances across the entire spectrum of pp values. In the reported experiments we use two notions of recovery for ϑ\vartheta:

  1. 1.

    We declare strong recovery when the solver outputs an optimal solution X^\hat{X} that is equal to XX^{\star} (up to a small tolerance).

  2. 2.

    We declare weak recovery when ϑ(G)=k\vartheta(G)=k^{\star} but the output solution X^\hat{X} is not equal to XX^{\star}.

First, we compare against the SDP (25) for community detection. We sweep over the choices regularization parameters λ{0.0,0.1,,1.0}\lambda\in\{0.0,0.1,\ldots,1.0\}. This sweeping process makes their method the most expensive to implement among all methods listed here. For a given choice of λ\lambda, let (L^,S^)(\hat{L},\hat{S}) denote the optimal solution. We say that the method succeeds if L^=𝟏𝒞l𝟏𝒞lT\hat{L}=\sum\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T} for some choice of parameter λ\lambda.

Second, we compare against the SDP (26) for the kk-disjoint clique problem. We say that the method succeeds if the optimal solution satisfies X^=𝟏𝒞l𝟏𝒞lT\hat{X}=\sum\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T}. It’s worth noting that when applying this method to uncover clique covers, there is a minor constraint in that we must specify the size of cliques k=10k^{\star}=10, or alternatively, sweep over all feasible clique sizes.

Our third basis of comparison is the SDP (27) proposed by Candogan and Chandrasekaran for finding subgraphs based on the Schur-Horn orbitope of a graph [9]. We say that the method succeeds if the optimal solution satisfies X^=𝟏𝒞l𝟏𝒞lT\hat{X}=\sum\mathbf{1}_{\mathcal{C}^{\star}_{l}}\mathbf{1}_{\mathcal{C}^{\star}_{l}}^{T}. A limitation of using this approach for finding a clique cover is that we implicitly assume that we know what we are searching for in the first place (to be precise, it is the spectrum of the target graph we require as an input). This assumption is somewhat unrealistic, but one should bear in mind that Candogan and Chandrasekaran’s work in [9] was intended to solve a completely different problem in the first place.

Our experiments are summarized in Figure 2. First, when examining the performance of the Lovász theta function ϑ\vartheta, we observe a congruence between our theoretical predictions, which predict strong recovery for small pp values, and the outcomes derived from our numerical experiments. Furthermore, it is noteworthy that this strong recovery pattern persists until approximately p0.4p\approx 0.4.

In the context of our comparisons with the other three methods, the Lovász theta function emerges as the most successful, with only a slight edge over the SDP (26) proposed by Ames and Vavasis [4] for finding disjoint cliques, as well as the Schur-Horn orbitope-based relaxation (27) presented by Candogan and Chandrasekaran. The method that exhibited the weakest performance in our experiments was the matrix deconvolution method by Oymak and Hassibi [24].

Refer to caption
Figure 2: Comparison of Lovász theta function with other methods.

7 Future directions

In this section, we conduct and discuss additional numerical experiments involving the Lovász theta function ϑ(G)\vartheta(G). Our objective is to report interesting behavior of the Lovász theta function in the context of recovering clique covers that are not supported or apparent from our theoretical findings, and suggest future directions to investigate.

7.1 Comparison with ILP

In our first example, we compare how tight the Lovász theta function is in comparison with the clique cover number. To investigate this question, we generate a graph from the planted clique cover model for varying choices of p[0,1]p\in[0,1]. We compute the clique cover number by solving the following Integer Linear Program (ILP)

min\displaystyle\min{}{} cyc\displaystyle~{}~{}\sum_{c}y_{c} (28)
s.t.\displaystyle\mathrm{s.t.}{}{} xi,cyc for all i,c\displaystyle~{}~{}x_{i,c}\leq y_{c}\quad\text{ for all }\quad i,c
xi,c+xj,c1(i,j)\displaystyle~{}~{}x_{i,c}+x_{j,c}\leq 1\quad(i,j)\in\mathcal{E}
cxi,c=1 for all i\displaystyle~{}~{}\sum_{c}x_{i,c}=1\quad\text{ for all }\quad i
xi,c,yc{0,1}.\displaystyle~{}~{}x_{i,c},y_{c}\in\{0,1\}.

Here, the binary variable xi,cx_{i,c} indicates whether vertex ii is contained in clique cc, while the binary variable ycy_{c} indicates whether clique cc is non-empty. The constraint xi,c+xj,c1x_{i,c}+x_{j,c}\leq 1 ensures no adjacent pair of vertices are in the same clique, while the constraint cxi,c=1\sum_{c}x_{i,c}=1 ensures that every vertex belongs to a unique clique.

Our motivation for studying this question is as follows: In our experiments in Section 6, we generally take the clique covering number to be equal to kk^{\star}. However, short of solving computing the clique covering number outright (say, via (28)), there is no simple way of verifying this is indeed the case. The concern increases as pp increases, as we may inadvertently lower the clique covering number when more edges are added. As such, one objective of this experiment is to understand if our assumption that the clique covering number is well approximated by kk^{\star} is sound.

In the left sub-plot of Figure 3 we compare the clique covering number with the Lovász theta function on a planted clique cover instance with 88 cliques, each of size 88, and with varying choices of parameter pp. Based on our results we notice that the deviation between these quantities is at most 2\approx 2, and is greatest at p0.7p\approx 0.7.

In the right sub-plot of Figure 3 we compare the time taken for the ILP solver Gurobi [15] to compute (28) with the SDP solver SDPT3 [30, 31] to compute (P). The time taken for SDP solver is approximately equal across all problem instances, as one would expect. The time taken for the ILP solver is quite small for p0.4p\leq 0.4, but becomes substantially longer for p=0.6p=0.6. In the same plot we also track the number of simplex solves required by the ILP solver Gurobi – the trend of this curve is largely similar to the previous curve. These observations suggest that the most complex regime for computing the clique covering number for the Erdős-Rényi noise model occurs around p0.6p\approx 0.6. There are a number of explanations: First, our theoretical findings suggest that it is generally easier for the Lovász theta function to discover minimal clique covers for small values of pp, and these curves do also suggest that small values of pp correspond to ‘easier’ instances. On the other extreme, at p=1p=1, there is only one clique. For values of p=1ϵp=1-\epsilon, it may be that large cliques are easy to find, and hence ILP solvers are able to certify optimality relatively quickly. It would be interesting to investigate the behavior of these curves for increasing problem sizes, though one obvious difficulty is that the ILP instances may become impractical to compute.

Refer to caption
Refer to caption
Figure 3: Comparison of the clique covering number with Lovász theta for random clique cover model (left). Comparison of time taken for ILP solver to compute clique covering number with time taken for SDP solver to compute Lovász theta function (right). The number of simplex iterations taken by ILP solver is included as a reference.

7.2 Phase Transition

In the second experiment, we investigate the probability that the Lovász theta function correctly recovers the planted cliques as certain parameters are taken to ++\infty. Our objective is to see if a phase transition arises.

To investigate this problem, we consider an experimental set-up where all the cliques are of equal size. We set the number of cliques kk^{\star} to be equal to the size of each clique nn, with the values n=kn=k^{\star} ranging over {5,,15}\{5,\ldots,15\}. For each value of nn and kk^{\star}, we generate 1010 random graphs from the planted clique cover model with pp taking values from {0.00,0.05,0.10,,1.00}\{0.00,0.05,0.10,\ldots,1.00\}. In Figure 4 we plot the success probabilities corresponding to each value of pp. We plot the curve corresponding to strong recovery in black, and the curve corresponding to weak recovery in red.

We make a number of observations. First, we observe that the transition from success to failure becomes more abrupt as we increase the values of nn and kk^{\star}. This observation holds for the both types of recovery conditions. In particular, our results suggest that a phase transition occurs as these values are taken to ++\infty, and we conjecture that this is indeed the case, as is typical with many phenomena in random graphs. Second, and quite interestingly, the gap between both types of recovery shrink as the problem parameters nn and kk^{\star} increase. We conjecture that both curves coincide in the limit.

Refer to caption
Figure 4: Probability of correctly recovering a planted clique instance with increasing clique size and increasing number of cliques. The black curves correspond to strong recovery while the red curves correspond to weak recovery. The thickness of the lines increase with the problem parameters nn and kk^{\star}.

References

  • [1] E. Abbe, A. S. Bandeira, and G. Hall. Exact Recovery in the Stochastic Block Model. IEEE Transactions on Information Theory, 62(1):471–487, 2016.
  • [2] F. Alizadeh, J. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semidefinite programming.
  • [3] B. P. W. Ames and S. A. Vavasis. Nuclear Norm Minimization for the Planted Clique and Biclique Problems. Mathematical Programming, 129:68–89, 2011.
  • [4] B. P. W. Ames and S. A. Vavasis. Convex Optimization for the Planted k-disjoint-clique Problem. Mathematical Programming, 143:299–337, 2014.
  • [5] P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, No Need to Round: Integrality of Clustering Formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, 2015.
  • [6] A. S. Bandeira. A Note on Probably Certifiably Correct Algorithms. Comptes Rendus Mathematique, 354(3), 2016.
  • [7] Ravi B. Boppana. Eigenvalues and graph bisection: An average-case analysis. In 28th Annual Symposium on Foundations of Computer Science, 1987.
  • [8] E. J. Candès and Y. Plan. Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements. IEEE Transactions on Information Theory, 57(4):2342–2359.
  • [9] U. O. Candogan and V. Chandrasekaran. Finding Planted Subgraphs with Few Eigenvalues using the Schur–Horn Relaxation. SIAM Journal on Optimization, 28:735–759, 2018.
  • [10] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-Sparsity Incoherence for Matrix Decomposition. SIAM Journal on Optimization, 21(2), 2011.
  • [11] Y. Chen, A. Jalali, S. Sanghavi, and H. Xu. Clustering Partially Observed Graphs via Convex Optimization. Journal of Machine Learning, 2014.
  • [12] Y. Ding and H. Wolkowicz. A Low-Dimensional Semidefinite Relaxation for the Quadratic Assignment Problem. Mathematics of Operations Research, 34:769–1024, 2009.
  • [13] Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Structures & Algorithms, 16(2), 2000.
  • [14] Santo Fortunato. Community Detection in Graphs. Physics Reports, 486(3–5):75–174, 2010.
  • [15] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
  • [16] Takayuki Iguchi, Dustin G. Mixon, Jesse Peterson, and Soledad Villar. On the tightness of an sdp relaxation of k-means. CoRR, abs/1505.04778, 2015.
  • [17] D. E. Knuth. The Sandwich Theorem.
  • [18] M. Laurent and F. Vallentin. Semidefinite Optimization. online, 2012.
  • [19] M. Laurent and A. Varvitsiotis. Positive Semidefinite Matrix Completion, Universal Rigidity and the Strong Arnold Property. Linear Algebra and its Applications, 452:292–317, 2014.
  • [20] L. Lovász. On the Shannon Capacity of a Graph. IEEE Transactions on Information Theory, 25:1–7, 1979.
  • [21] D. Marx. Graph colouring problems and their applications in scheduling. Periodica Polytechnica, Electrical Engineering, 48:11–16, 2004.
  • [22] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. SIAM Studies in Applied and Numerical Mathematics, 1994.
  • [23] M. E. J. Newman. Modularity and Community Structure in Networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582, 2006.
  • [24] S. Oymak and B. Hassibi. Finding Dense Clusters via Low Rank + Sparse Decomposition. CoRR, abs/1104.5186, 2011.
  • [25] M. Ramana and A. J. Goldman. Some Geometric Results in Semidefinite Programming. Journal of Global Optimization, 7:33–50, 1995.
  • [26] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. SIAM Review, 52(3):471–501, 2010.
  • [27] J. Renegar. A Mathematical View of Interior-Point Methods in Convex Optimization. MOS-SIAM Series on Optimization, 2001.
  • [28] Tim Roughgarden. Beyond the worst-case analysis of algorithms. Cambridge University Press, 2020.
  • [29] R. Sanyal, F. Sottile, and B. Sturmfels. Orbitopes. Mathematika, 57(2):275–314, 2011.
  • [30] K. C. Toh, M. J. Todd, and R. H. Tütüncü. SDPT3 – a MATLAB Software Package for Semidefinite Programming. Optimization Methods and Software, 11:545–581, 1999.
  • [31] R. H. Tütüncü, K. C. Toh, and M. J. Todd. Solving Semidefinite-Quadratic-Linear Programs using SDPT3. Mathematical Programming, 95:189–217, 2003.
  • [32] J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao. Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization. In Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009.