This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering

Nimita Shinde IITB-Monash Research Academy Industrial Engineering and Operations Research, IIT Bombay Electrical and Computer Systems Engineering, Monash University Vishnu Narayanan Industrial Engineering and Operations Research, IIT Bombay James Saunderson Electrical and Computer Systems Engineering, Monash University
Abstract

Max-k-Cut and correlation clustering are fundamental graph partitioning problems. For a graph G=(V,E)G=(V,E) with nn vertices, the methods with the best approximation guarantees for Max-k-Cut and the Max-Agree variant of correlation clustering involve solving SDPs with 𝒪(n2)\mathcal{O}(n^{2}) constraints and variables. Large-scale instances of SDPs, thus, present a memory bottleneck. In this paper, we develop simple polynomial-time Gaussian sampling-based algorithms for these two problems that use 𝒪(n+|E|)\mathcal{O}(n+|E|) memory and nearly achieve the best existing approximation guarantees. For dense graphs arriving in a stream, we eliminate the dependence on |E||E| in the storage complexity at the cost of a slightly worse approximation ratio by combining our approach with sparsification.

1 Introduction

Semidefinite programs (SDPs) arise naturally as a relaxation of a variety of problems such as kk-means clustering [5], correlation clustering [6] and Max-k-Cut [14]. In each case, the decision variable is an n×nn\times n matrix and there are d=Ω(n2)d=\Omega(n^{2}) constraints. While reducing the memory bottleneck for large-scale SDPs has been studied quite extensively in literature [9, 19, 11, 37], all these methods use memory that scales linearly with the number of constraints and also depends on either the rank of the optimal solution or an approximation parameter. A recent Gaussian-sampling based technique to generate a near-optimal, near-feasible solution to SDPs with smooth objective function involves replacing the decision variable XX with a zero-mean random vector whose covariance is XX [28]. This method uses at most 𝒪(n+d)\mathcal{O}(n+d) memory, independent of the rank of the optimal solution. However, for SDPs with d=Ω(n2)d=\Omega(n^{2}) constraints, these algorithms still use Ω(n2)\Omega(n^{2}) memory and provide no advantage in storage reduction. In this paper, we show how to adapt the Gaussian sampling-based approach of [28] to generate an approximate solution with provable approximation guarantees to Max-k-Cut, and to the Max-Agree variant of correlation clustering on a graph G=(V,E)G=(V,E) with arbitrary edge weights using only 𝒪(|V|+|E|)\mathcal{O}(|V|+|E|) memory.

1.1 Max-k-Cut

Max-k-Cut is the problem of partitioning the vertices of a weighted undirected graph G=(V,E)G=(V,E) into kk distinct parts, such that the total weight of the edges across the parts is maximized. If wijw_{ij} is the edge weight corresponding to edge (i,j)E(i,j)\in E, then the cut value of a partition is CUT=iandjare in different partitionswij\texttt{CUT}=\sum_{i\ \textup{and}\ j\ \textup{are in different partitions}}w_{ij}. Consider the standard SDP relaxation of Max-k-Cut

maxX0C,Xsubject to{diag(X)=𝟙Xij1k1ij,\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\textup{diag}(X)=\mathbbm{1}\\ &X_{ij}\geq-\frac{1}{k-1}\quad i\neq j,\end{cases} (k-Cut-SDP)

where C=k12kLGC=\frac{k-1}{2k}L_{G} is a scaled Laplacian. Frieze and Jerrum [14] developed a randomized rounding scheme that takes an optimal solution XX^{\star} of (k-Cut-SDP) and produces a random partitioning satisfying

𝔼[CUT]=ijE,i<jwijPr(iandjare in different partitions)αkC,XαkoptkG,\mathbb{E}[\texttt{CUT}]=\sum_{ij\in E,i<j}w_{ij}\textup{Pr}(i\ \textup{and}\ j\ \textup{are in different partitions})\geq\alpha_{k}\langle C,X^{\star}\rangle\geq\alpha_{k}\textup{opt}_{k}^{G}, (1.1)

where optkG\textup{opt}_{k}^{G} is the optimal kk-cut value and αk=min1/(k1)ρ1kp(ρ)(k1)(1ρ)\alpha_{k}=\min_{-1/(k-1)\leq\rho\leq 1}\frac{kp(\rho)}{(k-1)(1-\rho)}, where p(ρ)p(\rho) is the probability that ii and jj are in different partitions given that Xij=ρX_{ij}=\rho. The rounding scheme proposed in [14], referred to as the FJ rounding scheme in the rest of the paper, generates kk i.i.d. samples, z1,,zk𝒩(0,X)z_{1},\dotsc,z_{k}\sim\mathcal{N}(0,X^{\star}) and assigns vertex ii to part pp, if [zp]i[zl]i[z_{p}]_{i}\geq[z_{l}]_{i} for all l=1,,kl=1,\dotsc,k.

1.2 Correlation clustering

In correlation clustering, we are given a set of |V||V| vertices together with the information indicating whether pairs of vertices are similar or dissimilar, modeled by the edges in the sets E+E^{+} and EE^{-} respectively. The Max-Agree variant of correlation clustering seeks to maximize

𝒞=ijEwij𝟙[i,j in different clusters]+ijE+wij+𝟙[i,j in the same cluster].\mathcal{C}=\sum_{ij\in E^{-}}w_{ij}^{-}\mathbbm{1}_{[i,j\textup{ in different clusters}]}+\sum_{ij\in E^{+}}w_{ij}^{+}\mathbbm{1}_{[i,j\textup{ in the same cluster}]}.

Define G+=(V,E+)G^{+}=(V,E^{+}) and G=(V,E)G^{-}=(V,E^{-}). A natural SDP relaxation of Max-Agree [6] is

maxX0C,Xsubject to{diag(X)=𝟙Xij0ij,\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\textup{diag}(X)=\mathbbm{1}\\ &X_{ij}\geq 0\quad i\neq j,\end{cases} (MA-SDP)

where C=LG+W+C=L_{G^{-}}+W^{+}, LGL_{G^{-}} is the Laplacian of the graph GG^{-} and W+W^{+} is the weighted adjacency matrix of the graph G+G^{+}. Charikar et al. [10] (see also Swamy [31]) propose a rounding scheme that takes an optimal solution XGX^{\star}_{G} of (MA-SDP) and produces a random clustering 𝒞\mathcal{C} satisfying

𝔼[𝒞]0.766C,XG0.766optCCG,\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{\star}_{G}\rangle\geq 0.766\textup{opt}^{G}_{CC}, (1.2)

where optCCG\textup{opt}_{CC}^{G} is the optimal clustering value. The rounding scheme proposed in [10], referred to as the CGW rounding scheme in the rest of the paper, generates either k=2k=2 or k=3k=3 i.i.d. zero-mean Gaussian samples with covariance XGX^{\star}_{G} and uses them to define 2k2^{k} clusters.

1.3 Contributions

We now summarize key contributions of the paper.

Gaussian sampling for Max-k-Cut. Applying Gaussian sampling-based Frank-Wolfe given in [28] directly to (k-Cut-SDP) uses n2n^{2} memory. We, however, show how to extend the approach from [28] to Max-k-Cut by proposing an alternate SDP relaxation for the problem, and combining it with the FJ rounding scheme to generate a kk-cut with nearly the same approximation guarantees as stated in (1.1) (see Proposition 1.1) using 𝒪(n+|E|)\mathcal{O}(n+|E|) memory. A key highlight of our approach is that while the approximation ratio remains close to the state-of-the-art result in (1.1), reducing it by a factor of 15ϵ1-5\epsilon for ϵ(0,1/5)\epsilon\in(0,1/5), the memory used is independent of ϵ\epsilon. We summarize our result as follows.

Proposition 1.1.

For ϵ(0,1/5)\epsilon\in(0,1/5), our 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)-time method outlined in Section 3 uses 𝒪(n+|E|)\mathcal{O}(n+|E|) memory and generates a kk-cut for the graph G=(V,E)G=(V,E) whose expected value satisfies 𝔼[CUT]αk(15ϵ)optkG\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{G}, where optkG\textup{opt}_{k}^{G} is the optimal kk-cut value.

Gaussian sampling for Max-Agree. The structure of (MA-SDP) is similar to that of (k-Cut-SDP), however, the cost matrix in (MA-SDP) is no longer PSD or diagonally dominant, a property that plays an important role in our analysis in the case of Max-k-Cut. Despite this, we show how to generate a (17ϵ)0.766(1-7\epsilon)0.766-optimal clustering using 𝒪(n+|E|)\mathcal{O}(n+|E|) memory. Our approach makes a small sacrifice in the approximation ratio (as compared to (1.2)), however, the memory used remains independent of ϵ\epsilon.

Proposition 1.2.

For ϵ(0,1/7)\epsilon\in(0,1/7), our 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)-time method outlined in Section 4 uses 𝒪(n+|E|)\mathcal{O}(n+|E|) memory and generates a clustering of graph G=(V,E)G=(V,E) whose expected value satisfies 𝔼[𝒞]0.766(17ϵ)optCCG\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon)\textup{opt}^{G}_{CC}, where optCCG\textup{opt}^{G}_{CC} is the optimal clustering value.

The constructive proof outline of Propositions 1.1 and 1.2 is given in Sections 3 and 4 respectively.

Memory reduction using graph sparsification. Propositions 1.1 and 1.2 state that the memory used by our approach is 𝒪(n+|E|)\mathcal{O}(n+|E|). However, for dense graphs, the memory used by our method becomes Θ(n2)\Theta(n^{2}). In this setting, to reduce the memory used, we first need to change the way we access the problem instance. We assume that the input (weighted) graph GG arrives edge-by-edge, eliminating the need to store the entire dense graph. We then replace it with a τ\tau-spectrally close graph G~\tilde{G} (see Definition 5.1) with 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) edges. Next, we compute an approximate solution to the new problem defined on the sparse graph using 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) memory. For Max-k-Cut and Max-Agree, we show that this method generates a solution with provable approximation guarantees.

1.4 Literature review

We first review key low memory algorithms for linearly constrained SDPs.

Burer and Monteiro [9] proposed a nonlinear programming approach which replaces the PSD decision variable with its low-rank factorization in SDPs with dd linear constraints. If the selected value of rank rr satisfies r(r+1)2dr(r+1)\geq 2d and the constraint set is a smooth manifold, then any second-order critical point of the nonconvex problem is a global optimum [8]. Another approach, that requires Θ(d+nr)\Theta(d+nr) working memory, is to first determine (approximately) the subspace in which the (low) rank-rr solution to an SDP lies and then solve the problem over the (low) rr-dimensional subspace [11].

Alternatively, randomized sketching to a low dimensional subspace is often used as a low-memory alternative to storing a matrix decision variable [35, 32]. Recently, such sketched variables have been used to generate a low-rank approximation of a near-optimal solution to SDPs [38]. The working memory required to compute a near-optimal solution and generate its rank-rr approximation using the algorithmic framework proposed by Yurtsever et al. [38] is 𝒪(d+rn/ζ)\mathcal{O}(d+rn/\zeta) for some sketching parameter ζ(0,1)\zeta\in(0,1). Gaussian sampling-based Frank-Wolfe [28] uses 𝒪(n+d)\mathcal{O}(n+d) memory to generate zero-mean Gaussian samples whose covariance represents a near-optimal solution to the SDPs with dd linear constraints. This eliminates the dependency on the rank of the near-optimal solution or the accuracy to which its low rank approximation is computed.

However, the two problems considered in this paper have SDP relaxations with n2n^{2} constraints, for which applying the existing low-memory techniques provide no benefit since the memory requirement of these techniques depends on the number of constraints in the problem. These problems have been studied extensively in literature as we see below.

Max-k-Cut.

Max-k-Cut and its dual Min-k-Partition have applications in frequency allocation [12] and generating lower bound on co-channel interference in cellular networks [7]. These problems have been studied extensively in the literature [20, 27, 30]. The SDP-based rounding scheme given in [14] has also been adapted for similar problems of capacitated Max-k-Cut [16] and approximate graph coloring [21]. In each case, however, the SDP relaxation has Ω(n2)\Omega(n^{2}) constraints. Alternative heuristic methods have been proposed in [26, 24, 13, 17], however, these methods generate a feasible cut which only provides a lower bound on the optimal cut value.

Correlation clustering.

Swamy [31], Charikar et al. [10] each provide 0.766-approximation schemes for Max-Agree which involve solving (MA-SDP). For large-scale applications, data streaming techniques have been studied quite extensively for various clustering problems, such as kk-means and kk-median [3, 25]. Ahn et al. [2] propose a single-pass, 𝒪~(|E|+nϵ10)\mathcal{\tilde{O}}(|E|+n\epsilon^{-10})-time 0.766(1ϵ)0.766(1-\epsilon)-approximation algorithm for Max-Agree that uses 𝒪~(n/ϵ2)\mathcal{\tilde{O}}(n/\epsilon^{2}) memory. In contrast, to achieve the same approximation guarantee, our approach uses 𝒪(n+min{|E|,nlognϵ2})\mathcal{O}\left(n+\min\left\{|E|,\frac{n\log n}{\epsilon^{2}}\right\}\right) memory which is equal to 𝒪(n+|E|)\mathcal{O}(n+|E|) for sparse graphs, and is independent of ϵ\epsilon. Furthermore, the computational complexity of our approach has a better dependence on ϵ\epsilon given by 𝒪(n2.5ϵ2.5min{|E|,nlognϵ2}1.25log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}}{\epsilon^{2.5}}\min\{|E|,\frac{n\log n}{\epsilon^{2}}\}^{1.25}\log(n/\epsilon)\log(|E|)\right), and is at most 𝒪(n3.75ϵ5(logn)1.25log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{3.75}}{\epsilon^{5}}(\log n)^{1.25}\log(n/\epsilon)\log(|E|)\right). Moreover, our approach is algorithmically simple to implement.

1.5 Outline

In Section 2, we review the Gaussian sampling-based Frank-Wolfe method [28] to compute a near-feasible, near-optimal solution to SDPs with linear equality and inequality constraints. In Sections 3 and 4 respectively, we adapt the Gaussian sampling-based approach to give an approximation algorithm for Max-k-Cut and Max-Agree respectively, that use only 𝒪(n+|E|)\mathcal{O}(n+|E|) memory, proving Propositions 1.1 and 1.2 respectively. In Section 5, we show how to combine our methods with streaming spectral sparsification to reduce the memory required to 𝒪(nlogn/ϵ2)\mathcal{O}(n\log n/\epsilon^{2}) for dense graphs presented edge-by-edge in a stream. We provide some preliminary computational results for Max-Agree in Section 6, and conclude our work and discuss possible future directions in Section 7. All proofs are deferred to the appendix.

Notations.

The matrix inner product is denoted by A,B=Tr(ATB)\left\langle A,B\right\rangle=\textup{Tr}\left(A^{T}B\right). The vector of diagonal entries of a matrix XX is diag(X)\textup{diag}(X), and diag(x)\textup{diag}^{*}(x) is a diagonal matrix with the vector xx on the diagonal. The notations 𝒪,Ω,Θ\mathcal{O},\Omega,\Theta have the usual complexity interpretation and 𝒪~\tilde{\mathcal{O}} suppresses the dependence on logn\log n. An undirected edge (i,j)(i,j) in the set EE is denoted using (i,j)E(i,j)\in E and ijEij\in E interchangably.

2 Gaussian Sampling-based Frank-Wolfe

Consider a smooth, concave function gg and define the trace constrained SDP

maxX𝒮g((X)),\max_{X\in\mathcal{S}}\quad g(\mathcal{B}(X)), (BoundedSDP)

where 𝒮={Tr(X)α,X0}\mathcal{S}=\{\textup{Tr}(X)\leq\alpha,X\succeq 0\} and ():𝕊nd\mathcal{B}(\cdot):\mathbb{S}^{n}\rightarrow\mathbb{R}^{d} is a linear mapping that projects the variable from (n+12)\binom{n+1}{2}-dimensional space to a dd-dimensional space. One algorithmic approach to solving (BoundedSDP) is to use the Frank-Wolfe algorithm [18] which, in this case, computes an ϵ\epsilon-optimal solution by taking steps of the form Xt+1=(1γt)Xt+γtαhthtTX_{t+1}=(1-\gamma_{t})X_{t}+\gamma_{t}\alpha h_{t}h_{t}^{T}, where γt[0,1]\gamma_{t}\in[0,1] and unit vectors hth_{t}’s arise from approximately solving a symmetric eigenvalue problem that depends only on (Xt)\mathcal{B}(X_{t}) and g()g(\cdot). Standard convergence results show that an ϵ\epsilon-optimal solution is reached after 𝒪(Cgu/ϵ)\mathcal{O}(C^{u}_{g}/\epsilon) iterations, where CguC_{g}^{u} is an upper bound on the curvature constant of gg [18].

Frank-Wolfe with Gaussian sampling.

The Gaussian sampling technique of [28] replaces the matrix-valued iterates, XtX_{t}, with Gaussian random vectors zt𝒩(0,Xt)z_{t}\sim\mathcal{N}(0,X_{t}). The update, at the level of samples, is then zt+1=1γtzt+γtαζthtz_{t+1}=\sqrt{1-\gamma_{t}}z_{t}+\sqrt{\gamma_{t}\alpha}\,\zeta_{t}h_{t}, where ζt𝒩(0,1)\zeta_{t}\sim\mathcal{N}(0,1). Note that zt+1z_{t+1} is also a zero-mean Gaussian random vector with covariance equal to Xt+1=(1γt)Xt+γtαhthtTX_{t+1}=(1-\gamma_{t})X_{t}+\gamma_{t}\alpha h_{t}h_{t}^{T}. Furthermore, to track the change in the objective function value, it is sufficient to track the value vt=(Xt)v_{t}=\mathcal{B}(X_{t}), and compute vt+1v_{t+1} such that vt+1=(1γt)vt+γt(αhthtT)v_{t+1}=(1-\gamma_{t})v_{t}+\gamma_{t}\mathcal{B}(\alpha h_{t}h_{t}^{T}). Thus, computing the updates to the decision variable and tracking the objective function value only requires the knowledge of zt𝒩(0,Xt)z_{t}\sim\mathcal{N}(0,X_{t}) and (Xt)\mathcal{B}(X_{t}), which can be updated without explicitly storing XtX_{t}, thereby reducing the memory used.

Algorithm 1 [28] describes, in detail, Frank-Wolfe algorithm with Gaussian sampling when applied to (BoundedSDP). It uses at most 𝒪(n+d)\mathcal{O}(n+d) memory at each iteration, and after at most 𝒪(Cgu/ϵ)\mathcal{O}(C_{g}^{u}/\epsilon) iterations, returns a sample z^ϵ𝒩(0,X^ϵ)\widehat{z}_{\epsilon}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}), where X^ϵ\widehat{X}_{\epsilon} is an ϵ\epsilon-optimal solution to (BoundedSDP).

1
2
Input : Input data for (BoundedSDP), stopping criteria ϵ\epsilon, accuracy parameter η\eta, upper bound CguC_{g}^{u} on the curvature constant, probability pp for the subproblem LMO
3
Output : z𝒩(0,X^ϵ)z\sim\mathcal{N}(0,\widehat{X}_{\epsilon}) and v=(X^ϵ)v=\mathcal{B}(\widehat{X}_{\epsilon}), where X^ϵ\widehat{X}_{\epsilon} is an ϵ\epsilon-optimal solution of (BoundedSDP)
4
5 Function FWGaussian:
6       Select initial point X0𝒮X_{0}\in\mathcal{S}; set v0(X0)v_{0}\leftarrow\mathcal{B}(X_{0}) and sample z0𝒩(0,X0)z_{0}\sim\mathcal{N}(0,X_{0})
7       t0t\leftarrow 0, γ2/(t+2)\gamma\leftarrow 2/(t+2)
8       (ht,qt)LMO((g(vt)),12ηγCgu)(h_{t},q_{t})\leftarrow\textnormal{{LMO}}(\mathcal{B}^{*}(\nabla g(v_{t})),\frac{1}{2}\eta\gamma C_{g}^{u})
9      
10      while qtvt,g(vt)>ϵ\left\langle q_{t}-v_{t},\nabla g(v_{t})\right\rangle>\epsilon do
11             (zt+1,vt+1)UpdateVariable(zt,vt,ht,qt,γ)(z_{t+1},v_{t+1})\leftarrow\textnormal{{UpdateVariable}}(z_{t},v_{t},h_{t},q_{t},\gamma)
12            
13            tt+1t\leftarrow t+1, γ2/(t+2)\gamma\leftarrow 2/(t+2)
14             (ht,qt)LMO((g(vt)),12ηγCgu,p)(h_{t},q_{t})\leftarrow\textnormal{{LMO}}(\mathcal{B}^{*}(\nabla g(v_{t})),\frac{1}{2}\eta\gamma C_{g}^{u},p)
15            
16       end while
17      
18 return (zt,vt)(z_{t},v_{t})
19
20 Function LMO(JJ, δ\delta):
21       Find a unit vector hh such that with probability at least 1p1-p, αλ=αhhT,Jmaxd𝒮αd,Jδ\alpha\lambda=\alpha\langle hh^{T},J\rangle\geq\max_{d\in\mathcal{S}}\alpha\langle d,J\rangle-\delta
22       if λ0\lambda\geq 0 then  q(αhhT)q\leftarrow\mathcal{B}(\alpha hh^{T})
23       else  q0q\leftarrow 0, h0h\leftarrow 0
24      
25 return (h,qh,q)
26
27 Function UpdateVariable(z,v,h,q,γz,v,h,q,\gamma):
28       z(1γ)z+γαhζwhere ζ𝒩(0,1)z\leftarrow(\sqrt{1-\gamma})z+\sqrt{\gamma\alpha}\,h\zeta\;\;\textup{where $\zeta\sim\mathcal{N}(0,1)$}
29       v(1γ)v+γqv\leftarrow(1-\gamma)v+\gamma q
30      
31 return (z,vz,v)
32
Algorithm 1 (FWGaussian) Frank-Wolfe Algorithm with Gaussian Sampling [28]

2.1 SDP with linear equality and inequality constraints

Consider an SDP with linear objective function and a bounded feasible region,

maxX0C,Xsubject to{𝒜(1)(X)=b(1)𝒜(2)(X)b(2),\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\mathcal{A}^{(1)}(X)=b^{(1)}\\ &\mathcal{A}^{(2)}(X)\geq b^{(2)},\end{cases} (SDP)

where 𝒜(1)():𝕊+nd1\mathcal{A}^{(1)}(\cdot):\mathbb{S}^{n}_{+}\rightarrow\mathbb{R}^{d_{1}} and 𝒜(2)():𝕊+nd2\mathcal{A}^{(2)}(\cdot):\mathbb{S}^{n}_{+}\rightarrow\mathbb{R}^{d_{2}} are linear maps. To use Algorithm 1, the linear constraints are penalized using a smooth penalty function. Let ul=Al(1),Xbl(1)u_{l}=\langle A^{(1)}_{l},X\rangle-b^{(1)}_{l} for l=1,,d1l=1,\dotsc,d_{1} and vl=bl(2)Al(2),Xv_{l}=b^{(2)}_{l}-\langle A^{(2)}_{l},X\rangle for l=1,,d2l=1,\dotsc,d_{2}. For M>0M>0, the smooth function ϕM():d1+d2\phi_{M}(\cdot):\mathbb{R}^{d_{1}+d_{2}}\rightarrow\mathbb{R},

ϕM(u,v)=1Mlog(i=1d1eM(ui)+i=1d1eM(ui)+j=1d2eM(vj)),satisfies\displaystyle\phi_{M}(u,v)=\frac{1}{M}\log\left(\sum_{i=1}^{d_{1}}e^{M(u_{i})}+\sum_{i=1}^{d_{1}}e^{M(-u_{i})}+\sum_{j=1}^{d_{2}}e^{M(v_{j})}\right),\quad\textup{satisfies} (2.1)
max{u,maxivi}ϕM(u,v)log(2d1+d2)M+max{u,maxivi}.\displaystyle\max\left\{\|u\|_{\infty},\max_{i}v_{i}\right\}\leq\phi_{M}(u,v)\leq\frac{\log(2d_{1}+d_{2})}{M}+\max\left\{\|u\|_{\infty},\max_{i}v_{i}\right\}. (2.2)

We add this penalty term to the objective of (SDP) and define

maxX0{C,XβϕM(𝒜(1)(X)b(1),b(2)𝒜(2)(X)):Tr(X)α},\max_{X\succeq 0}\left\{\langle C,X\rangle-\beta\phi_{M}(\mathcal{A}^{(1)}(X)-b^{(1)},b^{(2)}-\mathcal{A}^{(2)}(X))\ :\ \textup{Tr}(X)\leq\alpha\right\}, (SDP-LSE)

where α,β\alpha,\beta and MM are appropriately chosen parameters. Algorithm 1 then generates a Gaussian sample with covariance X^ϵ\widehat{X}_{\epsilon} which is an ϵ\epsilon-optimal solution to (SDP-LSE). It is also a near-optimal, near-feasible solution to (SDP). This result is a slight modification of [28, Lemma 3.2] which only provides bounds for SDPs with linear equality constraints.

Lemma 2.1.

For ϵ>0\epsilon>0, let (X,ϑ,μ)(X^{\star},\vartheta^{\star},\mu^{\star}) be an optimal primal-dual solution to (SDP) and its dual, and let X^ϵ\widehat{X}_{\epsilon} be an ϵ\epsilon-optimal solution to (SDP-LSE). If β>ϑ1+μ1\beta>\|\vartheta^{\star}\|_{1}+\|\mu^{\star}\|_{1} and M>0M>0, then

C,Xβlog(2d1+d2)MϵC,X^ϵC,X+(ϑ1+μ1)βlog(2d1+d2)M+ϵβϑ1μ1,\displaystyle\langle C,X^{\star}\rangle-\frac{\beta\log(2d_{1}+d_{2})}{M}-\epsilon\leq\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}\rangle+(\|\vartheta^{\star}\|_{1}+\|\mu^{\star}\|_{1})\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\|\vartheta^{\star}\|_{1}-\|\mu^{\star}\|_{1}}, (2.3)
max{𝒜(1)(X)b(1),maxi(bi(2)𝒜i(2)(X))}βlog(2d1+d2)M+ϵβϑ1μ1.\displaystyle\max\left\{\|\mathcal{A}^{(1)}(X)-b^{(1)}\|_{\infty},\max_{i}\left(b^{(2)}_{i}-\mathcal{A}^{(2)}_{i}(X)\right)\right\}\leq\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\|\vartheta^{\star}\|_{1}-\|\mu^{\star}\|_{1}}. (2.4)

3 Application of Gaussian Sampling to (k-Cut-SDP)

In this section, we look at the application of Gaussian sampling to Max-k-Cut. Since Algorithm 1 uses 𝒪(n2)\mathcal{O}(n^{2}) memory when solving (k-Cut-SDP), we define a new SDP relaxation of Max-k-Cut with the same approximation guarantee, but with 𝒪(|E|)\mathcal{O}(|E|) constraints. We then apply Algorithm 1 to this new relaxation, and show how to round the solution to achieve nearly the same approximation ratio as given in (1.1). Let

αk=min1/(k1)ρ1kp(ρ)(k1)(1ρ),\alpha_{k}=\min_{-1/(k-1)\leq\rho\leq 1}\frac{kp(\rho)}{(k-1)(1-\rho)}, (3.1)

where p(Xij)p(X_{ij}) is the probability that vertices ii and jj are in different partitions. If XX is feasible for (k-Cut-SDP) and CUT is the value of the kk-cut generated by the FJ rounding scheme, then

𝔼[CUT]=ijE,i<jwijp(Xij)ijE,i<jk1kwij(1Xij)αk=αkC,X.\begin{split}\mathbb{E}[\texttt{CUT}]&=\sum_{ij\in E,i<j}w_{ij}p(X_{ij})\\ &\geq\sum_{ij\in E,i<j}\frac{k-1}{k}w_{ij}(1-X_{ij})\alpha_{k}=\alpha_{k}\langle C,X\rangle.\end{split} (3.2)

Frieze and Jerrum [14] derive a lower bound on αk\alpha_{k}, showing that the method gives a nontrivial approximation guarantee. Observe that (3.2) depends only on the values XijX_{ij} if (i,j)E(i,j)\in E.

A new SDP relaxation of Max-k-Cut.

We relax the constraints in (k-Cut-SDP) to define

maxX0C,Xsubject to{diag(X)=𝟙Xij1k1(i,j)E,i<j.\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\textup{diag}(X)=\mathbbm{1}\\ &X_{ij}\geq-\frac{1}{k-1}\quad(i,j)\in E,i<j.\end{cases} (k-Cut-Rel)

Since (k-Cut-Rel) is a relaxation of (k-Cut-SDP), its optimal objective function value provides an upper bound on C,X\langle C,X^{\star}\rangle, where XX^{\star} is an optimal solution to (k-Cut-SDP), and hence, on the optimal kk-cut value optkG\textup{opt}_{k}^{G}. Note that the bound in (3.2) holds true even if we replace XX^{\star} by an optimal solution to (k-Cut-Rel) since it depends on the value of XijX_{ij} only if (i,j)E(i,j)\in E. Furthermore, when the FJ rounding scheme is applied to the solution of (k-Cut-Rel), it satisfies the approximation guarantee on the expected value of the generated kk-cut given in (1.1), i.e., 𝔼[CUT]αkoptkG\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}\textup{opt}_{k}^{G}.

Using Algorithm 1.

We now have an SDP relaxation of Max-k-Cut that has n+|E|n+|E| constraints. Penalizing the linear constraints in (k-Cut-Rel) using the function ϕM()\phi_{M}(\cdot) (2.1), Algorithm 1 can now be used to generate kk samples with covariance X^ϵ\widehat{X}_{\epsilon} which is an ϵ\epsilon-optimal solution to

maxX0{C,XβϕM(diag(X)𝟙,1k1eiTXej):(i,j)E,Tr(X)n}.\textstyle{\max\limits_{X\succeq 0}\left\{\langle C,X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},-\frac{1}{k-1}-e_{i}^{T}Xe_{j}\right):(i,j)\in E,\textup{Tr}(X)\leq n\right\}}. (k-Cut-LSE)

Optimality and feasibility results for (k-Cut-Rel).

We now show that an ϵ\epsilon-optimal solution to (k-Cut-LSE) is also a near-optimal, near-feasible solution to (k-Cut-Rel).

Lemma 3.1.

For ϵ(0,1/2)\epsilon\in(0,1/2), let XRX^{\star}_{R} be an optimal solution to (k-Cut-Rel) and let X^ϵ\widehat{X}_{\epsilon} be an ϵTr(C)\epsilon\textup{Tr}(C)-optimal solution to (k-Cut-LSE). For β=6Tr(C)\beta=6\textup{Tr}(C) and M=6log(2n+|E|)ϵM=6\frac{\log(2n+|E|)}{\epsilon}, we have

(12ϵ)C,XRC,X^ϵ(1+4ϵ)C,XRand\displaystyle(1-2\epsilon)\langle C,X^{\star}_{R}\rangle\leq\langle C,\widehat{X}_{\epsilon}\rangle\leq(1+4\epsilon)\langle C,X^{\star}_{R}\rangle\quad\textrm{and} (3.3)
diag(X^ϵ)𝟙ϵ,[X^ϵ]ij1k1ϵ,(i,j)E,i<j.\displaystyle\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\frac{1}{k-1}-\epsilon,\quad(i,j)\in E,i<j. (3.4)

Generating a feasible solution to Max-k-Cut.

Since X^ϵ\widehat{X}_{\epsilon} might not necessarily be feasible to (k-Cut-Rel), we cannot apply the FJ rounding scheme to the samples zi𝒩(0,X^ϵ)z_{i}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}). We, therefore, generate samples zif𝒩(0,Xf)z^{f}_{i}\sim\mathcal{N}(0,X^{f}) using the procedure given in Algorithm 2 such that XfX^{f} is a feasible solution to (k-Cut-Rel) and C,Xf\langle C,X^{f}\rangle is close to C,X^ϵ\langle C,\widehat{X}_{\epsilon}\rangle.

1
Input : Sample zi𝒩(0,X^ϵ)z_{i}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}) for i=1,,ki=1,\dotsc,k and diag(X^ϵ)\textup{diag}(\widehat{X}_{\epsilon})
Output : zif𝒩(0,Xf)z^{f}_{i}\sim\mathcal{N}(0,X^{f}) for i=1,,ki=1,\dotsc,k with XfX^{f} feasible to (k-Cut-Rel)
2
3
4Function GeneratekSamples:
5       for i=1,,ki=1,\dotsc,k do
6             Set err=max{0,max(i,j)E,i<j1/(k1)[X^ϵ]ij}\textup{err}=\max\{0,\max_{(i,j)\in E,i<j}\ -1/(k-1)-[\widehat{X}_{\epsilon}]_{ij}\}
7             Set z¯i=zi+erry 1\overline{z}_{i}=z_{i}+\sqrt{\textup{err}}\,y\,\mathbbm{1}, where y𝒩(0,1)y\sim\mathcal{N}(0,1)
8             Generate ζ𝒩(0,Idiag(diag(X^ϵ)+errmax(diag(X^ϵ))+err))\zeta\sim\mathcal{N}\left(0,I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)
9             Set zif=z¯imax(diag(X^ϵ))+err+ζz^{f}_{i}=\frac{\overline{z}_{i}}{\sqrt{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}}+\zeta
10       end for
11      
12 return z1f,,zkfz^{f}_{1},\dotsc,z^{f}_{k}
Algorithm 2 Generate Gaussian samples with covariance feasible to (k-Cut-Rel)

We can now apply the FJ rounding scheme to z1f,,zkfz^{f}_{1},\dotsc,z^{f}_{k} as given in Lemma 3.2.

Lemma 3.2.

For G=(V,E)G=(V,E), let optkG\textup{opt}_{k}^{G} be the optimal kk-cut value and let XRX_{R}^{\star} be an optimal solution to (k-Cut-Rel). For ϵ(0,14)\epsilon\in\left(0,\frac{1}{4}\right), let X^ϵ0\widehat{X}_{\epsilon}\succeq 0 satisfy (3.3) and (3.4). Let z1f,,zkfz^{f}_{1},\dotsc,z^{f}_{k} be random vectors generated by Algorithm 2 with input zi,,zk𝒩(0,X^ϵ)z_{i},\dotsc,z_{k}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}) and let CUT denote the value of a kk-cut generated by applying the FJ rounding scheme to z1f,,zkfz^{f}_{1},\dotsc,z^{f}_{k}. For αk\alpha_{k} as defined by (3.1), we have

αk(14ϵ)optkGαk(14ϵ)C,XR𝔼[CUT]optkG.\alpha_{k}(1-4\epsilon)\textup{opt}_{k}^{G}\leq\alpha_{k}(1-4\epsilon)\langle C,X_{R}^{\star}\rangle\leq\mathbb{E}[\texttt{CUT}]\leq\textup{opt}_{k}^{G}. (3.5)

Computational complexity of Algorithm 1 when applied to (k-Cut-LSE).

Finally, in Lemma 3.3, we provide the computational complexity of the method proposed in this section, which concludes the proof of Proposition 1.1.

Lemma 3.3.

When the method proposed in this section (Section 3), with p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)} and T(n,ϵ)=144log(2n+|E|)n2ϵ2T(n,\epsilon)=\frac{144\log(2n+|E|)n^{2}}{\epsilon^{2}}, is used to generate an approximate kk-cut to Max-k-Cut, the generated cut satisfies 𝔼[CUT]αk(15ϵ)optkG\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{G} and runs in 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right) time.

4 Application of Gaussian Sampling to (MA-SDP)

We now look at the application of our Gaussian sampling-based method to Max-Agree. Algorithm 1 uses 𝒪(n2)\mathcal{O}(n^{2}) memory to generate samples whose covariance is an ϵ\epsilon-optimal solution to (MA-SDP). However, with the similar observation as in the case of Max-k-Cut, we note that for any XX feasible to (MA-SDP), the proof of the inequality 𝔼[𝒞]0.766C,X\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X\rangle, given in [10, Theorem 3], requires Xij0X_{ij}\geq 0 only if (i,j)E(i,j)\in E. We therefore, write a new relaxation of (MA-SDP),

maxX0C,X=W++LG,Xsubject to{Xii=1i{1,,n}Xij0(i,j)E,i<j,\max_{X\succeq 0}\ \langle C,X\rangle=\langle W^{+}+L_{G^{-}},X\rangle\ \ \ \textup{subject to}\ \begin{cases}&X_{ii}=1\ \ \forall\ i\in\{1,\dotsc,n\}\\ &X_{ij}\geq 0\ \ (i,j)\in E,i<j,\end{cases} (MA-Rel)

with only n+|E|n+|E| constraints. The bound 𝔼[𝒞]0.766C,X0.766optCCG\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{\star}\rangle\geq 0.766\textup{opt}_{CC}^{G} on the expected value of the clustering holds even if the clustering is generated by applying the CGW rounding scheme to an optimal solution XX^{\star} of (MA-Rel). To use Algorithm 1, we penalize the constraints in (MA-Rel) and define

maxX0{C,XβϕM(diag(X)𝟙,eiTXej):(i,j)E,Tr(X)n}.\max_{X\succeq 0}\left\{\langle C,X\rangle-\beta\phi_{M}(\textup{diag}(X)-\mathbbm{1},-e_{i}^{T}Xe_{j}):(i,j)\in E,\textup{Tr}(X)\leq n\right\}. (MA-LSE)

Optimality and feasibility results for (MA-Rel).

Algorithm 1 is now used to generate z𝒩(0,X^ϵ)z\sim\mathcal{N}(0,\widehat{X}_{\epsilon}), where X^ϵ\widehat{X}_{\epsilon} is an ϵ\epsilon-optimal solution to (MA-LSE). We show in Lemma 4.1 that X^ϵ\widehat{X}_{\epsilon} is also a near-optimal, near-feasible solution to (MA-Rel).

Lemma 4.1.

For Δ=Tr(LG)+ijE+wij+\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}, ϵ(0,14)\epsilon\in\left(0,\frac{1}{4}\right), let XGX^{\star}_{G} be an optimal solution to (MA-Rel) and X^ϵ\widehat{X}_{\epsilon} be an ϵΔ\epsilon\Delta-optimal solution to (MA-LSE). Setting β=4Δ\beta=4\Delta and M=4log(2n+|E|)ϵM=4\frac{\log(2n+|E|)}{\epsilon}, we have

(14ϵ)C,XGC,X^ϵ(1+4ϵ)C,XGand\displaystyle(1-4\epsilon)\langle C,X^{\star}_{G}\rangle\leq\langle C,\widehat{X}_{\epsilon}\rangle\leq(1+4\epsilon)\langle C,X^{\star}_{G}\rangle\quad\textup{and} (4.1)
diag(X^ϵ)𝟙ϵ,[X^ϵ]ijϵ,(i,j)E,i<j.\displaystyle\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\epsilon,\quad(i,j)\in E,i<j. (4.2)

Generating an approximate clustering.

The CGW rounding scheme can only be applied if we have a feasible solution to (MA-Rel). We, therefore, use a modified version of Algorithm 2, with Step 3 replaced by err=max{0,max(i,j)E,i<j[X^ϵ]ij}\textup{err}=\max\{0,\max_{(i,j)\in E,i<j}\ -[\widehat{X}_{\epsilon}]_{ij}\} and input z1,z2,z3𝒩(0,X^ϵ)z_{1},z_{2},z_{3}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}), to generate zero-mean Gaussian samples whose covariance is a feasible solution to (MA-Rel). Finally, we apply the CGW rounding scheme to the output of the modified of Algorithm 2.

Lemma 4.2.

Let XGX^{\star}_{G} be an optimal solution to (MA-Rel). For ϵ(0,1/6)\epsilon\in\left(0,1/6\right), let X^ϵ0\widehat{X}_{\epsilon}\succeq 0 satisfy (4.1) and (4.2), and let z1f,z2f,z3fz^{f}_{1},z^{f}_{2},z^{f}_{3} be random vectors generated by Algorithm 2 with input z1,z2,z3𝒩(0,X^ϵ)z_{1},z_{2},z_{3}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}). Let optCCG\textup{opt}_{CC}^{G} denote the optimal clustering value for the graph G=(V,E)G=(V,E) and let 𝒞\mathcal{C} denote the value of the clustering generated from the random vectors z1f,z2f,z3fz^{f}_{1},z^{f}_{2},z^{f}_{3} using the CGW rounding scheme. Then

𝔼[𝒞]0.766(16ϵ)C,XG0.766(16ϵ)optCCG.\mathbb{E}[\mathcal{C}]\geq 0.766(1-6\epsilon)\langle C,X^{\star}_{G}\rangle\geq 0.766(1-6\epsilon)\textup{opt}_{CC}^{G}. (4.3)

Computational complexity of Algorithm 1 when applied to (MA-LSE).

Lemma 4.3.

When the method proposed in this section (Section 4), with p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)} and T(n,ϵ)=64log(2n+|E|)n2ϵ2T(n,\epsilon)=\frac{64\log(2n+|E|)n^{2}}{\epsilon^{2}}, is used to generate an approximate clustering, the value of the clustering satisfies 𝔼[𝒞]0.766(17ϵ)optCCG\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon)\textup{opt}_{CC}^{G} and runs in 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right) time.

This completes the proof of Proposition 1.2.

5 Sparsifying the Laplacian Cost Matrix

As seen in Sections 3 and 4, the memory requirement for generating and representing an ϵ\epsilon-optimal solution to (k-Cut-LSE) and (MA-LSE) is bounded by 𝒪(n+|E|)\mathcal{O}(n+|E|). However, if the input graph GG is dense, the cost matrix will be dense and the number of inequality constraints will still be high. In this section, we consider the situation in which the dense weighted graph arrives in a stream, and we first build a sparse approximation with similar spectral properties. We refer to this additional step as sparsifying the cost.

Definition 5.1 (τ\tau-spectral closeness).

Two graphs GG and G~\tilde{G} defined on the same set of vertices are said to be τ\tau-spectrally close if, for any xnx\in\mathbb{R}^{n} and τ(0,1)\tau\in(0,1),

(1τ)xTLGxxTLG~x(1+τ)xTLGx.(1-\tau)x^{T}L_{G}x\leq x^{T}L_{\tilde{G}}x\leq(1+\tau)x^{T}L_{G}x. (5.1)

Spectral graph sparsification has been studied quite extensively (see, e.g., [29, 1, 15]). Kyng et al. [23] propose a 𝒪(|E|log2n)\mathcal{O}(|E|\log^{2}n)-time framework to replace a dense graph G=(V,E)G=(V,E) by a sparser graph G~=(V,E~)\tilde{G}=(V,\tilde{E}) such that |E~|𝒪(nlogn/τ2)|\tilde{E}|\sim\mathcal{O}(n\log n/\tau^{2}) and G~\tilde{G} satisfies (5.1) with probability 11poly(n)1-\frac{1}{\textup{poly}(n)}. Their algorithm assumes that the edges of the graph arrive one at a time, so that the total memory requirement is 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) rather than 𝒪(|E|)\mathcal{O}(|E|). Furthermore, a sparse cost matrix decreases the computation time of the subproblem in Algorithm 1 since it involves matrix-vector multiplication with the gradient of the cost.

Max-k-Cut with sparsification.

Let G~\tilde{G} be a sparse graph with 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) edges that is τ\tau-spectrally close to the input graph GG. By applying the method outlined in Section 3, we can generate a kk-cut for the graph G~\tilde{G} (using 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) memory) whose expected value satisfies the bound (3.5). Note that, this generated cut is also a kk-cut for the original graph GG with provable approximation guarantee as shown in Lemma 5.1.

Lemma 5.1.

For ϵ,τ(0,1/5)\epsilon,\tau\in(0,1/5), let X^ϵ\widehat{X}_{\epsilon} be a near-feasible, near-optimal solution to (k-Cut-Rel) defined on the graph G~\tilde{G} that satisfies (3.3) and (3.4). Let CUT denote the value of the kk-cut generated by applying Algorithm 2 followed by the FJ rounding scheme to X^ϵ\widehat{X}_{\epsilon}. Then the generated cut satisfies

αk(14ϵτ)optkG𝔼[CUT]optkG,\alpha_{k}(1-4\epsilon-\tau)\textup{opt}_{k}^{G}\leq\mathbb{E}[\texttt{CUT}]\leq\textup{opt}_{k}^{G}, (5.2)

where optkG\textup{opt}_{k}^{G} is the value of the optimal kk-cut for the original graph GG.

Max-Agree with sparsification.

The number of edges |E+||E^{+}| and |E||E^{-}| in graphs G+G^{+} and GG^{-} respectively determine the working memory of Algorithm 1. For dense input graphs G+G^{+} and GG^{-}, we sparsify them to generate graphs G~+\tilde{G}^{+} and G~\tilde{G}^{-} with at most 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) edges and define

maxX𝒮f~(X)=LG~+W~+,XβϕM(diag(X)𝟙,[eiTXej](i,j)E~),\max_{X\in\mathcal{S}}\tilde{f}(X)=\langle L_{\tilde{G}^{-}}+\tilde{W}^{+},X\rangle-\beta\phi_{M}(\textup{diag}(X)-\mathbbm{1},-[e_{i}^{T}Xe_{j}]_{(i,j)\in\tilde{E}}), (MA-Sparse)

where 𝒮={X:Tr(X)n,X0}\mathcal{S}=\{X:\textup{Tr}(X)\leq n,X\succeq 0\}, LG~L_{\tilde{G}^{-}} is the Laplacian of the graph G~\tilde{G}^{-}, W~+\tilde{W}^{+} is matrix with nonnegative entries denoting the weight of each edge (i,j)E~+(i,j)\in\tilde{E}^{+}, and E~=E~+E~\tilde{E}=\tilde{E}^{+}\cup\tilde{E}^{-}. Algorithm 1 then generates an ϵ(Tr(LG~)+ijE~+w~ij+)\epsilon(\textup{Tr}(L_{\tilde{G}^{-}})+\sum_{ij\in\tilde{E}^{+}}\tilde{w}^{+}_{ij})-optimal solution, X^ϵ\widehat{X}_{\epsilon}, to (MA-Sparse) using 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) memory. We can now use the method given in Section 4 to generate a clustering of graph G~\tilde{G} whose expected value, 𝔼[𝒞]\mathbb{E}[\mathcal{C}], satisfies (4.3). The following lemma shows that 𝒞\mathcal{C} also represents a clustering for the original graph GG with provable guarantees.

Lemma 5.2.

For ϵ,τ(0,1/9)\epsilon,\tau\in(0,1/9), let X^ϵ\widehat{X}_{\epsilon} be a near-feasible, near-optimal solution to (MA-Sparse) defined on the graph G~\tilde{G} that satisfies (4.1) and (4.2). Let 𝒞\mathcal{C} denote the value of the clustering generated by applying Algorithm 2 followed by the CGW rounding scheme to X^ϵ\widehat{X}_{\epsilon}. Then, 𝔼[𝒞]\mathbb{E}[\mathcal{C}] satisfies

0.766(16ϵ3τ)(1τ2)optCCG𝔼[𝒞]optCCG,0.766(1-6\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G}\leq\mathbb{E}[\mathcal{C}]\leq\textup{opt}_{CC}^{G}, (5.3)

where optCCG\textup{opt}_{CC}^{G} is the value of the optimal clustering of the original graph GG.

We summarize our results in the following lemma whose proof is given in Appendix A.10.

Lemma 5.3.

Assume that the edges of the input graph G=(V,E)G=(V,E) arrive one at a time in a stream. The procedure given in this section uses at most 𝒪(nlogn/τ2)\mathcal{O}(n\log n/\tau^{2}) memory and in 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)-time, generates approximate solutions to Max-k-Cut and Max-Agree that satisfy the bounds 𝔼[CUT]αk(15ϵτ)optkG\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon-\tau)\textup{opt}_{k}^{G} and 𝔼[𝒞]0.766(17ϵ3τ)(1τ2)optCCG\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G} respectively.

6 Computational Results

We now discuss the results of preliminary computations to cluster the vertices of a graph GG using the approach outlined in Section 4. The aim of numerical experiments was to verify that the bounds given in Lemma 4.2 were satisfied when we used the procedure outlined in Section 4 to generate a clustering for each input graph. We used the graphs from GSet dataset [36] which is a collection of randomly generated graphs. Note that the aim of correlation clustering is to generate a clustering of vertices for graphs where each edge has a label indicating ‘similarity’ or ‘dissimilarity’ of the vertices connected to that edge. We, therefore, first converted the undirected, unweighted graphs from the GSet dataset [36] into the instances of graphs with labelled edges using an adaptation of the approach used in [34, 33]. This modified approach generated a label and weight for each edge (i,j)E(i,j)\in E indicating the amount of ‘similarity’ or ‘dissimilarity’ between vertices ii and jj.

Generating input graphs for Max-Agree.

In the process of label generation, we first computed the Jaccard coefficient Jij=|N(i)N(j)|/|N(i)N(j)|J_{ij}=|N(i)\cap N(j)|/|N(i)\cup N(j)|, where N(i)N(i) is the set of neighbours of ii for each edge (i,j)E(i,j)\in E. Next we computed the quantity Sij=log((1Jij+δ)/(1+Jijδ))S_{ij}=\log((1-J_{ij}+\delta)/(1+J_{ij}-\delta)) with δ=0.05\delta=0.05 for each edge (i,j)E(i,j)\in E, which is a measure of the amount of ‘similarity’ or ‘dissimilarity’. Finally, the edge (i,j)(i,j) was labelled as ‘dissimilar’ if Sij<0S_{ij}<0 with wij=Sijw^{-}_{ij}=-S_{ij} and labelled as ‘similar’ with wij+=Sijw^{+}_{ij}=S_{ij} otherwise.

Experimental Setup.

We set the input parameters to ϵ=0.05\epsilon=0.05, Δ=Tr(LG)+ijE+wij+\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}, β=4Δ\beta=4\Delta, M=4log(2n)+|E|ϵM=4\frac{\log(2n)+|E|}{\epsilon}. Using Algorithm 1, (MA-LSE) was solved to ϵΔ\epsilon\Delta-optimality and, we computed feasible samples using Algorithm 2. Finally, we generated two Gaussian samples and created at most four clusters by applying the 0.75-rounding scheme proposed by Swamy [31, Theorem 2.1], for simplicity. The computations were performed using MATLAB R2021a on a machine with 8GB RAM. We noted the peak memory used by the algorithm using the profiler command in MATLAB.

The computational result for some randomly selected instances from the dataset is given in Table 1. We have provided the result for the rest of the graphs from GSet in Appendix C. First, we observed that for each input graph, the number of iterations of LMO for ϵΔ\epsilon\Delta-convergence satisfied the bound given in Proposition 1.1 and the infeasibility of the covariance X^ϵ\widehat{X}_{\epsilon} of the generated samples was less than ϵ\epsilon satisfying (4.2). We generated 10 pairs of i.i.d. zero-mean Gaussian samples with covariance X^ϵ\widehat{X}_{\epsilon}, and each in turn was used to generate a clustering for the input graph using the 0.75-rounding scheme proposed by Swamy [31]. Amongst the 10 clusterings generated for each graph, we picked the clustering with largest value denoted by 𝒞best\mathcal{C}_{\textup{best}}. Note that, 𝒞best𝔼[𝒞]0.75(16ϵ)C,XG0.7516ϵ1+4ϵC,X^ϵ\mathcal{C}_{\textup{best}}\geq\mathbb{E}[\mathcal{C}]\geq 0.75(1-6\epsilon)\langle C,X^{\star}_{G}\rangle\geq 0.75\frac{1-6\epsilon}{1+4\epsilon}\langle C,\widehat{X}_{\epsilon}\rangle, where the last inequality follows from combining (4.3) with (4.1). Since we were able to generate the values, 𝒞best\mathcal{C}_{\textup{best}} and C,X^ϵ\langle C,\widehat{X}_{\epsilon}\rangle, we noted that the weaker bound 𝒞best/C,X^ϵ=AR0.75(16ϵ)/(1+4ϵ)\mathcal{C}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle=\textup{AR}\geq 0.75(1-6\epsilon)/(1+4\epsilon) was satisfied by every input graph when ϵ=0.05\epsilon=0.05.

Table 1 also shows the memory used by our method. Consider the dataset G1, for which the memory used by our method was 1526.35kB9.8×(|V|+|E+|+|E|)×81526.35\textup{kB}\approx 9.8\times(|V|+|E^{+}|+|E^{-}|)\times 8, where a factor of 8 denotes that MATLAB requires 8 bytes to store a real number. Similarly, we observed that our method used at most c×(|V|+|E+|+|E|)×8c\times(|V|+|E^{+}|+|E^{-}|)\times 8 memory to generate clusters for other instances from GSet, where c33c\leq 33 for every instance of the input graph, showing that the memory used was linear in the size of the input graph.

Table 1: Result of generating a clustering of graphs from GSet using the method outlined in Section 4. We have, infeas=max{diag(X)1,max{0,[X^ϵ]ij}}\textup{infeas}=\max\{\|\textup{diag}(X)-1\|_{\infty},\max\{0,-[\widehat{X}_{\epsilon}]_{ij}\}\}, AR=𝒞best/C,X^ϵ\textup{AR}=\mathcal{C}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle and 0.75(16ϵ)/(1+4ϵ)=0.43750.75(1-6\epsilon)/(1+4\epsilon)=0.4375 for ϵ=0.05\epsilon=0.05.
Dataset |V||V| |E+||E^{+}| |E||E^{-}| # Iterations (×103\times 10^{3}) infeas C,X^ϵ\langle C,\widehat{X}_{\epsilon}\rangle 𝒞best\mathcal{C}_{\textup{best}} AR Memory required (in kB)
G1 800 2453 16627 669.46 10310^{-3} 849.48 643 0.757 1526.35
G11 800 817 783 397.2 6×1046\times 10^{-4} 3000.3 2080 0.693 448.26
G14 800 3861 797 330.02 8×1048\times 10^{-4} 542.55 469.77 0.866 423.45
G22 2000 115 19849 725.66 10310^{-3} 1792.9 1371.1 0.764 1655.09
G32 2000 2011 1989 571.42 9×1049\times 10^{-4} 7370 4488 0.609 1124
G43 1000 248 9704 501.31 10310^{-3} 803.8 616.05 0.766 654.46
G48 3000 0 6000 9806.22 0.004 599.64 461.38 0.769 736.09
G51 1000 4734 1147 1038.99 0.001 676.21 446.29 0.66 517.09
G55 5000 66 12432 2707.07 0.002 1244.2 901.74 0.724 1281.03
G57 5000 4981 5019 574.5 0.005 18195 10292 0.565 812.78

7 Discussion

In this paper, we proposed a Gaussian sampling-based optimization algorithm to generate approximate solutions to Max-k-Cut, and the Max-Agree variant of correlation clustering using 𝒪(n+min{|E|,nlognτ2})\mathcal{O}\left(n+\min\left\{|E|,\frac{n\log n}{\tau^{2}}\right\}\right) memory. The approximation guarantees given in [14, 10, 31] for these problems are based on solving SDP relaxations of these problems that have n2n^{2} constraints. The key observation that led to the low-memory method proposed in this paper was that the approximation guarantees from literature are preserved for both problems even if we solve their weaker SDP relaxations with only 𝒪(n+|E|)\mathcal{O}(n+|E|) constraints. We showed that for Max-k-Cut, and the Max-Agree variant of correlation clustering, our approach nearly preserves the quality of the solution as given in [14, 10]. We also implemented the method outlined in Section 4 to generate approximate clustering for random graphs with provable guarantees. The numerical experiments showed that while the method was simple to implement, it was slow in practice. However, there is scope for improving the convergence rate of our method so that it can potentially be applied to the large-scale instances of various real-life applications of clustering.

Extending the low-memory method to solve problems with triangle inequalities.

The known nontrivial approximation guarantees for sparsest cut problem involve solving an SDP relaxation that has n3n^{3} triangle inequalities [4]. It would be interesting to see whether it is possible to simplify these SDPs in such a way that they can be combined nicely with memory efficient algorithms, and still maintain good approximation guarantees.

References

  • Ahn and Guha [2009] Kook Jin Ahn and Sudipto Guha. Graph sparsification in the semi-streaming model. In International Colloquium on Automata, Languages, and Programming, pages 328–338. Springer, 2009.
  • Ahn et al. [2015] KookJin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlation clustering in data streams. In International Conference on Machine Learning, pages 2237–2246. PMLR, 2015.
  • Ailon et al. [2009] Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. Streaming k-means approximation. In Proceddings of the Twenty-Third Annual Conference on Neural Information Processing Systems, volume 4, page 2, 2009.
  • Arora et al. [2009] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):1–37, 2009.
  • Awasthi et al. [2015] Pranjal Awasthi, Afonso S Bandeira, Moses Charikar, Ravishankar Krishnaswamy, Soledad Villar, and Rachel Ward. Relax, no need to round: Integrality of clustering formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 191–200, 2015.
  • Bansal et al. [2004] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine Learning, 56(1-3):89–113, 2004.
  • Borndörfer et al. [1998] Ralf Borndörfer, Andreas Eisenblätter, Martin Grötschel, and Alexander Martin. Frequency assignment in cellular phone networks. Annals of Operations Research, 76:73–93, 1998.
  • Boumal et al. [2016] Nicolas Boumal, Vlad Voroninski, and Afonso Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Advances in Neural Information Processing Systems, pages 2757–2765, 2016.
  • Burer and Monteiro [2003] Samuel Burer and Renato DC Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95(2):329–357, 2003.
  • Charikar et al. [2005] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005.
  • Ding et al. [2019] Lijun Ding, Alp Yurtsever, Volkan Cevher, Joel A Tropp, and Madeleine Udell. An optimal-storage approach to semidefinite programming using approximate complementarity. arXiv preprint arXiv:1902.03373, 2019.
  • Eisenblätter et al. [2002] Andreas Eisenblätter, Martin Grötschel, and Arie Koster. Frequency planning and ramifications of coloring. Discussiones Mathematicae Graph Theory, 22(1):51–88, 2002.
  • Feo and Resende [1995] Thomas A Feo and Mauricio GC Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6(2):109–133, 1995.
  • Frieze and Jerrum [1997] Alan Frieze and Mark Jerrum. Improved approximation algorithms for Max-k-Cut and max bisection. Algorithmica, 18(1):67–81, 1997.
  • Fung et al. [2019] Wai-Shing Fung, Ramesh Hariharan, Nicholas JA Harvey, and Debmalya Panigrahi. A general framework for graph sparsification. SIAM Journal on Scientific Computing, 48(4):1196–1223, 2019.
  • Gaur et al. [2008] Daya Ram Gaur, Ramesh Krishnamurti, and Rajeev Kohli. The capacitated Max-k-Cut problem. Mathematical Programming, 115(1):65–72, 2008.
  • Ghaddar et al. [2011] Bissan Ghaddar, Miguel F Anjos, and Frauke Liers. A branch-and-cut algorithm based on semidefinite programming for the minimum k-partition problem. Annals of Operations Research, 188(1):155–174, 2011.
  • Jaggi [2013] Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In Proceedings of the 30th International Conference on Machine Learning, pages 427–435, 2013.
  • Journée et al. [2010] Michel Journée, Francis Bach, P-A Absil, and Rodolphe Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM Journal on Optimization, 20(5):2327–2351, 2010.
  • Kann et al. [1995] Viggo Kann, Sanjeev Khanna, Jens Lagergren, and Alessandro Panconesi. On the Hardness of Approximating Max-k-Cut and Its Dual. Citeseer, 1995.
  • Karger et al. [1998] David Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite programming. Journal of the ACM (JACM), 45(2):246–265, 1998.
  • Kuczyński and Woźniakowski [1992] Jacek Kuczyński and Henryk Woźniakowski. Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM Journal on Matrix Analysis and Applications, 13(4):1094–1122, 1992.
  • Kyng et al. [2017] Rasmus Kyng, Jakub Pachocki, Richard Peng, and Sushant Sachdeva. A framework for analyzing resparsification algorithms. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2032–2043. SIAM, 2017.
  • Ma and Hao [2017] Fuda Ma and Jin-Kao Hao. A multiple search operator heuristic for the Max-k-Cut problem. Annals of Operations Research, 248(1-2):365–403, 2017.
  • McCutchen and Khuller [2008] Richard Matthew McCutchen and Samir Khuller. Streaming algorithms for k-center clustering with outliers and with anonymity. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 165–178. Springer, 2008.
  • Mladenović and Hansen [1997] Nenad Mladenović and Pierre Hansen. Variable neighborhood search. Computers & Operations Research, 24(11):1097–1100, 1997.
  • Newman [2018] Alantha Newman. Complex semidefinite programming and Max-k-Cut. arXiv preprint arXiv:1812.10770, 2018.
  • Shinde et al. [2021] Nimita Shinde, Vishnu Narayanan, and James Saunderson. Memory-efficient structured convex optimization via extreme point sampling. SIAM Journal on Mathematics of Data Science, 3(3):787–814, 2021.
  • Spielman and Teng [2011] Daniel A Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM Journal on Scientific Computing, 40(4):981–1025, 2011.
  • Subramanian et al. [2008] Anand Prabhu Subramanian, Himanshu Gupta, Samir R Das, and Jing Cao. Minimum interference channel assignment in multiradio wireless mesh networks. IEEE Transactions on Mobile Computing, 7(12):1459–1473, 2008.
  • Swamy [2004] Chaitanya Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In SIAM Symposium on Discrete Algorithms, volume 4, pages 526–527. Citeseer, 2004.
  • Tropp et al. [2019] Joel A Tropp, Alp Yurtsever, Madeleine Udell, and Volkan Cevher. Streaming low-rank matrix approximation with an application to scientific simulation. SIAM Journal on Scientific Computing, 41(4):A2430–A2463, 2019.
  • Veldt et al. [2019] Nate Veldt, David F Gleich, Anthony Wirth, and James Saunderson. Metric-constrained optimization for graph clustering algorithms. SIAM Journal on Mathematics of Data Science, 1(2):333–355, 2019.
  • Wang et al. [2013] Yubo Wang, Linli Xu, Yucheng Chen, and Hao Wang. A scalable approach for general correlation clustering. In International Conference on Advanced Data Mining and Applications, pages 13–24. Springer, 2013.
  • Woodruff [2014] David P Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157, 2014.
  • Ye [2003] Yinyu Ye. GSet random graphs. https://www.cise.ufl.edu/research/sparse/matrices/Gset/, 2003. [accessed April-2021].
  • Yurtsever et al. [2017] Alp Yurtsever, Madeleine Udell, Joel A Tropp, and Volkan Cevher. Sketchy decisions: Convex low-rank matrix optimization with optimal storage. In Artificial Intelligence and Statistics, pages 1188–1196, 2017.
  • Yurtsever et al. [2021] Alp Yurtsever, Joel A Tropp, Olivier Fercoq, Madeleine Udell, and Volkan Cevher. Scalable semidefinite programming. SIAM Journal on Mathematics of Data Science, 3(1):171–200, 2021.

Appendix A Proofs

A.1 Proof of Lemma 2.1

Proof.

Let ϑd1\vartheta\in\mathbb{R}^{d_{1}} and μd2\mu\in\mathbb{R}^{d_{2}} be the dual variables corresponding to the d1d_{1} equality constraints and the d2d_{2} inequality constraints respectively. The dual of (SDP) is

minϑ,μi=1d1bi(1)ϑi+j=1d2bj(2)μjsubject to{i=1d1ϑiAi(1)+j=1d2Aj(2)μjC0μ0,\min_{\vartheta,\mu}\quad\sum_{i=1}^{d_{1}}b^{(1)}_{i}\vartheta_{i}+\sum_{j=1}^{d_{2}}b^{(2)}_{j}\mu_{j}\quad\textup{subject to}\quad\begin{cases}&\sum\limits_{i=1}^{d_{1}}\vartheta_{i}A^{(1)}_{i}+\sum\limits_{j=1}^{d_{2}}A^{(2)}_{j}\mu_{j}-C\succeq 0\\ &\mu\leq 0,\end{cases} (DSDP)

where Aj(2)A^{(2)}_{j}’s for j=1,,d2j=1,\dotsc,d_{2} are assumed to be symmetric.

Lower bound on the objective.

Let XX^{\star} be an optimal solution to (SDP) and let XFWX^{\star}_{FW} be an optimal solution to (SDP-LSE). For ease of notation, let

u=𝒜(1)(X)b(1)andv=b(2)𝒜(2)(X),u=\mathcal{A}^{(1)}(X)-b^{(1)}\quad\textrm{and}\quad v=b^{(2)}-\mathcal{A}^{(2)}(X), (A.1)

and define (u^ϵ,v^ϵ)(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon}), (uFW,vFW)(u_{FW},v_{FW}) and (u,v)(u^{\star},v^{\star}) by substituting X^ϵ\widehat{X}_{\epsilon}, XFWX_{FW} and XX^{\star} respectively in (A.1). Since X^ϵ\widehat{X}_{\epsilon} is an ϵ\epsilon-optimal solution to (SDP-LSE), and a feasible solution to (SDP-LSE), the following holds

C,X^ϵβϕM(u^ϵ,v^ϵ)C,XFWβϕM(uFW,vFW)ϵC,XβϕM(u,v)ϵ.\langle C,\widehat{X}_{\epsilon}\rangle-\beta\phi_{M}(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon})\geq\langle C,X_{FW}\rangle-\beta\phi_{M}(u_{FW},v_{FW})-\epsilon\geq\langle C,X^{\star}\rangle-\beta\phi_{M}(u^{\star},v^{\star})-\epsilon. (A.2)

We know that (u,v)(u^{\star},v^{\star}) is feasible to (SDP), so that ϕM(u,v)log(2d1+d2)M\phi_{M}(u^{\star},v^{\star})\leq\frac{\log(2d_{1}+d_{2})}{M}. Now, rearranging the terms, and using the upper bound on ϕM(u,v)\phi_{M}(u^{\star},v^{\star}) and the fact that ϕM(u^ϵ,v^ϵ)0\phi_{M}(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon})\geq 0,

C,X^ϵC,Xβlog(2d1+d2)Mϵ.\langle C,\widehat{X}_{\epsilon}\rangle\geq\langle C,X^{\star}\rangle-\frac{\beta\log(2d_{1}+d_{2})}{M}-\epsilon. (A.3)

Upper bound on the objective.

The Lagrangian of (SDP) is L(X,ϑ,μ)=C,Xi=1d1uiϑi+j=1d2vjμjL(X,\vartheta,\mu)=\langle C,X\rangle-\sum_{i=1}^{d_{1}}u_{i}\vartheta_{i}+\sum_{j=1}^{d_{2}}v_{j}\mu_{j}. For a primal-dual optimal pair, (X,ϑ,μX^{\star},\vartheta^{\star},\mu^{\star}), and X^ϵ0\widehat{X}_{\epsilon}\succeq 0, we have that L(X^ϵ,ϑ,μ)L(X,ϑ,μ)L(\widehat{X}_{\epsilon},\vartheta^{\star},\mu^{\star})\leq L(X^{\star},\vartheta^{\star},\mu^{\star}), i.e.,

C,X^ϵi=1d1ϑi[u^ϵ]i+i=jd2μj[v^ϵ]jC,Xi=1d1ϑiui+j=1d2μjvjC,X.\begin{split}\langle C,\widehat{X}_{\epsilon}\rangle-\sum_{i=1}^{d_{1}}\vartheta_{i}^{\star}[\widehat{u}_{\epsilon}]_{i}+\sum_{i=j}^{d_{2}}\mu_{j}^{\star}[\widehat{v}_{\epsilon}]_{j}&\leq\langle C,X^{\star}\rangle-\sum_{i=1}^{d_{1}}\vartheta^{\star}_{i}u^{\star}_{i}+\sum_{j=1}^{d_{2}}\mu^{\star}_{j}v^{\star}_{j}\\ &\leq\langle C,X^{\star}\rangle.\end{split}

Rearranging the terms, using the duality of the 1\ell_{1} and \ell_{\infty} norms, and the fact that μ0\mu^{\star}\leq 0, gives

C,X^ϵC,X+i=1d1ϑi[u^ϵ]ij=1d2μj[v^ϵ]jC,X+(i=1d1|ϑi|)u^ϵ+(j=1d2μj)maxj[v^ϵ]jC,X+[ϑ,μ]]1max{u^ϵ,maxj[v^ϵ]j}.\begin{split}\langle C,\widehat{X}_{\epsilon}\rangle&\leq\langle C,X^{\star}\rangle+\sum_{i=1}^{d_{1}}\vartheta^{\star}_{i}[\widehat{u}_{\epsilon}]_{i}-\sum_{j=1}^{d_{2}}\mu^{\star}_{j}[\widehat{v}_{\epsilon}]_{j}\\ &\leq\langle C,X^{\star}\rangle+\left(\sum_{i=1}^{d_{1}}|\vartheta^{\star}_{i}|\right)\|\widehat{u}_{\epsilon}\|_{\infty}+\left(\sum_{j=1}^{d_{2}}-\mu^{\star}_{j}\right)\max_{j}[\widehat{v}_{\epsilon}]_{j}\\ &\leq\langle C,X^{\star}\rangle+\|[\vartheta^{\star},\mu^{\star}]]\|_{1}\max\left\{\|\widehat{u}_{\epsilon}\|_{\infty},\max_{j}[\widehat{v}_{\epsilon}]_{j}\right\}.\end{split} (A.4)

Bound on infeasibility.

Using (A.4), we rewrite (A.2) as,

βϕM(u^ϵ,v^ϵ)C,X^ϵC,X+βϕM(u,v)+ϵ[ϑ,μ]1max{u^ϵ,maxj[v^ϵ]j}+βlog(2d1+d2)M+ϵ.\begin{split}\beta\phi_{M}(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon})&\leq\langle C,\widehat{X}_{\epsilon}\rangle-\langle C,X^{\star}\rangle+\beta\phi_{M}(u^{\star},v^{\star})+\epsilon\\ &\leq\|[\vartheta^{\star},\mu^{\star}]\|_{1}\max\left\{\|\widehat{u}_{\epsilon}\|_{\infty},\max_{j}[\widehat{v}_{\epsilon}]_{j}\right\}+\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon.\end{split} (A.5)

Combining the lower bound on ϕM()\phi_{M}(\cdot) given in (2.2) with (A.5) and since β>[ϑ,μ]1\beta>\|[\vartheta^{\star},\mu^{\star}]\|_{1} by assumption, we have

max{u^ϵ,maxj[v^ϵ]j}βlog(2d1+d2)M+ϵβ[ϑ,μ]1.\max\left\{\|\widehat{u}_{\epsilon}\|_{\infty},\max_{j}[\widehat{v}_{\epsilon}]_{j}\right\}\leq\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\|[\vartheta^{\star},\mu^{\star}]\|_{1}}. (A.6)

Completing the upper bound on the objective.

Substituting (A.6) into (A.4) gives

C,X^ϵC,X+[ϑ,μ]1βlog(2d1+d2)M+ϵβ[ϑ,μ]1.\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}\rangle+\|[\vartheta^{\star},\mu^{\star}]\|_{1}\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\|[\vartheta^{\star},\mu^{\star}]\|_{1}}. (A.7)

A.2 Proof of Lemma 3.1

Proof.

The proof consists of three parts.

Lower bound on the objective.

Substituting the values of β\beta and MM, and replacing ϵ\epsilon by ϵTr(C)\epsilon\textup{Tr}(C) in (A.3), we have

C,X^ϵC,XR2ϵTr(C).\langle C,\widehat{X}_{\epsilon}\rangle\geq\langle C,X^{\star}_{R}\rangle-2\epsilon\textup{Tr}(C). (A.8)

Since the identity matrix II is strictly feasible for (k-Cut-Rel), Tr(C)C,XR\textup{Tr}(C)\leq\langle C,X^{\star}_{R}\rangle. Combining this fact with (A.8) gives,

C,X^ϵ(12ϵ)C,XR.\langle C,\widehat{X}_{\epsilon}\rangle\geq(1-2\epsilon)\langle C,X^{\star}_{R}\rangle.

Bound on infeasibility.

For (k-Cut-Rel), let ν=[ν(1),ν(2)]n+|E|\nu=[\nu^{(1)},\nu^{(2)}]\in\mathbbm{R}^{n+|E|} be a dual variable such that νi(1)\nu^{(1)}_{i} for i=1,,ni=1,\dotsc,n are the variables corresponding to nn equality constraints and νij(2)\nu^{(2)}_{ij} for (i,j)E,i<j(i,j)\in E,i<j are the dual variables corresponding to |E||E| inequality constraints. Following (DSDP), the dual of (k-Cut-Rel) is

minνi=1nνi(1)1k1ijEi<jνij(2)subject to{diag(ν(1))+ijEi<j[eiejT+ejeiT]νij(2)2C0ν(2)0.\min_{\nu}\sum\limits_{i=1}^{n}\nu^{(1)}_{i}-\frac{1}{k-1}\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)}_{ij}\quad\textup{subject to}\quad\begin{cases}&\textup{diag}^{*}(\nu^{(1)})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)}_{ij}}{2}-C\succeq 0\\ &\nu^{(2)}\leq 0.\end{cases} (Dual-Relax)

Let ν\nu^{\star} be an optimal dual solution. In order to bound the infeasibility using (A.6), we need a bound on ν1\|\nu^{\star}\|_{1} which is given by the following lemma.

Lemma A.1.

The value of ν1\|\nu^{\star}\|_{1} is upper bounded by 4Tr(C)4\textup{Tr}(C).

Proof.

The matrix CC is a scaled Laplacian and so, the only off-diagonal entries that are nonzero correspond to (i,j)E(i,j)\in E and have value less than zero. For (Dual-Relax), a feasible solution is ν(1)=diag(C)\nu^{(1)}=\textup{diag}(C), νij(2)=2Cij\nu^{(2)}_{ij}=2C_{ij} for (i,j)E,i<j(i,j)\in E,i<j. The optimal objective function value of (Dual-Relax) is then upper bounded by

i=1nνi(1)1k1ijEi<jνij(2)Tr(C)+1k1Tr(C)=kk1Tr(C)\displaystyle\sum_{i=1}^{n}\nu^{(1)\star}_{i}-\frac{1}{k-1}\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\textup{Tr}(C)+\frac{1}{k-1}\textup{Tr}(C)=\frac{k}{k-1}\textup{Tr}(C) (A.9)
\displaystyle\Rightarrow\quad i=1nνi(1)kk1Tr(C)+1k1ijEi<jνij(2)kk1Tr(C),\displaystyle\sum_{i=1}^{n}\nu^{(1)\star}_{i}\leq\frac{k}{k-1}\textup{Tr}(C)+\frac{1}{k-1}\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\frac{k}{k-1}\textup{Tr}(C), (A.10)

where the last inequality follows since ν(2)0\nu^{(2)}\leq 0.

We have diag(ν(1))+ijEi<j[eiejT+ejeiT]νij(2)2,𝟙𝟙TC,𝟙𝟙T0\left\langle\textup{diag}^{*}(\nu^{(1)\star})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)\star}_{ij}}{2},\mathbbm{1}\mathbbm{1}^{T}\right\rangle-\langle C,\mathbbm{1}\mathbbm{1}^{T}\rangle\geq 0 since both matrices are PSD. Using the fact that 𝟙\mathbbm{1} is in the null space of CC, we get

ijEi<jνij(2)i=1nνi(1).-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\sum_{i=1}^{n}\nu^{(1)\star}_{i}. (A.11)

Since ν(2)0\nu^{(2)\star}\leq 0, we can write

ν1=i=1n|νi(1)|ijEi<jνi(2)2i=1nνi(1),\|\nu^{\star}\|_{1}=\sum_{i=1}^{n}|\nu^{(1)\star}_{i}|-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{i}\leq 2\sum_{i=1}^{n}\nu^{(1)\star}_{i}, (A.12)

which follows from (A.11) and the fact that for the dual to be feasible we have ν(1)0\nu^{(1)}\geq 0 since CC has nonnegative entries on the diagonal. Substituting (A.10) in (A.12),

ν12kk1Tr(C)4Tr(C),\|\nu^{\star}\|_{1}\leq\frac{2k}{k-1}\textup{Tr}(C)\leq 4\textup{Tr}(C), (A.13)

where the last inequality follows since k/(k1)2k/(k-1)\leq 2 for k2k\geq 2. ∎

Since X^ϵ\widehat{X}_{\epsilon} is an ϵTr(C)\epsilon\textup{Tr}(C)-optimal solution to (k-Cut-LSE), we replace ϵ\epsilon be ϵTr(C)\epsilon\textup{Tr}(C) in (A.6). Finally, substituting (A.13) into (A.6), and setting β=6Tr(C)\beta=6\textup{Tr}(C) and M=6log(2n+|E|)ϵM=6\frac{\log(2n+|E|)}{\epsilon},

max{diag(X^ϵ)𝟙,maxijE,i<j1k1[X^ϵ]ij}ϵ.\max\left\{\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty},\max_{ij\in E,i<j}-\frac{1}{k-1}-[\widehat{X}_{\epsilon}]_{ij}\right\}\leq\epsilon. (A.14)

This condition can also be stated as

diag(X^ϵ)𝟙ϵ,[X^ϵ]ij1k1ϵ(i,j)E,i<j.\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\frac{1}{k-1}-\epsilon\quad(i,j)\in E,i<j.

Upper bound on the objective.

Substituting (A.14) and (A.13) and the values of parameters β\beta and MM into (A.7) gives

C,X^ϵC,XR+4Tr(C)ϵ(1+4ϵ)C,XR,\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}_{R}\rangle+4\textup{Tr}(C)\epsilon\leq(1+4\epsilon)\langle C,X^{\star}_{R}\rangle,

where the last inequality follows since Tr(C)C,XR\textup{Tr}(C)\leq\langle C,X^{\star}_{R}\rangle. ∎

A.3 Proof of Lemma 3.2

Proof.

We first show that Algorithm 2 generates random Gaussian samples whose covariance is feasible to (k-Cut-Rel).

Proposition A.1.

Given kk Gaussian random vectors z1,,zk𝒩(0,X^ϵ)z_{1},\dotsc,z_{k}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}), such that their covariance X^ϵ\widehat{X}_{\epsilon} satisfies the inequality (3.4), the Gaussian random vectors z1f,,zkf𝒩(0,Xf)z^{f}_{1},\dotsc,z^{f}_{k}\sim\mathcal{N}(0,X^{f}) generated by Algorithm 2 have covariance XfX^{f} that is a feasible solution to (k-Cut-Rel).

Proof.

Define X¯=X^ϵ+err𝟙𝟙T\overline{X}=\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}. Note that, X¯0\overline{X}\succeq 0 and it satisfies the following properties:

  1. 1.

    Since X^ϵ\widehat{X}_{\epsilon} satisfies (3.4), we have errϵ\textup{err}\leq\epsilon. Combining this fact with the definition of X¯\overline{X}, we have X¯jl1k1\overline{X}_{jl}\geq-\frac{1}{k-1} for (j,l)E,j<l(j,l)\in E,j<l.

  2. 2.

    Furthermore, diag(X¯)=diag(X^ϵ)+err\textup{diag}(\overline{X})=\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}, which when combined with (3.4), gives 1diag(X¯)1+2err1\leq\textup{diag}(\overline{X})\leq 1+2\textup{err}.

  3. 3.

    For y𝒩(0,1)y\sim\mathcal{N}(0,1), if z¯i=zi+erry𝟙\overline{z}_{i}=z_{i}+\sqrt{\textup{err}}y\mathbbm{1}, i.e., it is a sum of two Gaussian random vectors, then z¯i𝒩(0,X¯)\overline{z}_{i}\sim\mathcal{N}(0,\overline{X}).

The steps 5 and 6 of Algorithm 2 generate a zero-mean random vector zfz^{f} whose covariance is

Xf=X¯max(diag(X¯))+(Idiag(diag(X¯)max(diag(X¯)))),X^{f}=\frac{\overline{X}}{\textup{max}(\textup{diag}(\overline{X}))}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\overline{X})}{\max(\textup{diag}(\overline{X}))}\right)\right), (A.15)

i.e., zf𝒩(0,Xf)z^{f}\sim\mathcal{N}(0,X^{f}). Furthermore, XfX^{f} is feasible to (k-Cut-Rel) since diag(Xf)=𝟙\textup{diag}(X^{f})=\mathbbm{1}, Xjlf1k1X^{f}_{jl}\geq-\frac{1}{k-1} for (j,l)E,j<l(j,l)\in E,j<l, and it is a sum of two PSD matrices so that Xf0X^{f}\succeq 0. ∎

The objective function value of (k-Cut-Rel) at XfX^{f} (defined in (A.15)) is

C,Xf\displaystyle\langle C,X^{f}\rangle =C,X^ϵ+err𝟙𝟙Tmax(diag(X^ϵ))+err+(Idiag(diag(X^ϵ)+errmax(diag(X^ϵ))+err))\displaystyle=\left\langle C,\frac{\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)\right\rangle (A.16)
(i)C,X^ϵmax(diag(X^ϵ))+err(ii)12ϵ1+2ϵC,XR(iii)(14ϵ)C,XR,\displaystyle\underset{(i)}{\geq}\ \frac{\langle C,\widehat{X}_{\epsilon}\rangle}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\ \underset{(ii)}{\geq}\frac{1-2\epsilon}{1+2\epsilon}\langle C,X^{\star}_{R}\rangle\ \underset{(iii)}{\geq}\ (1-4\epsilon)\langle C,X^{\star}_{R}\rangle, (A.17)

where (i) follows from the fact that both CC and err𝟙𝟙Tmax(diag(X^ϵ))+err+Idiag(diag(X^ϵ)+errmax(diag(X^ϵ))+err)\frac{\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right) are PSD and so, their inner product is nonnegative, (ii) follows from Lemma 3.1 and the fact that errϵ\textup{err}\leq\epsilon, and (iii) uses the fact that 12ϵ(1+2ϵ)(14ϵ)1-2\epsilon\geq(1+2\epsilon)(1-4\epsilon). Let 𝔼[CUT]\mathbb{E}[\texttt{CUT}] denote the value of the cut generated from the samples zifz^{f}_{i}’s. Combining (A.17) with the inequality 𝔼[CUT]C,Xfαk\frac{\mathbb{E}[\texttt{CUT}]}{\langle C,X^{f}\rangle}\geq\alpha_{k} (see (3.2)), we have

𝔼[CUT]αkC,Xfαk(14ϵ)C,XRαk(14ϵ)optkG.\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}\langle C,X^{f}\rangle\geq\alpha_{k}(1-4\epsilon)\langle C,X^{\star}_{R}\rangle\geq\alpha_{k}(1-4\epsilon)\textup{opt}_{k}^{G}. (A.18)

A.4 Proof of Lemma 3.3

Proof.

We use Algorithm 1 with p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)} and T(n,ϵ)=144log(2n+|E|)n2ϵ2T(n,\epsilon)=\frac{144\log(2n+|E|)n^{2}}{\epsilon^{2}} to generate an ϵTr(C)\epsilon\textup{Tr}(C)-optimal solution to (k-Cut-LSE). We first bound the outer iteration complexity, i.e., the number of iterations of Algorithm 1 until convergence. This value also denotes the number of times the subproblem LMO is solved.

Upper bound on outer iteration complexity.

Let the objective function of (k-Cut-LSE) be g(X)=C,XβϕM(diag(X)𝟙,[1k1eiTXej](i,j)E)g(X)=\langle C,X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},\left[-\frac{1}{k-1}-e_{i}^{T}Xe_{j}\right]_{(i,j)\in E}\right).

Theorem A.1.

Let g(X)g(X) be a concave and differentiable function and XX^{\star} an optimal solution of (k-Cut-LSE). Let CguC_{g}^{u} be an upper bound on the curvature constant of gg, and let η0\eta\geq 0 be the accuracy parameter for LMO, then, XtX_{t} satisfies

g(Xt)+g(X)2Cgu(1+η)t+2,-g(X_{t})+g(X^{\star})\leq\frac{2C_{g}^{u}(1+\eta)}{t+2}, (A.19)

with probability at least (1p)t1tp(1-p)^{t}\geq 1-tp.

The result follows from [18, Theorem 1] when LMO generates a solution with approximation error at most 12ηγCgu\frac{1}{2}\eta\gamma C_{g}^{u} with probability 1p1-p. Now, η(0,1)\eta\in(0,1) is an appropriately chosen constant, and from [28, Lemma 3.1], an upper bound CfuC_{f}^{u} on the curvature constant of g(X)g(X) is βMn2\beta Mn^{2}. Thus, after at most

T=2Cgu(1+η)ϵTr(C)2=2βMn2(1+η)ϵTr(C)2T=\frac{2C_{g}^{u}(1+\eta)}{\epsilon\textup{Tr}(C)}-2=\frac{2\beta Mn^{2}(1+\eta)}{\epsilon\textup{Tr}(C)}-2 (A.20)

iterations, Algorithm 1 generates an ϵTr(C)\epsilon\textup{Tr}(C)-optimal solution to (k-Cut-LSE).

Bound on the approximate kk-cut value.

From Theorem A.1, we see that after at most TT iterations, Algorithm 1 generates a solution X^ϵ\widehat{X}_{\epsilon} that satisfies the bounds in Lemma 3.1 with probability with at least 1ϵ1-\epsilon when p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)}. Consequently, the bound given in (A.17) also holds with probability at least 1ϵ1-\epsilon. And so, the expected value of C,Xf\langle C,X^{f}\rangle is 𝔼[C,Xf](14ϵ)C,XR(1ϵ)(15ϵ)C,XR\mathbb{E}[\langle C,X^{f}\rangle]\geq(1-4\epsilon)\langle C,X^{\star}_{R}\rangle(1-\epsilon)\geq(1-5\epsilon)\langle C,X^{\star}_{R}\rangle. Finally, from (A.18), the expected value of the kk-cut, denoted by 𝔼[CUT]\mathbb{E}[\texttt{CUT}], is bounded as

𝔼[CUT]=𝔼L[𝔼G[CUT]]αk𝔼L[C,Xf]αk(15ϵ)C,XRαk(15ϵ)optkG,\mathbb{E}[\texttt{CUT}]=\mathbb{E}_{L}[\mathbb{E}_{G}[\texttt{CUT}]]\geq\alpha_{k}\mathbb{E}_{L}[\langle C,X^{f}\rangle]\geq\alpha_{k}(1-5\epsilon)\langle C,X^{\star}_{R}\rangle\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{G}, (A.21)

where 𝔼L[]\mathbb{E}_{L}[\cdot] denotes the expectation over the randomness in the subproblem LMO and 𝔼G[]\mathbb{E}_{G}[\cdot] denotes the expectation over random Gaussian samples.

Finally, we compute an upper bound on the complexity of each iteration, i.e., inner iteration complexity, of Algorithm 1.

Upper bound on inner iteration complexity.

At each iteration tt, Algorithm 1 solves the subproblem LMO, which generates a unit vector hh, such that

αhhT,gtmaxd𝒮αd,gt12ηγtCgu,\alpha\langle hh^{T},\nabla g_{t}\rangle\geq\max_{d\in\mathcal{S}}\alpha\langle d,\nabla g_{t}\rangle-\frac{1}{2}\eta\gamma_{t}C_{g}^{u}, (A.22)

where γt=2t+2\gamma_{t}=\frac{2}{t+2}, gt=g(Xt)\nabla g_{t}=\nabla g(X_{t}) and 𝒮={X0:Tr(X)n}\mathcal{S}=\{X\succeq 0:\textup{Tr}(X)\leq n\}. Note that this problem is equivalent to approximately computing maximum eigenvector of the matrix gt\nabla g_{t} which can be done using Lanczos algorithm [22].

Lemma A.2 (Convergence of Lanczos algorithm).

Let ρ(0,1]\rho\in(0,1] and p(0,1/2]p\in(0,1/2]. For gt𝕊n\nabla g_{t}\in\mathbb{S}_{n}, the Lanczos method [22], computes a vector hnh\in\mathbb{R}^{n}, that satisfies

hTgthλmax(gt)ρ8gth^{T}\nabla g_{t}h\geq\lambda_{\textup{max}}(\nabla g_{t})-\frac{\rho}{8}\|\nabla g_{t}\| (A.23)

with probability at least 12p1-2p, after at most q12+1ρlog(n/p2)q\geq\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2}) iterations.

This result is an adaptation of [22, Theorem 4.2] which provides convergence of Lanczos to approximately compute minimum eigenvalue and the corresponding eigenvector of a symmetric matrix. Let N=12+1ρlog(n/p2)N=\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2}). We now derive an upper bound on NN.

Comparing (A.23) and (A.22), we see that

12ηγtCgu=αρ8gt1ρ=αgt4ηγtCgu\begin{split}&\frac{1}{2}\eta\gamma_{t}C_{g}^{u}=\alpha\frac{\rho}{8}\|\nabla g_{t}\|\\ \Rightarrow&\frac{1}{\rho}=\frac{\alpha\|\nabla g_{t}\|}{4\eta\gamma_{t}C_{g}^{u}}\end{split} (A.24)

Substituting the value of γt\gamma_{t} in the equation above, and noting that γt=2t+22T+2\gamma_{t}=\frac{2}{t+2}\geq\frac{2}{T+2}, we have

1ρ=αgt(t+2)8ηCguαgt(T+2)8ηCgu=αgt(1+η)4ηϵTr(C),\begin{split}\frac{1}{\rho}=\frac{\alpha\|\nabla g_{t}\|(t+2)}{8\eta C_{g}^{u}}\leq\frac{\alpha\|\nabla g_{t}\|(T+2)}{8\eta C_{g}^{u}}=\frac{\alpha\|\nabla g_{t}\|(1+\eta)}{4\eta\epsilon\textup{Tr}(C)},\end{split} (A.25)

where the last equality follows from substituting the value of TT (see (A.20)). We now derive an upper bound on gt\|\nabla g_{t}\|.

Lemma A.3.

Let g(X)=C,XβϕM(diag(X)𝟙,[1k1eiTXej](i,j)E)g(X)=\langle C,X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},\left[-\frac{1}{k-1}-e_{i}^{T}Xe_{j}\right]_{(i,j)\in E}\right), where ϕM()\phi_{M}(\cdot) is defined in (2.1). We have gtTr(C)(1+6(2|E|+n))\|\nabla g_{t}\|\leq\textup{Tr}(C)(1+6(\sqrt{2|E|+n})).

Proof.

For the function g(X)g(X) as defined in the lemma, gt=CβD\nabla g_{t}=C-\beta D, where DD is matrix such that Dii[1,1]D_{ii}\in[-1,1] for i=1,,ni=1,\dotsc,n, Dij[1,1]D_{ij}\in[-1,1] for (i,j)E(i,j)\in E, and Dij=0D_{ij}=0 for (i,j)E(i,j)\notin E. Thus, we have

maxk|λk(D)|Tr(DTD)=i,j=1n|Dij|22|E|+n,\max_{k}|\lambda_{k}(D)|\leq\sqrt{\textup{Tr}(D^{T}D)}=\sqrt{\sum_{i,j=1}^{n}|D_{ij}|^{2}}\leq\sqrt{2|E|+n}, (A.26)

where the last inequality follows since there are at most 2|E|2|E| off-diagonal and nn diagonal nonzero entries in the matrix DD with each nonzero entry in the range [1,1][-1,1]. Now,

gt=CβD(i)C+βDmaxi|λi(C)|+maxi|λi(βD)|(ii)Tr(C)+β2|E|+n(iii)Tr(C)(1+6(2|E|+n)).\begin{split}\|\nabla g_{t}\|=\|C-\beta D\|&\underset{(i)}{\leq}\|C\|+\|-\beta D\|\\ &\leq\max_{i}|\lambda_{i}(C)|+\max_{i}|\lambda_{i}(-\beta D)|\\ &\underset{(ii)}{\leq}\textup{Tr}(C)+\beta\sqrt{2|E|+n}\\ &\underset{(iii)}{\leq}\textup{Tr}(C)(1+6(\sqrt{2|E|+n})).\end{split} (A.27)

where (i) follows from the triangle inequality for the spectral norm of CβDC-\beta D, (ii) follows from (A.26) and since CC is graph Laplacian and a positive semidefinite matrix, and (iii) follows by substituting β=6Tr(C)\beta=6\textup{Tr}(C) as given in Lemma 3.1. ∎

Substituting α=n\alpha=n, and the bound on gt\|\nabla g_{t}\| in (A.25), we have

1ρ1+η4ηn(1+6(2|E|+n))ϵ,and\displaystyle\frac{1}{\rho}\leq\frac{1+\eta}{4\eta}\frac{n(1+6(\sqrt{2|E|+n}))}{\epsilon},\quad\textup{and} (A.28)
N=12+1ρlog(n/p2)12+1+η4ηn(1+6(2|E|+n))ϵlog(n/p2)=Nu.\displaystyle N=\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2})\leq\frac{1}{2}+\sqrt{\frac{1+\eta}{4\eta}}\sqrt{\frac{n(1+6(\sqrt{2|E|+n}))}{\epsilon}}\log(n/p^{2})=N^{u}. (A.29)
(A.30)

Finally, each iteration of Lanczos method performs a matrix-vector multiplication with gt\nabla g_{t}, which has at most 2|E|+n2|E|+n number of nonzero iterations, and 𝒪(n)\mathcal{O}(n) additional arithmetic operations. Thus, the computational complexity of Lanczos method is 𝒪(Nu(|E|+n))\mathcal{O}(N^{u}(|E|+n)). Moreover, Algorithm 1 performs 𝒪(|E|+n)\mathcal{O}(|E|+n) additional arithmetic operations so that the total inner iteration complexity is 𝒪(Nu(|E|+n))\mathcal{O}(N^{u}(|E|+n)), which can be written as 𝒪(n|E|1.25ϵlog(n/p2))\mathcal{O}\left(\frac{\sqrt{n}|E|^{1.25}}{\sqrt{\epsilon}}\log(n/p^{2})\right).

Computational complexity of Algorithm 1.

Now, substituting p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)}, we have

log(np2)=log((144)2n5(log(2n+|E|))2ϵ6)log((5.3n)6ϵ6)=6log(5.3nϵ),\log\left(\frac{n}{p^{2}}\right)=\log\left(\frac{(144)^{2}n^{5}(\log(2n+|E|))^{2}}{\epsilon^{6}}\right)\leq\log\left(\frac{(5.3n)^{6}}{\epsilon^{6}}\right)=6\log\left(\frac{5.3n}{\epsilon}\right), (A.31)

where the inequality follows since |E|(n12)|E|\leq\binom{n-1}{2}, (log(2n+(n12)))2n\left(\log\left(2n+\binom{n-1}{2}\right)\right)^{2}\leq n for n1n\geq 1 and (5.3)6(144)2(5.3)^{6}\geq(144)^{2}. Substituting the upper bound on log(n/p2)\log(n/p^{2}) in NuN^{u}, and combining the inner iteration complexity, 𝒪(Nu(|E|+n))\mathcal{O}(N^{u}(|E|+n)), and outer iteration complexity, TT, we see that Algorithm 1 is a 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)-time algorithm. ∎

A.5 Proof of Lemma 4.1

Proof.

We need to prove four inequalities given in Lemma 4.1.

Lower bound on the objective, C,X^ϵ\langle C,\widehat{X}_{\epsilon}\rangle.

Substituting the values of β\beta and MM, and replacing ϵ\epsilon by ϵTr(C)\epsilon\textup{Tr}(C) in (A.3), we have

C,X^ϵC,XG2ϵ(Tr(LG)+ijE+wij+).\langle C,\widehat{X}_{\epsilon}\rangle\geq\langle C,X^{\star}_{G}\rangle-2\epsilon\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right). (A.32)

Since 0.5I+0.5𝟙𝟙T0.5I+0.5\mathbbm{1}\mathbbm{1}^{T} is feasible for (MA-Rel), 0.5(Tr(LG)+ijE+wij+)C,XG0.5(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij})\leq\langle C,X^{\star}_{G}\rangle. Combining this fact with (A.32), we have

C,X^ϵ(14ϵ)C,XG.\langle C,\widehat{X}_{\epsilon}\rangle\geq(1-4\epsilon)\langle C,X^{\star}_{G}\rangle.

Bound on infeasibility.

Let E=EE+E=E^{-}\cup E^{+} and let ν=[ν(1),ν(2)]n+|E|\nu=[\nu^{(1)},\nu^{(2)}]\in\mathbb{R}^{n+|E|} be the dual variable such that ν(1)\nu^{(1)} is the dual variable corresponding to the nn equality constraints and ν(2)\nu^{(2)} is the dual variable for |E||E| inequality constraints. Following (DSDP), the dual of (MA-Rel) is

minνi=1nνi(1)subject to{diag(ν(1))+ijEi<j[eiejT+ejeiT]νij(2)2C0ν(2)0,\min_{\nu}\quad\sum_{i=1}^{n}\nu^{(1)}_{i}\quad\textup{subject to}\quad\begin{cases}&\textup{diag}^{*}(\nu^{(1)})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)}_{ij}}{2}-C\succeq 0\\ &\nu^{(2)}\leq 0,\end{cases} (Dual-CC)

where C=LG+W+C=L_{G^{-}}+W^{+}. Let ν\nu^{\star} be an optimal dual solution. We derive an upper bound on ν1\|\nu^{\star}\|_{1} in the following lemma, which is then used to bound the infeasibility using (A.6).

Lemma A.4.

The value of ν1\|\nu^{\star}\|_{1} is upper bounded by 2(Tr(LG)+ijE+wij+)2\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right).

Proof.

For (Dual-CC), νi(1)=[LG]ii+j:ijE+wij+\nu^{(1)}_{i}=[L_{G^{-}}]_{ii}+\sum_{j:ij\in E^{+}}w_{ij}^{+} for i=1,,ni=1,\dotsc,n, and νij(2)=2[LG]ij\nu^{(2)}_{ij}=2[L_{G^{-}}]_{ij} for (i,j)E,i<j(i,j)\in E,i<j is a feasible solution. The optimal objective function value of (Dual-CC) is then upper bounded as

i=1nνi(1)Tr(LG)+ijE+wij+.\sum_{i=1}^{n}\nu^{(1)\star}_{i}\leq\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}. (A.33)

We have diag(ν(1))+ijEi<j[eiejT+ejeiT]νij(2)2C,𝟙𝟙T0\left\langle\textup{diag}^{*}(\nu^{(1)\star})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)\star}_{ij}}{2}-C,\mathbbm{1}\mathbbm{1}^{T}\right\rangle\geq 0 since both matrices are PSD. Using the fact that LG,𝟙𝟙T=0\langle L_{G^{-}},\mathbbm{1}\mathbbm{1}^{T}\rangle=0, and rearranging the terms, we have

ijEi<jνij(2)i=1nνi(1)ijE+wij+.-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\sum_{i=1}^{n}\nu^{(1)\star}_{i}-\sum_{ij\in E^{+}}w^{+}_{ij}.

Since ν(2)0\nu^{(2)\star}\leq 0, we can write

ν1=i=1n|νi(1)|ijEi<jνij(2)2i=1nνi(1)ijE+wij+,\|\nu^{\star}\|_{1}=\sum_{i=1}^{n}|\nu^{(1)\star}_{i}|-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq 2\sum_{i=1}^{n}\nu^{(1)\star}_{i}-\sum_{ij\in E^{+}}w^{+}_{ij}, (A.34)

where we have used the fact that for any dual feasible solution, νi(1)[LG]ii0\nu^{(1)}_{i}\geq[L_{G^{-}}]_{ii}\geq 0 for all i=1,,ni=1,\dotsc,n. Substituting (A.33) in (A.34),

ν12Tr(LG)+ijE+wij+2(Tr(LG)+ijE+wij+).\|\nu^{\star}\|_{1}\leq 2\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\leq 2\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right). (A.35)

For Δ=Tr(LG)+ijE+wij+\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}, X^ϵ\widehat{X}_{\epsilon} is an ϵΔ\epsilon\Delta-optimal solution to (MA-LSE). And so, we replace ϵ\epsilon be ϵΔ\epsilon\Delta in (A.6). Now, substituting (A.35) and the values of β\beta and MM into (A.6), we get

max{diag(X^ϵ)𝟙,maxijE,i<j[X^ϵ]ij}ϵ.\max\left\{\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty},\max_{ij\in E,i<j}-[\widehat{X}_{\epsilon}]_{ij}\right\}\leq\epsilon. (A.36)

This condition can also be stated as

diag(X^ϵ)𝟙ϵ,[X^ϵ]ijϵ(i,j)E,i<j.\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\epsilon\quad(i,j)\in E,i<j.

Substituting (A.36),  (A.35) and the values of the parameters β\beta and MM into (A.7) gives

C,X^ϵC,XG+2(Tr(LG)+ijE+wij+)ϵ(1+4ϵ)C,XG,\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}_{G}\rangle+2\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right)\epsilon\leq(1+4\epsilon)\langle C,X^{\star}_{G}\rangle,

where the last inequality follows since II is a feasible solution to (MA-Rel). ∎

A.6 Proof of Lemma 4.2

Proof.

We first note that Algorithm 2 generates a samples whose covariance is feasible to (MA-Rel).

Proposition A.2.

Let z1,z2𝒩(0,X^ϵ)z_{1},z_{2}\sim\mathcal{N}(0,\widehat{X}_{\epsilon}) be Gaussian random vectors such that their covariance X^ϵ\widehat{X}_{\epsilon} satisfies the inequality (4.2). Replace Step 3 of Algorithm 2 with redefined err such that err=max{0,max(i,j)E,i<j[X^ϵ]ij}\textup{err}=\max\{0,\max_{(i,j)\in E,i<j}\ -[\widehat{X}_{\epsilon}]_{ij}\}. The Gaussian random vectors z1f,z2f𝒩(0,Xf)z^{f}_{1},z^{f}_{2}\sim\mathcal{N}(0,X^{f}) generated by the modified Algorithm 2 have covariance that is feasible to (MA-Rel).

The proof of Proposition A.2 is the same as the proof of Proposition A.1. Now, let

Xf=X^ϵ+err𝟙𝟙Tmax(diag(X^ϵ))+err+(Idiag(diag(X^ϵ)+errmax(diag(X^ϵ))+err))X^{f}=\frac{\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right) (A.37)

The objective function value of (MA-Rel) at XfX^{f} is

C,Xf\displaystyle\langle C,X^{f}\rangle =C,X^ϵ+err𝟙𝟙Tmax(diag(X^ϵ))+err+(Idiag(diag(X^ϵ)+errmax(diag(X^ϵ))+err))\displaystyle=\left\langle C,\frac{\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)\right\rangle (A.38)
(i)C,X^ϵmax(diag(X^ϵ))+err+C,(Idiag(diag(X^ϵ)+errmax(diag(X^ϵ))+err))\displaystyle\underset{(i)}{\geq}\ \frac{\langle C,\widehat{X}_{\epsilon}\rangle}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left\langle C,\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)\right\rangle (A.39)
(ii)C,X^ϵmax(diag(X^ϵ))+err(iii)14ϵ1+2ϵC,XG(iv)(16ϵ)C,XG\displaystyle\underset{(ii)}{\geq}\ \frac{\langle C,\widehat{X}_{\epsilon}\rangle}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\ \underset{(iii)}{\geq}\ \frac{1-4\epsilon}{1+2\epsilon}\langle C,X^{\star}_{G}\rangle\ \underset{(iv)}{\geq}\ (1-6\epsilon)\langle C,X^{\star}_{G}\rangle (A.40)

where (i) follows from the fact that LG,err𝟙𝟙T=0\langle L_{G^{-}},\textup{err}\mathbbm{1}\mathbbm{1}^{T}\rangle=0 and W+,err𝟙𝟙T0\langle W_{+},\textup{err}\mathbbm{1}\mathbbm{1}^{T}\rangle\geq 0, (ii) follows since LGL_{G^{-}} and Idiag(diag(X^ϵ)+errmax(diag(X^ϵ))+err)I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right) are PSD and their inner product is nonnegative and the diagonal entries of W+W_{+} are 0, (iii) follows from Lemma 4.1 and the fact that errϵ\textup{err}\leq\epsilon, and (iv) follows since 14ϵ(1+2ϵ)(16ϵ)1-4\epsilon\geq(1+2\epsilon)(1-6\epsilon). Combining the fact that C,XGoptCCG\langle C,X^{\star}_{G}\rangle\geq\textup{opt}_{CC}^{G} and 𝔼[𝒞]0.766C,Xf\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{f}\rangle with the above, we have

𝔼[𝒞]0.766(16ϵ)optCCG.\mathbb{E}[\mathcal{C}]\geq 0.766(1-6\epsilon)\textup{opt}_{CC}^{G}. (A.41)

A.7 Proof of Lemma 4.3

Proof.

We use Algorithm 1 with p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)} and T(n,ϵ)=64log(2n+|E|)n2ϵ2T(n,\epsilon)=\frac{64\log(2n+|E|)n^{2}}{\epsilon^{2}} to generate an ϵΔ\epsilon\Delta-optimal solution to (MA-LSE), where Δ=Tr(LG)+ijE+wij+\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}.

Upper bound on outer iteration complexity.

The convergence result given in Theorem A.1 holds when Algorithm 1 is applied to (k-Cut-LSE). Then, the total number of iterations of Algorithm 1, also known as outer iteration complexity, required to generate ϵΔ\epsilon\Delta-optimal solution to (k-Cut-LSE) is

T=2Cgu(1+η)ϵΔ2=2βMn2(1+η)ϵΔ2.T=\frac{2C_{g}^{u}(1+\eta)}{\epsilon\Delta}-2=\frac{2\beta Mn^{2}(1+\eta)}{\epsilon\Delta}-2. (A.42)

Bound on the value of generated clustering.

Algorithm 1 with p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)} generates a solution X^ϵ\widehat{X}_{\epsilon} that satisfies the bounds in Lemma 3.1 with probability with at least 1ϵ1-\epsilon after at most TT iterations. Thus, the bound given in (A.40) holds with probability at least 1ϵ1-\epsilon and we have

𝔼[C,Xf](16ϵ)C,XG(1ϵ)(17ϵ)C,XG.\mathbb{E}[\langle C,X^{f}\rangle]\geq(1-6\epsilon)\langle C,X^{\star}_{G}\rangle(1-\epsilon)\geq(1-7\epsilon)\langle C,X^{\star}_{G}\rangle. (A.43)

Let 𝔼L[]\mathbb{E}_{L}[\cdot] denote the expectation over the randomness in the subproblem LMO and 𝔼G[]\mathbb{E}_{G}[\cdot] denote the expectation over random Gaussian samples. The expected value of the generated clustering is then bounded as

𝔼[𝒞]=𝔼L[𝔼G[𝒞]](i)0.766𝔼L[C,Xf]0.766(17ϵ)C,XG0.766(17ϵ)optCCG,\mathbb{E}[\mathcal{C}]=\mathbb{E}_{L}[\mathbb{E}_{G}[\mathcal{C}]]\underset{(i)}{\geq}0.766\mathbb{E}_{L}[\langle C,X^{f}\rangle]\geq 0.766(1-7\epsilon)\langle C,X^{\star}_{G}\rangle\geq 0.766(1-7\epsilon)\textup{opt}_{CC}^{G}, (A.44)

where (i) follows from the fact that the value of clustering generated by CGW rounding scheme satisfies 𝔼[𝒞]0.766C,Xf\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{f}\rangle.

We now determine the inner iteration complexity of Algorithm 1.

Upper bound on inner iteration complexity.

At each iteration tt of Algorithm 1, the subroutine LMO (see (A.22)) is equivalent to approximately computing maximum eigenvector of the matrix gt\nabla g_{t}. This is achieved using Lanczos method whose convergence is given in Lemma A.2. Now, let N=12+1ρlog(n/p2)N=\frac{1}{2}+\frac{1}{\rho}\log(n/p^{2}). We see that the bound on 1/ρ1/\rho is

1ραgt(1+η)4ηϵΔ,\frac{1}{\rho}\leq\frac{\alpha\|\nabla g_{t}\|(1+\eta)}{4\eta\epsilon\Delta}, (A.45)

which is similar to (A.25). We now derive an upper bound on gt\|\nabla g_{t}\|.

Lemma A.5.

Let g(X)=LG+W+,XβϕM(diag(X)𝟙,[eiTXej](i,j)E)g(X)=\langle L_{G^{-}}+W^{+},X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},\left[-e_{i}^{T}Xe_{j}\right]_{(i,j)\in E}\right), where ϕM()\phi_{M}(\cdot) is defined in (2.1). We have gtΔ(1+4(2|E|+n))\|\nabla g_{t}\|\leq\Delta(1+4(\sqrt{2|E|+n})), where Δ=Tr(LG)+ijE+wij+\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}.

Proof.

For the function g(X)g(X) as defined in the lemma, gt=LG+W+βD\nabla g_{t}=L_{G^{-}}+W^{+}-\beta D, where DD is matrix such that Dii[1,1]D_{ii}\in[-1,1] for i=1,,ni=1,\dotsc,n, Dij[1,1]D_{ij}\in[-1,1] for (i,j)E(i,j)\in E, and Dij=0D_{ij}=0 for (i,j)E(i,j)\notin E, and E=EE+E=E^{-}\cup E^{+}. We have

maxk|λk(W+)|Tr(W+TW+)=(i,j)E+|wij+|2(i,j)E+wij+,and\displaystyle\max_{k}|\lambda_{k}(W^{+})|\leq\sqrt{\textup{Tr}(W^{+T}W^{+})}=\sqrt{\sum_{(i,j)\in E^{+}}|w^{+}_{ij}|^{2}}\leq\sum_{(i,j)\in E^{+}}w^{+}_{ij},\quad\textup{and} (A.46)
maxk|λk(D)|Tr(DTD)=i,j=1n|Dij|22|E|+n,\displaystyle\max_{k}|\lambda_{k}(D)|\leq\sqrt{\textup{Tr}(D^{T}D)}=\sqrt{\sum_{i,j=1}^{n}|D_{ij}|^{2}}\leq\sqrt{2|E|+n}, (A.47)
(A.48)

where the last inequality follows since DD has at most 2|E|+n2|E|+n nonzero entries in the range [1,1][-1,1]. Now, we have

gt=LG+W+βD(i)LG+W++βDmaxi|λi(LG)|+maxi|λi(W+)|+maxi|λi(βD)|(ii)Tr(LG)+(i,j)E+wij++β2|E|+n(iii)Δ(1+4(2|E|+n)).\begin{split}\|\nabla g_{t}\|=\|L_{G^{-}}+W^{+}-\beta D\|&\underset{(i)}{\leq}\|L_{G^{-}}\|+\|W^{+}\|+\|-\beta D\|\\ &\leq\max_{i}|\lambda_{i}(L_{G^{-}})|+\max_{i}|\lambda_{i}(W^{+})|+\max_{i}|\lambda_{i}(-\beta D)|\\ &\underset{(ii)}{\leq}\textup{Tr}(L_{G^{-}})+\sum_{(i,j)\in E^{+}}w^{+}_{ij}+\beta\sqrt{2|E|+n}\\ &\underset{(iii)}{\leq}\Delta(1+4(\sqrt{2|E|+n})).\end{split} (A.49)

where (i) follows since the spectral norm of LG+W+βDL_{G^{-}}+W^{+}-\beta D satisfies the triangle inequality, (ii) follows from (A.46),  (A.47) and the fact that LGL_{G^{-}} is a positive semidefinite matrix, and (iii) follows by substituting the value of Δ\Delta and β=4Δ\beta=4\Delta as given in Lemma 4.1. ∎

Substituting the bound on gt\|\nabla g_{t}\| in (A.45), we have

1ρ1+η4ηn(1+4(2|E|+n))ϵ,and\displaystyle\frac{1}{\rho}\leq\frac{1+\eta}{4\eta}\frac{n(1+4(\sqrt{2|E|+n}))}{\epsilon},\quad\textup{and} (A.50)
N=12+1ρlog(n/p2)12+1+η4ηn(1+4(2|E|+n))ϵlog(n/p2)=Nu.\displaystyle N=\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2})\leq\frac{1}{2}+\sqrt{\frac{1+\eta}{4\eta}}\sqrt{\frac{n(1+4(\sqrt{2|E|+n}))}{\epsilon}}\log(n/p^{2})=N^{u}. (A.51)
(A.52)

The computational complexity of Lanczos method is 𝒪(Nu(|E|+n))\mathcal{O}(N^{u}(|E|+n)), where the term |E|+n|E|+n appears since Lanczos method performs matrix-vector multiplication with gt\|\nabla g_{t}\|, whose sparsity is 𝒪(|E|)\mathcal{O}(|E|), plus additional 𝒪(n)\mathcal{O}(n) arithmetic operations at each iteration. We finally write the computational complexity of each iteration of Algorithm 1 as 𝒪(n|E|1.25ϵlog(n/p2))\mathcal{O}\left(\frac{\sqrt{n}|E|^{1.25}}{\sqrt{\epsilon}}\log(n/p^{2})\right).

Total computational complexity of Algorithm 1.

Since p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)}, we have

log(np2)=log((64)2n5(log(2n+|E|))2ϵ6)log(46n6ϵ6)=6log(4nϵ),\log\left(\frac{n}{p^{2}}\right)=\log\left(\frac{(64)^{2}n^{5}(\log(2n+|E|))^{2}}{\epsilon^{6}}\right)\leq\log\left(\frac{4^{6}n^{6}}{\epsilon^{6}}\right)=6\log\left(\frac{4n}{\epsilon}\right), (A.53)

where the inequality follows since |E|(n12)|E|\leq\binom{n-1}{2} and (log(2n+(n12)))2n\left(\log\left(2n+\binom{n-1}{2}\right)\right)^{2}\leq n for n1n\geq 1. Multiplying outer and inner iteration complexity and substituting the bound on pp, we prove that Algorithm 1 is a 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)-time algorithm. ∎

A.8 Proof of Lemma 5.1

For any symmetric matrix X𝕊nX\in\mathbb{S}^{n}, the definition of τ\tau-spectral closeness (Definition 5.1) implies

(1τ)LG,XLG~,X(1+τ)LG,X.(1-\tau)\langle L_{G},X\rangle\leq\langle L_{\tilde{G}},X\rangle\leq(1+\tau)\langle L_{G},X\rangle. (A.54)

Let CC and C~\tilde{C} be the cost matrix in the objective of (k-Cut-Rel), when the problem is defined on the graphs GG and G~\tilde{G} respectively. Since CC and C~\tilde{C} are scaled Laplacian matrices (with the same scaling factor (k1)/2k(k-1)/2k, from (A.54), we can write

(1τ)C,XC~,X(1+τ)C,X.(1-\tau)\langle C,X\rangle\leq\langle\tilde{C},X\rangle\leq(1+\tau)\langle C,X\rangle. (A.55)

Let XGX_{G}^{\star} and XG~X_{\tilde{G}}^{\star} be optimal solutions to (k-Cut-Rel) defined on the graphs GG and G~\tilde{G} respectively. From (A.55), we can write,

(1τ)C,XGC~,XGC~,XG~,(1-\tau)\langle C,X^{\star}_{G}\rangle\leq\langle\tilde{C},X^{\star}_{G}\rangle\leq\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle, (A.56)

where the last inequality follows since XGX_{G}^{\star} and XG~X_{\tilde{G}}^{\star} are feasible and optimal solutions respectively to (k-Cut-Rel) defined on the graph G~\tilde{G}. Combining this with the bound in Lemma 3.2, i.e., 𝔼[CUT]αk(14ϵ)C~,XG~\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-4\epsilon)\langle\tilde{C},X_{\tilde{G}}^{\star}\rangle, we get

𝔼[CUT]αk(14ϵ)C~,XG~(i)αk(14ϵ)(1τ)C,XG(ii)αk(14ϵτ)C,XG(iii)αk(14ϵτ)optkG,\begin{split}\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-4\epsilon)\langle\tilde{C},X_{\tilde{G}}^{\star}\rangle\underset{(i)}{\geq}\alpha_{k}(1-4\epsilon)(1-\tau)\langle C,X^{\star}_{G}\rangle&\underset{(ii)}{\geq}\alpha_{k}(1-4\epsilon-\tau)\langle C,X^{\star}_{G}\rangle\\ &\underset{(iii)}{\geq}\alpha_{k}(1-4\epsilon-\tau)\textup{opt}_{k}^{G},\end{split} (A.57)

where (i) follows from (A.56), (ii) follows since (14ϵ)(1τ)=14ϵτ+4ϵτ14ϵτ(1-4\epsilon)(1-\tau)=1-4\epsilon-\tau+4\epsilon\tau\geq 1-4\epsilon\tau for nonnegative ϵ\epsilon and τ\tau, and (iii) follows since C,XGoptkG\langle C,X^{\star}_{G}\rangle\geq\textup{opt}_{k}^{G} for an optimal solution XGX^{\star}_{G} to (k-Cut-Rel) defined on the graph GG.

A.9 Proof of Lemma 5.2

Proof.

The Laplacian matrices LGL_{G^{-}} and LG~L_{\tilde{G}^{-}} of the graphs GG^{-} and its sparse approximation G~\tilde{G}^{-} respectively satisfy (A.54). Furthermore, let LG+=D+W+L_{G}^{+}=D^{+}-W^{+}, where Dii+=j:(i,j)E+wij+D^{+}_{ii}=\sum_{j:(i,j)\in E^{+}}w^{+}_{ij}, be the Laplacian of the graph G+G^{+} and similarly let LG~+=D~+W~+L_{\tilde{G}}^{+}=\tilde{D}^{+}-\tilde{W}^{+} be the Laplacian of the graph G~+\tilde{G}^{+}. If X=IX=I, from (A.54), we have

(1τ)Tr(D+)Tr(D~+)(1+τ)Tr(D+).(1-\tau)\textup{Tr}(D^{+})\leq\textup{Tr}(\tilde{D}^{+})\leq(1+\tau)\textup{Tr}(D^{+}). (A.58)

Rewriting the second inequality in (A.54) for X=XGX=X^{\star}_{G}, and noting that diag(XG)=𝟙\textup{diag}(X^{\star}_{G})=\mathbbm{1}, we have

W+,XGW~+,XG1+τ+(1+τ)Tr(D+)Tr(D~+)1+τW~+,XG1+τ+2τTr(D+)1+τ,\begin{split}\langle W^{+},X^{\star}_{G}\rangle&\leq\frac{\langle\tilde{W}^{+},X^{\star}_{G}\rangle}{1+\tau}+\frac{(1+\tau)\textup{Tr}(D^{+})-\textup{Tr}(\tilde{D}^{+})}{1+\tau}\\ &\leq\frac{\langle\tilde{W}^{+},X^{\star}_{G}\rangle}{1+\tau}+\frac{2\tau\textup{Tr}(D^{+})}{1+\tau},\end{split} (A.59)

where the second inequality follows from (A.58). Let C=LG+W+C=L_{G^{-}}+W^{+} and C~=LG~+W~+\tilde{C}=L_{\tilde{G}^{-}}+\tilde{W}^{+} represent the cost in (MA-Rel) and (MA-Sparse) respectively. Let XGX^{\star}_{G} be an optimal solution to (MA-Rel). The optimal objective function value of (MA-Rel) at XGX^{\star}_{G} is C,XG\langle C,X^{\star}_{G}\rangle and

(1τ)C,XG=(1τ)LG,XG+(1τ)W+,XG(i)LG~,XG+1τ1+τW~+,XG+2τ(1τ)1+τTr(D+)(ii)C~,XG2τ1+τW~+,XG+2τ1+τTr(D~+)(iii)C~,XG~+2τ1+τC~,XG~,\begin{split}(1-\tau)\langle C,X^{\star}_{G}\rangle&=(1-\tau)\langle L_{G^{-}},X^{\star}_{G}\rangle+(1-\tau)\langle W^{+},X^{\star}_{G}\rangle\\ &\underset{(i)}{\leq}\langle L_{\tilde{G}^{-}},X^{\star}_{G}\rangle+\frac{1-\tau}{1+\tau}\langle\tilde{W}^{+},X^{\star}_{G}\rangle+\frac{2\tau(1-\tau)}{1+\tau}\textup{Tr}(D^{+})\\ &\underset{(ii)}{\leq}\langle\tilde{C},X^{\star}_{G}\rangle-\frac{2\tau}{1+\tau}\langle\tilde{W}^{+},X^{\star}_{G}\rangle+\frac{2\tau}{1+\tau}\textup{Tr}(\tilde{D}^{+})\\ &\underset{(iii)}{\leq}\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle+\frac{2\tau}{1+\tau}\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle,\end{split} (A.60)

where (i) follows from (A.54) and (A.59), (ii) follows from (A.58), and substituting C~=LG~+W~+\tilde{C}=L_{\tilde{G}^{-}}+\tilde{W}^{+} and rearranging the terms and (iii) holds true since W~+,XG0\langle\tilde{W}^{+},X^{\star}_{G}\rangle\geq 0, and II and XGX^{\star}_{G} are feasible to (MA-Sparse) so that Tr(D~+)C~,XG~\textup{Tr}(\tilde{D}^{+})\leq\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle and C~,XGC~,XG~\langle\tilde{C},X^{\star}_{G}\rangle\leq\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle. Rearraning the terms, we have

C,XG1+3τ1τ2C~,XG~.\langle C,X^{\star}_{G}\rangle\leq\frac{1+3\tau}{1-\tau^{2}}\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle. (A.61)

Combining (A.61) with the fact that the expected value of clustering 𝔼[𝒞]\mathbb{E}[\mathcal{C}] generated for the graph G~\tilde{G} satisfies (4.3), we have

𝔼[𝒞]0.766(16ϵ)C~,XG~0.766(16ϵ)(1τ2)1+3τC,XG(16ϵ3τ)(1τ2)optCCG,\begin{split}\mathbb{E}[\mathcal{C}]\geq 0.766(1-6\epsilon)\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle\geq 0.766\frac{(1-6\epsilon)(1-\tau^{2})}{1+3\tau}\langle C,X^{\star}_{G}\rangle\geq(1-6\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G},\end{split} (A.62)

where the last inequality follows since (16ϵ3τ)(1+3τ)16ϵ(1-6\epsilon-3\tau)(1+3\tau)\leq 1-6\epsilon. ∎

A.10 Proof of Lemma 5.3

The first step of the procedure given in Section 5 is to sparsify the input graph using the technique proposed in [23] whose computational complexity is 𝒪(|E|log2n)\mathcal{O}(|E|\log^{2}n). The second step when generating solutions to Max-k-Cut and Max-Agree is to apply the procedures given in Sections 3 and 4 respectively. The computational complexity of this step is bounded as given in Propositions 3.3 and 4.3 leading to a 𝒪(n2.5|E|1.25ϵ2.5log(n/ϵ)log(|E|))\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)-time algorithm.

Bound on the value of generated kk-cut.

Let p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)} and T(n,ϵ)=144log(2n+|E|)n2ϵ2T(n,\epsilon)=\frac{144\log(2n+|E|)n^{2}}{\epsilon^{2}} as given in Lemma 3.3. Using the procedure given in Section 3, we have 𝔼[CUT]αk(15ϵ)optkG~\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{\tilde{G}}. From the proof of Lemma 5.1, we see that CUT is then an approximate kk-cut for the input graph GG such that 𝔼[CUT]αk(15ϵτ)optkG\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon-\tau)\textup{opt}_{k}^{G}.

Bound on the value of generated clustering.

Let p=ϵT(n,ϵ)p=\frac{\epsilon}{T(n,\epsilon)} and T(n,ϵ)=64log(2n+|E|)n2ϵ2T(n,\epsilon)=\frac{64\log(2n+|E|)n^{2}}{\epsilon^{2}} as given in Lemma 4.3 and let the procedure given in Section 4 be applied to the sparse graph G~\tilde{G}. Then, the generated clustering satisfies 𝔼[𝒞]0.766(17ϵ)optCCG~\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon)\textup{opt}_{CC}^{\tilde{G}}. Combining this with the proof of Lemma 5.2, we have 𝔼[𝒞]0.766(17ϵ3τ)(1τ2)optCCG\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G}.

Appendix B Preliminary Computational Results for Max-k-Cut

We provide some preliminary computational results when generating an approximate kk-cut on the graph GG using the approach outlined in Section 3. The aim of these experiments was to verify that the bounds given in Lemma 3.2 were satisfied in practice. First, we solved (k-Cut-LSE) to ϵTr(C)\epsilon\textup{Tr}(C)-optimality using Algorithm 1 with the input parameters set to α=n\alpha=n, ϵ=0.05\epsilon=0.05, β=6Tr(C)\beta=6\textup{Tr}(C), M=6log(2n)+|E|ϵM=6\frac{\log(2n)+|E|}{\epsilon}. We then computed feasible samples using Algorithm 2 and then finally used the FJ rounding scheme on the generated samples. The computations were performed using MATLAB R2021a on a machine with 8GB RAM. The peak memory requirement was noted using the profiler command in MATLAB.

We performed computations on randomly selected graphs from GSet dataset. In each case, the infeasibility of the covariance of the generated samples was less than ϵ\epsilon, thus satisfying (3.4). The number of iterations of LMO in Algorithm 1 was also within the bounds given in Proposition 1.1. To a generate kk-cut, we generated 10 sets of kk i.i.d. zero-mean Gaussian samples with covariance X^ϵ\widehat{X}_{\epsilon}, and each set was then used to generate a kk-cut for the input graph using FJ rounding scheme. Let CUTbest\texttt{CUT}_{\textup{best}} denote the value of the best kk-cut amongst the 10 generated cuts. Table LABEL:table:maxkcutResults shows the result for graphs from the GSet dataset with k=3,4k=3,4. Note that, CUTbest𝔼[CUT]αk(14ϵ)C,Xαk14ϵ1+4ϵC,X^ϵ\texttt{CUT}_{\textup{best}}\geq\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-4\epsilon)\langle C,X^{\star}\rangle\geq\alpha_{k}\frac{1-4\epsilon}{1+4\epsilon}\langle C,\widehat{X}_{\epsilon}\rangle, where the last inequality follows from combining (3.5) with (3.3). Since we were able to generate the values, CUTbest\texttt{CUT}_{\textup{best}} and C,X^ϵ\langle C,\widehat{X}_{\epsilon}\rangle, we noted that the weaker bound CUTbest/C,X^ϵ=ARαk(14ϵ)/(1+4ϵ)\texttt{CUT}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle=\textup{AR}\geq\alpha_{k}(1-4\epsilon)/(1+4\epsilon) was satisfied by every input graph when ϵ=0.05\epsilon=0.05.

Furthermore, Table LABEL:table:maxkcutResults also shows that the memory used by our method was linear in the size of the input graph. To see this, consider the dataset G1, and note that for k=3k=3, the memory used by our method was 1252.73kB8.02×(|V|+|E|)×81252.73\textup{kB}\approx 8.02\times(|V|+|E|)\times 8, where a factor of 8 denotes that MATLAB uses 8 bytes to store a real number. Similarly, for other instances in GSet, the memory used by our method to generate an approximate kk-cut for k=3,4k=3,4 was at most c×(|V|+|E|)×8c\times(|V|+|E|)\times 8, where for each graph the value of cc was bounded by c82c\leq 82, showing linear dependence of the memory used on the size of the input graph.

Table 2: Result of generating a kk-cut for graphs from GSet using the method outlined in Section 3. We have, infeas=max{diag(X)1,max{0,[X^ϵ]ij1k1}}\textup{infeas}=\max\{\|\textup{diag}(X)-1\|_{\infty},\max\{0,-[\widehat{X}_{\epsilon}]_{ij}-\frac{1}{k-1}\}\} and AR=CUTbest/C,X^ϵ\textup{AR}=\texttt{CUT}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle.
Dataset |V||V| |E||E| kk # Iterations (×103\times 10^{3}) infeas C,X^ϵ\langle C,\widehat{X}_{\epsilon}\rangle CUTbest\textup{CUT}_{\textup{best}} AR Memory required (in kB)
G1 800 19176 3 823.94 4×1044\times 10^{-4} 15631 14266 0.9127 1252.73
G1 800 19176 4 891.23 4×1044\times 10^{-4} 17479 15746 0.9 1228.09
G2 800 19176 3 827.61 6×1056\times 10^{-5} 15629 14332 0.917 1243.31
G2 800 19176 4 9268.42 8×1058\times 10^{-5} 17474 15786 0.903 1231.07
G3 800 19176 3 1242.53 7×1057\times 10^{-5} 15493 14912 0.916 1239.57
G3 800 19176 4 1341.37 7×10457\times 10^{-45} 17301 15719 0.908 1240.17
G4 800 19176 3 812.8 9×1059\times 10^{-5} 15660 14227 0.908 1230.59
G4 800 19176 4 9082.74 10410^{-4} 17505 15748 0.899 1223.59
G5 800 19176 3 843.5 10410^{-4} 15633 14341 0.917 1222.09
G5 800 19176 4 9294.32 10410^{-4} 17470 15649 0.895 1227.9
G14 800 4694 3 1240.99 0.002 3917 2533 0.646 3502.64
G14 800 4694 4 3238.42 0.001 4467.9 3775 0.844 519.25
G15 800 4661 3 3400.17 0.001 4018.6 3385 0.842 612
G15 800 4661 4 1603.13 0.001 4475.8 3754 0.838 648.17
G16 800 4672 3 33216.68 0.001 4035.7 3422 0.847 561
G16 800 4672 4 3059.11 0.001 4437.5 3783 0.852 2800
G17 800 4667 3 3526.4 0.001 4031.5 3414 0.846 602.81
G17 800 4667 4 3400.01 0.001 4440 3733 0.84 693.6
G22 2000 19990 3 7402.59 10410^{-4} 17840 11954 0.67 1340.34
G22 2000 19990 4 8103.83 10410^{-4} 19582 16670 0.851 1341.67
G23 2000 19990 3 3597.39 10410^{-4} 17938 15331 0.854 1360.09
G23 2000 19990 4 3588.04 10410^{-4} 19697 16639 0.844 1317.09
G24 2000 19990 3 4304.48 10410^{-4} 17913 15370 0.858 1341.96
G24 2000 19990 4 1994.26 10410^{-4} 19738 16624 0.842 1321.59
G25 2000 19990 3 9774.03 10410^{-4} 18186 15294 0.841 1311.54
G25 2000 19990 4 1540.14 10410^{-4} 19778 16641 0.841 1330.95
G26 2000 19990 3 2069.65 10410^{-4} 18012 15411 0.855 1321.92
G26 2000 19990 4 1841.06 2×1042\times 10^{-4} 19735 16609 0.841 1331.53
G43 1000 9990 3 894.53 10410^{-4} 9029 7785 0.862 661.09
G43 1000 9990 4 9709.68 2×1042\times 10^{-4} 9925 8463 0.852 665.59
G44 1000 9990 3 721.64 10410^{-4} 9059.5 7782 0.859 661.09
G44 1000 9990 4 9294.43 10410^{-4} 9926.1 8448 0.851 765.37
G45 1000 9990 3 794.84 10410^{-4} 9038.4 7773 0.86 661.09
G45 1000 9990 4 9503.74 2×1042\times 10^{-4} 9929.7 8397 0.845 669
G46 1000 9990 3 703.4 10410^{-4} 9068.5 7822 0.862 661.09
G46 1000 9990 4 9684.93 4×1044\times 10^{-4} 9929.9 8333 0.839 657.09
G47 1000 9990 3 777.61 10410^{-4} 9059.4 7825 0.863 679.89
G47 1000 9990 4 9789.55 2×1042\times 10^{-4} 9930.8 8466 0.852 661.09

Appendix C Additional Computational Results for Correlation Clustering

We provide the computational result for the graphs from the GSet dataset (not included in the main article) here. We performed computations for graphs G1-G57 from GSet dataset. The instances for which we were able to generate an ϵΔ\epsilon\Delta-optimal solution to (MA-LSE) are given in Table LABEL:table:CCAddResults, where the parameters, ϵ\epsilon and Δ\Delta, were set as given in Section 6. For the instances not in the table, we were not able to generate an ϵΔ\epsilon\Delta-optimal solution after 30 hours of runtime.

Table 3: Result of generating a clustering of graphs from GSet using the method outlined in Section 4. We have, infeas=max{diag(X)1,max{0,[X^ϵ]ij}}\textup{infeas}=\max\{\|\textup{diag}(X)-1\|_{\infty},\max\{0,-[\widehat{X}_{\epsilon}]_{ij}\}\}, AR=𝒞best/C,X^ϵ\textup{AR}=\mathcal{C}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle and 0.75(16ϵ)/(1+4ϵ)=0.43750.75(1-6\epsilon)/(1+4\epsilon)=0.4375 for ϵ=0.05\epsilon=0.05.
Dataset |V||V| |E+||E^{+}| |E||E^{-}| # Iterations (×103\times 10^{3}) infeas C,X^ϵ\langle C,\widehat{X}_{\epsilon}\rangle 𝒞best\mathcal{C}_{\textup{best}} AR Memory required (in kB)
G2 800 2501 16576 681.65 8×1048\times 10^{-4} 848.92 643.13 0.757 1520.18
G3 800 2571 16498 677.56 7×1047\times 10^{-4} 835.05 634.83 0.76 1529.59
G4 800 2457 16622 665.93 6×1046\times 10^{-4} 852.18 647.37 0.76 1752
G5 800 2450 16623 646.4 10310^{-3} 840.63 636.21 0.756 1535.92
G6 800 9665 9511 429.9 3×1043\times 10^{-4} 25766 21302 0.826 1664
G7 800 9513 9663 423.58 8×1048\times 10^{-4} 26001 20790 0.799 1535.06
G8 800 9503 9673 421.34 6×1046\times 10^{-4} 26005 21080 0.81 4284
G9 800 9556 9620 426.4 3×1043\times 10^{-4} 25903 21326 0.823 1812
G10 800 9508 9668 426.25 3×1043\times 10^{-4} 25974 21412 0.824 1535.59
G12 800 798 802 393.69 9×1049\times 10^{-4} 3023.4 2034 0.672 444.06
G13 800 817 783 416.29 8×1048\times 10^{-4} 3001.1 2010 0.669 613.03
G15 800 3801 825 284.77 10310^{-3} 529.83 401.19 0.757 460.17
G16 800 3886 749 228.12 8×1048\times 10^{-4} 524.69 417.88 0.796 451.07
G17 800 3899 744 2448.633 9×1049\times 10^{-4} 536.65 369.04 0.687 480.45
G18 800 2379 2315 1919.44 2×1032\times 10^{-3} 7237.6 5074 0.701 434.67
G19 800 2274 2387 2653.79 2×1032\times 10^{-3} 7274.2 5130 0.705 496
G20 800 2313 2359 1881.75 2×1032\times 10^{-3} 7258.1 5186 0.714 406.09
G21 800 2300 2367 1884.97 2×1032\times 10^{-3} 7281.3 5238 0.719 467.26
G23 2000 120 19855 550.77 2×1032\times 10^{-3} 1802.2 1373.2 0.762 1651.54
G24 2000 96 19875 812.16 10310^{-3} 1811.2 1384.6 0.764 1678.04
G25 2000 109 19872 1739.06 6×1046\times 10^{-4} 1801.8 1398.1 0.776 1650.48
G26 2000 117 19855 1125.74 10310^{-3} 1789.9 1356.9 0.758 1650.01
G27 2000 9974 10016 464.93 5×1045\times 10^{-4} 30502 22010 0.721 1647.09
G28 2000 9943 10047 553.65 4×1044\times 10^{-4} 30412 22196 0.729 1317.78
G29 2000 10035 9955 513.97 2×1042\times 10^{-4} 30366 23060 0.759 1310.46
G30 2000 10045 9945 594.09 3×1043\times 10^{-4} 30255 22550 0.745 1310.48
G31 2000 9955 10035 1036.9 2×1042\times 10^{-4} 29965 22808 0.761 1305.05
G33 2000 1985 2015 403.75 10310^{-3} 7442 4404 0.591 634.93
G34 2000 1976 2024 863.53 4×1044\times 10^{-4} 7307.2 4760 0.651 574.12
G44 1000 229 9721 515.18 10310^{-3} 810.82 616.61 0.76 655.09
G45 1000 218 9740 504.91 10310^{-3} 812.21 615.84 0.758 660.51
G46 1000 237 9723 469.6 10310^{-3} 818.39 623.95 0.762 655.09
G47 1000 230 9732 495.24 9×1049\times 10^{-4} 819.63 621.65 0.758 648.32
G49 3000 0 6000 1002.59 0.003 599.64 456.48 0.761 733
G50 3000 0 6000 996.19 0.004 599.64 455.78 0.76 540.26
G52 1000 4750 1127 2041.8 0.001 684.1 441.02 0.644 757.59
G53 1000 4820 1061 785.33 8×1048\times 10^{-4} 695.53 445.03 0.639 417.07
G54 1000 4795 1101 2899.99 7×1047\times 10^{-4} 686.8 482.57 0.702 517.09
G56 5000 6222 6276 1340.35 0.004 22246 12788 0.574 1243.98