Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering

Nimita Shinde IITB-Monash Research Academy Industrial Engineering and Operations Research, IIT Bombay Electrical and Computer Systems Engineering, Monash University Vishnu Narayanan Industrial Engineering and Operations Research, IIT Bombay James Saunderson Electrical and Computer Systems Engineering, Monash University

Abstract

Max-k-Cut and correlation clustering are fundamental graph partitioning problems. For a graph $G=(V,E)$ with $n$ vertices, the methods with the best approximation guarantees for Max-k-Cut and the Max-Agree variant of correlation clustering involve solving SDPs with $\mathcal{O}(n^{2})$ constraints and variables. Large-scale instances of SDPs, thus, present a memory bottleneck. In this paper, we develop simple polynomial-time Gaussian sampling-based algorithms for these two problems that use $\mathcal{O}(n+|E|)$ memory and nearly achieve the best existing approximation guarantees. For dense graphs arriving in a stream, we eliminate the dependence on $|E|$ in the storage complexity at the cost of a slightly worse approximation ratio by combining our approach with sparsification.

1 Introduction

Semidefinite programs (SDPs) arise naturally as a relaxation of a variety of problems such as $k$ -means clustering [5], correlation clustering [6] and Max-k-Cut [14]. In each case, the decision variable is an $n\times n$ matrix and there are $d=\Omega(n^{2})$ constraints. While reducing the memory bottleneck for large-scale SDPs has been studied quite extensively in literature [9, 19, 11, 37], all these methods use memory that scales linearly with the number of constraints and also depends on either the rank of the optimal solution or an approximation parameter. A recent Gaussian-sampling based technique to generate a near-optimal, near-feasible solution to SDPs with smooth objective function involves replacing the decision variable $X$ with a zero-mean random vector whose covariance is $X$ [28]. This method uses at most $\mathcal{O}(n+d)$ memory, independent of the rank of the optimal solution. However, for SDPs with $d=\Omega(n^{2})$ constraints, these algorithms still use $\Omega(n^{2})$ memory and provide no advantage in storage reduction. In this paper, we show how to adapt the Gaussian sampling-based approach of [28] to generate an approximate solution with provable approximation guarantees to Max-k-Cut, and to the Max-Agree variant of correlation clustering on a graph $G=(V,E)$ with arbitrary edge weights using only $\mathcal{O}(|V|+|E|)$ memory.

1.1 Max-k-Cut

Max-k-Cut is the problem of partitioning the vertices of a weighted undirected graph $G=(V,E)$ into $k$ distinct parts, such that the total weight of the edges across the parts is maximized. If $w_{ij}$ is the edge weight corresponding to edge $(i,j)\in E$ , then the cut value of a partition is $\texttt{CUT}=\sum_{i\ \textup{and}\ j\ \textup{are in different partitions}}w_{ij}$ . Consider the standard SDP relaxation of Max-k-Cut

\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\textup{diag}(X)=\mathbbm{1}\\ &X_{ij}\geq-\frac{1}{k-1}\quad i\neq j,\end{cases}

(k-Cut-SDP)

where $C=\frac{k-1}{2k}L_{G}$ is a scaled Laplacian. Frieze and Jerrum [14] developed a randomized rounding scheme that takes an optimal solution $X^{\star}$ of (k-Cut-SDP) and produces a random partitioning satisfying

\mathbb{E}[\texttt{CUT}]=\sum_{ij\in E,i<j}w_{ij}\textup{Pr}(i\ \textup{and}\ j\ \textup{are in different partitions})\geq\alpha_{k}\langle C,X^{\star}\rangle\geq\alpha_{k}\textup{opt}_{k}^{G},

(1.1)

where $\textup{opt}_{k}^{G}$ is the optimal $k$ -cut value and $\alpha_{k}=\min_{-1/(k-1)\leq\rho\leq 1}\frac{kp(\rho)}{(k-1)(1-\rho)}$ , where $p(\rho)$ is the probability that $i$ and $j$ are in different partitions given that $X_{ij}=\rho$ . The rounding scheme proposed in [14], referred to as the FJ rounding scheme in the rest of the paper, generates $k$ i.i.d. samples, $z_{1},\dotsc,z_{k}\sim\mathcal{N}(0,X^{\star})$ and assigns vertex $i$ to part $p$ , if $[z_{p}]_{i}\geq[z_{l}]_{i}$ for all $l=1,\dotsc,k$ .

1.2 Correlation clustering

In correlation clustering, we are given a set of $|V|$ vertices together with the information indicating whether pairs of vertices are similar or dissimilar, modeled by the edges in the sets $E^{+}$ and $E^{-}$ respectively. The Max-Agree variant of correlation clustering seeks to maximize

\mathcal{C}=\sum_{ij\in E^{-}}w_{ij}^{-}\mathbbm{1}_{[i,j\textup{ in different clusters}]}+\sum_{ij\in E^{+}}w_{ij}^{+}\mathbbm{1}_{[i,j\textup{ in the same cluster}]}.

Define $G^{+}=(V,E^{+})$ and $G^{-}=(V,E^{-})$ . A natural SDP relaxation of Max-Agree [6] is

\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\textup{diag}(X)=\mathbbm{1}\\ &X_{ij}\geq 0\quad i\neq j,\end{cases}

(MA-SDP)

where $C=L_{G^{-}}+W^{+}$ , $L_{G^{-}}$ is the Laplacian of the graph $G^{-}$ and $W^{+}$ is the weighted adjacency matrix of the graph $G^{+}$ . Charikar et al. [10] (see also Swamy [31]) propose a rounding scheme that takes an optimal solution $X^{\star}_{G}$ of (MA-SDP) and produces a random clustering $\mathcal{C}$ satisfying

\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{\star}_{G}\rangle\geq 0.766\textup{opt}^{G}_{CC},

(1.2)

where $\textup{opt}_{CC}^{G}$ is the optimal clustering value. The rounding scheme proposed in [10], referred to as the CGW rounding scheme in the rest of the paper, generates either $k=2$ or $k=3$ i.i.d. zero-mean Gaussian samples with covariance $X^{\star}_{G}$ and uses them to define $2^{k}$ clusters.

1.3 Contributions

We now summarize key contributions of the paper.

Gaussian sampling for Max-k-Cut. Applying Gaussian sampling-based Frank-Wolfe given in [28] directly to (k-Cut-SDP) uses $n^{2}$ memory. We, however, show how to extend the approach from [28] to Max-k-Cut by proposing an alternate SDP relaxation for the problem, and combining it with the FJ rounding scheme to generate a $k$ -cut with nearly the same approximation guarantees as stated in (1.1) (see Proposition 1.1) using $\mathcal{O}(n+|E|)$ memory. A key highlight of our approach is that while the approximation ratio remains close to the state-of-the-art result in (1.1), reducing it by a factor of $1-5\epsilon$ for $\epsilon\in(0,1/5)$ , the memory used is independent of $\epsilon$ . We summarize our result as follows.

Proposition 1.1.

For $\epsilon\in(0,1/5)$ , our $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ -time method outlined in Section 3 uses $\mathcal{O}(n+|E|)$ memory and generates a $k$ -cut for the graph $G=(V,E)$ whose expected value satisfies $\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{G}$ , where $\textup{opt}_{k}^{G}$ is the optimal $k$ -cut value.

Gaussian sampling for Max-Agree. The structure of (MA-SDP) is similar to that of (k-Cut-SDP), however, the cost matrix in (MA-SDP) is no longer PSD or diagonally dominant, a property that plays an important role in our analysis in the case of Max-k-Cut. Despite this, we show how to generate a $(1-7\epsilon)0.766$ -optimal clustering using $\mathcal{O}(n+|E|)$ memory. Our approach makes a small sacrifice in the approximation ratio (as compared to (1.2)), however, the memory used remains independent of $\epsilon$ .

Proposition 1.2.

For $\epsilon\in(0,1/7)$ , our $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ -time method outlined in Section 4 uses $\mathcal{O}(n+|E|)$ memory and generates a clustering of graph $G=(V,E)$ whose expected value satisfies $\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon)\textup{opt}^{G}_{CC}$ , where $\textup{opt}^{G}_{CC}$ is the optimal clustering value.

The constructive proof outline of Propositions 1.1 and 1.2 is given in Sections 3 and 4 respectively.

Memory reduction using graph sparsification. Propositions 1.1 and 1.2 state that the memory used by our approach is $\mathcal{O}(n+|E|)$ . However, for dense graphs, the memory used by our method becomes $\Theta(n^{2})$ . In this setting, to reduce the memory used, we first need to change the way we access the problem instance. We assume that the input (weighted) graph $G$ arrives edge-by-edge, eliminating the need to store the entire dense graph. We then replace it with a $\tau$ -spectrally close graph $\tilde{G}$ (see Definition 5.1) with $\mathcal{O}(n\log n/\tau^{2})$ edges. Next, we compute an approximate solution to the new problem defined on the sparse graph using $\mathcal{O}(n\log n/\tau^{2})$ memory. For Max-k-Cut and Max-Agree, we show that this method generates a solution with provable approximation guarantees.

1.4 Literature review

We first review key low memory algorithms for linearly constrained SDPs.

Burer and Monteiro [9] proposed a nonlinear programming approach which replaces the PSD decision variable with its low-rank factorization in SDPs with $d$ linear constraints. If the selected value of rank $r$ satisfies $r(r+1)\geq 2d$ and the constraint set is a smooth manifold, then any second-order critical point of the nonconvex problem is a global optimum [8]. Another approach, that requires $\Theta(d+nr)$ working memory, is to first determine (approximately) the subspace in which the (low) rank- $r$ solution to an SDP lies and then solve the problem over the (low) $r$ -dimensional subspace [11].

Alternatively, randomized sketching to a low dimensional subspace is often used as a low-memory alternative to storing a matrix decision variable [35, 32]. Recently, such sketched variables have been used to generate a low-rank approximation of a near-optimal solution to SDPs [38]. The working memory required to compute a near-optimal solution and generate its rank- $r$ approximation using the algorithmic framework proposed by Yurtsever et al. [38] is $\mathcal{O}(d+rn/\zeta)$ for some sketching parameter $\zeta\in(0,1)$ . Gaussian sampling-based Frank-Wolfe [28] uses $\mathcal{O}(n+d)$ memory to generate zero-mean Gaussian samples whose covariance represents a near-optimal solution to the SDPs with $d$ linear constraints. This eliminates the dependency on the rank of the near-optimal solution or the accuracy to which its low rank approximation is computed.

However, the two problems considered in this paper have SDP relaxations with $n^{2}$ constraints, for which applying the existing low-memory techniques provide no benefit since the memory requirement of these techniques depends on the number of constraints in the problem. These problems have been studied extensively in literature as we see below.

Max-k-Cut.

Max-k-Cut and its dual Min-k-Partition have applications in frequency allocation [12] and generating lower bound on co-channel interference in cellular networks [7]. These problems have been studied extensively in the literature [20, 27, 30]. The SDP-based rounding scheme given in [14] has also been adapted for similar problems of capacitated Max-k-Cut [16] and approximate graph coloring [21]. In each case, however, the SDP relaxation has $\Omega(n^{2})$ constraints. Alternative heuristic methods have been proposed in [26, 24, 13, 17], however, these methods generate a feasible cut which only provides a lower bound on the optimal cut value.

Correlation clustering.

Swamy [31], Charikar et al. [10] each provide 0.766-approximation schemes for Max-Agree which involve solving (MA-SDP). For large-scale applications, data streaming techniques have been studied quite extensively for various clustering problems, such as $k$ -means and $k$ -median [3, 25]. Ahn et al. [2] propose a single-pass, $\mathcal{\tilde{O}}(|E|+n\epsilon^{-10})$ -time $0.766(1-\epsilon)$ -approximation algorithm for Max-Agree that uses $\mathcal{\tilde{O}}(n/\epsilon^{2})$ memory. In contrast, to achieve the same approximation guarantee, our approach uses $\mathcal{O}\left(n+\min\left\{|E|,\frac{n\log n}{\epsilon^{2}}\right\}\right)$ memory which is equal to $\mathcal{O}(n+|E|)$ for sparse graphs, and is independent of $\epsilon$ . Furthermore, the computational complexity of our approach has a better dependence on $\epsilon$ given by $\mathcal{O}\left(\frac{n^{2.5}}{\epsilon^{2.5}}\min\{|E|,\frac{n\log n}{\epsilon^{2}}\}^{1.25}\log(n/\epsilon)\log(|E|)\right)$ , and is at most $\mathcal{O}\left(\frac{n^{3.75}}{\epsilon^{5}}(\log n)^{1.25}\log(n/\epsilon)\log(|E|)\right)$ . Moreover, our approach is algorithmically simple to implement.

1.5 Outline

In Section 2, we review the Gaussian sampling-based Frank-Wolfe method [28] to compute a near-feasible, near-optimal solution to SDPs with linear equality and inequality constraints. In Sections 3 and 4 respectively, we adapt the Gaussian sampling-based approach to give an approximation algorithm for Max-k-Cut and Max-Agree respectively, that use only $\mathcal{O}(n+|E|)$ memory, proving Propositions 1.1 and 1.2 respectively. In Section 5, we show how to combine our methods with streaming spectral sparsification to reduce the memory required to $\mathcal{O}(n\log n/\epsilon^{2})$ for dense graphs presented edge-by-edge in a stream. We provide some preliminary computational results for Max-Agree in Section 6, and conclude our work and discuss possible future directions in Section 7. All proofs are deferred to the appendix.

Notations.

The matrix inner product is denoted by $\left\langle A,B\right\rangle=\textup{Tr}\left(A^{T}B\right)$ . The vector of diagonal entries of a matrix $X$ is $\textup{diag}(X)$ , and $\textup{diag}^{*}(x)$ is a diagonal matrix with the vector $x$ on the diagonal. The notations $\mathcal{O},\Omega,\Theta$ have the usual complexity interpretation and $\tilde{\mathcal{O}}$ suppresses the dependence on $\log n$ . An undirected edge $(i,j)$ in the set $E$ is denoted using $(i,j)\in E$ and $ij\in E$ interchangably.

2 Gaussian Sampling-based Frank-Wolfe

Consider a smooth, concave function $g$ and define the trace constrained SDP

\max_{X\in\mathcal{S}}\quad g(\mathcal{B}(X)),

(BoundedSDP)

where $\mathcal{S}=\{\textup{Tr}(X)\leq\alpha,X\succeq 0\}$ and $\mathcal{B}(\cdot):\mathbb{S}^{n}\rightarrow\mathbb{R}^{d}$ is a linear mapping that projects the variable from $\binom{n+1}{2}$ -dimensional space to a $d$ -dimensional space. One algorithmic approach to solving (BoundedSDP) is to use the Frank-Wolfe algorithm [18] which, in this case, computes an $\epsilon$ -optimal solution by taking steps of the form $X_{t+1}=(1-\gamma_{t})X_{t}+\gamma_{t}\alpha h_{t}h_{t}^{T}$ , where $\gamma_{t}\in[0,1]$ and unit vectors $h_{t}$ ’s arise from approximately solving a symmetric eigenvalue problem that depends only on $\mathcal{B}(X_{t})$ and $g(\cdot)$ . Standard convergence results show that an $\epsilon$ -optimal solution is reached after $\mathcal{O}(C^{u}_{g}/\epsilon)$ iterations, where $C_{g}^{u}$ is an upper bound on the curvature constant of $g$ [18].

Frank-Wolfe with Gaussian sampling.

The Gaussian sampling technique of [28] replaces the matrix-valued iterates, $X_{t}$ , with Gaussian random vectors $z_{t}\sim\mathcal{N}(0,X_{t})$ . The update, at the level of samples, is then $z_{t+1}=\sqrt{1-\gamma_{t}}z_{t}+\sqrt{\gamma_{t}\alpha}\,\zeta_{t}h_{t}$ , where $\zeta_{t}\sim\mathcal{N}(0,1)$ . Note that $z_{t+1}$ is also a zero-mean Gaussian random vector with covariance equal to $X_{t+1}=(1-\gamma_{t})X_{t}+\gamma_{t}\alpha h_{t}h_{t}^{T}$ . Furthermore, to track the change in the objective function value, it is sufficient to track the value $v_{t}=\mathcal{B}(X_{t})$ , and compute $v_{t+1}$ such that $v_{t+1}=(1-\gamma_{t})v_{t}+\gamma_{t}\mathcal{B}(\alpha h_{t}h_{t}^{T})$ . Thus, computing the updates to the decision variable and tracking the objective function value only requires the knowledge of $z_{t}\sim\mathcal{N}(0,X_{t})$ and $\mathcal{B}(X_{t})$ , which can be updated without explicitly storing $X_{t}$ , thereby reducing the memory used.

Algorithm 1 [28] describes, in detail, Frank-Wolfe algorithm with Gaussian sampling when applied to (BoundedSDP). It uses at most $\mathcal{O}(n+d)$ memory at each iteration, and after at most $\mathcal{O}(C_{g}^{u}/\epsilon)$ iterations, returns a sample $\widehat{z}_{\epsilon}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ , where $\widehat{X}_{\epsilon}$ is an $\epsilon$ -optimal solution to (BoundedSDP).

Input : Input data for (BoundedSDP), stopping criteria

\epsilon

, accuracy parameter

\eta

, upper bound

C_{g}^{u}

on the curvature constant, probability

p

for the subproblem LMO

Output :

z\sim\mathcal{N}(0,\widehat{X}_{\epsilon})

and

v=\mathcal{B}(\widehat{X}_{\epsilon})

, where

\widehat{X}_{\epsilon}

is an

\epsilon

-optimal solution of (BoundedSDP)

5 Function FWGaussian:

6 Select initial point

X_{0}\in\mathcal{S}

; set

v_{0}\leftarrow\mathcal{B}(X_{0})

and sample

z_{0}\sim\mathcal{N}(0,X_{0})

t\leftarrow 0

\gamma\leftarrow 2/(t+2)

(h_{t},q_{t})\leftarrow\textnormal{{LMO}}(\mathcal{B}^{*}(\nabla g(v_{t})),\frac{1}{2}\eta\gamma C_{g}^{u})

10 while $\left\langle q_{t}-v_{t},\nabla g(v_{t})\right\rangle>\epsilon$ do

(z_{t+1},v_{t+1})\leftarrow\textnormal{{UpdateVariable}}(z_{t},v_{t},h_{t},q_{t},\gamma)

t\leftarrow t+1

\gamma\leftarrow 2/(t+2)

(h_{t},q_{t})\leftarrow\textnormal{{LMO}}(\mathcal{B}^{*}(\nabla g(v_{t})),\frac{1}{2}\eta\gamma C_{g}^{u},p)

16 end while

18 return $(z_{t},v_{t})$

20 Function LMO( $J$ , $\delta$ ):

21 Find a unit vector

h

such that with probability at least

1-p

\alpha\lambda=\alpha\langle hh^{T},J\rangle\geq\max_{d\in\mathcal{S}}\alpha\langle d,J\rangle-\delta

22 if $\lambda\geq 0$ then

q\leftarrow\mathcal{B}(\alpha hh^{T})

23 else

q\leftarrow 0

h\leftarrow 0

25 return ( $h,q$ )

27 Function UpdateVariable( $z,v,h,q,\gamma$ ):

z\leftarrow(\sqrt{1-\gamma})z+\sqrt{\gamma\alpha}\,h\zeta\;\;\textup{where $\zeta\sim\mathcal{N}(0,1)$}

v\leftarrow(1-\gamma)v+\gamma q

31 return ( $z,v$ )

Algorithm 1 (FWGaussian) Frank-Wolfe Algorithm with Gaussian Sampling [28]

2.1 SDP with linear equality and inequality constraints

Consider an SDP with linear objective function and a bounded feasible region,

\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\mathcal{A}^{(1)}(X)=b^{(1)}\\ &\mathcal{A}^{(2)}(X)\geq b^{(2)},\end{cases}

(SDP)

where $\mathcal{A}^{(1)}(\cdot):\mathbb{S}^{n}_{+}\rightarrow\mathbb{R}^{d_{1}}$ and $\mathcal{A}^{(2)}(\cdot):\mathbb{S}^{n}_{+}\rightarrow\mathbb{R}^{d_{2}}$ are linear maps. To use Algorithm 1, the linear constraints are penalized using a smooth penalty function. Let $u_{l}=\langle A^{(1)}_{l},X\rangle-b^{(1)}_{l}$ for $l=1,\dotsc,d_{1}$ and $v_{l}=b^{(2)}_{l}-\langle A^{(2)}_{l},X\rangle$ for $l=1,\dotsc,d_{2}$ . For $M>0$ , the smooth function $\phi_{M}(\cdot):\mathbb{R}^{d_{1}+d_{2}}\rightarrow\mathbb{R}$ ,

	$\displaystyle\phi_{M}(u,v)=\frac{1}{M}\log\left(\sum_{i=1}^{d_{1}}e^{M(u_{i})}+\sum_{i=1}^{d_{1}}e^{M(-u_{i})}+\sum_{j=1}^{d_{2}}e^{M(v_{j})}\right),\quad\textup{satisfies}$		(2.1)
	$\displaystyle\max\left\{\\|u\\|_{\infty},\max_{i}v_{i}\right\}\leq\phi_{M}(u,v)\leq\frac{\log(2d_{1}+d_{2})}{M}+\max\left\{\\|u\\|_{\infty},\max_{i}v_{i}\right\}.$		(2.2)

We add this penalty term to the objective of (SDP) and define

\max_{X\succeq 0}\left\{\langle C,X\rangle-\beta\phi_{M}(\mathcal{A}^{(1)}(X)-b^{(1)},b^{(2)}-\mathcal{A}^{(2)}(X))\ :\ \textup{Tr}(X)\leq\alpha\right\},

(SDP-LSE)

where $\alpha,\beta$ and $M$ are appropriately chosen parameters. Algorithm 1 then generates a Gaussian sample with covariance $\widehat{X}_{\epsilon}$ which is an $\epsilon$ -optimal solution to (SDP-LSE). It is also a near-optimal, near-feasible solution to (SDP). This result is a slight modification of [28, Lemma 3.2] which only provides bounds for SDPs with linear equality constraints.

Lemma 2.1.

For $\epsilon>0$ , let $(X^{\star},\vartheta^{\star},\mu^{\star})$ be an optimal primal-dual solution to (SDP) and its dual, and let $\widehat{X}_{\epsilon}$ be an $\epsilon$ -optimal solution to (SDP-LSE). If $\beta>\|\vartheta^{\star}\|_{1}+\|\mu^{\star}\|_{1}$ and $M>0$ , then

	$\displaystyle\langle C,X^{\star}\rangle-\frac{\beta\log(2d_{1}+d_{2})}{M}-\epsilon\leq\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}\rangle+(\\|\vartheta^{\star}\\|_{1}+\\|\mu^{\star}\\|_{1})\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\\|\vartheta^{\star}\\|_{1}-\\|\mu^{\star}\\|_{1}},$		(2.3)
	$\displaystyle\max\left\{\\|\mathcal{A}^{(1)}(X)-b^{(1)}\\|_{\infty},\max_{i}\left(b^{(2)}_{i}-\mathcal{A}^{(2)}_{i}(X)\right)\right\}\leq\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\\|\vartheta^{\star}\\|_{1}-\\|\mu^{\star}\\|_{1}}.$		(2.4)

3 Application of Gaussian Sampling to (k-Cut-SDP)

In this section, we look at the application of Gaussian sampling to Max-k-Cut. Since Algorithm 1 uses $\mathcal{O}(n^{2})$ memory when solving (k-Cut-SDP), we define a new SDP relaxation of Max-k-Cut with the same approximation guarantee, but with $\mathcal{O}(|E|)$ constraints. We then apply Algorithm 1 to this new relaxation, and show how to round the solution to achieve nearly the same approximation ratio as given in (1.1). Let

\alpha_{k}=\min_{-1/(k-1)\leq\rho\leq 1}\frac{kp(\rho)}{(k-1)(1-\rho)},

(3.1)

where $p(X_{ij})$ is the probability that vertices $i$ and $j$ are in different partitions. If $X$ is feasible for (k-Cut-SDP) and CUT is the value of the $k$ -cut generated by the FJ rounding scheme, then

\begin{split}\mathbb{E}[\texttt{CUT}]&=\sum_{ij\in E,i<j}w_{ij}p(X_{ij})\\ &\geq\sum_{ij\in E,i<j}\frac{k-1}{k}w_{ij}(1-X_{ij})\alpha_{k}=\alpha_{k}\langle C,X\rangle.\end{split}

(3.2)

Frieze and Jerrum [14] derive a lower bound on $\alpha_{k}$ , showing that the method gives a nontrivial approximation guarantee. Observe that (3.2) depends only on the values $X_{ij}$ if $(i,j)\in E$ .

A new SDP relaxation of Max-k-Cut.

We relax the constraints in (k-Cut-SDP) to define

\max_{X\succeq 0}\quad\langle C,X\rangle\quad\textup{subject to}\quad\begin{cases}&\textup{diag}(X)=\mathbbm{1}\\ &X_{ij}\geq-\frac{1}{k-1}\quad(i,j)\in E,i<j.\end{cases}

(k-Cut-Rel)

Since (k-Cut-Rel) is a relaxation of (k-Cut-SDP), its optimal objective function value provides an upper bound on $\langle C,X^{\star}\rangle$ , where $X^{\star}$ is an optimal solution to (k-Cut-SDP), and hence, on the optimal $k$ -cut value $\textup{opt}_{k}^{G}$ . Note that the bound in (3.2) holds true even if we replace $X^{\star}$ by an optimal solution to (k-Cut-Rel) since it depends on the value of $X_{ij}$ only if $(i,j)\in E$ . Furthermore, when the FJ rounding scheme is applied to the solution of (k-Cut-Rel), it satisfies the approximation guarantee on the expected value of the generated $k$ -cut given in (1.1), i.e., $\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}\textup{opt}_{k}^{G}$ .

Using Algorithm 1.

We now have an SDP relaxation of Max-k-Cut that has $n+|E|$ constraints. Penalizing the linear constraints in (k-Cut-Rel) using the function $\phi_{M}(\cdot)$ (2.1), Algorithm 1 can now be used to generate $k$ samples with covariance $\widehat{X}_{\epsilon}$ which is an $\epsilon$ -optimal solution to

\textstyle{\max\limits_{X\succeq 0}\left\{\langle C,X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},-\frac{1}{k-1}-e_{i}^{T}Xe_{j}\right):(i,j)\in E,\textup{Tr}(X)\leq n\right\}}.

(k-Cut-LSE)

Optimality and feasibility results for (k-Cut-Rel).

We now show that an $\epsilon$ -optimal solution to (k-Cut-LSE) is also a near-optimal, near-feasible solution to (k-Cut-Rel).

Lemma 3.1.

For $\epsilon\in(0,1/2)$ , let $X^{\star}_{R}$ be an optimal solution to (k-Cut-Rel) and let $\widehat{X}_{\epsilon}$ be an $\epsilon\textup{Tr}(C)$ -optimal solution to (k-Cut-LSE). For $\beta=6\textup{Tr}(C)$ and $M=6\frac{\log(2n+|E|)}{\epsilon}$ , we have

	$\displaystyle(1-2\epsilon)\langle C,X^{\star}_{R}\rangle\leq\langle C,\widehat{X}_{\epsilon}\rangle\leq(1+4\epsilon)\langle C,X^{\star}_{R}\rangle\quad\textrm{and}$		(3.3)
	$\displaystyle\\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\frac{1}{k-1}-\epsilon,\quad(i,j)\in E,i<j.$		(3.4)

Generating a feasible solution to Max-k-Cut.

Since $\widehat{X}_{\epsilon}$ might not necessarily be feasible to (k-Cut-Rel), we cannot apply the FJ rounding scheme to the samples $z_{i}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ . We, therefore, generate samples $z^{f}_{i}\sim\mathcal{N}(0,X^{f})$ using the procedure given in Algorithm 2 such that $X^{f}$ is a feasible solution to (k-Cut-Rel) and $\langle C,X^{f}\rangle$ is close to $\langle C,\widehat{X}_{\epsilon}\rangle$ .

Input : Sample

z_{i}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})

for

i=1,\dotsc,k

and

\textup{diag}(\widehat{X}_{\epsilon})

Output :

z^{f}_{i}\sim\mathcal{N}(0,X^{f})

for

i=1,\dotsc,k

with

X^{f}

feasible to (k-Cut-Rel)

4Function GeneratekSamples:

5 for $i=1,\dotsc,k$ do

6 Set

\textup{err}=\max\{0,\max_{(i,j)\in E,i<j}\ -1/(k-1)-[\widehat{X}_{\epsilon}]_{ij}\}

7 Set

\overline{z}_{i}=z_{i}+\sqrt{\textup{err}}\,y\,\mathbbm{1}

, where

y\sim\mathcal{N}(0,1)

8 Generate

\zeta\sim\mathcal{N}\left(0,I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)

9 Set

z^{f}_{i}=\frac{\overline{z}_{i}}{\sqrt{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}}+\zeta

10 end for

12 return $z^{f}_{1},\dotsc,z^{f}_{k}$

Algorithm 2 Generate Gaussian samples with covariance feasible to (k-Cut-Rel)

We can now apply the FJ rounding scheme to $z^{f}_{1},\dotsc,z^{f}_{k}$ as given in Lemma 3.2.

Lemma 3.2.

For $G=(V,E)$ , let $\textup{opt}_{k}^{G}$ be the optimal $k$ -cut value and let $X_{R}^{\star}$ be an optimal solution to (k-Cut-Rel). For $\epsilon\in\left(0,\frac{1}{4}\right)$ , let $\widehat{X}_{\epsilon}\succeq 0$ satisfy (3.3) and (3.4). Let $z^{f}_{1},\dotsc,z^{f}_{k}$ be random vectors generated by Algorithm 2 with input $z_{i},\dotsc,z_{k}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ and let CUT denote the value of a $k$ -cut generated by applying the FJ rounding scheme to $z^{f}_{1},\dotsc,z^{f}_{k}$ . For $\alpha_{k}$ as defined by (3.1), we have

\alpha_{k}(1-4\epsilon)\textup{opt}_{k}^{G}\leq\alpha_{k}(1-4\epsilon)\langle C,X_{R}^{\star}\rangle\leq\mathbb{E}[\texttt{CUT}]\leq\textup{opt}_{k}^{G}.

(3.5)

Computational complexity of Algorithm 1 when applied to (k-Cut-LSE).

Finally, in Lemma 3.3, we provide the computational complexity of the method proposed in this section, which concludes the proof of Proposition 1.1.

Lemma 3.3.

When the method proposed in this section (Section 3), with $p=\frac{\epsilon}{T(n,\epsilon)}$ and $T(n,\epsilon)=\frac{144\log(2n+|E|)n^{2}}{\epsilon^{2}}$ , is used to generate an approximate $k$ -cut to Max-k-Cut, the generated cut satisfies $\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{G}$ and runs in $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ time.

4 Application of Gaussian Sampling to (MA-SDP)

We now look at the application of our Gaussian sampling-based method to Max-Agree. Algorithm 1 uses $\mathcal{O}(n^{2})$ memory to generate samples whose covariance is an $\epsilon$ -optimal solution to (MA-SDP). However, with the similar observation as in the case of Max-k-Cut, we note that for any $X$ feasible to (MA-SDP), the proof of the inequality $\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X\rangle$ , given in [10, Theorem 3], requires $X_{ij}\geq 0$ only if $(i,j)\in E$ . We therefore, write a new relaxation of (MA-SDP),

\max_{X\succeq 0}\ \langle C,X\rangle=\langle W^{+}+L_{G^{-}},X\rangle\ \ \ \textup{subject to}\ \begin{cases}&X_{ii}=1\ \ \forall\ i\in\{1,\dotsc,n\}\\ &X_{ij}\geq 0\ \ (i,j)\in E,i<j,\end{cases}

(MA-Rel)

with only $n+|E|$ constraints. The bound $\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{\star}\rangle\geq 0.766\textup{opt}_{CC}^{G}$ on the expected value of the clustering holds even if the clustering is generated by applying the CGW rounding scheme to an optimal solution $X^{\star}$ of (MA-Rel). To use Algorithm 1, we penalize the constraints in (MA-Rel) and define

\max_{X\succeq 0}\left\{\langle C,X\rangle-\beta\phi_{M}(\textup{diag}(X)-\mathbbm{1},-e_{i}^{T}Xe_{j}):(i,j)\in E,\textup{Tr}(X)\leq n\right\}.

(MA-LSE)

Optimality and feasibility results for (MA-Rel).

Algorithm 1 is now used to generate $z\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ , where $\widehat{X}_{\epsilon}$ is an $\epsilon$ -optimal solution to (MA-LSE). We show in Lemma 4.1 that $\widehat{X}_{\epsilon}$ is also a near-optimal, near-feasible solution to (MA-Rel).

Lemma 4.1.

For $\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}$ , $\epsilon\in\left(0,\frac{1}{4}\right)$ , let $X^{\star}_{G}$ be an optimal solution to (MA-Rel) and $\widehat{X}_{\epsilon}$ be an $\epsilon\Delta$ -optimal solution to (MA-LSE). Setting $\beta=4\Delta$ and $M=4\frac{\log(2n+|E|)}{\epsilon}$ , we have

	$\displaystyle(1-4\epsilon)\langle C,X^{\star}_{G}\rangle\leq\langle C,\widehat{X}_{\epsilon}\rangle\leq(1+4\epsilon)\langle C,X^{\star}_{G}\rangle\quad\textup{and}$		(4.1)
	$\displaystyle\\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\epsilon,\quad(i,j)\in E,i<j.$		(4.2)

Generating an approximate clustering.

The CGW rounding scheme can only be applied if we have a feasible solution to (MA-Rel). We, therefore, use a modified version of Algorithm 2, with Step 3 replaced by $\textup{err}=\max\{0,\max_{(i,j)\in E,i<j}\ -[\widehat{X}_{\epsilon}]_{ij}\}$ and input $z_{1},z_{2},z_{3}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ , to generate zero-mean Gaussian samples whose covariance is a feasible solution to (MA-Rel). Finally, we apply the CGW rounding scheme to the output of the modified of Algorithm 2.

Lemma 4.2.

Let $X^{\star}_{G}$ be an optimal solution to (MA-Rel). For $\epsilon\in\left(0,1/6\right)$ , let $\widehat{X}_{\epsilon}\succeq 0$ satisfy (4.1) and (4.2), and let $z^{f}_{1},z^{f}_{2},z^{f}_{3}$ be random vectors generated by Algorithm 2 with input $z_{1},z_{2},z_{3}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ . Let $\textup{opt}_{CC}^{G}$ denote the optimal clustering value for the graph $G=(V,E)$ and let $\mathcal{C}$ denote the value of the clustering generated from the random vectors $z^{f}_{1},z^{f}_{2},z^{f}_{3}$ using the CGW rounding scheme. Then

\mathbb{E}[\mathcal{C}]\geq 0.766(1-6\epsilon)\langle C,X^{\star}_{G}\rangle\geq 0.766(1-6\epsilon)\textup{opt}_{CC}^{G}.

(4.3)

Computational complexity of Algorithm 1 when applied to (MA-LSE).

Lemma 4.3.

When the method proposed in this section (Section 4), with $p=\frac{\epsilon}{T(n,\epsilon)}$ and $T(n,\epsilon)=\frac{64\log(2n+|E|)n^{2}}{\epsilon^{2}}$ , is used to generate an approximate clustering, the value of the clustering satisfies $\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon)\textup{opt}_{CC}^{G}$ and runs in $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ time.

This completes the proof of Proposition 1.2.

5 Sparsifying the Laplacian Cost Matrix

As seen in Sections 3 and 4, the memory requirement for generating and representing an $\epsilon$ -optimal solution to (k-Cut-LSE) and (MA-LSE) is bounded by $\mathcal{O}(n+|E|)$ . However, if the input graph $G$ is dense, the cost matrix will be dense and the number of inequality constraints will still be high. In this section, we consider the situation in which the dense weighted graph arrives in a stream, and we first build a sparse approximation with similar spectral properties. We refer to this additional step as sparsifying the cost.

Definition 5.1 ( $\tau$ -spectral closeness).

Two graphs $G$ and $\tilde{G}$ defined on the same set of vertices are said to be $\tau$ -spectrally close if, for any $x\in\mathbb{R}^{n}$ and $\tau\in(0,1)$ ,

(1-\tau)x^{T}L_{G}x\leq x^{T}L_{\tilde{G}}x\leq(1+\tau)x^{T}L_{G}x.

(5.1)

Spectral graph sparsification has been studied quite extensively (see, e.g., [29, 1, 15]). Kyng et al. [23] propose a $\mathcal{O}(|E|\log^{2}n)$ -time framework to replace a dense graph $G=(V,E)$ by a sparser graph $\tilde{G}=(V,\tilde{E})$ such that $|\tilde{E}|\sim\mathcal{O}(n\log n/\tau^{2})$ and $\tilde{G}$ satisfies (5.1) with probability $1-\frac{1}{\textup{poly}(n)}$ . Their algorithm assumes that the edges of the graph arrive one at a time, so that the total memory requirement is $\mathcal{O}(n\log n/\tau^{2})$ rather than $\mathcal{O}(|E|)$ . Furthermore, a sparse cost matrix decreases the computation time of the subproblem in Algorithm 1 since it involves matrix-vector multiplication with the gradient of the cost.

Max-k-Cut with sparsification.

Let $\tilde{G}$ be a sparse graph with $\mathcal{O}(n\log n/\tau^{2})$ edges that is $\tau$ -spectrally close to the input graph $G$ . By applying the method outlined in Section 3, we can generate a $k$ -cut for the graph $\tilde{G}$ (using $\mathcal{O}(n\log n/\tau^{2})$ memory) whose expected value satisfies the bound (3.5). Note that, this generated cut is also a $k$ -cut for the original graph $G$ with provable approximation guarantee as shown in Lemma 5.1.

Lemma 5.1.

For $\epsilon,\tau\in(0,1/5)$ , let $\widehat{X}_{\epsilon}$ be a near-feasible, near-optimal solution to (k-Cut-Rel) defined on the graph $\tilde{G}$ that satisfies (3.3) and (3.4). Let CUT denote the value of the $k$ -cut generated by applying Algorithm 2 followed by the FJ rounding scheme to $\widehat{X}_{\epsilon}$ . Then the generated cut satisfies

\alpha_{k}(1-4\epsilon-\tau)\textup{opt}_{k}^{G}\leq\mathbb{E}[\texttt{CUT}]\leq\textup{opt}_{k}^{G},

(5.2)

where $\textup{opt}_{k}^{G}$ is the value of the optimal $k$ -cut for the original graph $G$ .

Max-Agree with sparsification.

The number of edges $|E^{+}|$ and $|E^{-}|$ in graphs $G^{+}$ and $G^{-}$ respectively determine the working memory of Algorithm 1. For dense input graphs $G^{+}$ and $G^{-}$ , we sparsify them to generate graphs $\tilde{G}^{+}$ and $\tilde{G}^{-}$ with at most $\mathcal{O}(n\log n/\tau^{2})$ edges and define

\max_{X\in\mathcal{S}}\tilde{f}(X)=\langle L_{\tilde{G}^{-}}+\tilde{W}^{+},X\rangle-\beta\phi_{M}(\textup{diag}(X)-\mathbbm{1},-[e_{i}^{T}Xe_{j}]_{(i,j)\in\tilde{E}}),

(MA-Sparse)

where $\mathcal{S}=\{X:\textup{Tr}(X)\leq n,X\succeq 0\}$ , $L_{\tilde{G}^{-}}$ is the Laplacian of the graph $\tilde{G}^{-}$ , $\tilde{W}^{+}$ is matrix with nonnegative entries denoting the weight of each edge $(i,j)\in\tilde{E}^{+}$ , and $\tilde{E}=\tilde{E}^{+}\cup\tilde{E}^{-}$ . Algorithm 1 then generates an $\epsilon(\textup{Tr}(L_{\tilde{G}^{-}})+\sum_{ij\in\tilde{E}^{+}}\tilde{w}^{+}_{ij})$ -optimal solution, $\widehat{X}_{\epsilon}$ , to (MA-Sparse) using $\mathcal{O}(n\log n/\tau^{2})$ memory. We can now use the method given in Section 4 to generate a clustering of graph $\tilde{G}$ whose expected value, $\mathbb{E}[\mathcal{C}]$ , satisfies (4.3). The following lemma shows that $\mathcal{C}$ also represents a clustering for the original graph $G$ with provable guarantees.

Lemma 5.2.

For $\epsilon,\tau\in(0,1/9)$ , let $\widehat{X}_{\epsilon}$ be a near-feasible, near-optimal solution to (MA-Sparse) defined on the graph $\tilde{G}$ that satisfies (4.1) and (4.2). Let $\mathcal{C}$ denote the value of the clustering generated by applying Algorithm 2 followed by the CGW rounding scheme to $\widehat{X}_{\epsilon}$ . Then, $\mathbb{E}[\mathcal{C}]$ satisfies

0.766(1-6\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G}\leq\mathbb{E}[\mathcal{C}]\leq\textup{opt}_{CC}^{G},

(5.3)

where $\textup{opt}_{CC}^{G}$ is the value of the optimal clustering of the original graph $G$ .

We summarize our results in the following lemma whose proof is given in Appendix A.10.

Lemma 5.3.

Assume that the edges of the input graph $G=(V,E)$ arrive one at a time in a stream. The procedure given in this section uses at most $\mathcal{O}(n\log n/\tau^{2})$ memory and in $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ -time, generates approximate solutions to Max-k-Cut and Max-Agree that satisfy the bounds $\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon-\tau)\textup{opt}_{k}^{G}$ and $\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G}$ respectively.

6 Computational Results

We now discuss the results of preliminary computations to cluster the vertices of a graph $G$ using the approach outlined in Section 4. The aim of numerical experiments was to verify that the bounds given in Lemma 4.2 were satisfied when we used the procedure outlined in Section 4 to generate a clustering for each input graph. We used the graphs from GSet dataset [36] which is a collection of randomly generated graphs. Note that the aim of correlation clustering is to generate a clustering of vertices for graphs where each edge has a label indicating ‘similarity’ or ‘dissimilarity’ of the vertices connected to that edge. We, therefore, first converted the undirected, unweighted graphs from the GSet dataset [36] into the instances of graphs with labelled edges using an adaptation of the approach used in [34, 33]. This modified approach generated a label and weight for each edge $(i,j)\in E$ indicating the amount of ‘similarity’ or ‘dissimilarity’ between vertices $i$ and $j$ .

Generating input graphs for Max-Agree.

In the process of label generation, we first computed the Jaccard coefficient $J_{ij}=|N(i)\cap N(j)|/|N(i)\cup N(j)|$ , where $N(i)$ is the set of neighbours of $i$ for each edge $(i,j)\in E$ . Next we computed the quantity $S_{ij}=\log((1-J_{ij}+\delta)/(1+J_{ij}-\delta))$ with $\delta=0.05$ for each edge $(i,j)\in E$ , which is a measure of the amount of ‘similarity’ or ‘dissimilarity’. Finally, the edge $(i,j)$ was labelled as ‘dissimilar’ if $S_{ij}<0$ with $w^{-}_{ij}=-S_{ij}$ and labelled as ‘similar’ with $w^{+}_{ij}=S_{ij}$ otherwise.

Experimental Setup.

We set the input parameters to $\epsilon=0.05$ , $\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}$ , $\beta=4\Delta$ , $M=4\frac{\log(2n)+|E|}{\epsilon}$ . Using Algorithm 1, (MA-LSE) was solved to $\epsilon\Delta$ -optimality and, we computed feasible samples using Algorithm 2. Finally, we generated two Gaussian samples and created at most four clusters by applying the 0.75-rounding scheme proposed by Swamy [31, Theorem 2.1], for simplicity. The computations were performed using MATLAB R2021a on a machine with 8GB RAM. We noted the peak memory used by the algorithm using the profiler command in MATLAB.

The computational result for some randomly selected instances from the dataset is given in Table 1. We have provided the result for the rest of the graphs from GSet in Appendix C. First, we observed that for each input graph, the number of iterations of LMO for $\epsilon\Delta$ -convergence satisfied the bound given in Proposition 1.1 and the infeasibility of the covariance $\widehat{X}_{\epsilon}$ of the generated samples was less than $\epsilon$ satisfying (4.2). We generated 10 pairs of i.i.d. zero-mean Gaussian samples with covariance $\widehat{X}_{\epsilon}$ , and each in turn was used to generate a clustering for the input graph using the 0.75-rounding scheme proposed by Swamy [31]. Amongst the 10 clusterings generated for each graph, we picked the clustering with largest value denoted by $\mathcal{C}_{\textup{best}}$ . Note that, $\mathcal{C}_{\textup{best}}\geq\mathbb{E}[\mathcal{C}]\geq 0.75(1-6\epsilon)\langle C,X^{\star}_{G}\rangle\geq 0.75\frac{1-6\epsilon}{1+4\epsilon}\langle C,\widehat{X}_{\epsilon}\rangle$ , where the last inequality follows from combining (4.3) with (4.1). Since we were able to generate the values, $\mathcal{C}_{\textup{best}}$ and $\langle C,\widehat{X}_{\epsilon}\rangle$ , we noted that the weaker bound $\mathcal{C}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle=\textup{AR}\geq 0.75(1-6\epsilon)/(1+4\epsilon)$ was satisfied by every input graph when $\epsilon=0.05$ .

Table 1 also shows the memory used by our method. Consider the dataset G1, for which the memory used by our method was $1526.35\textup{kB}\approx 9.8\times(|V|+|E^{+}|+|E^{-}|)\times 8$ , where a factor of 8 denotes that MATLAB requires 8 bytes to store a real number. Similarly, we observed that our method used at most $c\times(|V|+|E^{+}|+|E^{-}|)\times 8$ memory to generate clusters for other instances from GSet, where $c\leq 33$ for every instance of the input graph, showing that the memory used was linear in the size of the input graph.

Table 1: Result of generating a clustering of graphs from GSet using the method outlined in Section 4. We have,

\textup{infeas}=\max\{\|\textup{diag}(X)-1\|_{\infty},\max\{0,-[\widehat{X}_{\epsilon}]_{ij}\}\}

\textup{AR}=\mathcal{C}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle

and

0.75(1-6\epsilon)/(1+4\epsilon)=0.4375

for

\epsilon=0.05

Dataset	$\|V\|$	$\|E^{+}\|$	$\|E^{-}\|$	# Iterations ( $\times 10^{3}$ )	infeas	$\langle C,\widehat{X}_{\epsilon}\rangle$	$\mathcal{C}_{\textup{best}}$	AR	Memory required (in kB)
G1	800	2453	16627	669.46	$10^{-3}$	849.48	643	0.757	1526.35
G11	800	817	783	397.2	$6\times 10^{-4}$	3000.3	2080	0.693	448.26
G14	800	3861	797	330.02	$8\times 10^{-4}$	542.55	469.77	0.866	423.45
G22	2000	115	19849	725.66	$10^{-3}$	1792.9	1371.1	0.764	1655.09
G32	2000	2011	1989	571.42	$9\times 10^{-4}$	7370	4488	0.609	1124
G43	1000	248	9704	501.31	$10^{-3}$	803.8	616.05	0.766	654.46
G48	3000	0	6000	9806.22	0.004	599.64	461.38	0.769	736.09
G51	1000	4734	1147	1038.99	0.001	676.21	446.29	0.66	517.09
G55	5000	66	12432	2707.07	0.002	1244.2	901.74	0.724	1281.03
G57	5000	4981	5019	574.5	0.005	18195	10292	0.565	812.78

7 Discussion

In this paper, we proposed a Gaussian sampling-based optimization algorithm to generate approximate solutions to Max-k-Cut, and the Max-Agree variant of correlation clustering using $\mathcal{O}\left(n+\min\left\{|E|,\frac{n\log n}{\tau^{2}}\right\}\right)$ memory. The approximation guarantees given in [14, 10, 31] for these problems are based on solving SDP relaxations of these problems that have $n^{2}$ constraints. The key observation that led to the low-memory method proposed in this paper was that the approximation guarantees from literature are preserved for both problems even if we solve their weaker SDP relaxations with only $\mathcal{O}(n+|E|)$ constraints. We showed that for Max-k-Cut, and the Max-Agree variant of correlation clustering, our approach nearly preserves the quality of the solution as given in [14, 10]. We also implemented the method outlined in Section 4 to generate approximate clustering for random graphs with provable guarantees. The numerical experiments showed that while the method was simple to implement, it was slow in practice. However, there is scope for improving the convergence rate of our method so that it can potentially be applied to the large-scale instances of various real-life applications of clustering.

Extending the low-memory method to solve problems with triangle inequalities.

The known nontrivial approximation guarantees for sparsest cut problem involve solving an SDP relaxation that has $n^{3}$ triangle inequalities [4]. It would be interesting to see whether it is possible to simplify these SDPs in such a way that they can be combined nicely with memory efficient algorithms, and still maintain good approximation guarantees.

References

Ahn and Guha [2009] Kook Jin Ahn and Sudipto Guha. Graph sparsification in the semi-streaming model. In International Colloquium on Automata, Languages, and Programming, pages 328–338. Springer, 2009.
Ahn et al. [2015] KookJin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlation clustering in data streams. In International Conference on Machine Learning, pages 2237–2246. PMLR, 2015.
Ailon et al. [2009] Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. Streaming k-means approximation. In Proceddings of the Twenty-Third Annual Conference on Neural Information Processing Systems, volume 4, page 2, 2009.
Arora et al. [2009] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):1–37, 2009.
Awasthi et al. [2015] Pranjal Awasthi, Afonso S Bandeira, Moses Charikar, Ravishankar Krishnaswamy, Soledad Villar, and Rachel Ward. Relax, no need to round: Integrality of clustering formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 191–200, 2015.
Bansal et al. [2004] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine Learning, 56(1-3):89–113, 2004.
Borndörfer et al. [1998] Ralf Borndörfer, Andreas Eisenblätter, Martin Grötschel, and Alexander Martin. Frequency assignment in cellular phone networks. Annals of Operations Research, 76:73–93, 1998.
Boumal et al. [2016] Nicolas Boumal, Vlad Voroninski, and Afonso Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Advances in Neural Information Processing Systems, pages 2757–2765, 2016.
Burer and Monteiro [2003] Samuel Burer and Renato DC Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95(2):329–357, 2003.
Charikar et al. [2005] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005.
Ding et al. [2019] Lijun Ding, Alp Yurtsever, Volkan Cevher, Joel A Tropp, and Madeleine Udell. An optimal-storage approach to semidefinite programming using approximate complementarity. arXiv preprint arXiv:1902.03373, 2019.
Eisenblätter et al. [2002] Andreas Eisenblätter, Martin Grötschel, and Arie Koster. Frequency planning and ramifications of coloring. Discussiones Mathematicae Graph Theory, 22(1):51–88, 2002.
Feo and Resende [1995] Thomas A Feo and Mauricio GC Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6(2):109–133, 1995.
Frieze and Jerrum [1997] Alan Frieze and Mark Jerrum. Improved approximation algorithms for Max-k-Cut and max bisection. Algorithmica, 18(1):67–81, 1997.
Fung et al. [2019] Wai-Shing Fung, Ramesh Hariharan, Nicholas JA Harvey, and Debmalya Panigrahi. A general framework for graph sparsification. SIAM Journal on Scientific Computing, 48(4):1196–1223, 2019.
Gaur et al. [2008] Daya Ram Gaur, Ramesh Krishnamurti, and Rajeev Kohli. The capacitated Max-k-Cut problem. Mathematical Programming, 115(1):65–72, 2008.
Ghaddar et al. [2011] Bissan Ghaddar, Miguel F Anjos, and Frauke Liers. A branch-and-cut algorithm based on semidefinite programming for the minimum k-partition problem. Annals of Operations Research, 188(1):155–174, 2011.
Jaggi [2013] Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In Proceedings of the 30th International Conference on Machine Learning, pages 427–435, 2013.
Journée et al. [2010] Michel Journée, Francis Bach, P-A Absil, and Rodolphe Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM Journal on Optimization, 20(5):2327–2351, 2010.
Kann et al. [1995] Viggo Kann, Sanjeev Khanna, Jens Lagergren, and Alessandro Panconesi. On the Hardness of Approximating Max-k-Cut and Its Dual. Citeseer, 1995.
Karger et al. [1998] David Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite programming. Journal of the ACM (JACM), 45(2):246–265, 1998.
Kuczyński and Woźniakowski [1992] Jacek Kuczyński and Henryk Woźniakowski. Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM Journal on Matrix Analysis and Applications, 13(4):1094–1122, 1992.
Kyng et al. [2017] Rasmus Kyng, Jakub Pachocki, Richard Peng, and Sushant Sachdeva. A framework for analyzing resparsification algorithms. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2032–2043. SIAM, 2017.
Ma and Hao [2017] Fuda Ma and Jin-Kao Hao. A multiple search operator heuristic for the Max-k-Cut problem. Annals of Operations Research, 248(1-2):365–403, 2017.
McCutchen and Khuller [2008] Richard Matthew McCutchen and Samir Khuller. Streaming algorithms for k-center clustering with outliers and with anonymity. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 165–178. Springer, 2008.
Mladenović and Hansen [1997] Nenad Mladenović and Pierre Hansen. Variable neighborhood search. Computers & Operations Research, 24(11):1097–1100, 1997.
Newman [2018] Alantha Newman. Complex semidefinite programming and Max-k-Cut. arXiv preprint arXiv:1812.10770, 2018.
Shinde et al. [2021] Nimita Shinde, Vishnu Narayanan, and James Saunderson. Memory-efficient structured convex optimization via extreme point sampling. SIAM Journal on Mathematics of Data Science, 3(3):787–814, 2021.
Spielman and Teng [2011] Daniel A Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM Journal on Scientific Computing, 40(4):981–1025, 2011.
Subramanian et al. [2008] Anand Prabhu Subramanian, Himanshu Gupta, Samir R Das, and Jing Cao. Minimum interference channel assignment in multiradio wireless mesh networks. IEEE Transactions on Mobile Computing, 7(12):1459–1473, 2008.
Swamy [2004] Chaitanya Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In SIAM Symposium on Discrete Algorithms, volume 4, pages 526–527. Citeseer, 2004.
Tropp et al. [2019] Joel A Tropp, Alp Yurtsever, Madeleine Udell, and Volkan Cevher. Streaming low-rank matrix approximation with an application to scientific simulation. SIAM Journal on Scientific Computing, 41(4):A2430–A2463, 2019.
Veldt et al. [2019] Nate Veldt, David F Gleich, Anthony Wirth, and James Saunderson. Metric-constrained optimization for graph clustering algorithms. SIAM Journal on Mathematics of Data Science, 1(2):333–355, 2019.
Wang et al. [2013] Yubo Wang, Linli Xu, Yucheng Chen, and Hao Wang. A scalable approach for general correlation clustering. In International Conference on Advanced Data Mining and Applications, pages 13–24. Springer, 2013.
Woodruff [2014] David P Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157, 2014.
Ye [2003] Yinyu Ye. GSet random graphs. https://www.cise.ufl.edu/research/sparse/matrices/Gset/, 2003. [accessed April-2021].
Yurtsever et al. [2017] Alp Yurtsever, Madeleine Udell, Joel A Tropp, and Volkan Cevher. Sketchy decisions: Convex low-rank matrix optimization with optimal storage. In Artificial Intelligence and Statistics, pages 1188–1196, 2017.
Yurtsever et al. [2021] Alp Yurtsever, Joel A Tropp, Olivier Fercoq, Madeleine Udell, and Volkan Cevher. Scalable semidefinite programming. SIAM Journal on Mathematics of Data Science, 3(1):171–200, 2021.

Appendix A Proofs

A.1 Proof of Lemma 2.1

Proof.

Let $\vartheta\in\mathbb{R}^{d_{1}}$ and $\mu\in\mathbb{R}^{d_{2}}$ be the dual variables corresponding to the $d_{1}$ equality constraints and the $d_{2}$ inequality constraints respectively. The dual of (SDP) is

\min_{\vartheta,\mu}\quad\sum_{i=1}^{d_{1}}b^{(1)}_{i}\vartheta_{i}+\sum_{j=1}^{d_{2}}b^{(2)}_{j}\mu_{j}\quad\textup{subject to}\quad\begin{cases}&\sum\limits_{i=1}^{d_{1}}\vartheta_{i}A^{(1)}_{i}+\sum\limits_{j=1}^{d_{2}}A^{(2)}_{j}\mu_{j}-C\succeq 0\\ &\mu\leq 0,\end{cases}

(DSDP)

where $A^{(2)}_{j}$ ’s for $j=1,\dotsc,d_{2}$ are assumed to be symmetric.

Lower bound on the objective.

Let $X^{\star}$ be an optimal solution to (SDP) and let $X^{\star}_{FW}$ be an optimal solution to (SDP-LSE). For ease of notation, let

u=\mathcal{A}^{(1)}(X)-b^{(1)}\quad\textrm{and}\quad v=b^{(2)}-\mathcal{A}^{(2)}(X),

(A.1)

and define $(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon})$ , $(u_{FW},v_{FW})$ and $(u^{\star},v^{\star})$ by substituting $\widehat{X}_{\epsilon}$ , $X_{FW}$ and $X^{\star}$ respectively in (A.1). Since $\widehat{X}_{\epsilon}$ is an $\epsilon$ -optimal solution to (SDP-LSE), and a feasible solution to (SDP-LSE), the following holds

\langle C,\widehat{X}_{\epsilon}\rangle-\beta\phi_{M}(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon})\geq\langle C,X_{FW}\rangle-\beta\phi_{M}(u_{FW},v_{FW})-\epsilon\geq\langle C,X^{\star}\rangle-\beta\phi_{M}(u^{\star},v^{\star})-\epsilon.

(A.2)

We know that $(u^{\star},v^{\star})$ is feasible to (SDP), so that $\phi_{M}(u^{\star},v^{\star})\leq\frac{\log(2d_{1}+d_{2})}{M}$ . Now, rearranging the terms, and using the upper bound on $\phi_{M}(u^{\star},v^{\star})$ and the fact that $\phi_{M}(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon})\geq 0$ ,

\langle C,\widehat{X}_{\epsilon}\rangle\geq\langle C,X^{\star}\rangle-\frac{\beta\log(2d_{1}+d_{2})}{M}-\epsilon.

(A.3)

Upper bound on the objective.

The Lagrangian of (SDP) is $L(X,\vartheta,\mu)=\langle C,X\rangle-\sum_{i=1}^{d_{1}}u_{i}\vartheta_{i}+\sum_{j=1}^{d_{2}}v_{j}\mu_{j}$ . For a primal-dual optimal pair, ( $X^{\star},\vartheta^{\star},\mu^{\star}$ ), and $\widehat{X}_{\epsilon}\succeq 0$ , we have that $L(\widehat{X}_{\epsilon},\vartheta^{\star},\mu^{\star})\leq L(X^{\star},\vartheta^{\star},\mu^{\star})$ , i.e.,

\begin{split}\langle C,\widehat{X}_{\epsilon}\rangle-\sum_{i=1}^{d_{1}}\vartheta_{i}^{\star}[\widehat{u}_{\epsilon}]_{i}+\sum_{i=j}^{d_{2}}\mu_{j}^{\star}[\widehat{v}_{\epsilon}]_{j}&\leq\langle C,X^{\star}\rangle-\sum_{i=1}^{d_{1}}\vartheta^{\star}_{i}u^{\star}_{i}+\sum_{j=1}^{d_{2}}\mu^{\star}_{j}v^{\star}_{j}\\ &\leq\langle C,X^{\star}\rangle.\end{split}

Rearranging the terms, using the duality of the $\ell_{1}$ and $\ell_{\infty}$ norms, and the fact that $\mu^{\star}\leq 0$ , gives

\begin{split}\langle C,\widehat{X}_{\epsilon}\rangle&\leq\langle C,X^{\star}\rangle+\sum_{i=1}^{d_{1}}\vartheta^{\star}_{i}[\widehat{u}_{\epsilon}]_{i}-\sum_{j=1}^{d_{2}}\mu^{\star}_{j}[\widehat{v}_{\epsilon}]_{j}\\ &\leq\langle C,X^{\star}\rangle+\left(\sum_{i=1}^{d_{1}}|\vartheta^{\star}_{i}|\right)\|\widehat{u}_{\epsilon}\|_{\infty}+\left(\sum_{j=1}^{d_{2}}-\mu^{\star}_{j}\right)\max_{j}[\widehat{v}_{\epsilon}]_{j}\\ &\leq\langle C,X^{\star}\rangle+\|[\vartheta^{\star},\mu^{\star}]]\|_{1}\max\left\{\|\widehat{u}_{\epsilon}\|_{\infty},\max_{j}[\widehat{v}_{\epsilon}]_{j}\right\}.\end{split}

(A.4)

Bound on infeasibility.

Using (A.4), we rewrite (A.2) as,

\begin{split}\beta\phi_{M}(\widehat{u}_{\epsilon},\widehat{v}_{\epsilon})&\leq\langle C,\widehat{X}_{\epsilon}\rangle-\langle C,X^{\star}\rangle+\beta\phi_{M}(u^{\star},v^{\star})+\epsilon\\ &\leq\|[\vartheta^{\star},\mu^{\star}]\|_{1}\max\left\{\|\widehat{u}_{\epsilon}\|_{\infty},\max_{j}[\widehat{v}_{\epsilon}]_{j}\right\}+\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon.\end{split}

(A.5)

Combining the lower bound on $\phi_{M}(\cdot)$ given in (2.2) with (A.5) and since $\beta>\|[\vartheta^{\star},\mu^{\star}]\|_{1}$ by assumption, we have

\max\left\{\|\widehat{u}_{\epsilon}\|_{\infty},\max_{j}[\widehat{v}_{\epsilon}]_{j}\right\}\leq\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\|[\vartheta^{\star},\mu^{\star}]\|_{1}}.

(A.6)

Completing the upper bound on the objective.

Substituting (A.6) into (A.4) gives

\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}\rangle+\|[\vartheta^{\star},\mu^{\star}]\|_{1}\frac{\beta\frac{\log(2d_{1}+d_{2})}{M}+\epsilon}{\beta-\|[\vartheta^{\star},\mu^{\star}]\|_{1}}.

(A.7)

∎

A.2 Proof of Lemma 3.1

Proof.

The proof consists of three parts.

Lower bound on the objective.

Substituting the values of $\beta$ and $M$ , and replacing $\epsilon$ by $\epsilon\textup{Tr}(C)$ in (A.3), we have

\langle C,\widehat{X}_{\epsilon}\rangle\geq\langle C,X^{\star}_{R}\rangle-2\epsilon\textup{Tr}(C).

(A.8)

Since the identity matrix $I$ is strictly feasible for (k-Cut-Rel), $\textup{Tr}(C)\leq\langle C,X^{\star}_{R}\rangle$ . Combining this fact with (A.8) gives,

\langle C,\widehat{X}_{\epsilon}\rangle\geq(1-2\epsilon)\langle C,X^{\star}_{R}\rangle.

Bound on infeasibility.

For (k-Cut-Rel), let $\nu=[\nu^{(1)},\nu^{(2)}]\in\mathbbm{R}^{n+|E|}$ be a dual variable such that $\nu^{(1)}_{i}$ for $i=1,\dotsc,n$ are the variables corresponding to $n$ equality constraints and $\nu^{(2)}_{ij}$ for $(i,j)\in E,i<j$ are the dual variables corresponding to $|E|$ inequality constraints. Following (DSDP), the dual of (k-Cut-Rel) is

\min_{\nu}\sum\limits_{i=1}^{n}\nu^{(1)}_{i}-\frac{1}{k-1}\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)}_{ij}\quad\textup{subject to}\quad\begin{cases}&\textup{diag}^{*}(\nu^{(1)})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)}_{ij}}{2}-C\succeq 0\\ &\nu^{(2)}\leq 0.\end{cases}

(Dual-Relax)

Let $\nu^{\star}$ be an optimal dual solution. In order to bound the infeasibility using (A.6), we need a bound on $\|\nu^{\star}\|_{1}$ which is given by the following lemma.

Lemma A.1.

The value of $\|\nu^{\star}\|_{1}$ is upper bounded by $4\textup{Tr}(C)$ .

Proof.

The matrix $C$ is a scaled Laplacian and so, the only off-diagonal entries that are nonzero correspond to $(i,j)\in E$ and have value less than zero. For (Dual-Relax), a feasible solution is $\nu^{(1)}=\textup{diag}(C)$ , $\nu^{(2)}_{ij}=2C_{ij}$ for $(i,j)\in E,i<j$ . The optimal objective function value of (Dual-Relax) is then upper bounded by

		$\displaystyle\sum_{i=1}^{n}\nu^{(1)\star}_{i}-\frac{1}{k-1}\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\textup{Tr}(C)+\frac{1}{k-1}\textup{Tr}(C)=\frac{k}{k-1}\textup{Tr}(C)$		(A.9)
	$\displaystyle\Rightarrow\quad$	$\displaystyle\sum_{i=1}^{n}\nu^{(1)\star}_{i}\leq\frac{k}{k-1}\textup{Tr}(C)+\frac{1}{k-1}\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\frac{k}{k-1}\textup{Tr}(C),$		(A.10)

where the last inequality follows since $\nu^{(2)}\leq 0$ .

We have $\left\langle\textup{diag}^{*}(\nu^{(1)\star})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)\star}_{ij}}{2},\mathbbm{1}\mathbbm{1}^{T}\right\rangle-\langle C,\mathbbm{1}\mathbbm{1}^{T}\rangle\geq 0$ since both matrices are PSD. Using the fact that $\mathbbm{1}$ is in the null space of $C$ , we get

-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\sum_{i=1}^{n}\nu^{(1)\star}_{i}.

(A.11)

Since $\nu^{(2)\star}\leq 0$ , we can write

\|\nu^{\star}\|_{1}=\sum_{i=1}^{n}|\nu^{(1)\star}_{i}|-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{i}\leq 2\sum_{i=1}^{n}\nu^{(1)\star}_{i},

(A.12)

which follows from (A.11) and the fact that for the dual to be feasible we have $\nu^{(1)}\geq 0$ since $C$ has nonnegative entries on the diagonal. Substituting (A.10) in (A.12),

\|\nu^{\star}\|_{1}\leq\frac{2k}{k-1}\textup{Tr}(C)\leq 4\textup{Tr}(C),

(A.13)

where the last inequality follows since $k/(k-1)\leq 2$ for $k\geq 2$ . ∎

Since $\widehat{X}_{\epsilon}$ is an $\epsilon\textup{Tr}(C)$ -optimal solution to (k-Cut-LSE), we replace $\epsilon$ be $\epsilon\textup{Tr}(C)$ in (A.6). Finally, substituting (A.13) into (A.6), and setting $\beta=6\textup{Tr}(C)$ and $M=6\frac{\log(2n+|E|)}{\epsilon}$ ,

\max\left\{\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty},\max_{ij\in E,i<j}-\frac{1}{k-1}-[\widehat{X}_{\epsilon}]_{ij}\right\}\leq\epsilon.

(A.14)

This condition can also be stated as

\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\frac{1}{k-1}-\epsilon\quad(i,j)\in E,i<j.

Upper bound on the objective.

Substituting (A.14) and (A.13) and the values of parameters $\beta$ and $M$ into (A.7) gives

\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}_{R}\rangle+4\textup{Tr}(C)\epsilon\leq(1+4\epsilon)\langle C,X^{\star}_{R}\rangle,

where the last inequality follows since $\textup{Tr}(C)\leq\langle C,X^{\star}_{R}\rangle$ . ∎

A.3 Proof of Lemma 3.2

Proof.

We first show that Algorithm 2 generates random Gaussian samples whose covariance is feasible to (k-Cut-Rel).

Proposition A.1.

Given $k$ Gaussian random vectors $z_{1},\dotsc,z_{k}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ , such that their covariance $\widehat{X}_{\epsilon}$ satisfies the inequality (3.4), the Gaussian random vectors $z^{f}_{1},\dotsc,z^{f}_{k}\sim\mathcal{N}(0,X^{f})$ generated by Algorithm 2 have covariance $X^{f}$ that is a feasible solution to (k-Cut-Rel).

Proof.

Define $\overline{X}=\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}$ . Note that, $\overline{X}\succeq 0$ and it satisfies the following properties:

1.

Since $\widehat{X}_{\epsilon}$ satisfies (3.4), we have $\textup{err}\leq\epsilon$ . Combining this fact with the definition of $\overline{X}$ , we have $\overline{X}_{jl}\geq-\frac{1}{k-1}$ for $(j,l)\in E,j<l$ .
2.

Furthermore, $\textup{diag}(\overline{X})=\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}$ , which when combined with (3.4), gives $1\leq\textup{diag}(\overline{X})\leq 1+2\textup{err}$ .
3.

For $y\sim\mathcal{N}(0,1)$ , if $\overline{z}_{i}=z_{i}+\sqrt{\textup{err}}y\mathbbm{1}$ , i.e., it is a sum of two Gaussian random vectors, then $\overline{z}_{i}\sim\mathcal{N}(0,\overline{X})$ .

The steps 5 and 6 of Algorithm 2 generate a zero-mean random vector $z^{f}$ whose covariance is

X^{f}=\frac{\overline{X}}{\textup{max}(\textup{diag}(\overline{X}))}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\overline{X})}{\max(\textup{diag}(\overline{X}))}\right)\right),

(A.15)

i.e., $z^{f}\sim\mathcal{N}(0,X^{f})$ . Furthermore, $X^{f}$ is feasible to (k-Cut-Rel) since $\textup{diag}(X^{f})=\mathbbm{1}$ , $X^{f}_{jl}\geq-\frac{1}{k-1}$ for $(j,l)\in E,j<l$ , and it is a sum of two PSD matrices so that $X^{f}\succeq 0$ . ∎

The objective function value of (k-Cut-Rel) at $X^{f}$ (defined in (A.15)) is

	$\displaystyle\langle C,X^{f}\rangle$	$\displaystyle=\left\langle C,\frac{\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)\right\rangle$		(A.16)
		$\displaystyle\underset{(i)}{\geq}\ \frac{\langle C,\widehat{X}_{\epsilon}\rangle}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\ \underset{(ii)}{\geq}\frac{1-2\epsilon}{1+2\epsilon}\langle C,X^{\star}_{R}\rangle\ \underset{(iii)}{\geq}\ (1-4\epsilon)\langle C,X^{\star}_{R}\rangle,$		(A.17)

where (i) follows from the fact that both $C$ and $\frac{\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)$ are PSD and so, their inner product is nonnegative, (ii) follows from Lemma 3.1 and the fact that $\textup{err}\leq\epsilon$ , and (iii) uses the fact that $1-2\epsilon\geq(1+2\epsilon)(1-4\epsilon)$ . Let $\mathbb{E}[\texttt{CUT}]$ denote the value of the cut generated from the samples $z^{f}_{i}$ ’s. Combining (A.17) with the inequality $\frac{\mathbb{E}[\texttt{CUT}]}{\langle C,X^{f}\rangle}\geq\alpha_{k}$ (see (3.2)), we have

\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}\langle C,X^{f}\rangle\geq\alpha_{k}(1-4\epsilon)\langle C,X^{\star}_{R}\rangle\geq\alpha_{k}(1-4\epsilon)\textup{opt}_{k}^{G}.

(A.18)

∎

A.4 Proof of Lemma 3.3

Proof.

We use Algorithm 1 with $p=\frac{\epsilon}{T(n,\epsilon)}$ and $T(n,\epsilon)=\frac{144\log(2n+|E|)n^{2}}{\epsilon^{2}}$ to generate an $\epsilon\textup{Tr}(C)$ -optimal solution to (k-Cut-LSE). We first bound the outer iteration complexity, i.e., the number of iterations of Algorithm 1 until convergence. This value also denotes the number of times the subproblem LMO is solved.

Upper bound on outer iteration complexity.

Let the objective function of (k-Cut-LSE) be $g(X)=\langle C,X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},\left[-\frac{1}{k-1}-e_{i}^{T}Xe_{j}\right]_{(i,j)\in E}\right)$ .

Theorem A.1.

Let $g(X)$ be a concave and differentiable function and $X^{\star}$ an optimal solution of (k-Cut-LSE). Let $C_{g}^{u}$ be an upper bound on the curvature constant of $g$ , and let $\eta\geq 0$ be the accuracy parameter for LMO, then, $X_{t}$ satisfies

-g(X_{t})+g(X^{\star})\leq\frac{2C_{g}^{u}(1+\eta)}{t+2},

(A.19)

with probability at least $(1-p)^{t}\geq 1-tp$ .

The result follows from [18, Theorem 1] when LMO generates a solution with approximation error at most $\frac{1}{2}\eta\gamma C_{g}^{u}$ with probability $1-p$ . Now, $\eta\in(0,1)$ is an appropriately chosen constant, and from [28, Lemma 3.1], an upper bound $C_{f}^{u}$ on the curvature constant of $g(X)$ is $\beta Mn^{2}$ . Thus, after at most

T=\frac{2C_{g}^{u}(1+\eta)}{\epsilon\textup{Tr}(C)}-2=\frac{2\beta Mn^{2}(1+\eta)}{\epsilon\textup{Tr}(C)}-2

(A.20)

iterations, Algorithm 1 generates an $\epsilon\textup{Tr}(C)$ -optimal solution to (k-Cut-LSE).

Bound on the approximate $k$ -cut value.

From Theorem A.1, we see that after at most $T$ iterations, Algorithm 1 generates a solution $\widehat{X}_{\epsilon}$ that satisfies the bounds in Lemma 3.1 with probability with at least $1-\epsilon$ when $p=\frac{\epsilon}{T(n,\epsilon)}$ . Consequently, the bound given in (A.17) also holds with probability at least $1-\epsilon$ . And so, the expected value of $\langle C,X^{f}\rangle$ is $\mathbb{E}[\langle C,X^{f}\rangle]\geq(1-4\epsilon)\langle C,X^{\star}_{R}\rangle(1-\epsilon)\geq(1-5\epsilon)\langle C,X^{\star}_{R}\rangle$ . Finally, from (A.18), the expected value of the $k$ -cut, denoted by $\mathbb{E}[\texttt{CUT}]$ , is bounded as

\mathbb{E}[\texttt{CUT}]=\mathbb{E}_{L}[\mathbb{E}_{G}[\texttt{CUT}]]\geq\alpha_{k}\mathbb{E}_{L}[\langle C,X^{f}\rangle]\geq\alpha_{k}(1-5\epsilon)\langle C,X^{\star}_{R}\rangle\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{G},

(A.21)

where $\mathbb{E}_{L}[\cdot]$ denotes the expectation over the randomness in the subproblem LMO and $\mathbb{E}_{G}[\cdot]$ denotes the expectation over random Gaussian samples.

Finally, we compute an upper bound on the complexity of each iteration, i.e., inner iteration complexity, of Algorithm 1.

Upper bound on inner iteration complexity.

At each iteration $t$ , Algorithm 1 solves the subproblem LMO, which generates a unit vector $h$ , such that

\alpha\langle hh^{T},\nabla g_{t}\rangle\geq\max_{d\in\mathcal{S}}\alpha\langle d,\nabla g_{t}\rangle-\frac{1}{2}\eta\gamma_{t}C_{g}^{u},

(A.22)

where $\gamma_{t}=\frac{2}{t+2}$ , $\nabla g_{t}=\nabla g(X_{t})$ and $\mathcal{S}=\{X\succeq 0:\textup{Tr}(X)\leq n\}$ . Note that this problem is equivalent to approximately computing maximum eigenvector of the matrix $\nabla g_{t}$ which can be done using Lanczos algorithm [22].

Lemma A.2 (Convergence of Lanczos algorithm).

Let $\rho\in(0,1]$ and $p\in(0,1/2]$ . For $\nabla g_{t}\in\mathbb{S}_{n}$ , the Lanczos method [22], computes a vector $h\in\mathbb{R}^{n}$ , that satisfies

h^{T}\nabla g_{t}h\geq\lambda_{\textup{max}}(\nabla g_{t})-\frac{\rho}{8}\|\nabla g_{t}\|

(A.23)

with probability at least $1-2p$ , after at most $q\geq\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2})$ iterations.

This result is an adaptation of [22, Theorem 4.2] which provides convergence of Lanczos to approximately compute minimum eigenvalue and the corresponding eigenvector of a symmetric matrix. Let $N=\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2})$ . We now derive an upper bound on $N$ .

Comparing (A.23) and (A.22), we see that

\begin{split}&\frac{1}{2}\eta\gamma_{t}C_{g}^{u}=\alpha\frac{\rho}{8}\|\nabla g_{t}\|\\ \Rightarrow&\frac{1}{\rho}=\frac{\alpha\|\nabla g_{t}\|}{4\eta\gamma_{t}C_{g}^{u}}\end{split}

(A.24)

Substituting the value of $\gamma_{t}$ in the equation above, and noting that $\gamma_{t}=\frac{2}{t+2}\geq\frac{2}{T+2}$ , we have

\begin{split}\frac{1}{\rho}=\frac{\alpha\|\nabla g_{t}\|(t+2)}{8\eta C_{g}^{u}}\leq\frac{\alpha\|\nabla g_{t}\|(T+2)}{8\eta C_{g}^{u}}=\frac{\alpha\|\nabla g_{t}\|(1+\eta)}{4\eta\epsilon\textup{Tr}(C)},\end{split}

(A.25)

where the last equality follows from substituting the value of $T$ (see (A.20)). We now derive an upper bound on $\|\nabla g_{t}\|$ .

Lemma A.3.

Let $g(X)=\langle C,X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},\left[-\frac{1}{k-1}-e_{i}^{T}Xe_{j}\right]_{(i,j)\in E}\right)$ , where $\phi_{M}(\cdot)$ is defined in (2.1). We have $\|\nabla g_{t}\|\leq\textup{Tr}(C)(1+6(\sqrt{2|E|+n}))$ .

Proof.

For the function $g(X)$ as defined in the lemma, $\nabla g_{t}=C-\beta D$ , where $D$ is matrix such that $D_{ii}\in[-1,1]$ for $i=1,\dotsc,n$ , $D_{ij}\in[-1,1]$ for $(i,j)\in E$ , and $D_{ij}=0$ for $(i,j)\notin E$ . Thus, we have

\max_{k}|\lambda_{k}(D)|\leq\sqrt{\textup{Tr}(D^{T}D)}=\sqrt{\sum_{i,j=1}^{n}|D_{ij}|^{2}}\leq\sqrt{2|E|+n},

(A.26)

where the last inequality follows since there are at most $2|E|$ off-diagonal and $n$ diagonal nonzero entries in the matrix $D$ with each nonzero entry in the range $[-1,1]$ . Now,

\begin{split}\|\nabla g_{t}\|=\|C-\beta D\|&\underset{(i)}{\leq}\|C\|+\|-\beta D\|\\ &\leq\max_{i}|\lambda_{i}(C)|+\max_{i}|\lambda_{i}(-\beta D)|\\ &\underset{(ii)}{\leq}\textup{Tr}(C)+\beta\sqrt{2|E|+n}\\ &\underset{(iii)}{\leq}\textup{Tr}(C)(1+6(\sqrt{2|E|+n})).\end{split}

(A.27)

where (i) follows from the triangle inequality for the spectral norm of $C-\beta D$ , (ii) follows from (A.26) and since $C$ is graph Laplacian and a positive semidefinite matrix, and (iii) follows by substituting $\beta=6\textup{Tr}(C)$ as given in Lemma 3.1. ∎

Substituting $\alpha=n$ , and the bound on $\|\nabla g_{t}\|$ in (A.25), we have

	$\displaystyle\frac{1}{\rho}\leq\frac{1+\eta}{4\eta}\frac{n(1+6(\sqrt{2\|E\|+n}))}{\epsilon},\quad\textup{and}$		(A.28)
	$\displaystyle N=\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2})\leq\frac{1}{2}+\sqrt{\frac{1+\eta}{4\eta}}\sqrt{\frac{n(1+6(\sqrt{2\|E\|+n}))}{\epsilon}}\log(n/p^{2})=N^{u}.$		(A.29)

(A.30)

Finally, each iteration of Lanczos method performs a matrix-vector multiplication with $\nabla g_{t}$ , which has at most $2|E|+n$ number of nonzero iterations, and $\mathcal{O}(n)$ additional arithmetic operations. Thus, the computational complexity of Lanczos method is $\mathcal{O}(N^{u}(|E|+n))$ . Moreover, Algorithm 1 performs $\mathcal{O}(|E|+n)$ additional arithmetic operations so that the total inner iteration complexity is $\mathcal{O}(N^{u}(|E|+n))$ , which can be written as $\mathcal{O}\left(\frac{\sqrt{n}|E|^{1.25}}{\sqrt{\epsilon}}\log(n/p^{2})\right)$ .

Computational complexity of Algorithm 1.

Now, substituting $p=\frac{\epsilon}{T(n,\epsilon)}$ , we have

\log\left(\frac{n}{p^{2}}\right)=\log\left(\frac{(144)^{2}n^{5}(\log(2n+|E|))^{2}}{\epsilon^{6}}\right)\leq\log\left(\frac{(5.3n)^{6}}{\epsilon^{6}}\right)=6\log\left(\frac{5.3n}{\epsilon}\right),

(A.31)

where the inequality follows since $|E|\leq\binom{n-1}{2}$ , $\left(\log\left(2n+\binom{n-1}{2}\right)\right)^{2}\leq n$ for $n\geq 1$ and $(5.3)^{6}\geq(144)^{2}$ . Substituting the upper bound on $\log(n/p^{2})$ in $N^{u}$ , and combining the inner iteration complexity, $\mathcal{O}(N^{u}(|E|+n))$ , and outer iteration complexity, $T$ , we see that Algorithm 1 is a $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ -time algorithm. ∎

A.5 Proof of Lemma 4.1

Proof.

We need to prove four inequalities given in Lemma 4.1.

Lower bound on the objective, $\langle C,\widehat{X}_{\epsilon}\rangle$ .

Substituting the values of $\beta$ and $M$ , and replacing $\epsilon$ by $\epsilon\textup{Tr}(C)$ in (A.3), we have

\langle C,\widehat{X}_{\epsilon}\rangle\geq\langle C,X^{\star}_{G}\rangle-2\epsilon\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right).

(A.32)

Since $0.5I+0.5\mathbbm{1}\mathbbm{1}^{T}$ is feasible for (MA-Rel), $0.5(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij})\leq\langle C,X^{\star}_{G}\rangle$ . Combining this fact with (A.32), we have

\langle C,\widehat{X}_{\epsilon}\rangle\geq(1-4\epsilon)\langle C,X^{\star}_{G}\rangle.

Bound on infeasibility.

Let $E=E^{-}\cup E^{+}$ and let $\nu=[\nu^{(1)},\nu^{(2)}]\in\mathbb{R}^{n+|E|}$ be the dual variable such that $\nu^{(1)}$ is the dual variable corresponding to the $n$ equality constraints and $\nu^{(2)}$ is the dual variable for $|E|$ inequality constraints. Following (DSDP), the dual of (MA-Rel) is

\min_{\nu}\quad\sum_{i=1}^{n}\nu^{(1)}_{i}\quad\textup{subject to}\quad\begin{cases}&\textup{diag}^{*}(\nu^{(1)})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)}_{ij}}{2}-C\succeq 0\\ &\nu^{(2)}\leq 0,\end{cases}

(Dual-CC)

where $C=L_{G^{-}}+W^{+}$ . Let $\nu^{\star}$ be an optimal dual solution. We derive an upper bound on $\|\nu^{\star}\|_{1}$ in the following lemma, which is then used to bound the infeasibility using (A.6).

Lemma A.4.

The value of $\|\nu^{\star}\|_{1}$ is upper bounded by $2\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right)$ .

Proof.

For (Dual-CC), $\nu^{(1)}_{i}=[L_{G^{-}}]_{ii}+\sum_{j:ij\in E^{+}}w_{ij}^{+}$ for $i=1,\dotsc,n$ , and $\nu^{(2)}_{ij}=2[L_{G^{-}}]_{ij}$ for $(i,j)\in E,i<j$ is a feasible solution. The optimal objective function value of (Dual-CC) is then upper bounded as

\sum_{i=1}^{n}\nu^{(1)\star}_{i}\leq\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}.

(A.33)

We have $\left\langle\textup{diag}^{*}(\nu^{(1)\star})+\sum\limits_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}[e_{i}e_{j}^{T}+e_{j}e_{i}^{T}]\frac{\nu^{(2)\star}_{ij}}{2}-C,\mathbbm{1}\mathbbm{1}^{T}\right\rangle\geq 0$ since both matrices are PSD. Using the fact that $\langle L_{G^{-}},\mathbbm{1}\mathbbm{1}^{T}\rangle=0$ , and rearranging the terms, we have

-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq\sum_{i=1}^{n}\nu^{(1)\star}_{i}-\sum_{ij\in E^{+}}w^{+}_{ij}.

Since $\nu^{(2)\star}\leq 0$ , we can write

\|\nu^{\star}\|_{1}=\sum_{i=1}^{n}|\nu^{(1)\star}_{i}|-\sum_{\begin{subarray}{c}ij\in E\\ i<j\end{subarray}}\nu^{(2)\star}_{ij}\leq 2\sum_{i=1}^{n}\nu^{(1)\star}_{i}-\sum_{ij\in E^{+}}w^{+}_{ij},

(A.34)

where we have used the fact that for any dual feasible solution, $\nu^{(1)}_{i}\geq[L_{G^{-}}]_{ii}\geq 0$ for all $i=1,\dotsc,n$ . Substituting (A.33) in (A.34),

\|\nu^{\star}\|_{1}\leq 2\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\leq 2\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right).

(A.35)

∎

For $\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}$ , $\widehat{X}_{\epsilon}$ is an $\epsilon\Delta$ -optimal solution to (MA-LSE). And so, we replace $\epsilon$ be $\epsilon\Delta$ in (A.6). Now, substituting (A.35) and the values of $\beta$ and $M$ into (A.6), we get

\max\left\{\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty},\max_{ij\in E,i<j}-[\widehat{X}_{\epsilon}]_{ij}\right\}\leq\epsilon.

(A.36)

This condition can also be stated as

\|\textup{diag}(\widehat{X}_{\epsilon})-\mathbbm{1}\|_{\infty}\leq\epsilon,\quad[\widehat{X}_{\epsilon}]_{ij}\geq-\epsilon\quad(i,j)\in E,i<j.

Substituting (A.36), (A.35) and the values of the parameters $\beta$ and $M$ into (A.7) gives

\langle C,\widehat{X}_{\epsilon}\rangle\leq\langle C,X^{\star}_{G}\rangle+2\left(\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}\right)\epsilon\leq(1+4\epsilon)\langle C,X^{\star}_{G}\rangle,

where the last inequality follows since $I$ is a feasible solution to (MA-Rel). ∎

A.6 Proof of Lemma 4.2

Proof.

We first note that Algorithm 2 generates a samples whose covariance is feasible to (MA-Rel).

Proposition A.2.

Let $z_{1},z_{2}\sim\mathcal{N}(0,\widehat{X}_{\epsilon})$ be Gaussian random vectors such that their covariance $\widehat{X}_{\epsilon}$ satisfies the inequality (4.2). Replace Step 3 of Algorithm 2 with redefined err such that $\textup{err}=\max\{0,\max_{(i,j)\in E,i<j}\ -[\widehat{X}_{\epsilon}]_{ij}\}$ . The Gaussian random vectors $z^{f}_{1},z^{f}_{2}\sim\mathcal{N}(0,X^{f})$ generated by the modified Algorithm 2 have covariance that is feasible to (MA-Rel).

The proof of Proposition A.2 is the same as the proof of Proposition A.1. Now, let

X^{f}=\frac{\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)

(A.37)

The objective function value of (MA-Rel) at $X^{f}$ is

$\displaystyle\langle C,X^{f}\rangle$	$\displaystyle=\left\langle C,\frac{\widehat{X}_{\epsilon}+\textup{err}\mathbbm{1}\mathbbm{1}^{T}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)\right\rangle$	(A.38)
	$\displaystyle\underset{(i)}{\geq}\ \frac{\langle C,\widehat{X}_{\epsilon}\rangle}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}+\left\langle C,\left(I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)\right)\right\rangle$	(A.39)
	$\displaystyle\underset{(ii)}{\geq}\ \frac{\langle C,\widehat{X}_{\epsilon}\rangle}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\ \underset{(iii)}{\geq}\ \frac{1-4\epsilon}{1+2\epsilon}\langle C,X^{\star}_{G}\rangle\ \underset{(iv)}{\geq}\ (1-6\epsilon)\langle C,X^{\star}_{G}\rangle$	(A.40)

where (i) follows from the fact that $\langle L_{G^{-}},\textup{err}\mathbbm{1}\mathbbm{1}^{T}\rangle=0$ and $\langle W_{+},\textup{err}\mathbbm{1}\mathbbm{1}^{T}\rangle\geq 0$ , (ii) follows since $L_{G^{-}}$ and $I-\textup{diag}^{*}\left(\frac{\textup{diag}(\widehat{X}_{\epsilon})+\textup{err}}{\max(\textup{diag}(\widehat{X}_{\epsilon}))+\textup{err}}\right)$ are PSD and their inner product is nonnegative and the diagonal entries of $W_{+}$ are 0, (iii) follows from Lemma 4.1 and the fact that $\textup{err}\leq\epsilon$ , and (iv) follows since $1-4\epsilon\geq(1+2\epsilon)(1-6\epsilon)$ . Combining the fact that $\langle C,X^{\star}_{G}\rangle\geq\textup{opt}_{CC}^{G}$ and $\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{f}\rangle$ with the above, we have

\mathbb{E}[\mathcal{C}]\geq 0.766(1-6\epsilon)\textup{opt}_{CC}^{G}.

(A.41)

∎

A.7 Proof of Lemma 4.3

Proof.

We use Algorithm 1 with $p=\frac{\epsilon}{T(n,\epsilon)}$ and $T(n,\epsilon)=\frac{64\log(2n+|E|)n^{2}}{\epsilon^{2}}$ to generate an $\epsilon\Delta$ -optimal solution to (MA-LSE), where $\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}$ .

Upper bound on outer iteration complexity.

The convergence result given in Theorem A.1 holds when Algorithm 1 is applied to (k-Cut-LSE). Then, the total number of iterations of Algorithm 1, also known as outer iteration complexity, required to generate $\epsilon\Delta$ -optimal solution to (k-Cut-LSE) is

T=\frac{2C_{g}^{u}(1+\eta)}{\epsilon\Delta}-2=\frac{2\beta Mn^{2}(1+\eta)}{\epsilon\Delta}-2.

(A.42)

Bound on the value of generated clustering.

Algorithm 1 with $p=\frac{\epsilon}{T(n,\epsilon)}$ generates a solution $\widehat{X}_{\epsilon}$ that satisfies the bounds in Lemma 3.1 with probability with at least $1-\epsilon$ after at most $T$ iterations. Thus, the bound given in (A.40) holds with probability at least $1-\epsilon$ and we have

\mathbb{E}[\langle C,X^{f}\rangle]\geq(1-6\epsilon)\langle C,X^{\star}_{G}\rangle(1-\epsilon)\geq(1-7\epsilon)\langle C,X^{\star}_{G}\rangle.

(A.43)

Let $\mathbb{E}_{L}[\cdot]$ denote the expectation over the randomness in the subproblem LMO and $\mathbb{E}_{G}[\cdot]$ denote the expectation over random Gaussian samples. The expected value of the generated clustering is then bounded as

\mathbb{E}[\mathcal{C}]=\mathbb{E}_{L}[\mathbb{E}_{G}[\mathcal{C}]]\underset{(i)}{\geq}0.766\mathbb{E}_{L}[\langle C,X^{f}\rangle]\geq 0.766(1-7\epsilon)\langle C,X^{\star}_{G}\rangle\geq 0.766(1-7\epsilon)\textup{opt}_{CC}^{G},

(A.44)

where (i) follows from the fact that the value of clustering generated by CGW rounding scheme satisfies $\mathbb{E}[\mathcal{C}]\geq 0.766\langle C,X^{f}\rangle$ .

We now determine the inner iteration complexity of Algorithm 1.

Upper bound on inner iteration complexity.

At each iteration $t$ of Algorithm 1, the subroutine LMO (see (A.22)) is equivalent to approximately computing maximum eigenvector of the matrix $\nabla g_{t}$ . This is achieved using Lanczos method whose convergence is given in Lemma A.2. Now, let $N=\frac{1}{2}+\frac{1}{\rho}\log(n/p^{2})$ . We see that the bound on $1/\rho$ is

\frac{1}{\rho}\leq\frac{\alpha\|\nabla g_{t}\|(1+\eta)}{4\eta\epsilon\Delta},

(A.45)

which is similar to (A.25). We now derive an upper bound on $\|\nabla g_{t}\|$ .

Lemma A.5.

Let $g(X)=\langle L_{G^{-}}+W^{+},X\rangle-\beta\phi_{M}\left(\textup{diag}(X)-\mathbbm{1},\left[-e_{i}^{T}Xe_{j}\right]_{(i,j)\in E}\right)$ , where $\phi_{M}(\cdot)$ is defined in (2.1). We have $\|\nabla g_{t}\|\leq\Delta(1+4(\sqrt{2|E|+n}))$ , where $\Delta=\textup{Tr}(L_{G^{-}})+\sum_{ij\in E^{+}}w^{+}_{ij}$ .

Proof.

For the function $g(X)$ as defined in the lemma, $\nabla g_{t}=L_{G^{-}}+W^{+}-\beta D$ , where $D$ is matrix such that $D_{ii}\in[-1,1]$ for $i=1,\dotsc,n$ , $D_{ij}\in[-1,1]$ for $(i,j)\in E$ , and $D_{ij}=0$ for $(i,j)\notin E$ , and $E=E^{-}\cup E^{+}$ . We have

	$\displaystyle\max_{k}\|\lambda_{k}(W^{+})\|\leq\sqrt{\textup{Tr}(W^{+T}W^{+})}=\sqrt{\sum_{(i,j)\in E^{+}}\|w^{+}_{ij}\|^{2}}\leq\sum_{(i,j)\in E^{+}}w^{+}_{ij},\quad\textup{and}$		(A.46)
	$\displaystyle\max_{k}\|\lambda_{k}(D)\|\leq\sqrt{\textup{Tr}(D^{T}D)}=\sqrt{\sum_{i,j=1}^{n}\|D_{ij}\|^{2}}\leq\sqrt{2\|E\|+n},$		(A.47)

(A.48)

where the last inequality follows since $D$ has at most $2|E|+n$ nonzero entries in the range $[-1,1]$ . Now, we have

\begin{split}\|\nabla g_{t}\|=\|L_{G^{-}}+W^{+}-\beta D\|&\underset{(i)}{\leq}\|L_{G^{-}}\|+\|W^{+}\|+\|-\beta D\|\\ &\leq\max_{i}|\lambda_{i}(L_{G^{-}})|+\max_{i}|\lambda_{i}(W^{+})|+\max_{i}|\lambda_{i}(-\beta D)|\\ &\underset{(ii)}{\leq}\textup{Tr}(L_{G^{-}})+\sum_{(i,j)\in E^{+}}w^{+}_{ij}+\beta\sqrt{2|E|+n}\\ &\underset{(iii)}{\leq}\Delta(1+4(\sqrt{2|E|+n})).\end{split}

(A.49)

where (i) follows since the spectral norm of $L_{G^{-}}+W^{+}-\beta D$ satisfies the triangle inequality, (ii) follows from (A.46), (A.47) and the fact that $L_{G^{-}}$ is a positive semidefinite matrix, and (iii) follows by substituting the value of $\Delta$ and $\beta=4\Delta$ as given in Lemma 4.1. ∎

Substituting the bound on $\|\nabla g_{t}\|$ in (A.45), we have

	$\displaystyle\frac{1}{\rho}\leq\frac{1+\eta}{4\eta}\frac{n(1+4(\sqrt{2\|E\|+n}))}{\epsilon},\quad\textup{and}$		(A.50)
	$\displaystyle N=\frac{1}{2}+\frac{1}{\sqrt{\rho}}\log(n/p^{2})\leq\frac{1}{2}+\sqrt{\frac{1+\eta}{4\eta}}\sqrt{\frac{n(1+4(\sqrt{2\|E\|+n}))}{\epsilon}}\log(n/p^{2})=N^{u}.$		(A.51)

(A.52)

The computational complexity of Lanczos method is $\mathcal{O}(N^{u}(|E|+n))$ , where the term $|E|+n$ appears since Lanczos method performs matrix-vector multiplication with $\|\nabla g_{t}\|$ , whose sparsity is $\mathcal{O}(|E|)$ , plus additional $\mathcal{O}(n)$ arithmetic operations at each iteration. We finally write the computational complexity of each iteration of Algorithm 1 as $\mathcal{O}\left(\frac{\sqrt{n}|E|^{1.25}}{\sqrt{\epsilon}}\log(n/p^{2})\right)$ .

Total computational complexity of Algorithm 1.

Since $p=\frac{\epsilon}{T(n,\epsilon)}$ , we have

\log\left(\frac{n}{p^{2}}\right)=\log\left(\frac{(64)^{2}n^{5}(\log(2n+|E|))^{2}}{\epsilon^{6}}\right)\leq\log\left(\frac{4^{6}n^{6}}{\epsilon^{6}}\right)=6\log\left(\frac{4n}{\epsilon}\right),

(A.53)

where the inequality follows since $|E|\leq\binom{n-1}{2}$ and $\left(\log\left(2n+\binom{n-1}{2}\right)\right)^{2}\leq n$ for $n\geq 1$ . Multiplying outer and inner iteration complexity and substituting the bound on $p$ , we prove that Algorithm 1 is a $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ -time algorithm. ∎

A.8 Proof of Lemma 5.1

For any symmetric matrix $X\in\mathbb{S}^{n}$ , the definition of $\tau$ -spectral closeness (Definition 5.1) implies

(1-\tau)\langle L_{G},X\rangle\leq\langle L_{\tilde{G}},X\rangle\leq(1+\tau)\langle L_{G},X\rangle.

(A.54)

Let $C$ and $\tilde{C}$ be the cost matrix in the objective of (k-Cut-Rel), when the problem is defined on the graphs $G$ and $\tilde{G}$ respectively. Since $C$ and $\tilde{C}$ are scaled Laplacian matrices (with the same scaling factor $(k-1)/2k$ , from (A.54), we can write

(1-\tau)\langle C,X\rangle\leq\langle\tilde{C},X\rangle\leq(1+\tau)\langle C,X\rangle.

(A.55)

Let $X_{G}^{\star}$ and $X_{\tilde{G}}^{\star}$ be optimal solutions to (k-Cut-Rel) defined on the graphs $G$ and $\tilde{G}$ respectively. From (A.55), we can write,

(1-\tau)\langle C,X^{\star}_{G}\rangle\leq\langle\tilde{C},X^{\star}_{G}\rangle\leq\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle,

(A.56)

where the last inequality follows since $X_{G}^{\star}$ and $X_{\tilde{G}}^{\star}$ are feasible and optimal solutions respectively to (k-Cut-Rel) defined on the graph $\tilde{G}$ . Combining this with the bound in Lemma 3.2, i.e., $\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-4\epsilon)\langle\tilde{C},X_{\tilde{G}}^{\star}\rangle$ , we get

\begin{split}\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-4\epsilon)\langle\tilde{C},X_{\tilde{G}}^{\star}\rangle\underset{(i)}{\geq}\alpha_{k}(1-4\epsilon)(1-\tau)\langle C,X^{\star}_{G}\rangle&\underset{(ii)}{\geq}\alpha_{k}(1-4\epsilon-\tau)\langle C,X^{\star}_{G}\rangle\\ &\underset{(iii)}{\geq}\alpha_{k}(1-4\epsilon-\tau)\textup{opt}_{k}^{G},\end{split}

(A.57)

where (i) follows from (A.56), (ii) follows since $(1-4\epsilon)(1-\tau)=1-4\epsilon-\tau+4\epsilon\tau\geq 1-4\epsilon\tau$ for nonnegative $\epsilon$ and $\tau$ , and (iii) follows since $\langle C,X^{\star}_{G}\rangle\geq\textup{opt}_{k}^{G}$ for an optimal solution $X^{\star}_{G}$ to (k-Cut-Rel) defined on the graph $G$ .

A.9 Proof of Lemma 5.2

Proof.

The Laplacian matrices $L_{G^{-}}$ and $L_{\tilde{G}^{-}}$ of the graphs $G^{-}$ and its sparse approximation $\tilde{G}^{-}$ respectively satisfy (A.54). Furthermore, let $L_{G}^{+}=D^{+}-W^{+}$ , where $D^{+}_{ii}=\sum_{j:(i,j)\in E^{+}}w^{+}_{ij}$ , be the Laplacian of the graph $G^{+}$ and similarly let $L_{\tilde{G}}^{+}=\tilde{D}^{+}-\tilde{W}^{+}$ be the Laplacian of the graph $\tilde{G}^{+}$ . If $X=I$ , from (A.54), we have

(1-\tau)\textup{Tr}(D^{+})\leq\textup{Tr}(\tilde{D}^{+})\leq(1+\tau)\textup{Tr}(D^{+}).

(A.58)

Rewriting the second inequality in (A.54) for $X=X^{\star}_{G}$ , and noting that $\textup{diag}(X^{\star}_{G})=\mathbbm{1}$ , we have

\begin{split}\langle W^{+},X^{\star}_{G}\rangle&\leq\frac{\langle\tilde{W}^{+},X^{\star}_{G}\rangle}{1+\tau}+\frac{(1+\tau)\textup{Tr}(D^{+})-\textup{Tr}(\tilde{D}^{+})}{1+\tau}\\ &\leq\frac{\langle\tilde{W}^{+},X^{\star}_{G}\rangle}{1+\tau}+\frac{2\tau\textup{Tr}(D^{+})}{1+\tau},\end{split}

(A.59)

where the second inequality follows from (A.58). Let $C=L_{G^{-}}+W^{+}$ and $\tilde{C}=L_{\tilde{G}^{-}}+\tilde{W}^{+}$ represent the cost in (MA-Rel) and (MA-Sparse) respectively. Let $X^{\star}_{G}$ be an optimal solution to (MA-Rel). The optimal objective function value of (MA-Rel) at $X^{\star}_{G}$ is $\langle C,X^{\star}_{G}\rangle$ and

\begin{split}(1-\tau)\langle C,X^{\star}_{G}\rangle&=(1-\tau)\langle L_{G^{-}},X^{\star}_{G}\rangle+(1-\tau)\langle W^{+},X^{\star}_{G}\rangle\\ &\underset{(i)}{\leq}\langle L_{\tilde{G}^{-}},X^{\star}_{G}\rangle+\frac{1-\tau}{1+\tau}\langle\tilde{W}^{+},X^{\star}_{G}\rangle+\frac{2\tau(1-\tau)}{1+\tau}\textup{Tr}(D^{+})\\ &\underset{(ii)}{\leq}\langle\tilde{C},X^{\star}_{G}\rangle-\frac{2\tau}{1+\tau}\langle\tilde{W}^{+},X^{\star}_{G}\rangle+\frac{2\tau}{1+\tau}\textup{Tr}(\tilde{D}^{+})\\ &\underset{(iii)}{\leq}\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle+\frac{2\tau}{1+\tau}\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle,\end{split}

(A.60)

where (i) follows from (A.54) and (A.59), (ii) follows from (A.58), and substituting $\tilde{C}=L_{\tilde{G}^{-}}+\tilde{W}^{+}$ and rearranging the terms and (iii) holds true since $\langle\tilde{W}^{+},X^{\star}_{G}\rangle\geq 0$ , and $I$ and $X^{\star}_{G}$ are feasible to (MA-Sparse) so that $\textup{Tr}(\tilde{D}^{+})\leq\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle$ and $\langle\tilde{C},X^{\star}_{G}\rangle\leq\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle$ . Rearraning the terms, we have

\langle C,X^{\star}_{G}\rangle\leq\frac{1+3\tau}{1-\tau^{2}}\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle.

(A.61)

Combining (A.61) with the fact that the expected value of clustering $\mathbb{E}[\mathcal{C}]$ generated for the graph $\tilde{G}$ satisfies (4.3), we have

\begin{split}\mathbb{E}[\mathcal{C}]\geq 0.766(1-6\epsilon)\langle\tilde{C},X^{\star}_{\tilde{G}}\rangle\geq 0.766\frac{(1-6\epsilon)(1-\tau^{2})}{1+3\tau}\langle C,X^{\star}_{G}\rangle\geq(1-6\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G},\end{split}

(A.62)

where the last inequality follows since $(1-6\epsilon-3\tau)(1+3\tau)\leq 1-6\epsilon$ . ∎

A.10 Proof of Lemma 5.3

The first step of the procedure given in Section 5 is to sparsify the input graph using the technique proposed in [23] whose computational complexity is $\mathcal{O}(|E|\log^{2}n)$ . The second step when generating solutions to Max-k-Cut and Max-Agree is to apply the procedures given in Sections 3 and 4 respectively. The computational complexity of this step is bounded as given in Propositions 3.3 and 4.3 leading to a $\mathcal{O}\left(\frac{n^{2.5}|E|^{1.25}}{\epsilon^{2.5}}\log(n/\epsilon)\log(|E|)\right)$ -time algorithm.

Bound on the value of generated $k$ -cut.

Let $p=\frac{\epsilon}{T(n,\epsilon)}$ and $T(n,\epsilon)=\frac{144\log(2n+|E|)n^{2}}{\epsilon^{2}}$ as given in Lemma 3.3. Using the procedure given in Section 3, we have $\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon)\textup{opt}_{k}^{\tilde{G}}$ . From the proof of Lemma 5.1, we see that CUT is then an approximate $k$ -cut for the input graph $G$ such that $\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-5\epsilon-\tau)\textup{opt}_{k}^{G}$ .

Bound on the value of generated clustering.

Let $p=\frac{\epsilon}{T(n,\epsilon)}$ and $T(n,\epsilon)=\frac{64\log(2n+|E|)n^{2}}{\epsilon^{2}}$ as given in Lemma 4.3 and let the procedure given in Section 4 be applied to the sparse graph $\tilde{G}$ . Then, the generated clustering satisfies $\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon)\textup{opt}_{CC}^{\tilde{G}}$ . Combining this with the proof of Lemma 5.2, we have $\mathbb{E}[\mathcal{C}]\geq 0.766(1-7\epsilon-3\tau)(1-\tau^{2})\textup{opt}_{CC}^{G}$ .

Appendix B Preliminary Computational Results for Max-k-Cut

We provide some preliminary computational results when generating an approximate $k$ -cut on the graph $G$ using the approach outlined in Section 3. The aim of these experiments was to verify that the bounds given in Lemma 3.2 were satisfied in practice. First, we solved (k-Cut-LSE) to $\epsilon\textup{Tr}(C)$ -optimality using Algorithm 1 with the input parameters set to $\alpha=n$ , $\epsilon=0.05$ , $\beta=6\textup{Tr}(C)$ , $M=6\frac{\log(2n)+|E|}{\epsilon}$ . We then computed feasible samples using Algorithm 2 and then finally used the FJ rounding scheme on the generated samples. The computations were performed using MATLAB R2021a on a machine with 8GB RAM. The peak memory requirement was noted using the profiler command in MATLAB.

We performed computations on randomly selected graphs from GSet dataset. In each case, the infeasibility of the covariance of the generated samples was less than $\epsilon$ , thus satisfying (3.4). The number of iterations of LMO in Algorithm 1 was also within the bounds given in Proposition 1.1. To a generate $k$ -cut, we generated 10 sets of $k$ i.i.d. zero-mean Gaussian samples with covariance $\widehat{X}_{\epsilon}$ , and each set was then used to generate a $k$ -cut for the input graph using FJ rounding scheme. Let $\texttt{CUT}_{\textup{best}}$ denote the value of the best $k$ -cut amongst the 10 generated cuts. Table LABEL:table:maxkcutResults shows the result for graphs from the GSet dataset with $k=3,4$ . Note that, $\texttt{CUT}_{\textup{best}}\geq\mathbb{E}[\texttt{CUT}]\geq\alpha_{k}(1-4\epsilon)\langle C,X^{\star}\rangle\geq\alpha_{k}\frac{1-4\epsilon}{1+4\epsilon}\langle C,\widehat{X}_{\epsilon}\rangle$ , where the last inequality follows from combining (3.5) with (3.3). Since we were able to generate the values, $\texttt{CUT}_{\textup{best}}$ and $\langle C,\widehat{X}_{\epsilon}\rangle$ , we noted that the weaker bound $\texttt{CUT}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle=\textup{AR}\geq\alpha_{k}(1-4\epsilon)/(1+4\epsilon)$ was satisfied by every input graph when $\epsilon=0.05$ .

Furthermore, Table LABEL:table:maxkcutResults also shows that the memory used by our method was linear in the size of the input graph. To see this, consider the dataset G1, and note that for $k=3$ , the memory used by our method was $1252.73\textup{kB}\approx 8.02\times(|V|+|E|)\times 8$ , where a factor of 8 denotes that MATLAB uses 8 bytes to store a real number. Similarly, for other instances in GSet, the memory used by our method to generate an approximate $k$ -cut for $k=3,4$ was at most $c\times(|V|+|E|)\times 8$ , where for each graph the value of $c$ was bounded by $c\leq 82$ , showing linear dependence of the memory used on the size of the input graph.

Table 2: Result of generating a

k

-cut for graphs from GSet using the method outlined in Section 3. We have,

\textup{infeas}=\max\{\|\textup{diag}(X)-1\|_{\infty},\max\{0,-[\widehat{X}_{\epsilon}]_{ij}-\frac{1}{k-1}\}\}

and

\textup{AR}=\texttt{CUT}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle

Dataset	$\|V\|$	$\|E\|$	$k$	# Iterations ( $\times 10^{3}$ )	infeas	$\langle C,\widehat{X}_{\epsilon}\rangle$	$\textup{CUT}_{\textup{best}}$	AR	Memory required (in kB)
G1	800	19176	3	823.94	$4\times 10^{-4}$	15631	14266	0.9127	1252.73
G1	800	19176	4	891.23	$4\times 10^{-4}$	17479	15746	0.9	1228.09
G2	800	19176	3	827.61	$6\times 10^{-5}$	15629	14332	0.917	1243.31
G2	800	19176	4	9268.42	$8\times 10^{-5}$	17474	15786	0.903	1231.07
G3	800	19176	3	1242.53	$7\times 10^{-5}$	15493	14912	0.916	1239.57
G3	800	19176	4	1341.37	$7\times 10^{-45}$	17301	15719	0.908	1240.17
G4	800	19176	3	812.8	$9\times 10^{-5}$	15660	14227	0.908	1230.59
G4	800	19176	4	9082.74	$10^{-4}$	17505	15748	0.899	1223.59
G5	800	19176	3	843.5	$10^{-4}$	15633	14341	0.917	1222.09
G5	800	19176	4	9294.32	$10^{-4}$	17470	15649	0.895	1227.9
G14	800	4694	3	1240.99	0.002	3917	2533	0.646	3502.64
G14	800	4694	4	3238.42	0.001	4467.9	3775	0.844	519.25
G15	800	4661	3	3400.17	0.001	4018.6	3385	0.842	612
G15	800	4661	4	1603.13	0.001	4475.8	3754	0.838	648.17
G16	800	4672	3	33216.68	0.001	4035.7	3422	0.847	561
G16	800	4672	4	3059.11	0.001	4437.5	3783	0.852	2800
G17	800	4667	3	3526.4	0.001	4031.5	3414	0.846	602.81
G17	800	4667	4	3400.01	0.001	4440	3733	0.84	693.6
G22	2000	19990	3	7402.59	$10^{-4}$	17840	11954	0.67	1340.34
G22	2000	19990	4	8103.83	$10^{-4}$	19582	16670	0.851	1341.67
G23	2000	19990	3	3597.39	$10^{-4}$	17938	15331	0.854	1360.09
G23	2000	19990	4	3588.04	$10^{-4}$	19697	16639	0.844	1317.09
G24	2000	19990	3	4304.48	$10^{-4}$	17913	15370	0.858	1341.96
G24	2000	19990	4	1994.26	$10^{-4}$	19738	16624	0.842	1321.59
G25	2000	19990	3	9774.03	$10^{-4}$	18186	15294	0.841	1311.54
G25	2000	19990	4	1540.14	$10^{-4}$	19778	16641	0.841	1330.95
G26	2000	19990	3	2069.65	$10^{-4}$	18012	15411	0.855	1321.92
G26	2000	19990	4	1841.06	$2\times 10^{-4}$	19735	16609	0.841	1331.53
G43	1000	9990	3	894.53	$10^{-4}$	9029	7785	0.862	661.09
G43	1000	9990	4	9709.68	$2\times 10^{-4}$	9925	8463	0.852	665.59
G44	1000	9990	3	721.64	$10^{-4}$	9059.5	7782	0.859	661.09
G44	1000	9990	4	9294.43	$10^{-4}$	9926.1	8448	0.851	765.37
G45	1000	9990	3	794.84	$10^{-4}$	9038.4	7773	0.86	661.09
G45	1000	9990	4	9503.74	$2\times 10^{-4}$	9929.7	8397	0.845	669
G46	1000	9990	3	703.4	$10^{-4}$	9068.5	7822	0.862	661.09
G46	1000	9990	4	9684.93	$4\times 10^{-4}$	9929.9	8333	0.839	657.09
G47	1000	9990	3	777.61	$10^{-4}$	9059.4	7825	0.863	679.89
G47	1000	9990	4	9789.55	$2\times 10^{-4}$	9930.8	8466	0.852	661.09

Appendix C Additional Computational Results for Correlation Clustering

We provide the computational result for the graphs from the GSet dataset (not included in the main article) here. We performed computations for graphs G1-G57 from GSet dataset. The instances for which we were able to generate an $\epsilon\Delta$ -optimal solution to (MA-LSE) are given in Table LABEL:table:CCAddResults, where the parameters, $\epsilon$ and $\Delta$ , were set as given in Section 6. For the instances not in the table, we were not able to generate an $\epsilon\Delta$ -optimal solution after 30 hours of runtime.

Table 3: Result of generating a clustering of graphs from GSet using the method outlined in Section 4. We have,

\textup{infeas}=\max\{\|\textup{diag}(X)-1\|_{\infty},\max\{0,-[\widehat{X}_{\epsilon}]_{ij}\}\}

\textup{AR}=\mathcal{C}_{\textup{best}}/\langle C,\widehat{X}_{\epsilon}\rangle

and

0.75(1-6\epsilon)/(1+4\epsilon)=0.4375

for

\epsilon=0.05

Dataset	$\|V\|$	$\|E^{+}\|$	$\|E^{-}\|$	# Iterations ( $\times 10^{3}$ )	infeas	$\langle C,\widehat{X}_{\epsilon}\rangle$	$\mathcal{C}_{\textup{best}}$	AR	Memory required (in kB)
G2	800	2501	16576	681.65	$8\times 10^{-4}$	848.92	643.13	0.757	1520.18
G3	800	2571	16498	677.56	$7\times 10^{-4}$	835.05	634.83	0.76	1529.59
G4	800	2457	16622	665.93	$6\times 10^{-4}$	852.18	647.37	0.76	1752
G5	800	2450	16623	646.4	$10^{-3}$	840.63	636.21	0.756	1535.92
G6	800	9665	9511	429.9	$3\times 10^{-4}$	25766	21302	0.826	1664
G7	800	9513	9663	423.58	$8\times 10^{-4}$	26001	20790	0.799	1535.06
G8	800	9503	9673	421.34	$6\times 10^{-4}$	26005	21080	0.81	4284
G9	800	9556	9620	426.4	$3\times 10^{-4}$	25903	21326	0.823	1812
G10	800	9508	9668	426.25	$3\times 10^{-4}$	25974	21412	0.824	1535.59
G12	800	798	802	393.69	$9\times 10^{-4}$	3023.4	2034	0.672	444.06
G13	800	817	783	416.29	$8\times 10^{-4}$	3001.1	2010	0.669	613.03
G15	800	3801	825	284.77	$10^{-3}$	529.83	401.19	0.757	460.17
G16	800	3886	749	228.12	$8\times 10^{-4}$	524.69	417.88	0.796	451.07
G17	800	3899	744	2448.633	$9\times 10^{-4}$	536.65	369.04	0.687	480.45
G18	800	2379	2315	1919.44	$2\times 10^{-3}$	7237.6	5074	0.701	434.67
G19	800	2274	2387	2653.79	$2\times 10^{-3}$	7274.2	5130	0.705	496
G20	800	2313	2359	1881.75	$2\times 10^{-3}$	7258.1	5186	0.714	406.09
G21	800	2300	2367	1884.97	$2\times 10^{-3}$	7281.3	5238	0.719	467.26
G23	2000	120	19855	550.77	$2\times 10^{-3}$	1802.2	1373.2	0.762	1651.54
G24	2000	96	19875	812.16	$10^{-3}$	1811.2	1384.6	0.764	1678.04
G25	2000	109	19872	1739.06	$6\times 10^{-4}$	1801.8	1398.1	0.776	1650.48
G26	2000	117	19855	1125.74	$10^{-3}$	1789.9	1356.9	0.758	1650.01
G27	2000	9974	10016	464.93	$5\times 10^{-4}$	30502	22010	0.721	1647.09
G28	2000	9943	10047	553.65	$4\times 10^{-4}$	30412	22196	0.729	1317.78
G29	2000	10035	9955	513.97	$2\times 10^{-4}$	30366	23060	0.759	1310.46
G30	2000	10045	9945	594.09	$3\times 10^{-4}$	30255	22550	0.745	1310.48
G31	2000	9955	10035	1036.9	$2\times 10^{-4}$	29965	22808	0.761	1305.05
G33	2000	1985	2015	403.75	$10^{-3}$	7442	4404	0.591	634.93
G34	2000	1976	2024	863.53	$4\times 10^{-4}$	7307.2	4760	0.651	574.12
G44	1000	229	9721	515.18	$10^{-3}$	810.82	616.61	0.76	655.09
G45	1000	218	9740	504.91	$10^{-3}$	812.21	615.84	0.758	660.51
G46	1000	237	9723	469.6	$10^{-3}$	818.39	623.95	0.762	655.09
G47	1000	230	9732	495.24	$9\times 10^{-4}$	819.63	621.65	0.758	648.32
G49	3000	0	6000	1002.59	0.003	599.64	456.48	0.761	733
G50	3000	0	6000	996.19	0.004	599.64	455.78	0.76	540.26
G52	1000	4750	1127	2041.8	0.001	684.1	441.02	0.644	757.59
G53	1000	4820	1061	785.33	$8\times 10^{-4}$	695.53	445.03	0.639	417.07
G54	1000	4795	1101	2899.99	$7\times 10^{-4}$	686.8	482.57	0.702	517.09
G56	5000	6222	6276	1340.35	0.004	22246	12788	0.574	1243.98

Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering

Abstract

1 Introduction

1.1 Max-k-Cut

1.2 Correlation clustering

1.3 Contributions

Proposition 1.1.

Proposition 1.2.

1.4 Literature review

Max-k-Cut.

Correlation clustering.

1.5 Outline

Notations.

2 Gaussian Sampling-based Frank-Wolfe

Frank-Wolfe with Gaussian sampling.

2.1 SDP with linear equality and inequality constraints

Lemma 2.1.

3 Application of Gaussian Sampling to (k-Cut-SDP)

A new SDP relaxation of Max-k-Cut.

Using Algorithm 1.

Optimality and feasibility results for (k-Cut-Rel).

Lemma 3.1.

Generating a feasible solution to Max-k-Cut.

Lemma 3.2.

Computational complexity of Algorithm 1 when applied to (k-Cut-LSE).

Lemma 3.3.

4 Application of Gaussian Sampling to (MA-SDP)

Optimality and feasibility results for (MA-Rel).

Lemma 4.1.

Generating an approximate clustering.

Lemma 4.2.

Computational complexity of Algorithm 1 when applied to (MA-LSE).

Lemma 4.3.

5 Sparsifying the Laplacian Cost Matrix

Definition 5.1 (τ\tau-spectral closeness).

Max-k-Cut with sparsification.

Lemma 5.1.

Max-Agree with sparsification.

Lemma 5.2.

Lemma 5.3.

6 Computational Results

Generating input graphs for Max-Agree.

Experimental Setup.

7 Discussion

Extending the low-memory method to solve problems with triangle inequalities.

References

Appendix A Proofs

A.1 Proof of Lemma 2.1

Proof.

Lower bound on the objective.

Upper bound on the objective.

Bound on infeasibility.

Completing the upper bound on the objective.

A.2 Proof of Lemma 3.1

Proof.

Lower bound on the objective.

Bound on infeasibility.

Lemma A.1.

Proof.

Upper bound on the objective.

A.3 Proof of Lemma 3.2

Proof.

Proposition A.1.

Proof.

A.4 Proof of Lemma 3.3

Proof.

Upper bound on outer iteration complexity.

Theorem A.1.

Bound on the approximate kk-cut value.

Upper bound on inner iteration complexity.

Lemma A.2 (Convergence of Lanczos algorithm).

Lemma A.3.

Proof.

Computational complexity of Algorithm 1.

A.5 Proof of Lemma 4.1

Proof.

Lower bound on the objective, ⟨C,X^ϵ⟩\langle C,\widehat{X}_{\epsilon}\rangle.

Bound on infeasibility.

Lemma A.4.

Proof.

Definition 5.1 ( $\tau$ -spectral closeness).

Bound on the approximate $k$ -cut value.

Lower bound on the objective, $\langle C,\widehat{X}_{\epsilon}\rangle$ .

Bound on the value of generated $k$ -cut.