This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning with Semi-Definite Programming: new statistical bounds based on fixed point analysis and excess risk curvature

Stéphane Chrétien Mihai Cucuringu Guillaume Lecué and Lucie Neirac Stéphane Chrétien Mihai Cucuringu Guillaume Lecué Lucie Neirac
email: [email protected]
[email protected] [email protected] [email protected]
Université Lyon 2
Oxford University The Alan Turing Institute CREST ENSAE IPParis
Abstract

Many statistical learning problems have recently been shown to be amenable to Semi-Definite Programming (SDP), with community detection and clustering in Gaussian mixture models as the most striking instances [60]. Given the growing range of applications of SDP-based techniques to machine learning problems, and the rapid progress in the design of efficient algorithms for solving SDPs, an intriguing question is to understand how the recent advances from empirical process theory can be put to work in order to provide a precise statistical analysis of SDP estimators.

In the present paper, we borrow cutting edge techniques and concepts from the learning theory literature, such as fixed point equations and excess risk curvature arguments, which yield general estimation and prediction results for a wide class of SDP estimators. From this perspective, we revisit some classical results in community detection from [54] and [23], and we obtain statistical guarantees for SDP estimators used in signed clustering, group synchronization and MAXCUT.

1 Introduction

Many statistical learning problems have recently been shown to be amenable to Semi-Definite Programming (SDP), with community detection and clustering in Gaussian mixture models as the most striking instances where SDP performs significantly better than other current approaches [60]. SDP is a class of convex optimisation problems generalising linear programming to linear problems over semi-definite matrices [94], [107], [21], and which proved an important tool in the computational approach to difficult challenges in automatic control, combinatorial optimisation, polynomial optimisation, data mining, high dimensional statistics and the numerical solution to partial differential equations. The goal of the present paper is to introduce a new fixed point approach to the statistical analysis of SDP-based estimators and illustrate our method on four current problems of interest, namely MAX-CUT, community detection, signed clustering, angular group synchronization. The rest of this section gives some historical background and presents the mathematical definition of SDP based estimators.

1.1 Historical background

SDP is a class of optimisation problems which includes linear programming as a particular case and can be written as the set of problems over symmetric (resp. Hermitian) positive semi-definite matrix variables, with linear cost function and affine constraints, i.e. optimization problems of the form

maxZ0(<A,Z>:<Bj,Z>=bj for j=1,,m)\displaystyle\max_{Z\succeq 0}\left(\bigl{<}A,Z\bigr{>}:\bigl{<}B_{j},Z\bigr{>}=b_{j}\mbox{ for }j=1,\ldots,m\right) (1.1)

where A,B1,,BmA,B_{1},\ldots,B_{m} are given matrices. SDPs are convex programming problems which can be solved in polynomial time when the constraint set is compact and it plays a paramount role in a large number of convex and nonconvex problems, for which it often appear as a convex relaxation [6].

We will sometimes use the notation 𝕊n,+\mathbb{S}_{n,+} (resp. 𝕊n,\mathbb{S}_{n,-}) for the cone of positive semi-definite matrices (resp. negative semi-definite matrices).

1.1.1 Early history

Early use of Semi-Definite programming to statistics can be traced back to [88] and [42]. In the same year, Shapiro used SDP in factor analysis [89]. The study of the mathematical properties of SDP then gained momentum with the introduction of Linear Matrix Inequalities (LMI) and their numerous applications in control theory, system identification and signal processing. The book [20] is the standard reference of these type of results, mostly obtained in the 90’s.

1.1.2 The Goemans-Williamson SDP relaxation of Max-Cut and its legacy

A notable turning point is the publication of [49] where SDP was shown to provide a 0.87 approximation to the NP-Hard problem known as Max-Cut. The Max-Cut problem is a clustering problem on graphs which consists in finding two complementary subsets SS and ScS^{c} of nodes such that the sum of the weights of the edges between SS and ScS^{c} is maximal. In [49], the authors approach this difficult combinatorial problem by using what is now known as the Goemans-Williamson SDP relaxation and use the Choleski factorization of the optimal solution to this SDP in order to produce a randomized scheme achieving the .87 bound in expectation. Moreover, this problem can be seen as a first instance where the Laplacian of a graph is employed in order to provide an optimal bi-clustering in a graph and certainly represents the first chapter of a long and fruitful relationship between clustering, embedding and Laplacians. Other SDP schemes for approximating hard combinatorial problems are, to name a few, for the graph coloring problem [61], for satisfiability problem [49, 48]. These results were later surveyed in [70, 47] and [106]. The randomized scheme introduced by Goemans and Williamson was then further improved in order to study more general Quadratically Constrained Quadratic Programmes (QCQP) in various references, most notably [79, 110] and further extended in [56]. Many applications to signal processing are discussed in [81], [74]; one specific reduced complexity implementation in the form of an eigenvalue minimisation problem and its application to binary least-squares recovery and denoising is presented in [27].

1.1.3 Relaxation of machine learning and high dimensional statistical estimation problems

Applications of SDP to problems related with machine learning is more recent and probably started with the SDP relaxation of KK-means in [83, 82] and later in [5]. This approach was then further improved using a refined statistical analysis by [87] and [46]. Similar methods have also been applied to community detection [55, 3] and for the weak recovery viewpoint, [54]. This last approach was also re-used via the kernel trick for the point cloud clustering [28]. Another incarnation of SDP in machine learning is the extensive use of nuclear norm-penalized least-square costs as a surrogate for rank-penalization in low-rank recovery problems such as matrix completion in recommender systems, matrix compressed sensing, natural language processing and quantum state tomography; these topics are surveyed in [35].

The problem of manifold learning was also addressed using SDP and is often mentioned as one of the most accurate approaches to the problem, let aside its computational complexity; see [103, 105, 104, 57]. Connections with the design of fast converging Markov-Chains were also exhibited in [93].

In a different direction, A. Singer and collaborators have recently promoted the use of SDP relaxation for estimation under group invariance, an active area with many applications [92, 8]. SDP-based relaxations have also been considered in [30] in the context of synchronization over 2\mathbb{Z}_{2} in signed multiplex networks with constraints, and [31] in the setting of ranking from inconsistent and incomplete pairwise comparisons where an SDP-based relaxation of angular synchronization over SO(2) outperformed a suite of state-of-the-art algorithms from the literature. Phase recovery using SDP was studied in e.g. [102] and [40]. An extension to multi-partite clustering based on SDP was then proposed in [61]. Other important applications of SDP are, information theory [73], estimation in power networks [67], quantum tomography [77], [50] and polynomial optimization via Sums-of-squares relaxations [66, 14]. Sums of squares relaxations were recently applied to statistical problems in [38, 58, 39]. Extension to the field of complex numbers, with <,>\bigl{<}\cdot,\cdot\bigr{>} denoting the Hermitian inner product, has been less extensively studied but has many interesting applications and comes with efficient algorithms [49, 45].

1.2 Mathematical formulation of the problem

The general problem we want to study can be stated as follows. Let AA be a random matrix in n×n\mathbb{R}^{n\times n} and 𝒞n×n{\cal C}\subset\mathbb{R}^{n\times n} be a constraint. The object that we want to recover, for instance, the community membership vector in community detection, is related to an oracle defined as

ZargmaxZ𝒞<𝔼A,Z>Z^{*}\in\operatorname*{argmax}_{Z\in{\cal C}}\;\bigl{<}\mathbb{E}A,Z\bigr{>} (1.2)

where <A,B>=Tr(AB¯)=AijB¯ij\bigl{<}A,B\bigr{>}={\rm Tr}(A\bar{B}^{\top})=\sum A_{ij}\bar{B}_{ij} when A,Bn×nA,B\in{\mathbb{C}}^{n\times n} where z¯\bar{z} is the conjugate of zz\in{\mathbb{C}}. We would like to estimate ZZ^{*}, from which we can ultimately retrieve the object that really matters to us (for instance, by considering a singular vector associated to the largest singular value of ZZ^{*}). To that end, we consider the following estimator

Z^argmaxZ𝒞<A,Z>,\hat{Z}\in\operatorname*{argmax}_{Z\in{\cal C}}\;\bigl{<}A,Z\bigr{>}, (1.3)

which is simply obtained by replacing the unobserved quantity 𝔼A\mathbb{E}A by the observation AA.

As pointed out, in many situations, ZZ^{*} is not the object we want to estimate, but there is a straightforward relation between ZZ^{*} and this object. For instance, consider the community detection problem, where the goal is to recover the class community vector x{1,1}nx^{*}\in\{-1,1\}^{n} of nn nodes. Here, when 𝒞{\cal C} is well chosen, there is a close relation between ZZ^{*} and xx^{*}, given by Z=x(x)Z^{*}=x^{*}(x^{*})^{\top}. We therefore need a final step to estimate xx^{*} using Z^\hat{Z}, for instance, by letting x^\hat{x} denote a top eigenvector of Z^\hat{Z}, and then using the Davis-Kahan ”sin-theta” Theorem [36, 108] to control the estimation of xx^{*} by x^\hat{x} from the one of ZZ^{*} by Z^\hat{Z}.

When the constraint 𝒞{\cal C} is of the form 𝒞={Zn×n:Z0,<Z,Bj>=bj,j=1,,m}{\cal C}=\{Z\in{\mathbb{R}}^{n\times n}:Z\succeq 0,\bigl{<}Z,B_{j}\bigr{>}=b_{j},j=1,\ldots,m\} where B1,,Bmn×nB_{1},\ldots,B_{m}\in{\mathbb{R}}^{n\times n} and Z0Z\succeq 0 is notation for “ ZZ is positive semidefinite” then (1.3) is a semidefinite programming (SDP) [21].

1.3 Goal of the paper

The aim of the present work is to present a general approach to the study of the statistical properties of SDP-based estimators defined in (1.3). In particular, using our framework, one is able to obtain new (non-asymptotic) rates of convergence or exact reconstruction properties for a wide class of estimators obtained as a solution of a semidefinite program like (1.3). Specifically, our goal is to show that the solution to (1.3) can be analyzed in a statistical way when 𝔼A\mathbb{E}A is only partially and noisily observed in AA. Even though the constraint 𝒞{\cal C} may not necessarily be the intersection of the set of PSD (or Hermitian) matrices with linear spaces – such as in the definition of SDP – in the following, a solution Z^\hat{Z} of (1.3) will be called a SDP estimator because in all our examples Z^\hat{Z} will be solution of a SDP. But for the sake of generality we will only assume only minimal requirement on the shape of 𝒞{\cal C}. We also illustrate our results on a number of specific machine learning problems such as various forms of clustering problems and angular group synchronization. Three out of the four examples worked out here are concerned with real-valued matrices. Only the angular synchronization problem is approached using complex matrices.

2 Main general results for the statistical analysis of SDP estimators

From a statistical point of view, the task remains to estimate in the most efficient way the oracle ZZ^{*}, and to that end Z^\hat{Z} is our candidate estimator. The point of view we will use to evaluate how far Z^\hat{Z} is from ZZ^{*} is coming from the learning theory literature. We therefore see Z^\hat{Z} as an empirical risk minimization (ERM) procedure built on a single observation AA, where the loss function is the linear one Z𝒞Z(A)=<A,Z>Z\in{\cal C}\to\ell_{Z}(A)=-\bigl{<}A,Z\bigr{>}, and the oracle ZZ^{*} is indeed the one minimizing the risk function Z𝒞𝔼Z(A)Z\in{\cal C}\to\mathbb{E}\ell_{Z}(A) over 𝒞{\cal C}. Having this setup in mind, we can use all the machinery developed in learning theory (see for instance [99, 63, 76, 97]) to obtain rates of convergence for the ERM (here Z^\hat{Z}) toward the oracle (here ZZ^{*}).

There is one key quantity driving the rate of convergence of the ERM: a fixed point complexity parameter. This type of parameter carries all the statistical complexity of the problem, and even though it is usually easy to set up, its computation can be tedious since it requires to control, with large probability, the supremum of empirical processes indexed by “localized classes”. We now define this complexity fixed point related to the problem we are considering here.

Definition 1.

Let 0<Δ<10<\Delta<1. The fixed point complexity parameter at deviation 1Δ1-\Delta is

r(Δ)=inf(r>0:[supZ𝒞:<𝔼A,ZZ>r<A𝔼A,ZZ>(1/2)r]1Δ).r^{*}(\Delta)=\inf\left(r>0:{\mathbb{P}}\left[\sup_{Z\in{\cal C}:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r}\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}\leq(1/2)r\right]\geq 1-\Delta\right). (2.1)

Fixed point complexity parameters have been extensively used in learning theory since the introduction of the localization argument [76, 64, 97, 13]. When they can be computed, they are preferred to the (global) analysis developed by Chervonenkis and Vapnik [99] to study ERM, since the latter analysis always yields slower rates given that the Vapnik-Chervonenkis bound is a high probability bound on the non-localized empirical process supZ𝒞<A𝔼A,ZZ>\sup_{Z\in{\cal C}}\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}, which is an upper bound for r(Δ)r^{*}(\Delta) since {Z𝒞:<𝔼A,ZZ>r}𝒞\{Z\in{\cal C}:\bigl{<}{\mathbb{E}}A,Z^{*}-Z\bigr{>}\leq r\}\subset{\cal C}. The gap between the two global and local analysis can be important since fast rates cannot be obtained using the VC approach, whereas the localization argument resulting in fixed points such as the one in Definition 1 may yield fast rates of convergence or even exact recovery results.

An example of a Vapnik-Chervonenkis’s type of analysis of SDP estimators can be found in [54] for the community detection problem. An improvement of the latter approach has been obtained in [41] thanks to a localization argument – even though it is not stated in these words (we elaborate more on the two approaches from [54, 41] in Section 3). Somehow, a fixed point such as (2.1) is a sharp way to measure the statistical performances of ERM estimators and in particular for the SDP estimators that we are considering here. They can even be proved to be optimal (in a minimax sense) when the noise A𝔼AA-\mathbb{E}A is Gaussian [69] and under mild conditions on the structure of 𝒞{\cal C}.

Before stating our general result, we first recall a definition of a minimal structural assumption on the constraint 𝒞{\cal C}.

Definition 2.

We say that the set 𝒞{\cal C} is star-shaped in ZZ^{*} when for all Z𝒞Z\in{\cal C}, the segment [Z,Z][Z,Z^{*}] is in 𝒞{\cal C}.

This is a pretty mild assumption satisfied, for instance, when 𝒞{\cal C} is convex, which is the setup we will always encounter in practical applications, given that SDP estimators are usually introduced after a “convex relaxation” argument. Our main general statistical bound on SDP estimators is as follows.

Theorem 1.

We assume that the constraint 𝒞{\cal C} is star-shaped in ZZ^{*}. Then, for all 0<Δ<10<\Delta<1, with probability at least 1Δ1-\Delta, it holds true that <𝔼A,ZZ^>r(Δ)\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}\leq r^{*}(\Delta).

Theorem 1 applies to any type of setup where an oracle ZZ^{*} is estimated by an estimator Z^\hat{Z} such as in (1.3). Its result shows that Z^\hat{Z} is almost a maximizer of the true objective function Z<𝔼A,Z>Z\to\bigl{<}{\mathbb{E}}A,Z\bigr{>} over 𝒞{\cal C} up to r(Δ)r^{*}(\Delta). In particular, when r(Δ)=0r^{*}(\Delta)=0, Z^\hat{Z} is exactly a maximizer such as ZZ^{*} and, in that case, we can work with Z^\hat{Z} as if we were working with ZZ^{*} without any loss.

Theorem 1 may be applied in many different settings; in the following, we study four such instances. We will apply Theorem 1 (or one of its corollary stated below) to some popular problems in the networks and graph signal processing literatures, namely, community detection [43] (we will mostly revisit the results in [54] and [41] from our perspective), signed clustering [32], group synchronization [90] and MAX-CUT.

The proof of Theorem 1 is straightforward (mostly because the loss function is linear). Its importance stems from the fact that it puts forward two important concepts originally introduced in Learning Theory, namely that the complexity of the problem comes from the one of the local subset 𝒞{Z:<𝔼A,ZZ>r(Δ)}{\cal C}\cap\{Z:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r^{*}(\Delta)\} and that the “radius” r(Δ)r^{*}(\Delta) of the localization is solution of a fixed point equation. For a setup given by a random matrix AA and a constraint 𝒞{\cal C}, we should try to understand how these two ideas apply to obtain estimation properties of SDP estimators such as Z^\hat{Z}. That is to understand the shape of the local subsets 𝒞{Z:<𝔼A,ZZ>r},r>0{\cal C}\cap\{Z:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r\},r>0 and the maximal oscillations of the empirical process Z<A𝔼A,ZZ>Z\to\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>} indexed by these local subsets. We will consider this task in three distinct problem instances. We now provide a proof for Theorem 1.

Proof of Theorem 1. Denote r=r(Δ)r^{*}=r^{*}(\Delta). Assume first that r>0r^{*}>0 (the case r=0r^{*}=0 is analyzed later). Let Ω\Omega^{*} be the event onto which for all Z𝒞Z\in{\cal C} if <𝔼A,ZZ>r\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r^{*} then <A𝔼A,ZZ>(1/2)r\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}\leq(1/2)r^{*}. By Definition of rr^{*}, we have [Ω]1Δ{\mathbb{P}}[\Omega^{*}]\geq 1-\Delta.

Let Z𝒞Z\in{\cal C} be such that <𝔼A,ZZ>>r\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}>r^{*} and define ZZ^{\prime} such that ZZ=(r/<𝔼A,ZZ>)(ZZ)Z^{\prime}-Z^{*}=\left(r^{*}/\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\right)(Z-Z^{*}). We have <𝔼A,ZZ>=r\bigl{<}\mathbb{E}A,Z^{*}-Z^{\prime}\bigr{>}=r^{*} and Z𝒞Z^{\prime}\in{\cal C} because 𝒞{\cal C} is star-shaped in ZZ^{*}. Therefore, on the event Ω\Omega^{*}, <A𝔼A,ZZ>(1/2)r\bigl{<}A-\mathbb{E}A,Z^{\prime}-Z^{*}\bigr{>}\leq(1/2)r^{*} and so <A𝔼A,ZZ>(1/2)<𝔼A,ZZ>\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}\leq(1/2)\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}. It therefore follows that on the event Ω\Omega^{*}, if Z𝒞Z\in{\cal C} is such that <𝔼A,ZZ>>r\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}>r^{*} then

<A,ZZ>(1/2)<𝔼A,ZZ><r/2\bigl{<}A,Z-Z^{*}\bigr{>}\leq(-1/2)\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}<-r^{*}/2

which implies that <A,ZZ><0\bigl{<}A,Z-Z^{*}\bigr{>}<0 and therefore ZZ does not maximize Z<A,Z>Z\to\bigl{<}A,Z\bigr{>} over 𝒞{\cal C}. As a consequence, we necessarily have <𝔼A,ZZ^>r\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}\leq r^{*} on the event Ω\Omega^{*} (which holds with probability at least 1Δ1-\Delta).

Let us now assume that r=0r^{*}=0. There exists a decreasing sequence (rn)n(r_{n})_{n} of positive real numbers tending to r=0r^{*}=0 such that for all n0n\geq 0, [Ωn]1Δ{\mathbb{P}}[\Omega_{n}]\geq 1-\Delta where Ωn\Omega_{n} is the event Ωn={ψ(rn)θ/2}\Omega_{n}=\left\{\psi(r_{n})\leq\theta/2\right\} where for all r>0r>0,

ψ(r)=1rsupZ𝒞:<𝔼A,ZZ>r<A𝔼A,ZZ>.\psi(r)=\frac{1}{r}\sup_{Z\in{\cal C}:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r}\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}.

Since 𝒞{\cal C} is star-shapped in ZZ^{*}, ψ\psi is a non-increasing function and so (Ωn)n(\Omega_{n})_{n} is a decreasing sequence (i.e. Ωn+1Ωn\Omega_{n+1}\subset\Omega_{n} for all n0n\geq 0). It follows that [nΩn]=limn[Ωn]1Δ{\mathbb{P}}[\cap_{n}\Omega_{n}]=\lim_{n}{\mathbb{P}}[\Omega_{n}]\geq 1-\Delta. Let us now place ourselves on the event nΩn\cap_{n}\Omega_{n}. For all nn, since Ωn\Omega_{n} holds and rn>0r_{n}>0, we can use the same argument as in first case to conclude that <𝔼A,ZZ^>rn\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}\leq r_{n}. Since the latter inequality is true for all nn (on the event nΩn\cap_{n}\Omega_{n}) and (rn)n(r_{n})_{n} tends to zero, we conclude that <𝔼A,ZZ^>0=r\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}\leq 0=r^{*}.  

The main conclusion of Theorem 1 is that all the information for the problem of estimating ZZ^{*} via Z^\hat{Z} is contained in the fixed point r(Δ)r^{*}(\Delta). We therefore have to compute or upper bound such a fixed point. This might be difficult in great generality but there are some tools that can help to find upper bounds on r(Δ)r^{*}(\Delta).

A first approach is to understand the shape of the local sets 𝒞{Z:<𝔼A,ZZ>r},r>0{\cal C}\cap\{Z:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r\},r>0 and to that end it is helpful to characterize the curvature of the excess risk Z<𝔼A,ZZ>Z\to\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>} around its maximizer ZZ^{*}. This type of local characterization of the excess risk is also a tool used in Learning theory that goes back to classical conditions such as the Margin assumption [96, 75] or the Bernstein condition [11]. The latter condition was initially introduced as an upper bound of the variance term by its expectation: for all Z𝒞,𝔼(A(Z)A(Z))2c0𝔼(A(Z)A(Z))Z\in{\cal C},\;\mathbb{E}(\ell_{A}(Z)-\ell_{A}(Z^{*}))^{2}\leq c_{0}\mathbb{E}(\ell_{A}(Z)-\ell_{A}(Z^{*})) for some absolute constant c0c_{0}, but it has now been better understood as a way to discriminate the oracle from the other points in the model 𝒞{\cal C}. These assumptions were global assumption in the sense that they concern all ZZ in 𝒞{\cal C}. It has been recently shown [26] that only the local curvature of the excess risk needs to be understood. We now introduce this tool in our setup.

We characterize the local curvature of the excess risk by some function G:n×nG:{\mathbb{R}}^{n\times n}\to{\mathbb{R}}. Most of the time the GG function is a norm like the 1\ell_{1}-norm or a power of a norm such as the 2\ell_{2} norm to the square. The radius defining the local subset onto which we need to understand the curvature of the excess risk is also solution of a fixed point equation:

rG(Δ)=inf(r>0:[supZ𝒞:G(ZZ)r<A𝔼A,ZZ>(1/2)r]1Δ).r^{*}_{G}(\Delta)=\inf\left(r>0:{\mathbb{P}}\left[\sup_{Z\in{\cal C}:G(Z^{*}-Z)\leq r}\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}\leq(1/2)r\right]\geq 1-\Delta\right). (2.2)

The difference between the two fixed points r(Δ)r^{*}(\Delta) and rG(Δ)r^{*}_{G}(\Delta) is that the local subsets are not defined using the same proximity function to the oracle ZZ^{*}; the first one uses the excess risk as a proximity function while the second one uses the GG function as a proximity function. The GG function should play the role of a simple description of the curvature of the excess risk function locally around ZZ^{*}; that is formalized in the next assumption.

Assumption 1.

For all Z𝒞Z\in{\cal C}, if <𝔼A,ZZ>rG(Δ)\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r^{*}_{G}(\Delta) then <𝔼A,ZZ>G(ZZ)\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\geq G(Z^{*}-Z).

Typical examples of curvature function GG will have the form G(ZZ)=θZZκG(Z^{*}-Z)=\theta\left\|Z^{*}-Z\right\|^{\kappa} for some κ1\kappa\geq 1, θ>0\theta>0 and some norm \left\|\cdot\right\|. In that case, the parameter κ\kappa was initially called the margin parameter [95, 75]. Even though the relation given in Assumption 1 has been typically referred to as a margin condition or Bernstein condition in the learning theory literature, we will rather call it a local curvature assumption, following [54] and [26], since this type of relation describes the behavior of the risk function locally around its oracle. The main advantage for finding a local curvature function GG is that rG(Δ)r^{*}_{G}(\Delta) should be easier to compute than r(Δ)r^{*}(\Delta) and r(Δ)rG(Δ)r^{*}(\Delta)\leq r^{*}_{G}(\Delta) because of the definition of rG(Δ)r_{G}^{*}(\Delta) and {Z𝒞:<𝔼A,ZZ>rG(Δ)}{Z𝒞:G(ZZ)rG(Δ)}\{Z\in{\cal C}:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r_{G}^{*}(\Delta)\}\subset\{Z\in{\cal C}:G(Z^{*}-Z)\leq r_{G}^{*}(\Delta)\} (thanks to Assumption 1). We can therefore state the following corollary.

Corollary 1.

We assume that the constraint 𝒞{\cal C} is star-shaped in ZZ^{*} and that the “local curvature” Assumption 1 holds for some 0<Δ<10<\Delta<1. With probability at least 1Δ1-\Delta, it holds true that

rG(Δ)<𝔼A,ZZ^>G(ZZ^).r^{*}_{G}(\Delta)\geq\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}\geq G(Z^{*}-\hat{Z}).

When it is possible to describe the local curvature of the excess risk around its oracle by some GG function and when some estimate of rG(Δ)r^{*}_{G}(\Delta) can be obtained, Corollary 1 applies and estimation results of ZZ^{*} by Z^\hat{Z} (w.r.t. to both the ”excess risk” metric <𝔼A,ZZ^>\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>} and the GG metric) follow. If not, either because understanding the local curvature of the excess risk or the computation of rG(Δ)r^{*}_{G}(\Delta) is difficult, it is still possible to apply Theorem 1 with the global VC approach which boils down to simply upper bound the fixed point r(Δ)r^{*}(\Delta) used in Theorem 1 by a global parameter that is a complexity measure of the entire set 𝒞{\cal C}:

r(Δ)inf(r>0:[supZ𝒞<A𝔼A,ZZ>(1/2)r]1Δ).r^{*}(\Delta)\leq\inf\left(r>0:{\mathbb{P}}\left[\sup_{Z\in{\cal C}}\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}\leq(1/2)r\right]\geq 1-\Delta\right). (2.3)

Interestingly, if the latter last resort approach is used then, following the approach form [54], Grothendieck’s inequality [51, 85] appears to be a powerful tool to upper bound the right-hand side of (2.3) in the case of the community detection problem such as in [53] as well as in the MAX-CUT problem. Of course, when it is possible to avoid this ultimate global approach one has to do it because the local approach will always provide better results.

Finally, proving a “local curvature” property such as in Assumption 1 may be difficult because it requires to understand the shape of the local subsets 𝒞{Z:<𝔼A,ZZ>r},r>0{\cal C}\cap\{Z:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r\},r>0. It is however possible to simplify this assumption if getting estimation results of ZZ^{*} only w.r.t. the GG function (and not necessarily an upper bound on the excess risk <𝔼A,ZZ^>\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}) is enough. In that case, Assumption 1 may be replaced by the following one.

Assumption 2.

For all Z𝒞Z\in{\cal C}, if G(ZZ)rG(Δ)G(Z^{*}-Z)\leq r^{*}_{G}(\Delta) then <𝔼A,ZZ>G(ZZ)\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\geq G(Z^{*}-Z).

Assumption 2 assumes a curvature of the excess risk function in a GG neighborhood of ZZ^{*} unlike Assumption 1 which grants this curvature in an “ excess risk neighborhood”. The shape of a neighborhood defined by the GG function may be easier to understand (for instance when GG is a norm, a neighborhood defined by GG is the ball of a norm centered at ZZ^{*} with radius rG(Δ)r_{G}^{*}(\Delta)). In general, the latter assumption and Assumption 1 do not compare. If Assumption 2 holds then Z^\hat{Z} can estimate ZZ^{*} w.r.t. the GG function.

Theorem 2.

We assume that the constraint 𝒞{\cal C} is star-shaped in ZZ^{*} and that the “local curvature” Assumption 2 holds for some 0<Δ<10<\Delta<1. We assume that the GG function is continuous, G(0)=0G(0)=0 and G(λ(ZZ))λG(ZZ)G(\lambda(Z^{*}-Z))\leq\lambda G(Z^{*}-Z) for all λ[0,1],ZZ𝒞\lambda\in[0,1],Z\in Z^{*}-{\cal C}. With probability at least 1Δ1-\Delta, it holds true that G(ZZ^)rG(Δ)G(Z^{*}-\hat{Z})\leq r^{*}_{G}(\Delta).

Proof of Theorem 2. Let r=rG(Δ)r^{*}=r^{*}_{G}(\Delta). First assume that r>0r^{*}>0. Let Z𝒞Z\in{\cal C} be such that G(ZZ)>rG(Z^{*}-Z)>r^{*}. Let f:λ[0,1]G(λ(ZZ))f:\lambda\in[0,1]\to G(\lambda(Z^{*}-Z)). We have f(0)=G(0)=0f(0)=G(0)=0, f(1)=G(ZZ)>rf(1)=G(Z^{*}-Z)>r^{*} and ff is continuous. Therefore, there exists λ0(0,1)\lambda_{0}\in(0,1) such that f(λ0)=rf(\lambda_{0})=r^{*}. We let ZZ^{\prime} be such that ZZ=λ0(ZZ)Z^{\prime}-Z^{*}=\lambda_{0}(Z-Z^{*}). Since 𝒞{\cal C} is star-shapped in ZZ^{*} and λ0[0,1]\lambda_{0}\in[0,1] we have Z𝒞Z^{\prime}\in{\cal C}. Moreover, G(ZZ)=rG(Z^{*}-Z^{\prime})=r^{*}. As a consequence, on the event Ω\Omega^{*} such that for all Z𝒞Z\in{\cal C} if G(ZZ)rG(Z^{*}-Z)\leq r^{*} then <A𝔼A,ZZ>(1/2)r\bigl{<}A-{\mathbb{E}}A,Z-Z^{*}\bigr{>}\leq(1/2)r^{*}, we have <A𝔼A,ZZ>(1/2)r\bigl{<}A-{\mathbb{E}}A,Z^{\prime}-Z^{*}\bigr{>}\leq(1/2)r^{*}. The latter and Assumption 2 imply that, on Ω\Omega^{*},

(1/2)r<A,ZZ>+<𝔼A,ZZ><A,ZZ>+G(ZZ)<A,ZZ>+r(1/2)r^{*}\geq\bigl{<}A,Z^{\prime}-Z^{*}\bigr{>}+\bigl{<}{\mathbb{E}}A,Z^{*}-Z^{\prime}\bigr{>}\geq\bigl{<}A,Z^{\prime}-Z^{*}\bigr{>}+G(Z^{*}-Z^{\prime})\geq\bigl{<}A,Z^{\prime}-Z^{*}\bigr{>}+r^{*}

and so <A,ZZ>r/2\bigl{<}A,Z^{\prime}-Z^{*}\bigr{>}\leq-r^{*}/2. Finally, using the definition of ZZ^{\prime}, we obtain

<A,ZZ>=(1/λ0)<A,ZZ>r/(2λ0)<0.\bigl{<}A,Z-Z^{*}\bigr{>}=(1/\lambda_{0})\bigl{<}A,Z^{\prime}-Z^{*}\bigr{>}\leq-r^{*}/(2\lambda_{0})<0.

In particular, ZZ cannot be a maximizer of Z<A,Z>Z\to\bigl{<}A,Z\bigr{>} over 𝒞{\cal C} and so necessarily, on the event Ω\Omega^{*}, G(ZZ^)rG(Z^{*}-\hat{Z})\leq r^{*}.

Let us now consider the case where r=0r^{*}=0. Using the same approach as in the proof of Theorem 1, we only have to check that the function

ψ:r>01rsupZ𝒞:G(ZZ)r<A𝔼A,ZZ>\psi:r>0\to\frac{1}{r}\sup_{Z\in{\cal C}:G(Z^{*}-Z)\leq r}\bigl{<}A-\mathbb{E}A,Z-Z^{*}\bigr{>}

is non-increasing. Let 0<r1<r20<r_{1}<r_{2}. W.l.o.g. we may assume that there exists some Z2𝒞Z_{2}\in{\cal C} such that G(ZZ2)r2G(Z^{*}-Z_{2})\leq r_{2} and ψ(r2)=<A𝔼A,Z2Z>/r2\psi(r_{2})=\bigl{<}A-\mathbb{E}A,Z_{2}-Z^{*}\bigr{>}/r_{2}. If G(ZZ2)r1G(Z^{*}-Z_{2})\leq r_{1} then ψ(r2)(r1/r2)ψ(r1)ψ(r1)\psi(r_{2})\leq(r_{1}/r_{2})\psi(r_{1})\leq\psi(r_{1}). If G(ZZ2)>r1G(Z^{*}-Z_{2})>r_{1} then there exists λ0(0,1)\lambda_{0}\in(0,1) such that for Z1=Z+λ0(Z2Z)Z_{1}=Z^{*}+\lambda_{0}(Z_{2}-Z^{*}) we have G(ZZ1)=r1G(Z^{*}-Z_{1})=r_{1} and Z1𝒞Z_{1}\in{\cal C}. Moreover, r1=G(λ0(ZZ2))λ0G(ZZ2)λ0r2r_{1}=G(\lambda_{0}(Z^{*}-Z_{2}))\leq\lambda_{0}G(Z^{*}-Z_{2})\leq\lambda_{0}r_{2} and so λ0r1/r2\lambda_{0}\geq r_{1}/r_{2}. It follows that

ψ(r2)=1r2<A𝔼A,Z2Z>=1λ0r2<A𝔼A,Z1Z>r1λ0r2ψ(r1)ψ(r1)\psi(r_{2})=\frac{1}{r_{2}}\bigl{<}A-\mathbb{E}A,Z_{2}-Z^{*}\bigr{>}=\frac{1}{\lambda_{0}r_{2}}\bigl{<}A-\mathbb{E}A,Z_{1}-Z^{*}\bigr{>}\leq\frac{r_{1}}{\lambda_{0}r_{2}}\psi(r_{1})\leq\psi(r_{1})

where we used that ψ(r)>0\psi(r)>0 for all r>0r>0 because Z{Z𝒞:G(ZZ)r}Z^{*}\in\{Z\in{\cal C}:G(Z^{*}-Z)\leq r\} for all r>0r>0.  

As a consequence, Theorem 1, Corollary 1 and Theorem 2 are the three tools at our disposal to study the performance of SDP estimators depending on the deepness of understanding we have on the problem. The best approach is given by Theorem 1 when it is possible to compute efficiently the complexity fixed point r(Δ)r^{*}(\Delta). If the latter approach is too complicated (likely because understanding the geometry of the local subset 𝒞{Z:<𝔼A,ZZ>r},r>0{\cal C}\cap\{Z:\bigl{<}{\mathbb{E}}A,Z^{*}-Z\bigr{>}\leq r\},r>0 may be difficult) then one may resort to find a curvature function GG of the excess risk locally around ZZ^{*}. In that case, both Corollary 1 and Theorem 2 may apply depending on the hardness to find a local curvature function GG on an “excess risk neighborhood” (see Assumption 1) or a “GG-neighborhood” (see Assumption 2). Finally, if no local approach can be handled (likely because describing the curvature of the excess risk in any neighborhood of ZZ^{*} or controlling the maximal oscillations of the empirical process Z<𝔼AA,ZZ>Z\to\bigl{<}{\mathbb{E}}A-A,Z^{*}-Z\bigr{>} locally are too difficult) then one may resort ultimately to a global approach which follows from Theorem 1 as explained in (2.3). In the following, we will use these tools for various problems.

Results like Theorem 1, Corollary 1 and Theorem 2 appeared in many papers on ERM in learning theory such as in [64, 11, 76, 69]. In all these results, typical loss functions such as the quadratic or logistic loss functions were not linear one such as the one we are using here. From that point of view our problem is easier and this can be seen by the simplicity to prove our three general results from this section. What is much more complicated here than in other more classical problems in Learning Theory is the computation of the fixed point because 1) the stochastic processes Z<A𝔼A,ZZ>Z\to\bigl{<}A-{\mathbb{E}}A,Z-Z^{*}\bigr{>} may be far from being a Gaussian process if the noise matrix A𝔼AA-{\mathbb{E}}A is complicated and 2) the local sets {Z𝒞:<𝔼A,ZZ>r}\{Z\in{\cal C}:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r\} or {Z𝒞:G(ZZ)r}\{Z\in{\cal C}:G(Z^{*}-Z)\leq r\} for r>0r>0 maybe very hard to describe in a simple way. Fortunately, we will rely on several previous papers such as [41] to solve such problems.

3 Revisiting two results from the community detection literature [41, 54]

The rapid growth of social networks on the Internet has lead many statisticians and computer scientists to focus their research on data coming from graphs. One important topic that has attracted particular interest during the last decades is that of community detection [43, 86], where the goal is to recover mesoscopic structures in a network, the so-called called communities. A community consists of group of nodes that are relatively densely connected to each other, but sparsely connected to other dense groups present within the network. The motivation for this line of work stems not only from the fact that finding communities in a network is an interesting and challenging problem of its own as it leads to understanding structural properties of networks, but community detection is also used as a data pre-processing step for other statistical inference tasks on large graphs, as it facilitates parallelization and allows one to distribute time consuming processes on several smaller subgraphs (i.e., the extracted communities).

One challenging aspect of the community detection problem arises in the setting of sparse graphs. Many of the existing algorithms, which enjoy theoretical guarantees, do so in the relatively dense regime for the edge sampling probability, where the expected average degree is of the order Θ(logn)\Theta(\log n). The problem becomes challenging in very sparse graphs with bounded average degree. To this end, Guédon and Vershynin proposed a semidefinite relaxation for a discrete optimization problem [54], an instance of which encompasses the community detection problem, and showed that it can recover a solution with any given relative accuracy even in the setting of very sparse graphs with average degree of order O(1)O(1).

A subset of the existing literature for community detection and clustering relies on spectral methods, which consider the adjacency matrix associated to a graph, and employ its eigenvalues, and especially eigenvectors, in the analysis process or to propose efficient algorithms to solve the task at hand. Along these lines, Can et al. [68] proposed a general framework for optimizing a general function of the graph adjacency matrix over discrete label assignments by projecting onto a low-dimensional subspace spanned by vectors that approximate the top eigenvectors of the expected adjacency matrix. The authors consider the problem of community detection with k=2k=2 communities, which they frame as an instance of their proposed framework, combined with a regularization step that shifts each entry in the adjacency matrix by a small constant τ\tau, which renders their methodology applicable in the sparse regime as well.

In the remainder of this section, we focus on the community detection problem on random graphs under the general stochastic block model. We will mostly revisit the work in [54] and [41] from the perspective given by Theorem 1, which simplifies the proof in [41] since the peeling argument is no longer required, and neither is the upper bound from [54] unlike [41], thanks to the homogeneity argument hidden in Theorem 1 (which underlies the localization argument).

We first recall the definition of the generalized stochastic block model (SBM). We consider a set of vertices V={1,,n}V=\{1,\cdots,n\}, and assume it is partitioned into KK communities 𝒞1,,𝒞K{\cal C}_{1},\cdots,{\cal C}_{K} of arbitrary sizes |𝒞1|=l1,,|𝒞K|=lK|{\cal C}_{1}|=l_{1},\cdots,|{\cal C}_{K}|=l_{K}.

Definition 3.

For any pair of nodes i,jVi,j\in V, we denote by iji\sim j when ii and jj belong to the same community (i.e., there exists k{1,,K}k\in\{1,\ldots,K\}) such that i,j𝒞ki,j\in{\cal C}_{k}), and we denote by i≁ji\not\sim j if ii and jj do not belong to the same community.

For each pair (i,j)(i,j) of nodes from VV, we draw an edge between ii and jj with a fixed probability pijp_{ij} independently from the other edges. We assume that there exist numbers pp and qq satisfying 0<q<p<10<q<p<1, such that

{pij>p, if ij and ij,pij=1, if i=j,pij<q, otherwise.\left\{\begin{array}[]{ll}p_{ij}>p\mbox{, if $i\sim j$ and $i\neq j$},\\ p_{ij}=1\mbox{, if }i=j,\\ p_{ij}<q\mbox{, otherwise.}\end{array}\right. (3.1)

We denote by A=(Ai,j)1i,j,nA=(A_{i,j})_{1\leq i,j,\leq n} the observed symmetric adjacency matrix, such that, for all 1ijn1\leq i\leq j\leq n, AijA_{ij} is distributed according to a Bernoulli of parameter pijp_{ij}. The community structure of such a graph is captured by the membership matrix Z¯n×n\bar{Z}\in\mathbb{R}^{n\times n}, defined by Z¯ij=1\bar{Z}_{ij}=1 if iji\sim j, and Z¯ij=0\bar{Z}_{ij}=0 otherwise. The main goal in community detection is to reconstruct Z¯\bar{Z} from the observation AA.

Spectral methods for community detection are very popular in the literature [54, 41, 100, 15, 29]. There are many ways to introduce such methods, one of which being via convex relaxations of certain graph cut problems aiming to minimize a modularity function such as the RatioCut [80]. Such relaxations often lead to SDP estimators, such as those introduced in Section 1.

Considering a random graph distributed according to the generalized stochastic block model, and its associated adjacency matrix AA (i.e. A=AA=A^{\top} and AijBern(pij)A_{ij}\sim{\rm Bern}(p_{ij}) for 1ijn1\leq i\leq j\leq n and pijp_{ij} as defined in (3.1)), we will estimate its membership matrix Z¯\bar{Z} via the following SDP estimator

Z^argmaxZ𝒞<A,Z>,\hat{Z}\in\operatorname*{argmax}_{Z\in\mathcal{C}}\bigl{<}A,Z\bigr{>},

where 𝒞={Zn×n,Z0,Z0,diag(Z)In,i,j=1nZijλ}\mathcal{C}=\{Z\in\mathbb{R}^{n\times n},Z\succeq 0,Z\geq 0,{\rm diag}(Z)\preceq I_{n},\sum_{i,j=1}^{n}Z_{ij}\leq\lambda\} and λ=i,j=1nZ¯ij=k=1K|𝒞k|2\lambda=\sum_{i,j=1}^{n}\bar{Z}_{ij}=\sum_{k=1}^{K}|\mathcal{C}_{k}|^{2} denotes the number of nonzero elements in the membership matrix Z¯\bar{Z}. The motivation for this approach stems from the fact that the membership matrix Z¯\bar{Z} is actually the oracle, i.e., Z=Z¯Z^{*}=\bar{Z} (see Lemma 7.1 in [54] or Lemma 1 below), where

ZargmaxZ𝒞<𝔼A,Z>.Z^{*}\in\operatorname*{argmax}_{Z\in\mathcal{C}}\bigl{<}\mathbb{E}A,Z\bigr{>}.

Following the strategy from Theorem 1 and from our point of view, the upper bound on r(Δ)r^{*}(\Delta) from [54] is the one that is based on the global approach – that is, without localization. Indeed, [54] uses the observation that, for all r>0r>0, it holds true that

supZ𝒞:<𝔼A,ZZ>r<A𝔼A,ZZ>(a)supZ𝒞<A𝔼A,ZZ>(b)2KGA𝔼Acut,\sup_{Z\in{\cal C}:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r}\bigl{<}A-{\mathbb{E}}A,Z-Z^{*}\bigr{>}\overset{(a)}{\leq}\sup_{Z\in{\cal C}}\bigl{<}A-{\mathbb{E}}A,Z-Z^{*}\bigr{>}\overset{(b)}{\leq}2K_{G}\left\|A-{\mathbb{E}}A\right\|_{\mathrm{cut}}, (3.2)

where cut\left\|\cdot\right\|_{{\rm cut}} is the cut-norm111The cut-norm cut\left\|\cdot\right\|_{{\rm cut}} of a real matrix A=(aij)iR,jCA=(a_{ij})_{i\in R,j\in C} with a set of rows indexed by RR and a set of columns indexed by CC, is the maximum, over all IRI\subset R and JCJ\subset C, of the quantity |iI,jJaij||\sum_{i\in I,j\in J}a_{ij}|. It is also the operator norm of AA from \ell_{\infty} to 1\ell_{1} and the “injective norm” in the orginal Grothendieck “résumé” [52, 85] and KGK_{G} is the Grothendieck constant (Grothendieck’s inequality is used in (b), see [85, 100]). Therefore, the localization around the oracle ZZ^{*} by the excess risk “band” Br:={Z:<𝔼A,ZZ>r}B^{*}_{r}:=\{Z:\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq r\} is simply removed in inequality (a). As a consequence, the resulting statistical bound is based on the complexity of the entire class 𝒞{\cal C} whereas, in a localized approach, only the complexity of 𝒞Br{\cal C}\cap B^{*}_{r} matters. Next step in the proof of [54] is a high probability upper bound on A𝔼Acut\left\|A-{\mathbb{E}}A\right\|_{\mathrm{cut}} which follows from Bernstein’s inequality and a union bound since one has A𝔼Acut=maxx,y{1,1}n<A𝔼A,xy>\left\|A-{\mathbb{E}}A\right\|_{\mathrm{cut}}=\max_{x,y\in\{-1,1\}^{n}}\bigl{<}A-{\mathbb{E}}A,xy^{\top}\bigr{>}, then for all t>0t>0, A𝔼Acuttn(n1)/2\left\|A-\mathbb{E}A\right\|_{\mathrm{cut}}\leq tn(n-1)/2 with probability at least 1exp(2nlog2(n(n1)t2)/(16p¯+8t/3))1-\exp\left(2n\log 2-(n(n-1)t^{2})/(16\bar{p}+8t/3)\right) where p¯=def2/[n(n1)]i<jpij(1pij)\bar{p}\overset{def}{=}2/[n(n-1)]\sum_{i<j}p_{ij}(1-p_{ij}). The resulting upper bound on the fixed point obtained in [54] is,

r(Δ)(8/3)KG(2nlog(2)+log(1/Δ)).r^{*}(\Delta)\leq(8/3)K_{G}(2n\log(2)+\log(1/\Delta)). (3.3)

Finally, under the assumption of Theorem 1 in [54] (i.e., for some some ϵ(0,1)\epsilon\in(0,1), n5.104/ϵ2n\geq 5.10^{4}/\epsilon^{2}, max(p(1p),q(1q))20/n\max(p(1-p),q(1-q))\geq 20/n, p=a/n>b/n=qp=a/n>b/n=q and (ab)22.104ϵ2(a+b)(a-b)^{2}\geq 2.10^{4}\epsilon^{-2}(a+b)), for Δ=e35n\Delta=e^{3}5^{-n} we obtain (using the general result in Theorem 1) with probability at least 1Δ1-\Delta, the bound <𝔼A,ZZ^>r(Δ)ϵn2=ϵZ22\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}\leq r^{*}(\Delta)\leq\epsilon n^{2}=\epsilon\left\|Z^{*}\right\|_{2}^{2}, which is the result from Theorem 1 in [54]. Finally, [54] uses a (global) curvature property of the excess risk in its Lemma 7.2:

Lemma 1 (Lemma 7.2 in [54]).

For all Z𝒞Z\in\mathcal{C}, <𝔼A,ZZ>[(pq)/2]ZZ1\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\geq[(p-q)/2]\left\|Z^{*}-Z\right\|_{1}.

Therefore, a (global– that is for all Z𝒞Z\in{\cal C}) curvature assumption holds for a GG function which is here the 1n×n\ell_{1}^{n\times n} norm, a margin parameter κ=1\kappa=1 and θ=(pq)/2\theta=(p-q)/2 for the community detection problem. However this curvature property is not use to compute a “better” fixed point parameter but only to have a 1n×n\ell_{1}^{n\times n} estimation bound since

Z^Z1(2pq)<𝔼A,ZZ^>16KG(2nlog(2)+log(1/Δ))3(pq).\left\|\hat{Z}-Z^{*}\right\|_{1}\leq\left(\frac{2}{p-q}\right)\bigl{<}\mathbb{E}A,Z^{*}-\hat{Z}\bigr{>}\leq\frac{16K_{G}(2n\log(2)+\log(1/\Delta))}{3(p-q)}.

The latter bound together with the sin-Theta theorem allow the authors from [54] to obtain estimation bound for the community membership vector xx^{*}.

The approach from [41] improves upon the one in [54] because it uses a localization argument: the curvature property of the excess risk function from Lemma 1 is used to improve the upper bound in (3.3) obtained following a global approach. Indeed, the authors from [41] obtain high probability upper bound on the quantity

supZ𝒞:ZZ1r<A𝔼A,ZZ>\sup_{Z\in{\cal C}:\left\|Z^{*}-Z\right\|_{1}\leq r}\bigl{<}A-{\mathbb{E}}A,Z-Z^{*}\bigr{>}

depending on rr. This yields to exact reconstruction result in the “dense” case and exponentially decaying rates of convergence in the “sparse” case. This is a typical example where the localization argument shows its advantage upon the global approach. The price to pay is usually a more technical proof for the local approach compare with the global one. However, the argument from [41] also uses an unnecessary peeling argument together with an unnecessary a priori upper bound on Z^Z1\left\|\hat{Z}-Z^{*}\right\|_{1} (which is actually the one from [54]). It appears that this peeling argument and this a priori upper bound on Z^Z1\left\|\hat{Z}-Z^{*}\right\|_{1} can be avoided thanks to our approach from Theorem 1. This improves the probability estimate and simplifies the proofs (since the result from [54] is not required anymore neither is the peeling argument). For the sign clustering problem we consider below as an application of our main results, we will mostly adapt the probabilistic tools from [41] (in the “dense” case) to the methodology associated with Theorem 1 (without this two unnecessary arguments).

4 Contributions of the paper

4.1 Application to signed clustering

Much of the clustering literature, including both spectral and non-spectral methods, has focused on unsigned graphs, where each edge carries a non-negative scalar weight that encodes a measure of affinity (similarity, trust) between pairs of nodes. However, in numerous instances, the above-mentioned affinity takes negative values, and encodes a measure of dissimilarity or distrust. Such applications arise in social networks where users relationships denote trust-distrust or friendship-enmity, shopping bipartite networks which capture like-dislike relationships between users and products [10], online news and review websites, such as Epinions [1] and Slashdot [2], that allow users to approve or denounce others [71], and clustering financial or economic time series data [4]. Such applications have spurred interest in the analysis of signed networks, which has recently become an increasingly important research topic [72], with relevant lines of work in the context of clustering signed networks including, in chronological order, [65, 24, 32].

The second application of our proposed methodology is an extension of the community detection and clustering problems to the setting of signed graphs, where, for simplicity, we assume that an edge connecting two nodes can take either 1-1 or +1+1 values.

4.1.1 A Signed Stochastic Block Model (SSBM)

We focus on the problem of clustering a K-weakly balanced graphs222A signed graph is K-weakly balanced if and only if all the edges are positive, or the nodes can be partitioned into KK\in\mathbb{N} disjoint sets such that positive edges exist only within clusters, and negative edges are only present across clusters [37].. We consider a signed stochastic block model (SSBM) similar to the one introduced in [32], where we are given a graph GG with nn nodes {1,,n}\{1,\ldots,n\} which are divided into KK communities, {𝒞1,,𝒞K}\{{\cal C}_{1},\cdots,{\cal C}_{K}\}, such that, in the noiseless setting, edges within each community are positive and edges between communities are negative.

The only information available to the user is given by a n×nn\times n sparse adjacency matrix AA constructed as follows: AA is symmetric, with Aii=1A_{ii}=1 for all i=1,,ni=1,\ldots,n, and for all 1i<jn1\leq i<j\leq n, Aij=sij(2Bij1)A_{ij}=s_{ij}(2B_{ij}-1) where

Bij{Bern(p) if ijBern(q) if i≁j and sijBern(δ),\mathrm{B}_{ij}\sim\left\{\begin{array}[]{l}\mathrm{Bern}(p)\mbox{ if }i\sim j\\ \mathrm{Bern}(q)\mbox{ if }i\not\sim j\end{array}\right.\;\;\mbox{ and }\;\;s_{ij}\sim{\rm Bern}(\delta),

for some 0q<1/2<p10\leq q<1/2<p\leq 1 and δ(0,1)\delta\in(0,1). Moreover, all the variables Bij,sijB_{ij},s_{ij} for 1i<jn1\leq i<j\leq n are independent.

We remark that this SSBM model is similar to the one considered in [32], which was governed by two parameters, the sampling probability δ\delta as above, and the noise level η\eta, which may flip entries of the adjacency matrix.

Our aim is to recover the community membership matrix or cluster matrix Z¯=(Z¯ij)n×n\bar{Z}=(\bar{Z}_{ij})_{n\times n}where Z¯ij=1\bar{Z}_{ij}=1 when iji\sim j and Z¯ij=0\bar{Z}_{ij}=0 when i≁ji\not\sim j using only the observed censored adjacency matrix AA.

Our approach is similar in nature to the one used by spectral methods in community detection. We first observe that for α:=δ(p+q1)\alpha:=\delta(p+q-1) and J=(1)n×nJ=(1)_{n\times n} we have Z¯=Z\bar{Z}=Z^{*} where

ZargmaxZ𝒞<𝔼AαJ,Z>Z^{*}\in\operatorname*{argmax}_{Z\in{\cal C}}\bigl{<}\mathbb{E}A-\alpha J,Z\bigr{>} (4.1)

and 𝒞={Zn×n:Z0,Zij[0,1],Zii=1,i=1,,n}{\cal C}=\{Z\in{\mathbb{R}}^{n\times n}:Z\succeq 0,Z_{ij}\in[0,1],Z_{ii}=1,i=1,\ldots,n\}. The proof of (4.1) is recalled in Section 8.

Since we do not know 𝔼A{\mathbb{E}}A and α\alpha, we should estimate both of them. We will estimate 𝔼A{\mathbb{E}}A with AA but, for simplicity, we will assume that α\alpha is known. The resulting estimator of the cluster matrix Z¯\bar{Z} is

Z^argmaxZ𝒞<AαJ,Z>\hat{Z}\in\operatorname*{argmax}_{Z\in{\cal C}}\bigl{<}A-\alpha J,Z\bigr{>} (4.2)

which is indeed a SDP estimator and therefore Theorem 1 (or Corollary 1 and Theorem 2) may be used to obtain statistical bounds for the estimation of ZZ^{*} from (4.1) by Z^\hat{Z}.

We will use the following notations: s:=δ(pq)2s:=\delta(p-q)^{2}, θ:=δ(pq)\theta:=\delta(p-q), ρ:=δmax{1δ(2p1)2,1δ(2q1)2}\rho:=\delta\max\{1-\delta(2p-1)^{2},1-\delta(2q-1)^{2}\}, ν:=max{2p1,12q}\nu:=\max\{2p-1,1-2q\}, [m]:={1,,m}[m]:=\{1,\cdots,m\} for all mm\in{\mathbb{N}}, lk:=|𝒞k|l_{k}:=|{\cal C}_{k}| for all k[K]k\in[K], λ2:=k=1Klk2\lambda^{2}:=\sum_{k=1}^{K}l_{k}^{2}, 𝒞+:=k=1𝐾(𝒞k×𝒞k){\cal C}^{+}:=\underset{k=1}{\overset{K}{\cup}}({\cal C}_{k}\times{\cal C}_{k}) and 𝒞:=kk(𝒞k×𝒞k){\cal C}^{-}:=\underset{k\neq k^{\prime}}{\overset{}{\cup}}({\cal C}_{k}\times{\cal C}_{k^{\prime}}). We also use the notation c0,c1,,c_{0},c_{1},\ldots, to denote absolute constants whose values may change from one line to another.

4.1.2 Main result for the estimation of the cluster matrix in signed clustering

Our main result concerns the reconstitution of the KK communities from the observation of the matrix AA. In order to avoid solutions with some communities of degenerated size (too small or too large) we consider the following assumption.

Assumption 3.

Up to constant, the elements of the partition 𝒞1𝒞K{\cal C}_{1}\sqcup\cdots\sqcup{\cal C}_{K} of {1,,n}\{1,\ldots,n\} are of same size (up to constants): there are absolute constant c0,c1>0c_{0},c_{1}>0 such that for any k[K]k\in[K], n/(c1K)|𝒞k|=lkc0n/Kn/(c_{1}K)\leq|{\cal C}_{k}|=l_{k}\leq c_{0}n/K.

We are now ready to state the main result on the estimation of the cluster matrix ZZ^{*} from (4.1) by the SDP estimator Z^\hat{Z} from (4.2).

Theorem 3.

There is an absolute positive constant c0c_{0} such that the following holds. Grant Assumption 3. Assume that

nνδlogn,n\nu\delta\geq\log n, (4.3)
(pq)2nδc0K2ν(p-q)^{2}n\delta\geq c_{0}K^{2}\nu (4.4)
Klog(2eKn)nmax(θ2ρ,9ρ32).\frac{K\log(2eKn)}{n}\leq\max\left(\frac{\theta^{2}}{\rho},\frac{9\rho}{32}\right). (4.5)

Then, with probability at least 1exp(δνn)3/(2eKn)1-\exp(-\delta\nu n)-3/(2eKn), exact recovery holds true, i.e., Z^=Z.\hat{Z}=Z^{*}.

Therefore, we have exact reconstruction in the dense case (that is under assumption (4.3)), which stems from condition (4.5). The latter condition is in the same spirit as the one in Theorem 1 of [41], it measures the SNR (signal-to-noise ratio) of the model which captures the hardness of the SSBM. As mentioned in [41], it is related to the Kesten-Stigum threshold [78]. The last condition (4.5) basically requires that the number of clusters KK is at most n/lognn/\log n. If this condition is dropped out then we do not have anymore exact reconstruction but only a certified rate of convergence: with probability at least 1exp(δνn)3/(2eKn)1-\exp(-\delta\nu n)-3/(2eKn),

ZZ^1nc1δ(pq)K.\left\|Z^{*}-\hat{Z}\right\|_{1}\leq\frac{n}{c_{1}\delta(p-q)K}.

This shows that there is a phase transition in the dense case: exact reconstruction is possible when Kn/lognK\lesssim n/\log n and, otherwise, when Kn/lognK\gtrsim n/\log n we only have a control of the estimation error.

4.2 Application to angular group synchronization

In this section, we introduce the group synchronization problem as well as a stochastic model for this problem. We consider a SDP relaxation of the original problem (which is exact) and construct the associated SDP estimator such as (1.3).

The angular synchronization problem consists of estimating nn unknown angles θ1,,θn\theta_{1},\cdots,\theta_{n} (up to a global shift angle) given a noisy subset of their pairwise offsets (θiθj)[2π](\theta_{i}-\theta_{j})[2\pi], where [2π][2\pi] is the modulo 2π2\pi operation. The pairwise measurements can be realized as the edge set of a graph GG, typically modeled as an Erdös-Renyi random graph [90].

The aim of this section is to show that the angular synchronization problem can be analyzed using our methodology. In order to keep the presentation as simple as possible, we assume that all pairwise offsets are observed up to some Gaussian noise: we are given (θiθj+σgij)[2π](\theta_{i}-\theta_{j}+\sigma g_{ij})[2\pi] for all 1i<jn1\leq i<j\leq n where (gij:1i<jn)(g_{ij}:1\leq i<j\leq n) are n(n1)/2n(n-1)/2 i.i.d. standard Gaussian variables and σ>0\sigma>0 is the noise variance. We may rewrite the problem as follows: we observe a n×nn\times n complex matrix AA defined by

A=S[x(x¯)] where S=(Sij)n×n,Sij={eισgij if i<j1 if i=jeισgij if i>j,A=S\circ[x^{*}(\overline{x^{*}})^{\top}]\mbox{ where }S=(S_{ij})_{n\times n},S_{ij}=\left\{\begin{array}[]{cc}e^{\iota\sigma g_{ij}}&\mbox{ if }i<j\\ 1&\mbox{ if }i=j\\ e^{-\iota\sigma g{ij}}&\mbox{ if }i>j\end{array}\right., (4.6)

ι\iota denotes the imaginary number such that ι2=1\iota^{2}=-1, x=(xi)i=1nnx^{*}=(x^{*}_{i})_{i=1}^{n}\in{\mathbb{C}}^{n}, xi=eιθi,i=1,,nx^{*}_{i}=e^{\iota\theta_{i}},i=1,\ldots,n, x¯\bar{x} denotes the conjugate vector of xx and S[x(x¯)]S\circ[x^{*}(\overline{x^{*}})^{\top}] is the element-wise product (Sijxix¯j)n×n(S_{ij}x_{i}\bar{x}_{j})_{n\times n}. In particular, SS is a Hermitian matrix (i.e. S¯=S\bar{S}^{\top}=S) and 𝔼Sij=exp(σ2/2)\mathbb{E}S_{ij}=\exp(-\sigma^{2}/2) for iji\neq j and 𝔼Sii=1\mathbb{E}S_{ii}=1 if i=ji=j. We want to estimate (θ1,,θn)(\theta_{1},\ldots,\theta_{n}) (up to a global shift) from the matrix of data AA. Unlike classical statistical models the noise here is multiplicative; we show that our approach covers this type of problem as well.

The first step is to find an (vectorial) optimization problem which solutions are given by (θi)i=1n(\theta_{i})_{i=1}^{n} (up to global angle shift) or some bijective function of it. Estimating (θi)i=1n(\theta_{i})_{i=1}^{n} up to global angle shift is equivalent to estimating the vector x=(eιθi)i=1nx^{*}=(e^{\iota\theta_{i}})_{i=1}^{n}. The latter is, up to a global rotation of its coordinates, the unique solution of the following maximization problem

argmaxxn:|xi|=1{x¯𝔼Ax}={(eι(θi+θ0))i=1n:θ0[0,2π)}.\operatorname*{argmax}_{x\in\mathbb{C}^{n}:|x_{i}|=1}\left\{\bar{x}^{\top}\;{\mathbb{E}}A\;x\right\}=\{(e^{\iota(\theta_{i}+\theta_{0})})_{i=1}^{n}:\theta_{0}\in[0,2\pi)\}. (4.7)

A proof of (4.7) is given in Section 6. Let us now rewrite (4.7) as a SDP problem. For all xnx\in{\mathbb{C}}^{n}, we have x¯𝔼Ax=tr(𝔼AX)=<𝔼A,X>\bar{x}^{\top}\mathbb{E}Ax=\mathrm{tr}(\mathbb{E}AX)=\bigl{<}\mathbb{E}A,X\bigr{>} where X=xx¯X=x\bar{x}^{\top} and {Zn×n:Z=xx¯T,|xi|=1}={Zn:Z0,diag(Z)=𝟏n,rank(Z)=1}\{Z\in\mathbb{C}^{n\times n}:Z=x\bar{x}^{T},|x_{i}|=1\}=\{Z\in\mathbb{H}_{n}:Z\succeq 0,\mathrm{diag}(Z)=\mathbf{1}_{n},\mathrm{rank}(Z)=1\} where n\mathbb{H}_{n} is the set of all n×nn\times n Hermitian matrices and 𝟏nn\mathbf{1}_{n}\in{\mathbb{C}}^{n} is the vector with all coordinates equal to 11. It is therefore straightforward to construct a SDP relaxation of (4.7) by dropping the rank constraint. It appears that this relaxation is exact since, for 𝒞={Zn:Z0,diag(Z)=𝟏n}{\cal C}=\{Z\in\mathbb{H}_{n}:Z\succeq 0,\mathrm{diag}(Z)=\mathbf{1}_{n}\},

argmaxZ𝒞<𝔼A,Z>={Z},\operatorname*{argmax}_{Z\in{\cal C}}\bigl{<}{\mathbb{E}}A,Z\bigr{>}=\{Z^{*}\}, (4.8)

where Z=x(x¯)Z^{*}=x^{*}(\overline{x^{*}})^{\top}. A proof of (4.8) can be found in Section 6. Finally, as we only observe AA, we consider the following SDP estimator of ZZ^{*}

Z^argmaxZ𝒞<A,Z>.\hat{Z}\in\operatorname*{argmax}_{Z\in{\cal C}}\bigl{<}A,Z\bigr{>}. (4.9)

In the next section, we use the strategy from Corollary 1 to obtain statistical guarantees for the estimation of ZZ^{*} by Z^\hat{Z}.

Intuitively, the above maximization problem (4.8) attempts to preserve the given angle offsets as best as possible

argmaxθ1,,θn[0,2π)i,j=1neιθiAijeιθj,\operatorname*{argmax}_{\theta_{1},\ldots,\theta_{n}\in[0,2\pi)}\sum_{i,j=1}^{n}e^{-\iota\theta_{i}}A_{ij}e^{\iota\theta_{j}}, (4.10)

where the objective function is incremented by +1+1 whenever an assignment of angles θi\theta_{i} and θj\theta_{j} perfectly satisfies the given edge constraint δij=(θiθj)[2π]\delta_{ij}=(\theta_{i}-\theta_{j})[2\pi] (i.e., for a clean edge for which σ=0\sigma=0), while the contribution of an incorrect assignment (i.e., of a very noisy edge) will be almost uniformly distributed on the unit circle in the complex plane. Due to non-convexity of optimization in (4.10), it is difficult to solve computationally [109]; one way to overcome this problem is to consider the SDP relaxation from (4.8) but it is also possible to consider a spectral relaxation such as the one proposed by Singer [90] which replaces the individual constraints that all ziz_{i}’s should have unit magnitude by the much weaker single constraint i=1n|zi|2=n\sum_{i=1}^{n}|z_{i}|^{2}=n, leading to

argmaxz1,,zn;i=1n|zi|2=ni,j=1nzi¯Aijzj.\operatorname*{argmax}_{z_{1},\ldots,z_{n}\in\mathbb{C};\;\;\;\sum_{i=1}^{n}|z_{i}|^{2}=n}\sum_{i,j=1}^{n}\bar{z_{i}}A_{ij}z_{j}. (4.11)

The solution to the resulting maximization problem is simply given by a top eigenvector of the Hermitian matrix AA, followed by a normalization step. We remark that the main advantage of the SDP relaxation (4.8) is that it explicitly imposes the unit magnitude constraint for eιθie^{\iota\theta_{i}}, which we cannot otherwise enforce in the spectral relaxation solved via the eigenvector method in (4.11). The above SDP program (4.8) is very similar to the well-known Goemans-Williamson SDP relaxation for the seminal MAX-CUT problem of finding the maximal cut of a graph (the MAX-CUT problem is one of the four applications considered in this work, see Section 4.3), with the main difference being that here we optimize over the cone of complex-valued Hermitian positive semidefinite matrices, not just real symmetric matrices.

4.2.1 Main results for phase recovery in the synchronization problem

Our main result concerns the estimation of the matrix of offsets Z=x(x¯)Z^{*}=x^{*}(\overline{x^{*}})^{\top} from the observation of the matrix AA. This result is then used to estimate (up to global phase shift) the angular vector xx^{*}.

Theorem 4.

Let 0<ϵ<10<\epsilon<1. If σlog(ϵn4)\sigma\leq\sqrt{\log(\epsilon n^{4})} then, with probability at least 1exp(ϵσ4n(n1)/2)1-\exp(-\epsilon\sigma^{4}n(n-1)/2), it holds true that

(eσ2/2/2)ZZ22<𝔼A,ZZ>(128/3)ϵσ4N.(e^{-\sigma^{2}/2}/2)\left\|Z^{*}-Z\right\|_{2}^{2}\leq\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq(128/3)\sqrt{\epsilon}\sigma^{4}N. (4.12)

Once we have an estimator Z^\hat{Z} for the oracle ZZ^{*}, we can extract an estimator x^\hat{x} for the vector of phases xx^{*} by considering a top eigenvector (i.e. an eigenvector associated with the largest eigenvalue) of Z^\hat{Z}. It is then possible to quantify the estimation properties of xx^{*} by x^\hat{x} using a sin-theta theorem and Theorem 4.

Corollary 2.

Let x^\hat{x} be a top eigenvector of Z^\hat{Z} with Euclidean norm x^2=n\left\|\hat{x}\right\|_{2}=\sqrt{n}. Let 0<ϵ<10<\epsilon<1 and assume that σlog(ϵn4)\sigma\leq\sqrt{\log(\epsilon n^{4})}. We have the existence of a universal constant c0>0c_{0}>0 (which is the constant in the Davis-Kahan Theorem for Hermitian matrices) such that, with probability at least 1exp(ϵσ4n(n1)/2)1-\exp(-\epsilon\sigma^{4}n(n-1)/2), it holds true that

minz:|z|=1x^zx28c02/3ϵ1/4eσ2/4σ2n.\min_{z\in{\mathbb{C}}:|z|=1}\left\|\hat{x}-zx^{*}\right\|_{2}\leq 8c_{0}\sqrt{2/3}\epsilon^{1/4}e^{\sigma^{2}/4}\sigma^{2}\sqrt{n}. (4.13)

It follows from Corollary 2 that we can estimate xx^{*} (up to a global rotation z:|z|=1z\in{\mathbb{C}}:|z|=1) with a 2n\ell_{2}^{n}-estimation error of the order of σ2n\sigma^{2}\sqrt{n} with exponential deviations. Given that x2=n\left\|x^{*}\right\|_{2}=\sqrt{n}, this means that a constant proportion of the entries are well estimated. For a values of ϵ1/n2\epsilon\sim 1/n^{2}, the rate of estimation is like σ2\sigma^{2}, we therefore get a much better estimation of xx^{*} but only with constant probability. It is important to recall that Z^\hat{Z} and x^\hat{x} can be both efficiently computed by solving a SDP problem and then by considering a top eigenvector of its solution (for instance, using the power method).

4.3 Application to the MAX-CUT problem

Let A0{0,1}n×nA^{0}\in\{0,1\}^{n\times n} be the adjacency (symmetric) matrix of an undirected graph G=(V,E0)G=(V,E^{0}), where V:={1,,n}V:=\{1,\ldots,n\} is the set of the vertices and the set of edges is E0:=EE{(i,i):Aii0=1}E^{0}:=E\cup E^{\top}\cup\{(i,i):A_{ii}^{0}=1\} where E:={(i,j)V2:i<j and Aij0=1}E:=\{(i,j)\in V^{2}~{}:~{}i<j\mbox{ and }A^{0}_{ij}=1\} and E={(j,i):(i,j)E}E^{\top}=\{(j,i):(i,j)\in E\}. We assume that GG has no self loop so that Aii0=0A_{ii}^{0}=0 for all iVi\in V. A cut of GG is any subset SS of vertices in VV. For a cut SVS\subset V, we define its weight by cut(G,S):=(1/2)(i,j)S×S¯Aij0\mathrm{cut}(G,S):=(1/2)\sum_{(i,j)\in S\times\bar{S}}A^{0}_{ij}, that is the number of edges in EE between SS and its complement S¯=V\S\bar{S}=V\backslash S. The MAX-CUT problem is to find the cut with maximal weight:

SargmaxSVcut(G,S).\displaystyle S^{*}\in\underset{S\subset V}{\mathrm{argmax}}\,\mathrm{cut}(G,S). (4.14)

The MAX-CUT problem is a NP-complete problem but [49] constructed a 0.8780.878 approximating solution via a SDP relaxation. Indeed, one can write the MAX-CUT problem in the following way. For a cut SVS\subset V, we define the membership vector x{1,1}nx\in\{-1,1\}^{n} associated with SS by setting xi:=1x_{i}:=1 if iSi\in S and xi=1x_{i}=-1 if iSi\notin S for all iVi\in V. We have cut(G,S)=(1/4)i,j=1nAij0(1xixj):=cut(G,x)\mathrm{cut}(G,S)=(1/4)\sum_{i,j=1}^{n}A^{0}_{ij}(1-x_{i}x_{j}):=\mathrm{cut}(G,x) and so solving (4.14) is equivalent to solve

xargmaxx{1,1}ncut(G,x).\displaystyle x^{*}\in\underset{x\in\{-1,1\}^{n}}{\mathrm{argmax}}\,\mathrm{cut}(G,x). (4.15)

Since (xixj)i,j=xx(x_{i}x_{j})_{i,j}=xx^{\top}, the latter problem is also equivalent to solve

max(14i,j=1nAij0(1Zij):rank(Z)=1,Z0,Zii=1)\max\left(\frac{1}{4}\sum_{i,j=1}^{n}A^{0}_{ij}(1-Z_{ij}):{\rm rank}(Z)=1,Z\succeq 0,Z_{ii}=1\right) (4.16)

which admits a SDP relaxation by removing the rank 11 constraint. This yields the following SDP relaxation problem of MAX-CUT from [49]:

ZargminZ𝒞<A0,Z>Z^{*}\in\underset{Z\in\mathcal{C}}{\mathrm{argmin}}\bigl{<}A^{0},Z\bigr{>} (4.17)

where 𝒞:={Zn×n:Z0,Zii=1,i=1,,n}\mathcal{C}:=\{Z\in{\mathbb{R}}^{n\times n}:Z\succeq 0,Z_{ii}=1,\forall i=1,\ldots,n\}.

Unlike the other examples from the previous sections, the SDP relaxation in (9.1) is not exact, except for bipartite graphs; see [62, 44] for more details. Nevertheless, thanks to the approximation result from [49] we can use our methodology to estimate ZZ^{*} and then deduce some approximate optimal cut. But first, we introduce a stochastic model because in many situations the adjacency matrix A0A^{0} is only partially observed but still it might be interesting to find an approximating solution to the MAX-CUT problem.

We observe A=SA0=(sijAij0)1i,jnA=S\circ A^{0}=(s_{ij}A_{ij}^{0})_{1\leq i,j\leq n} a “masked” version of A0A^{0}, where Sn×nS\in{\mathbb{R}}^{n\times n} is symmetric with upper triangular matrix filled with i.i.d. Bernoulli entries: for all i,jVi,j\in V such that iji\leq j, Sij=Sji=sijS_{ij}=S_{ji}=s_{ij} where (sij)ij(s_{ij})_{i\leq j} is a family of i.i.d. Bernoulli random variables with parameter p(1/2,1)p\in(1/2,1). Let B:=(1/p)AB:=-(1/p)A so that 𝔼[B]=A0{\mathbb{E}}[B]=-A^{0}. We can write ZZ^{*} as an oracle since ZargmaxZ𝒞<𝔼B,Z>Z^{*}\in\operatorname*{argmax}_{Z\in{\cal C}}\bigl{<}{\mathbb{E}}B,Z\bigr{>} and so we estimate ZZ^{*} via the SDP estimator Z^argmaxZ𝒞<B,Z>\hat{Z}\in\operatorname*{argmax}_{Z\in{\cal C}}\bigl{<}B,Z\bigr{>}. Our first aim is to quantify the cost we pay by using Z^\hat{Z} instead of ZZ^{*} in our final choice of cut. It appears that the fixed point used in Theorem 1 may be used to quantify this loss:

r(Δ)=inf(r>0:[supZ𝒞:<𝔼B,ZZ>r<B𝔼B,ZZ>(1/2)r]1Δ).r^{*}(\Delta)=\inf\left(r>0:{\mathbb{P}}\left[\sup_{Z\in{\cal C}:\bigl{<}\mathbb{E}B,Z^{*}-Z\bigr{>}\leq r}\bigl{<}B-\mathbb{E}B,Z-Z^{*}\bigr{>}\leq(1/2)r\right]\geq 1-\Delta\right). (4.18)

Our second result is an explicit high probability upper bound on the latter fixed point.

4.3.1 Main results for the MAX-CUT problem

In this section, we gather the two results on the estimation of ZZ^{*} from Z^\hat{Z} and on the approximate optimality of the final cut constructed from Z^\hat{Z}. Let us now explicitly provide the construction of this cut. We consider the same strategy as in [49]. Assume that Z^\hat{Z} has been constructed. Let G^\hat{G} be a centered Gaussian vector with covariance matrix Z^\hat{Z}. Let x^\hat{x} be the sign vector of G^\hat{G}. Using the statistical properties of Z^\hat{Z}, it is possible to prove near optimality of x^\hat{x}.

We denote the optimal values of the maxcut problem and its SDP relaxation by

SDP(G):=(1/4)<A0,JZ>=maxZ𝒞14i,jAi,j0(1Zij) and MAXCUT(G):=cut(G,S)\mathrm{SDP}(G):=(1/4)\bigl{<}A^{0},J-Z^{*}\bigr{>}=\max_{Z\in{\cal C}}\frac{1}{4}\sum_{i,j}A_{i,j}^{0}(1-Z_{ij})\mbox{ and }\mathrm{MAXCUT}(G):=\mathrm{cut}(G,S^{*})

where SS^{*} is a solution of (4.14) and J=(1)n×nJ=(1)_{n\times n}. Our first result is to show how the 0.8780.878 approximating result from [49] is downgraded by the incomplete information we have on the graph (we only partially observed the adjacency matrix A0A^{0} via AA).

Theorem 5.

For all 0<Δ<10<\Delta<1. With probability at least 1Δ1-\Delta (with respect to the masked SS),

SDP(G)𝔼[cut(G,x^)|Z^]0.878SDP(G)0.878r(Δ)4.\mathrm{SDP}(G)\geq\mathbb{E}\left[\mathrm{cut}(G,\hat{x})|\hat{Z}\right]\geq 0.878\mathrm{SDP}(G)-\frac{0.878r^{*}(\Delta)}{4}.

To precise the notation, x^\hat{x} is the sign vector of G^\hat{G} which is a centered Gaussian variable with covariance Z^\hat{Z}. In that context, 𝔼[cut(G,x^)|Z^]\mathbb{E}\left[\mathrm{cut}(G,\hat{x})|\hat{Z}\right] is the conditional expectation according to G^\hat{G} for a fixed Z^\hat{Z}. Moreover, the probability at least 1Δ1-\Delta that we obtain is w.r.t. the mask that is to the randomness in AA.

Let us put Theorem 5 into some perspective. If we had known the entire adjacency matrix (which is the case when p=1p=1), then we could have use ZZ^{*} instead of Z^\hat{Z}. In that case, for xx^{\star} the sign vector of G𝒩(0,Z)G^{\star}\sim{\cal N}(0,Z^{*}), we know from [49] that

SDP(G)𝔼[cut(G,x)]0.878SDP(G).\mathrm{SDP}(G)\geq\mathbb{E}\left[\mathrm{cut}(G,x^{\star})\right]\geq 0.878\mathrm{SDP}(G). (4.19)

Therefore, Theorem 5 characterizes the price we pay for not observing the entire adjacency matrix A0A^{0} but only a masked version AA of it. It is an interesting output of Theorem 5 to observe that the fixed point r(Δ)r^{*}(\Delta) measures, in a quantitative way, this loss. If we were able to identify scenarii of pp and EE for which r(Δ)=0r^{*}(\Delta)=0 that would prove that there is no loss for partially observing A0A^{0} in the MAX-CUT problem. Unfortunately, the approach we use to control r(Δ)r^{*}(\Delta) is the global one which does not allow for exact reconstruction (that is to show that r(Δ)=0r^{*}(\Delta)=0).

Let us now turn to an estimation result of ZZ^{*} by Z^\hat{Z} via an upper bound on r(Δ)r^{*}(\Delta).

Theorem 6.

With probability at least 14n1-4^{-n}:

<𝔼B,ZZ^>r(4n)2n(2log4)(1p)(n1)p+8nlog43.\displaystyle\bigl{<}\mathbb{E}B,Z^{*}-\hat{Z}\bigr{>}\leq r^{*}(4^{-n})\leq 2n\sqrt{\frac{(2\log 4)(1-p)(n-1)}{p}}+\frac{8n\log 4}{3}.

In particular, it follows from the approximation result from Theorem 5 and the high probability upper bound on r(Δ)r^{*}(\Delta) from Theorem 6 that, with probability at least 14n1-4^{-n},

𝔼[cut(G,x^)|Z^]0.878SDP(G)0.8784(2n(2log4)(1p)(n1)p+8nlog43).\mathbb{E}\left[\mathrm{cut}(G,\hat{x})|\hat{Z}\right]\geq 0.878\mathrm{SDP}(G)-\frac{0.878}{4}\left(2n\sqrt{\frac{(2\log 4)(1-p)(n-1)}{p}}+\frac{8n\log 4}{3}\right). (4.20)

This result is none trivial only when the right-hand side term is strictly larger than (0.5)SDP(G)(0.5)\mathrm{SDP}(G) which is the performance of a random cut. As a consequence, (4.20) shows that one can still do better than randomness even in an incomplete information setup for the MAX-CUT problem when pp, nn and SDP(G)\mathrm{SDP}(G) are such that

0.378SDP(G)>0.8784(2n(2log4)(1p)(n1)p+8nlog43).0.378\mathrm{SDP}(G)>\frac{0.878}{4}\left(2n\sqrt{\frac{(2\log 4)(1-p)(n-1)}{p}}+\frac{8n\log 4}{3}\right).

For instance, when pp is like a constant, it requires SDP(G)\mathrm{SDP}(G) to be larger than c0n3/2c_{0}n^{3/2} (for some absolute constant c0c_{0}) and when p=11/np=1-1/n, it requires SDP(G)\mathrm{SDP}(G) to be at least c0nc_{0}n (for some absolute constant c0c_{0}).

5 Proof of Theorem 3 (signed clustering)

The aim of this paper is to put forward a methodology developed in learning theory for the study of SDP estimators. In each example, we follow this methodology. For a problem, such as the signed clustering, where it is possible to characterize the curvature of the excess risk, we start to identify this curvature because the curvature function GG, coming out of it, defines the local subsets of 𝒞{\cal C} driving the complexity of the problem. Then, we turn to the stochastic part of the proof which is entirely summarized into the complexity fixed point rG(Δ)r^{*}_{G}(\Delta) from (2.2). Finally, we put the two pieces together and apply the main general result from Corollary 1 to obtain estimation results for the SDP estimator (4.2) in the signed clustering problem; which is Theorem 3.

5.1 Curvature equation

In this section, we show that the objective function Z𝒞<Z,𝔼AαJ>Z\in{\cal C}\rightarrow\bigl{<}Z,{\mathbb{E}}A-\alpha J\bigr{>} satisfies a curvature assumption around its maximizer ZZ^{*} with respect to the 1n×n\ell_{1}^{n\times n}-norm given by G(ZZ)=θZZ1G(Z^{*}-Z)=\theta\left\|Z^{*}-Z\right\|_{1} with parameter θ=δ(pq)\theta=\delta(p-q) (and margin exponent κ=1\kappa=1).

Proposition 1.

For θ=δ(pq)\theta=\delta(p-q), we have for all Z𝒞Z\in{\cal C}, <𝔼AαJ,ZZ>=θZZ1\bigl{<}{\mathbb{E}}A-\alpha J,Z^{*}-Z\bigr{>}=\theta\left\|Z^{*}-Z\right\|_{1}.

Proof.

Let ZZ be in 𝒞{\cal C}. We have

<ZZ,𝔼AαJ>\displaystyle\bigl{<}Z^{*}-Z,{\mathbb{E}}A-\alpha J\bigr{>} =i,j=1n(ZZ)ij(𝔼Aijα)\displaystyle=\sum_{i,j=1}^{n}(Z^{*}-Z)_{ij}({\mathbb{E}}A_{ij}-\alpha)
=(i,j)𝒞+(ZijZij)(δ(2p1)α)+(i,j)𝒞(ZijZij)(δ(2q1)α)\displaystyle=\sum_{(i,j)\in{\cal C}^{+}}(Z^{*}_{ij}-Z_{ij})(\delta(2p-1)-\alpha)+\sum_{(i,j)\in{\cal C}^{-}}(Z^{*}_{ij}-Z_{ij})(\delta(2q-1)-\alpha)
=δ(pq)[(i,j)𝒞+(ZZ)ij(i,j)𝒞(ZZ)ij].\displaystyle=\delta(p-q)\left[\sum_{(i,j)\in{\cal C}^{+}}(Z^{*}-Z)_{ij}-\sum_{(i,j)\in{\cal C}^{-}}(Z^{*}-Z)_{ij}\right].

Moreover, for all (i,j)𝒞+,Zij=1(i,j)\in{\cal C}^{+},Z^{*}_{ij}=1 and 0Zij10\leq Z_{ij}\leq 1, so (ZZ)ij=|(ZZ)ij|(Z^{*}-Z)_{ij}=|(Z^{*}-Z)_{ij}|. We also have for all (i,j)𝒞(i,j)\in{\cal C}^{-}, (ZZ)ij=Zij=|(ZZ)ij|(Z^{*}-Z)_{ij}=-Z_{ij}=-|(Z^{*}-Z)_{ij}| because in that case Zij=1Z^{*}_{ij}=1 and 0Zij10\leq Z_{ij}\leq 1. Hence,

<ZZ,𝔼AαJ>=δ(pq)[(i,j)𝒞+|(ZZ)ij|+(i,j)𝒞|(ZZ)ij|]=θZZ1.\bigl{<}Z^{*}-Z,{\mathbb{E}}A-\alpha J\bigr{>}=\delta(p-q)\left[\sum_{(i,j)\in{\cal C}^{+}}|(Z^{*}-Z)_{ij}|+\sum_{(i,j)\in{\cal C}^{-}}|(Z^{*}-Z)_{ij}|\right]=\theta\left\|Z^{*}-Z\right\|_{1}.

 

5.2 Computation of the complexity fixed point rG(Δ)r^{*}_{G}(\Delta)

Define W:=A𝔼AW:=A-{\mathbb{E}}A the noise matrix of the problem. Since WW is symmetric, its entries are not independent. In order to work only with independent random variables, we define the following matrix Ψn×n\Psi\in{\mathbb{R}}^{n\times n}:

Ψij={Wij if ij0 otherwise,\Psi_{ij}=\left\{\begin{array}[]{l}W_{ij}\mbox{ if }i\leq j\\ 0\mbox{ otherwise,}\end{array}\right. (5.1)

where 0 entries are considered as independent Bernoulli variables with parameter 0 and therefore, Ψ\Psi has independent entries, and satisfies the relation W=Ψ+ΨW=\Psi+\Psi^{\top}.

In order to obtain upper bounds on the fixed point complexity parameter rG(Δ)r^{*}_{G}(\Delta) associated with the signed clustering problem, we need to prove a high probability upper bound on the quantity

supZ𝒞:ZZ1r<W,ZZ>,\sup_{Z\in{\cal C}:\left\|Z-Z^{*}\right\|_{1}\leq r}\bigl{<}W,Z-Z^{*}\bigr{>}, (5.2)

and then find a radius rr as small as possible such that the quantity in (5.2) is less than (θ/2)r(\theta/2)r. We denote 𝒞r:=𝒞(Z+rB1n×n)={Z𝒞:ZZ1r}{\cal C}_{r}:={\cal C}\cap(Z^{*}+rB_{1}^{n\times n})=\left\{Z\in{\cal C}:\left\|Z-Z^{*}\right\|_{1}\leq r\right\} where B1n×nB_{1}^{n\times n} is the unit 1n×n\ell_{1}^{n\times n}-ball of n×n{\mathbb{R}}^{n\times n}.

We follow the strategy from [41] by decomposing the inner product <W,ZZ>\bigl{<}W,Z-Z^{*}\bigr{>} into two parts according to the SVD of ZZ^{*}. This observation is a key point in the work of [41] compared to the analysis from [54]. This allows to perform the localization argument efficiently. Up to a change of index of the nodes, ZZ^{*} is a block matrix with KK diagonal blocks of 11’s. It therefore admits KK singular vectors Uk:=I(i𝒞k)/|𝒞k|U_{\bullet k}:=I(i\in{\cal C}_{k})/\sqrt{|{\cal C}_{k}|} with multiplicity lkl_{k} associated with the singular value lkl_{k} for all k[K]k\in[K]. We can therefore write

Z=k=1KlkUkUk=UDU,Z^{*}=\sum_{k=1}^{K}l_{k}U_{\bullet k}\otimes U_{\bullet k}=UDU^{\top},

where Un×KU\in{\mathbb{R}}^{n\times K} has KK column vectors given by Uk,k=1,,KU_{\bullet k},k=1,\ldots,K and D=diag(l1,,lK)D={\rm diag}(l_{1},\ldots,l_{K}). We define the following projection operator

𝒫:Mn×nUUTM+MUUTUUTMUUT{\cal P}:M\in{\mathbb{R}}^{n\times n}\rightarrow UU^{T}M+MUU^{T}-UU^{T}MUU^{T}

and its orthogonal projection 𝒫{\cal P}^{\perp} by

𝒫:Mn×nM𝒫(M)=(InUUT)M(InUUT)=k=K+1n<M,UkUk>UkUk{\cal P}^{\perp}:M\in{\mathbb{R}}^{n\times n}\rightarrow M-{\cal P}(M)=(\mathrm{I}_{n}-UU^{T})M(\mathrm{I}_{n}-UU^{T})=\sum_{k=K+1}^{n}\bigl{<}M,U_{\bullet k}\otimes U_{\bullet k}\bigr{>}U_{\bullet k}\otimes U_{\bullet k}

where Ukn,k=K+1,,nU_{\bullet k}\in{\mathbb{R}}^{n},k=K+1,\ldots,n are such that (Uk:k=1,,n)(U_{\bullet k}:k=1,\ldots,n) is an orthonormal basis of n{\mathbb{R}}^{n}.

We use the same decomposition as in [41]: for all Z𝒞Z\in{\cal C},

<W,ZZ>=<W,𝒫(ZZ)+𝒫(ZZ)>=<𝒫(ZZ),W>S1(Z)+<𝒫(ZZ),W>S2(Z).\displaystyle\bigl{<}W,Z-Z^{*}\bigr{>}=\bigl{<}W,{\cal P}(Z-Z^{*})+{\cal P}^{\perp}(Z-Z^{*})\bigr{>}=\underset{S_{1}(Z)}{\underbrace{\bigl{<}{\cal P}(Z-Z^{*}),W\bigr{>}}}+\underset{S_{2}(Z)}{\underbrace{\bigl{<}{\cal P}^{\perp}(Z-Z^{*}),W\bigr{>}}}.

The next step is to control with large probability the two terms S1(Z)S_{1}(Z) and S2(Z)S_{2}(Z) uniformly for all Z𝒞(Z+rB1n×n)Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n}). To that end we use the two following propositions where we recall that ρ=δmax(1δ(2p1)2,1δ(2q1)2)\rho=\delta\max(1-\delta(2p-1)^{2},1-\delta(2q-1)^{2}) and ν=max(2p1,12q)\nu=\max(2p-1,1-2q). The proof of Proposition 2 and 3 can be found in Section 8, it is based on [41].

Proposition 2.

There are absolute positive constants c0,c1,c2c_{0},c_{1},c_{2} and c3c_{3} such that the following holds. If c1rK/n2eKnexp((9/32)nρ/K)\lceil c_{1}rK/n\rceil\geq 2eKn\exp(-(9/32)n\rho/K) then we have

[supZ𝒞(Z+rB1n×n)S1(Z)c2rKρnlog(2eKnc1rKn)]13(c1rKn2eKn)c1rKn.{\mathbb{P}}\left[\sup_{Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n})}S_{1}(Z)\leq c_{2}r\sqrt{\frac{K\rho}{n}\log\left(\frac{2eKn}{\lceil\frac{c_{1}rK}{n}\rceil}\right)}\right]\geq 1-3\left(\frac{\lceil\frac{c_{1}rK}{n}\rceil}{2eKn}\right)^{\lceil\frac{c_{1}rK}{n}\rceil}.
Proposition 3.

There exists an absolute constant c0>0c_{0}>0 such that the following holds. When nνδlognn\nu\delta\geq\log n, with probability at least 1exp(δνn)1-\exp(-\delta\nu n),

supZ𝒞(Z+rB1n×n)S2(Z)c0Krδνn.\sup_{Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n})}S_{2}(Z)\leq c_{0}Kr\sqrt{\frac{\delta\nu}{n}}.

It follows from Proposition 2 and Proposition 3 that when nνδlognn\nu\delta\geq\log n, for all rr such that c1rK/n2eKnexp((9/32)nρ/K)\lceil c_{1}rK/n\rceil\geq 2eKn\exp(-(9/32)n\rho/K) we have, for Δ=Δ(r):=exp(δνn)3(c1rKn/(2eKn))c1rKn\Delta=\Delta(r):=\exp(-\delta\nu n)-3\left(\lceil\frac{c_{1}rK}{n}\rceil/(2eKn)\right)^{\lceil\frac{c_{1}rK}{n}\rceil}, with probability at least 1Δ1-\Delta,

supZ𝒞(Z+rB1n×n)<W,ZZ>c0Krδνn+c2rKρnlog(2eKnc1rKn).\sup_{Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n})}\bigl{<}W,Z-Z^{*}\bigr{>}\leq c_{0}Kr\sqrt{\frac{\delta\nu}{n}}+c_{2}r\sqrt{\frac{K\rho}{n}\log\left(\frac{2eKn}{\lceil\frac{c_{1}rK}{n}\rceil}\right)}.

Moreover, we have

c0Krδνn+c2rKρnlog(2eKnc1rKn)θ2rc_{0}Kr\sqrt{\frac{\delta\nu}{n}}+c_{2}r\sqrt{\frac{K\rho}{n}\log\left(\frac{2eKn}{\lceil\frac{c_{1}rK}{n}\rceil}\right)}\leq\frac{\theta}{2}r (5.3)

for θ=δ(pq)\theta=\delta(p-q) when Kνnδ(pq)K\sqrt{\nu}\lesssim\sqrt{n\delta}(p-q) and c1rK/n2eKnexp(θ2n/(Kρ))\lceil c_{1}rK/n\rceil\geq 2eKn\exp(-\theta^{2}n/(K\rho)). In particular, when (pq)2nδK2ν(p-q)^{2}n\delta\geq K^{2}\nu and 12eKnmax(exp(θ2n/(Kρ)),exp((9/32)nρ/K))1\geq 2eKn\max\big{(}\exp(-\theta^{2}n/(K\rho)),\exp(-(9/32)n\rho/K)\big{)}, we conclude that for all 0<rn/(c1K)0<r\leq n/(c_{1}K) (5.3) is true. Therefore, one can take rG(Δ(0))=0r_{G}^{*}(\Delta(0))=0 meaning that we have exact reconstruction of ZZ^{*}: if (pq)2nδK2ν(p-q)^{2}n\delta\geq K^{2}\nu and nKmax(ρ/θ2log(2eK2ρ/θ2),(1/ρ)log(2eK2/ρ))n\gtrsim K\max(\rho/\theta^{2}\log(2eK^{2}\rho/\theta^{2}),(1/\rho)\log(2eK^{2}/\rho)) then with probability at least 1exp(δνn)3/(2eKn)1-\exp(-\delta\nu n)-3/(2eKn), Z^=Z\hat{Z}=Z^{*}.

If (pq)2nδK2ν(p-q)^{2}n\delta\geq K^{2}\nu and 1<2eKnmax(exp(θ2n/(Kρ)),exp((9/32)nρ/K))1<2eKn\max\big{(}\exp(-\theta^{2}n/(K\rho)),\exp(-(9/32)n\rho/K)\big{)} then we do not have exact reconstruction anymore and we have to take rr such that c1rK/n1c_{1}rK/n\geq 1. In that case, (5.3) holds when r=n/(c1K)r=n/(c_{1}K) and so rG(Δ(n/(c1K)))n/(c1K)r_{G}^{*}(\Delta(n/(c_{1}K)))\leq n/(c_{1}K). Therefore, it follows from Corollary 1 that

ZZ^1nc1δ(pq)K.\left\|Z^{*}-\hat{Z}\right\|_{1}\leq\frac{n}{c_{1}\delta(p-q)K}.

6 Proofs of Theorem 4 and Corollary 2 (angular synchronization)

Proof of (4.7): For all γ1,,γn[0,2π)\gamma_{1},\ldots,\gamma_{n}\in[0,2\pi) we have γiγj=δij\gamma_{i}-\gamma_{j}=\delta_{ij} for all ij[n]i\neq j\in[n] if and only if eσ2/2𝔼Aijeιγjeιγi=0e^{\sigma^{2}/2}\mathbb{E}A_{ij}e^{\iota\gamma_{j}}-e^{\iota\gamma_{i}}=0 for all ij[n]i\neq j\in[n]. We therefore have

argminxn:|xi|=1{ij|eσ2/2𝔼Aijxjxi|2}={(eι(θi+θ0))i=1n:θ0[0,2π)}.\operatorname*{argmin}_{x\in{\mathbb{C}}^{n}:|x_{i}|=1}\left\{\sum_{i\neq j}|e^{\sigma^{2}/2}\mathbb{E}A_{ij}x_{j}-x_{i}|^{2}\right\}=\{(e^{\iota(\theta_{i}+\theta_{0})})_{i=1}^{n}:\theta_{0}\in[0,2\pi)\}. (6.1)

Moreover, for all x=(xi)i=1nnx=(x_{i})_{i=1}^{n}\in\mathbb{C}^{n} such that |xi|=1|x_{i}|=1 for i=1,,ni=1,\ldots,n, we have

ij|eσ2/2𝔼Aijxjxi|2=ij|xix¯jxjxi|2=i,j=1n|xix¯jxix¯j|2\displaystyle\sum_{i\neq j}|e^{\sigma^{2}/2}\mathbb{E}A_{ij}x_{j}-x_{i}|^{2}=\sum_{i\neq j}|x^{*}_{i}\bar{x}_{j}^{*}x_{j}-x_{i}|^{2}=\sum_{i,j=1}^{n}|x_{i}^{*}\bar{x}_{j}^{*}-x_{i}\bar{x}_{j}|^{2}
=2n22(i,j=1nxix¯jxjx¯i)=2n22|<x,x>|2\displaystyle=2n^{2}-2\Re\left(\sum_{i,j=1}^{n}x_{i}^{*}\bar{x}_{j}^{*}x_{j}\bar{x}_{i}\right)=2n^{2}-2|\bigl{<}x^{*},x\bigr{>}|^{2}

where (z)\Re(z) denotes the real part of zz\in{\mathbb{C}}. On the other side, we have

x¯(eσ2/2𝔼A)x=ijx¯ixix¯jxj+i=1nx¯ieσ2/2xi=n(eσ2/21)+|<x,x>|2.\displaystyle\bar{x}^{\top}(e^{\sigma^{2}/2}\mathbb{E}A)x=\sum_{i\neq j}\bar{x}_{i}x_{i}^{*}\bar{x}_{j}^{*}x_{j}+\sum_{i=1}^{n}\bar{x}_{i}e^{\sigma^{2}/2}x_{i}=n(e^{\sigma^{2}/2}-1)+|\bigl{<}x^{*},x\bigr{>}|^{2}.

Hence, minimizing xijn|eσ2/2𝔼Aijxjxi|2x\to\sum_{i\neq j}^{n}|e^{\sigma^{2}/2}\mathbb{E}A_{ij}x_{j}-x_{i}|^{2} over all x=(xi)inx=(x_{i})_{i}\in{\mathbb{C}}^{n} such that |xi|=1|x_{i}|=1 is equivalent to maximize xx¯𝔼Axx\to\bar{x}^{\top}\mathbb{E}Ax over all x=(xi)inx=(x_{i})_{i}\in{\mathbb{C}}^{n} such that |xi|=1|x_{i}|=1. This concludes the proof with (6.1).  

Proof of (4.8): Let 𝒞={Zn×n:|Zij|1,i,j[n]}{\cal C}^{\prime}=\{Z\in\mathbb{C}^{n\times n}:|Z_{ij}|\leq 1,\forall i,j\in[n]\}. We first prove that 𝒞𝒞{\cal C}\subset{\cal C}^{\prime}. Let Z𝒞Z\in{\cal C}. Since Z0Z\succeq 0, there exists Xn×nX\in{\mathbb{C}}^{n\times n} such that Z=XX¯Z=X\bar{X}^{\top}. For all i{1,,n}i\in\{1,\ldots,n\}, denote by XiX_{i\bullet} the ii-th row vector of XX. We have Xi22=<Xi,Xi>=Zii=1\left\|X_{i\bullet}\right\|_{2}^{2}=\bigl{<}X_{i\bullet},X_{i\bullet}\bigr{>}=Z_{ii}=1 since diag(Z)=𝟏n\mathrm{diag}(Z)=\mathbf{1}_{n}. Moreover, for all i,j[n]i,j\in[n], we have |Zij|=|<Xi,Xj>|Xi2Xj21|Z_{ij}|=|\bigl{<}X_{i\bullet},X_{j\bullet}\bigr{>}|\leq\left\|X_{i\bullet}\right\|_{2}\left\|X_{j\bullet}\right\|_{2}\leq 1. This proves that Z𝒞Z\in{\cal C}^{\prime} and so 𝒞𝒞{\cal C}\subset{\cal C}^{\prime}.

Let Zargmax((<𝔼A,Z>):Z𝒞}Z^{\prime}\in\operatorname*{argmax}\left(\Re(\bigl{<}\mathbb{E}A,Z\bigr{>}):Z\in{\cal C}^{\prime}\right\}. Since 𝒞{\cal C}^{\prime} is convex and the objective function Z(<𝔼A,Z>)Z\rightarrow\Re(\bigl{<}\mathbb{E}A,Z\bigr{>}) is linear (for real coefficients), ZZ^{\prime} is one of the extreme points of 𝒞{\cal C}^{\prime}. Extreme points of 𝒞{\cal C}^{\prime} are matrices Zn×nZ\in{\mathbb{C}}^{n\times n} such that |Zij|=1|Z_{ij}|=1 for all i,j[n]i,j\in[n]. We can then write each entry of ZZ^{\prime} as Zij=eιβijZ^{\prime}_{ij}=e^{\iota\beta_{ij}} for some 0βij<2π0\leq\beta_{ij}<2\pi and now we obtain

(<𝔼A,Z>)=(i,j=1n𝔼AijZij¯)=(ijneσ2/2eιδijeιβij)+(i=1neιδiieιβii)\displaystyle\Re(\bigl{<}\mathbb{E}A,Z^{\prime}\bigr{>})=\Re{\left(\sum_{i,j=1}^{n}\mathbb{E}A_{ij}\overline{Z^{\prime}_{ij}}\right)}=\Re{\left(\sum_{i\neq j}^{n}e^{-\sigma^{2}/2}e^{\iota\delta_{ij}}e^{-\iota\beta_{ij}}\right)}+\Re{\left(\sum_{i=1}^{n}e^{\iota\delta_{ii}}e^{-\iota\beta_{ii}}\right)}
=ijeσ2/2cos(δijβij)+icos(δiiβii)eσ2/2(n2n)+n.\displaystyle=\sum_{i\neq j}e^{\sigma^{2}/2}\cos(\delta_{ij}-\beta_{ij})+\sum_{i}\cos(\delta_{ii}-\beta_{ii})\leq e^{\sigma^{2}/2}(n^{2}-n)+n.

The maximal value eσ2/2(n2n)+ne^{\sigma^{2}/2}(n^{2}-n)+n is attained only for βij=δij\beta_{ij}=\delta_{ij} for all i,j[n]i,j\in[n], that is for Z=(eιδij)i,j=1,,n=ZZ^{\prime}=(e^{\iota\delta_{ij}})_{i,j=1,\ldots,n}=Z^{*}. But we have Z𝒞Z^{*}\in{\cal C} and 𝒞𝒞{\cal C}\subset{\cal C}^{\prime}, so ZZ^{*} is the only maximizer of Z(<𝔼A,Z>)Z\to\Re(\bigl{<}\mathbb{E}A,Z\bigr{>}) on 𝒞{\cal C}. But for all Z𝒞Z\in{\cal C} we have <𝔼A,Z>=x¯Zx\bigl{<}\mathbb{E}A,Z\bigr{>}=\overline{x^{*}}^{\top}Zx^{*}\in{\mathbb{R}}, then ZZ^{*} is the only maximizer of Z<𝔼A,Z>Z\to\bigl{<}\mathbb{E}A,Z\bigr{>} over 𝒞{\cal C}.  

6.1 Curvature of the objective function

Proposition 4.

For θ=eσ2/2/2\theta=e^{-\sigma^{2}/2}/2, we have for all Z𝒞Z\in{\cal C}, <𝔼A,ZZ>θZZ22\bigl{<}{\mathbb{E}}A,Z^{*}-Z\bigr{>}\geq\theta\left\|Z^{*}-Z\right\|_{2}^{2}.

Proof.

Let Z=(zijeιβij)i,j=1n)𝒞Z=(z_{ij}e^{\iota\beta_{ij}})_{i,j=1}^{n})\in{\cal C} where zijz_{ij}\in{\mathbb{R}} and 0βij<2π0\leq\beta_{ij}<2\pi for all i,j[n]i,j\in[n]. Since Zii=Zii=1Z_{ii}^{*}=Z_{ii}=1 for all i[n]i\in[n], we have, on one side, <𝔼A,ZZ>=eσ2/2x¯(ZZ)x\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}=e^{-\sigma^{2}/2}\overline{x^{*}}^{\top}(Z^{*}-Z)x^{*}\in{\mathbb{R}}, and so

<𝔼A,ZZ>\displaystyle\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>} =(<𝔼A,ZZ>)=(i,j=1n𝔼Aij(ZZ)ij¯)=(i,j=1neσ2/2eιδij(eιδijzijeιβij))\displaystyle=\Re{\left(\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\right)}=\Re{\left(\sum_{i,j=1}^{n}\mathbb{E}A_{ij}\overline{(Z^{*}-Z)_{ij}}\right)}=\Re{\left(\sum_{i,j=1}^{n}e^{-\sigma^{2}/2}e^{\iota\delta_{ij}}(e^{-\iota\delta_{ij}}-z_{ij}e^{-\iota\beta_{ij}})\right)}
=eσ2/2(i,j=1n1zijeι(δijβij))=eσ2/2i,j=1n(1zijcos(δijβij)).\displaystyle=e^{-\sigma^{2}/2}\Re{\left(\sum_{i,j=1}^{n}1-z_{ij}e^{\iota(\delta_{ij}-\beta_{ij})}\right)}=e^{-\sigma^{2}/2}\sum_{i,j=1}^{n}(1-z_{ij}\cos(\delta_{ij}-\beta_{ij})). (6.2)

On the other side, we proved in the proof of (4.8) that 𝒞{Zn×n:|Zij|1,i,j[n]}{\cal C}\subset\{Z\in\mathbb{C}^{n\times n}:|Z_{ij}|\leq 1,\forall i,j\in[n]\}. So we have |zij|1|z_{ij}|\leq 1 for all i,j[n]i,j\in[n] and

ZZ22=i,j=1n|(ZZ)ij|2=i,j=1n|eιδijzijeιβij|2=i,j=1n|1zijeι(δij+βij)|2\displaystyle\left\|Z^{*}-Z\right\|_{2}^{2}=\sum_{i,j=1}^{n}|(Z^{*}-Z)_{ij}|^{2}=\sum_{i,j=1}^{n}|e^{\iota\delta_{ij}}-z_{ij}e^{\iota\beta_{ij}}|^{2}=\sum_{i,j=1}^{n}|1-z_{ij}e^{\iota(-\delta_{ij}+\beta_{ij})}|^{2}
=i,j=1n(1zijcos(βijδij))2+zij2sin2(βijδij)=i,j=1n12zijcos(βijδij)+zij2\displaystyle=\sum_{i,j=1}^{n}(1-z_{ij}\cos(\beta_{ij}-\delta_{ij}))^{2}+z_{ij}^{2}\sin^{2}(\beta_{ij}-\delta_{ij})=\sum_{i,j=1}^{n}1-2z_{ij}\cos(\beta_{ij}-\delta_{ij})+z_{ij}^{2}
2i,j=1n(1zijcos(βijδij)).\displaystyle\leq 2\sum_{i,j=1}^{n}(1-z_{ij}\cos(\beta_{ij}-\delta_{ij})). (6.3)

We conclude with (6.1) and (6.1).  

In fact, it follows from the proof of Proposition 4 that we have the following equality: for all Z𝒞Z\in{\cal C},

<𝔼A,ZZ>=θ(ZZ22+|Z|2|Z|21)\bigl{<}{\mathbb{E}}A,Z^{*}-Z\bigr{>}=\theta\left(\left\|Z^{*}-Z\right\|_{2}^{2}+\left\||Z^{*}|^{2}-|Z|^{2}\right\|_{1}\right)

where |Z|2=(|Zij|2)1i,jn|Z|^{2}=(|Z_{ij}|^{2})_{1\leq i,j\leq n} (in particular, |Z|2=(1)n×n|Z^{*}|^{2}=(1)_{n\times n}). We therefore know exactly how to characterize the curvature of the excess risk for the angular synchronization problem in terms of the 2\ell_{2} (to the square) and the 1\ell_{1} norms. Nevertheless, we will not use the extra term |Z|2|Z|21\left\||Z^{*}|^{2}-|Z|^{2}\right\|_{1} in the following.

6.2 Computation of the complexity fixed point rG(Δ)r^{*}_{G}(\Delta)

It follows from the (global) curvature property of the excess risk for the angular synchronization problem obtained in Proposition 4 that for the curvature GG function defined by G(ZZ)=θZZ22,Z𝒞G(Z^{*}-Z)=\theta\left\|Z^{*}-Z\right\|_{2}^{2},\forall Z\in{\cal C}, we just have to compute the rG(Δ)r^{*}_{G}(\Delta) fixed point and then apply Corollary 1 in order to obtain statistical properties of Z^\hat{Z} (w.r.t. to both the excess risk and the GG function). In this section, we compute the complexity fixed point rG(Δ)r_{G}^{*}(\Delta) for 0<Δ<10<\Delta<1.

Following Proposition 4, the natural “local” subsets of 𝒞{\cal C} around ZZ^{*} which drive the statistical complexity of the synchronization problem are defined for all r>0r>0 by 𝒞r={Z𝒞:ZZ2r}=𝒞(Z+rB2n×n){\cal C}_{r}=\{Z\in{\cal C}:\left\|Z-Z^{*}\right\|_{2}\leq r\}={\cal C}\cap(Z^{*}+rB_{2}^{n\times n}).

Let Z𝒞rZ\in{\cal C}_{r}. Denote by bijRb_{ij}^{R} (resp. bijIb_{ij}^{I}) the real (resp. imaginary) part of bij=ZijZijZij¯b_{ij}=Z_{ij}^{*}\overline{Z_{ij}-Z_{ij}^{*}} for all i,j[n]i,j\in[n]. Since ZZ2r\left\|Z-Z^{*}\right\|_{2}\leq r we also have i,j(bijR)2+(bijI)2r2\sum_{i,j}(b_{ij}^{R})^{2}+(b_{ij}^{I})^{2}\leq r^{2} and so

<A𝔼A,ZZ>=<(S𝔼S)Z,ZZ>=2(i<j(Sij𝔼Sij)bij)\displaystyle\bigl{<}A-{\mathbb{E}}A,Z-Z^{*}\bigr{>}=\bigl{<}(S-{\mathbb{E}}S)\circ Z^{*},Z-Z^{*}\bigr{>}=2\Re\left(\sum_{i<j}(S_{i}j-{\mathbb{E}}S_{ij})b_{ij}\right)
=2i<j(cos(σgij)𝔼cos(σgij))bijRsin(σgij)bijI2ri<j(cos(σgij)𝔼cos(σgij))2+(sin(σgij))2\displaystyle=2\sum_{i<j}(\cos(\sigma g_{ij})-\mathbb{E}\cos(\sigma g_{ij}))b_{ij}^{R}-\sin(\sigma g_{ij})b_{ij}^{I}\leq 2r\sqrt{\sum_{i<j}(\cos(\sigma g_{ij})-\mathbb{E}\cos(\sigma g_{ij}))^{2}+(\sin(\sigma g_{ij}))^{2}}
2r1eσ2+2eσ2/2(i<j𝔼cos(σgij)cos(σgij))\displaystyle\leq 2r\sqrt{1-e^{-\sigma^{2}}+2e^{-\sigma^{2}/2}\left(\sum_{i<j}{\mathbb{E}}\cos(\sigma g_{ij})-\cos(\sigma g_{ij})\right)}

where we used that 𝔼cos(σg)=(𝔼eιg)=eσ2/2\mathbb{E}\cos(\sigma g)=\Re({\mathbb{E}}e^{\iota g})=e^{-\sigma^{2}/2} for g𝒩(0,1)g\sim{\cal N}(0,1). Now its remains to get a high probability upper bound on the sum of the centered cosinus of σgij\sigma g_{ij}. We use Bernstein’s inequality (see Equation 7.3 below) to get such a bound. For all t>0t>0, with probability at least 1exp(t)1-\exp(-t),

1Ni<j𝔼cos(σgij)cos(σgij)2Vt+2t3N(1eσ2)t+2t3N\frac{1}{\sqrt{N}}\sum_{i<j}{\mathbb{E}}\cos(\sigma g_{ij})-\cos(\sigma g_{ij})\leq\sqrt{2Vt}+\frac{2t}{3\sqrt{N}}\leq(1-e^{-\sigma^{2}})\sqrt{t}+\frac{2t}{3\sqrt{N}}

for N=n(n1)/2N=n(n-1)/2 and V=𝔼cos2(σg)(𝔼cos(σg))2=(1/2)(1eσ2)2V=\mathbb{E}\cos^{2}(\sigma g)-(\mathbb{E}\cos(\sigma g))^{2}=(1/2)(1-e^{-\sigma^{2}})^{2} (because 𝔼cos2(σg)=(1/2)𝔼(1+cos(2σg))=(1/2)(1+e2σ2)\mathbb{E}\cos^{2}(\sigma g)=(1/2){\mathbb{E}}(1+\cos(2\sigma g))=(1/2)(1+e^{-2\sigma^{2}}) when g𝒩(0,1)g\sim{\cal N}(0,1)).

We now have all the ingredients to compute the fixed point rG(Δ)r_{G}^{*}(\Delta) for 0<Δ<10<\Delta<1: for θ=eσ2/2/2\theta=e^{-\sigma^{2}/2}/2 and t=log(1/Δ)t=\log(1/\Delta),

rG(Δ)4θ(1eσ2+2eσ2/2((1eσ2)tN+2t3))=32t3+8(1eσ2)(eσ2/2+2tN).r^{*}_{G}(\Delta)\leq\frac{4}{\theta}\left(1-e^{-\sigma^{2}}+2e^{-\sigma^{2}/2}\left((1-e^{-\sigma^{2}})\sqrt{tN}+\frac{2t}{3}\right)\right)=\frac{32t}{3}+8(1-e^{-\sigma^{2}})(e^{\sigma^{2}/2}+2\sqrt{tN}).

In particular, using 1eσ2σ21-e^{-\sigma^{2}}\leq\sigma^{2} and for t=ϵσ4Nt=\epsilon\sigma^{4}N (where N=n(n1)/2N=n(n-1)/2) for some 0<ϵ<10<\epsilon<1, if eσ2/22σ2ϵNe^{\sigma^{2}/2}\leq 2\sigma^{2}\sqrt{\epsilon}N then rG(Δ)(128/3)σ4Nϵr^{*}_{G}(\Delta)\leq(128/3)\sigma^{4}N\sqrt{\epsilon}.

6.3 End of the proof of Theorem 4 and Corollary 2: application of Corollary 1

Take Δ=exp(ϵσ4N)\Delta=\exp(-\epsilon\sigma^{4}N) (for N=n(n1)/2N=n(n-1)/2), we have rG(Δ)(128/3)ϵσ4Nr^{*}_{G}(\Delta)\leq(128/3)\sqrt{\epsilon}\sigma^{4}N when eσ2/22ϵσ2Ne^{\sigma^{2}/2}\leq 2\sqrt{\epsilon}\sigma^{2}N (which holds for instance when σlog(ϵN2)\sigma\leq\sqrt{\log(\epsilon N^{2})}) and so it follows from Corollary 1 (together with the curvature property in Section 6.1 and the computation of the fixed point rG(Δ)r^{*}_{G}(\Delta) from Section 6.2), that with probability at least 1exp(ϵσ4n(n1)/2)1-\exp(-\epsilon\sigma^{4}n(n-1)/2), θ<ZZ>22<𝔼A,ZZ>(128/3)ϵσ4N\theta\bigl{<}Z^{*}-Z\bigr{>}_{2}^{2}\leq\bigl{<}\mathbb{E}A,Z^{*}-Z\bigr{>}\leq(128/3)\sqrt{\epsilon}\sigma^{4}N, which is the statement of Theorem 4.

Proof of Corollary 2: The oracle ZZ^{*} is the rank one matrix xx¯x^{*}\overline{x^{*}}^{\top} which has nn for largest eigenvalue and associated eigenspace {λx:λ}\{\lambda x^{*}:\lambda\in{\mathbb{C}}\}. In particular, ZZ^{*} has a spectral gap g=ng=n. Let x^n\hat{x}\in{\mathbb{C}}^{n} be a top eigenvector of Z^\hat{Z} with norm x^2=n\left\|\hat{x}\right\|_{2}=\sqrt{n}. It follows from Davis-Kahan Theorem (see, for example, Theorem 4.5.5 in [100] or Theorem 4 in [101]) that there exists an universal constant c0>0c_{0}>0 such that

minz:|z|=1x^nzxn2c0gZ^Z2\min_{z\in{\mathbb{C}}:|z|=1}\left\|\frac{\hat{x}}{\sqrt{n}}-z\frac{x^{*}}{\sqrt{n}}\right\|_{2}\leq\frac{c_{0}}{g}\left\|\hat{Z}-Z^{*}\right\|_{2}

where g=ng=n is the spectral gap of ZZ^{*}. We conclude the proof of Corollary 2 using the upper bound on Z^Z2\left\|\hat{Z}-Z^{*}\right\|_{2} from Theorem 4.  

7 Proofs of Theorem 5 and 6 (MAX-CUT)

In this section, we prove the two main results from Section 4.3 using our general methodology for Theorem 6 and the technique from [49] for Theorem 5.

7.1 Proof of Theorem 5

The proof of Theorem 5 follows the one from [49] up to a minor modification due to the fact that we use the SDP estimator Z^\hat{Z} instead of the oracle ZZ^{*}. It is based on two tools. The first one is Grothendieck’s identity: let g𝒩(0,In)g\sim{\cal N}(0,\mathrm{I}_{n}) and u,v𝒮2n1u,v\in{\cal S}_{2}^{n-1}, we have:

𝔼[sign(<g,u>)sign(<g,u>)]=2πarcsin(<u,v>){\mathbb{E}}[\mathrm{sign}(\bigl{<}g,u\bigr{>})\mathrm{sign}(\bigl{<}g,u\bigr{>})]=\frac{2}{\pi}\mathrm{arcsin}(\bigl{<}u,v\bigr{>}) (7.1)

and the identity: for all t[1,1]t\in[-1,1]

12πarcsin(t)=2πarccos(t)0.878(1t).1-\frac{2}{\pi}\mathrm{arcsin}(t)=\frac{2}{\pi}\mathrm{arccos}(t)\geq 0.878(1-t). (7.2)

We now have enough tools to prove Theorem 5. The right-hand side inequality is trivial since MAXCUT(G)SDP(G)\mathrm{MAXCUT}(G)\leq\mathrm{SDP}(G). For the left-hand side, we denote by X^1,,X^n\hat{X}_{1},\ldots,\hat{X}_{n} (resp. X1,,XnX_{1}^{*},\ldots,X_{n}^{*}) the nn columns vectors in 𝒮2n1{\cal S}_{2}^{n-1} of Z^\hat{Z} (resp. ZZ^{*}). We also consider the event Ω\Omega^{*} onto which

<𝔼B,ZZ^>r(Δ)\bigl{<}\mathbb{E}B,Z^{*}-\hat{Z}\bigr{>}\leq r^{*}(\Delta)

which hold with probability at least 1Δ1-\Delta according to Theorem 1. On the event Ω\Omega^{*}, we have

𝔼[cut(G,x^)|Z^]=𝔼[14i,j=1nAij0(1x^ix^j)]=14i,j=1nAij0(1𝔼[sign(<X^i,g>)sign(<X^j,g>)])\displaystyle{\mathbb{E}}\left[\mathrm{cut}(G,\hat{x})|\hat{Z}\right]={\mathbb{E}}\left[\frac{1}{4}\sum_{i,j=1}^{n}A^{0}_{ij}(1-\hat{x}_{i}\hat{x}_{j})\right]=\frac{1}{4}\sum_{i,j=1}^{n}A^{0}_{ij}\left(1-{\mathbb{E}}[\mathrm{sign}(\bigl{<}\hat{X}_{i},g\bigr{>})\mathrm{sign}(\bigl{<}\hat{X}_{j},g\bigr{>})]\right)
=(i)14i,j=1nAij0(12πarcsin(<X^i,X^j>))=12πi,j=1nAij0arccos(<X^i,X^j>)\displaystyle\overset{(i)}{=}\frac{1}{4}\sum_{i,j=1}^{n}A^{0}_{ij}\left(1-\frac{2}{\pi}\arcsin(\bigl{<}\hat{X}_{i},\hat{X}_{j}\bigr{>})\right)=\frac{1}{2\pi}\sum_{i,j=1}^{n}A^{0}_{ij}\arccos(\bigl{<}\hat{X}_{i},\hat{X}_{j}\bigr{>})
(ii)0.8784i,j=1nAij0(1<X^i,X^j>)=0.8784i,j=1nAij0(1<Xi,Xj>)+0.8784i,j=1nAij0(<Xi,Xj><X^i,X^j>)\displaystyle\overset{(ii)}{\geq}\frac{0.878}{4}\sum_{i,j=1}^{n}A^{0}_{ij}(1-\bigl{<}\hat{X}_{i},\hat{X}_{j}\bigr{>})=\frac{0.878}{4}\sum_{i,j=1}^{n}A^{0}_{ij}(1-\bigl{<}X_{i}^{*},X_{j}^{*}\bigr{>})+\frac{0.878}{4}\sum_{i,j=1}^{n}A^{0}_{ij}(\bigl{<}X_{i}^{*},X_{j}^{*}\bigr{>}-\bigl{<}\hat{X}_{i},\hat{X}_{j}\bigr{>})
=0.8784<A0,JZ>+0.8784<A0,ZZ^>=0.878SDP(G)0.8784<𝔼B,ZZ^>\displaystyle=\frac{0.878}{4}\bigl{<}A^{0},J-Z^{*}\bigr{>}+\frac{0.878}{4}\bigl{<}A^{0},Z^{*}-\hat{Z}\bigr{>}=0.878\mathrm{SDP}(G)-\frac{0.878}{4}\bigl{<}\mathbb{E}B,Z^{*}-\hat{Z}\bigr{>}
0.878SDP(G)0.8784r(Δ)\displaystyle\geq 0.878\ \mathrm{SDP}(G)-\frac{0.878}{4}r^{*}(\Delta)

where we used (7.1) in (i) and (7.2) in (ii).

7.2 Proof of Theorem 6

For the MAX-CUT problem, we do not use any localization argument; we therefore use the (likely sub-optimal) global approach. The methodology is very close to the one used in [54] for the community detection problem. In particular, we use both Bernstein and Grothendieck inequalities to compute high probability upper bound for r(Δ)r^{*}(\Delta). We recall theses two tools now. First Bernstein’s inequality: if Y1,,YNY_{1},\ldots,Y_{N} are NN independent centered random variables such that |Yi|M|Y_{i}|\leq M a.s. for all i=1,,Ni=1,\ldots,N then for all t>0t>0, with probability at least 1exp(t)1-\exp(-t),

1Ni=1NYiσ2t+2Mt3N\frac{1}{\sqrt{N}}\sum_{i=1}^{N}Y_{i}\leq\sigma\sqrt{2t}+\frac{2Mt}{3\sqrt{N}} (7.3)

where σ2=(1/N)i=1Nvar(Yi)\sigma^{2}=(1/N)\sum_{i=1}^{N}{\rm var}(Y_{i}). The second tool is Grothendieck inequality [52] (see also [85] or Theorem 3.4 in [54]): if Cn×nC\in{\mathbb{R}}^{n\times n} then

supZ𝒞<C,Z>KGCcut=KGmaxs,t{1,1}ni,j=1nCijsitj\sup_{Z\in{\cal C}}\bigl{<}C,Z\bigr{>}\leq K_{G}\left\|C\right\|_{cut}=K_{G}\max_{s,t\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}C_{ij}s_{i}t_{j} (7.4)

where 𝒞={Z0:Zii=1,i=1,,n}{\cal C}=\{Z\succeq 0:Z_{ii}=1,i=1,\ldots,n\} and KGK_{G} is an absolute constant, called the Grothendieck constant.

In order to apply Theorem 1, we just have to compute the fixed point r(Δ)r^{*}(\Delta). As announced, we use the global approach and Grothendieck inequality (7.4) to get

supZ𝒞:<𝔼B,ZZ>r<B𝔼B,ZZ>supZ𝒞<B𝔼B,ZZ>2KGB𝔼Bcut\sup_{Z\in{\cal C}:\bigl{<}\mathbb{E}B,Z^{*}-Z\bigr{>}\leq r}\bigl{<}B-\mathbb{E}B,Z-Z^{*}\bigr{>}\leq\sup_{Z\in{\cal C}}\bigl{<}B-\mathbb{E}B,Z-Z^{*}\bigr{>}\leq 2K_{G}\left\|B-\mathbb{E}B\right\|_{cut} (7.5)

because Z𝒞Z^{*}\in{\cal C}. It follows from Bernstein’s inequality (7.3) and a union bound that for all t>0t>0, with probability at least 14nexp(t)1-4^{n}\exp(-t),

B𝔼Bcut=sups,t{±1}n1i<jn(Bij𝔼Bij)(sitj+sjti)2(1p)n(n1)tp+4t3.\left\|B-\mathbb{E}B\right\|_{cut}=\sup_{s,t\in\{\pm 1\}^{n}}\sum_{1\leq i<j\leq n}(B_{ij}-\mathbb{E}B_{ij})(s_{i}t_{j}+s_{j}t_{i})\leq 2\sqrt{\frac{(1-p)n(n-1)t}{p}}+\frac{4t}{3}.

Therefore, for t=2nlog4t=2n\log 4, with probability at least 14n1-4^{-n},

r(Δ)B𝔼Bcut2n(2log4)(1p)(n1)p+8nlog43r^{*}(\Delta)\leq\left\|B-\mathbb{E}B\right\|_{cut}\leq 2n\sqrt{\frac{(2\log 4)(1-p)(n-1)}{p}}+\frac{8n\log 4}{3}

for Δ=4n\Delta=4^{-n}. Then the result follows from Theorem 1.

8 Annex 1: Signed clustering

8.1 Proof of Equation (4.1)

We recall that the cluster matrix Z¯{0,1}n×n\bar{Z}\in\{0,1\}^{n\times n} is defined by Zij=1Z_{ij}=1 if iji\sim j and Zij=0Z_{ij}=0 when i≁ji\not\sim j and α=δ(p+q1)\alpha=\delta(p+q-1). For all matrix Z[0,1]n×nZ\in[0,1]^{n\times n}, we have

<Z,𝔼AαJ>=i,j=1nZij(𝔼Aijα)=(i,j)𝒞+Zij(𝔼Aijα)+(i,j)𝒞Zij(𝔼Aijα)\displaystyle\bigl{<}Z,{\mathbb{E}}A-\alpha J\bigr{>}=\sum_{i,j=1}^{n}Z_{ij}({\mathbb{E}}A_{ij}-\alpha)=\sum_{(i,j)\in{\cal C}^{+}}Z_{ij}({\mathbb{E}}A_{ij}-\alpha)+\sum_{(i,j)\in{\cal C}^{-}}Z_{ij}({\mathbb{E}}A_{ij}-\alpha)
=[δ(2p1)α](i,j)𝒞+:ijYij+[δ(2q1)α](i,j)𝒞Zij+(1α)i=1nZii\displaystyle=[\delta(2p-1)-\alpha]\sum_{(i,j)\in{\cal C}^{+}:i\neq j}Y_{ij}+[\delta(2q-1)-\alpha]\sum_{(i,j)\in{\cal C}^{-}}Z_{ij}+(1-\alpha)\sum_{i=1}^{n}Z_{ii}
=δ(pq)[(i,j)𝒞+Yij(i,j)𝒞Yij]+(1α)i=1nZii.\displaystyle=\delta(p-q)\left[\sum_{(i,j)\in{\cal C}^{+}}Y_{ij}-\sum_{(i,j)\in{\cal C}^{-}}Y_{ij}\right]+(1-\alpha)\sum_{i=1}^{n}Z_{ii}.

The latter quantity is maximal for Z[0,1]n×nZ\in[0,1]^{n\times n} such that Zij=1Z_{ij}=1 for (i,j)𝒞+(i,j)\in{\cal C}^{+} and Zij=0Z_{ij}=0 for (i,j)𝒞(i,j)\in{\cal C}^{-}, that is when Z=Z¯Z=\bar{Z}. As a consequence {Z¯}=argmaxZ[0,1]n×n<Z,𝔼AαJ>\{\bar{Z}\}=\operatorname*{argmax}_{Z\in[0,1]^{n\times n}}\bigl{<}Z,{\mathbb{E}}A-\alpha J\bigr{>}. Moreover, Z¯𝒞[0,1]n×n\bar{Z}\in{\cal C}\subset[0,1]^{n\times n} so we also have that Z¯\bar{Z} is the only solution to the problem maxZ𝒞<Z,𝔼AαJ>\max_{Z\in{\cal C}}\bigl{<}Z,{\mathbb{E}}A-\alpha J\bigr{>} and so Z¯=Z\bar{Z}=Z^{*}.  

8.2 Proof of Proposition 2: control of S1(Z)S_{1}(Z) adapted from [41]

The noise matrix WW is symmetric and has been decomposed as W=Ψ+ΨW=\Psi+\Psi^{\top} where Ψ\Psi has been defined in (5.1). For all Z𝒞(Z+rB1n×n)Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n}), we have

S1(Z)=<𝒫(ZZ),W>=<𝒫(W),ZZ>\displaystyle S_{1}(Z)=\bigl{<}{\cal P}(Z-Z^{*}),W\bigr{>}=\bigl{<}{\cal P}(W),Z-Z^{*}\bigr{>}
=<UUW,ZZ>+<WUU,ZZ><UUWUU,ZZ>\displaystyle=\bigl{<}UU^{\top}W,Z-Z^{*}\bigr{>}+\bigl{<}WUU^{\top},Z-Z^{*}\bigr{>}-\bigl{<}UU^{\top}WUU^{\top},Z-Z^{*}\bigr{>}
=2<UUW,ZZ><UUWUU,ZZ>\displaystyle=2\bigl{<}UU^{\top}W,Z-Z^{*}\bigr{>}-\bigl{<}UU^{\top}WUU^{\top},Z-Z^{*}\bigr{>}
=2<UUΨ,ZZ>+2<UUΨ,ZZ><UU(Ψ+Ψ)UU,ZZ>\displaystyle=2\bigl{<}UU^{\top}\Psi,Z-Z^{*}\bigr{>}+2\bigl{<}UU^{\top}\Psi^{\top},Z-Z^{*}\bigr{>}-\bigl{<}UU^{\top}(\Psi+\Psi^{\top})UU^{\top},Z-Z^{*}\bigr{>}
=2<UUΨ,ZZ>+2<UUΨ,ZZ>2<UUΨUU,ZZ>\displaystyle=2\bigl{<}UU^{\top}\Psi,Z-Z^{*}\bigr{>}+2\bigl{<}UU^{\top}\Psi^{\top},Z-Z^{*}\bigr{>}-2\bigl{<}UU^{\top}\Psi UU^{\top},Z-Z^{*}\bigr{>}
=2<UUΨ,ZZ>+2<UUΨ,ZZ>2<UUΨ,(ZZ)UU>.\displaystyle=2\bigl{<}UU^{\top}\Psi,Z-Z^{*}\bigr{>}+2\bigl{<}UU^{\top}\Psi^{\top},Z-Z^{*}\bigr{>}-2\bigl{<}UU^{\top}\Psi,(Z-Z^{*})UU^{\top}\bigr{>}. (8.1)

An upper bound on S1(Z)S_{1}(Z) follows from an upper bound on the three terms in the right side of (8.2). Let us show how to bound the first term. Similar arguments can be used to control the other two terms.

Let V:=UUTΨV:=UU^{T}\Psi. Let us find a high probability upper bound on the term <UUΨ,ZZ>=<V,ZZ>\bigl{<}UU^{\top}\Psi,Z-Z^{*}\bigr{>}=\bigl{<}V,Z-Z^{*}\bigr{>} uniformly over Z𝒞(Z+rB1n×n)Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n}). For all k[K]k\in[K], i𝒞ki\in{\cal C}_{k} and j[n]j\in[n], we have

Vij=t=1n(UUT)itΨtj=t𝒞k1lkΨtj=1lkt𝒞kΨtj=1lkt𝒞kΨtj.V_{ij}=\sum_{t=1}^{n}(UU^{T})_{it}\Psi_{tj}=\sum_{t\in{\cal C}_{k}}\frac{1}{l_{k}}\Psi_{tj}=\frac{1}{l_{k}}\sum_{t\in{\cal C}_{k}}\Psi_{tj}=\frac{1}{l_{k}}\sum_{t\in{\cal C}_{k}}\Psi_{tj}.

Therefore, given j[n]j\in[n] the VijV_{ij}’s are all equal for i𝒞ki\in{\cal C}_{k}. We can therefore fix some arbitrary index ik𝒞ki_{k}\in{\cal C}_{k} and have Vij=VikjV_{ij}=V_{i_{k}j} for all i𝒞ki\in{\cal C}_{k}. Moreover, (Vikj:k[K],j[n])(V_{i_{k}j}:k\in[K],j\in[n]) is a family of independent random variables. We now have

<V,ZZ>=k[K]i𝒞kj[n]Vij(ZZ)ij=k[K]j[n]lkVikji𝒞k(ZZ)ijlk=k[K]j[n]lkVikjwkj\displaystyle\bigl{<}V,Z-Z^{*}\bigr{>}=\sum_{k\in[K]}\sum_{i\in{\cal C}_{k}}\sum_{j\in[n]}V_{ij}(Z-Z^{*})_{ij}=\sum_{k\in[K]}\sum_{j\in[n]}l_{k}V_{i_{k}j}\sum_{i\in{\cal C}_{k}}\frac{(Z-Z^{*})_{ij}}{l_{k}}=\sum_{k\in[K]}\sum_{j\in[n]}l_{k}V_{i_{k}j}w_{kj}

which is a weighted sum of nKnK independent centered random variables Xk,j:=lkVikjX_{k,j}:=l_{k}V_{i_{k}j} with weights wk,j=(1/lk)i𝒞k(ZZ)ijw_{k,j}=(1/l_{k})\sum_{i\in{\cal C}_{k}}(Z-Z^{*})_{ij} for k[K],j[n]k\in[K],j\in[n]. We now idenfity some properties on the weights wkjw_{kj}.

The weights are such that

k[K]j[n]|wk,j|c1Knk[K]j[n]i𝒞k|(ZZ)ij|=ZZ1c1Knc1rKn\displaystyle\sum_{k\in[K]}\sum_{j\in[n]}|w_{k,j}|\leq\frac{c_{1}K}{n}\sum_{k\in[K]}\sum_{j\in[n]}\sum_{i\in{\cal C}_{k}}|(Z-Z^{*})_{ij}|=\left\|Z-Z^{*}\right\|_{1}\frac{c_{1}K}{n}\leq\frac{c_{1}rK}{n}

which is equivalent to say that the weights vector w=(wkj:k[K],j[n])w=(w_{kj}:k\in[K],j\in[n]) is in the 1Kn\ell_{1}^{Kn}-ball (c1rK/n)B1Kn(c_{1}rK/n)B_{1}^{Kn}. It is also in the unit Kn\ell_{\infty}^{Kn}-ball since for all k[K]k\in[K] and j[n]j\in[n],

|wk,j|i𝒞k|(ZZ)ij|lkZZ1.|w_{k,j}|\leq\sum_{i\in{\cal C}_{k}}\frac{|(Z-Z^{*})_{ij}|}{l_{k}}\leq\left\|Z-Z^{*}\right\|_{\infty}\leq 1.

We therefore have wBKn(c1rK/n)B1Knw\in B_{\infty}^{Kn}\cap(c_{1}rK/n)B_{1}^{Kn} and so

supZ𝒞(Z+rB1n×n)<V,ZZ>supwBKn(c1rK/n)B1Knk[K],j[n]Xk,jwk,j.\sup_{Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n})}\bigl{<}V,Z-Z^{*}\bigr{>}\leq\sup_{w\in B_{\infty}^{Kn}\cap(c_{1}rK/n)B_{1}^{Kn}}\sum_{k\in[K],j\in[n]}X_{k,j}w_{k,j}. (8.2)

It remains to find a high probability upper bound on the latter term. We can use the following lemma to that end.

Lemma 1.

Let Xk,j=t𝒞kΨtjX_{k,j}=\sum_{t\in{\cal C}_{k}}\Psi_{tj} for (k,j)[K]×[n](k,j)\in[K]\times[n]. For all 0RKn0\leq R\leq Kn, if R2eKnexp((9/32)nρ/K)\lceil R\rceil\geq 2eKn\exp(-(9/32)n\rho/K) then with probability at least 1(R/(2eKn))R1-\left(\lceil R\rceil/(2eKn)\right)^{\lceil R\rceil},

supwBKnRB1Kn(k,j)[K]×[n]Xk,jwk,j48c0RnρKlog(2eKnR).\sup_{w\in B_{\infty}^{Kn}\cap RB_{1}^{Kn}}\sum_{(k,j)\in[K]\times[n]}X_{k,j}w_{k,j}\leq 4\sqrt{8c_{0}}R\sqrt{\frac{n\rho}{K}\log\left(\frac{2eKn}{\lceil R\rceil}\right)}.

Proof of Lemma 1. Let N=KnN=Kn and assume that 1RN1\leq R\leq N. We denote by X1X2,XNX_{1}^{*}\geq X_{2}^{*}\geq\cdots,\geq X_{N}^{*} (resp. w1wNw_{1}^{*}\geq\cdots\geq w_{N}^{*}) the non-decreasing rearrangement of |Xk,j||X_{k,j}| (resp. |wk,j||w_{k,j}|) for (k,j)[K]×[n](k,j)\in[K]\times[n]. We have

supwBNRB1N(k,j)[K]×[n]Xk,jwk,jsupwBNRB1Ni=1NXiwisupwBNi=1RXiwi+supwRB1Ni=R+1NXiwi\displaystyle\sup_{w\in B_{\infty}^{N}\cap RB_{1}^{N}}\sum_{(k,j)\in[K]\times[n]}X_{k,j}w_{k,j}\leq\sup_{w\in B_{\infty}^{N}\cap RB_{1}^{N}}\sum_{i=1}^{N}X_{i}^{*}w_{i}^{*}\leq\sup_{w\in B_{\infty}^{N}}\sum_{i=1}^{\lceil R\rceil}X_{i}^{*}w_{i}^{*}+\sup_{w\in RB_{1}^{N}}\sum_{i=\lceil R\rceil+1}^{N}X_{i}^{*}w_{i}^{*}
i=1RXi+RXR+12i=1RXi.\displaystyle\leq\sum_{i=1}^{\lceil R\rceil}X_{i}^{*}+RX_{\lceil R\rceil+1}^{*}\leq 2\sum_{i=1}^{\lceil R\rceil}X_{i}^{*}.

Moreover, for all τ>0\tau>0, using a union bound, we have

(i=1RXi>τ)=(I[K]×[n]:|I|=R and (k,j)I|Xk,j|>τ)\displaystyle{\mathbb{P}}\left(\sum_{i=1}^{\lceil R\rceil}X_{i}^{*}>\tau\right)={\mathbb{P}}\left(\exists I\subset[K]\times[n]:|I|=\lceil R\rceil\mbox{ and }\sum_{(k,j)\in I}|X_{k,j}|>\tau\right)
=(maxI[K]×[n]:|I|=Rmaxuk,j=±1,(k,j)I(k,j)Iuk,jXk,j>τ)\displaystyle={\mathbb{P}}\left(\max_{I\subset[K]\times[n]:|I|=\lceil R\rceil}\max_{u_{k,j}=\pm 1,(k,j)\in I}\sum_{(k,j)\in I}u_{k,j}X_{k,j}>\tau\right)
I[K]×[n]:|I|=Ru{±1}R((k,j)IXk,juk,j>τ)=I[K]×[n]:|I|=Ru{±1}R((k,j)It𝒞kΨt,juk,j>τ).\displaystyle\leq\sum_{I\subset[K]\times[n]:|I|=\lceil R\rceil}\sum_{u\in\{\pm 1\}^{\lceil R\rceil}}{\mathbb{P}}\left(\sum_{(k,j)\in I}X_{k,j}u_{k,j}>\tau\right)=\sum_{I\subset[K]\times[n]:|I|=\lceil R\rceil}\sum_{u\in\{\pm 1\}^{\lceil R\rceil}}{\mathbb{P}}\left(\sum_{(k,j)\in I}\sum_{t\in{\cal C}_{k}}\Psi_{t,j}u_{k,j}>\tau\right).

Let us now control each term of the latter sum thanks to Bernstein inequality. The random variables (Ψt,j:t,j[n])(\Psi_{t,j}:t,j\in[n]) are independent with variances at most ρ=δmax(1δ(2p1)2,1δ(2q1)2)\rho=\delta\max(1-\delta(2p-1)^{2},1-\delta(2q-1)^{2}) since Var(Ψij)=0\mathrm{Var}(\Psi_{ij})=0 when i>ji>j and Var(Ψij)=Var(Aij𝔼[Aij])=Var(Aij)ρ\mathrm{Var}(\Psi_{ij})=\mathrm{Var}(A_{ij}-{\mathbb{E}}[A_{ij}])=\mathrm{Var}(A_{ij})\leq\rho for jij\geq i. Moreover, |Ψij|=0|\Psi_{ij}|=0 when j<ij<i and |Ψij|=|Wij|=|Aij𝔼Aij|2|\Psi_{ij}|=|W_{ij}|=|A_{ij}-\mathbb{E}A_{ij}|\leq 2 for jij\geq i because Aij{1,0,1}A_{ij}\in\{-1,0,1\}. It follows from Bernstein’s inequality that for all I[K]×[n]I\subset[K]\times[n] satisfying |I|=R|I|=\lceil R\rceil and u{±1}Ru\in\{\pm 1\}^{\lceil R\rceil} that

((k,j)It𝒞kΨt,juk,j>τ)exp(τ22Rlkρ+4τ/3)exp(τ22Rc0nρ/K+4τ/3)exp(τ24Rc0nρ/K)\displaystyle{\mathbb{P}}\left(\sum_{(k,j)\in I}\sum_{t\in{\cal C}_{k}}\Psi_{t,j}u_{k,j}>\tau\right)\leq\exp\left(\frac{-\tau^{2}}{2\lceil R\rceil l_{k}\rho+4\tau/3}\right)\leq\exp\left(\frac{-\tau^{2}}{2\lceil R\rceil c_{0}n\rho/K+4\tau/3}\right)\leq\exp\left(\frac{-\tau^{2}}{4\lceil R\rceil c_{0}n\rho/K}\right)

when τ(3/2)Rc0nρ/K\tau\leq(3/2)\lceil R\rceil c_{0}n\rho/K. Therefore, supwBNRB1NXk,jwk,j2τ\sup_{w\in B_{\infty}^{N}\cap RB_{1}^{N}}\sum X_{k,j}w_{k,j}\leq 2\tau with probability at least

1(NR)2Rexp(τ24Rc0nρ/K)1exp(τ28Rc0nρ/K)\displaystyle 1-\binom{N}{\lceil R\rceil}2^{\lceil R\rceil}\exp\left(\frac{-\tau^{2}}{4\lceil R\rceil c_{0}n\rho/K}\right)\geq 1-\exp\left(\frac{-\tau^{2}}{8\lceil R\rceil c_{0}n\rho/K}\right)

when

(3/2)Rc0nρ/Kτ8c0RnρKlog(2eNR)(3/2)\lceil R\rceil c_{0}n\rho/K\geq\tau\geq\sqrt{8c_{0}}\lceil R\rceil\sqrt{\frac{n\rho}{K}\log\left(\frac{2eN}{\lceil R\rceil}\right)}

which is a non vacuous condition since R2eNexp((9/32)nρ/K)\lceil R\rceil\geq 2eN\exp(-(9/32)n\rho/K). The result follows, in the case 1RN1\leq R\leq N, by taking τ=8c0R(nρ/K)log(2eN/R)\tau=\sqrt{8c_{0}}\lceil R\rceil\sqrt{(n\rho/K)\log\left(2eN/\lceil R\rceil\right)} and using that 2RR2R\geq\lceil R\rceil when R1R\geq 1.

For 0R10\leq R\leq 1, we have

supwBKnRB1Kn(k,j)[K]×[n]Xk,jwk,j=Rmax(k,j)[K]×[n]|Xk,j|\sup_{w\in B_{\infty}^{Kn}\cap RB_{1}^{Kn}}\sum_{(k,j)\in[K]\times[n]}X_{k,j}w_{k,j}=R\max_{(k,j)\in[K]\times[n]}|X_{k,j}|

and using Bernstein inequality as above we get that with probability at least 1exp(Kτ2/(8c0nρ))1-\exp(-K\tau^{2}/(8c_{0}n\rho)), max(k,j)[K]×[n]|Xk,j|τ\max_{(k,j)\in[K]\times[n]}|X_{k,j}|\leq\tau when 3c0nρ/(2K)τ8c0nρlog(nK)/K3c_{0}n\rho/(2K)\geq\tau\geq\sqrt{8c_{0}n\rho\log(nK)/K} which is a non vacuous condition when 9c0nρ4Klog(nK)9c_{0}n\rho\geq 4K\log(nK). By taking τ=8c0nρlog(nK)/K\tau=\sqrt{8c_{0}n\rho\log(nK)/K}, we obtain, that for all 0R10\leq R\leq 1, if 9c0nρ4Klog(nK)9c_{0}n\rho\geq 4K\log(nK) then with probability at least 11/(nK)1-1/(nK),

supwBKnRB1Kn(k,j)[K]×[n]Xk,jwk,jR8c0nρlog(nK)K.\sup_{w\in B_{\infty}^{Kn}\cap RB_{1}^{Kn}}\sum_{(k,j)\in[K]\times[n]}X_{k,j}w_{k,j}\leq R\sqrt{\frac{8c_{0}n\rho\log(nK)}{K}}.

 

We apply Lemma 1 for R=c1rK/nR=c_{1}rK/n to control (8.2):

[supZ𝒞(Z+rB1n×n)<V,ZZ>c2rKρnlog(2eKnc1rKn)]1(c1rKn2eKn)c1rKn{\mathbb{P}}\left[\sup_{Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n})}\bigl{<}V,Z-Z^{*}\bigr{>}\leq c_{2}r\sqrt{\frac{K\rho}{n}\log\left(\frac{2eKn}{\lceil\frac{c_{1}rK}{n}\rceil}\right)}\right]\geq 1-\left(\frac{\lceil\frac{c_{1}rK}{n}\rceil}{2eKn}\right)^{\lceil\frac{c_{1}rK}{n}\rceil}

when c1rK/n2eKnexp((9/32)nρ/K)\lceil c_{1}rK/n\rceil\geq 2eKn\exp(-(9/32)n\rho/K).

Using the same methodology, we can prove exactly the same result for supZ𝒞(Z+rB1n×n)<UUΨ,ZZ>\sup_{Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n})}\bigl{<}UU^{\top}\Psi^{\top},Z-Z^{*}\bigr{>}. We can also use the same method to upper bound supZ𝒞(Z+rB1n×n)<UUΨ,(ZZ)UU>\sup_{Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n})}\bigl{<}UU^{\top}\Psi^{\top},(Z-Z^{*})UU^{\top}\bigr{>}, we simply have to check that the weights vector w=(wkj:k[K],j[n])w^{\prime}=(w^{\prime}_{kj}:k\in[K],j\in[n]) where wkj=(1/lk)i𝒞k[(ZZ)UU]ijw^{\prime}_{kj}=(1/l_{k})\sum_{i\in{\cal C}_{k}}[(Z-Z^{*})UU^{\top}]_{ij} is also in BKn(c1rK/n)B1KnB_{\infty}^{Kn}\cap(c_{1}rK/n)B_{1}^{Kn} for any Z𝒞Z\in{\cal C} such that ZZ1r\left\|Z-Z^{*}\right\|_{1}\leq r. This is indeed the case, since we have for all i[n]i\in[n], k[K]k^{\prime}\in[K] and j𝒞kj\in{\cal C}_{k^{\prime}}, [(ZZ)UU]ij=p=1n(ZZ)ip(UU)pj=p𝒞k(ZZ)ip/lk[(Z-Z^{*})UU^{\top}]_{ij}=\sum_{p=1}^{n}(Z-Z^{*})_{ip}(UU^{\top})_{pj}=\sum_{p\in{\cal C}_{k^{\prime}}}(Z-Z^{*})_{ip}/l_{k^{\prime}} which is therefore constant for all elements in j𝒞kj\in{\cal C}_{k^{\prime}}. Therefore, we have

k[K]j[n]|wkj|=k[K]k[K]j𝒞k|1lklki𝒞kp𝒞k(ZZ)ip|\displaystyle\sum_{k\in[K]}\sum_{j\in[n]}|w_{kj}^{\prime}|=\sum_{k\in[K]}\sum_{k^{\prime}\in[K]}\sum_{j\in{\cal C}_{k^{\prime}}}\left|\frac{1}{l_{k}l_{k^{\prime}}}\sum_{i\in{\cal C}_{k}}\sum_{p\in{\cal C}_{k^{\prime}}}(Z-Z^{*})_{ip}\right|
k[K]k[K]j𝒞k1lklki𝒞kp𝒞k|(ZZ)ip|ZZ1c1Knc1rKn\displaystyle\leq\sum_{k\in[K]}\sum_{k^{\prime}\in[K]}\sum_{j\in{\cal C}_{k^{\prime}}}\frac{1}{l_{k}l_{k^{\prime}}}\sum_{i\in{\cal C}_{k}}\sum_{p\in{\cal C}_{k^{\prime}}}\left|(Z-Z^{*})_{ip}\right|\leq\left\|Z-Z^{*}\right\|_{1}\frac{c_{1}K}{n}\leq\frac{c_{1}rK}{n}

and for all k[K]k^{\prime}\in[K] and j𝒞kj\in{\cal C}_{k^{\prime}},

|wkj|=|1lklki𝒞kp𝒞k(ZZ)ip|ZZ1.|w_{kj}^{\prime}|=\left|\frac{1}{l_{k}l_{k^{\prime}}}\sum_{i\in{\cal C}_{k}}\sum_{p\in{\cal C}_{k^{\prime}}}(Z-Z^{*})_{ip}\right|\leq\left\|Z-Z^{*}\right\|_{\infty}\leq 1.

Therefore, wBKn(c1rK/n)B1Knw^{\prime}\in B_{\infty}^{Kn}\cap(c_{1}rK/n)B_{1}^{Kn} and we obtain exactly the same upper bound for the three terms in (8.2). This concludes the proof of Proposition 2.

8.3 Proof of Proposition 3: control of the S2(Z)S_{2}(Z) term from [41]

In this section, we prove Proposition 3. We follow the proof from [41] but we only consider the “dense case” which is when nδνlognn\delta\nu\geq\log n – we recall that ν=max(2p1,12q)\nu=\max(2p-1,1-2q). For a similar uniform control of S2(Z)S_{2}(Z) in the “sparse case ”, when c0nδνlognc_{0}\leq n\delta\nu\leq\log n for some absolute constant c0c_{0}, we refer the reader to [41].

For all Z𝒞Z\in{\cal C}, we have S2(Z)=<𝒫(ZZ),W>=<𝒫(Z),W>S_{2}(Z)=\bigl{<}{\cal P}^{\perp}(Z-Z^{*}),W\bigr{>}=\bigl{<}{\cal P}^{\perp}(Z),W\bigr{>} because, by construction of the projection operator, 𝒫(Z)=0{\cal P}^{\perp}(Z^{*})=0. Therefore, S2(Z)𝒫(Z)WopS_{2}(Z)\leq\left\|{\cal P}^{\perp}(Z)\right\|_{*}\left\|W\right\|_{\mbox{op}} where \left\|\cdot\right\|_{*} denotes the nuclear norm (i.e. the sum of singular values) and op\left\|\cdot\right\|_{\mbox{op}} denotes the operator norm (i.e. maximum of the singular value). In the following Lemma 2, we prove an upper bound for 𝒫(Z)\left\|{\cal P}^{\perp}(Z)\right\|_{*} and then, we will obtain a high probability upper bound onto Wop\left\|W\right\|_{\mbox{op}}.

Lemma 2.

For all Z𝒞(Z+rB1n×n)Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n}), we have

𝒫(Z)=Tr(𝒫(Z))c1knZZ1c1Krn.\displaystyle\left\|{\cal P}^{\perp}(Z)\right\|_{*}=\mathrm{Tr}({\cal P}^{\perp}(Z))\leq\frac{c_{1}k}{n}\left\|Z-Z^{*}\right\|_{1}\leq\frac{c_{1}Kr}{n}.
Proof.

Since Z0Z\succeq 0 so it is for (𝕀nUU)Z(𝕀nUU)({\mathbb{I}}_{n}-UU^{\top})Z({\mathbb{I}}_{n}-UU^{\top}) and so 𝒫(Z)=(𝕀nUU)Z(𝕀nUU)0{\cal P}^{\perp}(Z)=({\mathbb{I}}_{n}-UU^{\top})Z({\mathbb{I}}_{n}-UU^{\top})\succeq 0 therefore 𝒫(Z)=Tr(𝒫(Z))\left\|{\cal P}^{\perp}(Z)\right\|_{*}=\mathrm{Tr}({\cal P}^{\perp}(Z)). Next, we bound the trace of 𝒫(Z){\cal P}^{\perp}(Z).

Since 𝕀nUU{\mathbb{I}}_{n}-UU^{\top} is a projection operator, it is symmetric and (𝕀nUU)2=𝕀nUU({\mathbb{I}}_{n}-UU^{\top})^{2}={\mathbb{I}}_{n}-UU^{\top}, moreover, Tr(Z)=n=Tr(Z)\mathrm{Tr}(Z)=n=\mathrm{Tr}(Z^{*}) when Z𝒞Z\in{\cal C} so

Tr(𝒫(Z))=Tr(𝒫(ZZ))=Tr((𝕀nUU)(ZZ)(𝕀nUU))\displaystyle\mathrm{Tr}({\cal P}^{\perp}(Z))=\mathrm{Tr}({\cal P}^{\perp}(Z-Z^{*}))=\mathrm{Tr}(({\mathbb{I}}_{n}-UU^{\top})(Z-Z^{*})({\mathbb{I}}_{n}-UU^{\top}))
=Tr((𝕀nUUT)2(ZZ))=Tr((𝕀nUUT)(ZZ))=Tr(Z)Tr(Z)+Tr(UUT(ZZ))\displaystyle=\mathrm{Tr}(({\mathbb{I}}_{n}-UU^{T})^{2}(Z-Z^{*}))=\mathrm{Tr}(({\mathbb{I}}_{n}-UU^{T})(Z-Z^{*}))=\mathrm{Tr}(Z)-\mathrm{Tr}(Z^{*})+\mathrm{Tr}(UU^{T}(Z^{*}-Z))
=i,j(UUT)ij(ZZ)ij=k[K]i,j𝒞k1lk(ZZ)ij=(i)k[K]1lki,j𝒞k|(ZZ)ij|\displaystyle=\sum_{i,j}(UU^{T})_{ij}(Z^{*}-Z)_{ij}=\sum_{k\in[K]}\sum_{i,j\in{\cal C}_{k}}\frac{1}{l_{k}}(Z^{*}-Z)_{ij}\overset{\mbox{(i)}}{=}\sum_{k\in[K]}\frac{1}{l_{k}}\sum_{i,j\in{\cal C}_{k}}|(Z^{*}-Z)_{ij}|
c1Knk[K]i,j𝒞k|(ZZ)ij|c1KnZZ1\displaystyle\leq\frac{c_{1}K}{n}\sum_{k\in[K]}\sum_{i,j\in{\cal C}_{k}}|(Z^{*}-Z)_{ij}|\leq\frac{c_{1}K}{n}\left\|Z-Z^{*}\right\|_{1}

where we used in (i) that for ii and jj in a same community, we have Zij=1Z^{*}_{ij}=1 and Zij[0,1]Z_{ij}\in[0,1], thus (ZZ)ij=|(ZZ)ij|(Z^{*}-Z)_{ij}=|(Z^{*}-Z)_{ij}|. Finally, when ZZ is in the localized set 𝒞(Z+rB1n×n){\cal C}\cap(Z^{*}+rB_{1}^{n\times n}), we have ZZ1r\left\|Z-Z^{*}\right\|_{1}\leq r which concludes the proof.  

Now, we obtain a high probability upper bound on Wop\left\|W\right\|_{\mbox{op}}. In the following, we apply this result in the “dense case” (i.e. nδνlognn\delta\nu\geq\log n) to get the uniform bound onto S2(Z)S_{2}(Z) over Z𝒞(Z+rB1n×n)Z\in{\cal C}\cap(Z^{*}+rB_{1}^{n\times n}).

Lemma 3 (Lemma 4 in [41]).

With probability at least 1exp(δνn)1-\exp(-\delta\nu n), Wop16δνn+168log(n)\left\|W\right\|_{\mbox{op}}\leq 16\sqrt{\delta\nu n}+168\sqrt{\mathrm{log}(n)}.

Proof.

Let AA^{\prime} be an independent copy of AA and Rn×nR\in{\mathbb{R}}^{n\times n} be a random symmetric matrix independent from both AA and AA^{\prime} whose sub-diagonal entries are independent Rademacher random variables. Using a symmetrization argument (see Chapter 2 in [64] or Chapter 2.3 in [98]), we obtain for W=A𝔼AW=A-{\mathbb{E}}A,

𝔼Wop=𝔼A𝔼Aop(i)𝔼AAop=(ii)𝔼(AA)Rop(iii)2𝔼ARop\displaystyle{\mathbb{E}}\left\|W\right\|_{\mbox{op}}={\mathbb{E}}\left\|A-{\mathbb{E}}A^{\prime}\right\|_{\mbox{op}}\overset{\mbox{(i)}}{\leq}{\mathbb{E}}\left\|A-A^{\prime}\right\|_{\mbox{op}}\overset{\mbox{(ii)}}{=}{\mathbb{E}}\left\|(A-A^{\prime})\circ R\right\|_{\mbox{op}}\overset{\mbox{(iii)}}{\leq}2{\mathbb{E}}\left\|A\circ R\right\|_{\mbox{op}}

where \circ is the entry-wise matrix multiplication operation, (i) comes from Jensen’s inequality, (ii) occurs since AAA-A^{\prime} and (AA)R(A-A^{\prime})\circ R are identically distributed and (iii) is the triangle inequality. Next, we obtain an upper bound onto 𝔼ARop{\mathbb{E}}\left\|A\circ R\right\|_{\mbox{op}}.

We define the family of independent random variables (ξij:1ijn)(\xi_{ij}:1\leq i\leq j\leq n) where for all 1ijn1\leq i\leq j\leq n

ξij={1|𝔼Aij| with probability 𝔼Aij21|𝔼Aij| with probability 𝔼Aij20 with probability 1𝔼Aij.\xi_{ij}=\left\{\begin{array}[]{l}\frac{1}{\sqrt{|{\mathbb{E}}A_{ij}|}}\mbox{ with probability }\frac{{\mathbb{E}}A_{ij}}{2}\\ -\frac{1}{\sqrt{|{\mathbb{E}}A_{ij}|}}\mbox{ with probability }\frac{{\mathbb{E}}A_{ij}}{2}\\ 0\mbox{ with probability }1-{\mathbb{E}}A_{ij}.\end{array}\right. (8.3)

We also put bij:=|𝔼Aij|b_{ij}:=\sqrt{|{\mathbb{E}}A_{ij}|} for all 1ijn1\leq i\leq j\leq n. It is straightforward to see that (ξijbij:1ijn)(\xi_{ij}b_{ij}:1\leq i\leq j\leq n) and (AijRij:1ijn)(A_{ij}R_{ij}:1\leq i\leq j\leq n) have the same distribution. As a consequence, ARop\left\|A\circ R\right\|_{\mbox{op}} and Xop\left\|X\right\|_{\mbox{op}} have the same distribution where Xn×nX\in{\mathbb{R}}^{n\times n} is a symmetric matrix with Xij=ξijbijX_{ij}=\xi_{ij}b_{ij} for 1ijn1\leq i\leq j\leq n. An upper bound on 𝔼Xop{\mathbb{E}}\left\|X\right\|_{\mbox{op}} follow from the next result due to [9].

Theorem 7 (Corollary 3.6 in [9]).

Let ξij,1ijn\xi_{ij},1\leq i\leq j\leq n be independent symmetric random variables with unit variance and (bij,1ijn)(b_{ij},1\leq i\leq j\leq n) be a family of real numbers. Let Xn×nX\in{\mathbb{R}}^{n\times n} be the random symmetric matrix defined by Xij=ξijbijX_{ij}=\xi_{ij}b_{ij} for all 1ijn1\leq i\leq j\leq n. Let σ:=max1in{j=1nbij2}\sigma:=\max_{1\leq i\leq n}\left\{\sqrt{\sum_{j=1}^{n}b_{ij}^{2}}\right\}. Then, for any α3\alpha\geq 3,

𝔼Xope2α[2σ+14αmax1ijn{ξijbij2αlog(n)}log(n)]\displaystyle{\mathbb{E}}\left\|X\right\|_{\mbox{op}}\leq e^{\frac{2}{\alpha}}\left[2\sigma+14\alpha\max_{1\leq i\leq j\leq n}\left\{\left\|\xi_{ij}b_{ij}\right\|_{2\lceil\alpha\log(n)\rceil}\right\}\sqrt{\log(n)}\right]

where, for any q>0q>0, q\left\|\cdot\right\|_{q} denotes the LqL_{q}-norm.

Since (ξij:1ijn)(\xi_{ij}:1\leq i\leq j\leq n) are independent symmetric such that Var(ξij)=𝔼[ξij2]=1\mathrm{Var}(\xi_{ij})={\mathbb{E}}[\xi_{ij}^{2}]=1 we can apply Lemma 7 to upper bound 𝔼Xop=𝔼ARop{\mathbb{E}}\left\|X\right\|_{\mbox{op}}={\mathbb{E}}\left\|A\circ R\right\|_{\mbox{op}}. We have ξijbij2αlog(n)1\left\|\xi_{ij}b_{ij}\right\|_{2\lceil\alpha\log(n)\rceil}\leq 1 for any α3\alpha\geq 3 and bij2=|𝔼Aij|δνb_{ij}^{2}=|{\mathbb{E}}A_{ij}|\leq\delta\nu. It therefore follows from Lemma 7 for α=3\alpha=3 that

𝔼Wop2e23[2nδν+42log(n)]8nδν+168log(n).{\mathbb{E}}\left\|W\right\|_{\mbox{op}}\leq 2e^{\frac{2}{3}}\left[2\sqrt{n\delta\nu}+42\sqrt{\log(n)}\right]\leq 8\sqrt{n\delta\nu}+168\sqrt{\log(n)}. (8.4)

The final step to prove Lemma 3 is a concentration argument showing that Wop\left\|W\right\|_{\mbox{op}} is close to its expectation with high probability. To that end we rely on a general result for Lipschitz and separately convex functions from [16]. We first recall that a real-valued function ff of NN variables is said separately convex when for every i=1,,Ni=1,\ldots,N it is a convex function of the ii-th variable if the rest of the variables are fixed.

Theorem 8 (Theorem 6.10 in [16]).

Let 𝒳{\cal X} be a convex compact set in {\mathbb{R}} with diameter BB. Let X1,,XNX_{1},\cdots,X_{N} be independent random variables taking values in 𝒳{\cal X}. Let f:𝒳Nf:{\cal X}^{N}\rightarrow{\mathbb{R}} be a separately convex and 11-Lipschitz function, w.r.t. the 2N\ell_{2}^{N}-norm (i.e. |f(x)f(y)|xy2|f(x)-f(y)|\leq\left\|x-y\right\|_{2} for all x,y𝒳Nx,y\in{\cal X}^{N}). Then Z=f(X1,,XN)Z=f(X_{1},\ldots,X_{N}) satisfies, for all t>0t>0, with probability at least 1exp(t2/B2)1-\exp(-t^{2}/B^{2}), Z𝔼[Z]+tZ\leq{\mathbb{E}}[Z]+t.

We apply Theorem 8 to Z:=Wop=f(Aij𝔼Aij,1ijn)=12A𝔼AopZ:=\left\|W\right\|_{\mbox{op}}=f(A_{ij}-\mathbb{E}A_{ij},1\leq i\leq j\leq n)=\frac{1}{\sqrt{2}}\left\|A-{\mathbb{E}}A\right\|_{\mbox{op}} where ff is a 11-Lipschitz w.r.t. 2N\ell_{2}^{N}-norm for N=n(n1)/2N=n(n-1)/2 and separately convex function and (Aij𝔼Aij,1ijn)(A_{ij}-\mathbb{E}A_{ij},1\leq i\leq j\leq n) is a family of NN independent random variables. Moreover, for each iji\geq j, (A𝔼A)ij[1δ(2p1),1δ(2q1)](A-{\mathbb{E}}A)_{ij}\in[-1-\delta(2p-1),1-\delta(2q-1)], which is a convex compact set with diameter B=2(1+δ(pq))4B=2(1+\delta(p-q))\leq 4. Therefore, it follows from Theorem 8 that for all t>0t>0, with probability at least 1exp(t2/16)1-\exp(-t^{2}/16), Wop𝔼Wop+2t\left\|W\right\|_{\mbox{op}}\leq\mathbb{E}\left\|W\right\|_{\mbox{op}}+\sqrt{2}t. In particular, we finish the proof of Lemma 3 for t=4δνnt=4\sqrt{\delta\nu n} and using the bound from (8.4).  

It follows from Lemma 3 that when nνδlognn\nu\delta\geq\log n, Wop184nδν\left\|W\right\|_{\mbox{op}}\leq 184\sqrt{n\delta\nu} with probability at least 1exp(δνn)1-\exp(-\delta\nu n). Using this later result together with Lemma 2 concludes the proof of Proposition 3.

9 Annex 2: Solving SDP’s in practice

The practical implementation of our approach to the problems of synchronization, signed clustering and MAX-CUT resort to solving a convex optimization problem. In the present section, we describe the various algoritms we used for solving these SDP’s.

9.1 Pierra’s method

For SDP’s with constraints on the entries, we propose a simple modification of the method initially proposed by Pierra in [84]. Let ff: n×n\mathbb{R}^{n\times n}\mapsto\mathbb{R} be a convex function. Let 𝒞\mathcal{C} denote a convex set which can be written as the intersection of convex sets 𝒞=𝒮1𝒮J\mathcal{C}=\mathcal{S}_{1}\cap\cdots\cap\mathcal{S}_{J}. Let us define =n×n××n×n\mathbb{H}=\mathbb{R}^{n\times n}\times\cdots\times\mathbb{R}^{n\times n} (JJ times) and let 𝔻\mathbb{D} denote the (diagonal) subspace of \mathbb{H} of vectors of the form (Z,,Z)(Z,\ldots,Z). In this new formalism, the problem can now be formulated as a minimisation problem over the intersection of two sets only, i.e.

min𝖹(1Jj=1Jf(𝖹j):𝖹=(𝖹j)j=1J(𝒮1××𝒮J)𝔻).\min_{{\sf Z}\in\mathbb{H}}\left(\frac{1}{J}\ \sum_{j=1}^{J}\ f({\sf Z}_{j}):{\sf Z}=({\sf Z}_{j})_{j=1}^{J}\in\left(\mathcal{S}_{1}\times\cdots\times\mathcal{S}_{J}\right)\cap\mathbb{D}\right).

Define F(𝖹)=𝟣𝖩𝗃=𝟣𝖩𝖿(𝖹𝗃)F(\sf Z)=\frac{1}{J}\ \sum_{j=1}^{J}\ f({\sf Z}_{j}). The algorithm proposed by Pierra in [84] consists of performing the following iterations

𝖹p+1=ProxI𝒮1××𝒮J+12εF(𝖡𝗉) and 𝖡𝗉+𝟣=Proj𝔻(𝖹𝗉+𝟣).{\sf Z}^{p+1}=\operatorname{Prox}_{I_{\mathcal{S}_{1}\times\cdots\times\mathcal{S}_{J}}+\frac{1}{2}\varepsilon F}\left(\sf{B}^{p}\right)\mbox{ and }\sf{B}^{p+1}=\operatorname{Proj}_{\mathbb{D}}({\sf Z}^{p+1}).

9.1.1 Application to community detection

Let us now present the computational details of Pierra’s method for the community detection problem. We will estimate its membership matrix Z¯\bar{Z} via the following SDP estimator

Z^argmaxZ𝒞A,Z,\hat{Z}\in\text{argmax}_{Z\in\mathcal{C}}\langle A,Z\rangle,

where 𝒞={Zn×n,Z0,Z0,diag(Z)In,Zijλ}\mathcal{C}=\{Z\in\mathbb{R}^{n\times n},Z\succeq 0,Z\geq 0,{\rm diag}(Z)\preceq I_{n},\sum Z_{ij}\leq\lambda\} and λ=i,j=1nZ¯ij=k=1K|𝒞k|2\lambda=\sum_{i,j=1}^{n}\bar{Z}_{ij}=\sum_{k=1}^{K}|\mathcal{C}_{k}|^{2} denotes the number of nonzero elements in the membership matrix Z¯\bar{Z}. The motivation for this approach stems from the fact that the membership matrix Z¯\bar{Z} is actually the oracle, i.e., Z=Z¯Z^{*}=\bar{Z} , where ZargmaxZ𝒞𝔼[A],ZZ^{*}\in\text{argmax}_{Z\in\mathcal{C}}\langle\mathbb{E}[A],Z\rangle. The function ff to minimize in the Pierra algorithm is defined as f(Z)=A,Zf(Z)=-\langle A,Z\rangle.

Let us denote by 𝕊+\mathbb{S}_{+} the set of symmetric positive semi-definite matrices in n×n\mathbb{R}^{n\times n}. The set 𝒞\mathcal{C} is the intersection of the sets

S1\displaystyle S_{1} =𝕊+S2={Zn×nZ0}S3={Zn×ndiag(Z)I}\displaystyle=\mathbb{S}_{+}\,S_{2}=\left\{Z\in\mathbb{R}^{n\times n}\mid Z\geq 0\right\}\,S_{3}=\left\{Z\in\mathbb{R}^{n\times n}\mid\text{diag}(Z)\preceq I\right\}
and S4={Zn×ni,j=1nZijλ}\displaystyle\mbox{ and }S_{4}=\left\{Z\in\mathbb{R}^{n\times n}\mid\sum_{i,j=1}^{n}\ Z_{ij}\leq\lambda\right\}

We now compute for all 𝖡=(𝖡𝗃)𝗃=𝟣𝟦(𝗇×𝗇)𝟦\sf{B}=(\sf{B}_{j})_{j=1}^{4}\in({\mathbb{R}}^{n\times n})^{4} and j=1,,4j=1,\ldots,4 (J=4J=4 here)

ProxIS1××S4+12εF(𝖡)j\displaystyle\operatorname{Prox}_{I_{S_{1}\times\cdots\times S_{4}}+\frac{1}{2}\varepsilon F}\left(\sf{B}\right)_{j} =ProxISj+12Jεf(𝖡𝗃)\displaystyle=\operatorname{Prox}_{I_{S_{j}}+\frac{1}{2J}\varepsilon f}\left(\sf{B}_{j}\right)

We have for J=4J=4

ProxISj+12Jεf(𝖡𝗃)=argminZSjϵ2JA,Z+12Z𝖡jF2=PSj(𝖡j+ϵ2JA)\operatorname{Prox}_{I_{S_{j}}+\frac{1}{2J}\varepsilon f}\left(\sf{B}_{j}\right)=\text{argmin}_{Z\in S_{j}}\ -\frac{\epsilon}{2J}\ \langle A,Z\rangle+\frac{1}{2}\ \|Z-{\sf B}_{j}\|_{F}^{2}=P_{S_{j}}\left({\sf B}_{j}+\frac{\epsilon}{2J}\ A\right)

On the other hand, the projections operators PSj,j=1,2,3,4P_{S_{j}},j=1,2,3,4 are given by

PS1(𝖹𝟣)\displaystyle P_{S_{1}}(\sf Z_{1}) =Umax{Σ,0}Uwhere 𝖹𝟣 has SVD 𝖹𝟣=𝖴Σ𝖴\displaystyle=U\max\left\{\Sigma,0\right\}U^{\top}\hskip 8.5359pt\text{where $\sf Z_{1}$ has SVD }\sf Z_{1}=U\Sigma U^{\top}
PS2(𝖹𝟤)\displaystyle P_{S_{2}}(\sf Z_{2}) =max{𝖹2,0}PS3(𝖹𝟥)=𝖹𝟥diag(𝖹𝟥)+min{𝟣,diag(𝖹𝟥)}\displaystyle=\max\left\{{\sf Z}_{2},0\right\}\,P_{S_{3}}(\sf Z_{3})={\sf Z}_{3}-\text{diag}({\sf Z}_{3})+\min\left\{1,\text{diag}({\sf Z}_{3})\right\}
PS4(𝖹4)\displaystyle P_{S_{4}}({\sf Z}_{4}) =λij(𝖹4)ij𝖹𝟦.\displaystyle=\frac{\lambda}{\sum_{ij}\ ({\sf Z}_{4})_{ij}}\ \sf Z_{4}.

To sum up, Pierra’s method can be formulated as follows.

For all iterations kk in \mathbb{N}, compute the SVD of 𝖡1k+ϵ24A=UkΣk(Uk){\sf B}_{1}^{k}+\frac{\epsilon}{2\cdot 4}A=U^{k}\Sigma^{k}(U^{k})^{\top}. Then compute for all j=1,,4j=1,\ldots,4

𝖡𝗃𝗄+𝟣\displaystyle\sf B^{k+1}_{j} =14(Ukmax{Σk,0}(Uk)+max{𝖡𝟤𝗄+ϵ𝟤𝟦𝖠,𝟢}+𝖡3k+ϵ24Adiag(𝖡3k+ϵ24A)\displaystyle=\frac{1}{4}\Bigg{(}U^{k}\max\left\{\Sigma^{k},0\right\}(U^{k})^{\top}+\max\left\{\sf{\sf B}_{2}^{k}+\frac{\epsilon}{2\cdot 4}A,0\right\}+{\sf B}_{3}^{k}+\frac{\epsilon}{2\cdot 4}A-\text{diag}({\sf B}_{3}^{k}+\frac{\epsilon}{2\cdot 4}A)
+min{1,diag(𝖡3k+ϵ24A)}+λij(𝖡4k+ϵ24A)ij𝖡4k+ϵ24A).\displaystyle\hskip 56.9055pt+\min\left\{1,\text{diag}({\sf B}_{3}^{k}+\frac{\epsilon}{2\cdot 4}A)\right\}+\frac{\lambda}{\sum_{ij}\ ({\sf B}_{4}^{k}+\frac{\epsilon}{2\cdot 4}A)_{ij}}\ {\sf B}_{4}^{k}+\frac{\epsilon}{2\cdot 4}A\Bigg{)}.

9.1.2 Application to signed clustering

Let us now turn to the signed clustering problem. We will estimate its membership matrix Z¯\bar{Z} via the following SDP estimator Z^argmaxZ𝒞A,Z,\hat{Z}\in\text{argmax}_{Z\in\mathcal{C}}\ \langle A,Z\rangle, where 𝒞={Zn×n:Z0,Zij[0,1],Zii=1,i=1,,n}\mathcal{C}=\{Z\in\mathbb{R}^{n\times n}:Z\succeq 0,Z_{ij}\in[0,1],Z_{ii}=1,i=1,\ldots,n\}. As in the community detection case the function ff to minimize in the Pierra algorithm is defined as f(Z)=A,Zf(Z)=-\langle A,Z\rangle.

Let us denote by 𝕊+\mathbb{S}_{+} the set of symmetric positive semi-definite matrices in n×n\mathbb{R}^{n\times n}. The set 𝒞\mathcal{C} is the intersection of the sets S1=𝕊+S_{1}=\mathbb{S}_{+}, S2={Zn×nZ[0,1]n×n}S_{2}=\left\{Z\in\mathbb{R}^{n\times n}\mid Z\in[0,1]^{n\times n}\right\} and S3={Zn×nZii=1,i=1,,n}.S_{3}=\left\{Z\in\mathbb{R}^{n\times n}\mid Z_{ii}=1,\ i=1,\ldots,n\right\}.

As before, for j=1,,3j=1,\ldots,3

ProxISj+123εf(𝖡𝗃)\displaystyle\operatorname{Prox}_{I_{S_{j}}+\frac{1}{2\cdot 3}\varepsilon f}\left(\sf{B}_{j}\right) =PSj(𝖡j+ϵ23A)\displaystyle=P_{S_{j}}\left({\sf B}_{j}+\frac{\epsilon}{2\cdot 3}\ A\right)

and the projection operators PSjP_{S_{j}}, j=1,2j=1,2 are given by

PS1(𝖹𝟣)=𝖴max{Σ,𝟢}𝖴,𝖯𝖲𝟤(𝖹𝟤)=min{max{𝖹𝟤,𝟢},𝟣} and 𝖯𝖲𝟥(𝖹𝟥)=𝖹𝟥diag(𝖹𝟥)+𝖨P_{S_{1}}(\sf Z_{1})=U\max\left\{\Sigma,0\right\}U^{\top},\,P_{S_{2}}(\sf Z_{2})=\min\left\{\max\left\{{\sf Z}_{2},0\right\},1\right\}\mbox{ and }P_{S_{3}}(\sf Z_{3})={\sf Z}_{3}-\text{diag}({\sf Z}_{3})+I

To sum up, Pierra’s method can be formulated as follows.

At each iteration kk, compute the SVD of 𝖡1k+ϵ23A=UkΣk(Uk){\sf B}_{1}^{k}+\frac{\epsilon}{2\cdot 3}A=U^{k}\Sigma^{k}(U^{k})^{\top}. Then compute for all j=1,,3j=1,\ldots,3

𝖡jk+1\displaystyle{\sf B}^{k+1}_{j} =13(Ukmax{Σk,0}Ukt+min{max{𝖡2k+ϵ23A,0},1}+𝖡3k+ϵ23Adiag(𝖡3k+ϵ23A)+I).\displaystyle=\frac{1}{3}\Bigg{(}U^{k}\max\left\{\Sigma^{k},0\right\}U^{k^{t}}+\min\left\{\max\left\{{\sf B}_{2}^{k}+\frac{\epsilon}{2\cdot 3}\ A,0\right\},1\right\}+{\sf B}_{3}^{k}+\frac{\epsilon}{2\cdot 3}\ A-\text{diag}\left({\sf B}_{3}^{k}+\frac{\epsilon}{2\cdot 3}\ A\right)+I\Bigg{)}.

9.2 The Burer-Monteiro approach and the Manopt Solver

To solve the MAX-CUT and Angular Synchronization problems we rely on Manopt, a freely available Matlab toolbox for optimization on manifolds [17]. Manopt runs the Riemannian Trust-Region method on corresponding Burer-Monteiro non-convex problem with rank bounded by pp as follows. The Burer-Monteiro approach consists of replacing the optimization of a linear function <A,Z>\bigl{<}A,Z\bigr{>} over the convex set 𝒵={Z0:𝒜(Z)=b}\mathcal{Z}=\{Z\succeq 0:\mathcal{A}(Z)=b\} with the optimization of the quadratic function <AY,Y>\bigl{<}AY,Y\bigr{>} over the non-convex set 𝒴={Yn×p:𝒜(YYT)=b}\mathcal{Y}=\{Y\in\mathbb{R}^{n\times p}:\mathcal{A}(YY^{T})=b\}.

In the context of the MAX-CUT problem, the Burer-Monteiro approach amounts to the following steps. Denoting by ZZ the positive semidefinite matrix Z=zzTZ=zz^{T}, note that both the cost function and the constraints lend themselves to be expressed linearly in terms of ZZ. Dropping the NP-hard rank-1 constraint on ZZ, we arrive at the well-known convex relaxation of MAX-CUT from [49]

Z^argminZ𝒞<A,Z>\hat{Z}\in\underset{Z\in\mathcal{C}}{\mathrm{argmin}}\bigl{<}A,Z\bigr{>} (9.1)

where 𝒞:={Zn×n:Z0,Zii=1,i=1,,n}\mathcal{C}:=\{Z\in{\mathbb{R}}^{n\times n}:Z\succeq 0,Z_{ii}=1,\forall i=1,\ldots,n\}.

If a solution Z^\hat{Z} of this SDP has rank 1, then Z^=zzT\hat{Z}=z^{*}z^{*^{T}} for some zz^{*}, which then gives the optimal cut. Recall that in the general case of higher rank Z^\hat{Z}, Goemans and Williamson [49] introduced the celebrated rounding scheme that extracts approximately optimal cuts within a ratio of 0.878 from Z^\hat{Z}. The corresponding Burer-Monteiro non-convex problem with rank bounded by pp is given by

Y^argminX<AY,Y>\hat{Y}\in\underset{X\in\mathcal{B}}{\mathrm{argmin}}\bigl{<}AY,Y\bigr{>} (9.2)

where :={Yn×p:diag(YYT)=1}\mathcal{B}:=\{Y\in{\mathbb{R}}^{n\times p}:\text{diag}(YY^{T})=1\}. Note that the constraint diag(YYT)=1\text{diag}(YY^{T})=1 requires each row of YY to have unit 2p\ell_{2}^{p} norm, rendering YY to be a point on the Cartesian product of nn unit spheres 𝒮2p1{\cal S}_{2}^{p-1} in p{\mathbb{R}}^{p}, which is a smooth manifold. Also note that the search space of the SDP is compact, since all ZZ feasible for the SDP have identical trace equal to nn.

If the convex set 𝒵\mathcal{Z} is compact, and mm denotes the number of constraints, it holds true that whenever pp satisfies p(p+1)2m\frac{p(p+1)}{2}\geq m, the two problems share the same global optimum [12, 22]. Building on pioneering work of Burer and Monteiro [22], Boumal et. al [18] showed that if the set 𝒵\mathcal{Z} is compact and the set 𝒴\mathcal{Y} is a smooth manifold, then p(p+1)2m\frac{p(p+1)}{2}\geq m implies that, for almost all cost matrices AA, global optimality is achieved by any YY satisfying a second-order necessary optimality conditions. Following [18], for p=2np=\lceil\sqrt{2n}\rceil, for almost all matrices AA, even though (9.2) is non-convex, any local optimum YY is a global optimum (and so is Z=YYTZ=YY^{T}), and all saddle points have an escape (the Hessian has a negative eigenvalues). Note that for p>n/2p>n/2 the same statement holds true for all AA, and was previously established by [19].

10 Numerical experiments

This section contains the outcome of numerical experiments on the three application problems considered: signed clustering, MAX-CUT, and angular synchronization.

10.1 Signed Clustering

To assess the effectiveness of the SDP relaxation, we consider the following experimental setup. We generate synthetic networks following the signed stochastic block model (SSBM) previously described in Section 4.1.1, with K=5K=5 communities. To quantify the effectiveness of the SDP relaxation, we compare the accuracy of a suite of algorithms from the signed clustering literature, before (that is when we perform these algorithms directly on AA) and after the SDP relaxation (that is when we perform the very same algorithms on Z^\hat{Z}). To measure the recovery quality of the clustering results, for a given indicator set x1,,xKx_{1},\ldots,x_{K}, we rely on the error rate consider in [24], defined as

γ=c=1KxcTAcomxc+xcTLcom+xcn2,\gamma=\sum_{c=1}^{K}\frac{x_{c}^{T}A_{com}^{-}x_{c}+x_{c}^{T}L_{com}^{+}x_{c}}{n^{2}}, (10.1)

where xcx_{c} denotes a cluster indicator vector, AcomA_{com} (=𝔼A={\mathbb{E}}A) is the complete KK-weakly balanced ground truth network – with 11’s on the diagonal blocks corresponding to inter-cluster edges, and 1-1 otherwise – with Acom=Acom+AcomA_{com}=A_{com}^{+}-A_{com}^{-}, and Lcom+L_{com}^{+} denotes the combinatorial graph Laplacian corresponding to AcomA_{com}^{-}. Note that xcTAcomxcx_{c}^{T}A_{com}^{-}x_{c} counts the number of violations within the clusters (since negative edges should not be placed within clusters) and xcTLcom+xcx_{c}^{T}L_{com}^{+}x_{c} counts the number of violations across clusters (since positive edges should not belong to the cut). Overall, (10.1) essentially counts the fraction of intra-cluster and inter-cluster edge violations, with respect to the full ground truth matrix. Note that this definition can also be easily adjusted to work on real data sets, where the ground truth matrix AcomA_{com} is not available, which one can replace with the empirical observation AA.

In terms of the signed clustering algorithms compared, we consider the following algorithms from the literature. One straightforward approach is to simply rely on the spectrum of the observed adjacency matrix AA. Kunegis et al. [65] proposed spectral tools for clustering, link prediction, and visualization of signed graphs, by solving a 2-way “signed” ratio-cut problem based on the combinatorial Signed Laplacian [59] L¯=D¯A\bar{L}=\bar{D}-A, where D¯\bar{D} is a diagonal matrix with D¯ii=i=1n|Aij|\bar{D}_{ii}=\sum_{i=1}^{n}|A_{ij}|. The same authors proposed signed extensions for the case of the random-walk Laplacian L¯rw=ID¯1A\bar{L}_{\text{rw}}=I-\bar{D}^{-1}A, and the symmetric graph Laplacian L¯sym=ID¯1/2AD¯1/2\bar{L}_{\text{sym}}=I-\bar{D}^{-1/2}A\bar{D}^{-1/2}, the latter of which is particularly suitable for skewed degree distributions. Finally, the last algorithm we considered is BNC of Chiang et al. [25], who introduced a formulation based on the Balanced Normalized Cut objective

min{x1,,xK}(c=1KxcT(D+A)xcxcTD¯xc).\operatorname{min}_{\{x_{1},\ldots,x_{K}\}\in\mathcal{I}}\left(\sum_{c=1}^{K}\frac{x_{c}^{T}(D^{+}-A)x_{c}}{x_{c}^{T}\bar{D}x_{c}}\right). (10.2)

which, in light of the decomposition D+A=D+(A+A)=D+A++A=L++AD^{+}-A=D^{+}-(A^{+}-A^{-})=D^{+}-A^{+}+A^{-}=L^{+}+A^{-}, is effectively minimizing the number of violations in the clustering procedure.

In our experiments, we first compute the error rate γbefore\gamma_{before} of all algorithms on the original SSBM graph (shown in Column 1 of Figure 1), and then we repeat the procedure but with the input to all signed clustering algorithms being given by the output of the SDP relaxation, and denote the resulting recovery error by γafter\gamma_{after}. The third column of the same Figure 1 shows the difference in errors γδ=γbeforeγafter\gamma_{\delta}=\gamma_{before}-\gamma_{after} between the first and second columns, while the fourth column contains a histogram of the error differences γδ\gamma_{\delta}. This altogether illustrates the fact that the SDP relaxation does improve the performance of all signed clustering algorithms, except L¯\bar{L}, and could effectively be used as a denoising pre-processing step.

Before After Delta Histogram
AA [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
L¯\bar{L} [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
L¯rw\bar{L}_{rw} [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
L¯sym\bar{L}_{sym} [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
BNC [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
Figure 1: Summary of results for the Signed Clustering problem. The first column denotes the recovery error before the SDP relaxation step, meaning that we consider a number of signed clustering algorithms from the literature which we apply directly the initial adjacency matrix AA. The second column contains the results when applying the same suite of algorithms after the SDP relaxation. The third column shows the difference in errors between the first and second columns, while the fourth column contains a histogram of the delta errors. This altogether illustrates the fact the SDP relaxation does improve the performance of all signed clustering algorithms except L¯\bar{L}.

10.2 Max-Cut

For the MAX-CUT problem, we consider two sets of numerical experiments. First, we consider a version of the stochastic block model which essentially perturbs a complete bipartite graph

𝐁=|𝟎n1×n1𝟏n1×n2𝟏n2×n1𝟎n2×n2|,\mathbf{B}=\begin{vmatrix}\mbox{\boldmath$0$}_{n_{1}\times n_{1}}&\mbox{\boldmath$1$}_{n_{1}\times n_{2}}\\ \mbox{\boldmath$1$}_{n_{2}\times n_{1}}&\mbox{\boldmath$0$}_{n_{2}\times n_{2}}\\ \end{vmatrix}, (10.3)

where 𝟏n1×n2\mbox{\boldmath$1$}_{n_{1}\times n_{2}} (respectively, 𝟎n1×n2\mbox{\boldmath$0$}_{n_{1}\times n_{2}}) denotes an n1×n2n_{1}\times n_{2} matrix of all ones, respectively, all zeros. In our experiments, we set n1=n2=n2n_{1}=n_{2}=\frac{n}{2}, and fix n=500n=500. We perturb 𝐁\mathbf{B} by deleting edges across the two partitions, and inserting edges within each partition. More specifically, we generated the full adjacency matrix A0A^{0} from 𝐁\mathbf{B} by adding edges independently with probability η\eta within each partition (i.e., along the diagonal blocks in (10.3)). Finally, we denote by AA the masked version we observe, A=A0SA=A^{0}\circ S, where SS denotes the adjacency matrix of an Erdős-Rényi(nn, δ\delta) graph. The graph shown in Figure 2 is an instance of the above generative model.

Refer to caption
Figure 2: Illustration of Max-Cut in the setting of a perturbation of a complete bipartite graph.

Note that for small values of η\eta we expect the maximum cut to occur across the initial partition 𝒫𝐁\mathcal{P}_{\mathbf{B}} in the clean bipartite graph 𝐁\mathbf{B}, which we aim to recover as we sparsify the observed graph AA. The heatmap in the left of Figure 3 shows the Adjusted Rand Index (ARI) between the initial partition 𝒫𝐁\mathcal{P}_{\mathbf{B}} and the partition of the Max-Cut SDP relaxation in (9.1), as we vary the noise parameter η\eta and the sparsity δ\delta. As expected, for a fix level of noise η\eta, we are able to recover the hypothetically optimal Max-Cut, for suitable levels of the sparsity parameter. The heatmap in the right of Figure 3 shows the computational running time, as we vary the two parameter, showing that the Manopt solver takes the longest to solve dense noisy problems, as one would expect.

Refer to caption
(a) Adjusted Rand Index.
Refer to caption
(b) Running times (MANOPT).
Figure 3: Numerical results for MAX-CUT on a perturbed complete bipartite graph, as we vary the noise level η\eta and the sampling sparsity δ\delta. Results are averaged over 20 runs.

In the second set of experiments shown in Figure 4, we consider a graph A0A^{0} chosen at random from the collection333http://web.stanford.edu/~yyye/yyye/Gset/ of graphs known in the literature as the Gset, where we vary the sparsity level δ\delta, and show the Max-Cut value attained on the original full graph A0A^{0}, but using the Max-Cut partition computed by the SDP relaxation (9.1) on the sparsified graph AA.

Refer to caption
Figure 4: Max-Cut results for the G53 benchmark graph (from the Gset collection) with n=1000n=1000 nodes and average degree 12\approx 12. Results are averaged over 20 runs.

10.3 Angular Synchronization

For the angular synchronization problem, we consider the following experimental setup, by assessing the quality of the recovered angular solution from the SDP relaxation, as we vary the two parameters of interest. In the xx-axis in the plots from Figures 5 and 6 we vary the noise level σ\sigma, under two different noise models, Gaussian and outliers. On the yy-axis, we vary the sparsity of the sampling graph.

We measure the quality of the recovered angles via the Mean Squared Error (MSE), defined as follows. Since a solution can only be recovered up to a global shift, one needs an MSE error that mods out such a degree of freedom. The following MSE is also more broadly applicable for the case when the underlying group is the orthogonal group O(d)O(d), as opposed to SO(2)SO(2), as in the present work, where one can replace the unknown angles θ1,,θn\theta_{1},\ldots,\theta_{n} with their respective representation as 2×22\times 2 rotation matrices h1,,hnO(2)h_{1},\ldots,h_{n}\in\text{O}(2). To that end, we look for an optimal orthogonal transformation O^O(2)\hat{O}\in\text{O}(2) that minimizes the sum of squared distances between the estimated orthogonal transformations and the ground truth measurements

O^=argminOO(2)i=1nhiOh^iF2,\hat{O}=\underset{O\in O(2)}{\operatorname{argmin}}\sum_{i=1}^{n}\|h_{i}-O\hat{h}_{i}\|_{F}^{2}, (10.4)

where h^1,,h^nO(2)\hat{h}_{1},\ldots,\hat{h}_{n}\in\text{O}(2) denote the 2×22\times 2 rotation matrix representation of the estimated angles θ^1,,θ^n\hat{\theta}_{1},\ldots,\hat{\theta}_{n}. In other words, O^\hat{O} is the optimal solution to the alignment problem between two sets of orthogonal transformations, in the least-squares sense. Following the analysis of [91], and making use of properties of the trace, one arrives at

i=1nhiOh^iF2\displaystyle\sum_{i=1}^{n}\|h_{i}-O\hat{h}_{i}\|_{F}^{2} =\displaystyle= i=1nTrace[(hiOh^i)(hiOh^i)T]\displaystyle\sum_{i=1}^{n}\;\mathrm{Trace}\left[\left(h_{i}-O\hat{h}_{i}\right)\left(h_{i}-O\hat{h}_{i}\right)^{T}\right] (10.5)
=\displaystyle= i=1nTrace[2I2Oh^ihiT]=4n2Trace[Oi=1nh^ihiT].\displaystyle\sum_{i=1}^{n}\;\mathrm{Trace}\left[2I-2O\hat{h}_{i}h_{i}^{T}\right]=4n-2\;\mathrm{Trace}\left[O\sum_{i=1}^{n}\hat{h}_{i}h_{i}^{T}\right].

If we let QQ denote the 2×22\times 2 matrix

Q=1ni=1nh^ihiT,Q=\frac{1}{n}\sum_{i=1}^{n}\hat{h}_{i}h_{i}^{T}, (10.6)

it follows from (10.5) that the MSE is given by minimizing

1ni=1nhiOh^iF2=42Tr(OQ).\frac{1}{n}\sum_{i=1}^{n}\|h_{i}-O\hat{h}_{i}\|_{F}^{2}=4-2Tr(OQ). (10.7)

In [7] it is proven that Tr(OQ)Tr(VUTQ)Tr(OQ)\leq Tr(VU^{T}Q), for all OO(3)O\in O(3), where Q=UΣVTQ=U\Sigma V^{T} is the singular value decomposition of QQ. Therefore, the MSE is minimized by the orthogonal matrix O^=VUT\hat{O}=VU^{T} and is given by

MSE=def1ni=1nhiO^h^iF2=42Trace(VUTUΣVT)=42(σ1+σ2),\textsc{MSE}\;\mathrel{\overset{\makebox[0.0pt]{\mbox{\tiny def}}}{=}}\;\frac{1}{n}\sum_{i=1}^{n}\|h_{i}-\hat{O}\hat{h}_{i}\|_{F}^{2}=4-2\;\mathrm{Trace}(VU^{T}U\Sigma V^{T})=4-2(\sigma_{1}+\sigma_{2}), (10.8)

where σ1,σ2\sigma_{1},\sigma_{2} are the singular values of QQ. Therefore, whenever QQ is an orthogonal matrix for which σ1=σ2=1\sigma_{1}=\sigma_{2}=1, the MSE vanishes. Indeed, the numerical experiments (on a log scale) in Figures 5 and 6 confirm that for noiseless data, the MSE is very close to zero. Furthermore, as one would expected, under favorable noise regimes and sparsity levels, we have almost perfect recovery, both by the SDP and the spectral relaxations, under both noise models.

Refer to caption
(a) Spectral relaxation.
Refer to caption
(b) SDP relaxation (solved via MANOPT).
Figure 5: Recovery rates (MSE (10.8) - the lower the better) for angular synchronization with n=500n=500, under the Gaussian noise model, as we vary the noise level σ\sigma and the sparsity pp of the measurement graph. Averaged over 20 runs.
Refer to caption
(a) Spectral relaxation.
Refer to caption
(b) SDP relaxation (solved via MANOPT).
Figure 6: Recovery rates (MSE (10.8) - the lower the better) for angular synchronization with n=500n=500, under the Outlier noise model, as we vary the noise level γ\gamma and the sparsity pp of the measurement graph. Averaged over 20 runs.

11 Conclusions and future work

There are a number of other graph-based problems amenable to SDP relaxations, for which a similar theoretical analysis of their SDP-based estimators could be suitable. For example, the recent work of [34] considered a problem motivated by geosciences and engineering applications of recovering a smooth unknown function f:Gf:G\rightarrow\mathbb{R} (where G=[a,b]G=[a,b] is known) from noisy observations of its mod 1 values, which is also amenable to a solution based on an SDP relaxation solved via a Burer-Monteiro approach. Another potential application concerns the problem of clustering directed graphs, as in the very recent work of [33] that proposed a spectral algorithm based on Hermitian matrices; this problem is also amenable to an SDP relaxation.

Our theoretical and practical findings show that running algorithms (such as spectral methods) directly on AA may be improved by using first a SDP estimator such as Z^\hat{Z} and running the very same algorithms on Z^\hat{Z} (instead of AA). Somehow, Z^\hat{Z} performs a pre-processing de-noising step which improve the recovery of the hidden signal such as community vectors.

Acknowledgements

Mihai Cucuringu acknowledges support from the EPSRC grant EP/N510129/1 at The Alan Turing Institute. Guillaume Lecué acknowledges support from a grant of the French National Research Agency (ANR), “Investissements d’Avenir” (LabEx Ecodec/ANR-11-LABX-0047).

References

  • [1] Epinions data set. https://snap.stanford.edu/data/soc-Epinions1.html. Accessed: 2010-09-30.
  • [2] Slashdot data set. https://snap.stanford.edu/data/soc-Slashdot0902.html. Accessed: 2010-09-30.
  • [3] Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2015.
  • [4] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah. Time-series clustering–a decade review. Information Systems, 53:16–38, 2015.
  • [5] Brendan PW Ames. Guaranteed clustering and biclustering via semidefinite programming. Mathematical Programming, 147(1-2):429–465, 2014.
  • [6] Miguel F Anjos and Jean B Lasserre. Handbook on semidefinite, conic and polynomial optimization, volume 166. Springer Science & Business Media, 2011.
  • [7] K. Arun, T. Huang, and S. Bolstein. Least-squares fitting of two 3-D point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(5):698–700, 1987.
  • [8] Afonso S Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 459–470. ACM, 2014.
  • [9] Afonso S Bandeira, Ramon Van Handel, et al. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. The Annals of Probability, 44(4):2479–2506, 2016.
  • [10] Sujogya Banerjee, Kaushik Sarkar, Sedat Gokalp, Arunabha Sen, and Hasan Davulcu. Partitioning signed bipartite graphs for classification of individuals and organizations. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, pages 196–204. Springer, 2012.
  • [11] Peter L. Bartlett and Shahar Mendelson. Empirical minimization. Probab. Theory Related Fields, 135(3):311–334, 2006.
  • [12] A. I. Barvinok. Problems of distance geometry and convex properties of quadratic maps. Discrete & Computational Geometry, 13(2):189–202, Mar 1995.
  • [13] Lucien Birgé and Pascal Massart. Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields, 97(1-2):113–150, 1993.
  • [14] Grigoriy Blekherman, Pablo A Parrilo, and Rekha R Thomas. Semidefinite optimization and convex algebraic geometry. SIAM, 2012.
  • [15] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
  • [16] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013. ISBN 978-0-19-953525-5.
  • [17] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15:1455–1459, 2014.
  • [18] N. Boumal, V. Voroninski, and A.S. Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Advances in Neural Information Processing Systems 29, pages 2757–2765. 2016.
  • [19] Nicolas Boumal. A riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints. arXiv preprint arXiv:1506.00575, 2015.
  • [20] Stephen Boyd, Laurent El Ghaoui, Eric Feron, and Venkataramanan Balakrishnan. Linear matrix inequalities in system and control theory, volume 15. Siam, 1994.
  • [21] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [22] S. Burer and R.D.C. Monteiro. Local minima and convergence in low-rank semidefinite programming. Mathematical Programming, 103(3):427–444, 2005.
  • [23] Yudong Chen and Jiaming Xu. Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. The Journal of Machine Learning Research, 17(1):882–938, 2016.
  • [24] K. Chiang, J. Whang, and I. Dhillon. Scalable clustering of signed networks using Balance Normalized Cut. CIKM, 2012.
  • [25] Kai-Yang Chiang, Joyce Whang, and Inderjit S. Dhillon. Scalable Clustering of Signed Networks using Balance Normalized Cut. In ACM Conference on Information and Knowledge Management (CIKM), oct 2012.
  • [26] Geoffrey Chinot, Lecué Guillaume, and Lerasle Matthieu. Statistical learning with lipschitz and convex loss functions. arXiv preprint arXiv:1810.01090, 2018.
  • [27] Stéphane Chrétien and Franck Corset. Using the eigenvalue relaxation for binary least-squares estimation problems. Signal Processing, 89(11):2079–2091, 2009.
  • [28] Stéphane Chrétien, Clément Dombry, and Adrien Faivre. A semi-definite programming approach to low dimensional embedding for unsupervised clustering. Frontiers in Applied Mathematics and Statistics, to appear.
  • [29] Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004.
  • [30] M. Cucuringu. Synchronization over Z2 and community detection in multiplex networks with constraints. Journal of Complex Networks, 3:469–506, 2015.
  • [31] M. Cucuringu. Sync-Rank: Robust Ranking, Constrained Ranking and Rank Aggregation via Eigenvector and Semidefinite Programming Synchronization. IEEE Transactions on Network Science and Engineering, 3(1):58–79, 2016.
  • [32] M. Cucuringu, P. Davies, A. Glielmo, and H. Tyagi. SPONGE: A generalized eigenproblem for clustering signed networks. AISTATS 2019.
  • [33] M. Cucuringu, H. Li, H. Sun, and L. Zanetti. Hermitian matrices for clustering directed graphs: insights and applications. to appear in AISTATS 2020.
  • [34] M. Cucuringu and H. Tyagi. Provably robust estimation of modulo 1 samples of a smooth function with applications to phase unwrapping. to appear in Journal of Machine Learning Research (JMLR), 2020.
  • [35] Mark A Davenport and Justin Romberg. An overview of low-rank matrix recovery from incomplete observations. IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016.
  • [36] Chandler Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.
  • [37] J. A. Davis. Clustering and structural balance in graphs. Human Relations, 20(2):181–187, 1967.
  • [38] Yohann De Castro, Fabrice Gamboa, Didier Henrion, Roxana Hess, Jean-Bernard Lasserre, et al. Approximate optimal designs for multivariate polynomial regression. The Annals of Statistics, 47(1):127–155, 2019.
  • [39] Yohann de Castro, Fabrice Gamboa, Didier Henrion, and Jean B. Lasserre. Exact solutions to super resolution on semi-algebraic domains in higher dimensions. IEEE Trans. Information Theory, 63(1):621–630, 2017.
  • [40] Laurent Demanet and Paul Hand. Stable optimizationless recovery from phaseless linear measurements. Journal of Fourier Analysis and Applications, 20(1):199–221, 2014.
  • [41] Yingjie Fei and Yudong Chen. Exponential error rates of SDP for block models: beyond Grothendieck’s inequality. IEEE Trans. Inform. Theory, 65(1):551–571, 2019.
  • [42] Roger Fletcher. A nonlinear programming problem in statistics (educational testing). SIAM Journal on Scientific and Statistical Computing, 2(3):257–267, 1981.
  • [43] Santo Fortunato. Community detection in graphs. Physics reports, 486(3-5):75–174, 2010.
  • [44] Bernd Gärtner and Jiří Matoušek. Semidefinite programming. In Approximation Algorithms and Semidefinite Programming, pages 15–25. Springer, 2012.
  • [45] Jean Charles Gilbert and Cédric Josz. Plea for a semidefinite optimization solver in complex numbers. 2017.
  • [46] Christophe Giraud and Nicolas Verzelen. Partial recovery bounds for clustering with the relaxed kk means. arXiv preprint arXiv:1807.07547, 2018.
  • [47] Michel X Goemans. Semidefinite programming in combinatorial optimization. Mathematical Programming, 79(1-3):143–161, 1997.
  • [48] Michel X Goemans and David P Williamson. New 34-approximation algorithms for the maximum satisfiability problem. SIAM Journal on Discrete Mathematics, 7(4):656–666, 1994.
  • [49] Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995.
  • [50] David Gross, Yi-Kai Liu, Steven T Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Physical review letters, 105(15):150401, 2010.
  • [51] A. Grothendieck. Résumé de la théorie métrique des produits tensoriels topologiques. Bol. Soc. Mat. São Paulo, 8:1–79, 1953.
  • [52] Alexandre Grothendieck. Résumé de la théorie métrique des produits tensoriels topologiques. Soc. de Matemática de São Paulo, 1956.
  • [53] Olivier Guédon and Roman Vershynin. Community detection in sparse networks via Grothendieck’s inequality. Probab. Theory Related Fields, 165(3-4):1025–1049, 2016.
  • [54] Olivier Guédon and Roman Vershynin. Community detection in sparse networks via grothendieck’s inequality. Probability Theory and Related Fields, 165(3-4):1025–1049, 2016.
  • [55] Bruce Hajek, Yihong Wu, and Jiaming Xu. Achieving exact cluster recovery threshold via semidefinite programming. IEEE Transactions on Information Theory, 62(5):2788–2797, 2016.
  • [56] Simai He, Zhi-Quan Luo, Jiawang Nie, and Shuzhong Zhang. Semidefinite relaxation bounds for indefinite homogeneous quadratic optimization. SIAM Journal on Optimization, 19(2):503–523, 2008.
  • [57] Chinmay Hegde, Aswin C Sankaranarayanan, and Richard G Baraniuk. Near-isometric linear embeddings of manifolds. In 2012 IEEE Statistical Signal Processing Workshop (SSP), pages 728–731. IEEE, 2012.
  • [58] Samuel B. Hopkins. Sub-gaussian mean estimation in polynomial time. CoRR, abs/1809.07425, 2018.
  • [59] Jao Ping Hou. Bounds for the least Laplacian eigenvalue of a signed graph. Acta Mathematica Sinica, 21(4):955–960, 2005.
  • [60] Adel Javanmard, Andrea Montanari, and Federico Ricci-Tersenghi. Phase transitions in semidefinite relaxations. Proceedings of the National Academy of Sciences, 113(16):E2218–E2223, 2016.
  • [61] David Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite programming. Journal of the ACM (JACM), 45(2):246–265, 1998.
  • [62] Subhash Khot and Assaf Naor. Approximate kernel clustering. Mathematika, 55(1-2):129–165, 2009.
  • [63] Vladimir Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist., 34(6):2593–2656, 2006.
  • [64] Vladimir Koltchinskii. Oracle inequalities in empirical risk minimization and sparse recovery problems, volume 2033 of Lecture Notes in Mathematics. Springer, Heidelberg, 2011. Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School].
  • [65] Jérôme Kunegis, Stephan Schmidt, Andreas Lommatzsch, Jürgen Lerner, Ernesto William De Luca, and Sahin Albayrak. Spectral analysis of signed graphs for clustering, prediction and visualization. SDM, 10:559–570, 2010.
  • [66] Jean Bernard Lasserre. An Introduction to Polynomial and Semi-Algebraic Optimization. Number 52. Cambridge University Press, 2015.
  • [67] Javad Lavaei and Steven H Low. Zero duality gap in optimal power flow problem. IEEE Transactions on Power Systems, 27(1):92–107, 2011.
  • [68] Can M. Le, Elizaveta Levina, and Roman Vershynin. Optimization via low-rank approximation for community detection in networks. Ann. Statist., 44(1):373–400, 2016.
  • [69] Guillaume Lecué and Shahar Mendelson. Learning subgaussian classes: Upper and minimax bounds. Technical report, CNRS, Ecole polytechnique and Technion, 2013.
  • [70] Claude Lemaréchal, Arkadii Nemirovskii, and Yurii Nesterov. New variants of bundle methods. Mathematical programming, 69(1-3):111–147, 1995.
  • [71] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In WWW, pages 641–650, 2010.
  • [72] Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Signed Networks in Social Media. In CHI, pages 1361–1370, 2010.
  • [73] László Lovász. On the shannon capacity of a graph. IEEE Transactions on Information theory, 25(1):1–7, 1979.
  • [74] Wing-Kin Ken Ma. Semidefinite relaxation of quadratic optimization problems and applications. IEEE Signal Processing Magazine, 1053(5888/10), 2010.
  • [75] Enno Mammen and Alexandre B. Tsybakov. Smooth discrimination analysis. Ann. Statist., 27(6):1808–1829, 1999.
  • [76] Pascal Massart. Concentration inequalities and model selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard.
  • [77] David A Mazziotti. Large-scale semidefinite programming for many-electron quantum mechanics. Physical review letters, 106(8):083001, 2011.
  • [78] Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partition model. Probab. Theory Related Fields, 162(3-4):431–461, 2015.
  • [79] Y Nesterov. Semidefinite relaxation and non-convex quadratic optimization. Optimization Methods and Software, 12:1–20, 1997.
  • [80] Mark EJ Newman. Finding community structure in networks using the eigenvectors of matrices. Physical review E, 74(3):036104, 2006.
  • [81] Carl Olsson, Anders P Eriksson, and Fredrik Kahl. Solving large scale binary quadratic problems: Spectral methods vs. semidefinite programming. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007.
  • [82] Jiming Peng and Yu Wei. Approximating k-means-type clustering via semidefinite programming. SIAM journal on optimization, 18(1):186–205, 2007.
  • [83] Jiming Peng and Yu Xia. A new theoretical framework for k-means-type clustering. In Foundations and advances in data mining, pages 79–96. Springer, 2005.
  • [84] Guy Pierra. Decomposition through formalization in a product space. Mathematical Programming, 28(1):96–115, 1984.
  • [85] Gilles Pisier. Grothendieck’s theorem, past and present. Bulletin of the American Mathematical Society, 49(2):237–323, 2012.
  • [86] Mason A Porter, Jukka-Pekka Onnela, and Peter J Mucha. Communities in networks. Notices of the AMS, 56(9):1082–1097, 2009.
  • [87] Martin Royer. Adaptive clustering through semidefinite programming. In Advances in Neural Information Processing Systems, pages 1795–1803, 2017.
  • [88] P Scobey and DG Kabe. Vector quadratic programming problems and inequality constrained least squares estimation. J. Indust. Math. Soc., 28:37–49, 1978.
  • [89] Alexander Shapiro. Weighted minimum trace factor analysis. Psychometrika, 47(3):243–264, 1982.
  • [90] A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal., 30(1):20–36, 2011.
  • [91] A. Singer and Y. Shkolnisky. Three-dimensional structure determination from common lines in Cryo-EM by eigenvectors and semidefinite programming. SIAM Journal on Imaging Sciences, 4(2):543–572, 2011.
  • [92] Amit Singer. Angular synchronization by eigenvectors and semidefinite programming. Applied and computational harmonic analysis, 30(1):20–36, 2011.
  • [93] Jun Sun, Stephen Boyd, Lin Xiao, and Persi Diaconis. The fastest mixing markov process on a graph and a connection to a maximum variance unfolding problem. SIAM review, 48(4):681–699, 2006.
  • [94] Michael J Todd. Semidefinite optimization. Acta Numerica, 10:515–560, 2001.
  • [95] Alexandre B. Tsybakov. Optimal rate of aggregation. In Computational Learning Theory and Kernel Machines (COLT-2003), volume 2777 of Lecture Notes in Artificial Intelligence, pages 303–313. Springer, Heidelberg, 2003.
  • [96] Alexandre B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Ann. Statist., 32(1):135–166, 2004.
  • [97] Sara A. van de Geer. Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.
  • [98] Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to statistics.
  • [99] Vladimir N. Vapnik. Statistical learning theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons, Inc., New York, 1998. A Wiley-Interscience Publication.
  • [100] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2018.
  • [101] V Vu. Singular vectors under random perturbation. Preprint available in arXiv:104.2000, 2010.
  • [102] Irène Waldspurger, Alexandre d’Aspremont, and Stéphane Mallat. Phase recovery, maxcut and complex semidefinite programming. Mathematical Programming, 149(1-2):47–81, 2015.
  • [103] Kilian Q Weinberger, Benjamin Packer, and Lawrence K Saul. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In AISTATS, volume 2, page 6. Citeseer, 2005.
  • [104] Kilian Q Weinberger and Lawrence K Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In AAAI, volume 6, pages 1683–1686, 2006.
  • [105] Kilian Q Weinberger and Lawrence K Saul. Unsupervised learning of image manifolds by semidefinite programming. International journal of computer vision, 70(1):77–90, 2006.
  • [106] Henry Wolkowicz. Semidefinite and lagrangian relaxations for hard combinatorial problems. In IFIP Conference on System Modeling and Optimization, pages 269–309. Springer, 1999.
  • [107] Henry Wolkowicz, Romesh Saigal, and Lieven Vandenberghe. Handbook of semidefinite programming: theory, algorithms, and applications, volume 27. Springer Science & Business Media, 2012.
  • [108] Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis–Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2015.
  • [109] S. Zhang and Y. Huang. Complex quadratic optimization and semidefinite programming. SIAM Journal on Optimization, 16(3):871–890, 2006.
  • [110] Shuzhong Zhang. Quadratic maximization and semidefinite relaxation. Mathematical Programming, 87(3):453–465, 2000.