A conditional gradient homotopy method with applications to Semidefinite Programming
Abstract
We propose a new homotopy-based conditional gradient method for solving convex optimization problems with a large number of simple conic constraints. Instances of this template naturally appear in semidefinite programming problems arising as convex relaxations of combinatorial optimization problems. Our method is a double-loop algorithm in which the conic constraint is treated via a self-concordant barrier, and the inner loop employs a conditional gradient algorithm to approximate the analytic central path, while the outer loop updates the accuracy imposed on the temporal solution and the homotopy parameter. Our theoretical iteration complexity is competitive when confronted to state-of-the-art SDP solvers, with the decisive advantage of cheap projection-free subroutines. Preliminary numerical experiments are provided for illustrating the practical performance of the method.
1 Introduction
In this paper we investigate a new algorithm for solving a large class of convex optimization problems with conic constraints of the following form:
(P) |
The objective function is assumed to be closed, convex, and proper, is a compact convex set, is an affine mapping between two finite-dimensional Euclidean vector spaces and , and is a closed convex pointed cone. Problem (P) is sufficiently generic to cover many optimization settings considered in the literature.
Example 1.1 (Packing SDP).
The model template (P) covers the large class of packing semidefinite programs (SDPs) [24, 14]. In this problem class we have – the space of symmetric matrices, a collection of positive semidefinite input matrices , with constraints given by and , coupled with the objective function and the set . Special instances of packing SDPs include the MaxCut SDP relaxation of [19], the Lovasz- function SDP, among many others; see [23].
Example 1.2 (Covering SDP).
Example 1.3 (Relative entropy programs).
Relative entropy programs are conic optimization problems in which a linear functional of a decision variable is minimized subject to linear constraints and conic constraints specified by a relative entropy cone. The relative entropy cone is defined for triples via Cartesian products of the following elementary cone
As the function is the perspective transform of the negative logarithm, we know that it is a convex function [4]. Hence, is a convex cone. The relative entropy cone is accordingly defined as . This set appears in many important applications, containing estimation of dynamical systems and the optimization of quantum channels; see [5] for an overview.
Example 1.4 (Relative Entropy entanglement).
The relative entropy of entanglement (REE) is an important measure for entanglement for a given quantum state in quantum information theory [31]. Obtaining accurate bounds on the REE is a major problem in quantum information theory. In [44] the following problem is used to find a lower bound to the REE:
where is the cone of Hermitian positive semi-definite matrices, and is the so-called partial transpose operator, which is a linear symmetric operator (see [22]). is a given quantum reference state. We obtain a problem of the form (P), upon the identification the space of Hermitian matrices, and , and .
The model template (P) can be solved with moderate accuracy using fast proximal gradient methods, like FISTA and related acceleration tricks [12]. In this formalism, the constraints imposed on the optimization problem have to be included in proximal operators which essentially means that one can calculate a projection onto the feasible set. Unfortunately, the computation of proximal operators can impose a significant burden and even lead to intractability of gradient-based methods. This is in particular the case in SDPs with a large number of constraints and decision variables, such as in convex relaxations of NP-hard combinatorial optimization problems [28]. Other prominent examples in machine learning are -means clustering [32], -nearest neighbor classification [36], sparse PCA [8], kernel learning [27], among many others. In many of these mentioned applications, it is not only the size of the problem that challenges the application of off-the-shelve solvers, but also the desire to obtain sparse solutions. As a result, the Conditional Gradient (CndG) method [17] has seen renewed interest in large-scale convex programming, since it only requires computing a Linear Mnimization Oracle (LMO) at each iteration. Compared to proximal methods, CndG features significantly reduced computational costs (e.g. when is a spectrahedron), tractability and interpretability (e.g. it generates solutions as a combination of atoms of .
Our approach
In this paper we propose a CndG framework for solving (P) with rigorous iteration complexity guarantees. Our approach retains the simplicity of CndG, but allows to disentangle the different sources of complexity describing the problem. Specifically, we study the performance of a new homotopy/path-following method, which iteratively approaches a solution of problem (P). To develop the path-following argument we start by decomposing the feasible set into two components. We assume that is a compact convex set which admits efficient applications of CndG. The challenging constraints are embodied by the membership condition . These are conic restrictions on the solution, and in typical applications of our method we have to manage a lot of them. Our approach deals with these constraints via a logarithmically homogeneous barrier for the cone . [30] introduced this class of functions in connection with polynomial-time interior point methods. It is a well-known fact that any closed convex cone admits a logarithmically homogeneous barrier (the universal barrier) [30, Thm. 2.5.1]. A central assumption of our approach is that we have a practical representation of a logarithmically homogenous barrier for the cone . As many restrictions appearing in applications are linear or PSD inequalities, this assumption is rather mild. Exploiting the announced decomposition, we approximate the target problem (P) by a family of parametric penalty-based composite convex optimization problems formulated in terms of the potential function
(1.1) |
This potential function involves a path-following/homotopy parameter , and is a barrier function defined over the set . By minimizing the function over the set for a sequence of increasing values of , we can trace the analytic central path of (1.1) as it converges to a solution of (P). The practical implementation of this path-following scheme is achieved via a double-loop algorithm in which the inner loop performs a number of CndG updating steps on the potential function , and the outer-loop readjusts the parameter as well as the desired tolerance imposed on the CndG-solver. Importantly, unlike existing primal-dual methods for (P) (e.g. [41]), our approach generates a sequence of feasible points for the original problem (P).
Comparison to the literature
Exact homotopy/path-following methods were developed in statistics and signal processing in the context of the -regularized least squares problem (LASSO) [34] to compute the entire regularization path. Relatedly, approximate homotopy methods have been studied in this context as well [21, 37], and superior empirical performance has been reported for seeking sparse solutions. [38] is the first reference which investigates the complexity of a proximal gradient homotopy method for the LASSO problem. To our best knowledge, our paper provides the first complexity analysis of a homotopy CndG algorithm to compute an approximate solution close to the analytic central path for the optimization template (P). Our analysis is heavily influenced by recent papers [43] and [42]. They study a generalized CndG algorithm designed for solving a composite convex optimization problem, in which the smooth part is given by a logarithmically homogeneous barrier. Unlike [43], we analyze a more complicated dynamic problem in which the objective changes as we update the penalty parameter, and we propose important practical extensions including line-search and inexact-LMO versions of the CndG algorithm for reducing the potential function (1.1). Furthermore, we analyze the complexity of the whole path-following process in order to approximate a solution to the initial non-penalized problem (P).
Main Results
One of the challenging questions is how to design a penalty parameter updating schedule, and we develop a practical answer to this question. Specifically, our contributions can be summarized as follows:
- •
-
•
We provide two extensions for the inner-loop CndG algorithm solving the potential minimization problem (1.1): a line-search and an inexact-LMO version. For both, we show that the complexity of the inner and the outer loop are the same as in the basic variant up to constant factors.
-
•
We present practically relevant applications of our framework, including instances of SDPs arising from convex relaxations of NP-hard combinatorial optimization problems, and provide promising results of the numerical experiments.
Direct application of existing CndG frameworks [40, 41] for solving (P) would require projection onto , which may be complicated if, e.g., is a PSD cone. Moreover, these algorithms do not produce feasible iterates but rather approximate solutions with primal optimality and dual feasibility guarantees. For specific instances of SDPs, the theoretical complexity achieved by our method matches or even improves the state-of-the-art iteration complexity results reported in the literature. For packing and covering types of SDPs, the state-of-the-art upper complexity bound reported in [14] is worse than ours since their algorithm has complexity , where are constants depending on the problem data111The notation hides polylogarithmic factors. For SDPs with linear equality constraints and spectrahedron constraints, the primal-dual method CGAL [41] produces approximately optimal and approximately feasible points in iterations. Our method is purely primal, generating feasible points anytime. At the same time, our method shares the same theoretical iteration complexity as CGAL and can deal with additional conic constraints embodied by the membership restriction without use of projection. We expect that this added feature will allow us to apply our method to handle challenging scientific computing questions connected to the quantum information theory, such as bounding the relative entropy of entanglement of quantum states (cf. Example 1.4 and [44, 15, 16]).
Organization of the paper
This paper is organized as follows: Section 2 introduces the notation and terminology we use throughout this paper. Section 3 presents the basic algorithm under consideration in this paper. In section 4 we describe an inexact version of our basic homotopy method. Section 5 contains a detailed complexity analysis of our scheme when an exact LMO is available. The corresponding statements under our inexact LMO are performed in Appendix A. Section 6 reports the performance of our method in solving relevant examples of SDPs and gives some first comparisons with existing approaches. In particular, numerical results on the mixing time problem of a finite Markov Chain, and synthetic examples aimed at testing the robustness to scaling are reported there. Appendix B contains further results on the numerical experiments.
2 Notation and Preliminaries
Let be a finite-dimensional Euclidean vector space, its dual space, which is formed by all linear functions on . The value of function at is denoted by . Let be a positive definite self-adjoint operator. Define the primal and dual norms
for and , respectively. If is the identity operator, we simplify notation to and respectively for the primal and the dual norm. For a linear operator we define the operator norm
For a differentiable function with , we denote by its gradient, by its Hessian, and by its third derivative. Accordingly, given directions the first, second, and third directional derivatives of are denoted respectively as ,,.
2.1 Self-Concordant Functions
Let be an open and convex subset of . A function is self-concordant (SC) on if and for all , we have In case where is a closed, convex, and pointed (i.e., contains no line) cone, we have more structure. We call a -canonical barrier for , denoted by , if it is SC and
From [30, Prop. 2.3.4], the following properties are satisfied by a -canonical barrier for each :
(2.1) | |||
(2.2) | |||
(2.3) | |||
(2.4) |
We define the local norm for all and . SC functions are in general not Lipschitz smooth. Still, we have access to a version of a descent lemma of the following form [29, Thm. 5.1.9]:
(2.5) |
for all and such that , where . It also holds true that [29, Thm. 5.1.8]
(2.6) |
for all and such that , where for .
A classical and useful bound is [29, Lemma 5.1.5]:
(2.7) |
2.2 The Optimization Problem
The following assumptions are made for the rest of this paper.
Assumption 1.
The function is closed and convex and continuous on . The quantity is finite.
Assumption 2.
is a closed convex pointed cone with , admitting a -canonical barrier .
The next assumption transports the barrier setup from the codomain to the domain . This is a common operation in the framework of the "barrier calculus" developed in [30, Section 5.1].
Assumption 3.
The map is linear, and is a -canonical barrier on the cone , i.e. .
Note that .
Example 2.1.
Considering the normalised covering problem presented in Example 1.2. [14] solve this problem via a logarithmic potential function method [30], involving the logarithmically homogeneous barrier for . This is a typical choice in Newton-type methods to impose the semi-definiteness constraint. However, in our projection-free environment, we use the power of the linear minimisation oracle to obtain search directions which leaves the cone of positive semidefinite matrices invariant. Instead, we employ barrier functions to incorporate the additional linear constraints in (P). Hence, we set to absorb the constraint . In particular, this frees us from matrix inversions, and related computationally intensive steps coming with Newton and interior-point methods.
Assumption 4.
is a nonempty compact convex set in , and .
Let . Thanks to assumptions 2 and 4, is attained. Our goal is to find an -solution of problem (P), defined as follows.
Definition 2.1.
Given , define
(2.8) |
The following Lemma shows that the path traces a trajectory in the interior of the feasible set which can be used to approximate a solution of the original problem (P), provided the penalty parameter is chosen large enough.
Lemma 2.2.
For all , it holds that . In particular,
(2.9) |
Proof.
Since , it follows immediately that . By Fermat’s principle we have
Convexity and [29, Thm. 5.3.7] implies that for all , we have
Let be a feasible point point satisfying . Choosing a sequence with , the continuity on gives
3 Algorithm
In this section we first describe a new CndG method for solving general conic constrained convex optimization problems of the form (2.8), for a fixed . This procedure serves as the inner loop of our homotopy method. The path-following strategy in the outer loop is then explained in Section 3.2.
3.1 The Proposed CndG Solver
The computational efficiency of our approach relies on the availability of a suitably defined minimisation oracle:
Assumption 5.
For any , the auxiliary problem
(3.1) |
is easily solvable.
We can compute the gradient and Hessian of as
This means that we obtain a local norm on given by
Note that in order to evaluate the local norm we do not need to compute the full Hessian . It only requires a directional derivative, which is potentially easy to do numerically.
A Linear Mnimization Oracle (LMO) is defined as an oracle producing the vector field
(3.2) |
Note that our analysis does not rely on a specific tie-breaking rule, so any member of the set will be acceptable.
To measure solution accuracy and overall algorithmic progress, we introduce two merit functions:
(3.3) | ||||
(3.4) |
Note that and for all . Moreover, convexity together with the definition of the point , gives
Hence,
(3.5) |
Define
(3.6) |
Then, for , we get from (2.5)
Together with the convexity of , this implies
(3.7) |
where
Optimising this expression w.r.t , we obtain the analytic step-size policy
(3.8) |
Equipped with this step strategy, procedure , described in Algorithm 1, constructs a sequence which produces an approximately-optimal solution in terms of the merit function and the potential function gap . Specifically, our main results on the performance of Algorithm 1 are the following iteration complexity estimates, which are proven in Section 5.
Proposition 3.1.
Given , Algorithm requires at most
(3.9) |
iterations in order to reach a point satisfying for the first time.
Proposition 3.2.
Given . Algorithm requires at most
(3.10) |
iterations, in order to reach a point satisfying for the first time.
3.2 Updating the Homotopy Parameter
Our analysis so far focuses on minimisation of the potential function for a fixed . However, in order to solve the initial problem (P), one must trace the sequence of approximate solutions as . The construction of such an increasing sequence of homotopy parameters is the purpose of this section.
Let be a sequence of approximation errors and homotopy parameters. For each run , we activate procedure with the given configuration . For , we assume to have an admissible initial point available. For , we restart Algorithm 1 using a kind of warm-start strategy by choosing , where denotes the first iterate of Algorithm satisfying . Note that is upper bounded by the constant defined in Proposition 3.1. After the -th restart, let us call the obtained iterate . In this way we obtain a sequence , consisting of candidates for approximately following the central path, as they are -close in terms of the gap (and, hence, in terms of the function gap) to the stage ’s optimal point . We update the parameters as follows:
-
•
The sequence of homotopy parameters is determined as for until the last round of is reached.
-
•
The sequence of accuracies requested in the algorithm is updated by .
-
•
The Algorithm stops after updates of the accuracy and homotopy parameters and yields an -approximate solution of problem (P).
Our updating strategy for the parameters ensures ”detailed balance” . This equilibrating choice between the increasing homotopy parameter and the decreasing accuracy parameter is sensible, because the iteration complexity of the CndG solver is inversely proportional to (cf. Proposition 3.1). Making the judicious choice , yields a compact assessment of the total complexity of method .
Theorem 3.3.
Choose and . The total iteration complexity of method to find an -solution of (P) is
(3.11) |
where hides polylogarithmic factors in .
4 Modifications and Extensions
4.1 Line Search
The step size policy employed in Algorithm 1 is inversely proportional to the local norm . In particular, its derivation is based on the a-priori restriction that we force our iterates to stay within a trust-region defined by the Dikin ellipsoid. This restriction may force the method to take very small steps, and as a result display bad performance in practice. A simple remedy would be to employ a line search. Given , let
(4.1) |
Thanks to the barrier structure of the potential function , some useful consequences can be drawn from the definition of . First, . This implies . Second, since is also contained in the latter set, we have
Via a comparison principle, this allows us to deduce the analysis of the sequence produced by from the analysis of the sequence induced by . Indeed, if is the sequence constructed by the line search procedure , then we have for all . Hence, we can perform the complexity analysis on the majorising function as in the analysis of procedure . Consequently, all the complexity-related estimates for the method apply verbatim to the sequence induced by the method .
4.2 Algorithm with inexact LMO
A key assumption in our approach so far is that we can obtain an exact answer from the LMO (3.2). In this section we consider the case when our search direction subproblem is solved only with some pre-defined accuracy, a setting that is of particular interest in optimization problems defined over matrix variables. As an illustration, let us consider instances of (P) where is the spectrahedron of symmetric matrices, namely . For these instances solving the subproblem (3.1) with corresponds to computing the leading eigenvector of a symmetric matrix, which when is very large is typically computed inexactly using iterative methods. We therefore relax our definition of the LMO by allowing for an Inexact Linear Minimization Oracle (ILMO), featuring additive and multiplicative errors. To define our ILMO, let
By definition of the point in (3.2) and of the gap function , we have for all .
Definition 4.1.
Given , and , we call an oracle producing a point a -ILMO, such that
(4.2) |
We write for any direction satisfying (4.2).
We note that a -ILMO returns a search point satisfying inclusion (3.2). We therefore refer to the case where we can design a -ILMO as the exact oracle case. Moreover, a -ILMO is an oracle that only features additive errors. This relaxation for inexact computations has been considered in previous work, such as [10, 25, 18]. The combination of multiplicative error and additive error displayed in (4.2) is similar to the oracle constructed in [26]. The above definition can also be interpreted as follows. The answer of the exact LMO solves the maximization problem and the optimal value in this problem is nonnegative. Thus, (4.2) can be interpreted as the requirement that the point solves the same problem up to the additive error and the multiplicative error (i.e. because it is itself the output of a convex optimisation routine). We emphasize that our analysis does not depend on the specific choice of the point .
Equipped with these concepts, we can derive an analytic step-size policy as in the exact regime. Let be given, , and small enough so that . Just like in (3.7), we can then obtain the upper bound
Optimizing the upper bound with respect to gives us the analytic step-size criterion
(4.3) |
The -ILMO, and the step size policy (4.3) are enough to define a CndG-update with inexact computations that are executed by a function , defined in Algorithm 4.
To use this method within a bona fide algorithm, we would need to introduce some stopping criterion. Ideally, such a stopping criterion should aim at making the merit functions or small. However, our inexact oracle does not allow us to make direct inferences on the values of these merit functions. Hence, we need to come up with a stopping rule that uses only quantities that are observable and that are coupled with these merit functions. The natural optimality measure for the inexact oracle case is thus . Indeed, by the very definition of our inexact oracle, if, for accuracy , we have arrived at an iterate for which , we know from (4.2) that . Therefore, by making a function of , we would attain essentially the same accuracy as the exact implementation requires when running Algorithm 1. This is the main idea leading to the development of Algorithm 5.
There are two types of iteration complexity guarantees that we can derive for . One is concerned with upper bounding the number of iterations after which our stopping criterion will be satisfied. The other is an upper bound on the number of iterations needed to make the potential function gap smaller than . The derivation of these estimates is rather technical and follows closely the complexity analysis of the exact regime. We therefore only present the statements here and invite the reader to consult the Appendix A for a detailed proof.
Proposition 4.2.
Assume that , , and . Algorithm requires at most
(4.4) |
iterations in order to reach a point satisfying for the first time.
Proposition 4.3.
Assume that , , and . Algorithm requires at most
(4.5) |
iterations, in order to reach a point satisfying for the first time.
Updating the Homotopy Parameter in the Inexact Case
We are now in a position to describe the outer loop of our algorithm with ILMO. The construction is very similar to Algorithm 2.
As before, our aim is to reach an -solution (Definition 2.1). Let be sequences of homotopy parameters, accuracies in the inner loop, and ILMO inexactnesses. For each run , we activate procedure with the given configuration . For , we assume to have an admissible initial point available. For , we restart Algorithm 5 using a kind of warm-start strategy by choosing , where is the first iterate of Algorithm satisfying . Note is upper bounded by the mentioned number, by means of Proposition 4.2. After the -th restart, let us call the obtained iterate . In this way we generate a sequence , consisting of candidates for approximately following the central path, as they are -close in terms of (and, hence, in terms of the potential function gap ). We update the parameters as follows:
-
•
The sequence of homotopy parameters is determined as for until the last round of is reached.
-
•
The sequences of accuracies in the inner loop and ILMO inexactnesses requested in procedure is updated by , .
-
•
The Algorithm stops after updates of the accuracy and homotopy parameters, and yields an -approximate solution of problem (P).
We underline that it is not necessary to stop the algorithm after a prescribed number of iterations and it is essentially an any-time convergent algorithm.
Our updating strategy for the parameters ensures that . We will make the judicious choice to obtain an overall assessment of the iteration complexity of method . The proof is given in Appendix A.
Theorem 4.4.
Choose and . The total iteration complexity, i.e. the number of CndG steps, of method to find an -solution of (P) is
5 Complexity Analysis
In this section, we give a detailed presentation of the analysis of the iteration complexity of Algorithm 2.
5.1 Analysis of procedure
Recall and . The following estimate can be established as in [43, Prop. 3]. We therefore omit its simple proof.
Lemma 5.1.
For all , we have
(5.1) |
Observe that
(5.2) |
5.1.1 Proof of Proposition 3.2
The proof mainly follows the steps of the proof of Theorem 1 in [43]. The main technical challenge in our analysis is taking care of the penalty parameter . A main step in the complexity analysis is the identification of two convergence regimes, depending on the magnitude of the merit function .
Lemma 5.2.
Define , and the partition
If we have
(5.3) |
and if we have
(5.4) |
In particular, is monotonically decreasing.
Before proving this Lemma, let us explain how we use it to estimate the total number of iterations needed by method until the stopping criterion is satisfied. Motivated by the two different regimes identified in Lemma 5.2, we can bound the number of iterations produced by before the stopping criterion applies in two steps: First, we bound the number of iterations which the algorithm spends in the set ; Second we bound the number of iterations the algorithm spends in . Combining these two upper bounds, yields the total iteration bound reported in Proposition 3.2.
Proof of Lemma 5.2.
Assume that iteration is in phase . To reduce notational clutter, we set and . Combining the condition of phase with (5.2), we deduce that . This in turn implies . Hence, at this iteration, algorithm chooses the step sizes . The per-iteration reduction of the potential function can be estimated as
(5.5) |
where and . This readily yields and implies that is monotonically decreasing. For , we also see . Together with (5.1), this implies
(5.6) |
Using (3.5), the monotonicity of and that for (see [43, Prop. 1]), the fact that the function is strictly increasing for , we arrive at
(5.7) |
Next, consider an integer . We have to distinguish the iterates using large steps ), from those using short steps (). Let us first consider the case where . Then, , which implies , and . Moreover, using (2.7), we readily obtain
Therefore,
In the regime where , we obtain the estimate , by following the same reasoning as in [43]. Using the relation , we conclude
We now exploit this monotonicity of the potential function gap in order to upper bound the number of iterations which Algorithm spends in the disjoint phases and , respectively.
Proof of Proposition 3.2.
Given the pair , denote by the upper bound defined in (3.10). We separate into two parts, corresponding with stages and , as they are defined in the proof of Lemma 5.2. We will show that where is a bound on the number of iterates within and is bound on the number of iterates within until is reached.
Note that Lemma 5.2 ensures that for all , it holds that , i.e. the sequence is monotonically decreasing.
We begin by deriving an expression for , the number of iterations which spends in . To that end, let denote the increasing set of indices enumerating the elements of the set . From (5.7) and ), we conclude
(5.8) |
Setting (since ), we arrive at the recursion
Furthermore, by definition, all satisfy , so that (5.6) yields
Hence, , and
(5.9) |
yields the upper bound.
We proceed analogously with the analysis on the set . Let the increasing set of indices enumerating the elements of the set , and the upper bound on iterations in this set. Using inequality (5.4) for together with the monotonicity of results in
(5.10) |
Telescoping the inequality (5.10) together with we get
Thus, , is satisfied for . This demonstrates that at most iterations of are sufficient for achieving
5.1.2 Proof of Proposition 3.1
Using the bound on the number of iterations in from the proof of Proposition 3.2, we are left to bound the number of iteration in , until the condition is satisfied.
Let denote the increasing set of indices enumerating the elements of the set . From (5.4) we know that
Set and . We thus obtain the recursion
and we can apply [43, Prop. 2.4] directly to the above recursion to obtain the estimates
Since the algorithm terminates when we obtain an iterate with , it is sufficient to stop at the label is reached satisfying . Solving this for yields
(5.11) |
Combining the bound (5.11) with , we obtain the total complexity estimate postulated in Proposition 3.1.
5.2 Analysis of the outer loop and proof of Theorem 3.3
Let denote the a-priori fixed number of updates of the accuracy and homotopy parameter. We set , the last iterate of procedure satisfying . From this, using the definition of the gap function, we deduce that
(5.12) |
Hence, using Lemma 2.2, we observe
(5.13) |
Since , we obtain . We next estimate the initial gap incurred by our warm-start strategy. Observe that
where the last transitions follows from . Moreover,
Since , the above two inequalities imply
Whence,
(5.14) |
We are now in the position to estimate the total iteration complexity by using Proposition 3.1, and thereby prove Theorem 3.3. We do so by using the updating regime of the sequences and explained in Section 3. These updating mechanisms imply
In deriving these bounds, we have used the definition of and the following estimates:
and
It remains to bound the complexity at . We have
Since and , we have
Adding the two gives the total complexity bound
6 Experimental results
In this section, we give two examples covered by the problem template (P), and report the results of numerical experiments that illustrate the practical performance and scalability of our method. We implement Algorithm 2 using procedure and as solvers for the inner loops. We compare the results of our algorithm to two benchmarks: (1) an interior point solution of the problem by SDPT3 [35], and (2) the solution obtained by CGAL [41]. Our performance measure for these experiments is the relative optimality gap, which given the value of the objective of the solution computed for each method and iteration , , is computed by , where is the interior point solution. The full description of the setup for these experiments is available in Appendix B. A key takeaway from these numerical experiments is that and are not influenced by the problem’s scaling and are therefore well-suited to solving ill-posed problems with differently scaled constraints.
6.1 Estimating the mixing time of a Markov chain
Consider a continuous-time Markov chain on a weighted graph, with edge weights representing the transition rates of the process. Let be the stochastic semi-group of the process, where
is the graph Laplacian. From it follows that is a stationary distribution of the process. From [9] we have the following bound on the total variation distance of the time distribution of the Markov process and the stationary distribution
The quantity is called the mixing time. The constant is the second-largest eigenvalue of the graph Laplacian . To bound this eigenvalue, we follow the SDP-strategy laid out in [33], and in some detail described in Appendix B.
In the experiments, we generated random connected undirected graphs with nodes and edges, and for each edge in the graph, we generated a random weight uniformly in . Figure 1 illustrates the relative optimality gap for the generated results by each method (and the dependence of and on the starting accuracy ), only appears in the right most graphs222More detailed results are available in Appendix B..
We conclude that, despite being generally costly per iteration, it exhibits the best computational performance both in the number of iterations and in time. Moreover, for all of the datasets, the primal-dual method generated solutions that were far from being feasible, and was not able to reduce the feasibility gap even after 1000 seconds or 10000 iterations, resulting in very poor relative optimality gap.
Our conjecture is that the slow convergence rate of with respect to feasibility is due to the different scaling for the right-hand-side of the constraints, which results in a relative feasibility gap which is very large for constraints with small right-hand-side. To test this conjecture , in the next section we present further experiments testing the performance of the methods as a function of the right-hand-side scaling.333We thank an anonymous referee for suggesting this experiment.

6.2 Randomly Scaled SDP
In order to further investigate the robustness of , and to constraint scaling, we’ve created a synthetic data set for a general SDP with inequality constraints. The Synthetic Random Scaling (SRS) problem is formulated as follows:
(SRS) |
where both the objective function coefficient matrix and, the constraint coefficient matrices , for , are randomly generated PSD matrices with unit Frobenius norm. The coefficients are defined as , where is a uniformly distributed random number, , and is a randomly generated PSD matrix with unit trace. Thus, parameter controls the scaling of the right-hand-side of the constraints. Specifically, when we expect all to have the same order of magnitude, while for constraint . As we observed for the Mixing problem in Section B.2, our expectation is for and to be impervious to this right-hand-side scaling, and for the performance of to deteriorate as increases. To compare our method with CGAL, the CGAL output was adjusted to obtain a feasible solution. The details of this procedure are provided in Appendix B.3.


For the numerical experiments, we have chosen parameters and . Figures 2 and 3 show the performance of our methods in terms of the relative optimality gap for problem (SRS), where the “optimal” solution is computed using the SDPT3 solver, as described in Appendix B.3. Since the dataset consists of homogeneous data points, the plots present aggregated results over 30 instances for each value of . Figure 2 illustrates the performance of each algorithm in terms of relative optimality gap over iteration count, while Figure 3 presents the performance in terms of runtime, within a time limit of 1000 seconds. The plots indicate that, while in the initial stages of running and with a smaller value of () are superior, larger values () achieve slightly better overall results. Furthermore, the line search-based method consistently achieves better iteration and time complexity performances than .
Importantly, the plots confirm the conjecture that the performance of deteriorates as increases, our methods show consistent performance across different values. In fact, has inferior performance for ill-scaled problems () compared to and , leading to an average optimality gap of up to two orders of magnitude larger. Although Figure 2 shows that has iteration complexity comparable to or worse than and across all values of , its low-cost iterations can make it competitive in terms of time complexity, as shown in Figure 3. Specifically, is able to perform more iterations leading to superior time complexity when . However, when and especially , ’s iterations, while still inexpensive, fail to decrease the solution infeasibility, leading to limited improvement. A detailed analysis of the numerical experiments is provided in Appendix B.3.
7 Conclusions
Solving large scale conic-constrained convex programming problems is still a challenge in mathematical optimization and machine learning. In particular, SDPs with a massive amount of linear constraints appear naturally as convex relaxations of combinatorial optimization problems. For such problems, the development of scalable algorithms is a very active field of research [41, 14]. In this paper, we introduce a new path-following/homotopy strategy to solve such massive conic-constrained problems, building on recent advances in projection-free methods for self-concordant minimization [43, 11, 13]. Our theoretical analysis and numerical experiments scheme show that the method is competitive with state-of-the-art solvers in terms of its iteration complexity and gives promising results in practice. Future work will focus on improvements of our deployed CndG solver to accelerate the subroutines.
References
- [1] Y. Ye, GSet random graphs, https://www.cise.u.edu/research/sparse/matrices/Gset/ (accessed May, 2022).
- Arora et al. [2005] S. Arora, E. Hazan, and S. Kale. Fast algorithms for approximate semidefinite programming using the multiplicative weights update method. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05), pages 339–348, 2005. doi: 10.1109/SFCS.2005.35.
- Bandeira et al. [2016] Afonso S. Bandeira, Nicolas Boumal, and Vladislav Voroninski. On the low-rank approach for semidefinite programs arising in synchronization and community detection. In Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir, editors, 29th Annual Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pages 361–382, Columbia University, New York, New York, USA, 23–26 Jun 2016. PMLR. URL https://proceedings.mlr.press/v49/bandeira16.html.
- Boyd and Vandenberghe [2004] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004. ISBN 1107394007.
- Chandrasekaran and Shah [2017] Venkat Chandrasekaran and Parikshit Shah. Relative entropy optimization and its applications. Mathematical Programming, 161:1–32, 2017.
- Charikar and Wirth [2004] M. Charikar and A. Wirth. Maximizing quadratic programs: extending grothendieck’s inequality. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 54–60, 2004. doi: 10.1109/FOCS.2004.39.
- CVX Research [2012] Inc. CVX Research. CVX: Matlab software for disciplined convex programming, version 2.0. http://cvxr.com/cvx, August 2012.
- d’Aspremont et al. [2008] Alexandre d’Aspremont, Francis Bach, and Laurent El Ghaoui. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9(7), 2008.
- Diaconis and Stroock [1991] Persi Diaconis and Daniel W. Stroock. Geometric bounds for eigenvalues of Markov chains. Annals of Applied Probability, 1:36–61, 1991.
- Dunn and Harshbarger [1978] J. C Dunn and S Harshbarger. Conditional gradient algorithms with open loop step size rules. Journal of Mathematical Analysis and Applications, 62(2):432–444, 1978. doi: https://doi.org/10.1016/0022-247X(78)90137-3. URL https://www.sciencedirect.com/science/article/pii/0022247X78901373.
- Dvurechensky et al. [2020] Pavel Dvurechensky, Petr Ostroukhov, Kamil Safin, Shimrit Shtern, and Mathias Staudigl. Self-concordant analysis of Frank-Wolfe algorithms. In Hal Daume III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 2814–2824, Virtual, 13–18 Jul 2020. PMLR. URL http://proceedings.mlr.press/v119/dvurechensky20a.html. arXiv:2002.04320.
- Dvurechensky et al. [2021] Pavel Dvurechensky, Shimrit Shtern, and Mathias Staudigl. First-order methods for convex optimization. EURO Journal on Computational Optimization, 9:100015, 2021. ISSN 2192-4406. doi: https://doi.org/10.1016/j.ejco.2021.100015. URL https://www.sciencedirect.com/science/article/pii/S2192440621001428. arXiv:2101.00935.
- Dvurechensky et al. [2022] Pavel Dvurechensky, Kamil Safin, Shimrit Shtern, and Mathias Staudigl. Generalized self-concordant analysis of frank–wolfe algorithms. Mathematical Programming, 2022. doi: 10.1007/s10107-022-01771-1. URL https://doi.org/10.1007/s10107-022-01771-1.
- Elbassioni et al. [2022] Khaled Elbassioni, Kazuhisa Makino, and Waleed Najy. Finding sparse solutions for packing and covering semidefinite programs. SIAM Journal on Optimization, 32(2):321–353, 2022. doi: 10.1137/20M137570X. URL https://doi.org/10.1137/20M137570X.
- Fawzi et al. [2019] Hamza Fawzi, James Saunderson, and Pablo A Parrilo. Semidefinite approximations of the matrix logarithm. Foundations of Computational Mathematics, 19:259–296, 2019.
- Faybusovich and Zhou [2020] Leonid Faybusovich and Cunlu Zhou. Self-concordance and matrix monotonicity with applications to quantum entanglement problems. Applied Mathematics and Computation, 375:125071, 2020. doi: https://doi.org/10.1016/j.amc.2020.125071. URL https://www.sciencedirect.com/science/article/pii/S0096300320300400.
- Frank and Wolfe [1956] Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1-2):95–110, 2019/09/05 1956. doi: 10.1002/nav.3800030109. URL https://doi.org/10.1002/nav.3800030109.
- Freund and Grigas [2016] Robert M Freund and Paul Grigas. New analysis and results for the frank–wolfe method. Mathematical Programming, 155(1-2):199–230, 2016.
- Goemans and Williamson [1995] Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995.
- Göring et al. [2008] Frank Göring, Christoph Helmberg, and Markus Wappler. Embedded in the shadow of the separator. SIAM Journal on Optimization, 19(1):472–501, 2008. doi: 10.1137/050639430. URL https://doi.org/10.1137/050639430.
- Hale et al. [2008] Elaine T. Hale, Wotao Yin, and Yin Zhang. Fixed-point continuation for -minimization: Methodology and convergence. SIAM Journal on Optimization, 19(3):1107–1130, 2008. doi: 10.1137/070698920. URL https://doi.org/10.1137/070698920.
- Horodecki et al. [1996] Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Separability of mixed states: necessary and sufficient conditions. Physics Letters A, 223(1):1–8, 1996. doi: https://doi.org/10.1016/S0375-9601(96)00706-2. URL https://www.sciencedirect.com/science/article/pii/S0375960196007062.
- Iyengar et al. [2011] G. Iyengar, D. J. Phillips, and C. Stein. Approximating semidefinite packing programs. SIAM Journal on Optimization, 21(1):231–268, 2011. doi: 10.1137/090762671. URL https://doi.org/10.1137/090762671.
- Iyengar et al. [2010] Garud Iyengar, David J Phillips, and Cliff Stein. Feasible and accurate algorithms for covering semidefinite programs. Scandinavian Workshop on Algorithm Theory, pages 150–162, 2010.
- Jaggi [2013] Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International Conference on Machine Learning, pages 427–435, 2013.
- Lacoste-Julien et al. [2013] Simon Lacoste-Julien, Martin Jaggi, Mark Schmidt, and Patrick Pletscher. Block-coordinate Frank-Wolfe optimization for structural SVMs. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 53–61, Atlanta, Georgia, USA, 2013. PMLR. URL https://proceedings.mlr.press/v28/lacoste-julien13.html.
- Lanckriet et al. [2004] Gert RG Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine learning research, 5(Jan):27–72, 2004.
- Laurent and Rendl [2005] Monique Laurent and Franz Rendl. Semidefinite programming and integer programming. Handbooks in Operations Research and Management Science, 12:393–514, 2005.
- Nesterov [2018] Yurii Nesterov. Lectures on Convex Optimization, volume 137 of Springer Optimization and Its Applications. Springer International Publishing, 2018.
- Nesterov and Nemirovski [1994] Yurii Nesterov and Arkadi Nemirovski. Interior Point Polynomial methods in Convex programming. SIAM Publications, 1994.
- Nielsen and Chuang [2010] M. Nielsen and I Chuang. Quantum Computation and Quantum Information. Cambridge: Cambridge University Press., 2010.
- Peng and Wei [2007] Jiming Peng and Yu Wei. Approximating k-means-type clustering via semidefinite programming. SIAM Journal on Optimization, 18(1):186–205, 2007. doi: 10.1137/050641983. URL https://doi.org/10.1137/050641983.
- Sun et al. [2006] Jun Sun, Stephen Boyd, Lin Xiao, and Persi Diaconis. The fastest mixing markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Review, 48(4):681–699, 2022/01/12/ 2006. URL http://www.jstor.org/stable/20453871.
- Tibshirani and Taylor [2011] Ryan J. Tibshirani and Jonathan Taylor. The solution path of the generalized lasso. Ann. Statist., 39(3):1335–1371, 2011. doi: 10.1214/11-AOS878. URL https://projecteuclid.org:443/euclid.aos/1304514656.
- Toh et al. [1999] K. C. Toh, M. J. Todd, and R. H. Tütüncü. Sdpt3 —a matlab software package for semidefinite programming, version 1.3. Optimization Methods and Software, 11(1-4):545–581, 01 1999. doi: 10.1080/10556789908805762. URL https://doi.org/10.1080/10556789908805762.
- Weinberger and Saul [2009] Kilian Q Weinberger and Lawrence K Saul. Distance metric learning for large margin nearest neighbor classification. Journal of machine learning research, 10(2), 2009.
- Wen et al. [2010] Zaiwen Wen, Wotao Yin, Donald Goldfarb, and Yin Zhang. A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation. SIAM Journal on Scientific Computing, 32(4):1832–1857, 2010. doi: 10.1137/090747695. URL https://doi.org/10.1137/090747695.
- Xiao and Zhang [2013] Lin Xiao and Tong Zhang. A proximal-gradient homotopy method for the sparse least-squares problem. SIAM Journal on Optimization, 23(2):1062–1091, 2013. doi: 10.1137/120869997. URL https://doi.org/10.1137/120869997.
- [39] Alp Yurtsever. SketchyCGAL MATLAB code. URL https://github.com/alpyurtsever/SketchyCGAL.
- Yurtsever et al. [2018] Alp Yurtsever, Olivier Fercoq, Francesco Locatello, and Volkan Cevher. A conditional gradient framework for composite convex minimization with applications to semidefinite programming. pages 5727–5736. PMLR, 2018. ISBN 2640-3498.
- Yurtsever et al. [2021] Alp Yurtsever, Joel A. Tropp, Olivier Fercoq, Madeleine Udell, and Volkan Cevher. Scalable semidefinite programming. SIAM Journal on Mathematics of Data Science, 3(1):171–200, 2021. doi: 10.1137/19M1305045. URL https://doi.org/10.1137/19M1305045.
- Zhao [2023] Renbo Zhao. An away-step frank-wolfe method for minimizing logarithmically-homogeneous barriers. arXiv preprint arXiv:2305.17808, 2023.
- Zhao and Freund [2022] Renbo Zhao and Robert M. Freund. Analysis of the Frank–Wolfe method for convex composite optimization involving a logarithmically-homogeneous barrier. Mathematical Programming, 2022. doi: 10.1007/s10107-022-01820-9. URL https://doi.org/10.1007/s10107-022-01820-9.
- Zinchenko et al. [2010] Yuriy Zinchenko, Shmuel Friedland, and Gilad Gour. Numerical estimation of the relative entropy of entanglement. Physical Review A, 82(5):052336, 2010.
Appendix A Modifications due to inexact oracles
In this appendix we give the missing proofs of the theoretical statements in Sections 4 and 4.2. Consider the function , where is any member of the set (cf. Definition 4.1). It is easy to see that 5.1 translates to the case with inexact oracle feedback as
(A.1) |
Moreover, one can show that
(A.2) |
A.1 Proof of Proposition 4.3
The proof of this proposition mimics the proof of the exact regime (cf. Proposition 3.2), with some important twists. We split the analysis into two phases, bound the maximum number of iterations the algorithms can spend on these phases, and add these numbers up to obtain the worst-case upper bound.
First, for a round of , we use the abbreviations
We further redefine the phases as (phase I) and (phase II), where .
Note that in the analysis we use the inequality since our goal is to estimate the number of iterations to achieve . Moreover, if the stopping condition holds, we obtain by the definition of the ILMO and inequality (3.5) that
(A.3) |
which implies that . Thus, in the analysis we also use the inequality . We start by showing that the sequence is monotonically decreasing.
Monotonicity of
Consider . Combining the condition of this phase with (A.2), we deduce that . This in turn implies . Hence, at this iteration algorithm chooses the step size . The per-iteration reduction of the potential function can then be estimated just as in the derivation of (5.5):
(A.4) |
Note that the r.h.s. is well-defined since as the stopping condition is not yet achieved. The above inequality, using the definition of the potential function gap as , readily yields
(A.5) |
Now, consider , which means that . There are two cases we need to consider, depending on the step size . We first consider the case where . Then , which implies . From there we conclude that , and, by (3.7) and (2.7), that
Therefore, using again that , we have
(A.6) |
where the last inequality follows .
Consider now the case where , i.e., . Then, using this stepsize, just as in the derivation of (5.5) and (A.4), we get
(A.7) |
Again, since for , we have . Using the monotonicity of , inequality (A.1), and that for [43, Prop. 1], we can develop the bound
This and (A.7), again by , implies that also in this regime we obtain the estimate
(A.8) |
As we see, the above inequality holds on the whole phase II, both when and when . In particular, since , together with (A.6) imply that is monotonically decreasing.
We now move to the analysis of the iteration complexity.
Analysis for Phase I
Let let denote the increasing set of indices enumerating the elements of the set . Since , it holds that and so . Together with (A.1), this implies
Using the monotonicity of and that for [43, Prop. 1], we further obtain
(A.9) |
By our definition of the inexact oracle, inequalities (3.5) and , we see that
Combining this with (A.9) and the fact that the function is strictly increasing for and , we obtain
(A.10) |
where in the first and last inequalities follow from being monotonically decreasing and . Rearranging the terms and defining , which we know to be in since and , it follows from the previous display that
(A.11) |
On the other hand, we have
since for . Moreover, we know that , which gives us in combination with (A.11)
(A.12) |
Solving w.r.t. , we see that the number of iterations in is bounded as
(A.13) |
Analysis for Phase II
Let let denote the increasing set of indices enumerating the elements of the set . Let be a bound on the number of iterations until belonging to phase II. We will show that setting is such a bound. Indeed, if then the proof is done, otherwise for all from monotonicity of the sequence . Calling and using the monotonicity of this sequence, we see from (A.8) and (A.3) that, for ,
This implies that for that
Telescoping this expression together with , which stems from the entire sequence being monotonically decreasing, we see that
is satisfied for .
We can therefore conclude that the number of iterations required to obtain does not exceed which is exactly the value of , thus concluding the proof.
A.2 Proof of Proposition 4.2
Our analysis concentrates on the behaviour of the algorithm in Phase II since the estimate for number of iterations in Phase I still holds. Here we split the iterates of the set into two regimes so that with and . By monotonicity of the potential function gap, we know that the indices of of (’high states’) precede those of (’low states’). We bound the size of each of these subsets in order to make smaller than . Since the potential function gap is monotonically decreasing, we can organise the iterates on phase II as and accordingly, for some .
Bound on :
For we know that
(A.14) |
Define for , so that . Then, we obtain the recursion
for and and . By [43, Proposition 4], we can conclude that
Since we know that , we obtain the bound
(A.15) |
Bound on :
On this subset we cannot use the same argument as for the iterates in , since we cannot guarantee that the sequence involved in the previous part of the proof is positive. However, we still know that (A.14) applies for the iterates in . If we telescope this expression over the indices , and using the fact that , we see that
It follows
(A.16) |
Combining the bounds:
A.3 Analysis of the outer loop and proof of Theorem 4.4
We set the last iterate of procedure , where is at most by 4.2. Therefore, we know that , and a-fortiori
By eq. 5.12 and the definition of the -ILMO, we obtain
Hence, using 2.2, we observe
Since and , we obtain . The latter, by the choice of the number of restarts implies that
Our next goal is to estimate the total number of inner iterations based on the estimation for the first epoch and summation over epochs . For each epoch we use the estimate from 4.2, which we repeat here for convenience
(A.17) |
We see that we need to estimate the quantities and , which is our next step. Consider and observe that
Moreover,
Since , the above two inequalities imply
Whence,
where we used our assignments . In the same way, we obtain
where we again used our assignments .
Using these bounds, our assignments , we see that and the bound (A.3) simplifies to
(A.18) | ||||
(A.19) | ||||
(A.20) |
Using the same bounds for , as in the proof of 3.3 (see Section 5.2), we obtain that
(A.21) |
Finally, we estimate as follows using (A.3) and that , , , :
Combining the latter with (A.21), we obtain that the leading complexity term is
We remark that assuming that the oracle accuracies have geometric decay is crucial in order to obtain the same dependence on the accuracy in the complexity as in the exact case. Indeed, if the oracle accuracy is fixed to be , then the last term in the bound (A.3) contains a term , which after summation will imply that the leading term in the complexity would be .
Appendix B Added Details to the Numerical Experiments
In this section we report outcomes obtained by extensive numerical experiments illustrating the practical performance of our method. We implement Algorithm 2 using procedure and as solvers for the inner loops. We compare the results of our algorithm to two benchmarks: (1) an interior point solution of the problem, and (2) , implemented following the suggestions in [41]. In order to enable a fair comparison with , both and also used the LMO implementation of the random Lanczos algorithm provided as part of the code. The interior point solution was obtained via the Matlab-based modelling inteface CVX [7] (version 2.2) with the SDPT3 solver (version 4.0) [35]. The implementation of CGAL was taken from [39]. All experiments have been conducted on an Intel(R) Xeon(R) Gold 6254 CPU @3.10GHz server limited to 4 threads per run and 512G total RAM using Matlab R2019b.
B.1 MaxCut Problem
The following class of SDPs is studied in [2]. This SDP arises in many algorithms such as approximating MaxCut, approximating the CUTNORM of a matrix, and approximating solutions to the little Grothendieck problem [6, 3]. The optimization problem is defined as
(MAXQP) | ||||||
s.t. | ||||||
Let , and consider the conic hull of , defined as This set admits the -logarithmically homogenous barrier
We can then reformulate (MAXQP) as
(B.1) | ||||||
s.t. | ||||||
where is a linear homogenous mapping from to . Set for .
We apply this formulation to the classical MaxCut problem: given an undirected graph with vertex set and edge set , we seek to find the maximum-weight cut of the graph. The MaxCut problem is formulated as , where is the combinatorial Laplace matrix of graph . A convex relaxation of this NP-hard combinatorial optimization problem can be given by the SDP [19]
which is the form of (MAXQP) with .
To evaluate the performance of the tested algorithms on this problem, we consider the random graphs G1-G30, published online in [1]. The and methods were applied to all datasets with parameters and . The solutions were compared to the solution obtained from running , using the original code applied to the scaled problem and assuming that all constraint are satisfied with equality. The computations were stopped after either seconds of running time. The benchmark solutions for the interior point method for each data set are displayed in Table 1. Table 1 also displays the size of each dataset, the value obtained by solving the MaxCut SDP relaxation using CVX with the SDPT3 solver, and , that is the minimum eigenvalue of the solution. We observe that for larger graphs the interior-point solver SDPT3 returns infeasible solutions, displaying negative eigenvalues. If this occurs, the value obtained by a corrected solution is used as a reference value instead. This corrected solution is constructed as , where is the identity matrix, and is the minimal number for which for all . The objective value of this corrected solution, is also recorded in Table 1 as the feasible value (Feas. value). We therefore label the optimal solution as the minimum of and the minimal value obtained by any of the methods. The performance measure we employ to benchmark our experimental results is the relative optimality gap, defined as . Moreover, since CGAL does not necessarily generate feasible solutions, its reported solution is corrected to the feasible one , where is the relative feasibility gap of . In particular, for every dataset , method , and parameter choice , after each iteration we record the objective value of the solution after this iteration (or, in the case of , the corrected solution), the corresponding relative optimality gap , the time from the start of the run and, in the case of , the feasibility gap of the current solution.
Figures 4-7 illustrate the performance of our methods for various parameter values and datasets G1-G30 vs. both iteration and time. More specifically, Figures 4 and 6 display, for each iteration , the best relative optimality gap attained up to that iteration; Figures 5 and 7 display, for each time point , the best relative optimality gap obtained up to time . Additionally, Tables 2 and 3 report the algorithmic performances of the tested methods for each data base and method, at specific time points seconds. Specifically, for each of the time points we compute , and then display the relative optimality gap (Opt. Gap) , the relative feasibility gap for (Feas. Gap), and the number of iterations performed until that time point (Iter). The smallest relative optimality gap for each data set and time is marked in bold.
Figures 4 and 6, show that the performances of and are similar across iterations, with having a slight advantage. Further, the performance of both methods remains nearly unchanged for different values of . Note that, for a low number of iterations (smaller than 1000), and generally perform better than . However, beyond that point, exhibits a sudden improvement in objective value, leading to better performance. When looking at the methods’ performances over time in Figures 5 and 7, the initial advantage of and seems to disappear, due to the lower per-iteration cost of . However, a closer look at the results in Tables 2 and 3 reveals that achieves the lowest optimality gap in 7 out of 30 datasets after 1000 seconds, and in 13 out of the 30 datasets after 10000 seconds. The former instances can be identified by the fact that obtains an optimality gap higher than after 100 seconds, and has a very slow improvement rate after that point. In contrast, obtains a higher optimality gap of approximately at that time but manages to achieve better performance due to its superior optimality gap reduction rate. Additionally, we observe that the costly iterations of result in inferior performance over time compared to , as it can only complete a tenth of the iterations within the same time frame. In summary, while manages to obtain a less than optimality gap quite quickly, in about a third of the instances it gets ‘stuck’ and does not significantly improve the objective function, whereas and are more consistent across the instances starting from a higher optimality gap, but making progress in a constant rate throughout their execution, with being more efficient in terms of time complexity.
Dataset | #Nodes | #Edges | Obj. value | Feas. value | |
G1 | 800 | 19176 | 12083.2 | 0 | - |
G2 | 800 | 19176 | 12089.43 | 0 | - |
G3 | 800 | 19176 | 12084.33 | 0 | - |
G4 | 800 | 19176 | 12111.5 | 0 | - |
G5 | 800 | 19176 | 12100 | 0 | - |
G6 | 800 | 19176 | 2667 | 0 | - |
G7 | 800 | 19176 | 2494.25 | 0 | - |
G8 | 800 | 19176 | 2535.25 | 0 | - |
G9 | 800 | 19176 | 12111.5 | 0 | - |
G10 | 800 | 19176 | 12111.5 | 0 | - |
G11 | 800 | 1600 | 634.83 | 0 | - |
G12 | 800 | 1600 | 628.41 | 0 | - |
G13 | 800 | 1600 | 651.54 | 0 | - |
G14 | 800 | 1600 | 12111.5 | 0 | - |
G15 | 800 | 4661 | 3171.56 | 0 | - |
Dataset | #Nodes | #Edges | Value | Feas. value | |
G16 | 800 | 4672 | 3175.02 | 0 | - |
G17 | 800 | 4667 | 3171.25 | 0 | - |
G18 | 800 | 4694 | 1173.75 | 0 | - |
G19 | 800 | 4661 | 1091 | 0 | - |
G20 | 800 | 4672 | 1119 | 0 | - |
G21 | 800 | 4667 | 1112.06 | 0 | - |
G22 | 2000 | 19990 | 18276.89 | -1 | 14135.95 |
G23 | 2000 | 19990 | 18289.22 | -1 | 14142.11 |
G24 | 2000 | 19990 | 18286.75 | -1 | 9997.75 |
G25 | 2000 | 19990 | 18293.5 | -1 | 9996 |
G26 | 2000 | 19990 | 18270.24 | -1 | 14132.87 |
G27 | 2000 | 19990 | 8304.5 | -1 | 4141.75 |
G28 | 2000 | 19990 | 8253.5 | -1 | 4100.75 |
G29 | 2000 | 19990 | 8377.75 | -1 | 4209 |
G30 | 2000 | 19990 | 8381 | -1 | 4215.5 |




Data | Time | CGAL | LCG 0.25 | LCG 0.5 | LCG 0.9 | CG 0.25 | CG 0.5 | CG 0.9 | ||||||||
Set | (sec) | Opt. | Feas. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter |
Gap % | Gap % | Gap % | Gap % | Gap % | Gap % | Gap % | Gap % | |||||||||
G1 | 100 | 0.0452 | 0.0427 | 21685 | 5.3256 | 1650 | 5.1013 | 1592 | 6.3882 | 1494 | 1.1139 | 19046 | 1.1743 | 18468 | 1.2960 | 17742 |
1000 | 0.0100 | 0.0092 | 144575 | 1.8670 | 17024 | 1.5269 | 16930 | 1.7090 | 16589 | 0.3782 | 126215 | 0.3774 | 124709 | 0.4113 | 127334 | |
10000 | 0.0010 | 0.0010 | 888035 | 0.4146 | 134546 | 0.3628 | 141906 | 0.3890 | 154157 | 0.1312 | 786346 | 0.1322 | 728109 | 0.1565 | 730861 | |
G2 | 100 | 0.0443 | 0.0416 | 20514 | 5.1990 | 1673 | 4.9872 | 1668 | 6.1180 | 1614 | 1.1186 | 20079 | 1.1583 | 17849 | 1.3328 | 17078 |
1000 | 0.0076 | 0.0074 | 135864 | 1.8878 | 17695 | 1.5429 | 17554 | 1.5702 | 17592 | 0.3857 | 136275 | 0.3675 | 120983 | 0.4254 | 121586 | |
10000 | 0.0013 | 0.0012 | 816010 | 0.3419 | 150126 | 0.3133 | 153430 | 0.4046 | 148095 | 0.1317 | 763621 | 0.1463 | 687932 | 0.1571 | 715246 | |
G3 | 100 | 0.0641 | 0.0622 | 21664 | 5.9692 | 1416 | 5.2227 | 1629 | 6.1860 | 1585 | 1.1050 | 20425 | 1.1555 | 17401 | 1.3028 | 17647 |
1000 | 0.0074 | 0.0071 | 141485 | 1.8837 | 17015 | 1.3351 | 17410 | 1.5635 | 17266 | 0.4115 | 124686 | 0.3536 | 120982 | 0.4347 | 116904 | |
10000 | 0.0008 | 0.0008 | 892189 | 0.3171 | 151626 | 0.3283 | 156878 | 0.3777 | 156813 | 0.1326 | 763597 | 0.1405 | 649211 | 0.1665 | 669949 | |
G4 | 100 | 0.0686 | 0.0666 | 19922 | 5.6793 | 1487 | 5.7132 | 1355 | 6.9583 | 1208 | 1.0994 | 19435 | 1.1341 | 18535 | 1.2752 | 17574 |
1000 | 0.0077 | 0.0073 | 133750 | 2.1315 | 13938 | 1.4965 | 14001 | 1.7269 | 14378 | 0.3918 | 124072 | 0.3522 | 125967 | 0.4050 | 123681 | |
10000 | 0.0008 | 0.0008 | 892189 | 0.3171 | 151626 | 0.3283 | 156878 | 0.3777 | 156813 | 0.1326 | 763597 | 0.1405 | 649211 | 0.1665 | 669949 | |
G5 | 100 | 0.0668 | 0.0644 | 20334 | 5.8090 | 1509 | 5.7594 | 1379 | 6.8983 | 1338 | 1.1164 | 20254 | 1.1668 | 16756 | 1.2883 | 17533 |
1000 | 0.0125 | 0.0118 | 135291 | 2.0338 | 14165 | 1.3668 | 14650 | 1.7768 | 14696 | 0.4197 | 124738 | 0.3487 | 116404 | 0.4222 | 118817 | |
10000 | 0.0008 | 0.0007 | 888095 | 0.3882 | 137871 | 0.3553 | 143148 | 0.3893 | 146200 | 0.1322 | 772666 | 0.1335 | 714684 | 0.1643 | 694540 | |
G6 | 100 | 0.4803 | 0.0682 | 19842 | 8.2189 | 1541 | 7.2658 | 1426 | 7.0960 | 1293 | 2.5433 | 16239 | 2.0011 | 14659 | 2.1106 | 12283 |
1000 | 0.4179 | 0.0100 | 133015 | 1.8849 | 16247 | 1.7304 | 14630 | 1.5365 | 14492 | 0.6772 | 109973 | 0.5733 | 94957 | 0.6847 | 86027 | |
10000 | 0.4078 | 0.0009 | 937170 | 0.4791 | 172357 | 0.4163 | 177925 | 0.3755 | 163790 | 0.2108 | 720098 | 0.2232 | 581993 | 0.2106 | 566692 | |
G7 | 100 | 0.2743 | 0.0735 | 19832 | 9.9632 | 1282 | 7.2801 | 1206 | 8.0242 | 1129 | 2.2855 | 16331 | 1.9467 | 13740 | 2.1709 | 12006 |
1000 | 0.2068 | 0.0110 | 133207 | 2.1909 | 13391 | 1.7102 | 12885 | 1.7527 | 12644 | 0.6705 | 113963 | 0.6946 | 91575 | 0.6064 | 89294 | |
10000 | 0.1959 | 0.0009 | 813946 | 0.4959 | 142268 | 0.4302 | 129157 | 0.4549 | 124502 | 0.2127 | 657896 | 0.2206 | 525425 | 0.2223 | 510794 | |
G8 | 100 | 0.4518 | 0.0627 | 20101 | 8.5605 | 1370 | 7.5897 | 1277 | 7.9211 | 1179 | 2.3751 | 15179 | 2.1037 | 13710 | 2.1617 | 12686 |
1000 | 0.3945 | 0.0102 | 134291 | 2.0238 | 13734 | 1.7576 | 13463 | 1.6682 | 13493 | 0.7010 | 104772 | 0.5873 | 92671 | 0.6569 | 84183 | |
10000 | 0.3839 | 0.0009 | 820939 | 0.5342 | 137709 | 0.4611 | 133831 | 0.4212 | 129272 | 0.2134 | 566425 | 0.2330 | 566242 | 0.2457 | 480991 | |
G9 | 100 | 0.3402 | 0.0741 | 20486 | 8.6602 | 1454 | 6.8707 | 1402 | 6.7527 | 1364 | 2.5177 | 17074 | 2.0543 | 14883 | 2.1253 | 12280 |
1000 | 0.2705 | 0.0105 | 137199 | 1.9809 | 15167 | 1.7249 | 14672 | 1.5419 | 14895 | 0.6765 | 113442 | 0.5708 | 101335 | 0.6687 | 78602 | |
10000 | 0.2603 | 0.0011 | 839573 | 0.5128 | 148527 | 0.4597 | 141307 | 0.4304 | 137229 | 0.2126 | 578638 | 0.2375 | 561210 | 0.2158 | 492155 | |
G10 | 100 | 0.4693 | 0.0931 | 20176 | 9.1327 | 1365 | 7.0820 | 1293 | 7.5663 | 1210 | 2.3524 | 16086 | 2.0193 | 13699 | 2.1163 | 12605 |
1000 | 0.3829 | 0.0118 | 134447 | 2.0525 | 13854 | 1.7098 | 13728 | 1.5771 | 13428 | 0.6489 | 95647 | 0.5940 | 95183 | 0.5856 | 89508 | |
10000 | 0.3711 | 0.0008 | 843504 | 0.5452 | 140556 | 0.4503 | 139367 | 0.4136 | 136075 | 0.2046 | 572185 | 0.2221 | 525312 | 0.2224 | 502739 | |
G11 | 100 | 0.9391 | 0.0293 | 52396 | 7.6742 | 2344 | 6.8577 | 2332 | 7.4084 | 2073 | 2.1040 | 42363 | 1.8304 | 36848 | 1.7089 | 39655 |
1000 | 0.9021 | 0.0045 | 343271 | 2.3392 | 25386 | 1.9903 | 24475 | 2.0117 | 24162 | 0.6135 | 361872 | 0.5343 | 341395 | 0.5732 | 309567 | |
10000 | 0.8946 | 0.0007 | 1823694 | 0.7343 | 196738 | 0.7811 | 193144 | 0.6929 | 187647 | 0.2105 | 2176005 | 0.2228 | 1945371 | 0.2105 | 1813119 | |
G12 | 100 | 0.7673 | 0.0295 | 52859 | 8.1471 | 2120 | 7.0435 | 2302 | 7.3282 | 2236 | 2.1726 | 38919 | 1.8728 | 37346 | 1.7495 | 37614 |
1000 | 0.7294 | 0.0033 | 342496 | 2.4091 | 24045 | 2.1357 | 24512 | 2.3042 | 23901 | 0.6337 | 325029 | 0.5363 | 314001 | 0.5735 | 276203 | |
10000 | 0.7236 | 0.0007 | 1901444 | 0.7065 | 228102 | 0.5723 | 222533 | 0.6089 | 205372 | 0.2002 | 2311377 | 0.2094 | 2029389 | 0.2037 | 1692541 | |
G13 | 100 | 0.7167 | 0.0260 | 50953 | 7.5696 | 2414 | 7.2154 | 1830 | 6.6205 | 2339 | 2.1816 | 42733 | 1.9334 | 37760 | 1.8089 | 36572 |
1000 | 0.6846 | 0.0050 | 332950 | 2.4416 | 24491 | 2.1377 | 23816 | 1.9294 | 25070 | 0.6357 | 343063 | 0.5274 | 312566 | 0.5570 | 279764 | |
10000 | 0.6775 | 0.0005 | 1843586 | 0.7297 | 211955 | 0.7435 | 204609 | 0.6543 | 195695 | 0.1995 | 1887493 | 0.2062 | 1895654 | 0.1948 | 1547687 | |
G14 | 100 | 0.0488 | 0.0401 | 36050 | 21.069 | 1686 | 16.422 | 1764 | 14.932 | 1636 | 2.9552 | 32769 | 2.6901 | 30857 | 2.3423 | 28826 |
1000 | 0.0062 | 0.0051 | 247958 | 4.4529 | 18672 | 3.5753 | 18026 | 3.3863 | 17441 | 0.8246 | 243773 | 0.7049 | 220892 | 0.6569 | 206482 | |
10000 | 0.0010 | 0.0006 | 1653247 | 0.9661 | 175901 | 0.7821 | 173581 | 0.6847 | 186339 | 0.2270 | 1629850 | 0.2462 | 1237850 | 0.2253 | 1212944 | |
G15 | 100 | 0.0495 | 0.0411 | 36866 | 18.219 | 2085 | 15.498 | 2061 | 14.534 | 2039 | 3.2756 | 35930 | 2.7522 | 29819 | 2.5041 | 27217 |
1000 | 0.0083 | 0.0064 | 251117 | 4.1913 | 22101 | 3.2621 | 21712 | 3.0489 | 20963 | 0.8459 | 255448 | 0.7324 | 225802 | 0.6674 | 204773 | |
10000 | 0.0011 | 0.0007 | 1579597 | 0.9704 | 196785 | 0.7993 | 192710 | 0.7211 | 183253 | 0.2433 | 1550812 | 0.2374 | 1314984 | 0.2514 | 1151339 | |
G16 | 100 | 0.0365 | 0.0287 | 41046 | 16.989 | 2178 | 14.670 | 2150 | 13.058 | 2034 | 2.7754 | 33679 | 2.5751 | 31471 | 2.3225 | 30479 |
1000 | 0.0061 | 0.0049 | 271597 | 3.9114 | 22674 | 3.1580 | 22386 | 2.8169 | 21605 | 0.7870 | 260408 | 0.7366 | 228063 | 0.7303 | 222851 | |
10000 | 0.0010 | 0.0008 | 1679075 | 0.8573 | 224493 | 0.6819 | 227033 | 0.6209 | 224707 | 0.2386 | 1499401 | 0.2269 | 1423609 | 0.2134 | 1430100 | |
G17 | 100 | 0.0518 | 0.0434 | 37335 | 20.096 | 1707 | 17.255 | 1757 | 14.974 | 1703 | 3.2235 | 33611 | 2.7927 | 29911 | 2.6083 | 26684 |
1000 | 0.0090 | 0.0075 | 253376 | 4.5791 | 18601 | 3.6161 | 18267 | 3.4578 | 17899 | 0.8749 | 243142 | 0.7551 | 216924 | 0.6898 | 205937 | |
10000 | 0.0009 | 0.0006 | 1650749 | 0.9701 | 232319 | 0.7884 | 226754 | 0.6712 | 220683 | 0.2417 | 1537197 | 0.2402 | 1352397 | 0.2139 | 1371570 | |
G18 | 100 | 0.7302 | 0.0560 | 38999 | 11.727 | 1795 | 10.971 | 1780 | 9.5330 | 1586 | 2.4057 | 24255 | 2.0556 | 22388 | 2.0462 | 20555 |
1000 | 0.6753 | 0.0084 | 260520 | 2.7593 | 18331 | 2.1365 | 18265 | 2.1493 | 17040 | 0.7666 | 185915 | 0.6925 | 159283 | 0.6533 | 141416 | |
10000 | 0.6645 | 0.0006 | 1599182 | 0.6728 | 182254 | 0.4799 | 183154 | 0.5566 | 166432 | 0.2586 | 1061813 | 0.2166 | 962496 | 0.2154 | 879504 |
Data | Time | CGAL | LCG 0.25 | LCG 0.5 | LCG 0.9 | CG 0.25 | CG 0.5 | CG 0.9 | ||||||||
Set | (sec) | Opt. | Feas. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter | Opt. | Iter |
Gap % | Gap % | Gap % | Gap % | Gap % | Gap % | Gap % | Gap % | |||||||||
G19 | 100 | 0.8967 | 0.0492 | 37083 | 12.059 | 1630 | 10.580 | 1541 | 10.506 | 1527 | 2.3693 | 25356 | 2.2978 | 22770 | 2.0119 | 20054 |
1000 | 0.8444 | 0.0066 | 254368 | 2.6314 | 16530 | 2.1796 | 16473 | 2.1604 | 15966 | 0.6124 | 162830 | 0.6152 | 137255 | 0.6637 | 129524 | |
10000 | 0.8351 | 0.0006 | 1723323 | 0.5384 | 205382 | 0.5075 | 194687 | 0.4851 | 186757 | 0.2624 | 1026619 | 0.2506 | 842195 | 0.2320 | 756654 | |
G20 | 100 | 0.7363 | 0.0538 | 36281 | 12.956 | 1486 | 10.386 | 1593 | 10.453 | 1462 | 2.8064 | 23999 | 2.4927 | 21325 | 2.2067 | 19130 |
1000 | 0.6819 | 0.0062 | 251142 | 2.9914 | 15873 | 2.4176 | 15989 | 2.2335 | 16143 | 0.7705 | 162281 | 0.7034 | 144692 | 0.6373 | 132063 | |
10000 | 0.6742 | 0.0006 | 1716273 | 0.6061 | 208638 | 0.5587 | 206001 | 0.4746 | 187323 | 0.2475 | 1068889 | 0.2064 | 924839 | 0.2231 | 790836 | |
G21 | 100 | 0.7840 | 0.0726 | 41830 | 10.288 | 1935 | 7.7700 | 1998 | 8.3024 | 1955 | 2.4996 | 24785 | 2.0761 | 20738 | 2.0545 | 20794 |
1000 | 0.7121 | 0.0102 | 275681 | 2.1759 | 20341 | 1.8593 | 21211 | 1.8629 | 20506 | 0.7668 | 160845 | 0.5995 | 139988 | 0.6157 | 136838 | |
10000 | 0.7005 | 0.0008 | 1610239 | 0.4848 | 188165 | 0.4806 | 187510 | 0.4938 | 178700 | 0.2522 | 1016576 | 0.2235 | 914508 | 0.2284 | 824945 | |
G22 | 100 | 0.1174 | 0.1139 | 12416 | 18.1438 | 300 | 16.5330 | 288 | 18.2411 | 248 | 3.7563 | 7042 | 2.9849 | 6751 | 3.4068 | 5745 |
1000 | 0.0193 | 0.0187 | 90981 | 4.7658 | 3319 | 4.3706 | 3215 | 5.5525 | 2941 | 0.7828 | 64426 | 0.6636 | 59780 | 0.7353 | 58167 | |
10000 | 0.0022 | 0.0021 | 632496 | 1.0548 | 34970 | 1.1614 | 31913 | 1.1801 | 31293 | 0.2095 | 497773 | 0.1931 | 447623 | 0.2247 | 449915 | |
G23 | 100 | 0.1023 | 0.0980 | 12410 | 18.5561 | 292 | 17.3589 | 250 | 17.8255 | 256 | 3.6153 | 6927 | 2.9642 | 6491 | 3.3613 | 5583 |
1000 | 0.0149 | 0.0143 | 90915 | 4.7672 | 3275 | 4.3952 | 2940 | 5.3747 | 3069 | 0.7403 | 60120 | 0.6684 | 57688 | 0.7333 | 57294 | |
10000 | 0.0021 | 0.0020 | 631480 | 1.0797 | 33094 | 1.1388 | 31910 | 1.1669 | 32278 | 0.2062 | 478154 | 0.1923 | 440393 | 0.2229 | 449699 | |
G24 | 100 | 0.1257 | 0.1222 | 12342 | 20.1997 | 251 | 17.6862 | 259 | 18.7144 | 242 | 3.8042 | 6856 | 3.1557 | 6188 | 3.4891 | 5526 |
1000 | 0.0164 | 0.0157 | 90383 | 5.1886 | 2825 | 3.8498 | 2957 | 5.5847 | 2858 | 0.8075 | 61881 | 0.6991 | 55330 | 0.7356 | 57577 | |
10000 | 0.0020 | 0.0018 | 624799 | 1.2467 | 30859 | 1.1084 | 31608 | 1.1785 | 30999 | 0.2195 | 475491 | 0.2035 | 412481 | 0.2309 | 438368 | |
G25 | 100 | 0.1057 | 0.1012 | 12533 | 18.5114 | 300 | 16.3371 | 284 | 17.9667 | 256 | 3.5888 | 6971 | 2.9573 | 6687 | 3.4011 | 5701 |
1000 | 0.0135 | 0.0128 | 91948 | 4.7165 | 3368 | 4.4134 | 3115 | 5.4227 | 3022 | 0.7561 | 63518 | 0.6510 | 60912 | 0.7309 | 58963 | |
10000 | 0.0018 | 0.0017 | 637281 | 1.0292 | 34064 | 1.1919 | 32793 | 1.1593 | 32361 | 0.2028 | 491183 | 0.1930 | 448653 | 0.2257 | 452544 | |
G26 | 100 | 0.1375 | 0.1337 | 12736 | 20.6034 | 248 | 17.1955 | 269 | 18.7351 | 245 | 3.7270 | 6860 | 2.9696 | 6664 | 3.4397 | 5703 |
1000 | 0.0151 | 0.0145 | 92871 | 4.9788 | 3025 | 4.4023 | 3025 | 5.7863 | 2932 | 0.7733 | 62802 | 0.6581 | 60555 | 0.7417 | 59907 | |
10000 | 0.0022 | 0.0021 | 635639 | 1.0644 | 32298 | 0.9407 | 32064 | 1.1557 | 32364 | 0.2074 | 490930 | 0.1957 | 442197 | 0.2317 | 461381 | |
G27 | 100 | 0.3024 | 0.2283 | 11530 | 28.4263 | 195 | 25.0943 | 214 | 27.2575 | 174 | 4.7957 | 6361 | 4.0809 | 6127 | 4.4816 | 4712 |
1000 | 0.0877 | 0.0246 | 82946 | 8.2273 | 2240 | 6.4036 | 2407 | 5.7783 | 2351 | 0.8400 | 55681 | 0.8083 | 52249 | 0.7602 | 47704 | |
10000 | 0.0646 | 0.0031 | 609686 | 1.5543 | 25336 | 0.9938 | 25525 | 0.9926 | 25790 | 0.0102 | 390126 | 0.0000 | 378324 | 0.0409 | 344793 | |
G28 | 100 | 0.3037 | 0.1984 | 12418 | 23.2191 | 281 | 22.2085 | 280 | 23.0712 | 226 | 4.6768 | 6415 | 4.1060 | 6054 | 4.3763 | 4992 |
1000 | 0.1194 | 0.0240 | 91115 | 6.2536 | 3044 | 4.6599 | 3113 | 5.0701 | 3019 | 0.7727 | 54325 | 0.8275 | 54898 | 0.7498 | 50886 | |
10000 | 0.0964 | 0.0026 | 634335 | 1.3236 | 33006 | 0.8704 | 33326 | 0.7890 | 32323 | 0.0216 | 393334 | 0.0000 | 397716 | 0.0190 | 377339 | |
G29 | 100 | 0.1737 | 0.1651 | 12428 | 22.335 | 291 | 22.374 | 276 | 22.594 | 244 | 4.8811 | 6431 | 4.1196 | 6132 | 4.4864 | 4976 |
1000 | 0.0238 | 0.0216 | 91157 | 5.9446 | 3169 | 5.2525 | 3145 | 5.0132 | 3068 | 0.9131 | 57929 | 0.9055 | 50524 | 0.8096 | 51421 | |
10000 | 0.0025 | 0.0022 | 637658 | 1.2430 | 33012 | 0.8851 | 33265 | 0.9036 | 32289 | 0.1865% | 421074 | 0.0965 | 375627 | 0.1135 | 356157 | |
G30 | 100 | 0.2294 | 0.2176 | 11582 | 26.6363 | 217 | 22.9201 | 208 | 26.8685 | 190 | 4.8414 | 6122 | 4.0918 | 6027 | 4.4921 | 4685 |
1000 | 0.0300 | 0.0254 | 86515 | 7.8141 | 2482 | 5.3362 | 2403 | 6.1525 | 2295 | 0.8334 | 53756 | 0.7499 | 51787 | 0.7489 | 47761 | |
10000 | 0.0048 | 0.0017 | 616813 | 1.0134 | 27305 | 1.1942 | 24995 | 0.9684 | 24960 | 0.1127 | 388638 | 0.0027 | 364280 | 0.0314 | 353046 |
B.2 Markov chain mixing rate
We next consider the problem of finding the fastest mixing rate of a Markov chain on a graph [33]. In this problem, a symmetric Markov chain is defined on an undirected graph . Given and weights for , we are tasked with finding the transition rates for each with weighted sum smaller than 1, that result in the fastest mixing rate. We assume that . The mixing rate is given by the second smallest eigenvalue of the graph’s Laplacian matrix, described as
From it follows that is a stationary distribution of the process. From [9], we have the following bound on the total variation distance of the time distribution of the Markov process and the stationary distribution The quantity is called the mixing time, and is the second-largest eigenvalue of the graph Laplacian . To bound this eigenvalue, we follow the strategy laid out in [33]. Thus, the problem can be written as
s.t. | |||
This problem can also alternatively be formulated as
s.t. | |||
Due to the properties of the Laplacian, the first constraint can be reformulated as
(B.2) | ||||
s.t. | (B.3) | |||
(B.4) |
The dual problem is then given by
(B.5) |
To obtain an SDP within our convex programming model, we combine arguments from [33] and [2]. Let be a feasible point for (B.5). Then, there exists a matrix such that . It is easy to see that multiplying each row of the matrix with an orthonormal matrix does not change the feasibility of the candidate solution. Moreover for all , so that . Since shifting each vector by a constant vector would not affect the objective or constraints, we can, without loss of generality, set . This gives the equivalent optimization problem
(B.6) |
This is the geometric dual derived in [33], which is strongly connected to the geometric embedding of a graph in the plane [20]. Thus, setting again , we therefore obtain that for all thus reducing the dimension of the problem, and (B.7) can be reformulated as follows:
(B.7) |
Moreover, since , it follows that and so for any such that ,. Therefore, since the graph is connected, we can recursively bound each diagonal element of and therefore the trace by a constant , and add the trace constraint to the problem formulation without affecting the optimal values. Furthermore, defining by
Let and . This is a closed convex cone with logarithmically homogeneous barrier
This gives a logarithmically homogeneous barrier , where . We thus arrive at a formulation of the form of problem (P), which reads explicitly as
(B.8) |
Dataset | # Nodes | # Edges | SDPT3 Value |
1 | 100 | 1000 | 15.62 |
2 | 100 | 2000 | 7.93 |
3 | 200 | 1000 | 72.32 |
4 | 200 | 4000 | 17.84 |
5 | 400 | 1000 | 388.52 |
6 | 400 | 8000 | 38.37 |
7 | 800 | 4000 | 333.25 |
8 | 800 | 16000 | 80.03 |
We generated random connected undirected graphs of various sizes, and for each edge in the graph we generated a random uniformly in . Table 4 provides the size of each dataset, and the value obtained by solving the Mixing problem SDP using CVX with the SDPT3 solver. In order to compare to the algorithm, we used the normalisation guidelines specified in [41], so that the norm of the coefficients vector of each constraint will be equal to each other, and the norm of the constraint matrix and the objective matrix equals 1.
All datasets were run for both and options with the following choice of parameters and . All methods were stopped when either they reached iterations or seconds running time, the earlier of the two conditions. Since does not generate feasible solutions, we computed the deviation from feasibility of its outputted solution by
We then corrected the solution to a feasible solution , however in our numerical experiments it turned out that was extremely large, which resulted in being very close to the zero matrix with very poor performance. Therefore, we used the following alternative iterative correction method: We set , and at each iteration we identify which leads to the maximum value for set and set
We terminate the algorithm when . The value of the returned is then computed. As in the previous section, one of our performance measures is the relative optimality gap between the (corrected) solution of each method and the solution. Specifically, for each dataset , method and parameter combination , we record the objective value (or, for , the corrected objective value) after iteration , the corresponding relative optimality gap , the time elapsed since the start of the run and, only for , the relative feasibility gap .
Figures 8-9 illustrate the results for the tested and values. Figure 8 displays, for each iteration , the best relative optimality gap up to that iteration. Figure 9 displays for each time the best relative optimality gap obtained up to time . Table 5 reports the performances of the tested methods at time points seconds, in order to examine their algorithmic behavior under different time constraints. Specifically, given a data set , a method , parameters and a time point , for and we compute , so the table contains: the parameter values , the corresponding objective function value (Obj. Value) , the relative optimality gap (Opt. Gap) , and the number of iterations (Iter) performed until that time, given by . For , since there is no parameter choice, we retrieve , and the table records the corresponding objective value (Obj. Value) , the relative optimal gap (Opt. Gap) , the relative feasibility gap (Feas Gap) , as well as the number of iterations (Iter) given by . The lowest relative optimality gap for each data set and time is marked in bold.


Data | Time | ||||||||||||||
Set | (Sec) | Obj. | Opt. | Iter | Obj. | Opt. | Iter | Obj. | Opt. | Feas. | Iter | ||||
() | Value | Gap % | () | Value | Gap % | Value | Gap | % Gap % | |||||||
MixData1 | 100 | 2 | 0.5 | 13.03 | 16.6 | 2385 | 2 | 0.9 | 14.39 | 7.9 | 809 | 3.68 | 76.4 | 21930.2 | 3090 |
500 | 2 | 0.9 | 13.96 | 10.7 | 4969 | 2 | 0.25 | 14.97 | 4.1 | 9001 | 3.74 | 76.1 | 12304.4 | 11460 | |
1000 | 2 | 0.9 | 14.20 | 9.1 | 9286 | 0.5 | 0.25 | 15.12 | 3.2 | 13598 | 3.76 | 75.9 | 8873.5 | 19981 | |
MixData2 | 100 | 2 | 0.9 | 6.71 | 15.3 | 957 | 2 | 0.9 | 7.26 | 8.4 | 745 | 1.88 | 76.2 | 19184.7 | 2948 |
500 | 2 | 0.9 | 7.17 | 9.5 | 4099 | 2 | 0.9 | 7.49 | 5.5 | 3873 | 1.90 | 76.0 | 8729.7 | 10864 | |
1000 | 2 | 0.9 | 7.23 | 8.8 | 8024 | 2 | 0.9 | 7.59 | 4.3 | 7758 | 1.91 | 76.0 | 6066.0 | 18977 | |
MixData3 | 100 | 1 | 0.25 | 32.86 | 54.6 | 1865 | 0.5 | 0.25 | 65.70 | 9.2 | 888 | 15.15 | 79.1 | 44622.3 | 1656 |
500 | 1 | 0.25 | 54.34 | 24.9 | 8832 | 1 | 0.25 | 68.88 | 4.8 | 3289 | 15.30 | 78.8 | 13031.6 | 6080 | |
1000 | 1 | 0.25 | 54.87 | 24.1 | 14128 | 1 | 0.25 | 69.34 | 4.1 | 5851 | 15.32 | 78.8 | 7902.7 | 10657 | |
MixData4 | 100 | 2 | 0.25 | 10.28 | 42.4 | 1676 | 2 | 0.9 | 15.42 | 13.6 | 166 | 3.33 | 81.3 | 116332.0 | 1440 |
500 | 0.5 | 0.25 | 14.69 | 17.7 | 4710 | 2 | 0.9 | 16.15 | 9.5 | 921 | 3.36 | 81.2 | 57781.9 | 5203 | |
1000 | 0.5 | 0.25 | 14.72 | 17.5 | 6939 | 2 | 0.9 | 16.47 | 7.7 | 1880 | 3.36 | 81.2 | 41696.7 | 9081 | |
MixData5 | 100 | 2 | 0.25 | 92.61 | 76.2 | 1289 | 1 | 0.25 | 340.02 | 12.5 | 415 | 59.39 | 84.7 | 122971.0 | 946 |
500 | 0.5 | 0.9 | 163.46 | 57.9 | 4788 | 0.5 | 0.5 | 360.70 | 7.2 | 1701 | 59.31 | 84.7 | 126861.0 | 3397 | |
1000 | 1 | 0.25 | 277.79 | 28.5 | 5359 | 2 | 0.25 | 367.26 | 5.5 | 2937 | 59.90 | 84.6 | 64948.0 | 5939 | |
MixData6 | 100 | 1 | 0.5 | 10.52 | 72.6 | 751 | 2 | 0.25 | 30.11 | 21.5 | 256 | 7.44 | 80.6 | 149166.0 | 811 |
500 | 0.5 | 0.25 | 28.98 | 24.5 | 2995 | 2 | 0.9 | 33.59 | 12.5 | 232 | 7.60 | 80.2 | 103077.0 | 3004 | |
1000 | 0.5 | 0.25 | 32.30 | 15.8 | 4661 | 2 | 0.9 | 34.24 | 10.8 | 482 | 7.66 | 80.0 | 68409.9 | 5270 | |
MixData7 | 100 | 0.5 | 0.9 | 35.69 | 89.3 | 496 | 1 | 0.25 | 276.90 | 16.9 | 82 | 59.19 | 82.2 | 702281.0 | 478 |
500 | 1 | 0.25 | 102.57 | 69.2 | 984 | 0.5 | 0.25 | 295.47 | 11.3 | 616 | 62.41 | 81.3 | 239810.0 | 1766 | |
1000 | 1 | 0.25 | 152.56 | 54.2 | 2101 | 1 | 0.25 | 301.48 | 9.5 | 1447 | 62.78 | 81.2 | 185699.0 | 3124 | |
MixData8 | 100 | 0.5 | 0.9 | 4.49 | 94.4 | 306 | 0.5 | 0.25 | 55.80 | 30.3 | 66 | 13.77 | 82.8 | 613444.0 | 431 |
500 | 0.5 | 0.9 | 12.03 | 85.0 | 1455 | 2 | 0.25 | 63.64 | 20.5 | 566 | 13.99 | 82.5 | 282965.0 | 1593 | |
1000 | 0.5 | 0.9 | 15.84 | 80.2 | 2751 | 1 | 0.9 | 66.98 | 16.3 | 133 | 14.05 | 82.4 | 243686.0 | 2817 |
Figures 8 and 9 indicate that the line search version outperforms , although in most cases less iterations are performed. One explanation of this phenomenon is that the step size policy employed in is based on global optimization ideas which do not take into account the local structure of the problem. instead captures these local features of the problem more accurately, leading to better numerical performance. By comparison, performs quite poorly, with optimality gaps ranging from to roughly; this is closely linked to large relative feasibility gaps, exceeding . One possible reason for these results is that ’s convergence guarantees rely on the magnitude of the dual variables, which in turn depend on the size of the Slater condition gap. Since the right-hand side of the constraints may be very small, this gap will generally be small, causing the norm of the dual variables to become large and slowing convergence. This may also explain the contrast with ’s superior performance in the MaxCut example. Additionally, Table 5 does not indicate that there is a preferable choice of parameters for and , since the best configuration is different for different data sets and times.
B.3 Randomly Scaled SDP
Finally, we consider a general SDP problems with inequality constraints, which we use to investigate the sensitivity of the compared methods to problem scaling. The synthetically scaled SDP (SRS) takes the form
(B.9) | ||||||
s.t. | ||||||
Let and . This is a closed convex cone with logarithmically homogeneous barrier
i.e., , where . We thus arrive at a formulation of the form of problem (P), which reads explicitly as
(B.10) |
We randomly generate the objective function and constraint coefficients to be PSD matrices with unit Frobenius norm where, given a matrix , its Frobenius norm is . Specifically, the constraint matrices for every instance and are generated by first sampling with components from a standard Gaussian distribution, rounded to the nearest integer, then setting , and finally normalizing the matrix to Frobenius norm of 1. Similarly, the objective matrix is generated by sampling with components from a standard Gaussian distribution, rounded to the nearest integer, and . Additionally, we generate a PSD reference solution by sampling with components from a standard Gaussian distribution, then setting and, finally, normalizing the result to have a trace equal to 1. The right-hand side constraint coefficients are constructed as , with is a random variable uniformaly disributed over , and , where is a parameter that measures the scaling of the constraints. We note that ran on the normalized problem, using the normalization guidelines specified in [41].
Similarly to the previous examples, the performance metric to benchmark our experimental results is the relative optimality gap, defined , where is the optimal solution computed by the interior-point solver . We compare our solutions with those generated by . However, since does not necessarily produce feasible solutions, its reported solution is corrected to ensure feasibility. Specifically, given a relative feasibility gap , the corrected solution was given by , and the relative optimality gap is obtained by using .
For each value of , we ran , and on 30 realizations of problems of size and . All datasets were run for both and with parameters and . and all methods were stopped when either after seconds of running time. For every , realization of the data and algorithm , we record the objective value (corrected as explained above for ) obtained after the iteration, the corresponding relative optimality gap , the time since the start of the run and, for , the relative feasibility gap .
Data | Time | ||||||||||||||||
(sec) | Obj. | Opt. | Iter | Obj. | Opt. | Iter | Obj. | Opt. | Iter | Obj. | Opt. | Iter | Obj. | Opt. | Feas. | Iter | |
Value | Gap % | Value | Gap % | Value | Gap % | Value | Gap % | Value | Gap % | Gap % | |||||||
0 | 100 | -186.45 | 0.671 | 2067 | -186.78 | 0.4978 | 1467 | -186.79 | 0.4908 | 1543 | -186.99 | 0.3875 | 1305 | -187.67 | 0.0059 | 0.0183 | 13349 |
500 | -187.19 | 0.2788 | 6665 | -187.34 | 0.198 | 6120 | -187.28 | 0.2272 | 6095 | -187.4 | 0.1666 | 6201 | -187.7 | 0.0009 | 0.0049 | 50070 | |
1000 | -187.33 | 0.2027 | 13071 | -187.46 | 0.1308 | 11662 | -187.4 | 0.1653 | 11918 | -187.5 | 0.1139 | 12133 | -187.7 | 0.0004 | 0.0041 | 88240 | |
1 | 100 | -143.26 | 1.3039 | 1131 | -143.56 | 1.0994 | 1020 | -143.81 | 0.9277 | 1099 | -144.01 | 0.786 | 783 | -141.99 | 1.7272 | 2.1609 | 13266 |
500 | -144.38 | 0.5311 | 3976 | -144.53 | 0.423 | 3841 | -144.58 | 0.3932 | 3626 | -144.66 | 0.3374 | 3427 | -144.26 | 0.2718 | 0.6039 | 50306 | |
1000 | -144.62 | 0.3621 | 7458 | -144.73 | 0.2838 | 7314 | -144.72 | 0.2955 | 7014 | -144.8 | 0.235 | 6790 | -144.83 | 0.1209 | 0.213 | 87925 | |
2 | 100 | -107.75 | 1.5629 | 767 | -108.18 | 1.1652 | 686 | -108.31 | 1.0371 | 612 | -108.64 | 0.7336 | 569 | -32.44 | 70.2786 | 265.9191 | 14138 |
500 | -108.82 | 0.5716 | 2868 | -109.02 | 0.3879 | 2724 | -108.98 | 0.4199 | 2727 | -109.14 | 0.276 | 2603 | -65.05 | 40.3478 | 74.8299 | 50504 | |
1000 | -108.98 | 0.4222 | 5519 | -109.15 | 0.2695 | 5283 | -108.64 | 0.3322 | 5374 | -109.22 | 0.2037 | 5191 | -82.82 | 24.0898 | 35.2035 | 90162 |
Table 6 displays the performance of , and after time points seconds for different values of the parameter , in order to study algorithmic behaviour under different time constraints. Specifically, for each method , parameter value , realization , and time , we compute and . The table displays the average objective function value (Obj. Value) given by , the average relative optimality gap (Opt. Gap) , as well as the number of iterations (Iter) until that time given by and, for , the average feasibility gap (Feas. Gap) . Table 6 shows that benefits from inexpensive iterations, allowing it to close the optimality gap more quickly than and in favorable settings (). However, as increases to 2, although still performs more iterations than our algorithms, it fails to close the infeasibility gap even after 1000 seconds. This leads to a significantly larger optimality gap, as our calculations penalize infeasible solutions. In contrast, and demonstrate robust performance, with little variation across different values of . In particular reaches an average optimality gap smaller than after 100 seconds for all values of , while the average optimality gap of increases by a factor of 2-2.5 when moving from to , but still maintaining less than average optimality gap after 500 seconds for all values of .
The performance of the average relative optimality gap with respect to iteration and time for each value of is visually depicted in Figures 10 and 11, respectively. In these figures, it can clearly be observed that the performance of and is robust to the value of , while ’s performance deteriorates significantly as deteriorates as gets larger. In order to create Figure 10 we first compute, for each iteration , algorithm , parameter value and realization , the best relative optimality gap up to iteration . We then plot its average across all realizations, for each iteration . Instead, Figure 11 is obtained by first computing, for each time , algorithm , parameter value and realization , the best relative optimality gap obtained up to time (as defined above), and then by averaging these values across all realizations. These figures elucidate that while performance deteriorates as increases, the performance of and is robust to the scaling of the problem, .

