A Low-Rank Augmented Lagrangian Method
for Large-Scale Semidefinite Programming
Based on a Hybrid Convex-Nonconvex
Approach
Abstract
This paper introduces HALLaR, a new first-order method for solving large-scale semidefinite programs (SDPs) with bounded domain. HALLaR is an inexact augmented Lagrangian (AL) method where the AL subproblems are solved by a novel hybrid low-rank (HLR) method. The recipe behind HLR is based on two key ingredients: 1) an adaptive inexact proximal point method with inner acceleration; 2) Frank-Wolfe steps to escape from spurious local stationary points. In contrast to the low-rank method of Burer and Monteiro, HALLaR finds a near-optimal solution (with provable complexity bounds) of SDP instances satisfying strong duality. Computational results comparing HALLaR to state-of-the-art solvers on several large SDP instances arising from maximum stable set, phase retrieval, and matrix completion, show that the former finds highly accurate solutions in substantially less CPU time than the latter ones. For example, in less than 20 minutes, HALLaR can solve a maximum stable set SDP instance with dimension pair within relative precision.
Keywords: semidefinite programming, augmented Lagrangian, low-rank methods, proximal point method, Frank-Wolfe method, iteration complexity, adaptive method, global convergence rate
1 Introduction
Semidefinite programming (SDP) has many applications in engineering, machine learning, sciences, finance, among other areas. However, solving large-scale SDPs is very computationally challenging. In particular, interior point methods usually get stalled in large-scale instances due to lack of memory. This has motivated a recent surge of first-order methods for solving SDPs that scale to larger instances [65, 60, 45, 25, 18, 66, 55, 62, 64, 50].
This paper introduces HALLaR, a new first-order method for solving SDPs with bounded trace. Let be the space of symmetric matrices with Frobenius inner product and with positive semidefinite partial order . HALLaR solves the primal/dual pair of SDPs:
(P) |
(D) |
where , , is a linear map, is its adjoint, and is the spectraplex
(1) |
HALLaR is based on Burer and Monteiro’s low-rank (LR) approach [7, 8] which is described in the next paragraph.
Low-rank approach.
The LR approach is motivated by SDPs often having optimal solutions with small ranks. More specifically, it is known (see [47, 2, 54]) that , where is the smallest among the ranks of all optimal solutions of (P). The LR approach consists of solving subproblems obtained by restricting (P) to matrices of rank at most , for some integer , or equivalently, the nonconvex smooth reformulation
() |
Problems (P) and () are equivalent when , in the sense that if is optimal for () then is optimal for (P). The advantage of () compared to (P) is that its matrix variable has significantly less entries than that of (P) when , namely, instead of . However, as () is nonconvex, it may have stationary points which are not globally optimal. For a generic instance, the following results are known: i) if then all local minima of () are globally optimal (see [14, 15, 4, 5, 3, 48]); and ii) if then () may have local minima which are not globally optimal (see [58]).
Outline of HALLaR.
HALLaR is an inexact augmented Lagrangian (AL) method that generates sequences and according to the recursions
(2a) | ||||
(2b) |
where
(3) |
The key part of HALLaR is an efficient method, called Hybrid Low-Rank (HLR), for finding an approximate global solution of the AL subproblem (2a). The HLR method solves subproblems of the form
() |
for some integer . Subproblem () is equivalent to the subproblem obtained by restricting in (2a) to matrices with rank at most . Since () is nonconvex, it may have a spurious (near) stationary point, i.e., a (near) stationary point such that is not (nearly) optimal for (2a).
More specifically, HLR finds an approximate global solution of (2a) by solving a sequence of nonconvex subproblems such that , according to following steps: i) find a near stationary point of using an adaptive accelerated inexact proximal point (ADAP-AIPP) method that is based on a combination of ideas developed in [56, 13, 46, 37, 36]; ii) check if is nearly optimal for (2a) through a minimum eigenvalue computation and terminate the method if so; else iii) use the following escaping strategy to move away from the current spurious near stationary point : perform a Frank-Wolfe (FW) step from to obtain a point with either one column in which case (the unlikely one) is set to one, or with columns in which case is set to , and use as the initial iterate for solving . The initial pair for HLR is chosen by using a warm start strategy, namely, as the pair obtained at the end of the HLR call for solving the previous subproblem (2a). It is worth noting that HALLaR only stores the current iterate and never computes the (implicit) iterate (lying in the -space).
Computational impact.
Our computational results show that HALLaR performs very well on many large-scale SDPs such as phase retrieval, maximum-stable-set, and matrix completion. In all these applications, HALLaR efficiently obtains accurate solutions for large-scale instances, largely outperforming other state-of-the-art solvers. For example, HALLaR takes approximately hours (resp. hours) on a personal laptop to solve within relative precision maximum stable set SDP instance for a Hamming graph with and (resp. and ). Moreover, HALLaR takes approximately hours on a personal laptop to solve within relative precision a phase retrieval SDP instance with and . An important reason for the good computational performance of HALLaR is that the rank of the iterates remain relatively small throughout the whole algorithm.
Related works.
This part describes other methods for solving large-scale SDPs. SDPNAL+ [65, 60] is an AL based method that solves each AL subproblem using a semismooth Newton-CG method. Algorithms based on spectral bundle methods (more generally bundle methods) have also been proposed and studied for solving large-scale SDPs (e.g., [31, 9, 30, 19, 44]). The more recent works (e.g., [45, 25, 18, 41, 66]) propose methods for solving large-scale SDPs based on the alternating direction method of multipliers (ADMM). The remaining of this section discusses in more detail works that rely on the nonconvex LR approach and the FW method, since those works are more closely related to this paper. The reader is referred to the survey paper [42] for additional methods for solving large scale SDPs.
The nonconvex LR approach of [7, 8] has been successful in solving many relevant classes of SDPs. The SDPLR method developed in these works solves () with an AL method whose AL subproblems are solved by a limited memory BFGS method. Although SDPLR only handles equality constraints, it is possible to modify it to handle inequalities (e.g., [38]). HALLaR is also based on the AL method but it applies it directly to (P) instead of (). Moreover, in contrast to SDPLR, HALLaR solves the AL subproblems using the HLR method outlined above.
This paragraph describes works that solve (possibly a sequence of) () without using the AL method. Approaches that use interior point methods for solving () have been pursued for example in [51]. In the context of MaxCut SDPs, several specialized methods have been proposed which solve () using optimization techniques which preserves feasibility (e.g., [6, 32, 21, 43, 35]). Finally, Riemannian optimization methods have been used to solve special classes of SDPs where the feasible sets for () are smooth manifolds (e.g., [52, 35, 33, 43]).
The FW method minimizes a convex function over a compact convex domain (e.g., the spectraplex ). It is appealing when a sparse solution is desired, where the notion of sparsity is broad (e.g., small cardinality and/or rank). The FW method has been used (e.g., [29, 55, 34]) for solving SDP feasibility problems by minimizing where is either the squared norm function or the function . Several papers (e.g., [23, 28, 49]) introduce variants of the FW method for general convex optimization problems.
Another interesting method for solving (P) is CGAL of [62, 63] which generates its iterates by performing only FW steps with respect to AL subproblems in the same format as (2a). As HALLaR, the method of [63] only generates iterates in the -space. Its Lagrange multiplier update policy though differs from (2b) in that it updates the Lagrange multiplier in a more conservative way, i.e., with in (2b) replaced by a usually much smaller , and does so only when the size of the new tentative multiplier is not too large. Moreover, instead of using a pure FW method to solve (2a), an iteration of the subroutine HLR invoked by HALLaR to solve (2a) consists of an ADAP-AIPP call applied to () and, if HLR does not terminate, also a FW step (which generally increases the rank of the iterate by one). As demonstrated by our computational results, the use of ADAP-AIPP calls significantly reduces the number of FW steps performed by HALLaR, and, as a by-product, keeps the ranks of its iterates considerably smaller than those of the CGAL iterates.
The CGAL method was enhanced in [64] to derive a low-storage variant, namely, Sketchy-CGAL. Instead of explicitly storing its most recent -iterate as CGAL does, this variant computes a certain approximation of the above iterates lying in where is a specified threshold value whose purpose is to limit the rank of the stored approximation. It is shown in [64] that Sketchy-CGAL has memory storage, and that it outputs an -approximate solution of (P) (constructed using the sketch) under the assumption that , where is the largest among the ranks of all optimal solutions of (P). In contrast to either CGAL or HALLaR, a disadvantage of Sketchy-CGAL is that the accuracy of its output primal approximate solution is often low and degrades further as decreases, and can even be undetermined if . Finally, alternative methods for solving SDPs with memory storage are presented in [20, 55, 59].
Structure of the paper.
This paper is organized into four sections. Section 2 discusses the HLR method for solving the AL subproblem (2a) and, more generally, smooth convex optimization problems over the spectraplex . It also presents complexity bounds for HLR, given in Theorem 2.5. Section 3 presents HALLaR for solving the pair of SDPs (P) and (D) and presents the main complexity result of this paper, namely Theorem 3.2, which provides complexity bounds for HALLaR. Finally, Section 4 presents computational experiments comparing HALLaR with various solvers in a large collection of SDPs arising from stable set, phase retrieval, and matrix completion problems.
1.1 Basic Definitions and Notations
Let be the space of dimensional vectors, the space of matrices, and the space of symmetric matrices. Let () be the convex cone in of vectors with positive (nonnegative) entries, and let () be the convex cone in of positive (semi)definite matrices. Let and be the Euclidean inner product and norm on , and let and be the Frobenius inner product and norm on . The minimum eigenvalue of a matrix is denoted by , and denotes a corresponding eigenvector of unit norm. For any and , let .
For a given closed convex set , its boundary is denoted by and the distance of a point to is denoted by . The diameter of , denoted , is defined as
(4) |
The indicator function of , denoted by , is defined by if , and otherwise. The domain of a function is the set . Moreover, is said to be proper if . The -subdifferential of a proper convex function is defined by
(5) |
for every . The classical subdifferential, denoted by , corresponds to . Recall that, for a given , the -normal cone of a closed convex set at , denoted by , is
The normal cone of a closed convex set at is denoted by .
Given a differentiable function , its affine approximation at a point is
(6) |
The function is -smooth on a set if its gradient is -Lipschitz continuous on , i.e.,
(7) |
The set of -smooth functions on is denoted by .
2 Hybrid Low-Rank Method
This section introduces a Hybrid Low-Rank (HLR) method which, as outlined in the introduction, uses a combination of the ADAP-AIPP method and Frank-Wolfe steps for approximately solving convex problems of the form as in (2a). This section consists of three subsections. The first subsection introduces the main problem that the HLR method considers and introduces a notion of the type of approximate solution that it aims to find. The second subsection presents the ADAP-AIPP method and its complexity results. The third subsection states the complete HLR method and establishes its total complexity.
2.1 Problem of Interest and Solution Type
Let be a convex and differentiable function. The HLR method is developed in the context of solving the problem
(8) |
where is the spectraplex as in (1) and is -smooth on , i.e., there exists such that
(9) |
The goal of the HLR method is to find a near-optimal solution of (8) whose definition is given immediately after the next result.
Lemma 2.1.
Let be given and define
(10) |
Then:
-
a)
there hold
(11) -
b)
for any , the inclusion holds
(12) if and only if
(13)
Proof.
(a) The result is immediate from the definition of in (10).
Relation (13) provides an easily verifiable condition for checking whether satisfies inclusion (12). Moreover, (13) is equivalent to the “complementary slackness” condition
(14) |
The next lemma shows that the objective value of an -optimal solution of (8) is within of the optimal value of (8).
Lemma 2.3.
An -optimal solution of (8) satisfies that .
2.2 The ADAP-AIPP Method
As already mentioned in the introduction, one iteration of the HLR method consists of a call to the ADAP-AIPP method followed by a FW step. The purpose of this subsection is to describe the details of the ADAP-AIPP method.
For a given integer , consider the the subproblem obtained by restricting (8) to matrices of rank at most , or equivalently, the reformulation
(15) |
where
(16) |
denotes the Frobenius ball of radius in . In this subsection, the above set will be denoted by since the column dimension remains constant throughout its presentation.
The goal of the ADAP-AIPP method is to find an approximate stationary solution of (15) as described in Proposition 2.4(a) below. Briefly, ADAP-AIPP is an inexact proximal point method which attempts to solve its (potentially nonconvex) prox subproblems using an accelerated composite gradient method, namely, ADAP-FISTA, whose description is given in Appendix B. A rough description of the -th iteration of ADAP-AIPP is as follows: given and a positive scalar , ADAP-AIPP calls the ADAP-FISTA method to attempt to find a suitable approximate solution of the possibly nonconvex proximal subproblem
(17) |
where the first call made is always performed with . If ADAP-FISTA successfully finds such a solution, ADAP-AIPP sets this solution as its next iterate and sets as its next prox stepsize . If ADAP-FISTA is unsuccessful, ADAP-AIPP invokes it again to attempt to solve (17) with . This loop always terminates since ADAP-FISTA is guaranteed to terminate with success when the objective in (17) becomes strongly convex, which occurs when is sufficiently small.
The formal description of the ADAP-AIPP method is presented below. For the sake of simplifying the input lists of the algorithms stated throughout this paper, the parameters and are considered universal ones (and hence not input parameters).
ADAP-AIPP Method
Universal Parameters: and .
Input: quadruple .
-
0.
set , , and
(18) -
1.
choose and call the ADAP-FISTA method in Appendix B with universal input and inputs
(19) (20) -
2.
if ADAP-FISTA fails or its output (if it succeeds) does not satisfy the inequality
(21) then set and go to step ; else, set , , and
(22) and go to step 3;
-
3.
if , then stop with success and output ; else, go to step 4;
-
4.
set and go to step 1.
Several remarks about ADAP-AIPP are now given. First, at each iteration, steps 1 and 2 successively call the ADAP-FISTA method with inputs given by (19) and (20) to obtain a prox stepsize and a pair satisfying (21) and
(23) |
where is part of the input of ADAP-AIPP. Such a pair can be viewed as an approximate stationary solution of prox subproblem (17) with , where the residual is relaxed from being zero to a quantity that is now relatively bounded as in (23). Second, it follows immediately from the inclusion in relation (23) and the definition of in (22) that the pair computed in step 2 of ADAP-AIPP satisfies the inclusion for every iteration . As a consequence, if ADAP-AIPP terminates in step 3, then the pair output by this step is a -approximate stationary solution of (15), i.e., it satisfies
(24) |
Finally, it is interesting to note that ADAP-AIPP is a universal method in that it requires no knowledge of any parameters (such as objective function curvatures) underlying problem (15).
Before stating the main complexity result of the ADAP-AIPP method, the following quantities are introduced
(25) |
(26) |
where is the initial prox stepsize of ADAP-AIPP and and are as in (9) and (16), respectively. Observe that is well-defined and positive due to the fact that .
The main complexity result of ADAP-AIPP is stated in the proposition below. Its proof is in Appendix C.
Proposition 2.4.
The following statements about ADAP-AIPP hold:
- (a)
-
(b)
the total number of ADAP-FISTA calls performed by ADAP-AIPP is no more than
(28) where is the initial prox stepsize and is as in (27).
Some remarks about Proposition 2.4 are now in order. First, it follows from statement (a) that . Second, recall that each ADAP-AIPP iteration may perform more than a single ADAP-FISTA call. Statement (b) implies that the total number of ADAP-FISTA calls performed is at most the total number of ADAP-AIPP iterations performed plus a logarithmic term.
2.3 HLR Method
The goal of this subsection is to describe the HLR method for solving problem (8). The formal description of the HLR method is presented below.
HLR Method
Input: A quintuple for some .
Output: for some such that is an -optimal solution of (8).
-
0.
set , , and .
-
1.
call ADAP-AIPP with quadruple and let denote its output pair ;
-
2.
compute
(29) where
(30) and is a minimum eigenpair of ;
-
3.
set
(31) If
(32) then stop and output pair ; else go to step 4;
-
4.
compute
(33) and set
(34) -
5.
set and go to step 1.
Remarks about each of the steps in HLR are now given. First, step 1 of the -th iteration of HLR calls ADAP-AIPP with initial point to produce an iterate that is an -approximate stationary solution of nonconvex problem (15). Second, step 2 performs a minimum eigenvector (MEV) computation to compute the quantity in (29), which is needed for the termination check performed in step 3. Third, the definition of in (10), and relations (30), (29), and (31), imply that
Hence, it follows from Lemma 2.1(b) that termination criterion (32) in step 3 is equivalent to checking if is a -optimal solution of (8). Fourth, if the termination criterion in step 3 is not satisfied, a FW step at for (8) is taken in step 4 to produce an iterate as in (33) and (34). The computation of is entirely performed in the -space to avoid forming matrices of the form , and hence to save storage space. The reason for performing this FW step is to make HLR escape from the spurious near stationary point of (15) (see the end of the paragraph containing () in the Introduction). Finally, the quantity as in (34) keeps track of the column dimension of the most recently generated . It can either increase by one or be set to one after the update (34) is performed.
The complexity of the HLR method is described in the result below whose proof is given in the next subsection.
Theorem 2.5.
The following statements about the HLR method hold:
- (a)
- (b)
Two remarks about Theorem 2.5 are now given. First, it follows from statement (a) that HLR performs iterations (and hence MEV computations). Second, statement (b) implies that the total number of ADAP-FISTA calls performed by HLR is where is the input tolerance for each ADAP-AIPP call in step 1 of HLR.
Recall from the Introduction that an iteration of the HALLaR method described in (2a) and (2b) has to approximately solve subproblem (2a) and that the HLR method specialized to the case where is the novel proposed tool towards solving it. In the following, we discuss how to specialize some of the steps of the HLR to this case. Step 1 of HLR calls the ADAP-AIPP method whose steps can be easily implemented if it is known how to project a matrix onto the unit ball and how to compute where is as in (15). Projecting a matrix onto the unit ball is easy as it just involves dividing the matrix by its norm. Define the quantities
(37) |
When , the matrix can be explicitly computed as
(38) |
Also, the stepsize in step 4 of HLR has a closed form expression given by
where
(39) |
2.4 Proof of Theorem 2.5
This subsection provides the proof of Theorem 2.5. The following proposition establishes important properties of the iterates generated by HLR.
Proposition 2.6.
Proof.
Relation (41) follows immediately from the way is computed in (31), the definitions of and in (40), and Proposition A.1 with and .
To show relation (43), it is first necessary to show that
(47) |
Consider two possible cases. For the first case, suppose that , which in view of the definitions of and in (29) and (40), respectively implies that . Hence, relation (47) immediately follows. For the other case suppose that and hence . This observation and the fact that is an eigenpair of imply that , which in view of the definition of in (40) immediately implies relation (47).
It is now easy to see that the definition of in (29), relation (47), the update rule for in (34), and the definitions of , , and in (40) imply that
Relation (27) and the fact that HLR during its -th iteration calls ADAP-AIPP with initial point and outputs point imply that . This fact together with the definition of in (15) and the definitions of and in (40) imply the first inequality in (45). Now the first inequality in (45) and relations (41), (42), (43), and (44) imply that and can be viewed as iterates of the -th iteration of the RFW method of Appendix D. The second inequality in (45) is then an immediate consequence of this observation and the second relation in (138).
Since step 1 of the HLR method consists of a call to the ADAP-AIPP method developed in Subsection 2.2, the conclusion of Proposition 2.4 applies to this step. The following result, which will be useful in the analysis of the HLR method, translates the conclusion of Proposition 2.4 to the current setting.
Proposition 2.7.
The following statements about step 1 of the -th iteration of the HLR method hold:
- (a)
-
(b)
the number of ADAP-FISTA calls performed by the ADAP-AIPP call in step 1 is no more than where
(50)
Proof.
(a) The first statement is immediate from relation (24) and the fact that ADAP-AIPP outputs pair . To prove the second statement, suppose that the ADAP-AIPP call made in step 1 terminates after performing iterations. It follows immediately from Proposition 2.4(a) and the fact that the HLR method during its -th iteration calls ADAP-AIPP with initial point and outputs point that satisfies
(51) |
The result then follows from the definitions of and in (26) and (25), respectively.
We are now ready to give the proof of Theorem 2.5.
Proof of Theorem 2.5.
(a) Consider the matrix where is the output of the HLR method. The definition of in (30), the fact that the definitions of and in (29) and (10), respectively, imply that , relation (46), and the stopping criterion (32) in step 3 of the HLR method immediately imply that the pair satisfies relation (13) in Lemma 2.1(b) with and hence is an -optimal solution of (8). To show that the number of iterations that the HLR method performs to find such an -optimal solution is at most the quantity in (35), observe that Proposition 2.6 establishes that the HLR method generates iterates and during its -th iteration such that and can be viewed as iterates of the -th iteration of the RFW method of Appendix D. The result then immediately follows from this observation, Theorem D.1, and the fact that the diameter of is at most 2.
(b) Suppose that the HLR method terminates at an iteration index . It follows from relations (50) and (35) and the definition of in (49) that the HLR method performs at most
(52) |
ADAP-FISTA calls. Now using the fact that , it is easy to see that the last term in (52) is summable:
(53) |
The result then follows from relations (52) and (2.4), the facts that , , , , and the definition of in (15). ∎
3 HALLaR
This section presents an inexact AL method for solving the pair of primal-dual SDPs (P) and (D), namely, HALLaR, whose outline is given in the introduction. It contains two subsections. Subsection 3.1 formally states HALLaR and presents its main complexity result, namely Theorem 3.2. Subsection 3.2 is devoted to the proof of Theorem 3.2.
Throughout this section, it is assumed that (P) and (D) have optimal solutions and , respectively, and that both (P) and (D) have the same optimal value. It is well-known that such an assumption is equivalent to the existence of a triple satisfying the optimality conditions:
(primal feasibility) | ||||
(dual feasibility) | (54) | |||
(complementarity) |
This section studies the complexity of HALLaR for finding an -solution of (54), i.e., a triple that satisfies
(-primal feasibility) | ||||
(dual feasibility) | (55) | |||
(-complementarity) |
3.1 Description of HALLaR and Main Theorem
The formal description of HALLaR is presented next.
HALLaR Method
Input: Initial points , tolerance pair , penalty parameter , and ADAP-AIPP parameters .
Output: , an -solution of (54)
-
0.
set and
(56) -
1.
call HLR with input where , and let denote its output ;
-
2.
set
(57) -
3.
if , then set and return where is as in (37);
-
4.
set and go to step 1.
Some remarks about each of the steps in HALLaR are now given. First, step 1 invokes HLR to obtain an -optimal solution of subproblem (2a) using the previous as initial point. Second, step 2 updates the multiplier according to a full Lagrange multiplier update. Third, it is shown in Lemma 3.1 below that the triple always satisfies the dual feasibility and -complementarity conditions in (55) where is as in (37). Finally, step 3 checks if is an -primal feasible solution. It then follows from the above remarks that if this condition is satisfied, then the triple is an -solution of (55).
Lemma 3.1.
Proof.
Before stating the complexity of HALLaR, the following quantities are first introduced:
(58) | |||
(59) | |||
(60) |
where is an optimal dual solution and is as in (56).
The main result of this paper is now stated.
Theorem 3.2.
The following statements hold:
- a)
- (b)
Two remarks about Theorem 3.2 are now given. First, it follows from statement (a) that HALLaR performs iterations. Second, statement (b) and the definitions of , , and in (61), (64), and (58), respectively imply that HALLaR performs
(65) |
and
(66) |
total HLR iterations (and hence MEV computations) and ADAP-FISTA calls, respectively. In contrast to the case where , the result below shows that the bounds (65) and (66) can be improved when .
Corollary 3.3.
If and , then HALLaR performs at most
(67) |
total HLR iterations (and hence MEV computations) and total ADAP-FISTA calls.
Proof.
It can be shown that each ADAP-FISTA call performs at most
(68) |
iterations/resolvent evaluations.111A resolvent evaluation of is an evaluation of for some . This is an immediate consequence of Lemma C.2(a) in Appendix C, the definition of in (25), the fact that is -smooth, and Lemma 3.7 which is developed in the next subsection.
3.2 Proof of Theorem 3.2
Since HALLaR calls the HLR method at every iteration, the next proposition specializes Theorem 2.5, which states the complexity of HLR, to the specific case of SDPs. Its statement uses the following quantities associated with an iteration of HALLaR:
(69) | |||
(70) | |||
(71) |
Proposition 3.4.
The following statements about the HLR call in step 1 of the -th iteration of the HALLaR hold:
- (a)
- (b)
Proof.
The following Lemma establishes key bounds which will be used later to bound quantities , , and that appear in Proposition 3.4.
Lemma 3.5.
The following relations hold:
(74) | ||||
(75) |
where is the last iteration index of HALLaR, is an optimal Lagrange multiplier, and is as in (56).
Proof.
It follows immediately from Proposition 3.4(a) and the definition of -optimal solution that the HLR call made in step 1 of the -th iteration of HALLaR outputs such that satisfies
(76) |
It is then easy to see that HALLaR is an instance of the AL Framework in Appendix E since the HLR method implements relation (152) in the Blackbox AL with . The proof of relations (74) and (75) now follows immediately from relations (157) and (158) and the fact that . ∎
Lemma 3.6.
Proof.
Let be an iteration index. We first show that
(79) |
holds. It follows immediately from the definition of in (3), the fact that , the Cauchy-Schwarz inequality, and the fact that , that
(80) |
Hence, relation (79) holds with . Suppose now that and let be an iteration index such that . Relation (45), the fact that HALLaR during its -th iteration calls the HLR method with input and initial point , and the fact that imply that and hence that
(81) |
in view of the update rule (57) and the definition of in (3). Summing relation (81) from to and using relation (75) gives
(82) |
Relation (79) now follows by combining relations (80) and (82).
Now, relation (74), the fact that , and the definition of in (3), imply that for any ,
(83) |
where is the last iteration index of HALLaR. Relations (79) and (83) together with the definition of in (58) then imply that for every and iteration index . Relation (77) then follows immediately from this conclusion and the definition of in (69).
Proof.
Let be an iteration index of HALLaR and suppose that . It is easy to see from the definition of in (3) that
(84) |
It then follows from the fact that , Cauchy-Schwarz inequality, triangle inequality, relation (84), and bound (74) that
which immediately implies the result of the lemma in view of the definitions of and in (70) and (59), respectively. ∎
We are now ready to prove Theorem 3.2.
Proof of Theorem 3.2.
(a) It follows immediately from Lemma 3.1 that output satisfies the the dual feasibility and -complementarity conditions in (55). It then remains to show that the triple satisfies the -primal feasibility condition in (55) in at most iterations where is as in (61). To show this, it suffices to show that HALLaR is an instance of the AL framework analyzed in Appendix E. Observe first that (P) is a special case of (148) with and . It is also easy to see that the call to the HLR in step 1 of HALLaR is a special way of implementing step 1 of the AL framework, i.e., the Blackbox AL. Indeed, Proposition 3.4(a) implies that the output of HLR satisfies and hence HLR implements relation (152) in the Blackbox AL with , , , and .
In view of the facts that HALLaR is an instance of the AL framework and , it then follows immediately from Theorem E.1 that HALLaR terminates within the number of iterations in (61) and that the output satisfies the -primal feasibility condition in (55).
(b) Consider the quantity as in (72) where is an iteration index of HALLaR. It follows immediately from relations (56) and (77) and the definition of in (64) that . Hence, it follows from Proposition 3.4(a) that each HLR call made in step 1 of HALLaR performs at most iterations/MEV computations. The result then follows from this conclusion and part (a).
(c) Let be an iteration index of HALLaR. Lemma 3.7 implies that and in view of the definitions of and in (71) and (60), respectively. Hence, it follows from this conclusion, the fact that , Proposition 3.4(b), and part (a) that the total number of ADAP-FISTA calls performed by HALLaR is at most
where is as in (61) and is the last iteration index of HALLaR. The result in (c) then follows immediately from the above relation together with the fact that relation (78) implies that . ∎
4 Computational experiments
In this section the performance of HALLaR is tested against state-of-the art SDP solvers. The experiments are performed on a 2019 Macbook Pro with an 8-core CPU and 32 GB of memory. The methods are tested in SDPs arising from the following applications: maximum stable set, phase retrieval, and matrix completion.
This section is organized into five subsections. The first subsection provides details on the implementation of HALLaR. The second subsection explains the SDP solvers considered in the experiments. The remaining subsections describe the results of the computational experiments in each of the applications.
4.1 Implementation details
Our implementation of HALLaR uses the Julia programming language. The implementation applies to a class of SDPs slightly more general than (P). Let be either the field of real or complex numbers. Let (resp. ) be the space of symmetric (resp. complex Hermitian) matrices, with Frobenius inner product and with positive semidefinite partial order . The implementation applies to SDPs of the form
where , , is a linear map, and is its adjoint. In contrast to (P), the trace of is bounded by instead of one. The inputs for our implementation are the initial points , , the tolerances , , and the data describing the SDP instance, which is explained below. In the experiments, the primal initial point is a random matrix with entries generated independently from the Gaussian distribution over , and the dual initial point is the zero vector. Our computational results below are based on a variant of HALLaR, also referred to as HALLaR in this section, which differs slightly from one described in Section 3 in that the penalty parameter and the tolerance for the AL subproblem are chosen in an adaptive manner based on some of the ideas of the LANCELOT method [16].
The data defining the SDP instance involves matrices of size which should not be stored as dense arrays in large scale settings. Instead of storing a matrix , it is assumed that a routine that evaluates the linear operator , is given by the user.
Similar to the Sketchy-CGAL method of [64], our implementation of HALLaR requires the following user inputs to describe the SDP instance:
-
(i)
The vector and the scalar .
-
(ii)
A routine for evaluating the linear operator .
-
(iii)
A routine for evaluating linear operators of the form for any .
-
(iv)
A routine for evaluating the quadratic function
(85)
Note that the routine in (iv) allows to be evaluated on any matrix in factorized form since where the sum is over the columns of . In addition, the routines in (ii) and (iii) allow to multiply matrices of the form with a matrix by multiplying by each of the columns separately. It follows that all steps of HALLaR (including the steps of HLR and ADAP-AIPP) can be performed by using only the above inputs. For instance, the eigenvalue computation in Step 2 of HLR is performed using iterative Krylov methods, which only require matrix-vector multiplications.
4.2 Competing methods
We compare HALLaR against the following SDP solvers:
-
•
CSDP : Open source Julia solver based on interior point methods;
-
•
COSMO: Open source Julia solver based on ADMM/operator splitting;
-
•
SDPLR : Open source MATLAB solver based on Burer and Monteiro’s LR method;
-
•
SDPNAL+ : Open MATLAB source solver based on AL with a semismooth Newton method;
-
•
T-CGAL : Thin variant of the CGAL method [62] that stores the iterates in factored form;
-
•
-Sketchy : Low-rank variant of CGAL that only stores a sketch of of the iterates .
We use the default parameters in all methods. The -Sketchy method is tested with two possible values of , namely, and .
Given a tolerance , all methods, except SDPLR, stop when a primal solution and a dual solution satisfying
(86) |
is generated, where and denote the primal and dual values of the solution, respectively. SDPLR terminates solely based off primal feasibility, i.e., it terminates when the first condition in (86) is satisfied. The above conditions are standard in the SDP literature, although some solvers use the norm instead of the Euclidean norm. Given a vector , HALLaR, SDPLR, r-Sketchy, and T-CGAL, set and . The definition of implies that and that the left-hand side of the last inequality in (86) is zero.
Recall that a description of -Sketchy is already given in the paragraph preceding the part titled "Structure of the paper" in the Introduction. We now comment on how this method keeps track of its iterates and how it terminates. First, it never stores the iterate but only a rank approximation of it as already mentioned in the aforementioned paragraph of the Introduction. (We refer to as the implicit iterate as is never computed and as the computed one.) Second, it keeps track of the quantities and which, as can be easily verified, allow the first two relative errors in (86) with to be easily evaluated. Third, we kept the termination criterion for -Sketchy code intact in that it still stops when all three errors in (86) computed with the implicit solution , i.e., with , are less than or equal to . The fact that -Sketchy terminates based on the implicit solution does not guarantee, and is even unlikely, that it would terminate based on the computed solution. Fourth, if -Sketchy does not terminate within the time limit specified for each problem class, the maximum of the three errors in (86) at its final computed solution , i.e., with , is reported.
The solver COSMO includes an optional chordal decomposition pre-processing step (thoroughly discussed in [57, 24, 26]), which has not been invoked in the computational experiments reported below. This ensures that all solvers are compared over the same set of SDP instances.
The tables in the following three subsections present the results of the computational experiments performed on large collections of maximum stable set, phase retrieval, and matrix completion, SDP instances. A relative tolerance is set and a time limit of either 10000 seconds ( hours) or 14400 seconds ( 4 hours) is given. An entry of a table marked with (resp., ) means that the corresponding method finds an approximate solution (resp., crashed) with relative accuracy strictly larger than the desired accuracy in which case expresses the maximum of the three final relative accuracies in (86). For -Sketchy, entries marked as mean that it did not terminate within the time limit and the maximum of the three final relative accuracies in (86) of its final computed solution was . The bold numbers in the tables indicate the algorithm that had the best runtime for that instance.
4.3 Maximum stable set
Problem Instance | Runtime (seconds) | ||||||
---|---|---|---|---|---|---|---|
Graph(; ) | HALLaR | T-CGAL | 10-Sketchy | 100-Sketchy | CSDP | COSMO | SDPNAL+ |
G1(800; 19,176) | 218.23 | */.13e-02 | */.31e-01 | */.31e-01 | 226.01 | 322.91 | 60.30 |
G10(800; 19,176) | 241.51 | */.20e-02 | */.78e-01 | */.31e-01 | 220.44 | 229.22 | 55.30 |
G11(800; 1,600) | 3.01 | 1631.45 | 220.59 | 234.76 | 3.74 | 118.66 | 73.50 |
G12(800; 1,600) | 3.06 | 414.27 | 72.56 | 57.03 | 3.12 | 9531.99 | 70.20 |
G14(800; 4,694) | 66.46 | */.27e-03 | 6642.00 | 7231.30 | 9.31 | 3755.36 | 115.30 |
G20(800; 4,672) | 439.37 | */.37e-03 | */.12e-01 | */.12e-01 | 11.42 | */.69e-04 | 341.20 |
G43(1000; 9,990) | 96.79 | */.25e-04 | */.51e-01 | */.26e-01 | 42.39 | */.10e-02 | 62.10 |
G51(1,000; 5,909) | 190.98 | */.23e-03 | */.98e-02 | */.99e-02 | 16.97 | */.33e-04 | 284.40 |
G23(2,000; 19,990) | 390.34 | */.64e-03 | */.86e-01 | */.19e-01 | 288.77 | 5739.78 | 503.80 |
G31(2,000; 19,990) | 357.75 | */.68e-03 | */.20e-01 | */.19e-01 | 290.72 | 5946.84 | 498.30 |
G32(2,000; 4,000) | 3.62 | 3732.70 | 329.17 | 349.73 | 29.04 | */.58e-03 | 853.90 |
G34(2,000; 4,000) | 3.52 | 1705.60 | 162.85 | 177.84 | 29.04 | 458.11 | 1101.80 |
G35(2,000; 11,778) | 730.54 | */.78e-03 | */.62e-02 | */.62e-02 | 730.54 | 120.60 | 2396.60 |
G41(2,000; 11,785) | 555.02 | */.17e-02 | */.59e-02 | */.59e-02 | 114.73 | */.37e-03 | 2027.20 |
G48(3,000; 6,000) | 3.49 | 4069.30 | 288.64 | 306.91 | 81.97 | 1840.91 | 6347.50 |
G55(5,000; 12,498) | 253.22 | */.33e-02 | */.80e-02 | */.79e-02 | 535.57 | */.11e-02 | */.17e-01 |
G56(5,000; 12,498) | 264.46 | */.38e-02 | */.80e-02 | */.79e-02 | 523.06 | */.27e-02 | */.20e00 |
G57(5,000; 10,000) | 4.14 | 7348.50 | 791.75 | 831.06 | 336.93 | 8951.40 | */.10e00 |
G58(5,000; 29,570) | 2539.83 | */.96e-02 | */.24e-01 | */.43e-02 | 2177.03 | */.20e-03 | */.43e-01 |
G59(5,000; 29,570) | 2625.47 | */.54e-02 | */.24e-01 | */.43e-02 | 2178.33 | */.37e+02 | */.65e-01 |
G60(7,000; 17,148) | 476.65 | */.21e-02 | */.13e-01 | */.67e-02 | 2216.50 | */.39e+02 | */.10e+01 |
G62(7,000; 14,000) | 5.00 | */.28e-03 | 1795.50 | 1474.40 | 1463.18 | */.12e-03 | */.10e+01 |
G64(7,000; 41,459) | 3901.68 | */.16e-01 | */.36e-02 | */.36e-02 | 7127.72 | */.99e+01 | */.10e+01 |
G66(9,000; 18,000) | 5.77 | */.82e-01 | 2788.70 | 3022.70 | 2076.17 | */.77e-03 | */.10e+01 |
G67(10,000; 20,000) | 5.87 | */.13e-01 | 3725.80 | 3941.70 | 7599.80 | ** | */.47e+01 |
G72(10,000; 20,000) | 5.92 | */.11e00 | 3936.30 | 3868.60 | 7450.01 | ** | */.47e+01 |
G77(14,000; 28,000) | 8.08 | */.24e00 | */.60e-02 | */.60e-02 | ** | ** | */.99e00 |
G81(20,000; 40,000) | 10.89 | */.10e00 | */.91e-01 | */.71e-01 | ** | ** | */.10e+01 |
tor(69,192; 138,384) | 40.64 | */.38e00 | */.63e00 | */.29e00 | ** | ** | ** |
Given a graph , the maximum stable set problem consists of finding a subset of vertices of largest cardinality such that no two vertices are connected by an edge. Lovász [40] introduced a constant, the -function, which upper bounds the value of the maximum stable set. The -function is the value of the SDP
(87) |
where is the all ones vector. It was shown in [27] that the -function agrees exactly with the stable set number for perfect graphs.
Problem Instance | Runtime (seconds) | |||
---|---|---|---|---|
Graph(; ) | HALLaR | T-CGAL | 10-Sketchy | 100-Sketchy |
(8,192; 53,248) | 5.04 | */.23e00 | 1603.80 | 882.03 |
(16,384; 114,688) | 9.09 | */.45e00 | 6058.60 | 6712.20 |
(32,768; 245,760) | 65.22 | */.19e00 | */.19e-01 | */.14e-01 |
(65,536; 524,288) | 104.71 | */.11e-01 | */.24e00 | */.11e-01 |
(131,072; 1,114,112) | 69.63 | */.34e-01 | */.72e00 | */.32e-01 |
(262,144; 2,359,296) | 244.90 | */.99e-02 | */.88e-02 | */.31e00 |
(524,288; 4,980,736) | 786.73 | */.42e00 | */.35e00 | */.24e00 |
(1,048,576; 10,485,760) | 1157.96 | */.47e00 | */.31e-02 | */.31e-02 |
Problem Instance | Runtime (seconds) |
---|---|
Graph(; ) | HALLaR |
(2,097,152; 22,020,096) | 2934.33 |
(4,194,304; 46,137,344) | 6264.50 |
(8,388,608; 96,468,992) | 14188.23 |
(16,777,216; 201,326,592) | 46677.82 |
Problem Instance | Runtime (seconds) | ||
---|---|---|---|
Problem Size | Graph | Dataset | HALLaR |
10,937; 75,488 | wing_nodal | DIMACS10 | 1918.48 |
16,384; 49,122 | delaunay_n14 | DIMACS10 | 1355.01 |
16,386; 49,152 | fe-sphere | DIMACS10 | 147.93 |
22,499; 43,858 | cs4 | DIMACS10 | 747.66 |
25,016; 62,063 | hi2010 | DIMACS10 | 3438.06 |
25,181; 62,875 | ri2010 | DIMACS10 | 2077.97 |
32,580; 77,799 | vt2010 | DIMACS10 | 2802.37 |
48,837; 117,275 | nh2010 | DIMACS10 | 8530.38 |
24,300; 34,992 | aug3d | GHS_indef | 8.56 |
32,430; 54,397 | ia-email-EU | Network Repo | 530.21 |
11,806; 32,730 | Oregon-2 | SNAP | 2787.19 |
11,380; 39,206 | wiki-RFA_negative | SNAP | 1151.31 |
21,363; 91,286 | ca-CondMat | SNAP | 7354.75 |
31,379; 65,910 | as-caida_G_001 | SNAP | 3237.93 |
26,518; 65,369 | p2p-Gnutella24 | SNAP | 344.83 |
22,687; 54,705 | p2p-Gnutella25 | SNAP | 235.03 |
36,682; 88,328 | p2p-Gnutella30 | SNAP | 542.07 |
62,586; 147,892 | p2p-Gnutella31 | SNAP | 1918.30 |
49,152; 69,632 | cca | AG-Monien | 47.24 |
49,152; 73,728 | ccc | AG-Monien | 12.14 |
49,152; 98,304 | bfly | AG-Monien | 13.15 |
16,384; 32,765 | debr_G_12 | AG-Monien | 818.61 |
32,768; 65,533 | debr_G_13 | AG-Monien | 504.29 |
65,536; 131,069 | debr_G_14 | AG-Monien | 466.67 |
131,072; 262,141 | debr_G_15 | AG-Monien | 488.07 |
262,144; 524,285 | debr_G_16 | AG-Monien | 1266.71 |
524,288; 1,048,573 | debr_G_17 | AG-Monien | 5793.57 |
1,048,576; 2,097,149 | debr_G_18 | AG-Monien | 13679.12 |
Problem Instance | Runtime (seconds) | |
---|---|---|
Graph(; ) | HALLaR | SDPLR |
(1024; 5120) | 2.90 | 1.28 |
(2048; 11264) | 3.03 | 10.14 |
(4096; 24576) | 3.49 | 56.60/.12e-03 |
(8192; 53248) | 5.04 | 399.89/.38e-03 |
(16384; 114688) | 9.09 | 2469.11/.16e-02 |
(32768; 245760) | 65.22 | */.11e-01/.46e00 |
Tables 1, 2, 3, 4, and 5 present the results of the computational experiments performed on the maximum stable set SDP. Table 1 compares HALLaR against all the methods listed in Subsection 4.2 on smaller graph instances, i.e., with number of vertices not exceeding 70,000. All graph instances considered, except the last instance, are taken from the GSET data set, a curated collection of randomly generated graphs that can be found in [61]. The larger GSET graphs (GSET 66-81) are all toroidal graphs where every vertex has degree 4. The last graph instance presented in Table 1 is a large toroidal graph with approximately 70,000 vertices that we generated ourselves. A time limit of 10000 seconds (approximately 3 hours) is given. Table 2 compares HALLaR against T-CGAL, -Sketchy, and 100-Sketchy, on large graph instances with up to 1 million vertices and 10 million edges. CSDP was not included in Table 2 since it crashed on all but one of the instances included in it. All graph instances considered in Table 2 are Hamming graphs, a special class of graphs that has number of vertices and number of edges. The vertex set of such graphs can be seen as corresponding to binary words of length , and the edges correspond to binary words that differ in one bit. A time limit of 14400 seconds (4 hours) is now given.
Tables 3 and 4 solely present the performance of HALLaR on extremely large-sized Hamming instances (i.e., with number of vertices exceeding 2 millon) and hard graph instances from real-world datasets, respectively. The graph instances considered in Table 4 are taken from the DIMACS10, Stanford SNAP, AG-Monien, GHS_indef, and Network Repositories [39, 53, 17, 1].
Table 5 displays a special comparison between HALLaR and SDPLR on 6 different Hamming graphs. Recall that SDPLR terminates only based off the first condition in (86) and hence often finds a solution that does not satisfy the second condition in (86) with the desired accuracy of . An entry marked with time/N in Table 5 means that the corresponding method finds an approximate solution (within the time limit) that satisfies the first relation in (86) with but does not satisfy the second relation in (86) with the desired accuracy in which case expresses the final accuracy that the method satisfies the second relation in (86) with. An entry marked with */N1/N2 means that SDPLR finds an approximate solution that satisfies the first relation in (86) with relative accuracy strictly larger than the desired accuracy of in which case (resp., ) expresses the final accuracy that SDPLR satisfies the first (resp., second) relation in (86) with.
Remarks about the results presented in Tables 1, 2, 3, 4, and 5 are now given. As seen from Table 1, CSDP and HALLaR are the two best performing methods on these smaller graph instances. HALLaR, however, is the only method that can solve each of the instances to the desired accuracy of within the time limit of approximately 3 hours. On graph instances where the number of vertices exceeds 14,000 (resp., 10,000), CSDP (resp., COSMO) cannot perform a single iteration within 3 hours or crashes. SDPNAL+ crashed with a lack of memory error on the last graph instance with 69,192 vertices. Although T-CGAL, 10-Sketchy, and 100-Sketchy do not perform especially well on the smaller graph instances considered in Table 1, they are included for comparison on the larger graph instances considered in Table 2 since they require considerably less memory than CSDP, COSMO, and SDPNAL+. The results presented in Table 2 demonstrate that HALLaR performs especially well for larger instances as it is the only method that can solve all instances within the time limit of 4 hours. T-CGAL, 10-Sketchy, and 100-Sketchy cannot find a solution with the desired accuracy of on most of the instances considered, often finding solutions with accuracies on the range of to . CSDP was tested on the problems considered in Table 2 but not included for comparison since it crashed on every instance except one. COSMO and SDPNAL+ are not included for comparison due to their high memory requirements.
Tables 3 and 4 show that HALLaR can solve extremely large Hamming instances and hard real-world instances, respectively, within a couple of hours. As seen from Table 3, HALLaR can solve a Hamming instance with 4 million vertices and 40 million edges (resp. 16 million vertices and 200 million edges) in under 2 hours (resp., 13 hours). Table 4 shows that HALLaR can solve a huge Debruijin graph instance (which arises in the context of genome assembly) in just a few hours.
The results presented in Table 5 display the superior performance of HALLaR compared to SDPLR on six different Hamming graphs. HALLaR not only finds more accurate solutions than SDPLR within the time limit of 4 hours but is also at least 80 times faster than SDPLR on the three largest instances.
4.4 Phase retrieval
Given pairs , consider the problem of finding a vector such that
In other words, the goal is to retrieve from the magnitude of linear measurements. By creating the complex Hermitian matrix , this problem can be approached by solving the complex-valued SDP relaxation
The motivation of the trace objective function is that it promotes obtaining a low rank solution. It was shown in [12] that the relaxation is tight (i.e., the vector can be retrieved from the SDP solution ) when the vectors are sampled independently and uniformly on the unit sphere. Notice that this class of SDPs does not have a trace bound. However, since the objective function is precisely the trace, any bound on the optimal value can be used as the trace bound. In particular, the squared norm of the vector is a valid trace bound. Even though is unknown, bounds on its norm are known (see for example [63]).
Computational experiments are performed on the synthetic data set from [64] that is based on the coded diffraction pattern model from [11]. Given , the hidden solution vector is generated from the complex standard normal distribution. The are measurements that are indexed by pairs . Consider vectors for , where the entries of are products of of two independent random variables: the first is the uniform distribution on , and the second chooses from with probabilities and . The linear measurements correspond to modulating the vector with each of the ’s and then taking a discrete Fourier transform:
where denotes the Hadamard product, and denotes the -th entry of the discrete Fourier transform. The vector is obtained by applying the measurements to . The trace bound is set as , similarly as in [64].
Tables 6 and 7 present the results of the computational experiments performed on the phase retrieval SDP. As mentioned in the above paragraph, all instances considered are taken from a synthetic dataset that can be found in [64]. Table 6 compares HALLaR against T-CGAL, -Sketchy, and 100-Sketchy on medium sized phase retrieval instances, i.e., the dimension is either 10000 or 31623. The ranks of the outputted solutions of HALLaR and T-CGAL are now also reported. For entries corresponding to HALLaR and T-CGAL, the number reported after the last forward slash indicates the rank of that corresponding method’s outputted solution. A time limit of 14400 seconds (4 hours) is given. Table 7 solely presents the performance of HALLaR on larger sized phase retrieval instances, i.e., with dimension greater than or equal to 100,000. The rank of the outputted solution of HALLaR is again reported.
Problem Instance | Runtime (seconds) | |||
---|---|---|---|---|
Problem Size | HALLaR | T-CGAL | 10-Sketchy | 100-Sketchy |
10,000; 120,000 | 69.11/2 | */.18e-01/561 | */.13e-01 | */.25e-01 |
10,000; 120,000 | 66.14/2 | */.54e-01/521 | 11112.00 | */.80e-01 |
10,000; 120,000 | 64.42/2 | */.12e00/224 | */.31e-01 | */.13e00 |
10,000; 120,000 | 99.98/2 | */.28e-01/201 | */.13e00 | */.26e-01 |
31,623; 379,476 | 620.82/3 | */.29e00/1432 | */.77e-01 | */.23e00 |
31,623; 379,476 | 982.34/2 | */.23e00/729 | */.63e-01 | */.93e00 |
31,623; 379,476 | 870.25/2 | */.66e00/794 | */.65e-02 | */.78e-01 |
31,623; 379,476 | 712.09/2 | */.10e+01/1280 | */.10e+01 | */.82e00 |
Problem Instance | Runtime (seconds) |
---|---|
Problem Size | HALLaR |
100,000; 1,200,000 | 1042.92/4 |
100,000; 1,200,000 | 1147.46/3 |
100,000; 1,200,000 | 929.67/5 |
100,000; 1,200,000 | 939.23/5 |
316,228; 3,794,736 | 8426.94/5 |
316,228; 3,794,736 | 2684.83/1 |
316,228; 3,794,736 | 7117.31/6 |
316,228; 3,794,736 | 7489.42/7 |
3,162,278; 37,947,336 | 40569.10/1 |
Table 6 only compares HALLaR against T-CGAL and Sketchy-CGAL since these are the only methods that take advantage of the fact that the linear maps and in the phase retrieval SDP can be evaluated efficiently using the fast Fourier transform (FFT). As seen from Table 6, HALLaR is the best performing method and the only method that can solve each instance to a relative accuracy of within the time limit of 4 hours. T-CGAL and Sketchy-CGAL were unable to solve most instances to the desired accuracy, often finding solutions with accuracies on the range of to in hours. Sketchy-CGAL was also over 150 times slower than HALLaR on the single instance that it was able to find a accurate solution.
Since T-CGAL and Sketchy-CGAL did not perform well on the medium sized phase retrieval instances considered in Table 6, computational results for large sized phase retrieval instances are only presented for HALLaR in Table 7. The results presented in Table 7 show that HALLaR solves a phase retrieval SDP instance with dimension pair in approximately minutes and also one with dimension pair in just hours.
4.5 Matrix completion
Consider the problem of retrieving a low rank matrix , where , by observing a subset of its entries: , . A standard approach to tackle this problem is by considering the nuclear norm relaxation:
The above problem can be rephrased as the following SDP:
(88) |
Problem Instance | Runtime (seconds) | ||
---|---|---|---|
Problem Size | HALLaR | 10-Sketchy | |
10,000; 828,931 | 3 | 321.81 | */.81e00/.89e-02 |
10,000; 828,931 | 3 | 332.54 | */.80e00/.82e-02 |
10,000; 2,302,586 | 5 | 1117.60 | */.92e00/.28e00 |
10,000; 2,302,586 | 5 | 1067.15 | */.11e+01/ .41e00 |
31,623; 2,948,996 | 3 | 1681.03 | */.81e00/.69e-02 |
31,623; 2,948,996 | 3 | 1362.22 | */.81e00/.82e-02 |
31,623; 8,191,654 | 5 | 4740.48 | */.90e00/.43e-01 |
31,623; 8,191,654 | 5 | 5238.57 | */.90e00/.84e-01 |
Problem Instance | Runtime (seconds) | |
---|---|---|
Problem Size | HALLaR | |
75,000; 3,367,574 | 2 | 3279.85 |
75,000; 7,577,040 | 3 | 5083.68 |
100,000; 4,605,171 | 2 | 2872.44 |
100,000; 10,361,633 | 3 | 6048.63 |
150,000; 7,151,035 | 2 | 10967.74 |
150,000; 16,089,828 | 3 | 14908.08 |
200,000; 9,764,859 | 2 | 13454.12 |
200,000; 21,970,931 | 3 | 28021.56 |
The nuclear norm relaxation was introduced in [22]. It was shown in [10] it provably completes the matrix when is sufficiently large and the indices of the observations are independent and uniform.
Similar to the SDP formulation of phase retrieval in subsection 4.4, the SDP formulation of matrix completion does not include a trace bound, but the objective function is a multiple of the trace. Hence, any bound on the optimal value leads to a trace bound. In particular, a valid trace bound is , where is the trivial completion, which agrees with in the observed entries and has zeros everywhere else. However, computing the nuclear norm of is expensive, as it requires an SVD decomposition. In the experiments the inexpensive, though weaker, bound is used instead.
The matrix completion instances are generated randomly, using the following procedure. Given , the hidden solution matrix is the product , where the matrices and have independent standard Gaussian random variables as entries. Afterwards, independent and uniformly random observations from are taken. The number of observations is where is the oversampling ratio.
Tables 8 and 9 present the results of the computational experiments performed on the matrix completion SDP. All instances are generated randomly using the procedure described in the previous paragraph. Table 8 compares HALLaR against -Sketchy on medium sized matrix completion instances, i.e., the dimension is either 10000 or 31623. A time limit of 14400 seconds (4 hours) is given. On instances where -Sketchy did not terminate within the time limit, the relative accuracy of both of its computed and implicit solutions are now reported. An entry marked with means that, within 4 hours, the implicit solution corresponding to 10-Sketchy had relative accuracy strictly larger than the desired accuracy in which case (resp. ) expresses the maximum of the three relative accuracies in (86) of its computed (resp. implicit) solution. Table 9 solely presents the performance of HALLaR on larger sized matrix completion instances, i.e., with dimension greater than or equal to 75000.
Table 8 only compares HALLaR against 10-Sketchy due to 10-Sketchy’s low memory requirements and its superior/comparable performance to 100-Sketchy and T-CGAL on previous problem classes. As seen from Table 8, HALLaR is the best performing method and the only method that can solve each instance to a relative accuracy of within the time limit of 4 hours. 10-Sketchy is unable to solve a single instance to the desired accuracy, often finding solutions with accuracies on the range of to in 4 hours.
Since 10-Sketchy did not perform well on the medium sized matrix completion instances considered in Table 8, computational results for large sized matrix completion instances are only presented for HALLaR in Table 9. The results presented in Table 9 show that HALLaR solves a matrix completion instance with dimension pair in approximately minutes and also one with dimension pair in just hours.
Appendix A Technical Results
The following section states some useful facts about the spectraplex defined in (1). The first result characterizes the optimal solution a given linear form over the set . The second result characterizes its -normal cone.
A.1 Characterization of Optimal Solution of Linear Form over Spectraplex
Consider the problem
(89) |
where is as in (1). The optimality condition for (89) implies that is an optimal solution of (89) if and only if there exists such that the pair satisfies
(90) |
The following proposition explicitly characterizes solutions of the above problem using the special structure of .
Proposition A.1.
Proof.
Consider the pair as defined in (91). It is immediate from the definition of that . Consider now two cases. For the first case, suppose that and hence . It then follows immediately that the pair satisfies the optimality conditions in (90).
For the second case, suppose that . Thus and . Clearly, then and and hence the pair satisfies the first and fourth relations in (90). The fact that is an eigenpair implies that and hence the pair also satisfies the second relation in (90).
∎
A.2 Characterization of -Normal Cone of Spectraplex
Proposition A.2.
Let and where is as in (91). Then:
-
a)
there hold
(92) -
b)
for any , the inclusion holds
(93) if and only if
(94)
Appendix B ADAP-FISTA Method
Let denote a finite-dimensional inner product real vector space with inner product and induced norm denoted by and , respectively. Also, let be a proper closed convex function whose domain , has finite diameter .
ADAP-FISTA considers the following problem
(95) |
where is assumed to satisfy the following assumption:
-
(B1)
is a real-valued function that is differentiable on and there exists such that
(96) where
(97) and is the closed unit ball centered at with radius .
We now describe the type of approximate solution that ADAP-FISTA aims to find.
Problem: Given satisfying the above assumptions, a point , a parameter , the problem is to find a pair such that
(98) |
We are now ready to present the ADAP-FISTA algorithm below.
ADAP-FISTA Method
Universal Parameters: and .
Input: initial point , scalars , , and function pair .
-
0.
set , , , and ;
-
1.
Set ;
-
2.
Compute
(99) (100) If the inequality
(101) holds go to step 3; else set and repeat step 2;
-
3.
Compute
(102) (103) (104) -
4.
If the inequality
(105) holds, then go to step 5; otherwise, stop with failure;
-
5.
Compute
(106) If the inequality
(107) holds then stop with success and output ; otherwise, and go to step 1.
The ADAP-FISTA method was first developed in [56]. The method assumes that the gradient of is Lipschitz continuous on all of since it requires Lipchitz continuity of the gradient at the sequences of points and . This assumption can be relaxed to as in (B3) by showing that the sequence lies in the set defined in (97). The following lemma establishes this result inductively by using several key lemmas which can be found in [56].
Lemma B.1.
Let and suppose ADAP-FISTA generates sequence . Then, the following statements hold:
-
(a)
for any ;
-
(b)
for any , the relation
(108) holds;
-
(c)
.
As a consequence, the entire sequence .
Proof.
(a) Let Clearly since the definition of in (97) implies that . Now, using the facts that and , assumption (B1) implies that is Lipschitz continuous at these points. The proof of (a) is then identical to the one of Lemma A.3(b) in [56].
(b) Clearly, since ADAP-FISTA generates sequence , its loop in step 2 always terminates during its first iterations. Hence, ADAP-FISTA also generates sequences and . The proof of relation (108) then follows from this observation, (a), and by using similar arguments to the ones developed in Lemmas A.6-A.10 of [56].
(c) It follows from the fact that , relation (108) with , and the definition of that . This relation, the fact that , and the definition of in (97) then imply that
The result then immediately follows from the fact that is a convex combination of and and that is a convex set.
The last statement in Proposition B.1 follows immediately from (c) and a simple induction argument. ∎
We now present the main convergence results of ADAP-FISTA, whose proofs can be found in [56]. Proposition B.2 below gives an iteration complexity bound regardless if ADAP-FISTA terminates with success or failure and shows that if ADAP-FISTA successfully stops, then it obtains a stationary solution of (95) with respect to a relative error criterion. It also shows that ADAP-FISTA always stops successfully whenever is -strongly convex.
Proposition B.2.
The following statements about ADAP-FISTA hold:
-
(a)
if , it always stops (with either success or failure) in at most
iterations/resolvent evaluations;
-
(b)
if it stops successfully, it terminates with a triple satisfying
(109) -
(c)
if is -convex on , then ADAP-FISTA always terminates with success and its output , in addition to satisfying (109) also satisfies the inclusion .
Appendix C Proof of Proposition 2.4
Let . The following lemma establishes that the function in (15) has Lipschitz continuous gradient over .
Proof.
Let be given. Adding and subtracting and using the triangle inequality, we have
(110) |
This relation, the chain rule, the triangle inequality, the facts that and is -smooth on , and the definition of in (25), then imply that
The conclusion of the lemma now follows from the above inequality and the definition of in (25). ∎
The following lemma establishes key properties about each ADAP-FISTA call made in step 1 of ADAP-AIPP. It is a translation of the results in Proposition B.2.
Lemma C.2.
Let be as in (20). The following statements about each ADAP-FISTA call made in the -th iteration of ADAP-AIPP hold:
-
(a)
if , it always stops (with either success or failure) in at most
iterations/resolvent evaluations where is the initial prox stepsize and is as in (25);
-
(b)
if ADAP-FISTA stops successfully, it terminates with a triple satisfying
(111) (112) where ;
-
(c)
if is -convex on , then ADAP-FISTA always terminates with success and its output always satisfies relation (21).
Proof.
(a) The result follows directly from Proposition B.2 in Appendix B and hence the proof relies on verifying its assumptions. First it is easy to see that , where is as in (20). It also follows immediately from the fact that and from Lemma C.1 that is -smooth on in view of its definition in (20). These two observations imply that satisfies (96). Hence, it follows from this conclusion, the fact that each ADAP-FISTA call is made with , and from Proposition B.2(a) that statement (a) holds.
(b) In view of the definition of in (20), it is easy that see that . Statement (b) then immediately follows from this observation, the fact that each ADAP-FISTA call is made with inputs , , and as in (20), and from Proposition B.2(b) with .
(c) It follows immediately from Proposition B.2(c) and the fact that each ADAP-FISTA call is made with inputs as in (20) and that the first conclusion of statement (c) holds and that output satisfies inclusion
(113) |
Inclusion (113) and the definition of subdifferential in (5) then immediately imply that output satisfies relation (21). ∎
The lemma below shows that, in every iteration of ADAP-AIPP, the loop within steps 1 and 2 always stops and shows key properties of its output. Its proof (included here for completeness) closely follows the one of Proposition 3.1 of [56].
Lemma C.3.
The following statements about ADAP-AIPP hold for every :
-
(a)
the function in (20) has -Lipschitz continuous gradient on ;
-
(b)
the loop within steps 1 and 2 of its -th iteration always ends and the output obtained at the end of step 2 satisfies
(114) (115) (116) (117) (118) where , is the initial prox stepsize, and and are as in (25) and (26), respectively; moreover, every prox stepsize generated within the aforementioned loop is in .
Proof.
(a) It follows that has -Lipschitz continuous gradient on in view of its definition in (20) and Lemma C.1. The result then follows immediately from the fact that .
(b) We first claim that if the loop consisting of steps 1 and 2 of the -th iteration of ADAP-AIPP stops, then relations (114), (115), (116), and (117) hold. Indeed, assume that the loop consisting of steps 1 and 2 of the -th iteration of ADAP-AIPP stops. It then follows from the logic within step 1 and 2 of ADAP-AIPP that the last ADAP-FISTA call within the loop stops successfully and outputs triple satisfying (21), which immediately implies that (116) holds. Since (a) implies that satisfies relation (96), it follows Proposition B.2(b) with as in (20), , and that the triple satisfies inequality (117) and the following two relations
(119) | |||
(120) |
Now, using the definition of in (22), it is easy to see that the inclusion (119) is equivalent to (114) and that the inequality in (120) together with the triangle inequality for norms imply the two inequalities in (115).
We now claim that if step 1 is performed with a prox stepsize in the -th iteration, then for every , we have that and the -th iteration performs step 1 only once. To show the claim, assume that . Using this assumption and the fact that Lemma C.1 implies that is weakly convex on , it is easy to see that the function in (20) is strongly convex on with modulus . Since each ADAP-FISTA call is performed in step 1 of ADAP-AIPP with , it follows immediately from Proposition B.2(c) with as in (20) that ADAP-FISTA terminates successfully and outputs a pair satisfying . This inclusion, the definitions of , and the definition of subdifferential in (5), then imply that (21) holds. Hence, in view of the termination criteria of step 2 of ADAP-AIPP, it follows that . It is then easy to see, by the way is updated in step 2 of ADAP-AIPP, that is not halved in the -th iteration or any subsequent iteration, hence proving the claim.
It is now straightforward to see that the above two claims, the fact that the initial value of the prox stepsize is equal to , and the way is updated in ADAP-AIPP, imply that the lemma holds. ∎
Lemma C.4.
Proof.
The result follows from a simple induction argument. The inequality with is immediate due to the facts that and . Now suppose inequality (121) holds for . It then follows from relation (117) and the fact that that
where the equality is due to the assumption that (121) holds for . Hence, Lemma C.4 holds. ∎
Remark C.5.
The following lemma shows that ADAP-AIPP is a descent method.
Lemma C.6.
Proof.
We are now ready to give the proof of Proposition 2.4.
Proof of Proposition 2.4.
(a) The first statement of (a) follows immediately from the fact that relation (114) and the termination criterion of ADAP-AIPP in its step 3 imply that the pair satisfies (24). Assume now by contradiction that (27) does not hold. This implies that there exists an iteration index such that
(124) |
As ADAP-AIPP generates as an iteration index, it does not terminate at the -th iteration. In view of step 3 of ADAP-AIPP, this implies that for every . Using this conclusion and the fact that , and summing inequality (122) from to , we conclude that
which can be easily seen to contradict (124).
(b) The result follows immediately from (a) and the fact that the number of times is divided by in step 2 of ADAP-AIPP is at most . ∎
Appendix D Relaxed Frank-Wolfe Method
Let denote a finite-dimensional inner product real vector space with inner product and induced norm denoted by and , respectively. Let be a nonempty compact convex set with diameter . Consider the problem
(125) |
where is a convex function that satisfies the following assumption:
-
(A1)
there exists such that
(126)
The formal description of the Relaxed FW (RFW) method and its main complexity result for finding a near-optimal solution of (125) are presented below. The proof of the main result is given in the next subsection.
RFW Method
Input: tolerance and initial point .
Output: a point .
-
0.
set ;
-
1.
find a point such that
(127) -
2.
compute
(128) -
3.
if , then stop and output the point ; else compute
(129) and set
(130) -
4.
set and go to step 1.
Theorem D.1.
For a given tolerance , the RFW method finds a point such that
(131) |
in at most
(132) |
iterations where is as in (126) and is the diameter of .
D.1 Proof of Theorem D.1
This subsection is dedicated to proving Theorem D.1.
The following lemma establishes important properties of the iterates and .
Lemma D.2.
Proof.
The following lemma establishes that the RFW method is a descent method.
Lemma D.3.
Define
(136) |
Then the following statements hold for every :
(137) |
(138) |
Proof.
The next proposition establishes the convergence rate of the RFW method.
Proposition D.4.
For every , the following relations hold:
(139) | ||||
(140) |
Proof.
Define and let for any iteration index . It then follows from relations (133) and (137) and relation (138) with that the following two relations hold
(141) | |||
(142) |
Hence, using relations (141) and (142), relation (127) with , and the expression for in (136), it follows that
(143) | ||||
(144) |
It follows from summing the inequality in (143) from to that
(145) |
which, together, with the definition of implies relation (139).
We are now ready to prove Theorem D.1.
Appendix E AL method for linearly-constrained convex optimization
This section is dedicated to analyzing the convergence of the augmented Lagrangian framework for solving linearly-constrained convex optimization problems.
Let denote an Euclidean space, be a linear operator, , be a differentiable convex function, and be a closed proper convex function. Consider the linearly-constrained convex optimization problem
(148) |
where the domain of has finite diameter . The following assumption is also made.
Assumption E.1.
There exists such that
(149) |
Given a previous dual iterate , the AL framework finds the next primal iterate by
(150) |
where
(151) |
is the augmented Lagrangian function and is a fixed penalty parameter. We assume the existence of a blackbox that inexactly solves such minimization problems as in (150).
Blackbox AL.
Given a pair and convex functions and , the blackbox returns a pair satisfying
(152) |
The AL framework is now presented formally below.
AL Framework
Input: , tolerances > 0, , and penalty parameter .
Output: triple .
-
0.
Set and
(153) -
1.
Call the Blackbox AL with tolerance pair and functions and and let be its output;
-
2.
Set
(154) -
3.
If , then set and return ;
-
4.
Set and go to step 1.
The following result states the main iteration complexity of the AL framework and establishes the boundedness of its sequence of Lagrange multipliers. The proof of the result is given in the next subsection.
Theorem E.1.
E.1 Proof of Theorem E.1
This subsection is dedicated to proving Theorem E.1. The proof relies on the following two preliminary lemmas.
Lemma E.2.
Proof.
Since is an optimal primal-dual pair, it follows that
(160) |
where is as in (148). Relation (152) implies that
(161) |
It follows from relations (160) and (161) and the definition of -subdifferential that
Adding the two above relations implies that
(162) |
Relations (160) and (162) then imply that
from which the result immediately follows. ∎
Lemma E.3.
For any iteration index of the AL framework, there holds:
(163) |
Proof.
Let be an iteration index of the AL framework and suppose . By completing the square and using relation (154), it follows that
(164) |
Moreover, relation (159), the definition of , the Cauchy-Schwarz inequality, and the fact that the Blackbox is called in step 1 with tolerance , imply that
(165) |
Combining relations (E.1) and (165), we then conclude that
(166) |
The conclusion of the lemma now follows by summing relation (166) from to . ∎
We are now ready to prove Theorem E.1
Proof of Theorem E.1.
(a) Let be an iteration index of the AL framework. The fact that the Blackbox AL is called in step 1 with inputs and implies that its output satisfies that and also
Since and , it follows that
Since the above relations hold for any iteration index , the output of the AL framework satisfies the first two relations in (155). It remains to show that the AL framework terminates and that its last iteration index satisfies (156). Suppose by contradiction that the AL framework generates an iteration index satisfying
(167) |
In view of the stopping criterion of step 3 of the AL framework, this implies that for every . Using this conclusion, relation (163) with , and the definitions of and in (153), it follows that
(168) |
which clearly contradicts the bound on in (167). Hence, in view of this conclusion and the termination criterion in step 3, the AL framework must terminate with final iteration index satisfying (156) and output satisfying the third relation in (155).
(b) Let where is the final iteration index of the AL framework. It then follows from taking square root of relation (163) and triangle inequality that
(169) |
The fact that satisfies relation (156) and the definitions of and in (153) then imply that
(170) |
Relation (157) then immediately follows from combining relations (169) and (170).
References
- [1] David A. Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner, editors. Graph Partitioning and Graph Clustering, 10th DIMACS Implementation Challenge Workshop, Georgia Institute of Technology, Atlanta, GA, USA, February 13-14, 2012. Proceedings, volume 588 of Contemporary Mathematics. American Mathematical Society, 2013.
- [2] Alexander I. Barvinok. Problems of distance geometry and convex properties of quadratic maps. Discrete & Computational Geometry, 13:189–202, 1995.
- [3] Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, and Praneeth Netrapalli. Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form. In Conference On Learning Theory, pages 3243–3270. PMLR, 2018.
- [4] Nicolas Boumal, Vlad Voroninski, and Afonso Bandeira. The non-convex burer-monteiro approach works on smooth semidefinite programs. Advances in Neural Information Processing Systems, 29, 2016.
- [5] Nicolas Boumal, Vladislav Voroninski, and Afonso S Bandeira. Deterministic guarantees for burer-monteiro factorizations of smooth semidefinite programs. Communications on Pure and Applied Mathematics, 73(3):581–608, 2020.
- [6] Samuel Burer and Renato DC Monteiro. A projected gradient algorithm for solving the maxcut SDP relaxation. Optimization methods and Software, 15(3-4):175–200, 2001.
- [7] Samuel Burer and Renato DC Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical programming, 95(2):329–357, 2003.
- [8] Samuel Burer and Renato DC Monteiro. Local minima and convergence in low-rank semidefinite programming. Mathematical programming, 103(3):427–444, 2005.
- [9] M.L. Overton C. Helmberg and F. Rendl. The spectral bundle method with second-order information. Optimization Methods and Software, 29(4):855–876, 2014.
- [10] Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111–119, 2012.
- [11] Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval from coded diffraction patterns. Applied and Computational Harmonic Analysis, 39(2):277–299, 2015.
- [12] Emmanuel J Candes, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013.
- [13] Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford. Accelerated methods for nonconvex optimization. SIAM J. Optim., 28(2):1751–1772, 2018.
- [14] Diego Cifuentes. On the Burer–Monteiro method for general semidefinite programs. Optimization Letters, 15(6):2299–2309, 2021.
- [15] Diego Cifuentes and Ankur Moitra. Polynomial time guarantees for the Burer-Monteiro method. Advances in Neural Information Processing Systems, 35:23923–23935, 2022.
- [16] Andrew R Conn, Nicholas IM Gould, and Philippe Toint. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM Journal on Numerical Analysis, 28(2):545–572, 1991.
- [17] Timothy A. Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1), dec 2011.
- [18] Qi Deng, Qing Feng, Wenzhi Gao, Dongdong Ge, Bo Jiang, Yuntian Jiang, Jingsong Liu, Tianhao Liu, Chenyu Xue, Yinyu Ye, et al. New developments of ADMM-based interior point methods for linear programming and conic programming. arXiv preprint arXiv:2209.01793, 2022.
- [19] Lijun Ding and Benjamin Grimmer. Revisiting spectral bundle methods: Primal-dual (sub)linear convergence rates. SIAM Journal on Optimization, 33(2):1305–1332, 2023.
- [20] Lijun Ding, Alp Yurtsever, Volkan Cevher, Joel A Tropp, and Madeleine Udell. An optimal-storage approach to semidefinite programming using approximate complementarity. SIAM Journal on Optimization, 31(4):2695–2725, 2021.
- [21] Murat A Erdogdu, Asuman Ozdaglar, Pablo A Parrilo, and Nuri Denizcan Vanli. Convergence rate of block-coordinate maximization Burer–Monteiro method for solving large SDPs. Mathematical Programming, 195(1-2):243–281, 2022.
- [22] Maryam Fazel. Matrix rank minimization with applications. PhD thesis, PhD thesis, Stanford University, 2002.
- [23] Robert M Freund, Paul Grigas, and Rahul Mazumder. An extended Frank–Wolfe method with “in-face” directions, and its application to low-rank matrix completion. SIAM Journal on optimization, 27(1):319–346, 2017.
- [24] Mituhiro Fukuda, Masakazu Kojima, Kazuo Murota, and Kazuhide Nakata. Exploiting sparsity in semidefinite programming via matrix completion i: General framework. SIAM Journal on Optimization, 11(3):647–674, 2001.
- [25] Michael Garstka, Mark Cannon, and Paul Goulart. Cosmo: A conic operator splitting method for convex conic problems. Journal of Optimization Theory and Applications, 190(3):779–810, 2021.
- [26] Robert Grone, Charles R. Johnson, Eduardo M. Sá, and Henry Wolkowicz. Positive definite completions of partial hermitian matrices. Linear Algebra and its Applications, 58:109–124, 1984.
- [27] Martin Grötschel, László Lovász, and Alexander Schrijver. Polynomial algorithms for perfect graphs. In North-Holland mathematics studies, volume 88, pages 325–356. Elsevier, 1984.
- [28] Zaid Harchaoui, Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program., 152(1-2):75–112, 2015.
- [29] Elad Hazan. Sparse approximate solutions to semidefinite programs. In Latin American symposium on theoretical informatics, pages 306–316. Springer, 2008.
- [30] C. Helmberg and K.C. Kiwiel. A spectral bundle method with bounds. Math. Program., 93(2):173–194, 2002.
- [31] C. Helmberg and F. Rendl. A spectral bundle method for semidefinite programming. SIAM Journal on Optimization, 10(3):673–696, 2000.
- [32] Steven Homer and Marcus Peinado. Design and performance of parallel and distributed approximation algorithms for maxcut. Journal of Parallel and Distributed Computing, 46(1):48–61, 1997.
- [33] Wen Huang, Kyle A Gallivan, and Xiangxiong Zhang. Solving PhaseLift by low-rank Riemannian optimization methods for complex semidefinite constraints. SIAM Journal on Scientific Computing, 39(5):B840–B859, 2017.
- [34] Martin Jaggi and Marek Sulovský. A simple algorithm for nuclear norm regularized problems. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 471–478, Madison, WI, USA, 2010. Omnipress.
- [35] Michel Journée, Francis Bach, P-A Absil, and Rodolphe Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM Journal on Optimization, 20(5):2327–2351, 2010.
- [36] W. Kong, J.G. Melo, and R.D.C. Monteiro. Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim., 29(4):2566–2593, 2019.
- [37] W. Kong, J.G. Melo, and R.D.C. Monteiro. An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems. Comput. Optim. Appl., 76(2):305–346, 2019.
- [38] Brian Kulis, Arun C Surendran, and John C Platt. Fast low-rank semidefinite programming for embedding and clustering. In Artificial Intelligence and Statistics, pages 235–242. PMLR, 2007.
- [39] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
- [40] László Lovász. On the Shannon capacity of a graph. IEEE Transactions on Information theory, 25(1):1–7, 1979.
- [41] Ramtin Madani, Abdulrahman Kalbat, and Javad Lavaei. ADMM for sparse semidefinite programming with applications to optimal power flow problem. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 5932–5939. IEEE, 2015.
- [42] Anirudha Majumdar, Georgina Hall, and Amir Ali Ahmadi. Recent scalability improvements for semidefinite programming with applications in machine learning, control, and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 3:331–360, 2020.
- [43] Song Mei, Theodor Misiakiewicz, Andrea Montanari, and Roberto Imbuzeiro Oliveira. Solving SDPs for synchronization and MaxCut problems via the G rothendieck inequality. In Conference on learning theory, pages 1476–1515. PMLR, 2017.
- [44] F. Oustry. A second-order bundle method to minimize the maximum eigenvalue function. Math. Program., 89(1):1–33, 2000.
- [45] Brendan O’donoghue, Eric Chu, Neal Parikh, and Stephen Boyd. Conic optimization via operator splitting and homogeneous self-dual embedding. Journal of Optimization Theory and Applications, 169:1042–1068, 2016.
- [46] C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui. Catalyst for gradient-based nonconvex optimization. In AISTATS 2018-21st International Conference on Artificial Intelligence and Statistics, pages 1–10, 2018.
- [47] Gábor Pataki. On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Mathematics of operations research, 23(2):339–358, 1998.
- [48] Thomas Pumir, Samy Jelassi, and Nicolas Boumal. Smoothed analysis of the low-rank approach for smooth semidefinite programs. Advances in Neural Information Processing Systems, 31, 2018.
- [49] Nikhil Rao, Parikshit Shah, and Stephen Wright. Conditional gradient with enhancement and truncation for atomic-norm regularization. In NIPS workshop on Greedy Algorithms. Citeseer, 2013.
- [50] James Renegar. Accelerated first-order methods for hyperbolic programming. Mathematical Programming, 173(1-2):1–35, 2019.
- [51] David M Rosen. Scalable low-rank semidefinite programming for certifiably correct machine perception. In Algorithmic Foundations of Robotics XIV: Proceedings of the Fourteenth Workshop on the Algorithmic Foundations of Robotics 14, pages 551–566. Springer, 2021.
- [52] David M Rosen, Luca Carlone, Afonso S Bandeira, and John J Leonard. SE-Sync: A certifiably correct algorithm for synchronization over the special Euclidean group. The International Journal of Robotics Research, 38(2-3):95–125, 2019.
- [53] Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph analytics and visualization. In AAAI, 2015.
- [54] Alexander Shapiro. Rank-reducibility of a symmetric matrix and sampling theory of minimum trace factor analysis. Psychometrika, 47:187–199, 1982.
- [55] Nimita Shinde, Vishnu Narayanan, and James Saunderson. Memory-efficient structured convex optimization via extreme point sampling. SIAM Journal on Mathematics of Data Science, 3(3):787–814, 2021.
- [56] A. Sujanani and R.D.C. Monteiro. An adaptive superfast inexact proximal augmented Lagrangian method for smooth nonconvex composite optimization problems. J. Scientific Computing, 97(2), 2023.
- [57] Lieven Vandenberghe, Martin S Andersen, et al. Chordal graphs and semidefinite optimization. Foundations and Trends® in Optimization, 1(4):241–433, 2015.
- [58] Irene Waldspurger and Alden Waters. Rank optimality for the Burer–Monteiro factorization. SIAM journal on Optimization, 30(3):2577–2602, 2020.
- [59] Alex L Wang and Fatma Kilinc-Karzan. Accelerated first-order methods for a class of semidefinite programs. arXiv preprint arXiv:2206.00224, 2022.
- [60] Liuqin Yang, Defeng Sun, and Kim-Chuan Toh. SDPNAL+: a majorized semismooth Newton-CG augmented L agrangian method for semidefinite programming with nonnegative constraints. Mathematical Programming Computation, 7(3):331–366, 2015.
- [61] Yinyu Ye. Gset dataset of random graphs. https://www.cise.ufl.edu/research/sparse/matrices/Gset, 2003.
- [62] Alp Yurtsever, Olivier Fercoq, and Volkan Cevher. A conditional-gradient-based augmented lagrangian framework. In International Conference on Machine Learning, pages 7272–7281. PMLR, 2019.
- [63] Alp Yurtsever, Ya-Ping Hsieh, and Volkan Cevher. Scalable convex methods for phase retrieval. In 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 381–384. IEEE, 2015.
- [64] Alp Yurtsever, Joel A Tropp, Olivier Fercoq, Madeleine Udell, and Volkan Cevher. Scalable semidefinite programming. SIAM Journal on Mathematics of Data Science, 3(1):171–200, 2021.
- [65] Xin-Yuan Zhao, Defeng Sun, and Kim-Chuan Toh. A newton-cg augmented lagrangian method for semidefinite programming. SIAM Journal on Optimization, 20(4):1737–1765, 2010.
- [66] Yang Zheng, Giovanni Fantuzzi, Antonis Papachristodoulou, Paul Goulart, and Andrew Wynn. Fast ADMM for semidefinite programs with chordal sparsity. In 2017 American Control Conference (ACC), pages 3335–3340. IEEE, 2017.