This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convex Minimization with Integer Minima in O~(n4)\widetilde{O}(n^{4}) Timethanks: A preliminary version of this paper appeared at the 35th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2024).

Haotian Jiang [email protected]. University of Washington.    Yin Tat Lee [email protected]. University of Washington.    Zhao Song [email protected]. Adobe Research.    Lichen Zhang [email protected]. Massachusetts Institute of Technology. Work partially done while visiting University of Washington. Supported in part by NSF grant No. 1955217 and NSF grant No. 2022448.

Given a convex function ff on n\mathbb{R}^{n} with an integer minimizer, we show how to find an exact minimizer of ff using O(n2logn)O(n^{2}\log n) calls to a separation oracle and O(n4logn)O(n^{4}\log n) time. The previous best polynomial time algorithm for this problem given in [Jiang, SODA 2021, JACM 2022] achieves O(n2loglogn/logn)O(n^{2}\log\log n/\log n) oracle complexity. However, the overall runtime of Jiang’s algorithm is at least Ω~(n8)\widetilde{\Omega}(n^{8}), due to expensive sub-routines such as the Lenstra-Lenstra-Lovász (LLL) algorithm [Lenstra, Lenstra, Lovász, Math. Ann. 1982] and random walk based cutting plane method [Bertsimas, Vempala, JACM 2004]. Our significant speedup is obtained by a nontrivial combination of a faster version of the LLL algorithm due to [Neumaier, Stehlé, ISSAC 2016] that gives similar guarantees, the volumetric center cutting plane method (CPM) by [Vaidya, FOCS 1989] and its fast implementation given in [Jiang, Lee, Song, Wong, STOC 2020].

For the special case of submodular function minimization (SFM), our result implies a strongly polynomial time algorithm for this problem using O(n3logn)O(n^{3}\log n) calls to an evaluation oracle and O(n4logn)O(n^{4}\log n) additional arithmetic operations. Both the oracle complexity and the number of arithmetic operations of our more general algorithm are better than the previous best-known runtime algorithms for this specific problem given in [Lee, Sidford, Wong, FOCS 2015] and [Dadush, Végh, Zambelli, SODA 2018, MOR 2021].

1 Introduction

The problem of minimizing a convex function ff over n\mathbb{R}^{n}, assuming access to a separation oracle 𝖲𝖮\operatorname{\mathsf{SO}}, has gained considerable interest since the seminal work of Grötschel, Lovász, and Schrijver [22, 24]. The separation oracle 𝖲𝖮\operatorname{\mathsf{SO}} is such that when queried with a point xnx\in\mathbb{R}^{n}, it returns “YES” if xx minimizes ff, or else a hyperplane that separates xx from the minimizers of ff. One popular and successful approach for this problem is the cutting plane method, which dates back to the center of gravity method, independently discovered by Levin [39] and Newman [48] in the 1960s. Since then, cutting plane methods have undergone substantial developments and improvements over the past six decades in terms of its oracle complexity111The oracle complexity of an algorithm is the number of oracle calls made by the algorithm in the worst case. and runtime efficiency [58, 72, 37, 65, 49, 67, 4, 6, 44]. In particular, the current fastest cutting plane method is due to Jiang, Lee, Song, and Wong [34].

Despite outstanding progress on the cutting plane method, direct application of these methods for minimizing a convex function ff on n\mathbb{R}^{n} via a separation oracle typically results in weakly-polynomial time algorithms, with the oracle complexity and runtime depending logarithmically on the accuracy parameter ε>0\varepsilon>0 and the “size”222Depending on the specific problem setting, the “size” of the function can be its range, Lipschitzness, condition number, length of binary representation, etc. of the function ff.

A fundamental but extremely challenging question is to design a strongly-polynomial time algorithm that efficiently computes an exact minimizer of ff on n\mathbb{R}^{n}, with its oracle complexity, number of arithmetic operations, and bit size all being polynomial only in the dimension nn of the problem and not dependent on the “size” of the function ff. For example, it remains a major open problem to design a strongly-polynomial time algorithm for solving linear programs (LPs). This problem is widely known as Smale’s 9th problem in his list of eighteen unsolved mathematical problems for the 21st century [59]. In particular, recent breakthroughs on LP solvers are all weakly-polynomial time algorithms, e.g., [42, 43, 11, 5, 36]. Despite this obstacle, strongly-polynomial time algorithms for LPs are known under additional structures, such as LPs with at most two non-zero entries per row [47, 1, 12] or per column [68, 53] in the constraint matrix, LPs with bounded entries in the constraint matrix [64, 70, 14], and LPs with 0-1 optimal solutions [9, 10].

For minimizing a general convex function ff with access to a separation oracle, it is impossible to design a strongly-polynomial time algorithm unless the function ff satisfies additional combinatorial properties. One natural and general assumption, satisfied by many fundamental discrete optimization problems such as submodular function minimization (SFM), maximum flow, maximum matching, and shortest path, is that the convex function ff has an integer minimizer. Under the integer minimizer assumption, Jiang [31, 32], improving upon the classical work of Grötschel, Lovász, and Schrijver [22, 24], recently gave a strongly-polynomial time algorithm333In fact, Jiang [31, 32] gave a family of algorithms for this problem via an algorithmic reduction to the shortest vector problem. For instance, he gave an algorithm that achieves a nearly-optimal oracle complexity of O(nlog(nR))O(n\log(nR)) but using exponential time. for finding an exact minimizer of ff using O(n(nloglogn/logn+logR))O(n(n\log\log n/\log n+\log R)) calls to 𝖲𝖮\operatorname{\mathsf{SO}}, where R=2poly(n)R=2^{\operatorname{poly}(n)} is the \ell_{\infty}-norm of the integer minimizer.

In fact, Jiang’s general result implies significant improvement to the oracle complexity of strongly-polynomial time algorithms even for the special problem of SFM (see Subsection 1.2 for more details). Despite its favorable oracle complexity, Jiang’s algorithm actually requires Ω(n8)\Omega(n^{8}) additional arithmetic operations to implement. This additional part of the runtime is prohibitively large for its application to SFM; other state-of-the-art strongly-polynomial time algorithms for SFM [44, 34, 17] have worse oracle complexity but use much fewer arithmetic operations.

Unlike the special case of SFM, for the more general problem of minimizing a convex function with an integer minimizer, the Ω~(n8)\widetilde{\Omega}(n^{8}) arithmetic operations in Jiang’s algorithm is the state-of-the-art for strongly-polynomial time algorithms with near-optimal oracle complexity. The lack of a fast algorithm for the general problem, as well as the tradeoff between smaller oracle complexity and fewer additional arithmetic operations for state-of-the-art SFM algorithms, naturally lead to the following question:

Can we minimize a convex function with integer minimizers using a separation oracle in a way that is both oracle-efficient and requires much fewer arithmetic operations?

In this paper, we provide an affirmative answer to the above question. Specifically, we give an algorithm that minimizes a convex function with an integer minimizer using only O(n2log(nR))O(n^{2}\log(nR)) separation oracle calls and an additional O(n4log(nR))O(n^{4}\log(nR)) arithmetic operations. When applied to SFM, our result implies an algorithm that makes O(n3logn)O(n^{3}\log n) evaluation oracle queries and uses an additional O(n4logn)O(n^{4}\log n) arithmetic operations. This improves, for both the oracle complexity and the additional arithmetic operations, the state-of-the-art strongly-polynomial time SFM algorithms in [44, 17]. Compared with Jiang’s algorithm [32], despite having a slightly bigger oracle complexity, the additional number of arithmetic operations used by our algorithm is significantly lower by a Θ~(n4)\widetilde{\Theta}(n^{4}) factor.

1.1 Our Result

The main result of this paper is a much more efficient algorithm for minimizing convex functions with integer minimizers:

Theorem 1.1 (Main result, informal version of Theorem 5.4).

Given a separation oracle 𝖲𝖮\operatorname{\mathsf{SO}} for a convex function ff on n\mathbb{R}^{n}. If the set of minimizers KK^{*} of ff is contained in a box of radius RR and all extreme points of KK^{*} are integer vectors, then there exists a randomized algorithm that outputs an integer minimizer of ff with high probability using

  • O(n2log(nR)){O}(n^{2}\log(nR)) queries to 𝖲𝖮\operatorname{\mathsf{SO}}, and

  • O(n4log(nR)){O}(n^{4}\log(nR)) additional arithmetic operations.

The strong assumption that all extreme points of KK^{*} are integer vectors guarantees that the algorithm outputs an integer minimizer of ff. In fact, this assumption is necessary for the algorithm to efficiently compute an integer minimizer (see Remark 1.4 in [32] for an example). Without such an assumption, the algorithm can still output a minimizer, though not neccessarily an integer one, with the same guarantee as in Theorem 1.1 (see Remark 1.5 in [32]).

Previously, under the same assumptions as in Theorem 1.1, Jiang [32] gave an algorithm that computes an integer minimizer of ff using O(n(nloglogn/logn+logR))O(n(n\log\log n/\log n+\log R)) separation oracle calls, which is better than the oracle complexity in Theorem 1.1. However, his algorithm uses an enormous O~(n8)\widetilde{O}(n^{8}) number of additional arithmetic operations because of the following reasons: to obtain a subquadratic oracle complexity, [32] resorts to the random walk based cutting plane method by Bertsimas and Vempala [6] to work directly with the polytope KK formed by all the separating hyperplanes output by 𝖲𝖮\operatorname{\mathsf{SO}}. Unfortunately, the sampling approach in [6] requires Ω(n6)\Omega(n^{6}) arithmetic operations to perform one iteration of the cutting plane method. In addition, to guarantee that the sampling can be performed efficiently, [32] has to maintain two “sandwiching” ellipsoids, contained in and containing the polytope KK respectively, which requires additional computational cost to preserve after reducing the dimension of the problem.

Another computational bottleneck in [32] is that the algorithm approximates the length of the shortest vector after every iteration of the cutting plane method. To do so, [32] leverages the sieve algorithm in [2] that computes a 2nloglogn/logn2^{n\log\log n/\log n}-approximation to the shortest vector using O~(n4)\widetilde{O}(n^{4}) arithmetic operations. Compounding with the overall O~(n2)\widetilde{O}(n^{2}) iterations of the algorithm, this step of approximating the shortest vector takes Ω(n6)\Omega(n^{6}) arithmetic operations in total.

To get around the aforementioned computational bottlenecks, we utilize an intricate combination of a computationally more efficient cutting plane method based on the volumetric center of KK due to Vaidya [67] and a faster version of the LLL algorithm given in [51]. Instead of working directly with the polytope KK and performing volume reduction in a step-by-step manner, we run Vaidya’s cutting plane method [67] in blocks of O(nlogn)O(n\log n) consecutive steps, the volume decreases by a constant factor per step only in an amortized sense within each block. In particular, recent work by Jiang, Lee, Song and Wong [34] shows that O(nlogn)O(n\log n) steps of Vaidya’s method can be implemented using a total of O(n3logn)O(n^{3}\log n) arithmetic operations. Running Vaidya’s method in blocks also enables us to compute only O~(n)\widetilde{O}(n) approximate shortest vectors in total, while still being able to avoid the appearance of extremely short vectors. Harnessing the faster implementation of the LLL algorithm in [51] allows us to compute the O~(n)\widetilde{O}(n) approximate shortest vectors using a total of O~(n4)\widetilde{O}(n^{4}) arithmetic operations.

1.2 Applications to Submodular Function Minimization

Submodular function minimization (SFM) has been one of the cornerstone problems in combinatorial optimization since the seminal work of Edmonds [19]. Many classic combinatorial problems can be abstracted as optimization over a submodular function, such as graph cut function, set coverage function and economic utility function. For more comprehensive reviews of SFM, we refer readers to [46, 30].

Papers Year Runtime Remarks General?
[22, 24] 1981,88 O~(n5𝖤𝖮+n7)\widetilde{O}(n^{5}\cdot\mathsf{EO}+n^{7}) [46] first strongly \checkmark
[57] 2000 O(n8𝖤𝖮+n9)O(n^{8}\cdot\mathsf{EO}+n^{9}) first comb. strongly
[27] 2000 O(n7logn𝖤𝖮+poly(n))O(n^{7}\log n\cdot\mathsf{EO}+\operatorname{poly}(n)) first comb. strongly
[20] 2000 O(n7𝖤𝖮+n8)O(n^{7}\cdot\mathsf{EO}+n^{8})
[29] 2002 O(n6logn𝖤𝖮+n7logn)O(n^{6}\log n\cdot\mathsf{EO}+n^{7}\log n)
[71] 2003 O(n7𝖤𝖮+n8)O(n^{7}\cdot\mathsf{EO}+n^{8})
[52] 2007 O(n5𝖤𝖮+n6)O(n^{5}\cdot\mathsf{EO}+n^{6})
[28] 2009 O(n5logn𝖤𝖮+n6logn)O(n^{5}\log n\cdot\mathsf{EO}+n^{6}\log n)
[44] 2015 O(n3log2n𝖤𝖮+n4logO(1)n)O(n^{3}\log^{2}n\cdot\mathsf{EO}+n^{4}\log^{O(1)}n) previous best runtime
[44] 2015 O(n3logn𝖤𝖮+2O(n))O(n^{3}\log n\cdot\mathsf{EO}+2^{O(n)}) exponential time
[17] 2018 O(n3log2n𝖤𝖮+n4logO(1)n)O(n^{3}\log^{2}n\cdot\mathsf{EO}+n^{4}\log^{O(1)}n) previous best runtime
[31] 2021 O(n3loglognlogn𝖤𝖮+n8logn)O(n^{3}\frac{\log\log n}{\log n}\cdot\mathsf{EO}+n^{8}\log n) best oracle complexity \checkmark
[31] 2021 O(n2logn𝖤𝖮+2O(n))O(n^{2}\log n\cdot\mathsf{EO}+2^{O(n)}) exponential time \checkmark
This paper 2023 O(n3logn𝖤𝖮+n4logn)O(n^{3}\log n\cdot\mathsf{EO}+n^{4}\log n) best runtime \checkmark
Table 1: Strongly-polynomial algorithms for submodular function minimization. The oracle complexity measures the number of calls to the evaluation oracle 𝖤𝖮\mathsf{EO}. In the case where a paper is published in both conference and journal, the year we provide is the earlier one. In the column “General?”, \checkmark means that the algorithm works for the more general problem of minimizing convex functions with integer minimizers studied in this paper.

We briefly recall the standard setting of SFM: we are given a submodular function f:2Vf:2^{V}\rightarrow\operatorname{\mathbb{Z}} over an nn-element ground set VV where the function ff is accessed via an evaluation oracle, which returns the value f(S)f(S) for a query set SS using time 𝖤𝖮\mathsf{EO}. The goal is to find the minimizer of ff using queries to the evaluation oracle and additional arithmetic operations.

An SFM algorithm is called strongly-polynomial time if its runtime depends polynomially only on 𝖤𝖮\mathsf{EO} and the dimension nn, but not on the range of the function ff. In their seminal work, Grötschel, Lovász, and Schrijver [23, 24] gave the first strongly-polynomial time algorithm for SFM based on the ellipsoid method. Since then, there has been a long history of efforts in designing faster strongly-polynomial time algorithms for SFM (see Table 1).

The state-of-the-art strongly-polynomial time algorithms for SFM, in terms of the number of arithmetic operations used, were given by Lee, Sidford, and Wong [44] and Dadush, Végh, and Zambelli [17]. Both algorithms have runtime O(n3log2n𝖤𝖮+n4logO(1)n)O(n^{3}\log^{2}n\cdot\mathsf{EO}+n^{4}\log^{O(1)}n). The oracle complexity of these algorithms were later improved to sub-cubic in terms of nn by Jiang [31, 32], where he gave a strongly-polynomial time algorithm with runtime O(n3loglogn/logn𝖤𝖮+n8logO(1)n)O(n^{3}\log\log n/\log n\cdot\mathsf{EO}+n^{8}\log^{O(1)}n) alongside an exponential time algorithm with a nearly-quadratic O(n2logn)O(n^{2}\log n) oracle complexity. It remains a major open problem in the area of SFM whether there exists a strongly-polynomial time algorithm with truly sub-cubic, i.e. O(n3c)O(n^{3-c}) for some absolute constant c>0c>0, oracle complexity. When the problem has certain structures, such as the minimizer is kk-sparse, Graur, Jiang and Sidford [21] have developed an algorithm with O~(|V|poly(k))\widetilde{O}(|V|\cdot\operatorname{poly}(k)) queries. SFM has also been heavily studied in the parallel setting, see e.g., [8].

Unfortunately, while the algorithm in [32] answers major open questions in [44] and makes significant progress on the oracle complexity for SFM, the number of additional arithmetic operations used by Jiang’s algorithm is a factor of Θ~(n4)\widetilde{\Theta}(n^{4}) larger than the algorithms in [44, 17]. In this paper, we complement Jiang’s algorithm by giving the following strongly-polynomial time SFM algorithm that improves both the oracle complexity and the number of arithmetic operations used by the algorithms in [44, 17].

Corollary 1.2 (Submodular function minimization).

Given an evaluation oracle 𝖤𝖮\mathsf{EO} for a submodular function ff defined over an nn-element ground set, there exists a randomized strongly-polynomial time algorithm that computes an exact minimizer of ff with high probability using

  • O(n3logn)O(n^{3}\log n) queries to 𝖤𝖮\mathsf{EO}, and

  • O(n4logn)O(n^{4}\log n) additional arithmetic operations.

Corollary 1.2 almost immediately follows from Theorem 1.1 together with the standard fact that a separation oracle for the Lovász extension of the submodular function ff can be implemented by making nn queries to the evaluation oracle of ff and O(nlogn)O(n\log n) additional arithmetic operations (e.g., Theorem 6.4 in [32]). The proof of Corollary 1.2 is essentially identical to the proof of Theorem 1.7 in [32], and is therefore omitted.

Compared to the previous best runtime algorithms in [44] and [17], our algorithm improves their oracle complexity from O(n3log2n)O(n^{3}\log^{2}n) to O(n3logn)O(n^{3}\log n) while also improving the number of arithmetic operations from O(n4logO(1)n)O(n^{4}\log^{O(1)}n) to O(n4logn)O(n^{4}\log n). We highlight that in [44], the authors manage to achieve an O(n3logn)O(n^{3}\log n) oracle complexity, but at the expense of an exponential runtime. It is also important to note that our improvements are achieved via a much more general algorithm, whereas the algorithms in [44, 17] work specifically for SFM.

In comparison to the current best oracle complexity algorithm in [32], our algorithm has a slightly worse oracle complexity, but we significantly improve the O~(n8)\widetilde{O}(n^{8}) additional arithmetic operations in his algorithm down to O(n4logn)O(n^{4}\log n).

Finally, note that the current best implementation of a separation oracle for the Lovász extension requires nn queries to 𝖤𝖮\mathsf{EO}, and the current fastest cutting plane method requires O(n2)O(n^{2}) arithmetic operations per step. So for any cutting plane algorithm for SFM that uses TT iterations, the current best runtime we can hope for such a method is O(Tn𝖤𝖮+Tn2)O(Tn\cdot\mathsf{EO}+Tn^{2}) using state-of-the-art techniques. Our algorithm, in fact, matches such a runtime bound. In particular, we use O(n2logn)O(n^{2}\log n) iterations of the cutting plane method with a total runtime of O(n3logn𝖤𝖮+n4logn)O(n^{3}\log n\cdot\mathsf{EO}+n^{4}\log n). So in some sense, our result matches what can be achieved using the current best known algorithmic techniques for cutting plane methods. In contrast, the algorithm in [32] does not have such a feature, with the Ω(n8)\Omega(n^{8}) additional arithmetic operations being much larger than its oracle complexity.

2 Technique Overview

In this section, we provide a brief overview of the techniques in prior work and in our work. In Section 2.1, we review the methods in [32]. In Section 2.2, we present a preliminary overview of our approach. In Section 2.3, we summarize current progress on the problem of minimizing convex functions with integer minimizers and discuss potential future directions.

2.1 An Overview of Previous Work

Before discussing our main technical insights, we first review the approach of [32] that achieves subquadratic oracle calls for minimizing convex functions with integer minimizers. The key ingredient in Jiang’s algorithm is an interplay between the polytope, formed by the separating hyperplanes returned by the separation oracle, which contains the integer minimizer, and a lattice that captures the structures of integer points.

In particular, in each iteration of the cutting plane method, Jiang’s algorithm examines the length of the (approximate) shortest vector in the lattice under the Cov(K){\rm Cov}(K)-norm, where Cov(K){\rm Cov}(K) is the covariance matrix of the polytope KK. The length of this vector under Cov(K){\rm Cov}(K)-norm provides a measurement for the width of the outer ellipsoid induced by Cov(K)1{\rm Cov}(K)^{-1}. Thus, when this norm is small, it implies that the outer ellipsoid has a small width and we can safely proceed by reducing dimension and projecting the lattice onto the orthogonal complement of the shortest vector. On the other hand, if the shortest vector still has a relatively large norm, then the polytope KK can be further refined by using cutting plane method. To achieve the best possible oracle complexity, the algorithm in [32] performs the width measurement in a step-by-step fashion, meaning that it actively checks the length of the shortest vector after every single cutting plane step. This ensures that the algorithm can enter the dimension reduction phase as soon as a short lattice vector can be obtained, which happens when the volume of KK is small enough.

Unfortunately, such a step-by-step strategy and careful width measurement come at a price – the approximate shortest vector subprocedure will be called in every iteration of the cutting plane method. Moreover, an expensive cutting plane method, such as the random walk based center of gravity method [6], that guarantees the volume shrinks in every step needs to be used.

2.2 Our Approach: Cutting Plane in Blocks and Lazy Width Measurement

Now we discuss our approach for overcoming the major computational bottleneck of [32], which is the step-by-step measurement of width mentioned above. For simplicity, we assume the integer minimizer x{0,1}nx^{*}\in\{0,1\}^{n} (i.e. R=1R=1) in the subsequent discussion. By delaying the width measurement, i.e. find an approximate shortest vector after a block of cutting plane steps, we can utilize the much more efficient cutting plane method due to Vaidya [67]. However, this comes at the cost of a possibly much larger number of oracle calls, since the length of the shortest vector in the lattice might become very small when we actually perform the measurement, which will lead to a large increase in the volume of KK after dimension reduction. Nevertheless, we show that by carefully balancing the block size of cutting plane steps and the loss incurred during dimension reduction due to short lattice vectors, we can still achieve an O(n2logn)O(n^{2}\log n) oracle complexity.

To explain our approach, we first provide a brief introduction to Vaidya’s cutting plane method. Unlike classical cutting plane methods, such as the ellipsoid method and the center of gravity method, where each step is guaranteed to shrink the volume, Vaidya’s algorithm and analysis rely on controlling the volume of the Dikin ellipsoid induced by the log barrier on polytope KK. The algorithm iteratively finds the point ωK\omega_{K} whose corresponding Dikin ellipsoid has the largest volume, which is called the volumetric center. Subsuquently, the algorithm uses a separation oracle on the volumetric center ωK\omega_{K}. If ωK\omega_{K} is a minimizer, then we are done; otherwise, the algorithm computes the leverage score of each constraint with respect to the Hessian matrix, which measures the relative importance of each constraint. If all constraints are relatively important, then a new constraint is added based on the separating hyperplane returned by the separation oracle at ωK\omega_{K}. If one of the constraints has leverage score smaller than some tiny constant, then it is dropped to ensure that the polytope KK always has at most O(n)O(n) constraints. Due to the increase in the volume of the polytope KK when constraints are dropped, the volume of KK only shrinks in an amortized sense. In particular, only after O(nlogn)O(n\log n) steps, the volume of KK is guaranteed to decrease by a multiplicative factor of 2O(nlogn)2^{-O(n\log n)} from the initial volume. It is crucial to note that no guarantee on the volume shrinkage of KK is known if o(nlogn)o(n\log n) iterations of Vaidya’s cutting plane method is run.

Our idea is then to view the O(nlogn)O(n\log n) steps as a “block”, meaning that each time we measure the shortest vector and realize that it still has a large norm, we execute Vaidya’s cutting plane steps for O(nlogn)O(n\log n) steps and then re-examine the norm. This naturally induces a strategy that measures the width of the ellipsoid in a lazy fashion. The volume decrease of KK via this strategy is similar to that when running the cutting plane method in [6] for O(nlogn)O(n\log n) steps. Therefore, the only issue is the significant decrease in the norm of the shortest vector during a block of cutting plane steps, which can be as small as 2O(nlogn)2^{-O(n\log n)}. Fortunately, this shrinkage of the norm is acceptable and it incurs only an additional O(logn)O(\log n) factor in the number of oracle calls.

One major advantage of using Vaidya’s cutting plane method is that the iterations can be performed very efficiently. Using the fast implementation in [34], a block of O(nlogn)O(n\log n) cutting plane steps can be done in O(n3logn)O(n^{3}\log n) time. Lazy width measurement also reduces the total number of approximate shortest vectors we need to compute from O~(n2)\widetilde{O}(n^{2}) to O~(n)\widetilde{O}(n), which opens up the gate of using faster approximate shortest vector algorithms in [51].

Although the efficiency issue of the cutting plane step and width measurement is addressed, we still need to carefully control the complexity of the dimension reduction step. This step can be particularly expensive due to the loss of structural information of the polytope KK after collapsing it onto a proper subspace PP on which the algorithm recurs, as discussed in [32]. In the standard setting of Vaidya’s cutting plane method, the polytope evolves in a “slow-changing” manner that makes it easy to maintain and update the volumetric center and its corresponding Hessian matrix. However, after reducing a dimension, we no longer have such a “slow-changing” property, and thus the new volumetric center and Hessian matrix have to be computed from scratch.

A natural idea would be to formulate the problem of recomputing the volumetric center and Hessian as a convex optimization task constrained to the polytope KPK\cap P we have access to. On the surface, this problem can be solved straightforwardly using the same cutting plane procedure efficiently. The caveat is that evaluating the volumetric function or its gradient requires O(nω)O(n^{\omega}) time444ω\omega is the exponent of matrix multiplication [18, 69, 40]. Currently, ω2.373\omega\approx 2.373.. As the cutting plane method evolves for O(nlogn)O(n\log n) iterations, this will lead to a total of O(nω+1logn)O(n^{\omega+1}\log n) arithmetic operations. To circumvent this issue, we start from a simpler convex body containing KPK\cap P whose volumetric center can be easily computed. Specifically, we choose the hyperrectangle centered at the center of the outer ellipsoid that contains KPK\cap P. We show that this method only affects the number of oracle calls by a factor of O(logn)O(\log n), despite causing the volume to blow up by a factor of nO(n)n^{O(n)}. A similar approach is also taken in [32] to ease the computation of centroid and covariance matrix of KPK\cap P. However, his method requires iterativelty refining the hyperrectangle using constraints of KPK\cap P until it coincides with KPK\cap P, at which point the algorithm relearns the collapsed polytope KPK\cap P. This approach can take as many as O~(n2)\widetilde{O}(n^{2}) steps (without calling the separation oracle) with Ω(n2)\Omega(n^{2}) operations per step, and is mainly for achieving the best possible oracle complexity. Our approach is arguably simpler and much more efficient.

Another challenge is to prove that using volumetric centers and Dikin ellipsoids [15] is sufficient to progress the algorithm. In Jiang’s algorithm, the use of centroid and covariance matrix makes the analysis straightforward due to the standard nice geometric property that the polytope KK is sandwiched between ellipsoids induced by the covariance matrix at the centroid. However, in our analysis, we use the volumetric center and Dikin ellipsoid of the log barrier, which requires a different approach. Past works that exclusively analyze the performance of the cutting plane method based on this approach [67] take a functional value point of view, measuring the progress of the algorithm via the change of the volumetric function. Instead, our analysis extracts key geometric information of Vaidya’s cutting plane method, showing that the volumetric center and Dikin ellipsoid evolve in a similar spirit as the centroid and covariance matrix.

In particular, we prove structural result for using Dikin ellipsoids [15] to sandwich of an nn-dimensional polytope defined by mm constraints. Let us parametrize K={xn:Axb}K=\{x\in\mathbb{R}^{n}:Ax\geq b\} where Am×nA\in\mathbb{R}^{m\times n} and bmb\in\mathbb{R}^{m}, define the Hessian of the log-barrier as H(x)=ASx2AH(x)=A^{\top}S_{x}^{-2}A where Sxm×mS_{x}\in\mathbb{R}^{m\times m} is the diagonal matrix with ii-th diagonal being aixbia_{i}^{\top}x-b_{i}. The volumetric function is defined as F(x)=12logdet(H(x))F(x)=\frac{1}{2}\log\det(H(x)) and we let vc(K)\operatorname{vc}(K) denote the minimizer of FF, i.e. vc(K)\operatorname{vc}(K) is the volumetric center of KK. The key structural result we prove is that E(vc(K),H(vc(K)))Kpoly(mn)E(vc(K),H(vc(K)))E(\operatorname{vc}(K),H(\operatorname{vc}(K)))\subseteq K\subseteq\operatorname{poly}(mn)\cdot E(\operatorname{vc}(K),H(\operatorname{vc}(K))), where E(vc(K),H(vc(K)))E(\operatorname{vc}(K),H(\operatorname{vc}(K))) is the ellipsoid centered at vc(K)\operatorname{vc}(K) and defined by H(vc(K))H(\operatorname{vc}(K)). While it is standard for KK to be sandwiched by Dikin ellipsoids induced by a self-concordant barrier function and centered at the minimizer of that function, we note that vc(K)\operatorname{vc}(K) is not the minimizer of the log-barrier function (but rather the minimizer of FF), while the Dikin ellipsoid is defined w.r.t. the Hessian HH of the log-barrier. In fact, for the Dikin ellipsoid centered at the analytic center denoted by ac(K){\rm ac}(K) (minimizer of the log-barrier function) and the induced Hessian at ac(K){\rm ac}(K), we have

E(ac(K),H(ac(K)))KO(m)E(ac(K),H(ac(K)).\displaystyle E(\operatorname{ac}(K),H(\operatorname{ac}(K)))\subseteq K\subseteq O(m)\cdot E(\operatorname{ac}(K),H(\operatorname{ac}(K)).

On the other hand, to progress Vaidya’s cutting plane method, we have to work with the Dikin ellipsoid centered and induced by vc(K)\operatorname{vc}(K) instead of ac(K)\operatorname{ac}(K). We first note that the scaled volumetric function indeed defines a self-concordant barrier function and we thus have

E(vc(K),2F(vc(K)))KO(mn)E(vc(K),2F(vc(K))).\displaystyle E(\operatorname{vc}(K),\nabla^{2}F(\operatorname{vc}(K)))\subseteq K\subseteq O(mn)\cdot E(\operatorname{vc}(K),\nabla^{2}F(\operatorname{vc}(K))).

For the left containment, we can safely replace 2F(vc(K))\nabla^{2}F(\operatorname{vc}(K)) as for any xKx\in K, it is true that E(x,H(x))KE(x,H(x))\subseteq K. For the right containment, we utilize the fact that H(x)H(x) and 2F(x)\nabla^{2}F(x) are close spectral approximation of each other, thus we have E(vc(K),2F(vc(K)))O(m)E(vc(K),H(vc(K)))E(\operatorname{vc}(K),\nabla^{2}F(\operatorname{vc}(K)))\subseteq O(\sqrt{m})\cdot E(\operatorname{vc}(K),H(\operatorname{vc}(K))), and we conclude

E(vc(K),H(vc(K)))Kpoly(mn)E(vc(K),H(vc(K))).\displaystyle E(\operatorname{vc}(K),H(\operatorname{vc}(K)))\subseteq K\subseteq\operatorname{poly}(mn)\cdot E(\operatorname{vc}(K),H(\operatorname{vc}(K))).

We believe this sandwiching condition might be of independent interest for other geometric applications of Vaidya’s method.

To carry out the analysis, we use the potential function developed in [32], which captures the volume of the polytope and the density of the lattice simultaneously. We show that the block of cutting plane steps also leads to rapid decrement of the potential. Unlike [32], the block cutting plane steps and Dikin ellipsoids cause extra losses on the volume of KK after dimension reduction, due to the possible appearance of very short vectors. We observe that such blowups are always at most nO(n)n^{O(n)} for each dimension reduction step. Thus, the potential increases by at most O(n2logn)O(n^{2}\log n) in total, and we need to use a total of O(n2logn)O(n^{2}\log n) oracle calls to counter this increment. To summarize, by trading for a slightly-worse number of oracle calls, we make more room for the algorithm to gain extra speedup through leverage score maintenance, faster approximate shortest vector, and crude estimation of the convex body after reducing dimension.

2.3 Discussion

A natural question that arises from our result is whether it is possible to obtain a strongly-polynomial time algorithm with quadratic or subquadratic oracle complexity, matching the one in [32], while achieving the same improved runtime as ours. Our algorithm has only one logn\log n factor in the oracle complexity, but we contend that this is inherent for Vaidya’s approach since it can only guarantee a volume decrease after O(nlogn)O(n\log n) cutting plane steps. Additionally, in the dimension reduction phase, we use a hyperrectangle to approximate the convex body, which introduces a volume increase of nO(n)n^{O(n)}. To resolve this issue, an algorithm that approximately computes the volumetric center in O(n3logn)O(n^{3}\log n) time would be necessary. However, designing such an algorithm is an interesting and nontrivial data structure task, as discussed in the preceding subsection, since it requires maintaining and updating the log determinant of a Gram matrix under slow diagonal changes.

To achieve a subquadratic oracle complexity, a crucial ingredient in [32] is an approximate shortest vector algorithm with a sub-exponential approximation factor, first given in [2]. The main reason for the sub-exponential approximation ratio is a block-reduction scheme introduced in [56] that computes a more general notion of reduced lattice basis. Recent improvements on basis reduction algorithms make use of this block-reduction idea to obtain more refined recursive structures. Therefore, it is of interest to design approximate shortest vector algorithms using O~(n3)\widetilde{O}(n^{3}) arithmetic operations while achieving a sub-exponential approximation factor.

3 Preliminary

In Section 3.1, we provide several basic notations, definitions and facts. In Section 3.2, we discuss LLL algorithm and shortest vector problem. In Section 3.3, we define and state several basic tools in convex geometry. In Section 3.4, we provide some related definitions about lattice projection. In Section 3.5, we present the slicing lemma. In Section 3.6, we state a dimension reduction lemma.

3.1 Notations and Basic Facts

Basic Notations.

For an integer nn, we use [n][n] to denote the set {1,2,,n}\{1,2,\cdots,n\}. For any function ff, we use O~(f)\widetilde{O}(f) to denote fpoly(logf)f\cdot\operatorname{poly}(\log f).

Matrices and Vectors.

For a vector xx, we use x2\|x\|_{2} to denote its 2\ell_{2} norm. For a vector xx, we use xx^{\top} to denote its transpose. For a matrix AA, we use AA^{\top} to denote its transpose. We use 𝟎n{\bf 0}_{n} to denote a length-nn vector where all the entries are zeros. We use 𝟏n{\bf 1}_{n} to denote a length-nn vector where all the entries are ones.

We say a square matrix An×nA\in\mathbb{R}^{n\times n} is PSD (A0A\succeq 0) if for all vectors xnx\in\mathbb{R}^{n}, xAx0x^{\top}Ax\geq 0. For a square matrix AA, we use det(A)\det(A) to denote the determinant of matrix AA. For a square and invertible matrix AA, we use A1A^{-1} to denote the inverse of matrix AA.

For a PSD matrix AA, we define the induced matrix norm for any vector xx as xA:=xAx\|x\|_{A}:=\sqrt{x^{\top}Ax}.

Ellipsoid.

Given a point x0nx_{0}\in\mathbb{R}^{n} and a PSD matrix An×nA\in\mathbb{R}^{n\times n}, we use E(x0,A)E(x_{0},A) to denote the (not necessarily full-rank) ellipsoid given by

E(x0,A):={xx0+WA:(xx0)A(xx0)1},\displaystyle E(x_{0},A):=\{x\in x_{0}+W_{A}:(x-x_{0})^{\top}A(x-x_{0})\leq 1\},

where WAW_{A} is the subspace spanned by eigenvectors corresponding to nonzero eigenvalues of AA. When the ellipsoid is centered at 𝟎n{\bf 0}_{n}, we use the short-hand notation E(A)E(A) to denote E(𝟎n,A)E({\bf 0}_{n},A).

Lattices.

Let b1,,bknb_{1},\ldots,b_{k}\in\mathbb{R}^{n} be a set of linearly independent vectors, we use

Λ(b1,,bk)={i=1kλibi,λi}\displaystyle\Lambda(b_{1},\ldots,b_{k})=\{\sum_{i=1}^{k}\lambda_{i}b_{i},\lambda_{i}\in\operatorname{\mathbb{Z}}\}

denote the lattice generated by b1,,bkb_{1},\ldots,b_{k} and kk is the rank of the lattice. If k=nk=n, then it’s full rank. A basis of Λ:=Λ(b1,,bk)\Lambda:=\Lambda(b_{1},\ldots,b_{k}) is a set of kk linearly independent vectors that generates Λ\Lambda under integer combinations. Bases of Λ\Lambda are equivalent under unimodular transforms. We use λ1(Λ)\lambda_{1}(\Lambda) to denote the 2\ell_{2} norm of the shortest nonzero vector in Λ\Lambda and λ1(Λ,A)\lambda_{1}(\Lambda,A) to denote the induced AA-norm of the shortest nonzero vector in Λ\Lambda.

Given a basis Bn×kB\in\mathbb{R}^{n\times k}, the parallelotope associated to it is the polytope P(B)={i=1kλibi:λi[0,1),i[k]}P(B)=\{\sum_{i=1}^{k}\lambda_{i}b_{i}:\lambda_{i}\in[0,1),\forall i\in[k]\}. The determinant of Λ\Lambda is the volume of P(B)P(B), which is independent of basis.

The dual lattice of Λ\Lambda is defined as follows:

Definition 3.1 (Dual lattice).

Given a lattice Λn\Lambda\subseteq\mathbb{R}^{n}, the dual lattice Λ\Lambda^{*} is the set of all vectors xspan{Λ}x\in\mathrm{span}\{\Lambda\} such that x,y\langle x,y\rangle\in\mathbb{Z} for all yΛy\in\Lambda.

For more backgrounds about lattices, we refer the readers to lecture notes by Rothvoss [55].

Leverage Score.

We define leverage score, which is a standard concept in numerical linear algebra [13, 7, 60, 61, 34, 62, 16]. We remark that leverage score has multiple equivalent definitions, here we just present one of them.

Definition 3.2 (Leverage score).

Given a matrix Am×nA\in\mathbb{R}^{m\times n}, we define the leverage score for matrix AA to be σm\sigma\in\mathbb{R}^{m}, i.e,

σi=ai(AA)1ai,i[m]\displaystyle\sigma_{i}=a_{i}^{\top}(A^{\top}A)^{-1}a_{i},~{}~{}~{}\forall i\in[m]

Note that aia_{i}^{\top} is the ii-th row of AA.

We state a useful fact here.

Fact 3.3 (folklore).

Given a matrix Am×nA\in\mathbb{R}^{m\times n}, we have the following identity of its leverage score:

i=1mσi=n.\displaystyle\sum_{i=1}^{m}\sigma_{i}=n.

3.2 LLL Algorithm for Shortest Vector Problem

Given a lattice Λ\Lambda and a corresponding basis Bn×kB\in\mathbb{R}^{n\times k}, it is natural to seek an algorithm that finds the vector with norm λ1(L)\lambda_{1}(L), or at least approximately finds it. The famous Lenstra-Lenstra-Lovász algorithm serves such a purpose:

Theorem 3.4 (LLL algorithm, [41]).

Let b1,,bknb_{1},\cdots,b_{k}\in\mathbb{Z}^{n} be a basis for lattice Λ\Lambda and An×nA\in\mathbb{Z}^{n\times n} be a PSD matrix that is full rank on span(Λ)\mathrm{span}(\Lambda). Let DD\in\mathbb{R} such that biA2D\|b_{i}\|_{A}^{2}\leq D for any i[k]i\in[k]. Then there exists an algorithm that outputs in O(n4log(D))O(n^{4}\log(D)) arithmetic operations a nonzero vector b1b_{1}^{\prime} such that

b1A22k1λ12(Λ,A)\displaystyle\|b_{1}^{\prime}\|_{A}^{2}\leq 2^{k-1}\cdot\lambda_{1}^{2}(\Lambda,A)

Moreover, the integers occurring in the algorithm have bit sizes at most O(nlog(D))O(n\log(D)).

Lately, [51] improves the runtime of the LLL algorithm by leveraging the block reduction technique introduced in [56]. This is a key component in our O(n4logn)O(n^{4}\log n) time algorithm.

Theorem 3.5 (Theorem 2 of [51]).

Let b1,,bknb_{1},\cdots,b_{k}\in\mathbb{Z}^{n} be a basis for lattice Λ\Lambda and An×nA\in\mathbb{Z}^{n\times n} be a PSD matrix that is full rank on span(Λ)\mathrm{span}(\Lambda). Let DD\in\mathbb{R} such that biA2D\|b_{i}\|_{A}^{2}\leq D for any i[k]i\in[k]. Then there exists an algorithm that outputs in O(n3)O(n^{3}) arithmetic operations to a nonzero vector b1b_{1}^{\prime} such that

b1A22k1λ12(Λ,A)\displaystyle\|b_{1}^{\prime}\|_{A}^{2}\leq 2^{k-1}\cdot\lambda_{1}^{2}(\Lambda,A)

Moreover, the integers occurring in the algorithm have bit sizes at most O(nlog(D))O(n\log(D)).

3.3 Convex Geometry

In this section, we define notions about centroid and covariance. The work [32] heavily exploits the structure of objects to design a subqudratic oracle complexity algorithm for minimizing convex functions. Our approach also relates to these notions.

For a convex body KK, we use vol(K)\mathrm{vol}(K) to denote its volume, i.e., vol(K):=xKdx\mathrm{vol}(K):=\int_{x\in K}\mathrm{d}x. We first define the centroid of convex body,

Definition 3.6.

Let KnK\subseteq\mathbb{R}^{n} be a convex body. Let g:K+g:K\rightarrow\mathbb{R}_{+} denote the uniform measure on convex body KK. We define the centroid of KK as

cg(K)=\displaystyle\mathrm{cg}(K)= Kg(x)xdx.\displaystyle~{}\int_{K}g(x)\cdot x~{}\mathrm{d}x.

Equivalently, we can write cg(K)\mathrm{cg}(K) as

cg(K)=\displaystyle\mathrm{cg}(K)= 1vol(K)Kxdx.\displaystyle~{}\frac{1}{\mathrm{vol}(K)}\int_{K}x~{}\mathrm{d}x.

Then, we define the covariance of convex body,

Definition 3.7 (Covariance of convex body, Cov(K)\mathrm{Cov}(K)).

Let KnK\subseteq\mathbb{R}^{n} be a convex body. We define the covariance matrix of KK under uniform measure as

Cov(K)=\displaystyle\mathrm{Cov}(K)= 1vol(K)K(xcg(K))(xcg(K))dx.\displaystyle~{}\frac{1}{\mathrm{vol}(K)}\int_{K}(x-\mathrm{cg}(K))(x-\mathrm{cg}(K))^{\top}~{}\mathrm{d}x.

It is well-known that any isotropic555A convex body is isotropic if the uniform distribution over the body has zero mean and covariance matrix being the identity. convex body is enclosed by two balls.

Lemma 3.8 (Ellipsoidal approximation of convex body, [38]).

Let KK be an isotropic convex body in n\mathbb{R}^{n}. Then,

n+1nB2Kn(n+1)B2,\displaystyle\sqrt{\frac{n+1}{n}}\cdot B_{2}\subseteq K\subseteq\sqrt{n(n+1)}\cdot B_{2},

where B2B_{2} is the unit Euclidean ball in n\mathbb{R}^{n}.

If KK is non-isotropic, then

n+1nE(cg(K),Cov(K)1)Kn(n+1)E(cg(K),Cov(K)1).\displaystyle\sqrt{\frac{n+1}{n}}\cdot E({\rm cg}(K),{\rm Cov}(K)^{-1})\subseteq K\subseteq\sqrt{n(n+1)}\cdot E({\rm cg}(K),{\rm Cov}(K)^{-1}).

3.4 Lattice Projection

We collect some standard facts of lattice projection that is directly implied by Gram-Schmidt process. We use ΠW()\Pi_{W}(\cdot) to denote the orthogonal projection onto the subspace WW.

Fact 3.9 (Lattice projection).

Let Λ\Lambda be a full rank lattice in n\mathbb{R}^{n} and WW be a linear subspace such that dim(span(ΛW))=dim(W){\rm dim}({\rm span}(\Lambda\cap W))={\rm dim}(W). Then

det(Λ)=\displaystyle\det(\Lambda)= det(ΛW)det(ΠW(Λ)).\displaystyle~{}\det(\Lambda\cap W)\cdot\det(\Pi_{W^{\perp}}(\Lambda)).
Fact 3.10 (Dual of lattice projection).

Let Λ\Lambda be a full rank lattice in n\mathbb{R}^{n} and WW be a linear subspace such that dim(span(ΛW))=dim(W){\rm dim}({\rm span}(\Lambda\cap W))={\rm dim}(W). Then

(ΠW(Λ))=\displaystyle(\Pi_{W}(\Lambda))^{*}= ΛW.\displaystyle~{}\Lambda^{*}\cap W.

3.5 Slicing Lemma

We present a variant of Lemma 3.2 in [31] in our Lemma 4.17. In particular, we replace the norm with respect to Cov(K)\mathrm{Cov}(K) to the norm with respect to HK1H_{K}^{-1}. Before stating the our new lemma (Lemma 4.17), we first recall the high dimension slicing lemma of [31].

Lemma 3.11 (Lemma 3.2 in [31]).

Let KK be a convex body and LL be a full-rank lattice in n\mathbb{R}^{n}. Let WW be an (nk)(n-k)-dimensional subspace of n\mathbb{R}^{n} such that dim(LW)=nk{\rm dim}(L\cap W)=n-k. Then

vol(KW)det(LW)\displaystyle\frac{{\rm vol}(K\cap W)}{\det(L\cap W)}\leq vol(K)det(L)kO(k)λ1(L,Cov(K))k\displaystyle~{}\frac{{\rm vol}(K)}{\det(L)}\cdot\frac{k^{O(k)}}{\lambda_{1}(L^{*},{\rm Cov}(K))^{k}}

where LL^{*} is the dual lattice (see Definition 3.1) and λ1(L,Cov(K))\lambda_{1}(L^{*},{\rm Cov}(K)) is the shortest nonzero vector in LL^{*} under the norm Cov(K)\|\cdot\|_{{\rm Cov}(K)}.

3.6 Dimension Reduction Lemma

The key building block of both [31] and our algorithm is a lattice-dimension reduction step. The following result from [31] shows that all integral points are preserved after a dimension reduction step.

Lemma 3.12 (Lemma 3.1 of [31]).

Given an affine subspace W=x0+W0W=x_{0}+W_{0} where W0W_{0} is a subspace of n\mathbb{R}^{n} and x0nx_{0}\in\mathbb{R}^{n} is some fixed point, and an ellipsoid E=E(x0,A)E=E(x_{0},A) that has full rank on WW. Given a vector vΠW0(n){𝟎n}v\in\Pi_{W_{0}}(\operatorname{\mathbb{Z}}^{n})\setminus\{{\bf 0}_{n}\} with vA11/2\|v\|_{A^{-1}}\leq 1/2 then there exists a hyperplane PWP\not\supseteq W such that EnPWE\cap\operatorname{\mathbb{Z}}^{n}\subseteq P\cap W.

4 Cutting Plane Method

In Section 4.1, we introduce the definition of log-barrier and volumetric center. In Section 4.2, we introduce leverage score and related notations. In Section 4.3, we present the convergence lemma. In Section 4.4, we present the sandwiching lemma. In Section 4.5, we show the closeness between Hessian of log-barrier and covariance matrix. In Section 4.6, we show the stability of approximate center. In Section 4.7, we show the closeness between approximate center and true center. In Section 4.8, we present a novel slicing lemma. In Section 4.9, we present the dynamic leverage score maintenance data structure. In Section 4.10, we present our main lemma.

4.1 Log-barrier and Volumetric Center

We start with defining the log-barrier, and it has been widely used in convex optimization [54, 50, 33, 26, 35, 45, 63, 25].

Definition 4.1 (Log-barrier).

Let Am×nA\in\mathbb{R}^{m\times n} and bmb\in\mathbb{R}^{m}. Let aia_{i}^{\top} denote the ii-th row of AA. The log-barrier is defined as

ϕ(x)=\displaystyle\phi(x)= i=1mln(aixbi)\displaystyle~{}\sum_{i=1}^{m}-\ln(a_{i}^{\top}x-b_{i})

for xnx\in\mathbb{R}^{n}.

Let KK be the bounded full-dimensional polytope L={x:Axb}L=\{x:Ax\geq b\} where Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m} and xnx\in\mathbb{R}^{n}.

Definition 4.2 (Hessian and volumetric).

Given Am×nA\in\mathbb{R}^{m\times n} and bmb\in\mathbb{R}^{m}. Let H(x)H(x) be defined as

H(x)=i=1maiai(aixbi)2\displaystyle H(x)=\sum_{i=1}^{m}\frac{a_{i}a_{i}^{\top}}{(a_{i}^{\top}x-b_{i})^{2}}

where aia_{i}^{\top} denotes the ii-th row of AA.

Let (volumetric) function F(x)F(x) be as

F(x):=12ln(det(H(x)))\displaystyle F(x):=\frac{1}{2}\ln(\det(H(x)))

H(x)H(x) is the Hessian of the logarithmic barrier function i=1mln(aixbi)\sum_{i=1}^{m}-\ln(a_{i}^{\top}x-b_{i}) and is positive definite for all xx in the interior of KK.

Definition 4.3 (Volumetric center).

We define the volumetric center of KK as

vc(K):=\displaystyle\mathrm{vc}(K):= argminxKF(x).\displaystyle~{}\arg\min_{x\in K}~{}F(x).

Observe that FF is a convex function, hence one can run Newton-type algorithm to approximate vc(K)\operatorname{vc}(K) very fast.

4.2 Leverage Score of Log Hessian

We start with some definitions.

Definition 4.4.

We define σi(x)\sigma_{i}(x) to be the ii-th leverage score of matrix H(x)H(x) as

σi(x)=\displaystyle\sigma_{i}(x)= aiH(x)1ai(aixbi)2,i[m].\displaystyle~{}\frac{a_{i}^{\top}H(x)^{-1}a_{i}}{(a_{i}^{\top}x-b_{i})^{2}},~{}~{}~{}\forall i\in[m].

Using σi\sigma_{i}, we can write the gradient of FF in an convenient form:

F(x)=\displaystyle\nabla F(x)= i=1mσi(x)aiaixbi.\displaystyle~{}-\sum_{i=1}^{m}\sigma_{i}(x)\frac{a_{i}}{a_{i}^{\top}x-b_{i}}.
Definition 4.5.

We define Q(x)Q(x) as

Q(x)=\displaystyle Q(x)= i=1mσi(x)aiai(aixbi)2.\displaystyle~{}\sum_{i=1}^{m}\sigma_{i}(x)\frac{a_{i}a_{i}^{\top}}{(a_{i}^{\top}x-b_{i})^{2}}.

where σi(x)\sigma_{i}(x) is defined as Definition 4.4.

It is well-known that Q(x)Q(x) is a good approximation of 2F(x)\nabla^{2}F(x):

Lemma 4.6 (Lemma 3 of [67]).

For any xnx\in\mathbb{R}^{n}, we have

Q(x)2F(x)5Q(x).\displaystyle Q(x)\preceq\nabla^{2}F(x)\preceq 5Q(x).

Finally, we define μ(x)\mu(x) which quantifies:

Definition 4.7.

Let μ(x)\mu(x) be the largest number λ\lambda such that

Q(x)\displaystyle Q(x)\succeq λH(x).\displaystyle~{}\lambda H(x).

The following lemma provides an upper bound on μ(x)\mu(x):

Lemma 4.8 (Lemma 4 of [67]).

For any xKx\in K, we have

14mμ(x)1.\displaystyle\frac{1}{4m}\leq\mu(x)\leq 1.

Further, we have

μ(x)\displaystyle\mu(x)\geq mini[m]{σi(x)}.\displaystyle~{}\min_{i\in[m]}\{\sigma_{i}(x)\}.

4.3 Volume Shrinking

The following result (Lemma 4.9) bounds the progress of adding or deleting a plane of Vaidya’s CPM.

Lemma 4.9.

Let δ104\delta\leq 10^{-4} and ϵ103δ\epsilon\leq 10^{-3}\delta be some constants and let ρk\rho^{k} denote the value of F(vc(K))F({\rm vc}(K)) at the beginning of kk-th iteration. Then at the beginning of each iteration there exists an zz satisfying the condition

F(z)F(vc(K))\displaystyle F(z)-F({\rm vc}(K))\leq ϵ4μ(vc(K)).\displaystyle~{}\epsilon^{4}\mu({\rm vc}(K)).

Furthermore, the following statements hold:

  • If mini[m]{σi(z)}ϵ\min_{i\in[m]}\{\sigma_{i}(z)\}\geq\epsilon at kk-th iteration then

    ρk+1ρk\displaystyle\rho^{k+1}-\rho^{k}\geq (δϵ)1/25.\displaystyle~{}\frac{(\delta\epsilon)^{1/2}}{5}.
  • Otherwise, we have

    ρkρk+1\displaystyle\rho^{k}-\rho^{k+1}\leq 5ϵ.\displaystyle~{}5\epsilon.

Next, in Lemma 4.10, we show that after T=O(nlogm)T=O(n\log m) iterations, the volume of the resulting convex body is only (1m)n(\frac{1}{m})^{n} fraction of the original convex body.

Lemma 4.10.

Let KnK\subseteq\mathbb{R}^{n} be a convex body with non-empty interior. Suppose we run CuttingPlaneMethod for T=O(nlogm)T=O(n\log m) iterations, then we obtain a convex body KK^{\prime} such that

vol(K)\displaystyle{\rm vol}(K^{\prime})\leq (1m)nvol(K)\displaystyle~{}\left(\frac{1}{m}\right)^{n}\cdot{\rm vol}(K)
Proof.

Let πk\pi^{k} denote the volume of the polytope KK at the beginning of kk-th iteration. Note that vol(K)=πT{\rm vol}(K^{\prime})=\pi^{T}. Using Lemma 4.9 we shall obtain an upper bound on πk\pi^{k} and show that after O(nlogm)O(n\log m) iterations the volume decreases by a factor of (1m)n(\frac{1}{m})^{n}. First, we show that

ρk\displaystyle\rho^{k}\geq ρ0+12kϵ.\displaystyle~{}\rho^{0}+\frac{1}{2}k\epsilon.

Since KK is bounded, the number of bounding planes is at least n+1n+1 and to start with this number is exactly n+1n+1. Thus by the kk-th iteration the case of adding a plane must have occurred at least as often as the case of deleting a plane otherwise the number of planes would have fallen below n+1n+1. So by the kk-th iteration adding a plane must happen at least k/2k/2 times and removing a plane must happen at most k/2k/2 times. Hence

ρkρ012(15k(δϵ)1/25kϵ)12kϵ\displaystyle\rho^{k}-\rho^{0}\geq\frac{1}{2}\left(\frac{1}{5}k(\delta\epsilon)^{1/2}-5k\epsilon\right)\geq\frac{1}{2}k\epsilon

where the last step follows from ϵ103δ\epsilon\leq 10^{-3}\delta.

Set k=Tk=T, we have

ρTρ0\displaystyle\rho^{T}-\rho^{0}\geq 12Tϵ.\displaystyle~{}\frac{1}{2}T\epsilon. (1)

We note that by Lemma 4.11, the polytope KK contains E(vc(K),HK)E({\rm vc}(K),H_{K}) so its π0\pi^{0} can be lower bounded by the volume of E(vc(K),HK)E({\rm vc}(K),H_{K}). Therefore,

ln(π0)\displaystyle\ln(\pi^{0})\geq 12ln(det(HK))nlogn\displaystyle~{}-\frac{1}{2}\ln(\det(H_{K}))-n\log n
=\displaystyle= ρ0nlogn.\displaystyle~{}-\rho^{0}-n\log n. (2)

To obtain an upper bound on πT\pi^{T}, we note that if xx^{*} is the point that maximizes the logarithmic barrier over KK^{\prime}, then

K\displaystyle K^{\prime}\subseteq {x:(xx)H(x)(xx)m2}.\displaystyle~{}\{x:(x-x^{*})^{\top}H(x^{*})(x-x^{*})\leq m^{2}\}.

Then from the relationship between determinants and volume it follows that

vol(K)\displaystyle\mathrm{vol}(K^{\prime})\leq (2m)n(det(H(x)))1/2\displaystyle~{}(2m)^{n}(\det(H(x^{*})))^{-1/2}
\displaystyle\leq (2m)n(det(H(vc(K))))1/2\displaystyle~{}(2m)^{n}(\det(H({\rm vc}(K^{\prime}))))^{-1/2}
\displaystyle\leq (2m)nexp(F(vc(K))).\displaystyle~{}(2m)^{n}\exp(-F({\rm vc}(K^{\prime}))). (3)

where the last step follows from definitions of HH and FF (see Definition 4.2).

Since i=1mσi(x)=n\sum_{i=1}^{m}\sigma_{i}(x)=n (by Fact 3.3), the case of deleting a plane is forced to happen at an iteration if the number of planes defining KK is greater than n/ϵn/\epsilon and hence mm never exceeds n/ϵn/\epsilon.

Let us bound the difference between ln(πT)\ln(\pi^{T}) and ln(π0)\ln(\pi^{0}):

ln(πT)ln(π0)\displaystyle\ln(\pi^{T})-\ln(\pi^{0})\leq nlog(2m)ρTln(π0)\displaystyle~{}n\log(2m)-\rho^{T}-\ln(\pi^{0})
\displaystyle\leq nlog(2m)ρ012Tϵln(π0)\displaystyle~{}n\log(2m)-\rho^{0}-\frac{1}{2}T\epsilon-\ln(\pi^{0})
\displaystyle\leq nlog(2m)12Tϵ+nlogn\displaystyle~{}n\log(2m)-\frac{1}{2}T\epsilon+n\log n
\displaystyle\leq nlogm,\displaystyle~{}-n\log m,

the first step is by Eq. (4.3), the second step is by Eq. (1), the third step is by Eq. (4.3) and the last step is by T=O(nlogm)T=O(n\log m).

Exponentiating both sides, we obtain

vol(K)\displaystyle{\rm vol}(K^{\prime})\leq (1m)nvol(K).\displaystyle~{}\left(\frac{1}{m}\right)^{n}\cdot{\rm vol}(K).

Thus, we complete the proof. ∎

4.4 Sandwiching Lemma

We derive some sandwiching conditions regarding the ellipsoid induced by the Hessian of log barrier.

Lemma 4.11 (Sandwich convex body via log Hessian).

Let K={x:Axb}K=\{x:Ax\geq b\} be a polytope where Am×nA\in\mathbb{R}^{m\times n} and bmb\in\mathbb{R}^{m}. Let H(x)H(x) be defined as Definition 4.2. Then for any xKx\in K, we have that

E(x,H(x))\displaystyle E(x,H(x))\subseteq K,\displaystyle~{}K,

and the following upper bound

K\displaystyle K\subseteq 2m1.5nE(vc(K),H(vc(K))).\displaystyle~{}2m^{1.5}n\cdot E({\rm vc}(K),H({\rm vc}(K))).
Proof.

Let us first consider the ellipsoid contained in KK. By definition, we have that

E(x,H(x))=\displaystyle E(x,H(x))= {y:(yx)H(x)(yx)1},\displaystyle~{}\{y:(y-x)^{\top}H(x)(y-x)\leq 1\},

without loss of generality assume x=0x=0, then yE(x,H(x))y\in E(x,H(x)) means

i=1m(aiy)2bi2\displaystyle\sum_{i=1}^{m}\frac{(a_{i}^{\top}y)^{2}}{b_{i}^{2}}\leq 1,\displaystyle~{}1,

which means that for any i[m]i\in[m], it holds that

(aiy)2\displaystyle(a_{i}^{\top}y)^{2}\leq bi2.\displaystyle~{}b_{i}^{2}.

Since xKx\in K, we must have that bi0b_{i}\leq 0 for all i[m]i\in[m]. Hence, the square condition only requires that |aiy||bi||a_{i}^{\top}y|\leq|b_{i}|, so if aiy0a_{i}^{\top}y\geq 0, then clearly yKy\in K. Otherwise due to the absolute value constraint, it must be the case that aiybia_{i}^{\top}y\geq b_{i}. This concludes the proof of E(x,H(x))KE(x,H(x))\subseteq K.

For the ellipsoid that contains KK, we note that the volumetric function FF scaled by m\sqrt{m} is also a self-concordant barrier with complexity parameter mn\sqrt{m}n [67, 66, 3] and scaling does not change the minimizer, therefore we have

K\displaystyle K\subseteq mnE(vc(K),2F(vc(K)))),\displaystyle~{}mn\cdot E({\rm vc}(K),\nabla^{2}F(\operatorname{vc}(K)))),

recall that 2F(vc(K))μ(vc(K))H(vc(K))\nabla^{2}F(\operatorname{vc}(K))\succeq\mu(\operatorname{vc}(K))\cdot H(\operatorname{vc}(K)) (due to Lemma 4.6 and Definition 4.7) and 14mμ(vc(K))1\frac{1}{4m}\leq\mu(\operatorname{vc}(K))\leq 1 (See Lemma 4.8), we conclude that

14mH(vc(K))\displaystyle\frac{1}{4m}\cdot H(\operatorname{vc}(K))\preceq 2F(vc(K)),\displaystyle~{}\nabla^{2}F(\operatorname{vc}(K)),

thus, E(vc(K),2F(vc(K)))2mE(vc(K),H(vc(K)))E(\operatorname{vc}(K),\nabla^{2}F(\operatorname{vc}(K)))\subseteq 2\sqrt{m}\cdot E(\operatorname{vc}(K),H(\operatorname{vc}(K))) and we conclude the desired result. ∎

4.5 Closeness of Log Hessian and Covariance

Note that Lemma 3.8 and Lemma 4.11 together imply the spectral approximation between the matrix H(vc(K))H({\rm vc}(K)) (See Definition 4.2) and Cov(K){\rm Cov}(K) (See Definition 3.6).

Lemma 4.12 (Closeness of log Hessian and covariance).

Let K={x:Axb}K=\{x:Ax\geq b\} be a polytope for Am×nA\in\mathbb{R}^{m\times n} and bmb\in\mathbb{R}^{m}. Then

14m3n2H(vc(K))Cov(K)14n2H(vc(K)).\displaystyle\frac{1}{4m^{3}n^{2}}\cdot H({\rm vc}(K))\preceq{\rm Cov}(K)^{-1}\preceq 4n^{2}\cdot H({\rm vc}(K)).
Proof.

The proof relies on two sandwiching lemmas.

By Lemma 3.8, we have that

E(cg(K),Cov(K)1)K2nE(cg(K),Cov(K)1),\displaystyle E({\rm cg}(K),{\rm Cov}(K)^{-1})\subseteq K\subseteq 2n\cdot E({\rm cg}(K),{\rm Cov}(K)^{-1}),

by Lemma 4.11,

E(vc(K),H(vc(K)))K2m1.5nE(vc(K),H(vc(K))).\displaystyle E({\rm vc}(K),H({\rm vc}(K)))\subseteq K\subseteq 2m^{1.5}n\cdot E({\rm vc}(K),H({\rm vc}(K))).

We thus have

E(cg(K),Cov(K)1)\displaystyle E({\rm cg}(K),{\rm Cov}(K)^{-1})\subseteq 2m1.5nE(vc(K),H(vc(K))),\displaystyle~{}2m^{1.5}n\cdot E(\operatorname{vc}(K),H(\operatorname{vc}(K))),
E(vc(K),H(vc(K)))\displaystyle E(\operatorname{vc}(K),H(\operatorname{vc}(K)))\subseteq 2nE(cg(K),Cov(K)1).\displaystyle~{}2n\cdot E({\rm cg}(K),{\rm Cov}(K)^{-1}).

Without loss of generality, let’s prove one side containment. Given

E(cg(K),Cov(K)1)\displaystyle E({\rm cg}(K),{\rm Cov}(K)^{-1})\subseteq 2m1.5nE(vc(K),H(vc(K))),\displaystyle~{}2m^{1.5}n\cdot E({\rm vc}(K),H(\operatorname{vc}(K))),

we note that re-centering and making their centers the same does not change the containment relation, therefore, we have the following:

E(Cov(K)1)\displaystyle E({\rm Cov}(K)^{-1})\subseteq 2m1.5nE(H(vc(K))),\displaystyle~{}2m^{1.5}n\cdot E(H(\operatorname{vc}(K))),

this directly implies the Loewner ordering

14m3n2H(vc(K))\displaystyle\frac{1}{4m^{3}n^{2}}\cdot H(\operatorname{vc}(K))\preceq Cov(K)1\displaystyle~{}{\rm Cov}(K)^{-1}

Following a similar approach, we can show that

Cov(K)1\displaystyle{\rm Cov}(K)^{-1}\preceq 4n2H(vc(K))\displaystyle~{}4n^{2}\cdot H(\operatorname{vc}(K))

this completes the proof. ∎

4.6 Stability of Approximate Center

In this section, we prove another useful lemma that concerns properties of the Hessian matrix of the log barrier. It compares the Hessian evaluated at the volumetric center and an approximate center.

Before proceeding to the second lemma, we define some notions.

Definition 4.13.

Let K={x:Axb}K=\{x:Ax\geq b\} be a polytope with mm constraints of dimension nn. Let r(0,1)r\in(0,1). Define Σ(x,r)\Sigma(x,r) to be the region

Σ(x,r)=\displaystyle\Sigma(x,r)= {y:i[m],1raiybiaixbi1+r}.\displaystyle~{}\Big{\{}y:\forall i\in[m],1-r\leq\frac{a_{i}^{\top}y-b_{i}}{a_{i}^{\top}x-b_{i}}\leq 1+r\Big{\}}.

Recall the CPM of [67] maintains an approximate volumetric center that will be updated via Newton’s method. By performing Newton’s method, it can be guaranteed that evaluating function FF on the approximate center is not too far away from the volumetric center. The next lemma shows that the approximate center is also in the region Σ(vc(K),r)\Sigma(\operatorname{vc}(K),r) for proper rr.

Lemma 4.14 (Lemma 10 of [67]).

Let FF be defined as in Definition 4.2. Let QQ be defined as in Definition 4.5. Let ϵ104\epsilon\leq 10^{-4} be a small constant and let zz be a point in KK such that F(z)F(vc(K))ϵμ(vc(K))F(z)-F(\operatorname{vc}(K))\leq\epsilon\sqrt{\mu(\operatorname{vc}(K))}. Then we have

  • zΣ(vc(K),5ϵ1/2)z\in\Sigma(\operatorname{vc}(K),5\epsilon^{1/2}).

  • μ(vc(K))1.5μ(z)\mu(\operatorname{vc}(K))\leq 1.5\mu(z).

  • 0.1F(z)Q(z)1F(z)F(z)F(vc(K))2F(z)Q(z)1F(z)0.1\nabla F(z)^{\top}Q(z)^{-1}\nabla F(z)\leq F(z)-F(\operatorname{vc}(K))\leq 2\nabla F(z)^{\top}Q(z)^{-1}\nabla F(z).

In the following Lemma, we show the stability of approximate center.

Lemma 4.15 (Stability of approximate center).

Let KK be a convex polytope with vc(K)\operatorname{vc}(K) being its volumetric center. Let HH and FF be defined as in Definition 4.2. Let ϵ(0,104)\epsilon\in(0,10^{-4}) be a small constant. Let zKz\in K be a point such that F(z)F(vc(K))ϵμ(vc(K))F(z)-F(\operatorname{vc}(K))\leq\epsilon\sqrt{\mu(\operatorname{vc}(K))}, then we have

(130ϵ)H(vc(K))H(z)(1+30ϵ)H(vc(K)).\displaystyle(1-30\epsilon)\cdot H(\operatorname{vc}(K))\preceq H(z)\preceq(1+30\epsilon)\cdot H(\operatorname{vc}(K)).
Proof.

By Lemma 4.14, we know that zΣ(vc(K),5ϵ1/2)z\in\Sigma(\operatorname{vc}(K),5\epsilon^{1/2}), which means that for any i[m]i\in[m], we have that

aizbiaivc(K)bi[15ϵ1/2,1+5ϵ1/2].\displaystyle\frac{a_{i}^{\top}z-b_{i}}{a_{i}^{\top}\operatorname{vc}(K)-b_{i}}\in[1-5\epsilon^{1/2},1+5\epsilon^{1/2}].

Recall we define the HH (Definition 4.2) matrix as

H(x)=\displaystyle H(x)= i=1maiai(aixbi)2,\displaystyle~{}\sum_{i=1}^{m}\frac{a_{i}a_{i}^{\top}}{(a_{i}^{\top}x-b_{i})^{2}},

which means for different arguments, the only part differs is the coefficients (aixbi)2(a_{i}^{\top}x-b_{i})^{2}. If we can approximate the coefficients well, then we can show H(z)H(z) and H(vc(K))H(\operatorname{vc}(K)) are spectrally close. Without loss of generality we prove the upper bound, lower bound is similar:

H(z)=\displaystyle H(z)= i=1maiai(aizbi)2\displaystyle~{}\sum_{i=1}^{m}\frac{a_{i}a_{i}^{\top}}{(a_{i}^{\top}z-b_{i})^{2}}
\displaystyle\preceq i=1maiai(15ϵ1/2)2(aivc(K)bi)2\displaystyle~{}\sum_{i=1}^{m}\frac{a_{i}a_{i}^{\top}}{(1-5\epsilon^{1/2})^{2}(a_{i}^{\top}\operatorname{vc}(K)-b_{i})^{2}}
\displaystyle\preceq 1110ϵi=1maiai(aivc(K)bi)2\displaystyle~{}\frac{1}{1-10\epsilon}\sum_{i=1}^{m}\frac{a_{i}a_{i}^{\top}}{(a_{i}^{\top}\operatorname{vc}(K)-b_{i})^{2}}
=\displaystyle= 1110ϵH(vc(K))\displaystyle~{}\frac{1}{1-10\epsilon}H(\operatorname{vc}(K))
\displaystyle\preceq (1+30ϵ)H(vc(K)),\displaystyle~{}(1+30\epsilon)H(\operatorname{vc}(K)),

where the first step follows from definition HH, the second step follows from (aizbi)2(15ϵ1/2)2(aivc(K)bi)2(a_{i}^{\top}z-b_{i})^{2}\geq(1-5\epsilon^{1/2})^{2}(a_{i}^{\top}\operatorname{vc}(K)-b_{i})^{2}, the third step follows from (15ϵ1/2)2110ϵ(1-5\epsilon^{1/2})^{2}\geq 1-10\epsilon, the forth step follows from definition of HH, and the last step follows from (110ϵ)(1+30ϵ)1(1-10\epsilon)(1+30\epsilon)\geq 1 when ϵ(0,0.01)\epsilon\in(0,0.01). This completes our proof. ∎

4.7 Closeness of Approximate and True Center

The goal of this section is to prove Lemma 4.16, which states that under the induced-H(vc(K))H(\operatorname{vc}(K)) norm, the approximate center and the volumetric center is at most ϵm\epsilon m away. We will later show that this discrepancy is in fact acceptable for our algorithm to make progress.

Lemma 4.16 (Closeness of approximate and true center in terms of HH norm).

Let KK be a convex polytope with vc(K)\operatorname{vc}(K) being its volumetric center and let ϵ(0,104)\epsilon\in(0,10^{-4}). Let HH and FF be defined as in Definition 4.2. Let zKz\in K be a point such that F(z)F(vc(K))ϵ5μ(vc(K))F(z)-F(\operatorname{vc}(K))\leq\frac{\epsilon}{5}\sqrt{\mu(\operatorname{vc}(K))}, then we have

zvc(K)H(vc(K))\displaystyle\|z-\operatorname{vc}(K)\|_{H(\operatorname{vc}(K))}\leq ϵm.\displaystyle~{}\epsilon m.
Proof.

Throughout the proof, we set ϵ\epsilon to ϵ/5\epsilon/5.

By Lemma 4.14, we have that zΣ(vc(K),5ϵ1/2)z\in\Sigma(\operatorname{vc}(K),5\epsilon^{1/2}), which means

(1ϵ1/2)(aivc(K)bi)aizbi(1+ϵ1/2)(aivc(K)bi).\displaystyle(1-\epsilon^{1/2})\cdot(a_{i}^{\top}\operatorname{vc}(K)-b_{i})\leq a_{i}^{\top}z-b_{i}\leq(1+\epsilon^{1/2})\cdot(a_{i}^{\top}\operatorname{vc}(K)-b_{i}).

This means that if aizaivc(K)a_{i}^{\top}z\geq a_{i}^{\top}\operatorname{vc}(K),

aizaivc(K)=\displaystyle a_{i}^{\top}z-a_{i}^{\top}\operatorname{vc}(K)= (aizbi)(aivc(K)bi)\displaystyle~{}(a_{i}^{\top}z-b_{i})-(a_{i}^{\top}\operatorname{vc}(K)-b_{i})
\displaystyle\leq ϵ1/2(aivc(K)bi).\displaystyle~{}\epsilon^{1/2}(a_{i}^{\top}\operatorname{vc}(K)-b_{i}).

On the other hand if aiz<aivc(K)a_{i}^{\top}z<a_{i}^{\top}\operatorname{vc}(K),

aivc(K)aiz=\displaystyle a_{i}^{\top}\operatorname{vc}(K)-a_{i}^{\top}z= (aivc(K)bi)(aizbi)\displaystyle~{}(a_{i}^{\top}\operatorname{vc}(K)-b_{i})-(a_{i}^{\top}z-b_{i})
\displaystyle\leq (aivc(K)bi)(1ϵ1/2)(aivc(K)bi)\displaystyle~{}(a_{i}^{\top}\operatorname{vc}(K)-b_{i})-(1-\epsilon^{1/2})(a_{i}^{\top}\operatorname{vc}(K)-b_{i})
=\displaystyle= ϵ1/2(aivc(K)bi).\displaystyle~{}\epsilon^{1/2}(a_{i}^{\top}\operatorname{vc}(K)-b_{i}).

We thus have shown that |ai(zvc(K))|ϵ1/2(aivc(K)bi)|a_{i}^{\top}(z-\operatorname{vc}(K))|\leq\epsilon^{1/2}(a_{i}^{\top}\operatorname{vc}(K)-b_{i}). We proceed to measure the squared-H(vc(K))H(\operatorname{vc}(K)) norm of zvc(K)z-\operatorname{vc}(K):

zvc(K)H(vc(K))2=\displaystyle\|z-\operatorname{vc}(K)\|_{H(\operatorname{vc}(K))}^{2}= i=1m(aizaivc(K))2(aivc(K)bi)2\displaystyle~{}\sum_{i=1}^{m}\frac{(a_{i}^{\top}z-a_{i}^{\top}\operatorname{vc}(K))^{2}}{(a_{i}^{\top}\operatorname{vc}(K)-b_{i})^{2}}
\displaystyle\leq i=1mϵ(aivc(K)bi)2(aivc(K)bi)2\displaystyle~{}\sum_{i=1}^{m}\frac{\epsilon(a_{i}^{\top}\operatorname{vc}(K)-b_{i})^{2}}{(a_{i}^{\top}\operatorname{vc}(K)-b_{i})^{2}}
=\displaystyle= ϵm.\displaystyle~{}\epsilon m.

This completes the proof of the lemma. ∎

4.8 High Dimensional Slicing Lemma

We present a novel high dimensional slicing lemma that uses Hessian of log barrier, instead of the covariance matrix as in [31]. It incurs an extra (mn)O(k)(mn)^{O(k)} term, but as we will see later, this does not affect the total number of oracle calls too much.

Lemma 4.17 (High dimensional slicing lemma, volumetric version).

Let KK be a convex body, let HKH_{K} denote the Hessian matrix of log barrier of KK at its volumetric center. Let LL be a full-rank lattice in n\mathbb{R}^{n}. Let WW be an (nk)(n-k)-dimensional subspace of n\mathbb{R}^{n} such that dim(LW)=nk\mathrm{dim}(L\cap W)=n-k. Then we have

vol(KW)det(LW)vol(K)det(L)kO(k)(mn)O(k)λ1(L,HK1)k\displaystyle\frac{\mathrm{vol}(K\cap W)}{\det(L\cap W)}\leq\frac{\mathrm{vol}(K)}{\det(L)}\cdot\frac{k^{O(k)}\cdot(mn)^{O(k)}}{\lambda_{1}(L^{*},H_{K}^{-1})^{k}}

where LL^{*} is the dual lattice and λ1(L,HK1)\lambda_{1}(L^{*},H_{K}^{-1}) is the shortest nonzero vector in LL^{*} under the norm HK1\|\cdot\|_{H_{K}^{-1}}.

Proof.

Let vv denote the shortest nonzero vector in LL^{*} under the norm Cov(K)\|\cdot\|_{{\rm Cov}(K)} and uu be the shortest nonzero vector in LL^{*} under the norm HK1\|\cdot\|_{H_{K}^{-1}}, by Lemma 4.12, we know that

O(1m3n2)HK1Cov(K)O(n2)HK1.\displaystyle O\left(\frac{1}{m^{3}n^{2}}\right)\cdot H_{K}^{-1}\preceq{\rm Cov}(K)\preceq O(n^{2})\cdot H_{K}^{-1}.

We have

vCov(K)\displaystyle\|v\|_{{\rm Cov}(K)}\geq 12m1.5nvHK1\displaystyle~{}\frac{1}{2m^{1.5}n}\cdot\|v\|_{H_{K}^{-1}}
\displaystyle\geq 12m1.5nuHK1,\displaystyle~{}\frac{1}{2m^{1.5}n}\cdot\|u\|_{H_{K}^{-1}},

therefore, we have that

1λ1(L,Cov(K))\displaystyle\frac{1}{\lambda_{1}(L^{*},{\rm Cov}(K))}\leq 2m1.5nλ1(L,HK1).\displaystyle~{}\frac{2m^{1.5}n}{\lambda_{1}(L^{*},H_{K}^{-1})}.

Using Lemma 3.11, we have

vol(KW)det(LW)\displaystyle\frac{{\rm vol}(K\cap W)}{\det(L\cap W)}\leq vol(K)det(L)kO(k)λ1(L,Cov(K))k\displaystyle~{}\frac{{\rm vol}(K)}{\det(L)}\cdot\frac{k^{O(k)}}{\lambda_{1}(L^{*},{\rm Cov}(K))^{k}}

Finally, combining the above equations we conclude

vol(KW)det(LW)vol(K)det(L)kO(k)(mn)O(k)λ1(L,HK1)k.\displaystyle\frac{\mathrm{vol}(K\cap W)}{\det(L\cap W)}\leq\frac{\mathrm{vol}(K)}{\det(L)}\cdot\frac{k^{O(k)}\cdot(mn)^{O(k)}}{\lambda_{1}(L^{*},H_{K}^{-1})^{k}}.

Thus, we complete the proof. ∎

4.9 Faster Cutting Plane via Leverage Score Maintenance

The vanilla Vaidya’s CPM algorithm requires to compute all mm leverage scores per iteration, which would require O(nω)O(n^{\omega}) time to form a projection matrix. [34] shows that by carefully designing data structures for leverage score maintenance, each iteration can be improved to O(n2)O(n^{2}) amortized time. The following lemma states that to correct the extra error induced by using such data structures, it is sufficient to run for an extra O(ϵ1)O(\epsilon^{-1}) step. This provides guarantee for [34], and we leverage it to speed up our algorithm.

Lemma 4.18 (Approximating function value via approximate leverage score, Lemma A.3 of [34]).

If the following conditions hold

  • The Vaidya’s Newton step [67] uses exact leverage score σ\sigma and runs in T=O(nlogn)T=O(n\log n) iterations to obtain an approximate point zz such that

    F(z)F(vc(K))\displaystyle F(z)-F(\operatorname{vc}(K))\leq 0.1.\displaystyle~{}0.1.
  • The closeness between true leverage score and approximate leverage score, σ~σ21/logO(1)(n)\|\widetilde{\sigma}-\sigma\|_{2}\leq 1/\log^{O(1)}(n)

Then, running Vaidya’s Newton step [67] with approximate leverage score σ~\widetilde{\sigma} for T~=T+O(1/ϵ)\widetilde{T}=T+O(1/\epsilon) iterations, we can obtain a z~\widetilde{z} such that

F(z~)F(vc(K))\displaystyle F(\widetilde{z})-F(\operatorname{vc}(K))\leq ϵ.\displaystyle~{}\epsilon.

To obtain various guarantees, we need to find an zKz\in K with F(z)F(vc(K))cμ(vc(K))F(z)-F({\rm vc}(K))\leq c\cdot\sqrt{\mu(\operatorname{vc}(K))} for small constant c104c\leq 10^{-4}. Note that whenever the smallest leverage score is at least ϵ\epsilon, we only need to run an extra O(ϵ1)O(\epsilon^{-1}) iterations of Newton’s step. In the other case, one can show that the old point zz is still a good starting point for the Newton’s step, therefore running an extra O(1)O(1) iterations suffices.

We state the data structure of [34] for completeness.

Lemma 4.19 (Theorem 5.1 of [34]).

Given an initial matrix Am×nA\in\mathbb{R}^{m\times n} with m=O(n)m=O(n), initial weight w0mw\in\mathbb{R}^{m}_{\geq 0}, there is a randomized data structure that approximately maintains the leverage score

σi(w)=\displaystyle\sigma_{i}(w)= (WA(AWA)1AW)i,i,i[m]\displaystyle~{}(\sqrt{W}A(A^{\top}WA)^{-1}A^{\top}\sqrt{W})_{i,i},~{}~{}~{}\forall i\in[m]

where Wm×mW\in\mathbb{R}^{m\times m} is the diagonal matrix that puts wmw\in\mathbb{R}^{m} on its diagonal. The data structure uses O(n2+o(1))O(n^{2+o(1)}) space and supports the following operations:

  • Init(Am×n,wm)(A\in\mathbb{R}^{m\times n},w\in\mathbb{R}^{m}): The data structure initializes in O(nω+o(1))O(n^{\omega+o(1)}) time.

  • Update(wm,un,wu,i[m],𝖺𝖼𝗍{ins,del,upd})(w\in\mathbb{R}^{m},u\in\mathbb{R}^{n},w_{u}\in\mathbb{R},i\in[m],\mathsf{act}\in\{\mathrm{ins},\mathrm{del},\mathrm{upd}\}): The data structure updates by

    • If 𝖺𝖼𝗍=ins\mathsf{act}=\mathrm{ins}, we insert a row uu with weight wuw_{u} into (A(k1),w(k1))(A^{(k-1)},w^{(k-1)}) such that

      wuuu\displaystyle w_{u}uu^{\top}\preceq 0.01(A(k1))W(k1)A(k1)\displaystyle~{}0.01(A^{(k-1)})^{\top}W^{(k-1)}A^{(k-1)}

      Suppose currently A(k1)A^{(k-1)} has iui_{u} rows already, we append uu to the (iu+1)(i_{u}+1)-th row of A(k1)A^{(k-1)} and append wuw_{u} to the (iu+1)(i_{u}+1)-th row of W(k1)W^{(k-1)}. In this case, we ignore the ww and ii from input parameters.

    • If 𝖺𝖼𝗍=del\mathsf{act}=\mathrm{del}, let iv=ii_{v}=i, let v,wvv,w_{v} denote the ivi_{v}-the row of (A(k1),w(k1))(A^{(k-1)},w^{(k-1)}). We delete the ivi_{v}-th row from (A(k1),w(k1))(A^{(k-1)},w^{(k-1)}) such that

      wvvv\displaystyle w_{v}vv^{\top}\preceq 0.01(A(k1))W(k1)A(k1)\displaystyle~{}0.01(A^{(k-1)})^{\top}W^{(k-1)}A^{(k-1)}

      In this case we ignore the w,u,wuw,u,w_{u} from input parameters of update function.

    • If 𝖺𝖼𝗍=upd\mathsf{act}=\mathrm{upd}, we update the weight vector from w(k1)w^{(k-1)} to w(k)w^{(k)} such that

      log(w(k))log(w(k1))2\displaystyle\|\log(w^{(k)})-\log(w^{(k-1)})\|_{2}\leq 0.01.\displaystyle~{}0.01.

      Here w(k)w^{(k)} denote the ww from input of update function, and w(k1)w^{(k-1)} denote the weight we stored from last iteration. In this case, we ignore the u,wu,iu,w_{u},i from input parameters of update function.

    This step takes amortized O(n2)O(n^{2}) time.

  • Query()(): The data structure outputs an approximate leverage score σ~m\widetilde{\sigma}\in\mathbb{R}^{m} such that

    σ~σ2\displaystyle\|\widetilde{\sigma}-\sigma\|_{2}\leq O(1/logc(n)),\displaystyle~{}O(1/\log^{c}(n)),

    this step takes O(n)O(n) time. Here c>1c>1 is some fixed constant.

In our application, we will invoke the data structure in the following fashion: each time the data structure is initialized, it will be updated and queried for a consecutive of O(nlogn)O(n\log n) steps, which takes a total of O(n3logn)O(n^{3}\log n) time. We will then perform such sequence of operations for O(n)O(n) times, which leads to a total of O(n4logn)O(n^{4}\log n) time.

4.10 Main Lemma

The meta lemma of this section states that if we invoke O(nlogm)O(n\log m) oracle calls to add planes for Vaidya’s CPM, we end up with a polytope whose volume is only a fraction of (1m)n\left(\frac{1}{m}\right)^{n} of the original polytope.

Lemma 4.20.

Given a separation oracle 𝖲𝖮\operatorname{\mathsf{SO}} for a convex function ff defined on n\mathbb{R}^{n}. Let vc\mathrm{vc} be defined as Definition 4.3. Let HH be defined as Definition 4.2. Given a polytope KnK\subseteq\mathbb{R}^{n} with mm constraints that contains minimizer xx^{*} of ff, and an error parameter ϵ>0\epsilon>0, there exists a cutting plane method CuttingPlaneMethod(𝖲𝖮,K,T,ϵ)\textsc{CuttingPlaneMethod}(\operatorname{\mathsf{SO}},K,T,\epsilon) with T=O(nlog(m/ϵ))T=O(n\log(m/\epsilon)) that uses at most O(nlog(m/ϵ))O(n\log(m/\epsilon)) calls to 𝖲𝖮\operatorname{\mathsf{SO}} and an extra O(n3log(m/ϵ))O(n^{3}\log(m/\epsilon)) arithmetic operations to output a polytope KK^{\prime} with at most O(n/ϵ)O(n/\epsilon) constraints, an approximate volumetric center zz of KK^{\prime} and Hessian matrix HH of log barrier of KK^{\prime} such that the following holds:

  • Part 1. xKx^{*}\in K^{\prime} and KK^{\prime} is the intersection of KK with TT hyperplanes outputted by 𝖲𝖮\operatorname{\mathsf{SO}}.

  • Part 2. E(vc(K),H(vc(K)))KO(mn)E(vc(K),H(vc(K))E(\mathrm{vc}(K^{\prime}),H(\mathrm{vc}(K^{\prime})))\subseteq K^{\prime}\subseteq O(mn)\cdot E(\mathrm{vc}(K^{\prime}),H(\mathrm{vc}(K^{\prime})).

  • Part 3. vol(K)(1m)nvol(K)\mathrm{vol}(K^{\prime})\leq(\frac{1}{m})^{n}\cdot\mathrm{vol}(K).

  • Part 4. zvc(K)H(vc(K))ϵm\|z-\mathrm{vc}(K^{\prime})\|_{H(\mathrm{vc}(K^{\prime}))}\leq\epsilon m.

  • Part 5. (1ϵ)H(vc(K))H(z)(1+ϵ)H(vc(K))(1-\epsilon)\cdot H(\mathrm{vc}(K^{\prime}))\preceq H(z)\preceq(1+\epsilon)\cdot H(\mathrm{vc}(K^{\prime})).

Proof.

We prove the lemma item by item.

For Part 1, it is implied by the original Vaidya’s algorithm as in [67].

Part 2 is due to the sandwiching lemma for polytope as in Lemma 4.11.

Part 3 is owing to Lemma 4.10.

For Part 4, we prove it in Lemma 4.16.

For Part 5, we show it in Lemma 4.15.

Regarding the runtime, we will use the data structure of Lemma 4.19 for a consecutive of TT operations, which takes a total of O(nω+o(1)+n3logm)=O(n3logm)O(n^{\omega+o(1)}+n^{3}\log m)=O(n^{3}\log m) arithmetic operations. Note that each iteration the query guarantees the approximate leverage score satisfy σ~σ2O(1/logc(n))\|\widetilde{\sigma}-\sigma\|_{2}\leq O(1/\log^{c}(n)), by Lemma 4.18, we only need to run extra O(1)O(1) iterations of Newton’s step to obtain an approximate volumetric center zz with desired guarantee. Thus, the overall arithmetic operation count is O(n3logm)O(n^{3}\log m). ∎

5 Efficient Minimization via Fast Cutting and Lazy Width Measurement

In Section 5.1, we present our main algorithm, Algorithm 1. In Section 5.2, we present the correctness of our algorithm. In Section 5.3, we show the oracle complexity of our algorithm. In Section 5.4, we show the overall running time of our algorithm. In Section 5.5, we summarize our main result.

5.1 Our Algorithm

Our algorithm is a mixture of efficient cutting plane method of [34], fast shortest vector algorithm of [51] and a novel adaptation and simplification of [32]. The algorithm maintains an affine subspace WW, a lattice Λ\Lambda and a polytope KK. The algorithm then proceeds as follows: it computes the approximate shortest vector on Λ\Lambda with respect to HK1H_{K}^{-1} norm, where HKH_{K} is the Hessian of log barrier function at an approximate volumetric center. If the HK1H_{K}^{-1} norm of the shortest vector is relatively large, the algorithm performs a sequence of CuttingPlaneMethod for T=O(nlogn)T=O(n\log n) rounds.

Algorithm 1 Our Algorithm
1:procedure Main(𝖲𝖮,R\operatorname{\mathsf{SO}},R) \triangleright Theorem 1.1
2:     m2nm\leftarrow 2n
3:     WnW\leftarrow\mathbb{R}^{n} be an affine subspace
4:     KB(R)K\leftarrow B_{\infty}(R) be a polytope \triangleright KK can be parameterized by Am×nA\in\mathbb{R}^{m\times n} and bmb\in\mathbb{R}^{m}
5:     Λn\Lambda\leftarrow\operatorname{\mathbb{Z}}^{n} be a lattice
6:     xK0x_{K}\leftarrow 0 be the approximate volumetric center
7:     HKi=1maiai(aixKbi)2H_{K}\leftarrow\sum_{i=1}^{m}\frac{a_{i}a_{i}^{\top}}{(a_{i}^{\top}x_{K}-b_{i})^{2}} be the Hessian of log barrier
8:     TO(nlogm),ϵ0.01T\leftarrow O(n\log m),\epsilon\leftarrow 0.01
9:     while dim(W)>1\mathrm{dim}(W)>1 do
10:         vFasterShortestVector(Λ,HK1)v\leftarrow\textsc{FasterShortestVector}(\Lambda,H_{K}^{-1}) \triangleright Theorem 3.5
11:         if vHK12100nlogn\|v\|_{H_{K}^{-1}}\geq 2^{-100n\log n} then
12:              (K,xK,HK)CuttingPlaneMethod(𝖲𝖮,K,T,ϵ)(K^{\prime},x_{K^{\prime}},H_{K}^{\prime})\leftarrow\textsc{CuttingPlaneMethod}(\operatorname{\mathsf{SO}},K,T,\epsilon) \triangleright Lemma 4.20
13:              KK,xKxK,HKHKK\leftarrow K^{\prime},x_{K}\leftarrow x_{K^{\prime}},H_{K}\leftarrow H_{K^{\prime}}
14:         else
15:              Find znz\in\operatorname{\mathbb{Z}}^{n} such that v=ΠWxK(z)v=\Pi_{W-x_{K}}(z)
16:              P{y:vy=(vz)xK+[zxK]}P\leftarrow\{y:v^{\top}y=(v-z)^{\top}x_{K}+[z^{\top}x_{K}]\}
17:              WWPW\leftarrow W\cap P
18:              Let E(W,a)E(W,a) be the ellipsoid 3m1.5nE(xK,HK)P3m^{1.5}n\cdot E(x_{K},H_{K})\cap P
19:              Kw+A1/2B(1)K\leftarrow w+A^{-1/2}B_{\infty}(1)
20:              xKw,HK=i=1maiai(aixKbi)2x_{K}\leftarrow w,H_{K}=\sum_{i=1}^{m}\frac{a_{i}a_{i}^{\top}}{(a_{i}^{\top}x_{K}-b_{i})^{2}}
21:              Construct hyperplane P0{y:vy=0}P_{0}\leftarrow\{y:v^{\top}y=0\}
22:              ΛΠP0(Λ)\Lambda\leftarrow\Pi_{P_{0}}(\Lambda)
23:         end if
24:     end while
25:     Find integral minimizer xnKx^{*}\in\operatorname{\mathbb{Z}}^{n}\cap K
26:     return xx^{*}
27:end procedure

5.2 Correctness of Our Algorithm

We prove the output guarantee of our algorithm (Algorithm 1) in Lemma 5.1.

Lemma 5.1 (Formal version of Theorem 1.1, output guarantee part).

Given a separation oracle 𝖲𝖮\operatorname{\mathsf{SO}} for a convex function ff defined on n\mathbb{R}^{n} and a γ\gamma-approximation algorithm ApproxShortestVector for the shortest vector problem which takes 𝒯ApproxSV{\cal T}_{\textsc{ApproxSV}} arithmetic operations. Suppose the set of minimizers KK^{*} of ff is contained in a box of radius RR and satisfies all extreme points of KK^{*} are integral, Algorithm 1 finds an integral minimizer of ff.

Proof.

Recall that lattice projection and the dual of lattice projections are defined as in Definition 3.9 and Definition 3.10.

We start by showing that the lattice Λ\Lambda is the orthogonal projection of n\operatorname{\mathbb{Z}}^{n} onto the subspace W0W_{0} via induction. In the beginning of each iteration we have KWK\subseteq W and ΛW0\Lambda\subseteq W_{0} where W0W_{0} is a translation of WW passing through the origin. In the beginning of the algorithm, Λ=n\Lambda=\operatorname{\mathbb{Z}}^{n} and W=nW=\mathbb{R}^{n}, so Λ=ΠW0(n)\Lambda=\Pi_{W_{0}}(\operatorname{\mathbb{Z}}^{n}). Note that the lattice Λ\Lambda and subspace WW are only updated in the dimension reduction step of Algorithm 1. For inductive step, let Λt1\Lambda_{t-1} to denote the lattice at the (t1)th(t-1)^{\rm th} dimension reduction step and Λt1=ΠW0(n)\Lambda_{t-1}=\Pi_{W_{0}}(\operatorname{\mathbb{Z}}^{n}) and we prove for tt.

Λt=\displaystyle\Lambda_{t}= ΠP0(Λt1)\displaystyle~{}\Pi_{P_{0}}(\Lambda_{t-1})
=\displaystyle= ΠP0(ΠW0(n))\displaystyle~{}\Pi_{P_{0}}(\Pi_{W_{0}}(\operatorname{\mathbb{Z}}^{n}))
=\displaystyle= ΠW0P0(ΠW0(n))\displaystyle~{}\Pi_{W_{0}\cap P_{0}}(\Pi_{W_{0}}(\operatorname{\mathbb{Z}}^{n}))
=\displaystyle= ΠW0P0(n),\displaystyle~{}\Pi_{W_{0}\cap P_{0}}(\operatorname{\mathbb{Z}}^{n}),

to see the third equality, we note that P0P_{0} is the orthogonal subspace of vv and vW0v\in W_{0}. As initially W0=nW_{0}=\mathbb{R}^{n}, we can inductively show that at time tt, the subspace W0W_{0} is the orthogonal complement to v1,,vt1v_{1},\ldots,v_{t-1} where viv_{i} is the (approximate) shortest vector we use in iteration ii. As Λt1=ΠW0(n)\Lambda_{t-1}=\Pi_{W_{0}}(\operatorname{\mathbb{Z}}^{n}), it is a subspace orthogonal to span(v1,,vt1){\rm span}(v_{1},\ldots,v_{t-1}). Projecting this subspace onto P0P_{0} makes it orthogonal to vtv_{t}. Thus, first projecting onto W0W_{0} then projecting onto P0P_{0} is equivalent to first projecting onto W0W_{0} then projecting onto W0P0W_{0}\cap P_{0}. The last equality follows from W0P0W_{0}\cap P_{0} is a subspace of W0W_{0}. This completes the proof of the lattice property.

Now we are ready to show that Algorithm 1 indeed finds the integral minimizer. Assuming ff has a unique minimizer xnx^{*}\in\operatorname{\mathbb{Z}}^{n}, we note that CuttingPlaneMethod preserves xx^{*}, so it suffices to show that the dimension reduction step also preserves xx^{*}. We show that in fact, the dimension reduction step preserves all integral points in KK.

Lemma 4.11 gives the following sandwiching condition:

12E(vc(K),H(vc(K)))K2m1.5nE(vc(K),H(vc(K))).\displaystyle\frac{1}{2}\cdot E(\mathrm{vc}(K),H({\rm vc}(K)))\subseteq K\subseteq 2m^{1.5}n\cdot E(\mathrm{vc}(K),H({\rm vc}(K))).

Recall that we set HKH_{K} to be a (1±ϵ)(1\pm\epsilon)-spectral approximation to H(vc(K))H({\rm vc}(K)):

(1ϵ)HKH(vc(K))(1+ϵ)HK\displaystyle(1-\epsilon)\cdot H_{K}\preceq H({\rm vc}(K))\preceq(1+\epsilon)\cdot H_{K}

We know that

xKvc(K)H(vc(K))\displaystyle\|x_{K}-{\rm vc}(K)\|_{H({\rm vc}(K))}\leq ϵm,\displaystyle~{}\epsilon m,

this means that vc(K)2ϵ1/2m1/2E(xK,H(xK)){\rm vc}(K)\in 2\epsilon^{1/2}m^{1/2}\cdot E(x_{K},H(x_{K})). Consequently, we have

(vc(K)xK)H(xK)(vc(K)xK)\displaystyle({\rm vc}(K)-x_{K})^{\top}H(x_{K})({\rm vc}(K)-x_{K})\leq ϵm.\displaystyle~{}\epsilon m. (4)

Let yKy\in K, by the sandwiching condition, we also have y2m1.5nE(vc(K),H(xK))y\in 2m^{1.5}n\cdot E({\rm vc}(K),H(x_{K})) and subsequently

(yvc(K))H(xK)(yvc(K))\displaystyle(y-{\rm vc}(K))^{\top}H(x_{K})(y-{\rm vc}(K))\leq 4m3n2.\displaystyle~{}4m^{3}n^{2}. (5)

Combining Eq. (4) and (5), we conclude

(yxK)H(xK)(yxK)\displaystyle~{}(y-x_{K})^{\top}H(x_{K})(y-x_{K})
=\displaystyle= yxKH(xK)2\displaystyle~{}\|y-x_{K}\|_{H(x_{K})}^{2}
\displaystyle\leq 2vc(K)xKH(xK)2+2yvc(K)H(xK)2\displaystyle~{}2\|\operatorname{vc}(K)-x_{K}\|_{H(x_{K})}^{2}+2\|y-\operatorname{vc}(K)\|_{H(x_{K})}^{2}
\displaystyle\leq 8m3n2+2ϵm\displaystyle~{}8m^{3}n^{2}+2\epsilon m
\displaystyle\leq 9m3n2.\displaystyle~{}9m^{3}n^{2}.

We thus have shown that

K\displaystyle K\subseteq O(m1.5n)E(xK,HK).\displaystyle~{}O(m^{1.5}n)\cdot E(x_{K},H_{K}).

Now we proceed to show that each dimension reduction iteration preserves all integral points in KK. We have

Kn3m1.5nE(xK,HK)n.\displaystyle K\cap\operatorname{\mathbb{Z}}^{n}\subseteq 3m^{1.5}n\cdot E(x_{K},H_{K})\cap\operatorname{\mathbb{Z}}^{n}.

Since vHK11/(10n)\|v\|_{H_{K}^{-1}}\leq 1/(10n) is satisfied in a dimension reduction step, we can invoke Lemma 3.12, which states that all integral points in 3m1.5nE(xK,HK)3m^{1.5}n\cdot E(x_{K},H_{K}) lie on the hyperplane given by

P={y:vy=(vz)xK+[zxK]}.\displaystyle P=\{y:v^{\top}y=(v-z)^{\top}x_{K}+[z^{\top}x_{K}]\}.

Thus, we have KnKPK\cap\operatorname{\mathbb{Z}}^{n}\subseteq K\cap P and this finishes the proof of the lemma. ∎

5.3 Oracle Complexity

We prove the oracle complexity of Algorithm 1.

Lemma 5.2 (Oracle complexity part of Theorem 5.4).

Given a separation oracle 𝖲𝖮\operatorname{\mathsf{SO}} for a convex function ff on n\mathbb{R}^{n} such that the set of minimizers KK^{*} of ff is contained in a box of radius RR and all extreme points of KK^{*} are integral, then there exists a randomized algorithm (Algorithm 1) that outputs an integral minimizer of ff with at most O(n2logn+nlog(γnR))O(n^{2}\log n+n\log(\gamma nR)) calls to 𝖲𝖮\operatorname{\mathsf{SO}} with high probability.

Proof.

We consider the potential function

Φ:=log(vol(K)det(Λ)).\displaystyle\Phi:=\log({\rm vol}(K)\cdot\det(\Lambda)).

In the beginning, Φ=log(vol(B)det(I))=nlogR\Phi=\log({\rm vol}(B_{\infty})\cdot\det(I))=n\log R. A sequence of O(nlogm)O(n\log m) calls to CuttingPlaneMethod reduce the volume by a factor of (1m)n=(12)nlogm(\frac{1}{m})^{n}=(\frac{1}{2})^{n\log m}, consequently the potential decreases by nlogmn\log m, additively.

Without loss of generality, let us assume we have a maximal sequence of dimension reduction steps at t=1,2,,k+1t=1,2,\ldots,k+1.

Note that the potential at the beginning of this maximal sequence of dimension reduction iteration is

eΦ(0)=\displaystyle e^{\Phi^{(0)}}= vol(K(0))det(Λ(0))\displaystyle~{}{\rm vol}(K^{(0)})\cdot\det(\Lambda^{(0)})
=\displaystyle= vol(K(0))det((Λ(0))).\displaystyle~{}\frac{{\rm vol}(K^{(0)})}{\det((\Lambda^{(0)})^{*})}.

Between tt and t+1t+1, we note that K(t+1)K^{(t+1)} is designed in the following fashion: first compute K(t)W(t+1)K^{(t)}\cap W^{(t+1)}, then consider its outer ellipsoid E(w,A)=3m1.5nE(xK,HK)PE(w,A)=3m^{1.5}n\cdot E(x_{K},H_{K})\cap P, note that this blows up the volume by a factor of nO(n)n^{O(n)}. Finally, set KK to be the hyperrectangle containing E(w,A)E(w,A) which is w+A1/2B(1)w+A^{-1/2}B_{\infty}(1). This again blows up the volume by a factor of nO(n)n^{O(n)}.

In the beginning of t=1t=1, we have W(1)=W(0)W^{(1)}=W^{(0)} and K(1)W(1)K^{(1)}\subseteq W^{(1)}. By the proceeding discussion, we have that

vol(K(2))\displaystyle{\rm vol}(K^{(2)})\leq nc1nvol(K(1)W(2))\displaystyle~{}n^{c_{1}n}\cdot{\rm vol}(K^{(1)}\cap W^{(2)})

for some absolute constant c1c_{1}, similarly, we can upper bound the volume of K(3)K^{(3)} as follows:

vol(K(3))\displaystyle{\rm vol}(K^{(3)})\leq nc2nvol(K(2)W(3))\displaystyle~{}n^{c_{2}n}\cdot{\rm vol}(K^{(2)}\cap W^{(3)})
\displaystyle\leq nc1c2nvol(K(1)W(3)),\displaystyle~{}n^{c_{1}c_{2}n}\cdot{\rm vol}(K^{(1)}\cap W^{(3)}),

recursively apply this inequality, we conclude

vol(K(k+1))\displaystyle{\rm vol}(K^{(k+1)})\leq nO(nk)vol(K(1)W(k+1))\displaystyle~{}n^{O(nk)}\cdot{\rm vol}(K^{(1)}\cap W^{(k+1)})

The potential after this sequence of dimension reduction iterations is

eΦ(k+1)=\displaystyle e^{\Phi^{(k+1)}}= vol(K(k+1))det(ΠW0(k+1)(Λ(0)))\displaystyle~{}{\rm vol}(K^{(k+1)})\cdot\det(\Pi_{W_{0}^{(k+1)}}(\Lambda^{(0)}))
=\displaystyle= nO(nk)vol(K(1)W(k+1))det(ΠW0(k+1)(Λ(0)))\displaystyle~{}n^{O(nk)}\cdot{\rm vol}(K^{(1)}\cap W^{(k+1)})\cdot\det(\Pi_{W_{0}^{(k+1)}}(\Lambda^{(0)}))
=\displaystyle= nO(nk)vol(K(1)W(k+1))det(ΠW0(k+1)((Λ(0))))\displaystyle~{}n^{O(nk)}\cdot\frac{{\rm vol}(K^{(1)}\cap W^{(k+1)})}{\det(\Pi_{W_{0}^{(k+1)}}((\Lambda^{(0)}))^{*})}
=\displaystyle= nO(nk)vol(K(1)W(k+1))det((Λ(0))W0(k+1))\displaystyle~{}n^{O(nk)}\cdot\frac{{\rm vol}(K^{(1)}\cap W^{(k+1)})}{\det((\Lambda^{(0)})^{*}\cap W_{0}^{(k+1)})}
\displaystyle\leq nO(nk)vol(K(0)W(k+1))det((Λ(0))W0(k+1))\displaystyle~{}n^{O(nk)}\cdot\frac{{\rm vol}(K^{(0)}\cap W^{(k+1)})}{\det((\Lambda^{(0)})^{*}\cap W_{0}^{(k+1)})}

where the third step is by taking the dual lattice projection (Fact 3.10), the fourth step is due to (ΠW0(k+1)(Λ(0)))=(Λ(0))W0(k+1)(\Pi_{W_{0}^{(k+1)}}(\Lambda^{(0)}))^{*}=(\Lambda^{(0)})^{*}\cap W_{0}^{(k+1)} by Fact 3.9, and the last step is by the volume shrinking after cutting.

Since W(k+1)W^{(k+1)} is a translation of the subspace W0(k+1)W_{0}^{(k+1)}, we can apply Lemma 4.17 by taking L=(Λ(1))L=(\Lambda^{(1)})^{*} to obtain

eΦ(k+1)eΦ(0)nO(nk)kO(k)(mn)O(k)λ1(Λ(0),(HK(0))1)k\displaystyle e^{\Phi^{(k+1)}}\leq e^{\Phi^{(0)}}\cdot\frac{n^{O(nk)}\cdot k^{O(k)}\cdot(mn)^{O(k)}}{\lambda_{1}(\Lambda^{(0)},(H_{K}^{(0)})^{-1})^{k}} (6)

It remains to provide a lower bound on λ1(Λ(1),K(1))\lambda_{1}(\Lambda^{(1)},K^{(1)}).

As CuttingPlaneMethod is used in iteration t0t_{0}, we have

v(0)(HK(0))1min{110γn,2100nlogn}\displaystyle\|v^{(0)}\|_{(H_{K}^{(0)})^{-1}}\geq\min\{\frac{1}{10\gamma n},2^{-100n\log n}\}

for the output vector v(0)Λ(0)v^{(0)}\in\Lambda^{(0)}, and that Λ(0)=Λ(1)\Lambda^{(0)}=\Lambda^{(1)} since a cutting plane iteration doesn’t change the lattice.

Since the ApproxShortestVector procedure is γ\gamma-approximation and that HK(0)H_{K}^{(0)} is a (1±ϵ)(1\pm\epsilon)-spectral approximation to Hvc(K)(0)H_{{\rm vc}(K)}^{(0)}, this implies that

λ1(Λ(0),(HK(0))1)v(0)(HK(0))1γΩ(1)γnn\displaystyle\lambda_{1}(\Lambda^{(0)},(H_{K}^{(0)})^{-1})\geq\frac{\|v^{(0)}\|_{(H_{K}^{(0)})^{-1}}}{\gamma}\geq\frac{\Omega(1)}{\gamma n^{n}} (7)

Combining Eq. (6) and Eq. (7), we have

eΦ(k+1)eΦ(0)γO(k)nO(nk)\displaystyle e^{\Phi^{(k+1)}}\leq e^{\Phi^{(0)}}\cdot\gamma^{O(k)}\cdot n^{O(nk)}

as m=O(n)m=O(n).

This shows that after a sequence of kk dimension reduction iterations, the potential increases additively by at most O(klog(γn)+nklogn)O(k\log(\gamma n)+nk\log n). As there are at most n1n-1 such iterations the total amount of potential increase is at most O(nlog(γn)+n2logn)O(n\log(\gamma n)+n^{2}\log n).

Finally we note that whenever the potential becomes smaller than 100nlog(100γn)-100n\log(100\gamma n), Minkowski’s first theorem shows the existence of a nonzero vector vΛv\in\Lambda with vHK11/(100γn)\|v\|_{H_{K}^{-1}}\leq 1/(100\gamma n). This implies the γ\gamma-approximation algorithm ApproxShortestVector for the shortest vector problem will find a nonzero vector vΛv^{\prime}\in\Lambda with vHK11/(100n)\|v^{\prime}\|_{H_{K}^{-1}}\leq 1/(100n). So the next iteration where we run the LLL algorithm will always reduce the dimension.

Therefore, the sequence of TT CuttingPlaneMethod will be run at most

O(log(γnR)logn+n)\displaystyle O\big{(}\frac{\log(\gamma nR)}{\log n}+n\big{)}

times.

Since each run of CuttingPlaneMethod uses T=O(nlogm)=O(nlogn)T=O(n\log m)=O(n\log n) oracle calls, this also gives the total number of oracle calls

O(n2logn+nlog(γnR)).\displaystyle O(n^{2}\log n+n\log(\gamma nR)).

This finishes the proof of the theorem.

5.4 Runtime Analysis

The goal of this section is to prove Lemma 5.3.

Lemma 5.3 (Runtime part of Theorem 5.4).

Given a separation oracle 𝖲𝖮\operatorname{\mathsf{SO}} for a convex function ff on n\mathbb{R}^{n} such that the set of minimizers KK^{*} of ff is contained in a box of radius RR and all extreme points of KK^{*} are integral, then there exists a randomized algorithm (Algorithm 1) that outputs an integral minimizer of ff, and uses O(n4log(nR))O(n^{4}\log(nR)) arithmetic operations.

Proof.

As we use γ\gamma-ApproximateShortestVector with γ=O(2n)\gamma=O(2^{n}), the total number of oracle calls is O(n2log(nR))O(n^{2}\log(nR)).

The dimensional reduction step occurs at most O(n)O(n) times. Each dimension reduction step takes at most O(n3)O(n^{3}) arithmetic operations, amounts to an O(n4)O(n^{4}) arithmetic operations in total.

The CuttingPlaneMethod is called with T=O(nlogn)T=O(n\log n), so each call uses O(n3logn)O(n^{3}\log n) arithmetic operations. As such calls happen at most O(nlog(nR)logn)O(\frac{n\log(nR)}{\log n}) times, the total arithmetic operations for CuttingPlaneMethod is at most O(n4log(nR))O(n^{4}\log(nR)).

Regarding the number of calls to FasterShortestVector, we note that there are at most O(n)O(n) dimension reduction steps, and at most O(nlogR)O(n\log R) calls to the sequence of CPM. Thus, the total number of calls to FasterShortestVector can be upper bounded by O(nlogR)O(n\log R), yielding a total of O(n4logR)O(n^{4}\log R) arithmetic operations. ∎

5.5 Main Result

The goal of this section is to prove Theorem 5.4.

Theorem 5.4 (Main result, formal version of Theorem 1.1).

Given a separation oracle 𝖲𝖮\operatorname{\mathsf{SO}} for a convex function ff on n\mathbb{R}^{n} such that the set of minimizers KK^{*} of ff is contained in a box of radius RR and all extreme points of KK^{*} are integral, then there exists a randomized algorithm (Algorithm 1) that outputs an integral minimizer of ff with high probability, and uses

  • O(n2log(nR)){O}(n^{2}\log(nR)) calls to 𝖲𝖮\operatorname{\mathsf{SO}}.

  • O(n4log(nR)){O}(n^{4}\log(nR)) additional arithmetic operations.

Proof.

The proof follows from directly combining Lemma 5.1, Lemma 5.2 and Lemma 5.3. ∎

Acknowledgement

We would like to thank Jonathan Kelner for many helpful discussions and for suggesting the title of the paper. Lichen Zhang is supported in part by NSF grant No. 1955217 and NSF grant No. 2022448.

References

  • AC [91] Ilan Adler and Steven Cosares. A strongly polynomial algorithm for a special class of linear programs. Operations Research, 39(6):955–960, 1991.
  • AKS [01] Miklós Ajtai, Ravi Kumar, and D. Sivakumar. A sieve algorithm for the shortest lattice vector problem. In Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing, STOC ’01, 2001.
  • Ans [97] Kurt M. Anstreicher. Volumetric path following algorithms for linear programming. Math. Program., 1997.
  • AV [95] David S. Atkinson and Pravin M. Vaidya. A cutting plane algorithm for convex programming that uses analytic centers. Math. Program., 1995.
  • BLSS [20] Jan van den Brand, Yin Tat Lee, Aaron Sidford, and Zhao Song. Solving tall dense linear programs in nearly linear time. In STOC, 2020.
  • BV [04] Dimitris Bertsimas and Santosh Vempala. Solving convex programs by random walks. J. ACM, 2004.
  • BWZ [16] Christos Boutsidis, David P Woodruff, and Peilin Zhong. Optimal principal component analysis in distributed and streaming models. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 236–249, 2016.
  • CGJS [23] Deeparnab Chakrabarty, Andrei Graur, Haotian Jiang, and Aaron Sidford. Parallel submodular function minimization. In NeurIPS 2023, 2023.
  • Chu [12] Sergei Chubanov. A strongly polynomial algorithm for linear systems having a binary solution. Mathematical programming, 134(2):533–570, 2012.
  • Chu [15] Sergei Chubanov. A polynomial algorithm for linear optimization which is strongly polynomial under certain conditions on optimal solutions, 2015.
  • CLS [19] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), 2019.
  • CM [91] Edith Cohen and Nimrod Megiddo. Improved algorithms for linear inequalities with two variables per inequality. In Proceedings of the twenty-third annual ACM symposium on Theory of Computing, pages 145–155, 1991.
  • CW [13] Kenneth L Clarkson and David P Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of Computing (STOC), pages 81–90, 2013.
  • DHNV [20] Daniel Dadush, Sophie Huiberts, Bento Natura, and László A Végh. A scaling-invariant algorithm for linear programming whose running time depends only on the constraint matrix. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 761–774, 2020.
  • Dik [67] II Dikin. Iterative solution of problems of linear and quadratic programming. In Doklady Akademii Nauk, volume 174, pages 747–748. Russian Academy of Sciences, 1967.
  • DSW [22] Yichuan Deng, Zhao Song, and Omri Weinstein. Discrepancy minimization in input-sparsity time. arXiv preprint arXiv:2210.12468, 2022.
  • DVZ [21] Dan Dadush, László A Végh, and Giacomo Zambelli. Geometric rescaling algorithms for submodular function minimization. Mathematics of Operations Research, 46(3):1081–1108, 2021.
  • DWZ [23] Ran Duan, Hongxun Wu, and Renfei Zhou. Faster matrix multiplication via asymmetric hashing. In FOCS, 2023.
  • Edm [03] Jack Edmonds. Submodular functions, matroids, and certain polyhedra. In Combinatorial Optimization—Eureka, You Shrink! Papers Dedicated to Jack Edmonds 5th International Workshop Aussois, France, March 5–9, 2001 Revised Papers, pages 11–26. Springer, 2003.
  • FI [03] Lisa Fleischer and Satoru Iwata. A push-relabel framework for submodular function minimization and applications to parametric optimization. Discrete Applied Mathematics, 131(2):311–322, 2003.
  • GJS [23] Andrei Graur, Haotian Jiang, and Aaron Sidford. Sparse submodular function minimization. In FOCS 2023. IEEE, 2023.
  • GLS [81] Martin Grötschel, László Lovász, and Alexander Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981.
  • GLS [84] Martin Grötschel, László Lovász, and Alexander Schrijver. Geometric methods in combinatorial optimization. In Progress in combinatorial optimization, pages 167–183. Elsevier, 1984.
  • GLS [88] Martin Grotschel, Laszlo Lovasz, and Alexander Schrijver. Geometric algorithms and combinatorial optimization. Springer, 1988.
  • GSZ [23] Yuzhou Gu, Zhao Song, and Lichen Zhang. Faster algorithms for structured linear and kernel support vector machines, 2023.
  • HJS+ [22] Baihe Huang, Shunhua Jiang, Zhao Song, Runzhou Tao, and Ruizhe Zhang. Solving sdp faster: A robust ipm framework and efficient implementation. In FOCS, 2022.
  • IFF [01] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial strongly polynomial algorithm for minimizing submodular functions. Journal of the ACM (JACM), 48(4):761–777, 2001.
  • IO [09] Satoru Iwata and James B Orlin. A simple combinatorial algorithm for submodular function minimization. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 1230–1237. SIAM, 2009.
  • Iwa [03] Satoru Iwata. A faster scaling algorithm for minimizing submodular functions. SIAM Journal on Computing, 32(4):833–840, 2003.
  • Iwa [08] Satoru Iwata. Submodular function minimization. Math. Program., 2008.
  • Jia [21] Haotian Jiang. Minimizing convex functions with integral minimizers. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, 2021.
  • Jia [22] Haotian Jiang. Minimizing convex functions with rational minimizers. ACM Journal of the ACM (JACM), 2022.
  • JKL+ [20] Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, and Zhao Song. A faster interior point method for semidefinite programming. In FOCS, 2020.
  • JLSW [20] Haotian Jiang, Yin Tat Lee, Zhao Song, and Sam Chiu-wai Wong. An improved cutting plane method for convex optimization, convex-concave games, and its applications. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 944–953, 2020.
  • JNW [22] Shunhua Jiang, Bento Natura, and Omri Weinstein. A faster interior-point method for sum-of-squares optimization. 49th International Colloquium on Automata, Languages, and Programming (ICALP), 2022.
  • JSWZ [21] Shunhua Jiang, Zhao Song, Omri Weinstein, and Hengjie Zhang. Faster dynamic matrix inverse for faster lps. In STOC, 2021.
  • Kha [80] L.G. Khachiyan. Polynomial algorithms in linear programming. USSR Computational Mathematics and Mathematical Physics, 1980.
  • KLS [95] Ravi Kannan, László Lovász, and Miklós Simonovits. Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry, 13(3):541–559, 1995.
  • Lev [65] Anatoly Yur’evich Levin. An algorithm for minimizing convex functions. In Doklady Akademii Nauk, volume 160, pages 1244–1247. Russian Academy of Sciences, 1965.
  • LG [24] Francois Le Gall. Faster rectangular matrix multiplication by combination loss analysis. In SODA, 2024.
  • LLL [82] Arjen K Lenstra, Hendrik Willem Lenstra, and László Lovász. Factoring polynomials with rational coefficients. Mathematische annalen, 1982.
  • LS [14] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs in O(rank){O}(\sqrt{rank}) iterations and faster algorithms for maximum flow. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 424–433. IEEE, 2014.
  • LS [19] Yin Tat Lee and Aaron Sidford. Solving linear programs with sqrt (rank) linear system solves. arXiv preprint arXiv:1910.08033, 2019.
  • LSW [15] Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 1049–1065. IEEE, 2015.
  • LSZ+ [23] S Cliff Liu, Zhao Song, Hengjie Zhang, Lichen Zhang, and Tianyi Zhou. Space-efficient interior point method, with applications to linear programming and maximum weight bipartite matching. In ICALP, 2023.
  • McC [05] S Thomas McCormick. Submodular function minimization. Discrete Optimization, 12:321–391, 2005.
  • Meg [83] Nimrod Megiddo. Towards a genuinely polynomial algorithm for linear programming. SIAM Journal on Computing, 12(2):347–353, 1983.
  • New [65] Donald J Newman. Location of the maximum on unimodal surfaces. Journal of the ACM (JACM), 12(3):395–398, 1965.
  • NN [89] I.U.E. Nesterov and A.S. Nemirovski. Self-concordant Functions and Polynomial-time Methods in Convex Programming. USSR Academy of Sciences, Central Economic & Mathematic Institute, 1989.
  • NN [94] Yurii Nesterov and Arkadi Nemirovski. Interior-point polynomial algorithms in convex programming, volume 13. Siam, 1994.
  • NS [16] Arnold Neumaier and Damien Stehlé. Faster lll-type reduction of lattice bases. In Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation (ISSAC), pages 373–380, 2016.
  • Orl [09] James B Orlin. A faster strongly polynomial time algorithm for submodular function minimization. Mathematical Programming, 118(2):237–251, 2009.
  • OV [20] Neil Olver and László A Végh. A simpler and faster strongly polynomial algorithm for generalized flow maximization. Journal of the ACM (JACM), 67(2):1–26, 2020.
  • Ren [88] James Renegar. A polynomial-time algorithm, based on newton’s method, for linear programming. Math. Program., 1988.
  • Rot [16] Thomas Rothvoss. Integer optimization and lattices. University of Washington, Spring, 2016.
  • Sch [87] C.P. Schnorr. A hierarchy of polynomial time lattice basis reduction algorithms. Theor. Comput. Sci., 1987.
  • Sch [00] Alexander Schrijver. A combinatorial algorithm minimizing submodular functions in strongly polynomial time. Journal of Combinatorial Theory, Series B, 80(2):346–355, 2000.
  • Sho [77] Naum Shor. Cut-off method with space extension in convex programming problems. Cybernetics and systems analysis, 1977.
  • Sma [98] Steve Smale. Mathematical problems for the next century. The mathematical intelligencer, 20(2):7–15, 1998.
  • SWZ [17] Zhao Song, David P Woodruff, and Peilin Zhong. Low rank approximation with entrywise l1-norm error. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 688–701, 2017.
  • SWZ [19] Zhao Song, David P Woodruff, and Peilin Zhong. Relative error tensor low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2772–2789. SIAM, 2019.
  • SYYZ [22] Zhao Song, Xin Yang, Yuanyuan Yang, and Tianyi Zhou. Faster algorithm for structured john ellipsoid computation. arXiv preprint arXiv:2211.14407, 2022.
  • SYZ [23] Zhao Song, Mingquan Ye, and Lichen Zhang. Streaming semidefinite programs: O(n){O}(\sqrt{n}) passes, small space and fast runtime. arXiv preprint arXiv:2309.05135, 2023.
  • Tar [86] Éva Tardos. A strongly polynomial algorithm to solve combinatorial linear programs. Operations Research, 34(2):250–256, 1986.
  • TKE [88] S. P. Tarasov, L. G. Khachiyan, and I. I. Èrlikh. The method of inscribed ellipsoids. Dokl. Akad. Nauk SSSR, 1988.
  • VA [93] Pravin M. Vaidya and David S. Atkinson. A Technique for Bounding the Number of Iterations in Path Following Algorithms, pages 462–489. World Scientific Publishing Company, 1993.
  • Vai [89] P.M. Vaidya. A new algorithm for minimizing convex functions over convex sets. In 30th Annual Symposium on Foundations of Computer Science, pages 338–343, 1989.
  • Vég [14] László A Végh. A strongly polynomial algorithm for generalized flow maximization. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 644–653, 2014.
  • VWXXZ [24] Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. New bounds for matrix multiplication: from alpha to omega. In SODA, 2024.
  • VY [96] Stephen A Vavasis and Yinyu Ye. A primal-dual interior point method whose running time depends only on the constraint matrix. Mathematical Programming, 74(1):79–120, 1996.
  • Vyg [03] Jens Vygen. A note on schrijver’s submodular function minimization algorithm. Journal of Combinatorial Theory, Series B, 88(2):399–402, 2003.
  • YN [76] David Yudin and Arkadii Nemirovski. Evaluation of the information complexity of mathematical programming problems. Ekonomika i Matematicheskie Metody, 1976.