Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts
Abstract
The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.
1 Introduction
Integer programming (IP) solvers are the most widely-used tools for solving discrete optimization problems. They have numerous applications in machine learning, operations research, and many other fields, including MAP inference [22], combinatorial auctions [33], natural language processing [23], neural network verification [11], interpretable classification [37], training of optimal decision trees [9], and optimal clustering [30], among many others.
Under the hood, IP solvers use the tree-search algorithm branch-and-bound [26] augmented with cutting planes, known as branch-and-cut (B&C). A cutting plane is a linear constraint that is added to the LP relaxation at any node of the search tree. With a carefully selected cutting plane, the LP guidance can more efficiently lead B&C to the globally optimal integral solution. Cutting planes, specifically the family of Gomory mixed integer cuts which we study in this paper, are responsible for breakthrough speedups of modern IP solvers [14].
Successfully employing cutting planes can be challenging because there are infinitely many cuts to choose from and there are still many open questions about which cuts to employ when. A growing body of research has studied the use of machine learning for cut selection [34, 8, 7, 19]. In this paper, we analyze a machine learning setting where there is an unknown distribution over IPs—for example, a distribution over a shipping company’s routing problems. The learner receives a training set of IPs sampled from this distribution which it uses to learn cut parameters with strong average performance over the training set (leading, for example, to small search trees). We provide sample complexity bounds for this procedure, which bound the number of training instances sufficient to ensure that if a set of cut parameters leads to strong average performance over the training set, it will also lead to strong expected performance on future IPs from the same distribution. These guarantees apply no matter what procedure is used to optimize the cut parameters over the training set—optimal or suboptimal, automated or manual. We prove these guarantees by analyzing how the B&C tree varies as a function of the cut parameters on any IP. By bounding the “intrinsic complexity” of this function, we are able to provide our sample complexity bounds.


Figures 1 and 2 illustrate the need for distribution-dependent policies for choosing cutting planes. We plot the average number of nodes expanded by B&C as a function of a parameter that controls its cut-selection policy, as we detail in Appendix A. In each figure, we draw a training set of facility location integer programs from two different distributions. In Figure 1, we define the distribution by starting with a uniformly random facility location instance and perturbing its costs. In Figure 2, the costs are more structured: the facilities are located along a line and the clients have uniformly random locations. In Figure 1, a smaller value of leads to small search trees, but in Figure 2, a larger value of is preferable. These figures illustrate that tuning cut parameters according to the instance distribution at hand can have a large impact on the performance of B&C, and that for one instance distribution, the best parameters for cut evaluation can be very different—in fact opposite—than the optimal parameters for another instance distribution.
The key challenge we face in developing a theory for cutting planes is that a cut added at the root remains in the LP relaxations stored in each node all the way to the leaves, thereby impacting the LP guidance that B&C uses to search throughout the whole tree. Tiny changes to any cut can thus completely change the entire course of B&C. At its core, our analysis therefore involves understanding an intricate interplay between the continuous and discrete components of our problem. The first, continuous component requires us to characterize how the solution the LP relaxation changes as a function of its constraints. This optimal solution will move continuously through space until it jumps from one vertex of the LP tableau to another. We then use this characterization to analyze how the B&C tree—a discrete, combinatorial object—varies as a function of its LP guidance.
1.1 Our contributions
Our first main contribution (Section 3) addresses a fundamental question: how does an LP’s solution change when new constraints are added? As the constraints vary, the solution will jump from vertex to vertex of the LP polytope. We prove that one can partition the set of all possible constraint vectors into a finite number of regions such that within any one region, the LP’s solution has a clean closed form. Moreover, we prove that the boundaries defining this partition have a specific form, defined by degree-2 polynomials.
We build on this result to prove our second main contribution (Section 4), which analyzes how the entire B&C search tree changes as a function of the cuts added at the root. To prove this result, we analyze how every aspect of B&C—the variables branched on, the nodes selected to expand, and the nodes fathomed—changes as a function of the LP relaxations that are computed throughout the search tree. We prove that the set of all possible cuts can be partitioned into a finite number of regions such that within any one region, B&C builds the exact same search tree.
This result allows us to prove sample complexity bounds for learning high-performing cutting planes from the class of Gomory mixed integer (GMI) cuts, our third main contribution (Section 5). GMI cuts are one of the most important families of cutting planes in the field of integer programming. Introduced by Gomory [16], they dominate most other families of cutting planes [15], and are perhaps most directly responsible for the realization that a branch-and-cut framework is necessary for the speeds now achievable by modern IP solvers [3]. A historical account of these cuts is provided by Cornuéjols [14]. The structural results from Section 4 allow us to understand the “intrinsic complexity” of B&C’s performance as a function of the GMI cuts it uses. We quantify this notion of intrinsic complexity using pseudo-dimension [32], which then implies a sample complexity bound.
1.2 Related research
Learning to cut.
This paper helps develop a theory of generalization for cutting plane selection. This line of inquiry began with a paper by Balcan et al. [8], who studied Chvátal-Gomory cuts for (pure) integer programs (IPs). Unlike that work, which exploited the fact that there are only finitely many distinct Chvátal-Gomory cuts for a given IP, our analysis of GMI cuts is far more involved.
The main distinction between our analysis in this paper and the techniques used in previous papers on generalization guarantees for integer programming [5, 8, 7] can be summarized as follows. Let be a (potentially multidimensional) parameter controlling some aspect of the IP solver (e.g. a mixture parameter between branching rules or a cutting-plane parameter). In previous works, as varied, there were only a finite number of states each node of branch-and-cut could be in. For example, in the case of branching/variable selection, controls the additional branching constraint added to the IP at any given node of the search tree. There are only finitely many possible branching constraints, so there are only finitely many possible “child” IPs induced by . Similarly, if represents the parameterization for Chvátal-Gomory cuts [12, 17], since Balcan et al. [8] showed that there are only finitely many distinct Chvátal-Gomory cuts for a given IP, as varies, there are only finitely many possible child IPs induced by at any stage of the search tree. However, in many settings, this property does not hold. For example if controls the normal vector and offset of an additional feasible constraint , there are infinitely many possible IPs corresponding to the choice of . Similarly, if controls the parameterization of a GMI cut, there are infinitely many IPs corresponding to the choice of (unlike Chvátal-Gomory cuts). In this paper, we develop a new structural understanding of B&C that is significantly more involved than the structural results in prior work.
Sensitivity analysis of integer and linear programs.
A related line of research studied the sensitivity of LPs, and to a lesser extent IPs, to changes in their parameters. Mangasarian and Shiau [28] and Li [27], for example, show that the optimal solution to an LP is a Lipschitz function of the right-hand-side of its constraints but not of its objective. Cook et al. [13] study how the set of optimal solutions to an IP changes as the objective function varies and the right-hand-side of the constraints varies. This paper fits in to this line of research as we study how the solution to an LP varies as new rows are added. This function is not Lipschitz, but we show that it is well-structured.
2 Notation and branch-and-cut background
Integer and linear programs.
An integer program (IP) is defined by an objective vector , a constraint matrix , and a constraint vector , with the form
(1) |
The linear programming (LP) relaxation is formed by removing the integrality constraints:
(2) |
Polyhedra and polytopes.
A set is a polyhedron if there exists an integer , , and such that . is a rational polyhedron if there exists and such that . A bounded polyhedron is called a polytope. The feasible regions of all IPs considered in this paper are assumed to be rational polytopes 111This assumption is not a restrictive one. The Minkowski-Weyl theorem states that any polyhedron can be decomposed as the sum of a polytope and its recession cone. All results in this paper can be derived for rational polyhedra by considering the corresponding polytope in the Minkowski-Weyl decomposition.. Let be a nonempty polyhedron. For any , the set is a face of . Conversely, if is a nonempty face of , then for some . Given a set of constraints , let denote the polyhedron that is the intersection of with all inequalities in .
Cutting planes.
A cutting plane is a linear constraint . Let be the feasible region of the LP relaxation in Equation (2) and be the feasible set of the IP in Equation (1). A cutting plane is valid if it is satisfied by every integer-feasible point: for all . A valid cut separates a point if We interchangeably refer to a cut by its parameters and the halfspace in it defines.
An important family of cuts that we study in this paper is the set of Gomory mixed integer (GMI) cuts.
Definition 2.1 (Gomory mixed integer cut).
Suppose the feasible region of the IP is in equality form , (which can be achieved by adding slack variables). For , let denote the fractional part of and let denote the fractional part of . That is, and . The Gomory mixed integer (GMI) cut parameterized by is given by
Branch-and-cut.
We provide a high-level overview of branch-and-cut (B&C) and refer the reader to the textbook by Nemhauser and Wolsey [31] for more details. Given an IP, B&C searches through the IP’s feasible region by building a binary search tree. B&C solves the LP relaxation of the input IP and then adds any number of cutting planes. It stores this information at the root of its binary search tree. Let be the solution to the LP relaxation with the addition of the cutting planes. B&C next uses a variable selection policy to choose a variable to branch on. This means that it splits the IP’s feasible region in two: one set where and the other where . The left child of the root now corresponds to the IP with a feasible region defined by the first subset and the right child likewise corresponds to the second subset. B&C then chooses a leaf using a node selection policy and recurses, adding any number of cutting planes, branching on a variable, and so on. B&C fathoms a node—which means that it will never branch on that node—if 1) the LP relaxation at the node is infeasible, 2) the optimal solution to the LP relaxation is integral, or 3) the optimal solution to the LP relaxation is no better than the best integral solution found thus far. Eventually, B&C will fathom every leaf, and it can be verified that it has found the globally optimal integral solution. We assume there is a bound on the size of the tree we allow B&C to build before we terminate, as is common in prior research [20, 24, 25, 5, 7, 8].
Every step of B&C—including node and variable selection and the choice of whether or not to fathom—depends crucially on guidance from LP relaxations. To give an example, this is true of the product scoring rule [1], a popular variable selection policy that our results apply to.
Definition 2.2.
Let be the solution to the LP relaxation at a node and . The product scoring rule branches on the variable that maximizes: .
The tighter the LP relaxation, the more valuable the LP guidance, highlighting the importance of cutting planes.
Polynomial arrangements in Euclidean space.
Let be a polynomial of degree at most . The polynomial partitions into connected components that belong to either or . When we discuss the connected components of induced by , we include connected components in both these sets. We make this distinction because previous work on sample complexity for data-driven algorithm design oftentimes only needed to consider the connected components of the former set. The number of connected components in both sets is [36, 29, 35].
3 Linear programming sensitivity
Our main result in this section characterizes how an LP’s optimal solution is affected by the addition of one or more new constraints. In particular, fixing an LP with constraints and variables, if denotes the new LP optimum when the constraint is added, we pin down a precise characterization of as a function of and . We show that has a piece-wise closed form: there are surfaces partitioning such that within each connected component induced by these surfaces, has a closed form. While the geometric intuition used to establish this piece-wise structure relies on the basic property that optimal solutions to LPs are achieved at vertices, the surfaces defining the regions are perhaps surprisingly nonlinear: they are defined by multivariate degree- polynomials in . In Appendix B.1 we illustrate these surfaces for an example two-variable LP.
There are two main steps of our proof: (1) tracking the set of edges of the LP polytope intersected by the new constraint, and once that set of edges is fixed, (2) tracking which edge yields the vertex with the highest objective value.
Let denote the set of constraints. For , let and denote the restrictions of and to . For , , and with , let denote the matrix obtained by adding row vector to and let be the matrix with the th column replaced by .
Theorem 3.1.
Let be an LP and let denote the optimal solution. There is a set of at most hyperplanes and at most degree- polynomial hypersurfaces partitioning into connected components such that for each component , one of the following holds: either (1) or (2) there is a set of constraints with such that
for all .
Proof.
First, if does not separate , then . The set of all such cuts is the halfspace in given by All other cuts separate and thus pass through , and the new LP optimum is achieved at a vertex created by the cut. We consider the new vertices formed by the cut, which lie on edges (faces of dimension ) of . Letting denote the set of constraints that define , each edge of can be identified with a subset of size such that the edge is precisely the set of all points such that
where is the th row of . Let denote the restriction of to only the rows in , and let denote the entries of corresponding to constraints in . Drop the inequality constraints defining the edge, so the equality constraints define a line in . The intersection of the cut and this line is precisely the solution to the system of linear equations in variables: . By Cramer’s rule, the (unique) solution to this system is given by To ensure that the intersection point indeed lies on the edge of the polytope, we simply stipulate that it satisfies the inequality constraints in . That is,
(3) |
for every (note that if satisfy any of these constraints, it must be that , which guarantees that indeed has a unique solution). Multiplying through by shows that this constraint is a halfspace in , since and are both linear in and . The collection of all the hyperplanes defining the boundaries of these halfspaces over all edges of induces a partition of into connected components such that for all within a given connected component, the (nonempty) set of edges of that the hyperplane intersects is invariant.
Now, consider a single connected component, denoted by for brevity. Let denote the edges intersected by cuts in , and let denote the sets of constraints that are binding at each of these edges, respectively. For each pair , consider the surface
(4) |
Clearing the (nonzero) denominators shows this is a degree- polynomial hypersurface in in . This hypersurface is the set of all for which the LP objective value achieved at the vertex on edge is equal to the LP objective value achieved at the vertex on edge . The collection of these surfaces for each partitions into further connected components. Within each of these connected components, the edge containing the vertex that maximizes the objective is invariant. If this edge corresponds to binding constraints , has the closed form for all within this component. We now count the number of surfaces used to obtain our decomposition. has at most edges, and for each edge we first considered at most hyperplanes representing decision boundaries for cuts intersecting that edge (Equation (3)), for a total of at most hyperplanes. We then considered a degree- polynomial hypersurface for every pair of edges (Equation (4)), of which there are at most . ∎
4 Structure and sensitivity of branch-and-cut
We now use Theorem 3.1 to answer a fundamental question about B&C: how does the B&C tree change when cuts are added at the root? Said another way, what is the structure of the B&C tree as a function of the set of cuts? We prove that the set of all possible cuts can be partitioned into a finite number of regions where by employing cuts from any one region, the B&C tree remains exactly the same. Moreover, we prove that the boundaries between regions are defined by constant-degree polynomials. As in the previous section, we focus on a single cut added to the root of the B&C tree. We provide an extension to multiple cuts in Appendix C.2.
We outline the main steps of our analysis:
- 1.
-
2.
In Lemma 4.3, we analyze how the branching decisions of B&C are impacted by variations in the cuts.
-
3.
In Lemma 4.4, we analyze how cuts affect which nodes are fathomed due to the integrality of the LP relaxation.
-
4.
In Theorem 4.5, we analyze how the LP estimates based on cuts can lead to pruning nodes of the B&C tree, which gives us a complete description of when two cutting planes lead to the same B&C tree.
The full proofs from this section are in Appendix C.
Given an IP, let be the maximum magnitude coordinate of any LP-feasible solution, rounded up. The set of all possible branching constraints is contained in which is a set of size . Naïvely, there are at most subsets of branching constraints, but the following observation allows us to greatly reduce the number of sets we consider.
Lemma 4.1.
Fix an IP . Define an equivalence relation on pairs of branching-constraint sets , by for all possible cutting planes . The number of equivalence classes of is at most .
By Cramer’s rule, , where is any square submatrix of . This is at most by Hadamard’s inequality, where is the maximum absolute value of any entry of . However, can be much smaller in various cases. For example, if contains even one row with only positive entries, then .
We will use the following notation in the remainder of this section. Let and denote the augmented constraint matrix and vector when the constraints in are added. For , let and denote the restrictions of and to . For and with , let denote the matrix obtained by adding row vector to and let be the matrix with the th column replaced by .
Lemma 4.2.
For any LP , there are at most hyperplanes and at most degree- polynomial hypersurfaces partitioning into connected components such that for each component and every , either: (1) and , or (2) there is a set of constraints with such that for all .
Proof sketch.
The same reasoning in the proof of Theorem 3.1 yields a partition with the desired properties. ∎
Next, we refine the decomposition obtained in Lemma 4.2 so that the branching constraints added at each step of B&C are invariant within a region. Our results apply to the product scoring rule (Def. 2.2), which is used, for example, by the leading open-source solver SCIP [10].
Lemma 4.3.
There are at most hyperplanes, degree-2 polynomial hypersurfaces, and degree-5 polynomial hypersurfaces partitioning into connected components such that within each component, the branching constraints used at every step of B&C are invariant.
Proof sketch.
Fix a connected component in the decomposition established in Lemma 4.2. Then, for each , either or there exists such that for all and all . Now, if we are at a stage in the branch-and-cut tree where is the list of branching constraints added so far, and the th variable is being branched on next, the two constraints generated are and , respectively. If is a component where , then there is nothing more to do, since the branching constraints at that point are trivially invariant over . Otherwise, in order to further decompose such that the right-hand-side of these constraints are invariant for every and every , we add the two decision boundaries given by
for every , , and every integer . This ensures that within every connected component of induced by these boundaries (hyperplanes), and are invariant. A careful analysis of the definition of the product scoring rule provides the appropriate refinement of this partition. ∎
We now move to the most critical phase of branch-and-cut: deciding when to fathom a node. One reason a node might be fathomed is if the LP relaxation of the IP at that node has an integral solution. We derive conditions that ensure that nearby cuts have the same effect on the integrality of the original IP at any node in the search tree. Recall that is the set of integer points in . Let denote the set of all valid cuts for the input IP . The set is a polyhedron since it can be expressed as
and is finite as is bounded. For cuts outside , we assume the B&C tree takes some special form denoting an invalid cut. Our goal now is to decompose into connected components such that is invariant for all in each component.
Lemma 4.4.
For any IP , there are at most hyperplanes, degree- polynomial hypersurfaces, and degree- polynomial hypersurfaces partitioning into connected components such that for each component and each , is invariant for all .
Proof sketch.
Fix a connected component in the decomposition that includes the facets defining and the surfaces obtained in Lemma 4.3. For all , , and , consider the surface
(5) |
By Lemma 4.2, this surface is a hyperplane. Clearly, within any connected component of induced by these hyperplanes, for every and , is invariant. Finally, if for some cut within a given connected component, for some , which means that for all cuts in that connected component. ∎
Suppose for a moment that a node is fathomed by B&C if and only if either the LP at that node is infeasible, or the LP optimal solution is integral—that is, the “bounding” of B&C is suppressed. In this case, the partition of obtained in Lemma 4.4 guarantees that the tree built by branch-and-cut is invariant within each connected component. Indeed, since the branching constraints at every node are invariant, and for every the integrality of is invariant, the (bounding-suppressed) B&C tree (and the order in which it is built) is invariant within each connected component in our decomposition. Equipped with this observation, we now analyze the full behavior of B&C.
Theorem 4.5.
Given an IP , there is a set of at most polynomial hypersurfaces of degree partitioning into connected components such that the branch-and-cut tree built after adding the cut at the root is invariant over all within a given component.
Proof sketch.
Fix a connected component in the decomposition induced by the set of hyperplanes and degree- hypersurfaces established in Lemma 4.4. Let
(6) |
denote the nodes of the tree branch-and-cut creates, in order of exploration, under the assumption that a node is pruned if and only if either the LP at that node is infeasible or the LP optimal solution is integral (so the “bounding” of branch-and-bound is suppressed). Here, a node is identified by the list of branching constraints added to the input IP. Nodes labeled by are either infeasible or have fractional LP optimal solutions. Nodes labeled by have integral LP optimal solutions and are candidates for the incumbent integral solution at the point they are encountered. (The nodes are functions of and , as are the indices .) By Lemma 4.4 and the observation following it, this ordered list of nodes is invariant over all .
Now, given an node index , let denote the incumbent node with the highest objective value encountered up until the th node searched by B&C, and let denote its objective value. For each node , let denote the branching constraints added to arrive at node . The hyperplane
(7) |
(which is a hyperplane due to Lemma 4.2) partitions into two subregions. In one subregion, , that is, the objective value of the LP optimal solution is no greater than the objective value of the current incumbent integer solution, and so the subtree rooted at is pruned. In the other subregion, , and is branched on further. Therefore, within each connected component of induced by all hyperplanes given by Equation 16 for all , the set of node within the list (15) that are pruned is invariant. Combined with the surfaces established in Lemma 4.4, these hyperplanes partition into connected components such that as varies within a given component, the tree built by branch-and-cut is invariant. ∎
5 Sample complexity bounds for B&C
In this section, we show how the results from the previous section can be used to provide sample complexity bounds for configuring B&C. Our results will apply to families of cuts parameterized by vectors from a set , such as the family of GMI cuts from Definition 2.1. We assume there is an unknown, application-specific distribution over IPs. The learner receives a training set of IPs sampled from this distribution. A sample complexity guarantee bounds the number of samples sufficient to ensure that for any parameter setting , the B&C tree size on average over the training set is close to the expected B&C tree size. More formally, let be the size of the tree B&C builds given the input after applying the cut defined by at the root. Given and , a sample complexity guarantee bounds the number of samples sufficient to ensure that with probability over the draw , for every parameter setting ,
(8) |
To derive our sample complexity guarantee, we use the notion of pseudo-dimension [32]. Let . The pseudo-dimension of , denoted , is the largest integer for which there exist IPs and thresholds such that for every binary vector , there exists such that if and only if . The number of samples sufficient to ensure that Equation (8) holds is [32]. Equivalently, for a given number of samples , the left-hand-side of Equation (8) can be bounded by .
So far, are parameters that do not depend on the input instance . Suppose now that they do: are functions of and a parameter vector (as they are for GMI cuts). Despite the structure established in the previous section, if can depend on in arbitrary ways, one cannot even hope for a finite sample complexity, illustrated by the following impossibility result. The full proofs of all results from this section are in Appendix D.
Theorem 5.1.
There exist functions and such that
where is any set with .
However, in the case of GMI cuts, we show that the cutting plane coefficients parameterized by are highly structured. Combining this structure with our analysis of B&C allows us to derive polynomial sample complexity bounds.
Lemma 5.2.
Consider the family of GMI cuts parameterized by . There is a set of at most hyperplanes partitioning into connected components such that , , and are invariant, for every , within each component.
Proof sketch.
We have , , and since , and For all , and , hyperplanes define the two regions
and the hyperplanes defining the two halfspaces
In addition, for each , consider the hyperplane
(9) |
Within any connected component of determined by these hyperplanes, and are constant. Also, is invariant within each component, since if and , which is the hyperplane from Equation 9. The lemma follows by counting the hyperplanes. ∎
Let denote the function taking GMI cut parameters to the corresponding vector of coefficients determining the resulting cutting plane, and let denote the offset of the resulting cutting plane. So (after multiplying through by ),
and (of course and each are functions of , but we suppress this dependence for readability).
The next lemma allows us to transfer the polynomial partition of from Theorem 4.5 to a polynomial partition of , incurring only a factor increase in degree.
Lemma 5.3.
Let be a polynomial of degree . Let be a connected component from Lemma 5.2. Define by . Then is a polynomial in of degree .
Proof.
By Lemma 5.2, there are integers for such that and for all . Also, the set is fixed over all .
A degree- polynomial in variables can be written as for some coefficients , where means that is a multiset of . Evaluating at , we get
Now, and are linear in . The sum is over all multisets of size at most , so each monomial consists of the product of at most degree- terms of the form , , or . Thus, , as desired. ∎
Applying Lemma 5.3 to every polynomial hypersurface in the partition of established in Theorem 4.5 yields our main structural result for GMI cuts.
Lemma 5.4.
Consider the family of GMI cuts parameterized by . For any IP , there are at most hyperplanes and degree- polynomial hypersurfaces partitioning into connected components such that the B&C tree built after adding the GMI cut defined by is invariant over all within a single component.
Bounding the pseudo-dimension of the class of tree-size functions is a direct application of the main theorem of Balcan et al. [6] along with standard results bounding the VC dimension of polynomial boundaries [2].
Theorem 5.5.
The pseudo-dimension of the class of tree-size functions on the domain of IPs with and is
We generalize the analysis of this section to multiple GMI cuts at the root of the B&C tree in Appendix D. The analysis there is more involved since GMI cuts can be applied in sequence, re-solving the LP relaxation after each cut. In particular, GMI cuts applied in sequence have one more parameter than the next, so the hyperplane defined by each GMI cut depends (polynomially) on the parameters defining all GMI cuts before it. We show that if GMI cuts are sequentially applied at the root, the resulting partition of the parameter space is induced by polynomials of degree .
6 Conclusions
In this paper, we investigated fundamental questions about linear and integer programs: given an integer program, how many possible branch-and-cut trees are there if one or more additional feasible constraints can be added? Even more specifically, what is the structure of the branch-and-cut tree as a function of a set of additional constraints? Through a detailed geometric and combinatorial analysis of how additional constraints affect the LP relaxation’s optimal solution, we showed that the branch-and-cut tree is piecewise constant and precisely bounded the number of pieces. We showed that the structural understandings that we developed could be used to prove sample complexity bounds for configuring branch-and-cut.
Acknowledgements
This material is based on work supported by the National Science Foundation under grants CCF-1733556, CCF-1910321, IIS-1901403, and SES-1919453, the ARO under award W911NF2010081, the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003, a Simons Investigator Award, an AWS Machine Learning Research Award, an Amazon Research Award, a Bloomberg Research Grant, and a Microsoft Research Faculty Fellowship.
References
- Achterberg [2007] Tobias Achterberg. Constraint Integer Programming. PhD thesis, Technische Universität Berlin, 2007.
- Anthony and Bartlett [2009] Martin Anthony and Peter Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.
- Balas et al. [1996] Egon Balas, Sebastian Ceria, Gérard Cornuéjols, and N Natraj. Gomory cuts revisited. Operations Research Letters, 19(1):1–9, 1996.
- Balcan [2020] Maria-Florina Balcan. Data-driven algorithm design. In Tim Roughgarden, editor, Beyond Worst Case Analysis of Algorithms. Cambridge University Press, 2020.
- Balcan et al. [2018] Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In International Conference on Machine Learning (ICML), 2018.
- Balcan et al. [2021a] Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design. In Annual Symposium on Theory of Computing (STOC), 2021a.
- Balcan et al. [2021b] Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik. Improved learning bounds for branch-and-cut. arXiv preprint arXiv:2111.11207, 2021b.
- Balcan et al. [2021c] Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik. Sample complexity of tree search configuration: Cutting planes and beyond. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2021c.
- Bertsimas and Dunn [2017] Dimitris Bertsimas and Jack Dunn. Optimal classification trees. Machine Learning, 106(7):1039–1082, 2017.
- Bestuzheva et al. [2021] Ksenia Bestuzheva, Mathieu Besançon, Wei-Kun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, Alexander Hoen, Christopher Hojny, Rolf van der Hulst, Thorsten Koch, Marco Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, Marc E. Pfetsch, Daniel Rehfeldt, Steffan Schlein, Franziska Schlösser, Felipe Serrano, Yuji Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philipp Wellner, Dieter Weninger, and Jakob Witzig. The SCIP Optimization Suite 8.0. Technical report, Optimization Online, December 2021. URL http://www.optimization-online.org/DB_HTML/2021/12/8728.html.
- Bunel et al. [2018] Rudy Bunel, Ilker Turkaslan, Philip H.S. Torr, Pushmeet Kohli, and M. Pawan Kumar. A unified view of piecewise linear neural network verification. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2018.
- Chvátal [1973] Vašek Chvátal. Edmonds polytopes and a hierarchy of combinatorial problems. Discrete mathematics, 4(4):305–337, 1973.
- Cook et al. [1986] William Cook, Albertus MH Gerards, Alexander Schrijver, and Éva Tardos. Sensitivity theorems in integer linear programming. Mathematical Programming, 34(3):251–264, 1986.
- Cornuéjols [2007] Gérard Cornuéjols. Revival of the Gomory cuts in the 1990’s. Annals of Operations Research, 149(1):63–66, 2007.
- Cornuéjols and Li [2001] Gérard Cornuéjols and Yanjun Li. Elementary closures for integer programs. Operations Research Letters, 28(1):1–8, 2001.
- Gomory [1960] Ralph Gomory. An algorithm for the mixed integer problem. resreport RM-2597, The Rand Corporation, 1960.
- Gomory [1958] Ralph E. Gomory. Outline of an algorithm for integer solutions to linear programs. Bulletin of the American Mathematical Society, 64(5):275 – 278, 1958.
- Gupta and Roughgarden [2017] Rishi Gupta and Tim Roughgarden. A PAC approach to application-specific algorithm selection. SIAM Journal on Computing, 46(3):992–1017, 2017.
- Huang et al. [2022] Zeren Huang, Kerong Wang, Furui Liu, Hui-Ling Zhen, Weinan Zhang, Mingxuan Yuan, Jianye Hao, Yong Yu, and Jun Wang. Learning to select cuts for efficient mixed-integer programming. Pattern Recognition, 123:108353, 2022.
- Hutter et al. [2009] Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, and Thomas Stützle. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research, 36(1):267–306, 2009. ISSN 1076-9757.
- Jeroslow [1974] Robert G Jeroslow. Trivial integer programs unsolvable by branch-and-bound. Mathematical Programming, 6(1):105–109, 1974.
- Kappes et al. [2013] Jörg Hendrik Kappes, Markus Speth, Gerhard Reinelt, and Christoph Schnörr. Towards efficient and exact map-inference for large scale discrete computer vision problems via combinatorial optimization. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1752–1758. IEEE, 2013.
- Khashabi et al. [2016] Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. Question answering via integer programming over semi-structured knowledge. In International Joint Conference on Artificial Intelligence (IJCAI), 2016.
- Kleinberg et al. [2017] Robert Kleinberg, Kevin Leyton-Brown, and Brendan Lucier. Efficiency through procrastination: Approximately optimal algorithm configuration with runtime guarantees. In International Joint Conference on Artificial Intelligence (IJCAI), 2017.
- Kleinberg et al. [2019] Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier, and Devon Graham. Procrastinating with confidence: Near-optimal, anytime, adaptive algorithm configuration. Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.
- Land and Doig [1960] Ailsa H Land and Alison G Doig. An automatic method of solving discrete programming problems. Econometrica, pages 497–520, 1960.
- Li [1993] Wu Li. The sharp lipschitz constants for feasible and optimal solutions of a perturbed linear program. Linear algebra and its applications, 187:15–40, 1993.
- Mangasarian and Shiau [1987] Olvi L Mangasarian and T-H Shiau. Lipschitz continuity of solutions of linear inequalities, programs and complementarity problems. SIAM Journal on Control and Optimization, 25(3):583–595, 1987.
- Milnor [1964] John Milnor. On the Betti numbers of real varieties. Proceedings of the American Mathematical Society, 15(2):275–280, 1964.
- Miyauchi et al. [2018] Atsushi Miyauchi, Tomohiro Sonobe, and Noriyoshi Sukegawa. Exact clustering via integer programming and maximum satisfiability. In AAAI Conference on Artificial Intelligence, 2018.
- Nemhauser and Wolsey [1999] George Nemhauser and Laurence Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, 1999.
- Pollard [1984] David Pollard. Convergence of Stochastic Processes. Springer, 1984.
- Sandholm [2013] Tuomas Sandholm. Very-large-scale generalized combinatorial multi-attribute auctions: Lessons from conducting $60 billion of sourcing. In Zvika Neeman, Alvin Roth, and Nir Vulkan, editors, Handbook of Market Design. Oxford University Press, 2013.
- Tang et al. [2020] Yunhao Tang, Shipra Agrawal, and Yuri Faenza. Reinforcement learning for integer programming: Learning to cut. International Conference on Machine Learning (ICML), 2020.
- Thom [1965] René Thom. Sur l’homologie des varietes algebriques reelles. In Differential and combinatorial topology, pages 255–265. Princeton University Press, 1965.
- Warren [1968] Hugh E Warren. Lower bounds for approximation by nonlinear manifolds. Transactions of the American Mathematical Society, 133(1):167–178, 1968.
- Zeng et al. [2017] Jiaming Zeng, Berk Ustun, and Cynthia Rudin. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(3):689–722, 2017.
Appendix A Further details about plots
The version of the facility location problem we study involves a set of locations and a set of clients . Facilities are to be constructed at some subset of the locations, and the clients in are served by these facilities. Each location has a cost of being the site of a facility, and a cost of serving client . Finally, each location has a capacity which is a limit on the number of clients can serve. The goal of the facility location problem is to arrive at a feasible set of locations for facilities and a feasible assignment of clients to these locations that minimizes the overall cost incurred.
The facility location problem can be formulated as the following IP:
We consider the following two distributions over facility location IPs.
First distribution
Facility location IPs are generated by perturbing the costs and capacities of a base facility location IP. We generated the base IP with locations and clients by choosing the location costs and client-location costs uniformly at random from and the capacities uniformly at random from . To sample from the distribution, we perturb this base IP by adding independent Gaussian noise with mean and standard deviation to the cost of each location, the cost of each client-location pair, and the capacity of each location.
Second distribution
Facility location IPs are generated by placing evenly-spaced locations along the line segment connecting the points and in the Cartesian plane. The location costs are all uniformly set to . Then, clients are placed uniformly at random in the unit square . The cost of serving client from location is the distance between and . Location capacities are chosen uniformly at random from .
In our experiments, we add five cuts at the root of the B&C tree. These five cuts come from the set of Chvátal-Gomory and Gomory mixed integer cuts derived from the optimal simplex tableau of the LP relaxation. The five cuts added are chosen to maximize a weighting of cutting-plane scores:
(10) |
is the parallelism of a cut, which intuitively measures the angle formed by the objective vector and the normal vector of the cutting plane—promoting cutting planes that are nearly parallel with the objective direction. is the efficacy, or depth, of a cut, which measures the perpendicular distance from the LP optimum to the cut—promoting cutting planes that are “deeper”, as measured with respect to the LP optimum. More details about these scoring rules can be found in Balcan et al. [8] and references therein. Given an IP, for each (discretized at steps of ) we choose the five cuts among the set of Chvátal-Gomory and Gomory mixed integer cuts that maximize (10). Figures 1 and 2 display the average tree size over 1000 samples drawn from the respective distribution for each value of used to choose cuts at the root. We ran our experiments using the C API of IBM ILOG CPLEX 20.1.0, with default cut generation disabled.
Appendix B Omitted results and proofs from Section 3
B.1 Example in two dimensions
Consider the LP
The optimum is at . Consider adding an additional constraint . Let denote the hyperplane . We derive a description of the set of parameters such that intersects the hyperplanes and . The intersection of and is given by
which exists if and only if . This intersection point is in the LP feasible region if and only if (which additionally ensures that ). Similarly, intersects at
which exists if and only if . This intersection point is in the LP feasible region if and only if . Now, we put down an “indifference” curve in -space that represents the set of such that the value of the objective achieved at the two aforementioned intersection points is equal. This surface is given by
Since and (for the relevant in consideration), this is equivalent to which is a degree- curve in . The left-hand-side can be factored to write this as . Therefore, this curve is given by the two lines and . Figure 3 illustrates the resulting partition of -space.

It turns out that when the indifference curve can always be factored into a product of linear terms. Let the objective of the LP be , and let and be two intersecting edges of the LP feasible region. Let be an additional constraint. The intersection points of this constraint with the two lines, if they exist, are given by
The indifference surface is thus given by
For such that and , clearing denominators and some manipulation yields
This curve consists of the two planes and .
This is however not true if . For example, consider an LP in three variables with the constraints . Writing out the indifference surface (assuming the objective is ) for the vertex on the intersection of and the vertex on yields
Setting , we can plot the resulting surface in (Figure 4).

B.2 Linear programming sensitivity for multiple constraints
Lemma B.1.
Let be an LP and let denote the set of its constraints. Let and denote the optimal solution and its objective value, respectively. For , let and denote the restrictions of and to . For , , , and with , let denote the matrix obtained by adding row vectors to and let be the matrix with the th column replaced by . There is a set of at most hyperplanes, degree- polynomial hypersurfaces, and degree- polynomial hypersurfaces partitioning into connected components such that for each component , one of the following holds: either (1) , or (2) there is a subset of cuts indexed by and a set of constraints with such that
for all .
Proof.
First, if none of separate , then and . The set of all such cuts is given by the intersection of halfspaces in given by
(11) |
All other vectors of cuts contain at least one cut that separates , and those cuts therefore pass through . The new LP optimum is thus achieved at a vertex created by the cuts that separate . As in the proof of Theorem 3.1, we consider all possible new vertices formed by our set of cuts. In the case of a single cut, these new vertices necessarily were on edges of , but now they may lie on higher dimensional faces.
Consider a subset of cuts that separate . Without loss of generality, denote these cuts by . We now establish conditions for these cuts to “jointly” form a new vertex of . Any vertex created by these cuts must lie on a face of with (in the case that , the relevant face with is itself). Letting denote the set of constraints that define , each dimension- face of can be identified with a (potentially empty) subset of size such that is precisely the set of all points such that
where is the th row of . Let denote the restriction of to only the rows in , and let denote the entries of corresponding to the constraints in . Consider removing the inequality constraints defining the face. The intersection of the cuts and this unbounded surface (if it exists) is precisely the solution to the system of linear equations
Let denote the matrix obtained by adding row vectors to , and let denote the matrix where the th column is replaced by
By Cramer’s rule, the solution to this system is given by
and the value of the objective at this point is
Now, to ensure that the unique intersection point (1) exists and (2) actually lies on (or simply lies in , in the case that ) , we stipulate that it satisfies the inequality constraints in . That is,
(12) |
for every . If satisfies any of these constraints, it must be that , which guarantees that indeed has a unique solution. Now, is a polynomial in of degree , since it is multilinear in each coefficient of each , . Similarly, is a polynomial in of degree , again because it is multilinear in each cut parameter. Hence, the boundary each constraint of the form given by Equation 12 is a polynomial of degree at most .
The collection of these polynomials for every , every subset of of size , and every face of of dimension , along with the hyperplanes determining separation constraints (Equation 11), partition into connected components such that for all within a given connected component, there is a fixed subset of and a fixed set of faces of such that the cuts with indices in that subset intersect every face in the set at a common vertex.
Now, consider a single connected component, denoted by . Let denote the faces intersected by vectors of cuts in , and let (without loss of generality) denote the subset of cuts that intersect these faces. Let denote the sets of constraints that are binding at each of these faces, respectively. For each pair , consider the surface
which can be equivalently written as
(13) |
This is a degree- polynomial hypersurface in . This hypersurface is precisely the set of all cut vectors for which the LP objective achieved at the vertex on face is equal to the LP objective value achieved at the vertex on face . The collection of these surfaces for each partitions into further connected components. Within each of these connected components, the face containing the vertex that maximizes the objective is invariant, and the subset of cuts passing through that vertex is invariant. If is the set of binding constraints representing this face, and represent the subset of cuts intersecting this face, and have the closed forms:
and
for all within this component. We now count the number of surfaces used to obtain our decomposition. First, we added hyperplanes encoding separation constraints for each of the cuts (Equation 11). Then, for every subset of size , and for every face of with , we first considered at most degree- polynomial hypersurfaces representing decision boundaries for when cuts in intersected that face (Equation 12). The number of -dimensional faces of is at most , so the total number of these hypersurfaces is at most . Finally, we considered a degree- polynomial hypersurface for every subset of cuts and every pair of faces with degree equal to the size of the subset, of which there are at most . ∎
Appendix C Omitted results and proofs from Section 4
Proof of Lemma 4.1.
Consider as an example and . We have for any cut , because the constraint is redundant in . More generally, any can be reduced by preserving only the tightest constraint and tightest constraint without affecting the resulting LP optimal solutions. The number of such unique reduced sets is at most (for each variable, there are possibilities for the tightest constraint: no constraint or one of , and similarly possibilities for the constraint). ∎
Proof of Lemma 4.2.
We carry out the same reasoning in the proof of Theorem 3.1 for each reduced . The number of edges of is at most . For each edge , we considered at most hyperplanes, for a total of at most halfspaces. Then, we had a degree- polynomial hypersurface for every pair of edges, of which there are at most . Summing over all reduced (of which there are at most ), combined with the fact that if is reduced then , we get a total of at most hyperplanes and at most degree- hypersurfaces, as desired. ∎
Proof of Lemma 4.4.
Fix a connected component in the decomposition that includes the facets defining and the surfaces obtained in Lemma 4.3. For all , , and , consider the surface
(14) |
This surface is a hyperplane, since by Lemma 4.2, either or , where is the subset of constraints corresponding to and . Clearly, within any connected component of induced by these hyperplanes, for every and , is invariant. Finally, if for some cut within a given connected component, for some , which means that for all cuts in that connected component.
We now count the number of hyperplanes given by Equation 14. For each , there are binding edge constraints defining the formula of Lemma 4.2, and we have hyperplanes for each . Since , . So the total number of hyperplanes given by Equation 14 is at most The number of facets defining is at most . Adding these to the counts obtained in Lemma 4.3 yields the final tallies in the lemma statement. ∎
Proof of Theorem 4.5.
Fix a connected component in the decomposition induced by the set of hyperplanes and degree- hypersurfaces established in Lemma 4.4. Let
(15) |
denote the nodes of the tree branch-and-cut creates, in order of exploration, under the assumption that a node is pruned if and only if either the LP at that node is infeasible or the LP optimal solution is integral (so the “bounding” of branch-and-bound is suppressed). Here, a node is identified by the list of branching constraints added to the input IP. Nodes labeled by are either infeasible or have fractional LP optimal solutions. Nodes labeled by have integral LP optimal solutions and are candidates for the incumbent integral solution at the point they are encountered. (The nodes are functions of and , as are the indices .) By Lemma 4.4 and the observation following it, this ordered list of nodes is invariant over all .
Now, given an node index , let denote the incumbent node with the highest objective value encountered up until the th node searched by B&C, and let denote its objective value. For each node , let denote the branching constraints added to arrive at node . The hyperplane
(16) |
(which is a hyperplane due to Lemma 4.2) partitions into two subregions. In one subregion, , that is, the objective value of the LP optimal solution is no greater than the objective value of the current incumbent integer solution, and so the subtree rooted at is pruned. In the other subregion, , and is branched on further. Therefore, within each connected component of induced by all hyperplanes given by Equation 16 for all , the set of node within the list (15) that are pruned is invariant. Combined with the surfaces established in Lemma 4.4, these hyperplanes partition into connected components such that as varies within a given component, the tree built by branch-and-cut is invariant.
Finally, we count the total number of surfaces inducing this partition. Unlike the counting stages of the previous lemmas, we will first have to count the number of connected components induced by the surfaces established in Lemma 4.4. This is because the ordered list of nodes explored by branch-and-cut (15) can be different across each component, and the hyperplanes given by Equation 16 depend on this list. From Lemma 4.4 we have hyperplanes, degree- polynomial hypersurfaces, and degree- polynomial hypersurfaces. To determine the connected components of induced by the zero sets of these polynomials, it suffices to consider the zero set of the product of all polynomials defining these surfaces. Denote this product polynomial by . The degree of the product polynomial is the sum of the degrees of degree- polynomials, degree- polynomials, and degree- polynomials, which is at most . By Warren’s theorem, the number of connected components of is , and by the Milnor-Thom theorem, the number of connected components of is as well. So, the number of connected components induced by the surfaces in Lemma 4.4 is For every connected component in Lemma 4.4, the closed form of is already determined due to Lemma 4.2, and so the number of hyperplanes given by Equation 16 is at most the number of possible , which is at most . So across all connected components , the total number of hyperplanes given by Equation 16 is Finally, adding this to the surface-counts established in Lemma 4.4 yields the lemma statement. ∎
C.1 Product scoring rule for variable selection
Let be the set of branching constraints added thus far. The product scoring rule branches on the variable that maximizes:
where .
Lemma C.1.
There is a set of of at most hyperplanes and degree- polynomial hypersurfaces partitioning into connected components such that for any connected component and any , the set of branching constraints is invariant across all .
Proof.
Fix a connected component in the decomposition established in Lemma 4.2. By Lemma 4.2, for each , either or there exists such that for all . Fix a variable , which corresponds to two branching constraints
(17) |
If is a component where , then these two branching constraints are trivially invariant over . Otherwise, in order to further decompose such that the right-hand-sides of these constraints are invariant for every , we add the two decision boundaries given by
for every , , and every integer , where . This ensures that within every connected component of induced by these boundaries (hyperplanes),
are invariant, so the branching constraints from Equation (17) are invariant. For a fixed , there are two hyperplanes for every corresponding to an edge of and , for a total of at most hyperplanes. Summing over all reduced , we get a total of hyperplanes. Adding these hyperplanes to the set of hyperplanes established in Lemma 4.2 yields the lemma statement. ∎
Proof of Lemma 4.3.
Fix a connected component in the decomposition established in Lemma C.1. We know that for each set of branching constraints :
-
•
By Lemma 4.2, either or there exists such that for all and all , and
-
•
The set of branching constraints is invariant across all .
Suppose that is the list of branching constraints added so far. For any variable , let
So long as , and are fixed. With this notation, we can write the product scoring rule as
where .
By Lemma 4.2, we know that across all , either or there exists such that
and similarly for , defined according to some edge set . Therefore, for each , there is a single degree-2 polynomial hypersurface partitioning into connected components such that within each connected component, either
(18) |
or vice versa, and similarly for . In particular, the former hypersurface will have one of four forms:
-
1.
, which is uniformly satisfied or not satisfied across all ,
-
2.
, which is a hyperplane,
-
3.
, which is a hyperplane, or
-
4.
, which is a degree-2 polynomial hypersurface.
Simply said, these are all degree-2 polynomial hypersurfaces.
Within any region induced by these hypersurfaces, the comparison between any two variables and will have the form
which at its most complex will equal
(19) | ||||
This inequality can be written as a degree-5 polynomial hypersurface. In any region induced by these hypersurfaces, the variable that branch-and-cut branches on will be fixed.
We now count the total number of hypersurfaces. First, we count the number of degree-2 polynomial hypersurfaces from Equation (18): there is a hypersurface defined by each variable , set of branching constraints , cutoff such that , set corresponding to an edge of , and set (and similarly for and ). For a fixed , this amounts to hypersurfaces. Summing over all reduced , we have degree-2 polynomial hypersurfaces.
Next, we count the number of degree-5 polynomial hypersurfaces from Equation (19): there is a hypersurface defined by each pair of variables , set of branching constraints , cutoffs such that and , and sets corresponding to edges of . For a fixed , this amounts to hypersurfaces. Summing over all reduced , we have degree-5 polynomial hypersurfaces.
Adding these hypersurfaces to those from Lemma C.1, we get the lemma statement. ∎
C.2 Extension to multiple cutting planes
We can similarly derive a multi-cut version of Lemma 4.2 that controls for any set of branching constraints. We use the following notation. Let be an LP and let denote the set of its constraints. For , let and denote the restrictions of and to . For , , and with , let denote the matrix obtained by adding row vectors to and let be the matrix with the th column replaced by .
Corollary C.2.
Fix an IP . There is a set of at most hyperplanes, degree- polynomial hypersurfaces, and degree- polynomial hypersurfaces partitioning into connected components such that for each component and every , one of the following holds: either (1) , or (2) there is a subset of cuts indexed by and a set of constraints with such that
for all .
Proof.
The exact same reasoning in the proof of Lemma B.1 applies. We still have hyperplanes. Now, for each , for each subset with , and for every face of with , we have at most degree- polynomial hypersurfaces. The number of -dimensional faces of is at most , so the total number of these hypersurfaces is at most . Finally, for every , we considered a degree- polynomal hypersurfaces for every subset of cuts and every pair of faces with degree equal to the size of the subset, of which there are at most , as desired. ∎
We now refine the decomposition obtained in Lemma 4.2 so that the branching constraints added at each step of branch-and-cut are invariant within a region. For ease of exposition, we assume that branch-and-cut uses a lexicographic variable selection policy. This means that the variable branched on at each node of the search tree is fixed and given by the lexicographic ordering . Generalizing the argument to work for other policies, such as the product scoring rule, can be done as in the single-cut case.
Lemma C.3.
Suppose branch-and-cut uses a lexicographic variable selection policy. Then, there is a set of of at most hyperplanes, degree- polynomial hypersurfaces, and degree- polynomial hypersurfaces partitioning into connected components such that within each connected component, the branching constraints used at every step of branch-and-cut are invariant.
Proof.
Fix a connected component in the decomposition established in Corollary C.2. Then, by Corollary C.2, for each , either or there exists cuts (without less of generality) labeled by indices and there exists such that
for all and all . Now, if we are at a stage in the branch-and-cut tree where is the list of branching constraints added so far, and the th variable is being branched on next, the two constraints generated are
respectively. If is a component where , then there is nothing more to do, since the branching constraints at that point are trivially invariant over . Otherwise, in order to further decompose such that the right-hand-side of these constraints are invariant for every and every , we add the two decision boundaries given by
for every , , and every integer , where . This ensures that within every connected component of induced by these boundaries (degree- polynomial hypersurfaces),
and
are invariant, so the branching constraints added by, for example, a lexicographic branching rule, are invariant. For a fixed , there are two hypersurfaces for every subset , every corresponding to a -dimensional face of , and every , for a total of at most . Summing over all reduced , we get a total of hypersurfaces. Adding these hypersurfaces to the set of hypersurfaces established in Corollary C.2 yields the lemma statement. ∎
Now, as in the single-cut case, we consider the constraints that ensure that all cuts are valid. Let denote the set of all vectors of valid cuts. As before, is a polyhedron, since we may write
We now refine our decomposition further to control the integrality of the various LP solutions at each node of branch-and-cut.
Lemma C.4.
Given an IP , there is a set of at most hyperplanes, degree- polynomial hypersurfaces, and degree- polynomial hypersurfaces partitioning into connected components such that for each component , and each ,
is invariant for all .
Proof.
Fix a connected component in the decomposition that includes the facets defining and the surfaces obtained in Lemma C.3. For all , , and , consider the surface
(20) |
This surface is a polynomial hypersurface of degree at most , due to Corollary C.2. Clearly, within any connected component of induced by these hyperplanes, for every and , is invariant. Finally, if for some cuts within a given connected component, for some , which means that for all vectors of cuts in that connected component.
We now count the number of hyperplanes given by Equation 20. For each , there are possible subsets of cut indices and at most binding face constraints defining the formula of Corollary C.2. For each subset-face pair, there are degree- polynomial hypersurfaces given by Equation 20. So the total number of such hypersurfaces over all is at most . The number of facets defining is at most . Adding these to the counts obtained in Lemma C.3 yields the final tallies in the lemma statement. ∎
At this point, as in the single-cut case, if the bounding aspect of branch-and-cut is suppressed, our decomposition yields connected components over which the branch-and-cut tree built is invariant. We now prove our main structural theorem for B&C as a function of multiple cutting planes at the root.
Theorem C.5.
Given an IP , there is a set of at most polynomial hypersurfaces of degree at most partitioning into connected components such that the branch-and-cut tree built after adding the cuts at the root is invariant over all within a given component. In particular, is invariant over each connected component.
Proof.
Fix a connected component in the decomposition induced by the set of hyperplanes, degree- hypersurfaces, and degree- hypersurfaces established in Lemma C.4. Let
(21) |
denote the nodes of the tree branch-and-cut creates, in order of exploration, under the assumption that a node is pruned if and only if either the LP at that node is infeasible or the LP optimal solution is integral (so the “bounding” of branch-and-bound is suppressed). Here, a node is identified by the list of branching constraints added to the input IP. Nodes labeled by are either infeasible or have fractional LP optimal solutions. Nodes labeled by have integral LP optimal solutions and are candidates for the incumbent integral solution at the point they are encountered. (The nodes are functions of , as are the indices .) By Lemma C.4, this ordered list of nodes is invariant for all .
Now, given an node index , let denote the incumbent node with the highest objective value encountered up until the th node searched by B&C, and let denote its objective value. For each node , let denote the branching constraints added to arrive at node . The hyperplane
(22) |
(which is a hyperplane due to Corollary C.2) partitions into two subregions. In one subregion, , that is, the objective value of the LP optimal solution is no greater than the objective value of the current incumbent integer solution, and so the subtree rooted at is pruned. In the other subregion, , and is branched on further. Therefore, within each connected component of induced by all hyperplanes given by Equation 22 for all , the set of node within the list (21) that are pruned is invariant. Combined with the surfaces established in Lemma C.4, these hyperplanes partition into connected components such that as varies within a given component, the tree built by branch-and-cut is invariant.
Finally, we count the total number of surfaces inducing this partition. Unlike the counting stages of the previous lemmas, we will first have to count the number of connected components induced by the surfaces established in Lemma C.4. This is because the ordered list of nodes explored by branch-and-cut (21) can be different across each component, and the hyperplanes given by Equation 22 depend on this list. From Lemma C.4 we have polynomial hypersurfaces of degree . The set of all such that lies on the boundary of any of these surfaces is precisely the zero set of the product of all polynomials defining these surfaces. Denote this product polynomial by . The degree of the product polynomial is the sum of the degrees of polynomials of degree , which is at most . By Warren’s theorem, the number of connected components of is , and by the Milnor-Thom theorem, the number of connected components of is as well. So, the number of connected components induced by the surfaces in Lemma C.4 is For every connected component in Lemma C.4, the closed form of is already determined due to Corollary C.2, and so the number of hyperplanes given by Equation 22 is at most the number of possible , which is at most . So across all connected components , the total number of hyperplanes given by Equation 22 is Finally, adding this to the surface-counts established in Lemma C.4 yields the theorem statement. ∎
Appendix D Omitted results from Section 5
Proof of Theorem 5.1.
For a set , denotes the set of finite sequences of elements from . There is a bijection between the set of IPs and , so IPs can be uniquely represented as real numbers (and vice versa). Now, consider the set of all finite sequences of pairs of IPs and labels of the form , , that is, the set . There is a bijection between this set and , and in turn there is a bijection between and . Hence, there exists a bijection between and . Fix such a bijection , and let denote the inverse of , which is well defined and also a bijection.
Let be odd. For , let denote the IP
(23) |
Since is odd, is infeasible, independent of . Jeroslow [21] showed that without the use of cutting planes or heuristics, branch-and-bound builds a tree of size before determining infeasibility and terminating. The objective is irrelevant, but is important in generating distinct IPs with this property. Consider the cut , which is a valid cut for (this is in fact a Chvátal-Gomory cut [8]). In particular, since is odd, , so the equality constraint of is violated by this cut. Thus, the feasible region of the LP relaxation after adding this cut is empty, and branch-and-bound will terminate immediately at the root (building a tree of size ). Denote this cut by . On the other hand, let be the trivial cut . Adding this cut to the IP constraints does not change the feasible region, so branch-and-bound will build a tree of size .
We now define and . Let
The choice to use in the case that either for each , or and is arbitrary and unimportant. Now, for any integer , constructing a set of IPs and thresholds that is shattered is almost immediate. Let be distinct reals, and let . Then, the set can be shattered. Indeed, given a sign pattern , let
Then, if , , so and . If , , so and . So for any there is a set of IPs and thresholds that can be shattered, which yields the theorem statement.∎
Proof of Lemma 5.2.
We have , , and since , and Now, for all , and , put down the hyperplanes defining the two halfspaces
(24) |
and the hyperplanes defining the two halfspaces
(25) |
In addition, consider the hyperplane
(26) |
for each . Within any connected component of determined by these hyperplanes, and are constant. Furthermore, is invariant within each connected component, since if and , which is the hyperplane given by Equation 26. The total number of hyperplanes of type 24 is , the total number of hyperplanes of type 25 is , and the total number of hyperplanes of type 26 is . Summing yields the lemma statement. ∎
Proof of Lemma 5.4.
Let be a connected component in the partition established in Theorem 4.5, so can be written as the intersection of at most polynomial constraints of degree at most . Let be a connected component in the partition established in Lemma 5.2. By Lemma 5.3, there are at most polynomials of degree at most partitioning into connected components such that within each component, is invariant. If we consider the overlay of these polynomial surfaces over all components , we will get a partition of such that for every , is invariant over each connected component of . Once we have this we are done, since all in the same connected component of will be sent to the same connected component of by , and thus by Theorem 4.5 the behavior of branch-and-cut will be invariant.
We now tally up the total number of surfaces. The number of connected components was given by Warren’s theorem and the Milnor-Thom theorem to be , so the total number of degree- hypersurfaces is times this quantity, which yields the lemma statement.∎
D.1 Multiple GMI cuts at the root
In this section we extend our results to allow for multiple GMI cuts at the root of the B&C tree. These cuts can be added simultaneously, sequentially, or in rounds. If GMI cuts , are added simultaneously, both of them have the same dimension and are defined in the usual way. If GMI cuts , are added sequentially, has one more entry than . This is because when cuts are added sequentially, the LP relaxation is re-solved after the addition of the first cut, and the second cut has a multiplier for all original constraints as well as for the first cut (this ensures that the second cut can be chosen in a more informed manner). If cuts are made at the root, they can be added in sequential rounds of simultaneous cuts. In the following discussion, we focus on the case where all cuts are added sequentially—the other cases can be viewed as instantiations of this. We refer the reader to the discussion in Balcan et al. [8] for more details.
To prove an analogous result for multiple GMI cuts (in sequence, that is, each successive GMI cut has one more parameter than the previous), we combine the reasoning used in the single-GMI-cut case with some technical observations in Balcan et al. [8].
Lemma D.1.
Consider the family of sequential GMI cuts parameterized by . For any IP , there are at most
degree- polynomial hypersurfaces and
degree- polynomial hypersurfaces partitioning connected components such that the B&C tree built after sequentially adding the GMI cuts defined by is invariant over all within a single component.
Proof.
We start with the setup used by Balcan et al. [8] to prove similar results for sequential Chvátal-Gomory cuts. Let be the columns of . We define the following augmented columns for each , and the augmented constraint vectors via the following recurrences:
and
for . In other words, is the th column of the constraint matrix of the IP and is the constraint vector after applying cuts . An identical induction argument to that of Balcan et al. [8] shows that for each ,
and
Now, as in the single-GMI-cut setting, consider the surfaces
(27) |
and
(28) |
for every , and every integer and every integer . In addition, consider the surfaces
(29) |
for each . As observed by Balcan et al. [8], is a polynomial in of degree at most (as is ), so surfaces 27, 28, and 29 are all degree- polynomial hypersurfaces for all . Within any connected component of induced by these hypersurfaces, and are constant. Furthermore is invariant for every , where and .
Now, fix a connected component induced by the above hypersurfaces, and let be the intersection of polynomial inequalities of degree at most . Consider a single degree- polynomial inequality in variables , which can be written as
Now, the sets defined by are fixed within , so we can write this as
We have that and are degree- polynomials in . Since the sum is over all multisets such that , there are at most terms across the products, each of the form , , or . Therefore, the left-hand-side is a polynomial of degree at most , and if is the intersection of polynomial inequalities each of degree at most , the set
can be expressed as the intersection of degree- polynomial inequalities.
To finish, we run this process for every connected component in the partition established by Theorem C.5. This partition consists of degree- polynomials over . By Warren’s theorem and the Milnor-Thom theorem, these polynomials partition into connected components. Running the above argument for each of these connected components of yields a total of polynomials of degree . Finally, we count the surfaces of the form (27), (28), and (29). The total number of degree- polynomials of type 27 is at most , the total number of degree- polynomials of type 28 is , and the total number of degree- polynomials of type 29 is . Summing these counts yields the desired number of surfaces in the lemma statement.
In any connected component of determined by these surfaces, is invariant for every connected component in the partition of established in Theorem C.5. This means that the tree built by branch-and-cut is invariant, which concludes the proof. ∎
Finally, applying the main result of Balcan et al. [6] to Lemma D.1, we get the following pseudo-dimension bound for the class of sequential GMI cuts at the root of the B&C tree.
Theorem D.2.
For , let denote the number of nodes in the tree B&C builds given the input after sequentially applying the GMI cuts defined by at the root. The pseudo-dimension of the set of functions on the domain of IPs with and is