This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts

Maria-Florina Balcan Computer Science Department, Machine Learning Department, Carnegie Mellon University. [email protected]    Siddharth Prasad Computer Science Department, Carnegie Mellon University. [email protected]    Tuomas Sandholm Computer Science Department, Carnegie Mellon University, Optimized Markets, Inc., Strategic Machine, Inc., Strategy Robot, Inc. [email protected]    Ellen Vitercik Department of Electrical Engineering and Computer Sciences, UC Berkeley. [email protected]
Abstract

The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.

1 Introduction

Integer programming (IP) solvers are the most widely-used tools for solving discrete optimization problems. They have numerous applications in machine learning, operations research, and many other fields, including MAP inference [22], combinatorial auctions [33], natural language processing [23], neural network verification [11], interpretable classification [37], training of optimal decision trees [9], and optimal clustering [30], among many others.

Under the hood, IP solvers use the tree-search algorithm branch-and-bound [26] augmented with cutting planes, known as branch-and-cut (B&C). A cutting plane is a linear constraint that is added to the LP relaxation at any node of the search tree. With a carefully selected cutting plane, the LP guidance can more efficiently lead B&C to the globally optimal integral solution. Cutting planes, specifically the family of Gomory mixed integer cuts which we study in this paper, are responsible for breakthrough speedups of modern IP solvers [14].

Successfully employing cutting planes can be challenging because there are infinitely many cuts to choose from and there are still many open questions about which cuts to employ when. A growing body of research has studied the use of machine learning for cut selection [34, 8, 7, 19]. In this paper, we analyze a machine learning setting where there is an unknown distribution over IPs—for example, a distribution over a shipping company’s routing problems. The learner receives a training set of IPs sampled from this distribution which it uses to learn cut parameters with strong average performance over the training set (leading, for example, to small search trees). We provide sample complexity bounds for this procedure, which bound the number of training instances sufficient to ensure that if a set of cut parameters leads to strong average performance over the training set, it will also lead to strong expected performance on future IPs from the same distribution. These guarantees apply no matter what procedure is used to optimize the cut parameters over the training set—optimal or suboptimal, automated or manual. We prove these guarantees by analyzing how the B&C tree varies as a function of the cut parameters on any IP. By bounding the “intrinsic complexity” of this function, we are able to provide our sample complexity bounds.

Refer to caption
Figure 1: Facility location with 40 locations and 40 clients. Samples generated by perturbing a base facility location IP.
Refer to caption
Figure 2: Facility location with 80 locations, 80 clients, and random Euclidean distance costs.

Figures 1 and 2 illustrate the need for distribution-dependent policies for choosing cutting planes. We plot the average number of nodes expanded by B&C as a function of a parameter μ\mu that controls its cut-selection policy, as we detail in Appendix A. In each figure, we draw a training set of facility location integer programs from two different distributions. In Figure 1, we define the distribution by starting with a uniformly random facility location instance and perturbing its costs. In Figure 2, the costs are more structured: the facilities are located along a line and the clients have uniformly random locations. In Figure 1, a smaller value of μ\mu leads to small search trees, but in Figure 2, a larger value of μ\mu is preferable. These figures illustrate that tuning cut parameters according to the instance distribution at hand can have a large impact on the performance of B&C, and that for one instance distribution, the best parameters for cut evaluation can be very different—in fact opposite—than the optimal parameters for another instance distribution.

The key challenge we face in developing a theory for cutting planes is that a cut added at the root remains in the LP relaxations stored in each node all the way to the leaves, thereby impacting the LP guidance that B&C uses to search throughout the whole tree. Tiny changes to any cut can thus completely change the entire course of B&C. At its core, our analysis therefore involves understanding an intricate interplay between the continuous and discrete components of our problem. The first, continuous component requires us to characterize how the solution the LP relaxation changes as a function of its constraints. This optimal solution will move continuously through space until it jumps from one vertex of the LP tableau to another. We then use this characterization to analyze how the B&C tree—a discrete, combinatorial object—varies as a function of its LP guidance.

1.1 Our contributions

Our first main contribution (Section 3) addresses a fundamental question: how does an LP’s solution change when new constraints are added? As the constraints vary, the solution will jump from vertex to vertex of the LP polytope. We prove that one can partition the set of all possible constraint vectors into a finite number of regions such that within any one region, the LP’s solution has a clean closed form. Moreover, we prove that the boundaries defining this partition have a specific form, defined by degree-2 polynomials.

We build on this result to prove our second main contribution (Section 4), which analyzes how the entire B&C search tree changes as a function of the cuts added at the root. To prove this result, we analyze how every aspect of B&C—the variables branched on, the nodes selected to expand, and the nodes fathomed—changes as a function of the LP relaxations that are computed throughout the search tree. We prove that the set of all possible cuts can be partitioned into a finite number of regions such that within any one region, B&C builds the exact same search tree.

This result allows us to prove sample complexity bounds for learning high-performing cutting planes from the class of Gomory mixed integer (GMI) cuts, our third main contribution (Section 5). GMI cuts are one of the most important families of cutting planes in the field of integer programming. Introduced by Gomory [16], they dominate most other families of cutting planes [15], and are perhaps most directly responsible for the realization that a branch-and-cut framework is necessary for the speeds now achievable by modern IP solvers [3]. A historical account of these cuts is provided by Cornuéjols [14]. The structural results from Section 4 allow us to understand the “intrinsic complexity” of B&C’s performance as a function of the GMI cuts it uses. We quantify this notion of intrinsic complexity using pseudo-dimension [32], which then implies a sample complexity bound.

1.2 Related research

Learning to cut.

This paper helps develop a theory of generalization for cutting plane selection. This line of inquiry began with a paper by Balcan et al. [8], who studied Chvátal-Gomory cuts for (pure) integer programs (IPs). Unlike that work, which exploited the fact that there are only finitely many distinct Chvátal-Gomory cuts for a given IP, our analysis of GMI cuts is far more involved.

The main distinction between our analysis in this paper and the techniques used in previous papers on generalization guarantees for integer programming [5, 8, 7] can be summarized as follows. Let 𝝁\bm{\mu} be a (potentially multidimensional) parameter controlling some aspect of the IP solver (e.g. a mixture parameter between branching rules or a cutting-plane parameter). In previous works, as 𝝁\bm{\mu} varied, there were only a finite number of states each node of branch-and-cut could be in. For example, in the case of branching/variable selection, 𝝁\bm{\mu} controls the additional branching constraint added to the IP at any given node of the search tree. There are only finitely many possible branching constraints, so there are only finitely many possible “child” IPs induced by 𝝁\bm{\mu}. Similarly, if 𝝁\bm{\mu} represents the parameterization for Chvátal-Gomory cuts [12, 17], since Balcan et al. [8] showed that there are only finitely many distinct Chvátal-Gomory cuts for a given IP, as 𝝁\bm{\mu} varies, there are only finitely many possible child IPs induced by 𝝁\bm{\mu} at any stage of the search tree. However, in many settings, this property does not hold. For example if 𝝁=(𝜶,β)\bm{\mu}=(\bm{\alpha},\beta) controls the normal vector and offset of an additional feasible constraint 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta, there are infinitely many possible IPs corresponding to the choice of (𝜶,β)(\bm{\alpha},\beta). Similarly, if 𝝁\bm{\mu} controls the parameterization of a GMI cut, there are infinitely many IPs corresponding to the choice of 𝝁\bm{\mu} (unlike Chvátal-Gomory cuts). In this paper, we develop a new structural understanding of B&C that is significantly more involved than the structural results in prior work.

This paper ties in to a broader line of research that provides sample complexity bounds for algorithm configuration [e.g., 18, 6]. A chapter by Balcan [4] provides a comprehensive survey.

There have also been several papers that study how to use machine learning for cut selection from an applied perspective [34, 19]. In contrast, the goal of this paper is to provide theoretical guarantees.

Sensitivity analysis of integer and linear programs.

A related line of research studied the sensitivity of LPs, and to a lesser extent IPs, to changes in their parameters. Mangasarian and Shiau [28] and Li [27], for example, show that the optimal solution to an LP is a Lipschitz function of the right-hand-side of its constraints but not of its objective. Cook et al. [13] study how the set of optimal solutions to an IP changes as the objective function varies and the right-hand-side of the constraints varies. This paper fits in to this line of research as we study how the solution to an LP varies as new rows are added. This function is not Lipschitz, but we show that it is well-structured.

2 Notation and branch-and-cut background

Integer and linear programs.

An integer program (IP) is defined by an objective vector 𝒄n\bm{c}\in\mathbb{R}^{n}, a constraint matrix Am×nA\in\mathbb{Z}^{m\times n}, and a constraint vector 𝒃m\bm{b}\in\mathbb{Z}^{m}, with the form

max{𝒄T𝒙:A𝒙𝒃,𝒙𝟎,𝒙n}.\max\{\bm{c}^{T}\bm{x}:A\bm{x}\leq\bm{b},\bm{x}\geq\bm{0},\bm{x}\in\mathbb{Z}^{n}\}. (1)

The linear programming (LP) relaxation is formed by removing the integrality constraints:

max{𝒄T𝒙:A𝒙𝒃,𝒙𝟎}.\max\{\bm{c}^{T}\bm{x}:A\bm{x}\leq\bm{b},\bm{x}\geq\bm{0}\}. (2)

We denote the optimal solution to (1) by 𝒙𝖨𝖯\bm{x}^{*}_{\mathsf{IP}}. We denote the optimal solution to  (2) by 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}} and its objective value by z𝖫𝖯=𝒄T𝒙𝖫𝖯z^{*}_{\mathsf{LP}}=\bm{c}^{T}\bm{x}^{*}_{\mathsf{LP}}. If σ\sigma is a set of constraints, we let 𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\sigma) denote the LP optimum of (2) subject to these additional constraints (similarly define z𝖫𝖯(σ)z^{*}_{\mathsf{LP}}(\sigma) and 𝒙𝖨𝖯(σ)\bm{x}^{*}_{\mathsf{IP}}(\sigma)).

Polyhedra and polytopes.

A set Pn\pazocal{P}\subseteq\mathbb{R}^{n} is a polyhedron if there exists an integer mm, Am×nA\in\mathbb{R}^{m\times n}, and 𝒃m\bm{b}\in\mathbb{R}^{m} such that P={𝐱n:A𝐱𝐛}\pazocal{P}=\{\bm{x}\in\mathbb{R}^{n}:A\bm{x}\leq\bm{b}\}. P\pazocal{P} is a rational polyhedron if there exists Am×nA\in\mathbb{Z}^{m\times n} and 𝒃m\bm{b}\in\mathbb{Z}^{m} such that P={𝐱n:A𝐱𝐛}\pazocal{P}=\{\bm{x}\in\mathbb{R}^{n}:A\bm{x}\leq\bm{b}\}. A bounded polyhedron is called a polytope. The feasible regions of all IPs considered in this paper are assumed to be rational polytopes 111This assumption is not a restrictive one. The Minkowski-Weyl theorem states that any polyhedron can be decomposed as the sum of a polytope and its recession cone. All results in this paper can be derived for rational polyhedra by considering the corresponding polytope in the Minkowski-Weyl decomposition.. Let P={𝐱n:𝐚i𝐱bi,iM}\pazocal{P}=\{\bm{x}\in\mathbb{R}^{n}:\bm{a}^{i}\bm{x}\leq b_{i},i\in M\} be a nonempty polyhedron. For any IMI\subseteq M, the set FI:={𝒙n:𝒂i𝒙=bi,iI,𝒂i𝒙bi,iMI}F_{I}:=\{\bm{x}\in\mathbb{R}^{n}:\bm{a}^{i}\bm{x}=b_{i},i\in I,\bm{a}^{i}\bm{x}\leq b_{i},i\in M\setminus I\} is a face of P\pazocal{P}. Conversely, if FF is a nonempty face of P\pazocal{P}, then F=FIF=F_{I} for some IMI\subseteq M. Given a set of constraints σ\sigma, let P(σ)\pazocal{P}(\sigma) denote the polyhedron that is the intersection of P\pazocal{P} with all inequalities in σ\sigma.

Cutting planes.

A cutting plane is a linear constraint 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta. Let P\pazocal{P} be the feasible region of the LP relaxation in Equation (2) and P𝖨=Pn\pazocal{P}_{\mathsf{I}}=\pazocal{P}\cap\mathbb{Z}^{n} be the feasible set of the IP in Equation (1). A cutting plane is valid if it is satisfied by every integer-feasible point: 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta for all 𝒙P𝖨\bm{x}\in\pazocal{P}_{\mathsf{I}}. A valid cut separates a point 𝒙PP𝖨\bm{x}\in\pazocal{P}\setminus\pazocal{P}_{\mathsf{I}} if 𝜶T𝒙>β.\bm{\alpha}^{T}\bm{x}>\beta. We interchangeably refer to a cut by its parameters (𝜶,β)n+1(\bm{\alpha},\beta)\in\mathbb{R}^{n+1} and the halfspace 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta in n\mathbb{R}^{n} it defines.

An important family of cuts that we study in this paper is the set of Gomory mixed integer (GMI) cuts.

Definition 2.1 (Gomory mixed integer cut).

Suppose the feasible region of the IP is in equality form A𝒙=𝒃A\bm{x}=\bm{b}, 𝒙𝟎\bm{x}\geq\bm{0} (which can be achieved by adding slack variables). For 𝒖m\bm{u}\in\mathbb{R}^{m}, let fif_{i} denote the fractional part of (𝒖TA)i(\bm{u}^{T}A)_{i} and let f0f_{0} denote the fractional part of 𝒖T𝒃\bm{u}^{T}\bm{b}. That is, (𝒖TA)i=(𝒖TA)i+fi(\bm{u}^{T}A)_{i}=(\lfloor\bm{u}^{T}A\rfloor)_{i}+f_{i} and 𝒖T𝒃=𝒖T𝒃+f0\bm{u}^{T}\bm{b}=\lfloor\bm{u}^{T}\bm{b}\rfloor+f_{0}. The Gomory mixed integer (GMI) cut parameterized by 𝒖\bm{u} is given by

i:fif0fixi+f01f0i:fi>f0(1fi)xif0.\sum_{i:f_{i}\leq f_{0}}f_{i}x_{i}+\frac{f_{0}}{1-f_{0}}\sum_{i:f_{i}>f_{0}}(1-f_{i})x_{i}\geq f_{0}.

Branch-and-cut.

We provide a high-level overview of branch-and-cut (B&C) and refer the reader to the textbook by Nemhauser and Wolsey [31] for more details. Given an IP, B&C searches through the IP’s feasible region by building a binary search tree. B&C solves the LP relaxation of the input IP and then adds any number of cutting planes. It stores this information at the root of its binary search tree. Let 𝒙𝖫𝖯=(𝒙𝖫𝖯[1],,𝒙𝖫𝖯[n])\bm{x}_{\mathsf{LP}}^{*}=(\bm{x}_{\mathsf{LP}}^{*}[1],\dots,\bm{x}_{\mathsf{LP}}^{*}[n]) be the solution to the LP relaxation with the addition of the cutting planes. B&C next uses a variable selection policy to choose a variable xix_{i} to branch on. This means that it splits the IP’s feasible region in two: one set where xi𝒙𝖫𝖯[i]x_{i}\leq\lfloor\bm{x}_{\mathsf{LP}}^{*}[i]\rfloor and the other where xi𝒙𝖫𝖯[i]x_{i}\geq\lceil\bm{x}_{\mathsf{LP}}^{*}[i]\rceil. The left child of the root now corresponds to the IP with a feasible region defined by the first subset and the right child likewise corresponds to the second subset. B&C then chooses a leaf using a node selection policy and recurses, adding any number of cutting planes, branching on a variable, and so on. B&C fathoms a node—which means that it will never branch on that node—if 1) the LP relaxation at the node is infeasible, 2) the optimal solution to the LP relaxation is integral, or 3) the optimal solution to the LP relaxation is no better than the best integral solution found thus far. Eventually, B&C will fathom every leaf, and it can be verified that it has found the globally optimal integral solution. We assume there is a bound κ\kappa on the size of the tree we allow B&C to build before we terminate, as is common in prior research [20, 24, 25, 5, 7, 8].

Every step of B&C—including node and variable selection and the choice of whether or not to fathom—depends crucially on guidance from LP relaxations. To give an example, this is true of the product scoring rule [1], a popular variable selection policy that our results apply to.

Definition 2.2.

Let 𝒙𝖫𝖯\bm{x}_{\mathsf{LP}}^{*} be the solution to the LP relaxation at a node and z𝖫𝖯=𝒄T𝒙𝖫𝖯z_{\mathsf{LP}}^{*}=\bm{c}^{T}\bm{x}_{\mathsf{LP}}^{*}. The product scoring rule branches on the variable i[n]i\in[n] that maximizes: max{z𝖫𝖯z𝖫𝖯(xi𝒙𝖫𝖯[i]),106}max{z𝖫𝖯z𝖫𝖯(xi𝒙𝖫𝖯[i]),106}\max\{z_{\mathsf{LP}}^{*}-z_{\mathsf{LP}}^{*}(x_{i}\leq\lfloor\bm{x}_{\mathsf{LP}}^{*}[i]\rfloor),10^{-6}\}\cdot\max\{z_{\mathsf{LP}}^{*}-z_{\mathsf{LP}}^{*}(x_{i}\geq\lceil\bm{x}_{\mathsf{LP}}^{*}[i]\rceil),10^{-6}\}.

The tighter the LP relaxation, the more valuable the LP guidance, highlighting the importance of cutting planes.

Polynomial arrangements in Euclidean space.

Let p[y1,,yk]p\in\mathbb{R}[y_{1},\ldots,y_{k}] be a polynomial of degree at most dd. The polynomial pp partitions k\mathbb{R}^{k} into connected components that belong to either k{(y1,,yk):p(y1,,yk)=0}\mathbb{R}^{k}\setminus\{(y_{1},\ldots,y_{k}):p(y_{1},\ldots,y_{k})=0\} or {(y1,,yk):p(y1,,yk)=0}\{(y_{1},\ldots,y_{k}):p(y_{1},\ldots,y_{k})=0\}. When we discuss the connected components of k\mathbb{R}^{k} induced by pp, we include connected components in both these sets. We make this distinction because previous work on sample complexity for data-driven algorithm design oftentimes only needed to consider the connected components of the former set. The number of connected components in both sets is O(dk)O(d^{k}) [36, 29, 35].

3 Linear programming sensitivity

Our main result in this section characterizes how an LP’s optimal solution is affected by the addition of one or more new constraints. In particular, fixing an LP with mm constraints and nn variables, if 𝒙𝖫𝖯(𝜶T𝒙β)n\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta)\in\mathbb{R}^{n} denotes the new LP optimum when the constraint 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta is added, we pin down a precise characterization of 𝒙𝖫𝖯(𝜶T𝒙β)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta) as a function of 𝜶\bm{\alpha} and β\beta. We show that 𝒙𝖫𝖯(𝜶T𝒙β)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta) has a piece-wise closed form: there are surfaces partitioning n+1\mathbb{R}^{n+1} such that within each connected component induced by these surfaces, 𝒙𝖫𝖯(𝜶T𝒙β)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta) has a closed form. While the geometric intuition used to establish this piece-wise structure relies on the basic property that optimal solutions to LPs are achieved at vertices, the surfaces defining the regions are perhaps surprisingly nonlinear: they are defined by multivariate degree-22 polynomials in 𝜶,β\bm{\alpha},\beta. In Appendix B.1 we illustrate these surfaces for an example two-variable LP.

There are two main steps of our proof: (1) tracking the set of edges of the LP polytope intersected by the new constraint, and once that set of edges is fixed, (2) tracking which edge yields the vertex with the highest objective value.

Let M=[m]M=[m] denote the set of mm constraints. For EME\subseteq M, let AE|E|×nA_{E}\in\mathbb{R}^{|E|\times n} and 𝒃E|E|\bm{b}_{E}\in\mathbb{R}^{|E|} denote the restrictions of AA and 𝒃\bm{b} to EE. For 𝜶n\bm{\alpha}\in\mathbb{R}^{n}, β\beta\in\mathbb{R}, and EME\subseteq M with |E|=n1|E|=n-1, let AE,𝜶n×nA_{E,\bm{\alpha}}\in\mathbb{R}^{n\times n} denote the matrix obtained by adding row vector 𝜶\bm{\alpha} to AEA_{E} and let AE,𝜶,βin×nA^{i}_{E,\bm{\alpha},\beta}\in\mathbb{R}^{n\times n} be the matrix AE,𝜶A_{E,\bm{\alpha}} with the iith column replaced by (𝒃E,β)T(\bm{b}_{E},\beta)^{T}.

Theorem 3.1.

Let (𝐜,A,𝐛)(\bm{c},A,\bm{b}) be an LP and let 𝐱𝖫𝖯\bm{x}^{*}_{\mathsf{LP}} denote the optimal solution. There is a set of at most mnm^{n} hyperplanes and at most m2nm^{2n} degree-22 polynomial hypersurfaces partitioning n+1\mathbb{R}^{n+1} into connected components such that for each component CC, one of the following holds: either (1) 𝐱𝖫𝖯(𝛂T𝐱β)=𝐱𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta)=\bm{x}^{*}_{\mathsf{LP}} or (2) there is a set of constraints EME\subseteq M with |E|=n1|E|=n-1 such that

𝒙𝖫𝖯(𝜶T𝒙β)=(det(AE,𝜶,β1)det(AE,𝜶),,det(AE,𝜶,βn)det(AE,𝜶))\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta)=\left(\frac{\det(A_{E,\bm{\alpha},\beta}^{1})}{\det(A_{E,\bm{\alpha}})},\ldots,\frac{\det(A_{E,\bm{\alpha},\beta}^{n})}{\det(A_{E,\bm{\alpha}})}\right)

for all (𝛂,β)C(\bm{\alpha},\beta)\in C.

Proof.

First, if 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta does not separate 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}, then 𝒙𝖫𝖯(𝜶T𝒙β)=𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta)=\bm{x}^{*}_{\mathsf{LP}}. The set of all such cuts is the halfspace in n+1\mathbb{R}^{n+1} given by {(𝜶,β)n+1:𝜶T𝒙𝖫𝖯β}.\{(\bm{\alpha},\beta)\in\mathbb{R}^{n+1}:\bm{\alpha}^{T}\bm{x}^{*}_{\mathsf{LP}}\leq\beta\}. All other cuts separate 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}} and thus pass through P={𝐱n:A𝐱𝐛,𝐱𝟎}\pazocal{P}=\{\bm{x}\in\mathbb{R}^{n}:A\bm{x}\leq\bm{b},\bm{x}\geq\bm{0}\}, and the new LP optimum is achieved at a vertex created by the cut. We consider the new vertices formed by the cut, which lie on edges (faces of dimension 11) of P\pazocal{P}. Letting MM denote the set of mm constraints that define P\pazocal{P}, each edge ee of P\pazocal{P} can be identified with a subset EME\subset M of size n1n-1 such that the edge is precisely the set of all points 𝒙\bm{x} such that

𝒂iT𝒙=bi\displaystyle\bm{a}_{i}^{T}\bm{x}=b_{i}\qquad iE\displaystyle\forall\,i\in E
𝒂iT𝒙bi\displaystyle\bm{a}_{i}^{T}\bm{x}\leq b_{i}\qquad iME,\displaystyle\forall\,i\in M\setminus E,

where 𝒂i\bm{a}_{i} is the iith row of AA. Let AEn1×nA_{E}\in\mathbb{R}^{n-1\times n} denote the restriction of AA to only the rows in EE, and let 𝒃E|E|\bm{b}_{E}\in\mathbb{R}^{|E|} denote the entries of 𝒃\bm{b} corresponding to constraints in EE. Drop the inequality constraints defining the edge, so the equality constraints define a line in n\mathbb{R}^{n}. The intersection of the cut 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta and this line is precisely the solution to the system of nn linear equations in nn variables: AE𝒙=𝒃E,𝜶T𝒙=βA_{E}\bm{x}=\bm{b}_{E},\bm{\alpha}^{T}\bm{x}=\beta. By Cramer’s rule, the (unique) solution 𝒙=(x1,,xn)\bm{x}=(x_{1},\ldots,x_{n}) to this system is given by xi=det(AE,𝜶,βi)det(AE,𝜶).x_{i}=\frac{\det(A_{E,\bm{\alpha},\beta}^{i})}{\det(A_{E,\bm{\alpha}})}. To ensure that the intersection point indeed lies on the edge of the polytope, we simply stipulate that it satisfies the inequality constraints in MEM\setminus E. That is,

j=1naijdet(AE,𝜶,βj)det(AE,𝜶)bi\sum_{j=1}^{n}a_{ij}\cdot\frac{\det(A_{E,\bm{\alpha},\beta}^{j})}{\det(A_{E,\bm{\alpha}})}\leq b_{i} (3)

for every iMEi\in M\setminus E (note that if 𝜶,β\bm{\alpha},\beta satisfy any of these constraints, it must be that det(AE,𝜶)0\det(A_{E,\bm{\alpha}})\neq 0, which guarantees that AE𝒙=𝒃E,𝜶T𝒙=βA_{E}\bm{x}=\bm{b}_{E},\bm{\alpha}^{T}\bm{x}=\beta indeed has a unique solution). Multiplying through by det(AE,𝜶)\det(A_{E,\bm{\alpha}}) shows that this constraint is a halfspace in n+1\mathbb{R}^{n+1}, since det(AE,𝜶)\det(A_{E,\bm{\alpha}}) and det(AE,𝜶,βi)\det(A^{i}_{E,\bm{\alpha},\beta}) are both linear in 𝜶\bm{\alpha} and β\beta. The collection of all the hyperplanes defining the boundaries of these halfspaces over all edges of P\pazocal{P} induces a partition of n+1\mathbb{R}^{n+1} into connected components such that for all (𝜶,β)(\bm{\alpha},\beta) within a given connected component, the (nonempty) set of edges of P\pazocal{P} that the hyperplane 𝜶T𝒙=β\bm{\alpha}^{T}\bm{x}=\beta intersects is invariant.

Now, consider a single connected component, denoted by CC for brevity. Let e1,,eke_{1},\ldots,e_{k} denote the edges intersected by cuts in CC, and let E1,,EkME_{1},\ldots,E_{k}\subset M denote the sets of constraints that are binding at each of these edges, respectively. For each pair ep,eqe_{p},e_{q}, consider the surface

i=1ncidet(AEp,𝜶,βi)det(AEp,𝜶)=i=1ncidet(AEq,𝜶,βi)det(AEq,𝜶).\sum_{i=1}^{n}c_{i}\cdot\frac{\det(A_{E_{p},\bm{\alpha},\beta}^{i})}{\det(A_{E_{p},\bm{\alpha}})}=\sum_{i=1}^{n}c_{i}\cdot\frac{\det(A_{E_{q},\bm{\alpha},\beta}^{i})}{\det(A_{E_{q},\bm{\alpha}})}. (4)

Clearing the (nonzero) denominators shows this is a degree-22 polynomial hypersurface in 𝜶,β\bm{\alpha},\beta in n+1\mathbb{R}^{n+1}. This hypersurface is the set of all (𝜶,β)(\bm{\alpha},\beta) for which the LP objective value achieved at the vertex on edge epe_{p} is equal to the LP objective value achieved at the vertex on edge eqe_{q}. The collection of these surfaces for each p,qp,q partitions CC into further connected components. Within each of these connected components, the edge containing the vertex that maximizes the objective is invariant. If this edge corresponds to binding constraints EE, 𝒙𝖫𝖯(𝜶T𝒙β)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta) has the closed form 𝒙𝖫𝖯(𝜶T𝒙β)[i]=det(AE,𝜶,βi)det(AE,𝜶)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta)[i]=\frac{\det(A_{E,\bm{\alpha},\beta}^{i})}{\det(A_{E,\bm{\alpha}})} for all (𝜶,β)(\bm{\alpha},\beta) within this component. We now count the number of surfaces used to obtain our decomposition. P\pazocal{P} has at most (mn1)mn1\binom{m}{n-1}\leq m^{n-1} edges, and for each edge EE we first considered at most |ME|m|M\setminus E|\leq m hyperplanes representing decision boundaries for cuts intersecting that edge (Equation (3)), for a total of at most mnm^{n} hyperplanes. We then considered a degree-22 polynomial hypersurface for every pair of edges (Equation (4)), of which there are at most (mn2)m2n\binom{m^{n}}{2}\leq m^{2n}. ∎

In Appendix B.2, we generalize Theorem 3.1 to understand 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}} as a function of any KK constraints. In this case, we show that the piecewise structure is given by degree-2K2K multivariate polynomials.

4 Structure and sensitivity of branch-and-cut

We now use Theorem 3.1 to answer a fundamental question about B&C: how does the B&C tree change when cuts are added at the root? Said another way, what is the structure of the B&C tree as a function of the set of cuts? We prove that the set of all possible cuts can be partitioned into a finite number of regions where by employing cuts from any one region, the B&C tree remains exactly the same. Moreover, we prove that the boundaries between regions are defined by constant-degree polynomials. As in the previous section, we focus on a single cut added to the root of the B&C tree. We provide an extension to multiple cuts in Appendix C.2.

We outline the main steps of our analysis:

  1. 1.

    In Lemma 4.2 we use Theorem 3.1 to understand how the LP optimum at any node in the B&C tree behaves as a function of cuts added at the root.

  2. 2.

    In Lemma 4.3, we analyze how the branching decisions of B&C are impacted by variations in the cuts.

  3. 3.

    In Lemma 4.4, we analyze how cuts affect which nodes are fathomed due to the integrality of the LP relaxation.

  4. 4.

    In Theorem 4.5, we analyze how the LP estimates based on cuts can lead to pruning nodes of the B&C tree, which gives us a complete description of when two cutting planes lead to the same B&C tree.

The full proofs from this section are in Appendix C.

Given an IP, let τ=max𝒙P𝒙\tau=\lceil\max_{\bm{x}\in\pazocal{P}}\left\lVert\bm{x}\right\rVert_{\infty}\rceil be the maximum magnitude coordinate of any LP-feasible solution, rounded up. The set of all possible branching constraints is contained in BC:={𝐱[i],𝐱[i]}0τ,i[n]\pazocal{B}\pazocal{C}:=\{\bm{x}[i]\leq\ell,\bm{x}[i]\geq\ell\}_{0\leq\ell\leq\tau,i\in[n]} which is a set of size 2n(τ+1)2n(\tau+1). Naïvely, there are at most 22n(τ+1)2^{2n(\tau+1)} subsets of branching constraints, but the following observation allows us to greatly reduce the number of sets we consider.

Lemma 4.1.

Fix an IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}). Define an equivalence relation on pairs of branching-constraint sets σ1,σ2BC\sigma_{1},\sigma_{2}\subseteq\pazocal{B}\pazocal{C}, by σ1σ2𝐱𝖫𝖯(𝛂T𝐱β,σ1)=𝐱𝖫𝖯(𝛂T𝐱β,σ2)\sigma_{1}\sim\sigma_{2}\iff\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{1})=\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{2}) for all possible cutting planes 𝛂T𝐱β\bm{\alpha}^{T}\bm{x}\leq\beta. The number of equivalence classes of \sim is at most τ3n\tau^{3n}.

By Cramer’s rule, τ|det(A~)|\tau\leq|\det(\widetilde{A})|, where A~\widetilde{A} is any square submatrix of AA. This is at most annn/2a^{n}n^{n/2} by Hadamard’s inequality, where aa is the maximum absolute value of any entry of AA. However, τ\tau can be much smaller in various cases. For example, if AA contains even one row with only positive entries, then τ𝒃\tau\leq\left\lVert\bm{b}\right\rVert_{\infty}.

We will use the following notation in the remainder of this section. Let AσA_{\sigma} and 𝒃σ\bm{b}_{\sigma} denote the augmented constraint matrix and vector when the constraints in σBC\sigma\subseteq\pazocal{B}\pazocal{C} are added. For EMσE\subseteq M\cup\sigma, let AE,σ|E|×nA_{E,\sigma}\in\mathbb{R}^{|E|\times n} and 𝒃E|E|\bm{b}_{E}\in\mathbb{R}^{|E|} denote the restrictions of AσA_{\sigma} and 𝒃σ\bm{b}_{\sigma} to EE. For 𝜶n,β\bm{\alpha}\in\mathbb{R}^{n},\beta\in\mathbb{R} and EMσE\subseteq M\cup\sigma with |E|=n1|E|=n-1, let AE,𝜶,σn×nA_{E,\bm{\alpha},\sigma}\in\mathbb{R}^{n\times n} denote the matrix obtained by adding row vector 𝜶\bm{\alpha} to AE,σA_{E,\sigma} and let AE,𝜶,β,σin×nA^{i}_{E,\bm{\alpha},\beta,\sigma}\in\mathbb{R}^{n\times n} be the matrix AE,𝜶,σA_{E,\bm{\alpha},\sigma} with the iith column replaced by (𝒃E,σ,β)T(\bm{b}_{E,\sigma},\beta)^{T}.

Lemma 4.2.

For any LP (𝐜,A,𝐛)(\bm{c},A,\bm{b}), there are at most (m+2n)nτ3n(m+2n)^{n}\tau^{3n} hyperplanes and at most (m+2n)2nτ3n(m+2n)^{2n}\tau^{3n} degree-22 polynomial hypersurfaces partitioning n+1\mathbb{R}^{n+1} into connected components such that for each component CC and every σBC\sigma\subset\pazocal{B}\pazocal{C}, either: (1) 𝐱𝖫𝖯(𝛂T𝐱β,σ)=𝐱𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma) and z𝖫𝖯(𝛂T𝐱β,σ)=z𝖫𝖯(σ)z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=z^{*}_{\mathsf{LP}}(\sigma), or (2) there is a set of constraints EMσE\subseteq M\cup\sigma with |E|=n1|E|=n-1 such that 𝐱𝖫𝖯(𝛂T𝐱β,σ)[i]=det(AE,𝛂,β,σi)det(AE,𝛂,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\frac{\det(A_{E,\bm{\alpha},\beta,\sigma}^{i})}{\det(A_{E,\bm{\alpha},\sigma})} for all (𝛂,β)C(\bm{\alpha},\beta)\in C.

Proof sketch.

The same reasoning in the proof of Theorem 3.1 yields a partition with the desired properties. ∎

Next, we refine the decomposition obtained in Lemma 4.2 so that the branching constraints added at each step of B&C are invariant within a region. Our results apply to the product scoring rule (Def. 2.2), which is used, for example, by the leading open-source solver SCIP [10].

Lemma 4.3.

There are at most 3(m+2n)nτ3n3(m+2n)^{n}\tau^{3n} hyperplanes, 3(m+2n)3nτ4n3(m+2n)^{3n}\tau^{4n} degree-2 polynomial hypersurfaces, and (m+2n)6nτ4n(m+2n)^{6n}\tau^{4n} degree-5 polynomial hypersurfaces partitioning n+1\mathbb{R}^{n+1} into connected components such that within each component, the branching constraints used at every step of B&C are invariant.

Proof sketch.

Fix a connected component CC in the decomposition established in Lemma 4.2. Then, for each σ\sigma, either 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma) or there exists EMσE\subseteq M\cup\sigma such that 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AE,𝜶,β,σi)det(AE,𝜶,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})} for all (𝜶,β)C(\bm{\alpha},\beta)\in C and all i[n]i\in[n]. Now, if we are at a stage in the branch-and-cut tree where σ\sigma is the list of branching constraints added so far, and the iith variable is being branched on next, the two constraints generated are xi𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]x_{i}\leq\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rfloor and xi𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]x_{i}\geq\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rceil, respectively. If CC is a component where 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma), then there is nothing more to do, since the branching constraints at that point are trivially invariant over (𝜶,β)C(\bm{\alpha},\beta)\in C. Otherwise, in order to further decompose CC such that the right-hand-side of these constraints are invariant for every σ\sigma and every i=1,,ni=1,\ldots,n, we add the two decision boundaries given by

kdet(AE,𝜶,β,σi)det(AE,𝜶,σ)k+1k\leq\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})}\leq k+1

for every ii, σ\sigma, and every integer k=0,,τ1k=0,\ldots,\tau-1. This ensures that within every connected component of CC induced by these boundaries (hyperplanes), 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\rfloor and 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\rceil are invariant. A careful analysis of the definition of the product scoring rule provides the appropriate refinement of this partition. ∎

We now move to the most critical phase of branch-and-cut: deciding when to fathom a node. One reason a node might be fathomed is if the LP relaxation of the IP at that node has an integral solution. We derive conditions that ensure that nearby cuts have the same effect on the integrality of the original IP at any node in the search tree. Recall that P𝖨=Pn\pazocal{P}_{\mathsf{I}}=\pazocal{P}\cap\mathbb{Z}^{n} is the set of integer points in P\pazocal{P}. Let Vn+1\pazocal{V}\subseteq\mathbb{R}^{n+1} denote the set of all valid cuts for the input IP (𝒄,A,𝒃)(\bm{c},A,\bm{b}). The set V\pazocal{V} is a polyhedron since it can be expressed as

V=𝐱¯P𝖨{(𝜶,β)n+1:𝜶T𝐱¯β},\pazocal{V}=\bigcap_{\bm{\overline{x}}\in\pazocal{P}_{\mathsf{I}}}\{(\bm{\alpha},\beta)\in\mathbb{R}^{n+1}:\bm{\alpha}^{T}\bm{\overline{x}}\leq\beta\},

and P𝖨\pazocal{P}_{\mathsf{I}} is finite as P\pazocal{P} is bounded. For cuts outside V\pazocal{V}, we assume the B&C tree takes some special form denoting an invalid cut. Our goal now is to decompose V\pazocal{V} into connected components such that 𝟏[𝒙𝖫𝖯(𝜶T𝒙β,σ)n]\mathbf{1}\left[\bm{x}_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)\in\mathbb{Z}^{n}\right] is invariant for all (𝜶,β)(\bm{\alpha},\beta) in each component.

Lemma 4.4.

For any IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}), there are at most 3(m+2n)nτ4n3(m+2n)^{n}\tau^{4n} hyperplanes, 3(m+2n)3nτ4n3(m+2n)^{3n}\tau^{4n} degree-22 polynomial hypersurfaces, and (m+2n)6nτ4n(m+2n)^{6n}\tau^{4n} degree-55 polynomial hypersurfaces partitioning n+1\mathbb{R}^{n+1} into connected components such that for each component CC and each σBC\sigma\subseteq\pazocal{B}\pazocal{C}, 𝟏[𝐱𝖫𝖯(𝛂T𝐱β,σ)n]\mathbf{1}\left[\bm{x}^{*}_{\mathsf{LP}}\left(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma\right)\in\mathbb{Z}^{n}\right] is invariant for all (𝛂,β)C(\bm{\alpha},\beta)\in C.

Proof sketch.

Fix a connected component CC in the decomposition that includes the facets defining V\pazocal{V} and the surfaces obtained in Lemma 4.3. For all σ\sigma, 𝒙𝖨P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{I}}, and i[n]i\in[n], consider the surface

𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=𝒙𝖨[i].\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\bm{x}_{\mathsf{I}}[i]. (5)

By Lemma 4.2, this surface is a hyperplane. Clearly, within any connected component of CC induced by these hyperplanes, for every σ\sigma and 𝒙𝖨P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{I}}, 𝟏[𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖨]\mathbf{1}[\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}_{\mathsf{I}}] is invariant. Finally, if 𝒙𝖫𝖯(𝜶T𝒙β,σ)n\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)\in\mathbb{Z}^{n} for some cut 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta within a given connected component, 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖨\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}_{\mathsf{I}} for some 𝒙𝖨P𝖨(σ)P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{I}}(\sigma)\subseteq\pazocal{P}_{\mathsf{I}}, which means that 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖨n\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}_{\mathsf{I}}\in\mathbb{Z}^{n} for all cuts 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta in that connected component. ∎

Suppose for a moment that a node is fathomed by B&C if and only if either the LP at that node is infeasible, or the LP optimal solution is integral—that is, the “bounding” of B&C is suppressed. In this case, the partition of n+1\mathbb{R}^{n+1} obtained in Lemma 4.4 guarantees that the tree built by branch-and-cut is invariant within each connected component. Indeed, since the branching constraints at every node are invariant, and for every σ\sigma the integrality of 𝒙𝖫𝖯(𝜶T𝒙β,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma) is invariant, the (bounding-suppressed) B&C tree (and the order in which it is built) is invariant within each connected component in our decomposition. Equipped with this observation, we now analyze the full behavior of B&C.

Theorem 4.5.

Given an IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}), there is a set of at most O(14n(m+2n)3n2τ5n2)O(14^{n}(m+2n)^{3n^{2}}\tau^{5n^{2}}) polynomial hypersurfaces of degree 5\leq 5 partitioning n+1\mathbb{R}^{n+1} into connected components such that the branch-and-cut tree built after adding the cut 𝛂T𝐱β\bm{\alpha}^{T}\bm{x}\leq\beta at the root is invariant over all (𝛂,β)(\bm{\alpha},\beta) within a given component.

Proof sketch.

Fix a connected component CC in the decomposition induced by the set of hyperplanes and degree-22 hypersurfaces established in Lemma 4.4. Let

Q1,,Qi1,I1,Qi1+1,,Qi2,I2,Qi2+1,Q_{1},\ldots,Q_{i_{1}},I_{1},Q_{i_{1}+1},\ldots,Q_{i_{2}},I_{2},Q_{i_{2}+1},\ldots (6)

denote the nodes of the tree branch-and-cut creates, in order of exploration, under the assumption that a node is pruned if and only if either the LP at that node is infeasible or the LP optimal solution is integral (so the “bounding” of branch-and-bound is suppressed). Here, a node is identified by the list σ\sigma of branching constraints added to the input IP. Nodes labeled by QQ are either infeasible or have fractional LP optimal solutions. Nodes labeled by II have integral LP optimal solutions and are candidates for the incumbent integral solution at the point they are encountered. (The nodes are functions of 𝜶\bm{\alpha} and β\beta, as are the indices i1,i2,i_{1},i_{2},\ldots.) By Lemma 4.4 and the observation following it, this ordered list of nodes is invariant over all (𝜶,β)C(\bm{\alpha},\beta)\in C.

Now, given an node index \ell, let I()I(\ell) denote the incumbent node with the highest objective value encountered up until the \ellth node searched by B&C, and let z(I())z(I(\ell)) denote its objective value. For each node QQ_{\ell}, let σ\sigma_{\ell} denote the branching constraints added to arrive at node QQ_{\ell}. The hyperplane

z𝖫𝖯(𝜶T𝒙β,σ)=z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell})=z(I(\ell)) (7)

(which is a hyperplane due to Lemma 4.2) partitions CC into two subregions. In one subregion, z𝖫𝖯(𝜶T𝒙β,σ)z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell})\leq z(I(\ell)), that is, the objective value of the LP optimal solution is no greater than the objective value of the current incumbent integer solution, and so the subtree rooted at QQ_{\ell} is pruned. In the other subregion, z𝖫𝖯(𝜶T𝒙β,σ)>z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell})>z(I(\ell)), and QQ_{\ell} is branched on further. Therefore, within each connected component of CC induced by all hyperplanes given by Equation 16 for all \ell, the set of node within the list (15) that are pruned is invariant. Combined with the surfaces established in Lemma 4.4, these hyperplanes partition n+1\mathbb{R}^{n+1} into connected components such that as (𝜶,β)(\bm{\alpha},\beta) varies within a given component, the tree built by branch-and-cut is invariant. ∎

5 Sample complexity bounds for B&C

In this section, we show how the results from the previous section can be used to provide sample complexity bounds for configuring B&C. Our results will apply to families of cuts parameterized by vectors 𝒖\bm{u} from a set U\pazocal{U}, such as the family of GMI cuts from Definition 2.1. We assume there is an unknown, application-specific distribution D\pazocal{D} over IPs. The learner receives a training set SDN\pazocal{S}\sim\pazocal{D}^{N} of NN IPs sampled from this distribution. A sample complexity guarantee bounds the number of samples NN sufficient to ensure that for any parameter setting 𝒖U\bm{u}\in\pazocal{U}, the B&C tree size on average over the training set S\pazocal{S} is close to the expected B&C tree size. More formally, let g𝒖(𝒄,A,𝒃)g_{\bm{u}}(\bm{c},A,\bm{b}) be the size of the tree B&C builds given the input (𝒄,A,𝒃)(\bm{c},A,\bm{b}) after applying the cut defined by 𝒖\bm{u} at the root. Given ϵ>0\epsilon>0 and δ(0,1)\delta\in(0,1), a sample complexity guarantee bounds the number of samples NN sufficient to ensure that with probability 1δ1-\delta over the draw SDN\pazocal{S}\sim\pazocal{D}^{N}, for every parameter setting 𝒖U\bm{u}\in\pazocal{U},

|1N(𝒄,A,𝒃)Sg𝒖(𝒄,A,𝒃)𝔼[g𝒖(𝒄,A,𝒃)]|ϵ.\Bigg{|}\frac{1}{N}\sum_{(\bm{c},A,\bm{b})\in\pazocal{S}}g_{\bm{u}}(\bm{c},A,\bm{b})-\mathop{\mathbb{E}}\left[g_{\bm{u}}(\bm{c},A,\bm{b})\right]\Bigg{|}\leq\epsilon. (8)

To derive our sample complexity guarantee, we use the notion of pseudo-dimension [32]. Let G={g𝐮:𝐮U}\pazocal{G}=\{g_{\bm{u}}:\bm{u}\in\pazocal{U}\}. The pseudo-dimension of G\pazocal{G}, denoted Pdim(G)\textnormal{Pdim}(\pazocal{G}), is the largest integer NN for which there exist NN IPs (𝒄1,A1,𝒃1),,(𝒄N,AN,𝒃N)(\bm{c}_{1},A_{1},\bm{b}_{1}),\dots,(\bm{c}_{N},A_{N},\bm{b}_{N}) and NN thresholds r1,,rNr_{1},\dots,r_{N}\in\mathbb{R} such that for every binary vector (σ1,,σN){0,1}N(\sigma_{1},\dots,\sigma_{N})\in\{0,1\}^{N}, there exists g𝒖Gg_{\bm{u}}\in\pazocal{G} such that g𝒖(𝒄i,Ai,𝒃i)rig_{\bm{u}}(\bm{c}_{i},A_{i},\bm{b}_{i})\geq r_{i} if and only if σi=1\sigma_{i}=1. The number of samples sufficient to ensure that Equation (8) holds is N=O(κ2ϵ2(Pdim(G)+log1δ))N=O(\frac{\kappa^{2}}{\epsilon^{2}}(\textnormal{Pdim}(\pazocal{G})+\log\frac{1}{\delta})) [32]. Equivalently, for a given number of samples NN, the left-hand-side of Equation (8) can be bounded by κ1N(Pdim(G)+log1δ)\kappa\sqrt{\frac{1}{N}(\textnormal{Pdim}(\pazocal{G})+\log\frac{1}{\delta})}.

So far, 𝜶,β\bm{\alpha},\beta are parameters that do not depend on the input instance 𝒄,A,𝒃\bm{c},A,\bm{b}. Suppose now that they do: 𝜶,β\bm{\alpha},\beta are functions of 𝒄,A,𝒃\bm{c},A,\bm{b} and a parameter vector 𝒖\bm{u} (as they are for GMI cuts). Despite the structure established in the previous section, if 𝜶,β\bm{\alpha},\beta can depend on (𝒄,A,𝒃)(\bm{c},A,\bm{b}) in arbitrary ways, one cannot even hope for a finite sample complexity, illustrated by the following impossibility result. The full proofs of all results from this section are in Appendix D.

Theorem 5.1.

There exist functions 𝛂𝐜,A,𝐛:Un\bm{\alpha}_{\bm{c},A,\bm{b}}:\pazocal{U}\to\mathbb{R}^{n} and β𝐜,A,𝐛:U\beta_{\bm{c},A,\bm{b}}:\pazocal{U}\to\mathbb{R} such that

Pdim({g𝒖:𝒖U})=,\textnormal{Pdim}\left(\left\{g_{\bm{u}}:\bm{u}\in\pazocal{U}\right\}\right)=\infty,

where U\pazocal{U} is any set with |U|=|||\pazocal{U}|=|\mathbb{R}|.

However, in the case of GMI cuts, we show that the cutting plane coefficients parameterized by 𝒖\bm{u} are highly structured. Combining this structure with our analysis of B&C allows us to derive polynomial sample complexity bounds.

Lemma 5.2.

Consider the family of GMI cuts parameterized by 𝐮[U,U]m\bm{u}\in[-U,U]^{m}. There is a set of at most O(nU2A1𝐛1)O(nU^{2}\left\lVert A\right\rVert_{1}\left\lVert\bm{b}\right\rVert_{1}) hyperplanes partitioning [U,U]m[-U,U]^{m} into connected components such that 𝐮T𝐚i\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor, 𝐮T𝐛\lfloor\bm{u}^{T}\bm{b}\rfloor, and 𝟏[fif0]\mathbf{1}[f_{i}\leq f_{0}] are invariant, for every ii, within each component.

Proof sketch.

We have fi=𝒖T𝒂i𝒖T𝒂if_{i}=\bm{u}^{T}\bm{a}_{i}-\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor, f0=𝒖T𝒃𝒖T𝒃f_{0}=\bm{u}^{T}\bm{b}-\lfloor\bm{u}^{T}\bm{b}\rfloor, and since 𝒖[U,U]m\bm{u}\in[-U,U]^{m}, 𝒖T𝒂i[U𝒂i1,U𝒂i1]\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor\in[-U\left\lVert\bm{a}_{i}\right\rVert_{1},U\left\lVert\bm{a}_{i}\right\rVert_{1}] and 𝒖T𝒃[U𝒃1,U𝒃1].\lfloor\bm{u}^{T}\bm{b}\rfloor\in[-U\left\lVert\bm{b}\right\rVert_{1},U\left\lVert\bm{b}\right\rVert_{1}]. For all ii, ki[U𝒂i1,U𝒂i1]k_{i}\in[-U\left\lVert\bm{a}_{i}\right\rVert_{1},U\left\lVert\bm{a}_{i}\right\rVert_{1}]\cap\mathbb{Z} and k0[U𝒃1,U𝒃1]k_{0}\in[-U\left\lVert\bm{b}\right\rVert_{1},U\left\lVert\bm{b}\right\rVert_{1}]\cap\mathbb{Z}, hyperplanes define the two regions

𝒖T𝒂i=kiki𝒖T𝒂i<ki+1\left\lfloor\bm{u}^{T}\bm{a}_{i}\right\rfloor=k_{i}\iff k_{i}\leq\bm{u}^{T}\bm{a}_{i}<k_{i}+1

and the hyperplanes defining the two halfspaces

𝒖T𝒃=k0k0𝒖T𝒃<k0+1.\left\lfloor\bm{u}^{T}\bm{b}\right\rfloor=k_{0}\iff k_{0}\leq\bm{u}^{T}\bm{b}<k_{0}+1.

In addition, for each ii, consider the hyperplane

𝒖T𝒂iki=𝒖T𝒃k0.\bm{u}^{T}\bm{a}_{i}-k_{i}=\bm{u}^{T}\bm{b}-k_{0}. (9)

Within any connected component of m\mathbb{R}^{m} determined by these hyperplanes, 𝒖T𝒂i\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor and 𝒖T𝒃\lfloor\bm{u}^{T}\bm{b}\rfloor are constant. Also, 𝟏[fif0]\mathbf{1}[f_{i}\leq f_{0}] is invariant within each component, since if 𝒖T𝒂i=ki\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor=k_{i} and 𝒖T𝒃=k0\lfloor\bm{u}^{T}\bm{b}\rfloor=k_{0}, fif0𝒖T𝒂iki𝒖T𝒃k0,f_{i}\leq f_{0}\iff\bm{u}^{T}\bm{a}_{i}-k_{i}\leq\bm{u}^{T}\bm{b}-k_{0}, which is the hyperplane from Equation 9. The lemma follows by counting the hyperplanes. ∎

Let 𝜶:[U,U]mn\bm{\alpha}:[-U,U]^{m}\to\mathbb{R}^{n} denote the function taking GMI cut parameters 𝒖\bm{u} to the corresponding vector of coefficients determining the resulting cutting plane, and let β:[U,U]m\beta:[-U,U]^{m}\to\mathbb{R} denote the offset of the resulting cutting plane. So (after multiplying through by 1f01-f_{0}),

𝜶(𝒖)[i]={fi(1f0)if fif0f0(1fi)if fi>f0\bm{\alpha}(\bm{u})[i]=\begin{cases}f_{i}(1-f_{0})&\text{if }f_{i}\leq f_{0}\\ f_{0}(1-f_{i})&\text{if }f_{i}>f_{0}\end{cases}

and β(𝒖)=f0(1f0)\beta(\bm{u})=f_{0}(1-f_{0}) (of course f0f_{0} and each fif_{i} are functions of 𝒖\bm{u}, but we suppress this dependence for readability).

The next lemma allows us to transfer the polynomial partition of n+1\mathbb{R}^{n+1} from Theorem 4.5 to a polynomial partition of [U,U]m[-U,U]^{m}, incurring only a factor 22 increase in degree.

Lemma 5.3.

Let p[y1,,yn+1]p\in\mathbb{R}[y_{1},\ldots,y_{n+1}] be a polynomial of degree dd. Let D[U,U]mD\subseteq[-U,U]^{m} be a connected component from Lemma 5.2. Define q:Dq:D\to\mathbb{R} by q(𝐮)=p(𝛂(𝐮),β(𝐮))q(\bm{u})=p(\bm{\alpha}(\bm{u}),\beta(\bm{u})). Then qq is a polynomial in 𝐮\bm{u} of degree 2d2d.

Proof.

By Lemma 5.2, there are integers k0,kik_{0},k_{i} for i[n]i\in[n] such that 𝒖T𝒂i=ki\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor=k_{i} and 𝒖T𝒃=k0\lfloor\bm{u}^{T}\bm{b}\rfloor=k_{0} for all 𝒖D\bm{u}\in D. Also, the set S={i:fif0}S=\{i:f_{i}\leq f_{0}\} is fixed over all 𝒖D\bm{u}\in D.

A degree-dd polynomial pp in variables y1,,yn+1y_{1},\ldots,y_{n+1} can be written as T[n+1],|T|dλTiTyi\sum_{T\sqsubseteq[n+1],|T|\leq d}\lambda_{T}\prod_{i\in T}y_{i} for some coefficients λT\lambda_{T}\in\mathbb{R}, where T[n+1]T\sqsubseteq[n+1] means that TT is a multiset of [n+1][n+1]. Evaluating at (𝜶(𝒖),β(𝒖))(\bm{\alpha}(\bm{u}),\beta(\bm{u})), we get

|T|dλTiTSin+1fi(1f0)iTSin+1f0(1fi)iTi=n+1f0(1f0).\sum_{|T|\leq d}\lambda_{T}\prod_{\begin{subarray}{c}i\in T\cap S\\ i\neq n+1\end{subarray}}f_{i}(1-f_{0})\prod_{\begin{subarray}{c}i\in T\setminus S\\ i\neq n+1\end{subarray}}f_{0}(1-f_{i})\prod_{\begin{subarray}{c}i\in T\\ i=n+1\end{subarray}}f_{0}(1-f_{0}).

Now, fi=𝒖T𝒂ikif_{i}=\bm{u}^{T}\bm{a}_{i}-k_{i} and f0=𝒖T𝒃k0f_{0}=\bm{u}^{T}\bm{b}-k_{0} are linear in 𝒖\bm{u}. The sum is over all multisets of size at most dd, so each monomial consists of the product of at most dd degree-22 terms of the form fi(1f0)f_{i}(1-f_{0}), f0(1fi)f_{0}(1-f_{i}), or f0(1f0)f_{0}(1-f_{0}). Thus, deg(q)2d\deg(q)\leq 2d, as desired. ∎

Applying Lemma 5.3 to every polynomial hypersurface in the partition of n+1\mathbb{R}^{n+1} established in Theorem 4.5 yields our main structural result for GMI cuts.

Lemma 5.4.

Consider the family of GMI cuts parameterized by 𝐮[U,U]m\bm{u}\in[-U,U]^{m}. For any IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}), there are at most O(nU2A1𝐛1)O(nU^{2}\left\lVert A\right\rVert_{1}\left\lVert\bm{b}\right\rVert_{1}) hyperplanes and 2O(n2)(m+2n)O(n3)τO(n3)2^{O(n^{2})}(m+2n)^{O(n^{3})}\tau^{O(n^{3})} degree-1010 polynomial hypersurfaces partitioning [U,U]m[-U,U]^{m} into connected components such that the B&C tree built after adding the GMI cut defined by 𝐮\bm{u} is invariant over all 𝐮\bm{u} within a single component.

Bounding the pseudo-dimension of the class of tree-size functions {g𝒖:𝒖[U,U]m}\{g_{\bm{u}}:\bm{u}\in[-U,U]^{m}\} is a direct application of the main theorem of Balcan et al. [6] along with standard results bounding the VC dimension of polynomial boundaries [2].

Theorem 5.5.

The pseudo-dimension of the class of tree-size functions {g𝐮:𝐮[U,U]m}\{g_{\bm{u}}:\bm{u}\in[-U,U]^{m}\} on the domain of IPs with A1a\left\lVert A\right\rVert_{1}\leq a and 𝐛1b\left\lVert\bm{b}\right\rVert_{1}\leq b is

O(mlog(abU)+mn3log(m+n)+mn3logτ).O\left(m\log(abU)+mn^{3}\log(m+n)+mn^{3}\log\tau\right).

We generalize the analysis of this section to multiple GMI cuts at the root of the B&C tree in Appendix D. The analysis there is more involved since GMI cuts can be applied in sequence, re-solving the LP relaxation after each cut. In particular, GMI cuts applied in sequence have one more parameter than the next, so the hyperplane defined by each GMI cut depends (polynomially) on the parameters defining all GMI cuts before it. We show that if KK GMI cuts are sequentially applied at the root, the resulting partition of the parameter space is induced by polynomials of degree O(K2)O(K^{2}).

6 Conclusions

In this paper, we investigated fundamental questions about linear and integer programs: given an integer program, how many possible branch-and-cut trees are there if one or more additional feasible constraints can be added? Even more specifically, what is the structure of the branch-and-cut tree as a function of a set of additional constraints? Through a detailed geometric and combinatorial analysis of how additional constraints affect the LP relaxation’s optimal solution, we showed that the branch-and-cut tree is piecewise constant and precisely bounded the number of pieces. We showed that the structural understandings that we developed could be used to prove sample complexity bounds for configuring branch-and-cut.

Acknowledgements

This material is based on work supported by the National Science Foundation under grants CCF-1733556, CCF-1910321, IIS-1901403, and SES-1919453, the ARO under award W911NF2010081, the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003, a Simons Investigator Award, an AWS Machine Learning Research Award, an Amazon Research Award, a Bloomberg Research Grant, and a Microsoft Research Faculty Fellowship.

References

  • Achterberg [2007] Tobias Achterberg. Constraint Integer Programming. PhD thesis, Technische Universität Berlin, 2007.
  • Anthony and Bartlett [2009] Martin Anthony and Peter Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.
  • Balas et al. [1996] Egon Balas, Sebastian Ceria, Gérard Cornuéjols, and N Natraj. Gomory cuts revisited. Operations Research Letters, 19(1):1–9, 1996.
  • Balcan [2020] Maria-Florina Balcan. Data-driven algorithm design. In Tim Roughgarden, editor, Beyond Worst Case Analysis of Algorithms. Cambridge University Press, 2020.
  • Balcan et al. [2018] Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In International Conference on Machine Learning (ICML), 2018.
  • Balcan et al. [2021a] Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, and Ellen Vitercik. How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design. In Annual Symposium on Theory of Computing (STOC), 2021a.
  • Balcan et al. [2021b] Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik. Improved learning bounds for branch-and-cut. arXiv preprint arXiv:2111.11207, 2021b.
  • Balcan et al. [2021c] Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik. Sample complexity of tree search configuration: Cutting planes and beyond. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2021c.
  • Bertsimas and Dunn [2017] Dimitris Bertsimas and Jack Dunn. Optimal classification trees. Machine Learning, 106(7):1039–1082, 2017.
  • Bestuzheva et al. [2021] Ksenia Bestuzheva, Mathieu Besançon, Wei-Kun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, Alexander Hoen, Christopher Hojny, Rolf van der Hulst, Thorsten Koch, Marco Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, Marc E. Pfetsch, Daniel Rehfeldt, Steffan Schlein, Franziska Schlösser, Felipe Serrano, Yuji Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philipp Wellner, Dieter Weninger, and Jakob Witzig. The SCIP Optimization Suite 8.0. Technical report, Optimization Online, December 2021. URL http://www.optimization-online.org/DB_HTML/2021/12/8728.html.
  • Bunel et al. [2018] Rudy Bunel, Ilker Turkaslan, Philip H.S. Torr, Pushmeet Kohli, and M. Pawan Kumar. A unified view of piecewise linear neural network verification. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2018.
  • Chvátal [1973] Vašek Chvátal. Edmonds polytopes and a hierarchy of combinatorial problems. Discrete mathematics, 4(4):305–337, 1973.
  • Cook et al. [1986] William Cook, Albertus MH Gerards, Alexander Schrijver, and Éva Tardos. Sensitivity theorems in integer linear programming. Mathematical Programming, 34(3):251–264, 1986.
  • Cornuéjols [2007] Gérard Cornuéjols. Revival of the Gomory cuts in the 1990’s. Annals of Operations Research, 149(1):63–66, 2007.
  • Cornuéjols and Li [2001] Gérard Cornuéjols and Yanjun Li. Elementary closures for integer programs. Operations Research Letters, 28(1):1–8, 2001.
  • Gomory [1960] Ralph Gomory. An algorithm for the mixed integer problem. resreport RM-2597, The Rand Corporation, 1960.
  • Gomory [1958] Ralph E. Gomory. Outline of an algorithm for integer solutions to linear programs. Bulletin of the American Mathematical Society, 64(5):275 – 278, 1958.
  • Gupta and Roughgarden [2017] Rishi Gupta and Tim Roughgarden. A PAC approach to application-specific algorithm selection. SIAM Journal on Computing, 46(3):992–1017, 2017.
  • Huang et al. [2022] Zeren Huang, Kerong Wang, Furui Liu, Hui-Ling Zhen, Weinan Zhang, Mingxuan Yuan, Jianye Hao, Yong Yu, and Jun Wang. Learning to select cuts for efficient mixed-integer programming. Pattern Recognition, 123:108353, 2022.
  • Hutter et al. [2009] Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, and Thomas Stützle. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research, 36(1):267–306, 2009. ISSN 1076-9757.
  • Jeroslow [1974] Robert G Jeroslow. Trivial integer programs unsolvable by branch-and-bound. Mathematical Programming, 6(1):105–109, 1974.
  • Kappes et al. [2013] Jörg Hendrik Kappes, Markus Speth, Gerhard Reinelt, and Christoph Schnörr. Towards efficient and exact map-inference for large scale discrete computer vision problems via combinatorial optimization. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1752–1758. IEEE, 2013.
  • Khashabi et al. [2016] Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. Question answering via integer programming over semi-structured knowledge. In International Joint Conference on Artificial Intelligence (IJCAI), 2016.
  • Kleinberg et al. [2017] Robert Kleinberg, Kevin Leyton-Brown, and Brendan Lucier. Efficiency through procrastination: Approximately optimal algorithm configuration with runtime guarantees. In International Joint Conference on Artificial Intelligence (IJCAI), 2017.
  • Kleinberg et al. [2019] Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier, and Devon Graham. Procrastinating with confidence: Near-optimal, anytime, adaptive algorithm configuration. Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.
  • Land and Doig [1960] Ailsa H Land and Alison G Doig. An automatic method of solving discrete programming problems. Econometrica, pages 497–520, 1960.
  • Li [1993] Wu Li. The sharp lipschitz constants for feasible and optimal solutions of a perturbed linear program. Linear algebra and its applications, 187:15–40, 1993.
  • Mangasarian and Shiau [1987] Olvi L Mangasarian and T-H Shiau. Lipschitz continuity of solutions of linear inequalities, programs and complementarity problems. SIAM Journal on Control and Optimization, 25(3):583–595, 1987.
  • Milnor [1964] John Milnor. On the Betti numbers of real varieties. Proceedings of the American Mathematical Society, 15(2):275–280, 1964.
  • Miyauchi et al. [2018] Atsushi Miyauchi, Tomohiro Sonobe, and Noriyoshi Sukegawa. Exact clustering via integer programming and maximum satisfiability. In AAAI Conference on Artificial Intelligence, 2018.
  • Nemhauser and Wolsey [1999] George Nemhauser and Laurence Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, 1999.
  • Pollard [1984] David Pollard. Convergence of Stochastic Processes. Springer, 1984.
  • Sandholm [2013] Tuomas Sandholm. Very-large-scale generalized combinatorial multi-attribute auctions: Lessons from conducting $60 billion of sourcing. In Zvika Neeman, Alvin Roth, and Nir Vulkan, editors, Handbook of Market Design. Oxford University Press, 2013.
  • Tang et al. [2020] Yunhao Tang, Shipra Agrawal, and Yuri Faenza. Reinforcement learning for integer programming: Learning to cut. International Conference on Machine Learning (ICML), 2020.
  • Thom [1965] René Thom. Sur l’homologie des varietes algebriques reelles. In Differential and combinatorial topology, pages 255–265. Princeton University Press, 1965.
  • Warren [1968] Hugh E Warren. Lower bounds for approximation by nonlinear manifolds. Transactions of the American Mathematical Society, 133(1):167–178, 1968.
  • Zeng et al. [2017] Jiaming Zeng, Berk Ustun, and Cynthia Rudin. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(3):689–722, 2017.

Appendix A Further details about plots

The version of the facility location problem we study involves a set of locations JJ and a set of clients CC. Facilities are to be constructed at some subset of the locations, and the clients in CC are served by these facilities. Each location jJj\in J has a cost fjf_{j} of being the site of a facility, and a cost sc,js_{c,j} of serving client cCc\in C. Finally, each location jj has a capacity κj\kappa_{j} which is a limit on the number of clients jj can serve. The goal of the facility location problem is to arrive at a feasible set of locations for facilities and a feasible assignment of clients to these locations that minimizes the overall cost incurred.

The facility location problem can be formulated as the following 0,10,1 IP:

minimizejJfjxj+jJcCsc,jyc,jsubject tojJyc,j=1cCcCyc,jκjxjjJyc,j{0,1}cC,jJxj{0,1}jJ\begin{array}[]{ll}\text{minimize}&\displaystyle\sum_{j\in J}f_{j}x_{j}+\sum_{j\in J}\sum_{c\in C}s_{c,j}y_{c,j}\\ \text{subject to}&\displaystyle\sum_{j\in J}y_{c,j}=1\hfill\forall\,c\in C\\ &\displaystyle\sum_{c\in C}y_{c,j}\leq\kappa_{j}x_{j}\hfill\forall\,j\in J\\ &y_{c,j}\in\{0,1\}\hfill\qquad\forall\,c\in C,j\in J\\ &x_{j}\in\{0,1\}\hfill\forall\,j\in J\end{array}

We consider the following two distributions over facility location IPs.

First distribution

Facility location IPs are generated by perturbing the costs and capacities of a base facility location IP. We generated the base IP with 4040 locations and 4040 clients by choosing the location costs and client-location costs uniformly at random from [0,100][0,100] and the capacities uniformly at random from {0,,39}\{0,\ldots,39\}. To sample from the distribution, we perturb this base IP by adding independent Gaussian noise with mean 0 and standard deviation 1010 to the cost of each location, the cost of each client-location pair, and the capacity of each location.

Second distribution

Facility location IPs are generated by placing 8080 evenly-spaced locations along the line segment connecting the points (0,1/2)(0,1/2) and (1,1/2)(1,1/2) in the Cartesian plane. The location costs are all uniformly set to 11. Then, 8080 clients are placed uniformly at random in the unit square [0,1]2[0,1]^{2}. The cost sc,js_{c,j} of serving client cc from location jj is the distance between jj and cc. Location capacities are chosen uniformly at random from {0,,43}\{0,\ldots,43\}.

In our experiments, we add five cuts at the root of the B&C tree. These five cuts come from the set of Chvátal-Gomory and Gomory mixed integer cuts derived from the optimal simplex tableau of the LP relaxation. The five cuts added are chosen to maximize a weighting of cutting-plane scores:

μscore1+(1μ)score2.\mu\cdot\texttt{score}_{1}+(1-\mu)\cdot\texttt{score}_{2}. (10)

score1\texttt{score}_{1} is the parallelism of a cut, which intuitively measures the angle formed by the objective vector and the normal vector of the cutting plane—promoting cutting planes that are nearly parallel with the objective direction. score2\texttt{score}_{2} is the efficacy, or depth, of a cut, which measures the perpendicular distance from the LP optimum to the cut—promoting cutting planes that are “deeper”, as measured with respect to the LP optimum. More details about these scoring rules can be found in Balcan et al. [8] and references therein. Given an IP, for each μ[0,1]\mu\in[0,1] (discretized at steps of 0.010.01) we choose the five cuts among the set of Chvátal-Gomory and Gomory mixed integer cuts that maximize (10). Figures 1 and 2 display the average tree size over 1000 samples drawn from the respective distribution for each value of μ\mu used to choose cuts at the root. We ran our experiments using the C API of IBM ILOG CPLEX 20.1.0, with default cut generation disabled.

Appendix B Omitted results and proofs from Section 3

B.1 Example in two dimensions

Consider the LP

max{x+y:x1,y0,yx}.\max\{x+y:x\leq 1,y\geq 0,y\leq x\}.

The optimum is at (x,y)=(1,1)(x^{*},y^{*})=(1,1). Consider adding an additional constraint α1x+α2y1\alpha_{1}x+\alpha_{2}y\leq 1. Let hh denote the hyperplane α1x+α2y=1\alpha_{1}x+\alpha_{2}y=1. We derive a description of the set of parameters (α1,α2)(\alpha_{1},\alpha_{2}) such that hh intersects the hyperplanes x=1x=1 and y=xy=x. The intersection of hh and x=1x=1 is given by

(x,y)=(1,1α1α2),(x,y)=\left(1,\frac{1-\alpha_{1}}{\alpha_{2}}\right),

which exists if and only if α20\alpha_{2}\neq 0. This intersection point is in the LP feasible region if and only if 01α1α210\leq\frac{1-\alpha_{1}}{\alpha_{2}}\leq 1 (which additionally ensures that α20\alpha_{2}\neq 0). Similarly, hh intersects y=xy=x at

(x,y)=(1α1+α2,1α1+α2),(x,y)=\left(\frac{1}{\alpha_{1}+\alpha_{2}},\frac{1}{\alpha_{1}+\alpha_{2}}\right),

which exists if and only if α1+α20\alpha_{1}+\alpha_{2}\neq 0. This intersection point is in the LP feasible region if and only if 01α1+α210\leq\frac{1}{\alpha_{1}+\alpha_{2}}\leq 1. Now, we put down an “indifference” curve in (α1,α2)(\alpha_{1},\alpha_{2})-space that represents the set of (α1,α2)(\alpha_{1},\alpha_{2}) such that the value of the objective achieved at the two aforementioned intersection points is equal. This surface is given by

2α1+α2=1+1α1α2.\frac{2}{\alpha_{1}+\alpha_{2}}=1+\frac{1-\alpha_{1}}{\alpha_{2}}.

Since α1+α20\alpha_{1}+\alpha_{2}\neq 0 and α20\alpha_{2}\neq 0 (for the relevant α1,α2\alpha_{1},\alpha_{2} in consideration), this is equivalent to α12α22α1+α2=0,\alpha_{1}^{2}-\alpha_{2}^{2}-\alpha_{1}+\alpha_{2}=0, which is a degree-22 curve in α1,α2\alpha_{1},\alpha_{2}. The left-hand-side can be factored to write this as (α1α2)(α1+α21)=0(\alpha_{1}-\alpha_{2})(\alpha_{1}+\alpha_{2}-1)=0. Therefore, this curve is given by the two lines α1=α2\alpha_{1}=\alpha_{2} and α1+α2=1\alpha_{1}+\alpha_{2}=1. Figure 3 illustrates the resulting partition of (α1,α2)(\alpha_{1},\alpha_{2})-space.

Refer to caption
Figure 3: Decomposition of the parameter space: the blue region contains the set of (α1,α2)(\alpha_{1},\alpha_{2}) such that the constraint intersects the feasible region at x=1x=1 and x=yx=y. The red lines consist of all (α1,α2)(\alpha_{1},\alpha_{2}) such that the objective value is equal at these intersection points. The red lines partition the blue region into two components: one where the new optimum is achieved at the intersection of hh and x=yx=y, and one where the new optimum is achieved at the intersection of hh and x=1x=1.

It turns out that when n=2n=2 the indifference curve can always be factored into a product of linear terms. Let the objective of the LP be (c1,c2)(c_{1},c_{2}), and let s1x+s2y=u1s_{1}x+s_{2}y=u_{1} and t1x+t2y=v1t_{1}x+t_{2}y=v_{1} be two intersecting edges of the LP feasible region. Let α1x+α2y=β\alpha_{1}x+\alpha_{2}y=\beta be an additional constraint. The intersection points of this constraint with the two lines, if they exist, are given by

(s2βuα2s2α1s1α2,s1βuα1s1α2s2α1) and (t2βvα2t2α1t1α2,t2βvα1t1α2t2α1).\left(\frac{s_{2}\beta-u\alpha_{2}}{s_{2}\alpha_{1}-s_{1}\alpha_{2}},\frac{s_{1}\beta-u\alpha_{1}}{s_{1}\alpha_{2}-s_{2}\alpha_{1}}\right)\text{ and }\left(\frac{t_{2}\beta-v\alpha_{2}}{t_{2}\alpha_{1}-t_{1}\alpha_{2}},\frac{t_{2}\beta-v\alpha_{1}}{t_{1}\alpha_{2}-t_{2}\alpha_{1}}\right).

The indifference surface is thus given by

c1s2βuα2s2α1s1α2+c2s1βuα1s1α2s2α1=c1t2βvα2t2α1t1α2+c2t2βvα1t1α2t2α1.c_{1}\frac{s_{2}\beta-u\alpha_{2}}{s_{2}\alpha_{1}-s_{1}\alpha_{2}}+c_{2}\frac{s_{1}\beta-u\alpha_{1}}{s_{1}\alpha_{2}-s_{2}\alpha_{1}}=c_{1}\frac{t_{2}\beta-v\alpha_{2}}{t_{2}\alpha_{1}-t_{1}\alpha_{2}}+c_{2}\frac{t_{2}\beta-v\alpha_{1}}{t_{1}\alpha_{2}-t_{2}\alpha_{1}}.

For α1,α2\alpha_{1},\alpha_{2} such that s2α1s1α20s_{2}\alpha_{1}-s_{1}\alpha_{2}\neq 0 and t2α1t1α20t_{2}\alpha_{1}-t_{1}\alpha_{2}\neq 0, clearing denominators and some manipulation yields

(c1α2c2α1)((ut1vs1)α2(ut2vs2)α1+(s2t2t1s2)β)=0.(c_{1}\alpha_{2}-c_{2}\alpha_{1})((ut_{1}-vs_{1})\alpha_{2}-(ut_{2}-vs_{2})\alpha_{1}+(s_{2}t_{2}-t_{1}s_{2})\beta)=0.

This curve consists of the two planes c1α2c2α1=0c_{1}\alpha_{2}-c_{2}\alpha_{1}=0 and (ut1vs1)α2(ut2vs2)α1+(s2t2t1s2)β=0(ut_{1}-vs_{1})\alpha_{2}-(ut_{2}-vs_{2})\alpha_{1}+(s_{2}t_{2}-t_{1}s_{2})\beta=0.

This is however not true if n>2n>2. For example, consider an LP in three variables x,y,zx,y,z with the constraints x+y1,x+z1,x1,z1x+y\leq 1,x+z\leq 1,x\leq 1,z\leq 1. Writing out the indifference surface (assuming the objective is 𝒄=(1,1,1)T\bm{c}=(1,1,1)^{T}) for the vertex on the intersection of {x+y=1,x=1}\{x+y=1,x=1\} and the vertex on {x+z=1,z=1}\{x+z=1,z=1\} yields

α1α2α2βα32+α3β=0.\alpha_{1}\alpha_{2}-\alpha_{2}\beta-\alpha_{3}^{2}+\alpha_{3}\beta=0.

Setting β=1\beta=1, we can plot the resulting surface in α1,α2,α3\alpha_{1},\alpha_{2},\alpha_{3} (Figure 4).

Refer to caption
Figure 4: Indifference surface for two edges of the feasible region of an LP in three variables.

B.2 Linear programming sensitivity for multiple constraints

Lemma B.1.

Let (𝐜,A,𝐛)(\bm{c},A,\bm{b}) be an LP and let MM denote the set of its mm constraints. Let 𝐱𝖫𝖯\bm{x}^{*}_{\mathsf{LP}} and z𝖫𝖯z^{*}_{\mathsf{LP}} denote the optimal solution and its objective value, respectively. For FMF\subseteq M, let AF|F|×nA_{F}\in\mathbb{R}^{|F|\times n} and 𝐛F|F|\bm{b}_{F}\in\mathbb{R}^{|F|} denote the restrictions of AA and 𝐛\bm{b} to FF. For knk\leq n, 𝛂1,,𝛂kn\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}\in\mathbb{R}^{n}, β1,,βk\beta_{1},\ldots,\beta_{k}\in\mathbb{R}, and FMF\subseteq M with |F|=nk|F|=n-k, let AF,𝛂1,,𝛂kn×nA_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}}\in\mathbb{R}^{n\times n} denote the matrix obtained by adding row vectors 𝛂1,,𝛂k\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k} to AFA_{F} and let AF,𝛂1,β1,,𝛂k,βkin×nA^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}}\in\mathbb{R}^{n\times n} be the matrix AF,𝛂1,,𝛂kn×nA_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}}\in\mathbb{R}^{n\times n} with the iith column replaced by [𝐛Fβ1βk]T\begin{bmatrix}\bm{b}_{F}&\beta_{1}&\cdots&\beta_{k}\end{bmatrix}^{T}. There is a set of at most KK hyperplanes, nKnmnnK^{n}m^{n} degree-KK polynomial hypersurfaces, and nKnm2nnK^{n}m^{2n} degree-2K2K polynomial hypersurfaces partitioning K(n+1)\mathbb{R}^{K(n+1)} into connected components such that for each component CC, one of the following holds: either (1) 𝐱𝖫𝖯(𝛂1T𝐱β1,,𝛂KT𝐱βK)=𝐱𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K})=\bm{x}^{*}_{\mathsf{LP}}, or (2) there is a subset of cuts indexed by 1,,k[K]\ell_{1},\ldots,\ell_{k}\in[K] and a set of constraints FMF\subseteq M with |F|=nk|F|=n-k such that

𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK)=(det(AF,𝜶1,β1,,𝜶k,βk1)det(AF,𝜶1,,𝜶k),,det(AF,𝜶1,β1,,𝜶k,βkn)det(AF,𝜶1,,𝜶k)),\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K})=\left(\frac{\det(A^{1}_{F,\bm{\alpha}_{\ell_{1}},\beta_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\beta_{\ell_{k}}})}{\det(A_{F,\bm{\alpha}_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}}})},\ldots,\frac{\det(A^{n}_{F,\bm{\alpha}_{\ell_{1}},\beta_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\beta_{\ell_{k}}})}{\det(A_{F,\bm{\alpha}_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}}})}\right),

for all (𝛂1,β1,,𝛂K,βK)C(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K})\in C.

Proof.

First, if none of 𝜶1T𝒙β1,,𝜶KT𝒙βK\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K} separate 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}, then 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK)=𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K})=\bm{x}^{*}_{\mathsf{LP}} and z𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK)=z𝖫𝖯z^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K})=z^{*}_{\mathsf{LP}}. The set of all such cuts is given by the intersection of halfspaces in K(n+1)\mathbb{R}^{K(n+1)} given by

j=1K{(𝜶1,β1,,𝜶k,βk)K(n+1):𝜶jT𝒙𝖫𝖯βj}.\bigcap_{j=1}^{K}\left\{(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k})\in\mathbb{R}^{K(n+1)}:\bm{\alpha}_{j}^{T}\bm{x}^{*}_{\mathsf{LP}}\leq\beta_{j}\right\}. (11)

All other vectors of KK cuts contain at least one cut that separates 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}, and those cuts therefore pass through P={𝐱n:A𝐱𝐛,𝐱𝟎}\pazocal{P}=\{\bm{x}\in\mathbb{R}^{n}:A\bm{x}\leq\bm{b},\bm{x}\geq\bm{0}\}. The new LP optimum is thus achieved at a vertex created by the cuts that separate 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}. As in the proof of Theorem 3.1, we consider all possible new vertices formed by our set of KK cuts. In the case of a single cut, these new vertices necessarily were on edges of P\pazocal{P}, but now they may lie on higher dimensional faces.

Consider a subset of knk\leq n cuts that separate 𝒙𝖫𝖯\bm{x}^{*}_{\mathsf{LP}}. Without loss of generality, denote these cuts by 𝜶1T𝒙β1,,𝜶kT𝒙βk\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{k}^{T}\bm{x}\leq\beta_{k}. We now establish conditions for these kk cuts to “jointly” form a new vertex of P\pazocal{P}. Any vertex created by these cuts must lie on a face ff of P\pazocal{P} with dim(f)=k\dim(f)=k (in the case that k=nk=n, the relevant face ff with dim(f)=n\dim(f)=n is P\pazocal{P} itself). Letting MM denote the set of mm constraints that define P\pazocal{P}, each dimension-kk face ff of P\pazocal{P} can be identified with a (potentially empty) subset FMF\subset M of size nkn-k such that ff is precisely the set of all points 𝒙\bm{x} such that

𝒂iT𝒙=bi\displaystyle\bm{a}_{i}^{T}\bm{x}=b_{i}\qquad iF\displaystyle\forall\,i\in F
𝒂iT𝒙bi\displaystyle\bm{a}_{i}^{T}\bm{x}\leq b_{i}\qquad iMF,\displaystyle\forall\,i\in M\setminus F,

where 𝒂i\bm{a}_{i} is the iith row of AA. Let AFnk×nA_{F}\in\mathbb{R}^{n-k\times n} denote the restriction of AA to only the rows in FF, and let 𝒃Fnk\bm{b}_{F}\in\mathbb{R}^{n-k} denote the entries of 𝒃\bm{b} corresponding to the constraints in FF. Consider removing the inequality constraints defining the face. The intersection of the cuts 𝜶1T𝒙β1,,𝜶kT𝒙βk\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{k}^{T}\bm{x}\leq\beta_{k} and this unbounded surface (if it exists) is precisely the solution to the system of nn linear equations

AF𝒙\displaystyle A_{F}\bm{x} =𝒃F\displaystyle=\bm{b}_{F}
𝜶1T𝒙\displaystyle\bm{\alpha}_{1}^{T}\bm{x} =β1\displaystyle=\beta_{1}
\displaystyle\vdots
𝜶kT𝒙\displaystyle\bm{\alpha}_{k}^{T}\bm{x} =βk.\displaystyle=\beta_{k}.

Let AF,𝜶1,,𝜶kn×nA_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}}\in\mathbb{R}^{n\times n} denote the matrix obtained by adding row vectors 𝜶1,,𝜶k\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k} to AFA_{F}, and let AF,𝜶1,β1,,𝜶k,βkin×nA^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}}\in\mathbb{R}^{n\times n} denote the matrix AF,𝜶1,,𝜶kA_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}} where the iith column is replaced by

[𝒃Fβ1βk]n.\begin{bmatrix}\bm{b}_{F}\\ \beta_{1}\\ \vdots\\ \beta_{k}\end{bmatrix}\in\mathbb{R}^{n}.

By Cramer’s rule, the solution to this system is given by

𝒙=(det(AF,𝜶1,β1,,𝜶k,βk1)det(AF,𝜶1,,𝜶k),,det(AF,𝜶1,β1,,𝜶k,βkn)det(AF,𝜶1,,𝜶k)),\bm{x}=\left(\frac{\det(A^{1}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})},\ldots,\frac{\det(A^{n}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})}\right),

and the value of the objective at this point is

𝒄T𝒙=i=1ncidet(AF,𝜶1,β1,,𝜶k,βki)det(AF,𝜶1,,𝜶k).\bm{c}^{T}\bm{x}=\sum_{i=1}^{n}c_{i}\cdot\frac{\det(A^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})}.

Now, to ensure that the unique intersection point 𝒙\bm{x} (1) exists and (2) actually lies on ff (or simply lies in P\pazocal{P}, in the case that F=F=\emptyset) , we stipulate that it satisfies the inequality constraints in MFM\setminus F. That is,

j=1naijdet(AF,𝜶1,β1,,𝜶k,βk1)det(AF,𝜶1,,𝜶k)bi\sum_{j=1}^{n}a_{ij}\frac{\det(A^{1}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})}\leq b_{i} (12)

for every iMFi\in M\setminus F. If 𝜶1,β1,𝜶k,βk\bm{\alpha}_{1},\beta_{1}\ldots,\bm{\alpha}_{k},\beta_{k} satisfies any of these constraints, it must be that det(AF,𝜶1,,𝜶k)0\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})\neq 0, which guarantees that AF𝒙=𝒃F,𝜶1T𝒙=β1,,𝜶kT𝒙=βkA_{F}\bm{x}=\bm{b}_{F},\bm{\alpha}^{T}_{1}\bm{x}=\beta_{1},\ldots,\bm{\alpha}^{T}_{k}\bm{x}=\beta_{k} indeed has a unique solution. Now, det(AF,𝜶1,,𝜶k)\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}}) is a polynomial in 𝜶1,,𝜶k\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k} of degree k\leq k, since it is multilinear in each coefficient of each 𝜶\bm{\alpha}_{\ell}, =1,,k\ell=1,\ldots,k. Similarly, det(AF,𝜶1,β1,,𝜶k,βk1)\det(A^{1}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}}) is a polynomial in 𝜶1,β1,,𝜶k,βk\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k} of degree k\leq k, again because it is multilinear in each cut parameter. Hence, the boundary each constraint of the form given by Equation 12 is a polynomial of degree at most kk.

The collection of these polynomials for every kk, every subset of {𝜶1T𝒙β1,,𝜶KT𝒙βK}\{\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K}\} of size kk, and every face of P\pazocal{P} of dimension kk, along with the hyperplanes determining separation constraints (Equation 11), partition K(n+1)\mathbb{R}^{K(n+1)} into connected components such that for all (𝜶1,β1,,𝜶K,βK)(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K}) within a given connected component, there is a fixed subset of KK and a fixed set of faces of P\pazocal{P} such that the cuts with indices in that subset intersect every face in the set at a common vertex.

Now, consider a single connected component, denoted by CC. Let f1,,ff_{1},\ldots,f_{\ell} denote the faces intersected by vectors of cuts in CC, and let (without loss of generality) 1,,k1,\ldots,k denote the subset of cuts that intersect these faces. Let F1,,FMF_{1},\ldots,F_{\ell}\subset M denote the sets of constraints that are binding at each of these faces, respectively. For each pair fp,fqf_{p},f_{q}, consider the surface

i=1ncidet(AFp,𝜶1,β1,,𝜶k,βki)det(AFp,𝜶1,,𝜶k)=i=1ncidet(AFq,𝜶1,β1,,𝜶k,βki)det(AFq,𝜶1,,𝜶k),\sum_{i=1}^{n}c_{i}\cdot\frac{\det(A^{i}_{F_{p},\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})}{\det(A_{F_{p},\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})}=\sum_{i=1}^{n}c_{i}\cdot\frac{\det(A^{i}_{F_{q},\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})}{\det(A_{F_{q},\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})},

which can be equivalently written as

i=1ncidet(AFp,𝜶1,β1,,𝜶k,βki)det(AFq,𝜶1,,𝜶k)=i=1ncidet(AFq,𝜶1,β1,,𝜶k,βki)det(AFp,𝜶1,,𝜶k).\sum_{i=1}^{n}c_{i}\cdot\det(A^{i}_{F_{p},\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})\det(A_{F_{q},\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}})=\sum_{i=1}^{n}c_{i}\cdot\det(A^{i}_{F_{q},\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k}})\det(A_{F_{p},\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}}). (13)

This is a degree-2k2k polynomial hypersurface in (𝜶1,β1,,𝜶K,βK)K(n+1)(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K})\in\mathbb{R}^{K(n+1)}. This hypersurface is precisely the set of all cut vectors for which the LP objective achieved at the vertex on face fpf_{p} is equal to the LP objective value achieved at the vertex on face fqf_{q}. The collection of these surfaces for each p,qp,q partitions CC into further connected components. Within each of these connected components, the face containing the vertex that maximizes the objective is invariant, and the subset of cuts passing through that vertex is invariant. If FMF\subseteq M is the set of binding constraints representing this face, and 1,,k[K]\ell_{1},\ldots,\ell_{k}\in[K] represent the subset of cuts intersecting this face, 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K}) and z𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK)z^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K}) have the closed forms:

𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK)=(det(AF,𝜶1,β1,,𝜶k,βk1)det(AF,𝜶1,,𝜶k),,det(AF,𝜶1,β1,,𝜶k,βkn)det(AF,𝜶1,,𝜶k)),\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K})=\left(\frac{\det(A^{1}_{F,\bm{\alpha}_{\ell_{1}},\beta_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\beta_{\ell_{k}}})}{\det(A_{F,\bm{\alpha}_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}}})},\ldots,\frac{\det(A^{n}_{F,\bm{\alpha}_{\ell_{1}},\beta_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\beta_{\ell_{k}}})}{\det(A_{F,\bm{\alpha}_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}}})}\right),

and

z𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK)=i=1ncidet(AF,𝜶1,β1,,𝜶k,βki)det(AF,𝜶1,,𝜶k).z^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K})=\sum_{i=1}^{n}c_{i}\cdot\frac{\det(A^{i}_{F,\bm{\alpha}_{\ell_{1}},\beta_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\beta_{\ell_{k}}})}{\det(A_{F,\bm{\alpha}_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}}})}.

for all (𝜶1,β1,,𝜶K,βK)(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K}) within this component. We now count the number of surfaces used to obtain our decomposition. First, we added KK hyperplanes encoding separation constraints for each of the KK cuts (Equation 11). Then, for every subset SKS\subseteq K of size n\leq n, and for every face FF of P\pazocal{P} with dim(F)=|S|\dim(F)=|S|, we first considered at most |MF|m|M\setminus F|\leq m degree-K\leq K polynomial hypersurfaces representing decision boundaries for when cuts in SS intersected that face (Equation 12). The number of kk-dimensional faces of P\pazocal{P} is at most (mnk)mnkmn1\binom{m}{n-k}\leq m^{n-k}\leq m^{n-1}, so the total number of these hypersurfaces is at most ((K0)++(Kn))mnnKnmn(\binom{K}{0}+\cdots+\binom{K}{n})m^{n}\leq nK^{n}m^{n}. Finally, we considered a degree-2K2K polynomial hypersurface for every subset of cuts and every pair of faces with degree equal to the size of the subset, of which there are at most nKn(mn2)nKnm2nnK^{n}\binom{m^{n}}{2}\leq nK^{n}m^{2n}. ∎

Appendix C Omitted results and proofs from Section 4

Proof of Lemma 4.1.

Consider as an example σ1={𝒙[1]1,𝒙[1]5}\sigma_{1}=\{\bm{x}[1]\leq 1,\bm{x}[1]\leq 5\} and σ2={𝒙[1]1}\sigma_{2}=\{\bm{x}[1]\leq 1\}. We have 𝒙𝖫𝖯(𝜶T𝒙β,σ1)=𝒙𝖫𝖯(𝜶T𝒙β,σ2)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{1})=\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{2}) for any cut 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta, because the constraint 𝒙[1]5\bm{x}[1]\leq 5 is redundant in σ1\sigma_{1}. More generally, any σBC\sigma\subseteq\pazocal{B}\pazocal{C} can be reduced by preserving only the tightest \leq constraint and tightest \geq constraint without affecting the resulting LP optimal solutions. The number of such unique reduced sets is at most ((τ+2)2)n<τ3n((\tau+2)^{2})^{n}<\tau^{3n} (for each variable, there are τ+2\tau+2 possibilities for the tightest \leq constraint: no constraint or one of 𝒙[i]0,,𝒙[i]τ\bm{x}[i]\leq 0,\ldots,\bm{x}[i]\leq\tau, and similarly τ+2\tau+2 possibilities for the \geq constraint). ∎

Proof of Lemma 4.2.

We carry out the same reasoning in the proof of Theorem 3.1 for each reduced σ\sigma. The number of edges of P(σ)\pazocal{P}(\sigma) is at most (m+|σ|n1)(m+|σ|)n1\binom{m+|\sigma|}{n-1}\leq(m+|\sigma|)^{n-1}. For each edge EE, we considered at most |(Mσ)E|m+|σ||(M\cup\sigma)\setminus E|\leq m+|\sigma| hyperplanes, for a total of at most (m+|σ|)n(m+|\sigma|)^{n} halfspaces. Then, we had a degree-22 polynomial hypersurface for every pair of edges, of which there are at most ((m+|σ|)n2)(m+|σ|)2n\binom{(m+|\sigma|)^{n}}{2}\leq(m+|\sigma|)^{2n}. Summing over all reduced σ\sigma (of which there are at most τ3n\tau^{3n}), combined with the fact that if σ\sigma is reduced then |σ|2n|\sigma|\leq 2n, we get a total of at most (m+2n)nτ3n(m+2n)^{n}\tau^{3n} hyperplanes and at most (m+2n)2nτ3n(m+2n)^{2n}\tau^{3n} degree-22 hypersurfaces, as desired. ∎

Proof of Lemma 4.4.

Fix a connected component CC in the decomposition that includes the facets defining V\pazocal{V} and the surfaces obtained in Lemma 4.3. For all σBC\sigma\in\pazocal{B}\pazocal{C}, 𝒙𝖨P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{I}}, and i=1,,ni=1,\ldots,n, consider the surface

𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=𝒙𝖨[i].\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\bm{x}_{\mathsf{I}}[i]. (14)

This surface is a hyperplane, since by Lemma 4.2, either 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=𝒙𝖫𝖯(σ)[i]\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\bm{x}^{*}_{\mathsf{LP}}(\sigma)[i] or 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AE,𝜶,β,σi)det(AE,𝜶,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})}, where EMσE\subseteq M\cup\sigma is the subset of constraints corresponding to σ\sigma and CC. Clearly, within any connected component of CC induced by these hyperplanes, for every σ\sigma and 𝒙𝖨P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{I}}, 𝟏[𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖨]\mathbf{1}[\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}_{\mathsf{I}}] is invariant. Finally, if 𝒙𝖫𝖯(𝜶T𝒙β,σ)n\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)\in\mathbb{Z}^{n} for some cut 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta within a given connected component, 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖨\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}_{\mathsf{I}} for some 𝒙𝖨P𝖨𝖧(σ)P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{IH}}(\sigma)\subseteq\pazocal{P}_{\mathsf{I}}, which means that 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖨n\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}_{\mathsf{I}}\in\mathbb{Z}^{n} for all cuts 𝜶T𝒙β\bm{\alpha}^{T}\bm{x}\leq\beta in that connected component.

We now count the number of hyperplanes given by Equation 14. For each σ\sigma, there are (m+|σ|n1)(m+2n)n1\binom{m+|\sigma|}{n-1}\leq(m+2n)^{n-1} binding edge constraints EMσE\subseteq M\cup\sigma defining the formula of Lemma 4.2, and we have n|P𝖨|n|\pazocal{P}_{\mathsf{I}}| hyperplanes for each EE. Since τ=max𝒙P𝖨𝒙\tau=\max_{\bm{x}\in\pazocal{P}_{\mathsf{I}}}\left\lVert\bm{x}\right\rVert_{\infty}, |P𝖨|τn|\pazocal{P}_{\mathsf{I}}|\leq\tau^{n}. So the total number of hyperplanes given by Equation 14 is at most τ3n(m+2n)n1nτn(m+2n)nτ4n.\tau^{3n}(m+2n)^{n-1}n\tau^{n}\leq(m+2n)^{n}\tau^{4n}. The number of facets defining V\pazocal{V} is at most |P𝖨𝖧||P𝖨|τn|\pazocal{P}_{\mathsf{IH}}|\leq|\pazocal{P}_{\mathsf{I}}|\leq\tau^{n}. Adding these to the counts obtained in Lemma 4.3 yields the final tallies in the lemma statement. ∎

Proof of Theorem 4.5.

Fix a connected component CC in the decomposition induced by the set of hyperplanes and degree-22 hypersurfaces established in Lemma 4.4. Let

Q1,,Qi1,I1,Qi1+1,,Qi2,I2,Qi2+1,Q_{1},\ldots,Q_{i_{1}},I_{1},Q_{i_{1}+1},\ldots,Q_{i_{2}},I_{2},Q_{i_{2}+1},\ldots (15)

denote the nodes of the tree branch-and-cut creates, in order of exploration, under the assumption that a node is pruned if and only if either the LP at that node is infeasible or the LP optimal solution is integral (so the “bounding” of branch-and-bound is suppressed). Here, a node is identified by the list σ\sigma of branching constraints added to the input IP. Nodes labeled by QQ are either infeasible or have fractional LP optimal solutions. Nodes labeled by II have integral LP optimal solutions and are candidates for the incumbent integral solution at the point they are encountered. (The nodes are functions of 𝜶\bm{\alpha} and β\beta, as are the indices i1,i2,i_{1},i_{2},\ldots.) By Lemma 4.4 and the observation following it, this ordered list of nodes is invariant over all (𝜶,β)C(\bm{\alpha},\beta)\in C.

Now, given an node index \ell, let I()I(\ell) denote the incumbent node with the highest objective value encountered up until the \ellth node searched by B&C, and let z(I())z(I(\ell)) denote its objective value. For each node QQ_{\ell}, let σ\sigma_{\ell} denote the branching constraints added to arrive at node QQ_{\ell}. The hyperplane

z𝖫𝖯(𝜶T𝒙β,σ)=z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell})=z(I(\ell)) (16)

(which is a hyperplane due to Lemma 4.2) partitions CC into two subregions. In one subregion, z𝖫𝖯(𝜶T𝒙β,σ)z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell})\leq z(I(\ell)), that is, the objective value of the LP optimal solution is no greater than the objective value of the current incumbent integer solution, and so the subtree rooted at QQ_{\ell} is pruned. In the other subregion, z𝖫𝖯(𝜶T𝒙β,σ)>z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell})>z(I(\ell)), and QQ_{\ell} is branched on further. Therefore, within each connected component of CC induced by all hyperplanes given by Equation 16 for all \ell, the set of node within the list (15) that are pruned is invariant. Combined with the surfaces established in Lemma 4.4, these hyperplanes partition n+1\mathbb{R}^{n+1} into connected components such that as (𝜶,β)(\bm{\alpha},\beta) varies within a given component, the tree built by branch-and-cut is invariant.

Finally, we count the total number of surfaces inducing this partition. Unlike the counting stages of the previous lemmas, we will first have to count the number of connected components induced by the surfaces established in Lemma 4.4. This is because the ordered list of nodes explored by branch-and-cut (15) can be different across each component, and the hyperplanes given by Equation 16 depend on this list. From Lemma 4.4 we have 3(m+2n)nτ4n3(m+2n)^{n}\tau^{4n} hyperplanes, 3(m+2n)3nτ4n3(m+2n)^{3n}\tau^{4n} degree-22 polynomial hypersurfaces, and (m+2n)6nτ4n(m+2n)^{6n}\tau^{4n} degree-55 polynomial hypersurfaces. To determine the connected components of n+1\mathbb{R}^{n+1} induced by the zero sets of these polynomials, it suffices to consider the zero set of the product of all polynomials defining these surfaces. Denote this product polynomial by pp. The degree of the product polynomial is the sum of the degrees of 3(m+2n)nτ4n3(m+2n)^{n}\tau^{4n} degree-11 polynomials, 3(m+2n)3nτ4n3(m+2n)^{3n}\tau^{4n} degree-22 polynomials, and (m+2n)6nτ4n(m+2n)^{6n}\tau^{4n} degree-55 polynomials, which is at most 3(m+2n)nτ4n+23(m+2n)3nτ4n+5(m+2n)6nτ4n<14(m+2n)3nτ4n3(m+2n)^{n}\tau^{4n}+2\cdot 3(m+2n)^{3n}\tau^{4n}+5\cdot(m+2n)^{6n}\tau^{4n}<14(m+2n)^{3n}\tau^{4n}. By Warren’s theorem, the number of connected components of n+1{(𝜶,β):p(𝜶,β)=0}\mathbb{R}^{n+1}\setminus\{(\bm{\alpha},\beta):p(\bm{\alpha},\beta)=0\} is O((14(m+2n)3nτ4n)n1)O((14(m+2n)^{3n}\tau^{4n})^{n-1}), and by the Milnor-Thom theorem, the number of connected components of {(𝜶,β):p(𝜶,β)=0}\{(\bm{\alpha},\beta):p(\bm{\alpha},\beta)=0\} is O((14(m+2n)3nτ4n)n1)O((14(m+2n)^{3n}\tau^{4n})^{n-1}) as well. So, the number of connected components induced by the surfaces in Lemma 4.4 is O(14n(m+2n)3n2τ4n2).O(14^{n}(m+2n)^{3n^{2}}\tau^{4n^{2}}). For every connected component CC in Lemma 4.4, the closed form of z𝖫𝖯(𝜶T𝒙β,σ)z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell}) is already determined due to Lemma 4.2, and so the number of hyperplanes given by Equation 16 is at most the number of possible σBC\sigma\subseteq\pazocal{B}\pazocal{C}, which is at most τ3n\tau^{3n}. So across all connected components CC, the total number of hyperplanes given by Equation 16 is O(14n(m+2n)3n2τ5n2).O(14^{n}(m+2n)^{3n^{2}}\tau^{5n^{2}}). Finally, adding this to the surface-counts established in Lemma 4.4 yields the lemma statement. ∎

C.1 Product scoring rule for variable selection

Let σ\sigma be the set of branching constraints added thus far. The product scoring rule branches on the variable i[n]i\in[n] that maximizes:

max{z𝖫𝖯(σ)z𝖫𝖯(xix𝖫𝖯(σ)[i],σ),γ}max{z𝖫𝖯(σ)z𝖫𝖯(xix𝖫𝖯(σ)[i],σ),γ},\max\{z_{\mathsf{LP}}^{*}(\sigma)-z_{\mathsf{LP}}^{*}(x_{i}\leq\lfloor x_{\mathsf{LP}}^{*}(\sigma)[i]\rfloor,\sigma),\gamma\}\cdot\max\{z_{\mathsf{LP}}^{*}(\sigma)-z_{\mathsf{LP}}^{*}(x_{i}\geq\lceil x_{\mathsf{LP}}^{*}(\sigma)[i]\rceil,\sigma),\gamma\},

where γ=106\gamma=10^{-6}.

Lemma C.1.

There is a set of of at most 3(m+2n)nτ3n3(m+2n)^{n}\tau^{3n} hyperplanes and (m+2n)2nτ3n(m+2n)^{2n}\tau^{3n} degree-22 polynomial hypersurfaces partitioning n+1\mathbb{R}^{n+1} into connected components such that for any connected component CC and any σ\sigma, the set of branching constraints {xi𝐱𝖫𝖯(𝛂T𝐱β,σ)[i],xi𝐱𝖫𝖯(𝛂T𝐱β,σ)[i]i[n]}\{x_{i}\leq\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rfloor,x_{i}\geq\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rceil\mid i\in[n]\} is invariant across all (𝛂,β)C(\bm{\alpha},\beta)\in C.

Proof.

Fix a connected component CC in the decomposition established in Lemma 4.2. By Lemma 4.2, for each σ\sigma, either 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma) or there exists EMσE\subseteq M\cup\sigma such that 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AE,𝜶,β,σi)det(AE,𝜶,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})} for all (𝜶,β)C(\bm{\alpha},\beta)\in C. Fix a variable i[n]i\in[n], which corresponds to two branching constraints

xi𝒙𝖫𝖯(𝜶T𝒙β,σ)[i] and xi𝒙𝖫𝖯(𝜶T𝒙β,σ)[i].x_{i}\leq\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rfloor\text{ and }x_{i}\geq\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rceil. (17)

If CC is a component where 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma), then these two branching constraints are trivially invariant over (𝜶,β)C(\bm{\alpha},\beta)\in C. Otherwise, in order to further decompose CC such that the right-hand-sides of these constraints are invariant for every σ\sigma, we add the two decision boundaries given by

kdet(AE,𝜶,β,σi)det(AE,𝜶,σ)k+1k\leq\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})}\leq k+1

for every ii, σ\sigma, and every integer k=0,,τ1k=0,\ldots,\tau-1, where τ=max𝒙Pn𝒙\tau=\max_{\bm{x}\in\pazocal{P}\cap\mathbb{Z}^{n}}\left\lVert\bm{x}\right\rVert_{\infty}. This ensures that within every connected component of CC induced by these boundaries (hyperplanes),

𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AE,𝜶,β,σi)det(AE,𝜶,σ) and 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AE,𝜶,β,σi)det(AE,𝜶,σ)\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rfloor=\left\lfloor\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})}\right\rfloor\text{ and }\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rceil=\left\lceil\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})}\right\rceil

are invariant, so the branching constraints from Equation (17) are invariant. For a fixed σ\sigma, there are two hyperplanes for every EMσE\subseteq M\cup\sigma corresponding to an edge of P(σ)\pazocal{P}(\sigma) and i=1,,ni=1,\ldots,n, for a total of at most 2n(m+|σ|n1)2n(m+|σ|)n12n\binom{m+|\sigma|}{n-1}\leq 2n(m+|\sigma|)^{n-1} hyperplanes. Summing over all reduced σ\sigma, we get a total of 2n(m+2n)n1τ3n<2(m+2n)nτ3n2n(m+2n)^{n-1}\tau^{3n}<2(m+2n)^{n}\tau^{3n} hyperplanes. Adding these hyperplanes to the set of hyperplanes established in Lemma 4.2 yields the lemma statement. ∎

Proof of Lemma 4.3.

Fix a connected component CC in the decomposition established in Lemma C.1. We know that for each set of branching constraints σ\sigma:

  • By Lemma 4.2, either 𝒙𝖫𝖯(𝜶T𝒙β,σ)=𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma) or there exists EMσE\subseteq M\cup\sigma such that 𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AE,𝜶,β,σi)det(AE,𝜶,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]=\frac{\det(A^{i}_{E,\bm{\alpha},\beta,\sigma})}{\det(A_{E,\bm{\alpha},\sigma})} for all (𝜶,β)C(\bm{\alpha},\beta)\in C and all i[n]i\in[n], and

  • The set of branching constraints {xi𝒙𝖫𝖯(𝜶T𝒙β,σ)[i],xi𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]i[n]}\{x_{i}\leq\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rfloor,x_{i}\geq\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rceil\mid i\in[n]\} is invariant across all (𝜶,β)C(\bm{\alpha},\beta)\in C.

Suppose that σ\sigma is the list of branching constraints added so far. For any variable k[n]k\in[n], let

σk=(xk𝒙𝖫𝖯(𝜶T𝒙β,σ)[k],σ) and σk+=(xk𝒙𝖫𝖯(𝜶T𝒙β,σ)[k],σ).\sigma_{k}^{-}=(x_{k}\leq\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[k]\right\rfloor,\sigma)\text{ and }\sigma_{k}^{+}=(x_{k}\geq\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[k]\right\rceil,\sigma).

So long as (𝜶,β)C(\bm{\alpha},\beta)\in C, σk\sigma_{k}^{-} and σk+\sigma_{k}^{+} are fixed. With this notation, we can write the product scoring rule as

max{z𝖫𝖯(𝜶T𝒙β,σ)z𝖫𝖯(𝜶T𝒙β,σk),γ}max{z𝖫𝖯(𝜶T𝒙β,σ)z𝖫𝖯(𝜶T𝒙β,σk+),γ},\max\{z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)-z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{k}^{-}),\gamma\}\cdot\max\{z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)-z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{k}^{+}),\gamma\},

where γ=106\gamma=10^{-6}.

By Lemma 4.2, we know that across all (𝜶,β)C(\bm{\alpha},\beta)\in C, either z𝖫𝖯(𝜶T𝒙β,σk+)=z𝖫𝖯(σk+)z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{k}^{+})=z^{*}_{\mathsf{LP}}(\sigma_{k}^{+}) or there exists Ek+Mσk+E_{k}^{+}\subseteq M\cup\sigma_{k}^{+} such that

z𝖫𝖯(𝜶T𝒙β,σk+)=i=1ncidet(AEk+,𝜶,β,σk+i)det(AEk+,𝜶,σk+),z^{*}_{\mathsf{LP}}\left(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{k}^{+}\right)=\sum_{i=1}^{n}c_{i}\cdot\frac{\det\left(A^{i}_{E_{k}^{+},\bm{\alpha},\beta,\sigma_{k}^{+}}\right)}{\det\left(A_{E_{k}^{+},\bm{\alpha},\sigma_{k}^{+}}\right)},

and similarly for σk\sigma_{k}^{-}, defined according to some edge set EkMσkE_{k}^{-}\subseteq M\cup\sigma_{k}^{-}. Therefore, for each k[n]k\in[n], there is a single degree-2 polynomial hypersurface partitioning CC into connected components such that within each connected component, either

z𝖫𝖯(𝜶T𝒙β,σ)z𝖫𝖯(𝜶T𝒙β,σk)γz_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)-z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{k}^{-})\geq\gamma (18)

or vice versa, and similarly for σk+\sigma_{k}^{+}. In particular, the former hypersurface will have one of four forms:

  1. 1.

    z𝖫𝖯(σ)z𝖫𝖯(σk)γz^{*}_{\mathsf{LP}}(\sigma)-z^{*}_{\mathsf{LP}}(\sigma_{k}^{-})\geq\gamma, which is uniformly satisfied or not satisfied across all (𝜶,β)C(\bm{\alpha},\beta)\in C,

  2. 2.

    z𝖫𝖯(σ)i=1ncidet(AEk,𝜶,β,σki)det(AEk,𝜶,σk)γz^{*}_{\mathsf{LP}}(\sigma)-\sum_{i=1}^{n}c_{i}\cdot\frac{\det\left(A^{i}_{E_{k}^{-},\bm{\alpha},\beta,\sigma_{k}^{-}}\right)}{\det\left(A_{E_{k}^{-},\bm{\alpha},\sigma_{k}^{-}}\right)}\geq\gamma, which is a hyperplane,

  3. 3.

    i=1ncidet(AE,𝜶,β,σi)det(AE,𝜶,σ)z𝖫𝖯(σk)γ\sum_{i=1}^{n}c_{i}\cdot\frac{\det\left(A^{i}_{E,\bm{\alpha},\beta,\sigma}\right)}{\det\left(A_{E,\bm{\alpha},\sigma}\right)}-z^{*}_{\mathsf{LP}}(\sigma_{k}^{-})\geq\gamma, which is a hyperplane, or

  4. 4.

    i=1nci(det(AE,𝜶,β,σi)det(AE,𝜶,σ)det(AEk,𝜶,β,σki)det(AEk+,𝜶,σk))γ\sum_{i=1}^{n}c_{i}\left(\frac{\det\left(A^{i}_{E,\bm{\alpha},\beta,\sigma}\right)}{\det\left(A_{E,\bm{\alpha},\sigma}\right)}-\frac{\det\left(A^{i}_{E_{k}^{-},\bm{\alpha},\beta,\sigma_{k}^{-}}\right)}{\det\left(A_{E_{k}^{+},\bm{\alpha},\sigma_{k}^{-}}\right)}\right)\geq\gamma, which is a degree-2 polynomial hypersurface.

Simply said, these are all degree-2 polynomial hypersurfaces.

Within any region induced by these hypersurfaces, the comparison between any two variables xkx_{k} and xjx_{j} will have the form

max{z𝖫𝖯(𝜶T𝒙β,σ)z𝖫𝖯(𝜶T𝒙β,σk),γ}max{z𝖫𝖯(𝜶T𝒙β,σ)z𝖫𝖯(𝜶T𝒙β,σk+),γ}\displaystyle\max\{z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)-z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{k}^{-}),\gamma\}\cdot\max\{z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)-z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{k}^{+}),\gamma\}
\displaystyle\geq\, max{z𝖫𝖯(𝜶T𝒙β,σ)z𝖫𝖯(𝜶T𝒙β,σj),γ}max{z𝖫𝖯(𝜶T𝒙β,σ)z𝖫𝖯(𝜶T𝒙β,σj+),γ}\displaystyle\max\{z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)-z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{j}^{-}),\gamma\}\cdot\max\{z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)-z_{\mathsf{LP}}^{*}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{j}^{+}),\gamma\}

which at its most complex will equal

i=1nci(det(AE,𝜶,β,σi)det(AE,𝜶,σ)det(AEk,𝜶,β,σki)det(AEk,𝜶,σk))i=1nci(det(AE,𝜶,β,σi)det(AE,𝜶,σ)det(AEk+,𝜶,β,σk+i)det(AEk+,𝜶,σk+))\displaystyle\sum_{i=1}^{n}c_{i}\left(\frac{\det\left(A^{i}_{E,\bm{\alpha},\beta,\sigma}\right)}{\det\left(A_{E,\bm{\alpha},\sigma}\right)}-\frac{\det\left(A^{i}_{E_{k}^{-},\bm{\alpha},\beta,\sigma_{k}^{-}}\right)}{\det\left(A_{E_{k}^{-},\bm{\alpha},\sigma_{k}^{-}}\right)}\right)\cdot\sum_{i=1}^{n}c_{i}\left(\frac{\det\left(A^{i}_{E,\bm{\alpha},\beta,\sigma}\right)}{\det\left(A_{E,\bm{\alpha},\sigma}\right)}-\frac{\det\left(A^{i}_{E_{k}^{+},\bm{\alpha},\beta,\sigma_{k}^{+}}\right)}{\det\left(A_{E_{k}^{+},\bm{\alpha},\sigma_{k}^{+}}\right)}\right) (19)
\displaystyle\geq i=1nci(det(AE,𝜶,β,σi)det(AE,𝜶,σ)det(AEj,𝜶,β,σji)det(AEj,𝜶,σj))i=1nci(det(AE,𝜶,β,σi)det(AE,𝜶,σ)det(AEj+,𝜶,β,σj+i)det(AEj+,𝜶,σj+)).\displaystyle\sum_{i=1}^{n}c_{i}\left(\frac{\det\left(A^{i}_{E,\bm{\alpha},\beta,\sigma}\right)}{\det\left(A_{E,\bm{\alpha},\sigma}\right)}-\frac{\det\left(A^{i}_{E_{j}^{-},\bm{\alpha},\beta,\sigma_{j}^{-}}\right)}{\det\left(A_{E_{j}^{-},\bm{\alpha},\sigma_{j}^{-}}\right)}\right)\cdot\sum_{i=1}^{n}c_{i}\left(\frac{\det\left(A^{i}_{E,\bm{\alpha},\beta,\sigma}\right)}{\det\left(A_{E,\bm{\alpha},\sigma}\right)}-\frac{\det\left(A^{i}_{E_{j}^{+},\bm{\alpha},\beta,\sigma_{j}^{+}}\right)}{\det\left(A_{E_{j}^{+},\bm{\alpha},\sigma_{j}^{+}}\right)}\right).

This inequality can be written as a degree-5 polynomial hypersurface. In any region induced by these hypersurfaces, the variable that branch-and-cut branches on will be fixed.

We now count the total number of hypersurfaces. First, we count the number of degree-2 polynomial hypersurfaces from Equation (18): there is a hypersurface defined by each variable xkx_{k}, set of branching constraints σ\sigma, cutoff t[τ]t\in[\tau] such that σk=(xkt,σ)\sigma_{k}^{-}=(x_{k}\leq t,\sigma), set EMσE\subseteq M\cup\sigma corresponding to an edge of P(σ)\pazocal{P}(\sigma), and set EkMσkE_{k}^{-}\subseteq M\cup\sigma_{k}^{-} (and similarly for σk+\sigma_{k}^{+} and Ek+E_{k}^{+}). For a fixed σ\sigma, this amounts to 2nτ(m+|σ|n1)(m+|σ|+1n1)2nτ(m+|σ|+1)2(n1)2n\tau\binom{m+|\sigma|}{n-1}\binom{m+|\sigma|+1}{n-1}\leq 2n\tau(m+|\sigma|+1)^{2(n-1)} hypersurfaces. Summing over all τ3n\tau^{3n} reduced σ\sigma, we have 2nτ3n+1(m+2n+1)2(n1)2n\tau^{3n+1}(m+2n+1)^{2(n-1)} degree-2 polynomial hypersurfaces.

Next, we count the number of degree-5 polynomial hypersurfaces from Equation (19): there is a hypersurface defined by each pair of variables xk,xjx_{k},x_{j}, set of branching constraints σ\sigma, cutoffs tk,tj[τ]t_{k},t_{j}\in[\tau] such that σk=(xktk,σ)\sigma_{k}^{-}=(x_{k}\leq t_{k},\sigma) and σj=(xjtj,σ)\sigma_{j}^{-}=(x_{j}\leq t_{j},\sigma), and sets E,Ek,Ek+,Ej,Ej+E,E_{k}^{-},E_{k}^{+},E_{j}^{-},E_{j}^{+} corresponding to edges of P(σ),P(σk),P(σk+),P(σj),P(σj+)\pazocal{P}(\sigma),\pazocal{P}(\sigma_{k}^{-}),\pazocal{P}(\sigma_{k}^{+}),\pazocal{P}(\sigma_{j}^{-}),\pazocal{P}(\sigma_{j}^{+}). For a fixed σ\sigma, this amounts to n2τ2(m+|σ|n1)(m+|σ|+1n1)4n2τ2(m+|σ|+1)5(n1)n^{2}\tau^{2}\binom{m+|\sigma|}{n-1}\binom{m+|\sigma|+1}{n-1}^{4}\leq n^{2}\tau^{2}(m+|\sigma|+1)^{5(n-1)} hypersurfaces. Summing over all τ3n\tau^{3n} reduced σ\sigma, we have n2τ3n+2(m+2n+1)5(n1)n^{2}\tau^{3n+2}(m+2n+1)^{5(n-1)} degree-5 polynomial hypersurfaces.

Adding these hypersurfaces to those from Lemma C.1, we get the lemma statement. ∎

C.2 Extension to multiple cutting planes

We can similarly derive a multi-cut version of Lemma 4.2 that controls 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma) for any set of branching constraints. We use the following notation. Let (𝒄,A,𝒃)(\bm{c},A,\bm{b}) be an LP and let MM denote the set of its mm constraints. For FMσF\subseteq M\cup\sigma, let AF,σ|F|×nA_{F,\sigma}\in\mathbb{R}^{|F|\times n} and 𝒃F,σ|F|\bm{b}_{F,\sigma}\in\mathbb{R}^{|F|} denote the restrictions of AσA_{\sigma} and 𝒃σ\bm{b}_{\sigma} to FF. For 𝜶1,,𝜶kn\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k}\in\mathbb{R}^{n}, β1,,βk\beta_{1},\ldots,\beta_{k}\in\mathbb{R}, and FMσF\subseteq M\cup\sigma with |F|=nk|F|=n-k, let AF,𝜶1,,𝜶k,σn×nA_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k},\sigma}\in\mathbb{R}^{n\times n} denote the matrix obtained by adding row vectors 𝜶1,,𝜶k\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k} to AF,σA_{F,\sigma} and let AF,𝜶1,β1,,𝜶k,βk,σin×nA^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k},\sigma}\in\mathbb{R}^{n\times n} be the matrix AF,𝜶1,,𝜶k,σn×nA_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k},\sigma}\in\mathbb{R}^{n\times n} with the iith column replaced by [𝒃F,σβ1βk]T\begin{bmatrix}\bm{b}_{F,\sigma}&\beta_{1}&\cdots&\beta_{k}\end{bmatrix}^{T}.

Corollary C.2.

Fix an IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}). There is a set of at most KK hyperplanes, nKn(m+2n)nτ3nnK^{n}(m+2n)^{n}\tau^{3n} degree-KK polynomial hypersurfaces, and nKn(m+2n)2nτ3nnK^{n}(m+2n)^{2n}\tau^{3n} degree-2K2K polynomial hypersurfaces partitioning K(n+1)\mathbb{R}^{K(n+1)} into connected components such that for each component CC and every σBC\sigma\subseteq\pazocal{B}\pazocal{C}, one of the following holds: either (1) 𝐱𝖫𝖯(𝛂1T𝐱β1,,𝛂KT𝐱βK,σ)=𝐱𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma), or (2) there is a subset of cuts indexed by 1,,k[K]\ell_{1},\ldots,\ell_{k}\in[K] and a set of constraints FMσF\subseteq M\cup\sigma with |F|=nk|F|=n-k such that

𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)=(det(AF,𝜶1,β1,,𝜶k,βk,σ1)det(AF,𝜶1,,𝜶k,σ),,det(AF,𝜶1,β1,,𝜶k,βk,σn)det(AF,𝜶1,,𝜶k,σ)),\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)=\left(\frac{\det(A^{1}_{F,\bm{\alpha}_{\ell_{1}},\beta_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\beta_{\ell_{k}},\sigma})}{\det(A_{F,\bm{\alpha}_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\sigma})},\ldots,\frac{\det(A^{n}_{F,\bm{\alpha}_{\ell_{1}},\beta_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\beta_{\ell_{k}},\sigma})}{\det(A_{F,\bm{\alpha}_{\ell_{1}},\ldots,\bm{\alpha}_{\ell_{k}},\sigma})}\right),

for all (𝛂1,β1,,𝛂K,βK)C(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K})\in C.

Proof.

The exact same reasoning in the proof of Lemma B.1 applies. We still have KK hyperplanes. Now, for each σ\sigma, for each subset SKS\subseteq K with |S|n|S|\leq n, and for every face FF of P(σ)\pazocal{P}(\sigma) with dim(F)=|S|\dim(F)=|S|, we have at most mm degree-KK polynomial hypersurfaces. The number of kk-dimensional faces of P(σ)\pazocal{P}(\sigma) is at most (m+|σ|nk)(m+2n)n1\binom{m+|\sigma|}{n-k}\leq(m+2n)^{n-1}, so the total number of these hypersurfaces is at most nKn(m+2n)nτ3nnK^{n}(m+2n)^{n}\tau^{3n}. Finally, for every σ\sigma, we considered a degree-2K2K polynomal hypersurfaces for every subset of cuts and every pair of faces with degree equal to the size of the subset, of which there are at most nKn(m+2n)2nτ3nnK^{n}(m+2n)^{2n}\tau^{3n}, as desired. ∎

We now refine the decomposition obtained in Lemma 4.2 so that the branching constraints added at each step of branch-and-cut are invariant within a region. For ease of exposition, we assume that branch-and-cut uses a lexicographic variable selection policy. This means that the variable branched on at each node of the search tree is fixed and given by the lexicographic ordering x1,,xnx_{1},\ldots,x_{n}. Generalizing the argument to work for other policies, such as the product scoring rule, can be done as in the single-cut case.

Lemma C.3.

Suppose branch-and-cut uses a lexicographic variable selection policy. Then, there is a set of of at most KK hyperplanes, 3n2Kn(m+2n)nτ3n3n^{2}K^{n}(m+2n)^{n}\tau^{3n} degree-KK polynomial hypersurfaces, and nKn(m+2n)2nτ3nnK^{n}(m+2n)^{2n}\tau^{3n} degree-2K2K polynomial hypersurfaces partitioning n+1\mathbb{R}^{n+1} into connected components such that within each connected component, the branching constraints used at every step of branch-and-cut are invariant.

Proof.

Fix a connected component CC in the decomposition established in Corollary C.2. Then, by Corollary C.2, for each σ\sigma, either 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)=𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma) or there exists cuts (without less of generality) labeled by indices 1,,k[K]1,\ldots,k\in[K] and there exists FMσF\subseteq M\cup\sigma such that

𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)[i]=det(AF,𝜶1,β1,,𝜶k,βk,σi)det(AF,𝜶1,,𝜶k,σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)[i]=\frac{\det(A^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k},\sigma})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k},\sigma})}

for all (𝜶,β)C(\bm{\alpha},\beta)\in C and all i[n]i\in[n]. Now, if we are at a stage in the branch-and-cut tree where σ\sigma is the list of branching constraints added so far, and the iith variable is being branched on next, the two constraints generated are

xi𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)[i] and xi𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)[i],x_{i}\leq\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)[i]\right\rfloor\text{ and }x_{i}\geq\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)[i]\right\rceil,

respectively. If CC is a component where 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)=𝒙𝖫𝖯(σ)\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)=\bm{x}^{*}_{\mathsf{LP}}(\sigma), then there is nothing more to do, since the branching constraints at that point are trivially invariant over (𝜶1,β1,,𝜶K,βK)C(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K})\in C. Otherwise, in order to further decompose CC such that the right-hand-side of these constraints are invariant for every σ\sigma and every i=1,,ni=1,\ldots,n, we add the two decision boundaries given by

kdet(AF,𝜶1,β1,,𝜶k,βk,σi)det(AF,𝜶1,,𝜶k,σ)k+1k\leq\frac{\det(A^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k},\sigma})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k},\sigma})}\leq k+1

for every ii, σ\sigma, and every integer k=0,,τ1k=0,\ldots,\tau-1, where τ=max𝒙P𝒙\tau=\lceil\max_{\bm{x}\in\pazocal{P}}\left\lVert\bm{x}\right\rVert_{\infty}\rceil. This ensures that within every connected component of CC induced by these boundaries (degree-KK polynomial hypersurfaces),

𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AF,𝜶1,β1,,𝜶k,βk,σi)det(AF,𝜶1,,𝜶k,σ)\left\lfloor\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rfloor=\left\lfloor\frac{\det(A^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k},\sigma})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k},\sigma})}\right\rfloor

and

𝒙𝖫𝖯(𝜶T𝒙β,σ)[i]=det(AF,𝜶1,β1,,𝜶k,βk,σi)det(AF,𝜶1,,𝜶k,σ)\left\lceil\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma)[i]\right\rceil=\left\lceil\frac{\det(A^{i}_{F,\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{k},\beta_{k},\sigma})}{\det(A_{F,\bm{\alpha}_{1},\ldots,\bm{\alpha}_{k},\sigma})}\right\rceil

are invariant, so the branching constraints added by, for example, a lexicographic branching rule, are invariant. For a fixed σ\sigma, there are two hypersurfaces for every subset S[K]S\subseteq[K], every FMσF\subseteq M\cup\sigma corresponding to a |S||S|-dimensional face of P(σ)\pazocal{P}(\sigma), and every i=1,,ni=1,\ldots,n, for a total of at most 2n2Kn(m+|σ||S|)2n2Kn(m+2n)n2n^{2}K^{n}\binom{m+|\sigma|}{|S|}\leq 2n^{2}K^{n}(m+2n)^{n}. Summing over all reduced σ\sigma, we get a total of 2n2Kn(m+2n)nτ3n2n^{2}K^{n}(m+2n)^{n}\tau^{3n} hypersurfaces. Adding these hypersurfaces to the set of hypersurfaces established in Corollary C.2 yields the lemma statement. ∎

Now, as in the single-cut case, we consider the constraints that ensure that all cuts are valid. Let VK(n+1)\pazocal{V}\subseteq\mathbb{R}^{K(n+1)} denote the set of all vectors of valid KK cuts. As before, V\pazocal{V} is a polyhedron, since we may write

V=k=1K𝐱𝖨𝖧P𝖨𝖧{(𝜶1,β1,,𝜶K,βk)K(n+1):𝜶kT𝐱𝖨𝖧βk}.\pazocal{V}=\bigcap_{k=1}^{K}\bigcap_{\bm{x}_{\mathsf{IH}}\in\pazocal{P}_{\mathsf{IH}}}\left\{(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{k})\in\mathbb{R}^{K(n+1)}:\bm{\alpha}^{T}_{k}\bm{x}_{\mathsf{IH}}\leq\beta_{k}\right\}.

We now refine our decomposition further to control the integrality of the various LP solutions at each node of branch-and-cut.

Lemma C.4.

Given an IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}), there is a set of at most 2Kτn2K\tau^{n} hyperplanes, 4n2Kn(m+2n)nτ4n4n^{2}K^{n}(m+2n)^{n}\tau^{4n} degree-KK polynomial hypersurfaces, and nKn(m+2n)2nτ3nnK^{n}(m+2n)^{2n}\tau^{3n} degree-2K2K polynomial hypersurfaces partitioning K(n+1)\mathbb{R}^{K(n+1)} into connected components such that for each component CC, and each σBC\sigma\subseteq\pazocal{B}\pazocal{C},

𝟏[𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)n]\mathbf{1}\left[\bm{x}^{*}_{\mathsf{LP}}\left(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma\right)\in\mathbb{Z}^{n}\right]

is invariant for all (𝛂1,β1,,𝛂K,βK)C(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K})\in C.

Proof.

Fix a connected component CC in the decomposition that includes the facets defining V\pazocal{V} and the surfaces obtained in Lemma C.3. For all σBC\sigma\in\pazocal{B}\pazocal{C}, 𝒙𝖨P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{I}}, and i=1,,ni=1,\ldots,n, consider the surface

𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)[i]=𝒙𝖨[i].\bm{x}^{*}_{\mathsf{LP}}\left(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma\right)[i]=\bm{x}_{\mathsf{I}}[i]. (20)

This surface is a polynomial hypersurface of degree at most KK, due to Corollary C.2. Clearly, within any connected component of CC induced by these hyperplanes, for every σ\sigma and 𝒙𝖨P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{I}}, 𝟏[𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)=𝒙𝖨]\mathbf{1}[\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)=\bm{x}_{\mathsf{I}}] is invariant. Finally, if 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)n\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)\in\mathbb{Z}^{n} for some KK cuts 𝜶1T𝒙β1,,𝜶KT𝒙βK\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K} within a given connected component, 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)=𝒙𝖨\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)=\bm{x}_{\mathsf{I}} for some 𝒙𝖨P𝖨𝖧(σ)P𝖨\bm{x}_{\mathsf{I}}\in\pazocal{P}_{\mathsf{IH}}(\sigma)\subseteq\pazocal{P}_{\mathsf{I}}, which means that 𝒙𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)=𝒙𝖨n\bm{x}^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma)=\bm{x}_{\mathsf{I}}\in\mathbb{Z}^{n} for all vectors of KK cuts 𝜶1T𝒙β1,,𝜶KT𝒙βK\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K} in that connected component.

We now count the number of hyperplanes given by Equation 20. For each σ\sigma, there are nKnnK^{n} possible subsets of cut indices and at most (m+2n)n1(m+2n)^{n-1} binding face constraints FMσF\subseteq M\cup\sigma defining the formula of Corollary C.2. For each subset-face pair, there are n|P𝖨|nτnn|\pazocal{P}_{\mathsf{I}}|\leq n\tau^{n} degree-KK polynomial hypersurfaces given by Equation 20. So the total number of such hypersurfaces over all σ\sigma is at most τ3nn2Kn(m+2n)n1τn\tau^{3n}n^{2}K^{n}(m+2n)^{n-1}\tau^{n}. The number of facets defining V\pazocal{V} is at most K|P𝖨|KτnK|\pazocal{P}_{\mathsf{I}}|\leq K\tau^{n}. Adding these to the counts obtained in Lemma C.3 yields the final tallies in the lemma statement. ∎

At this point, as in the single-cut case, if the bounding aspect of branch-and-cut is suppressed, our decomposition yields connected components over which the branch-and-cut tree built is invariant. We now prove our main structural theorem for B&C as a function of multiple cutting planes at the root.

Theorem C.5.

Given an IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}), there is a set of at most O(12nn2nK2n2(m+2n)2n2τ5n2)O(12^{n}n^{2n}K^{2n^{2}}(m+2n)^{2n^{2}}\tau^{5n^{2}}) polynomial hypersurfaces of degree at most 2K2K partitioning K(n+1)\mathbb{R}^{K(n+1)} into connected components such that the branch-and-cut tree built after adding the KK cuts 𝛂1T𝐱β1,,𝛂kT𝐱βk\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{k}^{T}\bm{x}\leq\beta_{k} at the root is invariant over all (𝛂1,β1,,𝛂K,βK)(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K}) within a given component. In particular, f𝐜,A,𝐛(𝛂1,β1,,𝛂K,βK)f_{\bm{c},A,\bm{b}}(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K}) is invariant over each connected component.

Proof.

Fix a connected component CC in the decomposition induced by the set of hyperplanes, degree-KK hypersurfaces, and degree-2K2K hypersurfaces established in Lemma C.4. Let

Q1,,Qi1,I1,Qi1+1,,Qi2,I2,Qi2+1,Q_{1},\ldots,Q_{i_{1}},I_{1},Q_{i_{1}+1},\ldots,Q_{i_{2}},I_{2},Q_{i_{2}+1},\ldots (21)

denote the nodes of the tree branch-and-cut creates, in order of exploration, under the assumption that a node is pruned if and only if either the LP at that node is infeasible or the LP optimal solution is integral (so the “bounding” of branch-and-bound is suppressed). Here, a node is identified by the list σ\sigma of branching constraints added to the input IP. Nodes labeled by QQ are either infeasible or have fractional LP optimal solutions. Nodes labeled by II have integral LP optimal solutions and are candidates for the incumbent integral solution at the point they are encountered. (The nodes are functions of 𝜶1,β1,,𝜶K,βK\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K}, as are the indices i1,i2,i_{1},i_{2},\ldots.) By Lemma C.4, this ordered list of nodes is invariant for all (𝜶1,β1,,𝜶K,βk)C(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{k})\in C.

Now, given an node index \ell, let I()I(\ell) denote the incumbent node with the highest objective value encountered up until the \ellth node searched by B&C, and let z(I())z(I(\ell)) denote its objective value. For each node QQ_{\ell}, let σ\sigma_{\ell} denote the branching constraints added to arrive at node QQ_{\ell}. The hyperplane

z𝖫𝖯(𝜶1T𝒙β1,,𝜶KT𝒙βK,σ)=z(I())z^{*}_{\mathsf{LP}}\left(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{K}^{T}\bm{x}\leq\beta_{K},\sigma_{\ell}\right)=z(I(\ell)) (22)

(which is a hyperplane due to Corollary C.2) partitions CC into two subregions. In one subregion, z𝖫𝖯(𝜶1T𝒙β1,,𝜶kT𝒙βk,σ)z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{k}^{T}\bm{x}\leq\beta_{k},\sigma_{\ell})\leq z(I(\ell)), that is, the objective value of the LP optimal solution is no greater than the objective value of the current incumbent integer solution, and so the subtree rooted at QQ_{\ell} is pruned. In the other subregion, z𝖫𝖯(𝜶1T𝒙β1,,𝜶kT𝒙βk,σ)>z(I())z^{*}_{\mathsf{LP}}(\bm{\alpha}_{1}^{T}\bm{x}\leq\beta_{1},\ldots,\bm{\alpha}_{k}^{T}\bm{x}\leq\beta_{k},\sigma_{\ell})>z(I(\ell)), and QQ_{\ell} is branched on further. Therefore, within each connected component of CC induced by all hyperplanes given by Equation 22 for all \ell, the set of node within the list (21) that are pruned is invariant. Combined with the surfaces established in Lemma C.4, these hyperplanes partition K(n+1)\mathbb{R}^{K(n+1)} into connected components such that as (𝜶1,β1,𝜶K,βK)(\bm{\alpha}_{1},\beta_{1}\ldots,\bm{\alpha}_{K},\beta_{K}) varies within a given component, the tree built by branch-and-cut is invariant.

Finally, we count the total number of surfaces inducing this partition. Unlike the counting stages of the previous lemmas, we will first have to count the number of connected components induced by the surfaces established in Lemma C.4. This is because the ordered list of nodes explored by branch-and-cut (21) can be different across each component, and the hyperplanes given by Equation 22 depend on this list. From Lemma C.4 we have 6n2Kn(m+2n)2nτ4n6n^{2}K^{n}(m+2n)^{2n}\tau^{4n} polynomial hypersurfaces of degree 2K\leq 2K. The set of all (𝜶1,β1,𝜶K,βk)K(n+1)(\bm{\alpha}_{1},\beta_{1},\ldots\bm{\alpha}_{K},\beta_{k})\in\mathbb{R}^{K(n+1)} such that (𝜶1,β1,,𝜶K,βK)(\bm{\alpha}_{1},\beta_{1},\ldots,\bm{\alpha}_{K},\beta_{K}) lies on the boundary of any of these surfaces is precisely the zero set of the product of all polynomials defining these surfaces. Denote this product polynomial by pp. The degree of the product polynomial is the sum of the degrees of 6n2Kn(m+2n)2nτ4n6n^{2}K^{n}(m+2n)^{2n}\tau^{4n} polynomials of degree 2K\leq 2K, which is at most 2K6Kn2Kn(m+2n)2nτ4n=12n2Kn+2(m+2n)2nτ4n2K\cdot 6Kn^{2}K^{n}(m+2n)^{2n}\tau^{4n}=12n^{2}K^{n+2}(m+2n)^{2n}\tau^{4n}. By Warren’s theorem, the number of connected components of n+1{(𝜶,β):p(𝜶,β)=0}\mathbb{R}^{n+1}\setminus\{(\bm{\alpha},\beta):p(\bm{\alpha},\beta)=0\} is O((12n2Kn+2(m+2n)2nτ4n)n1)O((12n^{2}K^{n+2}(m+2n)^{2n}\tau^{4n})^{n-1}), and by the Milnor-Thom theorem, the number of connected components of {(𝜶,β):p(𝜶,β)=0}\{(\bm{\alpha},\beta):p(\bm{\alpha},\beta)=0\} is O((12n2Kn+2(m+2n)2nτ4n)n1)O((12n^{2}K^{n+2}(m+2n)^{2n}\tau^{4n})^{n-1}) as well. So, the number of connected components induced by the surfaces in Lemma C.4 is O(12nn2nK2n2(m+2n)2n2τ4n2).O(12^{n}n^{2n}K^{2n^{2}}(m+2n)^{2n^{2}}\tau^{4n^{2}}). For every connected component CC in Lemma C.4, the closed form of z𝖫𝖯(𝜶T𝒙β,σ)z^{*}_{\mathsf{LP}}(\bm{\alpha}^{T}\bm{x}\leq\beta,\sigma_{\ell}) is already determined due to Corollary C.2, and so the number of hyperplanes given by Equation 22 is at most the number of possible σBC\sigma\subseteq\pazocal{B}\pazocal{C}, which is at most τ3n\tau^{3n}. So across all connected components CC, the total number of hyperplanes given by Equation 22 is O(12nn2nK2n2(m+2n)2n2τ5n2).O(12^{n}n^{2n}K^{2n^{2}}(m+2n)^{2n^{2}}\tau^{5n^{2}}). Finally, adding this to the surface-counts established in Lemma C.4 yields the theorem statement. ∎

Appendix D Omitted results from Section 5

Proof of Theorem 5.1.

For a set X\pazocal{X}, X<\pazocal{X}^{<\mathbb{N}} denotes the set of finite sequences of elements from X\pazocal{X}. There is a bijection between the set of IPs (𝒄,A,𝒃)I:=n×m×n×m(\bm{c},A,\bm{b})\in\pazocal{I}:=\mathbb{R}^{n}\times\mathbb{Z}^{m\times n}\times\mathbb{Z}^{m} and \mathbb{R}, so IPs can be uniquely represented as real numbers (and vice versa). Now, consider the set of all finite sequences of pairs of IPs and ±1\pm 1 labels of the form ((𝒄𝟏,A1,𝒃1),ε1),,((𝒄𝑵,AN,𝒃N),εN)((\bm{c_{1}},A_{1},\bm{b}_{1}),\varepsilon_{1}),\ldots,((\bm{c_{N}},A_{N},\bm{b}_{N}),\varepsilon_{N}), ε1,,εN{1,1}\varepsilon_{1},\ldots,\varepsilon_{N}\in\{-1,1\}, that is, the set (I×{1,1})<(\pazocal{I}\times\{-1,1\})^{<\mathbb{N}}. There is a bijection between this set and (×{1,1})<(\mathbb{R}\times\{-1,1\})^{<\mathbb{N}}, and in turn there is a bijection between (×{1,1})<(\mathbb{R}\times\{-1,1\})^{<\mathbb{N}} and \mathbb{R}. Hence, there exists a bijection between U\pazocal{U} and (I×{1,1})<(\pazocal{I}\times\{-1,1\})^{<\mathbb{N}}. Fix such a bijection φ:U(I×{1,1})<\varphi:\pazocal{U}\to(\pazocal{I}\times\{-1,1\})^{<\mathbb{N}}, and let φ1:(I×{1,1})<U\varphi^{-1}:(\pazocal{I}\times\{-1,1\})^{<\mathbb{N}}\to\pazocal{U} denote the inverse of φ\varphi, which is well defined and also a bijection.

Let nn be odd. For cc\in\mathbb{R}, let 𝖨𝖯cI\mathsf{IP}_{c}\in\pazocal{I} denote the IP

maximizecsubject to2x1++2xn=n𝒙{0,1}n.\begin{array}[]{ll}\text{maximize}&c\\ \text{subject to}&2x_{1}+\cdots+2x_{n}=n\\ &\bm{x}\in\{0,1\}^{n}.\end{array} (23)

Since nn is odd, 𝖨𝖯c\mathsf{IP}_{c} is infeasible, independent of cc. Jeroslow [21] showed that without the use of cutting planes or heuristics, branch-and-bound builds a tree of size 2(n1)/22^{(n-1)/2} before determining infeasibility and terminating. The objective cc is irrelevant, but is important in generating distinct IPs with this property. Consider the cut x1++xnn/2x_{1}+\cdots+x_{n}\leq\left\lfloor n/2\right\rfloor, which is a valid cut for 𝖨𝖯c\mathsf{IP}_{c} (this is in fact a Chvátal-Gomory cut [8]). In particular, since nn is odd, x1++xnn/2x1++xn(n1)/2<n/2x_{1}+\cdots+x_{n}\leq\left\lfloor n/2\right\rfloor\implies x_{1}+\cdots+x_{n}\leq(n-1)/2<n/2, so the equality constraint of 𝖨𝖯c\mathsf{IP}_{c} is violated by this cut. Thus, the feasible region of the LP relaxation after adding this cut is empty, and branch-and-bound will terminate immediately at the root (building a tree of size 11). Denote this cut by (𝜶(1),β(1))=(𝟏,n/2)(\bm{\alpha}^{(-1)},\beta^{(-1)})=(\bm{1},\left\lfloor n/2\right\rfloor). On the other hand, let (𝜶(1),β(1))=(𝟎,0)(\bm{\alpha}^{(1)},\beta^{(1)})=(\bm{0},0) be the trivial cut 000\leq 0. Adding this cut to the IP constraints does not change the feasible region, so branch-and-bound will build a tree of size 2(n1)/22^{(n-1)/2}.

We now define 𝜶𝒄,A,𝒃\bm{\alpha}_{\bm{c},A,\bm{b}} and β𝒄,A,𝒃\beta_{\bm{c},A,\bm{b}}. Let

(𝜶𝒄,A,𝒃(𝒖),β𝒄,A,𝒃(𝒖))={(𝜶(1),β(1))if ((𝒄,A,𝒃),1)φ(𝒖) and ((𝒄,A,𝒃),1)φ(𝒖)(𝜶(1),β(1))if ((𝒄,A,𝒃),1)φ(𝒖) and ((𝒄,A,𝒃),1)φ(𝒖)(𝟎,0)otherwise.(\bm{\alpha}_{\bm{c},A,\bm{b}}(\bm{u}),\beta_{\bm{c},A,\bm{b}}(\bm{u}))=\begin{cases}(\bm{\alpha}^{(1)},\beta^{(1)})&\text{if }((\bm{c},A,\bm{b}),1)\in\varphi(\bm{u})\text{ and }((\bm{c},A,\bm{b}),-1)\notin\varphi(\bm{u})\\ (\bm{\alpha}^{(-1)},\beta^{(-1)})&\text{if }((\bm{c},A,\bm{b}),-1)\in\varphi(\bm{u})\text{ and }((\bm{c},A,\bm{b}),1)\notin\varphi(\bm{u})\\ (\bm{0},0)&\text{otherwise}\end{cases}.

The choice to use (𝟎,0)(\bm{0},0) in the case that either ((𝒄,A,𝒃),ε)φ(𝒖)((\bm{c},A,\bm{b}),\varepsilon)\notin\varphi(\bm{u}) for each ε{1,1}\varepsilon\in\{-1,1\}, or ((𝒄,A,𝒃),1)φ(𝒖)((\bm{c},A,\bm{b}),-1)\in\varphi(\bm{u}) and ((𝒄,A,𝒃),1)φ(𝒖)((\bm{c},A,\bm{b}),1)\in\varphi(\bm{u}) is arbitrary and unimportant. Now, for any integer N>0N>0, constructing a set of NN IPs and NN thresholds that is shattered is almost immediate. Let c1,,cNc_{1},\ldots,c_{N}\in\mathbb{R} be distinct reals, and let 1<r1,,rN<2(n1)/21<r_{1},\ldots,r_{N}<2^{(n-1)/2}. Then, the set {(𝖨𝖯c1,r1),,(𝖨𝖯cN,rN)}\{(\mathsf{IP}_{c_{1}},r_{1}),\ldots,(\mathsf{IP}_{c_{N}},r_{N})\} can be shattered. Indeed, given a sign pattern (ε1,,εN){1,1}N(\varepsilon_{1},\ldots,\varepsilon_{N})\in\{-1,1\}^{N}, let

𝒖=φ1((𝖨𝖯c1,ε1),,(𝖨𝖯cN,εN)).\bm{u}=\varphi^{-1}\left((\mathsf{IP}_{c_{1}},\varepsilon_{1}),\ldots,(\mathsf{IP}_{c_{N}},\varepsilon_{N})\right).

Then, if εi=1\varepsilon_{i}=1, (𝜶𝖨𝖯ci(𝒖),β𝖨𝖯ci(𝒖))=(𝜶(1),β(1))(\bm{\alpha}_{\mathsf{IP}_{c_{i}}}(\bm{u}),\beta_{\mathsf{IP}_{c_{i}}}(\bm{u}))=(\bm{\alpha}^{(1)},\beta^{(1)}), so g𝒖(𝖨𝖯ci)=2(n1)/2g_{\bm{u}}(\mathsf{IP}_{c_{i}})=2^{(n-1)/2} and sign(g𝒖(𝖨𝖯ci)ri)=1\textnormal{sign}(g_{\bm{u}}(\mathsf{IP}_{c_{i}})-r_{i})=1. If εi=1\varepsilon_{i}=-1, (𝜶𝖨𝖯ci(𝒖),β𝖨𝖯ci(𝒖))=(𝜶(1),β(1))(\bm{\alpha}_{\mathsf{IP}_{c_{i}}}(\bm{u}),\beta_{\mathsf{IP}_{c_{i}}}(\bm{u}))=(\bm{\alpha}^{(-1)},\beta^{(-1)}), so g𝒖(𝖨𝖯ci)=1g_{\bm{u}}(\mathsf{IP}_{c_{i}})=1 and sign(g𝒖(𝖨𝖯ci)ri)=1\textnormal{sign}(g_{\bm{u}}(\mathsf{IP}_{c_{i}})-r_{i})=-1. So for any NN there is a set of IPs and thresholds that can be shattered, which yields the theorem statement.∎

Proof of Lemma 5.2.

We have fi=𝒖T𝒂i𝒖T𝒂if_{i}=\bm{u}^{T}\bm{a}_{i}-\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor, f0=𝒖T𝒃𝒖T𝒃f_{0}=\bm{u}^{T}\bm{b}-\lfloor\bm{u}^{T}\bm{b}\rfloor, and since 𝒖[U,U]m\bm{u}\in[-U,U]^{m}, 𝒖T𝒂i[U𝒂i1,U𝒂i1]\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor\in[-U\left\lVert\bm{a}_{i}\right\rVert_{1},U\left\lVert\bm{a}_{i}\right\rVert_{1}] and 𝒖T𝒃[U𝒃1,U𝒃1].\lfloor\bm{u}^{T}\bm{b}\rfloor\in[-U\left\lVert\bm{b}\right\rVert_{1},U\left\lVert\bm{b}\right\rVert_{1}]. Now, for all ii, ki[U𝒂i1,U𝒂i1]k_{i}\in[-U\left\lVert\bm{a}_{i}\right\rVert_{1},U\left\lVert\bm{a}_{i}\right\rVert_{1}]\cap\mathbb{Z} and k0[U𝒃1,U𝒃1]k_{0}\in[-U\left\lVert\bm{b}\right\rVert_{1},U\left\lVert\bm{b}\right\rVert_{1}]\cap\mathbb{Z}, put down the hyperplanes defining the two halfspaces

𝒖T𝒂i=kiki𝒖T𝒂i<ki+1\left\lfloor\bm{u}^{T}\bm{a}_{i}\right\rfloor=k_{i}\iff k_{i}\leq\bm{u}^{T}\bm{a}_{i}<k_{i}+1 (24)

and the hyperplanes defining the two halfspaces

𝒖T𝒃=k0k0𝒖T𝒃<k0+1.\left\lfloor\bm{u}^{T}\bm{b}\right\rfloor=k_{0}\iff k_{0}\leq\bm{u}^{T}\bm{b}<k_{0}+1. (25)

In addition, consider the hyperplane

𝒖T𝒂iki=𝒖T𝒃k0\bm{u}^{T}\bm{a}_{i}-k_{i}=\bm{u}^{T}\bm{b}-k_{0} (26)

for each ii. Within any connected component of m\mathbb{R}^{m} determined by these hyperplanes, 𝒖T𝒂i\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor and 𝒖T𝒃\lfloor\bm{u}^{T}\bm{b}\rfloor are constant. Furthermore, 𝟏[fif0]\mathbf{1}[f_{i}\leq f_{0}] is invariant within each connected component, since if 𝒖T𝒂i=ki\lfloor\bm{u}^{T}\bm{a}_{i}\rfloor=k_{i} and 𝒖T𝒃=k0\lfloor\bm{u}^{T}\bm{b}\rfloor=k_{0}, fif0𝒖T𝒂iki𝒖T𝒃k0,f_{i}\leq f_{0}\iff\bm{u}^{T}\bm{a}_{i}-k_{i}\leq\bm{u}^{T}\bm{b}-k_{0}, which is the hyperplane given by Equation 26. The total number of hyperplanes of type 24 is O(nUA1)O(nU\left\lVert A\right\rVert_{1}), the total number of hyperplanes of type 25 is O(U𝒃1)O(U\left\lVert\bm{b}\right\rVert_{1}), and the total number of hyperplanes of type 26 is nU2A1𝒃1nU^{2}\left\lVert A\right\rVert_{1}\left\lVert\bm{b}\right\rVert_{1}. Summing yields the lemma statement. ∎

Proof of Lemma 5.4.

Let Cn+1C\subseteq\mathbb{R}^{n+1} be a connected component in the partition established in Theorem 4.5, so CC can be written as the intersection of at most 14n(m+2n)3n2τ5n214^{n}(m+2n)^{3n^{2}}\tau^{5n^{2}} polynomial constraints of degree at most 55. Let D[U,U]mD\subseteq[-U,U]^{m} be a connected component in the partition established in Lemma 5.2. By Lemma 5.3, there are at most 14n(m+2n)3n2τ5n214^{n}(m+2n)^{3n^{2}}\tau^{5n^{2}} polynomials of degree at most 1010 partitioning DD into connected components such that within each component, 𝟏[(𝜶(𝒖),β(𝒖))C]\mathbf{1}[(\bm{\alpha}(\bm{u}),\beta(\bm{u}))\in C] is invariant. If we consider the overlay of these polynomial surfaces over all components CC, we will get a partition of [U,U]m[-U,U]^{m} such that for every CC, 𝟏[(𝜶(𝒖),β(𝒖))C]\mathbf{1}[(\bm{\alpha}(\bm{u}),\beta(\bm{u}))\in C] is invariant over each connected component of [U,U]m[-U,U]^{m}. Once we have this we are done, since all 𝒖\bm{u} in the same connected component of [U,U]m[-U,U]^{m} will be sent to the same connected component of n+1\mathbb{R}^{n+1} by (𝜶(𝒖),β(𝒖))(\bm{\alpha}(\bm{u}),\beta(\bm{u})), and thus by Theorem 4.5 the behavior of branch-and-cut will be invariant.

We now tally up the total number of surfaces. The number of connected components CC was given by Warren’s theorem and the Milnor-Thom theorem to be O(14n(n+1)(m+2n)3n2(n+1)τ5n2(n+1))O(14^{n(n+1)}(m+2n)^{3n^{2}(n+1)}\tau^{5n^{2}(n+1)}), so the total number of degree-1010 hypersurfaces is 14n(m+2n)3n2τ5n214^{n}(m+2n)^{3n^{2}}\tau^{5n^{2}} times this quantity, which yields the lemma statement.∎

D.1 Multiple GMI cuts at the root

In this section we extend our results to allow for multiple GMI cuts at the root of the B&C tree. These cuts can be added simultaneously, sequentially, or in rounds. If GMI cuts 𝒖1\bm{u}_{1}, 𝒖2\bm{u}_{2} are added simultaneously, both of them have the same dimension and are defined in the usual way. If GMI cuts 𝒖1\bm{u}_{1}, 𝒖2\bm{u}_{2} are added sequentially, 𝒖2\bm{u}_{2} has one more entry than 𝒖1\bm{u}_{1}. This is because when cuts are added sequentially, the LP relaxation is re-solved after the addition of the first cut, and the second cut has a multiplier for all original constraints as well as for the first cut (this ensures that the second cut can be chosen in a more informed manner). If KK cuts are made at the root, they can be added in sequential rounds of simultaneous cuts. In the following discussion, we focus on the case where all KK cuts are added sequentially—the other cases can be viewed as instantiations of this. We refer the reader to the discussion in Balcan et al. [8] for more details.

To prove an analogous result for multiple GMI cuts (in sequence, that is, each successive GMI cut has one more parameter than the previous), we combine the reasoning used in the single-GMI-cut case with some technical observations in Balcan et al. [8].

Lemma D.1.

Consider the family of KK sequential GMI cuts parameterized by 𝐮1[U,U]m,𝐮2[U,U]m+1,,𝐮K[U,U]m+K1\bm{u}_{1}\in[-U,U]^{m},\bm{u}_{2}\in[-U,U]^{m+1},\ldots,\bm{u}_{K}\in[-U,U]^{m+K-1}. For any IP (𝐜,A,𝐛)(\bm{c},A,\bm{b}), there are at most

O(nK(1+U)2KA1𝒃1)O\left(nK(1+U)^{2K}\left\lVert A\right\rVert_{1}\left\lVert\bm{b}\right\rVert_{1}\right)

degree-KK polynomial hypersurfaces and

2O(n2)KO(n3)(m+2n)O(n3)τO(n3)2^{O(n^{2})}K^{O(n^{3})}(m+2n)^{O(n^{3})}\tau^{O(n^{3})}

degree-4K24K^{2} polynomial hypersurfaces partitioning [U,U]m××[U,U]m+K1[-U,U]^{m}\times\cdots\times[-U,U]^{m+K-1} connected components such that the B&C tree built after sequentially adding the GMI cuts defined by 𝐮1,,𝐮K\bm{u}_{1},\ldots,\bm{u}_{K} is invariant over all (𝐮1,,𝐮K)(\bm{u}_{1},\ldots,\bm{u}_{K}) within a single component.

Proof.

We start with the setup used by Balcan et al. [8] to prove similar results for sequential Chvátal-Gomory cuts. Let 𝒂1,,𝒂nm\bm{a}_{1},\ldots,\bm{a}_{n}\in\mathbb{R}^{m} be the columns of AA. We define the following augmented columns 𝒂~i1m,,𝒂~iKm+K1\widetilde{\bm{a}}_{i}^{1}\in\mathbb{R}^{m},\ldots,\widetilde{\bm{a}}_{i}^{K}\in\mathbb{R}^{m+K-1} for each i[n]i\in[n], and the augmented constraint vectors 𝒃~1m,,𝒃~Km+K1\widetilde{\bm{b}}^{1}\in\mathbb{R}^{m},\ldots,\widetilde{\bm{b}}^{K}\in\mathbb{R}^{m+K-1} via the following recurrences:

𝒂~i1\displaystyle\bm{\widetilde{a}}^{1}_{i} =𝒂i\displaystyle=\bm{a}_{i}
𝒂~ik\displaystyle\bm{\widetilde{a}}^{k}_{i} =[𝒂~ik1𝒖k1T𝒂~ik1]\displaystyle=\begin{bmatrix}\bm{\widetilde{a}}^{k-1}_{i}\\ \bm{u}^{T}_{k-1}\bm{\widetilde{a}}^{k-1}_{i}\end{bmatrix}

and

𝒃~1\displaystyle\bm{\widetilde{b}}^{1} =𝒃\displaystyle=\bm{b}
𝒃~k\displaystyle\bm{\widetilde{b}}^{k} =[𝒃~k1𝒖k1T𝒃~k1]\displaystyle=\begin{bmatrix}\bm{\widetilde{b}}^{k-1}\\ \bm{u}_{k-1}^{T}\bm{\widetilde{b}}^{k-1}\end{bmatrix}

for k=2,,Kk=2,\ldots,K. In other words, 𝒂~ik\widetilde{\bm{a}}^{k}_{i} is the iith column of the constraint matrix of the IP and 𝒃~k\widetilde{\bm{b}}^{k} is the constraint vector after applying cuts 𝒖1,,𝒖k1\bm{u}_{1},\ldots,\bm{u}_{k-1}. An identical induction argument to that of Balcan et al. [8] shows that for each k[K]k\in[K],

𝒖kT𝒂~ik[(1+U)k𝒂i1,(1+U)k𝒂i1]\big{\lfloor}\bm{u}^{T}_{k}\widetilde{\bm{a}}_{i}^{k}\big{\rfloor}\in\left[-\left(1+U\right)^{k}\left\lVert\bm{a}_{i}\right\rVert_{1},\left(1+U\right)^{k}\left\lVert\bm{a}_{i}\right\rVert_{1}\right]

and

𝒖kT𝒃~k[(1+U)k𝒃1,(1+U)k𝒃1].\big{\lfloor}\bm{u}^{T}_{k}\widetilde{\bm{b}}^{k}\big{\rfloor}\in\left[-\left(1+U\right)^{k}\left\lVert\bm{b}\right\rVert_{1},\left(1+U\right)^{k}\left\lVert\bm{b}\right\rVert_{1}\right].

Now, as in the single-GMI-cut setting, consider the surfaces

𝒖kT𝒂~ik=ii𝒖kT𝒂~ik<i+1\big{\lfloor}\bm{u}_{k}^{T}\widetilde{\bm{a}}^{k}_{i}\big{\rfloor}=\ell_{i}\iff\ell_{i}\leq\bm{u}_{k}^{T}\widetilde{\bm{a}}^{k}_{i}<\ell_{i}+1 (27)

and

𝒖kT𝒃~k=0i𝒖kT𝒃~k<0+1\big{\lfloor}\bm{u}_{k}^{T}\widetilde{\bm{b}}^{k}\big{\rfloor}=\ell_{0}\iff\ell_{i}\leq\bm{u}_{k}^{T}\widetilde{\bm{b}}^{k}<\ell_{0}+1 (28)

for every i,ki,k, and every integer i[(1+U)k𝒂i1,(1+U)k𝒂i1]\ell_{i}\in[-(1+U)^{k}\left\lVert\bm{a}_{i}\right\rVert_{1},(1+U)^{k}\left\lVert\bm{a}_{i}\right\rVert_{1}]\cap\mathbb{Z} and every integer 0[(1+U)k𝒃1,(1+U)k𝒃1]\ell_{0}\in[-(1+U)^{k}\left\lVert\bm{b}\right\rVert_{1},(1+U)^{k}\left\lVert\bm{b}\right\rVert_{1}]\cap\mathbb{Z}. In addition, consider the surfaces

𝒖kT𝒂~iki=𝒖kT𝒃~k0\bm{u}_{k}^{T}\widetilde{\bm{a}}_{i}^{k}-\ell_{i}=\bm{u}_{k}^{T}\widetilde{\bm{b}}^{k}-\ell_{0} (29)

for each i,k,i,0i,k,\ell_{i},\ell_{0}. As observed by Balcan et al. [8], 𝒖kT𝒂~ik\bm{u}_{k}^{T}\widetilde{\bm{a}}^{k}_{i} is a polynomial in 𝒖1[1],,𝒖1[m],𝒖2[1],,𝒖2[m+1],,𝒖k[1],,𝒖k[m+k1]\bm{u}_{1}[1],\ldots,\bm{u}_{1}[m],\bm{u}_{2}[1],\ldots,\bm{u}_{2}[m+1],\ldots,\bm{u}_{k}[1],\ldots,\bm{u}_{k}[m+k-1] of degree at most kk (as is 𝒖kT𝒃~k\bm{u}_{k}^{T}\widetilde{\bm{b}}^{k}), so surfaces 2728, and 29 are all degree-KK polynomial hypersurfaces for all i,ki,k. Within any connected component of [U,U]m××[U,U]m+K1[-U,U]^{m}\times\cdots\times[-U,U]^{m+K-1} induced by these hypersurfaces, 𝒖kT𝒂~ik\lfloor\bm{u}_{k}^{T}\widetilde{\bm{a}}^{k}_{i}\rfloor and 𝒖kT𝒃~k\lfloor\bm{u}_{k}^{T}\widetilde{\bm{b}}^{k}\rfloor are constant. Furthermore 𝟏[fikf0k]\mathbf{1}[f^{k}_{i}\leq f^{k}_{0}] is invariant for every i,ki,k, where fik=𝒖kT𝒂~ik𝒖kT𝒂~ikf^{k}_{i}=\bm{u}_{k}^{T}\widetilde{\bm{a}}_{i}^{k}-\lfloor\bm{u}_{k}^{T}\widetilde{\bm{a}}_{i}^{k}\rfloor and f0k=𝒖kT𝒃~k𝒖kT𝒃~kf^{k}_{0}=\bm{u}_{k}^{T}\widetilde{\bm{b}}^{k}-\lfloor\bm{u}_{k}^{T}\widetilde{\bm{b}}^{k}\rfloor.

Now, fix a connected component D[U,U]m××[U,U]m+K1D\subseteq[-U,U]^{m}\times\cdots\times[-U,U]^{m+K-1} induced by the above hypersurfaces, and let CK(n+1)C\subseteq\mathbb{R}^{K(n+1)} be the intersection of qq polynomial inequalities of degree at most dd. Consider a single degree-dd polynomial inequality in K(n+1)K(n+1) variables y1,,yK(n+1)y_{1},\ldots,y_{K(n+1)}, which can be written as

T[K(n+1)]|T|dλTjTyj=T1,,TK[n+1]|T1|++|TK|dλT1,,TKj1T1yj1jKTKyjKγ.\sum_{\begin{subarray}{c}T\sqsubseteq[K(n+1)]\\ |T|\leq d\end{subarray}}\lambda_{T}\prod_{j\in T}y_{j}=\sum_{\begin{subarray}{c}T_{1},\ldots,T_{K}\sqsubseteq[n+1]\\ |T_{1}|+\cdots+|T_{K}|\leq d\end{subarray}}\lambda_{T_{1},\ldots,T_{K}}\prod_{j_{1}\in T_{1}}y_{j_{1}}\cdots\prod_{j_{K}\in T_{K}}y_{j_{K}}\leq\gamma.

Now, the sets S1,,SKS_{1},\ldots,S_{K} defined by Sk={i:fikf0k}S_{k}=\{i:f^{k}_{i}\leq f^{k}_{0}\} are fixed within DD, so we can write this as

T1,,TK[n+1]|T1|++|TK|dλT1,,TKk=1K[jTkSkjn+1fjk(1f0k)jTkSkjn+1f0k(1fjk)jTkj=n+1f0k(1f0k)]γ.\sum_{\begin{subarray}{c}T_{1},\ldots,T_{K}\sqsubseteq[n+1]\\ |T_{1}|+\cdots+|T_{K}|\leq d\end{subarray}}\lambda_{T_{1},\ldots,T_{K}}\prod_{k=1}^{K}\Bigg{[}\prod_{\begin{subarray}{c}j\in T_{k}\cap S_{k}\\ j\neq n+1\end{subarray}}f^{k}_{j}(1-f^{k}_{0})\prod_{\begin{subarray}{c}j\in T_{k}\setminus S_{k}\\ j\neq n+1\end{subarray}}f^{k}_{0}(1-f^{k}_{j})\prod_{\begin{subarray}{c}j\in T_{k}\\ j=n+1\end{subarray}}f^{k}_{0}(1-f^{k}_{0})\Bigg{]}\leq\gamma.

We have that fjkf_{j}^{k} and f0kf_{0}^{k} are degree-kk polynomials in 𝒖1,,𝒖k\bm{u}_{1},\ldots,\bm{u}_{k}. Since the sum is over all multisets T1,,TKT_{1},\ldots,T_{K} such that |T1|++|TK|d|T_{1}|+\cdots+|T_{K}|\leq d, there are at most dd terms across the products, each of the form fjk(1f0)kf_{j}^{k}(1-f_{0})^{k}, f0k(1fjk)f_{0}^{k}(1-f_{j}^{k}), or f0k(1f0)kf_{0}^{k}(1-f_{0})^{k}. Therefore, the left-hand-side is a polynomial of degree at most 2dK2dK, and if CK(n+1)C\subseteq\mathbb{R}^{K(n+1)} is the intersection of qq polynomial inequalities each of degree at most dd, the set

{(𝒖1,,𝒖K)D:(𝜶(𝒖1,,𝒖K),β(𝒖1,,𝒖K))C}[U,U]m××[U,U]m+K1\left\{\left(\bm{u}_{1},\ldots,\bm{u}_{K}\right)\in D:\left(\bm{\alpha}\left(\bm{u}_{1},\ldots,\bm{u}_{K}\right),\beta\left(\bm{u}_{1},\ldots,\bm{u}_{K}\right)\right)\in C\right\}\subseteq[-U,U]^{m}\times\cdots\times[-U,U]^{m+K-1}

can be expressed as the intersection of qq degree-2dK2dK polynomial inequalities.

To finish, we run this process for every connected component CK(n+1)C\subseteq\mathbb{R}^{K(n+1)} in the partition established by Theorem C.5. This partition consists of O(12nn2nK2n2(m+2n)2n2τ5n2)O(12^{n}n^{2n}K^{2n^{2}}(m+2n)^{2n^{2}}\tau^{5n^{2}}) degree-2K2K polynomials over K(n+1)\mathbb{R}^{K(n+1)}. By Warren’s theorem and the Milnor-Thom theorem, these polynomials partition K(n+1)\mathbb{R}^{K(n+1)} into O(12n(n+1)n2n(n+1)K2n2(n+1)(m+2n)2n2(n+1)τ5n2(n+1))O(12^{n(n+1)}n^{2n(n+1)}K^{2n^{2}(n+1)}(m+2n)^{2n^{2}(n+1)}\tau^{5n^{2}(n+1)}) connected components. Running the above argument for each of these connected components of K(n+1)\mathbb{R}^{K(n+1)} yields a total of O(12n(n+1)n2n(n+1)K2n2(n+1)(m+2n)2n2(n+1)τ5n2(n+1))O(12nn2nK2n2(m+2n)2n2τ5n2)=2O(n2)KO(n3)(m+2n)O(n3)τO(n3)O\left(12^{n(n+1)}n^{2n(n+1)}K^{2n^{2}(n+1)}(m+2n)^{2n^{2}(n+1)}\tau^{5n^{2}(n+1)}\right)\cdot O\left(12^{n}n^{2n}K^{2n^{2}}(m+2n)^{2n^{2}}\tau^{5n^{2}}\right)=2^{O(n^{2})}K^{O(n^{3})}(m+2n)^{O(n^{3})}\tau^{O(n^{3})} polynomials of degree 4K24K^{2}. Finally, we count the surfaces of the form (27),  (28), and (29). The total number of degree-KK polynomials of type 27 is at most O(nK(1+U)KA1)O(nK(1+U)^{K}\left\lVert A\right\rVert_{1}), the total number of degree-kk polynomials of type 28 is O(K(1+U)K𝒃1)O(K(1+U)^{K}\left\lVert\bm{b}\right\rVert_{1}), and the total number of degree-KK polynomials of type 29 is O(nK(1+U)2KA1𝒃)O(nK(1+U)^{2K}\left\lVert A\right\rVert_{1}\left\lVert\bm{b}\right\rVert). Summing these counts yields the desired number of surfaces in the lemma statement.

In any connected component of [U,U]m[-U,U]^{m} determined by these surfaces, 𝟏[(𝜶(𝒖),β(𝒖))C]\mathbf{1}[(\bm{\alpha}(\bm{u}),\beta(\bm{u}))\in C] is invariant for every connected component CK(n+1)C\subseteq\mathbb{R}^{K(n+1)} in the partition of K(n+1)\mathbb{R}^{K(n+1)} established in Theorem C.5. This means that the tree built by branch-and-cut is invariant, which concludes the proof. ∎

Finally, applying the main result of Balcan et al. [6] to Lemma D.1, we get the following pseudo-dimension bound for the class of KK sequential GMI cuts at the root of the B&C tree.

Theorem D.2.

For 𝐮1[U,U]m,𝐮2[U,U]m+1,,𝐮K[U,U]m+K1\bm{u}_{1}\in[-U,U]^{m},\bm{u}_{2}\in[-U,U]^{m+1},\ldots,\bm{u}_{K}\in[-U,U]^{m+K-1}, let g𝐮1,,𝐮K(𝐜,A,𝐛)g_{\bm{u}_{1},\ldots,\bm{u}_{K}}(\bm{c},A,\bm{b}) denote the number of nodes in the tree B&C builds given the input (𝐜,A,𝐛)(\bm{c},A,\bm{b}) after sequentially applying the GMI cuts defined by 𝐮1,,𝐮K\bm{u}_{1},\ldots,\bm{u}_{K} at the root. The pseudo-dimension of the set of functions {g𝐮𝟏,,𝐮K:(𝐮1,,𝐮K)[U,U]m××[U,U]m+K1}\{g_{\bm{u_{1}},\ldots,\bm{u}_{K}}:(\bm{u}_{1},\ldots,\bm{u}_{K})\in[-U,U]^{m}\times\cdots\times[-U,U]^{m+K-1}\} on the domain of IPs with A1a\left\lVert A\right\rVert_{1}\leq a and 𝐛1b\left\lVert\bm{b}\right\rVert_{1}\leq b is

O(mK3logU+mn3K2log(mnKτ)+mK2log(ab)).O\left(mK^{3}\log U+mn^{3}K^{2}\log(mnK\tau)+mK^{2}\log(ab)\right).