Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation
Abstract
Machine learning has been successfully applied to improve the efficiency of Mixed-Integer Linear Programming (MILP) solvers. However, the learning-based solvers often suffer from severe performance degradation on unseen MILP instances—especially on large-scale instances from a perturbed environment—due to the limited diversity of training distributions. To tackle this problem, we propose a novel approach, which is called Adversarial Instance Augmentation and does not require to know the problem type for new instance generation, to promote data diversity for learning-based branching modules in the branch-and-bound (B&B) Solvers (AdaSolver). We use the bipartite graph representations for MILP instances and obtain various perturbed instances to regularize the solver by augmenting the graph structures with a learned augmentation policy. The major technical contribution of AdaSolver is that we formulate the non-differentiable instance augmentation as a contextual bandit problem and adversarially train the learning-based solver and augmentation policy, enabling efficient gradient-based training of the augmentation policy. To the best of our knowledge, AdaSolver is the first general and effective framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based (RL-based) B&B solvers. Extensive experiments demonstrate that by producing various augmented instances, AdaSolver leads to a remarkable efficiency improvement across various distributions.
Index Terms:
Mixed-Integer Linear Programming, Branching, Graph Neural Network, Generalization and Robustness.1 Introduction
Combinatorial optimization (CO) problem is one of the most fundamental optimization problems that widely originate from scheduling [1], chip design [2], route planning [3], and other industrial scenarios. Generally, solving CO problems is time-consuming and computationally expensive due to their NP-hard nature. In practice, a large family of CO problems can be formulated as Mixed-integer Linear Programmings (MILPs). Exact MILP solvers, such as the B&B solvers SCIP [4] and Gurobi [5], usually rely on hand-craft heuristics that require considerable manual tunings and complex working flows [6, 7]. To speed up the solvers, recent attempts have leveraged fast-inference learning-based models to approximate some complex heuristics [8, 9, 10] or explore more effective heuristics in solvers [11, 12] and achieved promising in-distribution efficiency improvement.
However, the generalization of learning-based solving models to out-of-distribution (OOD) scenarios remains a significant challenge for their real-world applications. First, the solvers are often trained on a given dataset and need to process instances that vary widely in scales in real-world scenarios. The learning-based solvers may process instances with much larger scales than the training instances in practice. Second, the environment changes often perturb the instance structures. For example, on staff scheduling problems, emergencies occur and present extra restrictions to the jobs. We thus need to consider extra kinds of constraints representing the restrictions that are not included in the previous MILP problems, leading to slight differences in the problem structures. Third, acquiring or generating more data from other distributions or with larger scales may not be feasible due to information security/privacy concerns or high data acquisition costs [13, 14]. For example, in a company’s daily job-shop scheduling problems, confidential business information, such as production capacity, can be estimated from the MILP instances. Thus, the company is likely to provide limited and anonymous training data to avoid privacy disclosure.
Existing work attempts to improve the generalization of learning-based solvers by generating instances from different distributions and then employing domain-randomization type approaches [15, 16]. These approaches have severe limitations when it comes to anonymous datasets, since no prior knowledge of the problem type to perform data generation algorithms. Therefore, this paper focuses on this restrictive practical setting where training data only comes from limited training domains, and acquiring a significant of new instances within a short time is intractable. Experiments in such a setting have demonstrated that the solving time could be much longer than traditional heuristics under distributional shifts, especially on larger-scale MILP instances, which hinders the applicability of neural solvers to real-world scenarios.
In such a practical setting, we represent MILP instances as bipartite graphs [10] and propose to use graph data augmentation—a data manipulation technique—that can promote the diversity of graph structures in training distributions. Unlike domain randomization, data augmentation perturbs the graph structures and does not require more training instances or information on problem types. Moreover, we theoretically justify that data augmentation can reduce the generalization bound to the unseen distributions in Section 7. We employ a learnable function as an augmentation policy network to explore effective augmentation strategies. However, two challenges in the augmentation procedure remain to address. First, it is intractable to train the augmentation policy with the gradient-based method since modifying the graph structures is generally a non-differentiable operation, posing challenges for backward propagation. Second, the augmentation policy network receives significantly less training data than the learning-based solver, since the augmentation policy takes MILP instances as training data and each augmented instance can generate a sequence of branching samples as training data for the learning-based solver. The training of the augmentation policy thus falls behind with the solvers. These challenges of non-differentiability and unbalanced data require a practicable and sample-efficient training framework.
To address the aforementioned challenges, we propose Adversarial Instance Augmentation for Solver (AdaSolver), a robust optimization framework for learning-based B&B solvers. To the best of our knowledge, AdaSolver is the first general framework for understanding and improving the robustness and generalization ability of both the IL and RL B&B solvers under real-world restrictions. Specifically, we focus on the competitive graph neural network (GNN) approach for learning branching heuristics, resulting in two variants AdaSolver-IL and AdaSolver-RL in the IL and RL settings. AdaSolver comprises two components: (1) a GNN branching policy for branching variable selection and (2) an adversarial augmentation network (augmenter) to simulate the real-world perturbations by modifying the structures of MILPs’ bipartite graph representations and promoting structural diversity in newly augmented instances. To train the augmenter, we formulate the adversarial graph augmentation as a contextual bandit problem, which offers two key advantages. First, the contextual bandit formulation enables the gradient-based training of the augmenter that controls the non-differentiable graph structure augmentation. Second, under this formulation, we can propose a sample-efficient decision model adapted from [17] that can learn fast with a limited amount of data. Furthermore, to make sure that the augmented instances do not deviate severely from the training distributions, we leverage a discriminator network to filter out the bad instances to avoid disrupting the model training.
We conduct experiments under five benchmarks and compare our approach with baselines in the in-distribution and out-of-distribution testing instances. Our experiments demonstrate that AdaSolver significantly enhances solvers’ robustness to environmental changes and consistently improves the solving efficiency on various distributions.
2 Related Work
2.1 Machine Learning for Combinatorial Optimization
Learning to optimize CO has attracted more and more interest in the operation research and machine learning fields [13, 18]. Existing methods can be roughly classified into two categories. One focuses on solving specific CO problems in an end-to-end style, such as Traveling Salesperson Problem (TSP) [19, 20, 21], Satisfiability Problem (SAT) [22, 23], Vehicle Routing Problem (VRP) [21, 24] and Maximum Cut Problem (MCP) [25]. While the other category employs learning-based methods to approximate heuristics in the exact B&B solvers, such as branching [9, 10, 26, 27], node selection [8], and cut selection [28, 12]. Built on the exact solvers, these methods have optimal guarantees and can be applied to solve general MILP instances. This paper concentrates on the branching module in the exact solvers since the computational efficiency and quality of the selected variables during branching have a great impact on the overall B&B solving efficiency [29].
Recently, researchers have paid more attention to the generalization and robustness of end-to-end neural solvers For example, existing works improve the generalization of end-to-end neural solvers by curriculum learning [15] and [30], group distributionally robust optimization [31], knowledge distillation [32], generative adversarial training [33], [16] and so on. These methods require the knowledge of problem type for instance generation, and most of them focus on the routing problems since instance generation for these problems is not costly. [34] also proposes label-preserving augmentations specified for SAT problems. However, due to the restrictive practical setting mentioned in the introduction, these approaches are not applicable to the exact B&B solvers when the instances are anonymous or hard for generation. For popular exact CO solvers, [35] propose an adversarial attack approach to evaluate solvers’ robustness. [36] aims to promote the generalization across heterogeneous MILPs in the IL setting. Nonetheless, designing learning-based robust exact solvers for both IL and RL solvers remains largely unexplored.
2.2 Machine Learning for Branching
Existing works on learning to branch include imitation-learning-based (IL-based) and reinforcement-learning-based (RL-based) algorithms. IL-based methods assume a branching expert and leverage a machine learning model to mimic the branching strategy of this expert. For example, [9] uses support vector machine (SVM) to imitate a strong branching expert; [10, 27] represent each branching state as a bipartite graph and employ GNNs to improve the branching performance; [36] mimics the given optimal solutions as oracles with tree search. RL-based methods do not require a branching expert and explore the branch-and-branch trees to improve the heuristics automatically. For example, [37] first uses reinforcement learning for variable selection; other works use evolution strategy [38], retrospective branching trajectories [39], superior Q-network trained with the mixture data [40] and tree Markov decision process [11] to further improve the performance of RL-based branching policy.
3 Background
3.1 Branch-and-Bound Algorithm
In practice, most of the CO problems can be formulated as Mix-Integer Linear Programmings (MILPs) in the form of
(1) |
where , , , , and denote the lower and upper variable bound, respectively. The branch-and-bound (B&B) algorithm solves MILP instances with a divide-and-conquer approach. B&B recursively performs the following steps. First, it solves the linear relaxation of a selected subproblem, starting with the original MILP. It identifies an integer variable with fractional value in the current LP solution and generates two subproblems by adding constraints to the current subproblem, respectively. B&B then chooses a new subproblem to explore next. As the algorithm proceeds, the subproblems form a binary searching tree, where the root node represents the original MILP and the left and right children of a subproblem are two new subproblems created in the previous step. If a subproblem is infeasible or does not contain optimal solutions, the corresponding subtree is pruned.
A key factor influencing the efficiency of the B&B algorithm is how to select a fractional variable to branch on. So far, the strong branching policy [41] has been found to produce the smallest B&B trees among the existing heuristics. Strong branching selects a branching variable by evaluating the expected bound improvement for each candidate variable.
3.2 Instance Bipartite Graph and Branching Samples
We represent each MILP instance (1) as a weighted bipartite graph as follows. The vertex sets and represent the MILP’s constraint set and variable set, respectively. Each node in or has its own features or containing the information of the corresponding constraint or variable. The adjacency matrix is the constraint matrix . We let denote the set of all such bipartite graphs, called instance bipartite graphs.
Given a MILP instance, the B&B solver recursively solves a sequence of subproblems, which are also MILPs. We can thus represent each subproblem as a bipartite graph as we did for the MILP instances. Furthermore, we would like to involve some solver statistics in the features of these bipartite graphs, such as the reduced cost and incumbent values, to fully observe the solving states [10]. We call these observed bipartite graphs with solver statistics branching samples, denoted by .

4 IL- and RL-based Branching Formulation
The sequential branching decisions made during B&B can be formulated as a Markov decision process (MDP) [8], where the solver and branching module are the environment and agent, respectively.
We consider the solving process as a finite-horizon MDP, called branching MDP, with maximal time , specified by the tuple . Here represents the state space of branching samples, is the action space consisting of candidate branching variables, and is the reward function. The initialization function takes in a random noise and samples from to produce the instance , where we view the instance graph as the initial state . The transition function is deterministic since the branching process rigorously follows the universal mathematical rules. A policy maps an observed state to an action using the corresponding branching strategy.
At time , the branching module observes a branching sample corresponding to the current subproblem. The branching module then selects a variable in the branching candidates according to the branching policy . Finally, the solver splits the current subproblem using to generate another two subproblems and decides the next subproblem to perform branching.
In practice, we only have access to a finite set of training instances , and the testing instances may be drawn from an unknown environment different from the training distribution . The distributional shifts influence the initialization function in the testing environment, which we denote by .
To describe the relationship between training and testing distributions, we first define a cost function to measure the discrepancy between two instance graphs. We make the following assumptions. (A1) The transition functions in the training and testing environment remain the same since the solver follows rigorous mathematical rules and node selection policy. (A2) The transition function is -Lipschitz with the first variable and -Lipschitz with the second variable, that is
where the norm is a general and simplified notation that can represent the norms in the state space or action space. (A3) Each policy is Lipschitz continuous with Lipschitz constant .
4.1 IL-based branching formulation
To solve the branching MDP mentioned above, we can employ imitation learning or reinforcement learning to train the branching agent. IL-based branching does not require the reward signal, instead, it assumes a strong branching expert with policy that is considered to be near optimal. A predefined dataset of branching samples and branching variables is obtained by running the expert on the training instances . Given the dataset, IL clones the behavior of the expert by enforcing the learned agent to make the same decision with the expert under the same observation,
(2) |
where is the loss function. is the joint distribution of the sequence of states and action when running the expert on instance , i.e.,
which is kept fixed during training. To simplify the notations, we let .
4.2 RL-based branching formulation
The agent needs to explore the B&B tree using the current policy and tries to improve the policy by minimizing the negative accumulative reward,
(3) |
where we should notice that the state distribution depends on the policy , instance and transition function . We let .
5 From Robust Optimization to Instance Augmentation
Given a branching policy and a training instance distribution , we formulate the distributionally robust training as the following optimization problems
(IL) | ||||
(RL) |
where is the Wasserstein distance that measures the distance between distributions and . Specifically, we define , where is the coupling distribution of and . We let represent or to simplify the notations. However, computing the supremum over the environment variable is intractable, so we use a Lagrangian relaxation with a penalty instead:
[43] shows that we can define surrogate loss functions
such that the gradients of the surrogate functions
where the augmented instance is
This implies that in order to compute the gradient of , we just need to compute the gradient of the objective on augmented instances . Thus, we turn to find an effective augmentation strategy.
We first specify the optimal policy in the testing distribution, which is the best policy to generalize in the hypothesis policy class , i.e.,
(4) |
In practice, we can only get access to finite instances sampled from the distribution to estimate the objective, and obtain the best policy on the dataset,
(5) |
where the empirical risks
(IL) | |||
(RL) |
We then derive the objective using data augmentation, and denote the distribution of augmented instances by and the best policy on the augmented dataset by ,
(6) |
in which is the empirical risks using augmented instances.
6 AdaSolver
In this section, we present the robust optimization framework for learning-based exact MILP solvers using adversarial instance augmentation, namely Adversarial Instance Augmentation for the exact MILP Solver (AdaSolver).

6.1 Adversarial Augmentation
The overview of the proposed framework on the learn-to-branch task is depicted in Figure 2, which consists of two components: a learning-based branching policy network and an adversarial augmenter. The adversarial augmenter employs data augmentation to promote the diversity of graph structures in training data, while the learning-based branching policy network tries to minimize the training loss on samples from both the original and augmented instances to extract task-relevant features across distributions.
6.1.1 Instance Graph Augmentation Operators
Inspired by the fact that the MILP problem structures are included in the instance graph structures, we propose to augment the instances by masking variables, constraints, and edges in instance bipartite graphs to promote the diversity of graph structures. These operators are also reasonable from a real-world perspective. Notice that many MILP problems arise from real-world scheduling scenarios. These problems often feature variables that represent available workers, machines, vehicles, or other material resources, and they feature constraints that formulate the scheduling restrictive relationship among these resources. Therefore, these simple and reliable augmentation operators can also simulate perturbations of real-world resources, i.e., certain materials are inaccessible or some restrictions are lifted. Moreover, recent studies have shown that GNNs struggle to process certain “foldable” MILPs, as stated in [44]. In real-world scenarios, CO instances are highly structural and symmetric, resulting in numerous such structures in the training data that GNNs are unable to distinguish. This issue exacerbates the challenges posed by the limited diversity of training data. By augmenting the graph structures, AdaSolver effectively breaks the symmetric structures in training instances and promotes the diversity of instance graph structures. Thus, instance augmentation can improve not only the generalization ability of branching policy but also the sample efficiency, in which we can use fewer instances to attain comparable performance as we show in Section 8.3.
In practice, we control the usage of augmentation operators and masking proportion so that the augmented instances preserve a similar problem structure and computational hardness with the original instances, preventing harmful effects on training.
6.1.2 Contextual Bandit for Augmentation
The instance augmenter is a function mapping an instance bipartite graph to the augmented instance graph by applying augmentation operators on . However, two challenges for the instance augmentation remain to address. First, employing structural augmentation on a bipartite graph is generally a non-differentiable process. It is thus intractable to train the adversarial augmenter since the non-differentiable mapping poses challenges for backward propagation and the gradient-based update fails. Second, the training efficiency of the augmenter and branching policy network is different. The training of the augmenter is much slower than the branching policy network since the augmenter will receive much less training data than the branching policy network. The adversarial augmenter operates on MILP instance graphs to obtain augmented instances . Later the expert policy runs on each augmented instance to generate a sequence of branching samples containing solving states and actions in the IL setting, or the agent runs on the augmented instances to collect samples in the RL setting to calibrate the branching policy. Consequently, the adversarial augmenter receives significantly less training data than the branching policy. The issues of non-differentiability and unsatisfactory training efficiency require a sample-efficient training strategy under sparse supervised signals. Therefore, we resort to a contextual bandit version of the proximal policy optimization (PPO) algorithm [17].
To formulate the graph augmentation as a contextual bandit problem, we view the solver as the environment and the adversarial augmenter as the agent. A contextual bandit problem can be represented as the tuple . To avoid misleading notations with branching MDP, here we use the superscript CB to specify the contextual bandit formulation. Then we specify the state space , the action space and reward function for a MILP instance as follows.
-
•
The state space , or context vector space, also known as the context vector space, is the set of graph representations of MILP instances.
-
•
An action , where , and are the subsets of variables, edges, and constraints to mask, respectively. The agent applies action to an instance graph to obtain an augmented instance graph . After that, we collect branching samples on it.
-
•
The reward is defined to be the average loss of branching policy under the samples collected from the augmented graph from .
We describe the training process for the augmenter policy network with parameters under the contextual bandit formulation as follows. In practice, the augmenter policy network takes as input an instance graph and outputs the masking probability of each node and edge. We select an action and record the predicted probability, i.e., . We leverage a learnable state-value function with parameters to calculate the variance-reduced estimation of the advantage function . In order to improve the training efficiency of the augmenter, we use a replay buffer consisting of the tuples under the history augmentation policy for performing experience replay. We use to represent the probability proportion of the current policy and old policies sampled from the buffer. Finally, we reach the training objective of the adversarial augmenter, which includes the policy network and state-value function, by minimizing
(7) | ||||
where is the clip ratio. We show in Table VII that the sample-efficient bandit algorithm plays a vital role in the overall training process.
6.1.3 Discriminator for Data Augmentation
The adversarial augmenter tends to produce instances quite different from the original distribution, leading to some extremely easy or hard instances. This is because the training objective is to maximize the loss of the GNN on the augmented data, but the augmenter does not receive any restrictions on the augmentation. Thus, the augmenter may ignore the distribution information and just produce some hard OOD instances, which might break the MILP problem structures. To address this problem, we leverage a discriminator network to identify and remove those instances that severely deviate from the original distribution.
We instantiate the discriminator using a graph convolutional encoder and an MLP decoder. The graph encoder extracts the structure information of the MILP instance and the decoder predicts the probability that the instance is from the original distribution or not. To train the discriminator, we minimize the objective
When inference, the discriminator takes as input the augmented graph and decides to use the instance for training the GNN when and discard it otherwise. Moreover, we use an additional regularized reward for the GNN, i.e.,
to control the augmentation, where is a constant.
We provide the pseudo-code for the AdaSolver framework in Algorithm 1.
7 Theoretical Analysis for Adversarial Augmentation Framework
In this section, we provide a detailed theoretical analysis of the proposed instance augmentation technique. We analyze the generalization bound for data augmentation in both IL and RL settings. We suppose that the augmented instances are drawn from a perturbed distribution .
7.1 IL Setting
Since the branching samples can be shuffled and collected by interval sampling, the transition function may have little effect on the training dataset. We thus omit the transition in the IL setting and view the collected branching samples drawn independently from the same distribution. For the hypothesis policy class , we use the classical Rademacher complexity
and under the instance augmentation
to bound the generalization gap, where and . Furthermore, to simplify the notations, we denote the quantity by .
Now we derive the IL generalization bound between the empirical risks in the training dataset and the expected risk in the testing dataset. We can find the classical proof of the generalization bound with Rademacher complexity in [45] and the in-distribution generalization bound for data augmentaion in [46]. We now derive the OOD version of the generalization bound under data augmentation in the IL setting (9).
Theorem 1.
Suppose that the loss function is bounded by 1 and is -Lipschitz continuous. With probability as least and for any , we have the generalization bounds in the out-of-distribution imitation learning setting as follows.
-
•
The generalization bound for empirical risk minimization
(8) -
•
The generalization bound for data augmentation
(9)
The notation denotes the distribution over branching samples that are collected from instances drawn from distribution using policy . is the Wasserstein distance between two distributions with respect to the norm over the branching sample space. Furthermore, we suppose that instance augmentation close the distribution between training and testing data, i.e., for some ,
For , the generalization bound in Equation 9 is lower than that in Equation 8, that is
(10) | ||||
Thus, Theorem 1 shows that data augmentation is beneficial for out-of-distribution generalization settings.
7.2 RL Setting
We now explore the theory of instance augmentation for OOD RL settings. The transition function in the RL setting cannot be omitted since we use the parameterized policy to collect branching samples sequentially. We thus need to consider the sequential property to analyze the generalization. We denote the Rademacher complexity for policy class by
For instance augmentation,
We derive the following generalization bound and generalization error for RL setting in Theorem 2.
Notice that the sample distribution is closely related to and changes with different , posing difficulties in analyzing the generalization gap. Thus, we consider the reparameterizable RL proposed in [47]. Notice that given an instance and deterministic policy , the solving process is deterministic since the transition is deterministic. Reparameterizable RL supposes that we can use a random variable to model the random noise and employ the reparameterization trick such that
The RL process thus samples from a deterministic distribution in training and testing environments to produce instances and models the distributional shifts in different initialization functions. This presents significant technical convenience for our analysis.
Theorem 2.
Consider the solvers that can be formulated as the reparameterizable RL with horizon and parameter . Suppose that the expected accumulated cost function is normalized and bounded by 1. The state reward function is -Lipschitz. With probability at least , we have
(11) | ||||
8 Experiments
In this part, we conduct comparative experiments on multiple well-recognized benchmarks to demonstrate the effectiveness of our proposed AdaSolver framework in terms of solving efficiency and sample efficiency. Meanwhile, we also perform ablation studies and extensive studies to provide further insight into AdaSolver.
8.1 Experiment Setup
8.1.1 Benchmarks
We evaluate the performance of our proposed AdaSolver on four widely used benchmarks of NP-hard synthesis problem families and a challenging real-world Anonymous dataset. The synthesis datasets include the set covering, combinatorial auctions, capacitated facility location and maximum independent set. We follow the conventional data generation settings in [10], [26] and [27]. In each benchmark, we generate training instances and validation instances for AdaSolver. For the final evaluation, we generate an in-distribution testing set of 32 instances , and five transfer sets of 32 instances with larger scales. We emphasize that the instance scales in the transfer datasets increase in order and the maximum scale of the dataset is much larger than that in the previous works. The Anonymous dataset is a well-recognized real-world dataset consisting of a curated subset of the challenge MILPLIB library. It consists of large-scale instances with 98 for training, 20 for validation and 20 for testing. The instances in the Anonymous dataset have an average of 49603 constraints and 37881 variables.
We report the instances’ scale for each benchmark in Table I. Due to the limited space, More information about benchmarks and instance generation for the synthesis datasets can be found in Appendix C.
Set Covering | Combinatorial Auctions | |||||||||||
Constraint number | 500 | 1000 | 2000 | 4000 | 3000 | 8000 | 193.03 | 386.25 | 576.75 | 771.65 | 963.56 | 1156.59 |
Variable number | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
Capacitated Facility Location | Maximum Independent Set | |||||||||||
Constraint number | 10201 | 20301 | 40501 | 60701 | 80901 | 161701 | 1952.34 | 3946.56 | 5942.34 | 7936.40 | 9934.12 | 11931.65 |
Variable number | 10100 | 20100 | 40100 | 60100 | 80100 | 160100 | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
8.1.2 Metric
We report the average solving time (Time, lower is better) and primal-dual integral (PD integral, lower is better) for different branching approaches for the comparative experiments. For more details on the PD integral metric, please see Appendix A.3. Moreover, we provide more results in terms of another two well-recognized metrics, i.e., the average branching nodes (Node) and the average primal-dual gap (PD gap, lower is better) in Appendix E.1.
Following [12], we mainly focus on the Time metric in the distributions where the solver can solve all the instances within the time limit since the solving time is the most direct metric to reflect the solving efficiency. However, the average solving time is almost the same for the distributions where the solvers cannot solve out most of the instances within the limited time. As a result, it is hard to tell which method has the best performance thus the solving time metric is not suitable in these distributions. So we focus on another well-recognized metric primal-dual integral in these distributions instead. Given a time limit, the lower primal-dual integral implies the better quality of the found feasible solutions by the solver. Specifically, we focus on the Time metric on for the synthesis benchmarks, for the combinatorial auctions and capacitated facility location benchmarks. We focus on the primal-dual integral on the other distributions.
The Node metric reflects the size of B&B searching trees. When the solver solves out an instance, fewer explored nodes imply a more efficient searching policy.
8.1.3 Baselines
Our baselines include a SCIP branching rule and six learning-based branchers containing the competitive IL-based and RL-based approaches. The SCIP branching rule is the traditional state-of-the-art reliability pseudo-cost (RPB) branching heuristic. We then introduce the learning-based branching baselines. SVMRank [9] uses a support vector machine for learning to select the branching variables. Trees [48] is a branching policy using ExtraTrees. LMart [49] is based on LambdaMART. tMDP+DFS is a reinforcement learning method based on tree MDP [11]. TreeGate is a tree search branching approach [36] trained with imitation learning. GNN is the graph neural network approach for learning to branch [10]. Finally, we build our AdaSolver framework on the IL-based GNN branching policy [10] (GNN) and RL-based GNN branching policy (tMDP+DFS) [11], resulting in two methods AdaSolver-IL and AdaSolver-RL. As stated in [11], the RL-based method performs worse than RPB or the IL-based baselines. For RL-based setting, we thus mainly focus on the improvement of AdaSovler-RL over the underlying RL-based solvers.
8.1.4 Implementation
Throughout all experiments, the backend solver is the state-of-the-art open-source solver SCIP 7.0.3. The solving time limit on a testing instance is 900s. Similar to [10, 26, 27], we deactivate solver restarts and allow employing cutting plane generation module only at the root node. We keep all the other SCIP parameters to default so as to better observe the improvements from AdaSolver and make comparisons of these approaches as fair.
For model training, we use Adam optimizer [50] to optimize the model parameters. When training AdaSolver, the adversarial augmenter randomly chooses training instances to perform augmentation in each training epoch. We then collect branching samples from the augmented instances. We choose the augmentation operators that perform the best for each benchmark. More details for these baselines and our AdaSolver are available in Appendix D.
8.2 Comparative Experiments on Solving Effciency
8.2.1 Scale Generalization
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RPB | 10.98 | 114.91 | 87.62 | 666.215 | 831.41 | 7152.10 | 900.00 | 11494.93 | 900.00 | 14757.09 | 900.00 | 23564.46 |
SVMRank | 11.40 | 116.21 | 158.29 | 989.67 | 890.66 | 8546.88 | 900.03 | 13444.79 | 900.05 | 17018.42 | 900.05 | 24535.92 |
Trees | 14.86 | 131.23 | 210.26 | 1238.87 | 899.87 | 8690.80 | 900.01 | 13226.00 | 900.02 | 16772.03 | 900.02 | 24599.38 |
LambdaMart | 9.46 | 106.92 | 127.34 | 812.27 | 875.87 | 7797.14 | 900.01 | 12927.04 | 900.02 | 16668.55 | 900.03 | 24525.11 |
tMDP+DFS | 20.73 | 159.59 | 444.38 | 2892.12 | 900.006 | 11559.79 | 900.00 | 15393.24 | 900.00 | 18384.06 | 900.00 | 24387.4 |
TreeGate | 25.16 | 187.99 | 271.82 | 1790.70 | 900.00 | 10391.53 | 900.00 | 14389.55 | 900.00 | 17747.06 | 900.02 | 24118.51 |
GNN | 9.25 | 116.22 | 95.57 | 716.25 | 844.74 | 7310.17 | 900.00 | 11899.40 | 900.00 | 16067.00 | 900.00 | 23786.51 |
AdaSolver-RL | 18.87 | 152.26 | 433.97 | 2885.97 | 900.00 | 11330.87 | 900.00 | 14466.95 | 900.00 | 17550.63 | 900.00 | 24319.00 |
AdaSolver-IL | 6.28 | 80.67 | 73.78 | 515.92 | 839.80 | 7079.13 | 900.00 | 10691.05 | 900.00 | 14452.71 | 900.00 | 21792.69 |
Set Covering |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RPB | 2.84 | 32.20 | 23.20 | 156.62 | 224.66 | 156.62 | 838.77 | 1622.46 | 900.00 | 2313.35 | 900.004 | 2657.51 |
SVMRank | 2.59 | 26.32 | 51.12 | 194.46 | 427.01 | 625.01 | 881.33 | 1730.40 | 900.07 | 2293.06 | 900.00 | 2580.31 |
Trees | 2.76 | 26.44 | 87.05 | 221.68 | 790.01 | 994.48 | 900.01 | 1782.20 | 900.01 | 2323.16 | 900.01 | 2535.18 |
LambdaMart | 1.90 | 24.69 | 28.08 | 156.10 | 287.69 | 446.49 | 862.31 | 1563.85 | 900.02 | 2136.37 | 900.02 | 2455.08 |
tMDP+DFS | 2.18 | 24.76 | 27.98 | 146.15 | 312.93 | 518.45 | 860.76 | 1871.73 | 900.00 | 2664.34 | 900.00 | 3103.08 |
TreeGate | 5.75 | 37.74 | 84.44 | 303.53 | 623.25 | 1168.37 | 891.50 | 2486.65 | 900.00 | 3302.70 | 900.00 | 3771.89 |
GNN | 2.20 | 30.48 | 20.30 | 167.60 | 195.68 | 384.17 | 819.28 | 1300.99 | 900.00 | 1931.54 | 900.00 | 2293.41 |
AdaSolver-RL | 2.04 | 106.83 | 17.28 | 117.28 | 323.94 | 517.91 | 846.32 | 1685.69 | 900.00 | 2379.82 | 900.00 | 2980.48 |
AdaSolver-IL | 1.63 | 22.44 | 16.34 | 117.13 | 178.32 | 314.32 | 817.57 | 1154.79 | 900.00 | 1716.55 | 900.00 | 2107.19 |
Combinatorial Auctions |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RPB | 65.67 | 170.34 | 287.52 | 719.26 | 689.45 | 3333.72 | 886.54 | 10762.83 | 892.90 | 20102.78 | 897.51 | 27527.24 |
SVMRank | 56.98 | 274.04 | 207.16 | 1011.28 | 592.15 | 3913.06 | 884.87 | 13898.43 | 873.08 | 22201.25 | 887.00 | 33649.01 |
Trees | 77.41 | 285.65 | 262.54 | 1038.46 | 652.15 | 3895.60 | 887.00 | 13649.01 | 878.81 | 22433.54 | 889.54 | 34909.74 |
LambdaMart | 51.30 | 281.76 | 201.06 | 988.92 | 591.41 | 3927.82 | 884.29 | 13908.60 | 872.67 | 22455.29 | 886.84 | 34976.25 |
tMDP+DFS | 73.35 | 275.38 | 270.71 | 1108.57 | 683.42 | 4253.40 | 886.07 | 12950.94 | 888.14 | 23675.80 | 893.70 | 34995.77 |
TreeGate | 47.96 | 180.17 | 192.29 | 757.39 | 661.64 | 4238.80 | 873.98 | 10553.59 | 886.34 | 19564.65 | 897.63 | 27529.18 |
GNN | 51.63 | 290.91 | 235.07 | 1107.29 | 660.25 | 4418.54 | 854.27 | 10297.25 | 868.93 | 19033.90 | 890.53 | 32302.74 |
AdaSolver-RL | 64.10 | 257.10 | 233.64 | 892.77 | 587.02 | 3332.38 | 852.53 | 7779.28 | 849.93 | 14727.70 | 852.51 | 27298.25 |
AdaSolver-IL | 34.51 | 196.56 | 147.47 | 738.36 | 555.66 | 3324.63 | 731.81 | 6644.35 | 819.31 | 13818.34 | 884.48 | 28231.23 |
Capacitated Facility Location |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RPB | 16.46 | 86.57 | 302.43 | 1045.88 | 883.79 | 3818.32 | 900.00 | 4634.32 | 900.00 | 5067.01 | 900.00 | 5683.46 |
SVMRank | 18.37 | 76.17 | 456.19 | 879.49 | 883.18 | 2539.03 | 900.07 | 3640.36 | 900.09 | 4313.94 | 900.08 | 5105.40 |
Trees | 26.04 | 91.01 | 701.12 | 1353.35 | 900.01 | 2828.39 | 900.02 | 3729.98 | 900.02 | 4353.62 | 900.03 | 5013.31 |
LambdaMart | 12.80 | 61.79 | 445.80 | 826.64 | 877.65 | 2475.11 | 900.02 | 3570.20 | 900.02 | 4221.15 | 900.04 | 5042.00 |
tMDP+DFS | 9.45 | 54.00 | 294.04 | 644.69 | 781.44 | 2201.37 | 897.91 | 3628.92 | 900.00 | 4613.32 | 900.01 | 5314.64 |
TreeGate | 67.48 | 229.32 | 844.92 | 2711.83 | 900.03 | 3939.96 | 900.03 | 4520.56 | 900.01 | 4906.46 | 900.01 | 5299.72 |
GNN | 11.09 | 65.73 | 254.12 | 558.31 | 817.04 | 2284.32 | 900.00 | 3532.03 | 900.01 | 4387.27 | 900.01 | 5168.77 |
AdaSolver-RL | 8.43 | 51.46 | 282.54 | 627.81 | 803.75 | 2117.01 | 879.81 | 3318.08 | 900.02 | 4148.24 | 900.00 | 5099.92 |
AdaSolver-IL | 7.39 | 44.56 | 210.36 | 460.80 | 806.78 | 2355.11 | 900.00 | 3591.38 | 900.00 | 4375.48 | 900.00 | 5152.28 |
Maximum Independent Set |
We evaluate the generalization ability of AdaSolver and compare it to other baselines on four benchmarks of generated datasets (see Table II). According to Table II, our AdaSolver-IL consistently outperforms the baselines in the four synthesis benchmarks in terms of Time or PD Integral metric. AdaSolver-RL has a significant improvement over the underlying RL-based solver. Please refer to Appendix E.1 for results in terms of Nodes and PD gap metrics.
In the easier distributions and , AdaSolver-IL achieves the shortest solving time and outperforms all the baselines. The GNN is a competitive baseline that outperforms other learning-based baselines on much of the distributions and of the four benchmarks. While LambdaMart is another promising learning-based baseline with stable performance. Compared to GNN, AdaSolver-IL achieves 27.66% improvement of solving time on the and , leading to a total of 39.06% acceleration over the RPB heuristics. AdaSolver-RL improves the solving time of tMDP+DFS up to 12.12%. The results demonstrate that the AdaSolver is effective in improving the performance in the in-distribution or minor out-of-distribution situations.
In the harder distributions , AdaSolver still significantly outperforms other baselines with the lowest PD integral, leading to an improvement of 11.12% over GNN and 13.15% over tMDP+DFS. This demonstrates that AdaSolver can consistently enhance the generalization ability of the underlying solvers to instances with over five times to ten times larger scales.
We summarize the improvement of AdaSolver over the underlying learning-based solver GNN and tMDP+DFS in Table III, demonstrating the effectiveness of AdaSolver.
Benchmark | Improvement (%) | |
AdaSolver-IL | AdaSolver-RL | |
Set Covering | 14.44 | 4.02 |
Combinatorial Auctions | 14.12 | 10.95 |
Capacitated Facility Location | 26.95 | 23.35 |
Maximum Independent Set | 7.73 | 6.87 |
8.2.2 Perturbation Robustness (Real-World Dataset)
To evaluate the performance in real-world applications, we deploy AdaSolver to a large-scale real-world Anonymous dataset, a widely-used subset of the MILPLIB library. The instances’ constraint number in the Anonymous dataset ranges from 3375 to 159823, and the variable number ranges from 1613 to 92261. The wide range of problem scales as well as real-world perturbations pose significant challenges for the learning-based methods. The results are summarized in Table IV, which demonstrates that AdaSolver is able to improve the performance of the underlying solvers both in IL and RL settings. AdaSolver-IL can reach the best PD gap and AdaSolver-RL has the best PD integral metric. The performance of AdaSolver sheds light on the generalizable neural solvers in real-world applications.
Method | Time(s) | Nodes | PD integral | PD gap |
RPB | 866.68 | 43907.95 | 76018.92 | 5.50e+19 |
SVMRank | 899.93 | 14519.66 | 65522.04 | 4.83e+19 |
Trees | 900.02 | 9658.55 | 76143.65 | 4.66e+19 |
LambdaMart | 830.45 | 16229.83 | 62415.32 | 4.66e+19 |
tMDP+DFS | 763.81 | 40933.30 | 44679.11 | 5.00e+18 |
GNN | 823.88 | 37469.58 | 45483.01 | 3.50e+18 |
AdaSolver-RL | 754.08 | 46746.05 | 42179.25 | 5.00e+18 |
AdaSolver-IL | 830.50 | 57156.53 | 42579.31 | 3.33e+18 |
8.3 Improving Sample Efficiency of the Branching Policy
We compare the performance of AdaSolver-IL and the underlying GNN solver trained on only 100 instances, one percent amount of instances of that in Section 8.2.1. The results in Table V demonstrate that the underlying learning-based solvers GNN face significant performance degradation when trained on fewer instances. Specifically, the performance of GNN decrease by 49.70% on the set covering benchmark and 76.45% on the combinatorial auctions benchmark. In such a situation, AdaSolver-IL effectively relieves the performance degradation and outperforms GNN, which implies that AdaSolver is much more sample-efficient. In some real-world scenarios, we often have only a few training instances, and the limited availability of instances poses severe challenges for the training of learning-based solvers. Thus, the performance under fewer training instances significantly reflects the potential of practical applicability.
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
GNN | 13.41 | 138.13 | 278.75 | 1694.85 | 900.00 | 10049.36 | 900.00 | 14158.38 | 900.00 | 16706.91 | 900.00 | 24062.04 |
AdaSolver-IL | 7.35 | 85.06 | 237.97 | 1289.08 | 900.00 | 8840.34 | 900.00 | 12381.27 | 900.00 | 15240.56 | 900.00 | 22754.06 |
Set Covering |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
GNN | 3.28 | 35.51 | 40.05 | 185.77 | 577.08 | 820.29 | 900.00 | 1981.16 | 900.00 | 2582.09 | 900.00 | 3014.64 |
AdaSolver-IL | 2.62 | 35.69 | 24.54 | 182.54 | 217.18 | 428.28 | 830.51 | 1349.77 | 900.00 | 1934.11 | 900.00 | 2326.64 |
Combinatorial Auctions |
8.4 Ablation Studies
We conduct ablation studies on the four synthesis benchmarks to show the contributions of the adversarial augmentation strategy. Specifically, we compare the performance of AdaSolver with a random augmentation approach (RA) without the adversarial graph augmenter. Please see Table VI for the results in terms of Time, Nodes and PD integral metrics. An interesting observation is that RA has a slight superiority over AdaSolver in terms of solving time on the distribution of the set covering, capacitated facility location and maximum independent set benchmarks. But AdaSolver consistently performs better in terms of other benchmarks. RA may be more effective in improving the in-distribution performance of branching policy but fails to capture and produce the graph structures that are useful for generalization to larger instances. The results demonstrate that the adversarial graph augmenter plays an important role in finding the worst-case instances that are beneficial for the generalization of the branching policy networks.
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RA | 6.13 | 79.31 | 85.01 | 572.71 | 852.54 | 7299.03 | 900.00 | 15603.95 | 900.00 | 18496.45 | 900.00 | 24254.16 |
AdaSolver-IL | 6.28 | 80.67 | 73.78 | 515.92 | 839.80 | 7079.13 | 900.00 | 10691.05 | 900.00 | 14452.71 | 900.00 | 21792.69 |
RA | 18.75 | 149.36 | 436.97 | 2931.59 | 900.00 | 11646.88 | 900.00 | 15478.89 | 900.00 | 18691.82 | 900.00 | 25966.91 |
AdaSolver-RL | 18.87 | 152.26 | 433.97 | 2885.97 | 900.00 | 11330.87 | 900.00 | 14466.95 | 900.00 | 17550.63 | 900.00 | 24319.00 |
Set Covering |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RA | 1.70 | 22.93 | 22.90 | 127.69 | 464.55 | 662.24 | 829.49 | 1327.54 | 900.00 | 1889.85 | 900.00 | 2299.41 |
AdaSolver-IL | 1.63 | 22.44 | 16.34 | 117.13 | 178.32 | 314.32 | 817.57 | 1154.79 | 900.00 | 1716.55 | 900.00 | 2107.19 |
RA | 2.18 | 24.76 | 18.80 | 118.94 | 301.43 | 489.06 | 845.88 | 1751.81 | 900.00 | 2548.19 | 900.00 | 3247.74 |
AdaSolver-RL | 2.04 | 106.83 | 17.28 | 117.28 | 323.94 | 517.91 | 846.32 | 1685.69 | 900.00 | 2379.82 | 900.00 | 2980.48 |
Combinatorial Auctions |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RA | 33.24 | 215.65 | 140.40 | 752.18 | 564.12 | 3548.71 | 865.28 | 11576.80 | 871.82 | 18585.50 | 877.97 | 32716.55 |
AdaSolver | 34.51 | 196.56 | 147.47 | 738.36 | 555.66 | 3324.63 | 731.81 | 6644.35 | 819.31 | 13818.34 | 884.48 | 28231.23 |
RA | 68.96 | 238.76 | 243.00 | 936.84 | 637.58 | 3953.49 | 877.48 | 8579.21 | 892.98 | 21496.21 | 893.97 | 39560.20 |
AdaSolver-RL | 64.10 | 257.10 | 233.64 | 892.77 | 587.02 | 3332.38 | 852.53 | 7779.28 | 849.93 | 14727.70 | 852.51 | 27298.25 |
Capacitated Facility Location |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
RA | 7.26 | 45.85 | 534.59 | 996.39 | 871.08 | 2999.36 | 900.00 | 5684.19 | 900.00 | 6331.45 | 900.00 | 6833.94 |
AdaSolver-IL | 7.39 | 44.56 | 210.36 | 460.80 | 806.78 | 2355.11 | 900.00 | 3591.38 | 900.00 | 4375.48 | 900.00 | 5152.28 |
RA | 8.00 | 48.94 | 275.73 | 623.85 | 790.42 | 2153.45 | 890.97 | 3686.33 | 900.00 | 4687.60 | 900.00 | 5625.48 |
AdaSolver-RL | 8.43 | 51.46 | 282.54 | 627.81 | 803.75 | 2117.01 | 879.81 | 3318.08 | 900.02 | 4148.24 | 900.00 | 5099.92 |
Maximum Independent Set |
8.5 Extensive Studies
8.5.1 Different Training Methods of the Augmenter
We compare AdaSolver with its variance that uses reinforce algorithm to train the augmenter (AdaSolver-REINFORCE) in Table VII. The AdaSolver outperforms AdaSolver-REINFORCE in all distributions of the set covering and combinatorial auctions benchmarks. It is well-known that PPO is much more sample-efficient than reinforce, and AdaSolver further improves the performance using the sample-efficient training method.
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
AdaSolver-REINFORCE | 1.79 | 24.66 | 26.65 | 151.13 | 493.92 | 737.42 | 882.36 | 1472.78 | 900.00 | 1864.20 | 900.00 | 2189.29 |
AdaSolver-IL | 1.63 | 22.44 | 16.34 | 117.13 | 178.32 | 314.32 | 817.57 | 1154.79 | 900.00 | 1716.55 | 900.00 | 2107.19 |
Combinatorial Auctions |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
AdaSolver-REINFORCE | 6.99 | 91.23 | 104.30 | 712.53 | 872.44 | 7749.23 | 900.00 | 11105.89 | 900.00 | 14739.92 | 900.00 | 23590.80 |
AdaSolver-IL | 6.28 | 80.67 | 73.78 | 515.92 | 839.80 | 7079.13 | 900.00 | 10691.05 | 900.00 | 14452.71 | 900.00 | 21792.69 |
Set Covering |
8.5.2 Sensitivity Analysis on the Masking proportions
We analyze the sensitivity of AdaSolver’s performance to the masking proportion on set covering and combinatorial auctions benchmarks, and we report the results in Table VIII. The results show that the performance of AdaSolver is not sensitive to different masking proportions and outperforms the default GNN policy when the proportion is limited to 4%. Another observation is that the lower masking proportions tend to provide a better-learned policy since lower masking proportions promote data diversity and do not significantly break the problem structures, avoiding biased data for training.
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
AdaSolver-IL 0.01 | 1.63 | 22.44 | 16.34 | 117.13 | 178.32 | 314.32 | 817.57 | 1154.79 | 900.00 | 1716.55 | 900.00 | 2107.19 |
AdaSolver-IL 0.02 | 1.96 | 27.68 | 19.21 | 155.75 | 194.64 | 375.73 | 819.28 | 1254.26 | 900.00 | 1743.97 | 900.00 | 2051.01 |
AdaSolver-IL 0.03 | 1.92 | 26.14 | 17.95 | 132.11 | 202.41 | 360.52 | 820.05 | 1240.96 | 900.00 | 1720.49 | 900.00 | 2054.11 |
AdaSolver-IL 0.04 | 2.22 | 30.55 | 20.35 | 156.33 | 193.72 | 361.55 | 820.80 | 1209.92 | 900.00 | 1700.80 | 900.00 | 1968.02 |
AdaSolver-IL 0.05 | 2.36 | 32.73 | 19.78 | 161.43 | 196.50 | 384.36 | 823.93 | 1292.45 | 900.00 | 1769.48 | 900.00 | 2116.08 |
Combinatorial Auctions |
Method | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral | Time(s) | PD integral |
AdaSolver-IL 0.01 | 6.28 | 80.67 | 73.78 | 515.92 | 839.80 | 7079.13 | 900.00 | 10691.05 | 900.00 | 14452.71 | 900.00 | 21792.69 |
AdaSolver-IL 0.02 | 7.09 | 87.37 | 90.46 | 671.65 | 872.50 | 7392.75 | 900.00 | 11415.87 | 900.00 | 14788.13 | 900.00 | 23364.03 |
AdaSolver-IL 0.03 | 7.05 | 87.98 | 90.39 | 672.11 | 730.26 | 7396.32 | 900.00 | 11282.07 | 900.00 | 15107.76 | 900.00 | 23151.86 |
AdaSolver-IL 0.04 | 9.05 | 98.56 | 99.42 | 737.68 | 869.89 | 8088.66 | 900.00 | 12128.56 | 900.00 | 15376.10 | 900.00 | 24009.53 |
AdaSolver-IL 0.05 | 9.20 | 98.94 | 107.61 | 794.04 | 876.45 | 8432.83 | 900.00 | 12479.55 | 900.00 | 16414.10 | 900.00 | 23640.26 |
Set Covering |
8.5.3 Analysis on the GNN Embeddings in the IL setting
We use T-SNE [51] to visualize the learned embeddings of the GNN branching policy trained on the distribution . The results on the set covering benchmark are illustrated in Figure 3, where different colors represent samples from various distributions on the set covering benchmark. In order to investigate the robustness of environmental perturbations, we select ten distributions with different instance scales and nonzero entry proportions of constraint matrices. The instance bipartite graphs across distributions thus have different local graph structures.
We first observe that before training, GNN’s embeddings on different distributions are separated from each other with little overlap 3(a). The GNN model without AdaSolver cannot capture task-relevant features across distributions. This means that the GNN model does not extract common task-relevant features across distributions on the same problem family. The predicted scores using these embeddings will become unstable across distributions, and the generalization ability of the GNN model is poor. We observe in Figure 3(b) that the embeddings of GNN do not scatter around but still appear loose. Finally, we can see that our proposed AdaSolver framework leads to more concentrative and stable embeddings in Figure 3(c), providing evidence that AdaSolver is not so sensitive to the distributional shifts and perturbations.



8.5.4 Analysis on the Imitation Accuraccy
We report the imitation learning accuracy on different distributions on the set covering and combinatorial auctions benchmarks in Table IX. For in-distribution imitation learning accuracy, AdaSolver-IL achieves the comparable or higher imitation accuracy of GNN. An interesting observation is that on of the set covering benchmark, the imitation learning accuracy of AdaSolver-IL is slightly lower than that of GNN, but AdaSolver-IL consistently outperforms GNN on these distributions. Thus, the imitation learning accuracy is not a definite factor of solving performance, and the higher imitation learning accuracy does not imply better performance.
Set Covering | ||||||
GNN | 70.02 | 66.36 | 54.07 | 48.03 | 48.05 | 16.08 |
AdaSolver-IL | 70.03 | 66.39 | 54.02 | 52.01 | 44.08 | 16.00 |
Combinatorial Auctions | ||||||
GNN | 67.63 | 72.42 | 67.86 | 72.07 | 76.02 | 47.03 |
AdaSolver-IL | 68.55 | 72.57 | 66.92 | 62.00 | 74.02 | 44.06 |
9 Conclusion
This paper focuses on the generalization and robustness issues of the learning-based MILP solvers under restrictive training distributions. We observe from empirical results that the learned embeddings of GNN are unstable across distributions on the learn-to-branch task, and we infer the overfitting risks to the homogeneous graph structures. To tackle this challenge, we propose a distributionally robust optimization framework (AdaSolver) for neural solvers to promote the diversity of instance graph structures in training distributions by adversarial instance augmentation. Specifically, we formulate the adversarial instance graph augmentation as a contextual bandit problem and employ a sample efficient decision model to enable efficient training of the augmenter under insufficient training data. Experiments show that AdaSolver significantly improves the generalization of neural solvers on five well-recognized benchmarks.
Acknowledgments
The authors would like to thank all the anonymous reviewers for their insightful comments. This work was supported by the National Key R&D Program of China under contract 2022ZD0119801, National Nature Science Foundations of China grants U19B2026, U19B2044, 61836011, 62021001, and61836006, and the Fundamental Research Funds for the Central Universities grant WK3490000004.
References
- [1] Z.-L. Chen, “Integrated production and outbound distribution scheduling: review and extensions,” Operations research, vol. 58, no. 1, pp. 130–148, 2010.
- [2] K. Ma, L. Xiao, J. Zhang, and T. Li, “Accelerating an fpga-based sat solver by software and hardware co-design,” Chinese Journal of Electronics, vol. 28, no. 5, pp. 953–961, 2019.
- [3] S. Liu, J. M. Pinto, and L. G. Papageorgiou, “A tsp-based milp model for medium-term planning of single-stage continuous multiproduct plants,” Industrial & Engineering Chemistry Research, vol. 47, no. 20, pp. 7733–7743, 2008.
- [4] K. Bestuzheva, M. Besançon, W.-K. Chen, A. Chmiela, T. Donkiewicz, J. van Doornmalen, L. Eifler, O. Gaul, G. Gamrath, A. Gleixner et al., “The scip optimization suite 8.0,” arXiv preprint arXiv:2112.08872, 2021.
- [5] L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2021.
- [6] A. Colorni, M. Dorigo, F. Maffioli, V. Maniezzo, G. Righini, and M. Trubian, “Heuristics from nature for hard combinatorial optimization problems,” International Transactions in Operational Research, vol. 3, no. 1, pp. 1–21, 1996.
- [7] J. Hromkovič, Algorithmics for hard problems: introduction to combinatorial optimization, randomization, approximation, and heuristics. Springer Science & Business Media, 2013.
- [8] H. He, H. Daume III, and J. M. Eisner, “Learning to search in branch and bound algorithms,” Advances in Neural Information Processing Systems, vol. 27, 2014.
- [9] E. Khalil, P. Le Bodic, L. Song, G. Nemhauser, and B. Dilkina, “Learning to branch in mixed integer programming,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, 2016.
- [10] M. Gasse, D. Chételat, N. Ferroni, L. Charlin, and A. Lodi, “Exact combinatorial optimization with graph convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- [11] L. Scavuzzo, F. Chen, D. Chetelat, M. Gasse, A. Lodi, N. Yorke-Smith, and K. Aardal, “Learning to branch with tree mdps,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 18 514–18 526. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/756d74cd58592849c904421e3b2ec7a4-Paper-Conference.pdf
- [12] Z. Wang, X. Li, J. Wang, Y. Kuang, M. Yuan, J. Zeng, Y. Zhang, and F. Wu, “Learning cut selection for mixed-integer linear programming via hierarchical sequence model,” in The Eleventh International Conference on Learning Representations, 2023.
- [13] Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinatorial optimization: a methodological tour d’horizon,” European Journal of Operational Research, vol. 290, no. 2, pp. 405–421, 2021.
- [14] J. Sakuma and S. Kobayashi, “A genetic algorithm for privacy preserving combinatorial optimization,” in Annual Conference on Genetic and Evolutionary Computation, 2007.
- [15] C. Wang, Z. Yu, S. McAleer, T. Yu, and Y. Yang, “Asp: Learn a universal neural solver!” arXiv preprint arXiv:2303.00466, 2023.
- [16] S. Geisler, J. Sommer, J. Schuchardt, A. Bojchevski, and S. Günnemann, “Generalization of neural combinatorial solvers through the lens of adversarial robustness,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=vJZ7dPIjip3
- [17] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/1707.06347
- [18] A. Lodi and G. Zarpellon, “On learning and branching: a survey,” Top, vol. 25, pp. 207–236, 2017.
- [19] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” Advances in Neural Information Processing Systems, vol. 28, 2015.
- [20] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization with reinforcement learning,” arXiv preprint arXiv:1611.09940, 2016.
- [21] W. Kool, H. van Hoof, and M. Welling, “Attention, learn to solve routing problems!” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=ByxBFsRqYm
- [22] D. Selsam, M. Lamm, B. Bünz, P. Liang, L. de Moura, and D. L. Dill, “Learning a SAT solver from single-bit supervision,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=HJMC_iA5tm
- [23] E. Yolcu and B. Póczos, “Learning local search heuristics for boolean satisfiability,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- [24] M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác, “Reinforcement learning for solving the vehicle routing problem,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- [25] T. Barrett, W. Clements, J. Foerster, and A. Lvovsky, “Exploratory combinatorial optimization with reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 3243–3250.
- [26] P. Gupta, M. Gasse, E. Khalil, P. Mudigonda, A. Lodi, and Y. Bengio, “Hybrid models for learning to branch,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 087–18 097, 2020.
- [27] P. Gupta, E. B. Khalil, D. Chetélat, M. Gasse, Y. Bengio, A. Lodi, and M. P. Kumar, “Lookback for learning to branch,” Transactions of Machine Learning Research (TMLR), 2022. [Online]. Available: https://openreview.net/forum?id=EQpGkw5rvL
- [28] Y. Tang, S. Agrawal, and Y. Faenza, “Reinforcement learning for integer programming: Learning to cut,” in International Conference on Machine Learning. PMLR, 2020, pp. 9367–9376.
- [29] T. Achterberg, “Constraint integer programming,” Ph.D. dissertation, 2007.
- [30] Z. Zeyang, Z. Ziwei, W. Xin, and Z. Wenwu, “Learning to solve travelling salesman problem with hardness-adaptive curriculum,” in AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 9136–9144.
- [31] J. Yuan, W. Yaoxin, C. Zhiguang, and Z. Jie, “Learning to solve routing problems via distributionally robust optimization,” in AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 9786–9794.
- [32] J. Bi, Y. Ma, J. Wang, Z. Cao, J. Chen, Y. Sun, and Y. M. Chee, “Learning generalizable models for vehicle routing problems via knowledge distillation,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=sOVNpUEgKMp
- [33] L. Xin, W. Song, Z. Cao, and J. Zhang, “Generative adversarial training for neural combinatorial optimization models,” 2022. [Online]. Available: https://openreview.net/forum?id=9vsRT9mc7U
- [34] H. Duan, P. Vaezipoor, M. B. Paulus, Y. Ruan, and C. Maddison, “Augment with care: Contrastive learning for combinatorial problems,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162. PMLR, 17–23 Jul 2022, pp. 5627–5642. [Online]. Available: https://proceedings.mlr.press/v162/duan22b.html
- [35] H. Lu, Z. Li, R. Wang, Q. Ren, X. Li, M. Yuan, J. Zeng, X. Yang, and J. Yan, “Roco: A general framework for evaluating robustness of combinatorial optimization solvers on graphs,” in The Eleventh International Conference on Learning Representations.
- [36] G. Zarpellon, J. Jo, A. Lodi, and Y. Bengio, “Parameterizing branch-and-bound search trees to learn branching policies,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 5, pp. 3931–3939, May 2021. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/16512
- [37] M. Etheve, Z. Alès, C. Bissuel, O. Juan, and S. Kedad-Sidhoum, “Reinforcement learning for variable selection in a branch and bound algorithm,” in Integration of Constraint Programming, Artificial Intelligence, and Operations Research, E. Hebrard and N. Musliu, Eds. Cham: Springer International Publishing, 2020, pp. 176–185.
- [38] H. Sun, W. Chen, H. Li, and L. Song, “Improving learning to branch via reinforcement learning,” in Learning Meets Combinatorial Algorithms at NeurIPS2020, 2020. [Online]. Available: https://openreview.net/forum?id=z4D7-PTxTb
- [39] C. W. F. Parsonson, A. Laterre, and T. D. Barrett, “Reinforcement learning for branch-and-bound optimisation using retrospective trajectories,” in AAAI Conference on Artificial Intelligence, vol. 37 (4), 2023, pp. 4061–4069. [Online]. Available: https://doi.org/10.1609/aaai.v37i4.25521
- [40] Q. Qu, X. Li, Y. Zhou, J. Zeng, M. Yuan, J. Wang, J. Lv, K. Liu, and K. Mao, “An improved reinforcement learning algorithm for learning to branch,” arXiv preprint arXiv:2201.06213, 2022.
- [41] D. Applegate, R. Bixby, V. Chvátal, and W. Cook, “Finding cuts in the tsp (a preliminary report),” Citeseer, Tech. Rep., 1995.
- [42] A. Prouvost, J. Dumouchelle, L. Scavuzzo, M. Gasse, D. Chételat, and A. Lodi, “Ecole: A gym-like library for machine learning in combinatorial optimization solvers,” in Learning Meets Combinatorial Algorithms at NeurIPS2020, 2020. [Online]. Available: https://openreview.net/forum?id=IVc9hqgibyB
- [43] R. Volpi, H. Namkoong, O. Sener, J. C. Duchi, V. Murino, and S. Savarese, “Generalizing to unseen domains via adversarial data augmentation,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 5339–5349. [Online]. Available: https://proceedings.neurips.cc/paper/2018/hash/1d94108e907bb8311d8802b48fd54b4a-Abstract.html
- [44] Z. Chen, J. Liu, X. Wang, J. Lu, and W. Yin, “On representing mixed-integer linear programs by graph neural networks,” 2022.
- [45] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2nd ed., ser. Adaptive Computation and Machine Learning. Cambridge, MA: MIT Press, 2018.
- [46] S. Chen, E. Dobriban, and J. Lee, “A group-theoretic framework for data augmentation,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 21 321–21 333. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2020/file/f4573fc71c731d5c362f0d7860945b88-Paper.pdf
- [47] H. Wang, S. Zheng, C. Xiong, and R. Socher, “On the generalization gap in reparameterizable reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 6648–6658. [Online]. Available: https://proceedings.mlr.press/v97/wang19o.html
- [48] C. Hansknecht, I. Joormann, and S. Stiller, “Cuts, primal heuristics, and learning to branch for the time-dependent traveling salesman problem,” 2018.
- [49] T. Joachims, “Optimizing search engines using clickthrough data,” Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002. [Online]. Available: https://api.semanticscholar.org/CorpusID:207605508
- [50] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6980
- [51] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of Machine Learning Research, vol. 9, no. 11, 2008.
- [52] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, pp. 1–48, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:195811894
- [53] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” ArXiv, vol. abs/1708.04896, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:2035600
- [54] L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” ArXiv, vol. abs/1712.04621, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:12219403
- [55] D. Yarats, I. Kostrikov, and R. Fergus, “Image augmentation is all you need: Regularizing deep reinforcement learning from pixels,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=GY6-6sTvGaf
- [56] M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 19 884–19 895. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2020/file/e615c82aba461681ade82da2da38004a-Paper.pdf
- [57] M. Laskin, A. Srinivas, and P. Abbeel, “CURL: Contrastive unsupervised representations for reinforcement learning,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 5639–5650. [Online]. Available: https://proceedings.mlr.press/v119/laskin20a.html
- [58] A. Sadeghi, M. Ma, B. Li, and G. B. Giannakis, “Distributionally robust semi-supervised learning over graphs,” arXiv preprint arXiv:2110.10582, 2021.
- [59] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack on graph structured data,” in International Conference on Machine Learning. PMLR, 2018, pp. 1115–1124.
- [60] F. Feng, X. He, J. Tang, and T.-S. Chua, “Graph adversarial training: Dynamically regularizing based on graph structure,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 6, pp. 2493–2504, 2019.
- [61] H. Xue, K. Zhou, T. Chen, K. Guo, X. Hu, Y. Chang, and X. Wang, “Cap: Co-adversarial perturbation on weights and features for improving generalization of graph neural networks,” arXiv preprint arXiv:2110.14855, 2021.
- [62] T. Dao, A. Gu, A. Ratner, V. Smith, C. De Sa, and C. Re, “A kernel theory of modern data augmentation,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 1528–1537. [Online]. Available: https://proceedings.mlr.press/v97/dao19b.html
- [63] S. Wu, H. Zhang, G. Valiant, and C. Re, “On the generalization effects of linear transformations in data augmentation,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 10 410–10 420. [Online]. Available: https://proceedings.mlr.press/v119/wu20g.html
- [64] K. Leyton-Brown, M. Pearson, and Y. Shoham, “Towards a universal test suite for combinatorial auction algorithms,” in Proceedings of the 2nd ACM Conference on Electronic Commerce, 2000, pp. 66–76.
- [65] E. Balas and A. Ho, Set covering algorithms using cutting planes, heuristics, and subgradient optimization: a computational study. Springer, 1980.
- [66] G. Cornuéjols, R. Sridharan, and J.-M. Thizy, “A comparison of heuristics and relaxations for the capacitated plant location problem,” European Journal of Operational Research, vol. 50, pp. 280–297, 1991.
- [67] D. Bergman, A. A. Ciré, W. J. van Hoeve, and J. Hooker, “Decision diagrams for optimization,” Constraints, vol. 20, pp. 494 – 495, 2015.
- [68] M. Gasse, S. Bowly, Q. Cappart, J. Charfreitag, L. Charlin, D. Chételat, A. Chmiela, J. Dumouchelle, A. Gleixner, A. M. Kazachkov, E. Khalil, P. Lichocki, A. Lodi, M. Lubin, C. J. Maddison, M. Christopher, D. J. Papageorgiou, A. Parjadis, S. Pokutta, A. Prouvost, L. Scavuzzo, G. Zarpellon, L. Yang, S. Lai, A. Wang, X. Luo, X. Zhou, H. Huang, S. Shao, Y. Zhu, D. Zhang, T. Quan, Z. Cao, Y. Xu, Z. Huang, S. Zhou, C. Binbin, H. Minggui, H. Hao, Z. Zhiyu, A. Zhiwu, and M. Kun, “The machine learning for combinatorial optimization competition (ml4co): Results and insights,” in Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, ser. Proceedings of Machine Learning Research, D. Kiela, M. Ciccone, and B. Caputo, Eds., vol. 176. PMLR, 06–14 Dec 2022, pp. 220–231. [Online]. Available: https://proceedings.mlr.press/v176/gasse22a.html
![]() |
Haoyang Liu received the B.Sc. degree in School of Mathematics from University of Science and Technology of China, Hefei, China, in 2023. He is currently a graduate student in the Department of Electronic Engineering and Information Science at University of Science and Technology of China, Hefei, China. His research interests include reinforcement learning and learning to optimize. |
![]() |
Yufei Kuang received the B.Sc. degree in Statistics from Nanjing University, Nanjing, China, in 2020. He is currently a Ph.D. candidate in the School of Information Science and Technology at University of Science and Technology of China, Hefei, China. His research interests include reinforcement learning and learning to optimize. |
![]() |
Jie Wang received the B.Sc. degree in electronic information science and technology from University of Science and Technology of China, Hefei, China, in 2005, and the Ph.D. degree in computational science from the Florida State University, Tallahassee, FL, in 2011. He is currently a professor in the Department of Electronic Engineering and Information Science at University of Science and Technology of China, Hefei, China. His research interests include AI for science, knowledge graph, large-scale optimization, deep learning, etc. He is a senior member of IEEE. |
![]() |
Xijun Li is a senior researcher of HUAWEI Noah’s Ark Lab. Before that, he has received M.Sc. degree from Shanghai Jiao Tong University, P.R. China, in 2018. He is working towards his Ph.D. degree in the University of Science and Technology of China (HUAWEI-USTC Joint Ph.D. Program) under the supervision of Prof. Jie Wang. He has published 10+ papers on top peer-reviewed conferences and journals (such as ICLR, KDD, ICDE, SIGMOD, DAC, CIKM, ICDCS, TCYB, etc.) and applied/published 12 patents (first author in 6 patents) with Noah’s Ark Lab. And he has won the championship of student learderboard in Dual Track of NeurIPS’21 ML4CO Competition. His recent research interests focus on Mathematical Programming Solver, Learning to Optimization (L2O) and Machine Learning for Computer System (ML4CS). |
![]() |
Yongdong Zhang received the Ph.D. degree in electronic engineering from Tianjin University, Tianjin, China, in 2002. He is currently a Professor at the University of Science and Technology of China. He has authored more than 100 refereed journal articles and conference papers. His current research interests include multimedia content analysis and understanding, multimedia content security, video encoding, and streaming media technology. He serves as an Editorial Board Member of Multimedia Systems journal and Neurocomputing. He was the recipient of the Best Paper Award in PCM 2013, ICIMCS 2013, and ICME 2010, and the Best Paper Candidate in ICME 2011. He is a Senior Member of IEEE. |
![]() |
Feng Wu received the B.S. degree in electrical engineering from Xidian University in 1992, and the M.S. and Ph.D. degrees in computer science from the Harbin Institute of Technology in 1996 and 1999, respectively. He is currently a Professor with the University of Science and Technology of China, where he is also the Dean of the School of Information Science and Technology. Before that, he was a Principal Researcher and the Research Manager with Microsoft Research Asia. His research interests include image and video compression, media communication, and media analysis and synthesis. He has authored or coauthored over 200 high quality articles (including several dozens of IEEE Transaction papers) and top conference papers on MOBICOM, SIGIR, CVPR, and ACM MM. He has 77 granted U.S. patents. His 15 techniques have been adopted into international video coding standards. As a coauthor, he received the Best Paper Award at 2009 IEEE Transactions on Circuits and Systems for Video Technology, PCM 2008, and SPIE VCIP 2007. He also received the Best Associate Editor Award from IEEE Circuits and Systems Society in 2012. He also serves as the TPC Chair for MMSP 2011, VCIP 2010, and PCM 2009, and the Special Sessions Chair for ICME 2010 and ISCAS 2013. He serves as an Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions ON Multimedia, and several other international journals. |
Appendix A More Background Information
A.1 Branching Rules
An efficiency branching policy significantly influences the size of B&B trees thus the overall solving efficiency. Commonly used traditional branching rules include strong branching and pseudocost branching.
The strong branching rule is known to produce the smallest B&B trees among the branching rules. Let denote the linear relaxed optimal objective for the parent node. We call a variable branching candidate variable if has factional value in the optimal solution of the parent LP relaxation. For a branching candidate variable , we let and be the relaxed optimal objective for the children obtained by adding the lower or upper bound (), respectively. Then strong branching chooses the branching variable in the branching candidate set by the following rules,
where is a small constant. Since strong branching needs to solve the LP relaxations of the two sub-problems for each candidate variable, it requires considerable computation.
The pseudocost branching rule is simpler with much less computational cost than strong branching. It keeps track of the previous successful branching variables and gives a score for each candidate branching variable using the history information. The pseudocost branching rule is faster than the strong branching rule for less computational cost. However, the pseudocost branching rule requires considerable human experience for manual tuning.
To overcome the disadvantages of strong branching and pseudocost branching, some hybrid branching rules are proposed to combine these methods. The reliability pseudocost branching is one of the variants of these hybrid rules with state-of-the-art performance. Thus, the reliability pseudocost branching rule is used as the default branching rule in SCIP.
A.2 GNN Encoder for Branching Samples
In recent studies [10], researchers use a single graph neural network composed of two interleaved half-convolutions to process the instance bipartite graphs or branching samples. GNN first performs massage passing from the variable nodes to constraint nodes, then from constraint nodes to variable nodes to obtain the variable embeddings, i.e.,
where MLP denotes the multi-layer perceptron and is the concatenation operation of vectors. The variable embeddings can be used for downstream tasks such as optimal solution prediction, branching variable selection and so on.

A.3 Primal-Dual Integral Metric
When employing the B&B algorithm, we keep track of the global primal and dual bound. The global primal bound at time , i.e., , is the best objective found at time . The global dual bound is the minimum dual bound of all the search tree leaves at time . By definition, we have the primal-dual gap , and this gap decreases as the B&B algorithm proceeds. B&B finds the optimal solution if and only if .
With time limit , the primal-dual integral (PD integral) is defined as
When solving the same instance, solvers that achieve smaller PD-integral often lead to better feasible solutions given the same solving time. The PD-integral is a well-recognized metric for solver performance. Additionally, the PD gap metric is 0 if the instance is solved to be optimal in the limit time.
Appendix B Theoretical Analysis
B.1 More Related Work on Theoretical Results for Data Augmentation
Data augmentation is an effective method for introducing prior information and promoting data diversity, which is widely used in supervised learning [52, 53, 54] and reinforcement learning tasks [55, 56, 57]. Recently, augmentation on graph data has attracted more and more attention [58, 59, 60, 61]. For theoretical results, previous work [46] proposes a generalization bound for data augmentation in a group-theoretical way; [62] analyses from the kernel perspective of data augmentation; [63] studies the effects of data augmentation on the ridge estimator in an over-parametrized linear regression setting.
B.2 Proof of Theorem 1
The generalization bound for Equation 8 and 9 can be derived from the classical approach, where we briefly show the proof as follows. We start by splitting the term to be bounded into two parts,
The second part can be bounded by the Rademacher complexity using Mcdiarmid’s inequality,
To bound the first part, we see that
We complete the proof of Equation 8 by combining the bounds for these two parts. The proof of the generalization bound under instance augmentation (Equation 9) is the same.
We next compare the generalization bound of the two bounds.
If and the following inequality holds,
We thus show Equation 10 then the generalization bound of instance augmentation can be lower.
B.3 Proof of Theorem 2
We start by decompose the generalization error as we did in the IL setting
(12) | ||||
The inequality in (12) holds since the training loss reaches its minimum at the policy by definition, and the expectation . We then try to control the other three terms in (12).
For reparameterizable RL,
For the second term in Equation (12), we view as a random variable and the total returns sampled from instances drawn from distribution are i.i.d. We then use the standard symmetrization argument to obtain the bound with confidence at least ,
(13) |
Finally, we bound the first and the last terms. Different from the IL setting, the transition function plays an important role in sampling training branching samples. Thus, the branching samples cannot be viewed as i.i.d samples.
Appendix C Details of the Benchmarks
C.1 Synthesis Benchmarks for Evaluations
We evaluate the performance of our proposed methods on four classical benchmarks of NP-hard problem families, namely combinatorial auctions, minimum set covering, capacitated facility location and maximum independent set. We use these four synthesis benchmarks mainly for the following reasons. First, these four benchmarks are widely-used testing beds for MILP solvers [10, 26, 27]. Second, we can generate a series of transfer distributions with much larger scales to evaluate solvers’ generalization ability. Third, these benchmarks represent a wide collection of MILP problems in practice.
We use the code in https://github.com/ds4dm/learn2branch for data generation. (1) Combinatorial auctions is comprised of instances generated following the arbitrary relationships procedure of [64]. (2) Set covering is comprised of instances generated following [65] with default density 0.05. (3) Capacitated facility location is comprised of instances generated following [66]. (4) Maximum independent set is comprised of instances on Erdős-Rényi random graphs, generated following the procedure of [67]. For all benchmarks, we classify six distributions to by increasing instance scales or difficulty levels, where is the easiest with the smallest scales and is the most difficult with the largest scales that are five to ten times larger than those in . The baselines can solve all the instances of and within 900s but reach the time limit in . Table X summarizes the generation hyperparameters for generation algorithms.
We generate instances for training and instances for validation, and we evaluate on instances on to , respectively. For the GNN and AdaSolver-IL methods, we solve the training instances with SCIP 7.0.3 and record branching samples and branching decisions of a strong branching expert. We collect samples for training and for validation in total. For RL-based solvers, we limit the processing time of each episode to no more than 1000s.
Distributions | ||||||
Row | 500 | 1000 | 2000 | 3000 | 4000 | 8000 |
Column | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 |
Set Covering |
Distributions | ||||||
Items | 100 | 200 | 300 | 400 | 500 | 600 |
Bids | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
Combinatorial Auctions |
Distributions | ||||||
Facilities | 100 | 200 | 400 | 600 | 800 | 1600 |
Customers | 100 | 100 | 100 | 100 | 100 | 100 |
Capacitated Facility Location |
Distributions | ||||||
Nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
Affinity | 4 | 4 | 4 | 4 | 4 | 4 |
Maximum Independent Set |
C.2 Real-world Benchmarks for Evaluations
The real-world dataset we use in this paper is the Anonymous dataset from the NeurIPS 2021 ML4CO competition [68]. It consists of 98 training and 20 validation instances. Similarly to what we do in the synthesis benchmarks, we collect training samples and validation samples for GNN and AdaSolver-IL. For RL-based solvers, we limit the processing time of each episode to no more than 1000s.
C.3 Datasets for the Visulization
We generate ten distributions of the set covering benchmark for visualization using the combinations of rows, 1000 columns and density. The density hyperparameter for data generation controls the proportion of nonzero entries of the constraint matrices. We change the density of the instances to obtain graphs with different local structures to simulate the environmental perturbations in real-world applications.
Appendix D Implementation Details
D.1 Hard-ware Specification and Experiment Settings
We conducted all the experiments on a single machine with NVidia GeForce GTX 3090 GPUs and Intel(R) Xeon(R) E5-2667 v4 CPUs @ 3.20GHz. During the evaluation, we run three random seeds over each instance.
D.2 Details of the Baselines
In this part, we introduce the details of the baselines used in this paper.
(1) GNN. We use the same implementation of GCNN in [10]. First, we apply two interleaved half-convolutions to the input bipartite graphs. The graph convolution has two successive passes with 64 units, one from variable to constraints and one from constraints to variables. Then we apply a final 2-layer perceptron on the embeddings of variable nodes, combined with a masked softmax activation to produce a probability distribution over the branching candidates. We also use the prenorm layers proposed in [10] and conduct a pre-train stage for them.
(2) SVMRank and LambdaMART. We implement the SVMRank and LambdaMART model for branching policy in SCIP as [10]. The two models take inputs as the 91-dimensional branching features described in [9] and are trained on pairs of variable’s state-rank samples for ranking. The SVMRank is trained by minimizing a pairwise loss of variables and the LambdaMART is trained by minimizing normalized cumulative gain. As mentioned in [10], we reduce the dataset with 250000 candidate variables for training and 100000 for validation because of the prohibitive memory and time cost.
(3) Trees. The Trees model is implemented following that in [10], where we obtain the input variable features by concatenating the bipartite graph variable features, the minimum, mean and maximum of the edge-constraint features over its neighborhood. The model predicts a branching score for each candidate variable and is trained by minimizing the mean-squared error between the predicted scores and ground truth strong branching scores. We also reduce the size of the dataset for Trees.
(4) tMDP+DFS. tMDP+DFS is a reinforcement-learning-based method under the framework of tree Markov Decision Processes (tMDP), a variant of temporal Markov Decision Processes. We implement the tMDP+DFS model with two interleaved half-convolutions GNN encoder as described in [11]. The GNN model architecture is the same as [10]. tMDP+DFS is trained by REINFORCE with tree policy gradients and depth-first search node selection.
(5) TreeGate. TreeGate is an imitation learning framework with input features of 25-dimensional branching candidate states and 60-dimensional search tree states [36]. TreeGate introduces such tree-state parameterization by first embedding the tree-state inputs and then sending them to the proposed gating layers to extract the tree-based signal. Combining the variable representations and tree-state representation, TreeGate predicts the branching scores for each branching candidate variable.
D.3 Representations of instance Bipartite Graph and Branching Samples
To avoid mixing up, recall that the instance bipartite graphs are the input of the augmenter and do not contain solving statistics (since we augment the instance before we solve it with the solver), while the branching samples are the input of branching policy with solving statistics. We use the MILP Bipartite representation for instance graphs implemented in Ecole 0.8 package [42] with nine variable features, one constraint feature and one edge feature listed in Table XI. The branching samples use the graph representations described in [10] listed in Table XII.
Index | Variable Feature Name | Description |
0 | Objective | Objective coefficient |
1-4 | Variable type | Variable type (binary, integer, implicit-integer, continuous) |
5 | Specified bounds | Whether the variable has a lower bound |
6 | Specified bounds | Whether the variable has an upper bound |
7 | Lower bound | Whether the variable reaches its lower bound |
8 | Upper bound | Whether the variable reaches its upper bound |
Index | Constraint Feature Name | Description |
0 | Bias | Normalized right-hand-side of the constraint |
Index | Constraint Feature Name | Description |
0 | Coefficient | Constraint coefficient |
Index | Variable Feature Name | Description |
0-3 | Variable type | Variable type |
4 | Normalized coefficient | Normalized objective coefficient |
5 | Specified bounds | Whether the variable has a lower bound |
6 | Specified bounds | Whether the variable has an upper bound |
7 | Lower bound | Whether the variable reaches its lower bound |
8 | Upper bound | Whether the variable reaches its upper bound |
9 | Solution fractionality | Fractional part of the variable |
10-13 | Categorical | Variable is at its bound or zero |
14 | Reduced cost | Amount by which the objective coefficient of the variable should decrease so that the variable assumes a positive value in the LP solution |
15 | Age | Number of LP iterations |
16 | Solution value | Value of the variable |
17 | Incumbent value | Value of the variable in the current best solution |
18 | Average incumbent value | Average value in the observed feasible primal solutions |
Index | Constraint Feature Name | Description |
0 | Cosine similarity | Cosine of the angle between objective coefficients and the coefficients of this constraint |
1 | Bias | Normalized right-hand -side of the constraint |
2 | Age | Iterations since the last time the constraint was active |
3 | Normalized dual value | Value of dual variable corresponding to the constraint |
4 | Bounds | Whether the constraint reaches its bound |
Index | Constraint Feature Name | Description |
0 | Coefficient | Constraint coefficient |
D.4 Implementation Details of AdaSolver
D.4.1 Model Architecture of the Augmenter
The augmenter comprises three parts: an augmentation policy network, a state-value function network and a discriminator network. Each part has a GNN graph encoder and MLP decoders for downstream tasks. The GNN graph encoder encodes the node features of the instance bipartite graphs into 10-dimensional embeddings, then applies an interleaved half-convolution layer to extract 64-dimensional representations for variables and constraints , respectively. The augmentation policy and state-value function share the same instance bipartite graph encoder. For the augmentation policy, we use three single-layer networks, , and to obtain the masking probability for the variable, the constraint and edge . Then we choose the nodes and edges to mask according to the predicted probability with a predefined masking proportion. The state-value function uses one single-layer network to predict the current state value. The discriminator uses one single-layer network to predict the probability of an augmented instance that is from the original distribution.
D.4.2 Training and Optimization
we use Adam optimizer [50] to train the networks in all experiments.
In the IL setting, we set an initial learning rate of and batch size of 8 for GNN and set a learning rate of and batch size of 4 for the augmenter. We divide the learning rate for GNN by 5 when the GNN validation loss does not improve for 10 epochs, and stop training if it does not improve for 20. The instance graph augmentation begins at Epoch 10. In each epoch after Epoch 10, we choose instances to perform augmentation and extract branching samples from the augmented instances. Here , in the combinatorial auctions benchmark; , in other three synthesis benchmarks; , in the real-world Anonymous dataset. Then we randomly choose 10000 branching samples from the original instances and 2000 samples from the augmented instances to train the GNN.
In the RL setting, we set an initial learning rate of and batch size of 8 for tMDP+DFS and set a learning rate of and batch size of 4 for the augmenter. In each epoch, we choose 10 instances and run the branching agent on them. We augment 10 instances and run on them every ten epochs. The maximum exploring time is set to be no more than 1000s on each instance.
D.4.3 Hyperparameters
We list the common hyperparameters in Table XIII. For augmentation operators and masking proportions, we choose to mask 3% constraints and 1% edges on the set covering benchmark, mask 1% variables on the combinatorial auctions benchmark, mask 1% constraints and 1% variables on the capacity facility location benchmark, mask 1% edges and 5% constraints on the maximum independent set, and mask 2% constraints and 1% variables in the Anonymous dataset.
Hyperparameter | Value |
Half-convolution layers | 1 for variables and 1 for constraints |
Graph encoder hidden size | 64 |
Batch size | 4 |
PPO learning rate | 1e-3 |
PPO entropy coefficient | 1e-2 |
PPO epochs at each iteration | 3 |
PPO buffer size | 300 |
PPO clip ratio | 0.2 |
D.4.4 Variants of AdaSolver
Random augmentation (RA). The branching policy network is the same as GNN. Different from AdaSolver, RA does not have an augmentation network. Instead, RA randomly masks the constraints, variables and edges under the same operators and proportions as AdaSolver.
AdaSolver-REINFORCE. The branching policy network is the same as GNN. Different from AdaSolver, AdaSolver-REINFORCE employs the REINFORCE algorithm to train the augmentation network, and other hyperparameters remain the same. It is well-known that the REINFORCE algorithm is less sample-efficient than the PPO algorithm, thus the performance of AdaSolver-REINFORCE is inferior to AdaSolver.
We will release all the codes for training and end-to-end evaluation once the paper is accepted.
Appendix E More Experimental Results
In this part, we provide the results for the experiments in the main text in terms of nodes (Nodes) and the primal-dual gap (PD gap). We first give some explanations for these metrics.
Nodes. The number of explored nodes is a metric that reflects the size of B&B searching trees. However, the Nodes metric may be not an appropriate metric to measure the solving efficiency. First, fewer nodes do not mean shorter solving time since the solver may spend more time on each node, resulting in a longer total solving time. Second, on the distributions that solvers cannot find the optimal solutions within the time limit, the Nodes metric is not appropriate to reflect the solving performance. On the one hand, we still cannot estimate the number of explored nodes when solving to the optimal value. On the other hand, given the time limit, fewer nodes imply that the solvers even explore the harder nodes that require more computation to solve the LP relaxations.
PD gap. (1) On the distributions that solvers can find the optimal solutions with the time limit, the PD gap metric reaches zero and fewer nodes imply a better solving performance. (2) On the distributions that solvers cannot find the optimal solutions within the time limit, the PD gap is a suitable metric to measure the solving efficiency. Given the time limit, the lower PD gap implies that the solver has found a better feasible solution so far.
E.1 More Results of Main Evaluation
We present more results on the comparative experiments. Table XIV reports the performance of AdaSolver and the baselines in terms of the total number of nodes and PD gap. The results in Table XIV demonstrate that AdaSolver-IL explores significantly fewer nodes than tMDP and TreeGate consistently on the easier distributions, and slightly more nodes than GNN. However, AdaSolver-IL achieves faster solving efficiency, which means that AdaSolver-IL spends less time than GNN on each node and leads to a wiser strategy. For the distributions that the solvers reach the time limit, AdaSolver-IL outperforms all the baselines in terms of the PD gap. We also observe that AdaSolver-IL explores more nodes than other baselines within the limited time, which demonstrates that AdaSolver-IL learns a policy that chooses easier nodes to explore.
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RPB | 247.59 | 0.00 | 5866.93 | 0.00 | 52814.36 | 0.05 | 25630.00 | 0.10 | 18279.17 | 0.15 | 3910.22 | 0.26 |
SVMRank | 332.60 | 0.00 | 5478.59 | 0.00 | 20326.22 | 0.03 | 5152.89 | 0.13 | 2382.83 | 0.19 | 798.75 | 0.28 |
Trees | 396.46 | 0.00 | 5960.50 | 0.00 | 17705.65 | 0.07 | 6323.89 | 0.13 | 3552.26 | 0.18 | 1021.86 | 0.28 |
LambdaMart | 327.42 | 0.00 | 5928.39 | 0.00 | 28178.68 | 0.03 | 7501.61 | 0.13 | 3472.22 | 0.18 | 893.93 | 0.28 |
tMDP+DFS | 2324.27 | 0.00 | 35037.37 | 0.02 | 34062.76 | 0.04 | 11145.84 | 0.17 | 5365.54 | 0.22 | 1519.11 | 0.31 |
TreeGate | 1242.64 | 0.00 | 12826.16 | 0.004 | 19133.68 | 0.03 | 6901.62 | 0.15 | 3144.67 | 0.20 | 568.44 | 0.30 |
GNN | 259.61 | 0.00 | 3913.11 | 0.00 | 28926.49 | 0.05 | 14676.73 | 0.11 | 7162.61 | 0.16 | 1120.95 | 0.27 |
AdaSolver-RL | 2227.30 | 0.00 | 35936.70 | 0.02 | 32528.50 | 0.12 | 15368.45 | 0.17 | 8476.60 | 0.21 | 1653.05 | 0.32 |
AdaSolver-IL | 270.69 | 0.00 | 4361.15 | 0.00 | 40711.45 | 0.05 | 30649.53 | 0.10 | 19573.54 | 0.14 | 2367.05 | 0.27 |
Set Covering |
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RPB | 42.00 | 0.00 | 1616.12 | 0.00 | 14154.73 | 0.00 | 24072.15 | 0.01 | 15178.22 | 0.02 | 9185.16 | 0.02 |
SVMRank | 93.02 | 0.00 | 1604.26 | 0.00 | 11150.03 | 0.004 | 9843.47 | 0.01 | 7564.52 | 0.02 | 6019.93 | 0.02 |
Trees | 102.49 | 0.00 | 3487.29 | 0.00 | 29083.64 | 0.005 | 16295.54 | 0.01 | 12398.58 | 0.02 | 10584.14 | 0.02 |
LambdaMart | 87.58 | 0.00 | 1587.87 | 0.00 | 14109.08 | 0.003 | 15478.90 | 0.01 | 11426.58 | 0.02 | 8965.19 | 0.02 |
tMDP+DFS | 141.69 | 0.00 | 3290.90 | 0.00 | 27826.58 | 0.003 | 47398.77 | 0.01 | 28752.17 | 0.02 | 16827.58 | 0.02 |
TreeGate | 527.47 | 0.00 | 4473.12 | 0.00 | 26772.53 | 0.003 | 21841.75 | 0.02 | 15610.67 | 0.02 | 6948.79 | 0.03 |
GNN | 84.99 | 0.00 | 1306.10 | 0.00 | 11710.30 | 0.00 | 32724.97 | 0.009 | 21995.83 | 0.01 | 15017.98 | 0.02 |
AdaSolver-RL | 191.85 | 0.00 | 2308.95 | 0.00 | 31815.45 | 0.00 | 58011.39 | 0.01 | 38012.95 | 0.02 | 22258.45 | 0.02 |
AdaSolver-IL | 102.71 | 0.00 | 1953.57 | 0.00 | 39556.67 | 0.00 | 36135.36 | 0.008 | 27599.51 | 0.01 | 17599.67 | 0.01 |
Combinatorial Auctions |
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RPB | 387.31 | 0.00 | 413.93 | 0.00 | 145.20 | 0.002 | 11.78 | 0.002 | 2.38 | 0.02 | 1.20 | 0.07 |
SVMRank | 491.71 | 0.00 | 533.21 | 0.00 | 337.59 | 0.002 | 34.63 | 0.004 | 10.46 | 0.02 | 2.46 | 0.09 |
Trees | 508.60 | 0.00 | 588.19 | 0.00 | 317.75 | 0.002 | 31.00 | 0.004 | 8.22 | 0.02 | 2.78 | 0.09 |
LambdaMart | 445.69 | 0.00 | 520.89 | 0.00 | 338.38 | 0.002 | 34.79 | 0.004 | 10.51 | 0.02 | 2.52 | 0.09 |
tMDP+DFS | 723.95 | 0.00 | 788.96 | 0.00 | 361.68 | 0.002 | 42.21 | 0.002 | 7.68 | 0.02 | 2.89 | 0.09 |
TreeGate | 584.64 | 0.00 | 631.03 | 0.00 | 299.50 | 0.003 | 46.27 | 0.004 | 2.93 | 0.007 | 1.44 | 0.05 |
GNN | 458.11 | 0.00 | 592.82 | 0.00 | 290.25 | 0.002 | 73.76 | 0.003 | 16.22 | 0.005 | 3.17 | 0.07 |
AdaSolver-RL | 1024.40 | 0.00 | 839.95 | 0.00 | 392.70 | 0.002 | 116.00 | 0.004 | 36.08 | 0.004 | 8.13 | 0.04 |
AdaSolver-IL | 529.61 | 0.00 | 659.79 | 0.00 | 487.49 | 0.002 | 190.46 | 0.002 | 59.84 | 0.003 | 3.72 | 0.08 |
Capacitated Facility Location |
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RPB | 594.46 | 0.00 | 16097.93 | 0.00 | 17730.83 | 0.03 | 900.00 | 5387.45 | 900.00 | 1874.07 | 900.01 | 854.44 |
SVMRank | 289.60 | 0.00 | 5757.21 | 0.01 | 5097.74 | 0.02 | 1536.43 | 0.03 | 909.91 | 0.04 | 507.74 | 0.05 |
Trees | 653.03 | 0.00 | 13349.63 | 0.004 | 7205.15 | 0.02 | 2037.52 | 0.03 | 1444.09 | 0.04 | 912.21 | 0.05 |
LambdaMart | 425.62 | 0.00 | 12268.72 | 0.005 | 8108.65 | 0.02 | 1862.61 | 0.03 | 1084.89 | 0.04 | 605.03 | 0.05 |
tMDP+DFS | 459.08 | 0.00 | 21183.04 | 0.003 | 23298.80 | 0.02 | 7350.35 | 0.03 | 3615.30 | 0.04 | 1647.60 | 0.05 |
TreeGate | 2136.85 | 0.003 | 8585.86 | 0.01 | 3516.26 | 0.01 | 1198.28 | 0.04 | 925.65 | 0.05 | 999.87 | 0.05 |
GNN | 285.58 | 0.00 | 10623.60 | 0.001 | 16520.12 | 0.02 | 3809.20 | 0.03 | 1392.95 | 0.04 | 679.15 | 0.04 |
AdaSolver-RL | 351.50 | 0.00 | 22448.85 | 0.00 | 30533.75 | 0.01 | 12887.25 | 0.03 | 7027.05 | 0.04 | 4016.80 | 0.05 |
AdaSolver-IL | 241.12 | 0.00 | 10664.04 | 0.004 | 20861.74 | 0.01 | 19596.78 | 0.03 | 12479.65 | 0.04 | 6548.56 | 0.05 |
Maximum Independent Set |
E.2 More Results of Ablation Studies
We compare the Nodes and PD gap metrics in the ablation studies. The results in Table XV show that AdaSolver outperforms RA in terms of solving time and PD gap, demonstrating the effectiveness of our proposed adversarial instance augmenter.
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RA | 269.83 | 0.00 | 5676.33 | 0.00 | 44928.63 | 0.06 | 14929.49 | 0.18 | 8362.64 | 0.23 | 3301.91 | 0.31 |
AdaSolver-IL | 270.69 | 0.00 | 4361.15 | 0.00 | 40711.45 | 0.05 | 30649.53 | 0.10 | 19573.54 | 0.14 | 2367.05 | 0.27 |
RA | 2227.10 | 0.00 | 36310.10 | 0.02 | 31287.20 | 0.12 | 10865.95 | 0.18 | 5084.90 | 0.22 | 815.05 | 0.33 |
AdaSolver-RL | 2227.30 | 0.00 | 35936.70 | 0.02 | 32528.50 | 0.12 | 15368.45 | 0.17 | 8476.60 | 0.21 | 1653.05 | 0.32 |
Set Covering |
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RA | 101.24 | 0.00 | 2491.01 | 0.00 | 43006.75 | 0.00 | 29486.44 | 0.009 | 21203.95 | 0.01 | 14441.14 | 0.02 |
AdaSolver-IL | 102.71 | 0.00 | 1953.57 | 0.00 | 39556.67 | 0.00 | 36135.36 | 0.008 | 27599.51 | 0.01 | 17599.67 | 0.01 |
RA | 199.05 | 0.00 | 2237.20 | 0.00 | 30545.60 | 0.00 | 58141.90 | 0.01 | 31930.00 | 0.02 | 15079.60 | 0.03 |
AdaSolver-RL | 191.85 | 0.00 | 2308.95 | 0.00 | 31815.45 | 0.00 | 58011.39 | 0.01 | 38012.95 | 0.02 | 22258.45 | 0.02 |
Combinatorial Auctions |
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RA | 473.79 | 0.00 | 608.75 | 0.00 | 438.14 | 0.002 | 89.51 | 0.004 | 29.80 | 0.005 | 9.52 | 0.05 |
AdaSolver-IL | 529.61 | 0.00 | 659.79 | 0.00 | 487.49 | 0.002 | 190.46 | 0.002 | 59.84 | 0.003 | 3.72 | 0.08 |
RA | 1018.60 | 0.00 | 854.90 | 0.00 | 468.90 | 0.002 | 98.50 | 0.003 | 7.90 | 0.008 | 2.15 | 0.11 |
AdaSolver-RL | 1024.40 | 0.00 | 839.95 | 0.00 | 392.70 | 0.002 | 116.00 | 0.004 | 36.08 | 0.004 | 8.13 | 0.04 |
Capacitated Facility Location |
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
RA | 226.79 | 0.00 | 40493.42 | 0.00 | 31616.33 | 0.01 | 14390.99 | 0.03 | 7323.71 | 0.04 | 4721.40 | 0.05 |
AdaSolver-IL | 241.12 | 0.00 | 10664.04 | 0.004 | 20861.74 | 0.01 | 19596.78 | 0.03 | 12479.65 | 0.04 | 6548.56 | 0.05 |
RA | 323.60 | 0.00 | 23020.05 | 0.007 | 27906.65 | 0.01 | 9073.45 | 0.03 | 2974.65 | 0.04 | 1434.90 | 0.05 |
AdaSolver-RL | 351.50 | 0.00 | 22448.85 | 0.00 | 30533.75 | 0.01 | 12887.25 | 0.03 | 7027.05 | 0.04 | 4016.80 | 0.05 |
Maximum Independent Set |
E.3 More Results of Longer Time Limit
We evaluate our proposed AdaSolver on a much longer time limit of two hours in the Anonymous dataset to investigate whether AdaSolver can still perform well when used in a much longer time limit. We train the models with a time limit of 900s. The results are in Table XVI, which demonstrate that AdaSolver significantly improves the performance of the underlying solvers.
Method | Time(s) | Nodes | PD integral | PD gap |
RPB | 6703.81 | 300709.45 | 435985.03 | 1.50e+19 |
tMDP+DFS | 5647.83 | 338191.85 | 241413.64 | 1.02 |
GNN | 5670.69 | 308766.63 | 206413.12 | 1.13 |
AdaSolver-RL | 5514.66 | 358002.10 | 211590.82 | 0.86 |
AdaSolver-IL | 5640.18 | 295459.21 | 203917.75 | 1.04 |
E.4 More Results on the Sample Efficiency
We compare the Nodes and PD gap metrics in the experiments of Section 8.3. The results in Table XVII show that AdaSolver significantly outperforms the underlying baselines in terms of PD gap when trained on limited instances, demonstrating much higher sample efficiency. Notice that in the distributions where the solvers can solve the instances within the time limit, AdaSolver-IL also explores significantly less nodes.
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
GNN | 769.60 | 0.00 | 15754.15 | 0.00 | 37577.55 | 0.10 | 26173.75 | 0.15 | 18570.70 | 0.20 | 4854.60 | 0.30 |
AdaSolver-IL | 445.25 | 0.00 | 10010.15 | 0.00 | 49802.45 | 0.08 | 38256.80 | 0.13 | 28539.60 | 0.17 | 7270.45 | 0.28 |
Set Covering |
Method | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap | Nodes | PD gap |
GNN | 217.50 | 0.00 | 3532.60 | 0.00 | 34269.20 | 0.004 | 33319.85 | 0.01 | 23282.75 | 0.02 | 16594.65 | 0.03 |
AdaSolver-IL | 107.93 | 0.00 | 1714.28 | 0.00 | 14221.71 | 0.00 | 31613.71 | 0.01 | 23753.37 | 0.01 | 16616.03 | 0.02 |
Combinatorial Auctions |