This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Network Interdiction Goes Neural

Lei Zhang
Department of Computer Science
Virginia Tech
[email protected]
&Zhiqian Chen
Department of Computer Science and Engineering
Mississippi State University
[email protected]
Chang-Tien Lu
Department of Computer Science
Virginia Tech
[email protected]
&Liang Zhao
Department of Computer Science
Emory University
[email protected]
Abstract

Network interdiction problems are combinatorial optimization problems involving two players: one aims to solve an optimization problem on a network, while the other seeks to modify the network to thwart the first player’s objectives. Such problems typically emerge in an attacker-defender context, encompassing areas such as military operations, disease spread analysis, and communication network management. The primary bottleneck in network interdiction arises from the high time complexity of using conventional exact solvers and the challenges associated with devising efficient heuristic solvers. GNNs, recognized as a cutting-edge methodology, have shown significant effectiveness in addressing single-level CO problems on graphs, such as the traveling salesman problem, graph matching, and graph edit distance. Nevertheless, network interdiction presents a bi-level optimization challenge, which current GNNs find difficult to manage. To address this gap, we represent network interdiction problems as Mixed-Integer Linear Programming (MILP) instances, then apply a multipartite GNN with sufficient representational capacity to learn these formulations. This approach ensures that our neural network is more compatible with the mathematical algorithms designed to solve network interdiction problems, resulting in improved generalization. Through two distinct tasks, we demonstrate that our proposed method outperforms theoretical baseline models and provides advantages over traditional exact solvers.

1 Introduction

In recent years, graph neural networks (GNNs) have exhibited promise in combinatorial optimization (CO) problems, with graphs being the preferred representation due to the discrete nature of most CO problems and the ubiquity of network data in real-world applications. For example, GNNs have also demonstrated effectiveness in CO problems such as Traveling Salesman Problem (TSP) [20], graph matching [11, 17], and graph edit distance [1]. GNNs’ inductive bias makes them well-suited for encoding graph-structured data, benefiting from permutation invariance and sensitivity to input sparsity. Efforts have also been made to understand the limitations and underlying mechanisms of GNNs in CO problems [3, 27, 10], and to integrate GNNs with classical methods like beam search [15] or tree search [18] for improved performance.

Beyond the realm of CO problems where the optimization is unilateral, real-world scenarios often involve adversarial settings requiring two levels of optimization. For instance, in a power grid system, it is not only important to design the network for maximum throughput (single level) but also to identify critical points where a compromise on the design of the network would cause the most significant damage (bi-level). This situation represents a more generalized CO problem with two competing roles: the follower aims to solve a CO problem on a fixed network, while the leader attempts to modify the network itself in a manner conflicting with the follower’s objectives. This generic scenario is known as network interdiction.

Traditional network interdiction solvers are commonly categorized as exact solvers and heuristic solvers [24]. Exact solvers aim to identify optimal solutions, necessitating the resolution of complex mathematical problems, typically NP-hard, which involves exploring a combinatorial space of potential network modifications and assessing their impact on the follower’s solution [8, 25]. As the network size increases, the exponential growth in the number of potential modifications results in computational complexity that exceeds polynomial time bounds. Conversely, heuristic solvers are algorithms designed to efficiently find good, if not optimal, solutions more quickly than exact methods. However, due to the complexity and variability of these problems—such as diverse assumptions, a vast solution space, and inherent problem intricacies—it’s uncommon for a pure heuristic solver to be effective. Instead, combining heuristic modules with traditional algorithms offers greater promise [24]. GNN-based machine learning methods, which have shown effectiveness in conventional unilateral CO problems[4], are clearly strong candidates for serving as heuristic modules in solving network interdiction problems more efficiently.

While it may seem intuitive to employ GNNs for addressing network interdiction problems, their application in this domain remains underexplored due to several challenges. Firstly, there is no effective representation method for network interdiction instances. For a GNN to learn to solve network interdiction problems, it must be able to encode all relevant information, including that of both the leader and the follower, in an attributed graph, and distinguish between instances with different solutions. Secondly, there is no theoretical guarantee that GNNs can be trained to perform algorithmically and generalize in solving network interdiction problems. Recent studies have shown that GNNs excel in certain CO tasks because they “align” with dynamic programming (DP) [27]. Given that DP or analogous polynomial-time heuristics can solve numerous CO problems, GNNs offer potential for generalization and extrapolation in these areas [10]. Unfortunately, network interdiction problems do not align with DP, making previous GNN for CO unable to be trivially used for network interdiction.

This paper proposes a new computational framework to solve the network interdiction problem. To do so, we answer two key questions: 1) Can GNNs provide effective representation for network interdiction problems, and if so, what theoretical assurances exist? 2) If the alignment between DP and GNN does not justify the utilization of GNNs for network interdiction problems, what alternative rationale does? To answer these two questions, we leverage existing research on the expressive capabilities of GNNs while accounting for the unique attributes of network interdiction problems and their conventional solution methodologies. Specifically, we employ a multipartite graph structure to comprehensively represent network interdiction problems and develop a novel GNN model tailored for multipartite graphs, termed MMILP-GNN. This proposed framework demonstrates that the MMILP-GNN model, supplemented with the random feature trick, offers sufficient representation power for network interdiction problems. Furthermore, by design, the framework aligns with traditional algorithms used to solve network interdiction problems, thereby not only fitting the training data but also demonstrating a degree of generalization ability.

2 Preliminaries

A network interdiction problem typically includes two players engaging in a game of max-min or min-max on a defined graph. One player, commonly referred to as the follower or defender, aims to optimize a standard CO problem on the graph, such as identifying the shortest path or maximizing the maximum flow between two nodes. The other player, often known as the leader or attacker, manipulates the network on which the follower operates, strategically disrupting the follower’s objective by actions like removing edges that should be connected in the graph or adding costs to existing edges. Formally, we define an instance of a network interdiction problem as follows.

Definition 2.1 (Network Interdiction Problem).

The general form of a max-min network interdiction problem is defined as:

maxΘ(𝐱)s.t.𝐱X,\max\Theta(\mathbf{x})\quad\quad s.t.\quad\mathbf{x}\in X, (1)

where 𝐱\mathbf{x} represents the leader/attacker’s variable. The objective function of the interdiction Θ(𝐱)\Theta(\mathbf{x}) is defined as the minimization of another function:

Θ(𝐱)=minf(𝐱,𝐲)s.t.𝐲Y(𝐱),\Theta(\mathbf{x})=\min f(\mathbf{x},\mathbf{y})\quad\quad s.t.\quad\mathbf{y}\in Y(\mathbf{x}), (2)

where 𝐲\mathbf{y} represents the follower/defense variables, f(𝐱,𝐲)f(\mathbf{x},\mathbf{y}) represents the follower/defender’s objective function (affected by the leader/attacker’s action 𝐱\mathbf{x}), and Y(𝐱)Y(\mathbf{x}) is the set of feasible actions that the follower/defender can do for a given 𝐱\mathbf{x}.

A min-max network interdiction problem is essentially the reverse of the max-min network interdiction problem, where the outer objective is now minimization, and the inner objective is maximization.

Example 2.1 (Shortest Path Interdiction).
max𝐱X{min𝐲Y(𝐱)(i,j)A(ci,j+di,jxi,j)yi,j}s.t.x{0,1}|A|:(i,j)Axi,jγ,Ty=b,y0,\small\begin{split}\max_{\mathbf{x}\in X}\{\min_{\mathbf{y}\in Y(\mathbf{x})}\sum_{(i,j)\in A}(c_{i,j}+d_{i,j}x_{i,j})y_{i,j}\}\quad\textrm{s.t.}\quad x\in\{0,1\}^{|A|}:\sum_{(i,j)\in A}x_{i,j}\leq\gamma,\quad T\textbf{y}=\textbf{b},\quad y\geq 0,\end{split} (3)

Eq. (3) illustrates the mathematical formulation of the shortest-path network interdiction problem. In this context, 𝐱={xi,j}(i,j)E\mathbf{x}=\{x_{i,j}\}_{(i,j)\in E} is a binary decision vector indicating interdicted edges, with EE being the network’s edge set. Here, ci,jc_{i,j} denotes the length of the edge, di,jd_{i,j} represents the additional length if edge (i,j)(i,j) is interdicted, y(i,j)y(i,j) indicates whether the edge exists in the network, and γ\gamma is the budget constraint on the number of interdictions.

Traditional exact solvers: Branch-and-Bound (BB) and Benders decomposition are two fundamental traditional exact methods for solving network interdiction problems. Compared to Benders decomposition, BB is more versatile and applicable to a wide range of network interdiction problems [9]. The problems that BB can handle are normally defined in MILPs, which can be formulated as

minxncTx,s.t.;Axb,lxu,xj,jI.\small\min_{x\in\mathbb{R}^{n}}c^{T}x,\quad\textrm{s.t.};Ax\circ b,l\leq x\leq u,x_{j}\in\mathbb{Z},\forall j\in I. (4)

For general MILPs, the worst-case complexity can be similarly exponential, as the algorithm may need to consider an exponential number of subproblems in the search space. Efforts have been made to represent general LPs and MILPs using a bipartite graph structure and then utilize GNNs to estimate solutions [5, 6, 12], but these methods have not yet been applied to network interdiction problems.

3 Proposed Framework

This section outlines the process of transforming a network interdiction problem into input suitable for a neural network, along with elucidating the neural network’s architecture. To ensure the model’s generalization ability, in Section 3.1, we process the problem instances into the Mixed Integer Linear Programming (MILP) form that BB can handle, following traditional methods. Given that GNNs have shown effectiveness in learning branching strategies [4], this approach simplifies the reasoning task for the GNN. These MILP formulations are then translated into multipartite graphs, capturing essential characteristics of the scenarios as outlined in Section 3.2. Subsequently, we propose a multipartite graph neural network, MMILP-GNN, tailored for estimating optimal interdiction decisions on the induced multipartite graphs rather than the original competitive network between the leader and follower, as elucidated in Section 3.3. More detailed theoretical rationale is provided in Section 4.

3.1 Preprocess Network Interdiction Instances

As the initial step in solving network interdiction problems, our aim is to process the problem instances using traditional methods up to the point where these methods become inadequate, leaving the challenging part to GNNs. As illustrated in Example 2.1, we have already transformed the problem instances into a constrained optimization problem. However, this format is not suitable for neural networks due to the nested two levels of optimization. To simplify this, we employ the “dualize-and-combine” approach. First, we derive the dual formulation of the inner problem with a fixed leader decision variable (in Example 2.1’s case, the variable xx), ensuring that both players’ problems align in the same optimization direction. Subsequently, we release the leader decision variable as a decision vector, converting the whole problem into a single-level MILP.

It is noteworthy that the network interdiction problems addressed in this paper, represented by Eq. (3), involve a followers’ problem that lends itself to being modeled as a convex optimization problem. This forms the basis for applying the “dualize-and-combine" technique for single-level reduction. Although this imposes a constraint on the types of problems we can tackle, the majority of network interdiction problems commonly encountered in real-world scenarios fall within this category [24].

Take the shortest path interdiction problem in Eq. (3) as an example, following the above-mentioned processes, the constrained bi-level optimization problem in Eq. (3) can be transformed into a single-level optimization:

Example 3.1 (Single-Level Reduction for the Shortest Path Interdiction Problem).
max𝐱X{min𝐲Y(𝐱)(i,j)A(ci,j+di,jxi,j)yi,j}s.t.x{0,1}|A|:(i,j)Axi,jγ,Ty=b,y0,maxx,𝝅bT𝝅s.t.TT𝝅c+Dx,x{0,1}|E|:(i,j)Axi,jγ\small\begin{split}\max_{\mathbf{x}\in X}\{\min_{\mathbf{y}\in Y(\mathbf{x})}\sum_{(i,j)\in A}(c_{i,j}+d_{i,j}x_{i,j})y_{i,j}\}\quad&\textrm{s.t.}\quad x\in\{0,1\}^{|A|}:\sum_{(i,j)\in A}x_{i,j}\leq\gamma,\quad T\textbf{y}=\textbf{b},\quad y\geq 0,\\ \quad\quad\quad&\Downarrow\\ \max_{\textbf{x},\bm{\pi}}\quad\textbf{b}^{T}\bm{\pi}\quad&\textrm{s.t.}\quad T^{T}\bm{\pi}\leq\textbf{c}+D\textbf{x},\quad\textbf{x}\in\{0,1\}^{|E|}:\sum_{(i,j)\in A}x_{i,j}\leq\gamma\end{split} (5)

3.2 Graph Representations for Network Interdiction Problems

We propose a Multipartite MILP-induced graph or MMILP-graph representation to encode the above single-level optimization into a form, i.e., MMILP-Graph, that is readable by GNNs while preserving all the needed information.

Definition 3.1 (MMILP-graph).

A multipartite MILP-induced graph is defined as a tuple (G,H)(G,H), where G(W0W1WpV,E)G\equiv(W_{0}\cup W_{1}\cup...W_{p}\cup V,E) consists of vertex set W0W1WpVW_{0}\cup W_{1}\cup...W_{p}\cup V and a weighted edge set EE, and HH represents vertex features.

  • Vertices: The vertex set can be divided into three groups including one interdiction action variable vertex group W0W_{0}, mm dual-variable vertex group, and one constraint vertex group VV. Each of the child vertex groups has no overlap with another child vertex group. For convenience, we use V\mathit{V^{\prime}} to represent one of the vertex groups: V=(W0,W1,,Wp,V)\mathit{V^{\prime}}=(W_{0},W_{1},...,W_{p},V).

  • Edges: One edge in EE can connect one variable vertex in any of the variable vertex groups with one constraint vertex. Note that there is no edge connecting vertices in the same vertex group, or between two different vertex groups. An edge can carry edge features: E=EW0,VEW1,V,,EWp,VE=E^{W_{0},V}\cup E^{W_{1},V}\cup,...,\cup E^{W_{p},V} where EWk,VE^{W_{k},V} represents the edges connecting one variable in WpW_{p} and one constraint in VV.

  • Vertex features: Each vertex has its corresponding feature vector that represents characteristics in the original interdiction problem.

Finally, an MMILP-graph is defined as (G,H)𝒢(G,H)\in\mathcal{G}

Example 3.2 (MMILP-Graph for a Specific Shortest Path Interdiction Instance).

Take the shortest path interdiction instance in Fig. 1(a) as an example, the induced MILP can be found in Eq. (6), and the corresponding MMILP-graph is shown in Fig. 1(b).

max(π0π9)s.t.v1:π0π1x0,19,v2:π0π4x0,43,v3:π0π3x0,33,v4:π1π2x1,24,v5:π1π3x1,31,v6:π4π3x4,32,v7:π4π5x4,53,v8:π3π2x3,28,v9:π3π6x3,64,v10:π3π5x3,56,v11:π2π6x2,65,v12:π5π6x5,64,v13:x0,1+x0,3+x0,4+x1,3+x1,2+x2,6+x3,2+x3,6+x3,5+x4,3+x4,5+x5,61\small\begin{split}&\max(\pi_{0}-\pi_{9})\\ \textrm{s.t.}\quad&v_{1}:\pi_{0}-\pi_{1}-x_{0,1}\leq 9,v_{2}:\pi_{0}-\pi_{4}-x_{0,4}\leq 3,v_{3}:\pi_{0}-\pi_{3}-x_{0,3}\leq 3,\\ &v_{4}:\pi_{1}-\pi_{2}-x_{1,2}\leq 4,v_{5}:\pi_{1}-\pi_{3}-x_{1,3}\leq 1,v_{6}:\pi_{4}-\pi_{3}-x_{4,3}\leq 2,\\ &v_{7}:\pi_{4}-\pi_{5}-x_{4,5}\leq 3,v_{8}:\pi_{3}-\pi_{2}-x_{3,2}\leq 8,v_{9}:\pi_{3}-\pi_{6}-x_{3,6}\leq 4,\\ &v_{10}:\pi_{3}-\pi_{5}-x_{3,5}\leq 6,v_{11}:\pi_{2}-\pi_{6}-x_{2,6}\leq 5,v_{12}:\pi_{5}-\pi_{6}-x_{5,6}\leq 4,\\ &v_{13}:x_{0,1}+x_{0,3}+x_{0,4}+x_{1,3}+x_{1,2}+x_{2,6}+x_{3,2}+x_{3,6}+x_{3,5}+x_{4,3}+x_{4,5}+x_{5,6}\leq 1\\ \end{split} (6)
Refer to caption
(a) Example Shortest Path Interdiction Instance
Refer to caption
(b) MMILP-graph for Eq. (6)
Figure 1: Network Interdiction Instance and the Corresponding MMILP-Graph

Vertices: In in Fig. 1(b), the vertices groups include one interdiction action variable group 𝐱={x0,1,x0,3,,x5,6}\mathbf{x}=\{x_{0,1},x_{0,3},...,x_{5,6}\} (the vertices on the right side ), only one dual-variable vertex group 𝛑={π0,π1,,π6}\bm{\pi}=\{\pi_{0},\pi_{1},...,\pi_{6}\} (the vertices on the left side), and one constraint group 𝐰={w1,w2,,w13}\mathbf{w}=\{w_{1},w_{2},...,w_{13}\} (the vertices in the middle). In this problem, pp in Definition 3.1 equals 11 because there is only one dual variable 𝛑\bm{\pi} in the reduced form of MILP.

Edges: One edge in Fig. 1(b) connects one constraint vertex (in the middle column) and one variable vertex (in the left or right column), representing the relationship between a constraint (vi,i=1,2,,13v_{i},i=1,2,...,13) and the involved variables in Eq (6). The weight of the edge represents the coefficient of the variable in the constraint. For example, the weight of the edge between vertex π0\pi_{0} and v1v_{1}, i.e., E0,1W1,VE^{W_{1},V}_{0,1}, is 1 because in the constraint v1v_{1}, i.e., π0π1x0,19\pi_{0}-\pi_{1}-x_{0,1}\leq 9, the coefficient of π0\pi_{0} is 1.

Vertex features: The vertex features for both variable vertices and constraint vertices align with the feature definitions in the bipartite graphs in [12, 19, 13, 5, 6]. hiVh_{i}^{\mathit{V}^{\prime}} represents the feature vector for the ii-th node of vertex group VV^{\prime}.

Relationship with existing work. This MMILP-graph representation is inspired by the use of bipartite graphs for solving general MILPs, as initially introduced by Gasse et al. [12] and subsequently utilized in various studies ( [19], [13], [5], [6]). Although we can demonstrate that the induced single-level MILPs of network interdiction instances in Section 3.1 can be represented in the standard MILP format, the use of MMILP-graph can not only capture the heterogeneity of different variables but also learn their different degrees of importance. Firstly, variables in the induced problems typically are different from each other. Comparing Eq. (5) with a general MILP in Eq. (4), the dual vector π\pi and the interdiction decision vector xx represent distinct variables. Incorporating them into the MILP-graph overlooks the difference between xx and π\pi, resulting in reduced information. Secondly, different variables have varying degrees of importance. Take shortest-path interdiction as an example, our primary interest lies in the value assignments of variable xx because once xx is assigned, the problem simplifies to a regular shortest-path problem that can be solved by Dijkstra’s algorithm within polynomial time. This characteristic is also called decomposability which is essential for using Benders Decomposition on network interdiction problems.

3.3 Graph Neural Network for MMILP-Graphs

Given the above-formulated MMILP-graph, here we propose a Multipartite Graph Neural Network that takes it as input to predict the solution network interdiction instance. The graph neural network aims to learn an 𝒢n\mathcal{G}\rightarrow\mathbb{R}^{n} mapping where nn is the number of nodes in the network interdiction problem as well as the number of variables in vertex group W0W_{0}.

In detail, because of the heterogeneity and multipartite structure of the input graph, our graph neural network has several passes. In general, we design the encoding layer for each partite of the vertices; we then propose a new message-passing strategy to learn the mapping between different groups of vertices where they should communicate; and finally, a read-out layer will produce estimates for the interdiction variable vertices, each corresponding directly to one candidate edge in the original network.

Variable and constraint vertices encoding: Raw vertex features are encoded into the embedding space with trainable functions finV,(V=(W0,W1,,Wp,V)f_{in}^{\mathit{V}^{\prime}},(\mathit{V}^{\prime}=(W_{0},W_{1},...,W_{p},V)):

hi0,V=finV(hiV),\displaystyle\begin{split}h_{i}^{0,\mathit{V}^{\prime}}=f_{in}^{\mathit{V}^{\prime}}(h_{i}^{\mathit{V}^{\prime}}),\end{split} (7)

where hi0,Vh_{i}^{0,\mathit{V}^{\prime}} represents the embedding (layer 0) for node ii in vertex group V\mathit{V}^{\prime}, and finVf_{in}^{\mathit{V}^{\prime}} represents the function for encoding the raw vertex features in group V\mathit{V}^{\prime} into embedding space. In practice, we append random features [22] to the vertex features to enhance the separation power of our method as explained in Section 4.

Variables-constraints message passing: We stack multiple graph neural network layers with variables-to-constraints message passing (vcv\rightarrow c) and constraint-to-variable message passing (cvc\rightarrow v). Since there are multiple sets/groups of variables in the MMILP-graph, the trainable functions are different as well. vcv\rightarrow c and cvc\rightarrow v are defined in turn in Eq (8) and Eq (9):

hil,V=glV(hil1,V,k=0peEWk,VflWk(hil1,V,hjl1,Wk,e),\small\begin{split}&h_{i}^{l,V}=g_{l}^{V}(h_{i}^{l-1,V},\sum_{k=0}^{p}\sum_{e\in E^{W_{k},V}}f_{l}^{W_{k}}(h_{i}^{l-1,V},h_{j}^{l-1,W_{k}},e),\end{split} (8)
hil,Wk=glWk(hil1,Wk,eEWk,VflWk(hil1,V,hjl1,Wk,e),\small\begin{split}&h_{i}^{l,W_{k}}=g_{l}^{W_{k}}(h_{i}^{l-1,W_{k}},\sum_{e\in E^{W_{k},V}}f_{l}^{W_{k}}(h_{i}^{l-1,V},h_{j}^{l-1,W_{k}},e),\end{split} (9)

where hil,Wkh_{i}^{l,W_{k}} is vertex features for node ii within variable set WkW_{k} at layer ll, and hil,Vh_{i}^{l,V} is vertex features for node ii in constraint set VV at layer ll.

Interdiction decision read-out: The read-out function is only applied on the interdiction decision variable vertices, i.e., W0W_{0} as is demonstrated in Eq (10).

xi^=gout(hiL,W0),\begin{split}&\hat{x_{i}}=g_{out}(h_{i}^{L,W_{0}}),\end{split} (10)

With the definition of the detailed operations, we denote the collection of GNNs with \mathcal{F}:

={F:𝒢n|Fyields (7), (8), (9), (10)}\begin{split}\mathcal{F}=\{F:\mathcal{G}\rightarrow\mathbb{R}^{n}|F\;\textrm{yields (\ref{eq:input_layer}), (\ref{eq:v2c}), (\ref{eq:c2v}), (\ref{eq:output_layer})}\}\end{split} (11)

In practice, we use multi-linear perceptrons (MLPs) for all the trainable functions in Eq (7) - (10).

Since we train the models using optimal solutions as labels, the predicted values can be interpreted as the likelihood of whether an edge should be interdicted or the budget to be allocated to that edge. This is actually more convenient than most unilateral CO problems that involve sequential decision-making. For instance, solutions to the shortest path problem must consist of a sequence of connected edges in the graph from the source to the target, whereas shortest path network interdiction does not have this requirement.

4 Theoretical Analysis

4.1 Representation Power Analysis

The representation power of a neural network is determined by its ability to produce different outcomes for different inputs, which is fundamental to its effectiveness. Our approach to measuring the representation power of MMILP-GNN is heavily inspired by existing research on the representation power of GNNs [27] and the separation power of GNNs in LP [5] and MILP [6]. Following the extension of the Weisfeiler-Lehman (WL) test for LP-Graphs [5] and MILP-Graphs [6], we further extend it for MMILP-Graphs as shown in Algorithm 1. We extend these results to MMILP-GNNs for network interdiction instances and validate them in a similar manner.

Algorithm 1 WLMMILP\mathrm{WL}_{\mathrm{MMILP}}
1:A graph instance (G,H)(G,H) with pp dual variables, iteration limit L>0L>0.
2:Ci0,V=HASHV0(hiV)C_{i}^{0,V}=\text{HASH}_{V}^{0}(h_{i}^{V})
3:Cj0,Wk=HASHWk0(hjWk)C_{j}^{0,W_{k}}=\text{HASH}_{W_{k}}^{0}(h_{j}^{W_{k}}) for kin{0,1,2,,p}k\;\textrm{in}\;\{0,1,2,...,p\}.
4:for l=1,2,,Ll=1,2,\cdots,L do
5:    Cil,V=HASHVl(Cil1,V,k=0pj=1nkEi,jWk,VHASHWkl(Cjl1,Wk))C_{i}^{l,V}=\text{HASH}_{V}^{l}\left(C_{i}^{l-1,V},\sum_{k=0}^{p}\sum_{j=1}^{n_{k}}E_{i,j}^{W_{k},V}\text{HASH}_{W_{k}}^{l^{\prime}}\left(C_{j}^{l-1,W_{k}}\right)\right).
6:    for k=0,1,,pk=0,1,\cdots,p do
7:       Cjl,Wk=HASHWkl(Cjl1,Wk,i=1mEi,jWk,VHASHV,Wkl(Cil1,V))C_{j}^{l,W_{k}}=\text{HASH}_{W_{k}}^{l}\left(C_{j}^{l-1,W_{k}},\sum_{i=1}^{m}E_{i,j}^{W_{k},V}\text{HASH}_{V,W_{k}}^{l^{\prime}}\left(C_{i}^{l-1,V}\right)\right).
8:    end for
9:end for
10:return The multisets containing all colors {{CiL,V}}i=0m\{\{C_{i}^{L,V}\}\}_{i=0}^{m} and {{{CjL,Wk}}j=0n}k=0p\{\{\{C_{j}^{L,W_{k}}\}\}_{j=0}^{n}\}_{k=0}^{p}.
Theorem 4.1.

For two MMILP-graphs (G,H)(G,H) and (G^,H^)(\hat{G},\hat{H}), they cannot be distinguished by WLMMILP\mathrm{WL}_{\mathrm{MMILP}} if and only if F(G,H)=F(G^,H^),FF(G,H)=F(\hat{G},\hat{H}),\forall F\in\mathcal{F}

The proof of Theorem 4.1 follows from Lemma 4.2 and 4.3.

Lemma 4.2.

Let (G,H),(G^,H^)𝒢(G,H),(\hat{G},\hat{H})\in\mathcal{G}. if (G,H)(G^,H^)(G,H)\sim(\hat{G},\hat{H}) then for any FWGNNWF_{W}\in\mathcal{F}_{\text{GNN}}^{W}, there exists a permutation σWSn\sigma_{W}\in S_{n} such that FW(G,H)=σW(FW(G^,H^))F_{W}(G,H)=\sigma_{W}(F_{W}(\hat{G},\hat{H})).

Lemma 4.3.

Let (G,H),(G^,H^)𝒢(G,H),(\hat{G},\hat{H})\in\mathcal{G}. If F(G,H)=F(G^,H^)F(G,H)=F(\hat{G},\hat{H}) holds for any FF\in\mathcal{F}, then (G,H)(G^,H^)(G,H)\sim(\hat{G},\hat{H}).

The proofs are detailed in Appendix A. Since our results align with those in [6] for MILPs, it is possible that the network interdiction instances induce foldable MILPs as introduced in their work. To address these cases, we enhance the representation power in our experiments by appending random features [22] to the MMILP-graphs.

4.2 Discussions on Algorithmic Alignment

The concept of algorithmic alignment, introduced by Xu et al. [27], describes the reasoning capabilities of GNNs. The core idea is that a neural network is more effective at learning to perform a reasoning task, such as solving a CO problem, when its architecture aligns well with the algorithm designed to solve that task. Specifically, GNNs have been found to align with DP, a paradigm capable of solving many CO problems. This alignment between DP and GNNs has been further examined by Dudzik and Veličković [10] using methods from category theory and abstract algebra.

However, network interdiction problems fall into a category that doesn’t align with DP and thus cannot be naturally addressed by GNNs. DP relies on the principle of optimal substructure, where an optimal solution to a problem can be constructed from optimal solutions to its subproblems [7]. Network interdiction problems typically involve nested two-layer decision-making, which does not exhibit the overlapping subproblems characteristic. The way our proposed method aligns better with the exact solver can also be explained with Dudzik and Veličković [10]’s theories in abstract algebra: by representing the network interdiction instances as multipartite graphs, the MMILP-GNN and the target algorithm (BB) at least share the same finite sets in the polynomial spans.

5 Experimental Evaluation

In the experiments, we focus on two specific network interdiction problems: the shortest path interdiction problem and the maximum flow interdiction problem.

Instance generation. We generate four maximum flow interdiction problem sets, MFI20, MFI30, MFI100, and MFI200, consisting of 4,000 of 20, 30, 100, and 200-node maximum flow interdiction instances, respectively. Similarly, we generate three shortest-path interdiction problem sets, SPI20, SPI30, and SPI100. Further information on data generation is available in the Appendix B.

Baseline methods.

  • GCN: The Graph Convolutional Network (GCN) is a popular GNN variant [16].

  • rGIN: The GIN model is a standard isomorphism graph neural network that operates on the original graph of the instances [26]. To ensure a fair comparison, we use the version of GIN that incorporates random features, denoted as rGIN [22].

  • SCIP is the most powerful non-commercial optimization software package that incorporates BB [2]. We formulate both the maximum flow interdiction problem and the shortest path interdiction problem as MILP problems and allow SCIP to solve them. The detailed configuration can be found in Appendix D.

Details on validation and testing: For each of the datasets, we use 50% of instances for training, 25% for validation, and 25% for testing. The MMILP-GNN produces interdiction marginal probabilities for each edge in the graph as its final output. Given the predicted x^\hat{x} in Eq. (10), we utilize two decision-making strategies to assess the model’s performance.

  • End-to-end prediction: For a problem with a limited interdiction budget of kk, the end-to-end prediction strategy simply applies interdictions to the top-k edges based on the predicted marginal interdiction probability.

  • Predict-and-search: This strategy utilizes a trust-region-based algorithm to search for high-quality feasible solutions, guided by a mapping derived from the initial predictions [14]. The implementation of this strategy is directly derived from existing work outlined by Han et al. [13] and is detailed in Appendix C.

We employ these two decision-making strategies tailored for different purposes and comparisons. Firstly, it’s important to note that the end-to-end prediction method is not expected to outperform SCIP, because SCIP is expected to generate optimal solutions, while perfect generalization ability is theoretically unattainable for machine learning models (detailed in Section 5.1). As a result, the end-to-end prediction method is only compared with other learning-based methods. Secondly, to facilitate a fair comparison between our learning-based method and the SCIP, we adopt the predict-and-search strategy. The rationale is that although SCIP is capable of providing the optimal solution, it typically requires a significant amount of time. Within a given timeframe, the predict-and-search strategy allows us to refine the predictions from our models. The goal is to assess whether our model can swiftly and reliably predict a smaller region than the original search space, enabling us to obtain better solutions faster than SCIP before it reaches an optimal solution.

5.1 End-to-End Prediction Results

For end-to-end predictions, we conduct experiments on three small datasets for both the shortest-path interdiction problem and the maximum flow interdiction problem. Table 1 demonstrate that Neural Interdiction consistently outperforms the comparison models. The approximation ratio measures the average ratio to the optimal results: the higher, the better for max-min problems, and the lower, the better for min-max problems. Visualization of the predictions is available in Figure 2. It’s important to acknowledge that learning-based methods, including our proposed Neural Interdiction, are not expected to match SCIP’s performance because both interdiction problems are recognized as NP-hard problems, while neural network models operate in polynomial time. Hence, under the assumption that PNP\text{P}\neq\text{NP}, it is clear that learning-based methods cannot approximate exact solvers.

Refer to caption
Figure 2: Example Shortest Path Interdiction Instance
Dataset Method Approximation Ratio Optimality Gap
MFI20 rGIN 1.33±0.071.33\pm 0.07 14.2±0.9214.2\pm 0.92
GCN 1.42±0.061.42\pm 0.06 15.2±0.6515.2\pm 0.65
Neural Interdiction 1.22±0.06\mathbf{1.22}\pm\mathbf{0.06} 8.2±0.32\textbf{8.2}\pm\textbf{0.32}
MFI30 rGIN 1.43±0.021.43\pm 0.02 23.2±0.5323.2\pm 0.53
GCN 1.59±0.071.59\pm 0.07 25.8±0.9725.8\pm 0.97
Neural Interdiction 1.28±0.02\mathbf{1.28}\pm\mathbf{0.02} 9.6±0.69\textbf{9.6}\pm\textbf{0.69}
MFI00 rGIN 1.48±0.191.48\pm 0.19 46.2±3.4846.2\pm 3.48
GCN 1.59±0.231.59\pm 0.23 59.2±5.8959.2\pm 5.89
Neural Interdiction 1.33±0.17\mathbf{1.33}\pm\mathbf{0.17} 38.2±3.83\textbf{38.2}\pm\textbf{3.83}
SPI20 rGIN 0.75±0.180.75\pm 0.18 19.75±2.5619.75\pm 2.56
GCN 0.69±0.170.69\pm 0.17 23.25±3.3723.25\pm 3.37
Neural Interdiction 0.82±0.14\textbf{0.82}\pm\textbf{0.14} 18.25±2.73\textbf{18.25}\pm\textbf{2.73}
SPI30 rGIN 0.68±0.250.68\pm 0.25 20.5±2.6620.5\pm 2.66
GCN 0.64±0.290.64\pm 0.29 26.25±5.2526.25\pm 5.25
Neural Interdiction 0.81±0.17\textbf{0.81}\pm\textbf{0.17} 19.75±2.37\textbf{19.75}\pm\textbf{2.37}
SPI00 rGIN 0.72±0.220.72\pm 0.22 24.96±5.2424.96\pm 5.24
GCN 0.56±0.260.56\pm 0.26 27.75±6.3827.75\pm 6.38
Neural Interdiction 0.75±0.19\textbf{0.75}\pm\textbf{0.19} 23.25±4.18\textbf{23.25}\pm\textbf{4.18}
Table 1: Results of end-to-end predictions. The approximation ratio is the higher, the better for SPI problems, and the lower, the better for MFI problems.

5.2 Predict-and-Search Results

Using the predict-and-search strategy, we initially predict interdiction decision distributions with MMILP-GNN, followed by searching for near-optimal solutions within a trust region derived from the prediction. We then assess Neural Interdiction’s performance against SCIP within a limited time frame. In the case of Neural Interdiction, the inference time comprises two periods: model inference and search. Model inference, conducted via PyTorch on GPUs, is relatively negligible compared to search time. The search phase, also executed through SCIP, can be contrasted with the solving time of the original problem.

Due to limited resources for labeling and training, we conducted experiments on 2,000 instances of 200-node maximum flow interdiction problems. Figure 3 represents a comparison of solving times between the original problem solved by SCIP and the predict-and-search approach. This figure is based on SCIP logs from running both SCIP alone and SCIP combined with Neural Interdiction. In most cases, the pre-solving time is very similar, but our proposed method begins to show advantages starting from 25-35 seconds. This illustrates that the MMILP-GNN model, trained on the training set, consistently offers reliable estimates of where the optimal solution may lie, such that the search strategy and start from there and find good solutions quicker than solving the exact problem.

Refer to caption
Figure 3: Averaged results visualization from 2,000 runs for each of the two methods.

6 Conclusions and Limitations

We introduce a GNN-based framework for solving network interdiction problems by modeling them as single-level MILPs and encoding them onto multipartite graphs. Using a multipartite GNN, we learn the mapping between the graph representation and solution distributions, allowing us to derive final discrete solutions through two strategies. To our knowledge, this is the first learning-based method for addressing network interdiction problems. We provide theoretical evidence demonstrating the representational power of our model while also elucidating how the method aligns with conventional algorithms capable of solving such problems. The main limitation of this work is that, although the experimental results show our model can learn from training data and perform well on several datasets, its performance and generalization ability are not yet sufficient to make it a reliable solver for network interdiction problems.

References

  • Bai et al. [2020] Yunsheng Bai, Hao Ding, Ken Gu, Yizhou Sun, and Wei Wang. Learning-based efficient graph similarity computation via multi-scale convolutional set matching. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 3219–3226, 2020.
  • Bestuzheva et al. [2021] Ksenia Bestuzheva, Mathieu Besançon, Wei-Kun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, Alexander Hoen, Christopher Hojny, Rolf van der Hulst, Thorsten Koch, Marco Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, Marc E. Pfetsch, Daniel Rehfeldt, Steffan Schlein, Franziska Schlösser, Felipe Serrano, Yuji Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philipp Wellner, Dieter Weninger, and Jakob Witzig. The SCIP Optimization Suite 8.0. Technical report, Optimization Online, December 2021. URL http://www.optimization-online.org/DB_HTML/2021/12/8728.html.
  • Böther et al. [2022] Maximilian Böther, Otto Kißig, Martin Taraz, Sarel Cohen, Karen Seidel, and Tobias Friedrich. What’s wrong with deep learning in tree search for combinatorial optimization. arXiv preprint arXiv:2201.10494, 2022.
  • Cappart et al. [2021] Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Veličković. Combinatorial optimization and reasoning with graph neural networks. In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4348–4355. International Joint Conferences on Artificial Intelligence Organization, 8 2021. doi: 10.24963/ijcai.2021/595. URL https://doi.org/10.24963/ijcai.2021/595. Survey Track.
  • Chen et al. [2022a] Ziang Chen, Jialin Liu, Xinshang Wang, and Wotao Yin. On representing linear programs by graph neural networks. In The Eleventh International Conference on Learning Representations, 2022a.
  • Chen et al. [2022b] Ziang Chen, Jialin Liu, Xinshang Wang, and Wotao Yin. On representing linear programs by graph neural networks. In The Eleventh International Conference on Learning Representations, 2022b.
  • Cormen et al. [2022] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2022.
  • Cormican [1995] Kelly J Cormican. Computational methods for deterministic and stochastic network interdiction problems. Technical report, NAVAL POSTGRADUATE SCHOOL MONTEREY CA, 1995.
  • Du and Pardalos [2013] Ding-Zhu Du and Panos M Pardalos. Handbook of Combinatorial Optimization: Supplement Volume A, volume 1. Springer Science & Business Media, 2013.
  • Dudzik and Veličković [2022] Andrew J Dudzik and Petar Veličković. Graph neural networks are dynamic programmers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 20635–20647. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/8248b1ded388fcdbbd121bcdfea3068c-Paper-Conference.pdf.
  • Fey et al. [2020] M. Fey, J. E. Lenssen, C. Morris, J. Masci, and N. M. Kriege. Deep graph matching consensus. In International Conference on Learning Representations (ICLR), 2020.
  • Gasse et al. [2019] Maxime Gasse, Didier Chételat, Nicola Ferroni, Laurent Charlin, and Andrea Lodi. Exact combinatorial optimization with graph convolutional neural networks. Advances in neural information processing systems, 32, 2019.
  • Han et al. [2022] Qingyu Han, Linxin Yang, Qian Chen, Xiang Zhou, Dong Zhang, Akang Wang, Ruoyu Sun, and Xiaodong Luo. A gnn-guided predict-and-search framework for mixed-integer linear programming. In The Eleventh International Conference on Learning Representations, 2022.
  • Han et al. [2023] Qingyu Han, Linxin Yang, Qian Chen, Xiang Zhou, Dong Zhang, Akang Wang, Ruoyu Sun, and Xiaodong Luo. A GNN-guided predict-and-search framework for mixed-integer linear programming. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=pHMpgT5xWaE.
  • Joshi et al. [2019] Chaitanya K Joshi, Thomas Laurent, and Xavier Bresson. An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv:1906.01227, 2019.
  • Kipf and Welling [2017] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
  • Li et al. [2019] Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning, pages 3835–3845. PMLR, 2019.
  • Li et al. [2018] Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Combinatorial optimization with graph convolutional networks and guided tree search. Advances in neural information processing systems, 31, 2018.
  • Nair et al. [2020] Vinod Nair, Sergey Bartunov, Felix Gimeno, Ingrid Von Glehn, Pawel Lichocki, Ivan Lobov, Brendan O’Donoghue, Nicolas Sonnerat, Christian Tjandraatmadja, Pengming Wang, et al. Solving mixed integer programs using neural networks. arXiv preprint arXiv:2012.13349, 2020.
  • Prates et al. [2019] Marcelo Prates, Pedro HC Avelar, Henrique Lemos, Luis C Lamb, and Moshe Y Vardi. Learning to solve np-complete problems: A graph neural network for decision tsp. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4731–4738, 2019.
  • Prouvost et al. [2020] Antoine Prouvost, Justin Dumouchelle, Lara Scavuzzo, Maxime Gasse, Didier Chételat, and Andrea Lodi. Ecole: A gym-like library for machine learning in combinatorial optimization solvers. In Learning Meets Combinatorial Algorithms at NeurIPS2020, 2020. URL https://openreview.net/forum?id=IVc9hqgibyB.
  • Sato et al. [2021] Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Random features strengthen graph neural networks. In Proceedings of the 2021 SIAM international conference on data mining (SDM), pages 333–341. SIAM, 2021.
  • Schäfer [2021] Luca E Schäfer. GitHub - Luca-Elias-Schaefer/Gurobi-Models: In this project, one can find an extensive collection of classical optimization problems along with a proposed solution procedure using Gurobi. — github.com. https://github.com/Luca-Elias-Schaefer/Gurobi-Models, 2021. [Accessed 21-05-2024].
  • Smith and Song [2020] J Cole Smith and Yongjia Song. A survey of network interdiction models and algorithms. European Journal of Operational Research, 283(3):797–811, 2020.
  • Wood [1993] R Kevin Wood. Deterministic network interdiction. Mathematical and Computer Modelling, 17(2):1–18, 1993.
  • Xu et al. [2019] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
  • Xu et al. [2020] Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken ichi Kawarabayashi, and Stefanie Jegelka. What can neural networks reason about? In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rJxbJeHFPS.

Appendix A Proofs for Section 4

Our approach to measuring the representation power of MMILP-GNN is heavily inspired by existing research on the representation power of GNNs [27] and the separation power of GNNs in LP [5] and MILP [6]. We also prove Lemma 4.2 and Lemma 4.3 in the same way corresponding lemmas and theorems are proved in [5].

Proof of Lemma 4.2.

Define p+1p+1 sets (not multisets) that collect different colors:

𝐂l1V={C1l1,V,,Cml1,V,C^1l1,V,,C^ml1,V},\mathbf{C}_{l-1}^{V}=\{C_{1}^{l-1,V},\dots,C_{m}^{l-1,V},\hat{C}_{1}^{l-1,V},\dots,\hat{C}_{m}^{l-1,V}\},

and

𝐂l1Wk={C1l1,Wk,,Cnl1,Wk,C^1l1,Wk,,C^nl1,Wk}.\mathbf{C}_{l-1}^{W_{k}}=\{C_{1}^{l-1,W_{k}},\dots,C_{n}^{l-1,W_{k}},\hat{C}_{1}^{l-1,W_{k}},\dots,\hat{C}_{n}^{l-1,W_{k}}\}.

We aim to prove by induction that for any l{0,1,,L}l\in\{0,1,\dots,L\}, the followings hold:

  • (i)

    Cil,V=Cil,VC_{i}^{l,V}=C_{i^{\prime}}^{l,V} implies hil,V=hil,Vh_{i}^{l,V}=h_{i^{\prime}}^{l,V}, for 1i,im1\leq i,i^{\prime}\leq m;

  • (ii)

    C^il,V=C^il,V\hat{C}_{i}^{l,V}=\hat{C}_{i^{\prime}}^{l,V} implies h^il,V=h^il,V\hat{h}_{i}^{l,V}=\hat{h}_{i^{\prime}}^{l,V}, for 1i,im1\leq i,i^{\prime}\leq m;

  • (iii)

    Cil,V=C^il,VC_{i}^{l,V}=\hat{C}_{i^{\prime}}^{l,V} implies hil,V=h^il,Vh_{i}^{l,V}=\hat{h}_{i^{\prime}}^{l,V}, for 1i,im1\leq i,i^{\prime}\leq m;

  • (iv)

    Cjl,Wk=Cjl,WkC_{j}^{l,W_{k}}=C_{j^{\prime}}^{l,W_{k}} implies hjl,Wk=hjl,Wkh_{j}^{l,W_{k}}=h_{j^{\prime}}^{l,W_{k}}, for 1j,jnk1\leq j,j^{\prime}\leq n_{k};

  • (v)

    C^jl,Wk=C^jl,Wk\hat{C}_{j}^{l,W_{k}}=\hat{C}_{j^{\prime}}^{l,W_{k}} implies h^jl,Wk=h^jl,Wk\hat{h}_{j}^{l,W_{k}}=\hat{h}_{j^{\prime}}^{l,W_{k}}, for 1j,jnk1\leq j,j^{\prime}\leq n_{k};

  • (vi)

    Cjl,Wk=C^jl,WkC_{j}^{l,W_{k}}=\hat{C}_{j^{\prime}}^{l,W_{k}} implies hjl,Wk=h^jl,Wkh_{j}^{l,W_{k}}=\hat{h}_{j^{\prime}}^{l,W_{k}}, for 1j,jnk1\leq j,j^{\prime}\leq n_{k}.

The above claims (i)-(vi) are clearly true for l=0l=0 due to the injectivity of HASHV0\text{HASH}_{V}^{0} and HASHWk0\text{HASH}_{W_{k}}^{0}. Now we assume that (i)-(vi) are true for some l1{0,1,,L1}l-1\in\{0,1,\dots,L-1\}. Suppose that Cil,V=Cil,VC_{i}^{l,V}=C_{i^{\prime}}^{l,V}, i.e.,

HASHVl(Cil1,V,k=0pj=1nkEi,jWk,VHASHWkl(Cjl1,Wk))\displaystyle\text{HASH}_{V}^{l}\left(C_{i}^{l-1,V},\sum_{k=0}^{p}\sum_{j=1}^{n_{k}}E_{i,j}^{W_{k},V}\text{HASH}_{W_{k}}^{l^{\prime}}\left(C_{j}^{l-1,W_{k}}\right)\right)
=\displaystyle= HASHVl(Cil1,V,k=0pj=1nkEi,jWk,VHASHWkl(Cjl1,Wk)),\displaystyle\text{HASH}_{V}^{l}\left(C_{i^{\prime}}^{l-1,V},\sum_{k=0}^{p}\sum_{j=1}^{n_{k}}E_{i^{\prime},j}^{W_{k},V}\text{HASH}_{W_{k}}^{l^{\prime}}\left(C_{j}^{l-1,W_{k}}\right)\right),

for some 1i,im1\leq i,i^{\prime}\leq m. It follows from the injectivity of HASHVl\text{HASH}_{V}^{l} that

Cil1,V=Cil1,V,C_{i}^{l-1,V}=C_{i^{\prime}}^{l-1,V}, (12)

and

k=0pj=1nkEi,jWk,VHASHWkl(Cjl1,Wk)=k=0pj=1nkEi,jWk,VHASHWkl(Cjl1,Wk).\displaystyle\sum_{k=0}^{p}\sum_{j=1}^{n_{k}}E_{i,j}^{W_{k},V}\text{HASH}_{W_{k}}^{l^{\prime}}\left(C_{j}^{l-1,W_{k}}\right)=\sum_{k=0}^{p}\sum_{j=1}^{n_{k}}E_{i^{\prime},j}^{W_{k},V}\text{HASH}_{W_{k}}^{l^{\prime}}\left(C_{j}^{l-1,W_{k}}\right).

According to the linearly independent property of HASHWkl\text{HASH}_{W_{k}}^{l^{\prime}}, the above equation implies that

Cjl1,Wk=CEi,j=Cjl1,Wk=CEi,j,C𝐂l1Wk.\sum_{C_{j}^{l-1,W_{k}}=C}E_{i,j}=\sum_{C_{j}^{l-1,W_{k}}=C}E_{i^{\prime},j},\quad\forall~{}C\in\mathbf{C}_{l-1}^{W_{k}}. (13)

Note that the induction assumption guarantees that hjl1,Wk=hjl1,Wkh_{j}^{l-1,W_{k}}=h_{j^{\prime}}^{l-1,W_{k}} as long as Cjl1,Wk=Cjl1,WkC_{j}^{l-1,W_{k}}=C_{j^{\prime}}^{l-1,W_{k}}. So one can assign for each C𝐂l1WkC\in\mathbf{C}_{l-1}^{W_{k}} some h(C)dl1h(C)\in\mathbb{R}^{d_{l-1}} such that hjl1,Wk=h(C)h_{j}^{l-1,W_{k}}=h(C) as long as Cjl1,Wk=CC_{j}^{l-1,W_{k}}=C for any 1jnk1\leq j\leq n_{k}. Therefore, it follows from (13) that

j=1nEi,jflWk(hjl1,Wk)=C𝐂l1WkCjl1,Wk=CEi,jflWk(h(C))\displaystyle\sum_{j=1}^{n}E_{i,j}f_{l}^{W_{k}}(h_{j}^{l-1,W_{k}})=\sum_{C\in\mathbf{C}_{l-1}^{W_{k}}}\sum_{C_{j}^{l-1,{W_{k}}}=C}E_{i,j}f_{l}^{W_{k}}(h(C))
j=1nEi,jflWk(hjl1,Wk)=C𝐂l1WkCjl1,Wk=CEi,jflWk(h(C))\displaystyle\sum_{j=1}^{n}E_{i^{\prime},j}f_{l}^{W_{k}}(h_{j}^{l-1,{W_{k}}})=\sum_{C\in\mathbf{C}_{l-1}^{W_{k}}}\sum_{C_{j}^{l-1,{W_{k}}}=C}E_{i^{\prime},j}f_{l}^{W_{k}}(h(C))
j=1nEi,jflWk(hjl1,Wk)=j=1nEi,jflWk(hjl1,Wk)\displaystyle\sum_{j=1}^{n}E_{i,j}f_{l}^{W_{k}}(h_{j}^{l-1,{W_{k}}})=\sum_{j=1}^{n}E_{i^{\prime},j}f_{l}^{W_{k}}(h_{j}^{l-1,{W_{k}}})

Note also that (12) and the induction assumption lead to hil1,V=hil1,Vh_{i}^{l-1,V}=h_{i^{\prime}}^{l-1,V}. Then one can conclude that

hil,V=\displaystyle h_{i}^{l,V}= glV(hil1,V,k=0pj=1nkEi,jWk,VflWk(hjl1,Wk))\displaystyle g_{l}^{V}\left(h_{i}^{l-1,V},\sum_{k=0}^{p}\sum_{j=1}^{n_{k}}E_{i,j}^{W_{k},V}f_{l}^{W_{k}}(h_{j}^{l-1,W_{k}})\right)
=\displaystyle= glV(hil1,V,k=0pj=1nkEi,jWk,VflWk(hjl1,Wk))=hil,V.\displaystyle g_{l}^{V}\left(h_{i^{\prime}}^{l-1,V},\sum_{k=0}^{p}\sum_{j=1}^{n_{k}}E_{i^{\prime},j}^{W_{k},V}f_{l}^{W_{k}}(h_{j}^{l-1,W_{k}})\right)=h_{i^{\prime}}^{l,V}.

This proves the claim (i) for ll. The other five claims can be proved using similar arguments.

Therefore, we obtain from (G,H)(G^,H^)(G,H)\sim(\hat{G},\hat{H}) that

{{h1L,V,h2L,V,,hmL,V}}={{h^1L,V,h^2L,V,,h^mL,V}},\left\{\left\{h_{1}^{L,V},h_{2}^{L,V},\dots,h_{m}^{L,V}\right\}\right\}=\left\{\left\{\hat{h}_{1}^{L,V},\hat{h}_{2}^{L,V},\dots,\hat{h}_{m}^{L,V}\right\}\right\},

and that

{{h1L,Wk,h2L,Wk,,hnL,Wk}}={{h^1L,Wk,h^2L,Wk,,h^nL,Wk}}.\displaystyle\left\{\left\{h_{1}^{L,W_{k}},h_{2}^{L,W_{k}},\dots,h_{n}^{L,W_{k}}\right\}\right\}=\left\{\left\{\hat{h}_{1}^{L,W_{k}},\hat{h}_{2}^{L,W_{k}},\dots,\hat{h}_{n}^{L,W_{k}}\right\}\right\}.

By the definition of the output layer, the above conclusion guarantees that F(G,H)=σ(F(G^,H^))F(G,H)=\sigma(F(\hat{G},\hat{H})) for some σSn\sigma\in S_{n}. ∎

Proof of Lemma 4.3.

It suffices to prove that, if (G,H)(G,H) can be distinguished from (G^,H^)(\hat{G},\hat{H}) by the WL test, then there exists FF\in\mathcal{F}, such that F(G,H)F(G^,H^)F(G,H)\neq F(\hat{G},\hat{H}). The distinguish-ability of the WL test implies that there exists LL\in\mathbb{N} and hash functions, HASH0,V\text{HASH}_{0,V}, HASH0,W\text{HASH}_{0,W}, HASHl,V\text{HASH}_{l,V}, HASHl,W\text{HASH}_{l,W}, HASHl,V\text{HASH}_{l,V}^{\prime}, and HASHl,W\text{HASH}_{l,W}^{\prime}, for l=1,2,,Ll=1,2,\dots,L, such that

{{C1L,V,C2L,V,,CmL,V}}{{C^1L,V,C^2L,V,,C^mL,V}},\displaystyle\left\{\left\{C_{1}^{L,V},C_{2}^{L,V},\dots,C_{m}^{L,V}\right\}\right\}\neq\left\{\left\{\hat{C}_{1}^{L,V},\hat{C}_{2}^{L,V},\dots,\hat{C}_{m}^{L,V}\right\}\right\}, (14)

or

{{C1L,W,C2L,W,,CnL,W}}{{C^1L,W,C^2L,W,,C^nL,W}},\displaystyle\left\{\left\{C_{1}^{L,W},C_{2}^{L,W},\dots,C_{n}^{L,W}\right\}\right\}\neq\left\{\left\{\hat{C}_{1}^{L,W},\hat{C}_{2}^{L,W},\dots,\hat{C}_{n}^{L,W}\right\}\right\}, (15)

We aim to construct some GNNs such that the followings hold for any l=0,1,,Ll=0,1,\dots,L:

  • (i)

    hil,V=hil,Vh_{i}^{l,V}=h_{i^{\prime}}^{l,V} implies Cil,V=Cil,VC_{i}^{l,V}=C_{i^{\prime}}^{l,V}, for 1i,im1\leq i,i^{\prime}\leq m;

  • (ii)

    h^il,V=h^il,V\hat{h}_{i}^{l,V}=\hat{h}_{i^{\prime}}^{l,V} implies C^il,V=C^il,V\hat{C}_{i}^{l,V}=\hat{C}_{i^{\prime}}^{l,V}, for 1i,im1\leq i,i^{\prime}\leq m;

  • (iii)

    hil,V=h^il,Vh_{i}^{l,V}=\hat{h}_{i^{\prime}}^{l,V} implies Cil,V=C^il,VC_{i}^{l,V}=\hat{C}_{i^{\prime}}^{l,V}, for 1i,im1\leq i,i^{\prime}\leq m;

  • (iv)

    hjl,Wk=hjl,Wkh_{j}^{l,W_{k}}=h_{j^{\prime}}^{l,W_{k}} implies Cjl,Wk=Cjl,WkC_{j}^{l,W_{k}}=C_{j^{\prime}}^{l,W_{k}}, for 1j,jn1\leq j,j^{\prime}\leq n;

  • (v)

    h^jl,Wk=h^jl,Wk\hat{h}_{j}^{l,W_{k}}=\hat{h}_{j^{\prime}}^{l,W_{k}} implies C^jl,Wk=C^jl,Wk\hat{C}_{j}^{l,W_{k}}=\hat{C}_{j^{\prime}}^{l,W_{k}}, for 1j,jn1\leq j,j^{\prime}\leq n;

  • (vi)

    hjl,Wk=h^jl,Wkh_{j}^{l,W_{k}}=\hat{h}_{j^{\prime}}^{l,W_{k}} implies Cjl,Wk=C^jl,WkC_{j}^{l,W_{k}}=\hat{C}_{j^{\prime}}^{l,W_{k}}, for 1j,jn1\leq j,j^{\prime}\leq n.

It is clear that the above conditions (i)-(vi) hold for l=0l=0 as long as we choose finVf^{V^{\prime}}_{\mathrm{in}} that is injective on the following p+1p+1 sets (not multisets) respectively: {h1V,,hmV,h^1V,,h^mV}\{h_{1}^{V},\dots,h_{m}^{V},\hat{h}_{1}^{V},\dots,\hat{h}_{m}^{V}\} and {h1Wk,,hnWk,h^1Wk,,h^nWk}.\{h_{1}^{W_{k}},\dots,h_{n}^{W_{k}},\hat{h}_{1}^{W_{k}},\dots,\hat{h}_{n}^{W_{k}}\}.  We then assume that (i)-(vi) hold for some 0l1<L0\leq l-1<L, and show that these conditions are also satisfied for ll if we choose flV,flW,giV,glWf_{l}^{V},f_{l}^{W},g_{i}^{V},g_{l}^{W} properly. Let us consider the set (not multiset):

{α1,α2,,αs}dl1\left\{\alpha_{1},\alpha_{2},\dots,\alpha_{s}\right\}\subset\mathbb{R}^{d_{l-1}}

that collects all different values in h1l1,W,h2l1,W,,hnl1,W,h^1l1,W,h^2l1,W,,h^nl1,Wh_{1}^{l-1,W},h_{2}^{l-1,W},\dots,h_{n}^{l-1,W},\hat{h}_{1}^{l-1,W},\hat{h}_{2}^{l-1,W},\dots,\hat{h}_{n}^{l-1,W}. Let dlsd_{l}\geq s and let epdl=(0,,0,1,0,,0)e_{p}^{d_{l}}=(0,\dots,0,1,0,\dots,0) be the vector in dl\mathbb{R}^{d_{l}} with the pp-th entry being 11 and all other entries being 0, for 1ps1\leq p\leq s. Choose flW:dl1dlf_{l}^{W}:\mathbb{R}^{d_{l-1}}\rightarrow\mathbb{R}^{d_{l}} as a continuous function satisfying flW(αp)=epdlf_{l}^{W}(\alpha_{p})=e_{p}^{d_{l}}, p=1,2,,sp=1,2,\dots,s, and choose glV:dl1×dldlg_{l}^{V}:\mathbb{R}^{d_{l-1}}\times\mathbb{R}^{d_{l}}\rightarrow\mathbb{R}^{d_{l}} that is continuous and is injective when restricted on the set (not multiset)

{(hil1,V,j=1nEi,jflW(hjl1,W)):1im}{(h^il1,V,j=1nE^i,jflW(h^jl1,W)):1im}.\left\{\left(h_{i}^{l-1,V},\sum_{j=1}^{n}E_{i,j}f_{l}^{W}(h_{j}^{l-1,W})\right):1\leq i\leq m\right\}\cup\left\{\left(\hat{h}_{i}^{l-1,V},\sum_{j=1}^{n}\hat{E}_{i,j}f_{l}^{W}(\hat{h}_{j}^{l-1,W})\right):1\leq i\leq m\right\}.

Noticing that

j=1nEi,jflW(hjl1,W)=p=1s(hjl1,W=αpEi,j)epdl,\sum_{j=1}^{n}E_{i,j}f_{l}^{W}(h_{j}^{l-1,W})=\sum_{p=1}^{s}\left(\sum_{h_{j}^{l-1,W}=\alpha_{p}}E_{i,j}\right)e_{p}^{d_{l}},

and that {e1dl,e2dl,,esdl}\{e_{1}^{d_{l}},e_{2}^{d_{l}},\dots,e_{s}^{d_{l}}\} is linearly independent, one can conclude that hil,V=hil,Vh_{i}^{l,V}=h_{i^{\prime}}^{l,V} if and only if hil1,V=hil1,Vh_{i}^{l-1,V}=h_{i^{\prime}}^{l-1,V} and j=1nEi,jflW(hjl1,W)=j=1nEi,jflW(hjl1,W)\sum_{j=1}^{n}E_{i,j}f_{l}^{W}(h_{j}^{l-1,W})=\sum_{j=1}^{n}E_{i^{\prime},j}f_{l}^{W}(h_{j}^{l-1,W}), where the second condition is equivalent to

hjl1,W=αpEi,j=hjl1,W=αpEi,j,p{1,2,,s}.\sum_{h_{j}^{l-1,W}=\alpha_{p}}E_{i,j}=\sum_{h_{j}^{l-1,W}=\alpha_{p}}E_{i^{\prime},j},\quad\forall~{}p\in\{1,2,\dots,s\}.

This, as well as the condition (iv) for l1l-1, implies that

j=1nEi,jHASHl,Wk(Cjl1,W)=j=1nEi,jHASHl,Wk(Cjl1,W),\sum_{j=1}^{n}E_{i,j}\text{HASH}_{l,W_{k}}^{\prime}\left(C_{j}^{l-1,W}\right)=\sum_{j=1}^{n}E_{i^{\prime},j}\text{HASH}_{l,W_{k}}^{\prime}\left(C_{j}^{l-1,W}\right),

and hence that Cil,V=Cil,VC_{i}^{l,V}=C_{i^{\prime}}^{l,V} by using hil1,V=hil1,Vh_{i}^{l-1,V}=h_{i}^{l-1,V} and condition (i) for l1l-1. Therefore, we know that (i) is satisfied for ll, and one can show (ii) and (iii) for ll using similar arguments by taking dld_{l} large enough. In addition, flVf_{l}^{V} and glWg_{l}^{W} can also be chosen in a similar way such that (iv)-(vi) are satisfied for ll.

Combining (14), (15), and condition (i)-(iv) for LL, we obtain that

{{h1L,V,h2L,V,,hmL,V}}{{h^1L,V,h^2L,V,,h^mL,V}},\left\{\left\{h_{1}^{L,V},h_{2}^{L,V},\dots,h_{m}^{L,V}\right\}\right\}\neq\left\{\left\{\hat{h}_{1}^{L,V},\hat{h}_{2}^{L,V},\dots,\hat{h}_{m}^{L,V}\right\}\right\}, (16)

or

{{h1l,Wk,h2l,Wk,,hnl,Wk}}{{h^1l,Wk,h^2l,Wk,,h^nl,Wk}}.\left\{\left\{h_{1}^{l,W_{k}},h_{2}^{l,W_{k}},\dots,h_{n}^{l,W_{k}}\right\}\right\}\neq\left\{\left\{\hat{h}_{1}^{l,W_{k}},\hat{h}_{2}^{l,W_{k}},\dots,\hat{h}_{n}^{l,W_{k}}\right\}\right\}.

Without loss of generality, we can assume that (16) holds.

Consider the set (not multiset)

{β1,β2,,βt}dL,\{\beta_{1},\beta_{2},\dots,\beta_{t}\}\subset\mathbb{R}^{d_{L}},

that collects all different values in h1L,V,h2L,V,,hmL,V,h^1L,V,h^2L,V,,h^mL,Vh_{1}^{L,V},h_{2}^{L,V},\dots,h_{m}^{L,V},\hat{h}_{1}^{L,V},\hat{h}_{2}^{L,V},\dots,\hat{h}_{m}^{L,V}. Let k>1k>1 be a positive integer that is greater than the maximal multiplicity of an element in the multisets {{h1L,V,h2L,V,,hmL,V}}\{\{h_{1}^{L,V},h_{2}^{L,V},\dots,h_{m}^{L,V}\}\} and {{h^1L,V,h^2L,V,,h^mL,V}}\{\{\hat{h}_{1}^{L,V},\hat{h}_{2}^{L,V},\dots,\hat{h}_{m}^{L,V}\}\}. There exists a continuous function φ:dL\varphi:\mathbb{R}^{d_{L}}\rightarrow\mathbb{R} such that φ(βq)=kq\varphi(\beta_{q})=k^{q} for q=1,2,,tq=1,2,\dots,t, and due to (16) and the fact that the way of writing an integer as kk-ary expression is unique, it hence holds that

i=1mφ(hiL,V)i=1mφ(h^iL,V).\sum_{i=1}^{m}\varphi(h_{i}^{L,V})\neq\sum_{i=1}^{m}\varphi(\hat{h}_{i}^{L,V}).

Set the dimension of (L+1)(L+1)-th layer as 11: dL+1=1d_{L+1}=1,

fout(i=1mhiL+1,V,j=1nhjL+1,W)=i=1mφ(hiL,V)i=1mφ(h^iL,V)=fout(i=1mh^iL+1,V,j=1nh^jL+1,W),\displaystyle f_{\text{out}}\left(\sum_{i=1}^{m}h_{i}^{L+1,V},\sum_{j=1}^{n}h_{j}^{L+1,W}\right)=\sum_{i=1}^{m}\varphi(h_{i}^{L,V})\neq\sum_{i=1}^{m}\varphi(\hat{h}_{i}^{L,V})=f_{\text{out}}\left(\sum_{i=1}^{m}\hat{h}_{i}^{L+1,V},\sum_{j=1}^{n}\hat{h}_{j}^{L+1,W}\right),

which guarantees the existence of FGNNF\in\mathcal{F}_{\text{GNN}} that has L+1L+1 layers and satisfies F(G,H)F(G^,H^)F(G,H)\neq F(\hat{G},\hat{H}). ∎

Appendix B Data Generation

In this study, without loss of generality, we concentrate on two specific network interdiction problems: the shortest path interdiction problem and the maximum flow interdiction problem. For simplicity, we adopt most of the hyperparameter settings from the project by Schäfer [23]. The budget constraint for each of the dataset is set to make the

For the shortest path network interdiction problem, after preprocessing, we solve the problem defined in Eq (5). For each instance, we generate a fully connected directed graph and assign cost and capacity to each edge. The source node is set as the first node, and the sink node is set as the NvN_{v}-th node in the network, where NvN_{v} is the total number of nodes. The cost on each edge is sampled uniformly between 1.01.0 and 10.010.0, and the capacity is sampled uniformly between 20.020.0 and 70.070.0. The budget constraint for three synthetic datasets (SPI20, SPI30, and SPI 100) are all set to 1515, 1515, and 3030, respectively.

Similarly, for the maximum flow network interdiction problem, after preprocessing, we solve the problem in Eq (17). For each instance, we generate a fully connected directed graph and assign cost and capacity to each edge. The source node is set as the first node, and the sink node is set as the NvN_{v}-th node in the network, where NvN_{v} is the total number of nodes. The cost on each edge is sampled uniformly between 1.01.0 and 10.010.0, and the capacity is sampled uniformly between 10.010.0 and 60.060.0. The budget constraint for all three synthetic datasets (MFI20, MFI30, and MFI 100) is all set to be 1515.

minα,β,γ(i,j)Aui,jβi,j,s.t.αiαj+βi,j+γi,j0,(i,j)A,αtαs1,(i,j)Ari,jγi,jR,αi{0,1},iN,βi,j,γi,j{0,1},(i,j)A.\displaystyle\begin{split}\min_{\alpha,\beta,\gamma}\quad&\sum_{(i,j)\in A}u_{i,j}\beta_{i,j},\\ \textrm{s.t.}\quad&\alpha_{i}-\alpha_{j}+\beta_{i,j}+\gamma_{i,j}\geq 0,\quad\forall(i,j)\in A,\quad\alpha_{t}-\alpha_{s}\geq 1,\\ &\sum_{(i,j)\in A}r_{i,j}\gamma_{i,j}\leq R,\quad\alpha_{i}\in\{0,1\},\quad\forall i\in N,\quad\beta_{i,j},\gamma_{i,j}\in\{0,1\},\quad\forall(i,j)\in A.\end{split} (17)

Appendix C The Predict-and-Search Strategy

The predict-and-search strategy, as employed in the work of Han et al. [13], involved predicting solution distributions and subsequently conducting a search for near-optimal solutions within a trust region derived from the predictions. The algorithm detailing this approach is outlined in Algorithm 2.

Algorithm 2 Predict-and-search Algorithm

Parameter: Size {k0,k1}\left\{k_{0},k_{1}\right\}, radius of the neighborhood: Δ\Delta
Input: Instance MM, Probability prediction Fθ(M)F_{\theta}\left(M\right)
Output: Solution xnx\in\mathbb{R}^{n}

1:Sort the components in Fθ(M)F_{\theta}\left(M\right) from smallest to largest to obtain sets I0I_{0} and I1I_{1}.
2:for d=1:nd=1:n do
3:     if dI0I1d\in I_{0}\cup I_{1} then
4:         create binary variable δd\delta_{d}
5:         if dI0d\in I_{0} then
6:              create constraint
7:xdδdx_{d}\leq\delta_{d}
8:         else
9:              create constraint
10:1xdδd1-x_{d}\leq\delta_{d}
11:         end if
12:     end if
13:end for
14:create constraint dI0I1δdΔ\underset{d\in I_{0}\cup I_{1}}{\sum}\delta_{d}\leq\Delta
15:Let MM^{\prime} denote the instance MM with new constraints and variables
16:Let x=SOLVE(M)x=SOLVE\left(M^{\prime}\right)
17:return xx

Appendix D SCIP Parameter Settings

The parameter settings for the SCIP solver are shown in Table 2.

Parameter Value
randomization/randomseedshift 0
randomization/lpseed 0
randomization/permutationseed 0
separating/maxrounds 0
presolving/maxrestarts 0
Heuristics AGGRESSIVE
Table 2: SCIP solver parameter settings .

Appendix E Other Experiment Settings

The implementation of our MMILP-GNN is modified from the branching-imitation module in the ecole package with Pytorch [21]. We use ADAM as the optimizer with a 0.0001 learning rate. All experiments were conducted on a desktop with one Intel 12900K CPU and one Nvidia Tesla V100 GPU, with 64GB RAM.