A Customized Augmented Lagrangian Method for Block-Structured Integer Programming
Abstract
Integer programming with block structures has received considerable attention recently and is widely used in many practical applications such as train timetabling and vehicle routing problems. It is known to be NP-hard due to the presence of integer variables. We define a novel augmented Lagrangian function by directly penalizing the inequality constraints and establish the strong duality between the primal problem and the augmented Lagrangian dual problem. Then, a customized augmented Lagrangian method is proposed to address the block-structures. In particular, the minimization of the augmented Lagrangian function is decomposed into multiple subproblems by decoupling the linking constraints and these subproblems can be efficiently solved using the block coordinate descent method. We also establish the convergence property of the proposed method. To make the algorithm more practical, we further introduce several refinement techniques to identify high-quality feasible solutions. Numerical experiments on a few interesting scenarios show that our proposed algorithm often achieves a satisfactory solution and is quite effective.
Index Terms:
Integer programming, augmented Lagrangian method, block coordinate descent, convergence1 Introduction
In this paper, we consider a block-structured integer programming problem:
(1a) | ||||
s.t. | (1b) | |||
(1c) |
where is the -th block variable of , i.e., for with . In (1), and the constraint is the set of vectors in a polyhedron, i.e.,
The constraints (1c) can be reformulated as where the block diagonal matrix is formed by the small submatrices as the main diagonal and . Correspondingly, and can be rewritten as with and with .
Assume that these constraints (1c) are “nice” in the sense that an integer program with just these constraints is easy. Therefore, if the coupling constraints (1b) are ignored, the remaining problem which is only composed of the constraints (1c) is easier to solve than the original problem (1). For convenience, we assume is not empty and (1) is feasible. Denote by the optimal value of the problem (1). This block structure is closely related to an important model known as “-fold integer programming (IP)” studied extensively in computer vision [1, 2], machine learning [3, 4] and theoretical computer science [5, 6], etc. The theoretical foundations of -fold IPs have significant implications for efficient algorithm development in various fields. For example, an algorithmic theory of integer programming based on -fold IP was proposed in [5]. Recent advancements have been provided in [6]. Furthermore, the progress in theory and application of integer programming with a block structure was summarized in [7, 8]. While existing works on -fold IP mainly focus on asymptotic analyses, this work aims to develop an efficient augmented Lagrangian approach tailored to the block structure for practical efficiency with a convergence guarantee.
1.1 Related Work
The branch and bound algorithm for general integer programming (IP) was first introduced by Land and Doig [9]. Gomory [10] developed a cutting plane algorithm for integer programming problems. These two approaches are at the heart of current state-of-the-art software for integer programming. Unfortunately, these methods often suffer from high computational burdens due to the discrete constraint, thus they are not good choices for solving some large-scale practical problems. Therefore, it is necessary to develop efficient approaches to obtain feasible and desirable solutions, even if they may not be globally optimal. Considering the block structure of the integer linear programming (1), a natural idea is to decompose a large global problem into smaller local subproblems. There are three typical decomposition methods for solving such problem: Benders [11], Dantzig-Wolfe (DW) [12] and Lagrangian decompositions [13]. The Benders decomposition method is used to deal with mixed integer programming problems by decomposing them into a master problem and a primal subproblem. It generates cuts that are added to the master problem by solving the dual of the primal subproblem. This method is not suitable for (1), since our primal subproblem is an integer programming problem, which is still NP-hard. The DW decomposition method can solve block structured integer programming problem (1) based on resolution theorem. To improve the tractability of large-scale problems, the DW decomposition relies on column generation. However, it might be harder or even intractable to solve the master problem by column generation if further constraints are applied to [14, Chapter 8.2]. The Lagrangian decomposition introduces Lagrange multipliers and constructs a sequence of simpler subproblems. Although the method itself has a limitation due to its inherent solution symmetry for certain practical problems, the Lagrangian duality bound is useful in the branch-and-bound procedure. A Lagrangian heuristic algorithm has also been proposed in [15] for solving a real-world train timetabling problem, which decomposed the problem into smaller subproblems using Lagrangian relaxation and developed a heuristic algorithm based on subgradient optimization to generate feasible solutions.
Many techniques from continuous optimization including the alternating direction method of multipliers (ADMM) have been applied to solve integer programming recently. The authors in [16] proposed an -box ADMM for solving a binary integer programming, where the binary constraint is replaced by the intersection of a box and an -norm sphere. The authors in [17] proposed augmented Lagrangian method (ALM) and ADMM based on the augmented Lagrangian function for two-block mixed integer linear programming (MILP). For the multi-block MILP problem, an extended ADMM was proposed in [18] to employ a release-and-fix approach to solve the subproblems. There are also some heuristic methods based on ADMM [19, 20, 21, 22] that have been successfully applied to various practical integer programming problems.
There are some other widespread approaches on the relaxation of the binary constraint including linear programming (LP) relaxation [23, 24] and semidefinite relaxation [25, 26]. For the multi-block MILP problem, several inexact methods have been proposed including a distributed algorithm relying on primal decomposition [27], a dual decomposition method [28], and a decomposition-based outer approximation method [29]. By introducing continuous variables to replace the discrete variables, the exact penalty methods [30, 31, 32] have been studied for solving the nonlinear IP problems. Then the problem is transformed into an equivalent nonlinear continuous optimization problem.
1.2 Contributions
In this paper, we propose a customized ALM for solving (1). Our main contributions are listed below.
- (i)
-
(ii)
Based on the special structure of (1), we propose two block coordinate descent (BCD)-type methods that are well-suited for solving the resulting subproblems in our ALM framework. These methods utilize classical update and proximal linear update techniques, denoted as ALM-C and ALM-P, respectively. We also analyze their convergence properties under proper conditions.
-
(iii)
To address challenges in finding the global optimal solution for practical problems such as train timetabling, we introduce refinement strategies and propose a customized ALM to enhance the quality of solutions generated by the ALM. Our numerical experiments demonstrate the effectiveness of the proposed method in solving large-scale instances.
Note that the ADMM-based method in [21] solves each subproblem only once per iteration for a fixed Lagrangian multiplier. On the other hand, our ALM method involves multiple iterations using the BCD method to minimize each AL function until certain rules are satisfied. Once the AL subproblem is solved exactly, the strong duality guarantees that the ALM can converge to a global minimizer of the problem (1). Therefore, achieving high accuracy in minimizing the AL function allows our ALM to achieve superior solutions with fewer iterations, resulting in significantly reduced computational time compared to the ADMM. This claim is supported by our numerical tests. Additionally, we introduce Assumptions 3.1 and 3.2, derived from the structural characteristics of the practical problem in [20] and [21], and subsequently analyze the theoretical properties of the ALM-C method under these assumptions. However, our ALM-P method has wider applicability and does not require these specific assumptions. Moreover, the ALM can be regarded as a dual ascent algorithm with respect to dual variables, hence ensuring its convergence. In contrast, the convergence analysis of the ADMM for solving this kind of problem remains unclear.
The existing ALM-based methods for solving integer programming in [33] mainly focused on the duality of the augmented Lagrangian dual for MILP with equality constraints, but numerical experiments were not available. Moreover, our approach differs significantly in that we handle inequality constraints directly, without introducing slack variables. This approach has two key benefits: it reduces the number of variables, thus decreasing computational burden in high-dimensional cases, and it allows for more customized algorithmic design based on the inherent structure of the problem. In [17], the norm was considered as an augmented term in the AL function for MILP such that the minimization of the AL function can be decomposed into multiple low-dimensional -penalty subproblems due to the separable blocks. These subproblems were then solved in parallel using the Gurobi solver. In our work, we take a different approach by using a quadratic term in the AL function which allows the augmented Lagrangian subproblem to be reduced to a linear programming under certain conditions. Furthermore, we update the blocks of variables sequentially one at a time.
1.3 Notation and Organization
Let , and . The superscript means “transpose”. Denote by the -th row of the matrix and by the -th element of the vector . For convenience, we let denote the submatrix consisting of columns of indexed by and denote the subvector consisting of entries of indexed by . A neighborhood of a point is a set consisting of all points x such that . Let be a row vector of all ones.
This paper is structured as follows. An AL function is defined and the strong duality of the problem (1) is discussed in section 2. We propose a customized ALM incorporating BCD methods and refinement strategies to improve the quality of the ALM solutions in section 3. We establish the convergence results of both the BCD methods to minimize the AL function and the ALM applied to the whole problem (1) in section 4. The proposed method is applied to two practical problems in section 5. Concluding remarks are made in the last section.
2 The AL Strong Duality
The Lagrangian relaxation (LR) of (1) with respect to the constraint (1b) has the following form:
(2) |
where is a Lagrange multiplier associated with the constraint (1b). We can observe that (2) is much easier to be solved than the original problem (1) since (2) can be decomposed into -block low-dimensional independent subproblems. However, there may exist a non-zero duality gap when certain constraints are relaxed by using classical Lagrangian dual [33]. To reduce the duality gap, we add a quadratic penalty function to the Lagrangian function in (2) and solve the augmented Lagrangian dual problem which is defined as follows.
Definition 2.1 (AL Dual)
We define an AL function by
(3) | |||||
where The corresponding AL relaxation of (1) is given by
(4) |
We call the following maximization problem the AL dual problem:
(5) |
Note that the classical AL function of (1) is given by
(6) |
We prefer using the form of the AL function (3) rather than the classical version (6) due to the absence of in the quadratic term of the max function. This makes it possible to convert the AL function into a linear function under certain conditions, making the problem (4) easier to solve, which will be explained in the next section. We also verify that the quadratic term in (3) is an exact penalty and strong duality holds between the AL dual problem (5) and the primal problem (1).
Lemma 2.1 (Strong Duality)
Suppose the problem (1) is feasible and its optimal value is bounded. If a minimum achievable non-zero slack exists, i.e., there is a such that for any ,
(7) |
then there exists a finite such that
Proof:
For any and , since , we have
(8) |
Then .
Now it suffices to find a finite such that . We first let be any arbitrary feasible solution of (1), that is, and . Denote by the linear programming (LP) relaxation of . Since the value of the LP relaxation of (1) is bounded [34], i.e., , we set , then . Moreover,
(9) |
where is a given parameter. Let , we consider following two cases:
Case 1: . In this case we have for all . By letting , we can obtain that
(10) |
Case 2: . Denote by a positive optimal vector of dual variables for in the LP relaxation of (1). In this case we get
Since
it yields
(11) |
where the second inequality holds due to the definition of . Thus the inequalities (2) and (2) by letting be and imply that
This together with (9) and (8) yields that
Hence we complete the proof. ∎∎
One can observe from the proof that there exists a finite value such that for all , the strong duality still holds. Therefore, given a sufficiently large penalty parameter , we can achieve a satisfactory feasibility of (1). Specifically, as increases beyond , the augmented Lagrangian method penalizes constraint violations (1b) more heavily, thereby making the solution closer to the feasible region of the problem. The strong duality allows us to obtain a globally optimal solution to the problem (1) by solving the augmented Lagrangian dual problem (5).
To utilize the concave structure of the dual function , we apply the projected subgradient method for solving (5) since the dual function is not differentiable. We first give the definition of subgradient and subdifferential.
Definition 2.2
Let be a convex function. The vector is called a subgradient of at if
The subdifferential of at is the set of all subgradients of at which is given by
Since is concave, we adjust Definition 2.2 to correspond to the set , allowing us to apply properties of the subdifferential of a convex function to a concave function.
Proposition 2.1
Consider the dual function . Then the subdifferentials of at and satisfy
(12) |
where x is a solution of (4) with input .
3 Augmented Lagrangian method
In this section, we introduce an augmented Lagrangian method framework. This method is composed of two steps in solving the augmented Lagrangian dual problem (5). We first use the primal information x to construct the subgradient of at . Since and satisfy and , then we apply the projected subgradient method to update parameters and . Starting from and , we can update and at -th iteration by
(14a) | ||||
(14b) |
where is an optimal solution of the augmented Lagrangian relaxation problem (4) at and , that is,
(15) |
Therefore, the iterative processes (15), (14a) and (14b) consisting of primal and dual variables make up the ALM framework.
Solving the x-subproblem (15) is an important step in ALM. While the subproblem (15) has no closed form solution in general, we can apply an iterative method to solve it exactly or inexactly. Consequently, the ALM for solving the problem (1) consists of outer and inner iterations. The subscript is used to denote the outer iteration number, and the subscript is used to denote the inner iteration number.
3.1 A BCD method for subproblem (15)
In this subsection, we present a BCD method for solving the x-subproblem (15) in the ALM framework. The BCD method minimizes the function by iterating cyclically in order , fixing the previous iteration during each iteration. Denote where is the value of at its -th update. Let
and
Therefore and Given fixed parameters and , we can also calculate the gradient of at by
At each step, we consider two types of updates for every :
(16a) | |||
Proximal linear: | |||
(16b) |
where is a step size. In fact, if one defines the projection operator by
then we obtain the following equivalent form of (16b):
In general, the classical subproblem (16a) is fundamentally harder to solve because of the quadratic term in the objective function. However, we derive a simplified form of this subproblem under certain conditions, which will be discussed in Subsection 3.1.2. By contrast, the prox-linear subproblem (16b) is relatively easy to solve because the objective function is linear with respect to . Now we summarize the ALM for solving (1) in Algorithm 1, which allows each to be updated by (16a) or (16b). We assume that each block is updated by the same scheme in (16a) and (16b) for all iteration .
3.1.1 Proximal linear update of BCD
Before considering the proximal linear subproblem (16b), we first give a definition of a linear operator, which is essential in the BCD method.
Definition 3.1
The linear operator associated to a vector is defined by
where is a nonempty and closed set.
Benefiting from the good characteristics of x being a binary variable, we have
(17) |
then the objective function in (16b) is linear. Therefore, we can also rewrite (16b) as
(18) |
Due to the discrete property of the feasible set , it is crucial to choose the step size appropriately. If is too large, the BCD method will not converge. If is too small, the BCD method will be stuck at some points. The reason why this happens will be explained in the convergence analysis.
3.1.2 Classical update of BCD
We start by giving the following assumption of model (1), which is very common in many applications such as train timetabling, vehicle routing, and allocation problems.
Assumption 3.1
The entries of the matrix are either or . The vector equals 1.
Under this assumption, we consider the subproblem (16a). For each , if the condition always holds for each iteration of the BCD method, then
(19) | ||||
To prove the above derivation, we introduce following two notations at the -th update:
(22) |
where Obviously, and . Therefore, we have
(23) | |||
(24) |
and
(27) |
Since the element in is either or , we have
(28) |
Denote . Then
(31) |
Then the iterative scheme for x-update is given by
We can observe that the condition induces the decomposition of minimizing the augmented Lagrangian function (3) into a set of subproblems with a linear objective function, which makes (16a) easier to solve. Therefore, the key point in the derivation of the linearization process is utilizing the fact that the entries in are either or . We next verify this condition is always true in each iteration of the BCD method under the following assumption.
Assumption 3.2
Denote by the indices of columns of the matrix containing only zeros and the -th element of . At least one of the following holds.
-
(i)
For all , there exists such that and .
-
(ii)
For all , there is at most one nonzero element in the vector and .
Remark: (a) Assumption 3.2 (i) requires that if an element satisfies , then at least one of and the -th column vector of is zero. Assumption 3.2 (ii) requires that if the block constraint set is large enough such that holds, then the weight corresponding to each block variable has only one nonzero element. (b) Assumption 3.2 usually holds when and are sparse. This is particularly true in the train timetabling problem with the objective of scheduling more trains, as will be shown in Section 5.
Lemma 3.1
Suppose Assumption 3.2 hold. From any starting point , the solution generated by the BCD method with the classical update at each iteration satisfies for all and .
Proof:
Let be the solution generated by the BCD method after -th classical update. We argue by contradiction and suppose that there exists such that . Then for any satisfying , we have
which implies that for any ,
then
(35) |
We consider following two cases corresponding to Assumption 3.2:
Overall, we can reformulate the subproblem (16a) into (19) based on the special structure of the model (1) under certain conditions, and thus make it easier to solve. It is worth noting that if Assumption 3.2 is not satisfied, we can consider (19) as a linear approximation of (16a). This linearization technique is widely used, including in works [20] and [21]. However, they lacked theoretical guarantees. Our main result shows that under specified assumptions, using the update (19) in the ALM-C method is equivalent to solving (16a) exactly, theoretically ensuring the effectiveness of our method.
3.2 Finding a good feasible solution by set packing
Since the BCD method may not always yield a feasible solution[35], we can adopt several refinement strategies that are very useful to find a feasible solution to the problem (1) in practice. Very often, (1) represents a problem where we optimize under limited resources, and this provides us a view of set packing problems. Since we produce candidate solutions all along, a natural idea is to utilize past iterates to construct a feasible solution. A simple strategy could be constructing a solution pool for each that includes the past BCD iterates . Intuitively, the solution pool may include feasible or “nice” solutions in some sense.
To illustrate the above approach, we first introduce the iterative sequential technique.
3.2.1 A sweeping technique
The most simple technique is probably to select a subset of blocks, one by one, until the infeasibility is detected. We present this sequential method in Algorithm 2, which can be understood by simply sweeping the blocks and selecting a candidate solution from if feasibility is still guaranteed, otherwise simply skip the current block and continue.
3.2.2 A packing technique
Let us further explore the idea of selecting solutions in a systematic way. Formally, for each block , we introduce a set of binary variables that defines the current selection:
where . We consider the following problem:
(38a) | ||||
s.t. | (38b) | |||
(38c) |
In view of (38b), one may recognize the above problem as a restricted master problem appearing in column generation algorithms where (38c) stands for a set of knapsack constraints.
Specifically, the coupling constraints (38c) in our model are cliques, representing complete subgraphs where each pair of distinct nodes is connected. This allows us to reformulate the model as a maximum independent set problem. Therefore, we can find a feasible solution which satisfies by solving the problem (38). Since this problem is still hard, we only solve the relaxation problem of the maximal independent set. Now we go into details. Since the knapsack (38c) means the candidate solutions may conflict, we can construct a conflict graph . In this graph, represents the set of nodes, where each node corresponds to a solution generated as the algorithm proceeds, and is the set of edges that connect two conflicting solutions, meaning they violate the coupling constraints. In this view, we only have to maintain the graph and find a maximal independent set . Therefore, the output feasible point corresponds to for each . If , then . We summarize the maximal independent set technique in Algorithm 3. We also note that Algorithm 2 can be seen as a special case of Algorithm 3.
Based on the above-mentioned techniques, we improve the ALM and propose a customized ALM in Algorithm 4.
Although Algorithms 2 and 3 can help us find a feasible solution to problem (1), the quality of output by them remains unjustified. To evaluate and improve the quality of the solution, the simple way is to estimate the upper and lower bounds of the objective function value, and then calculate the gap between them. The smaller the gap, the better the current solution. Obviously, our method can provide an upper bound of the objective function value. As for the generation of the lower bound, we can use the following method. (i) LP relaxation: by directly relaxing the binary integer variables to continuous variables, we solve a linear programming problem exactly to obtain the lower bound of the objective function value. (ii) LR relaxation: since the Lagrangian dual problem (2) is separable, we can solve the decomposed subproblems exactly to obtain the lower bound of the objective function value.
In general, the Lagrangian dual bound is at least as tight as the linear programming bound obtained from the usual linear programming relaxation [14]. Note that the relaxation must be solved to optimality to yield a valid bound. Therefore, we can use a combination of LR method and Alg. 4. To be specific, LR aims at generating the lower bound of the objective function value in (1), the solution is usually infeasible. Steps 2 and 3 in Alg. 4 are used for generating feasible solutions of (1). The smaller the gap between the lower bound and the upper bound, the closer the feasible solution is to the global optimal solution of the problem. It can be seen from the iterative procedure that after many iterations, this algorithm can generate many feasible solutions, and the lower bound of the model is constantly improving. Finally, we select the best solution. The combination of these two methods takes advantage of the ALM and the LR method, it not only finds a good feasible solution but also evaluates the quality of the solution.
Compared with the ADMM-based method in [21], we utilize BCD method to perform multiple iterations to solve the subproblem (15) until the solutions remain unchanged, which can improve the solution accuracy of the subproblem, thereby reducing the total number of iterations. Moreover, we adopt different refinement techniques to further enhance the solution quality.
4 Convergence Analysis
In this section, we present the convergence analysis of the block coordinate descent method for solving the augmented Lagrangian relaxation problem (4) and the augmented Lagrangian method for solving the dual problem (5). Unless otherwise stated, the convergence results presented in this section do not rely on Assumptions 3.1 and 3.2.
4.1 Convergence of BCD
We begin this section with the property of augmented Lagrangian function (3) that is fundamental in the convergence analyis.
Proposition 4.1
The gradient of at x is Lipschitz continuous with constant on , namely,
for all , where . Furthermore,
(39) |
for all .
Proof:
For any ,
Then we have
It follows from the convexity of the function that
which implies the estimate (39). ∎∎
Considering two different updates in the BCD method, we first summarize the convergence property of the BCD method for solving (16a), which has been presented in [36]. Before that, we give the definition of the blockwise optimal solution, which is also called coordinatewise minimum point in [37]. Given parameters and , a feasible solution is called a blockwise optimal solution of the problem (4) if for each , we have for all ,
Lemma 4.1
Proof:
If Assumptions 3.1 and 3.2 hold, Lemma 3.1 tells us that the subproblem (16a) can be solved exactly. (i) Since the constraint set of each subproblem (16a) is bounded, then all subproblems have an optimal solution. Therefore, the BCD method for solving (16a) is executable. (ii) The result in (i) illustrates that the sequence generated by the BCD method exists. Thus the sequence of objective function values can only take finitely many different values. This together with the monotonically decreasing property of the function yields that must become a constant. Then the BCD method is executable. (iii) By the definition of the blockwise optimal solution and the fact that the subproblem (16a) can be solved exactly, we arrive at the conclusion. ∎∎
A similar conclusion can be found in [36], but it does not clarify how to solve the subproblem. Note that each (global) optimal solution of the model (4) is blockwise optimal, but not vice versa. Subsequently, we analyze the convergence of (16b). The following lemma ensures decreasing the function value of after each iteration if and the step size is chosen properly.
Lemma 4.2
Let be a sequence generated by (16b), we obtain
(40) |
Proof:
This lemma tells us that a small step size satisfying leads to a decrease in the function value when . However, the following lemma states that when the step size is too small, the iteration returns the same result. We let denote the gradient of at x.
Lemma 4.3
If the step size satisfies when , then it holds that
Proof:
If , for all , we have
where the last inequality holds due to . It yields that
By the definition of the operator , we arrive at the conclusion. ∎∎
Based on Lemma 4.3, the implementation of the BCD method heavily relies on the choice of step size . Hence, we give a definition of a -stationary point.
Definition 4.1
For the AL relaxation problem (4), if a point satisfies
with , then it is called a -stationary point.
For the augmented Lagrangian relaxation problem (4), given parameters and , we say that is a -local minimizer if there is an integer such that
Note that a -local minimizer is a global minimizer due to the fact for all . The following important result reveals a relationship between a -stationary point and a global minimizer of the problem (4).
Theorem 4.1
We have the following relationships between the -stationary point and the local minimizer of the problem (4).
-
(i)
If is a local minimizer, then is a -stationary point for any step size .
-
(ii)
If is a -stationary point with , then is a -local minimizer of the problem (4) under the assumption that the entries in and are integral.
Proof:
(i) If is a local minimizer, then for any we have
Therefore,
which implies that is a -stationary point.
(ii) Since is a stationary point, then for any we have
(43) |
For any , we define the index sets and . Then
which together with (43) and yields that
Since the entries in and are integral, then is an integer for every . This implies that
Then for any , it follows from the convexity of that
Therefore, is a -local minimizer of the problem (4). ∎∎
We can observe from Theorem 4.1 that every blockwise optimal solution of (4) is a -stationary point. Conversely, if the entries in and are integral and , then every -stationary point of (4) is blockwise optimal. These results on relationships between -stationary point, blockwise optimal solution and local (global) minimizer are summarized in Figure 1 intuitively.
We finally present the main result on the convergence of the BCD method for solving (16b).
Theorem 4.2
(Convergence properties) Let be a sequence generated by (16b). If the step size satisfies , then we have
-
(i)
The sequence is nonincreasing and
(44) -
(ii)
The sequence converges to a -stationary point after at most iterations, where .
Proof:
(ii) Given the parameters and , we can observe that the model (4) can be equivalently written as
where is the indicator function of the set , i.e., and otherwise. To verify that the sequence converges to a -stationary point, we examine that the conditions in [38, Theorem 1] hold.
Firstly, since the set is semi-algebraic, then is a Kurdyka-Łojasiewicz (KL) function [39, 38]. It is obvious to see that is also a KL function, and hence the function satisfies the KL property, which is a crucial condition ensuring convergence.
Secondly, the Lipschitz constant in Proposition 4.1 is bounded if has an upper bound, and the problem (4) is inf-bounded.
Suppose is a -stationary point, then for every , we have
(45) |
If each iteration always finds a new point until arriving at , we get
(46) |
where the first inequality holds due to the conclusion in (i) and the fact . Thus combining (4.1) with (4.1) yields that . It implies that we only need at most steps to arrive at a -stationary point. ∎∎
If each iteration of the BCD method always finds a new point, then according to Theorem 4.2 (i), we have . By Theorem 1 in [38], we can conclude that where is a limit point of the sequence . If we choose an initial point that is already close to an optimal solution and an appropriate step size , based on specific convergence criteria detailed in Theorem 4.2, the BCD method can find a global solution of problem (4).
4.2 Convergence of ALM
Assume the BCD method returns a global minimizer to the augmented Lagrangian relaxation problem (4) in each inner loop. In this subsection, we focus on the convergence property of the projected subgradient method for solving the dual problem (5). For convenience, we introduce the following constants for subsequent analysis:
Let denote the subgradient of . We estimate the distance between the dual function value in each iteration and the optimal value of the problem (1) in the following theorem.
Theorem 4.3
If we take the step size with , then,
where . Furthermore, if the positive sequence is bounded and Then converges to some .
Proof:
Let be the sequence generated by Algorithm 1. We derive from the Lemma 3 in [17] that
(47) |
For all , we have
Then the right-hand side of (47) satisfies
Thus we arrive at the first result.
The second claim is then proved in [40, Theorem 7.4], where the proof for the subgradient method is readily extended to the projected subgradient method. ∎∎
Although the theoretical analysis of ALM-C relies on Assumption 3.1, our tests show that ALM-C is also effective in more general settings. Furthermore, we present the ALM-P method as a versatile alternative not relying on this assumption.
5 Applications
In this section, we show the performance of the proposed algorithms on two practical problems. One is the train timetabling problem and the other is the vehicle routing problem. To show the performance of compared methods, we define three termination criteria: a maximum number of iterations, a time limit, and an optimality gap based on the difference between the objective value generated by our methods and the best known value. All instances are tested on a MacBook Pro 2019 with 8GB of memory and Intel Core i5 (Turbo Boost up to 4.1 GHz) with 128 MB eDRAM.
5.1 Capacitated vehicle routing problem
We consider the capacitated vehicle routing problem with time windows (CVRPTW). The problem is defined on a complete directed graph , where is the node set and is the edge set. Node 0 represents the depot where the vehicles are based, and nodes 1 to represent the customers that need to be served. Each edge in has an associated travel time . Each customer has a demand and a service time window . We let be the distance from node to node and be a large constant. The objective is to construct a set of least-cost vehicle routes starting and ending at the depot, such that each customer is visited exactly once within their time window, and the total demand of customers served in each route does not exceed vehicle capacity .
To formulate this problem as an integer program, we define the following decision variables: (i) : binary variable equal to 1 if edge is used by vehicle , 0 otherwise. (ii) : continuous variable indicating the start of service time at customer by vehicle . Then the block structured integer linear programming formulation is:
min | (48a) | |||
s.t. | (48b) | |||
(48c) | ||||
(48d) | ||||
(48e) | ||||
(48f) | ||||
(48g) | ||||
(48h) |
The block structure lies in the routing variables for each vehicle , which are constrained by the flow balance, capacity and time window constraints. By relaxing the coupling constraints (48b), the problem decomposes into separate routing subproblems per vehicle.
5.1.1 Parameter Setting
We let ALM-C and ALM-P denote Algorithm 4 using the updates (16a) and (16b) in step 2, respectively. We compare our proposed methods with the Gurobi solver (version 11.0.0) and OR-tools on all the instances from the Solomon dataset [41]. Since the ADMM in [20] lacks adaptability for all instances, then we do not compare with it. To illustrate the scale of these instances, we present a subset of representative examples in Table I. The notations and represent the number of vehicles, the number of customers and the number of edges. and denote the number of variables and , respectively. To evaluate the robustness of the compared methods, we conduct experiments on the C1-type instances with various problem sizes, as detailed in Table II. The results of all remaining instances from the Solomon dataset are presented in Table III.
We adopt a “vehicle and route-based” formulation to model the CVRPTW problem, which is different from the space-time network flow formulation used in [20]. Therefore, the subproblem is a route problem with capacity and time window constraints. We utilize the Gurobi solver to solve the subproblem for convenience. Note that the formulation does not affect Gurobi’s performance. To ensure a fair comparison and maximize Gurobi’s utilization, we have implemented efficient callback functions based on Danzig’s formulation to fine-tune its performance. We set a time limit of 2000 seconds for the compared methods, except for or-tools where the time limit is set to 500 seconds, and use the symbol “–” to signify that the solver failed to find a feasible solution within the allotted time limit.
No. | |||||||
---|---|---|---|---|---|---|---|
R101-50 | 12 | 50 | 2,550 | 50 | 30,636 | 30,600 | 612 |
R201-50 | 6 | 50 | 2,550 | 50 | 15,318 | 15,300 | 306 |
RC101-50 | 8 | 50 | 2,550 | 50 | 20,424 | 20,400 | 408 |
RC201-50 | 5 | 50 | 2,550 | 50 | 12,765 | 12,750 | 255 |
C101-100 | 10 | 100 | 10,100 | 100 | 101,030 | 101,000 | 1,010 |
C201-100 | 3 | 100 | 10,100 | 100 | 30,309 | 30,300 | 303 |
5.1.2 Performances of the Proposed Algorithm
In the subsequent tables, the “” and “” columns correspond to the best-known objective values and the objective values of feasible solutions generated by these compared methods, respectively. The “Time” column denotes the CPU time (in seconds) that the methods taken by the algorithms to meet the stopping criteria. The optimality gap is defined by gap. Due to the lack of an inherent termination criterion in OR-tools, it runs for the entire limited time and outputs the corresponding feasible solution. Therefore, we do not report its solution time. Table II shows the stability of the compared methods under various problem sizes on the C1-type instances. We can observe that our methods outperform the OR-Tools and Gurobi, and are more stable than the OR-Tools. From Table III, we can find that the ALM-C algorithm outperforms the OR-Tools and Gurobi in most instances, achieving significantly lower optimality gaps and competitive computation times. The ALM-P algorithm also exhibits promising results, often outperforming Gurobi in efficiency and solution quality. Overall, both the proposed ALM-C and ALM-P algorithms demonstrate their superiority and robustness in solving the CVRPTW problem, providing high-quality solutions with good computational efficiency when compared to existing solvers and heuristic approaches.
ins. | OR-Tools | Gurobi | ALM-P | ALM-C | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gap1 | gap1 | Time | gap1 | Time | gap1 | Time | ||||||||
c101 | 25 | 3 | 191.3 | 632.6 | 230.7% | 191.8 | 0.3% | 0.0 | 191.8 | 0.3% | 2.4 | 191.8 | 0.3% | 2.7 |
50 | 5 | 362.4 | 514.4 | 41.9% | 363.2 | 0.2% | 1.1 | 363.2 | 0.2% | 5.9 | 363.2 | 0.2% | 5.0 | |
100 | 10 | 827.3 | 828.9 | 0.2% | 828.9 | 0.2% | 8.9 | 828.9 | 0.2% | 57.8 | 828.9 | 0.2% | 55.0 | |
c102 | 25 | 3 | 190.3 | 786.1 | 313.1% | 190.7 | 0.2% | 1.9 | 190.7 | 0.2% | 3.7 | 190.7 | 0.2% | 4.6 |
50 | 5 | 361.4 | 667.9 | 84.8% | 362.2 | 0.2% | 25.5 | 362.2 | 0.2% | 16.0 | 362.2 | 0.2% | 15.3 | |
100 | 10 | 827.3 | 828.9 | 0.2% | - | 2000 | 828.9 | 0.2% | 167.4 | 828.9 | 0.2% | 196.4 | ||
c103 | 25 | 3 | 190.3 | 786.1 | 313.1% | 190.7 | 0.2% | 7.0 | 190.7 | 0.2% | 15.9 | 190.7 | 0.2% | 11.0 |
50 | 5 | 361.4 | 667.9 | 84.8% | 362.2 | 0.2% | 1584.8 | 365.3 | 1.1% | 40.0 | 362.2 | 0.2% | 41.5 | |
100 | 10 | 826.3 | - | - | - | 2000 | 839.0 | 1.5% | 362.6 | 828.1 | 0.2% | 259.9 | ||
c104 | 25 | 3 | 186.9 | 875.6 | 368.5% | 187.4 | 0.3% | 45.1 | 188.6 | 0.9% | 33.5 | 188.6 | 0.9% | 17.9 |
50 | 5 | 358.0 | 606.2 | 69.3% | 360.1 | 0.6% | 2000 | 362.2 | 1.2% | 57.1 | 358.9 | 0.3% | 319.8 | |
100 | 10 | 822.9 | 1202.3 | 46.1% | - | 2000 | 890.9 | 8.3% | 667.5 | 904.9 | 10.0% | 363.3 | ||
c105 | 25 | 3 | 191.3 | 609.6 | 218.6% | 191.8 | 0.3% | 0.1 | 191.8 | 0.3% | 3.3 | 191.8 | 0.3% | 3.4 |
50 | 5 | 362.4 | 482.4 | 33.1% | 363.2 | 0.2% | 0.3 | 363.2 | 0.2% | 6.4 | 363.2 | 0.2% | 6.0 | |
100 | 10 | 827.3 | 828.9 | 0.2% | 828.9 | 0.2% | 15.8 | 828.9 | 0.2% | 53.3 | 828.9 | 0.2% | 34.2 | |
c106 | 25 | 3 | 191.3 | 639.6 | 234.3% | 191.8 | 0.3% | 0.1 | 191.8 | 0.3% | 3.6 | 191.8 | 0.3% | 3.8 |
50 | 5 | 362.4 | 430.4 | 18.8% | 363.2 | 0.2% | 1.4 | 363.2 | 0.2% | 5.8 | 363.2 | 0.2% | 5.9 | |
100 | 10 | 827.3 | 828.9 | 0.2% | 828.9 | 0.2% | 650.6 | 828.9 | 0.2% | 69.6 | 828.9 | 0.2% | 47.0 | |
c107 | 25 | 3 | 191.3 | 565.6 | 195.6% | 191.8 | 0.3% | 0.1 | 191.8 | 0.3% | 3.2 | 191.8 | 0.3% | 4.3 |
50 | 5 | 362.4 | 457.4 | 26.2% | 363.2 | 0.2% | 0.8 | 363.2 | 0.2% | 6.1 | 363.2 | 0.2% | 6.6 | |
100 | 10 | 827.3 | 828.9 | 0.2% | 828.9 | 0.2% | 14.6 | 828.9 | 0.2% | 65.2 | 828.9 | 0.2% | 35.2 | |
c108 | 25 | 3 | 191.3 | 564.6 | 195.1% | 191.8 | 0.3% | 0.9 | 191.8 | 0.3% | 4.5 | 191.8 | 0.3% | 4.4 |
50 | 5 | 362.4 | - | - | 363.2 | 0.2% | 18.6 | 363.2 | 0.2% | 8.4 | 363.2 | 0.2% | 14.2 | |
100 | 10 | 827.3 | - | - | - | 2000 | 828.9 | 0.2% | 98.0 | 828.9 | 0.2% | 265.4 | ||
c109 | 25 | 3 | 191.3 | 475.6 | 148.6% | 191.8 | 0.3% | 9.6 | 191.8 | 0.3% | 6.3 | 191.8 | 0.3% | 9.7 |
50 | 5 | 362.4 | 363.2 | 0.2% | 363.2 | 0.2% | 1867.4 | 363.2 | 0.2% | 12.3 | 363.2 | 0.2% | 13.1 | |
100 | 10 | 827.3 | 828.9 | 0.2% | - | 2000 | 828.9 | 0.2% | 234.5 | 828.9 | 0.2% | 304.3 |
ins. | OR-Tools | Gurobi | ALM-P | ALM-C | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
c201.50 | 3 | 360.2 | 1622 | 350.3% | 361.8 | 0.4% | 0.1 | 361.8 | 0.4% | 4.2 | 361.8 | 0.4% | 3.4 |
c202.50 | 3 | 360.2 | 1622 | 350.3% | 361.8 | 0.4% | 5.7 | 361.8 | 0.4% | 19.1 | 366.8 | 1.8% | 7.1 |
c203.50 | 3 | 359.8 | 1713.4 | 376.2% | 361.4 | 0.4% | 113.5 | 361.4 | 0.4% | 43.7 | 361.4 | 0.4% | 76.8 |
c204.50 | 2 | 350.1 | 1435.3 | 310.0% | 351.7 | 0.5% | 2000 | 366.9 | 4.8% | 55.7 | 351.7 | 0.5% | 166.3 |
c205.50 | 3 | 359.8 | 1729 | 380.5% | 361.4 | 0.4% | 1.1 | 361.4 | 0.4% | 7.9 | 361.4 | 0.4% | 7.0 |
c206.50 | 3 | 359.8 | 1741.8 | 384.1% | 361.4 | 0.4% | 1.5 | 361.4 | 0.4% | 14.4 | 361.4 | 0.4% | 19.1 |
c207.50 | 3 | 359.6 | 1622.3 | 351.1% | 361.2 | 0.4% | 11.9 | 361.2 | 0.5% | 25.0 | 361.2 | 0.4% | 125.7 |
c208.50 | 2 | 350.5 | 1629.2 | 364.8% | 352.1 | 0.5% | 8.0 | 355.9 | 1.5% | 14.2 | 352.1 | 0.5% | 31.5 |
r101.50 | 12 | 1044 | 1541.9 | 47.7% | 1046.7 | 0.3% | 2.1 | 1046.7 | 0.3% | 38.5 | 1046.7 | 0.3% | 47.3 |
r102.50 | 11 | 909 | 1374.8 | 51.2% | 911.4 | 0.3% | 2000 | 939.5 | 3.4% | 248.9 | 925.3 | 1.8% | 317.2 |
r103.50 | 9 | 772.9 | 1069.1 | 38.3% | - | - | 2000 | 806.4 | 4.3% | 289.8 | 803.8 | 4.0% | 256.1 |
r104.50 | 6 | 625.4 | 743.5 | 18.9% | - | - | 2000 | 654.4 | 4.6% | 472.1 | 686.6 | 9.8% | 649.9 |
r105.50 | 9 | 899.3 | 1101.7 | 22.5% | 901.9 | 0.3% | 268.5 | 906.13 | 0.8% | 61.1 | 901.9 | 0.3% | 60.2 |
r106.50 | 5 | 793 | 937.8 | 18.3% | - | - | 2000 | - | - | - | - | - | - |
r107.50 | 7 | 711.1 | 790.5 | 11.2% | - | - | 2000 | 816.47 | 14.8% | 120.5 | 741.5 | 4.3% | 336.2 |
r108.50 | 6 | 617.7 | 698.1 | 13.0% | - | - | 2000 | 645.7 | 4.5% | 215.1 | 643.9 | 4.2% | 466.5 |
r109.50 | 8 | 786.8 | 873.2 | 11.0% | 805.6 | 2.4% | 2000 | 796.3 | 1.2% | 77.2 | 788.7 | 0.2% | 324.9 |
r110.50 | 7 | 697 | 887.1 | 27.3% | - | - | 2000 | 754.9 | 8.3% | 560.6 | 768.1 | 10.2% | 225.4 |
r111.50 | 7 | 707.2 | 784.5 | 10.9% | - | - | 2000 | 750.0 | 6.0% | 297.8 | 801.9 | 13.4% | 220.8 |
r112.50 | 6 | 630.2 | 730.6 | 15.9% | - | - | 2000 | 665.7 | 5.6% | 476.5 | 698.0 | 10.8% | 300.2 |
r201.50 | 6 | 791.9 | 1196.9 | 51.1% | 794.3 | 0.3% | 5.2 | 803.5 | 1.5% | 27.0 | 801.3 | 1.2% | 30.4 |
r202.50 | 5 | 698.5 | 1173.2 | 68.0% | 723.0 | 3.5% | 2000 | 735.7 | 5.3% | 437.0 | 726.3 | 4.0% | 337.2 |
r203.50 | 5 | 605.3 | 1173.2 | 93.8% | 608.0 | 0.4% | 2000 | 639.4 | 5.6% | 168.6 | 653.0 | 7.9% | 380.9 |
r204.50 | 2 | 506.4 | 1127.9 | 122.7% | 512.4 | 1.2% | 2000 | 524.0 | 3.5% | 75.8 | 583.8 | 15.3% | 401.6 |
r205.50 | 4 | 690.1 | 1132.3 | 64.1% | 700.2 | 1.5% | 2000 | 732.4 | 6.1% | 74.5 | 731.7 | 6.0% | 87.8 |
r206.50 | 4 | 632.4 | 1066.9 | 68.7% | 657.2 | 3.9% | 2000 | 682.1 | 7.9% | 188.4 | 677.2 | 7.1% | 266.6 |
r207.50 | 3 | 361.6 | 1046.8 | 189.5% | - | - | 2000 | 362.6 | 0.3% | 13.1 | 364.1 | 0.7% | 25.2 |
r208.50 | 1 | 328.2 | 1019.6 | 210.7% | - | - | 2000 | 329.3 | 0.3% | 11.9 | 329.3 | 0.3% | 13.3 |
r209.50 | 4 | 600.6 | 1043.6 | 73.8% | 639.6 | 6.5% | 2000 | 608.5 | 1.3% | 217.9 | 609.4 | 1.5% | 58.5 |
r210.50 | 4 | 645.6 | 1119.5 | 73.4% | 661.0 | 2.4% | 2000 | 710.2 | 10.0% | 288.6 | 669.2 | 3.7% | 253.4 |
r211.50 | 3 | 535.5 | 958.7 | 79.0% | - | - | 2000 | 590.4 | 10.3% | 168.8 | 555.1 | 3.7% | 339.8 |
rc101.50 | 8 | 944 | 1108.3 | 17.4% | 945.6 | 0.2% | 2000 | 945.6 | 0.2% | 191.2 | 945.6 | 0.2% | 78.8 |
rc102.50 | 7 | 822.5 | 940.7 | 14.4% | - | - | 2000 | 823.1 | 0.1% | 510.3 | 823.1 | 0.1% | 242.9 |
rc103.50 | 6 | 710.9 | 834.9 | 17.4% | - | - | 2000 | 736.4 | 3.6% | 134.7 | 751.3 | 5.7% | 335.2 |
rc104.50 | 5 | 545.8 | 641.4 | 17.5% | - | - | 2000 | 546.5 | 0.1% | 101.6 | 546.5 | 0.1% | 91.5 |
rc105.50 | 8 | 855.3 | 1112.7 | 30.1% | - | - | 2000 | 873.2 | 2.1% | 99.0 | 902.7 | 5.5% | 134.4 |
rc106.50 | 6 | 723.2 | 793 | 9.7% | - | - | 2000 | 733.7 | 1.5% | 79.3 | 728.1 | 0.7% | 152.4 |
rc107.50 | 6 | 642.7 | 752.3 | 17.1% | - | - | 2000 | 650.3 | 1.2% | 235.2 | 644.0 | 0.2% | 183.7 |
rc108.50 | 6 | 598.1 | 690.3 | 15.4% | - | - | 2000 | 600.7 | 0.4% | 256.1 | 599.2 | 0.2% | 276.4 |
rc201.50 | 5 | 684.8 | 1904.7 | 178.1% | 686.3 | 0.2% | 12.5 | 687.7 | 0.4% | 31.9 | 686.3 | 0.2% | 14.8 |
rc202.50 | 5 | 613.6 | 1172.6 | 91.1% | 615.0 | 0.2% | 2000 | 615.6 | 0.3% | 62.2 | 615.0 | 0.2% | 85.9 |
rc203.50 | 4 | 555.3 | 1163.1 | 109.5% | 556.5 | 0.2% | 2000 | 590.4 | 6.3% | 256.7 | 558.5 | 0.6% | 265.8 |
rc204.50 | 3 | 444.2 | 1090.3 | 145.5% | 509.4 | 14.7% | 2000 | 451.8 | 1.7% | 218.4 | 462.3 | 4.1% | 153.7 |
rc205.50 | 5 | 630.2 | 1222.7 | 94.0% | 632.0 | 0.3% | 2000 | 632.0 | 0.3% | 89.8 | 632.0 | 0.3% | 56.0 |
rc206.50 | 5 | 610 | 1088.5 | 78.4% | 611.7 | 0.3% | 2000 | 611.7 | 0.3% | 36.8 | 611.7 | 0.3% | 42.9 |
rc207.50 | 4 | 558.6 | 998.3 | 78.7% | - | - | 2000 | 591.7 | 5.9% | 315.5 | 608.5 | 8.9% | 281.1 |
rc208.50 | 2 | 269.1 | 936.9 | 248.2% | - | - | 2000 | 269.6 | 0.2% | 65.0 | 269.6 | 0.2% | 109.1 |
To demonstrate the advantages of our methods, we show the convergence curve of the primal bound and dual bound of Gurobi, comparing it to the solution found by our solver with the instance “C109.50” as an example. As we can observe in Figure 2, our methods achieve near-optimal solutions in around 15 seconds, significantly faster than Gurobi which takes approximately 300 seconds. We notice that as the objective values increase, the corresponding constraint violation decreases simultaneously, facilitating fast convergence.



5.2 Train timetabling problem
We consider following space-time network model for the train timetabling problem (TTP) on a macro level, which is based on the model in [42]. We use a directed, acyclic and multiplicative graph to characterize the train timetabling problem, where and denote the set of all nodes and the set of all arcs. For each train , the sets or parameters with superscript or subscript notation corresponds to relevant object to . For each arc , we introduce a binary variable equal to 1 if the arc is selected. For each node , let and be the sets of arcs in leaving and entering node , respectively. Then the integer programming model of TTP is given by
(49a) | ||||
s.t. | (49b) | |||
(49c) | ||||
(49d) | ||||
(49e) | ||||
(49f) | ||||
(49g) |
where is the “profit” of using a certain arc . and denote artificial origin and destination nodes, respectively. and denote the set of trains may passing through node and the set of nodes conflicted with node , respectively. denotes the (exponentially large) family of maximal subsets of pairwise incompatible arcs. In this model, (49b), (49c), (49d) imply the arcs of train should form a valid path in , (49e) represents headway constraints, (49f) forbids the simultaneous selection of incompatible arcs, imposing the track capacity constraints.
Let . Then we can rewrite this space-time network model in the general form as (1). Our goal is to show that the proposed Algorithm 4 is fully capable to provide implementable time tables for the Jinghu railway. Specifically, our algorithms are tested on the time-tabling problem for Beijing-Shanghai high-speed railway (or Jinghu high-speed railway in Mandarin). As one of the busiest railways in the world, the Beijing-Shanghai high-speed railway transported over 210 million passengers in 2019***For details, see https://en.wikipedia.org/wiki/Beijing-Shanghai_high-speed_railway.. In our test case, the problem consists of 29 stations and 292 trains in both directions (up and down), including two major levels of speed: 300 km/h and 350 km/h. Several numerical experiments are carried out based on the data of Beijing-Shanghai high-speed railway to demonstrate the feasibility and effectiveness of the proposed strategy.
We compare our proposed methods with the Gurobi solver and ADMM [21] on small and real-world instances, as presented in Tables IV and V. The notations and represent the number of trains, the number of stations, and the time window. In most large-scale cases, the Gurobi solver takes a much longer time to solve, so we set a time limit of two hours. Since (49a) maximizes the positive revenue, we have a negative cost if reversing the objective to a minimization problem, then both subproblems (16a) and (16b) are solved by the Bellman-Ford algorithm, which is an efficient tool for solving the shortest path problem.
5.2.1 Performances on small instances
We first validate our algorithm on a smaller sub-network using a subset of stations of the Jinghu railway. We set the revenue of each arc proportional to the distance between two stations, which corresponds to an intuition that longer rides should bear higher incomes. For our proposed methods, we set uniformly and for all instances. We set a time limit of 100 seconds for our algorithms. Since the optimal value is unknown, we report the upper bounds (UB) obtained by the Gurobi solver as a reference for comparison in Table IV. We can observe that our ALM-C performs competitively with the Gurobi solver in terms of both the optimal value and computation time.
Gurobi | ADMM | ALM-P | ALM-C | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
UB | Time | Time | Time | Time | |||||||
20 | 15 | 200 | 20952 | 20952 | 5.0 | 20952 | 14.2 | 20952 | 13.1 | 20952 | 10.0 |
25 | 15 | 200 | 23396 | 23396 | 11.5 | 23396 | 31.3 | 21085 | 22.1 | 23396 | 11.4 |
30 | 15 | 300 | 25707 | 25707 | 14.4 | 24552 | 14.1 | 25707 | 23.0 | 25707 | 9.2 |
40 | 15 | 300 | 34305 | 34305 | 24.3 | 31817 | 100.0 | 32061 | 100.0 | 34305 | 17.0 |
5.2.2 Performances on real-world instances
To verify the efficiency of the ALM-C and ALM-P on large-scale data with all stations involved, we consider the following five examples of ascending problem size in Table V. Similarly, we try to maximize the total revenue of our schedule. In practice, the revenue of an arc may be ambiguous. Besides, since Jinghu high-speed railway always has a high level of utilization, introducing new trains is always beneficial since it further covers unmet demand. In this view, we introduce an alternative objective function to simply maximize the number of scheduled trains. Perhaps not surprisingly, this simplified objective can dramatically speed up our methods for practical interest. This is due to the fact that the coefficient in the objective function is sparse, resulting in faster identification of the optimal solution under this model.
For our proposed methods, we set and for the instances No. 1-2, and for the instances No. 3 and No. 5, and for the instance No. 4. We set and to be zero for all instances. Both the number of variables and the number of constraints of the model (49) are very large for this practical TTP instance, whose dimensions are up to tens of millions. Tables VI and VII show the performance results of our proposed methods and the Gurobi solver, where gap.
No. | ||||||
---|---|---|---|---|---|---|
1 | 50 | 29 | 300 | 49,837 | 231,722 | 308,177 |
2 | 50 | 29 | 600 | 113,737 | 1,418,182 | 1,396,572 |
3 | 100 | 29 | 720 | 145,135 | 1,788,088 | 4,393,230 |
4 | 150 | 29 | 960 | 199,211 | 3,655,151 | 9,753,590 |
5 | 292 | 29 | 1080 | 228,584 | 13,625,558 | 26,891,567 |
No. | Gurobi | ADMM | ALM-P | ALM-C | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
gap2 | Time | gap2 | Time | gap2 | Time | gap2 | Time | |||||
1 | 6.18e+04 | 13.6% | 7200.0 | 6.94e+04 | 3.0% | 54.1 | 6.93e+04 | 3.1% | 84.3 | 6.98e+04 | 2.5% | 72.2 |
2 | 7.61e+04 | 45.3% | 7200.0 | 1.34e+05 | 3.5% | 96.2 | 1.34e+05 | 3.4% | 144.4 | 1.35e+05 | 3.0% | 126.1 |
3 | 9.69e+04 | 61.5% | 7200.0 | 2.33e+05 | 7.3% | 558.3 | 2.36e+05 | 6.3% | 414.9 | 2.36e+05 | 6.3% | 396.6 |
4 | 1.23e+05 | 74.5% | 7200.0 | 4.38e+05 | 9.6% | 1422.7 | 4.42e+05 | 8.7% | 924.3 | 4.45e+05 | 8.1% | 714.9 |
5 | 2.95e+05 | 74.0% | 7200.0 | 9.15e+05 | 19.5% | 3600.0 | 9.60e+05 | 15.6% | 2094.2 | 9.63e+05 | 15.3% | 1722.7 |
No. | Gurobi | ADMM | ALM-P | ALM-C | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gap1 | Time | gap1 | Time | gap1 | Time | gap1 | Time | ||||||
1 | – | 27 | – | 3836.0 | 27 | – | 3.4 | 27 | – | 3.4 | 27 | – | 3.5 |
2 | 50 | 50 | 0 | 2965.8 | 50 | 0 | 8.2 | 50 | 0 | 8.1 | 50 | 0 | 8.2 |
3 | 100 | 11 | 89.0% | 7200.0 | 89 | 11.0% | 3001.4 | 100 | 0 | 127.4 | 100 | 0 | 96.4 |
4 | 150 | 14 | 90.7% | 7200.0 | 148 | 1.3% | 3005.3 | 150 | 0 | 99.5 | 150 | 0 | 96.4 |
5 | 292 | 30 | 89.7% | 7200.0 | 286 | 2.1% | 3003.8 | 292 | 0 | 203.1 | 292 | 0 | 172.1 |
The results clearly demonstrate the high effectiveness of our methods compared to Gurobi, especially when dealing with large-scale data. For the instance No. 1, both ALM-C and ALM-P can schedule 27 trains in a few seconds, which are at least 1000 times faster than the Gurobi solver and meanwhile obtain satisfactory accuracy performance. In particular, when the scale of data is up to tens of millions in instance No. 5, our ALM-C and ALM-P successfully schedule all 292 trains. In contrast, the Gurobi solver produces a relatively small number of trains, only 30 trains, and takes much longer. The results of other instances also illustrate the effectiveness of this technique. Therefore, we can conclude that the customized ALM provides a fast and global optimal solution for this practical problem.
6 Conclusion
In this paper, we study general integer programming with block structure. Benefiting from its special structure, we extend the augmented Lagrangian method, originally designed for continuous problems, to effectively solve the problem (1). By introducing a novel augmented Lagrangian function, we establish the strong duality and optimality for the problem (1). Furthermore, we provide the convergence results of the proposed methods for both the augmented Lagrangian relaxation and dual problems. To obtain high-quality feasible solutions, we develop a customized ALM combined with refinement techniques to iteratively improve the primal and dual solution quality simultaneously. The numerical experiments demonstrate that the customized ALM is time-saving and performs well for finding optimal solutions to a wide variety of practical problems.
References
- [1] P. Wang, C. Shen, A. van den Hengel, and P. H. Torr, “Large-scale binary quadratic optimization using semidefinite relaxation and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 3, pp. 470–485, 2016.
- [2] B. S. Y. Lam and A. W.-C. Liew, “A fast binary quadratic programming solver based on stochastic neighborhood search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 32–49, 2020.
- [3] X. Lin, Z. J. Hou, H. Ren, and F. Pan, “Approximate mixed-integer programming solution with machine learning technique and linear programming relaxation,” in 2019 3rd International Conference on Smart Grid and Smart Cities (ICSGSC). IEEE, 2019, pp. 101–107.
- [4] R. Anderson, J. Huchette, W. Ma, C. Tjandraatmadja, and J. P. Vielma, “Strong mixed-integer programming formulations for trained neural networks,” Mathematical Programming, vol. 183, no. 1, pp. 3–39, 2020.
- [5] F. Eisenbrand, C. Hunkenschröder, K.-M. Klein, M. Kouteckỳ, A. Levin, and S. Onn, “An algorithmic theory of integer programming,” arXiv preprint arXiv:1904.01361, 2019.
- [6] J. Cslovjecsek, M. Kouteckỳ, A. Lassota, M. Pilipczuk, and A. Polak, “Parameterized algorithms for block-structured integer programs with large entries,” in Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 2024, pp. 740–751.
- [7] J. A. De Loera, R. Hemmecke, S. Onn, and R. Weismantel, “N-fold integer programming,” Discrete Optimization, vol. 5, no. 2, pp. 231–241, 2008.
- [8] L. Chen, On Block-Structured Integer Programming and Its Applications. Cham: Springer International Publishing, 2019, pp. 153–177.
- [9] A. H. Land and A. G. Doig, “An automatic method for solving discrete programming problems,” in 50 Years of Integer Programming 1958-2008. Springer, 2010, pp. 105–132.
- [10] R. E. Gomory, “Outline of an algorithm for integer solutions to linear programs,” Bulletin of the American Mathematical Society, vol. 64, pp. 275–278, 1958.
- [11] J. F. Benders, “Partitioning procedures for solving mixed-variables programming problems,” Numerische Mathematik, vol. 4, no. 1, pp. 238–252, 1962.
- [12] G. B. Dantzig and P. Wolfe, “Decomposition principle for linear programs,” Operations Research, vol. 8, no. 1, pp. 101–111, 1960.
- [13] A. M. Geoffrion, “Lagrangean relaxation for integer programming,” in Approaches to Integer Programming. Springer, 1974, pp. 82–114.
- [14] M. Conforti, G. Cornuéjols, and G. Zambelli, Integer programming. Springer, 2014, vol. 271.
- [15] A. Caprara, M. Monaci, P. Toth, and P. L. Guida, “A lagrangian heuristic algorithm for a real-world train timetabling problem,” Discrete Applied Mathematics, vol. 154, no. 5, pp. 738–753, 2006.
- [16] B. Wu and B. Ghanem, “-box ADMM: A versatile framework for integer programming,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1695–1708, 2018.
- [17] K. Sun, M. Sun, and W. Yin, “Decomposition methods for global solutions of mixed-integer linear programs,” arXiv preprint arXiv:2102.11980, 2021.
- [18] M. Feizollahi, “Large-scale unit commitment: Decentralized mixed integer programming approaches,” Ph.D. dissertation, Georgia Institute of Technology, 2015.
- [19] R. Takapoui, “The alternating direction method of multipliers for mixed-integer optimization applications,” Ph.D. dissertation, Stanford University, 2017.
- [20] Y. Yao, X. Zhu, H. Dong, S. Wu, H. Wu, L. C. Tong, and X. Zhou, “ADMM-based problem decomposition scheme for vehicle routing problem with time windows,” Transportation Research Part B: Methodological, vol. 129, pp. 156–174, 2019.
- [21] Q. Zhang, R. M. Lusby, P. Shang, and X. Zhu, “Simultaneously re-optimizing timetables and platform schedules under planned track maintenance for a high-speed railway network,” Transportation Research Part C: Emerging Technologies, vol. 121, p. 102823, 2020.
- [22] Y. Kanno and S. Kitayama, “Alternating direction method of multipliers as a simple effective heuristic for mixed-integer nonlinear optimization,” Structural and Multidisciplinary Optimization, vol. 58, no. 3, pp. 1291–1295, 2018.
- [23] E. L. Johnson, G. L. Nemhauser, and M. W. Savelsbergh, “Progress in linear programming-based algorithms for integer programming: An exposition,” Informs Journal on Computing, vol. 12, no. 1, pp. 2–23, 2000.
- [24] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using linear programming to decode binary linear codes,” IEEE Transactions on Information Theory, vol. 51, no. 3, pp. 954–972, 2005.
- [25] M. Laurent and F. Rendl, “Semidefinite programming and integer programming,” Handbooks in Operations Research and Management Science, vol. 12, pp. 393–514, 2005.
- [26] P. Wang, C. Shen, and A. Van Den Hengel, “A fast semidefinite approach to solving binary quadratic problems,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1312–1319.
- [27] A. Camisa, I. Notarnicola, and G. Notarstefano, “A primal decomposition method with suboptimality bounds for distributed mixed-integer linear programming,” in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 3391–3396.
- [28] R. Vujanic, P. M. Esfahani, P. J. Goulart, S. Mariéthoz, and M. Morari, “A decomposition method for large scale MILPs, with performance guarantees and a power system application,” Automatica, vol. 67, pp. 144–156, 2016.
- [29] P. Muts, I. Nowak, and E. M. Hendrix, “The decomposition-based outer approximation algorithm for convex mixed-integer nonlinear programming,” Journal of Global Optimization, vol. 77, no. 1, pp. 75–96, 2020.
- [30] W. Murray and K.-M. Ng, “An algorithm for nonlinear optimization problems with binary variables,” Computational Optimization and Applications, vol. 47, no. 2, pp. 257–288, 2010.
- [31] S. Lucidi and F. Rinaldi, “Exact penalty functions for nonlinear integer programming problems,” Journal of Optimization Theory and Applications, vol. 145, no. 3, pp. 479–488, 2010.
- [32] C. Yu, K. L. Teo, and Y. Bai, “An exact penalty function method for nonlinear mixed discrete programming problems,” Optimization Letters, vol. 7, no. 1, pp. 23–38, 2013.
- [33] M. J. Feizollahi, S. Ahmed, and A. Sun, “Exact augmented Lagrangian duality for mixed integer linear programming,” Mathematical Programming, vol. 161, no. 1, pp. 365–387, 2017.
- [34] C. E. Blair and R. G. Jeroslow, “The value function of a mixed integer program: I,” Discrete Mathematics, vol. 19, no. 2, pp. 121–138, 1977.
- [35] J. A. González and J. Castro, “A heuristic block coordinate descent approach for controlled tabular adjustment,” Computers Operations Research, vol. 38, no. 12, pp. 1826–1835, 2011.
- [36] S. Jäger and A. Schöbel, “The blockwise coordinate descent method for integer programs,” Mathematical Methods of Operations Research, vol. 91, no. 2, pp. 357–381, 2020.
- [37] P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of Optimization Theory and Applications, vol. 109, no. 3, pp. 475–494, 2001.
- [38] J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1, pp. 459–494, 2014.
- [39] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods,” Mathematical Programming, vol. 137, no. 1-2, pp. 91–129, 2013.
- [40] A. Ruszczynski, Nonlinear Optimization. Princeton University Press, 2011.
- [41] M. M. Solomon, “Algorithms for the vehicle routing and scheduling problems with time window constraints,” Operations research, vol. 35, no. 2, pp. 254–265, 1987.
- [42] A. Caprara, M. Fischetti, and P. Toth, “Modeling and solving the train timetabling problem,” Operations Research, vol. 50, no. 5, pp. 851–861, 2002.