Towards large-scale probabilistic set covering problem: an efficient Benders decomposition approach
Abstract
In this paper, we investigate the probabilistic set covering problems (PSCP) in which the right-hand side is a random vector and the covering constraint is required to be satisfied with a prespecified probability. We consider the case arising from sample average approximation (or finite discrete distributions). We develop an effective Benders decomposition (BD) algorithm for solving large-scale PSCPs, which enjoys two key advantages: (i) the number of variables in the underlying Benders reformulation is independent of the scenario size; and (ii) the Benders cuts can be separated by an efficient combinatorial algorithm. For the special case that is a combination of several independent random blocks/subvectors, we explicitly take this kind of block structure into consideration and develop a more efficient BD algorithm. Numerical results on instances with up to one million scenarios demonstrate the effectiveness of the proposed BD algorithms over a black-box MIP solver’s branch-and-cut and automatic BD algorithms and a state-of-the-art algorithm in the literature.
Keywords:
Benders decomposition Large-scale optimization Probabilistic set covering Sample average approximations1 Introduction
In this paper, we consider the probabilistic set covering problem (PSCP) of the form:
(PSCP) |
where is a 0-1 matrix of dimension , is the cost of columns, is a confidence parameter chosen by the decision maker, typically near zero, and is a random vector taking values in . (PSCP) generalizes the well-known set covering problem (SCP) [29] where the random vector is replaced by the all ones vector and has obvious applications in location problems [8]. It also serves as a building block in various applications; see [4, 26].
(PSCP) falls in the class of probabilistic programs (also known as chance-constrained programs), which has been extensively investigated in the literatures. We refer to the surveys [3, 16, 15, 24] for a detailed discussion of probabilistic programs and the various algorithms. Beraldi and Ruszczyński [8] first investigated (PSCP) and developed a specialized branch-and-bound algorithm, which involves enumerating the so-called -efficient points introduced by [23] and solving a deterministic SCP for each -efficient point. Saxena et al. [26] studied the special case in which the random vector can be decomposed into blocks/subvectors formed by subsets , and and are independent for any two distinct (throughout the paper, for a nonnegative integer , we denote where if ). Problem (PSCP) in this special case can be rewritten as
(PSCP’) |
where is the submatrix formed by the rows in , . Saxena et al. [26] developed an equivalent mixed integer programming (MIP) formulation for (PSCP’) based on the -efficient and -inefficient points of the distribution of and solved it by a customized branch-and-cut (B&C) algorithm. For set covering models with uncertainty on the constraint matrix , we refer to [2, 7, 12, 14, 20, 27, 28, 30] and the references therein.
One weakness of the above two approaches for solving (PSCP) is that it requires an enumeration of the -efficient (and -inefficient) points of the distribution of or , which could be computationally demanding, especially when the dimensions are large. In addition, the MIP formulation of Saxena et al. [26] may grow exponentially with the dimensions of , making it only capable of solving (PSCP) with small dimensions of . Luedtke and Ahmed [19] addressed this difficulty by using the sample average approximation (SAA) approach [19, 22], where (PSCP) is replaced by an approximation problem based on Monte Carlo samples of the random vector . The approximation problem is also a form of (PSCP) but has a finite discrete distribution of . As such, it can be reformulated as a deterministic MIP problem and solved by the state-of-the-art MIP solvers. In general, the selection of the scenario size is crucial for the approximation quality; the larger the scenario size, the smaller the approximation error (see [3, 19, 21] for related discussions). However, since the problem size of the MIP formulation grows linearly with the number of scenarios, this also poses a new challenge for MIP solvers to solve (PSCP) with a huge number of scenarios.
The first contribution of this paper is the proposed customized Benders decomposition (BD) algorithm that is capable of solving (PSCP) with a huge number of scenarios due to the following two reasons. First, the number of variables in the underlying Benders reformulation is independent of the number of scenarios of the random data. Second, the Benders cuts can be separated by an efficient combinatorial algorithm. Our results show that the proposed BD algorithm is much more efficient than a black-box MIP solver’s branch-and-cut and automatic BD algorithms.
The second contribution of this paper is that for (PSCP) with block structure of [26], i.e., problem (PSCP’), we develop a mixed integer nonlinear programming (MINLP) formulation for an alternative approximation problem that explicitly takes the block structure into consideration, and propose a BD algorithm for it. The proposed BD algorithm is much more robust than the state-of-the-art approach in [26] in terms of solving (PSCP’) with different input parameters. Moreover, we show that the new approximation problem can return almost the same solution as the traditional one (when the scenario size is large) but is much more computationally solvable by the proposed BD algorithm.
2 Mathematical formulations
Consider (PSCP) with (arbitrarily) finite discrete distributions of random vectors , that is, there exist finitely many scenarios such that for and . We introduce for each , a binary variable , where guarantees , and for each , a binary variable , where guarantees . Then (PSCP) can be formulated as the following MIP formulation [19]:
(1a) | ||||
(1b) | ||||
(1c) | ||||
(1d) | ||||
(1e) |
where , . Constraints (1c)–(1e) ensure . Note that if for some , we can set and remove variable from the formulation. Therefore, we can assume for all and thus in the following.
For problem (PSCP’) (i.e., problem (PSCP) with block structure of ) with a finite discrete distribution of : for and , a similar convex MINLP problem can be presented as follows:
(2a) | ||||
(2b) | ||||
(2c) | ||||
(2d) | ||||
(2e) |
Here, (similarly, we assume ); variable represents the value of where is subvector of formed by the rows in ; and constraints (2c)–(2e) ensure that .
Both problems (1) and (2) can be seen as approximations of problem (PSCP) in the context of the SAA approach. When the block number is equal to one, problems (1) and (2) are equivalent. However, when the block number is larger than one, the two problems are not equivalent in general as the finite discrete distribution of , obtained from Monte Carlo sampling, may not have an independent block structure. In Section 4, we will perform computational experiments to compare the performance of the two approximations.
Although problems (1) and (2) can be solved to optimality by the state-of-the-art MIP/MINLP solvers, the potentially huge numbers of scenario variables and related constraints make this approach only able to solve moderate-sized instances. To alleviate the computational difficulty, the authors in [17, 18, 19] proposed some preprocessing techniques to reduce the numbers of variables and constraints in problem (1). The preprocessing techniques can also be extended to problem (2). Unfortunately, the numbers of variables and constraints after preprocessing can still grow linearly with the number of scenarios, making it still challenging to solve problems with a huge number of scenarios. For problem (2), Saxena et al. [26] proposed an equivalent MIP reformulation in the space of and . However, the problem size of this MIP reformulation depends on the number of the so-called -efficient and -inefficient points, which generally grows exponentially with the block sizes of .
To overcome the above weakness, we will develop an efficient BD approach in Section 3, which projects the scenario variables out from problems (1) or (2) and adds the Benders cuts on the fly. This approach is based on a favorable property of the two formulations, detailed as follows. Let (MIP’) (respectively, (MINLP’)) be the relaxation of problem (1) (respectively, (2)) obtained by removing the integrality constraints (respectively, ). The following proposition shows that this operation does not change the objective values of problems (1) and (2).
Proposition 1
Proof
Let and be the optimal values of problems (1) and (MIP’). Clearly, . To prove the other direction, suppose that is an optimal solution of (MIP’). Let be defined by where , . By (1c), we have , and thus must be a feasible solution of problem (1). As a result, . The proof for the equivalence of problems (2) and (MINLP’) is analogous. ∎
It is also easy to see that relaxing the binary variables into continuous variables in in formulations (MIP’) and (MINLP’) does not change their optimal values. However, as noted by [19], leaving the integrality constraints on variables can render MIP solvers to favor the branching on variables and effectively improve the overall performance. Our preliminary experiments also justified this statement (for both (MIP’) and (MINLP’)). Due to this, we decide to leave the integrality constraints in the two formulations.
3 Benders decomposition
In this section, we first review the (generalized) BD approach for generic convex MINLP and then apply the BD approach to formulations (1) and (2).
3.1 Benders decomposition for convex MINLPs
Here we briefly review the (generalized) BD algorithm for convex MINLPs. For more details, we refer to [6, 10, 13]. We consider a convex MINLP of the form
(3) |
where and are convex and twice differentiable functions. For simplicity of discussion, we assume that is a linear function of . Notice that this is the case in the context of solving formulations (1) and (2).
Let and . Problem (3) can be formulated as the master problem in the space:
(4) |
where
(5) |
is a convex function measuring the infeasibility of , , at point (if ). Here , , are the weights that can be chosen to reduce to, e.g., norm minimization. For simplicity, we assume that is the set of constraints that can be satisfied at all , and its complement is the set of constraints that can possibly be violated at some . A special case is that is a singleton as considered in the context of solving formulations (1) and (2). In this case, we can set and solve an equivalent problem:
(6) |
The master problem (4) can be solved by a linear programming (LP) based B&C approach in which the nonlinear constraint is replaced by the so-called (generalized) Benders feasibility cuts. In particular, let be a solution of the LP relaxation of the current master problem. Due to the convexity, can be underestimated by a supporting hyperplane , so we obtain the following Benders feasibility cut
(7) |
where
(8) |
and are optimal primal and Lagrangian dual solutions of the convex problem (6) with , derived using the Karush-Kuhn-Tucker (KKT) conditions; see [13] for more details.
3.2 Benders decomposition for generic PSCPs
Now, we apply the BD approach to the MIP formulation (1) of the PSCP. From Proposition 1, we project the variables out from formulation and obtain the master problem
(9) |
For a solution , there always exists a feasible solution satisfying (1c), and thus can be presented as follows:
(10) |
Note that as discussed in the end of Section 2, it is also possible to project variables out from the formulation. However, based on our experience, leaving the variables (whose number is generally not large) in the master problem makes the BD algorithm converge much faster, and thus improves the overall performance.
Let be the dual variable associated with constraint in problem (10). The Lagrangian function reads
(11) |
Let be an optimal dual solution. Hence , and the Benders feasibility cut (7) reduces to
(12) |
Next, we derive a closed formula for the Benders feasibility cut (12). Without loss of generality, we assume values are sorted non-decreasingly, i.e., . Let . Then one optimal solution for problem (10) is given by
(13) |
Substituting into the KKT system
we derive an optimal dual solution as follows:
(14) |
Plugging and into (12) and regrouping the terms, we obtain the Benders feasibility cut
(15) |
Here , , and . Given a solution , the above procedure for the separation of Benders feasibility cuts can be implemented with the complexity of .
3.3 Benders decomposition for PSCPs with block structure of
Next, we apply the BD approach to the MINLP formulation (2) of (PSCP) with block structure of . Similarly, we project the variables out from the formulation and obtain the master problem
(16) |
where for a solution , , , are given by
(17) |
Let . Without loss of generality, we assume and . Note that may not be well-defined for some . Indeed, letting , if for all , then for any feasible solution of problem (17), must hold for all , and thus . To bypass this difficulty, we may add the valid inequality to the master problem (16) to remove , where is the minimum index such that .
Next, we consider the case that holds for some , i.e., is well-defined. Let be the dual variable associated with constraint in problem (17). The Lagrangian function for problem (17) reads
(18) |
Using a similar of analysis as in Section 3.2, the Benders feasibility cut (7) can be written as
(19) |
where is the optimal dual solution.
4 Computational results
In this section, we present computational results to illustrate the effectiveness of the proposed BD algorithms for solving large-scale PSCPs. The proposed BD algorithms were implemented in Julia 1.7.3 using CPLEX 20.1.0. We use a branch-and-Benders-cut (B&BC) approach [25] to implement the BD algorithms in Sections 3.2 and 3.3, where the Benders feasibility cuts are added on the fly at each node of the search tree via the cut callback framework. In addition, before starting the B&BC algorithm, we apply the processing technique [17, 18, 19] to simplify formulations (1) and (2) and follow [9, 11] to apply an iterative procedure to solve the LP relaxation of the master problem. The latter attempts to collect effective Benders feasibility cuts for the initialization of the relaxed master problem, and improve the overall performance. At every iteration of the procedure, the LP is solved, and the Benders feasibility cuts violated by the current LP solution are added to the LP. This loop is repeated until the dual bound increases less than or equal to 0.01% for 100 consecutive times. At the end of this procedure, all violated cuts are added as static cuts to the relaxed master problem. The parameters of CPLEX were set to run the code in a single-threaded mode, with a time limit of 7200 seconds and a relative MIP gap of 0%. The computational experiments were conducted on a cluster of computers equipped with Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. Throughout this section, all averages are reported as the shifted geometric mean, with a shift of [1].
Following [26], we construct PSCP instances using the 60 deterministic SCP instances [5].We construct instances with two block sizes, namely 10 and 20, and instances that do not have a block structure of the random variable (that is, has only a single block). We use two probability distributions of called circular and star distributions [8, 26] to construct scenarios of or . For the circular distribution, the -th entry of random vector is set to , where , , are independent Bernoulli random variables following the distribution , are uniformly chosen from , and denotes the modulo operator. For the star distribution, the -th entry of random vector is set to if and otherwise, where , , , are independent Poisson random variables following the distribution (), and are uniformly chosen from . The distributions of random variables can be described in a similar manner. In our random construction procedure, the scenario size and the confidence parameter are chosen from and , respectively. In total, there are PSCP instances of problems (1) and (2) in our testbed.
All | Formulation (1) | Formulation (2) | Difference | |||||||||
S | T | G(%) | ST | S | T | G(%) | ST | ND | O(%) | |||
720 | 580 | 140.9 | 18.8 | 3.6 | 644 | 63.0 | 18.8 | 3.6 | 15 | 1.0 | ||
720 | 571 | 224.0 | 19.1 | 26.7 | 617 | 158.4 | 16.1 | 33.2 | 3 | 1.0 | ||
Block size | 10 | 480 | 384 | 162.1 | 24.4 | 9.3 | 454 | 59.1 | 14.5 | 10.0 | 10 | 0.9 |
20 | 480 | 388 | 173.7 | 24.1 | 10.1 | 443 | 61.4 | 13.6 | 9.4 | 8 | 1.0 | |
480 | 379 | 199.3 | 16.2 | 11.6 | 364 | 273.3 | 19.5 | 16.4 | 0 | 0.0 |
We first compare the performance of problems (1) and (2) when using them as Monte Carlo sampling approximations to solve PSCPs. We use the BD algorithms in Sections 3.2 and 3.3 to solve the two problems. Table 1 reports the total number of instances solved within the time limit (S), the average CPU time in seconds (T), and the average separation time in seconds (ST). For instances that cannot be solved within the time limit, we report the average relative end gap (G(%)) in percentage returned by CPLEX. To see the difference between the two problems, we report the number of instances with different optimal values (ND) and the gap between the optimal values of these instances (defined by , where and represent the optimal values of the two problems). As shown in Table 1, for most cases, the optimal values of problems (1) and (2) are identical, and even if the optimal values are different in some cases, their difference is usually very small. This shows that the gap between the two problems is indeed very small, especially for the cases with a huge scenario size. As for the computational efficiency, for instances with multiple blocks (or block sizes of 10 and 20), problem (2) is much more solvable by the proposed BD algorithm. This is reasonable as by exploiting the block structure of the random variables , up to cuts can be constructed in a single iteration, making the proposed BD algorithm for problem (2) converge much faster. In contrast, for instances with a single block, solving problem (2) is outperformed by solving problem (1).
Next, we compare the performance of the proposed BD algorithm (called BD) with the CPLEX’s default branch-and-cut and automatic BD algorithms (called CPX and AUTO-BD, respectively) for solving generic PSCPs with arbitrary distribution of (i.e., problem (1)). The results are summarized in Table 2. From Table 2, we see that the performance of BD is much better than CPX. Indeed, for large-scale instances with scenarios, CPX was even unable to solve the LP relaxation of problem (1) within the given time limit. Comparing AUTO-BD and BD, we observe that for small-scale instances with scenarios, BD is outperformed by AUTO-BD by a factor of (in terms of CPU time); but for large-scale instances with scenarios, BD outperforms AUTO-BD by a factor of .
All | CPX | AUTO-BD | BD | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
S | T | G(%) | S | T | G(%) | S | T | G(%) | ST | ||
720 | 528 | 975.9 | 26.7 | 582 | 102.0 | 14.7 | 580 | 140.9 | 15.5 | 3.6 | |
720 | – | – | – | 578 | 700.0 | 14.3 | 571 | 224.0 | 14.9 | 26.7 |
Finally, we compare the performance of the proposed BD algorithm with the B&C algorithm of Saxena et al. [26] (denoted by B&C) for solving PSCPs with block structure of (i.e., problem (2)). B&C solves an MIP formulation built on the so-called -efficient and -inefficient points (which must be enumerated before starting the algorithm). Table 3 summarizes the performance results on instances with block sizes being and . We do not report the results on instances with a single block, as the block size is too large to enumerate the -efficient and -inefficient points within a reasonable time of period. For setting B&C, we report the average number of (linear) constraints that are used to model the probabilistic constraint (C); and for completeness, we also report the CPU time spent in the enumeration phase (ET), which is not included in the time spent in solving the MIP formulation. As shown in Table 3, when and the block size are small or is constructed by a circular distribution, the number of constraints in the MIP formulation of Saxena et al. [26] is small, and thus B&C performs better than BD. However, for instances with a large , a large block size, or the star distribution of , due to the large number of constraints in the MIP formulation, using B&C yields worse performance. In contrast, as the proposed BD algorithm does not need to solve a large MIP formulation, it performs much better on these instances. Moreover, it can avoid calling a potentially time-consuming enumeration procedure of -efficient and -inefficient points.
All | B&C | BD | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
S | T | G(%) | C | ET | S | T | G(%) | ST | |||
0.05 | 480 | 477 | 23.4 | 10.0 | 2308 | 107.9 | 454 | 57.0 | 11.5 | 9.1 | |
0.1 | 480 | 385 | 274.0 | 21.5 | 27874 | 156.6 | 443 | 63.7 | 14.4 | 10.4 | |
Block size | 10 | 480 | 473 | 35.1 | 13.6 | 3658 | 4.4 | 454 | 59.1 | 18.3 | 10.0 |
20 | 480 | 389 | 184.8 | 21.5 | 17592 | 3183.5 | 443 | 61.4 | 13.3 | 9.4 | |
Distribution | circular | 480 | 456 | 51.2 | 15.8 | 5008 | 79.0 | 450 | 60.2 | 15.2 | 9.4 |
star | 480 | 406 | 127.3 | 23.3 | 12853 | 213.7 | 447 | 60.4 | 13.3 | 10.0 |
In summary, our computational results illustrate the effectiveness of our proposed BD algorithm for solving PSCPs. More specifically, for generic PSCPs with arbitrary distribution of , the proposed BD algorithm is much more efficient than the branch-and-cut and automatic BD algorithms of CPLEX; for PSCPs with block structure of , the proposed BD algorithm is much more robust than the state-of-the-art approach in [26] in terms of solving (PSCP’) with different input parameters.
References
- [1] Achterberg, T.: Constraint Integer Programming. Ph.D. thesis, Technical University of Berlin (2007)
- [2] Ahmed, S., Papageorgiou, D.J.: Probabilistic set covering with correlations. Operations Research 61(2), 438–452 (2013)
- [3] Ahmed, S., Shapiro, A.: Solving chance-constrained stochastic programs via sampling and integer programming. In: Chen, Z.L., Raghavan, S. (eds.) Tutorials in Operations Research: State-of-the-Art Decision-Making Tools in the Information-Intensive Age, pp. 261–269. INFORMS (2008)
- [4] Azizi, N., García, S., Irawan, C.A.: Discrete location problems with uncertainty. In: Salhi, S., Boylan, J. (eds.) The Palgrave Handbook of Operations Research, pp. 43–71. Palgrave Macmillan, Cham (2022)
- [5] Beasley, J.E.: OR-library: Distributing test problems by electronic mail. Journal of the Operational Research Society 41(11), 1069–1072 (1990)
- [6] Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. Acta Numerica 22, 1–131 (2013)
- [7] Beraldi, P., Bruni, M.E.: An exact approach for solving integer problems under probabilistic constraints with random technology matrix. Annals of Operations Research 177(1), 127–137 (2010)
- [8] Beraldi, P., Ruszczyński, A.: The probabilistic set-covering problem. Operations Research 50(6), 956–967 (2002)
- [9] Bodur, M., Luedtke, J.R.: Mixed-integer rounding enhanced Benders decomposition for multiclass service-system staffing and scheduling with arrival rate uncertainty. Management Science 63(7), 2073–2091 (2017)
- [10] Bonami, P., Kilinç, M., Linderoth, J.: Algorithms and software for convex mixed integer nonlinear programs. In: Lee, J., Leyffer, S. (eds.) Mixed Integer Nonlinear Programming, pp. 1–39. Springer New York (2012)
- [11] Fischetti, M., Ljubić, I., Sinnl, M.: Benders decomposition without separability: A computational study for capacitated facility location problems. European Journal of Operational Research 253(3), 557–569 (2016)
- [12] Fischetti, M., Monaci, M.: Cutting plane versus compact formulations for uncertain (integer) linear programs. Mathematical Programming Computation 4, 239–273 (2012)
- [13] Geoffrion, A.M.: Generalized Benders decomposition. Journal of Optimization Theory and Applications 10(4), 237–260 (1972)
- [14] Hwang, H.S.: A stochastic set-covering location model for both ameliorating and deteriorating items. Computers & Industrial Engineering 46(2), 313–319 (2004)
- [15] Küçükyavuz, S., Jiang, R.: Chance-constrained optimization under limited distributional information: A review of reformulations based on sampling and distributional robustness. EURO Journal on Computational Optimization 10, 100030 (2022)
- [16] Küçükyavuz, S., Sen, S.: An introduction to two-stage stochastic mixed-integer programming. In: Batta, R., Peng, J. (eds.) Tutorials in Operations Research: Leading Developments from INFORMS Communities, pp. 1–27. INFORMS (2017)
- [17] Lejeune, M.A.: Preprocessing techniques and column generation algorithms for stochastically efficient demand. Journal of the Operational Research Society 59(9), 1239–1252 (2008)
- [18] Lejeune, M., Noyan, N.: Mathematical programming approaches for generating -efficient points. European Journal of Operational Research 207(2), 590–600 (2010)
- [19] Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM Journal on Optimization 19(2), 674–699 (2008)
- [20] Lutter, P., Degel, D., Büsing, C., Koster, A.M.C.A., Werners, B.: Improved handling of uncertainty and robustness in set covering problems. European Journal of Operational Research 263(1), 35–49 (2017)
- [21] Nemirovski, A., Shapiro, A.: Scenario approximations of chance constraints. In: Calafiore, G., Dabbene, F. (eds.) Probabilistic and Randomized Methods for Design under Uncertainty, pp. 3–47. Springer London (2006)
- [22] Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: Theory and applications. Journal of Optimization Theory and Applications 142(2), 399–416 (2009)
- [23] Prékopa, A.: Dual method for the solution of a one-stage stochastic programming problem with random RHS obeying a discrete probability distribution. Zeitschrift für Operations Research 34(6), 441–461 (1990)
- [24] Prékopa, A.: Probabilistic programming. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming: Handbooks in Operations Research and Management Science, vol. 10, pp. 267–351. Elsevier (2003)
- [25] Rahmaniani, R., Crainic, T.G., Gendreau, M., Rei, W.: The Benders decomposition algorithm: A literature review. European Journal of Operational Research 259(3), 801–817 (2017)
- [26] Saxena, A., Goyal, V., Lejeune, M.A.: MIP reformulations of the probabilistic set covering problem. Mathematical Programming 121(1), 1–31 (2010)
- [27] Shen, H., Jiang, R.: Chance-constrained set covering with Wasserstein ambiguity. Mathematical Programming 198(1), 621–674 (2023)
- [28] Song, Y., Luedtke, J.R.: Branch-and-cut approaches for chance-constrained formulations of reliable network design problems. Mathematical Programming Computation 5(4), 397–432 (2013)
- [29] Toregas, C., Swain, R., ReVelle, C.S., Bergman, L.: The location of emergency service facilities. Operations Research 19(6), 1363–1373 (1971)
- [30] Wu, H.H., Küçükyavuz, S.: Probabilistic partial set covering with an oracle for chance constraints. SIAM Journal on Optimization 29(1), 690–718 (2019)