¹¹institutetext: School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China, ¹¹email: {chenweikun,chenyilong,lvwei}@bit.edu.cn ²²institutetext: Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, ²²email: [email protected]

Towards large-scale probabilistic set covering problem: an efficient Benders decomposition approach

Wei-Kun Chen 11 Yi-Long Chen 11 Yu-Hong Dai 22 Wei Lv 11

Abstract

In this paper, we investigate the probabilistic set covering problems (PSCP) in which the right-hand side is a random vector $\xi$ and the covering constraint is required to be satisfied with a prespecified probability. We consider the case arising from sample average approximation (or finite discrete distributions). We develop an effective Benders decomposition (BD) algorithm for solving large-scale PSCPs, which enjoys two key advantages: (i) the number of variables in the underlying Benders reformulation is independent of the scenario size; and (ii) the Benders cuts can be separated by an efficient combinatorial algorithm. For the special case that $\xi$ is a combination of several independent random blocks/subvectors, we explicitly take this kind of block structure into consideration and develop a more efficient BD algorithm. Numerical results on instances with up to one million scenarios demonstrate the effectiveness of the proposed BD algorithms over a black-box MIP solver’s branch-and-cut and automatic BD algorithms and a state-of-the-art algorithm in the literature.

Keywords:

Benders decomposition Large-scale optimization Probabilistic set covering Sample average approximations

1 Introduction

In this paper, we consider the probabilistic set covering problem (PSCP) of the form:

\displaystyle\min\left\{c^{\top}x\,:\,\mathbb{P}\{Ax\geq\xi\}\geq 1-\epsilon,~{}x\in\{0,1\}^{n}\right\},

(PSCP)

where $A$ is a 0-1 matrix of dimension $m\times n$ , $c\in\mathbb{R}^{n}_{+}$ is the cost of columns, $\epsilon$ is a confidence parameter chosen by the decision maker, typically near zero, and $\xi$ is a random vector taking values in $\{0,1\}^{m}$ . (PSCP) generalizes the well-known set covering problem (SCP) [29] where the random vector $\xi$ is replaced by the all ones vector and has obvious applications in location problems [8]. It also serves as a building block in various applications; see [4, 26].

(PSCP) falls in the class of probabilistic programs (also known as chance-constrained programs), which has been extensively investigated in the literatures. We refer to the surveys [3, 16, 15, 24] for a detailed discussion of probabilistic programs and the various algorithms. Beraldi and Ruszczyński [8] first investigated (PSCP) and developed a specialized branch-and-bound algorithm, which involves enumerating the so-called $(1-\epsilon)$ -efficient points introduced by [23] and solving a deterministic SCP for each $(1-\epsilon)$ -efficient point. Saxena et al. [26] studied the special case in which the random vector $\xi$ can be decomposed into $T$ blocks/subvectors $\xi^{1},\ldots,\xi^{T}$ formed by subsets $\mathcal{M}^{t}\subseteq[m]$ , and $\xi^{t_{1}}$ and $\xi^{t_{2}}$ are independent for any two distinct $t_{1},t_{2}\in[T]$ (throughout the paper, for a nonnegative integer $\tau$ , we denote $[\tau]=\{1,\ldots,\tau\}$ where $[\tau]=\varnothing$ if $\tau=0$ ). Problem (PSCP) in this special case can be rewritten as

\displaystyle\min\left\{c^{\top}x\,:\,\prod_{t=1}^{T}\mathbb{P}\{A^{t}x\geq\xi^{t}\}\geq 1-\epsilon,~{}x\in\{0,1\}^{n}\right\},

(PSCP’)

where $A^{t}$ is the submatrix formed by the rows in $\mathcal{M}^{t}$ , $t\in[T]$ . Saxena et al. [26] developed an equivalent mixed integer programming (MIP) formulation for (PSCP’) based on the $(1-\epsilon)$ -efficient and -inefficient points of the distribution of $\xi^{t}$ and solved it by a customized branch-and-cut (B&C) algorithm. For set covering models with uncertainty on the constraint matrix $A$ , we refer to [2, 7, 12, 14, 20, 27, 28, 30] and the references therein.

One weakness of the above two approaches for solving (PSCP) is that it requires an enumeration of the $(1-\epsilon)$ -efficient (and -inefficient) points of the distribution of $\xi$ or $\xi^{t}$ , which could be computationally demanding, especially when the dimensions are large. In addition, the MIP formulation of Saxena et al. [26] may grow exponentially with the dimensions of $\xi^{t}$ , making it only capable of solving (PSCP) with small dimensions of $\xi^{t}$ . Luedtke and Ahmed [19] addressed this difficulty by using the sample average approximation (SAA) approach [19, 22], where (PSCP) is replaced by an approximation problem based on Monte Carlo samples of the random vector $\xi$ . The approximation problem is also a form of (PSCP) but has a finite discrete distribution of $\xi$ . As such, it can be reformulated as a deterministic MIP problem and solved by the state-of-the-art MIP solvers. In general, the selection of the scenario size is crucial for the approximation quality; the larger the scenario size, the smaller the approximation error (see [3, 19, 21] for related discussions). However, since the problem size of the MIP formulation grows linearly with the number of scenarios, this also poses a new challenge for MIP solvers to solve (PSCP) with a huge number of scenarios.

The first contribution of this paper is the proposed customized Benders decomposition (BD) algorithm that is capable of solving (PSCP) with a huge number of scenarios due to the following two reasons. First, the number of variables in the underlying Benders reformulation is independent of the number of scenarios of the random data. Second, the Benders cuts can be separated by an efficient combinatorial algorithm. Our results show that the proposed BD algorithm is much more efficient than a black-box MIP solver’s branch-and-cut and automatic BD algorithms.

The second contribution of this paper is that for (PSCP) with block structure of $\xi$ [26], i.e., problem (PSCP’), we develop a mixed integer nonlinear programming (MINLP) formulation for an alternative approximation problem that explicitly takes the block structure $\xi$ into consideration, and propose a BD algorithm for it. The proposed BD algorithm is much more robust than the state-of-the-art approach in [26] in terms of solving (PSCP’) with different input parameters. Moreover, we show that the new approximation problem can return almost the same solution as the traditional one (when the scenario size is large) but is much more computationally solvable by the proposed BD algorithm.

The rest of the paper is organized as follows. Section 2 presents the mathematical formulations for (PSCP) and (PSCP’), and provides a favorable property for the formulations. Section 3 develops the BD algorithms for solving the two formulations. Finally, Section 4 presents the numerical results.

2 Mathematical formulations

Consider (PSCP) with (arbitrarily) finite discrete distributions of random vectors $\xi$ , that is, there exist finitely many scenarios $\xi_{1},\ldots,\xi_{s}\in\{0,1\}^{m}$ such that $\mathbb{P}\{\xi=\xi_{i}\}=p_{i}$ for $i\in[s]$ and $\sum_{i=1}^{s}p_{i}=1$ . We introduce for each $k\in[m]$ , a binary variable $v_{k}$ , where $v_{k}=1$ guarantees $A_{k}x\geq 1$ , and for each $i\in[s]$ , a binary variable $z_{i}$ , where $z_{i}=1$ guarantees $v\geq\xi_{i}$ . Then (PSCP) can be formulated as the following MIP formulation [19]:


$\displaystyle\min\quad$	$\displaystyle c^{\top}x$	(1a)
$\displaystyle{\rm s.t.}\quad$	$\displaystyle Ax\geq v,$	(1b)
	$\displaystyle v_{k}\geq z_{i},~{}\forall~{}k\in[m],~{}i\in\mathcal{S}_{k},$	(1c)
	$\displaystyle\sum_{i=1}^{s}p_{i}z_{i}\geq 1-\epsilon,$	(1d)
	$\displaystyle x\in\{0,1\}^{n},~{}v\in\{0,1\}^{m},~{}z\in\{0,1\}^{s},$	(1e)

where $\mathcal{S}_{k}=\left\{i\in[s]\,:\,\xi_{ik}=1\right\}$ , $k\in[m]$ . Constraints (1c)–(1e) ensure $\mathbb{P}\{v\geq\xi\}\geq 1-\epsilon$ . Note that if $\xi_{i}=\boldsymbol{0}$ for some $i\in[s]$ , we can set $z_{i}=1$ and remove variable $z_{i}$ from the formulation. Therefore, we can assume $\xi_{i}\neq\boldsymbol{0}$ for all $i\in[s]$ and thus $\cup_{k\in[m]}\mathcal{S}_{k}=[s]$ in the following.

For problem (PSCP’) (i.e., problem (PSCP) with block structure of $\xi$ ) with a finite discrete distribution of $\xi^{t}$ : $\mathbb{P}\{\xi^{t}=\xi^{t}_{i}\}=p_{it}$ for $i\in[s]$ and $\sum_{i=1}^{s}p_{it}=1$ , a similar convex MINLP problem can be presented as follows:


$\displaystyle\min\quad$	$\displaystyle c^{\top}x$	(2a)
$\displaystyle{\rm s.t.}\quad$	$\displaystyle Ax\geq v,$	(2b)
	$\displaystyle v_{k}\geq z_{it},~{}\forall~{}t\in[T],~{}k\in\mathcal{M}^{t},~{}i\in\mathcal{S}^{t}_{k},$	(2c)
	$\displaystyle\eta_{t}\leq\ln\left(\sum_{i=1}^{s}p_{it}z_{it}\right),~{}\forall~{}t\in[T],~{}\sum_{t=1}^{T}\eta_{t}\geq\ln(1-\epsilon),$	(2d)
	$\displaystyle x\in\{0,1\}^{n},~{}v\in\{0,1\}^{m},~{}z\in\{0,1\}^{s\times T},~{}\eta\in\mathbb{R}^{T}.$	(2e)

Here, $\mathcal{S}^{t}_{k}=\left\{i\in[s]\,:\,\xi^{t}_{ik}=1\right\}$ (similarly, we assume $\cup_{k\in\mathcal{M}^{t}}\mathcal{S}^{t}_{k}=[s]$ ); variable $\eta_{t}$ represents the value of $\ln(\mathbb{P}\{v^{t}\geq\xi^{t}\})$ where $v^{t}$ is subvector of $v$ formed by the rows in $\mathcal{M}^{t}$ ; and constraints (2c)–(2e) ensure that $\prod_{t=1}^{T}\mathbb{P}\{v^{t}\geq\xi^{t}\}\geq 1-\epsilon$ .

Both problems (1) and (2) can be seen as approximations of problem (PSCP) in the context of the SAA approach. When the block number $T$ is equal to one, problems (1) and (2) are equivalent. However, when the block number $T$ is larger than one, the two problems are not equivalent in general as the finite discrete distribution of $\xi$ , obtained from Monte Carlo sampling, may not have an independent block structure. In Section 4, we will perform computational experiments to compare the performance of the two approximations.

Although problems (1) and (2) can be solved to optimality by the state-of-the-art MIP/MINLP solvers, the potentially huge numbers of scenario variables $z$ and related constraints make this approach only able to solve moderate-sized instances. To alleviate the computational difficulty, the authors in [17, 18, 19] proposed some preprocessing techniques to reduce the numbers of variables and constraints in problem (1). The preprocessing techniques can also be extended to problem (2). Unfortunately, the numbers of variables and constraints after preprocessing can still grow linearly with the number of scenarios, making it still challenging to solve problems with a huge number of scenarios. For problem (2), Saxena et al. [26] proposed an equivalent MIP reformulation in the space of $x$ and $v$ . However, the problem size of this MIP reformulation depends on the number of the so-called $(1-\epsilon)$ -efficient and -inefficient points, which generally grows exponentially with the block sizes of $\xi^{t}$ .

To overcome the above weakness, we will develop an efficient BD approach in Section 3, which projects the scenario variables $z$ out from problems (1) or (2) and adds the Benders cuts on the fly. This approach is based on a favorable property of the two formulations, detailed as follows. Let (MIP’) (respectively, (MINLP’)) be the relaxation of problem (1) (respectively, (2)) obtained by removing the integrality constraints $z\in\{0,1\}^{s}$ (respectively, $z\in\{0,1\}^{s\times T}$ ). The following proposition shows that this operation does not change the objective values of problems (1) and (2).

Proposition 1

Problems (1) and (2) are equivalent to (MIP’) and (MINLP’), respectively, in terms of sharing the same optimal values.

Proof

Let $o_{\rm{MIP}}$ and $o_{\rm{MIP}\text{'}}$ be the optimal values of problems (1) and (MIP’). Clearly, $o_{\rm{MIP}\text{'}}\leq o_{\rm{MIP}}$ . To prove the other direction, suppose that $(x,v,z)$ is an optimal solution of (MIP’). Let $\bar{z}\in\{0,1\}^{s}$ be defined by $\bar{z}_{i}=\min_{k\in\mathcal{M}_{i}}v_{k}$ where $\mathcal{M}_{i}=\{k\in[m]\,:\,i\in\mathcal{S}_{k}\}$ , $i\in[s]$ . By (1c), we have $z_{i}\leq\min_{k\in\mathcal{M}_{i}}v_{k}=\bar{z}_{i}$ , and thus $(x,{v},\bar{z})$ must be a feasible solution of problem (1). As a result, $o_{\rm{MIP}}\leq o_{\rm{MIP}\text{'}}$ . The proof for the equivalence of problems (2) and (MINLP’) is analogous. ∎

It is also easy to see that relaxing the binary variables $v$ into continuous variables in $[0,1]$ in formulations (MIP’) and (MINLP’) does not change their optimal values. However, as noted by [19], leaving the integrality constraints on variables $v$ can render MIP solvers to favor the branching on variables $v$ and effectively improve the overall performance. Our preliminary experiments also justified this statement (for both (MIP’) and (MINLP’)). Due to this, we decide to leave the integrality constraints $v\in\{0,1\}^{m}$ in the two formulations.

3 Benders decomposition

In this section, we first review the (generalized) BD approach for generic convex MINLP and then apply the BD approach to formulations (1) and (2).

3.1 Benders decomposition for convex MINLPs

Here we briefly review the (generalized) BD algorithm for convex MINLPs. For more details, we refer to [6, 10, 13]. We consider a convex MINLP of the form

\min\{f(x,y)\,:\,g_{\ell}(x,y)\leq 0,~{}\forall~{}\ell\in[L],~{}Ax\geq b,~{}x\in\mathbb{Z}^{p}\times\mathbb{R}^{n-p},~{}y\in\mathbb{R}^{\tau}\},

(3)

where $f(x,y)$ and $g_{1},\ldots,g_{L}$ are convex and twice differentiable functions. For simplicity of discussion, we assume that $f(x,y)=c^{\top}x$ is a linear function of $x$ . Notice that this is the case in the context of solving formulations (1) and (2).

Let $\mathcal{X}=\{x\in\mathbb{Z}^{p}\times\mathbb{R}^{n-p}\,:\,Ax\geq b\}$ and $\mathcal{Y}(x)=\{y\in\mathbb{R}^{\tau}\,:\,g_{\ell}(x,y)\leq 0,\,\forall\,\ell\in[L]\}$ . Problem (3) can be formulated as the master problem in the $x$ space:

\min\{c^{\top}x\,:\,\Psi(x)\leq 0,~{}x\in\mathcal{X}\},

(4)

where

\Psi(x)=\min\left\{\sum_{\ell\in\mathcal{L}^{\bot}}w_{\ell}r_{\ell}\,:\,g_{\ell}(x,y)\leq 0,\,\forall\,\ell\in\mathcal{L},\,g_{\ell}(x,y)\leq r_{\ell},\,r_{\ell}\geq 0,\,\forall\,\ell\in\mathcal{L}^{\bot}\right\}

(5)

is a convex function measuring the infeasibility of $g_{\ell}(x,y)$ , $\ell\in[L]$ , at point $x$ (if $\mathcal{Y}(x)=\varnothing$ ). Here $w_{\ell}>0$ , $\ell\in\mathcal{L}^{\bot}$ , are the weights that can be chosen to reduce to, e.g., ${||\cdot||}_{1}$ norm minimization. For simplicity, we assume that $\mathcal{L}\subseteq[L]$ is the set of constraints that can be satisfied at all $x\in\mathcal{X}$ , and its complement $\mathcal{L}^{\bot}$ is the set of constraints that can possibly be violated at some $x\in\mathcal{X}$ . A special case is that $\mathcal{L}^{\bot}=\{\ell^{\prime}\}$ is a singleton as considered in the context of solving formulations (1) and (2). In this case, we can set $w_{\ell^{\prime}}=1$ and solve an equivalent problem:

{\Psi}(x)=\min\left\{g_{\ell^{\prime}}(x,y)\,:\,~{}g_{\ell}(x,y)\leq 0,~{}\forall~{}\ell\in[L]\backslash\{\ell^{\prime}\}\right\}.

(6)

The master problem (4) can be solved by a linear programming (LP) based B&C approach in which the nonlinear constraint $\Psi(x)\leq 0$ is replaced by the so-called (generalized) Benders feasibility cuts. In particular, let $x^{*}$ be a solution of the LP relaxation of the current master problem. Due to the convexity, $\Psi(x)$ can be underestimated by a supporting hyperplane $\Psi(x^{*})+\mu(x^{*})^{\top}(x-x^{*})$ , so we obtain the following Benders feasibility cut

0\geq\Psi(x^{*})+\mu(x^{*})^{\top}(x-x^{*}),

(7)

where

{\mu}_{i}(x^{*})=\frac{\partial g_{\ell^{\prime}}(x^{*},y^{*})}{\partial x_{i}}+\sum_{\ell\in[L]\backslash\{\ell^{\prime}\}}u_{\ell}^{*}\frac{\partial g_{\ell}(x^{*},y^{*})}{\partial x_{i}},~{}\forall~{}i\in[n],

(8)

$y^{*}$ and $u^{*}$ are optimal primal and Lagrangian dual solutions of the convex problem (6) with $x=x^{*}$ , derived using the Karush-Kuhn-Tucker (KKT) conditions; see [13] for more details.

3.2 Benders decomposition for generic PSCPs

Now, we apply the BD approach to the MIP formulation (1) of the PSCP. From Proposition 1, we project the $z$ variables out from formulation and obtain the master problem

\min\{c^{\top}x\,:\,Ax\geq v,~{}\Psi(v)\leq 0,~{}(x,v)\in\{0,1\}^{n+m}\}.

(9)

For a solution $v^{*}\in[0,1]^{m}$ , there always exists a feasible solution $z^{*}$ satisfying (1c), and thus $\Psi(v^{*})$ can be presented as follows:

\Psi(v^{*})=\min\left\{(1-\epsilon)-\sum_{i=1}^{s}p_{i}z_{i}\,:\,z_{i}\leq v^{*}_{k},~{}\forall~{}k\in[m],~{}i\in\mathcal{S}_{k}\right\}.

(10)

Note that as discussed in the end of Section 2, it is also possible to project $v$ variables out from the formulation. However, based on our experience, leaving the $v$ variables (whose number $m$ is generally not large) in the master problem makes the BD algorithm converge much faster, and thus improves the overall performance.

Let $u_{ki}$ be the dual variable associated with constraint $z_{i}\leq v_{k}^{*}$ in problem (10). The Lagrangian function reads

L(z,u)=(1-\epsilon)-\sum_{i=1}^{s}p_{i}z_{i}+\sum_{k=1}^{m}\sum_{i\in\mathcal{S}_{k}}u_{ki}(z_{i}-v^{*}_{k}).

(11)

Let $u^{*}$ be an optimal dual solution. Hence $\mu_{k}(v^{*})=-\sum_{i\in\mathcal{S}_{k}}u_{ki}^{*}$ , and the Benders feasibility cut (7) reduces to

0\geq\Psi(v^{*})-\sum_{k=1}^{m}\sum_{i\in\mathcal{S}_{k}}u^{*}_{ki}(v_{k}-v_{k}^{*}).

(12)

Next, we derive a closed formula for the Benders feasibility cut (12). Without loss of generality, we assume values $v^{*}$ are sorted non-decreasingly, i.e., $v^{*}_{1}\leq\cdots\leq v_{m}^{*}$ . Let $\mathcal{M}_{i}=\{k\in[m]\,:\,i\in\mathcal{S}_{k}\}$ . Then one optimal solution for problem (10) is given by

z^{*}_{i}=\min_{k\in\mathcal{M}_{i}}v^{*}_{k}=v^{*}_{k_{i}},~{}\text{where}~{}k_{i}=\min\{k\,:\,i\in\mathcal{S}_{k}\},~{}\forall~{}i\in[s].

(13)

Substituting $z^{*}$ into the KKT system

	$\displaystyle\frac{\partial L(z,u)}{\partial z_{i}}=-p_{i}+\sum_{k\in\mathcal{M}_{i}}u_{ki}=0,~{}\forall~{}i\in[s],$
	$\displaystyle u\geq 0,~{}z_{i}\leq v^{}_{k},~{}u_{ki}(z_{i}-v^{}_{k})=0,~{}\forall~{}k\in[m],~{}i\in\mathcal{S}_{k},$

we derive an optimal dual solution as follows:

u^{*}_{ki}=\begin{cases}p_{i},&~{}\text{if}~{}k=k_{i};\\ 0,&~{}\text{otherwise},\end{cases}~{}\forall~{}i\in[s],~{}k\in\mathcal{M}_{i}.

(14)

Plugging $\Psi(v^{*})=1-\epsilon-\sum_{i=1}^{s}p_{i}v^{*}_{k_{i}}$ and $u^{*}$ into (12) and regrouping the terms, we obtain the Benders feasibility cut

\sum_{k=1}^{m}p(\mathcal{S}_{k}\backslash\mathcal{S}_{[k-1]})v_{k}\geq 1-\epsilon.

(15)

Here $\mathcal{S}_{[k-1]}=\cup_{k^{\prime}\in[k-1]}\mathcal{S}_{k^{\prime}}$ , $\mathcal{S}_{[0]}=\varnothing$ , and $p(\mathcal{S}_{k}\backslash\mathcal{S}_{[k-1]})=\sum_{i\in\mathcal{S}_{k}\backslash\mathcal{S}_{[k-1]}}p_{i}$ . Given a solution $v^{*}\in[0,1]^{m}$ , the above procedure for the separation of Benders feasibility cuts can be implemented with the complexity of $\mathcal{O}(m\log(m)+\sum_{k\in[m]}|\mathcal{S}_{k}|)$ .

3.3 Benders decomposition for PSCPs with block structure of $\xi$

Next, we apply the BD approach to the MINLP formulation (2) of (PSCP) with block structure of $\xi$ . Similarly, we project the $z$ variables out from the formulation and obtain the master problem

\displaystyle\min\left\{c^{\top}x:Ax\geq v,\,\sum_{t=1}^{T}\eta_{t}\geq\ln(1-\epsilon),\,\Psi_{t}(v^{t},\eta_{t})\leq 0,\,\forall~{}t\in[T],\,(x,v)\in\{0,1\}^{n+m}\right\},

(16)

where for a solution $(v^{*},\eta^{*})\in[0,1]^{m}\times\mathbb{R}^{T}$ , $\Psi_{t}((v^{t})^{*},\eta_{t}^{*})$ , $t\in[T]$ , are given by

\Psi_{t}((v^{t})^{*},\eta_{t}^{*})=\min\left\{\eta^{*}_{t}-\ln\left(\sum_{i=1}^{s}p_{it}z_{it}\right):z_{it}\leq v^{*}_{k},\,\forall~{}k\in\mathcal{M}^{t},\,i\in\mathcal{S}^{t}_{k}\right\}.

(17)

Let $m_{t}=|\mathcal{M}^{t}|$ . Without loss of generality, we assume $\mathcal{M}^{t}=[m_{t}]$ and $v_{1}^{*}\leq\cdots\leq v_{m_{t}}^{*}$ . Note that $\Psi_{t}((v^{t})^{*},\eta_{t}^{*})$ may not be well-defined for some $((v^{t})^{*},\eta_{t}^{*})\in[0,1]^{m_{t}}\times\mathbb{R}$ . Indeed, letting $\mathcal{M}^{t}_{i}=\{k\in\mathcal{M}^{t}\,:\,i\in\mathcal{S}^{t}_{k}\}$ , if $\min_{k\in\mathcal{M}^{t}_{i}}v^{*}_{k}=0$ for all $i\in[s]$ , then for any feasible solution $z_{\cdot t}$ of problem (17), $z_{it}=0$ must hold for all $i\in[s]$ , and thus $\Psi_{t}((v^{t})^{*},\eta_{t}^{*})=+\infty$ . To bypass this difficulty, we may add the valid inequality $\sum_{k=1}^{k^{\prime}}v_{k}\geq 1$ to the master problem (16) to remove $(v^{*},\eta^{*})$ , where $k^{\prime}\in[m_{t}]$ is the minimum index such that $\cup_{k=1}^{k^{\prime}}\mathcal{S}^{t}_{k}=[s]$ .

Next, we consider the case that $\min_{k\in\mathcal{M}^{t}_{i}}v^{*}_{k}>0$ holds for some $i\in[s]$ , i.e., $\Psi_{t}((v^{t})^{*},\eta_{t}^{*})<+\infty$ is well-defined. Let $u_{ki}$ be the dual variable associated with constraint $z_{it}\leq v_{k}^{*}$ in problem (17). The Lagrangian function for problem (17) reads

L(z_{\cdot t},u)=\eta^{*}_{t}-\ln\left(\sum_{i=1}^{s}p_{i}z_{it}\right)+\sum_{k\in\mathcal{M}^{t}}\sum_{i\in\mathcal{S}^{t}_{k}}u_{ki}(z_{it}-v^{*}_{k}).

(18)

Using a similar of analysis as in Section 3.2, the Benders feasibility cut (7) can be written as

0\geq\Psi_{t}((v^{t})^{*},\eta_{t}^{*})+(\eta_{t}-\eta_{t}^{*})-\sum_{k\in\mathcal{M}^{t}}\sum_{i\in\mathcal{S}^{t}_{k}}u^{*}_{ki}(v_{k}-v_{k}^{*}).

(19)

where $u^{*}$ is the optimal dual solution.

It is easy to see that the point $z^{*}_{\cdot t}$ defined by

z^{*}_{it}=\min_{k\in\mathcal{M}^{t}_{i}}v^{*}_{k}=v^{*}_{k_{i}},~{}\text{where}~{}k_{i}=\min\{k\,:\,i\in\mathcal{S}^{t}_{k}\},~{}\forall~{}i\in[s],

(20)

is optimal for problem (17). Substituting $z_{\cdot t}^{*}$ into the following KKT system

	$\displaystyle\frac{\partial L(z_{\cdot t},u)}{\partial z_{it}}=\frac{-p_{i}}{\sum_{i=1}^{s}p_{i}z_{it}}+\sum_{k\in\mathcal{M}^{t}_{i}}u_{ki}=0,~{}\forall~{}i\in[s],$
	$\displaystyle u\geq 0,~{}z_{it}\leq v^{}_{k},~{}u_{ki}(z_{it}-v^{}_{k})=0,~{}\forall~{}k\in\mathcal{M}^{t},~{}i\in\mathcal{S}^{t}_{k},$

we can see that $u^{*}$ defined by

u^{*}_{ki}=\begin{cases}\frac{p_{i}}{\delta},&~{}\text{if}~{}k=k_{i};\\ 0,&~{}\text{otherwise},\end{cases}~{}\forall~{}i\in[s],~{}k\in\mathcal{M}^{t}_{i},~{}\text{where}~{}\delta=\sum_{i=1}^{s}p_{i}v^{*}_{k_{i}},

(21)

is an optimal dual solution. Thus, the Benders feasibility cut (19) becomes

\sum_{k=1}^{m_{t}}{p(\mathcal{S}^{t}_{k}\backslash\mathcal{S}^{t}_{[k-1]})}v_{k}\geq{\delta}(\eta_{t}+1-\ln\delta),

(22)

where $\mathcal{S}^{t}_{[k-1]}=\cup_{k^{\prime}\in[k-1]}\mathcal{S}^{t}_{k^{\prime}}$ , $\mathcal{S}^{t}_{[0]}=\varnothing$ , and $p(\mathcal{S}^{t}_{k}\backslash\mathcal{S}^{t}_{[k-1]})=\sum_{i\in\mathcal{S}^{t}_{k}\backslash\mathcal{S}^{t}_{[k-1]}}p_{it}$ .

For a solution $(v^{*},\eta^{*})\in[0,1]^{m}\times\mathbb{R}^{T}$ , the Benders feasibility cuts (22) can be separated in the complexity of $\mathcal{O}(\sum_{t\in[T]}(m_{t}\log(m_{t})+\sum_{k\in[m_{t}]}|\mathcal{S}^{t}_{k}|))$ . It is worthwhile remarking that unlike the BD algorithm for formulation (1) in which at most one cut is added in one iteration, we can add up to $T$ cuts in one iteration in the BD algorithm for formulation (2).

4 Computational results

In this section, we present computational results to illustrate the effectiveness of the proposed BD algorithms for solving large-scale PSCPs. The proposed BD algorithms were implemented in Julia 1.7.3 using CPLEX 20.1.0. We use a branch-and-Benders-cut (B&BC) approach [25] to implement the BD algorithms in Sections 3.2 and 3.3, where the Benders feasibility cuts are added on the fly at each node of the search tree via the cut callback framework. In addition, before starting the B&BC algorithm, we apply the processing technique [17, 18, 19] to simplify formulations (1) and (2) and follow [9, 11] to apply an iterative procedure to solve the LP relaxation of the master problem. The latter attempts to collect effective Benders feasibility cuts for the initialization of the relaxed master problem, and improve the overall performance. At every iteration of the procedure, the LP is solved, and the Benders feasibility cuts violated by the current LP solution are added to the LP. This loop is repeated until the dual bound increases less than or equal to 0.01% for 100 consecutive times. At the end of this procedure, all violated cuts are added as static cuts to the relaxed master problem. The parameters of CPLEX were set to run the code in a single-threaded mode, with a time limit of 7200 seconds and a relative MIP gap of 0%. The computational experiments were conducted on a cluster of computers equipped with Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. Throughout this section, all averages are reported as the shifted geometric mean, with a shift of $1$ [1].

Following [26], we construct PSCP instances using the 60 deterministic SCP instances [5].We construct instances with two block sizes, namely 10 and 20, and instances that do not have a block structure of the random variable $\xi$ (that is, $\xi$ has only a single block). We use two probability distributions of $\xi$ called circular and star distributions [8, 26] to construct scenarios of $\xi_{i}$ or $\xi_{i}^{t}$ . For the circular distribution, the $k$ -th entry of random vector $\xi$ is set to $\xi_{k}=\max\{Y_{k},Y_{(k\bmod m)+1}\}$ , where $Y_{k}$ , $k\in[m]$ , are independent Bernoulli random variables following the distribution $\Pr\{Y_{k}=1\}=\alpha_{k}$ , $\alpha_{k}$ are uniformly chosen from $[0.01,0.0275]$ , and $\bmod$ denotes the modulo operator. For the star distribution, the $k$ -th entry of random vector $\xi$ is set to $0$ if $V_{k}\leq 1$ and $1$ otherwise, where $V_{k}=Y_{k}+Y_{m+1}$ , $Y_{k}$ , $k\in[m+1]$ , are independent Poisson random variables following the distribution $\Pr(Y_{k}=\tau)=\frac{e^{-\lambda_{k}}(\lambda_{k})^{\tau}}{\tau!}$ ( $\tau\in\mathbb{Z}_{+}$ ), and $\lambda_{k}$ are uniformly chosen from $[0.1,0.2]$ . The distributions of random variables $\xi^{t}$ can be described in a similar manner. In our random construction procedure, the scenario size $s$ and the confidence parameter $\epsilon$ are chosen from $\{10^{4},10^{6}\}$ and $\{0.05,0.1\}$ , respectively. In total, there are $1440$ PSCP instances of problems (1) and (2) in our testbed.

Table 1: Performance comparison of solving problems (1) and (2).

		All	Formulation (1)				Formulation (2)				Difference
			S	T	G(%)	ST	S	T	G(%)	ST	ND	$\rm\Delta$ O(%)
$s$	$10^{4}$	720	580	140.9	18.8	3.6	644	63.0	18.8	3.6	15	1.0
	$10^{6}$	720	571	224.0	19.1	26.7	617	158.4	16.1	33.2	3	1.0
Block size	10	480	384	162.1	24.4	9.3	454	59.1	14.5	10.0	10	0.9
	20	480	388	173.7	24.1	10.1	443	61.4	13.6	9.4	8	1.0
	$m$	480	379	199.3	16.2	11.6	364	273.3	19.5	16.4	0	0.0

We first compare the performance of problems (1) and (2) when using them as Monte Carlo sampling approximations to solve PSCPs. We use the BD algorithms in Sections 3.2 and 3.3 to solve the two problems. Table 1 reports the total number of instances solved within the time limit (S), the average CPU time in seconds (T), and the average separation time in seconds (ST). For instances that cannot be solved within the time limit, we report the average relative end gap (G(%)) in percentage returned by CPLEX. To see the difference between the two problems, we report the number of instances with different optimal values (ND) and the gap between the optimal values of these instances (defined by $\texttt{$\rm\Delta$O(\%)}=100\times\frac{|o_{1}-o_{2}|}{\max\{o_{1},o_{2}\}}$ , where $o_{1}$ and $o_{2}$ represent the optimal values of the two problems). As shown in Table 1, for most cases, the optimal values of problems (1) and (2) are identical, and even if the optimal values are different in some cases, their difference is usually very small. This shows that the gap between the two problems is indeed very small, especially for the cases with a huge scenario size. As for the computational efficiency, for instances with multiple blocks (or block sizes of 10 and 20), problem (2) is much more solvable by the proposed BD algorithm. This is reasonable as by exploiting the block structure of the random variables $\xi$ , up to $T$ cuts can be constructed in a single iteration, making the proposed BD algorithm for problem (2) converge much faster. In contrast, for instances with a single block, solving problem (2) is outperformed by solving problem (1).

Next, we compare the performance of the proposed BD algorithm (called BD) with the CPLEX’s default branch-and-cut and automatic BD algorithms (called CPX and AUTO-BD, respectively) for solving generic PSCPs with arbitrary distribution of $\xi$ (i.e., problem (1)). The results are summarized in Table 2. From Table 2, we see that the performance of BD is much better than CPX. Indeed, for large-scale instances with $10^{6}$ scenarios, CPX was even unable to solve the LP relaxation of problem (1) within the given time limit. Comparing AUTO-BD and BD, we observe that for small-scale instances with $10^{4}$ scenarios, BD is outperformed by AUTO-BD by a factor of $1.4$ (in terms of CPU time); but for large-scale instances with $10^{6}$ scenarios, BD outperforms AUTO-BD by a factor of $3.1$ .

Table 2: Performance comparison of CPX, AUTO-BD, and BD on problem (1). “–” means that CPX was unable to find a feasible solution within the timelimit.

$s$	All	CPX			AUTO-BD			BD
		S	T	G(%)	S	T	G(%)	S	T	G(%)	ST
$10^{4}$	720	528	975.9	26.7	582	102.0	14.7	580	140.9	15.5	3.6
$10^{6}$	720	–	–	–	578	700.0	14.3	571	224.0	14.9	26.7

Finally, we compare the performance of the proposed BD algorithm with the B&C algorithm of Saxena et al. [26] (denoted by B&C) for solving PSCPs with block structure of $\xi$ (i.e., problem (2)). B&C solves an MIP formulation built on the so-called $(1-\epsilon)$ -efficient and -inefficient points (which must be enumerated before starting the algorithm). Table 3 summarizes the performance results on instances with block sizes being $10$ and $20$ . We do not report the results on instances with a single block, as the block size is too large to enumerate the $(1-\epsilon)$ -efficient and -inefficient points within a reasonable time of period. For setting B&C, we report the average number of (linear) constraints that are used to model the probabilistic constraint (C); and for completeness, we also report the CPU time spent in the enumeration phase (ET), which is not included in the time spent in solving the MIP formulation. As shown in Table 3, when $\epsilon$ and the block size are small or $\xi^{t}$ is constructed by a circular distribution, the number of constraints in the MIP formulation of Saxena et al. [26] is small, and thus B&C performs better than BD. However, for instances with a large $\epsilon$ , a large block size, or the star distribution of $\xi^{t}$ , due to the large number of constraints in the MIP formulation, using B&C yields worse performance. In contrast, as the proposed BD algorithm does not need to solve a large MIP formulation, it performs much better on these instances. Moreover, it can avoid calling a potentially time-consuming enumeration procedure of $(1-\epsilon)$ -efficient and -inefficient points.

Table 3: Performance comparison of B&C of [26] and BD on problem (2)

		All	B&C					BD
			S	T	G(%)	C	ET	S	T	G(%)	ST
$\epsilon$	0.05	480	477	23.4	10.0	2308	107.9	454	57.0	11.5	9.1
	0.1	480	385	274.0	21.5	27874	156.6	443	63.7	14.4	10.4
Block size	10	480	473	35.1	13.6	3658	4.4	454	59.1	18.3	10.0
	20	480	389	184.8	21.5	17592	3183.5	443	61.4	13.3	9.4
Distribution	circular	480	456	51.2	15.8	5008	79.0	450	60.2	15.2	9.4
	star	480	406	127.3	23.3	12853	213.7	447	60.4	13.3	10.0

In summary, our computational results illustrate the effectiveness of our proposed BD algorithm for solving PSCPs. More specifically, for generic PSCPs with arbitrary distribution of $\xi$ , the proposed BD algorithm is much more efficient than the branch-and-cut and automatic BD algorithms of CPLEX; for PSCPs with block structure of $\xi$ , the proposed BD algorithm is much more robust than the state-of-the-art approach in [26] in terms of solving (PSCP’) with different input parameters.

References

[1] Achterberg, T.: Constraint Integer Programming. Ph.D. thesis, Technical University of Berlin (2007)
[2] Ahmed, S., Papageorgiou, D.J.: Probabilistic set covering with correlations. Operations Research 61(2), 438–452 (2013)
[3] Ahmed, S., Shapiro, A.: Solving chance-constrained stochastic programs via sampling and integer programming. In: Chen, Z.L., Raghavan, S. (eds.) Tutorials in Operations Research: State-of-the-Art Decision-Making Tools in the Information-Intensive Age, pp. 261–269. INFORMS (2008)
[4] Azizi, N., García, S., Irawan, C.A.: Discrete location problems with uncertainty. In: Salhi, S., Boylan, J. (eds.) The Palgrave Handbook of Operations Research, pp. 43–71. Palgrave Macmillan, Cham (2022)
[5] Beasley, J.E.: OR-library: Distributing test problems by electronic mail. Journal of the Operational Research Society 41(11), 1069–1072 (1990)
[6] Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. Acta Numerica 22, 1–131 (2013)
[7] Beraldi, P., Bruni, M.E.: An exact approach for solving integer problems under probabilistic constraints with random technology matrix. Annals of Operations Research 177(1), 127–137 (2010)
[8] Beraldi, P., Ruszczyński, A.: The probabilistic set-covering problem. Operations Research 50(6), 956–967 (2002)
[9] Bodur, M., Luedtke, J.R.: Mixed-integer rounding enhanced Benders decomposition for multiclass service-system staffing and scheduling with arrival rate uncertainty. Management Science 63(7), 2073–2091 (2017)
[10] Bonami, P., Kilinç, M., Linderoth, J.: Algorithms and software for convex mixed integer nonlinear programs. In: Lee, J., Leyffer, S. (eds.) Mixed Integer Nonlinear Programming, pp. 1–39. Springer New York (2012)
[11] Fischetti, M., Ljubić, I., Sinnl, M.: Benders decomposition without separability: A computational study for capacitated facility location problems. European Journal of Operational Research 253(3), 557–569 (2016)
[12] Fischetti, M., Monaci, M.: Cutting plane versus compact formulations for uncertain (integer) linear programs. Mathematical Programming Computation 4, 239–273 (2012)
[13] Geoffrion, A.M.: Generalized Benders decomposition. Journal of Optimization Theory and Applications 10(4), 237–260 (1972)
[14] Hwang, H.S.: A stochastic set-covering location model for both ameliorating and deteriorating items. Computers & Industrial Engineering 46(2), 313–319 (2004)
[15] Küçükyavuz, S., Jiang, R.: Chance-constrained optimization under limited distributional information: A review of reformulations based on sampling and distributional robustness. EURO Journal on Computational Optimization 10, 100030 (2022)
[16] Küçükyavuz, S., Sen, S.: An introduction to two-stage stochastic mixed-integer programming. In: Batta, R., Peng, J. (eds.) Tutorials in Operations Research: Leading Developments from INFORMS Communities, pp. 1–27. INFORMS (2017)
[17] Lejeune, M.A.: Preprocessing techniques and column generation algorithms for stochastically efficient demand. Journal of the Operational Research Society 59(9), 1239–1252 (2008)
[18] Lejeune, M., Noyan, N.: Mathematical programming approaches for generating $p$ -efficient points. European Journal of Operational Research 207(2), 590–600 (2010)
[19] Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM Journal on Optimization 19(2), 674–699 (2008)
[20] Lutter, P., Degel, D., Büsing, C., Koster, A.M.C.A., Werners, B.: Improved handling of uncertainty and robustness in set covering problems. European Journal of Operational Research 263(1), 35–49 (2017)
[21] Nemirovski, A., Shapiro, A.: Scenario approximations of chance constraints. In: Calafiore, G., Dabbene, F. (eds.) Probabilistic and Randomized Methods for Design under Uncertainty, pp. 3–47. Springer London (2006)
[22] Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: Theory and applications. Journal of Optimization Theory and Applications 142(2), 399–416 (2009)
[23] Prékopa, A.: Dual method for the solution of a one-stage stochastic programming problem with random RHS obeying a discrete probability distribution. Zeitschrift für Operations Research 34(6), 441–461 (1990)
[24] Prékopa, A.: Probabilistic programming. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming: Handbooks in Operations Research and Management Science, vol. 10, pp. 267–351. Elsevier (2003)
[25] Rahmaniani, R., Crainic, T.G., Gendreau, M., Rei, W.: The Benders decomposition algorithm: A literature review. European Journal of Operational Research 259(3), 801–817 (2017)
[26] Saxena, A., Goyal, V., Lejeune, M.A.: MIP reformulations of the probabilistic set covering problem. Mathematical Programming 121(1), 1–31 (2010)
[27] Shen, H., Jiang, R.: Chance-constrained set covering with Wasserstein ambiguity. Mathematical Programming 198(1), 621–674 (2023)
[28] Song, Y., Luedtke, J.R.: Branch-and-cut approaches for chance-constrained formulations of reliable network design problems. Mathematical Programming Computation 5(4), 397–432 (2013)
[29] Toregas, C., Swain, R., ReVelle, C.S., Bergman, L.: The location of emergency service facilities. Operations Research 19(6), 1363–1373 (1971)
[30] Wu, H.H., Küçükyavuz, S.: Probabilistic partial set covering with an oracle for chance constraints. SIAM Journal on Optimization 29(1), 690–718 (2019)