This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China, 11email: {chenweikun,chenyilong,lvwei}@bit.edu.cn 22institutetext: Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, 22email: [email protected]

Towards large-scale probabilistic set covering problem: an efficient Benders decomposition approach

Wei-Kun Chen 11    Yi-Long Chen 11    Yu-Hong Dai 22    Wei Lv 11
Abstract

In this paper, we investigate the probabilistic set covering problems (PSCP) in which the right-hand side is a random vector ξ\xi and the covering constraint is required to be satisfied with a prespecified probability. We consider the case arising from sample average approximation (or finite discrete distributions). We develop an effective Benders decomposition (BD) algorithm for solving large-scale PSCPs, which enjoys two key advantages: (i) the number of variables in the underlying Benders reformulation is independent of the scenario size; and (ii) the Benders cuts can be separated by an efficient combinatorial algorithm. For the special case that ξ\xi is a combination of several independent random blocks/subvectors, we explicitly take this kind of block structure into consideration and develop a more efficient BD algorithm. Numerical results on instances with up to one million scenarios demonstrate the effectiveness of the proposed BD algorithms over a black-box MIP solver’s branch-and-cut and automatic BD algorithms and a state-of-the-art algorithm in the literature.

Keywords:
Benders decomposition Large-scale optimization Probabilistic set covering Sample average approximations

1 Introduction

In this paper, we consider the probabilistic set covering problem (PSCP) of the form:

min{cx:{Axξ}1ϵ,x{0,1}n},\displaystyle\min\left\{c^{\top}x\,:\,\mathbb{P}\{Ax\geq\xi\}\geq 1-\epsilon,~{}x\in\{0,1\}^{n}\right\}, (PSCP)

where AA is a 0-1 matrix of dimension m×nm\times n, c+nc\in\mathbb{R}^{n}_{+} is the cost of columns, ϵ\epsilon is a confidence parameter chosen by the decision maker, typically near zero, and ξ\xi is a random vector taking values in {0,1}m\{0,1\}^{m}. (PSCP) generalizes the well-known set covering problem (SCP) [29] where the random vector ξ\xi is replaced by the all ones vector and has obvious applications in location problems [8]. It also serves as a building block in various applications; see [4, 26].

(PSCP) falls in the class of probabilistic programs (also known as chance-constrained programs), which has been extensively investigated in the literatures. We refer to the surveys [3, 16, 15, 24] for a detailed discussion of probabilistic programs and the various algorithms. Beraldi and Ruszczyński [8] first investigated (PSCP) and developed a specialized branch-and-bound algorithm, which involves enumerating the so-called (1ϵ)(1-\epsilon)-efficient points introduced by [23] and solving a deterministic SCP for each (1ϵ)(1-\epsilon)-efficient point. Saxena et al. [26] studied the special case in which the random vector ξ\xi can be decomposed into TT blocks/subvectors ξ1,,ξT\xi^{1},\ldots,\xi^{T} formed by subsets t[m]\mathcal{M}^{t}\subseteq[m], and ξt1\xi^{t_{1}} and ξt2\xi^{t_{2}} are independent for any two distinct t1,t2[T]t_{1},t_{2}\in[T] (throughout the paper, for a nonnegative integer τ\tau, we denote [τ]={1,,τ}[\tau]=\{1,\ldots,\tau\} where [τ]=[\tau]=\varnothing if τ=0\tau=0). Problem (PSCP) in this special case can be rewritten as

min{cx:t=1T{Atxξt}1ϵ,x{0,1}n},\displaystyle\min\left\{c^{\top}x\,:\,\prod_{t=1}^{T}\mathbb{P}\{A^{t}x\geq\xi^{t}\}\geq 1-\epsilon,~{}x\in\{0,1\}^{n}\right\}, (PSCP’)

where AtA^{t} is the submatrix formed by the rows in t\mathcal{M}^{t}, t[T]t\in[T]. Saxena et al. [26] developed an equivalent mixed integer programming (MIP) formulation for (PSCP’) based on the (1ϵ)(1-\epsilon)-efficient and -inefficient points of the distribution of ξt\xi^{t} and solved it by a customized branch-and-cut (B&C) algorithm. For set covering models with uncertainty on the constraint matrix AA, we refer to [2, 7, 12, 14, 20, 27, 28, 30] and the references therein.

One weakness of the above two approaches for solving (PSCP) is that it requires an enumeration of the (1ϵ)(1-\epsilon)-efficient (and -inefficient) points of the distribution of ξ\xi or ξt\xi^{t}, which could be computationally demanding, especially when the dimensions are large. In addition, the MIP formulation of Saxena et al. [26] may grow exponentially with the dimensions of ξt\xi^{t}, making it only capable of solving (PSCP) with small dimensions of ξt\xi^{t}. Luedtke and Ahmed [19] addressed this difficulty by using the sample average approximation (SAA) approach [19, 22], where (PSCP) is replaced by an approximation problem based on Monte Carlo samples of the random vector ξ\xi. The approximation problem is also a form of (PSCP) but has a finite discrete distribution of ξ\xi. As such, it can be reformulated as a deterministic MIP problem and solved by the state-of-the-art MIP solvers. In general, the selection of the scenario size is crucial for the approximation quality; the larger the scenario size, the smaller the approximation error (see [3, 19, 21] for related discussions). However, since the problem size of the MIP formulation grows linearly with the number of scenarios, this also poses a new challenge for MIP solvers to solve (PSCP) with a huge number of scenarios.

The first contribution of this paper is the proposed customized Benders decomposition (BD) algorithm that is capable of solving (PSCP) with a huge number of scenarios due to the following two reasons. First, the number of variables in the underlying Benders reformulation is independent of the number of scenarios of the random data. Second, the Benders cuts can be separated by an efficient combinatorial algorithm. Our results show that the proposed BD algorithm is much more efficient than a black-box MIP solver’s branch-and-cut and automatic BD algorithms.

The second contribution of this paper is that for (PSCP) with block structure of ξ\xi [26], i.e., problem (PSCP’), we develop a mixed integer nonlinear programming (MINLP) formulation for an alternative approximation problem that explicitly takes the block structure ξ\xi into consideration, and propose a BD algorithm for it. The proposed BD algorithm is much more robust than the state-of-the-art approach in [26] in terms of solving (PSCP’) with different input parameters. Moreover, we show that the new approximation problem can return almost the same solution as the traditional one (when the scenario size is large) but is much more computationally solvable by the proposed BD algorithm.

The rest of the paper is organized as follows. Section 2 presents the mathematical formulations for (PSCP) and (PSCP’), and provides a favorable property for the formulations. Section 3 develops the BD algorithms for solving the two formulations. Finally, Section 4 presents the numerical results.

2 Mathematical formulations

Consider (PSCP) with (arbitrarily) finite discrete distributions of random vectors ξ\xi, that is, there exist finitely many scenarios ξ1,,ξs{0,1}m\xi_{1},\ldots,\xi_{s}\in\{0,1\}^{m} such that {ξ=ξi}=pi\mathbb{P}\{\xi=\xi_{i}\}=p_{i} for i[s]i\in[s] and i=1spi=1\sum_{i=1}^{s}p_{i}=1. We introduce for each k[m]k\in[m], a binary variable vkv_{k}, where vk=1v_{k}=1 guarantees Akx1A_{k}x\geq 1, and for each i[s]i\in[s], a binary variable ziz_{i}, where zi=1z_{i}=1 guarantees vξiv\geq\xi_{i}. Then (PSCP) can be formulated as the following MIP formulation [19]:

min\displaystyle\min\quad cx\displaystyle c^{\top}x (1a)
s.t.\displaystyle{\rm s.t.}\quad Axv,\displaystyle Ax\geq v, (1b)
vkzi,k[m],i𝒮k,\displaystyle v_{k}\geq z_{i},~{}\forall~{}k\in[m],~{}i\in\mathcal{S}_{k}, (1c)
i=1spizi1ϵ,\displaystyle\sum_{i=1}^{s}p_{i}z_{i}\geq 1-\epsilon, (1d)
x{0,1}n,v{0,1}m,z{0,1}s,\displaystyle x\in\{0,1\}^{n},~{}v\in\{0,1\}^{m},~{}z\in\{0,1\}^{s}, (1e)

where 𝒮k={i[s]:ξik=1}\mathcal{S}_{k}=\left\{i\in[s]\,:\,\xi_{ik}=1\right\}, k[m]k\in[m]. Constraints (1c)–(1e) ensure {vξ}1ϵ\mathbb{P}\{v\geq\xi\}\geq 1-\epsilon. Note that if ξi=𝟎\xi_{i}=\boldsymbol{0} for some i[s]i\in[s], we can set zi=1z_{i}=1 and remove variable ziz_{i} from the formulation. Therefore, we can assume ξi𝟎\xi_{i}\neq\boldsymbol{0} for all i[s]i\in[s] and thus k[m]𝒮k=[s]\cup_{k\in[m]}\mathcal{S}_{k}=[s] in the following.

For problem (PSCP’) (i.e., problem (PSCP) with block structure of ξ\xi) with a finite discrete distribution of ξt\xi^{t}: {ξt=ξit}=pit\mathbb{P}\{\xi^{t}=\xi^{t}_{i}\}=p_{it} for i[s]i\in[s] and i=1spit=1\sum_{i=1}^{s}p_{it}=1, a similar convex MINLP problem can be presented as follows:

min\displaystyle\min\quad cx\displaystyle c^{\top}x (2a)
s.t.\displaystyle{\rm s.t.}\quad Axv,\displaystyle Ax\geq v, (2b)
vkzit,t[T],kt,i𝒮kt,\displaystyle v_{k}\geq z_{it},~{}\forall~{}t\in[T],~{}k\in\mathcal{M}^{t},~{}i\in\mathcal{S}^{t}_{k}, (2c)
ηtln(i=1spitzit),t[T],t=1Tηtln(1ϵ),\displaystyle\eta_{t}\leq\ln\left(\sum_{i=1}^{s}p_{it}z_{it}\right),~{}\forall~{}t\in[T],~{}\sum_{t=1}^{T}\eta_{t}\geq\ln(1-\epsilon), (2d)
x{0,1}n,v{0,1}m,z{0,1}s×T,ηT.\displaystyle x\in\{0,1\}^{n},~{}v\in\{0,1\}^{m},~{}z\in\{0,1\}^{s\times T},~{}\eta\in\mathbb{R}^{T}. (2e)

Here, 𝒮kt={i[s]:ξikt=1}\mathcal{S}^{t}_{k}=\left\{i\in[s]\,:\,\xi^{t}_{ik}=1\right\} (similarly, we assume kt𝒮kt=[s]\cup_{k\in\mathcal{M}^{t}}\mathcal{S}^{t}_{k}=[s]); variable ηt\eta_{t} represents the value of ln({vtξt})\ln(\mathbb{P}\{v^{t}\geq\xi^{t}\}) where vtv^{t} is subvector of vv formed by the rows in t\mathcal{M}^{t}; and constraints (2c)–(2e) ensure that t=1T{vtξt}1ϵ\prod_{t=1}^{T}\mathbb{P}\{v^{t}\geq\xi^{t}\}\geq 1-\epsilon.

Both problems (1) and (2) can be seen as approximations of problem (PSCP) in the context of the SAA approach. When the block number TT is equal to one, problems (1) and (2) are equivalent. However, when the block number TT is larger than one, the two problems are not equivalent in general as the finite discrete distribution of ξ\xi, obtained from Monte Carlo sampling, may not have an independent block structure. In Section 4, we will perform computational experiments to compare the performance of the two approximations.

Although problems (1) and (2) can be solved to optimality by the state-of-the-art MIP/MINLP solvers, the potentially huge numbers of scenario variables zz and related constraints make this approach only able to solve moderate-sized instances. To alleviate the computational difficulty, the authors in [17, 18, 19] proposed some preprocessing techniques to reduce the numbers of variables and constraints in problem (1). The preprocessing techniques can also be extended to problem (2). Unfortunately, the numbers of variables and constraints after preprocessing can still grow linearly with the number of scenarios, making it still challenging to solve problems with a huge number of scenarios. For problem (2), Saxena et al. [26] proposed an equivalent MIP reformulation in the space of xx and vv. However, the problem size of this MIP reformulation depends on the number of the so-called (1ϵ)(1-\epsilon)-efficient and -inefficient points, which generally grows exponentially with the block sizes of ξt\xi^{t}.

To overcome the above weakness, we will develop an efficient BD approach in Section 3, which projects the scenario variables zz out from problems (1) or (2) and adds the Benders cuts on the fly. This approach is based on a favorable property of the two formulations, detailed as follows. Let (MIP’) (respectively, (MINLP’)) be the relaxation of problem (1) (respectively, (2)) obtained by removing the integrality constraints z{0,1}sz\in\{0,1\}^{s} (respectively, z{0,1}s×Tz\in\{0,1\}^{s\times T}). The following proposition shows that this operation does not change the objective values of problems (1) and (2).

Proposition 1

Problems (1) and (2) are equivalent to (MIP’) and (MINLP’), respectively, in terms of sharing the same optimal values.

Proof

Let oMIPo_{\rm{MIP}} and oMIPo_{\rm{MIP}\text{'}} be the optimal values of problems (1) and (MIP’). Clearly, oMIPoMIPo_{\rm{MIP}\text{'}}\leq o_{\rm{MIP}}. To prove the other direction, suppose that (x,v,z)(x,v,z) is an optimal solution of (MIP’). Let z¯{0,1}s\bar{z}\in\{0,1\}^{s} be defined by z¯i=minkivk\bar{z}_{i}=\min_{k\in\mathcal{M}_{i}}v_{k} where i={k[m]:i𝒮k}\mathcal{M}_{i}=\{k\in[m]\,:\,i\in\mathcal{S}_{k}\}, i[s]i\in[s]. By (1c), we have ziminkivk=z¯iz_{i}\leq\min_{k\in\mathcal{M}_{i}}v_{k}=\bar{z}_{i}, and thus (x,v,z¯)(x,{v},\bar{z}) must be a feasible solution of problem (1). As a result, oMIPoMIPo_{\rm{MIP}}\leq o_{\rm{MIP}\text{'}}. The proof for the equivalence of problems (2) and (MINLP’) is analogous. ∎

It is also easy to see that relaxing the binary variables vv into continuous variables in [0,1][0,1] in formulations (MIP’) and (MINLP’) does not change their optimal values. However, as noted by [19], leaving the integrality constraints on variables vv can render MIP solvers to favor the branching on variables vv and effectively improve the overall performance. Our preliminary experiments also justified this statement (for both (MIP’) and (MINLP’)). Due to this, we decide to leave the integrality constraints v{0,1}mv\in\{0,1\}^{m} in the two formulations.

3 Benders decomposition

In this section, we first review the (generalized) BD approach for generic convex MINLP and then apply the BD approach to formulations (1) and (2).

3.1 Benders decomposition for convex MINLPs

Here we briefly review the (generalized) BD algorithm for convex MINLPs. For more details, we refer to [6, 10, 13]. We consider a convex MINLP of the form

min{f(x,y):g(x,y)0,[L],Axb,xp×np,yτ},\min\{f(x,y)\,:\,g_{\ell}(x,y)\leq 0,~{}\forall~{}\ell\in[L],~{}Ax\geq b,~{}x\in\mathbb{Z}^{p}\times\mathbb{R}^{n-p},~{}y\in\mathbb{R}^{\tau}\}, (3)

where f(x,y)f(x,y) and g1,,gLg_{1},\ldots,g_{L} are convex and twice differentiable functions. For simplicity of discussion, we assume that f(x,y)=cxf(x,y)=c^{\top}x is a linear function of xx. Notice that this is the case in the context of solving formulations (1) and (2).

Let 𝒳={xp×np:Axb}\mathcal{X}=\{x\in\mathbb{Z}^{p}\times\mathbb{R}^{n-p}\,:\,Ax\geq b\} and 𝒴(x)={yτ:g(x,y)0,[L]}\mathcal{Y}(x)=\{y\in\mathbb{R}^{\tau}\,:\,g_{\ell}(x,y)\leq 0,\,\forall\,\ell\in[L]\}. Problem (3) can be formulated as the master problem in the xx space:

min{cx:Ψ(x)0,x𝒳},\min\{c^{\top}x\,:\,\Psi(x)\leq 0,~{}x\in\mathcal{X}\}, (4)

where

Ψ(x)=min{wr:g(x,y)0,,g(x,y)r,r0,}\Psi(x)=\min\left\{\sum_{\ell\in\mathcal{L}^{\bot}}w_{\ell}r_{\ell}\,:\,g_{\ell}(x,y)\leq 0,\,\forall\,\ell\in\mathcal{L},\,g_{\ell}(x,y)\leq r_{\ell},\,r_{\ell}\geq 0,\,\forall\,\ell\in\mathcal{L}^{\bot}\right\} (5)

is a convex function measuring the infeasibility of g(x,y)g_{\ell}(x,y), [L]\ell\in[L], at point xx (if 𝒴(x)=\mathcal{Y}(x)=\varnothing). Here w>0w_{\ell}>0, \ell\in\mathcal{L}^{\bot}, are the weights that can be chosen to reduce to, e.g., ||||1{||\cdot||}_{1} norm minimization. For simplicity, we assume that [L]\mathcal{L}\subseteq[L] is the set of constraints that can be satisfied at all x𝒳x\in\mathcal{X}, and its complement \mathcal{L}^{\bot} is the set of constraints that can possibly be violated at some x𝒳x\in\mathcal{X}. A special case is that ={}\mathcal{L}^{\bot}=\{\ell^{\prime}\} is a singleton as considered in the context of solving formulations (1) and (2). In this case, we can set w=1w_{\ell^{\prime}}=1 and solve an equivalent problem:

Ψ(x)=min{g(x,y):g(x,y)0,[L]\{}}.{\Psi}(x)=\min\left\{g_{\ell^{\prime}}(x,y)\,:\,~{}g_{\ell}(x,y)\leq 0,~{}\forall~{}\ell\in[L]\backslash\{\ell^{\prime}\}\right\}. (6)

The master problem (4) can be solved by a linear programming (LP) based B&C approach in which the nonlinear constraint Ψ(x)0\Psi(x)\leq 0 is replaced by the so-called (generalized) Benders feasibility cuts. In particular, let xx^{*} be a solution of the LP relaxation of the current master problem. Due to the convexity, Ψ(x)\Psi(x) can be underestimated by a supporting hyperplane Ψ(x)+μ(x)(xx)\Psi(x^{*})+\mu(x^{*})^{\top}(x-x^{*}), so we obtain the following Benders feasibility cut

0Ψ(x)+μ(x)(xx),0\geq\Psi(x^{*})+\mu(x^{*})^{\top}(x-x^{*}), (7)

where

μi(x)=g(x,y)xi+[L]\{}ug(x,y)xi,i[n],{\mu}_{i}(x^{*})=\frac{\partial g_{\ell^{\prime}}(x^{*},y^{*})}{\partial x_{i}}+\sum_{\ell\in[L]\backslash\{\ell^{\prime}\}}u_{\ell}^{*}\frac{\partial g_{\ell}(x^{*},y^{*})}{\partial x_{i}},~{}\forall~{}i\in[n], (8)

yy^{*} and uu^{*} are optimal primal and Lagrangian dual solutions of the convex problem (6) with x=xx=x^{*}, derived using the Karush-Kuhn-Tucker (KKT) conditions; see [13] for more details.

3.2 Benders decomposition for generic PSCPs

Now, we apply the BD approach to the MIP formulation (1) of the PSCP. From Proposition 1, we project the zz variables out from formulation and obtain the master problem

min{cx:Axv,Ψ(v)0,(x,v){0,1}n+m}.\min\{c^{\top}x\,:\,Ax\geq v,~{}\Psi(v)\leq 0,~{}(x,v)\in\{0,1\}^{n+m}\}. (9)

For a solution v[0,1]mv^{*}\in[0,1]^{m}, there always exists a feasible solution zz^{*} satisfying (1c), and thus Ψ(v)\Psi(v^{*}) can be presented as follows:

Ψ(v)=min{(1ϵ)i=1spizi:zivk,k[m],i𝒮k}.\Psi(v^{*})=\min\left\{(1-\epsilon)-\sum_{i=1}^{s}p_{i}z_{i}\,:\,z_{i}\leq v^{*}_{k},~{}\forall~{}k\in[m],~{}i\in\mathcal{S}_{k}\right\}. (10)

Note that as discussed in the end of Section 2, it is also possible to project vv variables out from the formulation. However, based on our experience, leaving the vv variables (whose number mm is generally not large) in the master problem makes the BD algorithm converge much faster, and thus improves the overall performance.

Let ukiu_{ki} be the dual variable associated with constraint zivkz_{i}\leq v_{k}^{*} in problem (10). The Lagrangian function reads

L(z,u)=(1ϵ)i=1spizi+k=1mi𝒮kuki(zivk).L(z,u)=(1-\epsilon)-\sum_{i=1}^{s}p_{i}z_{i}+\sum_{k=1}^{m}\sum_{i\in\mathcal{S}_{k}}u_{ki}(z_{i}-v^{*}_{k}). (11)

Let uu^{*} be an optimal dual solution. Hence μk(v)=i𝒮kuki\mu_{k}(v^{*})=-\sum_{i\in\mathcal{S}_{k}}u_{ki}^{*}, and the Benders feasibility cut (7) reduces to

0Ψ(v)k=1mi𝒮kuki(vkvk).0\geq\Psi(v^{*})-\sum_{k=1}^{m}\sum_{i\in\mathcal{S}_{k}}u^{*}_{ki}(v_{k}-v_{k}^{*}). (12)

Next, we derive a closed formula for the Benders feasibility cut (12). Without loss of generality, we assume values vv^{*} are sorted non-decreasingly, i.e., v1vmv^{*}_{1}\leq\cdots\leq v_{m}^{*}. Let i={k[m]:i𝒮k}\mathcal{M}_{i}=\{k\in[m]\,:\,i\in\mathcal{S}_{k}\}. Then one optimal solution for problem (10) is given by

zi=minkivk=vki,whereki=min{k:i𝒮k},i[s].z^{*}_{i}=\min_{k\in\mathcal{M}_{i}}v^{*}_{k}=v^{*}_{k_{i}},~{}\text{where}~{}k_{i}=\min\{k\,:\,i\in\mathcal{S}_{k}\},~{}\forall~{}i\in[s]. (13)

Substituting zz^{*} into the KKT system

L(z,u)zi=pi+kiuki=0,i[s],\displaystyle\frac{\partial L(z,u)}{\partial z_{i}}=-p_{i}+\sum_{k\in\mathcal{M}_{i}}u_{ki}=0,~{}\forall~{}i\in[s],
u0,zivk,uki(zivk)=0,k[m],i𝒮k,\displaystyle u\geq 0,~{}z_{i}\leq v^{*}_{k},~{}u_{ki}(z_{i}-v^{*}_{k})=0,~{}\forall~{}k\in[m],~{}i\in\mathcal{S}_{k},

we derive an optimal dual solution as follows:

uki={pi,ifk=ki;0,otherwise,i[s],ki.u^{*}_{ki}=\begin{cases}p_{i},&~{}\text{if}~{}k=k_{i};\\ 0,&~{}\text{otherwise},\end{cases}~{}\forall~{}i\in[s],~{}k\in\mathcal{M}_{i}. (14)

Plugging Ψ(v)=1ϵi=1spivki\Psi(v^{*})=1-\epsilon-\sum_{i=1}^{s}p_{i}v^{*}_{k_{i}} and uu^{*} into (12) and regrouping the terms, we obtain the Benders feasibility cut

k=1mp(𝒮k\𝒮[k1])vk1ϵ.\sum_{k=1}^{m}p(\mathcal{S}_{k}\backslash\mathcal{S}_{[k-1]})v_{k}\geq 1-\epsilon. (15)

Here 𝒮[k1]=k[k1]𝒮k\mathcal{S}_{[k-1]}=\cup_{k^{\prime}\in[k-1]}\mathcal{S}_{k^{\prime}}, 𝒮[0]=\mathcal{S}_{[0]}=\varnothing, and p(𝒮k\𝒮[k1])=i𝒮k\𝒮[k1]pip(\mathcal{S}_{k}\backslash\mathcal{S}_{[k-1]})=\sum_{i\in\mathcal{S}_{k}\backslash\mathcal{S}_{[k-1]}}p_{i}. Given a solution v[0,1]mv^{*}\in[0,1]^{m}, the above procedure for the separation of Benders feasibility cuts can be implemented with the complexity of 𝒪(mlog(m)+k[m]|𝒮k|)\mathcal{O}(m\log(m)+\sum_{k\in[m]}|\mathcal{S}_{k}|).

3.3 Benders decomposition for PSCPs with block structure of ξ\xi

Next, we apply the BD approach to the MINLP formulation (2) of (PSCP) with block structure of ξ\xi. Similarly, we project the zz variables out from the formulation and obtain the master problem

min{cx:Axv,t=1Tηtln(1ϵ),Ψt(vt,ηt)0,t[T],(x,v){0,1}n+m},\displaystyle\min\left\{c^{\top}x:Ax\geq v,\,\sum_{t=1}^{T}\eta_{t}\geq\ln(1-\epsilon),\,\Psi_{t}(v^{t},\eta_{t})\leq 0,\,\forall~{}t\in[T],\,(x,v)\in\{0,1\}^{n+m}\right\}, (16)

where for a solution (v,η)[0,1]m×T(v^{*},\eta^{*})\in[0,1]^{m}\times\mathbb{R}^{T}, Ψt((vt),ηt)\Psi_{t}((v^{t})^{*},\eta_{t}^{*}), t[T]t\in[T], are given by

Ψt((vt),ηt)=min{ηtln(i=1spitzit):zitvk,kt,i𝒮kt}.\Psi_{t}((v^{t})^{*},\eta_{t}^{*})=\min\left\{\eta^{*}_{t}-\ln\left(\sum_{i=1}^{s}p_{it}z_{it}\right):z_{it}\leq v^{*}_{k},\,\forall~{}k\in\mathcal{M}^{t},\,i\in\mathcal{S}^{t}_{k}\right\}. (17)

Let mt=|t|m_{t}=|\mathcal{M}^{t}|. Without loss of generality, we assume t=[mt]\mathcal{M}^{t}=[m_{t}] and v1vmtv_{1}^{*}\leq\cdots\leq v_{m_{t}}^{*}. Note that Ψt((vt),ηt)\Psi_{t}((v^{t})^{*},\eta_{t}^{*}) may not be well-defined for some ((vt),ηt)[0,1]mt×((v^{t})^{*},\eta_{t}^{*})\in[0,1]^{m_{t}}\times\mathbb{R}. Indeed, letting it={kt:i𝒮kt}\mathcal{M}^{t}_{i}=\{k\in\mathcal{M}^{t}\,:\,i\in\mathcal{S}^{t}_{k}\}, if minkitvk=0\min_{k\in\mathcal{M}^{t}_{i}}v^{*}_{k}=0 for all i[s]i\in[s], then for any feasible solution ztz_{\cdot t} of problem (17), zit=0z_{it}=0 must hold for all i[s]i\in[s], and thus Ψt((vt),ηt)=+\Psi_{t}((v^{t})^{*},\eta_{t}^{*})=+\infty. To bypass this difficulty, we may add the valid inequality k=1kvk1\sum_{k=1}^{k^{\prime}}v_{k}\geq 1 to the master problem (16) to remove (v,η)(v^{*},\eta^{*}), where k[mt]k^{\prime}\in[m_{t}] is the minimum index such that k=1k𝒮kt=[s]\cup_{k=1}^{k^{\prime}}\mathcal{S}^{t}_{k}=[s].

Next, we consider the case that minkitvk>0\min_{k\in\mathcal{M}^{t}_{i}}v^{*}_{k}>0 holds for some i[s]i\in[s], i.e., Ψt((vt),ηt)<+\Psi_{t}((v^{t})^{*},\eta_{t}^{*})<+\infty is well-defined. Let ukiu_{ki} be the dual variable associated with constraint zitvkz_{it}\leq v_{k}^{*} in problem (17). The Lagrangian function for problem (17) reads

L(zt,u)=ηtln(i=1spizit)+kti𝒮ktuki(zitvk).L(z_{\cdot t},u)=\eta^{*}_{t}-\ln\left(\sum_{i=1}^{s}p_{i}z_{it}\right)+\sum_{k\in\mathcal{M}^{t}}\sum_{i\in\mathcal{S}^{t}_{k}}u_{ki}(z_{it}-v^{*}_{k}). (18)

Using a similar of analysis as in Section 3.2, the Benders feasibility cut (7) can be written as

0Ψt((vt),ηt)+(ηtηt)kti𝒮ktuki(vkvk).0\geq\Psi_{t}((v^{t})^{*},\eta_{t}^{*})+(\eta_{t}-\eta_{t}^{*})-\sum_{k\in\mathcal{M}^{t}}\sum_{i\in\mathcal{S}^{t}_{k}}u^{*}_{ki}(v_{k}-v_{k}^{*}). (19)

where uu^{*} is the optimal dual solution.

It is easy to see that the point ztz^{*}_{\cdot t} defined by

zit=minkitvk=vki,whereki=min{k:i𝒮kt},i[s],z^{*}_{it}=\min_{k\in\mathcal{M}^{t}_{i}}v^{*}_{k}=v^{*}_{k_{i}},~{}\text{where}~{}k_{i}=\min\{k\,:\,i\in\mathcal{S}^{t}_{k}\},~{}\forall~{}i\in[s], (20)

is optimal for problem (17). Substituting ztz_{\cdot t}^{*} into the following KKT system

L(zt,u)zit=pii=1spizit+kituki=0,i[s],\displaystyle\frac{\partial L(z_{\cdot t},u)}{\partial z_{it}}=\frac{-p_{i}}{\sum_{i=1}^{s}p_{i}z_{it}}+\sum_{k\in\mathcal{M}^{t}_{i}}u_{ki}=0,~{}\forall~{}i\in[s],
u0,zitvk,uki(zitvk)=0,kt,i𝒮kt,\displaystyle u\geq 0,~{}z_{it}\leq v^{*}_{k},~{}u_{ki}(z_{it}-v^{*}_{k})=0,~{}\forall~{}k\in\mathcal{M}^{t},~{}i\in\mathcal{S}^{t}_{k},

we can see that uu^{*} defined by

uki={piδ,ifk=ki;0,otherwise,i[s],kit,whereδ=i=1spivki,u^{*}_{ki}=\begin{cases}\frac{p_{i}}{\delta},&~{}\text{if}~{}k=k_{i};\\ 0,&~{}\text{otherwise},\end{cases}~{}\forall~{}i\in[s],~{}k\in\mathcal{M}^{t}_{i},~{}\text{where}~{}\delta=\sum_{i=1}^{s}p_{i}v^{*}_{k_{i}}, (21)

is an optimal dual solution. Thus, the Benders feasibility cut (19) becomes

k=1mtp(𝒮kt\𝒮[k1]t)vkδ(ηt+1lnδ),\sum_{k=1}^{m_{t}}{p(\mathcal{S}^{t}_{k}\backslash\mathcal{S}^{t}_{[k-1]})}v_{k}\geq{\delta}(\eta_{t}+1-\ln\delta), (22)

where 𝒮[k1]t=k[k1]𝒮kt\mathcal{S}^{t}_{[k-1]}=\cup_{k^{\prime}\in[k-1]}\mathcal{S}^{t}_{k^{\prime}}, 𝒮[0]t=\mathcal{S}^{t}_{[0]}=\varnothing, and p(𝒮kt\𝒮[k1]t)=i𝒮kt\𝒮[k1]tpitp(\mathcal{S}^{t}_{k}\backslash\mathcal{S}^{t}_{[k-1]})=\sum_{i\in\mathcal{S}^{t}_{k}\backslash\mathcal{S}^{t}_{[k-1]}}p_{it}.

For a solution (v,η)[0,1]m×T(v^{*},\eta^{*})\in[0,1]^{m}\times\mathbb{R}^{T}, the Benders feasibility cuts (22) can be separated in the complexity of 𝒪(t[T](mtlog(mt)+k[mt]|𝒮kt|))\mathcal{O}(\sum_{t\in[T]}(m_{t}\log(m_{t})+\sum_{k\in[m_{t}]}|\mathcal{S}^{t}_{k}|)). It is worthwhile remarking that unlike the BD algorithm for formulation (1) in which at most one cut is added in one iteration, we can add up to TT cuts in one iteration in the BD algorithm for formulation (2).

4 Computational results

In this section, we present computational results to illustrate the effectiveness of the proposed BD algorithms for solving large-scale PSCPs. The proposed BD algorithms were implemented in Julia 1.7.3 using CPLEX 20.1.0. We use a branch-and-Benders-cut (B&BC) approach [25] to implement the BD algorithms in Sections 3.2 and 3.3, where the Benders feasibility cuts are added on the fly at each node of the search tree via the cut callback framework. In addition, before starting the B&BC algorithm, we apply the processing technique [17, 18, 19] to simplify formulations (1) and (2) and follow [9, 11] to apply an iterative procedure to solve the LP relaxation of the master problem. The latter attempts to collect effective Benders feasibility cuts for the initialization of the relaxed master problem, and improve the overall performance. At every iteration of the procedure, the LP is solved, and the Benders feasibility cuts violated by the current LP solution are added to the LP. This loop is repeated until the dual bound increases less than or equal to 0.01% for 100 consecutive times. At the end of this procedure, all violated cuts are added as static cuts to the relaxed master problem. The parameters of CPLEX were set to run the code in a single-threaded mode, with a time limit of 7200 seconds and a relative MIP gap of 0%. The computational experiments were conducted on a cluster of computers equipped with Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. Throughout this section, all averages are reported as the shifted geometric mean, with a shift of 11 [1].

Following [26], we construct PSCP instances using the 60 deterministic SCP instances [5].We construct instances with two block sizes, namely 10 and 20, and instances that do not have a block structure of the random variable ξ\xi (that is, ξ\xi has only a single block). We use two probability distributions of ξ\xi called circular and star distributions [8, 26] to construct scenarios of ξi\xi_{i} or ξit\xi_{i}^{t}. For the circular distribution, the kk-th entry of random vector ξ\xi is set to ξk=max{Yk,Y(kmodm)+1}\xi_{k}=\max\{Y_{k},Y_{(k\bmod m)+1}\}, where YkY_{k}, k[m]k\in[m], are independent Bernoulli random variables following the distribution Pr{Yk=1}=αk\Pr\{Y_{k}=1\}=\alpha_{k}, αk\alpha_{k} are uniformly chosen from [0.01,0.0275][0.01,0.0275], and mod\bmod denotes the modulo operator. For the star distribution, the kk-th entry of random vector ξ\xi is set to 0 if Vk1V_{k}\leq 1 and 11 otherwise, where Vk=Yk+Ym+1V_{k}=Y_{k}+Y_{m+1}, YkY_{k}, k[m+1]k\in[m+1], are independent Poisson random variables following the distribution Pr(Yk=τ)=eλk(λk)ττ!\Pr(Y_{k}=\tau)=\frac{e^{-\lambda_{k}}(\lambda_{k})^{\tau}}{\tau!} (τ+\tau\in\mathbb{Z}_{+}), and λk\lambda_{k} are uniformly chosen from [0.1,0.2][0.1,0.2]. The distributions of random variables ξt\xi^{t} can be described in a similar manner. In our random construction procedure, the scenario size ss and the confidence parameter ϵ\epsilon are chosen from {104,106}\{10^{4},10^{6}\} and {0.05,0.1}\{0.05,0.1\}, respectively. In total, there are 14401440 PSCP instances of problems (1) and (2) in our testbed.

Table 1: Performance comparison of solving problems (1) and (2).
All Formulation (1) Formulation (2) Difference
S T G(%) ST S T G(%) ST ND Δ\rm\DeltaO(%)
ss 10410^{4} 720 580 140.9 18.8 3.6 644 63.0 18.8 3.6 15 1.0
10610^{6} 720 571 224.0 19.1 26.7 617 158.4 16.1 33.2 3 1.0
Block size 10 480 384 162.1 24.4 9.3 454 59.1 14.5 10.0 10 0.9
20 480 388 173.7 24.1 10.1 443 61.4 13.6 9.4 8 1.0
mm 480 379 199.3 16.2 11.6 364 273.3 19.5 16.4 0 0.0

We first compare the performance of problems (1) and (2) when using them as Monte Carlo sampling approximations to solve PSCPs. We use the BD algorithms in Sections 3.2 and 3.3 to solve the two problems. Table 1 reports the total number of instances solved within the time limit (S), the average CPU time in seconds (T), and the average separation time in seconds (ST). For instances that cannot be solved within the time limit, we report the average relative end gap (G(%)) in percentage returned by CPLEX. To see the difference between the two problems, we report the number of instances with different optimal values (ND) and the gap between the optimal values of these instances (defined by ΔO(%)=100×|o1o2|max{o1,o2}\texttt{$\rm\Delta$O(\%)}=100\times\frac{|o_{1}-o_{2}|}{\max\{o_{1},o_{2}\}}, where o1o_{1} and o2o_{2} represent the optimal values of the two problems). As shown in Table 1, for most cases, the optimal values of problems (1) and (2) are identical, and even if the optimal values are different in some cases, their difference is usually very small. This shows that the gap between the two problems is indeed very small, especially for the cases with a huge scenario size. As for the computational efficiency, for instances with multiple blocks (or block sizes of 10 and 20), problem (2) is much more solvable by the proposed BD algorithm. This is reasonable as by exploiting the block structure of the random variables ξ\xi, up to TT cuts can be constructed in a single iteration, making the proposed BD algorithm for problem (2) converge much faster. In contrast, for instances with a single block, solving problem (2) is outperformed by solving problem (1).

Next, we compare the performance of the proposed BD algorithm (called BD) with the CPLEX’s default branch-and-cut and automatic BD algorithms (called CPX and AUTO-BD, respectively) for solving generic PSCPs with arbitrary distribution of ξ\xi (i.e., problem (1)). The results are summarized in Table 2. From Table 2, we see that the performance of BD is much better than CPX. Indeed, for large-scale instances with 10610^{6} scenarios, CPX was even unable to solve the LP relaxation of problem (1) within the given time limit. Comparing AUTO-BD and BD, we observe that for small-scale instances with 10410^{4} scenarios, BD is outperformed by AUTO-BD by a factor of 1.41.4 (in terms of CPU time); but for large-scale instances with 10610^{6} scenarios, BD outperforms AUTO-BD by a factor of 3.13.1.

Table 2: Performance comparison of CPX, AUTO-BD, and BD on problem (1). “–” means that CPX was unable to find a feasible solution within the timelimit.
ss All CPX AUTO-BD BD
S T G(%) S T G(%) S T G(%) ST
10410^{4} 720 528 975.9 26.7 582 102.0 14.7 580 140.9 15.5 3.6
10610^{6} 720 578 700.0 14.3 571 224.0 14.9 26.7

Finally, we compare the performance of the proposed BD algorithm with the B&C algorithm of Saxena et al. [26] (denoted by B&C) for solving PSCPs with block structure of ξ\xi (i.e., problem (2)). B&C solves an MIP formulation built on the so-called (1ϵ)(1-\epsilon)-efficient and -inefficient points (which must be enumerated before starting the algorithm). Table 3 summarizes the performance results on instances with block sizes being 1010 and 2020. We do not report the results on instances with a single block, as the block size is too large to enumerate the (1ϵ)(1-\epsilon)-efficient and -inefficient points within a reasonable time of period. For setting B&C, we report the average number of (linear) constraints that are used to model the probabilistic constraint (C); and for completeness, we also report the CPU time spent in the enumeration phase (ET), which is not included in the time spent in solving the MIP formulation. As shown in Table 3, when ϵ\epsilon and the block size are small or ξt\xi^{t} is constructed by a circular distribution, the number of constraints in the MIP formulation of Saxena et al. [26] is small, and thus B&C performs better than BD. However, for instances with a large ϵ\epsilon, a large block size, or the star distribution of ξt\xi^{t}, due to the large number of constraints in the MIP formulation, using B&C yields worse performance. In contrast, as the proposed BD algorithm does not need to solve a large MIP formulation, it performs much better on these instances. Moreover, it can avoid calling a potentially time-consuming enumeration procedure of (1ϵ)(1-\epsilon)-efficient and -inefficient points.

Table 3: Performance comparison of B&C of [26] and BD on problem (2)
All B&C BD
S T G(%) C ET S T G(%) ST
ϵ\epsilon 0.05 480 477 23.4 10.0 2308 107.9 454 57.0 11.5 9.1
0.1 480 385 274.0 21.5 27874 156.6 443 63.7 14.4 10.4
Block size 10 480 473 35.1 13.6 3658 4.4 454 59.1 18.3 10.0
20 480 389 184.8 21.5 17592 3183.5 443 61.4 13.3 9.4
Distribution circular 480 456 51.2 15.8 5008 79.0 450 60.2 15.2 9.4
star 480 406 127.3 23.3 12853 213.7 447 60.4 13.3 10.0

In summary, our computational results illustrate the effectiveness of our proposed BD algorithm for solving PSCPs. More specifically, for generic PSCPs with arbitrary distribution of ξ\xi, the proposed BD algorithm is much more efficient than the branch-and-cut and automatic BD algorithms of CPLEX; for PSCPs with block structure of ξ\xi, the proposed BD algorithm is much more robust than the state-of-the-art approach in [26] in terms of solving (PSCP’) with different input parameters.

References

  • [1] Achterberg, T.: Constraint Integer Programming. Ph.D. thesis, Technical University of Berlin (2007)
  • [2] Ahmed, S., Papageorgiou, D.J.: Probabilistic set covering with correlations. Operations Research 61(2), 438–452 (2013)
  • [3] Ahmed, S., Shapiro, A.: Solving chance-constrained stochastic programs via sampling and integer programming. In: Chen, Z.L., Raghavan, S. (eds.) Tutorials in Operations Research: State-of-the-Art Decision-Making Tools in the Information-Intensive Age, pp. 261–269. INFORMS (2008)
  • [4] Azizi, N., García, S., Irawan, C.A.: Discrete location problems with uncertainty. In: Salhi, S., Boylan, J. (eds.) The Palgrave Handbook of Operations Research, pp. 43–71. Palgrave Macmillan, Cham (2022)
  • [5] Beasley, J.E.: OR-library: Distributing test problems by electronic mail. Journal of the Operational Research Society 41(11), 1069–1072 (1990)
  • [6] Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. Acta Numerica 22, 1–131 (2013)
  • [7] Beraldi, P., Bruni, M.E.: An exact approach for solving integer problems under probabilistic constraints with random technology matrix. Annals of Operations Research 177(1), 127–137 (2010)
  • [8] Beraldi, P., Ruszczyński, A.: The probabilistic set-covering problem. Operations Research 50(6), 956–967 (2002)
  • [9] Bodur, M., Luedtke, J.R.: Mixed-integer rounding enhanced Benders decomposition for multiclass service-system staffing and scheduling with arrival rate uncertainty. Management Science 63(7), 2073–2091 (2017)
  • [10] Bonami, P., Kilinç, M., Linderoth, J.: Algorithms and software for convex mixed integer nonlinear programs. In: Lee, J., Leyffer, S. (eds.) Mixed Integer Nonlinear Programming, pp. 1–39. Springer New York (2012)
  • [11] Fischetti, M., Ljubić, I., Sinnl, M.: Benders decomposition without separability: A computational study for capacitated facility location problems. European Journal of Operational Research 253(3), 557–569 (2016)
  • [12] Fischetti, M., Monaci, M.: Cutting plane versus compact formulations for uncertain (integer) linear programs. Mathematical Programming Computation 4, 239–273 (2012)
  • [13] Geoffrion, A.M.: Generalized Benders decomposition. Journal of Optimization Theory and Applications 10(4), 237–260 (1972)
  • [14] Hwang, H.S.: A stochastic set-covering location model for both ameliorating and deteriorating items. Computers & Industrial Engineering 46(2), 313–319 (2004)
  • [15] Küçükyavuz, S., Jiang, R.: Chance-constrained optimization under limited distributional information: A review of reformulations based on sampling and distributional robustness. EURO Journal on Computational Optimization 10, 100030 (2022)
  • [16] Küçükyavuz, S., Sen, S.: An introduction to two-stage stochastic mixed-integer programming. In: Batta, R., Peng, J. (eds.) Tutorials in Operations Research: Leading Developments from INFORMS Communities, pp. 1–27. INFORMS (2017)
  • [17] Lejeune, M.A.: Preprocessing techniques and column generation algorithms for stochastically efficient demand. Journal of the Operational Research Society 59(9), 1239–1252 (2008)
  • [18] Lejeune, M., Noyan, N.: Mathematical programming approaches for generating pp-efficient points. European Journal of Operational Research 207(2), 590–600 (2010)
  • [19] Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM Journal on Optimization 19(2), 674–699 (2008)
  • [20] Lutter, P., Degel, D., Büsing, C., Koster, A.M.C.A., Werners, B.: Improved handling of uncertainty and robustness in set covering problems. European Journal of Operational Research 263(1), 35–49 (2017)
  • [21] Nemirovski, A., Shapiro, A.: Scenario approximations of chance constraints. In: Calafiore, G., Dabbene, F. (eds.) Probabilistic and Randomized Methods for Design under Uncertainty, pp. 3–47. Springer London (2006)
  • [22] Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: Theory and applications. Journal of Optimization Theory and Applications 142(2), 399–416 (2009)
  • [23] Prékopa, A.: Dual method for the solution of a one-stage stochastic programming problem with random RHS obeying a discrete probability distribution. Zeitschrift für Operations Research 34(6), 441–461 (1990)
  • [24] Prékopa, A.: Probabilistic programming. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming: Handbooks in Operations Research and Management Science, vol. 10, pp. 267–351. Elsevier (2003)
  • [25] Rahmaniani, R., Crainic, T.G., Gendreau, M., Rei, W.: The Benders decomposition algorithm: A literature review. European Journal of Operational Research 259(3), 801–817 (2017)
  • [26] Saxena, A., Goyal, V., Lejeune, M.A.: MIP reformulations of the probabilistic set covering problem. Mathematical Programming 121(1), 1–31 (2010)
  • [27] Shen, H., Jiang, R.: Chance-constrained set covering with Wasserstein ambiguity. Mathematical Programming 198(1), 621–674 (2023)
  • [28] Song, Y., Luedtke, J.R.: Branch-and-cut approaches for chance-constrained formulations of reliable network design problems. Mathematical Programming Computation 5(4), 397–432 (2013)
  • [29] Toregas, C., Swain, R., ReVelle, C.S., Bergman, L.: The location of emergency service facilities. Operations Research 19(6), 1363–1373 (1971)
  • [30] Wu, H.H., Küçükyavuz, S.: Probabilistic partial set covering with an oracle for chance constraints. SIAM Journal on Optimization 29(1), 690–718 (2019)