This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Generating Lagrangian Cuts for Two-Stage Stochastic Integer Programs

Rui Chen, James Luedtke
(Department of Industrial and Systems Engineering, University of Wisconsin-Madison
)
Abstract

We investigate new methods for generating Lagrangian cuts to solve two-stage stochastic integer programs. Lagrangian cuts can be added to a Benders reformulation, and are derived from solving single scenario integer programming subproblems identical to those used in the nonanticipative Lagrangian dual of a stochastic integer program. While Lagrangian cuts have the potential to significantly strengthen the Benders relaxation, generating Lagrangian cuts can be computationally demanding. We investigate new techniques for generating Lagrangian cuts with the goal of obtaining methods that provide significant improvements to the Benders relaxation quickly. Computational results demonstrate that our proposed method improves the Benders relaxation significantly faster than previous methods for generating Lagrangian cuts and, when used within a branch-and-cut algorithm, significantly reduces the size of the search tree for three classes of test problems.

1 Introduction

We study methods for solving two-stage stochastic integer programs (SIPs) with general mixed-integer first-stage and second-stage variables. Two-stage stochastic programs are used to model problems with uncertain data, where a decision maker first decides the values of first-stage decision variables, then observes the values of the uncertain data, and finally decides the values of second-stage decision variables. The objective is to minimize the sum of the first-stage cost and the expected value of the second-stage cost, where the expected value is taken with respect to the distribution of the uncertain data. Each realization of the uncertain data is called a scenario. Assuming the uncertainty is modeled with a finite set of scenarios SS, a two-stage SIP can be formulated as follows:

zIP=minx{cx+sSpsQs(x):Axb,xX},z_{\text{IP}}=\min_{x}\Big{\{}c^{\top}x+\sum_{s\in S}p_{s}Q_{s}(x):Ax\geq b,x\in X\Big{\}}, (1)

where cnc\in\mathbb{R}^{n}, and for each sSs\in S, psp_{s} denotes the probability of scenario ss and Qs:n{+}Q_{s}:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\} is the recourse function of scenario ss defined as

Qs(x)=miny{(qs)y:WsyhsTsx,yY}.Q_{s}(x)=\min_{y}\{(q^{s})^{\top}y:\ W^{s}y\geq h^{s}-T^{s}x,y\in Y\}. (2)

Here XnX\subseteq\mathbb{R}^{n} and YnyY\subseteq\mathbb{R}^{n_{y}} denote the integrality restrictions on some or all variables of xx and yy, respectively. Sign restrictions and variable bounds, if present, are assumed to be included in the constraints AxbAx\geq b or WsyhsTsxW^{s}y\geq h^{s}-T^{s}x. Matrices Ws,TsW^{s},T^{s} and vectors qs,hsq^{s},h^{s} are the realization of the uncertain data associated with scenario sSs\in S. We do not assume relatively complete recourse, i.e., it is possible that the recourse problem (2) is infeasible for some sSs\in S and xXx\in X satisfying AxbAx\geq b, in which case Qs(x)=+Q_{s}(x)=+\infty by convention.

Two-stage SIPs can also be written in the extensive form:

minx,ys\displaystyle\min_{x,y^{s}} cx+sSps(qs)ys\displaystyle c^{\top}x+\sum_{s\in S}p_{s}(q^{s})^{\top}y^{s} (3)
s.t. Axb,xX,\displaystyle Ax\geq b,x\in X,
WsyshsTsxs,ysY,sS.\displaystyle W^{s}y^{s}\geq h^{s}-T^{s}x^{s},y^{s}\in Y,s\in S.

From this perspective, SIPs are essentially large-scale mixed-integer programs (MIPs) with special block structure. Directly solving (3) as a MIP can be difficult when |S||S| is large. Therefore, significant research has been devoted to the study of decomposition methods for solving SIP problems. Benders decomposition [1, 2] and dual decomposition [3] are the two most commonly used decomposition methods for solving SIPs. In Benders decomposition, the second-stage variables are projected out from the Benders master model and linear programming (LP) relaxations of scenario subproblems are solved to add cuts in the master model. In dual decomposition, a copy of the first-stage variables is created for each scenario and constraints are added to require the copies to be equal to each other. A Lagrangian relaxation is then formed by relaxing these so-called nonanticipativity constraints. Solving the corresponding Lagrangian dual problem requires solving single-scenario MIPs to provide function values and supergradients of the Lagrangian relaxation. A more detailed description of dual decomposition for SIPs is given in Section 2.2. Usually dual decomposition generates a much stronger bound than Benders decomposition but the bound takes much longer to compute. Rahmaniani et al. [4] proposed Benders dual decomposition (BDD) in which Lagrangian cuts generated by solving single-scenario MIPs identical to the subproblems in dual decomposition are added to the Benders formulation to strengthen the relaxation. Similarly, Li and Grossmann [5] develop a Benders-like decomposition algorithm implementing both Benders cuts and Lagrangian cuts for solving convex mixed 0-1 nonlinear stochastic programs. While representing an interesting hybrid between the Benders and dual decomposition approaches, the time spent generating the Lagrangian cuts remains a limitation in these approaches. In this paper, we extend this line of work by investigating methods to generate Lagrangian cuts more efficiently.

This work more broadly contributes to the significant research investigating the use of cutting planes to strengthen the Benders model. Laporte and Louveaux [6] propose integer L-shaped cuts for SIPs with pure binary first-stage variables. Sen and Higle [7] and Sen and Sherali [8] apply disjunctive programming techniques to convexify the second-stage problems for SIPs with pure binary first-stage variables. Ntaimo [9] investigates the use of Fenchel cuts added to strengthen the subproblem relaxations. Gade et al. [10] and Zhang and Küçükyavuz [11] demonstrate how Gomory cuts can be used within a Benders decomposition method for solving SIPs with pure integer first-stage and second-stage variables, leading to a finitely convergent algorithm. Qi and Sen [12] use cuts valid for multi-term disjunctions introduced in [13] to derive an algorithm for SIPs with mixed-integer recourse. van der Laan and Romeijnders [14] propose a new class of cuts, scaled cuts, which integrates previously generated Lagrangian cuts into a cut-generation model, again leading to a finitely convergent cutting-plane method for SIPs with mixed-integer recourse. Bodur et al. [15] compare the strength of split cuts derived in the Benders master problem space to those added in the scenario LP relaxation space. Zou et al. [16] propose strengthened Benders cuts and Lagrangian cuts to solve pure binary multistage SIPs.

There are also other approaches for solving SIPs that do not rely on a Benders model. Lulli and Sen [17] develop a column generation based algorithm for solving multistage SIPs. Lubin et al. [18] apply proximal bundle methods to parallelize the dual decomposition algorithm. Boland et al. [19] and Guo et al. [20] investigate the use of progressive hedging to calculate the Lagrangian dual bound. Kim and Dandurand [21] propose a new branching method for dual decomposition of SIPs. Kim et al. [22] apply asynchronous trust-region methods to solve the Lagrangian dual used in dual decomposition.

The goal of this paper is to develop an effective way of generating Lagrangian cuts to add to a Benders model. The main contributions of our work are summarized as follows.

  1. 1.

    We propose a normalization for generating Lagrangian cuts similar to the one used in [23] for separating Benders cuts. This normalization can be used to construct a Lagrangian cut separation problem different from the one used in [16] and [4].

  2. 2.

    We propose methods for accelerating the generation of Lagrangian cuts, including solving the cut generation problem in a restricted subspace, and using a MIP approximation to identify a promising restricted subspace. Numerical results indicate that these approaches lead to significantly faster bound improvement from Lagrangian cuts.

  3. 3.

    We conduct an extensive numerical study on three classes of two-stage SIPs. We compare the impact of different strategies for generating Lagrangian cuts. Computational results are also given for using our method within branch-and-cut as an exact solution method. We find that this method has potential to outperform both a pure Benders approach and a pure dual decomposition approach.

Finally, we note that while we focus on new techniques for generating Lagrangian cuts for use in solving two-stage SIPs, these techniques can be directly applied to multi-stage SIPs by integrating them within the stochastic dual dynamic integer programming (SDDIP) algorithm in [16].

2 Preliminaries

2.1 Branch-and-Cut Based Methods

Problem (1) can be reformulated as the following Benders model:

minx,θs{cx+sSpsθs:(x,θs)Es,sS},\min_{x,\theta_{s}}\Big{\{}c^{\top}x+\sum_{s\in S}p_{s}\theta_{s}:(x,\theta_{s})\in E^{s},s\in S\Big{\}}, (4)

where for each sSs\in S, EsE^{s} contains the first-stage constraints and the epigraph of QsQ_{s}, i.e.,

Es={(x,θs)X×:Axb,θsQs(x)}.E^{s}=\{(x,\theta_{s})\in X\times\mathbb{R}:Ax\geq b,\theta_{s}\geq Q_{s}(x)\}. (5)

In a typical approach to solve (4), each recourse function QsQ_{s} is replaced by a cutting-plane underestimate Q^s\hat{Q}_{s} which creates a relaxation and is dynamically updated. In each iteration, an approximate problem defined by the current underestimate is solved to obtain a candidate solution and then a cut generation problem is solved to update the cutting plane approximation if necessary. This process is repeated until no cuts are identified.

We review two types of valid inequalities for EsE^{s}. The first collection of cuts are Benders cuts [1, 2]. Benders cuts are generated based on the LP relaxation of the problem (2) defining Qs(x)Q_{s}(x). Given a candidate solution x^\hat{x}, the LP relaxation of the recourse problem is solved:

miny{(qs)y:WsyhsTsx^}.\min_{y}\{(q^{s})^{\top}y:W^{s}y\geq h^{s}-T^{s}\hat{x}\}. (6)

Let μ\mu be an optimal dual solution of problem (6). Based on LP duality, the following Benders cut is valid for EsE^{s}:

μTsx+θsμhs.\mu^{\top}T^{s}x+\theta_{s}\geq\mu^{\top}h^{s}. (7)

If the SIP has continuous recourse, i.e., Y=nyY=\mathbb{R}^{n_{y}}, then (7) is tight at x^\hat{x}, i.e., Qs(x^)=μTsx^+μhs.Q_{s}(\hat{x})=-\mu^{\top}T^{s}\hat{x}+\mu^{\top}h^{s}. The cutting-plane model Q^s\hat{Q}_{s} is often constructed by iteratively adding Benders cuts until the lower bound converges to the LP relaxation bound. Benders cuts are sufficient to provide convergence for solving SIPs with continuous recourse [24].

Another useful family of cuts is the integer L-shaped cuts introduced in [6]. These cuts are valid only when the first-stage variables are binary, i.e., X={0,1}nX=\{0,1\}^{n}, but can be applied even when the second-stage includes integer decision variables. In this case, given x^{0,1}n\hat{x}\in\{0,1\}^{n} and a value LsL_{s} such that Qs(x)LsQ_{s}(x)\geq L_{s} for all feasible xx, the following integer L-shaped cut is a valid inequality for EsE^{s}:

θsQs(x^)(Qs(x^)Ls)(i:x^i=1(1xi)+i:x^i=0xi).\theta_{s}\geq Q_{s}(\hat{x})-(Q_{s}(\hat{x})-L_{s})\Big{(}\sum_{i:\hat{x}_{i}=1}(1-x_{i})+\sum_{i:\hat{x}_{i}=0}x_{i}\Big{)}. (8)

A standard branch-and-cut algorithm using Benders cuts and integer L-shaped cuts is described in Appendix. The branch-and-cut algorithm can be implemented using a lazy constraint callback in modern MIP solvers, which allows the addition of Benders or integer L-shaped cuts when the solver encounters a solution (θ^,x^)(\hat{\theta},\hat{x}) with x^X\hat{x}\in X (i.e., x^\hat{x} satisfies any integrality constraints) but for which θ^s<Qs(x^)\hat{\theta}_{s}<Q_{s}(\hat{x}) for some sSs\in S. These two classes of cuts are sufficient to guarantee convergence for SIPs with continuous recourse or pure binary first-stage variables. However, the efficiency of the algorithm depends significantly on the strength of the cutting-plane models Q^s\hat{Q}_{s}’s. Given poor relaxations of the recourse functions, the branch-and-bound search may end up exploring a huge number of nodes, resulting in a long solution time. As discussed in Section 1 many methods have been proposed to strengthen the model Q^s\hat{Q}_{s} in order to accelerate the algorithm.

We note that the validity of Lagrangian cuts does not require either continuous recourse or pure binary first-stage variables. However, outside of those settings, additional cuts or a specialized branching scheme would be required to obtain a convergent algorithm. We refer the reader to, e.g., [25, 12, 14, 11] for examples of methods that can be used to obtain a finitely convergent algorithm in other settings. Lagrangian cuts could potentially be added to enhance any of these approaches.

2.2 Dual Decomposition

In dual decomposition [3], a copy xsx^{s} of the first-stage variables xx is created for each scenario sSs\in S which yields a reformulation of (1):

minx,xs,ys\displaystyle\min_{x,x^{s},y^{s}} sSps(cxs+(qs)ys)\displaystyle\sum_{s\in S}p_{s}(c^{\top}x^{s}+(q^{s})^{\top}y^{s})
s.t. Axsb,sS,Tsxs+Wsyshs,sS,xsX,ysY,sS,xs=x,sS.\displaystyle\begin{array}[]{ll}\text{s.t. }\>\>\>Ax^{s}\geq b,&s\in S,\\ T^{s}x^{s}+W^{s}y^{s}\geq h^{s},&s\in S,\\ x^{s}\in X,y^{s}\in Y,&s\in S,\\ x^{s}=x,&s\in S.\end{array}

Lagrangian relaxation is applied to the nonanticipativity constraints xs=xx^{s}=x in this formulation with multipliers λs\lambda^{s} for sSs\in S, which gives the following Lagrangian relaxation problem:

z(λ)=minx,xs,ys\displaystyle z(\lambda)=\min_{x,x^{s},y^{s}} sSps(cxs+(qs)ys)+sSps(λs)(xsx),\displaystyle\sum_{s\in S}p_{s}(c^{\top}x^{s}+(q^{s})^{\top}y^{s})+\sum_{s\in S}p_{s}(\lambda^{s})^{\top}(x^{s}-x), (9)
s.t. (xs,ys)Ks,sS,\displaystyle(x^{s},y^{s})\in K^{s},\quad s\in S,

where Ks:={xX,yY:Axb,Tsx+Wsyhs}K^{s}:=\{x\in X,y\in Y:Ax\geq b,T^{s}x+W^{s}y\geq h^{s}\} for each sSs\in S. Throughout this paper we assume that KsK^{s} is nonempty for each sSs\in S

The nonanticipative Lagrangian dual problem is defined as zD=maxλz(λ)z_{D}=\max_{\lambda}z(\lambda). Note that constraint sSpsλs=0\sum_{s\in S}p_{s}\lambda^{s}=0 is implicitly enforced in maxλz(λ)\max_{\lambda}z(\lambda) since z(λ)=z(\lambda)=-\infty when sSpsλs0\sum_{s\in S}p_{s}\lambda^{s}\neq 0. Under this condition, the variables xx in (9) can be eliminated, and hence (9) can be solved by solving a separate optimization for each scenario sSs\in S. The nonanticipative Lagrangian dual problem

zD=maxλ{z(λ):sSpsλs=0}z_{D}=\max_{\lambda}\Big{\{}z(\lambda):\sum_{s\in S}p_{s}\lambda^{s}=0\Big{\}}

is a convex program which gives a lower bound on the optimal value of (1). The following theorem provides a primal characterization of the bound zDz_{D}.

Theorem 1 (Carøe and Schultz [3]).

The following equality holds:

zD=minx,ys{cx+sSps(qs)ys:(x,ys)conv(Ks),sS}.z_{D}=\min_{x,y^{s}}\Big{\{}c^{\top}x+\sum_{s\in S}p_{s}(q^{s})^{\top}y^{s}:(x,y^{s})\in\text{conv}(K^{s}),s\in S\Big{\}}.

Empirical evidence (e.g., [26, 27]) shows that the Lagrangian dual bound zDz_{D} is often a tight lower bound on the optimal objective value of (2). Such a bound can be used in a branch and bound algorithm and to generate feasible solutions using heuristics (see, e.g., [3, 21]). However, the Lagrangian dual problem can be hard to solve due to its large size (with n|S|n|S| variables) and the difficulty of solving MIP subproblems to evaluate z(λ)z(\lambda) and obtain supergradients of z()z(\cdot) at a point λ\lambda.

2.3 Benders Dual Decomposition

The BDD algorithm [4] is a version of Benders decomposition that uses strengthened Benders cuts and Lagrangian cuts to tighten the Benders model. In particular, they solve two types of scenario MIPs to generate optimality and feasibility cuts for EsE^{s}:

  1. 1.

    Given any λn\lambda\in\mathbb{R}^{n}, let (x¯λs,y¯λs)\big{(}\bar{x}^{s}_{\lambda},\bar{y}^{s}_{\lambda}\big{)} be an optimal solution of

    minx,y{λx+(qs)y:(x,y)Ks}.\min_{x,y}\{\lambda^{\top}x+(q^{s})^{\top}y:(x,y)\in K^{s}\}.

    Then the following Lagrangian optimality cut is valid for EsE^{s}:

    λ(xx¯λs)+θs(qs)y¯λs.\lambda^{\top}(x-\bar{x}^{s}_{\lambda})+\theta_{s}\geq(q^{s})^{\top}\bar{y}^{s}_{\lambda}. (10)
  2. 2.

    Given any λn\lambda\in\mathbb{R}^{n}, let (x^λs,y^λs,u^λs,v^λs)\big{(}\hat{x}^{s}_{\lambda},\hat{y}^{s}_{\lambda},\hat{u}^{s}_{\lambda},\hat{v}^{s}_{\lambda}\big{)} be an optimal solution of

    minx,y,u,v{𝟙v+𝟙u+λx:Ax+ub,Tsx+Wsy+vhs,xX,yY}.\min_{x,y,u,v}\{\mathbbm{1}^{\top}v+\mathbbm{1}^{\top}u+\lambda^{\top}x:Ax+u\geq b,T^{s}x+W^{s}y+v\geq h^{s},x\in X,y\in Y\}.

    Then the following Lagrangian feasibility cut is valid for EsE^{s}:

    λ(xx^λs)𝟙v^λs+𝟙u^λs.\lambda^{\top}(x-\hat{x}^{s}_{\lambda})\geq\mathbbm{1}^{\top}\hat{v}^{s}_{\lambda}+\mathbbm{1}^{\top}\hat{u}^{s}_{\lambda}. (11)

The BDD algorithm generates both types of Lagrangian cuts by heuristically solving a Lagrangian cut generation problem. Numerical results from [4] show that Lagrangian cuts are able to close significant gap at the root node for a variety of SIP problems. Our goal in this work is to provide new methods for quickly finding strong Lagrangian cuts.

3 Selection and Normalization of Lagrangian Cuts

We begin by introducing an alternative view of the Lagrangian cuts (10) and (11) which leads to a different cut generation formulation. Let Q¯s:n×+\overline{Q}_{s}^{*}:\mathbb{R}^{n}\times\mathbb{R}_{+}\rightarrow\mathbb{R} be defined as

Q¯s(π,π0)=minx\displaystyle\overline{Q}_{s}^{*}(\pi,\pi_{0})=\min_{x} {πx+π0Qs(x):Axb,xX}=minx,y{πx+π0(qs)y:(x,y)Ks}\displaystyle\bigl{\{}\pi^{\top}x+\pi_{0}Q_{s}(x):Ax\geq b,x\in X\bigr{\}}=\min_{x,y}\bigl{\{}\pi^{\top}x+\pi_{0}(q^{s})^{\top}y:(x,y)\in K^{s}\bigr{\}} (12)

for (π,π0)n×+(\pi,\pi_{0})\in\mathbb{R}^{n}\times\mathbb{R}_{+}. According to (12), for any (π,π0)n×+(\pi,\pi_{0})\in\mathbb{R}^{n}\times\mathbb{R}_{+}, the following inequality is valid for set EsE^{s} defined in (5):

πx+π0θsQ¯s(π,π0).\pi^{\top}x+\pi_{0}\theta_{s}\geq\overline{Q}_{s}^{*}(\pi,\pi_{0}). (13)

We next demonstrate that the set of all cuts of the form (13) is equivalent to set of Lagrangian cuts (10) and (11), and therefore we refer to cuts of the form (13) as Lagrangian cuts. For any Πsn×+\Pi_{s}\subseteq\mathbb{R}^{n}\times\mathbb{R}_{+}, let Ps(Πs)P_{s}(\Pi_{s}) denote the set defined by all cuts (13) with coefficients restricted on Πs\Pi_{s}, i.e.,

Ps(Πs):={(x,θs):πx+π0θsQ¯s(π,π0) for all (π,π0)Πs}.P_{s}(\Pi_{s}):=\{(x,\theta_{s}):\pi^{\top}x+\pi_{0}\theta_{s}\geq\overline{Q}_{s}^{*}(\pi,\pi_{0})\text{ for all }(\pi,\pi_{0})\in\Pi_{s}\}.
Proposition 2.

Let Πsn+1\Pi_{s}\subseteq\mathbb{R}^{n+1} be a neighborhood of the origin and let Πs=Πs(n×+)\Pi_{s}^{*}=\Pi_{s}\cap(\mathbb{R}^{n}\times\mathbb{R}_{+}). Define P¯s:={(x,θs):(10) and (11) hold for all λn}\bar{P}_{s}:=\{(x,\theta_{s}):\text{\eqref{ineq:Lag} and \eqref{ineq:fea} hold for all }\lambda\in\mathbb{R}^{n}\}. Then Ps(Πs)=P¯sP_{s}(\Pi_{s}^{*})=\bar{P}_{s}.

Proof.

We first show that Ps(Πs)P¯sP_{s}(\Pi_{s}^{*})\subseteq\bar{P}_{s}. Let (x,θs)Ps(Πs)(x,\theta_{s})\in P_{s}(\Pi_{s}^{*}). We show that (x,θs)(x,\theta_{s}) satisfies (10) and (11) for all λn\lambda\in\mathbb{R}^{n}. Let λn\lambda\in\mathbb{R}^{n}, and choose ρλ1\rho^{1}_{\lambda} and ρλ2\rho^{2}_{\lambda} large enough such that (λ/ρλ1,1/ρλ1),(λ/ρλ2,0)Πs(\lambda/\rho^{1}_{\lambda},1/\rho^{1}_{\lambda}),(\lambda/\rho^{2}_{\lambda},0)\in\Pi_{s}^{*}, and thus (x,θs)(x,\theta_{s}) satisfies (13) using these values for (π,π0)(\pi,\pi_{0}). Multiplying inequality (13) with (π,π0)=(λ/ρλ1,1/ρλ1)Πs(\pi,\pi_{0})=(\lambda/\rho^{1}_{\lambda},1/\rho^{1}_{\lambda})\in\Pi_{s}^{*} by ρλ1\rho_{\lambda}^{1} gives

λx+θsρλ1Q¯s(λ/ρλ1,1/ρλ1)=Q¯s(λ,1)=λx¯λs+(qs)y¯λs,\lambda^{\top}x+\theta_{s}\geq\rho^{1}_{\lambda}\overline{Q}_{s}^{*}(\lambda/\rho^{1}_{\lambda},1/\rho^{1}_{\lambda})=\overline{Q}_{s}^{*}(\lambda,1)=\lambda^{\top}\bar{x}^{s}_{\lambda}+(q^{s})^{\top}\bar{y}^{s}_{\lambda},

where (x¯λs,y¯λs)(\bar{x}^{s}_{\lambda},\bar{y}^{s}_{\lambda}) is as defined in (10) and where we have used the observation that the function Q¯s\overline{Q}_{s}^{*} is positively homogeneous. Subtracting λx¯λs\lambda^{\top}\bar{x}^{s}_{\lambda} from both sides yields that (x,θs)(x,\theta_{s}) satisfies (10). On the other hand, inequality (13) with (π,π0)=(λ/ρλ2,0)Πs(\pi,\pi_{0})=(\lambda/\rho^{2}_{\lambda},0)\in\Pi_{s}^{*} implies

(λ/ρλ2)x\displaystyle(\lambda/\rho^{2}_{\lambda})^{\top}x Q¯s(λ/ρλ2,0)\displaystyle\geq\overline{Q}_{s}^{*}(\lambda/\rho^{2}_{\lambda},0)
=minx,y{λx:Axb,Tsx+Wsyh,xX,yY}/ρλ2\displaystyle=\min_{x,y}\{\lambda^{\top}x:Ax\geq b,T^{s}x+W^{s}y\geq h,x\in X,y\in Y\}/\rho^{2}_{\lambda}
minx,y,u,v{𝟙v+𝟙u+λx:Ax+ub,Tsx+Wsy+vhs,xX,yY}/ρλ2\displaystyle\geq\min_{x,y,u,v}\{\mathbbm{1}^{\top}v+\mathbbm{1}^{\top}u+\lambda^{\top}x:Ax+u\geq b,T^{s}x+W^{s}y+v\geq h^{s},x\in X,y\in Y\}/\rho^{2}_{\lambda}
=(λx^λs+𝟙v^λs+𝟙u^λs)/ρλ2.\displaystyle=(\lambda^{\top}\hat{x}^{s}_{\lambda}+\mathbbm{1}^{\top}\hat{v}^{s}_{\lambda}+\mathbbm{1}^{\top}\hat{u}^{s}_{\lambda})/\rho^{2}_{\lambda}.

Multiplying both sides by ρλ2\rho_{\lambda}^{2} and subtracting λx^λs\lambda^{\top}\hat{x}^{s}_{\lambda} from both sides gives inequality (11). Since λ\lambda can be arbitrary, it implies Ps(Πs)P¯sP_{s}(\Pi_{s}^{*})\subseteq\bar{P}_{s}.

Next let (x,θs)P¯s(x,\theta_{s})\in\bar{P}_{s}. We aim to show (x,θs)Ps(Πs)(x,\theta_{s})\in P_{s}(\Pi_{s}^{*}). Let (π,π0)Πs(\pi,\pi_{0})\in\Pi_{s}^{*}. If π0>0\pi_{0}>0, then (13) is satisfied by scaling (10) with λ=π/π0\lambda=\pi/\pi_{0}. We can similarly show that πx+π0θsQ¯s(π,π0)\pi^{\top}x+\pi_{0}^{\prime}\theta_{s}\geq\overline{Q}_{s}^{*}(\pi,\pi_{0}^{\prime}) for any π0>0\pi_{0}^{\prime}>0. Now assume π0=0\pi_{0}=0. By Theorem 7.1 of [28], Q¯s\overline{Q}_{s}^{*} is lower-semicontinuous, and hence

πx=πx+lim infπ00π0θslim infπ00Q¯s(π,π0)Q¯s(π,0),\pi^{\top}x=\pi^{\top}x+\liminf_{\pi_{0}^{\prime}\rightarrow 0}\pi_{0}^{\prime}\theta_{s}\geq\liminf_{\pi_{0}^{\prime}\rightarrow 0}\overline{Q}_{s}^{*}(\pi,\pi_{0}^{\prime})\geq\overline{Q}_{s}^{*}(\pi,0),

i.e., (13) is satisfied. Therefore, we have P¯sPs(Πs)\bar{P}_{s}\subseteq P_{s}(\Pi_{s}^{*}). ∎

Since Proposition 2 holds for any neighborhood Πs\Pi_{s} of the origin, it allows flexibility to choose a normalization on the Lagrangian cut coefficients when generating cuts. We discuss the choice of such a normalization in Section 4.

Let zLCz_{\text{LC}} denote the bound we obtain after adding all Lagrangian cuts, i.e.,

zLC:=minx,θs{cx+sSpsθs:(x,θs)Ps(n×+),sS}.z_{\text{LC}}:=\min_{x,\theta_{s}}\Bigl{\{}c^{\top}x+\sum_{s\in S}p_{s}\theta_{s}:(x,\theta_{s})\in P_{s}(\mathbb{R}^{n}\times\mathbb{R}_{+}),s\in S\Bigr{\}}.

Since Q¯s\overline{Q}_{s}^{*} is positively homogeneous and the set defined by an inequality is invariant to any positive scaling of the inequality, we also have

zLC=minx,θs{cx+sSpsθs:(x,θs)Ps(Πs),sS}z_{\text{LC}}=\min_{x,\theta_{s}}\Bigl{\{}c^{\top}x+\sum_{s\in S}p_{s}\theta_{s}:(x,\theta_{s})\in P_{s}(\Pi_{s}^{*}),s\in S\Bigr{\}}

for any Πs=Πs(n×+)\Pi_{s}^{*}=\Pi_{s}\cap(\mathbb{R}^{n}\times\mathbb{R}_{+}) such that Πs\Pi_{s} is a neighborhood of the origin. Rahmaniani et al. [4] showed that for |S|=1|S|=1, adding all Lagrangian cuts of the form (10) and feasibility cuts (11) yields a Benders master problem with optimal objective value equal to zIPz_{\text{IP}}. This is not true in general when |S|>1|S|>1. However, Li and Grossmann [5] showed that zLCzDz_{\text{LC}}\geq z_{D}. We show in the following theorem that this is an equality, i.e., zLC=zDz_{\text{LC}}=z_{D}. Since zIP=zDz_{\text{IP}}=z_{D} according to Theorem 1 when |S|=1|S|=1, this result generalizes the result in [4].

Theorem 3.

The equality zLC=zDz_{\text{LC}}=z_{D} holds.

Proof.

Although zLCzDz_{\text{LC}}\geq z_{D} has been shown in [5], we show both directions for completeness.

By Theorem 1,

zD=minx,ys{cx+sSps(qs)ys:(x,ys)conv(Ks),sS}=minx,θs{cx+sSpsθs:(x,θs)Us,sS},z_{D}=\min_{x,y^{s}}\Big{\{}c^{\top}x+\sum_{s\in S}p_{s}(q^{s})^{\top}y^{s}:(x,y^{s})\in\text{conv}(K^{s}),s\in S\Big{\}}\\ =\min_{x,\theta_{s}}\Big{\{}c^{\top}x+\sum_{s\in S}p_{s}\theta_{s}:(x,\theta_{s})\in U^{s},s\in S\Big{\}},

where Us:=proj(x,θs){(x,ys,θs):θs(qs)ys,(x,ys)conv(Ks)}U^{s}:=\text{proj}_{(x,\theta_{s})}\{(x,y^{s},\theta_{s}):\theta_{s}\geq(q^{s})^{\top}y^{s},(x,y^{s})\in\text{conv}(K^{s})\}. Therefore, we only need to show that Us=Ps(n×+)U^{s}=P_{s}(\mathbb{R}^{n}\times\mathbb{R}_{+}) for each sSs\in S.

On the one hand, let sSs\in S and (x¯,θ¯s)Us(\bar{x},\bar{\theta}_{s})\in U^{s}. Then there exists y¯s\bar{y}^{s} such that θ¯s(qs)y¯s\bar{\theta}_{s}\geq(q^{s})^{\top}\bar{y}^{s} and (x¯,y¯s)conv(Ks)(\bar{x},\bar{y}^{s})\in\text{conv}(K^{s}). Then by definition of Q¯s\overline{Q}_{s}^{*}, we have

πx¯+π0θ¯sπx¯+π0(qs)y¯sminx,y{πx+π0(qs)y:(x,y)conv(Ks)}=Q¯s(π,π0)\pi^{\top}\bar{x}+\pi_{0}\bar{\theta}_{s}\geq\pi^{\top}\bar{x}+\pi_{0}(q^{s})^{\top}\bar{y}^{s}\geq\min_{x,y}\{\pi^{\top}x+\pi_{0}(q^{s})^{\top}y:(x,y)\in\text{conv}(K^{s})\}=\overline{Q}_{s}^{*}(\pi,\pi_{0})

for all (π,π0)n×+(\pi,\pi_{0})\in\mathbb{R}^{n}\times\mathbb{R}_{+}, which implies that (x¯,θ¯s)Ps(n×+)(\bar{x},\bar{\theta}_{s})\in P_{s}(\mathbb{R}^{n}\times\mathbb{R}_{+}).

On the other hand, let sSs\in S and (x¯,θ¯s)Us(\bar{x},\bar{\theta}_{s})\notin U^{s}. Since UsU^{s} is a polyhedron, which is convex and closed, there exists a hyperplane strictly separating (x¯,θ¯s)(\bar{x},\bar{\theta}_{s}) from UsU^{s}. In other words, there exists unu\in\mathbb{R}^{n} and vv\in\mathbb{R} such that ux¯+vθ¯s<minx,θs{ux+vθs:(x,θs)Us}u^{\top}\bar{x}+v\bar{\theta}_{s}<\min_{x,\theta_{s}}\{u^{\top}x+v\theta_{s}:(x,\theta_{s})\in U^{s}\}. Note that we have v0v\geq 0 in this case since minx,θs{ux+vθs:(x,θs)Us}=\min_{x,\theta_{s}}\{u^{\top}x+v\theta_{s}:(x,\theta_{s})\in U^{s}\}=-\infty if v<0v<0. Then

ux¯+vθ¯s<minx,θs{ux+vθs:(x,θs)Us}=minx,y{ux+v(qs)y:(x,y)conv(Ks)}=Q¯s(u,v),u^{\top}\bar{x}+v\bar{\theta}_{s}<\min_{x,\theta_{s}}\{u^{\top}x+v\theta_{s}:(x,\theta_{s})\in U^{s}\}\\ =\min_{x,y}\{u^{\top}x+v(q^{s})^{\top}y:(x,y)\in\text{conv}(K^{s})\}=\overline{Q}_{s}^{*}(u,v),

i.e., (x¯,θ¯s)(\bar{x},\bar{\theta}_{s}) violates the Lagrangian cut ux¯+vθ¯sQ¯s(u,v)u^{\top}\bar{x}+v\bar{\theta}_{s}\geq\overline{Q}_{s}^{*}(u,v). Therefore, (x¯,θ¯s)Ps(n×+)(\bar{x},\bar{\theta}_{s})\notin P_{s}(\mathbb{R}^{n}\times\mathbb{R}_{+}). ∎

We next consider the problem of separating a point from Ps(Πs)P_{s}(\Pi_{s}^{*}) using Lagrangian cuts. Given a candidate solution (x^,θ^s)(\hat{x},\hat{\theta}_{s}) and a convex and compact Πs\Pi_{s}^{*}, the problem of separating (x^,θ^s)(\hat{x},\hat{\theta}_{s}) from Ps(Πs)P_{s}(\Pi_{s}^{*}) can be formulated as the following bounded convex optimization problem

maxπ,π0{Q¯s(π,π0)πx^π0θ^s:(π,π0)Πs}.\max_{\pi,\pi_{0}}\{\overline{Q}_{s}^{*}(\pi,\pi_{0})-\pi^{\top}\hat{x}-\pi_{0}^{\top}\hat{\theta}_{s}:(\pi,\pi_{0})\in\Pi_{s}^{*}\}. (14)

Problem (14) can be solved by bundle methods given an oracle for evaluating the function value and a supergradient of QsQ_{s}^{*}. The function value and a supergradient of Q¯s\overline{Q}_{s}^{*} can be evaluated at any (π,π0)n×+(\pi,\pi_{0})\in\mathbb{R}^{n}\times\mathbb{R}_{+} by solving the single-scenario MIP (12). Let (x,y)(x^{*},y^{*}) be an optimal solution of (12). Then we have Q¯s(π,π0)=πx+π0(qs)y\overline{Q}_{s}^{*}(\pi,\pi_{0})=\pi^{\top}x^{*}+\pi_{0}(q^{s})^{\top}y^{*} and (x,(qs)y)(x^{*},(q^{s})^{\top}y^{*}) is a supergradient of Q¯s\overline{Q}_{s}^{*} at (π,π0)(\pi,\pi_{0}).

Although (14) is a convex optimization problem, the requirement to solve a potentially large number of MIP problems of the form (12) to evaluate function values and supergradients of Q¯s\overline{Q}_{s}^{*} to find a single Lagrangian cut can make the use of such cuts computationally prohibitive. Numerical results in [16] indicate that although Lagrangian cuts can significantly improve the LP relaxation bound for many SIP problems, the use of exact separation of Lagrangian cuts does not improve overall solution time. In an attempt to reduce the time required to separate Lagrangian cuts, Rahmaniani et al. [4] applied an inner approximation heuristic. In the next section, we propose an alternative approach for generating Lagrangian cuts more quickly.

4 Restricted Separation of Lagrangian Cuts

Observe that given any (π,π0)n×+(\pi,\pi_{0})\in\mathbb{R}^{n}\times\mathbb{R}_{+}, evaluating Q¯s(π,π0)\overline{Q}_{s}^{*}(\pi,\pi_{0}) gives a valid inequality (13). By picking (π,π0)(\pi,\pi_{0}) properly, we show in the following proposition that a small number of Lagrangian cuts is sufficient to give the perfect information bound [29] defined as

zPI:=sSpsminx,y{cx+(qs)y:(x,y)Ks}.z_{\text{PI}}:=\sum_{s\in S}p_{s}\min_{x,y}\Big{\{}c^{\top}x+(q^{s})^{\top}y:(x,y)\in K^{s}\Big{\}}.

Although the perfect information bound can computed by solving a separate subproblem for each scenario, it is often better than the LP relaxation bound because each subproblem is a MIP.

Proposition 4.

The following equality holds:

zPI=minx,θs{cx+sSpsθs:cx+θsQ¯s(c,1),sS}.z_{\text{PI}}=\min_{x,\theta_{s}}\Big{\{}c^{\top}x+\sum_{s\in S}p_{s}\theta_{s}:c^{\top}x+\theta_{s}\geq\overline{Q}_{s}^{*}(c,1),s\in S\Big{\}}. (15)
Proof.

We first show the “\leq” direction. It follows by observing that if (x,{θs}sS)(x,\{\theta_{s}\}_{s\in S}) satisfies cx+θsQ¯s(c,1)c^{\top}x+\theta_{s}\geq\overline{Q}_{s}^{*}(c,1) for all sSs\in S, then

cx+sSpsθs\displaystyle c^{\top}x+\sum_{s\in S}p_{s}\theta_{s}\geq cx+sSps(cx+Q¯s(c,1))\displaystyle~{}c^{\top}x+\sum_{s\in S}p_{s}(-c^{\top}x+\overline{Q}_{s}^{*}(c,1)) (16)
=\displaystyle= sSpsQ¯s(c,1)=sSpsminx,y{cx+(qs)y:(x,y)Ks}=zPI.\displaystyle\sum_{s\in S}p_{s}\overline{Q}_{s}^{*}(c,1)=\sum_{s\in S}p_{s}\min_{x,y}\bigl{\{}c^{\top}x+(q^{s})^{\top}y:(x,y)\in K^{s}\bigr{\}}=z_{\text{PI}}.

The inequality (15) follows from the fact that the first inequality in (16) must hold at equality for all optimal solutions of (15). This is because if cx+θs>Q¯s(c,1)c^{\top}x+\theta_{s}>\overline{Q}_{s}^{*}(c,1) for some sSs\in S, we can obtain a feasible solution with a lower objective value by decreasing the value of θs\theta_{s}. ∎

From this example, we see that Lagrangian cuts with (π,π0)=(c,1)(\pi,\pi_{0})=(c,1) can already yield the perfect information bound. One interpretation of this example is that useful Lagrangian cuts can be obtained even when searching in a heavily constrained set of coefficients, which motivates our study of restricted separation of Lagrangian cuts.

4.1 General Framework

We propose solving the following restricted version of (14) to search for a Lagrangian cut that separates a solution (x^,θ^s)(\hat{x},\hat{\theta}_{s}):

maxπ,π0{Q¯s(π,π0)πx^π0θ^s:(π,π0)Πs},\max_{\pi,\pi_{0}}\{\overline{Q}_{s}^{*}(\pi,\pi_{0})-\pi^{\top}\hat{x}-\pi_{0}\hat{\theta}_{s}:(\pi,\pi_{0})\in\Pi_{s}\}, (17)

where Πs\Pi_{s} can be any convex compact subset of n×+\mathbb{R}^{n}\times\mathbb{R}_{+}, which is not necessarily defined by a neighborhood of the origin.

While setting Πs\Pi_{s} to be a neighborhood of the origin would give an exact separation of Lagrangian cuts, our proposal is to use a more restricted definition of Πs\Pi_{s} with the goal of obtaining a separation problem that can be solved faster. As one might expect, the strength of the cut generated from (17) will heavily depend on the choice of Πs\Pi_{s}. We propose to define Πs\Pi_{s} to be the span of a small set of vectors. Specifically, we require

π=k=1Kβkπk\pi=\sum_{k=1}^{K}\beta_{k}\pi^{k}

for some βK\beta\in\mathbb{R}^{K}, where {πk}k=1K\{\pi^{k}\}_{k=1}^{K} are chosen vectors. If {πk}k=1K\{\pi^{k}\}_{k=1}^{K} span n\mathbb{R}^{n}, then this approach reduces to exact separation. However, for smaller KK, we expect a trade-off between computation time and strength of cuts. A natural possibility for the vectors {πk}k=1K\{\pi^{k}\}_{k=1}^{K}, which we explore in Section 4.3, is to use the coefficients of Benders cuts derived from the LP relaxations.

4.2 Solution of (17)

We present a basic cutting-plane algorithm for solving (17) in Algorithm 1. The classical cutting-plane method used in the algorithm can be replaced by other bundle methods, e.g., the level method [30]. Within the solution process of (17), the function Q¯s\overline{Q}_{s}^{*} is approximated by an upper bounding cutting-plane model Qs:n×+Q_{s}^{*}:\mathbb{R}^{n}\times\mathbb{R}_{+}\rightarrow\mathbb{R} defined by

Qs(π,π0)=minz,θsz{πz+π0θsz:(z,θsz)E^s}Q_{s}^{*}(\pi,\pi_{0})=\min_{z,\theta_{s}^{z}}\{\pi^{\top}z+\pi_{0}\theta_{s}^{z}:(z,\theta_{s}^{z})\in\hat{E}^{s}\} (18)

for (π,π0)n×+(\pi,\pi_{0})\in\mathbb{R}^{n}\times\mathbb{R}_{+}, where E^s\hat{E}^{s} is a finite subset of EsE^{s}. In our implementation, for each scenario ss, all feasible first-stage solutions obtained from evaluating Q¯s(π,π0)\overline{Q}_{s}^{*}(\pi,\pi_{0}) and the corresponding feasible second-stage costs are added into E^s\hat{E}^{s}. Specifically, each time we solve (12) for some (π,π0)(\pi,\pi_{0}), we collect all optimal and sub-optimal solutions (x,y)(x,y) from the solver and store the first-stage solution z=xz=x and the corresponding feasible second-stage cost θsz=(qs)y\theta_{s}^{z}=(q^{s})^{\top}y. To guarantee finite convergence, we assume that at least one of the optimal solutions obtained when solving (12) is an extreme point of conv(Ks)\text{conv}(K^{s}). Note that even though we may solve problem (17) associated with different x^,θ^s\hat{x},\hat{\theta}_{s} and Πs\Pi_{s}, the previous QsQ_{s}^{*} remains a valid overestimate of Q¯s\overline{Q}_{s}^{*}. Therefore, whenever we need to solve (12) with some new x^,θ^s\hat{x},\hat{\theta}_{s} and Πs\Pi_{s}, we use the currently stored E^s\hat{E}^{s} to initialize QsQ_{s}^{*}. In our implementation, to avoid unboundedness of Q^s\hat{Q}^{*}_{s}, E^s\hat{E}^{s} is initialized by the perfect information solution (zs,(qs)ys)(z^{s},(q^{s})^{\top}y^{s}), where (zs,ys)argminx,y{cx+(qs)y:(x,y)Ks}(z^{s},y^{s})\in\arg\min_{x,y}\{c^{\top}x+(q^{s})^{\top}y:(x,y)\in K^{s}\}.

Input: (x^,θ^s\hat{x},\hat{\theta}_{s}), Πs\Pi_{s}, E^s\hat{E}^{s}
Output: (π,π0)(\pi^{*},\pi^{*}_{0}), E^s\hat{E}^{s}
1 Initialize UB+\leftarrow+\infty, LB\leftarrow-\infty
2 while UB >0>0 and UB-LBδ\geq\deltaUB do
3       Solve UBmaxπ,π0{Qs(π,π0)πx^π0θ^s:(π,π0)Πs}UB\leftarrow\max_{\pi,\pi_{0}}\{Q_{s}^{*}(\pi,\pi_{0})-\pi^{\top}\hat{x}-\pi_{0}\hat{\theta}_{s}:(\pi,\pi_{0})\in\Pi_{s}\}, and collect solution (π,π0)=(π^,π^0)(\pi,\pi_{0})=(\hat{\pi},\hat{\pi}_{0})
4       Solve (12) to evaluate Q¯s(π^,π^0)\overline{Q}_{s}^{*}(\hat{\pi},\hat{\pi}_{0}). Update E^s\hat{E}^{s} and QsQ_{s}^{*} with optimal and sub-optimal solutions obtained while solving (12)
5       if LB<Q¯s(π^,π^0)π^x^π^0θ^sLB<\overline{Q}_{s}^{*}(\hat{\pi},\hat{\pi}_{0})-\hat{\pi}^{\top}\hat{x}-\hat{\pi}_{0}\hat{\theta}_{s} then
6            LBQ¯s(π^,π^0)π^x^π^0θ^sLB\leftarrow\overline{Q}_{s}^{*}(\hat{\pi},\hat{\pi}_{0})-\hat{\pi}^{\top}\hat{x}-\hat{\pi}_{0}\hat{\theta}_{s}
7             (π,π0)(π^,π^0)(\pi^{*},\pi_{0}^{*})\leftarrow(\hat{\pi},\hat{\pi}_{0})
8       end if
9      
10 end while
Algorithm 1 Solution of (17).

While Algorithm 1 is for the most part a standard cutting-plane algorithm, an important detail is the stopping condition which uses a relative tolerance δ[0,1)\delta\in[0,1). Convergence of Algorithm 1 follows standard arguments for a cutting-plane algorithm. Specifically, because the set of extreme-point optimal solutions from (12) is finite, there will eventually be an iteration where the optimal solution obtained when solving (12) is already contained in the set E^s\hat{E}^{s}. When that occurs, after updating LBLB it will hold that UB=LBUB=LB. At that point it either holds that UB 0\leq 0 or UB-LB =0<δ=0<\deltaUB and the algorithm terminates. If the optimal value of (17) is positive, then UB will be strictly positive throughout the algorithm. In that case Algorithm 1 returns a violating Lagrangian cut with cut violation LB((1δ)UB,UB]\in((1-\delta)\text{UB},\text{UB}]. The value of δ\delta controls the trade-off between the strength of the cut and the running time. On the other hand, if the optimal value of (17) is non-positive, then we can terminate as soon as UB0\leq 0, since in this case we are assured a violated cut will not be found.

We next demonstrate that even when using δ>0\delta>0 it is possible to attain the full strength of the Lagrangian cuts for a given set Πs\Pi_{s} when applying them repeatedly within a cutting-plane algorithm for solving a relaxation of problem (4). To make this result concrete, we state such a cutting-plane algorithm in Algorithm 2. At each iteration tt, we approximate the true objective function by the cutting-plane model:

cx+sSpsQ^st(x)=cx+sSpsminx,θs{θs:π¯x+π¯0θsτ¯,(π¯0,π¯,τ¯)Ψst}c^{\top}x+\sum_{s\in S}p_{s}\hat{Q}_{s}^{t}(x)=c^{\top}x+\sum_{s\in S}p_{s}\min_{x,\theta_{s}}\{\theta_{s}:\bar{\pi}^{\top}x+\bar{\pi}_{0}\theta_{s}\geq\bar{\tau},\ \forall(\bar{\pi}_{0},\bar{\pi},\bar{\tau})\in\Psi_{s}^{t}\}

where Ψst\Psi_{s}^{t} represents the Benders cuts and Lagrangian cuts associated with scenario ss that have been added to the master relaxation up to iteration tt. At iteration tt, a master problem based on the current cutting-plane model is solved and generates candidate solution (xt,{θst}sS)(x^{t},\{\theta_{s}^{t}\}_{s\in S}). For each scenario ss, the restricted Lagrangian cut separation problem (17) is solved using Algorithm 1 with Πs\Pi_{s} restricted to be a (typically low-dimensional) set Πst\Pi_{s}^{t}. The cutting-plane model is then updated by adding the violated Lagrangian cuts (if any) for each scenario.

Input: {Q^s0}sS\{\hat{Q}_{s}^{0}\}_{s\in S}
Output: {Q^st}sS\{\hat{Q}_{s}^{t}\}_{s\in S}
1 Initialize t0t\leftarrow 0, cutfound \leftarrow True
2 while cutfound = True do
3       cutfound \leftarrow False;
4       Solve minx,θs{cx+sSpsθs:θsQ^st(x),Axb}\min_{x,\theta_{s}}\{c^{\top}x+\sum_{s\in S}p_{s}\theta_{s}:\theta_{s}\geq\hat{Q}^{t}_{s}(x),Ax\geq b\} to obtain solution (xt,{θst}sS)(x^{t},\{\theta_{s}^{t}\}_{s\in S})
5       for sSs\in S do
6             Solve Benders subproblem (6) and obtain dual solution μt,s\mu^{t,s};
7             if (μt,s)Tsxt+θst<(μt,s)hs(\mu^{t,s})^{\top}T^{s}x^{t}+\theta_{s}^{t}<(\mu^{t,s})^{\top}h^{s} then
8                   Update Q^st\hat{Q}_{s}^{t} using (7) defined by μ=μt,s\mu=\mu^{t,s} to obtain Q^st+1\hat{Q}_{s}^{t+1};
9                   cutfound \leftarrow True;
10                  
11             end if
12            
13       end for
14       if coutfound = False then
15             for sSs\in S do
16                   Generate Πstn\Pi_{s}^{t}\subseteq\mathbb{R}^{n}
17                   Solve (17) using Algorithm 1 with (x^,θ^s)=(xt,θst)(\hat{x},\hat{\theta}_{s})=(x^{t},\theta_{s}^{t}) and Πs=Πst\Pi_{s}=\Pi_{s}^{t}, and collect solution (πt,s,π0t,s):=(π,π0)(\pi^{t,s},\pi^{t,s}_{0}):=(\pi^{*},\pi^{*}_{0})
18                   if Q^s(πt,s,π0t,s)(πt,s)xtπt,sθst>0\hat{Q}_{s}^{*}(\pi^{t,s},\pi_{0}^{t,s})-(\pi^{t,s})^{\top}x^{t}-\pi^{t,s}\theta_{s}^{t}>0 then
19                         Update Q^st\hat{Q}_{s}^{t} using cut (13) with (π,π0)=(πt,s,π0t,s)(\pi,\pi_{0})=(\pi^{t,s},\pi^{t,s}_{0}) to obtain Q^st+1\hat{Q}_{s}^{t+1};
20                         cutfound \leftarrow True;
21                        
22                   end if
23                  
24             end for
25            
26       end if
27      tt+1t\leftarrow t+1
28 end while
29
Algorithm 2 Restricted Separation of Lagrangian Cuts

We show in the next theorem that if we search on a static compact Πs\Pi_{s} and keep adding Lagrangian cuts by solving (17) to a δ\delta relative tolerance with δ[0,1)\delta\in[0,1), the algorithm will converge to a solution that satisfies all the restricted Lagrangian cuts.

Theorem 5.

In Algorithm 2, let ΠstΠs\Pi_{s}^{t}\equiv\Pi_{s} be a compact set independent of tt, and assume that at each iteration of Algorithm 2, for each scenario sSs\in S, a Lagrangian cut (if a violated one exists) is generated by solving (17) to δ\delta relative tolerance with δ[0,1)\delta\in[0,1). Then for each scenario ss, every accumulation point (x¯,θ¯s)(\bar{x},\bar{\theta}_{s}) of the sequence {(xt,θst)}t=1\{(x^{t},\theta_{s}^{t})\}_{t=1}^{\infty} satisfies (x¯,θ¯s)Ps(Πs)(\bar{x},\bar{\theta}_{s})\in P_{s}(\Pi_{s}).

Proof.

Note that Q¯s\overline{Q}_{s}^{*} is a piecewise linear concave function. Therefore, Q¯s\overline{Q}_{s}^{*} is Lipschitz continuous. Assume {(xit,θsit)}t=1\{(x^{i_{t}},\theta_{s}^{i_{t}})\}_{t=1}^{\infty} is any fixed subsequence of {(xt,θst)}t=1\{(x^{t},\theta_{s}^{t})\}_{t=1}^{\infty} that converges to a point (x¯,θ¯s)(\bar{x},\bar{\theta}_{s}). Assume for contradiction that

maxπ,π0{Q¯s(π,π0)πx¯π0θ¯s:(π,π0)Πs}=:ϵ>0.\max_{\pi,\pi_{0}}\{\overline{Q}_{s}^{*}(\pi,\pi_{0})-\pi^{\top}\bar{x}-\pi_{0}\bar{\theta}_{s}:(\pi,\pi_{0})\in\Pi_{s}\}=:\epsilon>0.

Since Πs\Pi_{s} is bounded, there exists τ\tau such that

maxπ,π0{|π(xitx¯)+π0(θsitθ¯s)|:(π,π0)Πs}ϵ/2 for all tτ.\max_{\pi,\pi_{0}}\{|\pi^{\top}(x^{i_{t}}-\bar{x})+\pi_{0}(\theta_{s}^{i_{t}}-\bar{\theta}_{s})|:(\pi,\pi_{0})\in\Pi_{s}\}\leq\epsilon/2\text{ for all }t\geq\tau.

This implies that for all tτt\geq\tau,

maxπ,π0{Qs(π,π0)πTxitπ0θsit:(π,π0)Πs}\displaystyle\max_{\pi,\pi_{0}}\{Q_{s}^{*}(\pi,\pi_{0})-\pi^{T}x^{i_{t}}-\pi_{0}\theta_{s}^{i_{t}}:(\pi,\pi_{0})\in\Pi_{s}\}
=\displaystyle= maxπ,π0{Qs(π,π0)πTx¯π0θ¯sπT(xitx¯)π0(θsitθ¯s):(π,π0)Πs}\displaystyle\max_{\pi,\pi_{0}}\{Q_{s}^{*}(\pi,\pi_{0})-\pi^{T}\bar{x}-\pi_{0}\bar{\theta}_{s}-\pi^{T}(x^{i_{t}}-\bar{x})-\pi_{0}(\theta_{s}^{i_{t}}-\bar{\theta}_{s}):(\pi,\pi_{0})\in\Pi_{s}\}
\displaystyle\geq maxπ,π0{Qs(π,π0)πTx¯π0θ¯s|πT(xitx¯)+π0(θsitθ¯s)|:(π,π0)Πs}\displaystyle\max_{\pi,\pi_{0}}\{Q_{s}^{*}(\pi,\pi_{0})-\pi^{T}\bar{x}-\pi_{0}\bar{\theta}_{s}-|\pi^{T}(x^{i_{t}}-\bar{x})+\pi_{0}(\theta_{s}^{i_{t}}-\bar{\theta}_{s})|:(\pi,\pi_{0})\in\Pi_{s}\}
\displaystyle\geq maxπ,π0{Qs(π,π0)πTx¯π0θ¯s:(π,π0)Πs}\displaystyle\max_{\pi,\pi_{0}}\{Q_{s}^{*}(\pi,\pi_{0})-\pi^{T}\bar{x}-\pi_{0}\bar{\theta}_{s}:(\pi,\pi_{0})\in\Pi_{s}\}
maxπ,π0{|πT(xitx¯)+π0(θsitθ¯s)|:(π,π0)Πs}\displaystyle-\max_{\pi,\pi_{0}}\{|\pi^{T}(x^{i_{t}}-\bar{x})+\pi_{0}(\theta_{s}^{i_{t}}-\bar{\theta}_{s})|:(\pi,\pi_{0})\in\Pi_{s}\}
\displaystyle\geq ϵϵ/2=ϵ/2,\displaystyle\epsilon-\epsilon/2=\epsilon/2,

where the second “\geq” follows from maxx{f(x)}maxx{f(x)+g(x)}maxx{g(x)}\max_{x}\{f(x)\}\geq\max_{x}\{f(x)+g(x)\}-\max_{x}\{g(x)\}. Let (π(it),π0(it))(\pi^{(i_{t})},\pi_{0}^{(i_{t})}) denote the multiplier associated with the Lagrangian cut added at iteration iti_{t}. For any t>tτt^{\prime}>t\geq\tau,

Q¯s(π(it),π0(it))(π(it))xitπ0(it)θsit0\overline{Q}_{s}^{*}(\pi^{(i_{t})},\pi_{0}^{(i_{t})})-(\pi^{(i_{t})})^{\top}x^{i_{t^{\prime}}}-\pi_{0}^{(i_{t})}\theta_{s}^{i_{t^{\prime}}}\leq 0

since the Lagrangian cut associated with (π(it),π0(it))(\pi^{(i_{t})},\pi_{0}^{(i_{t})}) was added before iteration iti_{t^{\prime}}. On the other hand,

Q¯s(π(it),π0(it))(π(it))xitπ0(it)θsit(1δ)maxπ,π0{Q¯s(π,π0)πxitπ0θsit:(π,π0)Πs}(1δ)ϵ/2.\overline{Q}_{s}^{*}(\pi^{(i_{t^{\prime}})},\pi_{0}^{(i_{t^{\prime}})})-(\pi^{(i_{t^{\prime}})})^{\top}x^{i_{t^{\prime}}}-\pi_{0}^{(i_{t^{\prime}})}\theta_{s}^{i_{t^{\prime}}}\\ \geq(1-\delta)\max_{\pi,\pi_{0}}\{\overline{Q}_{s}^{*}(\pi,\pi_{0})-\pi^{\top}x^{i_{t^{\prime}}}-\pi_{0}\theta_{s}^{i_{t^{\prime}}}:(\pi,\pi_{0})\in\Pi_{s}\}\geq(1-\delta)\epsilon/2.

Since the sequence {(xit,θsit)}t=1\{(x^{i_{t}},\theta_{s}^{i_{t}})\}_{t=1}^{\infty} is bounded and function Q¯s\overline{Q}_{s}^{*} is Lipschitz continuous, there exists ρ>0\rho>0 such that (π(it),π0(it))(π(it),π0(it))ρ\|(\pi^{(i_{t})},\pi_{0}^{(i_{t})})-(\pi^{(i_{t^{\prime}})},\pi_{0}^{(i_{t^{\prime}})})\|\geq\rho for all t>tτt^{\prime}>t\geq\tau. This contradicts with the fact that Πs\Pi_{s} is bounded. ∎

In practice, we allow Πst\Pi_{s}^{t} to change with iteration tt, but Theorem 5 provides motivation for solving (17) to a δ\delta relative tolerance. In Section 5.2.1, we present results of experiments demonstrating that for at least one problem class, using δ\delta much larger than 0 can lead to significantly faster improvement in the lower bound obtained when using Lagrangian cuts. When allowing Πst\Pi_{s}^{t} to change with iteration tt, Theorem 5 no longer applies. Instead, in this setting we follow common practice and terminate the cut-generation process after no more cuts are found, or when it is detected that the progress in increasing the lower bound has significantly slowed (see Section 5.3 for an example stopping condition).

4.3 Choice of Πs\Pi_{s}

The primary difference between the exact separation (14) and the proposed restricted separation problem (17) is the restriction constraint (π,π0)Πs(\pi,\pi_{0})\in\Pi_{s}. Ideally, we would like to choose Πs\Pi_{s} so that (17) is easy to solve, in particular avoiding evaluating the function Q¯s\overline{Q}_{s}^{*} too many times, while also yielding a cut (13) that has a similar strength compared with the Lagrangian cut corresponding to the optimal solution of (14). Our proposal that aims to satisfy these properties is to define Πs\Pi_{s} to be the span of past Benders cuts with some normalization. Specifically, we propose

Πs={(π,π0):βK s.t. π=k=1Kβkπk,απ0+π11,π00}.\Pi_{s}=\Big{\{}(\pi,\pi_{0}):\exists\beta\in\mathbb{R}^{K}\text{ s.t. }\pi=\sum_{k=1}^{K}\beta_{k}\pi^{k},\alpha\pi_{0}+\|\pi\|_{1}\leq 1,\pi_{0}\geq 0\Big{\}}. (19)

Here KK and α\alpha are both predetermined parameters, and {πk}k=1K\{\pi^{k}\}_{k=1}^{K} are the coefficients of the last KK Benders cuts corresponding to scenario ss. The choice of {πk}k=1K\{\pi^{k}\}_{k=1}^{K} is important. The motivation for this choice is that the combination of cut coefficients from the LP relaxation could give a good approximation of the tangent of the supporting hyperplanes of QsQ_{s} at an (near-)optimal solution. The normalization used here is similar to the one proposed for the selection of Benders cuts in [23].

Different normalization leads to different cuts. Our second proposal is to use

Πs={(π,π0):βK s.t. π=k=1Kβkπk,απ0+β11,π00},\Pi_{s}=\Big{\{}(\pi,\pi_{0}):\exists\beta\in\mathbb{R}^{K}\text{ s.t. }\pi=\sum_{k=1}^{K}\beta_{k}\pi^{k},\alpha\pi_{0}+\|\beta\|_{1}\leq 1,\pi_{0}\geq 0\Big{\}}, (20)

which replaces π1\|\pi\|_{1} in (19) by β1\|\beta\|_{1}. This normalization may tend to lead to solutions with sparse β\beta. An advantage of this normalization is that it leads to a MIP approximation for optimally choosing πk\pi_{k}’s from a set of candidate vectors, which we discuss next.

Let Vs={vk}kIsV^{s}=\{v^{k}\}_{k\in I_{s}} be a given (potentially large) collection of coefficient vectors for a scenario sSs\in S. We construct a MIP model for selecting a subset of these vectors of size at most KK, to then use in (20). Specifically, we introduce a binary variable zkz_{k} for each kIsk\in I_{s}, where zk=1z_{k}=1 indicates we select vkv^{k} as one of the πk\pi^{k} vectors to use in (20). Using these decision variables, the problem of selecting a subset of at most KK vectors that maximizes the normalized violation can be formulated as the following mixed-integer nonlinear program:

maxβk,zk,π,π0\displaystyle\max_{\beta_{k},z_{k},\pi,\pi_{0}}\ Q¯s(π,π0)πx^π0θ^s\displaystyle\overline{Q}_{s}^{*}(\pi,\pi_{0})-\pi^{\top}\hat{x}-\pi_{0}\hat{\theta}_{s} (21)
s.t. π=kIsβkvk,\displaystyle\pi=\sum_{k\in I_{s}}\beta_{k}v^{k}, (22)
απ0+kIs|βk|1,\displaystyle\alpha\pi_{0}+\sum_{k\in I_{s}}|\beta_{k}|\leq 1, (23)
zkβkzk,\displaystyle-z_{k}\leq\beta_{k}\leq z_{k}, kIs,\displaystyle k\in I_{s}, (24)
kIszkK,\displaystyle\sum_{k\in I_{s}}z_{k}\leq K, (25)
zk{0,1},\displaystyle z_{k}\in\{0,1\}, kIs,\displaystyle k\in I_{s}, (26)
π00.\displaystyle\pi_{0}\geq 0. (27)

The objective and constraints (22) and (23) match the formulation (20), whereas constraint (25) limits the number of selected vectors to at most KK, and (24) enforces that if a vector is not selected then the weight on that vector must be zero. This problem is computationally challenging due to the implicit function Q¯s\overline{Q}_{s}^{*}. However, if we replace Q¯s\overline{Q}_{s}^{*} by its current cutting-plane approximation QsQ_{s}^{*}, then the problem can be solved as a MIP. Specifically, the MIP approximation of (21)-(27) is given by:

maxβk,zk,π,π0,τ\displaystyle\max_{\beta_{k},z_{k},\pi,\pi_{0},\tau} τπx^π0θ^s\displaystyle\tau-\pi^{\top}\hat{x}-\pi_{0}\hat{\theta}_{s} (28)
s.t. τπz+π0θsz,\displaystyle\tau\leq\pi^{\top}z+\pi_{0}\theta_{s}^{z}, (z,θsz)E^s,\displaystyle(z,\theta_{s}^{z})\in\hat{E}^{s},
(22)(27).\displaystyle\eqref{pik con1}-\eqref{pik con-1}.

After solving this approximation, we choose the vectors {πk}k=1K\{\pi_{k}\}_{k=1}^{K^{\prime}} to be the vectors vkv_{k} with zk=1z_{k}=1. Given this basis, we then solve (17) with Πs\Pi_{s} defined in (20) to generate Lagrangian cuts. The basis size KK^{\prime} in this case can be strictly smaller than KK. Observe that the optimal value of (28) gives an upper bound of (21)-(27). Therefore, if the optimal objective value of (28) is small, this would indicate we would not be able to generate a highly violated Lagrangian cut for this scenario, and hence we can skip this scenario for generating Lagrangian cuts.

5 Computational Study

We report our results from a computational study investigating different variants of our proposed methods for generating Lagrangian cuts and also comparing these to existing approaches. In Section 5.2 we investigate the bound closed over time by just adding cuts to the LP relaxation. In Section 5.3 we investigate the impact of using our cut-generation strategy to solve our test instances to optimality within a simple branch-and-cut algorithm.

We conduct our study on three test problems. The first two test problems are standard SIP problems that can be found in literature: the stochastic server location problem (SSLP, binary first-stage and mixed-integer second-stage) introduced in [31], and the stochastic network interdiction problem (SNIP, binary first-stage and continuous second-stage) in [32]. We also create a variant of the stochastic server location problem (SSLPV) that has mixed-binary first-stage variables and continuous second-stage varaibles. Our test set includes 24 SSLP instances with 20-50 first-stage variables, 2000-2100 second-stage variables, and 50-200 scenarios. The SSLPV test instances are the same as for SSLP, except the number of first-stage variable is doubled, with half of them being continuous. We use 20 instances of the SNIP problem having 320 first-stage variables, 2586 second-stage variables, and 456 scenarios. More details of the test problems and instances can be found in Appendix.

We consider in total six implementations of Lagrangian cuts:

  1. 1.

    StrBen: Strengthened Benders cuts from [16], i.e., (10) with λ\lambda equal to the negative of the Benders cut coefficients each time a Benders cut is generated.

  2. 2.

    BDD: BDD3 from [4]. This algorithm uses a multiphase implementation that first generates Benders cuts and strengthened Benders cuts, and then generates Lagrangian cuts based on a regularized inner approximation for separation.

  3. 3.

    Exact: Exact separation of Lagrangian cuts of the form (17) with Πs={(π,π0):απ0+π11,π00}\Pi_{s}=\{(\pi,\pi_{0}):\alpha\pi_{0}+\|\pi\|_{1}\leq 1,\pi_{0}\geq 0\}.

  4. 4.

    Rstr1: Restricted separation of Lagrangian cuts (17) with Πs\Pi_{s} defined in (19) and {πk}k=1K\{\pi_{k}\}_{k=1}^{K} being the coefficients of the last KK Benders cuts corresponding to scenario ss.

  5. 5.

    Rstr2: Restricted separation of Lagrangian cuts (17) with Πs\Pi_{s} defined in (20) and {πk}k=1K\{\pi_{k}\}_{k=1}^{K} being the coefficients of the last KK Benders cuts corresponding to scenario ss.

  6. 6.

    RstrMIP: Restricted separation of Lagrangian cuts (17) with Πs\Pi_{s} defined in (20) and {πk}k=1K\{\pi_{k}\}_{k=1}^{K} determined by solving the MIP approximation (28) with Vs={vk}kIsV^{s}=\{v^{k}\}_{k\in I_{s}} being the set of all Benders cuts coefficients that have been generated for scenario ss.

5.1 Implementation Details

All experiments are run on a Windows laptop with 16GB RAM and an Intel Core i7-7660U processor running at 2.5GHz. All LPs, MIPs and convex quadratic programs (QPs) are solved using the optimization solver Gurobi 8.1.1 for SNIP and SSLP instances, and are solved using Gurobi 9.0.3 for SSLPV instances.

In Algorithm 1, we also terminate if the upper bound is small enough (<106(|θ^s|+1)<10^{-6}(|\hat{\theta}_{s}|+1)) or two consecutive multipliers are too close to each other (infinity norm of the difference <1010<10^{-10}). A computational issue that can occur when solving (17) is that if we obtain a solution with π0=0\pi_{0}=0 (or very small), the solution of (x,y)(x^{*},y^{*}) of (12) does not provide information about Qs(x)Q_{s}(x^{*}) since the coefficients associated with the yy variables are equal zero. In that case, if we store the “optimal” second-stage cost (qs)y(q^{s})^{\top}y^{*} and use it together with xx^{*} to define a cut in the cutting-plane model (18), this second-stage cost value (qs)y(q^{s})^{\top}y^{*} can be very far away from Qs(x)Q_{s}(x^{*}) and lead to a very loose cut. So in our implementation, when π0\pi_{0} is small (<104<10^{-4}), after finding (x,y)(x^{*},y^{*}) from solving (12), we reevaluate Qs(x)Q_{s}(x^{*}) by solving the scenario subproblem (2) with x=xx=x^{*} fixed and use this value as θsx\theta_{s}^{x^{*}}.

For adding Benders cuts in Algorithm 2, the initialization of Q^s0\hat{Q}_{s}^{0} is obtained by one iteration of Benders cuts. Specifically, to avoid unboundedness, we solve minx{0:Axb}\min_{x}\{0:Ax\geq b\} to obtain a candidate solution and add the corresponding Benders cuts for all scenarios. As described in Algorithm 2, lines 2-2, Benders cuts from the LP relaxation are added using the classical cutting-plane method [33]. This implementation was used for the SNIP test problem. For the SSLP test problem we found this phase took too long to converge, so the phase of adding Benders cuts as described in lines 2-2 was replaced by the level method [30]. Benders cuts are only added if the violation is at least 104(|θ^s|+1)10^{-4}(|\hat{\theta}_{s}|+1). We store the coefficients of all Benders cuts that have been added to the master problem. For each sSs\in S, we define Πs\Pi_{s} in the definitions of Rstr1 and Rstr2 based on the coefficients of the KK Benders cuts that have been most recently found by the algorithm for scenario ss.

We use Algorithm 1 for solving the separation problem, with one exception. When doing exact separation of Lagrangian cuts (i.e., without constraining Πs\Pi_{s} to be low-dimensional set) we implement a level method for solving (17), because in that case we found Algorithm 1 converged too slowly. Based on preliminary experiments, the normalization coefficient α\alpha in (19) or (20) is set to be 1 for tests on SSLP and SSLPV instances, and 0.1 for tests on SNIP instances.

In all SSLP, SSLPV, and SNIP instances the set {x:Axb}\{x:Ax\leq b\} is integral and we have relatively complete recourse. Therefore, no feasibility cuts are necessary. Only Lagrangian cuts with π0106\pi_{0}\geq 10^{-6} are added in our tests.

For the BDD tests, we follow the implementation of [4] and set the regularization coefficient as δ0=0.01\delta_{0}=0.01 for SSLP and SSLPV instances and as δ0=100\delta_{0}=100 for SNIP instances. These values are chosen based on preliminary experiments as producing the best improvement in bound over time.

We have also tested our instances using the general purpose SIP solver DSP [34, 22, 21]. All DSP experiments are run on an Ubuntu virtual machine built on the same Windows machine using the binary version of DSP 1.1.3 with CPLEX 12.10 as the MIP solver.

5.2 LP Relaxation Results

We first compare results obtained from different implementations of Lagrangian cuts on our test problems and study the impact of different parameters in the algorithm. The stopping condition of Algorithm 2 for all root node experiments is that either the algorithm hits the one-hour time limit or there is no cut that can be found by the algorithm (with a 10610^{-6} relative tolerance).

To compare the overall performance of the methods visually across many test instances, we present results in the form of a γ\gamma-gap-closed profile. Given a set of problem instances PP and a set of cut generation methods MM, let g¯p\bar{g}_{p} denote the largest gap closed by any of the methods (compared with the LP bound) in MM for instance pPp\in P. The γ\gamma-gap-closed profile is defined with respect to certain threshold γ(0,1]\gamma\in(0,1]. Given the value of γ\gamma, we define tp,mγt^{\gamma}_{p,m} as the earliest time of closing the gap by at least γg¯p\gamma\bar{g}_{p} for problem pPp\in P with method mMm\in M. If method mm did not close a gap at least γg¯p\gamma\bar{g}_{p} before termination, we define tp,mγ=t^{\gamma}_{p,m}=\infty. The γ\gamma-gap-closed profile is a figure representing the cumulative growth (distribution function) of the γ\gamma-gap-closed ratio ρmγ(τ)\rho_{m}^{\gamma}(\tau)’s over time τ\tau where

ρmγ(τ)=|{pP:tp,mγτ}||P|.\rho_{m}^{\gamma}(\tau)=\frac{|\{p\in P:t^{\gamma}_{p,m}\leq\tau\}|}{|P|}.

In this paper, PP is either the set of all SSLP test instances, all SNIP test instances or all SSLPV instances, and MM is the set of all methods with all parameters we have tested. Therefore, for each instance pp, g¯p\bar{g}_{p} is defined to be the largest gap we have been able to close with Lagrangian cuts. We set parameter γ\gamma equal to either 0.750.75 or 0.950.95 for summarizing the information regarding time needed for obtaining what we refer to as a “reasonably good bound” (γ=0.75\gamma=0.75) or a “strong bound” (γ=0.95\gamma=0.95).

To illustrate the behavior of these algorithms we also present the lower bound obtained over time using different cut generation schemes on some individual instances.

5.2.1 Relative Gap Tolerance δ\delta

Refer to caption
Refer to caption
Figure 1: Convergence profile of Exact on an SSLP instance (left) and an SSLPV instance (right) with varying δ\delta values.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: γ\gamma-gap-closed profile for SSLP instances obtained by Exact (top) and Rstr1 (K=10K=10, bottom) with γ=0.75\gamma=0.75 (left) or γ=0.95\gamma=0.95 (right) and varying δ\delta values.
Refer to caption
Refer to caption
Figure 3: γ\gamma-gap-closed profile for SNIP instances obtained by Rstr1 with γ=0.75\gamma=0.75 (left) or γ=0.95\gamma=0.95 (right), K=10K=10 and varying δ\delta values.

We first examine the impact of varying the relative gap tolerance δ\delta. Experiments are run with parameter δ\delta being 0.1%,5%,50%0.1\%,5\%,50\% and 100%100\%, respectively. A small δ\delta value (e.g., 0.1%0.1\%) means solving (17) to near optimality, which may generate strong cuts but can be time-consuming to generate each cut. Using large δ\delta values might sacrifice some strength of the cut in an iteration for an earlier termination.

For different δ\delta values, in Figure 1, we compare the bounds obtained by Exact over time. As the trend is quite similar for instances within the same problem class, we only report the convergence profiles for one SSLP instance (sslp1_20_100_200) and one SSLPV instance (sslp1_20_100_200_var). For the SNIP instances, which have many more first-stage variables, Exact often cannot finish one iteration of separation within an hour, so we put the convergence profile of Exact for one SNIP instance in Appendix. For the SSLP and SSLPV instances, a large δ\delta value tends to close the gap faster than a small δ\delta value. Plots providing the same comparison for Rstr1 with K=10K=10 are provided in Appendix. For Rstr1, a large δ\delta value helps improve the bound efficiently for SSLP and SSLPV instances while the choice of δ\delta has little impact on the performance of the SNIP instances.

To summarize the impact of δ\delta over the full set of test instances, we plot the 0.750.75-gap-closed profile and 0.950.95-gap-closed profile with different δ\delta values for SSLP and SNIP instances in Figures 2 and 3. We omit the gap-closed profiles for the SSLPV instances since they are similar to those for the SSLP instances. The gap-closed profiles for SNIP instances with Exact are omitted as no useful bound is obtained by Exact for any of the SNIP instances. For the SSLP and SSLPV instances, it is preferable to use a large δ\delta value as both a reasonable bound (0.750.75 gap closed) and a strong bound (0.950.95 gap closed) are obtained earlier when δ\delta is large. For the SNIP instances, there is less clear preference although small δ\delta values perform slightly better. This is because the time spent on each iteration of separation does not vary too much by changing the δ\delta value for the SNIP instances.

Conclusion. Depending on the problem, the choice of δ\delta can have either a mild or big impact. But we find a relatively large δ\delta, e.g., δ=50%\delta=50\%, has a stable performance on our instance classes.

5.2.2 Basis Size KK

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: γ\gamma-gap-closed profile for SSLP instances obtained by Rstr1 and Rstr2 (top), and RstrMIP (bottom) with γ=0.75\gamma=0.75 (left) or γ=0.95\gamma=0.95 (right), δ=50%\delta=50\% and varying KK values.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: γ\gamma-gap-closed profile for SNIP instances obtained by Rstr1 and Rstr2 (top), and RstrMIP (bottom) with γ=0.75\gamma=0.75 (left) or γ=0.95\gamma=0.95 (right), δ=50%\delta=50\% and varying KK values.

We next investigate the impact of the choice of the basis size KK in Rstr1, Rstr2 and RstrMIP. In Figures 4 and 5 we plot the 0.750.75-gap-closed profiles and the 0.950.95-gap-closed profiles with different KK values for SSLP and SNIP instances, respectively. The results for SSLPV instances are omitted as they are similar to the results for SSLP instances. Plots showing the evolution of the lower bound obtained using Rstr1, Rstr2 and RstrMIP with different KK values on an example SNIP instance, an example SSLP instance and an example SSLPV instance are given in Appendix.

As seen in Figures 4 and 5, the trends are quite different for different problem classes when applying Rstr1. For SSLP and SSLPV instances, smaller KK performs better for obtaining a reasonably good bound faster since the restricted separation is much easier to solve when KK is small. However, for SNIP instances K=20K=20 seems to dominate K=5K=5 and K=10K=10. We believe this is because the time to generate a cut is not significantly impacted by KK for the SNIP instances, but K=20K=20 gives much stronger cuts for closing the gap. Comparing Rstr1 and Rstr2, Rstr2 appears to be less sensitive to KK in terms of obtaining a reasonably good bound for SSLP instances. On the other hand, the performance of Rstr2 is almost identical to the performance of Rstr1 for SNIP instances. For SSLP and SSLPV instances with Rstr1 or Rstr2, we find that larger KK often leads to longer solution time for obtaining a reasonable bound while small KK (K=5K=5) is incapable of getting a strong bound. We observe that RstrMIP is fairly robust to the choice of the basis size KK. One particular reason is that the constraint kIszkK\sum_{k\in I_{s}}z_{k}\leq K in (28) is often loose in early stages of the algorithm for relatively large KK’s as the constraint απ0+β11\alpha\pi_{0}+\|\beta\|_{1}\leq 1 tends to select a sparse β\beta and thus a sparse zz. For the same instance, different basis sizes result in very close bounds at the end of the run.

Conclusion. The basis size KK yields a trade-off between the strength of cuts and computation time. We find the optimal choice of KK for Rstr1 depends on the instance. Slightly larger KK (K=20K=20) often works better for Rstr2 while the performance of RstrMIP is much more robust to the choice of KK.

5.2.3 Comparison between Methods

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: γ\gamma-gap-closed profile for SSLP instances (top), SSLPV instances (middle), and SNIP instances (bottom) with γ=0.75\gamma=0.75 (left) or γ=0.95\gamma=0.95 (right).

We present results obtained by strengthened Benders cut (StrBen), the Benders dual decomposition (BDD) and different variants of our restricted Lagrangian cut separation (Exact, Rstr1, Rstr2 and RstrMIP). For all variants of restricted Lagrangian cut separation, we choose relative gap tolerance δ=50%\delta=50\%. For basis sizes, we set K=20K=20 for Rstr1, Rstr2 and RstrMIP. We plot 0.750.75-gap-closed profiles and 0.950.95-gap-closed profiles for these methods in Figure 6.

Apparently, RstrMIP has the best performance on SSLP instances in terms of the gap closed over time. On SSLPV instances, the performances of Rstr2 and RstrMIP are similar while much better than other methods. On SNIP instances, the performance of Rstr1, Rstr2 and RstrMIP are almost identical while RstrMIP is much more robust in the choice of KK as observed in Section 5.2.2. The curves for BDD are flat on the bottom plots of Figure 6 as BDD fails to close 75% of the gap for all SNIP instances. Convergence profiles for BDD on one SSLP instance, one SSLPV instance and one SNIP instance can be found in Appendix.

Conclusion. RstrMIP has the best overall performance and in particular is more robust to the choice of KK.

5.2.4 Integrality gaps

In Table 1, we report statistics (minimum, maximum and average) of the LP integrality gap and the integrality gap obtained after 60 minutes of Lagrangian cut generation by RstrMIP with K=20K=20 for each problem class. The integrality gaps are calculated as (zIPz_{\text{IP}}-LB)/|zIP||z_{\text{IP}}|, where LB is the lower bound obtained from the Benders model before or after adding the generated Lagrangian cuts. The results show that RstrMIP closes the vast majority of the gap to the optimal value. Since the resulting bound closely approximates the optimal value, and the exact Lagrangian dual must lie between this bound and the optimal value, these results imply that, on these test instances, our method approximates the optimal value of the Lagrangian dual, and that the Lagrangian dual approximates the optimal value. We emphasize, however, that this experiment is just to illustrate the potential for this method to close a majority of the gap when given enough time – we recommend using an early stopping condition to stop the cut generation process earlier when it is detected that the bound improvement has slowed significantly.

Table 1: Integrality gap closed by Lagrangian cuts generated by RstrMIP.
Gap after RstrMIP
LP gap (%) cuts (%)
Class min max average min max average
SSLP 32.9 58.0 44.4 0.0 3.6 0.7
SSLPV 18.2 27.8 23.8 0.0 2.1 0.8
SNIP 22.1 34.2 28.0 0.8 5.9 3.0

5.3 Tests for Solving to Optimality

Finally, we study the use of our approach for generating Lagrangian cuts within a branch-and-cut method to solve SIPs to optimality, where Benders cuts and integer L-shaped cuts are added using Gurobi lazy constraint callback when a new solution satisfying the integrality constraints is found. We refer to this method as LBC. We consider only RstrMIP with δ=50%\delta=50\% and K=10K=10 as the implementation of Lagrangian cuts. To reduce the time spent on Lagrangian cut generation at the root node, an early termination rule is applied. We stop generating Lagrangian cuts if the gap closed in the last 5 iterations is less than 1% of the total gap closed so far. For comparison, we also experiment with Benders cuts integrated with branch-and-cut (BBC – Algorithm 3 in Appendix) and the DSP solver [34] on the same instances. A two-hour time limit is set for these experiments. Gurobi heuristics are turned off for the branch-and-cut master problem to avoid the bug in Gurobi 8.1.1 that the MIP solution generated by Gurobi heuristics sometimes cannot trigger the callback. (This bug is supposed to be fixed in Gurobi 9.1.)

Table 2: Comparison of algorithms for solving SSLP instances to optimality.
On |S|=50|S|=50 instances On |S|=200|S|=200 instances
LBC BBC DSP LBC BBC DSP
# solved 12/12 0/12 12/12 12/12 0/12 7/12
Avg soln time 1496 7200\geq 7200 1259 4348 7200\geq 7200 5365\geq 5365
Avg gap (%) 0.0 18.1 0.0 27.0
Avg B&C time 25 7200\geq 7200 163 7200\geq 7200
Avg # nodes 775 13808 1411 3954

We present the results obtained by LBC, BBC and DSP on SSLP instances in Table 2. We report the number of instances at sample size |S|=50|S|=50 or |S|=200|S|=200 each algorithm solved within the time limit (‘# solved’), the total average solution time of each algorithm (‘Avg soln time’) including time to generate cuts if relevant, the average ending optimality gap (‘Avg gap (%)’), average time spent solving the problem with branch-and-cut not including the initial cut generation time (‘Avg B&C time’), and the average number of branch-and-bound nodes explored (‘Avg # nodes’). The ending optimality gap of a method on an instance is calculated as (UBLB)/max{|UB|,|LB|}(\text{UB}-\text{LB})/\max\{|\text{UB}|,|\text{LB}|\} where UB and LB are the upper and lower bounds obtained from the method when the time limit was reached.

All SSLP instances are solved to optimality by LBC within the two-hour time limit with at most a few thousand nodes explored. On the other hand, none of the SSLP instances are solved by BBC before hitting the time limit, leaving a relatively large optimality gap. SSLP instances are hard to solve by BBC since neither Benders cuts or integer L-shaped cuts are strong enough to cut off infeasible points efficiently in the case when second-stage variables are discrete. Comparing with DSP, LBC solves more |S|=200|S|=200 SSLP instances than DSP withinin the time limit. The performance of LBC is relatively stable compared with DSP in the sense that LBC scales well as |S||S| increases. The efficiency of DSP heavily depends on the strength of the nonanticipative dual bound. For all instances where DSP outperforms LBC on solution time, no branching is needed for closing the dual gap. We observe that LBC spends most of the solution time on generating Lagrangian cuts for SSLP instances. The branch-and-cut time required after generating Lagrangian cuts is very small.

Table 3: Comparison of algorithms for solving SSLPV instances to optimality.
On |S|=50|S|=50 instances On |S|=200|S|=200 instances
LBC BBC DSP LBC BBC DSP
# solved 11/12 0/12 3/12 7/12 0/12 2/12
Avg soln time 2537\geq 2537 7200\geq 7200 5894\geq 5894 7200\geq 7200
Avg gap (%) 0.1 18.4 0.3 20.8
Avg B&C time 605\geq 605 7197\geq 7197 1156\geq 1156 7191\geq 7191
Avg # nodes 7101\geq 7101 22970\geq 22970 6300\geq 6300 8945\geq 8945

In Table 3, we report results obtained by these methods for the SSLPV instances. We do not report the average solution time of DSP because for many of the SSLPV instances DSP fails at a very early stage by outputting an infeasible solution. Even though the performances of our approaches at the root node is similar for SSLPV as SSLP, we find the SSLPV instances are far more difficult to solve to optimality than the SSLP instances. As with the SSLP instances, BBC is unable to solve any of the SSLPV instances because the Benders cuts are not strong enough. Within the two-hour time limit, only a small fraction of the SSLPV instances can be solved by DSP correctly. On the other hand, LBC solves the majority of the SSLPV instances, and the ending optimality gaps are small for those unsolved instances, showing its capability and stability in dealing with different problem classes.

Table 4: Comparison of algorithms for solving SNIP instances to optimality.
LBC BBC DSP
# solved 20/20 20/20 0/20
Avg soln time 2333 163 7200\geq 7200
Avg gap (%) 0.0 0.0
Avg B&C time 71 116
Avg # nodes 1231 3348
Table 5: Comparison of LBC and BBC on SNIP instances with Gurobi presolve and cuts turned off.
LBC BBC
# solved 20/20 9/20
Avg soln time 2385 4981\geq 4981
Avg gap (%) 0.0 2.9
Avg B&C time 71 4534\geq 4534
Avg # nodes 4236 2923914\geq 2923914

In Table 4 we report results using these methods to solve the SNIP instances. We observe that no SNIP instance can be solved within two hours by DSP. This is because solving the nonanticipative dual problem of SNIP instances is not completed in two hours. BBC solves the SNIP instances faster than LBC as the time spent on cut generation at the root node is much shorter and the Benders cuts, when combined with Gurobi’s presolve and cut generation procedures are sufficient for keeping the number of nodes small.

To better understand the impact of the Lagrangian cuts vs. Benders cuts on their own, we also report results obtained by LBC and BBC with Gurobi presolve and cuts disabled in Table 5. We see that Gurobi presolve and cuts are quite essential for solving SNIP instances when using only Benders cuts. In this case, BBC is unable to solve many of the instances within the time limit. This is because Gurobi presolve and cuts significantly improve the LP bounds for BBC. LBC still solves all SNIP instances, demonstrating the strength of the Lagrangian cuts generated.

6 Conclusion

We derive a new method for generating Lagrangian cuts to add to a Benders formulation of a two-stage SIP. Extensive numerical experiments on three SIP problem classes indicate that the proposed method can improve the bound more quickly than alternative methods for using Lagrangian cuts. When incorporating the proposed method within a simple branch-and-cut algorithm for solving the problems to optimality we find that our method is able to solve one problem class within a time limit that is is not solved by the open-source solver DSP, and performs modestly better on the other problem classes.

A direction of future work is to explore the integration of our techniques for generating Lagrangian cuts with the use of other classes of cuts for two-stage SIPs, and also for multi-stage SIPs via the SDDIP algorithm proposed by [16]

Appendix

A Branch-and-Cut Algorithm

A standard branch-and-cut algorithm for solving two-stage SIPs with either continuous recourse or pure binary first-stage variables is described in Algorithm 3. The algorithm starts with the approximations {Q^s}sS\{\hat{Q}_{s}\}_{s\in S} of {Qs}sS\{Q_{s}\}_{s\in S} uses it to create an LP initial LP relaxation of problem (4) represented by the root node 0 which is added to the list of candidate nodes \mathcal{L}. In our case, the approximations {Q^s}sS\{\hat{Q}_{s}\}_{s\in S} are constructed by Benders cuts and Lagrangian cuts. Within each iteration, the algorithm picks a node from \mathcal{L} and solves the corresponding LP which is a relaxation of problem (4) with the addition of the constraints adding from branching, and generates a candidate solution (x^,{θ^s}sS)(\hat{x},\{\hat{\theta}_{s}\}_{s\in S}). If x^\hat{x} does not satisfy the integrality constraints (x^X\hat{x}\notin X), the algorithm creates two new subproblems (nodes) by subdividing the feasible solutions to that node. The subproblems are created by choosing an integer decision variable xjx_{j} having x^j\hat{x}_{j}\notin\mathbb{Z}, and adding the constraint xjx^jx_{j}\leq\lfloor\hat{x}_{j}\rfloor in one of the subproblems and adding the constraint xjx^jx_{j}\geq\lceil\hat{x}_{j}\rceil in the other subproblem. If x^\hat{x} satisfies the integrality constraints (x^X\hat{x}\in X), the algorithm evaluates Qs(x^)Q_{s}(\hat{x}) for all sSs\in S. If any violated Benders cuts and integer L-shaped cuts are identified, these are used to update {Q^s}sS\{\hat{Q}_{s}\}_{s\in S}, and the relaxation at that node is then solved again. This process of searching for and adding cuts if violated when x^X\hat{x}\in X can be implemented using the “Lazy constraint callback” in commercial solvers. If no violated Benders or integer L-shaped cuts are identified, then the current solution is feasible and thus the upper bound z¯\bar{z} and best known solution (the incumbent xx^{*}) is updated. The algorithm terminates when there are no more nodes to explore in the candidate list \mathcal{L}.

Input: {Q^s}sS\{\hat{Q}_{s}\}_{s\in S}
Output: (x,{θs}sS)(x^{*},\{\theta_{s}^{*}\}_{s\in S})
1 Initialize {0}\mathcal{L}\leftarrow\{0\}, LP0minx,θs{cTx+sSpsθs:θsQ^s(x),sS,Axb}{}_{0}\leftarrow\min_{x,\theta_{s}}\{c^{T}x+\sum_{s\in S}p_{s}\theta_{s}:\theta_{s}\geq\hat{Q}_{s}(x),s\in S,Ax\geq b\}, z¯+\bar{z}\leftarrow+\infty, (x,{θs}sS)(x^{*},\{\theta_{s}^{*}\}_{s\in S})\leftarrow\emptyset
2 while \mathcal{L}\neq\emptyset do
3       Choose a node ii\in\mathcal{L}
4       Solve LPi and, if feasible, obtain optimal solution (x^,{θ^s}sS)(\hat{x},\{\hat{\theta}_{s}\}_{s\in S}) and optimal value z^\hat{z}
5       if LPi is feasible and z^<z¯\hat{z}<\bar{z} then
6             if x^X\hat{x}\in X then
7                   for sSs\in S do
8                         Evaluate Qs(x^)Q_{s}(\hat{x})
9                         Add the Benders cut (7) to update Q^s\hat{Q}_{s} if violated
10                         if X={0,1}nX=\{0,1\}^{n} then
11                              Add the integer L-shaped cut (8) to update Q^s\hat{Q}_{s} if violated
12                         end if
13                        
14                   end for
15                  if θ^s=Qs(x^)\hat{\theta}_{s}=Q_{s}(\hat{x}) for all sSs\in S then
16                         {i}\mathcal{L}\leftarrow\mathcal{L}\setminus\{i\}
17                         (x,{θs}sS)(x^,{θ^s}sS)(x^{*},\{\theta_{s}^{*}\}_{s\in S})\leftarrow(\hat{x},\{\hat{\theta}_{s}\}_{s\in S})
18                         z¯z^\bar{z}\leftarrow\hat{z}
19                   end if
20                  
21            else
22                  Choose an integer variable xjx_{j} with x^j\hat{x}_{j}\neq\mathbb{Z}, then branch on LPi to construct LPi1{}_{i_{1}} and LPi2{}_{i_{2}} such that LPi1{}_{i_{1}} is LPi with an additional constraint xjx^jx_{j}\leq\lfloor\hat{x}_{j}\rfloor and LPi2{}_{i_{2}} is LPi with an additional constraint xjx^jx_{j}\geq\lceil\hat{x}_{j}\rceil
23                   ({i}){i1,i2}\mathcal{L}\leftarrow(\mathcal{L}\setminus\{i\})\cup\{i_{1},i_{2}\}
24             end if
25            
26      else
27            {i}\mathcal{L}\leftarrow\mathcal{L}\setminus\{i\}
28       end if
29      
30 end while
Algorithm 3 Branch-and-cut for SIPs.

Test Problems

The SSLP problem

The SSLP problem [31] is a two-stage SIP with pure binary first-stage and mixed-binary second-stage variables. In this problem, the decision maker has to choose from mm sites to allocate servers with some cost in the first stage. Then in the second stage, the availability of each client ii would be observed and every available client must be served at some site. The first-stage variables are denoted by xx with xj=1x_{j}=1 if and only if a server is located at site jj. The second-stage variables are denoted by yy and y0y_{0} where yijs=1y_{ij}^{s}=1 if and only if client ii is served at site jj in scenario ss and y0jsy_{0j}^{s} is the amount of resource shortage at at site jj in scenario ss. Allocating a server costs cjc_{j} at site jj. Each allocated server can provide up to uu units of resource. Client ii uses dijd_{ij} units of resource and generates revenue qijq_{ij} if served at site jj. Each unit of resource shortage at site jj incurs a penalty q0jq_{0j}. The client availability in scenario ss is represented by hsh^{s} with his=1h^{s}_{i}=1 if and only if client ii is present in scenario ss. The extensive formulation of the problem is as follows:

minx,ys,y0s\displaystyle\min_{x,y^{s},y_{0}^{s}} j=1mcjxj+sSps(j=1mq0jy0jsi=1nj=1mqijyijs)\displaystyle\sum_{j=1}^{m}c_{j}x_{j}+\sum_{s\in S}p_{s}\Big{(}\sum_{j=1}^{m}q_{0j}y_{0j}^{s}-\sum_{i=1}^{n}\sum_{j=1}^{m}q_{ij}y_{ij}^{s}\Big{)}
s.t. i=1ndijyijsy0jsuxj,\displaystyle\sum_{i=1}^{n}d_{ij}y_{ij}^{s}-y_{0j}^{s}\leq ux_{j}, j=1,,m,sS,\displaystyle j=1,\ldots,m,\ s\in S,
j=1myijs=his,\displaystyle\sum_{j=1}^{m}y_{ij}^{s}=h_{i}^{s}, i=1,,n,sS,\displaystyle i=1,\ldots,n,\ s\in S,
xj{0,1},\displaystyle x_{j}\in\{0,1\}, j=1,,m,\displaystyle j=1,\ldots,m,
yijs{0,1},\displaystyle y_{ij}^{s}\in\{0,1\}, i=1,,n,j=1,,m,sS,\displaystyle i=1,\ldots,n,\ j=1,\ldots,m,\ s\in S,
y0js0,\displaystyle y_{0j}^{s}\geq 0, j=1,,m,sS.\displaystyle j=1,\ldots,m,\ s\in S.

All SSLP instances are generated with (m,n){(20,100),(30,70),(40,50),(50,40)}(m,n)\in\{(20,100),(30,70),(40,50),(50,40)\} and |S|{50,200}|S|\in\{50,200\}. For each size (m,n,|S|)(m,n,|S|), three instances are generated with k{1,2,3}k\in\{1,2,3\}. For each SSLP instance, the data are generated as:

  • ps=1/|S|p_{s}=1/|S|, sSs\in S.

  • cjc_{j}\simUniform({40,41,,80}\{40,41,\ldots,80\}), j=1,,mj=1,\ldots,m.

  • dij=qijd_{ij}=q_{ij}\simUniform({0,1,,25}\{0,1,\ldots,25\}), i=1,,ni=1,\ldots,n, j=1,,mj=1,\ldots,m.

  • q0j=1000q_{0j}=1000, j=1,,mj=1,\ldots,m.

  • u=i=1nj=1mdij/mu=\sum_{i=1}^{n}\sum_{j=1}^{m}d_{ij}/m.

  • hish_{i}^{s}\simBernoulli(1/21/2), i=1,,ni=1,\ldots,n, sSs\in S.

All SSLP instances are named of the form sslpkk_mm_nn_|S||S| where mm denotes the number of sites, nn denotes number of clients, |S||S| is the number of scenarios and kk is the instance number.

The SSLPV problem

The SSLPV problem is a variant of the SSLP problem that has mixed-binary first-stage and pure continuous second-stage variables. There are two main differences between SSLP and SSLPV. First, in the SSLPV problem, in addition to choosing which sites to “open” for servers, the decision maker also has an option to add more servers than the “base number” if the site is open. Second, in the second stage, we allow the clients’ demands to be partly met by the servers, and only unmet shortage will be penalized. Thus, in this problem the recourse problem is a linear program. We assume that if a site jj is selected (binary decision) it has base capacity of ubu^{b} (independent of jj), and installing this has cost cjbc^{b}_{j}. In addition, it is possible to install any fraction of an additional unu^{n} units of capacity at site jj, with total cost cjnc_{j}^{n} if the total new capacity is installed and the cost reduced proportionallly if a fraction is installed. The variable representing the amount of additional server capacity added at site jj is denoted by a continuous variable xj+[0,1]x^{+}_{j}\in[0,1]. The extensive formulation of the problem is as follows:

minx,x+,ys,y0s\displaystyle\min_{x,x^{+},y^{s},y_{0}^{s}} j=1m(cjbxj+cjnxj+)+sSps(j=1mq0jy0jsi=1nj=1mqijyijs)\displaystyle\sum_{j=1}^{m}\Big{(}c_{j}^{b}x_{j}+c_{j}^{n}x^{+}_{j}\Big{)}+\sum_{s\in S}p_{s}\Big{(}\sum_{j=1}^{m}q_{0j}y_{0j}^{s}-\sum_{i=1}^{n}\sum_{j=1}^{m}q_{ij}y_{ij}^{s}\Big{)}
s.t. i=1ndijyijsy0jsubxj+unxj+,\displaystyle\sum_{i=1}^{n}d_{ij}y_{ij}^{s}-y_{0j}^{s}\leq u^{b}x_{j}+u^{n}x^{+}_{j}, j=1,,m,sS,\displaystyle j=1,\ldots,m,\ s\in S,
j=1myijs=his,\displaystyle\sum_{j=1}^{m}y_{ij}^{s}=h_{i}^{s}, i=1,,n,sS,\displaystyle i=1,\ldots,n,\ s\in S,
xj{0,1},xj+[0,1],xj+xj,\displaystyle x_{j}\in\{0,1\},~{}x^{+}_{j}\in[0,1],~{}x^{+}_{j}\leq x_{j}, j=1,,m,\displaystyle j=1,\ldots,m,
yijs[0,1],\displaystyle y_{ij}^{s}\in[0,1], i=1,,n,j=1,,m,sS,\displaystyle i=1,\ldots,n,\ j=1,\ldots,m,\ s\in S,
y0js0,\displaystyle y_{0j}^{s}\geq 0, j=1,,m,sS.\displaystyle j=1,\ldots,m,\ s\in S.

Each SSLPV instance corresponds to a SSLP instance. In all SSLPV instances, we set ub=(1ρ)uu^{b}=(1-\rho)u, un=ρuu^{n}=\rho u, and for each site jj, we set cjb=(1ρ)cjc_{j}^{b}=(1-\rho)c_{j}, cjn=ρcjc_{j}^{n}=\rho c_{j}, where ρ=0.5\rho=0.5 and uu and cjc_{j} are the data from the SSLP instance. The other data is the same as the data of the SSLP instances. All SSLP instances are named of the form sslpkk_mm_nn_|S||S|_var where mm denotes the number of sites, nn denotes number of clients, |S||S| is the number of scenarios and kk is the instance number.

The SNIP problem

The SNIP problem [32] is a two-stage SIP with pure binary first-stage and continuous second-stage variables. In this problem the defender interdicts arcs on a directed network by installing sensors in the first stage to maximize the probability of finding an attacker. Then in the second stage, the origin and destination of the attacker are observed and the attacker chooses the maximum reliability path from its origin to its destination. Let NN and AA denote the node set and the arc set of the network and let DAD\subseteq A denote the set of interdictable arcs. The first-stage variables are denoted by xx with xa=1x_{a}=1 if and only if the defender installs a sensor on arc aDa\in D. The scenarios sSs\in S define the possible origin/destination combinations of the attacker, with usu^{s} representing the origin and vsv^{s} representing the destination of the attacker for each scenario sSs\in S. The second-stage variables are denoted by π\pi where πis\pi^{s}_{i} denotes the maximum probability of reaching destination vsv^{s} undetected from node ii in scenario ss. The budget for installing sensors is bb, and the cost of installing a sensor on arc aa is cac_{a} for each arc aDa\in D. For each arc aAa\in A, the probability of traveling on arc aa undetected is rar_{a} if the arc is not interdicted, or qaq_{a} if the arc is interdicted. Finally, let π¯js\bar{\pi}_{j}^{s} denote the maximum probability of reaching the destination undetected from node jj when no sensors are installed. The extensive formulation of the problem is as follows:

minx,πs\displaystyle\min_{x,\pi^{s}} sSpsπuss\displaystyle\sum_{s\in S}p_{s}\pi_{u^{s}}^{s}
s.t. aAcaxab,\displaystyle\sum_{a\in A}c_{a}x_{a}\leq b,
πisraπjs0,\displaystyle\pi_{i}^{s}-r_{a}\pi_{j}^{s}\geq 0, a=(i,j)AD,sS,\displaystyle a=(i,j)\in A\setminus D,\ s\in S,
πisraπjs(raqa)π¯jsxa,\displaystyle\pi_{i}^{s}-r_{a}\pi_{j}^{s}\geq-(r_{a}-q_{a})\bar{\pi}_{j}^{s}x_{a}, a=(i,j)D,sS,\displaystyle a=(i,j)\in D,\ s\in S,
πisqaπjs0,\displaystyle\pi_{i}^{s}-q_{a}\pi_{j}^{s}\geq 0, a=(i,j)D,sS,\displaystyle a=(i,j)\in D,\ s\in S,
πvss=1,\displaystyle\pi_{v^{s}}^{s}=1, sS,\displaystyle s\in S,
xa{0,1},\displaystyle x_{a}\in\{0,1\}, aD.\displaystyle a\in D.

All of the SNIP instances we used in this paper are from [32]. We consider instances with snipno = 3, and budget b{30,50,70,90}b\in\{30,50,70,90\}. All instances have 320 first-stage binary variables, 2586 second-stage continuous variables per scenario and 456 scenarios.

Some Convergence Profiles

Figures 8 - 11 present the evolution of the progress in the lower bound over time on one representative example SSLP instance and one representative example SNIP instance using various cut generation approaches and parameters.

Refer to caption
Figure 7: Convergence profile of an SNIP instance (instance0_50_3) with varying δ\delta values.
Refer to caption
Refer to caption
Refer to caption
Figure 8: Convergence profile of Rstr1 on an SSLP instance (top left), an SSLPV instance (top right), and a SNIP instance (bottom) with K=10K=10 and varying δ\delta values.
Refer to caption
Refer to caption
Refer to caption
Figure 9: Convergence profile of Rstr1 on an SSLP instance (top left), an SSLPV instance (top right), and a SNIP instance (bottom) with δ=50%\delta=50\% and varying KK values.
Refer to caption
Refer to caption
Refer to caption
Figure 10: Convergence profile of Rstr2 on an SSLP instance (top left), an SSLPV instance (top right), and a SNIP instance (bottom) with δ=50%\delta=50\% and varying KK values.
Refer to caption
Refer to caption
Refer to caption
Figure 11: Convergence profile of RstrMIP on an SSLP instance (top left), an SSLPV instance (top right), and an SNIP instance (bottom) with δ=50%\delta=50\% and varying KK values and comparison with BDD.

From Figure 11 we see that for the example SSLP instance, BDD does not make much progress in the first 10 minutes, but then makes steady improvement. The shift occurs when BDD switches from its initial phase of generating only strengthened Benders cuts to the second phase of heuristically generating Lagrangain cuts. In the initial phase BDD the strengthened Benders cuts are not strong enough to substantially improve the bound after the first iteration, so that significant bound progress only occurs again after switching to the second phase. We experimented with making this transition quicker (or even skipping it altogether) but we found that this led to poor performance of BDD. It appears that the first phase is important for building an inner approximation that is used for separation of Lagrangian cuts in the next phase. The convergence profiles of BDD on the SNIP instance and the SSLPV instance clearly show its slow convergence relative to all variations of our proposed restricted Lagrangian cut generation approach.

References

  • [1] J. F. Benders, “Partitioning procedures for solving mixed-variables programming problems,” Numerische Mathematik, vol. 4, pp. 238–252, 1962.
  • [2] R. M. Van Slyke and R. Wets, “L-shaped linear programs with applications to optimal control and stochastic programming,” SIAM Journal on Applied Mathematics, vol. 17, no. 4, pp. 638–663, 1969.
  • [3] C. C. Carøe and R. Schultz, “Dual decomposition in stochastic integer programming,” Operations Research Letters, vol. 24, no. 1-2, pp. 37–45, 1999.
  • [4] R. Rahmaniani, S. Ahmed, T. G. Crainic, M. Gendreau, and W. Rei, “The Benders dual decomposition method,” Operations Research, vol. 68, pp. 878–895, 2020.
  • [5] C. Li and I. E. Grossmann, “An improved l-shaped method for two-stage convex 0–1 mixed integer nonlinear stochastic programs,” Computers & Chemical Engineering, vol. 112, pp. 165–179, 2018.
  • [6] G. Laporte and F. V. Louveaux, “The integer l-shaped method for stochastic integer programs with complete recourse,” Operations research letters, vol. 13, no. 3, pp. 133–142, 1993.
  • [7] S. Sen and J. L. Higle, “The c 3 theorem and a d 2 algorithm for large scale stochastic mixed-integer programming: Set convexification,” Mathematical Programming, vol. 104, no. 1, pp. 1–20, 2005.
  • [8] S. Sen and H. D. Sherali, “Decomposition with branch-and-cut approaches for two-stage stochastic mixed-integer programming,” Mathematical Programming, vol. 106, no. 2, pp. 203–223, 2006.
  • [9] L. Ntaimo, “Fenchel decomposition for stochastic mixed-integer programming,” Journal of Global Optimization, vol. 55, no. 1, pp. 141–163, 2013.
  • [10] D. Gade, S. Küçükyavuz, and S. Sen, “Decomposition algorithms with parametric gomory cuts for two-stage stochastic integer programs,” Mathematical Programming, vol. 144, no. 1-2, pp. 39–64, 2014.
  • [11] M. Zhang and S. Küçükyavuz, “Finitely convergent decomposition algorithms for two-stage stochastic pure integer programs,” SIAM Journal on Optimization, vol. 24, no. 4, pp. 1933–1951, 2014.
  • [12] Y. Qi and S. Sen, “The ancestral benders’ cutting plane algorithm with multi-term disjunctions for mixed-integer recourse decisions in stochastic programming,” Mathematical Programming, vol. 161, no. 1-2, pp. 193–235, 2017.
  • [13] B. Chen, S. Küçükyavuz, and S. Sen, “Finite disjunctive programming characterizations for general mixed-integer linear programs,” Operations Research, vol. 59, no. 1, pp. 202–210, 2011.
  • [14] N. van der Laan and W. Romeijnders, “A converging benders’ decomposition algorithm for two-stage mixed-integer recourse models,” 2020. http://www.optimization-online.org/DB_HTML/2020/12/8165.html.
  • [15] M. Bodur, S. Dash, O. Günlük, and J. Luedtke, “Strengthened benders cuts for stochastic integer programs with continuous recourse,” INFORMS Journal on Computing, vol. 29, no. 1, pp. 77–91, 2017.
  • [16] J. Zou, S. Ahmed, and X. A. Sun, “Stochastic dual dynamic integer programming,” Mathematical Programming, vol. 175, no. 1-2, pp. 461–502, 2019.
  • [17] G. Lulli and S. Sen, “A branch-and-price algorithm for multistage stochastic integer programming with application to stochastic batch-sizing problems,” Management Science, vol. 50, no. 6, pp. 786–796, 2004.
  • [18] M. Lubin, K. Martin, C. G. Petra, and B. Sandıkçı, “On parallelizing dual decomposition in stochastic integer programming,” Operations Research Letters, vol. 41, no. 3, pp. 252–258, 2013.
  • [19] N. Boland, J. Christiansen, B. Dandurand, A. Eberhard, J. Linderoth, J. Luedtke, and F. Oliveira, “Combining progressive hedging with a frank–wolfe method to compute lagrangian dual bounds in stochastic mixed-integer programming,” SIAM Journal on Optimization, vol. 28, no. 2, pp. 1312–1336, 2018.
  • [20] G. Guo, G. Hackebeil, S. M. Ryan, J.-P. Watson, and D. L. Woodruff, “Integration of progressive hedging and dual decomposition in stochastic integer programs,” Operations Research Letters, vol. 43, no. 3, pp. 311 – 316, 2015.
  • [21] K. Kim and B. Dandurand, “Scalable branching on dual decomposition of stochastic mixed-integer programming problems.” Preprint on webpage at http://www.optimization-online.org/DB_HTML/2018/10/6867.html, 2018.
  • [22] K. Kim, C. G. Petra, and V. M. Zavala, “An asynchronous bundle-trust-region method for dual decomposition of stochastic mixed-integer programming,” SIAM Journal on Optimization, vol. 29, no. 1, pp. 318–342, 2019.
  • [23] M. Fischetti, D. Salvagnin, and A. Zanette, “A note on the selection of benders’ cuts,” Mathematical Programming, vol. 124, no. 1-2, pp. 175–182, 2010.
  • [24] P. Kall, S. W. Wallace, and P. Kall, Stochastic programming. Springer, 1994.
  • [25] S. Ahmed, M. Tawarmalani, and N. V. Sahinidis, “A finite branch-and-bound algorithm for two-stage stochastic integer programs,” Mathematical Programming, vol. 100, no. 2, pp. 355–377, 2004.
  • [26] P. Schütz, A. Tomasgard, and S. Ahmed, “Supply chain design under uncertainty using sample average approximation and dual decomposition,” European Journal of Operational Research, vol. 199, no. 2, pp. 409–419, 2009.
  • [27] S. Solak, J.-P. B. Clarke, E. L. Johnson, and E. R. Barnes, “Optimization of r&d project portfolios under endogenous uncertainty,” European Journal of Operational Research, vol. 207, no. 1, pp. 420–433, 2010.
  • [28] R. T. Rockafellar, Convex analysis, vol. 36. Princeton university press, 1970.
  • [29] M. Avriel and A. Williams, “The value of information and stochastic programming,” Operations Research, vol. 18, no. 5, pp. 947–954, 1970.
  • [30] C. Lemaréchal, A. Nemirovskii, and Y. Nesterov, “New variants of bundle methods,” Mathematical programming, vol. 69, no. 1-3, pp. 111–147, 1995.
  • [31] L. Ntaimo and S. Sen, “The million-variable “march” for stochastic combinatorial optimization,” Journal of Global Optimization, vol. 32, no. 3, pp. 385–400, 2005.
  • [32] F. Pan and D. P. Morton, “Minimizing a stochastic maximum-reliability path,” Networks: An International Journal, vol. 52, no. 3, pp. 111–119, 2008.
  • [33] J. E. Kelley, Jr, “The cutting-plane method for solving convex programs,” Journal of the society for Industrial and Applied Mathematics, vol. 8, no. 4, pp. 703–712, 1960.
  • [34] K. Kim and V. M. Zavala, “Algorithmic innovations and software for the dual decomposition method applied to stochastic mixed-integer programs,” Mathematical Programming Computation, vol. 10, no. 2, pp. 225–266, 2018.