A Unified Framework for Multistage and Multilevel Mixed Integer Linear Optimization

1 Introduction

2 Related and Previous Work

3 Setup and preliminaries

4 Computational Complexity

5 Duality and the Value Function

6 Reformulations

7 Algorithmic Approaches

8 Conclusions

Acknowledgements

References

1.1 Motivation

1.2 Focus of the Paper

2.1 Applications

2.2 Algorithms

3.1 Canonical Risk Function

3.2 Technical Assumptions

3.3 Alternative Models

3.4 Related Classes

5.1 Value Functions

5.2 Dual Functions

5.3 Dual Problems

5.4 Structure of the Value Function

5.5 Approximating the Value Function

5.6 Reaction and Risk Functions

5.7 Optimality Conditions

6.1 Value-Function (Optimality-based) Reformulation

6.2 Risk-Function (Projection-based) Reformulation

6.3 Polyhedral (Convexification-based) Reformulation

6.4 Deterministic Equivalent (Decomposition-based) Reformulation

7.1 Decomposition Methods

7.2 Convexification-based Methods

3.3.1 Other Risk Functions.

3.4.1 Single-Stage Problems.

3.4.2 Bilevel Problems.

3.4.3 Two-Stage Stochastic Optimization Problems with Recourse.

3.4.4 Zero-Sum and Interdiction Problems.

5.5.1 Dual Functions from Branch and Bound

5.5.2 Iterative Refinement

5.5.3 Primal Functions

7.1.1 Continuous Problems

7.1.2 Discrete Problems

Alternative Form of (2SMILP).

Pessimistic Risk Function.

Recursive Risk Functions.

Abstract

Assumption 1.

Assumption 2.

Definition 1.

Definition 2.

Theorem 5.1.

Theorem 5.2 (Weak Duality).

Theorem 5.3 (Strong Duality).

Suresh Bolusani [email protected] Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA Stefano Coniglio [email protected] Department of Mathematical Sciences, University of Southampton, Southampton, UK Ted K. Ralphs [email protected] Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA Sahar Tahernejad [email protected] Lindo Systems, Inc., Chicago, IL 60642

(April 16, 2020)

We introduce a unified framework for the study of multilevel mixed integer linear optimization problems and multistage stochastic mixed integer linear optimization problems with recourse. The framework highlights the common mathematical structure of the two problems and allows for the development of a common algorithmic framework. Focusing on the two-stage case, we investigate, in particular, the nature of the value function of the second-stage problem, highlighting its connection to dual functions and the theory of duality for mixed integer linear optimization problems, and summarize different reformulations. We then present two main solution techniques, one based on a Benders-like decomposition to approximate either the risk function or the value function, and the other one based on cutting plane generation.

This article introduces a unified framework for the study of multilevel mixed integer linear optimization problems and multistage stochastic mixed integer linear optimization problems with recourse. This unified framework provides insights into the nature of these two well-known classes of optimization problems, highlights their common mathematical structure, and allows results from the wider literature devoted to both classes to be exploited for the development of a common algorithmic framework.

Historically, research in mathematical optimization, which is arguably the most widely applied theoretical and methodological framework for solving optimization problems, has been primarily focused on “idealized” models aimed at informing the decision process of a single decision maker (DM) facing the problem of making a single set of decisions at a single point in time under perfect information. Techniques for this idealized case are now well developed, with efficient implementations widely available in off-the-shelf software.

In contrast, most real-world applications involve multiple DMs, and decisions must be made at multiple points in time under uncertainty. To allow for this additional complexity, a number of more sophisticated modeling frameworks have been developed, including multistage and multilevel optimization. In line with the recent optimization literature, we use the term multistage optimization to denote the decision process of a single DM over multiple time periods with an objective that factors in the (expected) impact at future stages of the decisions taken at the current stage. With the term multilevel optimization, on the other hand, we refer to game-theoretic decision processes in which multiple DMs with selfish objectives make decisions in turn, competing to optimize their own individual outcomes in the context of settings such as, e.g., economic markets.

Because the distinction between multistage and multilevel optimization problems appears substantial from a modeling perspective, their development has been undertaken independently by different research communities. Indeed, multistage problems have arisen out of the necessity to account for stochasticity, which is done by explicitly including multiple decision stages in between each of which the value of a random variable is realized. Knowledge of the precise values of the quantities that were unknown at earlier stages allows for so-called recourse decisions to be made in later stages in order to correct possible mis-steps taken due to the lack of precise information. On the other hand, multilevel optimization has been developed primarily to model multi-round games (technically known as finite, extensive form games with perfect information in the general case and Stackelberg games in the case of two rounds) in which the decision (or strategy) of a given player at a given round must take into account the reactions of other players in future rounds.

Despite these distinctions, these classes of problems share an important common structure from a mathematical and methodological perspective that makes considering them in a single, unifying framework attractive. It is not difficult to understand the source of this common structure—from the standpoint of an individual DM, the complexity of the decision process comes from uncertainty about the future. From an analytical perspective, the methods we use for dealing with the uncertainty arising from a lack of knowledge of the precise values of input parameters to later-stage decision problems can also be used to address the uncertainty arising from a lack of knowledge of the future actions of another self-interested player. In fact, one way of viewing the outcome of a random variable is as a “decision” made by a DM about whose objective function nothing is known. Both cases require consideration of a set of outcomes arising from either the different ways in which the uncertain future could unfold or from the different possible actions the other players could take. Algorithms for solving these difficult optimization problems must, either explicitly or implicitly, rely on efficient methods for exploring this outcome space. This commonality turns out to be more than a philosophical abstraction. The mathematical connections between multistage and multilevel optimization problems run deep and existing algorithms for the two cases already exhibit common features, as we illustrate in what follows.

In the rest of this paper, we address the broad class of optimization problems that we refer to from here on by the collective name multistage mixed integer linear optimization problems. Such problems allow for multiple decision stages, with decisions at each stage made by a different DM and with each stage followed by the revelation of the value of one or more random variables affecting the available actions at the subsequent stages. Each DM is assumed to have their own objective function for the evaluation of the decisions made at all stages following the stage at which they make their first decision, including stages whose decision they do not control. Importantly, the objective functions of different DMs may (but do not necessarily) conflict. Algorithmically, the focus of such problems is usually on determining an optimal decision at the first stage. At the point in time when later-stage decisions must be made, the optimization problem faced by those DMs has the same form as the one faced by early-stage DMs but, in it, the decisions made at the earlier stages act as deterministic inputs. Note that, while we have assumed different DMs at each stage, it is entirely possible to model scenarios in which a single DM makes multiple decisions over time. From a mathematical standpoint, this latter situation is equivalent to the case in which different DMs share the same objective function and we thus do not differentiate these situations in what follows.

Although the general framework we introduce applies more broadly, we focus here on problems with two decision stages and two DMs, as well as stochastic parameters whose values are realized between the two stages. We further restrict our consideration to the case in which we have both continuous and integer variables but the constraints and objective functions are linear. We refer to these problems as two-stage mixed integer linear optimization problems (2SMILPs). Despite this restricted setting, the framework can be extended to multiple stages and more general forms of constraints and objective functions in a conceptually straightforward way (see, e.g., Section 3.3).

In this section, we give a brief overview of related works. The literature on these topics is vast and the below overview is not intended to be exhaustive by any means, but only to give a general sense of work that has been done to date. The interested reader should consult other articles in this volume for additional background and relevant citations.

Multilevel and multistage structures, whose two level/stage versions are known as bilevel optimization problems and two-stage stochastic optimization problems with recourse (2SPRs), arise naturally in a vast array of applications of which we touch on only a small sample here.

In the modeling of large organizations with hierarchical decision-making processes, such as corporations and governments, Bard (1983) discusses a bilevel corporate structure in which top management is the leader and subordinate divisions, which may have their own conflicting objectives, are the followers. Similarly, government policy-making can be viewed from a bilevel optimization perspective: Bard et al. (2000) models a government encouraging biofuel production through subsidies to the petro-chemical industry, while Amouzegar and Moshirvaziri (1999) models a central authority that sets prices and taxes for hazardous waste disposal while polluting firms respond by making location-allocation and recycling decisions.

A large body of work exists on interdiction problems, which model competitive games where two players have diametrically opposed goals and the first player has the ability to prevent one or more of the second player’s possible activities (variables) from being engaged in at a non-zero level. Most of the existing literature on these problems has focused on variations of the well-studied network interdiction problem Wollmer (1964); McMasters and Mustin (1970); Ghare et al. (1971); Wood (1993); Cormican et al. (1998); Israeli and Wood (2002); Held and Woodruff (2005); Janjarassuk and Linderoth (2008), in which the lower-level DM is an entity operating a network of some sort and the upper-level DM (or interdictor) attempts to interdict the network as much as possible via the removal (complete or otherwise) of portions (subsets of arcs or nodes) of the network. A more general case which does not involve networks (the so-called linear system interdiction problem) was studied in Israeli (1999) and later in DeNegre (2011). A related set of problems involves an attacker disrupting warehouses or other facilities to maximize the resulting transportation costs faced by the firm (the follower) Church et al. (2004); Scaparra and Church (2008); Zhang et al. (2016). A trilevel version of this problem involves the firm first fortifying the facilities, then the attacker interdicting them, and finally the firm re-allocating customers Church and Scaparra (2006). More abstract graph-theoretical interdiction problems in which the vertices of a graph are removed in order to reduce the graphs’ stability/clique number are studied in Rutenburg (1994); Furini et al. (2019); Coniglio and Gualandi (2017).

Multilevel problems arise in a wide range of industries. For instance, in the context of the electricity industry, Hobbs and Nelson (1992) applies bilevel optimization to demand-side management while Grimm et al. (2016) formulates a trilevel problem with power network expansion investments in the first level, market clearing in the second, and redispatch in the third. Cote et al. (2003); Coniglio et al. (2020b) addresses the capacity allocation and pricing problem for the airline industry. Dempe et al. (2005) presents a model for the natural-gas shipping industry. A large amount of work has been carried out in the context of traffic planning problems, including constructing road networks to maximize users’ benefits Ben-Ayed et al. (1992), toll revenue maximization Labbé et al. (1998), and hazardous material transportation Kara and Verter (2004). For a general review on these problems and for one specialized to price-setting, the reader is referred to Migdalas (1995); Labbé and Violin (2013). More applications arise in chemical engineering and bioengineering in the design and control of optimized systems. For example, Clark and Westerberg (1990) optimizes a chemical process by controlling temperature and pressure (first-stage actions) where the system (second stage) reaches an equilibrium as it naturally minimizes the Gibbs free energy. Burgard et al. (2003) develops gene-deletion strategies (first stage) to allow overproduction of a desired chemical by a cell (second stage). In the area of telecommunication networks, bilevel optimization has been used for modeling the behavior of a networking communication protocol (second-level problem) which the network operator, acting as first-level DM, can influence but not directly control. The case of routing over a TCP/IP network is studied in Amaldi et al. (2013c, b, a, 2014).

The literature on game theory features many works on bilevel optimization problems naturally arising from the computation of Stackelberg equilibria in different settings. Two main variants of the Stackelberg paradigm are typically considered: one in which the followers can observe the action that the leader draws from its commitment and, therefore, the commitment is in pure strategies Stackelberg (2010), and one in which the followers cannot do that directly and, hence, the leader’s commitment can be in mixed strategies Conitzer and Sandholm (2006); von Stengel and Zamir (2010). While most of the works focus on the case with a single leader and a single follower (which leads to a proper bilevel optimization problem), some work has been done on the case with more than two players: see Conitzer and Korzhyk (2011); Basilico et al. (2016, 2017); Coniglio et al. (2017); Basilico et al. (2020); Marchesi et al. (2018); Castiglioni et al. (2019b); Coniglio et al. (2020a) for the single-leader multi-follower case, Smith et al. (2014); Lou and Vorobeychik (2015); Laszka et al. (2016); Lou et al. (2017); Gan et al. (2018) for the multi-leader single-follower case, or Castiglioni et al. (2019a); Pang and Fukushima (2005); Leyffer and Munson (2010); Kulkarni and Shanbhag (2014) for the multi-leader multi-follower case. Practical applications are often found in security games, which correspond to competitive situations where a defender (leader) has to allocate scarce resources to protect valuable targets from an attacker (follower) Paruchuri et al. (2008); Kiekintveld et al. (2009); An et al. (2011); Tamble (2011). Other practical applications are found in, among others, inspection games Avenhaus et al. (1991) and mechanism design Sandholm (2002). The works on the computation of a correlated equilibrium Celli et al. (2019) as well those on Bayesian persuasion Celli et al. (2020), where a leader affects the behavior of the follower(s) by a signal, also fall in this category.

Finally, there are deep connections between bilevel optimization and the algorithmic decision framework that drives branch and bound itself, and it is likely that the study of bilevel optimization problems may lead to improved methods for solving single-level optimization problems. For example, the problem of determining the disjunction whose imposition results in the largest bound improvement within a branch-and-bound framework and the problem of determining the maximum bound-improving inequality are themselves bilevel optimization problems Mahajan (2009); Mahajan and Ralphs (2010); Coniglio and Tieves (2015). The same applies in $n$ -ary branching when one looks for a branching decision leading to the smallest possible number of child nodes Lodi et al. (2009); Segundo et al. (2019).

Multistage problems and, in particular, two-stage stochastic optimization problems with recourse, arise in an equally wide array of application areas, including scheduling, forestry, pollution control, telecommunication and finance. Grass and Fischer (2016) surveys literature, applications, and methods for solving disaster-management problems arising in the humanitarian context. Gupta et al. (2007) addresses network-design problems where, in the second stage, after one of a finite set of scenarios is realized, additional edges of the network can be bought, and provides constant-factor approximation algorithms. A number of works address the two-stage stochastic optimization with recourse version of classical combinatorial optimization problems: among others, Dhamdhere et al. (2005) considers the spanning-tree problem, Katriel et al. (2007) the matching problem, and Gendreau et al. (1996); Gørtz et al. (2012) the vehicle routing problem. For references to other areas of applicability, see the books Birge and Louveaux (2011); Kall and Mayer (2010) and, in particular, Wallace and Ziemba (2005).

The first recognizable formulations for bilevel optimization problems were introduced in the 1970s in Bracken and McGill (1973) and this is when the term was also first coined. Beginning in the early 1980s, these problems attracted increased interest. Vicente and Calamai (1994) provide a large bibliography of the early developments.

There is now a burgeoning literature on continuous bilevel linear optimization, but it is only in the past decade that work on the discrete case has been undertaken in earnest by multiple research groups. Moore and Bard (1990) was the first to introduce a framework for general integer bilevel linear optimization and to suggest a simple branch-and-bound algorithm. The same authors also proposed a more specialized algorithm for binary bilevel optimization problems in Bard and Moore (1992). Following these early works, the focus shifted primarily to various special cases, especially those in which the lower-level problem has the integrality property. Dempe (2001) considers a special case characterized by continuous upper-level variables and integer lower-level variables and uses a cutting plane approach to approximate the lower-level feasible region (a somewhat similar approach is adopted in Dempe and Kue (2017) for solving a bilinear mixed integer bilevel problem with integer second-level variables). Wen and Yang (1990) consider the opposite case, where the lower-level problem is a linear optimization problem and the upper-level problem is an integer optimization problem, using linear optimization duality to derive exact and heuristic solutions.

The publication of a general algorithm for pure integer problems in DeNegre and Ralphs (2009) (based on the groundwork laid in a later-published dissertation DeNegre (2011)) spurred renewed interest in developing general-purpose algorithms. The evolution of work is summarized in Table 1, which indicates the types of variables supported in both the first and second stages (C indicates continuous, B indicates binary, and G indicates general integer). The aforementioned network interdiction problem is a special case that continues to receive significant attention, since tractable duality conditions exist for the lower-level problem Wollmer (1964); McMasters and Mustin (1970); Ghare et al. (1971); Wood (1993); Cormican et al. (1998); Israeli and Wood (2002); Held and Woodruff (2005); Janjarassuk and Linderoth (2008).

Citation	Stage 1 Variable Types	Stage 2 Variable Types
Wen and Yang (1990)	B	C
Bard and Moore (1992)	B	B
Faísca et al. (2007)	B, C	B, C
Saharidis and Ierapetritou (2008)	B, C	B, C
Garcés et al. (2009)	B	C
DeNegre and Ralphs (2009), DeNegre (2011)	G	G
Köppe et al. (2010)	G or C	G
Baringo and Conejo (2012)	B, C	C
Xu and Wang (2014)	G	G, C
Zeng and An (2014)	G, C	G, C
Caramia and Mari (2015)	G	G
Caprara et al. (2016)	B	B
Hemmati and Smith (2016)	B, C	B, C
Tahernejad et al. (2016)	G, C	G, C
Wang and Xu (2017)	G	G
Lozano and Smith (2017)	G	G, C
Fischetti et al. (2018), Fischetti et al. (2017)	G, C	G, C

Table 1: Evolution of algorithms for bilevel optimization

As for the case of multistage stochastic optimization, the two-stage linear stochastic optimization problem with recourse in which both the first- and second-stage problems contain only continuous variables has been well studied both theoretically and methodologically. Birge and Louveaux (2011); Kall and Mayer (2010) survey the related literature. The integer version of the problem was first considered in the early 1990s by Louveaux and van der Vlerk (1993) for the case of two-stage problems with simple integer recourse. Combining the methods developed for the linear version with the branch-and-bound procedure, Laporte and Louveaux (1993) proposed an algorithm known as the integer L-shaped method where the first-stage problem contains only binary variables. Due to the appealing structural properties of a (mixed) binary integer optimization problem, a substantial amount of literature since then has been considering the case of a two-stage stochastic problem with (mixed) binary variables in one or both stages Sen and Higle (2005); Sherali and Zhu (2006); Gade et al. (2012). The case of two-stage problems with a pure integer recourse has also been frequently visited, see Schultz et al. (1998); Kong et al. (2006). It must be noted that methods such as these, which are typically developed for special cases, often rely on the special structure of the second-stage problem, thus being often not applicable to the two-stage problem with mixed integer restrictions. Algorithms for stochastic optimization problems with integer recourse were proposed by Carøe and Tind (1998) and Sherali and Fraticelli (2002).

The defining feature of a multistage optimization problem is that the values of the first-stage variables (sometimes called upper-level or leader variables in the bilevel optimization literature) must be (conceptually) fixed without explicit knowledge of future events, the course of which can be influenced by the first-stage decision itself. Due to this influence, the perceived “value” of the first-stage decision must take into account the effect of this decision on the likelihood of occurrence of these future events.

More concretely, the first-stage DM’s overall objective is to minimize the sum of two terms, the first term representing the immediate cost of implementation of the first-stage solution and the second term representing the desirability of the first-stage decision in terms of its impact on the decisions taken at later stages. The general form of a two-stage mixed integer linear optimization problem is then

\min_{x\in\mathcal{P}_{1}\cap X}\left\{cx+\Xi(x)\right\},

(2SMILP)

where

\mathcal{P}_{1}=\left\{x\in\mathbb{R}^{n_{1}}\mid A^{1}x\geq b^{1}\right\}

is the first-stage feasible region, with $A^{1}\in\mathbb{Q}^{m_{1}\times n_{1}}$ and $b^{1}\in\mathbb{Q}^{m_{1}}$ defining the associated linear constraints, and $X=\mathbb{Z}^{r_{1}}_{+}\times\mathbb{Q}^{n_{1}-r_{1}}_{+}$ representing integrality, rationality, and non-negativity requirements on the first-stage variables, denoted by $x$ . Note that we require the continuous variables to take on rational values in order to ensure that the second-stage problem defined in (SS) below has an optimal value that is attainable when that value is finite. In practice, solvers always return such solutions, so this is a purely technical detail. The linear function $cx$ with $c\in\mathbb{Q}^{n_{1}}$ is the aforementioned term that reflects the immediate cost of implementing the first-stage solution. The function $\Xi:\mathbb{Q}^{n_{1}}\rightarrow\mathbb{Q}\cup\{\pm\infty\}$ is the risk function, which takes only rational input for technical reasons discussed later. $\Xi$ is the aforementioned term representing the first-stage DM’s evaluation of the impact of a given choice for the value of the first-stage variables on future decisions. Similar concepts of risk functions have been employed in many different application domains and will be briefly discussed in Section 3.3. To enable the development of a practical methodology for the solution of these problems, however, we now define the specific class of functions we consider.

Our canonical risk function is a generalization of the risk function traditionally used in defining 2SPRs. As usual, let us now introduce a random variable $U$ over an outcome space $\Omega$ representing the set of possible future scenarios that could be realized between the making of the first- and second-stage decisions. The values of this random variable will be input parameters to the so-called second-stage problem to be defined below.

As is common in the literature on 2SPRs, we assume that $U$ is discrete, i.e., that the outcome space $\Omega$ is finite, so that $\omega\in\Omega$ represents which of a finite number of explicitly enumerated scenarios is actually realized. In practice, this assumption is not very restrictive, as one can exploit any algorithm for the case in which $\Omega$ is assumed finite to solve cases where $\Omega$ is not (necessarily) finite by utilizing a technique for discretization, such as sample average approximation (SAA) Shapiro (2003). As $U$ is discrete in this work, we can associate with it a probability distribution defined by $p\in\mathbb{R}^{|\Omega|}$ such that $0\leq p_{\omega}\leq 1$ and $\sum_{\omega\in\Omega}p_{\omega}=1$ .

With this setup, the canonical risk function for $x\in\mathbb{Q}^{n_{1}}$ is

\Xi(x)=\mathbb{E}\left[\Xi_{\omega}(x)\right]=\sum_{\omega\in\Omega}p_{\omega}\Xi_{\omega}(x),

(RF)

where $\Xi_{\omega}(x)$ is the scenario risk function, defined as

\Xi_{\omega}(x)=\min\left\{d^{1}y^{\omega}\;\middle|\;y^{\omega}\in\operatorname{argmin}\{d^{2}y\mid y\in\mathcal{P}_{2}(b^{2}_{\omega}-A^{2}_{\omega}x)\cap Y\}\right\};

(2SRF)

the set

\displaystyle\mathcal{P}_{2}(\beta)

\displaystyle=\left\{y\in\mathbb{R}^{n_{2}}\mid G^{2}y\geq\beta\right\}

is one member of a family of polyhedra that is parametric w.r.t. the right-hand side vector $\beta\in\mathbb{R}^{m_{2}}$ and represents the second-stage feasibility conditions; and $Y=\mathbb{Z}^{r_{2}}_{+}\times\mathbb{Q}^{n_{2}-r_{2}}_{+}$ represents the second-stage integrality and non-negativity requirements. The deterministic input data defining $\Xi_{\omega}$ are $d^{1},d^{2}\in\mathbb{Q}^{n_{2}}$ and $G^{2}\in\mathbb{Q}^{m_{2}\times n_{2}}$ . $A^{2}_{\omega}\in\mathbb{Q}^{m_{2}\times n_{1}}$ and $b^{2}_{\omega}\in\mathbb{Q}^{m_{2}}$ represent the realized values of the random input parameters in scenario $\omega\in\Omega$ , i.e., $U(\omega)=(A^{2}_{\omega},b^{2}_{\omega})$ .

As indicated in (RF) and (2SRF), the inner optimization problem faced by the second-stage DM is parametric only in its right-hand side, which is determined jointly by the value $\omega$ of the random variable $U$ and by the chosen first-stage solution. It will be useful in what follows to define a family of second-stage optimization problems

\inf\left\{d^{2}y\;\middle|\;y\in\mathcal{P}_{2}(\beta)\cap Y\right\}

(SS)

that are parametric in the right-hand side $\beta\in\mathbb{R}^{m_{2}}$ (we use “ $\inf$ ” instead of “ $\min$ ” here because, for $\beta\not\in\mathbb{Q}^{m_{2}}$ , the minimum may not exist). By further defining

\displaystyle\beta^{\omega}(x)=b^{2}_{\omega}-A^{2}_{\omega}x

to be the parametric right-hand side that arises when the chosen first-stage solution is $x\in X$ and the realized scenario is $\omega\in\Omega$ , we can identify the member of the parametric family defined in (SS) in scenario $\omega\in\Omega$ when the chosen first-stage solution is $x\in X$ as that with feasible region $\mathcal{P}_{2}(\beta^{\omega}(x))\cap Y$ . Associated with each $x\in\mathbb{Q}^{n_{1}}$ and $\omega\in\Omega$ is the set of all alternative optimal solutions to the second-stage problem (SS) (we allow for $x\not\in X$ here because such solutions arise when solving certain relaxations), called the rational reaction set and denoted by

\mathcal{R}^{\omega}(x)=\operatorname{argmin}\{d^{2}y\mid y\in\mathcal{P}_{2}(b^{2}_{\omega}-A^{2}_{\omega}x)\cap Y\}.

For a given $x\in\mathbb{Q}^{n_{1}}$ , $\mathcal{R}^{\omega}(x)$ may be empty if $\mathcal{P}_{2}(b^{2}_{\omega}-A^{2}_{\omega}x)\cap Y$ is itself empty or if the second-stage problem is unbounded (we assume in Section 3.2 that this cannot happen, however).

When $|\mathcal{R}^{\omega}(x)|>1$ , the second-stage DM can, in principle, choose which alternative optimal solution to implement. We must therefore specify in the definition of the risk function a rule by which to choose one of the alternatives. According to our canonical risk function (RF) and the corresponding scenario risk function (2SRF), the rule is to choose, for each scenario $\omega\in\Omega$ , the alternative optimal solution that minimizes $d^{1}y^{\omega}$ , which corresponds to choosing the collection $\{y^{\omega}\}_{\omega\in\Omega}$ of solutions to the individual scenario subproblems that minimizes

\displaystyle d^{1}\left(\sum_{\omega\in\Omega}p_{\omega}y^{\omega}\right).

This is known as the optimistic or semi-cooperative case in the bilevel optimization literature, since it corresponds to choosing the alternative that is most beneficial to the first-stage DM. Throughout the paper, we consider this case unless otherwise specified. In Section 3.3, we discuss other forms of risk function.

Because of the subtleties introduced above, there are a number of ways one could define the “feasible region” of (2SMILP). We define the feasible region for scenario $\omega$ (with respect to both first- and second-stage variables) as

\mathcal{F}^{\omega}=\{(x,y^{\omega})\in X\times Y\mid x\in\mathcal{P}_{1},y^{\omega}\in\mathcal{R}^{\omega}(x)\}

(FR)

and members of $\mathcal{F}^{\omega}$ as feasible solutions for scenario $\omega$ . Note that this definition of feasibility does not prevent having $(x,y^{\omega})\in\mathcal{F}^{\omega}$ but $d^{1}y^{\omega}>\Xi_{\omega}(x)$ . This will not cause any serious difficulties, but is something to keep in mind.

We can similarly define the feasible region with respect to just the first-stage variables as

\mathcal{F}^{1}=\bigcap_{\omega\in\Omega}\operatorname{proj}_{x}(\mathcal{F}^{\omega}).

(FR-Proj)

Since $\Xi(x)=\infty$ for $x\in\mathbb{Q}^{n_{1}}$ if the feasible region $\mathcal{P}_{2}(\beta^{\omega}(x))\cap Y$ of the second-stage problem (SS) is empty for some $\omega\in\Omega$ , we have that, for $x\in\mathcal{P}_{1}\cap X$ , the following are equivalent:

x\in\mathcal{F}^{1}\Leftrightarrow x\in\bigcap_{\omega\in\Omega}\operatorname{proj}_{x}(\mathcal{F}^{\omega})\Leftrightarrow\mathcal{R}^{\omega}(x)\not=\emptyset\;\forall\omega\in\Omega\Leftrightarrow\Xi(x)<\infty.

Finally, it will be convenient to define $\mathcal{P}^{\omega}$ to be the feasible region of the relaxation of the deterministic two-stage problem under scenario $\omega\in\Omega$ that is obtained by dropping the optimality requirement for the second-stage variables $y^{\omega}$ , as well as any integrality restrictions. Formally, we have:

\mathcal{P}^{\omega}=\left\{(x,y^{\omega})\in\mathbb{R}^{n_{1}+n_{2}}_{+}\mid x\in\mathcal{P}_{1},y^{\omega}\in\mathcal{P}_{2}(b^{2}_{\omega}-A^{2}_{\omega}x)\right\}.

Later in Section 7, we will use these sets to define a relaxation for the entire problem that will be used as the basis for the development of a branch-and-cut algorithm.

We now note the following assumptions made in the remainder of the paper.

$\mathcal{P}^{\omega}$ is bounded for all $\omega\in\Omega$ .

This assumption, which is made for ease of presentation and can be relaxed, results in the boundedness of (2SMILP).

All first-stage variables with at least one non-zero coefficient in the second-stage problem (the so-called linking variables) are integer, i.e.,

L=\left\{i\in\{1,\dots,n_{1}\}\;\middle|\;a^{\omega}_{i}\neq 0\textrm{ for some }\omega\in\Omega\right\}\subseteq\left\{1,...,r_{1}\right\},

where $a^{\omega}_{i}$ represents the $i^{\textrm{th}}$ column of matrix $A^{2}_{\omega}$ .

These two assumptions together guarantee that an optimal solution exists whenever (2SMILP) is feasible Vicente et al. (1996). It also guarantees that the convex hull of $\mathcal{F}^{\omega}$ is a polyhedron, which is important for the algorithms we discuss later. Note that, due to the assumption of optimism, we can assume w.l.o.g. that all first-stage variables are linking variables by simply interpreting the non-linking variables as belonging to the second stage. While this may seem conceptually inconsistent with the intent of the original model, it is not difficult to see that the resulting model is mathematically equivalent, since these variables do not affect the second-stage problem and thus, the optimistic selection of values for those variables will be the same in either case.

Before closing this section, we remark that, in this article, we do not allow second-stage variables in the first-stage constraints. While this case can be handled with techniques similar to those we describe in the paper from an algorithmic perspective, it does require a more complicated notation which, for the sake of clarity, we prefer not to adopt. Detailed descriptions of algorithms for this more general case in the bilevel setting are provided in Tahernejad et al. (2016); Bolusani and Ralphs (2020).

For completeness, we present here an alternative form of (2SMILP) that is closer to the traditional form in which bilevel optimization problems are usually specified in the literature. Adopting the traditional notation, (2SMILP) can be alternatively written as

Note that, in the first stage, the minimization is carried out with respect to both $x$ and $y$ . This again specifies the optimistic case discussed earlier, since the above formulation requires that, for a given $x\in X$ , we select $\{y^{\omega}\}_{\omega\in\Omega}$ such that

\displaystyle d^{1}\left(\sum_{\omega\in\Omega}p_{\omega}y^{\omega}\right)

is minimized.

As already pointed out, the canonical risk function defined in (RF) assumes the optimistic case, since it encodes selection of the alternative optimal solution to the second-stage problem that is most beneficial to the first-stage DM. This is the case we focus on in the remainder of the paper. The pessimistic case, on the other hand, is easily modeled by defining the scenario risk function to be

\Xi_{\omega}(x)=\max\left\{d^{1}y^{\omega}\;\middle|\;y^{\omega}\in\operatorname{argmin}\{d^{2}y\mid y\in\mathcal{P}_{2}(\beta^{\omega}(x)\cap Y)\}\right\}.

We remark that, while the optimistic and pessimistic cases may coincide in some cases (e.g., when (SS) admits a single optimal solution for every $x$ ), this coincidence is rarely observed in practice and would be hard to detect in any case. In general, the pessimistic case is more difficult to solve, though the algorithms discussed in Section 7 can be modified to handle it.

Although we limit ourselves to problems with two stages in this paper, we briefly mention that more general risk functions can be defined by recursively defining risk functions at earlier stages in terms of later-stage risk functions. This is akin to the recursive definition of the cost-to-go functions that arise in stochastic dynamic programming (see Bertsekas (2017)). With such recursive definitions, it is possible to generalize much of the methodology described here in a relatively straightforward way, though the algorithm complexity grows exponentially with the addition of each stage. It is doubtful exact algorithms can be made practical in such cases.

Other forms of risk function have been used in the literature, especially in finance. In robust optimization, for example, one might consider a risk function of the form

\Xi(x)=\max_{\omega\in\Omega}\left\{\Xi_{\omega}(x)\right\},

which models the impact on the first-stage DM of the worst-case second-stage realization of the random variables. A popular alternative in finance applications that is slightly less conservative is the conditional value at risk, the expected value taken over the worst $\alpha$ -percentile of outcomes Rockafellar and Uryasev (2000); Uryasev (2000). While it is possible to incorporate such risk functions into the general algorithmic framework we present here, for the purposes of limiting the scope of the discussion, we focus herein only on risk functions in the canonical form (RF).

With $\Xi$ defined as in (RF), the problem (2SMILP) generalizes several well-known classes of optimization problems.

When $d^{1}=d^{2}$ and $|\Omega|=1$ , the two stages of (2SMILP) can be collapsed into a single stage and the problem reduces to a traditional mixed integer linear optimization problem (MILP). It is natural that algorithms for (2SMILP) rely heavily on solving sequences of related single-stage MILPs and we discuss parametric versions of this class in later sections. For continuity, we utilize the notation for the second-stage variables and input data throughout. The case of $r_{2}=0$ (in which there are no integer variables) further reduces to a standard linear optimization problem (LP).

When $|\Omega|=1$ and assuming that we may have $d^{1}\not=d^{2}$ , (2SMILP) takes the form of a mixed integer bilevel linear optimization problem (MIBLP). Dropping the scenario super/subscript for simplicity, this problem is more traditionally written as

\min_{x\in\mathcal{P}_{1}\cap X,y\in Y}\left\{cx+d^{1}y\,\Big{|}\,y\in\underbrace{\operatorname{argmin}\{d^{2}y\mid y\in\mathcal{P}_{2}(b^{2}-A^{2}x)\cap Y\}}_{\mathcal{R}(x)}\right\}.

(MIBLP)

Note that this formulation implicitly specifies the optimistic case, since if $\mathcal{R}(x)$ is not a singleton, it requires that among the alternative optima, the solution minimizing $d^{1}y$ be chosen. In this setting, the bilevel risk function can be written as

\displaystyle\Xi(x)

\displaystyle=\min\left\{d^{1}y\mid y\in\mathcal{R}(x)\right\}.

When $d^{1}=d^{2}$ , either the inner or the outer minimization in (2SRF) is redundant and (2SMILP) takes the form of a two-stage stochastic mixed integer linear optimization problem with recourse. In this case, for each scenario $\omega\in\Omega$ we can write the scenario risk function more simply as

\Xi_{\omega}(x)=\min\big{\{}d^{1}y^{\omega}\mid y^{\omega}\in\mathcal{P}_{2}(b^{2}_{\omega}-A^{2}_{\omega}x)\cap Y\big{\}}.

The second-stage solution $y^{\omega}$ corresponding to scenario $\omega\in\Omega$ is usually called the recourse decision. These problems involve a single DM optimizing a single objective function, but capable of controlling two sets of variables: the first-stage here-and-now variables $x$ and the second-stage wait-and-see variables $y^{\omega}$ , whose value is set after observing the realization of the random event $\omega$ .

For $d^{1}=-d^{2}$ (and typically, $|\Omega|=1$ ), (2SMILP) subsumes the case of zero-sum problems, which model competitive games in which two players have exactly opposing goals. An even more specially-structured subclass of zero-sum problems are interdiction problems, in which the first-stage variables are in one-to-one correspondence with those of the second stage and represent the ability of the first-stage DM to “interdict” (i.e., forcing to take value zero) individual variables of the second-stage DM. Formally, the effect of interdiction can be modeled using a variable upper-bound constraint

y\leq u(e-x)

in the second-stage problem, where $u\in\mathbb{R}^{n}$ is a vector of natural upper bounds on the vector of variables $y$ and $e$ is an $n$ -dimensional column vector of ones (here, $n=n_{1}=n_{2}$ ). Formally, the mixed integer interdiction problem is

\max_{x\in\mathcal{P}_{1}\cap X}\min_{y\in\mathcal{P}_{2}(x)\cap Y}d^{2}y

where (abusing notation slightly), we have

\displaystyle\mathcal{P}_{2}(x)

\displaystyle=\left\{y\in\mathbb{R}^{n_{2}}\mid G^{2}y\geq b^{2},y\leq u(e-x)\right\}.

Within the discrete optimization community, the framework typically used for assessing problem complexity is based primarily on the well-known theory of ${\mathsf{NP}}$ -completeness, which has evolved from the foundational work of Cook (1971), Karp (1975), and Garey and Johnson (1979). This has lead to the ubiquitous practice of classifying optimization problems as being either in the class ${\mathsf{P}}$ or the class ${\mathsf{NP}}$ -hard, the latter being an all-encompassing and amorphous class that includes essentially all optimization problems not known to be polynomially solvable. This categorization lacks the refinement necessary for consideration of classes such as those described in this article. It is indeed easy to show that multistage optimization problems are ${\mathsf{NP}}$ -hard in general Calamai and Vicente (1994); Jeroslow (1985); Ben-Ayed and Blair (1990); Hansen et al. (1992), but this merely tells us that these problems are not in ${\mathsf{P}}$ (assuming ${\mathsf{P}}\not={\mathsf{NP}}$ ), which is not surprising. What we would really like to know is for which complexity class (the decision versions of) these problems are complete.

In the presence of a hierarchical structure with $k$ levels (and when $\Omega$ is a singleton), the natural complexity class to consider is $\Sigma_{k}^{\mathsf{P}}$ , i.e., the $k^{\textrm{th}}$ level of the polynomial hierarchy. From an optimization perspective, this hierarchy (originally introduced in Stockmeyer (1976)) is a scheme for classifying multilevel decision problems beyond the usual classes ${\mathsf{P}}$ and ${\mathsf{NP}}$ . The class ${\mathsf{P}}$ (which contains all decision problems that can be solved in polynomial time) occupies the $0^{\textrm{th}}$ level, also known as $\Sigma_{0}^{\mathsf{P}}$ . The first level, $\Sigma_{1}^{\mathsf{P}}$ , is the class also known as ${\mathsf{NP}}$ , which consists of all problems for which there exists a certificate verifiable in polynomial time or, equivalently, all problems that can be solved in non-deterministic polynomial time. The $k^{\textrm{th}}$ level, $\Sigma_{k}^{\mathsf{P}}$ , contains all problems with certificates that can be be verified in polynomial time (equivalently, all problems solvable in non-deterministic polynomial time), assuming the existence of an oracle for solving problems in the class $\Sigma_{k-1}^{\mathsf{P}}$ . While it is clear that $\Sigma_{k}^{\mathsf{P}}\subseteq\Sigma_{\ell}^{\mathsf{P}}$ for any $k,\ell\in\mathbb{N}\cup\{0\}$ with $k\leq\ell$ , $\Sigma_{k}^{\mathsf{P}}\subset\Sigma_{\ell}^{\mathsf{P}}$ is conjectured to hold for all $k,\ell\in\mathbb{N}\cup\{0\}$ with $k<\ell$ (the well-known ${\mathsf{P}}\neq{\mathsf{NP}}$ conjecture is a special case). It is also known that $\Sigma_{k}^{\mathsf{P}}=\Sigma_{k+1}^{\mathsf{P}}$ would imply $\Sigma_{k}^{\mathsf{P}}=\Sigma_{\ell}^{\mathsf{P}}$ for all $\ell\geq k+1$ , which would cause the polynomial hierarchy to collapse to level $k$ (for $k=0$ , we would have ${\mathsf{P}}={\mathsf{NP}}$ ). The notions of completeness and hardness commonly used for ${\mathsf{NP}}$ translate directly to $\Sigma_{k}^{\mathsf{P}}$ . A proof that $k$ -level optimization problems with binary variables, linear constraints, and linear objective functions are hard for $\Sigma_{k}^{\mathsf{P}}$ is contained in Jeroslow (1985). Such result suffices to show that the multistage problems with $k$ stages treated in this paper are (in their optimization version) $\Sigma_{k}^{\mathsf{P}}$ -hard (and those with $k=2$ stages are $\Sigma_{2}^{\mathsf{P}}$ -hard). A compendium of $\Sigma_{2}^{\mathsf{P}}$ -complete/hard problems, somewhat similar in spirit to Garey and Johnson (1979), can be found in Schaefer and Umans (2002), with more recent updates available online.

For the case of two-stage stochastic optimization problems with recourse with linear constraints, linear objective functions, and mixed integer variables, the assumption of a finite outcome space $\Omega$ of either fixed or polynomially bounded size suffices to guarantee that the decision version of such a problem is ${\mathsf{NP}}$ -complete. Indeed, when $|\Omega|$ is considered a constant or is bounded by a polynomial in the total number of variables and constraints, one can directly introduce a block-structured reformulation of the problem with one block per scenario $\omega\in\Omega$ that contains the coefficients of the constraints that $y^{\omega}$ should satisfy (we discuss such a reformulation in Section 6). As such reformulation is of polynomial size, solutions to the corresponding optimization problem can clearly be certified in polynomial time by checking that they satisfy all the polynomially-many constraints featured in the formulation, which, in turn, implies that the problem belongs to ${\mathsf{NP}}$ . When the outcome space $\Omega$ is continuous, the problem becomes $\#{\mathsf{P}}$ -hard in general Dyer and Stougie (2006); Hanasusanto et al. (2016). While a single sample average approximation problem with a finite or polynomially-bounded number of samples can be used to approximate a continuous problem by solving a single discrete optimization problem of polynomial size, Hanasusanto et al. (2016) shows that even finding an approximate solution using the SAA method is $\#{\mathsf{P}}$ -hard. New results on the complexity of 2SPRs featuring a double-exponential algorithm can be found in Klein (2019).

Virtually all algorithms for the exact solution of optimization problems produce a proof of optimality that depends on the construction of a solution to a strong dual. Although the duality theory for MILPs is not widely known, the most effective algorithms for solving MILPs (which are variants of the well-known branch-and-bound algorithm) do produce a solution to a certain dual problem. A natural approach to solving (2SMILP) is therefore to embed the production of the “dual proof” of optimality of the second-stage problem (SS) into the formulation of the first-stage problem, reducing the original two-stage problem to a traditional single-stage optimization problem.

The reformulations and algorithmic approaches that we present in Sections 6 and 7 all use some variant of this strategy. In particular, the algorithms we describe are based on iteratively strengthening an initial relaxation in a fashion reminiscent of many iterative optimization algorithms. The strengthening operation essentially consists of the dynamic construction of both a proof of optimality of the second-stage problem and of corresponding first- and second-stage solutions.

In the remainder of the section, we introduce the central concepts of a duality theory for mixed integer linear optimization problems (and more general discrete optimization problems), emphasizing its connection to solution methods for (2SMILP). This introduction is necessarily brief and we refer the reader to Güzelsoy and Ralphs (2007); Hassanzadeh and Ralphs (2014b, a) for more details specific to the treatment here and to Wolsey (1981b); Williams (1996) for earlier foundational work on IP duality. Although the “dual problem” is usually a fixed (non-parametric) optimization problem associated with a fixed (non-parametric) “primal problem,” the typical concepts of duality employed in constructing dual proofs of optimality and in designing solution algorithms inherently involve parametric families of optimization problems. This makes the tools offered by this theory particularly suitable for employment in this setting. To preserve the connection with the material already introduced, we consider the family of MILPs parameterized on the right-hand side $\beta\in\mathbb{R}^{m_{2}}$ that was introduced earlier as (SS) and use the same notation. We reproduce it here for convenience:

\inf\left\{d^{2}y\;\middle|\;y\in\mathcal{P}_{2}(\beta)\cap Y\right\},

(SS)

where

\displaystyle\mathcal{P}_{2}(\beta)

\displaystyle=\left\{y\in\mathbb{R}^{n_{2}}\mid G^{2}y\geq\beta\right\}

and $\beta\in\mathbb{R}^{m_{2}}$ is the input parameter. When we want to refer to a (fixed) generic instance in this parametric family, the notation $b$ will be used to indicate a fixed (but arbitrary) right-hand side. We also refer to specific right-hand sides arising in the solution of (2SMILP) using the notation defined earlier.

Among possible notions of duality, the one most relevant to the development of optimization algorithms is one that also has an intuitive interpretation in terms of familiar economic concepts. This theory rests fundamentally on an understanding of the so-called value function, which we introduce below. The value function of an MILP has been studied by a number of authors and a great deal is known about its structure and properties. Early work on the value function includes Blair and Jeroslow (1977, 1979, 1984); Blair (1995), while the material here is based on the work in Güzelsoy and Ralphs (2007); Güzelsoy (2009); Hassanzadeh and Ralphs (2014b, a).

As a starting point, consider an instance of (SS) with fixed right-hand side $b$ and let us interpret the values of the variables as specifying a numerical “level of engagement” in certain activities in an economic market. Further, let us interpret the constraints as corresponding to limitations imposed on these activities due to available levels of certain scarce resources (it is most natural to think of “ $\leq$ ” constraints in this interpretation). In each row $j$ of the constraint matrix, the coefficient $G^{2}_{ij}$ associated with activity (variable) $i$ can then be thought of as representing the rate at which resource $j$ is consumed by engagement in activity $i$ . In this interpretation, the optimal primal solution then specifies the level of each activity in which one should engage in order to maximize profits (it is most natural here to think in terms of maximization), given the fixed level of resources $b$ .

Assuming that additional resources were available, how much should one be willing to pay? The intuitive answer is that one should be willing to pay at most the marginal amount by which profits would increase if more of a particular resource were made available. Mathematically, this information can be extracted from the value function $\phi:\mathbb{R}^{m_{2}}\rightarrow\mathbb{R}\cup\{\pm\infty\}$ associated with (SS), defined by

\phi(\beta)=\inf_{y\in\mathcal{P}_{2}(\beta)\cap Y}d^{2}y,

(2SVF)

for $\beta\in\mathbb{R}^{m_{2}}$ . Since this function returns the optimal profit for any given basket of resources, its gradient at $b$ (assuming $\phi$ is differentiable at $b$ ) tells us what the marginal change in profit would be if the level of resources available changed in some particular direction. Thus, the gradient specifies a “price” on that basket of additional resources.

The reader familiar with the theory of duality for linear optimization problems should recognize that the solution to the usual dual problem associated with an LP of the form (SS) (i.e., assuming $r_{2}=0$ ) provides exactly this same information. In fact, we describe below that the set of optimal solutions to the LP dual are precisely the subgradients of its associated value function. This dual solution can hence be interpreted as a linear price on the resources and is sometimes referred to as a vector of “dual prices.” The optimal dual prices allow us to easily determine whether it will be profitable to enter into a particular activity $i$ by comparing the profit $d^{2}_{i}$ obtained by entering into that activity to the cost $uG^{2}_{i}$ of the required resources, where $u$ is a given vector of dual prices and $G_{i}^{2}$ is the $i$ -th column of $G^{2}$ . The difference $d^{2}_{i}-uG^{2}_{i}$ between the profit and the cost is the reduced profit/cost in linear optimization. It is easily proven that the reduced profit of each activity entered into at a non-zero level (i.e., reduced profits of the variables with non-zero value) must be non-negative (again, in the case of maximization) and duality provides an intuitive economic interpretation of this result.

Although the construction of the full value function is challenging even in the simplest case of a linear optimization problem, approximations to the value function in the local area around $b$ can still be used for sensitivity analysis and in optimality conditions. The general dual problem we describe next formalizes this idea by formulating the problem of constructing a function that bounds the value function from below everywhere but yields a strong approximation near a fixed right-hand side $b$ . Such a so-called “dual function” can yield approximate “prices” and its iterative construction can also be used in a more technical way to guide the evolution of an algorithm by providing gradient information helpful in finding the optimal solution, as well as providing a proof of its optimality.

The above discussion leads to the natural concept of a dual (price) function from which we can derive a general notion of a dual problem.

A dual function $F:\mathbb{R}^{m_{2}}\rightarrow\mathbb{R}$ is a function that satisfies $F(\beta)\leq\phi(\beta)$ for all $\beta\in\mathbb{R}^{m_{2}}$ . We call such a dual function strong at $b\in\mathbb{R}^{m_{2}}$ if $F(b)=\phi(b)$ .

Dual functions are naturally associated with relaxations of the original problem, as the value function of any relaxation yields a feasible dual function. In particular, the value function of the well-known LP relaxation is the best convex under-estimator of the value function.

Also of interest are functions that bound the value function from above, which we refer to as primal functions.

A primal function $H:\mathbb{R}^{m_{2}}\rightarrow\mathbb{R}$ is a function that satisfies $H(\beta)\geq\phi(\beta)$ for all $\beta\in\mathbb{R}^{m_{2}}$ . We call such a primal function strong at $b$ if $H(b)=\phi(b)$ .

In contrast to dual functions, primal functions are naturally associated with restrictions of the original problem and the value function of any such restriction yields a valid primal function.

It is immediately evident that a pair of primal and dual functions yields optimality conditions. If we have a primal function $H^{*}$ and a dual function $F^{*}$ such that $F^{*}(b)=\gamma=H^{*}(b)$ for some $b\in\mathbb{R}^{m_{2}}$ , then we must also have $\phi(b)=\gamma$ . Proofs of optimality of this nature are produced by many optimization algorithms.

The concepts we have discussed so far further lead us to the definition of a generalized dual problem, originally introduced in Güzelsoy and Ralphs (2007), for an instance of (SS) with right-hand side $b\in\mathbb{R}^{m_{2}}$ . This problem simply calls for the construction of a dual function that is strong for a particular fixed right-hand side $b\in\mathbb{R}^{m_{2}}$ by determining

\max_{F\in\Upsilon^{m_{2}}}\{F(b):F(\beta)\leq\phi(\beta),\ \forall\beta\in\mathbb{R}^{m_{2}}\},

(MILPD)

where $\Upsilon^{m_{2}}\subseteq\{f\;|\;f:\mathbb{R}^{m_{2}}\rightarrow\mathbb{R}\}$ . Here, $\Upsilon^{m_{2}}$ can be taken to be a specific class of functions, such as linear or subadditive, to obtain specialized dual problems for particular classes of optimization problems. It is clear that (MILPD) always has a solution $F^{*}$ that is strong, provided that the value function is real-valued everywhere (and hence belongs to $\Upsilon^{m_{2}}$ , however it is defined), since $\phi$ itself is a solution whenever it is finite everywhere.¹¹1When the value function is not real-valued everywhere, we have to show that there is a real-valued function that coincides with the value function when it is real-valued and is itself real-valued everywhere else, but is still a feasible dual function (see Wolsey (1981a)).

Although it may not be obvious, this notion of a dual problem naturally generalizes existing notions for particular problem classes. For example, consider again a parametric family of LPs defined as in (SS) (i.e., assuming $r_{2}=0$ ). We show informally that the usual LP dual problem with respect to a fixed instance with right-hand side $b$ can be derived by taking $\Upsilon^{m_{2}}$ to be the set of all non-decreasing linear functions in (MILPD) and simplifying the resulting formulation. First, let a non-decreasing linear function $F:\mathbb{R}^{m_{2}}\rightarrow\mathbb{R}$ be given. Then, $\exists u\in\mathbb{R}^{m_{2}}_{+}$ such that $F(\beta)=u\beta$ for all $\beta\in\mathbb{R}^{m_{2}}$ . It follows that

\displaystyle F(\beta)=u\beta\leq uG^{2}y=\sum_{j=1}^{n_{2}}uG^{2}_{j}y_{j}\;\forall\beta\in\mathbb{R}^{m_{2}}.

From the above, it then follows that, for any $\beta\in\mathbb{R}^{n_{2}}$ , we have

	$\displaystyle uG^{2}_{j}\leq d^{2}_{j}\;\forall j\in\{1,\dots,m_{2}\}$	$\displaystyle\Rightarrow u\beta\leq uG^{2}y\leq d^{2}y\;\forall y\in\mathcal{P}_{2}(\beta)\cap Y$
		$\displaystyle\Rightarrow u\beta\leq\min_{y\in\mathcal{P}_{2}(\beta)\cap Y}d^{2}y$
		$\displaystyle\Rightarrow u\beta\leq\phi(\beta)$
		$\displaystyle\Rightarrow F(\beta)\leq\phi(\beta).$

The conditions on the left-hand side above are exactly the feasibility conditions for the usual LP dual and the final condition on the right is the feasibility condition for (MILPD). Hence, the usual dual feasibility conditions ensure that $u$ defines a linear function that bounds the value function from below and is a dual function in the sense we have defined. The fact that the epigraph of $\phi$ is a convex polyhedral cone in this case (it is the max of linear functions associated with extreme points of the feasible region of the dual problem) is enough to show that the dual (MILPD) is strong in the LP case, even when we take $\Upsilon^{m_{2}}$ to be the set of (non-decreasing) linear functions. Furthermore, it is easy to show that any subgradient of $\phi$ at $b$ is an optimal solution (and in fact, the set of all dual feasible solutions is precisely the subdifferential of the value function at the origin).

The concepts just discussed can be easily seen in the following small example (note that this example is equality-constrained, in which case most of the above derivation carries through unchanged, but the dual function no longer needs to be non-decreasing).

Example 1

	$\displaystyle\min$	$\displaystyle\ 6y_{1}+7y_{2}+5y_{3}$
	$\displaystyle\operatorname{s.\!t.}$	$\displaystyle\ 2y_{1}-7y_{2}+y_{3}=b$
		$\displaystyle\ y_{1},y_{2},y_{3}\in\mathbb{R}_{+}.$

The solution to the dual of this LP is unique whenever $b$ is non-zero and can be easily obtained by considering the ratios $c_{j}/a_{j}$ of objective coefficient to constraint coefficient for $j=1,2,3$ , which determine which single primal variable will take a non-zero value in the optimal basic feasible solution. Depending on the sign of $b$ , we obtain one of two possible dual solutions:

u^{*}=\begin{cases}\displaystyle 6/2=3&\textrm{if }b>0\\ \displaystyle 7/(-7)=-1&\textrm{if }b<0.\end{cases}

Thus, the value function associated with this linear optimization problem is as shown in Figure 1. Note that, when $b=0$ , the dual solution is not unique and can take any value between $-1$ and $3$ . This set of solutions corresponds to the set of subgradients at the single point of non-differentiability of the value function.

Refer to caption — Figure 1: Value Function of the LP in Example 5.3.

This function has one of two gradients for all points that are differentiable, and these gradients are equal to one of the two dual solutions derived above. ∎

As we mentioned previously, solution methods for (2SMILP) inherently involve the explicit or implicit approximation of several functions, including the value function $\phi$ in (2SVF) and the risk function $\Xi$ in (RF), which ultimately derives its structure from $\phi$ . Here, we summarize the results described in the series of papers Güzelsoy and Ralphs (2007); Güzelsoy (2009); Hassanzadeh and Ralphs (2014b). Most importantly, the function is piecewise polyhedral, lower semi-continuous, subadditive, and has a discrete structure that is derivative of the structure of two related value functions which we now introduce.

Let $y_{C}$ , $d^{2}_{C}$ , and $G^{2}_{C}$ be the parts of each of the vectors/matrices describing the second-stage problem (SS) that are associated with the continuous variables and let $y_{I}$ , $d^{2}_{I}$ , and $G^{2}_{I}$ be likewise for the integer variables. The continuous restriction (CR) is the LP obtained by dropping the integer variables in the second-stage problem (or equivalently, setting them to zero). This problem has its own value function, defined as

\phi_{C}(\beta)=\min_{y_{C}\in\mathbb{R}^{n_{2}-r_{2}}_{+}}\{d^{2}_{C}y_{C}\mid G^{2}_{C}y_{C}\geq\beta\}.

(CRVF)

On the other hand, if we instead drop the continuous variables from the problem, we can then consider the integer restriction (IR), which has value function

\phi_{I}(\beta)=\min_{y_{I}\in\mathbb{Z}^{r_{2}}_{+}}\{d^{2}_{I}y_{I}\mid G^{2}_{I}y_{I}\geq\beta\}.

(IRVF)

To illustrate how these two functions combine to yield the structure of $\phi$ and to briefly summarize some of the important results from the study of this function carried out in the aforementioned papers, consider the following simple example of a two-stage stochastic mixed integer linear optimization problem with a single constraint. Note that, in this example, $d^{1}=d^{2}$ and we are again considering the equality-constrained case in order to make the example a bit more interesting.

Example 2

	$\displaystyle\min$	$\displaystyle\Psi(x_{1},x_{2})=\ -3x_{1}-4x_{2}+\mathbb{E}[\phi(b^{2}_{\omega}-2x_{1}-0.5x_{2})]$
	$\displaystyle\operatorname{s.\!t.}$	$\displaystyle\ x_{1}\leq 5,x_{2}\leq 5$
		$\displaystyle\ x_{1},x_{2}\in\mathbb{R}_{+},$

where

	$\displaystyle\phi(\beta)=\min$	$\displaystyle\ 6y_{1}+4y_{2}+3y_{3}+4y_{4}+5y_{5}+7y_{6}$
	$\displaystyle\operatorname{s.\!t.}$	$\displaystyle\ 2y_{1}+5y_{2}-2y_{3}-2y_{4}+5y_{5}+5y_{6}=\beta$
		$\displaystyle\ y_{1},y_{2},y_{3}\in\mathbb{Z}_{+},y_{4},y_{5},y_{6}\in\mathbb{R}_{+},$

with $\Omega=\{1,2\}$ , $b_{1}^{2}=6$ , and $b_{2}^{2}=12$ . Figure 2 shows the objective function $\Psi$ and the second-stage value function $\phi$ for this example.

Examining Figure 5.4, it appears that $\phi$ is comprised of a collection of translations of $\phi_{C}$ , each of which has a structure that is similar to the value function of the LP in Example 5.3. At points where $\phi$ is differentiable, the gradient always corresponds to one of the two solutions to the dual of the continuous restriction (precisely as in Example 5.3, dual solutions are the ratios of objective function coefficients to constraint coefficients for the continuous variables), which are in turn also the gradients of $\phi_{C}$ . The so-called points of strict local convexity are the points of non-differentiability that are the extreme points of the epigraphs of the translations of $\phi_{C}$ and are determined by the solutions to the integer restriction. In particular, they correspond to points at which $\phi_{I}$ and $\phi$ are coincident. For instance, observe that, in the example, $\phi_{I}(5)=\phi(5)=4$ . Finally, we can observe in Figure 5.4 how the structure of $\phi$ translates into that of $\Psi$ . ∎

Although the case illustrated in the example is of a single constraint, these properties can be made rigorous and do generalize to higher dimension with roughly the same intuition. The general principle is that the value function of an MILP is the minimum of translations of the value function of the continuous restriction $\phi_{C}$ , where the points of translation (the aforementioned points of strict local convexity) are determined by the value function of the integer restriction $\phi_{I}$ .

Hassanzadeh and Ralphs (2014b) Under the assumption that $\{\beta\in\mathbb{R}^{m_{2}}\mid\phi_{I}(\beta)<\infty\}$ is compact and $\operatorname{epi}(\phi_{C})$ is pointed, there exists a finite set $\mathcal{S}\subseteq Y$ such that

\phi(\beta)=\min_{y_{I}\in\mathcal{S}}\{d_{I}^{2}y_{I}+\phi_{C}(\beta-G_{I}^{2}y_{I})\}.

Under the assumptions of the theorem, this result provides a finite description of $\phi$ .

Constructing functions that bound the value function of an MILP is an important part of solution methods for both traditional single-stage optimization problems and their multistage counterparts. Functions bounding the value function from below are exactly the dual functions defined earlier and arise from relaxations of the original problem (such as the LP relaxation, for instance). They are naturally obtained as by-products of common solution algorithms, such as branch and bound, which itself works by iteratively strengthening a given relaxation and produces a dual proof of optimality.

Functions that bound the value function from above are the primal functions defined earlier and can be obtained by considering the value function of a restriction of the original problem. While it is a little less obvious how to obtain such functions in general (solution algorithms generally do not produce practically useful primal functions), they can be obtained by taking the minimum over the value functions of restrictions obtained by fixing the values of integer variables, as we describe below.

Both primal and dual functions can be iteratively improved by producing new such functions and combining them with existing ones by taking the maximum over several bounding functions in the dual case or the minimum over several bounding functions in the primal case. When such functions are iteratively constructed to be strong at different right-hand side values, such as when solving a sequence of instances with different right-hand sides, such a technique can yield an approximation with good fidelity across a larger part of the domain than any singly constructed function could—this is, in fact, the principle implicitly behind the algorithms we describe in Section 7.1.

Dual functions can be obtained from most practical solution algorithms for solving the MILP associated with (2SVF) with input $\beta=b^{2}_{\omega}-A_{\omega}^{2}x$ , i.e., for computing $\phi(b^{2}_{\omega}-A^{2}_{\omega}x)$ . This is because their existence is (at least implicitly) the very source of the proof of optimality produced by such algorithms. To illustrate, we show how a dual function can be obtained as the by-product of the branch-and-bound algorithm.

Branch and bound is an algorithm that searches the feasible region by partitioning it and then recursively solving the resulting subproblems. Implemented naively, this results in an inefficient complete enumeration, but this potential inefficiency is avoided by utilizing upper and lower bounds computed for each subproblem to intelligently “prune” the search. The recursive partitioning process can be envisioned as a process of searching a rooted tree, each of whose nodes corresponds to a subproblem. Although it is not usually described this way, the branch-and-bound algorithm can be interpreted as constructing a function feasible to (MILPD), thus producing not only a solution but also a dual proof of its optimality.

To understand this interpretation, suppose we evaluate $\phi(b)$ for $b\in\mathbb{R}^{m_{2}}$ by solving the associated MILP using a standard branch-and-bound algorithm with branching done on elementary (a.k.a. variable) disjunctions of type $y_{j}\leq\pi_{0}\vee y_{j}\geq\pi_{0}+1$ for some $j\in\{1,\dots,r_{2}\}$ and $\pi_{0}\in\mathbb{Z}$ . Because the individual subproblems in the branch-and-bound tree differ only by the bounds on the variables, it will be convenient to assume that all integer variables have finite initial lower and upper bounds denoted by the vectors $u,l\in\mathbb{Z}^{r_{2}}$ (such bounds exist by Assumption 1, even if they are not part of the formulation). In this case, we have that

\mathcal{P}_{2}(\beta)=\{y\in\mathbb{R}^{n_{2}}\mid G^{2}y\geq\beta,y_{I}\geq l,-y_{I}\geq-u\}.

The solution of (SS) by branch and bound for right-hand side $b$ yields a branch-and-bound tree whose leaves, contained in the set $\mathcal{T}$ , are each associated with the subproblem

\min\{d^{2}y\mid y\in\mathcal{P}_{2}^{t}(b)\cap Y\},

where

\mathcal{P}_{2}^{t}(\beta)=\{y\in\mathbb{R}^{n_{2}}\mid G^{2}y\geq\beta,y_{I}\geq l^{t},-y_{I}\geq-u^{t}\}

is the parametric form of the polytope containing the feasible points associated with subproblem $t\in\mathcal{T}$ , with $l^{t},\ u^{t}\in\mathbb{Z}_{+}^{r_{2}}$ being the modified bounds on the integer variables imposed by branching.

Assuming that no pruning took place, the validity of the overall method comes from the fact that valid methods of branching ensure that

\bigcup_{t\in\mathcal{T}}\mathcal{P}_{2}^{t}(b)\cap Y=\mathcal{P}_{2}(b)\cap Y,

so that the feasible regions of the leaf nodes constitute a partition of the original feasible region. This partition can be thought of as a single logical disjunction that serves to strengthen the original LP relaxation. The proof of optimality that branch and bound produces derives from global lower and upper bounds derived from local bounds associated with each node $t\in\mathcal{T}$ . We denote by $L^{t}$ and $U^{t}$ , respectively, the lower and upper bounds on the optimal solution value of the subproblem, where

\displaystyle L^{t}

\displaystyle=\phi_{\rm LP}^{t}(b),\quad U^{t}=d^{2}y^{t},

in which $\phi^{t}_{LP}$ is as defined in (NVF) below and $y^{t}\in Y$ is the best solution known for subproblem $t\in\mathcal{T}$ ( $U^{t}=\infty$ if no solution is known). Assuming (SS) is solved to optimality and $y^{*}\in Y$ is an optimal solution, we must have

\displaystyle L:=\min_{t\in\mathcal{T}}L^{t}=d^{2}y^{*}=\min_{t\in\mathcal{T}}U^{t}=:U,

where $L$ and $U$ are the global lower and upper bounds.

From the information encoded in the branch-and-bound tree, the overall dual function can be constructed by deriving a parametric form of the lower bound, combining dual functions for the individual subproblems in set $\mathcal{T}$ . For this purpose, we define the value function

\phi_{\rm LP}(\beta,\lambda,\nu)=\min\{d^{2}y\mid y\in\mathcal{P}_{2}(\beta),\lambda\leq y_{I}\leq\nu,y_{C}\geq 0\}

(PNVF)

of a generic LP relaxation, which captures the bounds on the integer variables as also being parametric. Using (PNVF), the value function of the LP relaxation associated with a particular node $t\in\mathcal{T}$ (only parametric in the original right-hand side) can be obtained as

\phi_{\rm LP}^{t}(\beta)=\phi_{\rm LP}(\beta,l^{t},u^{t}).

(NVF)

For all $t\in\mathcal{T}$ such that $\phi_{\rm LP}^{t}(b)<\infty$ , let $({v^{t}},\underline{v}^{t},\bar{v}^{t})$ be an optimal solution to the dual of the LP relaxation at node $t\in\mathcal{T}$ , where $v^{t}$ , $\underline{v}^{t}$ , and $\bar{v}^{t}$ are, respectively, the dual variables associated with the original inequality constraints, the lower bounds on integer variables, and the upper bounds on integer variables. Then, by LP duality we have that

\phi_{\rm LP}^{t}(b)=v^{t}b+\underline{v}^{t}l^{t}-\bar{v}^{t}u^{t}.

For each node $t\in\mathcal{T}$ for which $\phi_{\rm LP}^{t}(b)=\infty$ (the associated subproblem is infeasible), we instead let $({v^{t}},\underline{v}^{t},\bar{v}^{t})$ be a dual feasible solution that provides a finite bound exceeding $U$ (such can be found by, e.g., adding some multiple of the dual ray that proves infeasibility to the feasible dual solution found in the final iteration of the simplex algorithm).

Finally, from the above we have that

\underline{\phi}_{\rm LP}^{t}(\beta)=v^{t}\beta+\underline{v}^{t}l^{t}-\bar{v}^{t}u^{t}

(NDF)

is a dual function w.r.t. the LP relaxation at node $t$ that is strong at $b$ . Finally, we can combine these individual dual functions together to obtain

\underline{\phi}^{\mathcal{T}}(\beta)=\min_{t\in\mathcal{T}}\underline{\phi}_{\rm LP}^{t}(\beta)=\min_{t\in\mathcal{T}}\{v^{t}\beta+\underline{v}^{t}l^{t}-\bar{v}^{t}u^{t}\},

(BB-DF)

a dual function for the second-stage problem yielded by the tree $\mathcal{T}$ and which is also strong at the right-hand side $b$ , i.e., $\underline{\phi}^{\mathcal{T}}(b)=\phi(b)$ .

In principle, a stronger dual function can be obtained by replacing the single linear dual function (which is strong at $b$ for the LP relaxation) associated with each subproblem above by its full value function $\phi_{\rm LP}^{t}$ to obtain

\underline{\phi}^{\mathcal{T}}_{*}(\beta)=\min_{t\in\mathcal{T}}\phi_{\rm LP}^{t}(\beta).

(BB-DF-BIS)

In practice, constructing a complete description of $\underline{\phi}^{\mathcal{T}}_{*}$ is not practical (even evaluating it for a given $\beta$ requires the solution of $|\mathcal{T}|$ LPs). We can instead construct a function that bounds it from below (and hence is also a dual function for the original problem) by exploiting the entire collection of dual solutions arising during the solution process. For example, let

\displaystyle\underline{\phi}_{\rm LP}(\beta,\lambda,\nu)=\max_{t\in\mathcal{T}}\{v^{t}\beta+\underline{v}^{t}\lambda-\bar{v}^{t}\nu\},

which consists of an approximation of the full value function $\phi_{\rm LP}$ using the optimal dual solutions at each leaf node. Replacing $\underline{\phi}_{\rm LP}^{t}(\beta)$ with $\underline{\phi}_{\rm LP}(\beta,l^{t},u^{t})$ in (BB-DF) results in a potentially stronger but still practical dual function. Of course, it is also possible to add dual solutions found at non-leaf nodes, as well as other suboptimal dual solutions arising during the solution process, but there is an obvious trade-off between strength and tractability. More details on this methodology are contained in Güzelsoy and Ralphs (2007); Hassanzadeh and Ralphs (2014a).

In iterative algorithms such as those we introduce in Section 7, the single dual function (BB-DF) we get by evaluating the value function for one right-hand side can be iteratively augmented by taking the maximum over a sequence of similarly derived dual functions. Taking this basic idea a step further, Ralphs and Güzelsoy (2005, 2006); Güzelsoy (2009) developed methods of warm starting the solution process of an MILP. Such methods may serve to enhance tractability, though this is still an active area of research. When evaluating $\phi$ repeatedly for different values in its domain, we do not need to solve each instance from scratch—it is possible to use the tree resulting from a previous solve as a starting point and simply further refine it to obtain a dual function that remains strong for the previous right-hand side of interest and is made to be strong for a new right-hand side.

Hassanzadeh and Ralphs (2014a) shows how to use this iterative-refinement approach to construct a lower approximation of the value function of an MILP in the context of a Benders-like algorithm for two-stage stochastic optimization within a single branch-and-bound tree. In fact, with enough sampling this method can be used to construct a single tree whose associated dual function is strong at every right-hand side (provided the set of feasible right-hand sides is finite). refinement approach in approximating the value function of Example 5.4.

Example 3

Consider the value function of Example 5.4, reported in Figure 2. The sequence of evaluations of the value function in this example are the ones arising from first-stage solutions generated by solving the master problem in a generalized Benders algorithm, such as the one described in Section 7.1. Here, we only illustrate the way in which the dual function is iteratively refined in each step.

We first evaluate $\phi(3.5)$ by branch and bound. Figure 3 shows both the tree obtained (far left) and the value function itself (in blue). The dual function arises from the solution to the dual of the LP relaxation in each of the nodes in the branch-and-bound tree. We exhibit the values of the dual solution for each node in the tree in Table 2. Explicit upper and lower bounds were added with upper bounds initially taking on a large value $M$ , representing infinity. Note that the dual values associated with the bound constraints are actually nothing more than the reduced costs associated with the variables.

Table 2: Dual solutions for each node in the branch-and-bound tree

$t$	$v^{t}$	$\underline{v}^{t}$						$\bar{v}^{t}$
$0$	$0.8$	$4.4$	$0.0$	$4.6$	$5.6$	$1.0$	$3.0$	$0.0$	$0.0$	$0.0$	$0.0$	$0.0$	$0.0$
$1$	$1.0$	$4.0$	$0.0$	$5.0$	$6.0$	$0.0$	$2.0$	$0.0$	$-1.0$	$0.0$	$0.0$	$0.0$	$0.0$
$2$	$-1.5$	$9.0$	$11.5$	$0.0$	$1.0$	$12.5$	$14.5$	$0.0$	$0.0$	$0.0$	$0.0$	$0.0$	$0.0$

The dual function associated with this first branch-and-bound tree is the minimum of the two linear functions shown in Figure 3 in green and labeled as “Node 1” and “Node 2.” Formally, this dual function is

\underline{\phi}^{\mathcal{T}_{1}}=\min\{\underline{\phi}_{\rm LP}^{1},\underline{\phi}_{\rm LP}^{2}\},

where the nodal dual functions for the three nodes are

	$\displaystyle\underline{\phi}_{\rm LP}^{0}(\beta)$	$\displaystyle=0.8\beta$
	$\displaystyle\underline{\phi}_{\rm LP}^{1}(\beta)$	$\displaystyle=\beta$
	$\displaystyle\underline{\phi}_{\rm LP}^{2}(\beta)$	$\displaystyle=-1.5\beta+11.5.$

In other words, we have $v^{1}_{0}=1$ (the value of the dual variable associated with the single equality constraint in Node 1), while $\underline{v}^{1}l^{1}-\bar{v}^{1}u^{1}=0$ (this is the contribution from the dual value corresponding to the bound constraints, which we take to be a constant here, as in (NDF)). Similarly, $v^{2}_{0}=-1.5$ and $\underline{v}^{1}l^{1}-\bar{v}^{1}u^{1}=11.5$ . The dual function $\phi_{\rm LP}^{\mathcal{T}_{1}}$ is strong in the interval $[0,5]$ , but yields a weaker lower bound outside this interval. If we subsequently evaluate the right-hand side $9.5$ , we see that

\underline{\phi}_{\rm LP}^{\mathcal{T}_{1}}(9.5)=\min\{9.5,-2.75\}=-2.75\not=\phi(9.5)=8.5.

To obtain a strong dual function for the new right-hand side, we identify that node 2 is the node whose bound needs to be improved by further refining the tree by branching (this is the linear function yielding the bound in this part of the domain). By further partitioning the subproblem associated with node 2, we obtain the tree pictured to the right of the first tree in Figure 3. We obtain the dual function

\underline{\phi}_{\rm LP}^{\mathcal{T}_{2}}=\min\{\underline{\phi}_{\rm LP}^{1},\underline{\phi}_{\rm LP}^{3},\underline{\phi}_{\rm LP}^{4}\},

which is strong at the right-hand side $9.5$ .

Note that this new function is no longer strong at the initial right-hand side of 3.5. To ensure that this single dual function remains strong for all previously evaluated right-hand sides, we must take the max over the collection of dual functions found at each iteration. This function is still obtained from the single tree, but we are effectively strengthening the leaf node dual functions by taking the max over all dual solutions arising on the path from the root subproblem (this is still a bound on the optimal solution value to the LP relaxation). In this case, we get the strengthened function

\min\{\max\{\underline{\phi}_{\rm LP}^{1},\underline{\phi}_{\rm LP}^{0}\},\max\{\underline{\phi}_{\rm LP}^{3},\underline{\phi}_{\rm LP}^{2},\underline{\phi}_{\rm LP}^{0}\},\max\{\underline{\phi}_{\rm LP}^{4},\underline{\phi}_{\rm LP}^{2},\underline{\phi}_{\rm LP}^{0}\}\}.

This can be seen as an approximation of $\phi^{\mathcal{T}}_{*}$ by replacing the full value function at each node with an approximation made of just the dual solutions arising on the path to the root node. ∎

Upper approximations of $\phi$ can be obtained by considering the value functions of the second-stage problem (SS) obtained by fixing variables. For example, consider the integer-fixing value function

\bar{\phi}_{\hat{y}}(\beta)=d_{I}^{2}\hat{y}_{I}+\phi_{C}(\beta-A_{\omega I}^{2}\hat{y}_{I})

(IFVF)

obtained by fixing the integer part $\hat{y}_{I}\in\mathbb{Z}^{r_{2}}$ to be equal to that from some previously found second-stage solution $\hat{y}\in Y$ , where $\phi_{C}$ is as defined in (CRVF). Then, we have $\bar{\phi}_{\hat{y}_{I}}(\beta)\geq\phi(\beta)$ for all $\beta\in\mathbb{R}^{m_{2}}$ . If $\hat{y}$ is an optimal solution to the second-stage problem with respect to a given first-stage solution $x\in\mathcal{P}_{1}\cap X$ and a given realized value of $\omega$ , then we have $\hat{y}\in\operatorname{argmin}_{y\in\mathcal{P}_{2}(b^{2}_{\omega}-A^{2}_{\omega}x)\cap Y}d^{2}y$ and $\bar{\phi}_{\hat{y}}$ is strong at $\beta=b^{2}_{\omega}-A^{2}_{\omega}x$ .

In a fashion similar to a cutting plane method, we can iteratively improve the global upper bounding function by taking the minimum of all bounding functions of the form (IFVF) found so far, i.e.,

\bar{\phi}(\beta)=\min_{y\in\mathcal{R}}\bar{\phi}_{y}(\beta),

where $\mathcal{R}$ is the set of all second-stage solutions that have been found when evaluating $\phi(\beta)$ for different values of $\beta$ .

A pictorial example of this type of upper bounding function is shown in Figure 4, where each of the labeled cones shown is the value function of a restriction of the original MILP. The upper bounding function is the minimum over all of these cones.

Figure 4: Upper bounding functions obtained at right-hand sides

\beta_{i},i=1,\dots,5

Because it is particularly relevant in the present context, we also now introduce a function known as the second-stage (optimistic) reaction function. This function is closely related to the risk function but its input is a second stage right-hand side, rather than a first-stage solution. Although this function, like the second-stage value function $\phi$ , takes a right-hand side $\beta$ as its input, it can nevertheless be used to evaluate a first-stage solution $x\in X$ in scenario $\omega\in\Omega$ by evaluating it with respect to $\beta_{\omega}(x)$ . The function is defined as

\rho(\beta)=\inf\big{\{}d^{1}y\mid y\in\operatorname{argmin}\{d^{2}y\mid y\in\mathcal{P}_{2}(\beta)\cap Y\}\big{\}},

(ReF)

for $\beta\in\mathbb{R}^{m_{2}}$ . Note that, although the evaluation of this function appears to require solving a bilevel optimization problem, its evaluation is actually equivalent to solving a lexicographic optimization problem, a somewhat easier computational task.

The importance of the function $\rho$ is to enable us to see that, for (2SMILP), the scenario risk functions $\Xi_{\omega}$ defined in (2SRF) are not in fact completely independent functions but, rather, are connected, since

\Xi_{\omega}(x)=\rho(\beta_{\omega}(x)).

Thus, these functions only differ from each other in the affine map $\beta_{\omega}(x)=b^{2}_{\omega}-A_{\omega}^{2}x$ that is applied to $x$ .

The structure of the functions $\rho$ and $\Xi$ derives from that of $\phi$ and can be understood through a somewhat more involved application of the same principles used to derive the function $\phi$ . Their structure, though combinatorially much more complex, is nevertheless also piecewise polyhedral. Approximations of $\Xi$ can be derived easily from approximation of $\rho$ in a straightforward way, since $\Xi=\mathbb{E}\left[\Xi_{\omega}\right]$ and the scenario risk functions are themselves defined in terms of $\rho$ , as discussed earlier. The approximation of $\rho$ is quite involved, but it can be obtained by methods that are natural generalizations of those used to approximate $\phi$ . The main challenge is that the evaluation of $\rho(\beta)$ for a particular value of $\beta$ itself reduces to the solution of a lexicographic optimization problem, which in turn requires knowledge of $\phi$ . We may approximate $\rho$ by repeatedly evaluating it, extracting primal and dual information from the solution algorithm as we do with $\phi$ , but this requires repeatedly evaluating $\phi$ , which is itself expensive.

In the algorithms we discuss in Section 7, we approach this difficulty by constructing a single approximation of $\phi$ in tandem with the approximation of $\rho$ . We need only ensure that the approximation of $\phi$ is guaranteed to be strong (i.e., equal to the value function) exactly in the region needed to properly evaluate $\rho$ . The result is an approximation of $\rho$ that is strong in the region of a given right-hand side and this is exactly what is needed for a Benders-type algorithm to converge. More details regarding the Benders-type algorithm are contained next in Section 7. Further details on the structure of and methods for approximating $\rho$ and $\Xi$ can be found in Bolusani and Ralphs (2020), which describes a Benders-type algorithm for solving (2SMILP).

To solidify the connection between the notion of duality described in this section and captured in the dual problem (MILPD), we end this section by formally stating both the weak and strong duality properties arising from this theory. These properties are a proper generalization of the well-known ones from linear optimization and can be used to derive optimality conditions that generalize those from LP duality. These are, in turn, the conditions that can be used to derive the reformulations presented in Section 6.

If $F\in\Upsilon^{m_{2}}$ is feasible for (MILPD) and $y\in\mathcal{P}_{2}(b)\cap Y$ , then $F(b)\leq d^{2}y$ .

Let $b\in\mathbb{R}^{m_{2}}$ be such that $\phi(b)<\infty$ . Then, there exists both $F\in\Upsilon^{m_{2}}$ that is feasible for (MILPD) and $y^{*}\in\mathcal{P}_{2}(b)\cap Y$ such that $F(b)=\phi(b)=d^{2}y^{*}$ .

The form of the dual (MILPD) makes these properties rather self-evident, but Theorem 5.3 nevertheless yields optimality conditions that are useful in practice. In particular, the dual functions arising from branch-and-bound algorithms that were described earlier in Section 5.5.1 are the strong dual functions that provide the optimality certificates for the solutions produced by such algorithms and are the basis on which the algorithms described in Section 7.1 are developed.

A crucial step in deriving solution algorithms is to consider some conceptual reformulations that underlie the problems under study, each of which suggests a particular algorithmic strategy. These formulations are heavily based on the duality theory and methodology in the previous section, as we better clarify in the following. In all cases, the goal is to derive, from some initial problem description, a formulation that is, at least in principle, a single-stage mathematical optimization problem that can be tackled with (possibly generalized versions of) the standard algorithmic approaches used for solving mathematical optimization problems. In this case, as in the solution of single-stage MILPs, the main tools are cutting plane methods (branch and cut) and decomposition methods (Benders’ algorithm and the exploitation of the block structure using a Dantzig-Wolfe decomposition).

The first reformulation we describe is a variation on (2SMILP-Alt), the standard formulation used in most of the bilevel optimization literature. This formulation introduces the second-stage variables explicitly and formulates the problem in the form of a classical mathematical optimization problem using a technique that is standard—replacing the requirement that the solution to the second-stage problem be optimal with explicit optimality conditions. To achieve this, we introduce a constraint involving the value function $\phi$ , as well as the second-stage feasibility conditions. This is roughly equivalent to imposing primal and dual feasibility along with equality of the primal and dual objectives in the linear optimization case (the constraint involving the value function must be satisfied at equality, though it is stated as an inequality). The formulation is as follows.


$\displaystyle\min\quad$	$\displaystyle cx+\sum_{\omega\in\Omega}p_{\omega}d^{1}y^{\omega}$
$\displaystyle\operatorname{s.\!t.}\quad$	$\displaystyle A^{1}x\geq b^{1}$		(0a)
	$\displaystyle G^{2}y^{\omega}\geq b^{2}_{\omega}-A^{2}_{\omega}x\quad$	$\displaystyle\forall\omega\in\Omega$	(0b)
	$\displaystyle d^{2}y^{\omega}\leq\phi(b^{2}_{\omega}-A^{2}_{\omega}x)\quad$	$\displaystyle\forall\omega\in\Omega$	(0c)
	$\displaystyle x\in X$		(0d)
	$\displaystyle y^{\omega}\in Y\quad$	$\displaystyle\forall\omega\in\Omega.$	(0e)

It is clear that this formulation cannot be constructed explicitly, but rather, must be solved by the iterative approximation of the constraints involving the value function (which we refer to as the second-stage optimality constraints). This reformulation suggests a family of methods described in Section 7 in which we replace $\phi$ with a primal function $\bar{\phi}$ , as defined in Definition 2).

Notice that, when $r_{2}=0$ , so that the second-stage problem (SS) is a linear optimization problem, we can exploit the fact that the optimality conditions for this problem involve linear functions. This allows for, in essence, substituting for $\phi$ the objective function of the classical LP dual of (SS), after introducing the corresponding variables and constraints. This, overall, leads to a tractable primal-dual reformulation—the technique is applied, for instance, in Coniglio and Tieves (2015). The alternative idea of, rather than the dual of (SS), introducing its KKT conditions, is arguably more popular and has been often exploited in a number of “classical” works on mixed integer bilevel optimization problems, including, among others, Labbé et al. (1998). Note, however, that while there is an analog of this reformulation that applies in the setting of (2SMILP) (see DeNegre (2011)), it has so far proved not to be practical and, therefore, we will not present any algorithms for its solution in Section 7.

The next reformulation we consider exploits the finiteness of $\Omega$ and avoids introducing the second-stage variables explicitly. It reads as follows.

$\displaystyle\min$	$\displaystyle c^{1}x+\sum_{\omega\in\Omega}p_{\omega}z^{\omega}$		(RFR)
$\displaystyle\operatorname{s.\!t.}$	$\displaystyle z^{\omega}\geq\Xi_{\omega}(x)$	$\displaystyle\omega\in\Omega$
	$\displaystyle x\in\mathcal{P}_{1}\cap X$
	$\displaystyle z^{\omega}\in\mathbb{R}$	$\displaystyle\omega\in\Omega.$

This reformulation mirrors the original formulation implicitly adopted when we first defined (2SMILP), in which the second-stage variables are not (explicitly) present. However, we can also interpret it as a projection onto the $X$ -space of the value-function reformulation described in the previous section. In fact, it is not hard to see that the set $\{x\in\mathcal{P}_{1}\mid\Xi(x)<\infty\}$ is exactly $\mathcal{F}^{1}$ (the projection of the feasible region onto the space of the first-stage variables) as defined in (FR-Proj). As such, this formulation is a natural basis for a Benders-type algorithm that we describe in Section 7.1, in which we replace $\Xi$ with an under-estimator to obtain a master problem which is then iteratively improved until convergence.

An apparently unrelated reformulation generalizes the notion of convexification used heavily in the polyhedral theory that underlies the solution methodology of standard MILPs. Convexification considers the following conceptual reformulation:

	$\displaystyle\min$	$\displaystyle cx+\sum_{\omega\in\Omega}p_{\omega}d^{1}y^{\omega}$		(POLY-R)
	s.t.	$\displaystyle(x,y^{\omega})\in\operatorname{conv}(\mathcal{F}^{\omega})\quad\forall\omega\in\Omega,$		(POLY-R)

where $\mathcal{F}^{\omega}$ is the feasible region under scenario $\omega$ , defined as in (FR). Under our earlier assumptions, the convex hull of $\mathcal{F}^{\omega}$ is a polyhedron whose extreme points are members of $\mathcal{F}^{\omega}$ . Thus, due to the linearity of the objective function, we can w.l.o.g. replace the requirement that $(x,y^{\omega})\in\mathcal{F}^{\omega}$ with the requirement that $(x,y^{\omega})\in\operatorname{conv}(\mathcal{F}^{\omega})$ , thereby convexifying the feasible region.

With this reformulation, we have transformed the requirement that the second-stage solution be optimal for the parameterized second-stage problem (SS) into a requirement that the combination of first- and second-stage solutions lie in a polyhedral feasible region. This reformulation suggests a different class of algorithms based on the dynamic generation of valid inequalities, such as those so successfully employed in the case of MILPs. We describe an algorithm of such class in Section 7.2.

Finally, we remark that the finiteness of $\Omega$ allows for solving the problem via a block-angular reformulation based on the formulation (2SMILP-Alt) presented earlier, which is in the spirit of the so-called deterministic equivalent reformulation used in two-stage stochastic optimization. This renders the stochastic problem as a deterministic MIBLP, which can then be solved via standard methods for that case with the requisite further reformulations (of course, exploiting the block structure of the resulting matrices). For details, see Tahernejad (2019).

We now summarize a range of methodologies that arise naturally from the reformulations of the previous section. Any practical method of solving (2SMILP) must have as a fundamental step the evaluation of $\phi(\beta)$ for certain fixed values of $\beta\in\mathbb{R}^{m_{2}}$ , an operation which can be challenging in itself, since the corresponding problem is ${\mathsf{NP}}$ -hard. From the evaluation of $\phi$ , both primal and dual information is obtained, which can be used to build approximations. While some methods explicitly build such approximations, other methods do it only implicitly. In all cases, information about the value function that is built up through repeated evaluations can be exploited.

Similarly, in the dual methods that we describe below, the function $\Xi$ is also evaluated for various values of $x\in X$ (or rather the function $\rho$ ) and, similarly, approximations of this function can be built from primal and dual information obtained during its evaluation. In order to develop computationally tractable methods, a key aspect is to limit the number of such function evaluations as much as possible and to exploit to the maximum extent possible the information generated as a by-product of these single function evaluations.

Decomposition methods are based on generalizations of Benders’ original method of decomposing a given mathematical optimization problem by splitting the variables into two subsets and forming a master problem by projecting out one subset. More concretely, we are simply solving a reformulation of the form (RFR).

For illustration, we consider the simplest case in which we have only continuous variables in both stages ( $r_{1}=r_{2}=0$ ) and $d^{1}=d^{2}$ . Since the first- and second-stage objectives are the same in this case, the full problem is nothing more than a standard linear optimization problem, but Benders’ approach nevertheless applies when either fixing the first stage variables results in a more tractable second-stage LP (such as a min-cost flow problem). In the Benders approach, we (conceptually) rewrite the LP as

\min\left\{cx+\sum_{\omega\in\Omega}p_{\omega}\phi(\beta^{\omega}(x))\;\middle|\;x\in\mathcal{P}_{1}\right\},

where $\phi$ is the value function (2SVF). Note that, because $\Xi(x)=\sum_{\omega\in\Omega}p_{\omega}\phi(\beta^{\omega}(x))$ , this is just a simplification of the original formulation implicitly adopted when we first defined (2SMILP). As we observed earlier, the value function in the LP case is the maximum of linear functions associated with the dual solutions. Recalling that we can restrict the description to only the extreme points of the dual feasible region, we can further rewrite the LP as

\min\left\{cx+\sum_{\omega\in\Omega}p_{\omega}z^{\omega}\;\middle|\;\begin{array}[]{ll}x\in\mathcal{P}_{1}\\ z^{\omega}\geq u\beta^{\omega}(x),u\in\mathcal{E},\omega\in\Omega\\ z^{\omega}\in\mathbb{R},\omega\in\Omega\end{array}\right\},

(LP)

where $\mathcal{E}$ is the set of such extreme points of the dual of the second-stage LP (which we assumed to be bounded and nonempty). Thus, the linear constraints involving the variable $z^{\omega}$ (along with the fact that $z^{\omega}$ is minimized) are precisely equivalent to requiring $z^{\omega}=\phi(b^{2}_{\omega}-A^{2}_{\omega}x)$ , so this reformulation is exactly the formulation (RFR) specialized to this case.

A straightforward solution approach is then to solve (LP) by a cutting plane algorithm, which results in the classical $L$ -shaped method for solving (continuous) stochastic linear optimization problems Van Slyke and Wets (1969). From the point of view we have taken in this article, this method can be interpreted as one that approximates the value function from below as the maximum of the strong dual functions generated in each iteration. The strong dual functions arise from the solutions to the dual of the second-stage problem and yield what are typically called Benders cuts (inequalities of the form imposed in (LP)). The Benders approach is then to iteratively improve this approximation until convergence.

The case $d^{1}\not=d^{2}$ is more complex. The epigraph of the value function of the second-stage problem is no longer necessarily a polyhedral cone, and the function itself is no longer necessarily convex. Formulating the equivalent of (LP) thus requires integer variables. Alternative formulations using the related complementary slackness optimality conditions are also commonly used (see Colson et al. (2005)).

For the case in which there are integer variables, the approach just described can be applied by simply replacing the linear strong dual functions (Benders’ cuts) with strong under-estimators of the risk function constructed from dual functions arising from solution algorithms for the second-stage problem, such as those based on branch and bound described in Section 5.5.1. In this approach, we work directly with the reformulation (RFR), employing the generalized Benders-type algorithm summarized in Figure 5 and a master problem defined as follows.

$\displaystyle\min$	$\displaystyle c^{1}x+\sum_{\omega\in\Omega}p_{\omega}z^{\omega}$		(MASTER)
$\displaystyle\operatorname{s.\!t.}$	$\displaystyle z^{\omega}\geq\underline{\Xi}_{\omega}(x)$	$\displaystyle\omega\in\Omega$
	$\displaystyle x\in\mathcal{P}_{1}$
	$\displaystyle z^{\omega}\in\mathbb{R}$	$\displaystyle\omega\in\Omega.$

Step 0. Initialize $k\leftarrow 1$ , $\underline{\Xi}^{0}_{\omega}(x)=-\infty$ for all $x\in\mathbb{Q}^{n_{1}}$ , $\omega\in\Omega$ . Step 1. Solve the Master Problem a) Set $\displaystyle\underline{\Xi}_{\omega}=\max_{i=0,\dots,k-1}\underline{\Xi}^{i}_{\omega}$ for $\omega\in\Omega$ . b) Solve (MASTER) to obtain an optimal solution $(x^{k},\{z^{\omega}_{k}\}_{\{\omega\in\Omega\}})$ . Step 2. Solve the Subproblem a) Evaluate $\Xi_{\omega}(x^{k})$ to obtain an optimal solution $y^{\omega,k}$ for $\omega\in\Omega$ and the strong under-estimator $\underline{\Xi}^{k}_{\omega}$ . b) Termination check: Is $z^{\omega}_{k}=d^{1}y^{\omega,k}$ for $\omega\in\Omega$ ? 1. If yes, STOP. $x^{k}$ is an optimal solution to (RFR). 2. If no, set $k\leftarrow k+1$ and go to Step 1.

Figure 5: Generalized Benders Algorithm for solving 2SMILPs.

When $d^{1}=d^{2}$ , the approximation of the scenario risk function and of the risk function itself reduces to the direct approximation of the second-stage value function, and the algorithm can be described rather succinctly. A basic version was originally proposed as the integer $L$ -shaped algorithm for two-stage stochastic optimization problems with integer recourse by Laporte and Louveaux (1993); Carøe and Tind (1998). The version based on dual functions from branch and bound that we describe here is described in Hassanzadeh and Ralphs (2014a).

To briefly summarize, as in the LP case, we rewrite (2SMILP) as

\min\left\{cx+\sum_{\omega\in\Omega}p_{\omega}z^{\omega}\;\middle|\;\begin{array}[]{ll}x\in\mathcal{P}_{1}\cap X\\ z^{\omega}\geq\phi(\beta^{\omega}(x))&\omega\in\Omega\\ z^{\omega}\in\mathbb{R}&\omega\in\Omega\end{array}\right\}.

By replacing $\phi$ with the maximum of a set $\mathcal{G}^{\omega}$ of dual functions associated with scenario $\omega\in\Omega$ (alternatively, we can employ one universal set of dual functions, as indicated in (LP) above), we obtain a convergent Benders-like algorithm based on iteratively solving a master problem of the form

\min\left\{cx+\sum_{\omega\in\Omega}p_{\omega}z^{\omega}\;\middle|\;\begin{array}[]{ll}x\in\mathcal{P}_{1}\cap X\\ z^{\omega}\geq F(\beta^{\omega}(x))&F\in\mathcal{G}^{\omega},\omega\in\Omega\\ z^{\omega}\in\mathbb{R}&\omega\in\Omega\end{array}\right\},

which generalizes (LP). The key to making this approach work in practice is that the dual functions we need be easily available as a by-product of evaluating the second-stage value function $\phi$ for a fixed value of $\beta$ .

The most general case in which $d^{1}\not=d^{2}$ is conceptually no more complex than that described above, but the details of the methodology for approximating $\Xi$ and in linearizing the master problem are quite involved. The reader is referred to Bolusani and Ralphs (2020) for the details.

Primal algorithms are based on the implicit solution of (POLY-R) and generalize the well-known framework of branch and cut that has been so successful in the MILP case. This class of algorithms is based on the iterative approximation of $\operatorname{conv}(\mathcal{F}^{\omega})$ beginning with the approximation $\mathcal{P}^{\omega}$ , the feasible region in scenario $\omega$ of the following relaxation obtained by dropping both the value-function constraint (0c) and the integrality requirements (0d) and (0e) from ( ‣ 6.1).


$\displaystyle\min\quad$	$\displaystyle cx+\sum_{\omega\in\Omega}p_{\omega}d^{1}y^{\omega}$
$\displaystyle\operatorname{s.\!t.}\quad$	$\displaystyle A^{1}x\geq b^{1}$		(0a)
	$\displaystyle G^{2}y^{\omega}\geq b^{2}_{\omega}-A^{2}_{\omega}x\quad$	$\displaystyle\forall\omega\in\Omega$	(0b)
	$\displaystyle x\in\mathbb{R}^{n_{1}}_{+}$		(0c)
	$\displaystyle y^{\omega}\in\mathbb{R}^{n_{2}}_{+}\quad$	$\displaystyle\forall\omega\in\Omega.$	(0d)

Being an LP, this relaxation is easily solved, but it is not difficult to see, however, that it is rather weak (see, e.g., Example 7.2). A straightforward way to strengthen it is simply by including the integrality constraints (0d) and (0e) from ( ‣ 6.1). This leads to an MILP relaxation with feasible set

\displaystyle\mathcal{S}^{\omega}=\mathcal{P}^{\omega}\cap(X\times Y)

in scenario $\omega\in\Omega$ , which, while stronger, is clearly more difficult to solve and also still potentially weak—whether adopting it is a computationally good idea is a purely empirical question. When the number of scenarios is large, dropping constraints (0b) from the relaxation may also be advantageous, since this may reduce the size of the relaxation.

As in cutting plane methods for MILPs, the idea is to improve this initial formulation with the addition of inequalities valid for $\mathcal{S}^{\omega}$ , $\mathcal{F}^{\omega}$ , $\bigcup_{\omega\in\Omega}\mathcal{F}^{\omega}$ , or even $\mathcal{F}^{1}$ . In some cases, inequalities may first be derived with respect to $\mathcal{S}^{\omega}$ or $\mathcal{F}^{\omega}$ for some particular scenario $\omega\in\Omega$ and then lifted to become valid for a larger set. Inequalities valid for $\mathcal{S}^{\omega}$ (which can be referred to as feasibility cuts) can be generated using any number of well-known procedures associated with cutting plane algorithms for mixed integer linear programming. Inequalities valid for $\mathcal{F}^{\omega}$ (which can be referred to as optimality cuts) are the more interesting case because they can incorporate information about the value function in order to eliminate members of $\mathcal{P}^{\omega}$ that are not two-stage feasible.

In early work on these methods, the authors of DeNegre and Ralphs (2009) developed inequalities valid for $\mathcal{F}^{\omega}$ in the case for which $\Omega$ is a singleton and the variables must all be integer ( $r_{1}=n_{1}$ and $r_{2}=n_{2}$ ), which illustrate the basic principles. When the input data are integer, a very simple argument can be used to generate an inequality valid for $\mathcal{F}^{\omega}$ but violated by $(\hat{x},\hat{y})$ , an extreme point of $\mathcal{P}^{\omega}$ not in $\mathcal{F}^{\omega}$ , by taking advantage of the discrete nature of the feasible set. Assuming the solution of the LP relaxation is an extreme point of $\mathcal{P}^{\omega}$ , there is thus a hyperplane supporting $\mathcal{P}^{\omega}$ and inducing a face of dimension $0$ . As such, there exist $f\in\mathbb{R}^{n_{1}}$ , $g\in\mathbb{R}^{n_{2}}$ , and $\gamma\in\mathbb{R}$ such that the hyperplane $\{(x,y)\in\mathbb{R}^{n_{1}+n_{2}}\mid fx+gy=\gamma\}$ intersects $\mathcal{P}^{\omega}$ in the unique point $(\hat{x},\hat{y})$ . Hence, we have that $fx+gy\leq\gamma$ for all $(x,y)\in\mathcal{P}^{\omega}$ . Finally, since the face of $\mathcal{P}^{\omega}$ induced by this inequality does not contain any other members of $\mathcal{S}^{\omega}$ , we can “push” the hyperplane until it meets the next integer point without separating any additional members of $\mathcal{F}^{\omega}$ . Hence,

fx+gy\leq\gamma-1

is valid for $\mathcal{F}^{\omega}$ . This procedure is similar in spirit to the Gomory procedure for standard MILPs. It is used, for instance, in Dempe and Kue (2017). We next describe the method with a brief example.

Example 4

Consider the instance

$\max_{x}\min_{y}\left\{y\mid-x+y\leq 2,-2x-y\leq-2,3x-y\leq 3,y\leq 3,x,y\in\mathbb{Z}_{+}\right\},$

with $|\Omega|=1$ , whose feasible region is the set $\mathcal{F}=\{(0,2),(1,0),(2,3)\}$ shown in Figure 6. Solving the LP relaxation yields the point $(1,3)$ , which is not feasible. This point is eliminated by the addition of the inequality $x-2y\geq-4$ , which is valid for the feasible region $\mathcal{F}$ and is obtained as a strengthening of the inequality $x-2y\geq-5$ , which is valid for the LP relaxation itself.

Figure 6: Example of a valid inequality.

∎

This cut generation procedure is enough to yield a converging algorithm in this case, but it amounts to removing infeasible points one by one and is not scalable in general. An important observation is that this cut only exploits integrality of the solutions and does not take into account any information about the second-stage value function.

A generalized version of this basic methodology has since been described in Tahernejad et al. (2016) and enhanced with additional classes of inequalities, including those valid for the general mixed integer case described in Fischetti et al. (2018, 2017). Inequalities valid for more general discrete probability spaces are derived in Gade et al. (2012) for the case $d^{1}=d^{2}$ .

Stronger cuts can be obtained by using disjunctive arguments based on knowledge of the value function. In particular, an option is to add constraints of the form

d^{2}y^{\omega}\leq\bar{\phi}(b^{2}_{\omega}-A^{2}_{\omega}x),

where $\bar{\phi}$ is a primal function, as defined in Definition 2). Such primal functions can take many forms and imposing such constraints may be expensive. In general, the form of such functions will be either affine or piecewise polyhedral (“standard” disjunctive programming techniques can be used to obtain a reformulation which only involves linear functions).

We have introduced a unified framework for multistage mixed integer linear optimization problems which encompasses both multilevel mixed integer linear optimization problems and multistage mixed integer linear optimization problems with recourse. Focusing on the two-stage case, we have investigated the nature of the value function of the second-stage problem and highlighted its connection to dual functions and the theory of duality for mixed integer linear optimization problems. We have summarized different reformulations for this broad class of problems, which rely on either the risk function, the value function, or on their polyhedral nature. We have then presented the main solution techniques for problems of this class, considering both dual- and primal-type methods, the former based on a Benders-like decomposition to approximate either the risk function or the value function, and the latter based on a cutting plane technique that relies on the polyhedral nature of these problems. While much work is still to be done for solving multistage mixed integer linear optimization problems with techniques that are (mutatis mutandis, given their intrinsic harder nature) of comparable efficiency to those for solving single-level problems, we believe that the theoretical understanding of multistage mixed integer linear problems is now sufficiently mature to make this an achievable objective.

This research was made possible with support from National Science Foundation Grants CMMI-1435453, CMMI-0728011, and ACI-0102687, as well as Office of Naval Research Grant N000141912330.

$\displaystyle\min_{x,\{y^{\omega}\}_{\omega\in\Omega}}$

$\displaystyle cx+\sum_{\omega\in\Omega}p_{\omega}d^{1}y^{\omega}$

(2SMILP-Alt)

$\displaystyle\operatorname{s.\!t.}$

$\displaystyle A^{1}x\geq b^{1}$

$\displaystyle x\in X$

$\displaystyle\left.\begin{array}[]{llllr}y^{\omega}\in&\operatorname{argmin}_{y}&d^{2}y\\ &\operatorname{s.\!t.}&A^{2}_{\omega}x+G^{2}y\geq b^{2}_{\omega}\\ &&y\in Y\end{array}\right\}\forall\omega\in\Omega.$

Amaldi et al. [2013a] E. Amaldi, A. Capone, S. Coniglio, and L.G. Gianoli. Energy-aware traffic engineering with elastic demands and mmf bandwidth allocation. In Proc of 18th IEEE Int. Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD 2013), pages 169–174. IEEE, 2013a.
Amaldi et al. [2013b] E. Amaldi, A. Capone, S. Coniglio, and L.G. Gianoli. Network optimization problems subject to max-min fair flow allocation. IEEE Communications Letters, 17(7):1463–1466, 2013b.
Amaldi et al. [2013c] E. Amaldi, S. Coniglio, L.G. Gianoli, and C.U. Ileri. On single-path network routing subject to max-min fair flow allocation. Electronic Notes in Discrete Mathematics, 41:543–550, 2013c.
Amaldi et al. [2014] E. Amaldi, S. Coniglio, and L. Taccari. Maximum throughput network routing subject to fair flow allocation. In Proc. of Int. Symp. on Combinatorial Optimization (ISCO 2014), pages 1–12. Springer, 2014.
Amouzegar and Moshirvaziri [1999] M.A. Amouzegar and K. Moshirvaziri. Determining optimal pollution control policies: An application of bilevel programmingoa. European Journal of Operational Research, 119(1):100–120, 1999.
An et al. [2011] B. An, J. Pita, E. Shieh, M. Tambe, C. Kiekintveld, and J. Marecki. Guards and Protect: Next generation applications of security games. ACM SIGecom Exchanges, 10(1):31–34, 2011.
Avenhaus et al. [1991] R. Avenhaus, A. Okada, and S. Zamir. Inspector leadership with incomplete information. In Game equilibrium models IV, pages 319–361. Springer, 1991.
Bard [1983] J. Bard. An algorithm for solving the general bilevel programming problem. Mathematics of Operations Research, 8(2):260–272, 1983.
Bard and Moore [1992] J. Bard and J.T. Moore. An algorithm for the discrete bilevel programming problem. Naval Research Logistics, 39(3):419–435, 1992.
Bard et al. [2000] J.F. Bard, J. Plummer, and J.C. Sourie. A bilevel programming approach to determining tax credits for biofuel production. European Journal of Operational Research, 120:30–46, 2000.
Baringo and Conejo [2012] L. Baringo and A.J. Conejo. Transmission and wind power investment. IEEE Transactions on Power Systems, 27(2):885–893, May 2012.
Basilico et al. [2016] N. Basilico, S. Coniglio, and N. Gatti. Methods for finding leader-follower equilibria with multiple followers (extended abstract). In Proc. of 2016 Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2016), pages 1363–1364, 2016.
Basilico et al. [2017] N. Basilico, S. Coniglio, N. Gatti, and A. Marchesi. Bilevel programming approaches to the computation of optimistic and pessimistic single-leader-multi-follower equilibria. In Proc. of 16th Int. Symp. on Experimental Algorithms (SEA 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
Basilico et al. [2020] N. Basilico, S. Coniglio, N. Gatti, and A. Marchesi. Bilevel programming methods for computing single-leader-multi-follower equilibria in normal-form and polymatrix games. EURO Journal on Computational Optimization, 8:3–31, 2020.
Ben-Ayed and Blair [1990] O. Ben-Ayed and C. Blair. Computational difficulties of bilevel linear programming. Operations Research, 38:556–560, 1990.
Ben-Ayed et al. [1992] O. Ben-Ayed, C. Blair, D. Boyce, and L. LeBlanc. Construction of a real-world bilevel linear programming model of the highway network design problem. Annals of Operations Research, 34(1):219–254, 1992.
Bertsekas [2017] D.P. Bertsekas. Dynamic Programming and Optimal Control. Athena scientific, 2017.
Birge and Louveaux [2011] J.R. Birge and F. Louveaux. Introduction to stochastic programming. Springer Science & Business Media, 2011.
Blair [1995] C.E. Blair. A closed-form representation of mixed-integer program value functions. Mathematical Programming, 71(2):127–136, 1995.
Blair and Jeroslow [1977] C.E. Blair and R.G. Jeroslow. The value function of a mixed integer program: I. Discrete Mathematics, 19(2):121–138, 1977.
Blair and Jeroslow [1979] C.E. Blair and R.G. Jeroslow. The value function of a mixed integer program: Ii. Discrete Mathematics, 25(1):7–19, 1979.
Blair and Jeroslow [1984] C.E. Blair and R.G. Jeroslow. Constructive characterizations of the value-function of a mixed-integer program i. Discrete Applied Mathematics, 9(3):217–233, 1984.
Bolusani and Ralphs [2020] S. Bolusani and T.K. Ralphs. A Framework for Generalized Benders’ Decomposition and Its Application to Multilevel Optimization. Technical report, COR@L Laboratory Report 20T-004, Lehigh University, 2020. URL http://coral.ie.lehigh.edu/~ted/files/papers/MultilevelBenders20.pdf.
Bracken and McGill [1973] J. Bracken and J.T. McGill. Mathematical programs with optimization problems in the constraints. Operations Research, 21(1):37–44, 1973.
Burgard et al. [2003] A.P. Burgard, P. Pharkya, and C.D. Maranas. OptKnock: A Bilevel Programming Framework for Identifying Gene Knockout Strategies for Microbial Strain Optimization. Biotechnology and Bioengineering, 84:647–657, 2003.
Calamai and Vicente [1994] P. Calamai and L. Vicente. Generating quadratic bilevel programming problems. ACM Transactions on Mathematical Software, 20:103–119, 1994.
Caprara et al. [2016] A. Caprara, M. Carvalho, A. Lodi, and G. Woeginger. Bilevel knapsack with interdiction constraints. INFORMS Journal on Computing, 28(2):319–333, 2016.
Caramia and Mari [2015] M. Caramia and R. Mari. Enhanced exact algorithms for discrete bilevel linear problems. Optimization Letters, 9(7):1447–1468, 2015.
Carøe and Tind [1998] C.C. Carøe and J. Tind. L-shaped decomposition of two-stage stochastic programs with integer recourse. Mathematical Programming, 83(1):451–464, 1998.
Castiglioni et al. [2019a] M. Castiglioni, A. Marchesi, and N. Gatti. Be a leader or become a follower: the strategy to commit to with multiple leaders. In Proc. of 28th Int. Joint Conf. on Artificial Intelligence (IJCAI 2019), 2019a.
Castiglioni et al. [2019b] M. Castiglioni, A. Marchesi, N. Gatti, and S. Coniglio. Leadership in singleton congestion games: What is hard and what is easy. Artificial Intelligence, 277:103–177, 2019b.
Celli et al. [2019] A. Celli, S. Coniglio, and N. Gatti. Computing optimal ex ante correlated equilibria in two-player sequential games. In Proc. of 18th Int. Conf. on Autonomous Agents and MultiAgent Systems (AAMAS 2019), pages 909–917. International Foundation for Autonomous Agents and Multiagent Systems, 2019.
Celli et al. [2020] A. Celli, S. Coniglio, and N. Gatti. Private bayesian persuasion with sequential games. In Proc. of 34th AAAI Conf. on Artificial Intelligence (AAAI 2020), pages 1–8. AAAI Press, 2020.
Church and Scaparra [2006] R.L. Church and M.P. Scaparra. Protecting critical assets: The $r$ -interdiction median problem with fortification. Geographical Analysis, 39(2):129–146, 2006.
Church et al. [2004] R.L. Church, M.P. Scaparra, and R.S. Middleton. Identifying critical infrastructure: The median and covering facility interdiction problems. Annals of the Association of American Geographers, 94(3):491–502, 2004.
Clark and Westerberg [1990] P.A. Clark and A.W. Westerberg. Bilevel programming for steady-state chemical process design i. fundamentals and algorithms. Computers & Chemical Engineering, 14(1):87–97, 1990.
Colson et al. [2005] B. Colson, P. Marcotte, and G. Savard. Bilevel programming: A survey. 4OR: A Quarterly Journal of Operations Research, 3(2):87–107, 2005.
Coniglio and Gualandi [2017] S. Coniglio and S. Gualandi. On the separation of topology-free rank inequalities for the max stable set problem. In Proc. of 16th Int. Symp. on Experimental Algorithms (SEA 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
Coniglio and Tieves [2015] S. Coniglio and M. Tieves. On the generation of cutting planes which maximize the bound improvement. In Proc. of 14th Int. Symp. on Experimental Algorithms (SEA 2015), pages 97–109. Springer, 2015.
Coniglio et al. [2017] S. Coniglio, N. Gatti, and A. Marchesi. Pessimistic leader-follower equilibria with multiple followers. In Proc. of 26th Int. Joint Conf. on Artificial Intelligence (IJCAI 2017), pages 171–177. AAAI Press, 2017.
Coniglio et al. [2020a] S. Coniglio, N. Gatti, and A. Marchesi. Computing a pessimistic stackelberg equilibrium with multiple followers: The mixed-pure case. Algorithmica, 82(5):1189–1238, 2020a.
Coniglio et al. [2020b] S. Coniglio, M. Sirvent, and M. Weibelzahl. Airport capacity extension, fleet investment, and optimal aircraft scheduling in a multi-level market model: Quantifying the costs of imperfect markets. Submitted, 2020b.
Conitzer and Korzhyk [2011] V. Conitzer and D. Korzhyk. Commitment to correlated strategies. In Proc. of 25th AAAI Conf. on Artificial Intelligence (AAAI 2011), pages 632–637, 2011.
Conitzer and Sandholm [2006] V. Conitzer and T. Sandholm. Computing the optimal strategy to commit to. In Proc. of 7th ACM Conf. on Electronic Commerce (EC 2006), pages 82–90, 2006.
Cook [1971] S.A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, pages 151–158. ACM, 1971.
Cormican et al. [1998] K.J. Cormican, D.P. Morton, and R.K. Wood. Stochastic network interdiction. Operations Research, 46(2):184–197, 1998.
Cote et al. [2003] J.-P. Cote, P Marcotte, and G. Savard. A bilevel modelling approach to pricing and fare optimisation in the airline industry. Journal of Revenue and Pricing Management, 2(1):23–36, 2003.
Dempe [2001] S. Dempe. Discrete bilevel optimization problems. Technical Report D-04109, Institut fur Wirtschaftsinformatik, Universitat Leipzig, Leipzig, Germany, 2001.
Dempe and Kue [2017] S. Dempe and F. Mefo Kue. Solving discrete linear bilevel optimization problems using the optimal value reformulation. Journal of Global Optimization, 68(2):255–277, Jun 2017. ISSN 1573-2916. doi: 10.1007/s10898-016-0478-5. URL https://doi.org/10.1007/s10898-016-0478-5.
Dempe et al. [2005] S. Dempe, V. Kalashnikov, and R. Z. Rios-Mercado. Discrete bilevel programming: Application to a natural gas cash-out problem. European Journal of Operational Research, 166(2):469–488, 2005.
DeNegre [2011] S. DeNegre. Interdiction and Discrete Bilevel Linear Programming. PhD, Lehigh University, 2011. URL http://coral.ie.lehigh.edu/{~}ted/files/papers/ScottDeNegreDissertation11.pdf.
DeNegre and Ralphs [2009] S. DeNegre and T.K. Ralphs. A Branch-and-Cut Algorithm for Bilevel Integer Programming. In Proceedings of the Eleventh INFORMS Computing Society Meeting, pages 65–78, 2009. doi: 10.1007/978-0-387-88843-9˙4. URL http://coral.ie.lehigh.edu/~ted/files/papers/BILEVEL08.pdf.
Dhamdhere et al. [2005] K. Dhamdhere, R. Ravi, and M. Singh. On two-stage stochastic minimum spanning trees. In International Conference on Integer Programming and Combinatorial Optimization, pages 321–334. Springer, 2005.
Dyer and Stougie [2006] M. Dyer and L. Stougie. Computational complexity of stochastic programming problems. Mathematical Programming, 106(3):423–432, May 2006. ISSN 1436-4646. doi: 10.1007/s10107-005-0597-0. URL https://doi.org/10.1007/s10107-005-0597-0.
Faísca et al. [2007] N.P. Faísca, V. Dua, B. Rustem, P.M. Saraiva, and E.N. Pistikopoulos. Parametric global optimisation for bilevel programming. Journal of Global Optimization, 38:609–623, 2007.
Fischetti et al. [2017] M. Fischetti, I. Ljubić, M. Monaci, and M. Sinnl. A new general-purpose algorithm for mixed-integer bilevel linear programs. Operations Research, 65(6):1615–1637, 2017.
Fischetti et al. [2018] M. Fischetti, I. Ljubić, M. Monaci, and M. Sinnl. On the use of intersection cuts for bilevel optimization. Mathematical Programming, 72:77–103, 2018.
Furini et al. [2019] F. Furini, I. Ljubic, S. Martin, and P. San Segundo. The maximum clique interdiction game. European Journal of Operational Research, 277(1):112–127, 2019.
Gade et al. [2012] D. Gade, S. Küçükyavuz, and S. Sen. Decomposition algorithms with parametric gomory cuts for two-stage stochastic integer programs. Mathematical Programming, pages 1–26, 2012.
Gan et al. [2018] J. Gan, E. Elkind, and M. Wooldridge. Stackelberg security games with multiple uncoordinated defenders. In Proc. of 17th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2008), 2018.
Garcés et al. [2009] L.P. Garcés, A.J. Conejo, R. García-Bertrand, and R. Romero. A bilevel approach to transmission expansion planning within a market environment. IEEE Transactions on Power Systems, 24(3):1513–1522, Aug 2009.
Garey and Johnson [1979] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Thoery of NP-Completeness. W.H. Freeman and Company, 1979.
Gendreau et al. [1996] M. Gendreau, G. Laporte, and R. Séguin. Stochastic vehicle routing. European Journal of Operational Research, 88(1):3–12, 1996.
Ghare et al. [1971] P.M. Ghare, D.C. Montgomery, and W.C. Turner. Optimal interdiction policy for a flow network. Naval Research Logistics Quarterly, 18:27–45, 1971.
Gørtz et al. [2012] I.L. Gørtz, V. Nagarajan, and R. Saket. Stochastic vehicle routing with recourse. In International Colloquium on Automata, Languages, and Programming, pages 411–423. Springer, 2012.
Grass and Fischer [2016] E. Grass and K. Fischer. Two-stage stochastic programming in disaster management: A literature survey. Surveys in Operations Research and Management Science, 21(2):85–100, 2016.
Grimm et al. [2016] V. Grimm, A. Martin, M. Martin, M. Weibelzahl, and G. Zöttl. Transmission and generation investment in electricity markets: The effects of market splitting and network fee regimes. European Journal of Operational Research, 254(2):493 – 509, 2016. ISSN 0377-2217. doi: https://doi.org/10.1016/j.ejor.2016.03.044. URL http://www.sciencedirect.com/science/article/pii/S0377221716301904.
Gupta et al. [2007] A. Gupta, R. Ravi, and A. Sinha. Lp rounding approximation algorithms for stochastic network design. Mathematics of Operations Research, 32(2):345–364, 2007.
Güzelsoy [2009] M. Güzelsoy. Dual Methods in Mixed Integer Linear Programming. PhD, Lehigh University, 2009. URL http://coral.ie.lehigh.edu/{~}ted/files/papers/MenalGuzelsoyDissertation09.pdf.
Güzelsoy and Ralphs [2007] M. Güzelsoy and T.K. Ralphs. Duality for Mixed-Integer Linear Programs. International Journal of Operations Research, 4:118–137, 2007. URL http://coral.ie.lehigh.edu/~ted/files/papers/MILPD06.pdf.
Hanasusanto et al. [2016] G.A. Hanasusanto, D. Kuhn, and W. Wiesemann. A comment on “computational complexity of stochastic programming problems”. Mathematical Programming, 159(1):557–569, Sep 2016. ISSN 1436-4646. doi: 10.1007/s10107-015-0958-2. URL https://doi.org/10.1007/s10107-015-0958-2.
Hansen et al. [1992] P. Hansen, B. Jaumard, and G. Savard. New branch-and-bound rules for linear bilevel programming. SIAM Journal on Scientific and Statistical Computing, 13(5):1194–1217, 1992.
Hassanzadeh and Ralphs [2014a] A. Hassanzadeh and T.K. Ralphs. A Generalized Benders’ Algorithm for Two-Stage Stochastic Program with Mixed Integer Recourse. Technical Report COR@L Laboratory Report 14T-005, Lehigh University, 2014a. URL http://coral.ie.lehigh.edu/~ted/files/papers/SMILPGenBenders14.pdf.
Hassanzadeh and Ralphs [2014b] A. Hassanzadeh and T.K. Ralphs. On the Value Function of a Mixed Integer Linear Optimization Problem and an Algorithm for Its Construction. Technical report, COR@L Laboratory Report 14T-004, Lehigh University, 2014b. URL http://coral.ie.lehigh.edu/~ted/files/papers/MILPValueFunction14.pdf.
Held and Woodruff [2005] H. Held and D.L. Woodruff. Heuristics for multi-stage interdiction of stochastic networks. Journal of Heuristics, 11(5-6):483–500, 2005.
Hemmati and Smith [2016] M. Hemmati and J.C. Smith. A mixed integer bilevel programming approach for a competitive set covering problem. Technical report, Clemson University, 2016.
Hobbs and Nelson [1992] B.F. Hobbs and S.K. Nelson. A nonlinear bilevel model for analysis of electric utility demand-side planning issues. Annals of Operations Research, 34(1):255–274, 1992.
Israeli [1999] E. Israeli. System Interdiction and Defense. PhD thesis, Naval Postgraduate School, 1999.
Israeli and Wood [2002] E. Israeli and R.K. Wood. Shortest path network interdiction. Networks, 40(2):97–111, 2002.
Janjarassuk and Linderoth [2008] U. Janjarassuk and J. Linderoth. Reformulation and sampling to solve a stochastic network interdiction problem. Networks, 52:120–132, 2008.
Jeroslow [1985] R.G. Jeroslow. The polynomial hierarchy and a simple model for competitive analysis. Mathematical Programming, 32(2):146–164, Jun 1985. ISSN 1436-4646. doi: 10.1007/BF01586088. URL https://doi.org/10.1007/BF01586088.
Kall and Mayer [2010] P. Kall and J. Mayer. Stochastic linear programming: models, theory, and computation. Springer Verlag, 2010. ISBN 1441977287.
Kara and Verter [2004] B. Kara and V. Verter. Designing a road network for hazardous materials transportation. Transportation Science, 38(2):188–196, 2004.
Karp [1975] R.M. Karp. On the computational complexity of combinatorial problems. Networks, 5:45–68, 1975.
Katriel et al. [2007] I. Katriel, C. Kenyon-Mathieu, and E. Upfal. Commitment under uncertainty: Two-stage stochastic matching problems. In International Colloquium on Automata, Languages, and Programming, pages 171–182. Springer, 2007.
Kiekintveld et al. [2009] C. Kiekintveld, M. Jain, J. Tsai, J. Pita, F. Ordóñez, and M. Tambe. Computing optimal randomized resource allocations for massive security games. In AAMAS, pages 689–696, 2009.
Klein [2019] K.-M. Klein. About the complexity of two-stage stochastic ips, 2019.
Kong et al. [2006] N. Kong, A.J. Schaefer, and B. Hunsaker. Two-stage integer programs with stochastic right-hand sides: a superadditive dual approach. Mathematical Programming, 108(2):275–296, 2006.
Köppe et al. [2010] M. Köppe, M. Queyranne, and C. T. Ryan. Parametric integer programming algorithm for bilevel mixed integer programs. Journal of Optimization Theory and Applications, 146(1):137–150, Jul 2010.
Kulkarni and Shanbhag [2014] A.A. Kulkarni and U.V. Shanbhag. A shared-constraint approach to multi-leader multi-follower games. Set-Valued and Variational Analysis, 22(4):691–720, 2014.
Labbé and Violin [2013] M. Labbé and A. Violin. Bilevel programming and price setting problems. 4OR, 11(1):1–30, 2013.
Labbé et al. [1998] M. Labbé, P. Marcotte, and G. Savard. A bilevel model of taxation and its application to optimal highway pricing. Management Science, 44:1608–1622, 1998.
Laporte and Louveaux [1993] G. Laporte and F.V. Louveaux. The integer l-shaped method for stochastic integer programs with complete recourse. Operations research letters, 13(3):133–142, 1993.
Laszka et al. [2016] A. Laszka, J. Lou, and Y. Vorobeychik. Multi-defender strategic filtering against spear-phishing attacks. In Proc. of 30th AAAI Conf. on Artificial Intelligence (AAAI 2016), 2016.
Leyffer and Munson [2010] S. Leyffer and T. Munson. Solving multi-leader-common-follower games. OPT MET SO, 25(4):601–623, 2010.
Lodi et al. [2009] A. Lodi, T.K. Ralphs, F. Rossi, and S. Smriglio. Interdiction branching. Technical Report OR/09/10, DEIS-Università di Bologna, 2009.
Lou and Vorobeychik [2015] J. Lou and Y. Vorobeychik. Equilibrium analysis of multi-defender security games. In Proc. of 24th Int. Joint Conf. on Artificial Intelligence (IJCAI 2019), 2015.
Lou et al. [2017] J. Lou, A.M. Smith, and Y. Vorobeychik. Multidefender security games. IEEE INTELL SYST, 32(1):50–60, 2017.
Louveaux and van der Vlerk [1993] F.V. Louveaux and M.H. van der Vlerk. Stochastic programming with simple integer recourse. Mathematical programming, 61(1):301–325, 1993.
Lozano and Smith [2017] L. Lozano and J.C. Smith. A value-function-based exact approach for the bilevel mixed-integer programming problem. Operations Research, 65(3):768–786, 2017.
Mahajan [2009] A. Mahajan. On Selecting Disjunctions in Mixed Integer Linear Programming. PhD, Lehigh University, 2009. URL http://coral.ie.lehigh.edu/{~}ted/files/papers/AshutoshMahajanDissertation09.pdf.
Mahajan and Ralphs [2010] A. Mahajan and T.K. Ralphs. On the Complexity of Selecting Disjunctions in Integer Programming. SIAM Journal on Optimization, 20(5):2181–2198, 2010. doi: 10.1137/080737587. URL http://coral.ie.lehigh.edu/~ted/files/papers/%****␣MIBLP.bbl␣Line␣700␣****Branching08.pdf.
Marchesi et al. [2018] A. Marchesi, S. Coniglio, and N. Gatti. Leadership in singleton congestion games. In Proc. of 27th Int. Joint Conf. on Artificial Intelligence (IJCAI 2018), pages 447–453. AAAI Press, 2018.
McMasters and Mustin [1970] A.W. McMasters and T.M. Mustin. Optimal interdiction of a supply network. Naval Research Logistics Quarterly, 17:261–268, 1970.
Migdalas [1995] A. Migdalas. Bilevel programming in traffic planning: models, methods and challenge. Journal of Global Optimization, 7:381–405, 1995.
Moore and Bard [1990] J.T. Moore and J.F. Bard. The mixed integer linear bilevel programming problem. Operations Research, 38(5):911–921, 1990.
Pang and Fukushima [2005] J-S. Pang and M. Fukushima. Quasi-variational inequalities, generalized nash equilibria, and multi-leader-follower games. Computational Management Science, 2(1):21–56, 2005.
Paruchuri et al. [2008] P. Paruchuri, J.P. Pearce, J. Marecki, M. Tambe, F. Ordonez, and S. Kraus. Playing games for security: an efficient exact algorithm for solving bayesian stackelberg games. In Proc. of 7th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2008), pages 895–902, 2008.
Ralphs and Güzelsoy [2005] T.K. Ralphs and M. Güzelsoy. The SYMPHONY Callable Library for Mixed Integer Programming. In Proceedings of the Ninth INFORMS Computing Society Conference, pages 61–76, 2005. doi: 10.1007/0-387-23529-9˙5. URL http://coral.ie.lehigh.edu/~ted/files/papers/SYMPHONY04.pdf.
Ralphs and Güzelsoy [2006] T.K. Ralphs and M. Güzelsoy. Duality and Warm Starting in Integer Programming. In The Proceedings of the 2006 NSF Design, Service, and Manufacturing Grantees and Research Conference, 2006. URL http://coral.ie.lehigh.edu/~ted/files/papers/DMII06.pdf.
Rockafellar and Uryasev [2000] R.T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2:21–42, 2000.
Rutenburg [1994] V. Rutenburg. Propositional truth maintenance systems: Classification and complexity analysis. Annals of Mathematics and Artificial Intelligence, 10(3):207–231, 1994.
Saharidis and Ierapetritou [2008] G.K. Saharidis and M.G. Ierapetritou. Resolution method for mixed integer bi-level linear problems based on decomposition technique. Journal of Global Optimization, 44(1):29–51, 2008.
Sandholm [2002] W.H. Sandholm. Evolutionary implementation and congestion pricing. The Review of Economic Studies, 69(3):667–689, 2002.
Scaparra and Church [2008] M.P. Scaparra and R.L. Church. A bilevel mixed-integer program for critical infrastructure protection planning. Computers and Operations Research, 35(6):1905–1923, 2008. ISSN 0305-0548.
Schaefer and Umans [2002] M. Schaefer and C. Umans. Completeness in the polynomial-time hierarchy: A compendium. SIGACT news, 33(3):32–49, 2002.
Schultz et al. [1998] R. Schultz, L. Stougie, and M.H. Van Der Vlerk. Solving stochastic programs with integer recourse by enumeration: A framework using Gröbner basis. Mathematical Programming, 83(1):229–252, 1998.
Segundo et al. [2019] P. San Segundo, S. Coniglio, F. Furini, and I. Ljubić. A new branch-and-bound algorithm for the maximum edge-weighted clique problem. European Journal of Operational Research, 278(1):76–90, 2019.
Sen and Higle [2005] S. Sen and J.L. Higle. The $C^{3}$ theorem and a $D^{2}$ algorithm for large scale stochastic mixed-integer programming: Set convexification. Mathematical Programming, 104(1):1–20, 2005. ISSN 0025-5610.
Shapiro [2003] A. Shapiro. Monte carlo sampling methods. Handbooks in operations research and management science, 10:353–425, 2003.
Sherali and Fraticelli [2002] H.D. Sherali and B.M.P. Fraticelli. A modification of Benders’ decomposition algorithm for discrete subproblems: An approach for stochastic programs with integer recourse. Journal of Global Optimization, 22(1):319–342, 2002.
Sherali and Zhu [2006] H.D. Sherali and X. Zhu. On solving discrete two-stage stochastic programs having mixed-integer first-and second-stage variables. Mathematical Programming, 108(2):597–616, 2006.
Smith et al. [2014] A. Smith, Y. Vorobeychik, and J. Letchford. Multidefender security games on networks. ACM SIGMETRICS Performance Evaluation Review, 41(4):4–7, 2014.
Stackelberg [2010] H. Von Stackelberg. Market structure and equilibrium. Springer Science & Business Media, 2010.
Stockmeyer [1976] L.J. Stockmeyer. The polynomial-time hierarchy. Theoretical Computer Science, 3:1–22, 1976.
Tahernejad [2019] S. Tahernejad. Two-stage Mixed Integer Stochastic Bilevel Linear Optimization. PhD, Lehigh University, 2019.
Tahernejad et al. [2016] S. Tahernejad, T.K. Ralphs, and S.T. DeNegre. A Branch-and-Cut Algorithm for Mixed Integer Bilevel Linear Optimization Problems and Its Implementation. Mathematical Programming Computation (to appear), 2016. URL http://coral.ie.lehigh.edu/~ted/files/papers/MIBLP16.pdf. To appear, Mathematical Programming Computation.
Tamble [2011] M. Tamble. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press, 2011. ISBN 1107096421, 9781107096424.
Uryasev [2000] S. Uryasev. Conditional value-at-risk: Optimization algorithms and applications. In Proceedings of the IEEE/IAFE/INFORMS 2000 Conference on Computational Intelligence for Financial Engineering (CIFEr)(Cat. No. 00TH8520), pages 49–57. IEEE, 2000.
Van Slyke and Wets [1969] R.M. Van Slyke and R. Wets. L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal on Applied Mathematics, pages 638–663, 1969.
Vicente et al. [1996] L. Vicente, G. Savard, and J. Júdice. Discrete linear bilevel programming problem. Journal of Optimization Theory and Applications, 89(3):597–614, 1996.
Vicente and Calamai [1994] L.N. Vicente and P.H. Calamai. Bilevel and multilevel programming: A bibliography review. Journal of Global Optimization, 5(3):291–306, 1994.
von Stengel and Zamir [2010] B. von Stengel and S. Zamir. Leadership games with convex strategy sets. Games and Economic Behavior, 69(2):446–457, 2010.
Wallace and Ziemba [2005] S.W. Wallace and W.T. Ziemba. Applications of stochastic programming. SIAM, 2005.
Wang and Xu [2017] L. Wang and P. Xu. The watermelon algorithm for the bilevel integer linear programming problem. SIAM Journal on Optimization, 27(3):1403–1430, 2017.
Wen and Yang [1990] U.P. Wen and Y.H. Yang. Algorithms for solving the mixed integer two-level linear programming problem. Computers & Operations Research, 17(2):133–142, 1990.
Williams [1996] H.P. Williams. Duality in mathematics and linear and integer programming. Journal of Optimization Theory and Applications, 90(2):257–278, 1996.
Wollmer [1964] R. Wollmer. Removing arcs from a network. Operations Research, 12(6):934–940, 1964.
Wolsey [1981a] L.A. Wolsey. Integer programming duality: Price functions and sensitivity analysis. Mathematical Programming, 20(1):173–195, 1981a. ISSN 0025-5610.
Wolsey [1981b] L.A. Wolsey. Integer programming duality: Price functions and sensitivity analaysis. Mathematical Programming, 20:173–195, 1981b.
Wood [1993] R.K. Wood. Deterministic network interdiction. Mathematical and Computer Modelling, 17(2):1–18, 1993.
Xu and Wang [2014] P. Xu and L. Wang. An exact algorithm for the bilevel mixed integer linear programming problem under three simplifying assumptions. Computers & operations research, 41:309–318, 2014.
Zeng and An [2014] B. Zeng and U. An. Solving bilevel mixed integer program by reformulations and decomposition. Optimization online, pages 1–34, 2014.
Zhang et al. [2016] Y. Zhang, L.V. Snyder, T.K. Ralphs, and Z. Xue. The Competitive Facility Location Problem Under Disruption Risks. Transportation Research Part E: Logistics and Transportation Review, 93, 2016. ISSN 13665545. doi: 10.1016/j.tre.2016.07.002. URL http://coral.ie.lehigh.edu/~ted/files/papers/CFLPD16.pdf.