This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Numerical Methods for Convex Multistage Stochastic Optimization

Guanghui Lan Georgia Institute of Technology, Atlanta, Georgia 30332, USA, [email protected]
Research of this author was partially supported by the NSF grant DMS-1953199 and the NSF AI Institute grant NSF-2112533.
   Alexander Shapiro Georgia Institute of Technology, Atlanta, Georgia 30332, USA, [email protected]
Research of this author was partially supported by Air Force Office of Scientific Research (AFOSR) under Grant FA9550-22-1-0244.

Optimization problems involving sequential decisions in a stochastic environment were studied in Stochastic Programming (SP), Stochastic Optimal Control (SOC) and Markov Decision Processes (MDP). In this paper we mainly concentrate on SP and SOC modelling approaches. In these frameworks there are natural situations when the considered problems are convex. Classical approach to sequential optimization is based on dynamic programming. It has the problem of the so-called “Curse of Dimensionality”, in that its computational complexity increases exponentially with increase of dimension of state variables. Recent progress in solving convex multistage stochastic problems is based on cutting planes approximations of the cost-to-go (value) functions of dynamic programming equations. Cutting planes type algorithms in dynamical settings is one of the main topics of this paper. We also discuss Stochastic Approximation type methods applied to multistage stochastic optimization problems. From the computational complexity point of view, these two types of methods seem to be complimentary to each other. Cutting plane type methods can handle multistage problems with a large number of stages, but a relatively smaller number of state (decision) variables. On the other hand, stochastic approximation type methods can only deal with a small number of stages, but a large number of decision variables.


Key words: Stochastic Programming, Stochastic Optimal Control, Markov Decision Process, dynamic programming, risk measures, Stochastic Dual Dynamic Programming, Stochastic Approximation method, cutting planes algorithm.

AMS subject classifications: 65K05, 90C15, 90C39, 90C40.

1 Introduction

Traditionally different communities of researchers dealt with optimization problems involving uncertainty, modelled in stochastic terms, using different terminology and modeling frameworks. In this respect we can point to the fields of Stochastic Programming (SP), Stochastic Optimal Control (SOC) and Markov Decision Processes (MDP). Historically the developments in SP on one hand and SOC and MDP on the other, went along different directions with different modeling frameworks and solution methods. In this paper we mainly concentrate on SP and SOC approaches. In these frameworks there are natural situations when the considered problems are convex. In continuous optimization convexity is the main consideration allowing development of efficient numerical algorithms, [30].

The main goal of this paper is to present some recent developments in numerical approaches to solving convex optimization problems involving sequential decision making. We do not try to give a comprehensive review of the subject with a complete list of references, etc. Rather the aim is to present a certain point of view about some recent developments in solving convex multistage stochastic problems.

The SOC modeling is convenient for demonstration of some basic ideas since on one hand it can be naturally written in the MDP terms, and on the other hand it can be formulated in the Stochastic Programming framework. Stochastic Programming (SP) has a long history. Two stage stochastic programming (with recourse) was introduced in Dantzig [8] and Beale [2], and was intrinsically connected with linear programming. From the beginning SP was aimed at numerical solutions. Until about twenty years ago, the modeling approach to two and multistage SP was predominately based on construction of scenarios represented by scenario trees. This approach allows to formulate the so-called deterministic equivalent optimization problem with the number of decision variables more or less proportional to the number of scenarios. When the deterministic equivalent could be represented as a linear program, such problems were considered to be numerically solvable. Because of that, the topic of SP was often viewed as a large scale linear programming. We can refer for presentation of these developments to Birge [6] and references there in.

From the point of view of the scenarios construction approach there is no much difference between two stage and multistage SP. In both cases the numerical effort in solving the deterministic equivalent is more or less proportional to the number of generated scenarios. This view on SP started to change with developments of randomization methods and the sample complexity theory. From the point of view of solving deterministic equivalent, even two stage linear stochastic programs are computationally intractable; their computational complexity is #P-hard for a sufficiently high accuracy (cf., [10],[16]). On the other hand, under reasonable assumptions, the number of randomly generated scenarios (by Monte Carlo sampling techniques), which are required to solve two stage SP problems with accuracy ε>0\varepsilon>0 and high probability is of order O(ε2)O(\varepsilon^{-2}), [47, Section 5.3]. While randomization methods were reasonably successful in solving two stage problems, as far as multistage SP is concerned the situation is different. Number of scenarios needed to solve multistage SP problems grows exponentially with increase of the number of stages, [50],[47, Section 5.8.2].

Classical approach to sequential optimization is based on dynamic programming, [3]. Dynamic programming also has a long history and is in the heart of the SOC and MDP modeling. It has the problem of the so-called “Curse of Dimensionality”, the term coined by Bellman. Its computational complexity increases exponentially with increase of dimension of state variables. There is a large literature trying to deal with this problem by using various approximations of dynamic programming equations (e.g., [37]). Most of these methods are heuristics and do not give verifiable guarantees for the accuracy of obtained solutions.

Recent progress in solving convex multistage SP problems is based on cutting planes approximations of the cost-to-go (value) functions of dynamic programming equations. These methods allow to give an estimate of the error of the computed solution. Cutting planes type algorithms in dynamical settings is one of the main topics of this paper. We discuss such algorithms in the frameworks of SP and SOC. Moreover, we will also present extensions of stochastic approximation (a.k.a. stochastic gradient descent) type methods [38, 29, 28, 21] for multistage stochastic optimization, referred to as dynamic stochastic approximation in [24]. From the computational complexity point of view, these two types of methods seem to be complimentary to each other in the following sense. Cutting plane type methods can handle multistage problems with a large number of stages, but a relatively smaller number of state (decision) variables. On the other hand, stochastic approximation type methods can only deal with a small number of stages, but a large number of decision variables. These methods share the following common features: (a) both methods utilize the convex structure of the cost-to-go (value) functions of dynamic programming equations, (b) both methods do not require explicit discretization of state space, (c) both methods guarantee the convergence to the global optimality, (d) rates of convergence for both methods have been established.

We use the following notation and terminology throughout the paper. For aa\in{\mathbb{R}} we denote [a]+:=max{0,a}[a]_{+}:=\max\{0,a\}. Unless stated otherwise \|\cdot\| denotes Euclidean norm in n{\mathbb{R}}^{n}. By dist(x,S):=infySxy\mathop{\rm dist}(x,S):=\inf_{y\in S}\|x-y\| we denote the distance from a point xnx\in{\mathbb{R}}^{n} to a set SnS\subset{\mathbb{R}}^{n}. We write xyx^{\top}y or x,y\langle x,y\rangle for the scalar product i=1nxiyi\sum_{i=1}^{n}x_{i}y_{i} of vectors x,ynx,y\in{\mathbb{R}}^{n}. It is said that a set SnS\subset{\mathbb{R}}^{n} is polyhedral if it can be represented by a finite number of affine constraints, it is said that a function f:nf:{\mathbb{R}}^{n}\to{\mathbb{R}} is polyhedral if it can be represented as maximum of a finite number of affine functions. For a process ξ1,ξ2,\xi_{1},\xi_{2},..., we denote by ξ[t]=(ξ1,,ξt)\xi_{[t]}=(\xi_{1},...,\xi_{t}) its history up to time tt. By 𝔼|X[]{\mathbb{E}}_{|X}[\,\cdot\,] we denote the conditional expectation, conditional on random variable (random vector) XX. We use the same notation ξt\xi_{t} viewed as a random vector or as a vector variable, the particular meaning will be clear from the context. For a probability space (Ω,,)(\Omega,{\cal F},{\mathbb{P}}), by Lp(Ω,,)L_{p}(\Omega,{\cal F},{\mathbb{P}}), p[1,)p\in[1,\infty), we denote the space of random variables Z:ΩZ:\Omega\to{\mathbb{R}} having finite pp-th order moment, i.e., such that |Z|p𝑑<\int|Z|^{p}d{\mathbb{P}}<\infty. Equipped with norm Zp:=(|Z|p𝑑)1/p\|Z\|_{p}:=\left(\int|Z|^{p}d{\mathbb{P}}\right)^{1/p}, Lp(Ω,,)L_{p}(\Omega,{\cal F},{\mathbb{P}}) becomes a Banach space. The dual of 𝒵:=Lp(Ω,,){\cal Z}:=L_{p}(\Omega,{\cal F},{\mathbb{P}}) is the space 𝒵=Lq(Ω,,){\cal Z}^{*}=L_{q}(\Omega,{\cal F},{\mathbb{P}}) with q(1,]q\in(1,\infty] such that 1/p+1/q=11/p+1/q=1.

2 Stochastic Programming

In Stochastic Programming (SP) the TT-stage optimization problem can be written as

minxt𝒳t\displaystyle\min\limits_{x_{t}\in{\cal X}_{t}} 𝔼[t=1Tft(xt,ξt)]\displaystyle{\mathbb{E}}\Big{[}\sum_{t=1}^{T}f_{t}(x_{t},\xi_{t})\Big{]} (2.1)
s.t\displaystyle{\rm s.t} Btxt1+Atxt=bt,t=1,,T.\displaystyle B_{t}x_{t-1}+A_{t}x_{t}=b_{t},\;t=1,...,T. (2.2)

Here ft:nt×dtf_{t}:{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{d_{t}}\to{\mathbb{R}} are objective functions, ξtdt\xi_{t}\in{\mathbb{R}}^{d_{t}} are random vectors, 𝒳tnt{\cal X}_{t}\subset{\mathbb{R}}^{n_{t}} are nonempty closed sets, Bt=Bt(ξt)B_{t}=B_{t}(\xi_{t}), At=At(ξt)A_{t}=A_{t}(\xi_{t}) and bt=bt(ξt)b_{t}=b_{t}(\xi_{t}) are matrix and vector functions of ξt\xi_{t}, t=1,,Tt=1,...,T, with B1=0B_{1}=0.

Constraints (2.2) often represent some type of balance equations between successive stages, while the set 𝒳t{\cal X}_{t} can be viewed as representing local constraints at time tt. It is possible to consider balance equations in a more general form, nevertheless formulation (2.2) will be sufficient for our purpose of discussing convex problems. In particular, if the objective functions ft(xt,ξt):=ctxtf_{t}(x_{t},\xi_{t}):=c_{t}^{\top}x_{t} are linear, with ct=ct(ξt)c_{t}=c_{t}(\xi_{t}), and the sets 𝒳t{\cal X}_{t} are polyhedral, then problem (2.1) - (2.2) becomes a linear multistage stochastic program.

The sequence of random vectors ξ1,\xi_{1},..., is viewed as a data process. Unless stated otherwise we make the following assumption throughout the paper.

Assumption 2.1

Probability distribution of the random process ξt\xi_{t}, t=1,t=1,..., does not depend on our decisions, i.e., is independent of the chosen decision policy.

At every time period t=1,,t=1,..., we have information about (observed) realization ξ[t]=(ξ1,,ξt)\xi_{[t]}=(\xi_{1},...,\xi_{t}) of the data process. Naturally our decisions should be based on the information available at time of the decision and should not depend on the future unknown values ξt+1,,ξT\xi_{t+1},...,\xi_{T}; this is the so-called principle of nonantisipativity. That is, the decision variables xt=xt(ξ[t])x_{t}=x_{t}(\xi_{[t]}) are functions of the data process, and a sequence of such functions for every stage t=1,,Tt=1,...,T, is called a policy or a decision rule. The optimization in (2.1) - (2.2) is performed over policies satisfying the feasibility constraints. The feasibility constraints of problem (2.1) - (2.2) should be satisfied with probability one, and the expectation is taken with respect to the probability distribution of the random vector ξ[T]=(ξ1,,ξT)\xi_{[T]}=(\xi_{1},...,\xi_{T}). It could be noted that the dependence of decision variables xtx_{t}, as well as parameters (Bt(B_{t},AtA_{t},bt)b_{t}), on the data process is often suppressed in the notation. It is worthwhile to emphasize that it suffices to consider policies depending only on the history of the data process without history of the decisions, because of Assumption 2.1.

The first stage decision x1x_{1} is of the main interest and assumed to be deterministic, i.e., made before observing future realization of the data process. In that framework vector ξ1\xi_{1} and the corresponding parameters (A1,b1)(A_{1},b_{1}) are deterministic, i.e., their values are known at the time of first stage decision. Also the first stage objective function f1(x1)=f1(x1,ξ1)f_{1}(x_{1})=f_{1}(x_{1},\xi_{1}) is a function of x1x_{1} alone.

We can write the following dynamic programming equations for problem (2.1) - (2.2) (e.g., [47, Section 3.1]). At the last stage the cost-to-go (value) function is

QT(xT1,ξT)=infxT𝒳T{fT(xT,ξT):BTxT1+ATxT=bT}Q_{T}(x_{T-1},\xi_{T})=\inf_{x_{T}\in{\cal X}_{T}}\left\{f_{T}(x_{T},\xi_{T}):B_{T}x_{T-1}+A_{T}x_{T}=b_{T}\right\} (2.3)

and then going backward in time for t=T1,,2t=T-1,...,2, the cost-to-go function is

Qt(xt1,ξ[t])=infxt𝒳t{ft(xt,ξt)+𝔼|ξ[t][Qt+1(xt,ξ[t+1])]:Btxt1+Atxt=bt}.Q_{t}(x_{t-1},\xi_{[t]})=\inf_{x_{t}\in{\cal X}_{t}}\left\{f_{t}(x_{t},\xi_{t})+{\mathbb{E}}_{|\xi_{[t]}}[Q_{t+1}(x_{t},\xi_{[t+1]})]:B_{t}x_{t-1}+A_{t}x_{t}=b_{t}\right\}. (2.4)

The optimal value of problem (2.1) - (2.2) is given by the optimal value of the first stage problem

minx1𝒳1f1(x1)+𝔼[Q2(x1,ξ2)]s.t.A1x1=b1.\min\limits_{x_{1}\in{\cal X}_{1}}f_{1}(x_{1})+{\mathbb{E}}\left[Q_{2}(x_{1},\xi_{2})\right]\;{\rm s.t.}\;A_{1}x_{1}=b_{1}. (2.5)

The optimization in (2.3) - (2.4) is over (nonrandom) variables xtntx_{t}\in{\mathbb{R}}^{n_{t}} (recall that Bt=Bt(ξt)B_{t}=B_{t}(\xi_{t}), At=At(ξt)A_{t}=A_{t}(\xi_{t}) and bt=bt(ξt)b_{t}=b_{t}(\xi_{t}) are functions of ξt\xi_{t}). The corresponding optimal policy is defined by x¯t=x¯t(ξ[t])\bar{x}_{t}=\bar{x}_{t}(\xi_{[t]}), t=2,,Tt=2,...,T, where

x¯targminxt𝒳t{ft(xt,ξt)+𝔼|ξ[t][Qt+1(xt,ξ[t+1])]:Btx¯t1+Atxt=bt},\bar{x}_{t}\in\mathop{\rm arg\,min}_{x_{t}\in{\cal X}_{t}}\left\{f_{t}(x_{t},\xi_{t})+{\mathbb{E}}_{|\xi_{[t]}}[Q_{t+1}(x_{t},\xi_{[t+1]})]:B_{t}\bar{x}_{t-1}+A_{t}x_{t}=b_{t}\right\}, (2.6)

with the expectation term of QT+1(,)Q_{T+1}(\cdot,\cdot) at the last stage omitted. Of the main interest is the first stage solution given by the optimal solution x¯1\bar{x}_{1} of the first stage problem (2.5).

Since x¯t1\bar{x}_{t-1} is a function of ξ[t1]\xi_{[t-1]}, the cost-to-go function Qt(x¯t1,ξ[t])Q_{t}(\bar{x}_{t-1},\xi_{[t]}) can be considered as a function of ξ[t]\xi_{[t]}. It represents the optimal value of problem

minxτ𝒳τ𝔼[τ=t+1Tfτ(xτ,ξτ)]s.t.Bτxτ1+Aτxτ=bτ,τ=t+1,,T,\min\limits_{x_{\tau}\in{\cal X}_{\tau}}{\mathbb{E}}\Big{[}\sum_{\tau=t+1}^{T}f_{\tau}(x_{\tau},\xi_{\tau})\Big{]}\;\;{\rm s.t.}\;B_{\tau}x_{\tau-1}+A_{\tau}x_{\tau}=b_{\tau},\;\tau=t+1,...,T, (2.7)

conditional on realization ξ[t]\xi_{[t]} of the data process. This is the dynamic programming (Bellman) principle of optimality. This is also the motivation for the name “cost-to-go” function. Another name for the cost-to-go functions is “value functions”.

Remark 2.1

Let us recall the following simple result. Consider an extended real valued function ϕ:n×m{+}\phi:{\mathbb{R}}^{n}\times{\mathbb{R}}^{m}\to{\mathbb{R}}\cup\{+\infty\}. We have that if ϕ(x,y)\phi(x,y) is a convex function of (x,y)n×m(x,y)\in{\mathbb{R}}^{n}\times{\mathbb{R}}^{m}, then the min-value function φ(x):=infymϕ(x,y)\varphi(x):=\inf_{y\in{\mathbb{R}}^{m}}\phi(x,y) is also convex. This can be applied to establish convexity of the cost-to-go functions. That is, suppose that the function fT(xT,ξT)f_{T}(x_{T},\xi_{T}) is convex in xTx_{T} and the set 𝒳T{\cal X}_{T} is convex. Then the extended real valued function taking value fT(xT,ξT)f_{T}(x_{T},\xi_{T}) if xT𝒳Tx_{T}\in{\cal X}_{T} and BTxT1+ATxT=bTB_{T}x_{T-1}+A_{T}x_{T}=b_{T}, and value ++\infty otherwise, is convex. Since the cost-to-go function QT(xT1,ξT)Q_{T}(x_{T-1},\xi_{T}) can be viewed as the min-function of this extended real valued function, it follows that it is convex in xT1x_{T-1}. Also note that if the cost-to-go function Qt+1(xt,ξ[t+1])Q_{t+1}(x_{t},\xi_{[t+1]}) is convex in xtx_{t}, then its expected value in (2.4) is also convex. Consequently by induction going backward in time it is not difficult to show the following convexity property of the cost-to-go functions.

Proposition 2.1

If the objective functions ft(xt,ξt)f_{t}(x_{t},\xi_{t}) are convex in xtx_{t} and the sets 𝒳t{\cal X}_{t} are convex, t=1,,Tt=1,...,T, then the cost-to-go functions Qt(xt1,ξ[t])Q_{t}(x_{t-1},\xi_{[t]}), t=2,,Tt=2,...,T, are convex in xt1x_{t-1} and the first stage problem (2.5) is convex. In particular such convexity follows for linear multistage stochastic programs.

Such convexity property is crucial for development of efficient numerical algorithms. It is worthwhile to note that Assumption 2.1 is essential here. Dependence of the distribution of the random data on decisions typically destroys convexity even in the linear case. \hfill\square

The dynamic programming equations reduce the problem from optimization over policies, i.e., over infinite dimensional functional spaces, to evaluation of the cost-to-go functions Qt(xt1,ξ[t])Q_{t}(x_{t-1},\xi_{[t]}) of finite dimensional vectors. However, dependence of the cost-to-go functions on the whole history of the data process makes it numerically unmanageable with increase of the number of stages. The situation is simplified dramatically if we make the following assumption of stagewise independence.

Assumption 2.2 (Stagewise independence)

Probability distribution of ξt+1\xi_{t+1} is independent of ξ[t]\xi_{[t]} for t1t\geq 1.

Under Assumption 2.2 of stagewise independence, the conditional expectations in dynamic equations (2.4) become the corresponding unconditional expectations. It is straightforward to show then by induction, going backward in time, that the dynamic equations (2.4) can be written as

Qt(xt1,ξt)=infxt𝒳t{ft(xt,ξt)+𝒬t+1(xt):Btxt1+Atxt=bt},Q_{t}(x_{t-1},\xi_{t})=\inf_{x_{t}\in{\cal X}_{t}}\left\{f_{t}(x_{t},\xi_{t})+{\cal Q}_{t+1}(x_{t}):B_{t}x_{t-1}+A_{t}x_{t}=b_{t}\right\}, (2.8)

where

𝒬t+1(xt):=𝔼[Qt+1(xt,ξt+1)],{\cal Q}_{t+1}(x_{t}):={\mathbb{E}}[Q_{t+1}(x_{t},\xi_{t+1})], (2.9)

with the expectation taken with respect to the marginal distribution of ξt+1\xi_{t+1}. In that setting one needs to keep track of the (expectation) cost-to-go functions 𝒬t+1(xt){\cal Q}_{t+1}(x_{t}) only. The optimal policy is defined by the equations

x¯targminxt𝒳t{ft(xt,ξt)+𝒬t+1(xt):Btx¯t1+Atxt=bt}.\bar{x}_{t}\in\mathop{\rm arg\,min}_{x_{t}\in{\cal X}_{t}}\left\{f_{t}(x_{t},\xi_{t})+{\cal Q}_{t+1}(x_{t}):B_{t}\bar{x}_{t-1}+A_{t}x_{t}=b_{t}\right\}. (2.10)

Note that here the optimal policy values can be computed iteratively, starting from optimal solution x¯1\bar{x}_{1} of the first stage problem, and then by sequentially computing a minimizer in the right hand side of (2.10) going forward in time for t=2,,Tt=2,...,T. Of course, this procedure requires availability of the cost-to-go functions 𝒬t+1(xt){\cal Q}_{t+1}(x_{t}).

It is worthwhile to note that here the optimal policy decision x¯t\bar{x}_{t} can be viewed as a function of x¯t1\bar{x}_{t-1} and ξt\xi_{t}, for t=2,,Tt=2,...,T. Of course, x¯t1\bar{x}_{t-1} is a function of x¯t2\bar{x}_{t-2} and ξt1\xi_{t-1}, and so on. So eventually x¯t\bar{x}_{t} can be represented as a function of the history ξ[t]\xi_{[t]} of the data process. Assumption 2.1 was crucial for this conclusion, since then we need to consider only the history of the data process rather than also the history of our decisions. This is an important point, we will discuss this later.

Remark 2.2

In SP setting the process ξt\xi_{t} is viewed as a data process, and the assumption of stagewise independence could be unrealistic. In order to model stagewise dependence we assume the following Markovian property of the data process: the conditional distribution of ξt+1\xi_{t+1} given ξ[t]\xi_{[t]} is the same as the conditional distribution of ξt+1\xi_{t+1} given ξt\xi_{t}. In such Markovian setting the dynamic equations (2.4) become

Qt(xt1,ξt)=infxt𝒳t{ft(xt,ξt)+𝔼|ξt[Qt+1(xt,ξt+1)]:Btxt1+Atxt=bt}.Q_{t}(x_{t-1},\xi_{t})=\inf_{x_{t}\in{\cal X}_{t}}\left\{f_{t}(x_{t},\xi_{t})+{\mathbb{E}}_{|\xi_{t}}[Q_{t+1}(x_{t},\xi_{t+1})]:B_{t}x_{t-1}+A_{t}x_{t}=b_{t}\right\}. (2.11)

That is, compared with the stagewise independent case, in the Markovian setting the expected cost-to-go function

𝒬t+1(xt,ξt)=𝔼|ξt[Qt+1(xt,ξt+1)]{\cal Q}_{t+1}(x_{t},\xi_{t})={\mathbb{E}}_{|\xi_{t}}[Q_{t+1}(x_{t},\xi_{t+1})] (2.12)

also depends on ξt\xi_{t}. \hfill\square

3 Stochastic Optimal Control

Consider the classical Stochastic Optimal Control (SOC) (discrete time, finite horizon) model (e.g. , Bertsekas and Shreve [4]):

minut𝒰t(xt)\displaystyle\min\limits_{u_{t}\in{\cal U}_{t}(x_{t})} 𝔼π[t=1Tct(xt,ut,ξt)+cT+1(xT+1)],\displaystyle{\mathbb{E}}^{\pi}\Big{[}\sum_{t=1}^{T}c_{t}(x_{t},u_{t},\xi_{t})+c_{T+1}(x_{T+1})\Big{]}, (3.1)
s.t.\displaystyle{\rm s.t.} xt+1=Ft(xt,ut,ξt),t=1,,T.\displaystyle x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t}),\;t=1,...,T. (3.2)

Variables xtntx_{t}\in{\mathbb{R}}^{n_{t}}, t=1,,T+1t=1,...,T+1, represent the state of the system, utmtu_{t}\in{\mathbb{R}}^{m_{t}}, t=1,,Tt=1,...,T, are controls, ξtdt\xi_{t}\in{\mathbb{R}}^{d_{t}}, t=1,,Tt=1,...,T, are random vectors, ct:nt×mt×dtc_{t}:{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}\times{\mathbb{R}}^{d_{t}}\to{\mathbb{R}}, t=1,,Tt=1,...,T, are cost functions, cT+1(xT+1)c_{T+1}(x_{T+1}) is a final cost function, Ft:nt×mt×dtnt+1F_{t}:{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}\times{\mathbb{R}}^{d_{t}}\to{\mathbb{R}}^{n_{t+1}} are (measurable) mappings and 𝒰t(xt){\cal U}_{t}(x_{t}) is a (measurable) multifunction mapping xtntx_{t}\in{\mathbb{R}}^{n_{t}} to a subset of mt{\mathbb{R}}^{m_{t}}. Values x1x_{1} and ξ0\xi_{0} are deterministic (initial conditions); it is also possible to view x1x_{1} as random with a given distribution, this is not essential for the following discussion. The optimization in (3.1) - (3.2) is performed over policies π\pi determined by decisions utu_{t} and state variables xtx_{t}. This is emphasized in the notation 𝔼π{\mathbb{E}}^{\pi}, we are going to discuss this below.

Unless stated otherwise we assume that probability distribution of the random process ξt\xi_{t} does not depend on our decisions (Assumption 2.1). We can consider problem (3.1)-(3.2) in the framework of SP if we view yt=(xt,ut)y_{t}=(x_{t},u_{t}), t=1,,Tt=1,...,T, as decision variables. Suppose that the mapping Ft(xt,ut,ξt)F_{t}(x_{t},u_{t},\xi_{t}) is affine, i.e.,

Ft(xt,ut,ξt):=Atxt+Btut+bt,t=1,,T,F_{t}(x_{t},u_{t},\xi_{t}):=A_{t}x_{t}+B_{t}u_{t}+b_{t},\;t=1,...,T, (3.3)

where At=At(ξt)A_{t}=A_{t}(\xi_{t}), Bt=Bt(ξt)B_{t}=B_{t}(\xi_{t}) and bt=bt(ξt)b_{t}=b_{t}(\xi_{t}) are matrix and vector valued functions of ξt\xi_{t}. The constraints ut𝒰t(xt)u_{t}\in{\cal U}_{t}(x_{t}) can be viewed as local constraints in the SP framework; in particular if 𝒰t(xt)𝒰t{\cal U}_{t}(x_{t})\equiv{\cal U}_{t} are independent of xtx_{t}, then 𝒰t{\cal U}_{t} can be viewed as a counterpart of the set 𝒳t{\cal X}_{t} in problem (2.1) - (2.2). For the affine mapping of the form (3.3), the state equations (3.2) become

xt+1AtxtBtut=bt,t=1,,T.x_{t+1}-A_{t}x_{t}-B_{t}u_{t}=b_{t},\;t=1,...,T. (3.4)

These state equations are linear in yt=(xt,ut)y_{t}=(x_{t},u_{t}) and can be viewed as a particular case of the balance equations (2.2) of the stochastic program (2.1) - (2.2). Note, however, that here decision utu_{t} should be made before realization of ξt\xi_{t} becomes known, and xtx_{t} and utu_{t} are functions of ξ[t1]\xi_{[t-1]} rather than ξ[t]\xi_{[t]}. We emphasize that xtx_{t} and utu_{t} can be considered as functions of the history ξ[t1]\xi_{[t-1]} of the random process ξt\xi_{t}, without dependence on the history of the decisions. As in the SP framework, this follows from the dynamic programming equations discussed below, and Assumption 2.1 is essential for that conclusion.

There is a time shift in equations (3.4) as compared with the SP equations (2.2). We emphasize that in both the SOC and SP approaches, the decisions are made based on information available at time of the decision; this is the principle of nonanticipativity. Because of the time shift, the dynamic programming equations for the SOC problem are slightly different from the respective equations in SP. More important is that in the SOC framework there is a clear separation between the state and control variables. This will have an important consequence for the respective dynamic programming equations which we consider next.

Similar to SP, the dynamic programming equations for the SOC problem (3.1) - (3.2) can be written as follows. At the last stage, the value function VT+1(xT+1)=cT+1(xT+1)V_{T+1}(x_{T+1})=c_{T+1}(x_{T+1}) and, going backward in time for t=T,,1t=T,...,1, the value functions

Vt(xt,ξ[t1])=infut𝒰t(xt)xt+1=Ft(xt,ut,ξt)𝔼|ξ[t1][ct(xt,ut,ξt)+Vt+1(xt+1,ξ[t])].V_{t}(x_{t},\xi_{[t-1]})=\inf\limits_{{u_{t}\in{\cal U}_{t}(x_{t})}\atop{x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t})}}{\mathbb{E}}_{|\xi_{[t-1]}}\left[c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}x_{t+1},\xi_{[t]}\big{)}\right]. (3.5)

The optimization in the right hand side of (3.5) is performed jointly in utu_{t} and xt+1x_{t+1}. The optimal value of the SOC problem (3.1)-(3.2) is given by the first stage value function V1(x1)V_{1}(x_{1}), and can be viewed as a function of the initial conditions x1x_{1}. Note that because of the state equation (3.2), equations (3.5) can be written in the following equivalent form

Vt(xt,ξ[t1])=infut𝒰t(xt)𝔼|ξ[t1][ct(xt,ut,ξt)+Vt+1(Ft(xt,ut,ξt),ξ[t])].V_{t}(x_{t},\xi_{[t-1]})=\inf\limits_{u_{t}\in{\cal U}_{t}(x_{t})}{\mathbb{E}}_{|\xi_{[t-1]}}\left[c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]}\big{)}\right]. (3.6)

The corresponding optimal policy is defined by

(u¯t,x¯t+1)argminut𝒰t(x¯t)xt+1=Ft(x¯t,ut,ξt)𝔼|ξ[t1][ct(x¯t,ut,ξt)+Vt+1(xt+1,ξ[t])].(\bar{u}_{t},\bar{x}_{t+1})\in\mathop{\rm arg\,min}\limits_{{u_{t}\in{\cal U}_{t}(\bar{x}_{t})}\atop{x_{t+1}=F_{t}(\bar{x}_{t},u_{t},\xi_{t})}}{\mathbb{E}}_{|\xi_{[t-1]}}\left[c_{t}(\bar{x}_{t},u_{t},\xi_{t})+V_{t+1}\big{(}x_{t+1},\xi_{[t]}\big{)}\right]. (3.7)

Similar to SP, it follows that u¯t\bar{u}_{t} can be considered as a function of ξ[t1]\xi_{[t-1]}, and x¯t+1\bar{x}_{t+1} as a function of ξ[t]\xi_{[t]}. That is, (u¯t,x¯t)(\bar{u}_{t},\bar{x}_{t}) is a function of ξ[t1]\xi_{[t-1]}, this was already mentioned in the previous paragraphs. Again Assumption 2.1 is essential for the above derivations and conclusions.

Remark 3.1

Let us discuss convexity of the value functions. Consider graph of the multifunction 𝒰t(){\cal U}_{t}(\cdot):

𝖦𝗋(𝒰t):={(xt,ut):ut𝒰t(xt),xtnt}.{\sf Gr}({\cal U}_{t}):=\left\{(x_{t},u_{t}):u_{t}\in{\cal U}_{t}(x_{t}),\;x_{t}\in{\mathbb{R}}^{n_{t}}\right\}. (3.8)

Note that minimization minut𝒰t(xt)φ(xt,ut)\min_{u_{t}\in{\cal U}_{t}(x_{t})}\varphi(x_{t},u_{t}) of a function φ:nt×mt{+}\varphi:{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}\to{\mathbb{R}}\cup\{+\infty\} can equivalently be written as minut𝒰t(xt)φ^(xt,ut)\min_{u_{t}\in{\cal U}_{t}(x_{t})}\hat{\varphi}(x_{t},u_{t}), where function φ^(xt,ut)\hat{\varphi}(x_{t},u_{t}) coincides with φ(xt,ut)\varphi(x_{t},u_{t}) when ut𝒰t(xt)u_{t}\in{\cal U}_{t}(x_{t}), and is ++\infty otherwise. Of course, the constraint ut𝒰t(xt)u_{t}\in{\cal U}_{t}(x_{t}) is the same as (xt,ut)𝖦𝗋(𝒰t)(x_{t},u_{t})\in{\sf Gr}({\cal U}_{t}). If function φ(xt,ut)\varphi(x_{t},u_{t}) is convex and the set 𝖦𝗋(𝒰t){\sf Gr}({\cal U}_{t}) is convex, then the corresponding function φ^(xt,ut)\hat{\varphi}(x_{t},u_{t}) is convex. Consequently in that case the min-function minut𝒰t(xt)φ(xt,ut)\min_{u_{t}\in{\cal U}_{t}(x_{t})}\varphi(x_{t},u_{t}) is a convex function of xtx_{t}. By applying this observation and using induction going backward in time, it is not difficult to show that value functions Vt(xt,ξ[t1])V_{t}\big{(}x_{t},\xi_{[t-1]}\big{)} are convex in xtx_{t}, t=1,,Tt=1,...,T, if the following assumption holds.

Assumption 3.1

The function cT+1()c_{T+1}(\cdot) is convex, and for every t=1,,T,t=1,...,T, the cost function ct(xt,ut,ξt)c_{t}(x_{t},u_{t},\xi_{t}) is convex in (xt,ut)(x_{t},u_{t}), the mapping FtF_{t} is affine of the form (3.3), the graph 𝖦𝗋(𝒰t){\sf Gr}({\cal U}_{t}) is a convex subset of nt×mt{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}, t=1,,Tt=1,...,T.

Note that, in particular, 𝖦𝗋(𝒰t){\sf Gr}({\cal U}_{t}) is convex if 𝒰t(xt)𝒰t{\cal U}_{t}(x_{t})\equiv{\cal U}_{t} is independent of xtx_{t} and the set 𝒰t{\cal U}_{t} is convex. When 𝒰t(xt){\cal U}_{t}(x_{t}) is dependent on xtx_{t}, we need some constructive way to represent it in order to ensure convexity of 𝖦𝗋(𝒰t){\sf Gr}({\cal U}_{t}). For example suppose that the feasibility sets are defined by inequality constraints as:

𝒰t(xt):={utmt:gt(xt,ut)0},{\cal U}_{t}(x_{t}):=\{u_{t}\in{\mathbb{R}}^{m_{t}}:g_{t}(x_{t},u_{t})\leq 0\}, (3.9)

where gt:nt×mttg_{t}:{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}\to{\mathbb{R}}^{\ell_{t}}. Then 𝖦𝗋(𝒰t)={(xt,ut):gt(xt,ut)0}{\sf Gr}({\cal U}_{t})=\{(x_{t},u_{t}):g_{t}(x_{t},u_{t})\leq 0\}. Consequently if every component of mapping gt(xt,ut)g_{t}(x_{t},u_{t}) is a convex function, then 𝖦𝗋(𝒰t){\sf Gr}({\cal U}_{t}) is convex. \hfill\square

In SOC framework the random process ξ1,,\xi_{1},..., is often viewed as a noise or disturbances (note that here ξ1\xi_{1} is also random). In such cases the assumption of stagewise independence (Assumption 2.2) is natural. In the stagewise independent case the dynamic programming equations (3.6) simplify with the value functions Vt(xt)V_{t}(x_{t}), t=1,,Tt=1,...,T, being functions of the state variables only, that is

Vt(xt)=infut𝒰t(xt)𝔼[ct(xt,ut,ξt)+Vt+1(Ft(xt,ut,ξt))],V_{t}(x_{t})=\inf\limits_{u_{t}\in{\cal U}_{t}(x_{t})}{\mathbb{E}}\left[c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t})\big{)}\right], (3.10)

where the expectation is taken with respect to the marginal distribution of ξt\xi_{t}. Consequently the optimal policy is defined by the optimal controls u¯t=πt(xt)\bar{u}_{t}=\pi_{t}(x_{t}), where

u¯targminut𝒰t(xt)𝔼[ct(xt,ut,ξt)+Vt+1(Ft(xt,ut,ξt))].\bar{u}_{t}\in\mathop{\rm arg\,min}_{u_{t}\in{\cal U}_{t}(x_{t})}{\mathbb{E}}\left[c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t})\big{)}\right]. (3.11)

That is, in the stagewise independent case it is suffices to consider policies with controls of the form ut=πt(xt)u_{t}=\pi_{t}(x_{t}).

Remark 3.2

Suppose that distribution of ξt\xi_{t} can depend on (xt,ut)(x_{t},u_{t}), i.e., probability distribution Pt(|xt,ut)P_{t}(\cdot|x_{t},u_{t}) of ξt\xi_{t} is conditional on (xt,ut)(x_{t},u_{t}). That is, distribution of the random process is effected by our decisions and consequently Assumption 2.1 is not satisfied. Under the stagewise independence Assumption 2.2, the dynamic programming equations (3.10) still hold and the optimal policy is determined by (3.11). However in that case u¯t\bar{u}_{t} depends on the history (x¯1,u¯1,,u¯t1,x¯t)(\bar{x}_{1},\bar{u}_{1},...,\bar{u}_{t-1},\bar{x}_{t}) of the decision process. This history depends on the decisions (on the considered policy) and cannot be represented in terms of the process ξt\xi_{t} alone. It also could be noted that in the stagewise independence case, SOC can be formulated as Markov Decision Processes (MDP) with transition probabilities Pt(|xt,ut)P_{t}(\cdot|x_{t},u_{t}) determined by the state equations (3.2). We will not pursue the MDP formulation in the following discussion, and will rather deal with the SOC model instead. \hfill\square

As an example let us consider the classical inventory problem.

Example 3.1 (Inventory problem)

Consider problem [54],

minut0\displaystyle\min\limits_{u_{t}\geq 0} 𝔼[t=1Tctut+ψt(xt+ut,Dt)]\displaystyle{\mathbb{E}}\Big{[}\,\sum\limits_{t=1}^{T}c_{t}u_{t}+\psi_{t}(x_{t}+u_{t},D_{t})\Big{]} (3.12)
s.t.\displaystyle{\rm s.t.} xt+1=xt+utDt,t=1,,T,\displaystyle x_{t+1}=x_{t}+u_{t}-D_{t},\;t=1,...,T, (3.13)

where ct,bt,htc_{t},b_{t},h_{t} are the ordering cost, backorder penalty cost and holding cost per unit, respectively, xtx_{t} is the current inventory level, utu_{t} is the order quantity, DtD_{t} is the demand at time t=1,,Tt=1,...,T, and

ψt(yt,dt):=bt[dtyt]++ht[ytdt]+.\psi_{t}(y_{t},d_{t}):=b_{t}[d_{t}-y_{t}]_{+}+h_{t}[y_{t}-d_{t}]_{+}.

In that format xtx_{t} are state variables, utu_{t} are controls, DtD_{t} are random disturbances, and 𝒰t:=+{\cal U}_{t}:={\mathbb{R}}_{+} is the feasibility set (independent of state variables). At time t=1t=1 the inventory level x1x_{1} and demand value D0D_{0} are known (initial conditions). Decision about utu_{t} at stage t=1,,T,t=1,...,T, should be made before a realization of the demand DtD_{t} becomes known. That is, at time tt history D[t1]D_{[t-1]} of the demand process is available, but future realizations Dt,,DTD_{t},...,D_{T} are not known.

By making change of variables yt=xt+uty_{t}=x_{t}+u_{t}, problem (3.12) -(3.13) is often written in the following equivalent form

minytxt\displaystyle\min\limits_{y_{t}\geq x_{t}} 𝔼[t=1Tct(ytxt)+ψt(yt,Dt)]\displaystyle{\mathbb{E}}\Big{[}\,\sum\limits_{t=1}^{T}c_{t}(y_{t}-x_{t})+\psi_{t}(y_{t},D_{t})\Big{]} (3.14)
s.t.\displaystyle{\rm s.t.} xt+1=ytDt,t=1,,T.\displaystyle x_{t+1}=y_{t}-D_{t},\;t=1,...,T. (3.15)

Problem (3.14) -(3.15) can be viewed as an SP problem in decision variables yty_{t} and xtx_{t}.

It is convenient to write dynamic programming equations for the inventory problem in the form (3.14) -(3.15), of course this can be reformulated for the form (3.12) -(3.13) as well. As before we assume here that distribution of the demand process does not depend on our decisions. At stage t=T,,1,t=T,...,1, the value functions satisfy the following equation

Vt(xt,D[t1])=infytxt{ct(ytxt)+𝔼|D[t1][ψt(yt,Dt)+Vt+1(ytDt,D[t])]},V_{t}(x_{t},D_{[t-1]})=\inf\limits_{y_{t}\geq x_{t}}\left\{c_{t}(y_{t}-x_{t})+{\mathbb{E}}_{|D_{[t-1]}}\big{[}\psi_{t}(y_{t},D_{t})+V_{t+1}\left(y_{t}-D_{t},D_{[t]}\right)\big{]}\right\}, (3.16)

with VT+1()0V_{T+1}(\cdot)\equiv 0. The objective functions ct(ytxt)+ψt(yt,Dt)c_{t}(y_{t}-x_{t})+\psi_{t}(y_{t},D_{t}) are convex in (xt,yt)(x_{t},y_{t}) and hence value functions Vt(xt,D[t1])V_{t}(x_{t},D_{[t-1]}) are convex in xtx_{t} (see Remark 3.1).

The optimal policy of how much to order at stage tt is given by u¯t=y¯tx¯t\bar{u}_{t}=\bar{y}_{t}-\bar{x}_{t}, where

y¯targminytx¯t{ct(ytx¯t)+𝔼|D[t1][ψt(yt,Dt)+Vt+1(ytDt,D[t])]}.\bar{y}_{t}\in\mathop{\rm arg\,min}\limits_{y_{t}\geq\bar{x}_{t}}\left\{c_{t}(y_{t}-\bar{x}_{t})+{\mathbb{E}}_{|D_{[t-1]}}\big{[}\psi_{t}(y_{t},D_{t})+V_{t+1}\left(y_{t}-D_{t},D_{[t]}\right)\big{]}\right\}. (3.17)

That is, starting with the initial value x¯1=x1\bar{x}_{1}=x_{1}, value y¯1\bar{y}_{1} is computed as an optimal solution of the minimization problem in the right hand side of (3.17) for t=1t=1 (at the firs stage, since D0D_{0} is deterministic, the conditional expectation in (3.17) is just the unconditional expectation with respect to D1D_{1}). Consequently the next inventory level x¯2=y¯1D1\bar{x}_{2}=\bar{y}_{1}-D_{1} is computed given an observed realization of D1D_{1}. Then y¯2\bar{y}_{2} is computed as an optimal solution of the minimization problem in the right hand side of (3.17) for t=2t=2, the inventory level is updated as x¯3=y¯2D2\bar{x}_{3}=\bar{y}_{2}-D_{2}, and so forth going forward in time. This defines an optimal policy with the inventory level and order quantity at stage tt being functions of observed realization of the history D[t1]D_{[t-1]} of the demand process.

If the demand process DtD_{t} is stagewise independent, then value functions Vt(xt)V_{t}(x_{t}) do not depend on the history of the demand process, and equations (3.16) become

Vt(xt)=infytxt{ct(ytxt)+𝔼[ψt(yt,Dt)+Vt+1(ytDt)]},V_{t}(x_{t})=\inf\limits_{y_{t}\geq x_{t}}\left\{c_{t}(y_{t}-x_{t})+{\mathbb{E}}\big{[}\psi_{t}(y_{t},D_{t})+V_{t+1}\left(y_{t}-D_{t}\right)\big{]}\right\}, (3.18)

with the expectations taken with respect to the marginal distributions of the demand process. By using convexity of the objective function it is possible to show that in the stagewise independence case the so-called basestock policy is optimal. That is, optimal control (optimal order quantity) u¯t\bar{u}_{t} is a function of state xtx_{t}, and is given by

u¯t=max{xt,yt}xt=[ytxt]+,\bar{u}_{t}=\max\{x_{t},y^{*}_{t}\}-x_{t}=[y^{*}_{t}-x_{t}]_{+}, (3.19)

where yty^{*}_{t} is the (unconstrained) minimizer

ytargminyt{ctyt+𝔼[ψt(yt,Dt)+Vt+1(ytDt)]}.y^{*}_{t}\in\mathop{\rm arg\,min}\limits_{y_{t}}\left\{c_{t}y_{t}+{\mathbb{E}}\big{[}\psi_{t}(y_{t},D_{t})+V_{t+1}\left(y_{t}-D_{t}\right)\big{]}\right\}. (3.20)

That is, if at stage tt the current inventory level xtx_{t} is greater than or equal to yty^{*}_{t}, then order nothing; otherwise order ytxty^{*}_{t}-x_{t}. Of course, computation of the (critical) values yty^{*}_{t} is based on availability of value functions Vt()V_{t}(\cdot). \hfill\square

4 Risk Averse and Distributionally Robust Optimization

In the risk averse approach the expectation functional is replaced by a risk measure. Let (Ω,,)(\Omega,{\cal F},{\mathbb{P}}) be a probability space and 𝒵{\cal Z} be a linear space of measurable functions (random variables) Z:ΩZ:\Omega\to{\mathbb{R}}. Risk measure {\cal R} is a functional assigning a number to random variable Z𝒵Z\in{\cal Z}, that is :𝒵{\cal R}:{\cal Z}\to{\mathbb{R}}. In the risk neutral case, {\cal R} is the expectation operator, i.e., (Z):=𝔼[Z]=ΩZ(ω)𝑑(ω){\cal R}(Z):={\mathbb{E}}[Z]=\int_{\Omega}Z(\omega)d{\mathbb{P}}(\omega). In order for this expectation to be well defined we have to restrict the space of considered random variables; for the expectation it is natural to take the space of integrable functions, i.e., 𝒵:=L1(Ω,,){\cal Z}:=L_{1}(\Omega,{\cal F},{\mathbb{P}}).

It was suggested in Artzner et al [1] that reasonable risk measures should satisfy the following axioms: subadditivity, monotonicity, translation equivariance, positive homogeneity. Risk measures satisfying these axioms were called coherent. For a thorough discussion of coherent risk measures we can refer to [11],[47]. It is possible to show that if the space 𝒵{\cal Z} is a Banach lattice, in particular if 𝒵=Lp(Ω,,){\cal Z}=L_{p}(\Omega,{\cal F},{\mathbb{P}}), then a (real valued) coherent risk measure :𝒵{\cal R}:{\cal Z}\to{\mathbb{R}} is continuous in the norm topology of 𝒵{\cal Z} (cf., [43, Proposition 3.1]). It follows then by the Fenchel - Moreau theorem that {\cal R} has the following dual representation

(Z)=supζ𝔄ΩZ(ω)ζ(ω)𝑑(ω),Z𝒵,{\cal R}(Z)=\sup_{\zeta\in{\mathfrak{A}}}\int_{\Omega}Z(\omega)\zeta(\omega)d{\mathbb{P}}(\omega),\;\;Z\in{\cal Z}, (4.1)

where 𝔄{\mathfrak{A}} is a convex weakly compact subset of 𝒵{\cal Z}^{*}. Moreover, the axioms of coherent risk measures imply that every ζ𝔄\zeta\in{\mathfrak{A}} is a density function, i.e., almost surely ζ(ω)0\zeta(\omega)\geq 0 and Ωζ(ω)𝑑(ω)=1\int_{\Omega}\zeta(\omega)d{\mathbb{P}}(\omega)=1 (cf., [43, Theorem 2.2]).

An important example of coherent risk measure is the Average Value-at-Risk:

𝖠𝖵@𝖱α(Z)\displaystyle{\sf AV@R}_{\alpha}(Z) :=\displaystyle:= (1α)1α1FZ1(τ)𝑑τ\displaystyle(1-\alpha)^{-1}\int_{\alpha}^{1}F^{-1}_{Z}(\tau)d\tau (4.2)
=\displaystyle= infθ{θ+(1α)1𝔼[Zθ]+},α(0,1]\displaystyle\inf_{\theta\in{\mathbb{R}}}\left\{\theta+(1-\alpha)^{-1}{\mathbb{E}}[Z-\theta]_{+}\right\},\;\;\alpha\in(0,1] (4.3)

where FZ(z):=(Zz)F_{Z}(z):={\mathbb{P}}(Z\leq z) is the cumulative distribution function (cdf) of ZZ and FZ1(τ):=inf{z:FZ(z)τ}F^{-1}_{Z}(\tau):=\inf\{z:F_{Z}(z)\geq\tau\} is the respective quantile. It is natural in this example to use the space 𝒵=L1(Ω,,){\cal Z}=L_{1}(\Omega,{\cal F},{\mathbb{P}}) and its dual 𝒵=L(Ω,,){\cal Z}^{*}=L_{\infty}(\Omega,{\cal F},{\mathbb{P}}). In various equivalent forms this risk measure was introduced in different contexts by different authors under different names, such as Expected Shortfall, Expected Tail Loss, Conditional Value-at-Risk; variational form (4.3) was suggested in [32],[41]. In the dual form, it has representation (4.1) with

𝔄={ζ𝒵:0ζ(1α)1,Ωζ𝑑=1}.\begin{array}[]{l}{\mathfrak{A}}=\left\{\zeta\in{\cal Z}^{*}:0\leq\zeta\leq(1-\alpha)^{-1},\;\int_{\Omega}\zeta d{\mathbb{P}}=1\right\}.\end{array}

Representation (4.1) can be viewed from the distributionally robust point of view. Recall that if PP is a probability measure on (Ω,)(\Omega,{\cal F}) which is absolutely continuous with respect to {\mathbb{P}}, then by the Radon - Nikodym Theorem it has density ζ=dP/d\zeta=dP/d{\mathbb{P}}. Consider the set 𝔐{\mathfrak{M}} of probability measures PP, absolutely continuous with respect to {\mathbb{P}}, defined as 𝔐:={P:dP/d𝔄}.{\mathfrak{M}}:=\{P:dP/d{\mathbb{P}}\in{\mathfrak{A}}\}. Then representation (4.1) of functional {\cal R} can be written as111By writing 𝔼P[Z]{\mathbb{E}}_{P}[Z] we emphasize that the expectation is taken with respect to the probability measure (probability distribution) PP.

(Z)=supP𝔐𝔼P[Z].{\cal R}(Z)=\sup_{P\in{\mathfrak{M}}}{\mathbb{E}}_{P}[Z]. (4.4)

In the Distributionally Robust Optimization (DRO) it is argued that the “true” distribution is not known, and consequently a set 𝔐{\mathfrak{M}} of probability measures (distributions) is constructed in some way, and referred to as the ambiguity set. In that framework we can view (4.4) as the definition of the corresponding functional. It is straightforward to verify that the obtained functional {\cal R}, defined on an appropriate space of random variables, satisfies the axioms of coherent risk measures. So we have a certain duality relation between the risk averse and distributionally robust approaches to stochastic optimization. However, there is an essential difference in a way how the corresponding ambiguity set 𝔐{\mathfrak{M}} is constructed. In the risk averse approach the set 𝔐{\mathfrak{M}} consists of probability measures absolutely continuous with respect a specified (reference) measure {\mathbb{P}}. On the other hand, in the DRO such dominating measure may not be naturally defined. This happens, for instance, in popular settings where the ambiguity set is defined by moment constraints or is the set of probability measures with prescribed Wasserstein distance from the empirical distribution defined by the data.

In order to extend the risk averse and distributionally robust optimization to the dynamical setting of multistage stochastic optimization, we need to construct a conditional counterpart of the respective functional {\cal R}. In writing dynamic equations (2.4) and (3.6) we used conditional expectations in a somewhat informal manner. For the expectation operator there is a classical definition of conditional expectation. That is, let PP be a probability measure and 𝒢{\cal G} be a sigma subalgebra of {\cal F}. It is said that random variable, denoted 𝔼|𝒢[Z]{\mathbb{E}}_{|{\cal G}}[Z], is the conditional expected value of random variable ZZ given 𝒢{\cal G} if the following two properties hold: (i) 𝔼|𝒢[Z]:Ω{\mathbb{E}}_{|{\cal G}}[Z]:\Omega\to{\mathbb{R}} is 𝒢{\cal G}-measurable, (ii) A𝔼|𝒢[Z](ω)𝑑P(ω)=AZ(ω)𝑑P(ω)\int_{A}{\mathbb{E}}_{|{\cal G}}[Z](\omega)dP(\omega)=\int_{A}Z(\omega)dP(\omega) for any A𝒢A\in{\cal G}. There are many versions of 𝔼|𝒢[Z](ω){\mathbb{E}}_{|{\cal G}}[Z](\omega) which can differ from each other on sets of PP-measure zero. The conditional expectation depends on measure PP, therefore we sometimes write 𝔼P|𝒢[Z]{\mathbb{E}}_{P|{\cal G}}[Z] to emphasize this when various probability measures are considered.

It seems that it is natural to define the conditional counterpart of the distributionally robust functional, defined in (4.4), as supP𝔐𝔼P|𝒢[Z]\sup_{P\in{\mathfrak{M}}}{\mathbb{E}}_{P|{\cal G}}[Z]. However there are technical issues with a precise meaning of such definition since there are different versions of conditional expectations related to different probability measures P𝔐P\in{\mathfrak{M}}. Even more important, there are conceptual problems with such definition, and in fact this is not how conditional risk measures are defined (cf., [34]). In the risk averse setting there is a natural concept of law invariance. For a random variable ZZ its cumulative distribution function (cdf), with respect to the reference probability measure {\mathbb{P}}, is FZ(z):=(Zz)F_{Z}(z):={\mathbb{P}}(Z\leq z), zz\in{\mathbb{R}}. It is said that random variables ZZ and ZZ^{\prime} are distributionally equivalent if their cumulative distribution functions FZ(z)F_{Z}(z) and FZ(z)F_{Z^{\prime}}(z) do coincide for all zz\in{\mathbb{R}}. It is said that risk measure {\cal R} is law invariant if (Z)=(Z){\cal R}(Z)={\cal R}(Z^{\prime}) for any distributionally equivalent ZZ and ZZ^{\prime}. For example the Average Value-at-Risk measure =𝖠𝖵@𝖱α{\cal R}={\sf AV@R}_{\alpha} is law invariant.

Note that law invariance is defined with respect to the reference measure {\mathbb{P}}. Law invariant coherent risk measure (Z){\cal R}(Z) can be considered as a function of the cumulative distribution function FZF_{Z}, and consequently its conditional counterpart can be defined as the respective function of the conditional counterpart of FZF_{Z}; this definition can be made precise. On the other hand, in the DRO setting where there is no dominating probability measure and the concept of law invariance is not applicable, it is not completely clear how to define a conditional counterpart of functional {\cal R} defined in (4.4), although an attempt was made in [34]. See Remark 5.6 below for a further discussion of law invariance.

In the risk neutral setting we were particularly interested in the stagewise independent case (Assumption 2.2). Under that assumption the dynamic programming equations simplify to (2.8) - (2.9) in the SP and to (3.10) in the SOC frameworks, respectively. Natural counterparts of these equations are obtained by replacing the expectation operator with a coherent risk measure. That is, in the risk averse SP framework the counterpart of (2.9) is

𝒬t+1(xt):=t+1[Qt+1(xt,ξt+1)],t=1,,T1,{\cal Q}_{t+1}(x_{t}):={\cal R}_{t+1}[Q_{t+1}(x_{t},\xi_{t+1})],\;t=1,...,T-1, (4.5)

where t+1{\cal R}_{t+1} is a coherent risk measure applied to the random variable Qt+1(xt,ξt+1)Q_{t+1}(x_{t},\xi_{t+1}). For example,

t():=(1λt)𝔼[]+λt𝖠𝖵@𝖱αt(),λt[0,1],{\cal R}_{t}(\cdot):=(1-\lambda_{t}){\mathbb{E}}[\,\cdot\,]+\lambda_{t}{\sf AV@R}_{\alpha_{t}}(\cdot),\;\;\lambda_{t}\in[0,1], (4.6)

is a law invariant coherent risk measure representing convex combination of the expectation and Average Value-at-Risk. Such choice of risk measure tries to reach a compromise between minimization of the cost on average and controlling the risk of upper deviations from the average (upper quantiles of the cost) at every stage of the process. In applications the parameters λt\lambda_{t} and αt\alpha_{t} often are constants, independent of the stage tt, i.e., the risk measure t={\cal R}_{t}={\cal R} is the same for every stage (cf., [51]).

Remark 4.1

The dynamic programming equations, with (4.5) replacing (2.9), correspond to the nested formulation of the risk averse multistage SP problem (cf., [42],[47, Section 6.5]). In the nested risk averse multistage optimization the risk is controlled at every stage of the process and the total nested value of the cost does not have practical interpretation. Nevertheless, when solving the corresponding risk averse optimization problem it would be useful to estimate its optimal value for the purpose of evaluating error of the computed solution. \hfill\square

In the DRO setting the counterpart of equations (2.9) can be written as

𝒬t+1(xt):=supP𝔐t+1𝔼P[Qt+1(xt,ξt+1)],{\cal Q}_{t+1}(x_{t}):=\sup_{P\in{\mathfrak{M}}_{t+1}}{\mathbb{E}}_{P}[Q_{t+1}(x_{t},\xi_{t+1})], (4.7)

where 𝔐t+1{\mathfrak{M}}_{t+1} is a set of probability distributions of vector ξt+1\xi_{t+1}. Note that by convexity and monotonicity properties of coherent risk measures, convexity of the cost-to-go functions is preserved here. That is, convexity of Qt+1(,ξt+1)Q_{t+1}(\cdot,\xi_{t+1}) implies convexity of 𝒬t+1(){\cal Q}_{t+1}(\cdot) in the left hand side of equations (4.5) and (4.7). This in turn implies convexity of Qt(,ξt)Q_{t}(\cdot,\xi_{t}) in the left hand side of equation (2.8), provided the function ft(,ξt)f_{t}(\cdot,\xi_{t}) is convex and the set 𝒳t{\cal X}_{t} is convex. Therefore we have here similar to the risk neutral case (compare with Proposition 2.1) the following convexity property.

Proposition 4.1

If the objective functions ft(xt,ξt)f_{t}(x_{t},\xi_{t}) are convex in xtx_{t} and the sets 𝒳t{\cal X}_{t} are convex for t=1,,Tt=1,...,T, then the cost-to-go functions Qt+1(xt,ξt+1)Q_{t+1}(x_{t},\xi_{t+1}) and 𝒬t+1(xt){\cal Q}_{t+1}(x_{t}) are convex in xtx_{t}. In particular such convexity of the cost-to-go functions follows for risk averse linear multistage stochastic programs.

Similar considerations apply to the SOC framework, with the risk averse counterpart of equations (3.10) written as

Vt(xt)=infut𝒰t(xt)t[ct(xt,ut,ξt)+Vt+1(Ft(xt,ut,ξt))],V_{t}(x_{t})=\inf\limits_{u_{t}\in{\cal U}_{t}(x_{t})}{\cal R}_{t}\left[c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t})\big{)}\right], (4.8)

where t{\cal R}_{t}, t=1,,Tt=1,...,T, are coherent risk measures. Again these dynamic equations correspond to the nested formulation of the respective SOC problem. Similar to the risk neutral case, the value functions Vt(xt)V_{t}(x_{t}) are convex if Assumption 3.1 holds.

5 Dynamic Cutting Planes Algorithms

The basic idea of cutting planes algorithms for solving convex problems is approximation of the (convex) objective function by cutting planes. That is, let f:n{+}f:{\mathbb{R}}^{n}\to{\mathbb{R}}\cup\{+\infty\} be an extended real valued function. Its domain is dom(f):={xn:f(x)<+}{\rm dom}(f):=\{x\in{\mathbb{R}}^{n}:f(x)<+\infty\}. We say that an affine function (x)=α+βx\ell(x)=\alpha+\beta^{\top}x is a cutting plane of f(x)f(x) if f(x)(x)f(x)\geq\ell(x) for all xnx\in{\mathbb{R}}^{n}. If moreover f(x¯)=(x¯)f(\bar{x})=\ell(\bar{x}) for some x¯dom(f)\bar{x}\in{\rm dom}(f), then (x)\ell(x) is said to be a supporting plane of f(x)f(x), at the point x¯\bar{x}. When the function ff is convex, its supporting plane at a point x¯dom(f)\bar{x}\in{\rm dom}(f) is given by (x)=f(x¯)+g(xx¯)\ell(x)=f(\bar{x})+g^{\top}(x-\bar{x}), where gg is a subgradient of ff at x¯\bar{x}. The set of all subgradients, at a point xdom(f)x\in{\rm dom}(f), is called the subdifferential of ff at xx and denoted f(x)\partial f(x). The subdifferential of a convex function at every interior point of its domain is nonempty, [40, Theorem 23.4].

For multistage linear SP problems a cutting planes type algorithm, called Stochastic Dual Dynamic Programming (SDDP), was introduced in Pereira and Pinto [31], building on nested decomposition algorithm of Birge [5]. The SDDP algorithm is going backward and forward in order to build piecewise linear approximations of the cost-to-go functions by their cutting planes.

Remark 5.1

Let us recall how to compute a subgradient of the optimal value function of a linear program. That is, let Q(x)Q(x) be the optimal value of linear program

miny0cys.t.Bx+Ay=b,\min_{y\geq 0}c^{\top}y\;\;{\rm s.t.}\;\;Bx+Ay=b, (5.1)

considered as a function222If for some xx problem (5.1) does not have a feasible solution, then Q(x):=+Q(x):=+\infty. of vector xx. The dual of (5.1) is the linear program

maxλλ(bBx)s.t.Aλc.\max_{\lambda}\lambda^{\top}(b-Bx)\;\;{\rm s.t.}\;\;A^{\top}\lambda\leq c. (5.2)

Function Q()Q(\cdot) is piecewise linear convex, and its subgradient at a point xx where Q(x)Q(x) is finite, is given by Bλ¯-B^{\top}\bar{\lambda} with λ¯\bar{\lambda} being an optimal solution of the dual problem (5.2) (e.g., [47, Section 2.1.1]). That is, the subgradient of the optimal value function of a linear program is computed by solving the dual of that linear program. A more general setting amendable to linear programming formulation, is when the objective function of (5.1) is polyhedral. In such cases the gradient of the respective optimal value function can be computed by solving the dual of the obtained linear program. This principle is applied in the SDDP algorithm to approximation of the cost-to-go functions by their cutting planes in a dynamical setting. This motivates the “Dual Dynamic” part in the name of the algorithm. \hfill\square

5.1 SDDP algorithm for SP problems

Consider SP problem (2.1) - (2.2). Suppose that Assumptions 2.1 and 2.2 hold, and hence the corresponding cost-to-go functions are defined by equations (2.8) - (2.9). If probability distributions of random vectors ξt\xi_{t} are continuous, these distributions should be discretized in order to compute the respective expectations (see Remark 5.2 below). So we also make the following assumption.

Assumption 5.1

The probability distribution of ξt\xi_{t} has finite support {ξt1,,ξtN}\{\xi_{t1},...,\xi_{tN}\} with respective probabilities333For the sake of simplicity we assume that the number NN of realizations of ξt\xi_{t} is the same for every stage t=2,,Tt=2,...,T. Recall that in the SP framework, ξ1\xi_{1} is deterministic. pt1,,ptNp_{t1},...,p_{tN}, t=2,,Tt=2,...,T. Denote Atj:=At(ξtj)A_{tj}:=A_{t}(\xi_{tj}), Btj:=Bt(ξtj)B_{tj}:=B_{t}(\xi_{tj}), btj:=bt(ξtj)b_{tj}:=b_{t}(\xi_{tj}), and Qtj(xt1):=Qt(xt1,ξtj)Q_{tj}(x_{t-1}):=Q_{t}(x_{t-1},\xi_{tj}) the cost-to-cost-to-go functions, defined in (2.8).

The corresponding expected value cost-to-go functions can be written as

𝒬t(xt1)=j=1NptjQtj(xt1).\begin{array}[]{ll}{\cal Q}_{t}(x_{t-1})=\sum_{j=1}^{N}p_{tj}Q_{tj}(x_{t-1}).\end{array} (5.1)

Suppose further that the objective functions ft(xt,ξtj):=ctjxtf_{t}(x_{t},\xi_{tj}):=c_{tj}^{\top}x_{t} are linear in xtx_{t}, and the sets 𝒳t:=+nt{\cal X}_{t}:={\mathbb{R}}^{n_{t}}_{+}, i.e., the constraint xt𝒳tx_{t}\in{\cal X}_{t} can be written as xt0x_{t}\geq 0, t=1,,Tt=1,...,T. In that case (2.1) - (2.2) becomes the standard linear multistage SP problem:

minxt0𝔼[t=1Tctxt]s.t.Btxt1+Atxt=bt,t=1,,T.\min\limits_{x_{t}\geq 0}{\mathbb{E}}\Big{[}\sum_{t=1}^{T}c_{t}^{\top}x_{t}\Big{]}\;\;{\rm s.t.}\;B_{t}x_{t-1}+A_{t}x_{t}=b_{t},\;t=1,...,T. (5.2)

It could be noted that if the sets 𝒳t{\cal X}_{t} are polyhedral and the functions ft(xt,ξt)f_{t}(x_{t},\xi_{t}) are polyhedral in xtx_{t}, then the problem is linear and can be formulated in the standard form by the well known techniques of linear programming.

Remark 5.2

From the modeling point of view, random vectors ξt\xi_{t} often are assumed to have continuous distributions. In such cases these continuous distributions have to be discretized in order to perform numerical calculations. One possible approach to such discretization is to use Monte Carlo sampling techniques. That is, a random sample ξt1,,ξtN\xi_{t1},...,\xi_{tN} of size NN is generated from the probability distribution of random vector ξt\xi_{t}, t=2,,Tt=2,...,T. This approach is often referred to as the Sample Average Approximation (SAA) method. The constructed discretized problem can be considered in the above framework with equal probabilities ptj=1/Np_{tj}=1/N, j=1,,Nj=1,...,N. Statistical properties of the SAA method in the multistage setting are discussed, e.g., in [47, Section 5.8]. It could be mentioned that the question of the sample size NN, i.e. how many discretization points per stage to use, can be only addressed in the framework of the original model with continuous distributions. One of the advantages of the SAA method is that statistical tests can be applied to the results of computational procedures, this is discussed in [9]. \hfill\square

The SDDP algorithm for linear multistage SP problems, having finite number of scenarios, was described in details in a number of publications (see, e.g., recent survey paper [12] and references therein). For a later reference we briefly discuss it below. The SDDP algorithm consists of backward and forward steps. At each iteration of the algorithm, in the backward step current approximations of the cost-to-go functions 𝒬t(){\cal Q}_{t}(\cdot), by cutting planes, are updated by adding respective cuts. That is, at the last stage the cost-to-go function QTj(xT1)Q_{Tj}(x_{T-1}), j=1,,Nj=1,...,N, is given by the optimal solution of the linear program

minxT0cTjxTs.t.BTjxT1+ATjxT=bTj.\min_{x_{T}\geq 0}c_{Tj}^{\top}x_{T}\;\;{\rm s.t.}\;\;B_{Tj}x_{T-1}+A_{Tj}x_{T}=b_{Tj}. (5.3)

The subgradient of QTj(xT1)Q_{Tj}(x_{T-1}) at a chosen point x~T1\tilde{x}_{T-1} is given by gTj=BTjλ~Tjg_{Tj}=-B_{Tj}\tilde{\lambda}_{Tj}, where λ~Tj\tilde{\lambda}_{Tj} is an optimal solution of the dual of problem (5.3) (see Remark 5.1 above). Consequently the subgradient of 𝒬T(xT1){\cal Q}_{T}(x_{T-1}) at x~T1\tilde{x}_{T-1} is computed as 𝗀T=j=1NpTjgTj{\sf g}_{T}=\sum_{j=1}^{N}p_{Tj}g_{Tj}, and hence the cutting plane

T(xT1)=𝒬T(x~T1)+𝗀T(xT1x~T1)\ell_{T}(x_{T-1})={\cal Q}_{T}(\tilde{x}_{T-1})+{\sf g}_{T}^{\top}(x_{T-1}-\tilde{x}_{T-1}) (5.4)

is added to the current family of cutting planes of 𝒬T(){\cal Q}_{T}(\cdot). Actually at the last stage the above cutting plane T()\ell_{T}(\cdot) is the supporting plane of 𝒬T(){\cal Q}_{T}(\cdot) at x~T1\tilde{x}_{T-1}. This is because 𝒬T(x~T1){\cal Q}_{T}(\tilde{x}_{T-1}) is explicitly computed by solving problem (5.3).

At one step back at stage T1T-1, the cost-to-go function QT1,j(xT2)Q_{T-1,j}(x_{T-2}), j=1,,Nj=1,...,N, is given by the optimal solution of the program

minxT10cT1,jxT1+𝒬T(xT1)s.t.BT1,jxT2+AT1jxT1=bT1,j.\min_{x_{T-1}\geq 0}c_{T-1,j}^{\top}x_{T-1}+{\cal Q}_{T}(x_{T-1})\;\;{\rm s.t.}\;\;B_{T-1,j}x_{T-2}+A_{T-1j}x_{T-1}=b_{T-1,j}. (5.5)

The cost-to-go function 𝒬T(){\cal Q}_{T}(\cdot) is not known. Therefore it is replaced by the approximation 𝒬¯T()\underline{\cal Q}_{T}(\cdot) given by the maximum of the current family of cutting planes including the cutting plane added at stage TT as described above. Since 𝒬¯T()\underline{\cal Q}_{T}(\cdot) is the maximum of a finite number of affine functions, the obtained problems

minxT10cT1,jxT1+𝒬¯T(xT1)s.t.BT1,jxT2+AT1jxT1=bT1,j,\min_{x_{T-1}\geq 0}c_{T-1,j}^{\top}x_{T-1}+\underline{\cal Q}_{T}(x_{T-1})\;\;{\rm s.t.}\;\;B_{T-1,j}x_{T-2}+A_{T-1j}x_{T-1}=b_{T-1,j}, (5.6)

j=1,,Nj=1,...,N, are linear. Let 𝔔T1,j(xT2){\mathfrak{Q}}_{T-1,j}(x_{T-2}) be the optimal value function of problem (5.6) and 𝔔¯T1(xT2)=j=1NpT1,j𝔔T1,j(xT2)\bar{{\mathfrak{Q}}}_{T-1}(x_{T-2})=\sum_{j=1}^{N}p_{T-1,j}{\mathfrak{Q}}_{T-1,j}(x_{T-2}) be the corresponding estimate of the expected value cost function. Note that since 𝒬T()𝒬¯T(){\cal Q}_{T}(\cdot)\geq\underline{\cal Q}_{T}(\cdot), we have that QT1,j()𝔔T1,j()Q_{T-1,j}(\cdot)\geq{\mathfrak{Q}}_{T-1,j}(\cdot), j=1,,Nj=1,...,N, and hence 𝒬T1()𝔔¯T1(){\cal Q}_{T-1}(\cdot)\geq\bar{{\mathfrak{Q}}}_{T-1}(\cdot). Gradient gT1,jg_{T-1,j} of the optimal value function 𝔔T1,j(xT2){\mathfrak{Q}}_{T-1,j}(x_{T-2}) can be computed at a chosen point x~T2\tilde{x}_{T-2} by solving the dual of problem (5.6) and hence computing its optimal solution. Then 𝗀T1=j=1NpT1,jgT1,j{\sf g}_{T-1}=\sum_{j=1}^{N}p_{T-1,j}g_{T-1,j} is the gradient of 𝔔¯T1(xT2)\bar{{\mathfrak{Q}}}_{T-1}(x_{T-2}) at x~T2\tilde{x}_{T-2}. Consequently

T1(xT2)=𝒬¯T1(x~T2)+𝗀T1(xT2x~T2)\ell_{T-1}(x_{T-2})=\underline{\cal Q}_{T-1}(\tilde{x}_{T-2})+{\sf g}_{T-1}^{\top}(x_{T-2}-\tilde{x}_{T-2}) (5.7)

is a cutting plane of 𝒬T1(){\cal Q}_{T-1}(\cdot), and is added to the current collection of cutting planes of 𝒬T1(){\cal Q}_{T-1}(\cdot). And so on going backward in time until the corresponding approximation of the first stage problem is obtained:

minx10c1x1+𝒬¯2(x1)s.t.A1x1=b1.\min_{x_{1}\geq 0}c_{1}^{\top}x_{1}+\underline{\cal Q}_{2}(x_{1})\;\;{\rm s.t.}\;\;A_{1}x_{1}=b_{1}. (5.8)

Let us note that by the construction, 𝒬t()𝒬¯t(){\cal Q}_{t}(\cdot)\geq\underline{\cal Q}_{t}(\cdot) for t=2,,Tt=2,...,T. Therefore any cutting plane of 𝒬¯t()\underline{\cal Q}_{t}(\cdot) is also a cutting plane of 𝒬t(){\cal Q}_{t}(\cdot). Also it follows that the optimal value of problem (5.8) is less than or equal to the optimal value of problem (2.1) - (2.2). Therefore optimal value of problem (5.8) provides a (deterministic) lower bound for the optimal value the considered multistage problem.

The computed estimates 𝒬¯t()\underline{\cal Q}_{t}(\cdot), t=2,,Tt=2,...,T, of the cost-to-go functions, and optimal solution x¯1\bar{x}_{1} of the first stage problem (5.8), define a feasible policy in the same way as in (2.10). Since this policy is feasible, its value is greater than or equal to the optimal value of the considered problem (2.1) - (2.2). In the forward step this value is estimated by randomly generating realizations of the data process and evaluating the corresponding policy values. That is, consider a sample path (scenario) ξ^2,,ξ^T\hat{\xi}_{2},...,\hat{\xi}_{T} of the data process. Let B^t=Bt(ξ^t)\hat{B}_{t}=B_{t}(\hat{\xi}_{t}), A^t=At(ξ^t)\hat{A}_{t}=A_{t}(\hat{\xi}_{t}), b^t=bt(ξ^t)\hat{b}_{t}=b_{t}(\hat{\xi}_{t}) and c^t=ct(ξ^t)\hat{c}_{t}=c_{t}(\hat{\xi}_{t}) be realizations of the random parameters corresponding to the generated sample path. Starting with first stage solution x¯1\bar{x}_{1}, next values x¯t\bar{x}_{t} are computed iteratively going forward in time by computing an optimal solution of the optimization problem in (2.10) with 𝒬t+1(){\cal Q}_{t+1}(\cdot) replaced by 𝒬¯t+1()\underline{\cal Q}_{t+1}(\cdot). That is

x¯targminxt0{c^txt+𝒬¯t+1(xt):B^tx¯t1+A^txt=b^t}.\bar{x}_{t}\in\mathop{\rm arg\,min}_{x_{t}\geq 0}\left\{\hat{c}_{t}^{\top}x_{t}+\underline{\cal Q}_{t+1}(x_{t}):\hat{B}_{t}\bar{x}_{t-1}+\hat{A}_{t}x_{t}=\hat{b}_{t}\right\}. (5.9)

The corresponding total sum

ϑ^:=c^1x¯1+t=2Tc^tx¯t\hat{\vartheta}:=\hat{c}_{1}^{\top}\bar{x}_{1}+\sum_{t=2}^{T}\hat{c}_{t}^{\top}\bar{x}_{t} (5.10)

is a function of the generated sample and hence is random. Its expected value 𝔼[ϑ^]{\mathbb{E}}[\hat{\vartheta}] is equal to the value of the constructed policy, and thus 𝔼[ϑ^]{\mathbb{E}}[\hat{\vartheta}] is greater than or equal to the optimal value of problem (2.1) - (2.2).

The forward step of the algorithm proceeds to evaluate value of the constructed policy by randomly generating the sample paths (scenarios). At every stage t=2,,Tt=2,...,T, a point ξt\xi_{t} is generated at random according to the distribution of the corresponding random vector. That is, a point ξ^t\hat{\xi}_{t} is chosen with probability ptip_{ti}, i=1,,Ni=1,...,N, from the set {ξt1,,ξtN}\{\xi_{t1},...,\xi_{tN}\}. This generates the corresponding sample path (scenario) ξ^2,,ξ^T\hat{\xi}_{2},...,\hat{\xi}_{T} of the data process. Consequently the policy values for the generated sample path are computed, by using the rule (5.9), and thus the total sum ϑ^\hat{\vartheta}, defined in (5.10), is evaluated. The computed value ϑ^\hat{\vartheta} gives a point estimate of value of the constructed policy. This procedure can be repeated several times by randomly (and independently from each other) generating MM scenarios. Let ϑ^1,,ϑ^M\hat{\vartheta}^{1},...,\hat{\vartheta}^{M} be values, calculated in accordance with (5.10), for each generated scenario. Their average ϑ¯:=M1i=1Mϑ^i\bar{\vartheta}:=M^{-1}\sum_{i=1}^{M}\hat{\vartheta}^{i} gives an unbiased estimate of the value of the considered policy. An upper edge ϑ¯+zαS/M\bar{\vartheta}+z_{\alpha}S/\sqrt{M} of the corresponding confidence interval is viewed as a statistical estimate of an upper bound for the optimal value of the considered multistage problem. Here S2=(M1)1i=1M(ϑ^iϑ¯)2S^{2}=(M-1)^{-1}\sum_{i=1}^{M}(\hat{\vartheta}^{i}-\bar{\vartheta})^{2} is an estimate of the variance Var(ϑ^){\rm Var}(\hat{\vartheta}), and zαz_{\alpha} is the critical level, say zα=2z_{\alpha}=2. The justification for this bound is that by the Central Limit Theorem (CLT) for large MM, the average ϑ¯\bar{\vartheta} has approximately normal distribution and zαz_{\alpha} corresponds to mass 100(1α)%100(1-\alpha)\% of the standard normal distribution. For MM not too large it is a common practice to use critical values from tt-distribution with M1M-1 degrees of freedom, rather than normal. In any case this is just a heuristic justified by the CLT. Nevertheless it is useful and typically gives a reasonable estimate of the upper bound.

Remark 5.3

The constructed approximations 𝒬¯t()\underline{\cal Q}_{t}(\cdot), of the expected value cost-to-go functions, also define a feasible policy for the original problem with continuous distributions of the data process vectors ξt\xi_{t}. Its value can be estimated in the same way by generating a sample path ξ^2,,ξ^T\hat{\xi}_{2},...,\hat{\xi}_{T} from the original continuous distribution and computing the corresponding policy values in the forward step in accordance with (5.9). Consequently the policy value is estimated by averaging the computed values for a number of randomly generated sample paths. This can be used for construction of statistical upper bound for the original problem. Expectation of the optimal value of the SAA problem, based on randomly generated sample, is less than or equal to the optimal value of the original problem. Therefore the lower bound generated by the SDDP algorithm gives a point estimate of a lower bound of the original problem. We can refer to [9] for various related statistical testing procedures. \hfill\square

The difference between the (statistical) upper bound and (deterministic) lower bound provides an upper bound for the optimality gap of the constructed policy. It can be used as a stopping rule by stopping the iteration procedure when this estimated gap is smaller than a specified precision level. The bound is quite conservative and for larger problems may not be reducible to the specified accuracy level in a reasonable computational time. A more practical heuristic approach is to stop the iterations after the lower bound stabilizes and new iterations, with additional cutting planes, do not significantly improve (increase) the lower bound.

The above procedure depends on a choice of the points x~t\tilde{x}_{t}, referred to as the trial points. The forward step suggests to use the computed values x¯t\bar{x}_{t} as trial points in the next iteration of the backward step of the algorithm. The rational for this can be explained as follows. In general, in order to approximate the cost-to-go functions uniformly with a given accuracy in a specified region, the number of cutting planes grows exponentially with increase of the dimension of decision variables xtx_{t}. In many applications the regions where the cost-to-go functions should be approximated more accurately are considerably smaller than the regions of possible variability of decision variables. By randomly generating scenarios, the forward step generates trial points where likely the cost-to-go functions should be approximated more accurately.

Remark 5.4

The trial points are supposed to be chosen at stages t=2,,T1t=2,...,T-1. This suggests that the above procedure is relevant for number of stages T3T\geq 3. For two stage problems, i.e., when T=2T=2, there is no sampling involved and this becomes the classical Kelley’s cutting plane algorithm, [18]. It is well known in the deterministic convex optimization that Kelley’s algorithm can behave poorly with increase of the number of decision variables, and much better cutting plane algorithms can be designed by using regularization or trust region/level set strategies [26, 20]. However, it is not clear how such methods can be extended to the dynamic multistage settings (although such attempts were made). The reason is that in the multistage problems the optimization is performed over policies which are functions of the data process. It is not known a priori where realizations of considered policies may happen in future stages and an attempt to restrict regions of search by some form of regularization could be not helpful. \hfill\square

Proofs of convergence of the SDDP type algorithms were studied in several publications (e.g., [13] and references there in). These proofs are based on the argument that eventually almost surely the forward step of the algorithm will go through every possible scenario many times, and hence the cost-to-go functions will be approximated with any accuracy at relevant regions. Since number of scenarios grows exponentially with increase of the number of stages, this is not a very realistic assumption. Numerical experiments indicate that, in a certain sense, the numerical complexity of SDDP method grows almost linearly with increase of the number of stages while the dimension of decision vectors is constant. This property, of the rate of convergence, was investigated in recent paper [22].

5.1.1 Interstage dependence

Suppose now that the data process ξtd\xi_{t}\in{\mathbb{R}}^{d} is Markovian, and hence the cost-to-go functions are defined by equations (2.11) - (2.12). There are different ways how such Markovian structure can be modeled. One classical approach is time series analysis. The data process is modeled as, say first order, autoregressive process, i.e.,

ξt=μ+Φξt1+εt,\xi_{t}=\mu+\Phi\xi_{t-1}+\varepsilon_{t}, (5.11)

where Φd×d\Phi\in{\mathbb{R}}^{d\times d} and μd\mu\in{\mathbb{R}}^{d} are parameters of the process and εt\varepsilon_{t} is a sequence of random error vectors. It is assumed that the errors process εt\varepsilon_{t} is stagewise independent (in fact it is usually assumed that εt\varepsilon_{t} are i.i.d.). Suppose further that only the right hand side variables btb_{t} in the considered problem are random and set ξt:=bt\xi_{t}:=b_{t}. Then we can write the cost-to-go functions in terms of xtx_{t} and ξt\xi_{t} treating εt\varepsilon_{t} as the corresponding random process:

Qt(xt1,ξt1,εt)=infxt0,ξt{ctxt+𝒬t+1(xt,ξt):Btxt1+Atxt=ξt,ξtμΦξt1=εt},Q_{t}(x_{t-1},\xi_{t-1},\varepsilon_{t})=\inf_{x_{t}\geq 0,\,\xi_{t}}\left\{c_{t}^{\top}x_{t}+{\cal Q}_{t+1}(x_{t},\xi_{t}):\begin{array}[]{lll}B_{t}x_{t-1}+A_{t}x_{t}=\xi_{t},\\ \xi_{t}-\mu-\Phi\xi_{t-1}=\varepsilon_{t}\end{array}\right\}, (5.12)

with

𝒬t+1(xt,ξt)=𝔼[Qt+1(xt,ξt,εt+1)].{\cal Q}_{t+1}(x_{t},\xi_{t})={\mathbb{E}}[Q_{t+1}(x_{t},\xi_{t},\varepsilon_{t+1})]. (5.13)

Here (xt,ξt)(x_{t},\xi_{t}) are treated as decision variables, εt\varepsilon_{t} is viewed as the corresponding random process with the expectation in (5.13) is taken with respect to εt+1\varepsilon_{t+1}.

There are two drawback in the above approach. First, the number of decision variables is artificially increased with the cutting planes in the corresponding SDDP algorithm computed jointly in xtx_{t} and ξt\xi_{t}. This increases the computational complexity of the numerical procedure. Second, in order to have a linear problem, only the right hand side parameters are allowed to be random. An alternative approach is to approximate Markovian process ξt\xi_{t} by Markov chain, which we are going to discuss next.

Suppose that distribution of random vector ξt\xi_{t} is supported on set Ξtdt\Xi_{t}\subset{\mathbb{R}}^{d_{t}}, t=2,,Tt=2,...,T. Consider the partition444For the sake of simplicity we assume that the number NN of cells is the same at every stage. Ξti\Xi_{ti}, i=1,,Ni=1,...,N, of Ξt\Xi_{t} by Voronoi cells, i.e.,

Ξti:={ξtΞt:ξtηtiξtηtjfor allji},\Xi_{ti}:=\left\{\xi_{t}\in\Xi_{t}:\|\xi_{t}-\eta_{ti}\|\leq\|\xi_{t}-\eta_{tj}\|\;\text{for all}\;j\neq i\right\},

where ηtiΞt\eta_{ti}\in\Xi_{t} are given points. That is, Ξti\Xi_{ti} is the set of points of Ξt\Xi_{t} closest to ηti\eta_{ti}. There are various ways how the center points ηti\eta_{ti} can be chosen. For example, ηti\eta_{ti} could be determined by minimizing the squared distances:

minηt1,,ηtNmini=1,,Nξtηti2dP(ξt).\min_{\eta_{t1},...,\eta_{tN}}\int\min_{i=1,...,N}\|\xi_{t}-\eta_{ti}\|^{2}dP(\xi_{t}). (5.14)

The above optimization problem is not convex. Nevertheless it can be reasonably solved by various methods (cf., [27]). Consider conditional probabilities ptij:=P(ξt+1Ξt+1,j|ξtΞti)p_{tij}:=P(\xi_{t+1}\in\Xi_{t+1,j}|\xi_{t}\in\Xi_{ti}). Note that different Voronoi cells can have common boundary points, we assume that probabilities of such events are zero so that the probabilities ptijp_{tij} are well defined. This leads to an approximation of the data process by a (possibly nonhomogeneous) Markov chain with transition probabilities ptijp_{tij} of moving from point ηti\eta_{ti} to point ηt+1,j\eta_{t+1,j} at the next stage.

Consider cti:=ct(ηti)c_{ti}:=c_{t}(\eta_{ti}), Ati:=At(ηti)A_{ti}:=A_{t}(\eta_{ti}), Bti:=Bt(ηti)B_{ti}:=B_{t}(\eta_{ti}) and bti:=bt(ηti)b_{ti}:=b_{t}(\eta_{ti}), i=1,,Ni=1,...,N. Recall dynamic equations (2.11) - (2.12) for Markovian data process. For the constructed Markov chain these dynamic equations can be written as

Qt(xt1,ηti)=infxt0{ctixt+𝒬t+1(xt,ηti):Btixt1+Atixt=bti},i=1,,N,Q_{t}(x_{t-1},\eta_{ti})=\inf_{x_{t}\geq 0}\Big{\{}c_{ti}^{\top}x_{t}+{\cal Q}_{t+1}(x_{t},\eta_{ti}):B_{ti}x_{t-1}+A_{ti}x_{t}=b_{ti}\Big{\}},\;i=1,...,N, (5.15)

where

𝒬t+1(xt,ηti)=j=1NptijQt+1(xt,ηt+1,j).{\cal Q}_{t+1}(x_{t},\eta_{ti})=\sum_{j=1}^{N}p_{tij}Q_{t+1}(x_{t},\eta_{t+1,j}). (5.16)

Note that the cost-to-go functions Qt(xt1,ηt)Q_{t}(x_{t-1},\eta_{t}) are convex in xt1x_{t-1}, while the argument ηt\eta_{t} can take only the values ηt1,,ηtN\eta_{t1},...,\eta_{tN}.

It is possible to apply the SDDP method separately for every ηt=ηti\eta_{t}=\eta_{ti}, i=1,,Ni=1,...,N (cf., [27]). That is, in the backward step of the algorithm the cost-to-go functions 𝒬t+1,i(xt):=𝒬t+1(xt,ηti){\cal Q}_{t+1,i}(x_{t}):={\cal Q}_{t+1}(x_{t},\eta_{ti}) are approximated by their cutting planes (in xtx_{t}) separately for each ηti\eta_{ti}, i=1,,Ni=1,...,N. This defines a policy for every realization (sample path) η1,,\eta_{1},..., of the constructed Markov chain. This can be used in the forward step of the algorithm.

It could happen that a sample path ξ1,,\xi_{1},..., of the original data process is different from every sample path η1,,\eta_{1},..., of the constructed Markov chain. In order to extend the constructed policy to a policy for the original data process, one approach is to define the policy values for a sample path of the original data process by taking the corresponding policy values from the sample path of the Markov chain closest in some sense to the considered sample path of the original data process.

The above approach is based on discretization (approximation) of the Markov process by a Markov chain. Compared with the approach based on the autoregressive modeling of the data process, the approach of Markov chain discretization is not restricted to the case where only the right hand side parameters are random. Also it uses cutting planes approximations only with respect to variables xtx_{t}, which could be computationally advantageous. On the other hand, its computational complexity and required computer memory grows with increase of the number NN of per stage discretization points. Also unlike the SAA method, there is no available inference describing statistical properties of such discretization. From an intuitive point of view, such discretization could become problematic with increase of the dimension of the random vectors of the data process.

5.1.2 Duality of multistage linear stochastic programs

The linear multistage stochastic program (5.2), with a finite number of scenarios, can be viewed as a large scale linear program, the so called deterministic equivalent. By the standard theory of linear programming it has the (Lagrangian) dual. Suppose that the primal problem (5.2) has finite optimal value. Then the optimal values of the primal and its dual are equal to each other and both problems have optimal solutions. Dualization of the feasibility constraints leads to the following (Lagrangian) dual of problem (5.2) (cf., [47, Section 3.2.3])

maxπ𝔼[t=1Tbtπt]s.t.ATπTcT,At1πt1+𝔼|ξ[t1][Btπt]ct1,t=2,,T.\begin{array}[]{cll}\max\limits_{\pi}&{\mathbb{E}}\big{[}\sum_{t=1}^{T}b_{t}^{\top}\pi_{t}\big{]}\\ {\rm s.t.}&A^{\top}_{T}\pi_{T}\leq c_{T},\\ &A_{t-1}^{\top}\pi_{t-1}+{\mathbb{E}}_{|\xi_{[t-1]}}\left[B_{t}^{\top}\pi_{t}\right]\leq c_{t-1},\;t=2,...,T.\end{array} (5.17)

The optimization in (5.17) is over policies πt=πt(ξ[t])\pi_{t}=\pi_{t}(\xi_{[t]}), t=1,,Tt=1,...,T.

It is possible to write dynamic programming equations for the dual problem. The idea of applying the SDDP algorithm to the dual problem was introduced in [25]. In what follows we use the approach of [15] based on formulation (5.17). As before we make Assumptions 2.1, 2.2 and 5.1. At the last stage t=Tt=T, given πT1\pi_{T-1} and ξ[T1]\xi_{[T-1]}, we need to solve the following problem with respect to πT\pi_{T}:

maxπT1,,πTNj=1NpTjbTjπTjs.t.ATjπTjcTj,j=1,,N,AT1πT1+j=1NpTjBTjπTjcT1.\begin{array}[]{cll}\max\limits_{\pi_{T1},\ldots,\pi_{TN}}&\sum\limits_{j=1}^{N}p_{Tj}b_{Tj}^{\top}\pi_{Tj}\\ {\rm s.t.}&A_{Tj}^{\top}\pi_{Tj}\leq c_{Tj},\;j=1,...,N,\\ &A_{T-1}^{\top}\pi_{T-1}+\sum\limits_{j=1}^{N}p_{Tj}B_{Tj}^{\top}\pi_{Tj}\leq c_{T-1}.\end{array} (5.18)

Note that j=1NpTjbTjπTj=𝔼[bTπT]\sum\limits_{j=1}^{N}p_{Tj}b_{Tj}^{\top}\pi_{Tj}={\mathbb{E}}[b_{T}^{\top}\pi_{T}] and j=1NpTjBTjπTj=𝔼[BTπT]\sum\limits_{j=1}^{N}p_{Tj}B_{Tj}^{\top}\pi_{Tj}={\mathbb{E}}\left[B_{T}^{\top}\pi_{T}\right]. Since ξT\xi_{T} is independent of ξ[T1]\xi_{[T-1]}, the expectation in (5.18) is unconditional with respect to the distribution of ξT\xi_{T}.

The optimal value VT(πT1,ξT1)V_{T}(\pi_{T-1},\xi_{T-1}) and an optimal solution (π¯T1,,π¯TN)(\bar{\pi}_{T1},\ldots,\bar{\pi}_{TN}) of problem (5.18) are functions of vectors πT1\pi_{T-1} and cT1=cT1(ξT1)c_{T-1}=c_{T-1}(\xi_{T-1}) and matrix AT1=AT1(ξT1)A_{T-1}=A_{T-1}(\xi_{T-1}). And so on going backward in time, using the stagewise independence assumption, we can write the respective dynamic programming equations for t=T1,,2t=T-1,...,2, as

maxπt1,,πtNj=1Nptj[btjπtj+Vt+1(πtj,ξtj)]s.t.At1πt1+j=1NptjBtjπtjct1,\begin{array}[]{cll}\max\limits_{\pi_{t1},\ldots,\pi_{tN}}&\sum\limits_{j=1}^{N}p_{tj}\left[b_{tj}^{\top}\pi_{tj}+V_{t+1}(\pi_{tj},\xi_{tj})\right]\\ {\rm s.t.}&A_{t-1}^{\top}\pi_{t-1}+\sum\limits_{j=1}^{N}p_{tj}B_{tj}^{\top}\pi_{tj}\leq c_{t-1},\end{array} (5.19)

with Vt(πt1,ξt1)V_{t}(\pi_{t-1},\xi_{t-1}) being the optimal value of problem (5.19). Finally at the first stage the following problem should be solved555Value functions VtV_{t} here should not be confused with the value functions of the SOC framework.

maxπ1b1π1+V2(π1,ξ1).\max_{\pi_{1}}b_{1}^{\top}\pi_{1}+V_{2}(\pi_{1},\xi_{1}). (5.20)

Let us make the following observations about the dual problem. Unlike the primal problem, the optimization (maximization) problems (5.18) and (5.19) do not decompose into separate problems with respect to each πtj\pi_{tj} and should be solved as one linear program with respect to (πt1,,πtN)(\pi_{t1},...,\pi_{tN}). If AtA_{t} and ctc_{t}, t=2,,Tt=2,...,T, are deterministic, then Vt(πt1)V_{t}(\pi_{t-1}) is only a function of πt1\pi_{t-1}. The value (cost-to-go) function Vt(πt1,ξt1)V_{t}(\pi_{t-1},\xi_{t-1}) is a concave function of πt1\pi_{t-1}. This allows to apply an SDDP type algorithm to the dual problem (see [15] for details). For any feasible policy of the dual problem, value of the first stage problem (5.20) gives an upper bound for the optimal value of the primal problem (5.2). Since the dual is a maximization problem, the SDDP algorithm applied to the dual problem produces a deterministic upper bound for the optimal value of problem (5.2).

It is interesting to note that it could happen that the dual problem does not have relatively complete recourse even if the primal problem has it. It is said that the problem (primal or dual) has relatively complete recourse, if at every stage t=2,,Tt=2,...,T, for any generated solutions by the forward process at the previous stages, the respective dynamic program has a feasible solution at the stage tt for every realization of the random data. Without relatively complete recourse it could happen at a certain stage that for some realizations of the data process, the corresponding optimization problem is infeasible and the forward process cannot continue. Note that the relatively complete recourse is related to the policy construction in the forward dynamic process and may not hold even if the problem has finite optimal value. It is still possible to proceed with construction of the dual upper bound by introducing a penalty term to the dual problem (cf., [15]).

5.2 Cutting planes algorithm for SOC problems

Consider the SOC problem (3.1) - (3.2). Suppose that Assumptions 2.1 and 2.2 hold, and hence equation (3.10) for the value function follows. Suppose further that Assumption 3.1 is satisfied, and hence value functions Vt()V_{t}(\cdot) are convex. Then a cutting planes algorithm of the SDDP type can be applied in a way similar to the SP setting. There are, however, some subtle differences which we are going to discuss next.

In the backward step of the algorithm an approximation V¯t()\underline{V}_{t}(\cdot) of the value function Vt()V_{t}(\cdot) is constructed by maximum of a family of cutting planes, similar to the SP setting, going backward in time. Let x~t\tilde{x}_{t}, t=2,,T+1t=2,...,T+1, be trial points, generated say by the forward step of the algorithm at the previous iteration. At the backward step, starting at time t=T+1t=T+1 the cutting (supporting) plane T+1(xT+1)=cT+1(x~T+1)+𝗀T+1(xT+1x~T+1)\ell_{T+1}(x_{T+1})=c_{T+1}(\tilde{x}_{T+1})+{\sf g}_{T+1}^{\top}(x_{T+1}-\tilde{x}_{T+1}), where 𝗀T+1{\sf g}_{T+1} is a subgradient of cT+1()c_{T+1}(\cdot) at x~T+1\tilde{x}_{T+1}, is added to the current family of cutting planes of VT+1()V_{T+1}(\cdot). Going backward in time, at stage tt we need to compute a subgradient of function

infut𝒰t(xt)j=1Nptj[ctj(xt,ut)+V¯t+1(Atjxt+Btjut+btj)]\inf\limits_{u_{t}\in{\cal U}_{t}(x_{t})}\sum_{j=1}^{N}p_{tj}\left[c_{tj}(x_{t},u_{t})+\underline{V}_{t+1}\big{(}A_{tj}x_{t}+B_{tj}u_{t}+b_{tj}\big{)}\right] (5.21)

at the trial point xt=x~tx_{t}=\tilde{x}_{t}, where V¯t+1()=max1iMt+1,i()\underline{V}_{t+1}(\cdot)=\max_{1\leq i\leq M}\ell_{t+1,i}(\cdot) is the current approximation of Vt+1()V_{t+1}(\cdot) by cutting planes t+1,i()\ell_{t+1,i}(\cdot). The minimization problem (5.21) is linear if functions ctj(xt,ut)c_{tj}(x_{t},u_{t}) are linear (polyhedral) in utu_{t}, and 𝒰t(xt){\cal U}_{t}(x_{t}) is of the form (3.9) with the corresponding mapping gt(xt,ut)g_{t}(x_{t},u_{t}) being affine in utu_{t}.

Let us first consider the case where 𝒰t(xt)𝒰t{\cal U}_{t}(x_{t})\equiv{\cal U}_{t} does not depend on xtx_{t}. Then the required subgradient is given by (cf., [39, Theorem 23.8])

𝗀t=j=1Nptj[ctj(x~t,u~t)+AtjV¯t+1(Atjx~t+Btju~t+btj)],{\sf g}_{t}=\sum_{j=1}^{N}p_{tj}\left[\nabla c_{tj}(\tilde{x}_{t},\tilde{u}_{t})+A^{\top}_{tj}\nabla\underline{V}_{t+1}\big{(}A_{tj}\tilde{x}_{t}+B_{tj}\tilde{u}_{t}+b_{tj}\big{)}\right], (5.22)

where u~t𝒰t\tilde{u}_{t}\in{\cal U}_{t} is a minimizer of problem (5.21) for xt=x~tx_{t}=\tilde{x}_{t}, and ctj(x~t,u~t)\nabla c_{tj}(\tilde{x}_{t},\tilde{u}_{t}) is a subgradient of ctj(,u~t)c_{tj}(\cdot,\tilde{u}_{t}) at x~t\tilde{x}_{t}. We also need to compute a subgradient of V¯t+1()\nabla\underline{V}_{t+1}(\cdot) at the points Atjx~t+Btju~t+btjA_{tj}\tilde{x}_{t}+B_{tj}\tilde{u}_{t}+b_{tj}, j=1,,Nj=1,...,N. The subgradient V¯t+1()\nabla\underline{V}_{t+1}(\cdot) at a point xt+1x_{t+1} is given by the gradient t+1,ν(xt+1)\nabla\ell_{t+1,\nu}(x_{t+1}) of its cutting plane with ν{1,,M}\nu\in\{1,...,M\} such that t+1,ν(xt+1)=V¯t+1(xt+1)\ell_{t+1,\nu}(x_{t+1})=\underline{V}_{t+1}(x_{t+1}), i.e., t+1,ν\ell_{t+1,\nu} is the supporting plane of V¯t+1()\underline{V}_{t+1}(\cdot) at xt+1x_{t+1}. Of course, the gradient of affine function (x)=α+βx\ell(x)=\alpha+\beta^{\top}x is (x)=β\nabla\ell(x)=\beta for any xx. The cutting plane is then V¯t(x~t)+𝗀t(xtx~t)\underline{V}_{t}(\tilde{x}_{t})+{\sf g}_{t}^{\top}(x_{t}-\tilde{x}_{t}), where

V¯t(x~t)=j=1Nptj[ctj(x~t,u~t)+V¯t+1(Atjx~t+Btju~t+btj)].\underline{V}_{t}(\tilde{x}_{t})=\sum_{j=1}^{N}p_{tj}\left[c_{tj}(\tilde{x}_{t},\tilde{u}_{t})+\underline{V}_{t+1}\big{(}A_{tj}\tilde{x}_{t}+B_{tj}\tilde{u}_{t}+b_{tj}\big{)}\right].

It could be noted that in the present case the required subgradient, and hence the corresponding cutting plane, can be computed without solving the dual problem. Therefore it may be inappropriate in this setting to call it Stochastic Dual Dynamic Programming.

When 𝒰t(xt){\cal U}_{t}(x_{t}) depends on xtx_{t}, calculation of a subgradient of function (5.21) could be more involved; nevertheless in some cases it is still possible to proceed. Suppose that 𝒰t(xt){\cal U}_{t}(x_{t}) is of the form (3.9) with the corresponding mapping gt(,)g_{t}(\cdot,\cdot) having convex components. That is, minimization problem (5.21) can be written as

minutj=1Nptj[ctj(xt,ut)+V¯t+1(Atjxt+Btjut+btj)]s.t.gt(xt,ut)0.\min\limits_{u_{t}}\sum_{j=1}^{N}p_{tj}\left[c_{tj}(x_{t},u_{t})+\underline{V}_{t+1}\big{(}A_{tj}x_{t}+B_{tj}u_{t}+b_{tj}\big{)}\right]\;\;{\rm s.t.}\;g_{t}(x_{t},u_{t})\leq 0. (5.23)

Then the required gradient is given by the gradient of the Lagrangian Lt(xt,u¯t,λ¯t)L_{t}(x_{t},\bar{u}_{t},\bar{\lambda}_{t}) of problem (5.23) with u¯t\bar{u}_{t} being the respective optimal solution and λ¯t\bar{\lambda}_{t} being the corresponding Lagrange multipliers vector, provided the optimal solution and Lagrange multipliers are unique (e.g., [7, Theorem 4.24]).

The forward step of the algorithm is performed similarly to the SP setting. Starting with initial value x1=x¯1x_{1}=\bar{x}_{1} and generated sample path ξ^t\hat{\xi}_{t}, t=1,,t=1,..., the policy u¯t=πt(xt)\bar{u}_{t}=\pi_{t}(x_{t}) is computed iteratively going forward in time (compare with (3.11)). That is, at stage t=1,t=1,..., given value x¯t\bar{x}_{t} of the state vector, the control value is computes as

u¯targminut𝒰t(x¯t)j=1Nptj[ctj(x¯t,ut)+V¯t+1(Atjx¯t+Btjut+btj)].\bar{u}_{t}\in\mathop{\rm arg\,min}_{u_{t}\in{\cal U}_{t}(\bar{x}_{t})}\sum_{j=1}^{N}p_{tj}\left[c_{tj}(\bar{x}_{t},u_{t})+\underline{V}_{t+1}\big{(}A_{tj}\bar{x}_{t}+B_{tj}u_{t}+b_{tj}\big{)}\right]. (5.24)

Consequently the next stage state value is computed

x¯t+1=At(ξ^t)x¯t+Bt(ξ^t)u¯t+bt(ξ^t),\bar{x}_{t+1}=A_{t}(\hat{\xi}_{t})\bar{x}_{t}+B_{t}(\hat{\xi}_{t})\bar{u}_{t}+b_{t}(\hat{\xi}_{t}), (5.25)

and so on. Recall that in the SOC framework, ξ1\xi_{1} is also random. The computed values x¯t\bar{x}_{t} of the state variables can be used as trial points in the next iteration of the backward step of the algorithm.

Remark 5.5 (QQ-factor approach)

Consider dynamic equations (3.10) with affine state equations of the form (3.3), and define666The QQ-functions here should not be confused with the cost-to-go functions of the SP framework.

Qt(xt,ut):=𝔼[(ct(xt,ut,ξt)+Vt+1(Atxt+Btut+bt)].Q_{t}(x_{t},u_{t}):={\mathbb{E}}\left[(c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}(A_{t}x_{t}+B_{t}u_{t}+b_{t})\right]. (5.26)

We have that

Vt(xt)=infut𝒰t(xt)Qt(xt,ut).V_{t}(x_{t})=\inf\limits_{u_{t}\in{\cal U}_{t}(x_{t})}Q_{t}(x_{t},u_{t}). (5.27)

The dynamic equations (3.10) can be written in terms of Qt(xt,ut)Q_{t}(x_{t},u_{t}) as

Qt(xt,ut)=𝔼[ct(xt,ut,ξt)+infut+1𝒰t+1(Atxt+Btut+bt)Qt+1(Atxt+Btut+bt,ut+1)].Q_{t}(x_{t},u_{t})={\mathbb{E}}\Big{[}c_{t}(x_{t},u_{t},\xi_{t})+\inf\limits_{u_{t+1}\in{\cal U}_{t+1}(A_{t}x_{t}+B_{t}u_{t}+b_{t})}Q_{t+1}\big{(}A_{t}x_{t}+B_{t}u_{t}+b_{t},u_{t+1}\big{)}\Big{]}. (5.28)

The optimal policy u¯t=πt(xt)\bar{u}_{t}=\pi_{t}(x_{t}), defined by (3.11), can be written in terms of the QQ-functions as

u¯targminut𝒰t(xt)Qt(xt,ut).\bar{u}_{t}\in\mathop{\rm arg\,min}_{u_{t}\in{\cal U}_{t}(x_{t})}Q_{t}(x_{t},u_{t}). (5.29)

Under Assumption 3.1, functions Qt(xt,ut)Q_{t}(x_{t},u_{t}) are convex jointly in xtx_{t} and utu_{t}. Consequently the cutting plane algorithm can be applied directly to dynamic equations (5.28). That is, in the backward step current lower approximations Q¯t(xt,ut)\underline{Q}_{t}(x_{t},u_{t}) of the respective functions Qt(xt,ut)Q_{t}(x_{t},u_{t}) are updated by adding cutting planes t(xt,ut)\ell_{t}(x_{t},u_{t}) computed at trial points (x~t,u~t)(\tilde{x}_{t},\tilde{u}_{t}) generated in the forward step of of the algorithm. The constructed approximations Q¯t(,)\underline{Q}_{t}(\cdot,\cdot) are used in the forward step of the algorithm for generating a point estimate of the expected value of the corresponding policy for a randomly generated sample path. \hfill\square

5.3 SDDP algorithms for risk averse problems.

Cutting planes algorithms of SDDP type can be extended to risk averse problems (cf., [33],[44]). Let us start with the SP framework.

5.3.1 Stochastic Programming framework.

We follow the assumptions and notation of Section 5.1, and assume that equation (2.9) for the cost-to-go functions, is replaced by its risk averse counterpart (4.5). The backward step of the algorithm is performed similar to the risk neutral case. We only need to make adjustment to calculation of subgradients of the cutting planes approximations 𝒬¯t+1()\underline{\cal Q}_{t+1}(\cdot) of the cost-to-go functions 𝒬t+1(){\cal Q}_{t+1}(\cdot).

In numerical algorithms, we deal with discrete distributions having a finite support. So we assume in this section that Ω={ω1,,ωN}\Omega=\{\omega_{1},...,\omega_{N}\} is a finite space equipped with sigma algebra {\cal F} of all subsets of Ω\Omega, and reference probability measure {\mathbb{P}} with the respective nonzero probabilities pi=({ωi})p_{i}={\mathbb{P}}(\{\omega_{i}\}), i=1,,Ni=1,...,N. For a random variable (a function) Z:ΩZ:\Omega\to{\mathbb{R}} , we denote Zi:=Z(ωi)Z_{i}:=Z(\omega_{i}), i=1,,Ni=1,...,N.

Remark 5.6

Recall definition of law invariant risk measures discussed in section 4. Law invariant coherent risk measures have naturally defined conditional counterparts used in construction of the corresponding nested risk measures (see equations (5.51) - (5.52) below). Suppose for the moment that the probabilities pip_{i} are such that for ,𝒥{1,,N}{\cal I},{\cal J}\subset\{1,...,N\} the equality ipi=j𝒥pj\sum_{i\in{\cal I}}p_{i}=\sum_{j\in{\cal J}}p_{j} holds only if =𝒥{\cal I}={\cal J}. Then two variables Z,Z:ΩZ,Z^{\prime}:\Omega\to{\mathbb{R}} are distributionally equivalent iff they do coincide. Therefore in that case any risk measure defined on the space of variables Z:ΩZ:\Omega\to{\mathbb{R}} is law invariant. In the theory of law invariant coherent risk measures it is usually assumed that the reference probability space (Ω,,)(\Omega,{\cal F},{\mathbb{P}}) is atomless; of course an atomless space cannot be finite. In the case of finite space Ω\Omega it makes sense to consider law invariant risk measures when all probabilities pip_{i} are equal to each other, i.e., pi=1/Np_{i}=1/N, i=1,,Ni=1,...,N. In that case Z,Z:ΩZ,Z^{\prime}:\Omega\to{\mathbb{R}} are distributionally equivalent iff there exists a permutation π:ΩΩ\pi:\Omega\to\Omega such that Zi=Zπ(i)Z^{\prime}_{i}=Z_{\pi(i)}, i=1,,Ni=1,...,N. Anyway we will not restrict the following discussion to the case of all probabilities pip_{i} being equal to each other. \hfill\square

In the considered setting 𝔼[Z]=i=1NpiZi{\mathbb{E}}_{\mathbb{P}}[Z]=\sum_{i=1}^{N}p_{i}Z_{i}, and the dual representation (4.1) of coherent risk measure can be written as

(Z)=supζ𝔄i=1NpiζiZi,\begin{array}[]{ll}{\cal R}(Z)=\sup\limits_{\zeta\in{\mathfrak{A}}}\sum_{i=1}^{N}p_{i}\zeta_{i}Z_{i},\end{array} (5.30)

where 𝔄{\mathfrak{A}} is a convex closed subset of N{\mathbb{R}}^{N} consisting of vectors ζ0\zeta\geq 0 such that i=1Npiζi=1\sum_{i=1}^{N}p_{i}\zeta_{i}=1. Since it is assumed that (Z){\cal R}(Z) is real valued, the set 𝔄{\mathfrak{A}} is bounded and hence is compact.

The subdifferential of (){\cal R}(\cdot) at ZZ is given by

(Z)=argmaxζ𝔄i=1NpiζiZi.\begin{array}[]{ll}\partial{\cal R}(Z)=\mathop{\rm arg\,max}\limits_{\zeta\in{\mathfrak{A}}}\sum_{i=1}^{N}p_{i}\zeta_{i}Z_{i}.\end{array} (5.31)

Specific formulas for the subgradients, in various examples of coherent risk measures, are listed in [47, Section 6.3.2]. These formulas can be applied for computation of cutting planes in the backward step of the algorithm. Several such examples were discussed and implemented in an SDDP type algorithm in [51].

As it was described in Section 5.1, in the risk neutral setting the forward step of the algorithm has two functions; namely, construction of statistical upper bounds and generation of trial points. The constructed approximations 𝒬¯t+1(xt)\underline{\cal Q}_{t+1}(x_{t}) of the cost-to-go functions define a feasible policy in the same way as in the risk neutral case by using formula (5.9). Therefore the forward step can be used for generation of trial points similar to the risk neutral case. However, because of the nonlinearity of the risk measures, it does not produce an unbiased estimate of the risk averse value of the constructed policy and cannot be applied in a straightforward way for computing statistical upper bounds similar to the risk neutral case. At the moment there are no known efficient numerical algorithms for computing statistical upper bounds for risk averse problems in the SP framework. Interestingly, as we will argue below, in the SOC setting it is possible to construct reasonably efficient statistical upper bounds for a certain class of risk measures.

5.3.2 Stochastic Optimal Control framework.

As before we suppose that Assumptions 2.1 and 2.2 hold, and thus value functions Vt(xt)V_{t}(x_{t}), associated with coherent risk measure {\cal R}, are determined by the dynamic equations (4.8). Furthermore, we suppose that Assumption 3.1 is satisfied, and hence value functions Vt(xt)V_{t}(x_{t}) are convex.

We consider in this section the following class of coherent risk measures. This class is quite broad and includes many risk measures used in practical applications (the presentation below is based on [14]). We assume that the considered risk measures can be represented as

t(Z):=infθΘ𝔼t[Ψ(Z,θ)],Z𝒵,{\cal R}_{t}(Z):=\inf_{\theta\in\Theta}{\mathbb{E}}_{{\mathbb{P}}_{t}}[\Psi(Z,\theta)],\;\;Z\in{\cal Z}, (5.32)

where Θ\Theta is a nonempty convex closed subset of a finite dimensional vector space and Ψ:×Θ\Psi:{\mathbb{R}}\times\Theta\to{\mathbb{R}} is a real valued function. For the sake of simplicity, we assume that the function Ψ(z,θ)\Psi(z,\theta) is the same at every stage, while the reference probability distribution t{\mathbb{P}}_{t} of ξt\xi_{t} can be different at different stages. We make the following assumptions about function Ψ(z,θ)\Psi(z,\theta). (i) For every Z𝒵Z\in{\cal Z}, the expectation in the right hand side of (5.32) is well defined and the infimum is finite valued. (ii) Function Ψ(z,θ)\Psi(z,\theta) is convex in (z,θ)×Θ(z,\theta)\in{\mathbb{R}}\times\Theta. (iii) Ψ(,θ)\Psi(\cdot,\theta) is monotonically nondecreasing, i.e., if z1z2z_{1}\leq z_{2}, then Ψ(z1,θ)Ψ(z2,θ)\Psi(z_{1},\theta)\leq\Psi(z_{2},\theta) for all θΘ\theta\in\Theta.

Under these assumptions, the functional defined in (5.32) satisfies the axioms of convexity and monotonicity. Convex combination of expectation and Average Value-at-Risk, defined in (4.6), is of that form with Θ:=\Theta:={\mathbb{R}}, 𝒵:=L1(Ω,,){\cal Z}:=L_{1}(\Omega,{\cal F},{\mathbb{P}}) and

Ψ(z,θ):=(1λt)z+λt(θ+(1αt)1[zθ]+).\Psi(z,\theta):=(1-\lambda_{t})z+\lambda_{t}\left(\theta+(1-\alpha_{t})^{-1}[z-\theta]_{+}\right). (5.33)

For risk measures of the form (5.32), dynamic programming equations (4.8) can be written as

Vt(xt)=infut𝒰t(xt),θΘj=1NptjΨ(ctj(xt,ut)+Vt+1(Atjxt+Btjut+btj),θ)𝔼t[Ψ(ct(xt,ut,ξt)+Vt+1(Atxt+Btut+bt),θ)].V_{t}(x_{t})=\inf\limits_{u_{t}\in{\cal U}_{t}(x_{t}),\,\theta\in\Theta}\underbrace{\sum_{j=1}^{N}p_{tj}\Psi\big{(}c_{tj}(x_{t},u_{t})+V_{t+1}(A_{tj}x_{t}+B_{tj}u_{t}+b_{tj}),\theta\big{)}}_{{\mathbb{E}}_{{\mathbb{P}}_{t}}[\Psi(c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}(A_{t}x_{t}+B_{t}u_{t}+b_{t}),\theta)]}. (5.34)

Note that the minimization in the right hand side of (5.34) is performed jointly in controls utu_{t} and parameter vector θ\theta.

The backward step of the algorithm can performed similar to the risk neutral case. In case 𝒰t(xt)𝒰t{\cal U}_{t}(x_{t})\equiv{\cal U}_{t} does not depend on xtx_{t}, by the chain rule of differentiation formula (5.22) for the required subgradient is extended to

j=1Nptj[Ψ(ytj,θ¯t)(ctj(xt,u¯t)+AtjV¯t+1(Atjxt+Btju¯t+btj))],\sum_{j=1}^{N}p_{tj}\left[\Psi^{\prime}(y_{tj},\bar{\theta}_{t})\left(\nabla c_{tj}(x_{t},\bar{u}_{t})+A^{\top}_{tj}\nabla\underline{V}_{t+1}\big{(}A_{tj}x_{t}+B_{tj}\bar{u}_{t}+b_{tj}\big{)}\right)\right], (5.35)

where u¯t\bar{u}_{t} and θ¯t\bar{\theta}_{t} are minimizers in the right hand side of (5.34), with Vt+1()V_{t+1}(\cdot) replaced by its current approximation V¯t+1()\underline{V}_{t+1}(\cdot), and Ψ(ytj,θ¯t)\Psi^{\prime}(y_{tj},\bar{\theta}_{t}) is a subgradient of Ψ(,θ¯t)\Psi(\cdot,\bar{\theta}_{t}) at ytj:=ctj(xt,u¯t)+V¯t+1(Atjxt+Btju¯t+btj)y_{tj}:=c_{tj}(x_{t},\bar{u}_{t})+\underline{V}_{t+1}\big{(}A_{tj}x_{t}+B_{tj}\bar{u}_{t}+b_{tj}\big{)}.

The subgradient Ψ(,θ)\Psi^{\prime}(\cdot,\theta) can be easily computed in many interesting examples (e.g., [47, Section 6.3.2]). For example, consider convex combination of expectation and Average Value-at-Risk measure, defined in (4.6). The corresponding function Ψ(z,θ)\Psi(z,\theta) of the form (5.33), is piecewise linear in zz. Then Ψ(z,θ)=1λt\Psi^{\prime}(z,\theta)=1-\lambda_{t} if z<θz<\theta, Ψ(z,θ)=1λt+λt(1αt)1\Psi^{\prime}(z,\theta)=1-\lambda_{t}+\lambda_{t}(1-\alpha_{t})^{-1} if z>θz>\theta; and if z=θz=\theta, then Ψ(z,θ)\Psi^{\prime}(z,\theta) can be any point in the interval [1λt,1λt+λt(1αt)1][1-\lambda_{t},1-\lambda_{t}+\lambda_{t}(1-\alpha_{t})^{-1}].

The computed approximations V¯t+1()\underline{V}_{t+1}(\cdot) of the value functions define a feasible policy in the same way as in the risk neutral case. Also by construction we have that Vt()V¯t()V_{t}(\cdot)\geq\underline{V}_{t}(\cdot), t=1,,Tt=1,...,T. Therefore the algorithm produces a (deterministic) lower bound for the optimal value of the problem.

Formula (5.34) shows that the controls and value of the parameter θ\theta can be computed simultaneously. This leads to the following statistical upper bound for the (risk averse) value of the policy defined by the computed approximation. Let ξ^1,,ξ^T\hat{\xi}_{1},...,\hat{\xi}_{T} be a sample path of the data process. Suppose for the moment that t=𝔼{\cal R}_{t}={\mathbb{E}}, i.e., consider the risk neutral case. Let π\pi be a policy determined by controls u¯t\bar{u}_{t} and states x¯t\bar{x}_{t}, and c^t:=ct(x¯t,u¯t,ξ^t)\hat{c}_{t}:=c_{t}(\bar{x}_{t},\bar{u}_{t},\hat{\xi}_{t}), t=1,,Tt=1,...,T, and c^T+1:=cT+1(x¯T+1)\hat{c}_{T+1}:=c_{T+1}(\bar{x}_{T+1}), be cost values associated with the sample path. Then the corresponding point estimate of the expectation (3.1) of the total cost for the considered policy π\pi, can be computed iteratively going backward in time with 𝔳T+1:=c^T+1{\mathfrak{v}}_{T+1}:=\hat{c}_{T+1} and

𝔳t:=c^t+𝔳t+1,t=T,,1.{\mathfrak{v}}_{t}:=\hat{c}_{t}+{\mathfrak{v}}_{t+1},\;\;t=T,...,1. (5.36)

Value 𝔳1{\mathfrak{v}}_{1} gives an unbiased estimate of the corresponding expected total cost.

This process can be adapted for general risk averse case as follows. Let A^t=At(ξ^t)\hat{A}_{t}=A_{t}(\hat{\xi}_{t}), B^t=Bt(ξ^t)\hat{B}_{t}=B_{t}(\hat{\xi}_{t}) and b^t=bt(ξ^t)\hat{b}_{t}=b_{t}(\hat{\xi}_{t}), t=1,,Tt=1,...,T, be realizations of the parameters corresponding to the generated sample path. For the approximations V¯t()\underline{V}_{t}(\cdot) consider the corresponding policy with controls u¯t\bar{u}_{t} and parameters θ¯t\bar{\theta}_{t} computed going forward, starting with initial value x¯1\bar{x}_{1} and using (compare with (5.24) - (5.25))

(u¯t,θ¯t)\displaystyle(\bar{u}_{t},\bar{\theta}_{t}) \displaystyle\in argminut𝒰t(x¯t),θΘj=1NptjΨ(ctj(x¯t,ut)+V¯t+1(Atjx¯t+Btjut+btj),θ),\displaystyle\mathop{\rm arg\,min}\limits_{u_{t}\in{\cal U}_{t}(\bar{x}_{t}),\,\theta\in\Theta}\sum_{j=1}^{N}p_{tj}\Psi\big{(}c_{tj}(\bar{x}_{t},u_{t})+\underline{V}_{t+1}(A_{tj}\bar{x}_{t}+B_{tj}u_{t}+b_{tj}),\theta\big{)}, (5.37)
x¯t+1\displaystyle\bar{x}_{t+1} =\displaystyle= A^tx¯t+B^tu¯t+b^t.\displaystyle\hat{A}_{t}\ \bar{x}_{t}+\hat{B}_{t}\bar{u}_{t}+\hat{b}_{t}. (5.38)

Let c^t:=ct(x¯t,u¯t,ξ^t)\hat{c}_{t}:=c_{t}(\bar{x}_{t},\bar{u}_{t},\hat{\xi}_{t}), t=1,,Tt=1,...,T, and c^T+1:=cT+1(x¯T+1)\hat{c}_{T+1}:=c_{T+1}(\bar{x}_{T+1}), be cost values associated with the generated sample path. Consider the following values defined iteratively going backward in time: 𝔳T+1:=c^T+1{\mathfrak{v}}_{T+1}:=\hat{c}_{T+1} and

𝔳t:=Ψ(c^t+𝔳t+1,θ¯t),t=T,,1.{\mathfrak{v}}_{t}:=\Psi(\hat{c}_{t}+{\mathfrak{v}}_{t+1},\bar{\theta}_{t}),\;\;t=T,...,1. (5.39)

It is possible to show that 𝔼[𝔳1]{\mathbb{E}}[{\mathfrak{v}}_{1}] is greater than or equal to value of the policy defined by the considered approximate value functions, and hence is an upper bound on the optimal value of the risk averse problem (cf., [14]). Expectation 𝔼[𝔳1]{\mathbb{E}}[{\mathfrak{v}}_{1}] can be estimated by randomly generating sample paths (scenarios) and averaging the computed realizations of random variable 𝔳1{\mathfrak{v}}_{1}. This is similar to construction of statistical upper bound in the risk neutral setting. The inequality: “𝔼[𝔳1]{\mathbb{E}}[{\mathfrak{v}}_{1}]\geq value of the constructed policy”, is based on Jensen’s inequality applied to convex function Ψ(,θ)\Psi(\cdot,\theta) and, unlike the risk neutral case, can be strict. Therefore the above procedure introduces an additional error in the estimated optimality gap. This additional error becomes larger when function Ψ(,θ)\Psi(\cdot,\theta) is “more nonlinear”. Numerical experiments indicate that this procedure gives a reasonable upper bound, for instance for risk measures of the form (4.6) (cf., [14]).

QQ-factor approach.

Consider the dynamic equations (5.34) and define (see Remark 5.5)

Qt(xt,ut,θt):=𝔼t[Ψ(ct(xt,ut,ξt)+Vt+1(Atxt+Btut+bt),θt)].Q_{t}(x_{t},u_{t},\theta_{t}):={\mathbb{E}}_{{\mathbb{P}}_{t}}\left[\Psi\big{(}c_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}(A_{t}x_{t}+B_{t}u_{t}+b_{t}),\theta_{t}\big{)}\right]. (5.40)

We have that

Vt(xt)=infut𝒰t,θtΘQt(xt,ut,θt),V_{t}(x_{t})=\inf\limits_{u_{t}\in{\cal U}_{t},\,\theta_{t}\in\Theta}Q_{t}(x_{t},u_{t},\theta_{t}),

and hence dynamic equations (5.34) can be written in terms of Qt(xt,ut,θt)Q_{t}(x_{t},u_{t},\theta_{t}) as

Qt(xt,ut,θt)=𝔼t[Ψ(ct(xt,ut,ξt)+infut+1𝒰t+1,θt+1ΘQt+1(Atxt+Btut+bt,ut+1,θt+1),θt)].Q_{t}(x_{t},u_{t},\theta_{t})={\mathbb{E}}_{{\mathbb{P}}_{t}}\left[\Psi\Big{(}c_{t}(x_{t},u_{t},\xi_{t})+\inf\limits_{u_{t+1}\in{\cal U}_{t+1},\,\theta_{t+1}\in\Theta}Q_{t+1}\big{(}A_{t}x_{t}+B_{t}u_{t}+b_{t},u_{t+1},\theta_{t+1}\big{)},\theta_{t}\Big{)}\right].

The cutting planes algorithm can be applied directly to functions Qt(xt,ut,θt)Q_{t}(x_{t},u_{t},\theta_{t}) rather than to the value functions Vt(xt)V_{t}(x_{t}). In the backward step of the algorithm, subgradients with respect to xt,utx_{t},u_{t} and θt\theta_{t}, of the current approximations of Qt(xt,ut,θt)Q_{t}(x_{t},u_{t},\theta_{t}), should be computed at the respective trial points, and the obtained cutting plane t(xt,ut,θt)\ell_{t}(x_{t},u_{t},\theta_{t}) is added to the current family of cutting planes of Qt(xt,ut,θt)Q_{t}(x_{t},u_{t},\theta_{t}). An advantage of that approach could be that the calculation of the respective subgradients does not require solving nonlinear optimization programs even if the function Ψ\Psi is not polyhedral.

5.3.3 Infinite horizon setting

An infinite horizon (stationary) counterpart of the SOC problem (3.1) - (3.2) is

minπΠ𝔼π[t=1γt1c(xt,ut,ξt)],\min\limits_{\pi\in\Pi}{\mathbb{E}}^{\pi}\Big{[}\sum_{t=1}^{\infty}\gamma^{t-1}c(x_{t},u_{t},\xi_{t})\Big{]}, (5.41)

where γ(0,1)\gamma\in(0,1) is the so-called discount factor. The optimization (minimization) in (5.41) is over the set of feasible policies Π\Pi satisfying777When the number of scenarios is finite, the feasibility constrains should be satisfied for all possible realizations of the random process ξt\xi_{t}. (w.p.1):

ut𝒰,xt+1=F(xt,ut,ξt),t1.u_{t}\in{\cal U},\;x_{t+1}=F(x_{t},u_{t},\xi_{t}),\;t\geq 1. (5.42)

It is assumed that ξtP\xi_{t}\sim P, t=1,,t=1,..., is an i.i.d. (independent identically distributed) sequence of random vectors, with common distribution PP. The cost c(x,u,ξ)c(x,u,\xi) and the mapping F(x,u,ξ)F(x,u,\xi) are assumed to be the same at every stage, and the (nonempty) feasibility set 𝒰{\cal U} of controls does not depend on the state and stage.

Bellman equation for the value function, associated with problem (5.41), can be written as

V(x)=infu𝒰𝔼P[c(x,u,ξ)+γV(F(x,u,ξ))].V(x)=\inf_{u\in{\cal U}}{\mathbb{E}}_{P}[c(x,u,\xi)+\gamma V(F(x,u,\xi))]. (5.43)

Assuming that c(x,u,ξ)c(x,u,\xi) is bounded, equation (5.43) has a unique solution V¯()\bar{V}(\cdot). The optimal policy for problem (5.42) is given by ut=π(xt)u_{t}=\pi(x_{t}), t=1,,t=1,..., with

π(x)argminu𝒰𝔼P[c(x,u,ξ)+γV¯(F(x,u,ξ))].\pi(x)\in\mathop{\rm arg\,min}_{u\in{\cal U}}{\mathbb{E}}_{P}\big{[}c(x,u,\xi)+\gamma\bar{V}(F(x,u,\xi))\big{]}. (5.44)

Similar to Assumption 3.1 let us make following assumption ensuring convexity of the solution V¯()\bar{V}(\cdot) of equation (5.43).

Assumption 5.2

The function c(x,u,ξ)c(x,u,\xi) is convex in (x,u)(x,u), the mapping F(x,u,ξ)=A(ξ)x+B(ξ)u+b(ξ)F(x,u,\xi)=A(\xi)x+B(\xi)u+b(\xi) is affine, and the set 𝒰{\cal U} is convex closed.

In order to solve equation (5.43) numerically we need to discretize the possibly continuous distribution PP. Suppose that the SAA method is applied, i.e., a sample ξ1,,ξN\xi^{1},...,\xi^{N} of size NN from the distribution PP is generated (say by Monte Carlo sampling techniques), and the distribution PP in Bellman equation (5.43) is replaced by its empirical estimate888By δξ\delta_{\xi} we denote measure assigning mass one at point ξ\xi, the so-called Dirac measure. PN=N1j=1NδξjP_{N}=N^{-1}\sum_{j=1}^{N}\delta_{\xi^{j}}, assigning probability 1/N1/N to each generated point. This raises the question of the involved sample complexity, i.e., how large should be the sample size NN in order for the SAA problem to give an accurate approximation of the original problem. In some applications the discount factor is very close to one. It is well known that as the discount factor γ\gamma approaches one, it becomes more difficult to solve the problem. Note that the optimal value of problem (5.41) increases at the rate of O((1γ)1)O((1-\gamma)^{-1}) as γ\gamma approaches one. It is shown in [45] that the required sample size NN is of order O((1γ)1ε2)O((1-\gamma)^{-1}\varepsilon^{-2}) as a function of the discount factor γ\gamma and the error level ε>0\varepsilon>0. That is, the error grows more or less proportionally to the optimal value as γ\gamma approaches one. This indicates that the sample size required to achieve a specified relative error is not sensitive to the discount factor being close to one.

Suppose now that random vector ξ\xi has a finite number ξ1,,ξN\xi^{1},...,\xi^{N} of possible values with respective probabilities p1,,pNp_{1},...,p_{N}. For example, in the SAA approach this is a random sample generated by Monte Carlo sampling techniques, in which case pi=1/Np_{i}=1/N, i=1,,Ni=1,...,N. In that setting Bellman equation (5.43) can be written as

V(x)=infu𝒰j=1Npj[cj(x,u)+γV(Ajx+Bju+bj)],V(x)=\inf_{u\in{\cal U}}\,\sum_{j=1}^{N}p_{j}[c_{j}(x,u)+\gamma V(A_{j}x+B_{j}u+b_{j})], (5.45)

where cj(x,u):=c(x,u,ξj)c_{j}(x,u):=c(x,u,\xi^{j}), Aj:=A(ξj)A_{j}:=A(\xi^{j}), Bj:=B(ξj)B_{j}:=B(\xi^{j}) and bj:=b(ξj)b_{j}:=b(\xi^{j}), j=1,,Nj=1,...,N.

A cutting planes algorithm of SDDP type can be applied to equation (5.45). Given a current approximation V¯()\underline{V}(\cdot) of the value function by cutting planes and a trial point x~\tilde{x}, new cutting plane (x)=v~+𝗀(xx~)\ell(x)=\tilde{v}+{\sf g}^{\top}(x-\tilde{x}) is added to the set of cutting planes of the value function, where

u~\displaystyle\tilde{u}\in argminu𝒰j=1Npj[cj(x~,u)+γV¯(Ajx~+Bju+bj)],\displaystyle\mathop{\rm arg\,min}\limits_{u\in{\cal U}}\sum_{j=1}^{N}p_{j}[c_{j}(\tilde{x},u)+\gamma\underline{V}(A_{j}\tilde{x}+B_{j}u+b_{j})], (5.46)
v~=\displaystyle\tilde{v}= j=1Npj[cj(x~,u~)+γV¯(Ajx~+Bju~+bj)],\displaystyle\sum_{j=1}^{N}p_{j}[c_{j}(\tilde{x},\tilde{u})+\gamma\underline{V}(A_{j}\tilde{x}+B_{j}\tilde{u}+b_{j})], (5.47)
𝗀=\displaystyle{\sf g}= j=1Npj[cj(x~,u~)+γAjV¯(Ajx~+Bju~+bj)].\displaystyle\sum_{j=1}^{N}p_{j}[\nabla c_{j}(\tilde{x},\tilde{u})+\gamma A_{j}^{\top}\nabla\underline{V}(A_{j}\tilde{x}+B_{j}\tilde{u}+b_{j})]. (5.48)

If c(x,u,ξ)c(x,u,\xi) is linear (polyhedral) in uu and the set 𝒰{\cal U} is polyhedral given by linear inequalities, the minimization problem in the right hand side of (5.46) can be formulated as a linear programming problem. The subgradient of V¯()\underline{V}(\cdot) at a point xx can be computed in the same way as in (5.22) by using subgradient of its supporting plane at xx.

Similar to the case of finite horizon, by construction we have that V¯()V¯()\bar{V}(\cdot)\geq\underline{V}(\cdot), and hence V¯(x1)\underline{V}(x_{1}) gives a (deterministic) lower bound for the optimal value of problem (5.41) with initial value x1x_{1} of the state vector. Important questions, related to the above algorithm, is how to generate trial points and how to construct an upper bound for the optimal value of problem (5.41). Current implementation, which was tested in numerical experiments, is to apply the forward step of the cutting planes algorithm to a truncated version of problem (5.41). Note that

|t=T+1γt1c(xt,ut,ξt)|t=T+1γt1|c(xt,ut,ξt)|κγT1γ,\left|\sum_{t=T+1}^{\infty}\gamma^{t-1}c(x_{t},u_{t},\xi_{t})\right|\leq\sum_{t=T+1}^{\infty}\gamma^{t-1}\left|c(x_{t},u_{t},\xi_{t})\right|\leq\frac{\kappa\gamma^{T}}{1-\gamma}, (5.49)

where κ\kappa is an upper bound for |c(x,u,ξ)||c(x,u,\xi)|. Therefore the suggestion is to choose the horizon TT such that κγT(1γ)1ε\kappa\gamma^{T}(1-\gamma)^{-1}\leq\varepsilon, where ε>0\varepsilon>0 is a prescribed accuracy. Then the forward steps are applied to the finite horizon version of the problem using current approximation V¯()\underline{V}(\cdot) of the value function. One run of the forward step, applied to the truncated version, will generate a point estimate (up to accuracy ε\varepsilon) of value of the constructed policy, and T1T-1 trial points. It is possible to choose, say at random, one of these trial points for construction of the cutting plane for the value function.

Such choice of the trial points could be not optimal. The question of “best way” for construction of trial points in the above algorithm was not carefully investigated. Another problem with this procedure is that when the discount factor is close to one, the required horizon TT could be too large for a numerical implementation; recall that in order to construct the statistical upper bound the forward step should be run a reasonable number of times. In such cases, if the problem can be formulated as a linear program, deterministic upper bounds of the dual problem can be more efficient (cf., [46]).

Risk averse case.

The infinite horizon risk neutral formulation (5.41) can be extended to the risk averse setting by replacing the expectation operator in Bellman equation (5.43) with a coherent risk measure {\cal R}. That is, risk averse counterpart of Bellman equation (5.43) is

V(x)=infu𝒰[c(x,u,ξ)+γV(F(x,u,ξ)].V(x)=\inf_{u\in{\cal U}}{\cal R}[c(x,u,\xi)+\gamma V(F(x,u,\xi)]. (5.50)

This equation corresponds to the nested counterpart of the risk neutral problem (5.41):

minπΠlimTT(t=1Tγt1c(xt,ut,ξt)),\begin{array}[]{ll}\min\limits_{\pi\in\Pi}\ \lim\limits_{T\to\infty}{\mathfrak{R}}_{T}\left(\sum_{t=1}^{T}\gamma^{t-1}c(x_{t},u_{t},\xi_{t})\right),\end{array} (5.51)

where T{\mathfrak{R}}_{T} is the corresponding TT-stage nested risk measure (cf., [42]). That is, as it was discussed in section 3, for a considered policy, state xtx_{t} and control utu_{t} at stage tt are functions of ξ[t1]\xi_{[t-1]}. Consider the corresponding cost Zt:=γt1c(xt,ut,ξt)Z_{t}:=\gamma^{t-1}c(x_{t},u_{t},\xi_{t}) which is a function of ξ[t]\xi_{[t]}. Then

T(t=1TZt)=(Z1+|ξ[1](Z2++|ξT1(ZT))),\begin{array}[]{ll}{\mathfrak{R}}_{T}\left(\sum_{t=1}^{T}Z_{t}\right)={\cal R}\Big{(}Z_{1}+{\cal R}_{|\xi_{[1]}}\big{(}Z_{2}+...+{\cal R}_{|\xi_{T-1}}(Z_{T})\big{)}\Big{)},\end{array} (5.52)

where |ξ[t]{\cal R}_{|\xi_{[t]}} is the conditional counterpart of {\cal R}.

The backward step of the cutting planes algorithm in the risk averse setting is basically the same as the one discussed above for the risk neutral case. In order to construct a statistical upper bound we can proceed similar to the construction of section 5.3.2. That is, suppose that risk measure {\cal R} is of the form (5.32) with function Ψ(z,θ)\Psi(z,\theta) satisfying conditions (i) - (iii) specified in section 5.3.2. Suppose also that ξ\xi has a finite number ξ1,,ξN\xi^{1},...,\xi^{N} of possible values with reference probabilities p1,,pNp_{1},...,p_{N}. Then Bellman equation (5.50) can be written as

V(x)=infu𝒰,θΘj=1NpjΨ(cj(x,u)+γV(Ajx+Bju+bj),θ).V(x)=\inf_{u\in{\cal U},\,\theta\in\Theta}\,\sum_{j=1}^{N}p_{j}\Psi\big{(}c_{j}(x,u)+\gamma V(A_{j}x+B_{j}u+b_{j}),\theta\big{)}. (5.53)

A statistical upper bound can be constructed in a way similar to section 5.3.2 using a truncated version of problem (5.41).

Periodical setting

In various applications the data process ξt\xi_{t} has a periodical behavior. One such example is hydropower generation planning discussed in [51]. The uncertain process in that example is water inflows recorded on monthly basis. There is available historical data of 79 years of records of the natural monthly energy inflow. This allows to model the uncertain process as a periodical random process with period 𝗉=12{\sf p}=12. The planning in that example was for 5 years with another 5 years to eliminate the end of horizon effect. This resulted in T=120T=120 stage stochastic optimization problem. Periodical approach, which we discuss below, allows drastic reduction in the number of stages while giving almost the same policy value for the first stage.

In [48] such periodical approach was introduced from the stochastic programming point of view. We discuss this in the framework of SOC. Consider the following infinite horizon SOC problem with discount factor γ(0,1)\gamma\in(0,1):

minut𝒰t\displaystyle\min\limits_{u_{t}\in{\cal U}_{t}} 𝔼π[t=1γt1ct(xt,ut,ξt)]\displaystyle{\mathbb{E}}^{\pi}\left[\sum\limits_{t=1}^{\infty}\gamma^{t-1}c_{t}(x_{t},u_{t},\xi_{t})\right] (5.54)
s.t.\displaystyle{\rm s.t.} xt+1=Ft(xt,ut,ξt),t1.\displaystyle x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t}),\;t\geq 1. (5.55)

We assume the following periodic structure, with period 𝗉1{\sf p}\geq 1: the random data process ξt\xi_{t} is stagewise independent with ξt\xi_{t} and ξt+𝗉\xi_{t+{\sf p}} having the same probability distribution, and with ct(,,)=ct+𝗉(,,)c_{t}(\cdot,\cdot,\cdot)=c_{t+{\sf p}}(\cdot,\cdot,\cdot), Ft(,,)=Ft+m(,,)F_{t}(\cdot,\cdot,\cdot)=F_{t+m}(\cdot,\cdot,\cdot) and 𝒰t=𝒰t+𝗉{\cal U}_{t}={\cal U}_{t+{\sf p}}, for t1t\geq 1.

Bellman equations for this problem take the form (compare with (3.10))

Vt(xt)\displaystyle V_{t}(x_{t}) =\displaystyle= infut𝒰t𝔼[ct(xt,ut,ξt)+γVt+1(Ft(xt,ut,ξt))],t=1,,𝗉1,\displaystyle\inf\limits_{u_{t}\in{\cal U}_{t}}{\mathbb{E}}\left[c_{t}(x_{t},u_{t},\xi_{t})+\gamma V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t})\big{)}\right],\;t=1,...,{\sf p}-1, (5.56)
V𝗉(x𝗉)\displaystyle V_{\sf p}(x_{\sf p}) =\displaystyle= infu𝗉𝒰𝗉𝔼[c𝗉(x𝗉,u𝗉,ξ𝗉)+γV1(F𝗉(x𝗉,u𝗉,ξ𝗉))],t=𝗉.\displaystyle\inf\limits_{u_{\sf p}\in{\cal U}_{\sf p}}{\mathbb{E}}\left[c_{\sf p}(x_{\sf p},u_{\sf p},\xi_{\sf p})+\gamma V_{1}\big{(}F_{\sf p}(x_{\sf p},u_{\sf p},\xi_{\sf p})\big{)}\right],\;t={\sf p}. (5.57)

For period 𝗉=1{\sf p}=1, periodical problem (5.54) -(5.55) coincides with the stationary problem (5.41) of section 5.3.3, and equation (5.57) becomes Bellman equation (5.43). The optimal policy for problem (5.54) - (5.55) is given by ut=πt(xt)u_{t}=\pi_{t}(x_{t}) with

πt(xt)\displaystyle\pi_{t}(x_{t}) \displaystyle\in infut𝒰t𝔼[ct(xt,ut,ξt)+γV¯t+1(Ft(xt,ut,ξt))],t=1,,𝗉1,\displaystyle\inf\limits_{u_{t}\in{\cal U}_{t}}{\mathbb{E}}\left[c_{t}(x_{t},u_{t},\xi_{t})+\gamma\bar{V}_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t})\big{)}\right],\;t=1,...,{\sf p}-1, (5.58)
π𝗉(x𝗉)\displaystyle\pi_{\sf p}(x_{\sf p}) \displaystyle\in infu𝗉𝒰𝗉𝔼[c𝗉(x𝗉,u𝗉,ξ𝗉)+γV¯1(F𝗉(x𝗉,u𝗉,ξ𝗉))],t=𝗉,\displaystyle\inf\limits_{u_{\sf p}\in{\cal U}_{\sf p}}{\mathbb{E}}\left[c_{\sf p}(x_{\sf p},u_{\sf p},\xi_{\sf p})+\gamma\bar{V}_{1}\big{(}F_{\sf p}(x_{\sf p},u_{\sf p},\xi_{\sf p})\big{)}\right],\;t={\sf p}, (5.59)

and πt+𝗉()=πt()\pi_{t+{\sf p}}(\cdot)=\pi_{t}(\cdot) for t1t\geq 1, where (V¯1,,V¯𝗉)(\bar{V}_{1},...,\bar{V}_{\sf p}) is the solution of equations (5.56) - (5.57).

Such periodical formulation can be extended to the (nested) risk averse setting and cutting planes type algorithm can be applied in a way similar to what was discussed above (cf., [48]).

6 Computational Complexity of Cutting Plane Methods

As mentioned earlier, SDDP reduces to Kelly’s cutting plane method for two stage problems and it can behave poorly as the number of decision variables increases. On the other hand, numerical experiments indicate that this type of algorithm can scale well with respect to the number of stages TT. In this section we summarize some recent theoretical studies in [22, 17] on the rate of convergence of SDDP. We first establish the number of iterations required by the SDDP method, and then present a variant of SDDP that can improve this complexity bound in terms of the dependence on the number of stages. Related developments, with a focus more on mixed-integer nonlinear optimization, can also be found in [53].

6.1 Computational complexity of SDDP

Consider the multi-stage stochastic programming problem (2.1)-(2.2). Similar to the the previous section, we assume that ξt\xi_{t}’s are stagewise independent (i.e., Assumption 2.2) and finitely supported (i.e., Assumption 5.1). For simplicity we further assume that the probabilities for ξti\xi_{ti}, i=1,,Ni=1,\ldots,N, are equal, i.e., pti=1/Np_{ti}=1/N, t=2,,Tt=2,...,T, i=1,,Ni=1,...,N. In that case, the dynamic equations in (2.8)-(2.9) reduce to

Qt(xt1,ξti)=infxt𝒳t{Fti(xt):=ft(xt,ξti)+𝒬t+1(xt):Btixt1+Atixt=bti},Q_{t}(x_{t-1},\xi_{ti})=\inf_{x_{t}\in{\cal X}_{t}}\left\{F_{ti}(x_{t}):=f_{t}(x_{t},\xi_{ti})+{\cal Q}_{t+1}(x_{t}):B_{ti}x_{t-1}+A_{ti}x_{t}=b_{ti}\right\}, (6.1)

where

𝒬t+1(xt)=1Ni=1NQt+1(xt,ξt+1,i).{\cal Q}_{t+1}(x_{t})=\tfrac{1}{N}{\textstyle\sum}_{i=1}^{N}Q_{t+1}(x_{t},\xi_{t+1,i}). (6.2)

The basic scheme of the SDDP method has been discussed in Section 5.1. Here we add a little more details to facilitate its complexity analysis. Each iteration of the SDDP method to solve problem (2.1)-(2.2) contains a forward step and a backward step. In the forward step of the SDDP method, we randomly pick up an index iti_{t} out of {1,,N}\{1,\ldots,N\} and solve problem

xtkargmin{F¯titk1(xt):=ft(xt,ξtit)+𝒬¯t+1k1(xt)}s.t.Btitxt1k+Atitxt=btit,xt𝒳t,\begin{array}[]{lll}x_{t}^{k}&\in&\mathop{\rm arg\,min}\left\{\underline{F}_{t{i_{t}}}^{k-1}(x_{t}):=f_{t}(x_{t},\xi_{t{i_{t}}})+\underline{\cal Q}_{t+1}^{k-1}(x_{t})\right\}\\ &&\ \ \ {\mbox{s.t.}\ \ \ B_{ti_{t}}x_{t-1}^{k}+A_{ti_{t}}x_{t}=b_{ti_{t}}},\;x_{t}\in{\cal X}_{t},\end{array}

to update xtkx_{t}^{k}. Equivalently, one can view xtkx_{t}^{k} as being randomly chosen from x~tik\tilde{x}_{ti}^{k}, i=1,,Ni=1,\ldots,N, defined as

x~tikargmin{F¯tik1(xt):=ft(xt,ξti)+𝒬¯t+1k1(xt)}s.t.Btixt1k+Atixt=bti,xt𝒳t.\begin{array}[]{lll}\tilde{x}_{ti}^{k}&\in&\mathop{\rm arg\,min}\left\{\underline{F}_{ti}^{k-1}(x_{t}):=f_{t}(x_{t},\xi_{ti})+\underline{\cal Q}_{t+1}^{k-1}(x_{t})\right\}\\ &&\ \ \ {\mbox{s.t.}\ \ \ B_{ti}x_{t-1}^{k}+A_{ti}x_{t}=b_{ti}},\;x_{t}\in{\cal X}_{t}.\end{array} (6.3)

Note that we do not need to compute x~tik\tilde{x}_{ti}^{k} for iiti\neq i_{t}, even though they will be used in the analysis of the SDDP method. In next section, we will present a deterministic dual dynamic programming method which chooses the feasible solution xtkx_{t}^{k} in the forward step in a more aggressive manner.

In the backward step of SDDP, starting from the last stage, we update the cutting plane models 𝒬¯tk\underline{\cal Q}_{t}^{k} according to

𝒬¯tk(xt1)=max{𝒬¯tk1(x),1Ni=1N[𝔔tik(xt1k)+(𝔔tik)(xt1k),xt1xt1k]},\underline{\cal Q}_{t}^{k}(x_{t-1})=\max\left\{\underline{\cal Q}_{t}^{k-1}(x),\tfrac{1}{N}{\textstyle\sum}_{i=1}^{N}\left[{\mathfrak{Q}}_{ti}^{k}(x_{t-1}^{k})+\langle({\mathfrak{Q}}_{ti}^{k})^{\prime}(x_{t-1}^{k}),x_{t-1}-x_{t-1}^{k}\rangle\right]\right\},

for t=T,T1,,2t=T,T-1,\ldots,2. Here 𝔔tik(xt1k){\mathfrak{Q}}_{ti}^{k}(x_{t-1}^{k}) and (𝔔tik)(xt1k)({\mathfrak{Q}}_{ti}^{k})^{\prime}(x_{t-1}^{k}), respectively, denote the optimal value and a subgradient at the point xt1kx_{t-1}^{k} for the following approximate value function

𝔔tik(xt1k)=min{F¯tik(xt):=ft(xt,ξti)+𝒬¯t+1k(xt)}s.t.Btixt1k+Atixt=bti,xt𝒳t.\begin{array}[]{lll}{\mathfrak{Q}}_{ti}^{k}(x_{t-1}^{k})&=&\min\left\{\underline{F}_{ti}^{k}(x_{t}):=f_{t}(x_{t},\xi_{ti})+\underline{\cal Q}_{t+1}^{k}(x_{t})\right\}\\ &&\ \ {\mbox{s.t.}\ \ B_{ti}x_{t-1}^{k}+A_{ti}x_{t}=b_{ti}},x_{t}\in{\cal X}_{t}.\end{array}

In order to assess the progress made by each SDDP iteration, we need to introduce a few notions. We say that a search point xtkx_{t}^{k} is ϵt\epsilon_{t}-saturated at iteration kk if 𝒬t+1(xtk)𝒬¯t+1k(xtk)ϵt.{\cal Q}_{t+1}(x_{t}^{k})-\underline{\cal Q}_{t+1}^{k}(x_{t}^{k})\leq\epsilon_{t}. Intuitively, a point xtkx_{t}^{k} is saturated if we have a good approximation for the value function 𝔔t+1{\mathfrak{Q}}_{t+1} at xtkx_{t}^{k}. Moreover, we say an ϵt\epsilon_{t}-saturated search point xtkx_{t}^{k} at stage tt is δt\delta_{t}-distinguishable if xtkxtj>δt\|x_{t}^{k}-x_{t}^{j}\|>\delta_{t} for all other ϵt\epsilon_{t}-saturated search points xtjx_{t}^{j}, jk1j\leq k-1, that have been generated for stage tt so far by the algorithm. Let us denote Stk1S_{t}^{k-1} the set of saturated points at stage tt. An ϵt\epsilon_{t}-saturated search point xtkx_{t}^{k} is δt\delta_{t}-distinguishable if dist(xtk,Stk1)>δt.\mathop{\rm dist}(x_{t}^{k},S_{t}^{k-1})>\delta_{t}. Recall that dist(x,S)\mathop{\rm dist}(x,S) denotes the distance from point xx to the set SS. Hence, in this case, this search point xtkx_{t}^{k} is far away enough from other saturated points. The rate of convergence of SDDP depends on how fast it generates ϵt\epsilon_{t}-saturated and δt\delta_{t}-distinguishable search points. For simplicity we assume in this paper that ϵt=δt=ϵ,t=1,,T\epsilon_{t}=\delta_{t}=\epsilon,\;t=1,\ldots,T, for a given tolerance ϵ>0\epsilon>0.

Under certain regularity assumptions, it can be shown that the function FtiF_{ti} and lower approximation functions F¯tik\underline{F}_{ti}^{k} are Lipschitz continuous with constant MM. Utilizing these continuity assumptions, it can be shown that the probability for SDDP to find a new ϵ\epsilon-distinguishable and ϵ\epsilon-saturated search point at the kk-iteration can be bounded from below by

1N¯(1𝖯𝗋{g~tkϵ,t=1,,T1}),\tfrac{1}{\bar{N}}(1-{\sf Pr}\{\tilde{g}_{t}^{k}\leq\epsilon,t=1,\ldots,T-1\}), (6.4)

where

N¯:=NT2andg~tk:=1Ni=1Ndist(x~tik,Stk1).\bar{N}:=N^{T-2}\ \ \mbox{and}\ \ \ \tilde{g}_{t}^{k}:=\tfrac{1}{N}{\textstyle\sum}_{i=1}^{N}\mathop{\rm dist}(\tilde{x}_{ti}^{k},S_{t}^{k-1}). (6.5)

On the other hand, if for some iteration kk, we have g~tkϵ\tilde{g}_{t}^{k}\leq\epsilon for all t=1,,T1t=1,\ldots,T-1, then

F11(x1k)FF11(x1k)F¯11k1(x1k)2M(T1)ϵ.F_{11}(x_{1}^{k})-F^{*}\leq F_{11}(x_{1}^{k})-\underline{F}^{k-1}_{11}(x_{1}^{k})\leq 2M(T-1)\,\epsilon. (6.6)

Note that the upper bound 2M(T1)2M(T-1) can be further reduced so that it does not depend on TT if a discounting factor is incorporated in each stage of the problem.

Now let KK denote the number of iterations performed by the SDDP method before it finds a forward path (x1k,,xTk)(x_{1}^{k},\ldots,x_{T}^{k}) such that (6.6) holds. Using the previous observation in (6.4), it can be shown (see [22]) that 𝔼[K]K¯ϵN¯+2{\mathbb{E}}[K]\leq\bar{K}_{\epsilon}\bar{N}+2, where N¯\bar{N} is defined in (6.5) and

K¯ϵ:=(T1)(Dϵ+1)n.\bar{K}_{\epsilon}:=(T-1)\left(\tfrac{D}{\epsilon}+1\right)^{n}. (6.7)

Here ntnn_{t}\leq n and DtDD_{t}\leq D for t=1,,Tt=1,\ldots,T denote the dimension and diameter of the feasible sets 𝒳t{\cal X}_{t}, respectively. In addition, for any α1\alpha\geq 1, we have

𝖯𝗋{KαK¯ϵN¯+1}exp((α1)2K¯ϵ22αN¯).{\sf Pr}\{K\geq\alpha\bar{K}_{\epsilon}\bar{N}+1\}\leq\exp\left(-\tfrac{(\alpha-1)^{2}\bar{K}_{\epsilon}^{2}}{2\alpha\bar{N}}\right).

We now add a few remarks about the above complexity results for SDDP. Firstly, since SDDP is a randomized algorithm, we provide bounds on the expected number of iterations required to find an approximate solution of problem (2.1)-(2.2). We also show that the probability of having large deviations from these expected bounds for SDDP decays exponentially fast. Secondly, the complexity bound for the SDDP method depends on N¯\bar{N}, which increases exponentially with respect to TT. We will show in the next section how to reduce the dependence on TT by presenting a variant of the SDDP method.

6.2 Explorative Dual Dynamic Programming

The SDDP method in the previous section chooses the feasible solution xtkx_{t}^{k} in a randomized manner. In this section, we will introduce a deterministic method, called Explorative Dual Dynamic Programming (EDDP) first presented in [22], which chooses the feasible solution in the forward step in an aggressive manner. As we will see, the latter approach will exhibit better iteration complexity while the former one is easier to implement.

The EDDP method consists of the forward step and backward step. In the forward step of EDDP, for each stage tt, we solve NN subproblems defined in (6.3) to compute the search points x~tik\tilde{x}_{ti}^{k}, i=1,,Ni=1,\ldots,N. For each x~tik\tilde{x}_{ti}^{k}, we further compute the quantity dist(x~tik,Stk1)\mathop{\rm dist}(\tilde{x}_{ti}^{k},S_{t}^{k-1}), the distance between x~tik\tilde{x}_{ti}^{k} and the set Stk1S_{t}^{k-1} of currently saturated search points in stage tt. Then we will choose from x~tik\tilde{x}_{ti}^{k}, i=1,,Ni=1,\ldots,N, the one with the largest value of dist(x~tik,Stk1)\mathop{\rm dist}(\tilde{x}_{ti}^{k},S_{t}^{k-1}) as xtkx_{t}^{k}, i.e., dist(xtk,Stk1)=maxi=1,,Ndist(x~tik,Stk1)\mathop{\rm dist}(x_{t}^{k},S_{t}^{k-1})=\max_{i=1,\ldots,N}\mathop{\rm dist}(\tilde{x}_{ti}^{k},S_{t}^{k-1}). We can break the ties arbitrarily. In view of the above discussion, the EDDP method always chooses the most “distinguishable” forward path to encourage exploration in an aggressive manner. This also explains the origin of the name EDDP. The backward step of EDDP is similar to SDDP.

Under the same regularity assumptions as in SDDP, it can be shown that each EDDP iteration will either generate at least one ϵ\epsilon-distinguishable and ϵ\epsilon-saturated search point at some stage t=1,,Tt=1,\ldots,T, or find a feasible solution x1kx_{1}^{k} of problem (2.1)-(2.2) such that

F11(x1k)F\displaystyle F_{11}(x_{1}^{k})-F^{*} 2M(T1)ϵ.\displaystyle\leq 2M(T-1)\,\epsilon. (6.8)

As a result, the total number of iterations performed by EDDP before finding a feasible policy of problem (2.1)-(2.2) satisfying (6.8) within at most K¯ϵ+1\bar{K}_{\epsilon}+1 iterations where K¯ϵ\bar{K}_{\epsilon} is defined in (6.7) (see [22]). This complexity bound does not depend on the number of scenarios N¯\bar{N}, and hence significantly outperforms the one for the original SDDP method. Moreover, we can show that the relation dist(x1k,S1k1)δ\mathop{\rm dist}(x_{1}^{k},S_{1}^{k-1})\leq\delta implies the relation in (6.8) for a properly chosen value of δ\delta, and hence it can be used as a termination criterion for the EDDP method. Note that it is possible to further reduce the dependence of the computational complexity of EDDP on TT so that the term K¯ϵ\bar{K}_{\epsilon} in (6.7) does not depend on TT. However, this would require us to further modify EDDP by incorporating early termination of the forward step. More specifically, we need to terminate the forward step at some stage t¯T\bar{t}\leq T and start the backward step from t¯\bar{t}, whenever dist(xt¯k,St¯k1)\mathop{\rm dist}(x_{\bar{t}}^{k},S_{\bar{t}}^{k-1}) falls within a certain threshold value (see [53, 17]).

It should be noted that although the complexity of SDDP is worse than that for EDDP, its performance in earlier steps of the algorithm should be similar to that of EDDP. Intuitively, for earlier iterations, the tolerance parameter δt\delta_{t} used to define δt\delta_{t}-distinguishable points are large. As long as δt\delta_{t} are large enough so that the solutions x~tik\tilde{x}^{k}_{ti} are contained within a ball with diameter roughly in the order of δt\delta_{t}, one can choose any point randomly from x~tik\tilde{x}^{k}_{ti} as xtkx^{k}_{t}. In this case, SDDP will perform similarly to EDDP. This may explain why SDDP exhibits good practical performance for low accuracy region. For high accuracy region, the new EDDP algorithm seems to be a much better choice in terms of its theoretical complexity.

6.3 Complexity of EDDP and SDDP over an infinite horizon

It is possible to generalize EDDP and SDDP to solve stochastic programs over an infinite horizon (see Section 5.3.3), when the number of stages T=T=\infty, the cost function ft=ff_{t}=f, the feasible set 𝒳t=𝒳{\cal X}_{t}={\cal X} for all tt, and the random variables ξt\xi_{t} are iid (independent identically distributed) and can be viewed as realizations of random vector ξ\xi. A common line is to approximate the infinite horizon with a finite horizon. For example, using the relation (5.49) in the SOC problem, a horizon length of T=O((ϵ(1λ))1)T=O((\epsilon(1-\lambda))^{-1}) suffices for a target accuracy ϵ\epsilon. A direct extension of EDDP method show that the computational complexity depends only quadratically on the time horizon TT. Exploiting the fact that the infinite-horizon problem is stationary, it has been shown that a simple modification that combines the forward and backward step can improve the dependence on TT from quadratic to linear. Alternative selection strategies, including the use of upper bounds on the cost-to-go function and random sampling as in SDDP can also be incorporated (see [17] for more details).

7 Dynamic Stochastic Approximation Algorithms

7.1 Extension of Stochastic Approximation

Stochastic approximation (SA) has attracted much attention recently for solving static stochastic optimization problems given in the form of

minx𝒳{f(x):=𝔼ξ[F(x,ξ)]},\min_{x\in{\cal X}}\left\{f(x):={\mathbb{E}}_{\xi}[F(x,\xi)]\right\}, (7.1)

where 𝒳{\cal X} is a closed convex set, ξ\xi denotes the random vector and F(,ξ)F(\cdot,\xi) is a lower semicontinuous convex function. Observe that we can cast two-stage stochastic optimization in the form of (7.1) by viewing FF as the value function of the second-stage problem (see [28, 23, 19]).

The basic SA algorithm, initially proposed by Robbins and Monro [38], mimics the simple projected gradient descent method by replacing exact gradient with its unbiased estimator. Important improvements for the SA methods have been made by Nemirovski and Yudin [29] and later by Polayk and Juditsky [35, 36]. During the past few years, significant progress has been made in SA methods including the incorporation of momentum and variance reduction, and generalization to nonconvex setting [21]. It has been shown in [28, 23] that SA type methods can significantly outperform the SAA approach for solving static (or two-stage) stochastic programming problems. However, it remains unclear whether these SA methods can be generalized for multi-stage stochastic optimization problems with T3T\geq 3.

In this section, we discuss a recently developed dynamic stochastic approximation (DSA) method for multi-stage stochastic optimization [24]. The basic idea of the DSA method is to apply an inexact primal-dual SA method for solving the tt-th stage optimization problem to compute an approximate stochastic subgradient for its associated value functions. For simplicity, we focus on the basic scheme of the DSA algorithm and its main convergence properties for solving three-stage stochastic optimization problems. We also briefly discuss how to generalize it for solving more general form of multi-stage stochastic optimization with T>3T>3.

The main problem of interest in this section is the following three-stage SP problem:

min\displaystyle\min f1(x1,ξ1)+\displaystyle f_{1}(x_{1},\xi_{1})+ 𝔼|ξ1[min\displaystyle{\mathbb{E}}_{|\xi_{1}}[\min f2(x2,ξ2)\displaystyle\ f_{2}(x_{2},\xi_{2}) +𝔼|ξ[2][min\displaystyle+{\mathbb{E}}_{|\xi_{[2]}}[\min f3(x3,ξ3)]]\displaystyle\ f_{3}(x_{3},\xi_{3})]] (7.2)
s.t. A1x1=b1,\displaystyle\ A_{1}x_{1}=b_{1}, s.t. A2x2+B2x1=b2,\displaystyle\ A_{2}x_{2}+B_{2}x_{1}=b_{2}, s.t. A3x3+B3x2=b3,\displaystyle\ A_{3}x_{3}+B_{3}x_{2}=b_{3},
x1𝒳1,\displaystyle x_{1}\in{\cal X}_{1},\ \ \ x2𝒳2,\displaystyle\ x_{2}\in{\cal X}_{2},\ \ \ x3𝒳3.\displaystyle\ x_{3}\in{\cal X}_{3}.

We can write problem (7.2) in a more compact form by using value functions as discussed in (2.3)-(2.5). More specifically, let Q3(x2,ξ[3])Q_{3}(x_{2},\xi_{[3]}) be the value function at the third stage and 𝒬3(x2){\cal Q}_{3}(x_{2}) be the corresponding expected value function conditionally on ξ[2]\xi_{[2]}:

Q3(x2,ξ[3]):=minf3(x3,ξ3) s.t.A3x3+B3x2=b3,x3𝒳3.𝒬3(x2,ξ[2]):=𝔼|ξ[2][Q3(x2,ξ[3])].\begin{array}[]{lll}Q_{3}(x_{2},\xi_{[3]})&:=&\min\ f_{3}(x_{3},\xi_{3})\\ &&\text{ s.t.}\ \ A_{3}x_{3}+B_{3}x_{2}=b_{3},\\ &&\quad\quad\quad x_{3}\in{\cal X}_{3}.\\ {\cal Q}_{3}(x_{2},\xi_{[2]})&:=&{\mathbb{E}}_{|\xi_{[2]}}[Q_{3}(x_{2},\xi_{[3]})].\end{array} (7.3)

We can then define the stochastic cost-to-go function Q2(x1,ξ[2])Q_{2}(x_{1},\xi_{[2]}) and its corresponding (expected) value function as

Q2(x1,ξ[2]):=minf2(x2,ξ2)+𝒬3(x2,ξ[2]) s.t.A2x2+B2x1=b2,x2𝒳2.𝒬2(x1,ξ1):=𝔼|ξ1[Q2(x1,ξ[2])]=𝔼[Q2(x1,ξ2)].\begin{array}[]{lll}Q_{2}(x_{1},\xi_{[2]})&:=&\min\ f_{2}(x_{2},\xi_{2})+{\cal Q}_{3}(x_{2},\xi_{[2]})\\ &&\ \text{ s.t.}\ \ A_{2}x_{2}+B_{2}x_{1}=b_{2},\\ &&\quad\quad\quad x_{2}\in{\cal X}_{2}.\\ {\cal Q}_{2}(x_{1},\xi_{1})&:=&{\mathbb{E}}_{|\xi_{1}}[Q_{2}(x_{1},\xi_{[2]})]={\mathbb{E}}[Q_{2}(x_{1},\xi_{2})].\end{array} (7.4)

Problem (7.2) can then be formulated equivalently as

minf1(x1,ξ1)+𝒬2(x1,ξ1) s.t.A1x1=b1,x1𝒳1.\begin{array}[]{ll}\min\ f_{1}(x_{1},\xi_{1})+{\cal Q}_{2}(x_{1},\xi_{1})\\ \text{ s.t.}\ \ A_{1}x_{1}=b_{1},\\ \quad\quad\quad x_{1}\in{\cal X}_{1}.\end{array} (7.5)

7.2 Approximate stochastic subgradients

In order to solve problem (7.5) by stochastic approximation type methods, we need to understand how to compute first-order information of the value functions 𝒬2{\cal Q}_{2} and 𝒬3{\cal Q}_{3}. Recall that for a given closed convex set 𝒳n{\cal X}\subseteq{\mathbb{R}}^{n} and a closed convex function 𝒬:𝒳{\cal Q}:{\cal X}\to{\mathbb{R}}, g(x)g(x) is called an ϵ\epsilon-subgradient of 𝒬{\cal Q} at xXx\in X if

𝒬(y)𝒬(x)+g(x),yxϵy𝒳.{\cal Q}(y)\geq{\cal Q}(x)+\langle g(x),y-x\rangle-\epsilon\ \ \forall y\in{\cal X}. (7.6)

The collection of all such ϵ\epsilon-subgradients of 𝒬{\cal Q} at xx is called the ϵ\epsilon-subdeifferential of 𝒬{\cal Q} at xx, denoted by ϵ𝒬(x)\partial_{\epsilon}{\cal Q}(x). Since both 𝒬2{\cal Q}_{2} and 𝒬3{\cal Q}_{3} are given in the form of (conditional) expectation, their exact first-order information is hard to compute. We resort to the computation of a stochastic ϵ\epsilon-subgradient of these value functions defined as follows. Observe that we do not assume stagewise independence throughout this section.

Definition 7.1

G(x,ξ[t])G(x,\xi_{[t]}) is called a stochastic ϵ\epsilon-subgradient of the value function 𝒬t(x,ξ[t1])=𝔼|ξ[t1][Qt(x,ξ[t])]{\cal Q}_{t}(x,\xi_{[t-1]})={\mathbb{E}}_{|\xi_{[t-1]}}[Q_{t}(x,\xi_{[t]})] if G(x,ξ[t])G(x,\xi_{[t]}) is an unbiased estimator of an ϵ\epsilon-subgradient of 𝒬t(x,ξ[t1]){\cal Q}_{t}(x,\xi_{[t-1]}) with respect to xx, i.e.,

𝔼|ξ[t1][G(x,ξ[t])]=g(x,ξ[t1])andg(x,ξ[t1])ϵ𝒬t(x,ξ[t1]).{\mathbb{E}}_{|\xi_{[t-1]}}[G(x,\xi_{[t]})]=g(x,\xi_{[t-1]})\ \ \mbox{and}\ \ g(x,\xi_{[t-1]})\in\partial_{\epsilon}{\cal Q}_{t}(x,\xi_{[t-1]}). (7.7)

To compute a stochastic ϵ\epsilon-subgradient of 𝒬2{\cal Q}_{2} (resp., 𝒬3{\cal Q}_{3}), we have to compute an approximate subgradient of the corresponding stochastic value function Q2(x1,ξ[2])Q_{2}(x_{1},\xi_{[2]}) (resp., Q3(x2,ξ[3])Q_{3}(x_{2},\xi_{[3])}). To this end, we further assume that strong Lagrange duality holds for the optimization problems defined in (7.4) (resp.,(7.3)) almost surely. In other words, these problems can be formulated as saddle point problems:

Q2(x1,ξ[2])\displaystyle Q_{2}(x_{1},\xi_{[2]}) =maxy2minx2𝒳2b2B2x1A2x2,y2+f2(x2,ξ2)+𝒬3(x2,ξ[2]),\displaystyle=\max_{y_{2}}\min_{x_{2}\in{\cal X}_{2}}\langle b_{2}-B_{2}x_{1}-A_{2}x_{2},y_{2}\rangle+f_{2}(x_{2},\xi_{2})+{\cal Q}_{3}(x_{2},\xi_{[2]}), (7.8)
Q3(x2,ξ[3])\displaystyle Q_{3}(x_{2},\xi_{[3]}) =maxy3minx3𝒳3b3B3x2A3x3,y3+f3(x3,ξ3).\displaystyle=\max_{y_{3}}\min_{x_{3}\in{\cal X}_{3}}\langle b_{3}-B_{3}x_{2}-A_{3}x_{3},y_{3}\rangle+f_{3}(x_{3},\xi_{3}). (7.9)

One set of sufficient conditions to guarantee the equivalence between (7.4) (resp.,(7.3)) and (7.8) (resp., (7.9)) is that (7.4) (resp.,(7.3)) is solvable and the Slater condition holds.

Observe that (7.8) and (7.9) are special cases of a more generic saddle point problem:

Q(u,ξ):=maxyminx𝒳bBuAx,y+f(x,ξ)+𝒬(x),Q(u,\xi):=\max_{y}\min_{x\in{\cal X}}\langle b-Bu-Ax,y\rangle+f(x,\xi)+{\cal Q}(x), (7.10)

where bb, AA and BB are linear mappings of the random variable ξ\xi. For example, (7.9) is a special case of (7.10) with u=x2u=x_{2}, y=y3y=y_{3}, ξ=ξ3\xi=\xi_{3}, f=f3f=f_{3} and 𝒬=0{\cal Q}=0. It is worth noting that the first stage problem can also be viewed as a special case of (7.10), since (7.5) is equivalent to

maxy1minx1𝒳1{b1A1x1,y1+f1(x1,ξ1)+𝒬2(x1,ξ1)}.\max_{y_{1}}\min_{x_{1}\in{\cal X}_{1}}\ \left\{\langle b_{1}-A_{1}x_{1},y_{1}\rangle+f_{1}(x_{1},\xi_{1})+{\cal Q}_{2}(x_{1},\xi_{1})\right\}. (7.11)

We now discuss how to relate an approximate solution of the saddle point problem in (7.10) to an approximate subgradient of QQ. Let (x,y)(x_{*},y_{*}) be a pair of optimal solutions of the saddle point problem (7.10), i.e.,

Q(u,ξ)\displaystyle Q(u,\xi) =y,bBuAx+f(x,ξ)+𝒬(x)=f(x,ξ)+𝒬(x),\displaystyle=\langle y_{*},b-Bu-Ax_{*}\rangle+f(x_{*},\xi)+{\cal Q}(x_{*})=f(x_{*},\xi)+{\cal Q}(x_{*}), (7.12)

where the second identity follows from the complementary slackness of Lagrange duality. It can be shown (see Lemma 1 of [24]) that if

gap(z¯;x,y)\displaystyle\mathop{\rm gap}(\bar{z};x,y_{*}) :=y,bBuAx¯+f(x¯,ξ)+𝒬(x¯)\displaystyle:=\langle y_{*},b-Bu-A\bar{x}\rangle+f(\bar{x},\xi)+{\cal Q}(\bar{x}) (7.13)
y¯,bBuAxf(x,ξ)𝒬(x)ϵ,x𝒳,\displaystyle\quad-\langle\bar{y},b-Bu-Ax\rangle-f(x,\xi)-{\cal Q}(x)\leq\epsilon,\ \forall x\in{\cal X},

for a given z¯:=(x¯,y¯)Z\bar{z}:=(\bar{x},\bar{y})\in Z and un0u\in{\mathbb{R}}^{n_{0}}, then BTy¯-B^{T}\bar{y} is an ϵ\epsilon-subgradient of Q(u,ξ)Q(u,\xi) at uu.

In view of this observation, in order to compute a stochastic subgradient of 𝒬t(u,ξ[t1])=𝔼[Qt(u,ξ[t])|ξ[t1]]{\cal Q}_{t}(u,\xi_{[t-1]})={\mathbb{E}}[Q_{t}(u,\xi_{[t]})|\xi_{[t-1]}] at a given point uu, we can first generate a random realization ξt\xi_{t} conditionally on ξ[t1]\xi_{[t-1]} and then try to find a pair of solutions (x¯,y¯)(\bar{x},\bar{y}) satisfying

yt,btBtuAtx¯+f(x¯,ξt)+𝒬t+1(x¯,ξ[t])\displaystyle\langle y_{t*},b_{t}-B_{t}u-A_{t}\bar{x}\rangle+f(\bar{x},\xi_{t})+{\cal Q}_{t+1}(\bar{x},\xi_{[t]})
y¯,btBtuAtxf(x,ξt)𝒬t+1(x,ξ[t])ϵ,x𝒳t,\displaystyle-\langle\bar{y},b_{t}-B_{t}u-A_{t}x\rangle-f(x,\xi_{t})-{\cal Q}_{t+1}(x,\xi_{[t]})\leq\epsilon,\ \forall x\in{\cal X}_{t},

where ytyt(ξ[t])y_{t*}\equiv y_{t*}(\xi_{[t]}) denotes the optimal solution for the tt-th stage problem associated with the random realization ξ[t]\xi_{[t]}. We will then use BTy¯-B^{T}\bar{y} as a stochastic ϵ\epsilon-subgradient of 𝒬t(u,ξ[t1]){\cal Q}_{t}(u,\xi_{[t-1]}) at uu. The difficulty exists in that the function 𝒬t+1(x¯,ξ[t]){\cal Q}_{t+1}(\bar{x},\xi_{[t]}) is also given in the form of expectation, and requires a numerical procedure to estimate its value. We will discuss this in more details in the next section.

7.3 The DSA algorithm and its convergence properties

Our goal in this section is to present the basic scheme of the dynamic stochastic approximation algorithm applied to problem (7.5). This algorithm relies on the following three key primal-dual steps, referred to as stochastic primal-dual transformation (SPDT), applied to the generic saddle point problem in (7.10) at every stage.

(p+,d+,d~)=SPDT(p,d,d_,𝒬,u,ξ,f,X,θ,τ,η)(p_{+},d_{+},\tilde{d})={\rm SPDT}(p,d,d_{\_},{\cal Q}^{\prime},u,\xi,f,X,\theta,\tau,\eta):

d~\displaystyle\tilde{d} =θ(dd_)+d.\displaystyle=\theta(d-d_{\_})+d. (7.14)
p+\displaystyle p_{+} =argminxXbBuAx,d~+f(x,ξ)+𝒬,x+τ2xp2.\displaystyle=\mathop{\rm arg\,min}_{x\in X}\langle b-Bu-Ax,\tilde{d}\rangle+f(x,\xi)+\langle{\cal Q}^{\prime},x\rangle+\tfrac{\tau}{2}\|x-p\|^{2}. (7.15)
d+\displaystyle d_{+} =argminyb+Bu+Ap+,y+η2yd2.\displaystyle=\mathop{\rm arg\,min}_{y}\langle-b+Bu+Ap_{+},y\rangle+\tfrac{\eta}{2}\|y-d\|^{2}. (7.16)

In the above primal-dual transformation, the input (p,d,d_)(p,d,d_{\_}) denotes the current primal solution, dual solution, and the previous dual solution, respectively. Moreover, the input 𝒬{\cal Q}^{\prime} denotes a stochastic ϵ\epsilon-subgradient for 𝒬{\cal Q} at the current search point pp. The parameters (u,ξ,f,X)(u,\xi,f,X) describe the problem in (7.10) and (θ,τ,η)(\theta,\tau,\eta) are certain algorithmic parameters to be specified. Given these input parameters, the relation in (7.14) defines a dual extrapolation (or prediction) step to estimate the dual variable d~\tilde{d} for the next iterate. Based on this estimate, (7.15) performs a primal projection to compute p+p_{+}, and then (7.16) updates in the dual space to compute d+d_{+} by using the updated p+p_{+}. Note that the Euclidean projection in (7.15) can be extended to the non-Euclidean setting by replacing the term xp2/2\|x-p\|^{2}/2 with Bregman divergence. We assume that the above SPDT operator can be performed very fast or even has explicit expressions. The primal-dual transformation is closely related to the alternating direction method of multipliers and the primal-dual hybrid gradient method (see [24] for an account of history for this procedure).

In order to solve problem (7.5), we will combine the above primal-dual transformation applied to all the three stages, the scenario generation for the random variables ξ2\xi_{2} and ξ3\xi_{3} in the second and third stage, and certain averaging steps in both the primal and dual spaces to compute an approximate pair of primal and dual solution for the saddle point problem in the form of (7.10) at each stage. More specifically, the DSA method consists of three loops. The innermost (third) loop runs N3N_{3} steps of SPDT in order to compute an approximate stochastic subgradient of the value function 𝒬3{\cal Q}_{3} of the third stage. The second loop consists of N2N_{2} SPDTs applied to the saddle point formulation of the second-stage problem, which requires the output from the third loop. The outer loop applies N1N_{1} SPDTs to the saddle point formulation of the first-stage optimization problem in (7.5), using the approximate stochastic subgradients for 𝒬2{\cal Q}_{2} computed by the second loop. In this algorithm, we need to generate N1N_{1} and N1×N2N_{1}\times N_{2} realizations for the random vectors ξ2\xi_{2} and ξ3\xi_{3}, respectively.

Observe that the DSA algorithm described above is conceptual only since we have not specified any algorithmic parameters (θ,τ,η)(\theta,\tau,\eta) yet. Two sets of parameters will be chosen depending on the stages where SPDTs are applied. One set of parameters that may lead to slightly slower rate of convergence but can guarantee the boundedness of the generated dual iterates will be used for the second and third stage, while another set of parameter setting is used in the first stage to achieve faster rate of convergence. Suppose that ftf_{t}, t=1,2,3t=1,2,3, are generally (not necessarily strongly) convex. By properly specifying the algorithmic parameters, it can be shown that by setting N3=𝒪(1/ϵ2)N_{3}={\cal O}(1/\epsilon^{2}) and N2=𝒪(1/ϵ2)N_{2}={\cal O}(1/\epsilon^{2}), we will find an approximate ϵ\epsilon-solution, i.e., a point x¯1𝒳1\bar{x}_{1}\in{\cal X}_{1} such that

𝔼[f1(x¯1,ξ1)+𝒬2(x¯1,ξ1)(f1(x,ξ1)+𝒬2(x,ξ1))]ϵ,𝔼[Ax¯1b]ϵ,\begin{array}[]{l}{\mathbb{E}}[f_{1}(\bar{x}_{1},\xi_{1})+{\cal Q}_{2}(\bar{x}_{1},\xi_{1})-(f_{1}(x_{*},\xi_{1})+{\cal Q}_{2}(x_{*},\xi_{1}))]\leq\epsilon,\\ {\mathbb{E}}[\|A\bar{x}_{1}-b\|]\leq\epsilon,\end{array}

in at most N1=𝒪(1/ϵ2)N_{1}={\cal O}(1/\epsilon^{2}) outer iterations. As a consequence, the number of random samples ξ2\xi^{2} and ξ3\xi^{3} used in the DSA method are bounded by

N1=𝒪(1/ϵ2)andN1×N2=𝒪(1/ϵ4),N_{1}=\mathcal{O}(1/\epsilon^{2})\ \ \mbox{and}\ \ \ N_{1}\times N_{2}=\mathcal{O}(1/\epsilon^{4}), (7.17)

respectively. Now consider the case when the objective functions ftf_{t}, t=1,2,3t=1,2,3, are strongly convex. In that case the above complexity results can be significantly improved. In particular, by choosing N3=𝒪(1/ϵ)N_{3}={\cal O}(1/\sqrt{\epsilon}) and N2=𝒪(1/ϵ)N_{2}={\cal O}(1/\epsilon), we will find an approximate ϵ\epsilon-solution in at most N1=𝒪(1/ϵ)N_{1}={\cal O}(1/\epsilon) outer iterations. As a result, it can be shown that the number of random samples of ξ2\xi_{2} and ξ3\xi_{3} will be bounded by N1N_{1} and N1×N2N_{1}\times N_{2}, i.e., 𝒪(1/ϵ)\mathcal{O}(1/\epsilon) and 𝒪(1/ϵ2)\mathcal{O}(1/\epsilon^{2}), respectively,

It should be noted that our analysis of DSA focuses on the optimality of the first-stage decisions, and the decisions we generated for the later stages are mainly used for computing the approximate stochastic subgradients for the value functions at each stage. Except for the first stage decision x¯1\bar{x}_{1}, the performance guarantees (e.g., feasibility and optimality) that we can provide for later stages are dependent on the sequences of random variables (or scenarios) we generated.

7.4 DSA for general multistage stochastic optimization

In this section, we consider a multistage stochastic optimization problem given by

minf1(x1,ξ1)+𝒬2(x1,ξ1) s.t.A1x1=b1,x1𝒳1,\begin{array}[]{ll}\min\ f_{1}(x_{1},\xi_{1})+{\cal Q}_{2}(x_{1},\xi_{1})\\ \text{ s.t.}\ \ A_{1}x_{1}=b_{1},\\ \quad\quad\quad x_{1}\in{\cal X}_{1},\end{array} (7.18)

where the value factions 𝒬t{\cal Q}_{t}, t=2,,Tt=2,\ldots,T, are recursively defined by

𝒬t(xt1,ξ[t1]):=Ft1(xt1,ξt1)+𝔼|ξ[t1][Qt(xt1,ξ[t])],t=2,,T1,Qt(xt1,ξ[t]):=minft(xt,ξt)+𝒬t+1(xt) s.t.Atxt+Btxt1=bt,xt𝒳t,\begin{array}[]{lll}{\cal Q}_{t}(x_{t-1},\xi_{[t-1]})&:=&F_{t-1}(x_{t-1},\xi_{t-1})+{\mathbb{E}}_{|\xi_{[t-1]}}[Q_{t}(x_{t-1},\xi_{[t]})],\ \ t=2,\ldots,T-1,\\ Q_{t}(x_{t-1},\xi_{[t]})&:=&\min\ f_{t}(x_{t},\xi_{t})+{\cal Q}_{t+1}(x_{t})\\ &&\ \text{ s.t.}\ \ A_{t}x_{t}+B_{t}x_{t-1}=b_{t},\\ &&\quad\quad\quad x_{t}\in{\cal X}_{t},\end{array} (7.19)

and

𝒬T(xT1,ξ[T1]):=𝔼ξT|ξ[T1][QT(xT1,ξ[T])],QT(xT1,ξ[T]):=minfT(xT,ξT) s.t.ATxT+BTxT1=bT,xT𝒳T.\begin{array}[]{lll}{\cal Q}_{T}(x_{T-1},\xi_{[T-1]})&:=&{\mathbb{E}}_{\xi_{T}|\xi_{[T-1]}}[Q_{T}(x_{T-1},\xi_{[T]})],\\ Q^{T}(x_{T-1},\xi_{[T]})&:=&\min\ f_{T}(x_{T},\xi_{T})\\ &&\text{ s.t.}\ \ A_{T}x_{T}+B_{T}x_{T-1}=b_{T},\\ &&\quad\quad\quad x_{T}\in{\cal X}_{T}.\end{array} (7.20)

Here ξt\xi_{t} are random variables, ft(,ξt)f_{t}(\cdot,\xi_{t}) are relatively simple functions, and Ft(,ξt)F_{t}(\cdot,\xi_{t}) are general (not necessarily simple) Lipschitz continuous convex functions. We also assume that one can compute the subgradient F(xt,ξt)F^{\prime}(x_{t},\xi_{t}) of function Ft(xt,ξt)F_{t}(x_{t},\xi_{t}) at any point xt𝒳tx_{t}\in{\cal X}_{t} for a given ξt\xi_{t}.

Problem (7.18) is more general than problem (7.2) (or equivalently problem (7.5)) in the following sense. First, we are dealing with a more complicated multistage stochastic optimization problem where the number of stages TT in (7.18) can be greater than three. Second, the value function 𝒬t(xt1,ξ[t1]){\cal Q}_{t}(x_{t-1},\xi_{[t-1]}) in (7.19) is defined as the summation of Ft1(xt1,ξt1)F_{t-1}(x_{t-1},\xi_{t-1}) and 𝔼|ξ[t1][Qt(xt1,ξ[t])]{\mathbb{E}}_{|\xi_{[t-1]}}[Q_{t}(x_{t-1},\xi_{[t]})], where Ft1F_{t-1} is not necessarily simple. The DSA algorithm can be generalized for solving problem (7.18). More specifically, we will call the SPDT operators to compute a stochastic ϵ\epsilon-subgradient of 𝒬t+1{\cal Q}_{t+1} at xtx_{t}, t=1,,T2t=1,\ldots,T-2, in a recursive manner until we obtain the ϵ\epsilon-subgradient of 𝒬T{\cal Q}_{T} at xT1x_{T-1}.

Similar to the three-stage problem, let NtN_{t} be the number of iterations for stage tt subproblem. For the last stage TT, we will set NT=𝒪(1/ϵ)N_{T}={\cal O}(1/\epsilon) and NT=𝒪(1/ϵ)N_{T}={\cal O}(1/\sqrt{\epsilon}) for the generally convex and strongly convex cases, respectively. For the middle stages t=2,,T1t=2,\ldots,T-1, we set Nt=𝒪(1/ϵ2)N_{t}={\cal O}(1/\epsilon^{2}) and Nt=𝒪(1/ϵ)N_{t}={\cal O}(1/\epsilon). Different algorithmic settings will be used for either generally convex or strongly convex cases. Moreover, less aggressive stepsize selection will be used for the inner loops to guarantee the boundedness of the generated dual variables. Under these parameter selection, we can show that the number of outer loops performed by the DSA method to find an approximate ϵ\epsilon-solution can be bounded by N1=𝒪(1/ϵ2)N_{1}={\cal O}(1/\epsilon^{2}) and 𝒪(1/ϵ){\cal O}(1/\epsilon), respectively, for the generally convex and strongly convex problems.

In view of these results, the total number of scenarios required to find an ϵ\epsilon-solution of (7.18) is given by N2×N3×NTN_{2}\times N_{3}\times\ldots N_{T}, and hence will grow exponentially with respect to TT, no matter whether the objective functions are strongly convex or not. These sampling complexity bounds match well with those lower bounds in [49, 52], implying that multi-stage stochastic optimization problems are essentially intractable for T5T\geq 5 and a moderate target accuracy. Hence, it is reasonable to use the DSA algorithm only for multi-stage stochastic optimization problems with T=3T=3 or 44 and ϵ\epsilon relatively large. However, it is interesting to point out that the DSA algorithm only needs to go through the scenario tree once and hence its memory requirement increases only linearly with respect to TT.

7.5 Combined EDDP and DSA for hierarchical problems

In this section, we discuss how EDDP/SDDP type algorithms can be applied together with DSA type algorithms for solving a class of hierarchical stationary stochastic programs (HSSPs). HSSPs can model problems with a hierarchy of decision-making, e.g., how managerial decisions influence day-to-day operations in a factory. More specifically, the upper level decisions are made by solving a stationary stochastic program over an infinite horizon (see Section 5.3.3), where the number of stages T=T=\infty, the cost function ft=ff_{t}=f, the feasible set 𝒳t=𝒳{\cal X}_{t}={\cal X}, and the random variables ξt\xi_{t} are iid having the same distribution as ξ\xi. The lower level decisions are determined by a stochastic two-stage program,

f(x,ξ):=minz1Z1(x,ξ)f1(z1,ξ)+𝔼ζ[minz2Z2(z1,ζ)f2(z2,ζ)],\displaystyle f(x,\xi):=\min_{z_{1}\in Z_{1}(x,\xi)}f_{1}(z_{1},\xi)+\mathbb{E}_{\zeta}\big{[}\min_{z_{2}\in Z_{2}(z_{1},\zeta)}f_{2}(z_{2},\zeta)\big{]}, (7.21)

where ζ\zeta denotes the random variable independent of ξ\xi involved in the second stage problem.

We extend the dual dynamic programming method to a so-called hierarchical dual dynamic programming by accounting for inexact solutions to the subproblems. To solve the lower-level stochastic multi-stage problem, we approximate it using SAA approach and solve the resulting problem using a primal-dual stochastic approximation method. This method can be generalized to a DSA-type method when the number of stages is three or more. Our results show that when solving an infinite-horizon hierarchical problem where the top-level decision variable is of modest size (i.e., n=4n=4) and the lower-level stochastic multistage program has a modest number of stages (2 or 3), then by integrating dual dynamic programming to handle the top-level decisions and stochastic approximation-type method for the lower-level decisions, we can get a polynomial computational complexity that is independent of the dimension of the lower-level decision variables (see [17] for details).

References

  • [1] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9:203–228, 1999.
  • [2] E. M. L. Beale. On minimizing a convex function subject to linear inequalities. Journal of the Royal Statistical Society, Series B, 17:173–184, 1955.
  • [3] R.E. Bellman. Dynamic Programming. Princeton University Press, 1957.
  • [4] D.P. Bertsekas and S.E. Shreve. Stochastic Optimal Control, The Discrete Time Case. Academic Press, New York, 1978.
  • [5] J.R. Birge. Decomposition and partitioning methods for multistage stochastic linear programs. Operations Research, 33:989–1007, 1985.
  • [6] R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer, New York, 2nd edition, 2011.
  • [7] J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer, 2000.
  • [8] G.B. Dantzig. Linear programming under uncertainty. Management Science, 1:197–206, 1955.
  • [9] L. Ding, S. Ahmed, and A. Shapiro. A python package for multi-stage stochastic programming. Optimization online, 2019.
  • [10] M. Dyer and L. Stougie. Computational complexity of stochastic programming problems. Mathematical Programming, 106:423–432, 2006.
  • [11] H. Föllmer and A. Schied. Stochastic Finance: An Introduction in Discrete Time. Walter de Gruyter, Berlin, 2nd edition, 2004.
  • [12] C. Füllner and S. Rebennack. Stochastic dual dynamic programming and its variants. Optimization online, 2021.
  • [13] P. Girardeau, V. Leclére, and A. B. Philpott. On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research, 40:130–145, 2016.
  • [14] V. Guigues, A. Shapiro, and Y. Cheng. Risk-averse stochastic optimal control: an efficiently computable statistical upper bound. Optimization online, 2022.
  • [15] V. Guigues, A. Shapiro, and Y. Cheng. Duality and sensitivity analysis of multistage linear stochastic programs. European Journal of Operational Research, 308:752–767, 2023.
  • [16] G.A. Hanasusanto, D. Kuhn, and W. Wiesemann. A comment on computational complexity of stochastic programming problems. Mathematical Programming, 159:557–569, 2015.
  • [17] C. Ju and G. Lan. Dual dynamic programming for stochastic programs over an infinite horizon. arXiv, 2023.
  • [18] J.E. Kelley. The cutting-plane method for solving convex programs. Journal of the Society for Industrial and Applied Mathematics, 8:703–712, 1960.
  • [19] G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133(1):365–397, 2012.
  • [20] G. Lan. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Mathematical Programming, 149(1-2):1–45, 2015.
  • [21] G. Lan. First-order and Stochastic Optimization Methods for Machine Learning. Springer Nature, Switzerland AG, 2020.
  • [22] G. Lan. Complexity of stochastic dual dynamic programming. Mathematical Programming, 191:717–754, 2022.
  • [23] G. Lan, A. S. Nemirovski, and A. Shapiro. Validation analysis of mirror descent stochastic approximation method. Mathematical Programming, 134:425–458, 2012.
  • [24] G. Lan and Z. Zhou. Dynamic stochastic approximation for multi-stage stochastic optimization. Mathematical Programming, 187:487–532, 2021.
  • [25] V. Leclére, P. Carpentier, J-P Chancelier, A. Lenoir, and F. Pacaud. Exact converging bounds for Stochastic Dual Dynamic Programming via Fenchel duality. SIAM J. Optimization, 30:1223–1250, 2020.
  • [26] C. Lemaréchal, A. Nemirovskii, and Y. Nesterov. New variants of bundle methods. Mathematical Programming, 69:111–147, 1995.
  • [27] N. Löhndorf and A. Shapiro. Modeling time-dependent randomness in stochastic dual dynamic programming. European Journal of Operational Research, 273:650–661, 2019.
  • [28] A. S. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. 19:1574–1609, 2009.
  • [29] A. S. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, XV, 1983.
  • [30] Y. Nesterov and A. Nemirovski. Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, 1994.
  • [31] M.V.F. Pereira and L.M.V.G. Pinto. Multi-stage stochastic optimization applied to energy planning. Mathematical programming, 52(1-3):359–375, 1991.
  • [32] G. Pflug. Some remarks on the value-at-risk and the conditional value-at-risk. In Probabilistic Constrained Optimization: Methodology and Applications, S. Uryasev (Ed.). Kluwer Academic Publishers, Norwell, MA, 2000.
  • [33] A.B. Philpott, V.L. de Matos, and E. Finardi. On solving multistage stochastic programs with coherent risk measures. Operations Research, 61(4):957–970, 2013.
  • [34] A. Pichler and A. Shapiro. Mathematical foundations of distributionally robust multistage optimization. SIAM J. Optimization, 31:3044–3067, 2021.
  • [35] B.T. Polyak. New stochastic approximation type procedures. Automat. i Telemekh., 7:98–107, 1990.
  • [36] B.T. Polyak and A.B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control and Optimization, 30:838–855, 1992.
  • [37] W.B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley and Sons, New York, 2nd edition, 2011.
  • [38] H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.
  • [39] R. T Rockafellar. Conjugate Duality and Optimization. Society for Industrial and Applied Mathematics, Philadelphia, 1974.
  • [40] R.T. Rockafellar. Convex Analysis. Princeton University Press, 1970.
  • [41] R.T Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions. J. of Banking and Finance, 26(7):1443–1471, 2002.
  • [42] A. Ruszczyński and A. Shapiro. Conditional risk mappings. Mathematics of Operations Research, 31:544–561, 2006.
  • [43] A. Ruszczyński and A. Shapiro. Optimization of convex risk functions. Mathematics of Operations Research, 31:433–452, 2006.
  • [44] A. Shapiro. Analysis of stochastic dual dynamic programming method. European Journal of Operational Research, 209:63–72, 2011.
  • [45] A. Shapiro and Y. Cheng. Central limit theorem and sample complexity of stationary stochastic programs. Operations Research Letters, 49:676–681, 2021.
  • [46] A. Shapiro and Y. Cheng. Dual bounds for periodical stochastic programs. Operations Research, 2021.
  • [47] A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia, third edition, 2021.
  • [48] A. Shapiro and L. Ding. Periodical multistage stochastic programs. SIAM J. Optimization, 30:2083–2102, 2020.
  • [49] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. E-print available at: http://www.optimization-online.org, 2004.
  • [50] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In V. Jeyakumar and A.M. Rubinov, editors, Continuous Optimization: Current Trends and Applications: Current Trends and Applications, pages 111–144. Springer, 2005.
  • [51] A. Shapiro, W. Tekaya, J.P. da Costa, and M. Pereira Soares. Risk neutral and risk averse stochastic dual dynamic programming method. European Journal of Operational Research, 224:375–391, 2013.
  • [52] A. Shaprio. On complexity of multistage stochastic programs. Operations Research Letters, 34:1–8, 2006.
  • [53] S. Zhang and X. Sun. Stochastic dual dynamic programming for multistage stochastic mixed-integer nonlinear optimization. Mathematical Programming, 196:935–985, 2022.
  • [54] P.H. Zipkin. Foundation of inventory management. McGraw-Hill, 2000.