Scenario-decomposition Solution Framework for
Nonseparable Stochastic Control Problems

Xin Huang Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong (e-mails: [email protected]; [email protected]). Duan Li Corresponding author. D. Li is with School of Data Science, City University of Hong Kong, Kowloon, Hong Kong (e-mail: [email protected]). The research of this author is supported by Hong Kong Research Grants Council under grant 11200219. Daniel Zhuoyu Long ¹¹footnotemark: 1

Abstract

When stochastic control problems do not possess separability and/or monotonicity, the dynamic programming pioneered by Bellman in 1950s fails to work as a time-decomposition solution method. Such cases have posted a great challenge to the control society in both theoretical foundation and solution methodologies for many years. With the help of the progressive hedging algorithm proposed by Rockafellar and Wets in 1991, we develop a novel scenario-decomposition solution framework for stochastic control problems which could be nonseparable and/or non-monotonic, thus extending the reach of stochastic optimal control. We discuss then some of its promising applications, including online quadratic programming problems and dynamic portfolio selection problems with smoothing properties.

Keywords: Nonseparable stochastic control, scenario decomposition, progressive hedging algorithm, online quadratic programming, dynamic portfolio selection.

1 Introduction

Stochastic control problems can be, in general, formulated as follows, {IEEEeqnarray*}ccl (P) & min_u_t, t=0, …, T-1 E[J(x_0,u_0,x_1,u_1, …, x_T-1,u_T-1,x_T)]
s.t. x_t+1 = f_t(x_t,u_t, ξ_t),
g_t(x_t,u_t) ≤0, g_T(x_T) ≤0,
t = 0, 1, …, T-1, where $x_{t}$ $\in$ $\mathbb{R}^{m}$ is the state with $x_{0}$ given, $u_{t}$ $\in$ $\mathbb{R}^{n}$ is the control, and $g_{t}(x_{t},u_{t})\leq 0$ and $g_{T}(x_{T})\leq 0$ represent, respectively, the running constraints on states and controls, and the constraint on the terminal state. Moreover, $\xi_{t}$ $\in$ $\mathbb{R}^{p}$ is a white noise vector, and $f_{t}:\mathbb{R}^{m}\times\mathbb{R}^{n}\times\mathbb{R}^{p}\rightarrow\mathbb{R}^{m}$ is the system dynamics. Thus, the system under consideration is of a Markovian property. The performance measure $J$ is backward separable if there exist functions $\phi_{t}:$ $\mathbb{R}^{m}\times\mathbb{R}^{n}\times\mathbb{R}\rightarrow\mathbb{R}$ , $t=0,1,\ldots,T-1$ , and $\phi_{T}:$ $\mathbb{R}^{m}\rightarrow\mathbb{R}$ such that {IEEEeqnarray*}c J=ϕ_0(x_0,u_0, ϕ_1(x_1,u_1, ϕ_2(…ϕ_T-2(x_T-2,u_T-2, ϕ_T-1(x_T-1,u_T-1, ϕ_T(x_T)))…))). The backward separable objective function $J$ is then said backward monotonic if for all $t$ = 0, 1, $\ldots$ , $T-1$ , the condition

\displaystyle\phi_{t}(\hat{x}_{t},\hat{u}_{t},\phi_{t+1}(\ldots\phi_{T-1}(\hat{x}_{T-1},\hat{u}_{T-1},\phi_{T}(\hat{x}_{T}))\ldots))\leq\phi_{t}(\tilde{x}_{t},\tilde{u}_{t},\phi_{t+1}(\ldots\phi_{T-1}(\tilde{x}_{T-1},\tilde{u}_{T-1},\phi_{T}(\tilde{x}_{T}))\ldots))

implies the following: for any triple $(x_{t-1},u_{t-1},\xi_{t-1})$ such that $\hat{x}_{t}=\tilde{x}_{t}=f_{t-1}(x_{t-1},u_{t-1},\xi_{t-1})$ , we have

	$\displaystyle\phi_{t-1}(x_{t-1},u_{t-1},\phi_{t}(\hat{x}_{t},\hat{u}_{t},\phi_{t+1}(\ldots\phi_{T-1}(\hat{x}_{T-1},\hat{u}_{T-1},\phi_{T}(\hat{x}_{T}))\ldots)))\leq$
	$\displaystyle\phi_{t-1}(x_{t-1},u_{t-1},\phi_{t}(\tilde{x}_{t},\tilde{u}_{t},\phi_{t+1}(\ldots\phi_{T-1}(\tilde{x}_{T-1},\tilde{u}_{T-1},\phi_{T}(\tilde{x}_{T}))\ldots))).$

When $(\mathcal{P})$ satisfies both the separability and the monotonicity as defined above, the celebrated dynamic programming (DP) Bellman1952 is a powerful time-decomposition solution approach, which is based on the principle of optimality.

There exist, however, a plenty of problems of interests that do not satisfy these fundamental requirements in DP. One notorious nonseparable case is the variance minimization problem (see white1974dynamic and li2003variance). The obstacle is mainly due to that the variance operation, unlike the expectation operator, does not satisfy the tower property along the time horizon. The variance minimization naturally emerges in the dynamic mean-variance (MV) portfolio selection problem. After many years of struggle, li2000optimal finally solves it by embedding the original nonseparable problem into a family of separable auxiliary problems that are analytically solvable by DP. sniedovich1986c and domingo1993experiments in the early days consider nonseparable problems with the objective function of the form $h(u)=\psi(v(u),z(u))$ , where both $v$ and $z$ are functions in additive forms w.r.t. stages. Under the assumption that $\psi$ is pseudo-concave w.r.t. its arguments, the authors of sniedovich1986c and domingo1993experiments develop the so-called C-programming to convert the primal problem into a separable version which could be handled by DP and report its applications in the variance minimization (see also sniedovich1987class) and fractional programming (see also sniedovich1990solution). carraway1990generalized proposes a generalized DP for the multi-criteria optimization problem that violates the monotonicity. li1990new, li1991extension, and li1990multiple consider a class of nonseparable problems where the nonseparable objective function is a monotone function of several separable sub-objectives. Among these three papers, the first two deal with the deterministic cases, whereas the last one deals with the stochastic counterpart. They introduce the concept of $k$ th-order separability and convert the primal nonseparable problem into a separable $k$ -objective optimization problem which could be solved by the multi-objective DP Envelope. They further develop conditions under which a specific Pareto solution is optimal to the original nonseparable problem. Moreover, li1997cost investigates a nonseparable cost smoothing problem for the discrete-time deterministic linear-quadratic control.

Different from the above works, this paper aims to develop a novel solution framework through the scenario decomposition, which is fundamentally distinct from the methods governed by time-decomposition-based DP. Our solution framework relies on the progressive hedging algorithm (PHA) pioneered in rockafellar1991scenarios. In contrast to DP, our PHA-oriented solution scheme can be applied to stochastic control problems which may not be separable and/or monotonic. We emphasize that PHA has not been fully recognized up to today for its powerful capability in dealing with the non-separability or non-monotonicity in stochastic control. We will further apply the newly-developed solution scheme to two nonseparable (thus non-tractable by DP) real-world applications: online quadratic programming (QP) and a novel variation of the dynamic portfolio selection problem with smoothing properties. Interestingly, the considered MV problem with smoothing feature could be embedded into a series of auxiliary problems that turn out to be a concrete type of our proposed online QP model.

The rest of the paper proceeds as follows. We build up in Section 2 the scenario-decomposition solution framework through adopting PHA on general stochastic control problems, where the information flow follows a tree structure. We then demonstrate its prominent application to the online QP problem in Section 3. On top of that, we also apply this solution methodology to dynamic portfolio selection problems and their novel variations with smoothing features, and analyze experimental results in Section 4. Finally, we conclude the paper in Section 5.

2 Solution Approach by Scenario Decomposition

We consider in this paper the problem $(\mathcal{P})$ with a Markovian system. As the most prominent feature of our new formulation, the objective function in general could be nonseparable and/or non-monotonic. On the other hand, we confine the structure of the information flow to a tree form, where the system randomness $\boldsymbol{\xi}=\{\xi_{0},\xi_{1},\ldots,\xi_{T-1}\}$ is realized stage by stage, and a series of realizations of $\xi_{t}$ ’s will form a scenario of the tree, indexed by $i$ . From the scenario analysis prospective, the dynamic stochastic control problem could be armed with a scenario tree in order to reflect its information flow for the underlying uncertainties. Figure 1 exemplifies a specific three-stage tree structure, where $\boldsymbol{\xi}$ is realized successively from $\xi_{0}$ to $\xi_{1}$ and finally to $\xi_{2}$ , thus leading to seven possible scenarios (paths of $\boldsymbol{\xi}$ ) in total. The number in each circle node represents a possible value of the disturbance realized right before that stage. Note that any parent node (starting from the square root node) could in general result in different numbers of children nodes. In contrast to DP whose effectiveness comes from the time decomposition, the solution power by PHA that we adopt in this paper roots in the scenario decomposition. Invented almost thirty years ago, PHA has been successfully applied to several application areas including power systems scheduling problems (see dos2009practical among others) and water resource planning problems (see, e.g., carpentier2013long). For more details on the general methodology of PHA, please refer to rockafellar1991scenarios.

Figure 1: A scenario tree with three stages and seven scenarios.

Let us denote by $\mathcal{I}$ the scenario set which consists of all possible scenarios, and denote by $\boldsymbol{\xi}^{i}=\{\xi_{0}^{i},\xi_{1}^{i},\ldots,\xi_{T-1}^{i}\}$ the realizations of disturbance under the scenario $i\in\mathcal{I}$ . Assuming the occurring probability of scenario $i$ to be $\rho_{i}$ that is fixed at time $0$ , we can rewrite the objective of $(\mathcal{P})$ as $\min\sum_{i\in\mathcal{I}}\rho_{i}J_{i}$ , where $J_{i}$ denotes the sub-objective under $\boldsymbol{\xi}^{i}$ . Then it is natural to decompose problem $(\mathcal{P})$ into a family of scenario subproblems and consider the following individual scenario subproblem for each $i\in\mathcal{I}$ , {IEEEeqnarray*}ccl (P^i) & min_u_t,∀t J_i = J(x_0^i,u_0,x_1^i,u_1,…,x_T-1^i,u_T-1,x_T^i)
s.t. x_t+1^i = f_t(x_t^i,u_t, ξ_t^i), x_0^i=x_0,
g_t(x_t^i,u_t) ≤0, g_T(x_T^i) ≤0,
t = 0, 1, …, T-1, which is a deterministic optimal control problem, and should be much easier to solve than the original stochastic one. In this paper, we further assume that each $(\mathcal{P}^{i})$ is convex w.r.t. the control variable $\mathbf{u}=(u_{0}^{\prime},u_{1}^{\prime},\ldots,u_{T-1}^{\prime})^{\prime}\in\mathbb{R}^{nT}$ . Although the optimal solution of $(\mathcal{P}^{i})$ satisfies all the admissible constraints of the primal problem $(\mathcal{P})$ , it is not implementable in reality, since we have “stolen” the future information (i.e., the future realization of $\boldsymbol{\xi}$ ) when solving each scenario subproblem at time $0$ . In other words, the scenario-specific solutions violate the so-called nonanticipative constraint which is either explicitly or implicitly implied in any stochastic control problem. To force any admissible solution to meet nonanticipativity, the scenario bundles, as a partition of $\mathcal{I}$ , are formed at each time according to the scenario tree of the underlying problem. Graphically speaking, scenarios passing through each node at a certain time stage are grouped together to form a bundle. In Figure 1, for instance, at time 0 all the scenarios form a single bundle that is the scenario set itself and we denote this partition by $\mathcal{I}_{0}=\{\mathcal{I}_{0,1}\}=\{\{i_{1},\ldots,i_{7}\}\}$ ; and when $t=1$ we have two bundles together to form $\mathcal{I}_{1}=\{\mathcal{I}_{1,1},\mathcal{I}_{1,2}\}=\{\{i_{1},\ldots,i_{4}\},\{i_{5},\ldots,i_{7}\}\}$ ; and finally for $t=2$ we have five bundles to form the partition of $\mathcal{I}$ at that time, i.e., $\mathcal{I}_{2}=\{\mathcal{I}_{2,1},\mathcal{I}_{2,2},\mathcal{I}_{2,3},\mathcal{I}_{2,4},\mathcal{I}_{2,5}\}=\{\{i_{1}\},\{i_{2},i_{3}\},\{i_{4}\},\{i_{5}\},\{i_{6},i_{7}\}\}$ . The nonanticipativity naturally requires any implementable policy to react the same to all indifferent scenarios (the scenarios from the same bundle), and this is achieved by taking conditional expectations on the scenario-specific solutions from the related bundle. More specifically, the implementable control at time $t$ , if the scenario $i$ occurs, is computed through

\displaystyle\hat{u}^{i}_{t}=\sum_{j\in\mathcal{I}_{t,l}}\frac{\rho_{j}}{\sum_{j^{\prime}\in\mathcal{I}_{t,l}}\rho_{j^{\prime}}}u_{t}^{j},~{}i\in\mathcal{I}_{t,l},~{}l=1,\ldots,|\mathcal{I}_{t}|,

(1)

where $u_{t}^{j}$ is the scenario- $j$ -based admissible control at time $t$ , and $|\mathcal{I}_{t}|$ is the number of scenario bundles in the partition $\mathcal{I}_{t}$ . Note that $|\mathcal{I}_{t}|$ determines the number implementable controls corresponding to different realizations at that time. In fact, the above procedure in (1) can be characterized in a linear transformation $\hat{\mathbf{u}}_{t}=\mathbf{T}_{t}\mathbf{u}_{t}$ , where $\hat{\mathbf{u}}_{t}=((\hat{{u}}_{t}^{1})^{\prime},(\hat{{u}}_{t}^{2})^{\prime},\ldots,(\hat{{u}}_{t}^{|\mathcal{I}|})^{\prime})^{\prime}\in\mathbb{R}^{n|\mathcal{I}|}$ , ${\mathbf{u}}_{t}=(({{u}}_{t}^{1})^{\prime},({{u}}_{t}^{2})^{\prime},\ldots,({{u}}_{t}^{|\mathcal{I}|})^{\prime})^{\prime}\in\mathbb{R}^{n|\mathcal{I}|}$ , and the projection matrix $\mathbf{T}_{t}$ can be easily build up by scenario probabilities based on the structure of $\mathcal{I}_{t}$ . Then the overall linear mapping is

\displaystyle\left(\begin{array}[]{c}\hat{\mathbf{u}}_{0}\\ \hat{\mathbf{u}}_{1}\\ \vdots\\ \hat{\mathbf{u}}_{T-1}\end{array}\right)=\left(\begin{array}[]{cccc}\mathbf{T}_{0}&0&\cdots&0\\ 0&\mathbf{T}_{1}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\mathbf{T}_{T-1}\end{array}\right)\left(\begin{array}[]{c}{\mathbf{u}}_{0}\\ {\mathbf{u}}_{1}\\ \vdots\\ {\mathbf{u}}_{T-1}\end{array}\right).

The beauty of PHA lies in its augmented Lagrangian formulation that progressively aggregates the scenario-specific solutions into an implementable one and forces them to converge to the optimal solution of the primal problem $(\mathcal{P})$ , which are both admissible and implementable. More precisely, it deals with an augmented Lagrangian problem at each iteration $\nu=0,1,\ldots$ , which is constructed by adding a linear Lagrangian term and a quadratic penalty term to the scenario-specific objective function in order to penalize any utilization of the anticipative information from the future. More precisely, we solve the following augmented Lagrangian problem in the $\nu$ th iteration for each $i\in\mathcal{I}$ , {IEEEeqnarray*}ccl (P^i,ν) & min_u J(x_0^i,u_0,x_1^i,u_1,…,x_T-1^i,u_T-1,x_T^i) +u’w^i,ν+12α—u-^u^i,ν—_2^2,
s.t. x_t+1^i = f_t(x_t^i,u_t, ξ^i_t), x_0^i=x_0,
g_t(x

Scenario-decomposition Solution Framework for Nonseparable Stochastic Control Problems

Abstract

1 Introduction

2 Solution Approach by Scenario Decomposition

Scenario-decomposition Solution Framework for
Nonseparable Stochastic Control Problems