Distributionally robust stochastic optimal control

Alexander Shapiro Georgia Institute of Technology, Atlanta, Georgia 30332, USA, [email protected]
Research of this author was partially supported by Air Force Office of Scientific Research (AFOSR) under Grant FA9550-22-1-0244. Yan Li Texas A&M University, College Station, Texas 77843, USA, [email protected]

The main goal of this paper is to discuss construction of distributionally robust counterparts of stochastic optimal control problems. Randomized and non-randomized policies are considered. In particular necessary and sufficient conditions for existence of non-randomized policies are given.

Keywords: stochastic optimal control, dynamic equations distributional robustness, game formulation, risk measures

1 Introduction

Consider the Stochastic Optimal Control (SOC) (discrete time, finite horizon) model (e.g., [1]):

\min\limits_{\pi\in\Pi}{\mathbb{E}}^{\pi}\left[\sum_{t=1}^{T}f_{t}(x_{t},u_{t},\xi_{t})+f_{T+1}(x_{T+1})\right],

(1.1)

where $\Pi$ is the set of polices satisfying the constraints

\Pi=\Big{\{}\pi=(\pi_{1},\ldots,\pi_{T}):u_{t}=\pi_{t}(x_{t},\xi_{[t-1]}),u_{t}\in{\cal U}_{t},x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t}),\;\;t=1,...,T\Big{\}}.

(1.2)

Here variables $x_{t}\in{\mathbb{R}}^{n_{t}}$ , $t=1,...,T+1$ , represent the state of the system, $u_{t}\in{\mathbb{R}}^{m_{t}}$ , $t=1,...,T$ , are controls, $\xi_{t}\in\Xi_{t}$ , $t=1,...,T$ , are random vectors, $\Xi_{t}$ is a closed subset of ${\mathbb{R}}^{d_{t}}$ , $f_{t}:{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}\times{\mathbb{R}}^{d_{t}}\to{\mathbb{R}}$ , $t=1,...,T$ , are cost functions, $c_{T+1}(x_{T+1})$ is a final cost function, $F_{t}:{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}\times{\mathbb{R}}^{d_{t}}\to{\mathbb{R}}^{n_{t+1}}$ are (measurable) mappings and ${\cal U}_{t}$ is a (nonempty) subset of ${\mathbb{R}}^{m_{t}}$ . Values $x_{1}$ and $\xi_{0}$ are deterministic (initial conditions); it is also possible to view $x_{1}$ as random with a given distribution, this is not essential for the following discussion.

•

Unless stated otherwise we assume that the probability law of the random process $\xi_{1},...,\xi_{T}$ does not depend on our decisions.

Remark 1.1.

The above assumption is basic for our analysis. It views $\xi_{1},...,\xi_{T}$ as a random data process and assumes that its probability law does not depend on the respective states and actions. This assumption is reasonable in many applications, for instance in the example of Inventory Model, discussed in section 4, this means that the probability distribution of the demand process does not depend on the inventory level and order quantity. This assumption allows to separate the transition probabilities, defined by the functional relations $x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t})$ , and the probability distribution of the data process (see Remarks 1.2 and 2.1 below). $\hfill\square$

The optimization in (1.1) is performed over policies $\pi\in\Pi$ determined by decisions $u_{t}$ and state variables $x_{t}$ considered as functions of $\xi_{[t-1]}=(\xi_{1},...,\xi_{t-1})$ , $t=1,...,T$ , and satisfying the feasibility constraints (1.2). We also denote $\Xi_{[t]}:=\Xi_{1}\times\cdots\times\Xi_{t}$ . For the sake of simplicity, in order not to distract from the main message of the paper, we assume that the control sets ${\cal U}_{t}$ do not depend on $x_{t}$ . It is possible to extend the analysis to the general case, where the control sets are functions of the state variables.

Remark 1.2.

Note that because of the basic assumption that the probability distribution of $\xi_{1},...,\xi_{T}$ does not depend on our decisions (does not depend on states and actions), it suffices to consider policies $\{\pi_{t}(\xi_{[t-1]})\}$ as functions of the process $\xi_{t}$ alone. It also could be noted that in case the random process $\xi_{t}$ is stagewise independent, i.e., random vector $\xi_{t+1}$ is independent of $\xi_{[t]}$ , $t=1,...,T-1$ , it suffices to consider policies of the form $\{u_{t}=\pi_{t}(x_{t})\}$ . $\hfill\square$

Consider Banach space $C(\Xi_{t})$ of continuous functions $\phi:\Xi_{t}\to{\mathbb{R}}$ equipped with the sup-norm $\|\phi\|_{\infty}=\sup_{\xi\in\Xi_{t}}|\phi(\xi)|$ . The dual space of $C(\Xi_{t})$ is the space of finite signed measures on $\Xi_{t}$ with respect to the bilinear form $\langle\phi,\gamma\rangle=\int\phi d\gamma$ (Riesz representation).

Remark 1.3.

Denote by ${\mathfrak{M}}_{t}$ the set of probability measures on the set $\Xi_{t}$ equipped with its Borel sigma algebra. The set ${\mathfrak{M}}_{t}$ is a weakly^∗ closed subset of the unit ball of the dual space of $C(\Xi_{t})$ and hence is weakly^∗ compact by the Banach - Alaoglu theorem. The weak^∗ topology¹¹1In probability theory this weak^∗ topology is often referred to as the weak topology, e.g., [2]. of ${\mathfrak{M}}_{t}$ is metrizable (e.g., [6, Theorem 6.30]). It can be noted that if $\phi_{n}\in C(\Xi_{t})$ converges (in the norm topology) to $\phi\in C(\Xi_{t})$ and $\gamma_{n}\stackrel{{\scriptstyle\omega^{*}}}{{\to}}\gamma$ , then $\int\phi_{n}d\gamma_{n}=\langle\phi_{n},\gamma_{n}\rangle$ converges to $\int\phi d\gamma$ . $\hfill\square$

Remark 1.4.

Let us recall the following properties of the min-max problem:

\min_{x\in{\cal X}}\sup_{y\in{\cal Y}}f(x,y),

(1.3)

where ${\cal X}$ and ${\cal Y}$ are nonempty sets and $f:{\cal X}\times{\cal Y}\to{\mathbb{R}}$ is a real valued function. A point $(x^{*},y^{*})\in{\cal X}\times{\cal Y}$ is a saddle point of problem (1.3) if $x^{*}\in\mathop{\rm arg\,min}_{x\in{\cal X}}f(x,y^{*})$ and $y^{*}\in\mathop{\rm arg\,max}f(x^{*},y).$ If the saddle point exists, then problem (1.3) has the same optimal value as its dual

\max_{y\in{\cal Y}}\inf_{x\in{\cal X}}f(x,y),

(1.4)

and $x^{*}$ is an optimal solution of problem (1.3) and $y^{*}$ is an optimal solution of problem (1.4). Conversely, if the optimal values of problems (1.3) and (1.4) are the same, and $x^{*}$ is an optimal solution of problem (1.3) and $y^{*}$ is an optimal solution of problem (1.4), then $(x^{*},y^{*})$ is a saddle point. By Sion’s minimax theorem [16], we have that if ${\cal X}$ and ${\cal Y}$ are convex subsets of linear topological spaces, $f(x,y)$ is continuous, convex in $x$ and concave in $y$ , and at least one of the sets ${\cal X}$ or ${\cal Y}$ is compact, then the optimal values of problems (1.3) and (1.4) are equal to each other. $\hfill\square$

The risk neutral SOC model (1.1) was thoroughly investigated. On the other hand, the analysis of its distributionally robust counterpart is more involved. The distributionally robust approach to Markov Decision Processes (MDPs) can be traced back to [7] and [10]. The origins of distributional robustness can be also related to the dynamic game theory (e.g., [8] and references there in). By duality arguments the distributionally robust formulations are closely related to the risk averse settings. The aim of this paper is to describe and formulate certain related questions in the framework of the SOC model. In particular we consider randomized policies and give necessary and suffcient conditions for existence of non-randomized optimal policies. We also discuss a relation between the nested risk averse and game formulations of distributionally robust problems.

2 Distributionally Robust Stochastic Optimal Control

In the distributionally robust counterpart of the risk neutral SOC problem (1.1) it is assumed that the probability law of the data process $\xi_{t}$ is not specified exactly, but rather is a member of a specified so-called ambiguity set of probability distributions. This modeling concept could be applied in various contexts, such as the Inventory Model discussed in Section 4, where the order quantity must be determined in the presence of uncertain demand distributions. Specifically, consider the setting where there is ambiguity set ${\cal P}_{t}^{\xi_{[t-1]}}$ of probability measures on $\Xi_{t}$ , depending on the history $\xi_{[t-1]}=(\xi_{1},...,\xi_{t-1})$ (cf., [11]). That is, ${\cal P}_{t}^{\xi_{[t-1]}}={\mathbb{M}}_{t}(\xi_{[t-1]})$ , where ${\mathbb{M}}_{t}:\Xi_{[t-1]}\rightrightarrows{\mathfrak{M}}_{t}$ is the respective multifunction. Here ${\mathbb{M}}_{1}(\cdot)\equiv{\cal P}_{1}$ for some pre-determined ${\cal P}_{1}\subset{\mathfrak{M}}_{1}$ .

We view the distributionally robust counterpart of problem (1.1) as a dynamic game between the decision maker (the controller) and the adversary (the nature). Define history of the decision process as

\mathfrak{h}_{t}=(x_{1},u_{1},P_{1},\xi_{1},x_{2},...,u_{t-1},P_{t-1},\xi_{t-1},x_{t}),\;t=1,...,T.

(2.5)

At stage $t=1,...,T$ , based on $\mathfrak{h}_{t}$ the nature chooses a probability measure $P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}$ , at the same time the controller chooses its action $u_{t}\in{\cal U}_{t}$ . For a realization of $\xi_{t}\sim P_{t}$ , the next state $x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t})$ and so on the process is continued. (It is also possible to let the ambiguity set ${\cal P}_{t}^{\xi_{[t-1]}}$ depend on the past actions $u_{[t-1]}=(u_{1},\ldots,u_{t-1})$ ; this does not change the essence of our discussion for the dynamic game.) We write the corresponding distributionally robust problem as

\min\limits_{\pi\in\Pi}\sup_{\gamma\in\Gamma}{\mathbb{E}}^{\pi,\gamma}\left[\sum_{t=1}^{T}f_{t}(x_{t},u_{t},\xi_{t})+f_{T+1}(x_{T+1})\right],

(2.6)

where $\Pi$ is the set of polices of the controller and $\Gamma$ is the set of policies of the nature (cf., [13, p. 812]).

Unless stated otherwise we make the following assumptions about ${\cal U}_{t},\Xi_{t},f_{t},F_{t}$ .

Assumption 2.1.

The sets ${\cal U}_{t}$ and $\Xi_{t}$ , $t=1,...,T$ , are compact, the functions $F_{t}(x_{t},u_{t},\xi_{t})$ , $f_{t}(x_{t},u_{t},\xi_{t})$ , $t=1,...,T$ , and $f_{T+1}(x_{T+1})$ are continuous.

2.1 Dynamic Equations

The dynamic programming equations for the distributionally robust counterpart (2.6) of problem (1.1) are ${\cal V}_{T+1}(x_{T+1},\xi_{[T]})=f_{T+1}(x_{T+1})$ for all $\xi_{[T]}\in\Xi_{1}\times\cdots\times\Xi_{T}$ , and for $t=T,...,1$ ,

\displaystyle{\cal V}_{t}(x_{t},\xi_{[t-1]})

\displaystyle=\inf\limits_{u_{t}\in{\cal U}_{t}}\sup_{P_{t}\in{\cal P}^{\xi_{[t-1]}}_{t}}{\mathbb{E}}_{\xi_{t}\sim P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+{\cal V}_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]}\big{)}\right].

(2.7)

To ensure that (2.7) is well-defined, we need value functions ${\cal V}_{t}$ to be measurable. We establish below the continuity of the value functions and that the optimal action in (2.7) can be attained. We refer the readers to the Appendix for the definition of continuous multifunctions.

Proposition 2.1.

Suppose that Assumption 2.1 holds and the multifunctions ${\mathbb{M}}_{t}$ are continuous. Then the value functions ${\cal V}_{t}(x_{t},\xi_{[t-1]})$ in (2.7) are continuous in $(x_{t},\xi_{[t-1]})\in{\mathbb{R}}^{n_{t}}\times\Xi_{[t-1]}$ and the infimum in (2.7) is attained. The optimal policy of the controller is given by $\bar{\pi}_{t}(x_{t},\xi_{[t-1]})=\bar{u}_{t}(x_{t},\xi_{[t-1]})$ , $t=1,...,T,$ with

\bar{u}_{t}(x_{t},\xi_{[t-1]})\in\mathop{\rm arg\,min}\limits_{u_{t}\in{\cal U}_{t}}\sup_{P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}}{\mathbb{E}}_{P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+{\cal V}_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]}\big{)}\right].

(2.8)

Proof.

We argue by induction in $t=T,...,1$ . We have that ${\cal V}_{T+1}(x_{T+1},\xi_{[T]})=f_{T+1}(x_{T+1})$ and hence ${\cal V}_{T+1}(x_{T+1},\xi_{[T]})$ is continuous by the assumption. Now suppose that ${\cal V}_{t+1}(\cdot,\cdot)$ is continuous. Consider

	$\displaystyle\psi(P_{t},x_{t},u_{t},\xi_{[t-1]})$	$\displaystyle:=$	$\displaystyle{\mathbb{E}}_{\xi_{t}\sim P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+{\cal V}_{t+1}(F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]})\right]$
		$\displaystyle=$	$\displaystyle\int_{\Xi_{t}}f_{t}(x_{t},u_{t},\xi_{t})+{\cal V}_{t+1}(F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t-1]},\xi_{t})P_{t}(d\xi_{t}).$

From Assumption 2.1 the set $\Xi_{t}$ is compact, and subsequently for any sequence $(x_{t}^{n},u_{t}^{n},\xi_{[t-1]}^{n})$ converging to $(x_{t},u_{t},\xi_{[t-1]})$ , we have that $\phi_{t}^{n}(\xi_{t}):=f_{t}(x^{n}_{t},u_{t}^{n},\xi_{t})+{\cal V}_{t+1}(F_{t}(x_{t}^{n},u_{t}^{n},\xi_{t}),(\xi_{[t-1]}^{n},\xi_{t}))$ converges to $\phi_{t}(\xi_{t}):=f_{t}(x_{t},u_{t},\xi_{t})+{\cal V}_{t+1}(F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]})$ uniformly in $\xi_{t}\in\Xi_{t}$ . For any $P_{t}^{n}\overset{w^{*}}{\to}P_{t}$ , it follows by Remark 1.3 that $\psi_{t}(P_{t}^{n},x_{t}^{n},u_{t}^{n},\xi_{[t-1]}^{n})\to\psi_{t}(P_{t},x_{t},u_{t},\xi_{[t-1]})$ . In addition, since $\Xi_{t}$ is compact, $C(\Xi_{t})$ is separable, and ${\mathfrak{M}}_{t}$ with weak^∗ topology is metrizable [6, Theorem 6.30]. It follows that $\psi_{t}$ is jointly continuous in $P_{t}$ (with respect to the weak^∗ topology of ${\mathfrak{M}}_{t}$ ) and $(x_{t},u_{t},\xi_{[t-1]})\in{\mathbb{R}}^{n_{t}}\times{\mathbb{R}}^{m_{t}}\times\Xi_{[t-1]}$ . Consequently we obtain by Theorem 7.1 that the function

\nu(x_{t},u_{t},\xi_{[t-1]}):=\sup_{P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}}\psi(P_{t},x_{t},u_{t},\xi_{[t-1]})

is continuous in $(x_{t},u_{t},\xi_{[t-1]})$ . Finally again by Theorem 7.1 we have that

{\cal V}_{t}(x_{t},\xi_{[t-1]})=\inf_{u_{t}\in{\cal U}_{t}}\nu(x_{t},u_{t},\xi_{[t-1]})

is continuous. The infimum is attained as ${\cal U}_{t}$ is compact. ∎

Remark 2.1.

It can be noted that because of the basic assumption that the probability laws of the process $\xi_{t}$ do not depend on the states and actions, it follows that the state $x_{t}$ can be considered as a function of $\xi_{[t-1]}$ , and hence the value functions ${\cal V}_{t}(\xi_{[t-1]})$ and optimal policies $\bar{\pi}_{t}(\xi_{[t-1]})$ can be viewed as functions of the process $\xi_{1},...,\xi_{T}$ alone (compare with Remark 1.2). $\hfill\square$

Remark 2.2.

Condition (2.8) is sufficient for the corresponding policy to be optimal. It could be interesting to note that for certain classes of ambiguity sets such condition is not necessary for the policy to be optimal for problem (2.6) even if ambiguity sets ${\cal P}_{t}$ do not depend on $\xi_{[t-1]}$ (cf., [12]). $\hfill\square$

2.2 Randomized Policies

We also consider randomized policies for the controller. That is, at stage $t=1,...,T$ , the nature chooses a probability measure $P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}$ , at the same time the controller chooses its action $u_{t}$ at random according to a probability distribution $\mu_{t}\in{\mathfrak{C}}_{t}$ , where ${\mathfrak{C}}_{t}$ denotes the set of (Borel) probability measures on ${\cal U}_{t}$ . Consequently $x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t})$ and so on. Here the decision process is defined by histories ${\mathfrak{h}}_{t}$ , as defined in (2.5), with actions $u_{t}$ chosen at random.

This corresponds to the min-max formulation

\min\limits_{\mu\in\widehat{\Pi}}\sup_{\gamma\in\Gamma}{\mathbb{E}}^{\mu,\gamma}\left[\sum_{t=1}^{T}f_{t}(x_{t},u_{t},\xi_{t})+f_{T+1}(x_{T+1})\right],

(2.9)

where $\widehat{\Pi}$ is the set of randomized policies of the controller that maps $\mathfrak{h}_{t}$ into ${\mathfrak{C}}_{t}$ at each stage. For the randomized policies the counterpart of dynamic equation (2.7) becomes $V_{T+1}(x_{T+1},\xi_{[T]})=f_{T+1}(x_{T+1})$ for all $\xi_{[T]}\in\Xi_{[T]}$ , and for $t=T,...,1$ ,

V_{t}(x_{t},\xi_{[t-1]})=\inf\limits_{\mu_{t}\in{\mathfrak{C}}_{t}}\sup_{P_{t}\in{\cal P}^{\xi_{[t-1]}}_{t}}{\mathbb{E}}_{(u_{t},\xi_{t})\sim\mu_{t}\times P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]}\big{)}\right]

(2.10)

(cf., [9],[17]). The controller has a non-randomized optimal policy if for every $t$ and $x_{t}$ problem (2.11) below has optimal solution supported on a single point of ${\cal U}_{t}$ .

Note that by the Banach - Alaoglu theorem the set ${\mathfrak{C}}_{t}$ is compact in the weak^∗ topology of the dual of the space $C({\cal U}_{t})$ . With similar arguments as in Proposition 2.1, we have the following.

Proposition 2.2.

Suppose that Assumption 2.1 holds and the multifunctions ${\mathbb{M}}_{t}$ are continuous. Then the value functions $V_{t}(x_{t},\xi_{[t-1]})$ in (2.10) are continuous in $(x_{t},\xi_{[t-1]})\in{\mathbb{R}}^{n_{t}}\times\Xi_{[t-1]}$ and the infimum in (2.10) is attained. The optimal policy of the controller is given by $\bar{\pi}_{t}(x_{t},\xi_{[t-1]})=\bar{\mu}_{t}(x_{t},\xi_{[t-1]})$ with

\bar{\mu}_{t}(x_{t},\xi_{[t-1]})\in\mathop{\rm arg\,min}\limits_{\mu_{t}\in{\mathfrak{C}}_{t}}\sup_{P_{t}\in{\cal P}^{\xi_{[t-1]}}_{t}}{\mathbb{E}}_{(u_{t},\xi_{t})\sim\mu_{t}\times P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]}\big{)}\right].

(2.11)

Proof.

We have that $V_{T+1}(x_{T+1},\xi_{[T]})=f_{T+1}(x_{T+1})$ and hence $V_{T+1}(x_{T+1},\xi_{[T]})$ is continuous by the assumption. Now suppose that $V_{t+1}(\cdot,\cdot)$ is continuous. Consider

	$\displaystyle\psi(P_{t},x_{t},\mu_{t},\xi_{[t-1]})$	$\displaystyle:=$	$\displaystyle{\mathbb{E}}_{(u_{t},\xi_{t})\sim\mu_{t}\times P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}(F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]})\right]$
		$\displaystyle=$	$\displaystyle\int_{{\cal U}_{t}}\int_{\Xi_{t}}f_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}(F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]})\mu_{t}(du_{t})P_{t}(d\xi_{t}).$

From Assumption 2.1, both $\Xi_{t}$ and ${\cal U}_{t}$ are compact, and subsequently for any $(x_{t}^{n},\xi_{[t-1]}^{n})$ converging to $(x_{t},\xi_{[t-1]})$ , we have that $\phi_{t}^{n}(u_{t},\xi_{t}):=f_{t}(x^{n}_{t},u_{t},\xi_{t})+V_{t+1}(F_{t}(x_{t}^{n},u_{t},\xi_{t}),(\xi_{[t-1]}^{n},\xi_{t}))$ converges to $\phi_{t}(u_{t},\xi_{t}):=f_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}(F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]})$ uniformly in $(u_{t},\xi_{t})\in{\cal U}_{t}\times\Xi_{t}$ . For $P_{t}^{n}\overset{w^{*}}{\to}P_{t}$ and $\mu_{t}^{n}\overset{w^{*}}{\to}\mu_{t}$ , from compactness of $\Xi_{t}$ and ${\cal U}_{t}$ , it can be readily shown that $P_{t}^{n}\times\mu_{t}^{n}$ converges to $P_{t}\times\mu_{t}$ in the weak^∗ topology of the dual of $C(\Xi_{t}\times{\cal U}_{t})$ , and hence $\psi_{t}(P_{t}^{n},x_{t}^{n},\mu_{t}^{n},\xi_{[t-1]}^{n})\to\psi_{t}(P_{t},x_{t},\mu_{t},\xi_{[t-1]})$ (Remark 1.3). In addition, both ${\mathfrak{M}}_{t}$ and ${\mathfrak{C}}_{t}$ with weak^∗ topology are metrizable [6, Theorem 6.30]. It follows that $\psi_{t}$ is continuous jointly in $(P_{t},\mu_{t})$ (with respect to the product weak^∗ topology of ${\mathfrak{M}}_{t}$ and ${\mathfrak{C}}_{t}$ ) and $(x_{t},\xi_{[t-1]})\in{\mathbb{R}}^{n_{t}}\times\Xi_{[t-1]}$ . The rest of the proof then follows from the same lines as in Proposition 2.1. ∎

Remark 2.3.

Note that here the state $x_{t}$ also depends on the history of chosen controls $u_{\tau}\sim\mu_{\tau}$ , $\tau=1,...,t-1$ . Therefore the optimal policy $\bar{\pi}_{t}(x_{t},\xi_{[t-1]})$ cannot be written as functions of $\xi_{[t-1]}$ alone (compare with Remark 2.1). $\hfill\square$

In Section 4 we will discuss necessary and sufficient conditions for the existence of non-randomized optimal policies for the controller.

2.3 Nested Formulation

For non-randomized policies of the controller, dynamic equations (2.7) correspond to the following nested formulation of the respective distributionally robust SOC. For a non-randomized policy $\{\pi_{t}=\pi_{t}(x_{t},\xi_{[t-1]})\}$ , consider the total cost

Z^{\pi}:=\sum_{t=1}^{T}f_{t}(x_{t},\pi_{t},\xi_{t})+f_{T+1}(x_{T+1}).

Recall that we can consider here policies $\pi_{t}(\xi_{[t-1]})$ as functions of $\xi_{[t-1]}$ alone, and that the state $x_{t}$ is also a function of $\xi_{[t-1]}$ (see Remark 2.1). Therefore we can view $Z^{\pi}=Z^{\pi}(\xi_{[T]})$ as a function of the process $\xi_{1},...,\xi_{T}$ .

Consider linear spaces ${\cal Z}_{t}$ of bounded random variables $Z_{t}:\Xi_{[t]}\to{\mathbb{R}}$ . For $t=T,...,1$ , define recursively mappings ${\cal R}_{t}:{\cal Z}_{t}\to{\cal Z}_{t-1}$ ,

{\cal R}_{t}(Z_{t}):=\sup_{P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}}\left\{{\mathbb{E}}_{\xi_{t}\sim P_{t}}[Z_{t}(\xi_{1},...,\xi_{t-1},\xi_{t})]=\int_{\Xi_{t}}Z_{t}(\xi_{1},...,\xi_{t-1},\xi_{t})dP_{t}(\xi_{t})\right\}.

(2.12)

Note that ${\cal Z}_{0}={\mathbb{R}}$ and ${\cal R}_{1}(Z_{1})=\sup_{P_{1}\in{\cal P}_{1}}{\mathbb{E}}_{P_{1}}[Z_{1}]$ is a real number. Let

{\mathfrak{R}}_{T}(Z_{T}):={\cal R}_{1}({\cal R}_{2}(\cdots{\cal R}_{T}(Z_{T})))

(2.13)

be the corresponding composite functional. Then with $Z_{T}=Z^{\pi}$ the nested distributionally robust counterpart of problem (1.1) is

\begin{array}[]{ll}\min\limits_{\pi\in\Pi}{\mathfrak{R}}_{T}\left[\sum_{t=1}^{T}f_{t}(x_{t},\pi_{t},\xi_{t})+f_{T+1}(x_{T+1})\right],\end{array}

(2.14)

where (as before) $\Pi$ is the set of non-randomized policies of the controller.

2.4 Stagewise Independence Setting

An important particular case of the above framework is when the ambiguity sets ${\cal P}_{t}$ do not depend on $\xi_{[t-1]}$ , i.e., ${\cal P}_{t}^{\xi_{[t-1]}}={\cal P}_{t}$ for all $\xi_{[t-1]}\in\Xi_{[t-1]}$ . In that case the value functions $V_{t}(x_{t})$ and optimal policies $\bar{\pi}(x_{t})$ do not depend on $\xi_{[t-1]}$ and the respective dynamic equations can be written as

V_{t}(x_{t})=\inf\limits_{\mu_{t}\in{\mathfrak{C}}_{t}}\sup_{P_{t}\in{\cal P}_{t}}{\mathbb{E}}_{(u_{t},\xi_{t})\sim\mu_{t}\times P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+V_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t})\big{)}\right].

(2.15)

Also in that case the assumption of continuity of the multifunctions ${\mathbb{M}}_{t}(\cdot)\equiv{\cal P}_{t}$ holds automatically.

Here we can consider the static counterpart of problem (2.6) defined as

\min\limits_{\pi\in\Pi}\sup_{P\in{\mathfrak{P}}}{\mathbb{E}}^{\pi}_{\xi_{[T]}\sim P}\left[\sum_{t=1}^{T}f_{t}(x_{t},u_{t},\xi_{t})+f_{T+1}(x_{T+1})\right],

(2.16)

where

{\mathfrak{P}}:=\{P=P_{1}\times\cdots\times P_{T}:P_{t}\in{\cal P}_{t},\;t=1,...,T\}.

(2.17)

In the distributionally robust setting the ambiguity set of probability distributions of $\xi_{[T]}$ of the form (2.17) can be viewed as the counterpart of the stagewise-independence condition.

3 Duality of Game Formulation

The dynamic equations of the dual of problem (2.9) are: $Q_{T+1}(x_{T+1},\xi_{[T]})=f_{T+1}(x_{T+1})$ for all $\xi_{[T]}\in\Xi_{[T]}$ , and for $t=T,...,1$ ,

Q_{t}(x_{t},\xi_{[t-1]})=\sup_{P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}}\inf\limits_{\mu_{t}\in{\mathfrak{C}}_{t}}{\mathbb{E}}_{\mu_{t}\times P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+Q_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]}\big{)}\right].

(3.1)

For a given $P_{t}$ (and $x_{t}$ ) the maximization in the right hand side of (3.1) is over all probability measures $\mu_{t}$ supported on the set $\Xi_{t}$ . It is straightforward to see that the maximum is attained at Dirac measure²²2We denote by $\delta_{a}$ the Dirac measure of mass one at point $a$ . which depends on $u_{t}$ . Therefore it suffices to perform the minimization in (3.1) over Dirac measures $\mu_{t}=\delta_{u_{t}}$ , and hence we can write equation (3.1) in the following equivalent way

Q_{t}(x_{t},\xi_{[t-1]})=\sup_{P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}}\inf\limits_{u_{t}\in{\cal U}_{t}}{\mathbb{E}}_{P_{t}}\left[f_{t}(x_{t},u_{t},\xi_{t})+Q_{t+1}\big{(}F_{t}(x_{t},u_{t},\xi_{t}),\xi_{[t]}\big{)}\right].

(3.2)

Similar to Proposition 2.2, it can be shown that the value functions $Q_{t}(\cdot)$ are well-defined and continuous. By the standard theory of min-max, we have that $Q_{t}(\cdot)\leq V_{t}(\cdot)$ . We next establish that equality indeed holds when the multifunctions ${\mathbb{M}}_{t}$ , $t=1,...,T$ , are convex-valued, i.e., the sets ${\cal P}_{t}^{\xi_{[t-1]}}={\mathbb{M}}_{t}(\xi_{[t-1]})$ are convex for all $\xi_{[t-1]}$ .

Proposition 3.1.

Suppose that Assumption 2.1 holds, and the multifunctions ${\mathbb{M}}_{t}$ are continuous and convex-valued, then $Q_{t}(\cdot)=V_{t}(\cdot)$ and the min-max problem in (3.1) possesses saddle point $(P_{t}^{*},\mu_{t}^{*})$ , for all $t=1,\ldots,T$ .

Proof.

Under the specified assumptions, the sets ${\cal P}_{t}^{\xi_{[t-1]}}$ and ${\mathfrak{C}}_{t}$ are weakly^∗ compact and convex, and of course the expectation ${\mathbb{E}}_{\mu_{t}\times P_{t}}$ is linear with respect to $\mu_{t}$ and $P_{t}$ and continuous in the respective weak^∗ topologies. Then by using Sion’s duality theorem (see Remark 1.4) and applying induction backward in time, we can conclude that $Q_{t}(\cdot)=V_{t}(\cdot)$ , $t=1,...,T$ . Further by compactness arguments the respective min-max and max-min problems have optimal solutions, which implies existence of the saddle point. ∎

It could be worth mentioning here that the duality of the game formulation (when considering randomized policies $\widehat{\Pi}$ of the controller) does not require the convexity of ${\cal U}_{t}$ or any assumptions on the transition functions $F_{t}$ .

4 Existence of Non-randomized Optimal Policies

We have the following necessary and sufficient condition for the existence of a non-randomized optimal policy of the controller (cf., [9, Theorem 2.2]).

Theorem 4.1.

Suppose that Assumption 2.1 holds, the multifunctions ${\mathbb{M}}_{t}$ are continuous and convex-valued. Then the controller has a non-randomized optimal policy iff the min-max problem in the right hand side of (2.7) has a saddle point for all $t=1,...,T$ and $(x_{t},\xi_{[t-1]})$ .

Proof.

Note that the min-max problem (2.7) (resp. the max-min problem (3.2)) has a saddle point iff $Q_{t}(\cdot)={\cal V}_{t}(\cdot)$ , $t=1,...,T$ .

It is clear that $V_{t}(\cdot)\leq{\cal V}_{t}(\cdot)$ , and $Q_{t}(\cdot)\leq V_{t}(\cdot)$ . Suppose that for $t=1,...,T$ and all $(x_{t},\xi_{[t-1]})$ , the min-max problem (2.7) possesses a saddle point $(\bar{u}_{t},P_{t})$ possibly depending on $(x_{t},\xi_{[t-1]})$ . Then we have ${\cal V}_{t}(\cdot)=Q_{t}(\cdot)$ and subsequently $V_{t}(\cdot)={\cal V}_{t}(\cdot)$ . It follows that $\bar{\mu}_{t}=\delta_{\bar{u}_{t}}$ is an optimal solution of the min-max problem (2.10) and hence a non-randomized optimal policy of the controller.

Conversely, suppose that the controller has a non-randomized optimal policy. Then $V_{t}(\cdot)={\cal V}_{t}(\cdot)$ , $t=1,...,T$ . From Proposition 3.1, we also have that $V_{t}(\cdot)=Q_{t}(\cdot)$ , and hence $Q_{t}(\cdot)={\cal V}_{t}(\cdot)$ , $t=1,...,T$ . Since problem (2.7) and its dual have optimal solutions, this implies the existence of a saddle point for (2.7) (see Remark 1.4). ∎

In particular, we have the following result (cf., [5]).

Corollary 4.1.

Suppose that the sets ${\cal U}_{t}$ are convex, for every $\xi_{t}\in\Xi_{t}$ the functions $f_{t}(x_{t},u_{t},\xi_{t})$ are convex in $(x_{t},u_{t})$ , the mappings $F_{t}(x_{t},u_{t},\xi_{t})=A_{t}(\xi_{t})x_{t}+B(\xi_{t})u_{t}+b_{t}(\xi_{t})$ are affine, the multifunctions ${\mathbb{M}}_{t}$ are continuous and convex-valued, $t=1,...,T$ , and Assumption 2.1 holds. Then the value functions $V_{t}(x_{t},\xi_{[t-1]})={\cal V}_{t}(x_{t},\xi_{[t-1]})$ are convex in $x_{t}$ for every $\xi_{[t-1]}$ , and the controller has a non-randomized optimal policy.

Proof.

It can be directly verified that under the specified assumptions, the value functions ${\cal V}_{t}(\cdot,\xi_{[t-1]})$ are convex for all $t=1,\ldots T$ and $\xi_{[t-1]}\in\Xi_{[t-1]}$ . Consequently the min-max problem (2.7) has a saddle point (see Remark 1.4). ∎

In many interesting applications the considered problem is convex in the sense of Corollary 4.1. In such cases there is no point of considering randomized policies. As an example consider the classical Inventory Model (cf., [18]).

Inventory Model.

Consider the following inventory model (cf., [18])

\begin{array}[]{cll}\min&{\mathbb{E}}\left[\sum_{t=1}^{T}c_{t}u_{t}+\psi_{t}(x_{t},u_{t},D_{t})\right]\\ {\rm s.t.}&u_{t}\in{\cal U}_{t},\;\;x_{t+1}=x_{t}+u_{t}-D_{t},\;t=1,...,T,\end{array}

(4.1)

where $\psi_{t}(x_{t},u_{t},d_{t}):=b_{t}[d_{t}-(x_{t}+u_{t})]_{+}+h_{t}[x_{t}+u_{t}-d_{t}]_{+},$ $D_{1},...,D_{T}$ is a (random) demand process, $c_{t},b_{t},h_{t}$ are the ordering, backorder penalty and holding costs per unit, respectively, $x_{t}$ is the inventory level, $u_{t}$ is the order quantity at time $t$ , and ${\cal U}_{t}=[0,a_{t}]$ . Suppose that the distribution of $(D_{1},...,D_{T})$ is supported on $\Xi_{1}\times\cdots\times\Xi_{T}$ with $\Xi_{t}$ , $t=1,...,T$ , being a finite subinterval of the nonnegative real line.

The distributionally robust counterpart of (4.1) is a convex problem provided the respective multifunctions ${\mathbb{M}}_{t}$ are continuous and convex valued. It follows by Corollary 4.1 that there is a non-randomized optimal policy of the controller.

5 Construction of Ambiguity Sets

There are several ways how the ambiguity sets can be constructed. We discuss now an approach which can be considered as an extension of the risk averse modeling of multistage stochastic programs.

Let ${\mathbb{P}}$ be a probability measure on a (closed) set $\Xi\subset{\mathbb{R}}^{d}$ , and ${\cal P}$ be a set of probability measures on $\Xi$ absolutely continuous with respect to ${\mathbb{P}}$ . Consider the corresponding distributionally robust functional

{\cal R}(Z):=\sup_{P\in{\cal P}}{\mathbb{E}}_{P}[Z]

(5.1)

defined on an appropriate space ${\cal Z}$ of random variables $Z:\Xi\to{\mathbb{R}}$ . Such functional can be considered as a coherent risk measure (e.g., [14, Chapter 6]). The functional ${\cal R}$ is law invariant if ${\cal R}(Z)={\cal R}(Z^{\prime})$ when $Z$ and $Z^{\prime}$ are distributionally equivalent, i.e., ${\mathbb{P}}(Z\leq z)={\mathbb{P}}(Z^{\prime}\leq z)$ for any $z\in{\mathbb{R}}$ . The law invariant coherent risk measure can be considered as a function of the cumulative distribution function (cdf) $H_{Z}(z):={\mathbb{P}}(Z\leq z)$ .

An important example of law invariant coherent risk measure is the Average Value-at-Risk

{\sf AV@R}_{\alpha}(Z):=(1-\alpha)^{-1}\int_{\alpha}^{1}{\sf V@R}_{\tau}(Z)d\tau,\;\alpha\in[0,1),

where ${\sf V@R}_{\tau}(Z)=\inf\{z:{\mathbb{P}}_{t}(Z\leq z)\geq\tau\}$ . It has dual representation of the form (5.1) with³³3The notation $P\ll Q$ means that measure $P$ is absolutely continuous with respect to measure $Q$ .

{\cal P}=\left\{P\ll{\mathbb{P}}:dP/d{\mathbb{P}}\leq(1-\alpha)^{-1}\right\}.

Now let ${\mathbb{P}}$ be a reference probability measure on $\Xi_{[T]}=\Xi_{1}\times\cdots\times\Xi_{T}$ . Denote by ${\mathbb{P}}_{t}$ and ${\mathbb{P}}_{[t]}$ the corresponding marginal probability measures on $\Xi_{t}$ and $\Xi_{[t]}$ . Let us define ${\mathbb{P}}_{t}^{\xi_{[t-1]}}$ , with respect to $\Xi_{[t]}=\Xi_{[t-1]}\times\Xi_{t}$ and ${\mathbb{P}}_{[t-1]}$ , recursively going backward in time. That is, using the Regular Probability Kernel (RPK) (e.g., [4, III- 70]), define ${\mathbb{P}}_{t}^{\xi_{[t-1]}}$ as a probability measure on $\Xi_{t}$ for almost every (a.e.) $\xi_{[t-1]}\in\Xi_{[t-1]}$ (a.e. with respect to ${\mathbb{P}}_{[t-1]}$ ), and for any measurable sets $A\subset\Xi_{[t-1]}$ and $B\subset\Xi_{t}$ ,

{\mathbb{P}}(A\times B)=\int_{A}{\mathbb{P}}_{t}^{\xi_{[t-1]}}(B)d{\mathbb{P}}_{[t-1]}(\xi_{[t-1]}).

(5.2)

The conditional counterpart ${\cal R}_{t}^{\xi_{[t-1]}}$ of the law invariant coherent risk measure can be defined as the respective function of the conditional cdf of $Z$ (cf., [15]). For example, the conditional counterpart of the Average Value-at-Risk can be defined in that way and is given by

{\sf AV@R}^{\xi_{[t-1]}}_{\alpha}(Z)=(1-\alpha)^{-1}\int_{\alpha}^{1}{\sf V@R}_{\tau|\xi_{[t-1]}}(Z)d\tau,\;\alpha\in[0,1),

where ${\sf V@R}_{\tau|\xi_{[t-1]}}$ is the conditional counterpart of ${\sf V@R}_{\tau}$ .

In turn this defines the set ${\cal P}_{t}^{\xi_{[t-1]}}$ of conditional probability measures. The respective one-step mappings ${\cal R}_{t}:{\cal Z}_{t}\to{\cal Z}_{t-1}$ , defined in (2.12), are given by

[{\cal R}_{t}(Z_{t})](\xi_{[t-1]})={\cal R}_{t}^{\xi_{[t-1]}}(Z_{t}).

(5.3)

This leads to the respective nested functional ${\mathfrak{R}}_{T}$ , defined in (2.13), and the nested formulation (2.14) of the problem with respect to non-randomized policies of the controller.

In this construction of nested counterparts of law invariant risk measures it is essential that the ambiguity sets consist of probability measures absolutely continuous with respect to the reference measure. However, the above approach can be extended beyond that setting. That is, the conditional set ${\cal P}_{t}^{\xi_{[t-1]}}$ of probability measures on $\Xi_{t}$ can be defined in some away with respect to the conditional distribution ${\mathbb{P}}_{t}^{\xi_{[t-1]}}$ . For example, ${\cal P}_{t}^{\xi_{[t-1]}}$ can consist of probability measures $P_{t}\in{\mathfrak{M}}_{t}$ with a Wasserstein distance from ${\mathbb{P}}_{t}^{\xi_{[t-1]}}$ less than or equal to a constant $r_{t}>0$ .

Of course, in such construction it should be verified that the dynamic programming equations (2.7) (in the case of non-randomized policies) and (2.10) (in the case of randomized policies) are well defined. There are two important cases where the corresponding multifunctions ${\mathbb{M}}_{t}$ are continuous and hence the value functions are continuous (Proposition 2.2). One such case is when the ambiguity sets ${\cal P}_{t}$ do not depend on $\xi_{[t-1]}$ (see Section 2.4). This corresponds to the case where ${\mathbb{P}}$ is given by the direct product ${\mathbb{P}}_{1}\times\cdots\times{\mathbb{P}}_{t}$ of the marginal measures. In that case, the ambiguity sets ${\cal P}_{t}\subset{\mathfrak{M}}_{t}$ can be arbitrary. For example, ${\cal P}_{t}$ can be the set of probability measures with Wasserstein distance from the reference measure ${\mathbb{P}}_{t}$ , less than or equal to a positive constant $r_{t}$ . The dynamic programming equations take the form (2.15), and by Proposition 2.2 the value functions $V_{t}(x_{t})$ are continuous.

Another important case is when the sets $\Xi_{t}$ , $t=1,...,T$ , are finite, i.e., $(\xi_{1},...,\xi_{T})$ has discrete distribution with a finite support. Then continuity of the multifunctions ${\mathbb{M}}_{t}$ trivially holds, and hence the value functions $V_{t}(x_{t},\xi_{[t-1]})$ are continuous.

6 Conclusions

The main technical assumption used in the considered construction is continuity of the multifunctions ${\mathbb{M}}_{t}$ . This assumption ensures continuity of the value functions and hence measurability of the objective functions in the dynamic programming equations (2.7) and (2.10). It holds automatically in the two important cases, namely when the ambiguity sets do not depend on the history of the process $\xi_{t}$ or when the sets $\Xi_{t}$ are finite. It could be noted that just upper semicontinuity of ${\mathbb{M}}_{t}$ is not sufficient for ensuring continuity (and even semicontinuity) of the value functions. In general it could be difficult to verify the assumption of continuity of ${\mathbb{M}}_{t}$ , in which case the measurability question remains open.

By using the game formulation it is possible to consider the randomized policies of the controller. In Theorem 4.1 we give necessary and sufficient conditions for existence of non-randomized optimal policies. The assumption that the multifunctions ${\mathbb{M}}_{t}$ are convex-valued, i.e., that the respective ambiguity sets are convex, is rather mild. The assumption of continuity of ${\mathbb{M}}_{t}$ is needed in order to ensure that the respective dynamic equations are well defined.

There is a delicate point which we would like to mention. In the game formulation, after the nature chooses the probability measure $P_{t}\in{\cal P}_{t}^{\xi_{[t-1]}}$ , the system moves to the next state $x_{t+1}=F_{t}(x_{t},u_{t},\xi_{t})$ according to probability distribution $\xi_{t}\sim P_{t}$ . On the other hand in the risk averse approach is assumed existence of reference measures ${\mathbb{P}}_{t}$ and the probability law of $\xi_{t}$ is defined by $\xi_{t}\sim{\mathbb{P}}_{t}$ . In both cases, the game and the nested risk averse (for non-randomized policies) approaches lead to the same dynamic equations and the same optimal policies of the controller and the same optimal values. However, the interpretation in terms of realizations of the random process could be different.

References

[1] D.P. Bertsekas and S.E. Shreve. Stochastic Optimal Control, The Discrete Time Case. Academic Press, New York, 1978.
[2] P. Billingsley. Probability and Measure, Third Edition. Wiley, 1995.
[3] J. Frédéric Bonnans and Alexander Shapiro. Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer, 2000.
[4] Dellacherie C. and Meyer P.A. Probabilities and Potential. Holland Publishing, Amsterdam, Netherlands, 1988.
[5] E. Delage, D. Kuhn, and W. Wiesemann. ”Dice”-sion - making under uncertainty: When can a random decision reduce risk? Management Science, 65(7):3282 – 3301, 2019.
[6] A Hitchhiker’s Guide. Infinite dimensional analysis. Springer, 2006.
[7] G.N. Iyengar. Robust Dynamic Programming. Mathematics of Operations Research, 30:257–280, 2005.
[8] A. Jaśkiewicz and A. S. Nowak. Zero-sum stochastic games. In T. Basar and G. Zaccour, editors, Handbook of dynamic game theory. Springer, 2016.
[9] Yan Li and A. Shapiro. Rectangularity and duality of distributionally robust markov decision processes. https://arxiv.org/abs/2308.11139, 2023.
[10] A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition probabilities. Operations Research, 53:780–798, 2005.
[11] A. Shapiro. Rectangular sets of probability measures. Operations Research, 64(2):528–541, 2016.
[12] A. Shapiro. Interchangeability principle and dynamic equations in risk averse stochastic programming. Operations Research Letters, 45:377–381, 2017.
[13] A. Shapiro. Distributionally robust optimal control and MDP modeling. Operations Research Letters, 49:809–814, 2021.
[14] A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia, third edition, 2021.
[15] A. Shapiro and A. Pichler. Conditional distributionally robust functionals. Operations Research, online:1–13, 2023.
[16] M. Sion. On general minimax theorems. Pacific Journal of Mathematics, 8:171–176, 1958.
[17] S. Wang, N. Si, J. Blanchet, and Z. Zhou. On the foundation of distributionally robust reinforcement learning. https://arxiv.org/abs/2311.09018, 2023.
[18] P.H. Zipkin. Foundation of inventory management. McGraw-Hill, Boston, MA, 2000.

7 Appendix

Let ${\cal X}$ and $\Gamma$ be nonempty compact metric spaces, $f:{\cal X}\times\Gamma\to{\mathbb{R}}$ be continuous real valued unction, and $\Phi:\Gamma\rightrightarrows{\cal X}$ be a multifunction (point-to-set mapping). It is said that the multifunction $\Phi$ is closed if $\gamma_{n}\to\gamma$ , $x_{n}\in\Phi(\gamma_{n})$ and $x_{n}\to x$ implies that $x\in\Phi(\gamma)$ . If $\Phi$ is closed, then $\Phi$ is closed valued, i.e., the set $\Phi(\gamma)$ is closed for every $\gamma\in\Gamma$ . It is said that the multifunction $\Phi$ is upper semicontinuous if for any $\bar{\gamma}\in\Gamma$ and any neighborhood ${\cal V}$ of $\Phi(\bar{\gamma})$ there is a neighborhood ${\cal U}$ of $\gamma$ such that $\Phi(\gamma)\subset{\cal V}$ for any $\gamma\in{\cal U}$ . In the considered framework of compact sets, the multifunction $\Phi$ is closed iff it is closed valued and upper semicontinuous (e.g., [3, Lemma 4.3]).

It is said that the multifunction $\Phi$ is lower semicontinuous if for any $\bar{\gamma}\in\Gamma$ and any neighborhood ${\cal V}$ of $\Phi(\bar{\gamma})$ there is a neighborhood ${\cal U}$ of $\gamma$ such that ${\cal V}\cap\Phi(\gamma)\neq\emptyset$ for any $\gamma\in{\cal U}$ . It is said that $\Phi$ is continuous if it is closed and lower semicontinuous.

Consider value function

v(\gamma):=\sup_{x\in\Phi(\gamma)}f(x,\gamma).

(7.1)

We have the following result (e.g., [3, Proposition 4.4]).

Theorem 7.1.

Suppose that the sets ${\cal X}$ and $\Gamma$ are compact, the function $f:{\cal X}\times\Gamma\to{\mathbb{R}}$ is continuous, and the multifunction $\Phi:\Gamma\rightrightarrows{\cal X}$ is continuous. Then the value function $v(\gamma)$ is real valued and continuous.