Preference Robust Modified Optimized Certainty Equivalent

Qiong Wu¹¹1Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong. Email: [email protected] and Huifu Xu²²2Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong. Email: [email protected]

Abstract

Ben-Tal and Teboulle [6] introduce the concept of optimized certainty equivalent (OCE) of an uncertain outcome as the maximum present value of a combination of the cash to be taken out from the uncertain income at present and the expected utility value of the remaining uncertain income. In this paper, we consider two variations of the OCE. First, we introduce a modified OCE by maximizing the combination of the utility of the cash and the expected utility of the remaining uncertain income so that the combined quantity is in a unified utility value. Second, we consider a situation where the true utility function is unknown but it is possible to use partially available information to construct a set of plausible utility functions. To mitigate the risk arising from the ambiguity, we introduce a robust model where the modified OCE is based on the worst-case utility function from the ambiguity set. In the case when the ambiguity set of utility functions is constructed by a Kantorovich ball centered at a nominal utility function, we show how the modified OCE and the corresponding worst case utility function can be identified by solving two linear programs alternatively. We also show the robust modified OCE is statistically robust in a data-driven environment where the underlying data are potentially contaminated. Some preliminary numerical results are reported to demonstrate the performance of the modified OCE and the robust modified OCE model.

Keywords. Robust modified optimized certainty equivalent, ambiguity of utility function, Kantorovich ball, piecewise linear approximation, error bounds, statistical robustness

1 Introduction

Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space with $\sigma$ algebra $\mathcal{F}$ and probability measure $\mathbb{P}$ and $\xi:(\Omega,\mathcal{F},\mathbb{P})\to{\rm I\!R}$ be a random variable representing future income of a decision maker (DM). The optimized certainty equivalent of $\xi$ is defined as

{\rm(OCE)}\quad\quad\displaystyle{S_{u}(\xi):=\sup_{x\in{\rm I\!R}}\;\;\{x+{\mathbb{E}}_{P}[u(\xi-x)]\}},

(1.1)

where $u:{\rm I\!R}\to{\rm I\!R}$ is the decision maker’s utility function and $P:=\mathbb{P}\circ\xi^{-1}$ is the probability measure on ${\rm I\!R}$ induced by $\xi$ . The concept is first introduced by Ben-Tal and Teboulle [6] and closely related to other notions of certainty equivalent and risk measures, see [7] for a comprehensive discussion. The economic interpretation of this notion is that the decision maker may need to consume part of $\xi$ at present, denoted by $x$ , the sure present value of $\xi$ under the consumption plan becomes $x+{\mathbb{E}}_{P}[u(\xi-x)]$ , and the optimized certainty equivalent $S_{u}(\xi)$ gives rise to the optimal allocation of the consumption which maximizes the sure present value of $\xi$ . As a measure, it enjoys a number of nice properties including constancy ( $S_{u}(C)=C$ for constant $C$ ), risk aversion ( $S_{u}(\xi)\leq{\mathbb{E}}_{P}[\xi]$ ) and translation invariance ( $S_{u}(\xi+C)=S_{u}(\xi)+C$ ). In particular, if $u$ is a normalized exponential utility function, it coincides with the classical certainty equivalent $u^{-1}({\mathbb{E}}_{P}[u(\xi)])$ in the literature of economics. Moreover, if $u(t)=-\frac{1}{\alpha}(-t)_{+}$ where $\alpha\in(0,1)$ and $(t)_{+}=\max\{t,0\}$ for $t\in{\rm I\!R}$ , then the OCE effectively recovers the conditional value-at-risk (CVaR):

	$\displaystyle\displaystyle S_{u}(\xi)$	$\displaystyle=$	$\displaystyle\sup_{x\in{\rm I\!R}}\;\;\left\{x-\frac{1}{\alpha}{\mathbb{E}}_{P}[(-\xi+x)_{+}]\right\}$		(1.2)
		$\displaystyle=$	$\displaystyle-\inf_{x\in{\rm I\!R}}\;\;\left\{x+\frac{1}{\alpha}{\mathbb{E}}_{P}[(-\xi-x)_{+}]\right\}=-\text{ CVaR}_{\alpha}(-\xi).$		(1.2)

The last equality is Rockafellar and Uryasev’s formulation of CVaR, see [7, 41]. Since CVaR is average of quantile, it is also known as average value-at-risk (AVaR), tail value-at-risk (TVaR) and expected shortfall, see [38, 37].

In this paper, we revisit the subject OCE from two perspectives. One is to consider a modified version of the optimized certainty equivalent

{\rm(MOCE)}\quad\quad\displaystyle{M_{u}(\xi):=\sup_{x\in{\rm I\!R}}\;\;\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}}.

(1.3)

The modification is motivated to align the sure present value of $\xi$ to the expected utility theory [35] by considering the utility of present consumption $u(x)$ instead of the monetary value $x$ . Recall that in Von Neumann-Morgenstern expected utility theory [35], the utility function is used to represent the decision maker’s preference relation over a prospect space including both random and deterministic prospects, and such representation is unique up to positive linear transformation. This means we can use both $u$ and $100u$ to represent the DM’s preference. However, the two utility functions would lead to completely different optimal values and optimal solutions in the OCE model. In contrast, the optimal solution is not affected in the MOCE model, and the optimal value is only affected by the same scale of the utility function. In our view, this kind of “invariance” of the optimal allocation $x^{*}$ and “scalability” w.r.t. the utility function is important because the optimal decision on the allocation/consumption $x$ should be determined by the DM’s risk preference irrespective of its equivalent representations.

The modified OCE model may be regarded as a special case of the well known consumption/investment models in economics [35, 19, 12] where $u(x)$ is the utility of the current consumption/investment whereas ${\mathbb{E}}[u(\xi-x)]$ is the expected utility of the remaining asset to be consumed/invested in future. In these models, the utility functions for the current consumption and future consumption are identical. It is also possible to use different utility functions when the consumption at present is used for a new investment or production.

The other is to consider a situation where the decision maker’s utility function $u(\cdot)$ is ambiguous, in other words, there is incomplete information to identify a utility function $u$ which captures the decision maker’s true utility preference. Consequently we propose to consider a robust optimized certainty equivalent measure

{\rm(RMOCE)}\quad\quad\displaystyle{R(\xi):=\sup_{x\in{\rm I\!R}}\inf_{u\in{\cal U}}\;\;\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}},

(1.4)

where ${\cal U}$ is a set of plausible utility functions consistent with the observed utility preferences of the decision maker. The definition is in line with the philosophy of robust optimization where the optimized certainty equivalent value is based on the worst case utility function from set ${\cal U}$ to mitigate the risk arising from potential inaccurate use or misuse of the utility function. By convention, we call ${\cal U}$ the ambiguity set. In the case that ${\cal U}$ is a singleton, RMOCE reduces to MOCE. Note that RMOCE should be differentiated from the distributionally robust formulation of OCE by Wisemann et al. [50] where the focus is on the ambiguity of $P$ . In decision analysis, $P$ is known as a decision maker’s belief of the state of nature whereas $u$ characterizes the decision maker’s taste for risk/utility. The RMOCE model concerns the ambiguity of decision maker’s taste rather than belief.

Ambiguity of utility preference is a well discussed topic in behavioural economics. For instances, Thurstone [44] regards such ambiguity as a lack of accurate description of human behaviour. Karmarkar [27] and Weber [49] ascribe the ambiguity to cognitive difficulty and incomplete information. The ambiguity may also arise in the decision making problems which involve several stakeholders who fail to reach a consensus. Parametric and non-parametric approaches have subsequently been proposed to assess the true utility function, including discrete choice models (Train [45]), standard and paired gambling approaches for preference comparisons and certainty equivalence (Farquhar [15]), we refer readers to Hu et al. [23] for an excellent overview on this.

In decision making under uncertainty, a decision maker may choose the worst case utility function among a set of plausible utility functions representing his/her risk preference to mitigate the overall risk. This kind of research may be traced back to Maccheroni [33]. Cerreia-Vioglio et al. [10] seem to be the first to investigate ambiguity of decision maker’s utility function in the certainty equivalent model $u^{-1}({\mathbb{E}}[u(\xi)])$ by considering the worst-case certainty equivalent from a given set of utility functions in their cautious expected utility model. They show that the DM’s risk preference can be represented by a worst-case certainty equivalent if and only if they are given by a binary relation satisfying the weak order, continuity, weak monotonicity and negative certainty independence (NCI) (NCI states that if a sure outcome is not enough to compensate the DM for a risky prospect, then its mixture with another lottery which reduces the certainty appeal, will not be more attractive than the same mixture of the risky prospect and the lottery).

Armbruster and Delage [3] give a comprehensive treatment of the topic from minimax preference robust optimization (PRO) perspective. Specifically, they propose to use available information of the decision maker’s utility preference such as preferring certain lotteries over other lotteries and being risk averse, $S$ -shaped or prudent to construct an ambiguity set of plausible utility functions and then base the optimal decision on the worst case utility function from the ambiguity set. Hu and Mehrotra [24] consider a probabilistic representation of the class of increasing concave utility functions by confining them to a compact interval and normalizing them with range $[0,1]$ . In doing so, they propose a moment-type framework for constructing the ambiguity set of the decision maker’s utility preference which covers a number of important approaches such as the certainty equivalent and pairwise comparison. Hu and Stepanyan [25] propose a so-called reference-based almost stochastic dominance method for constructing a set of utility functions near a reference utility which satisfies certain stochastic dominance relationship and use the set to characterize the decision maker’s preference. Over the past few years, the research on PRO has received increasing attentions in the communities of stochastic/robust optimization and risk management, see for instances [22, 21, 14, 52, 31, 32] and references therein.

In both (MOCE) and (RMOCE) models, the true probability distribution $P$ is assumed to be known. In the data driven problems, the true $P$ is unknown but it is possible to use empirical data to construct an approximation of $P$ . Unfortunately, such data may be contaminated and consequently we may be concerned by the quality of the MOCE values calculated as such. This kind of issue is well studied in robust statistics [26] and can be traced down to earlier work of Hample [20]. Cont et al. [13] first study the quality of the plug-in estimators of law invariant risk measures using Hampel’s classical concept of qualitative robustness [20], that is, the plug-in estimator of a risk functional is said to be qualitatively robust if it is insensitive to the variation of sampling data. According to Hampel’s theorem, Cont et al. [13] demonstrate that the qualitative robustness of a plug-in estimator is equivalent to the weak continuity of the risk functional and that value at risk (VaR) is qualitatively robust whereas conditional value at risk (CVaR) is not. Krätschmer et al. [30] argue that the use of Hampel’s classical concept of qualitative robustness may be problematic because it requires the risk measure essentially to be insensitive with respect to the tail behaviour of the random variable and the recent financial crisis shows that a faulty estimate of tail behaviour can lead to a drastic underestimation of the risk. Consequently, they propose a refined notion of qualitative robustness that applies also to tail-dependent statistical functionals and that allows one to compare statistical functionals in regards to their degree of robustness. The new concept captures the trade-off between robustness and sensitivity and can be quantified by an index of qualitative robustness. Guo and Xu [17] take a step forward by deriving quantitative statistical robustness of PRO models. Xu and Zhang [51] extend the analysis to distributionally robust optimization models.

In this paper, we consider a situation where the decision maker has a nominal utility function but is short of complete information as to whether it is the true. Consequently we propose to use the Kantorovich ball centered at the nominal utility function as the ambiguity set. We begin with piecewise linear utility (PLU) functions defined over a convex and closed interval of ${\rm I\!R}$ and show that the inner minimization problem in the definition of RMOCE can be reformulated as a linear program when $\xi$ has a finite discrete distribution. We then propose an iterative algorithm to compute the RMOCE by solving the inner minimization problem and outer maximization problem alternatively.

To extend the scope of the proposed computational method, we extend the discussion to the cases that the utility functions are not necessarily piecewise linear and the domain of the utility function is unbounded. We derive error bounds arising from using PLU-based RMOCE to approximate the general RMOCE. Since our numerical scheme for computing the RMOCE is based on the samples of $\xi$ , we study statistical robustness of the sample-based RMOCE to address the case that the sample data of $\xi$ are potentially contaminated. Finally we carry out some numerical tests on the proposed computational schemes for concave utility functions.

The rest of the paper are organized as follows. Section 2 discusses the basic properties of MOCE and RMOCE. Section 3 presents numerical schemes for computing the RMOCE when the utility functions in the ambiguity set are piecewise linear. Section 4 details approximation of the ambiguity set of general utility functions by the ambiguity set of piecewise linear utility functions and its effect on RMOCE. Section 5 discusses the RMOCE model with utility function having unbounded domain and streamlines the potential extensions of the MOCE model to multi-attribute decision making. Section 6 discusses statistical robustness of RMOCE when it is calculated with contaminated data. Section 7 reports numerical results and finally Section 8 concludes with a brief summary of the main contributions of the paper.

2 Properties of MOCE and RMOCE

We begin by discussing the well-definedness of MOCE and RMOCE. Let $L_{p}(\Omega,{\cal F},\mathbb{P})$ denote the space of random variables mapping from $(\Omega,{\cal F},\mathbb{P})$ to ${\rm I\!R}$ with finite $p$ -th order moments and $\xi\in L_{p}(\Omega,{\cal F},\mathbb{P})$ . Let $\mathscr{U}:{\rm I\!R}\to{\rm I\!R}$ be the set of nondecreasing concave utility functions. Throughout this paper, we make a blanket assumption to ensure the well-definedness of the expected utility in the definitions of MOCE and RMOCE.

Assumption 2.1

There exist gauge functions $\phi_{1}:{\rm I\!R}\to{\rm I\!R}$ and $\phi_{2}:{\rm I\!R}\to{\rm I\!R}$ parameterized by $x$ satisfying ${\mathbb{E}}_{P}[\phi_{i}(\xi)]<\infty$ for $i=1,2$ such that

|u(\xi-x)|\leq\phi_{1}(\xi)\quad\text{and}\quad\sup_{u\in{\mathscr{U}}}|u(\xi-x)|\leq\phi_{2}(\xi),\forall x,\xi\in{\rm I\!R}.

The condition stipulates the interaction between the tails distribution of $\xi$ and tails of the utility function. We refer readers to Guo and Xu [18] for more detailed discussions on this. To facilitate the forthcoming discussions, we let $\mathscr{P}(\Xi)$ denote the set of probability measures on $\Xi\subset{\rm I\!R}$ , and for each fixed $x$ , define

{\cal M}^{\phi_{i}}:=\{P\in\mathscr{P}(\Xi):{\mathbb{E}}_{P}[\phi_{i}(\xi)]<\infty\}

for $i=1,2$ . Let ${\cal C}_{\Xi}^{\phi_{i}}$ denote the class of continuous functions $h:\Xi\to{\rm I\!R}$ such that $|h(t)|\leq C(\phi_{i}(t)+1)$ for all $t\in\Xi$ . The $\phi_{i}$ -topology, denoted by $\tau_{\phi_{i}}$ , is the coarsest topology on ${\cal M}^{\phi_{i}}$ for which the mapping $g_{h}:=\int_{\Xi}h(z)P(dz),\;h\in{\cal C}_{\rm I\!R}^{\phi_{i}}$ is continuous. A sequence $\{P_{N}\}\subset{\cal M}^{\phi_{i}}$ is said to converge $\phi_{i}$ -weakly to $P\in{\cal M}^{\phi_{i}}$ written ${P_{N}}\xrightarrow[]{\phi_{i}}P$ if it converges w.r.t. $\tau_{\phi_{i}}$ . Note that in the case that when the support set of $\xi$ is a compact set in ${\rm I\!R}$ , then the $\phi_{i}$ -topology reduces to ordinary topology of weak convergence.

Our first technical result is on the attainability of the optimum in the definition of MOCE.

Proposition 2.1

Assume: (a) 2.1 holds, (b) there exists $\alpha$ such that $\{x\in{\rm I\!R}:u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\geq\alpha\}$ is a compact set, (c) the support set of $\xi$ , denoted by $\Xi=[\xi_{\min},\xi_{\max}]$ , is bounded, (d) $u$ is strictly concave over $\Xi$ . Then for $P\in{\cal M}^{\phi_{1}}$ ,

M_{u}(\xi)=\sup_{x\in[\xi_{\min}/2,\;\xi_{\max}/2]}\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}.

(2.5)

Moreover, if $\{P_{N}\}\subset\mathscr{P}(\Xi)$ and $P_{N}$ converges weakly to $\delta_{\hat{\xi}}$ , the Dirac probability measure at $\hat{\xi}$ , then $M_{u}(\xi_{N})$ converges to $2u(\hat{\xi}/2)$ .

Proof. Since $u$ is a strictly concave function, (1.3) is a convex optimization problem. Condition (b) ensures existence of an optimal solution, denoted by $x^{*}$ . Following a similar analysis to the proof of [7, Lemma 2.1], we can write down the first order optimality condition of the program at $x^{*}$ ,

0\in\partial u(x^{*})+\partial{\mathbb{E}}_{P}[u(\xi-x^{*})],

(2.6)

where $\partial u$ denotes convex subdifferential [40]. Since $\partial u(x)=[u_{+}^{\prime}(x),u_{-}^{\prime}(x)]$ for any $x\in{\rm I\!R}$ , where $u^{\prime}_{-},u^{\prime}_{+}$ denote the left derivative and right derivative of $u$ at $x$ and

\partial{\mathbb{E}}_{P}[u(\xi-x^{*})]=-{\mathbb{E}}_{P}[\partial u(\xi-x^{*})],

where the expectation/integration at the right hand side is in the sense of Aumann [4]. Consequently we can rewrite (2.6) as

	$\displaystyle 0$	$\displaystyle\in$	$\displaystyle[u_{+}^{\prime}(x^{}),u_{-}^{\prime}(x^{})]-{\mathbb{E}}_{P}\left[[u_{+}^{\prime}(\xi-x^{}),u_{-}^{\prime}(\xi-x^{})]\right]$		(2.7)
		$\displaystyle=$	$\displaystyle[u_{+}^{\prime}(x^{}),u_{-}^{\prime}(x^{})]-\left[{\mathbb{E}}_{P}[u_{+}^{\prime}(\xi-x^{})],{\mathbb{E}}_{P}[u_{-}^{\prime}(\xi-x^{})]\right],$		(2.7)

which yields

u^{\prime}_{+}(x^{*})-{\mathbb{E}}_{P}[u^{\prime}_{-}(\xi-x^{*})]\leq 0\leq u^{\prime}_{-}(x^{*})-{\mathbb{E}}_{P}[u^{\prime}_{+}(\xi-x^{*})].

Since $u^{\prime}_{-}$ and $u^{\prime}_{+}$ are non-increasing, the inequality above implies

u^{\prime}_{+}(x^{*})\leq{\mathbb{E}}_{P}[u^{\prime}_{-}(\xi-x^{*})]\leq u^{\prime}_{-}(\xi_{\min}-x^{*})

(2.8)

and

u^{\prime}_{-}(x^{*})\geq{\mathbb{E}}_{P}[u^{\prime}_{+}(\xi-x^{*})]\geq u^{\prime}_{+}(\xi_{\max}-x^{*}).

(2.9)

Moreover, since $u_{-}^{\prime}(t^{\prime})>u_{+}^{\prime}(t^{\prime\prime})$ for any $t^{\prime}<t^{\prime\prime}$ , then inequalities (2.8)-(2.9) imply

x^{*}\geq\xi_{\min}-x^{*}\quad{\rm and}\quad x^{*}\leq\xi_{\max}-x^{*},

and hence (2.5).

The second part of the claim follows directly from the first part in that the interval $[\xi_{\min}^{N}/2,\\ \xi_{\max}^{N}/2]$ converges to a single point $\hat{\xi}/2$ and $\tau_{\phi_{1}}$ -convergence coincides with the weak convergence because of the restriction of the range of $\xi$ to a compact subset of ${\rm I\!R}$ .

Ben-Tal and Teboulle [7] derive a similar result to the first part of the proposition for the optimized certainty equivalent and demonstrate that $S_{u}(\xi)\in[\xi_{\min},\xi_{\max}]$ under the conditions that $u$ is concave and $1\in\partial u(0)$ rather than strictly concave. Strict concavity is needed to ensure the optimum in (2.5) to be achieved in $[\xi_{\min}/2,\xi_{\max}/2]$ . We can find a counter example otherwise, see 2.3.

Like the optimized certainty equivalent, the newly defined modified optimized certainty equivalent enjoys a number of properties as stated in the next proposition.

Proposition 2.2 (Properties of MOCE)

Let $u:{\rm I\!R}\rightarrow(-\infty,+\infty)$ be a closed proper function. Under 2.1, the following assertions hold.

(i)

$M_{u}$ is law invariant.
(ii)

(Monotonicity) For any $\xi_{1}\leq\xi_{2}\in L_{p}(\Omega,{\cal F},\mathbb{P})$ , with respective distributions (push-forward probabilities) $P_{1},P_{2}\in{\cal M}^{\phi_{1}}$ , $M_{u}(\xi_{1})\leq M_{u}(\xi_{2})$ .
(iii)

(Risk aversion) If $u(t)\leq t$ for all $t\in{\rm I\!R}$ , then $M_{u}(\xi)\leq{\mathbb{E}}_{P}[\xi]$ for any random variable $\xi$ .
(iv)

(Second-order stochastic dominance) Let $\xi_{1},\xi_{2}$ be random variables with compact support. Then for any concave utility function $u$ ,

$M_{u}(\xi_{1})\geq M_{u}(\xi_{2})\Longleftrightarrow C_{u}(\xi_{1})\geq C_{u}(\xi_{2}),$

where $C_{u}(\xi):=u^{-1}({\mathbb{E}}_{P}[u(\xi)])$ is the classical certainty equivalent.

(v)

(Concavity and positive subhomogeneity) If $u$ is concave, then $M_{u}(\cdot)$ is also concave. Moreover, if $u(0)\geq 0$ , then

M_{u}(\delta\xi)\leq\delta M_{u}(\xi),\;\forall\delta\in[1,\infty)\quad{\text{and}}\quad M_{u}(\delta\xi)\geq\delta M_{u}(\xi),\;\forall\delta\in[0,1].

(2.10)

Proof. Parts (i)-(iii) follow straightforwardly from the definitions, we prove the rest.

Part (iv). “ $\Longleftarrow$ ”. By the definition of certainty equivalent, $C_{u}(\xi_{1})\geq C_{u}(\xi_{2})$ implies ${\mathbb{E}}_{P}[u(\xi_{1})]\geq{\mathbb{E}}_{P}[u(\xi_{2})]$ for all concave utility functions. The latter implies $\xi_{1}$ dominates $\xi_{2}$ in second order, which in turn guarantees $\xi_{1}-x$ dominates $\xi_{2}-x$ in second order for any fixed $x\in{\rm I\!R}$ . Consequently ${\mathbb{E}}_{P}[u(\xi_{1}-x)]\geq{\mathbb{E}}_{P}[u(\xi_{2}-x)]$ for any $x\in{\rm I\!R}$ . Adding both sides of the inequality by $u(x)$ and taking the maximum, we obtain $M_{u}(\xi_{1})\geq M_{u}(\xi_{2})$ .

“ $\Longrightarrow$ ”. Let $x_{1},x_{2}$ be the points where the supremum of $M_{u}(\xi_{1})$ and $M_{u}(\xi_{2})$ are attained. Then

	$\displaystyle M_{u}(\xi_{1})=u(x_{1})+{\mathbb{E}}_{P}[u(\xi_{1}-x_{1})]$	$\displaystyle\geq$	$\displaystyle M_{u}(\xi_{2})=u(x_{2})+{\mathbb{E}}_{P}[u(\xi_{2}-x_{2})]$
		$\displaystyle\geq$	$\displaystyle u(x_{1})+{\mathbb{E}}_{P}[u(\xi_{2}-x_{1})],$

which yields ${\mathbb{E}}_{P}[u(\xi_{1}-x_{1})]\geq{\mathbb{E}}_{P}[u(\xi_{2}-x_{1})]$ . The latter implies ${\mathbb{E}}_{P}[u(\xi_{1})]\geq{\mathbb{E}}_{P}[u(\xi_{2})]$ .

Part (v). First we prove the concavity of $M_{u}$ , i.e. for $\lambda\in(0,1)$ and any random variables $\xi_{1}$ , $\xi_{2}$ ,

M_{u}(\lambda\xi_{1}+(1-\lambda)\xi_{2}))\geq\lambda M_{u}(\xi_{1})+(1-\lambda)M_{u}(\xi_{2}).

Since $u$ is concave, the function $f(z,x):=u(x)+u(z-x)$ is joint concave over ${\rm I\!R}\times{\rm I\!R}$ . Therefore, for any $x_{1},x_{2}\in{\rm I\!R}$ , with $x_{\lambda}:=\lambda x_{1}+(1-\lambda)x_{2}$ and $\xi_{\lambda}:=\lambda\xi_{1}+(1-\lambda)\xi_{2}$ , one has

{\mathbb{E}}[f(\xi_{\lambda},x_{\lambda})]\geq\lambda{\mathbb{E}}[f(\xi_{1},x_{1})]+(1-\lambda){\mathbb{E}}_{P}[f(\xi_{2},x_{2})].

Since $M_{u}(\xi_{\lambda})=M_{u}(\lambda\xi_{1}+(1-\lambda)\xi_{2}))=\sup_{x\in{\rm I\!R}}{\mathbb{E}}_{P}[f(\xi_{\lambda},x)]$ , it follows that

\displaystyle M_{u}(\xi_{\lambda})

\displaystyle\geq

\displaystyle\sup_{x_{1},x_{2}}\left\{\lambda{\mathbb{E}}_{P}[f(\xi_{1},x_{1})]+(1-\lambda){\mathbb{E}}_{P}[f(\xi_{2},x_{2})]\right\}=\lambda M_{u}(\xi_{1})+(1-\lambda)M_{u}(\xi_{2}).

Next, we turn to prove the subhomogeneity of $M_{u}$ . Let $s(\delta):=\frac{1}{\delta}M_{u}(\delta\xi)$ , for $\delta>0$ . Then

s(\delta)=\sup_{x\in{\rm I\!R}}\left\{\frac{1}{\delta}u(\delta x)+{\mathbb{E}}_{P}\left[\frac{1}{\delta}u(\delta(\xi-x))\right]\right\}.

(2.11)

Let $\delta_{2}>\delta_{1}>0$ . By the concavity of $u$ ,

\frac{u(\delta_{2}t)-u(0)}{\delta_{2}-0}\leq\frac{u(\delta_{1}t)-u(0)}{\delta_{1}-0}.

Since $u(0)\geq 0$ , the above inequality implies

\frac{1}{\delta_{2}}u(\delta_{2}t)\leq\frac{1}{\delta_{1}}u(\delta_{1}t),\;\forall t\in{\rm I\!R}.

(2.12)

Inequality (2.12) also implies

{\mathbb{E}}_{P}\left[\frac{1}{\delta_{2}}u(\delta_{2}(\xi-x))\right]\leq{\mathbb{E}}_{P}\left[\frac{1}{\delta_{1}}u(\delta_{1}(\xi-x))\right].

A combination of the two inequalities implies the objective function in (2.11) is non-increasing in $\delta$ and hence $s(\delta)$ . By setting $\delta_{1}$ and $\delta_{2}$ to $1$ respectively in the inequality above, we obtain (2.10).

Next, we discuss how the utility function $u$ may be recovered from a given modified certainty equivalent $M_{u}(\xi)$ , which is an important property enjoyed by the OCE. Let

\xi_{p}=\left\{\begin{array}[]{ll}z&\text{with probability}\;p,\\ 0&\text{with probability}\;1-p,\end{array}\right.

where $0<p<1$ and $z>0$ . For a concave utility function $u$ , the modified optimized certainty equivalent $M_{u}(\xi_{p})$ can be written as

M_{u}[z,p]:=\sup_{0\leq x\leq z/2}\{u(x)+pu(z-x)+(1-p)u(-x)\}.

(2.13)

Proposition 2.3

If $u$ is a strong risk averse utility function, i.e., $u(t)<t$ for all $t\neq 0$ , and $u(0)=0$ , then $\lim_{p\rightarrow 0^{+}}\frac{M_{u}[z,p]}{p}=u(z).$

Proof. Observe that $x^{*}=0$ is the optimal solution of problem (2.13) if and only if

	$\displaystyle u(x)+pu(z-x)+(1-p)u(-x)$	$\displaystyle\leq$	$\displaystyle u(0)+pu(z-0)+(1-p)u(-0)$		(2.14)
		$\displaystyle=$	$\displaystyle pu(z),\forall x\in[0,z/2].$		(2.14)

The inequality above can be equivalently written as

p[u(z-x)-u(-x)-u(z)]\leq-u(-x)-u(x),\forall x\in[0,z/2].

(2.15)

Since $u$ is strongly risk averse, that is, $u(t)<t$ for all $t\neq 0$ , then $-u(-x)-u(x)>0$ and hence inequality (2.15) holds for $p$ sufficiently small. This in turn shows that inequality (2.15) holds and hence $x^{*}=0$ is the optimal solution of problem (2.13) for all $p$ sufficiently small. Thus we have $M_{u}[z,p]=pu(z)$ and the conclusion follows.

Example 2.1

We give a few examples which illustrate how MOCE can be calculated in a closed form and their difference in comparison with OCE. Let $M_{u}(\xi)=2(1-\left({\mathbb{E}}_{P}[e^{-\xi}]\right)^{1/2})$ . Then $M_{u}[z,p]=2-2(pe^{-z}+(1-p))^{1/2}$ and

\displaystyle u(z)=\lim_{p\rightarrow 0^{+}}\frac{M_{u}[z,p]}{p}=\lim_{p\rightarrow 0^{+}}\frac{2-2(pe^{-z}+1-p)^{1/2}}{p}=\lim_{p\rightarrow 0^{+}}\frac{1-e^{-z}}{(pe^{-z}+1-p)^{1/2}}.

Hence, the recovered utility function is $u(z)=1-e^{-z}$ .

Example 2.2 (Exponential Utility Function)

Let $u(t)=1-e^{-t}$ , $t\in{\rm I\!R}$ . It is easy to derive that the optimal solution of problem (1.3) is $x^{*}=-\frac{1}{2}\ln{\mathbb{E}}_{P}\left[e^{-\xi}\right]$ and the modified optimized certainty equivalent is $M_{u}(\xi)=2\left(1-\left({\mathbb{E}}_{P}[e^{-\xi}]\right)^{1/2}\right).$ On the other hand, it follows from [7] that $S_{u}(\xi)=-\ln{\mathbb{E}}_{P}[e^{-\xi}].$ Since $u(t)\leq t$ for all $t\in{\rm I\!R}$ , then we can deduce from the definitions that $M_{u}(\xi)\leq S_{u}(\xi)$ . Indeed the strict inequality holds in that $u(t)=t$ only at $t=0$ .

Example 2.3 (Piecewise Linear Utility Function)

Let

u(t)=\left\{\begin{array}[]{ll}\gamma_{2}t&\text{if}\;\;t\leq 0,\\ \gamma_{1}t&\text{if}\;\;t>0,\end{array}\right.

where $0\leq\gamma_{1}<1\leq\gamma_{2}$ . Then the utility function $u$ can be written as $u(t)=\gamma_{1}(t)_{+}-\gamma_{2}(-t)_{+}$ and the modified optimized certainty equivalent is

M_{u}(\xi)=\sup_{x\in{\rm I\!R}}\{\gamma_{1}(x)_{+}-\gamma_{2}(-x)_{+}-\gamma_{2}{\mathbb{E}}_{P}[(x-\xi)_{+}]+\gamma_{1}{\mathbb{E}}_{P}[(\xi-x)_{+}]\}.

(2.16)

Compared to optimized certainty equivalent (see [7])

S_{u}(\xi)=\sup_{x\in{\rm I\!R}}\{x-\gamma_{2}{\mathbb{E}}_{P}[(x-\xi)_{+}]+\gamma_{1}{\mathbb{E}}_{P}[(\xi-x)_{+}]\},

we can also conclude that $M_{u}(\xi)\leq S_{u}(\xi)$ because $u(t)\leq t$ for all $t\in{\rm I\!R}$ .

It might be interesting to see where the optimum in (2.16) is achieved. We consider the case that $P$ follows a Dirac distribution at point $t>0$ , that is, $[\xi_{\min},\xi_{\max}]=\{t\}$ . Consequently

\displaystyle u(x)+{\mathbb{E}}_{P}[u(\xi-x)]=\left\{\begin{array}[]{ll}(\gamma_{2}-\gamma_{1})x+t\gamma_{1}&\;{\rm if}\;x\leq 0,\\ t\gamma_{1}&\;{\rm if}\;0<x\leq t,\\ (\gamma_{1}-\gamma_{2})x+t\gamma_{2}&\;{\rm if}\;x\geq t.\end{array}\right.

(2.20)

The set of optimal solutions is $[0,t]$ , which is not contained in $[0,t/2]\not\subset[\xi_{\min}/2,\xi_{\max}/2]=\{t/2\}$ . This explains that (2.5) may fail to hold without strict concavity of $u$ .

We now move on to discuss the properties of the robust modified optimized certainty equivalent.

Proposition 2.4 (Properties of RMOCE)

Let $u:{\rm I\!R}\rightarrow[-\infty,+\infty)$ be a closed proper function. Under 2.1, the following assertions hold.

(i)

$R(\xi)$ is law invariant.
(ii)

(Monotonicity) For any $\xi_{1}\leq\xi_{2}\in L_{p}(\Omega,{\cal F},\mathbb{P})$ , with respective distributions (push-forward probabilities) $P_{1},P_{2}\in{\cal M}^{\phi_{2}}$ , $R(\xi_{1})\leq R(\xi_{2})$ .
(iii)

(Risk aversion) If $u(t)\leq t$ , for all $t\in{\rm I\!R}$ and $u\in\mathscr{U}$ , then $R(\xi)\leq{\mathbb{E}}_{P}[\xi]$ , for any random variable $\xi$ .
(iv)

(Second-order stochastic dominance) Let $\xi_{1},\xi_{2}$ be random variables with compact support. Then for any concave utility function $u$ ,

$C_{u}(\xi_{1})\geq C_{u}(\xi_{2})\Longrightarrow R(\xi_{1})\geq R(\xi_{2}),$

where $C_{u}(\xi)$ is the classical certainty equivalent.

(v)

(Concavity and positive subhomogeneity) If $u$ is concave, then $R(\cdot)$ is also concave. Moreover, if $u(0)\geq 0$ , then

R(\delta\xi)\leq\delta R(\xi),\;\forall\delta\in[1,\infty)\quad{\text{and}}\quad R(\delta\xi)\geq\delta R(\xi),\;\forall\delta\in[0,1].

(2.21)

Proof. Parts (i)-(iii) are obvious.

Part (iv). Following a similar argument to the proof of part (iv) of 2.2, we can show that $C_{u}(\xi_{1})\geq C_{u}(\xi_{2})$ implies ${\mathbb{E}}_{P}[u(\xi_{1})]\geq{\mathbb{E}}_{P}[u(\xi_{2})]$ and ${\mathbb{E}}_{P}[u(\xi_{1}-x)]\geq{\mathbb{E}}_{P}[u(\xi_{2}-x)]$ for any fixed $x\in{\rm I\!R}$ and hence

u(x)+{\mathbb{E}}_{P}[u(\xi_{1}-x)]\geq u(x)+{\mathbb{E}}_{P}[u(\xi_{2}-x)].

Taking infimum on both sides w.r.t. $u$ over ${\cal U}$ and then supremum w.r.t. $x$ , we obtain $R(\xi_{1})\geq R(\xi_{2})$ .

Part (v). Let $g_{u}(\delta,x):=\frac{1}{\delta}u(\delta x)+{\mathbb{E}}_{P}\left[\frac{1}{\delta}u(\delta(\xi-x))\right].$ We can show as in the proof of 2.2 (v) that $g_{u}(\cdot,x)$ is non-increasing over ${\rm I\!R}$ . This property is preserved after taking the infimum in $u$ over ${\mathcal{U}}$ and then supremum in $x$ over ${\rm I\!R}$ .

Before concluding this section, we remark that it is possible to use a different utility function $v$ for the present consumption $x$ , i.e.,

{\rm(RMOCE^{\prime})}\quad\quad\displaystyle{M_{u,v}(\xi):=\sup_{x\in{\rm I\!R}}\;\{v(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}}.

(2.22)

In that case, some of the properties of MOCE may be retained. For example, law invariance, monotonicity, risk aversion, concavity, positive subhomogeneity and second-order stochastic dominance are all satisfied when $v$ enjoys the same property as $u$ . 2.3 also holds when $v$ satisfies the same property as $u$ . However, the change will have an effect on 2.1, in which case it will be difficult to estimate the interval containing the optimal solution.

3 Computation of RMOCE

Having investigated the properties of MOCE and RMOCE in the previous section, we move on to discuss numerical schemes for computing RMOCE in this section. To this end, we need to have a concrete structure of the ambiguity set. As reviewed in the introduction, various approaches have been proposed for constructing an ambiguity set of utility functions in the literature of preference robust optimization depending on the availability of information. Here we consider a situation where the decision maker has a nominal utility function obtained from empirical data or subjective judgement but lacks of complete information to identify whether it is the true utility function which captures precisely the decision maker’s preference. Consequently we may construct a ball of utility functions centered at the nominal utility function under some appropriate metrics. Here we concentrate on the Kantorovich metric.

3.1 Kantorovich ball of piecewise linear utility functions

We begin by considering a ball of utility function centered at a piecewise linear utility function under the the Kantorovich metric. In practice, decision maker’s utility preferences are often elicited through questionnaires. For example, a customer’s utility preference may be elicited via the customer’s willingness to pay at certain price points [43, 32]. From computational point of view, piecewise linear utility function may bring significant convenience to calculation of OCE, see Nouiehed et al. [36].

Let $t_{1}<\cdots<t_{N}$ be an ordered sequence of points in $[a,b]$ and $T:=\{t_{1},\cdots,t_{N}\}$ with $t_{1}=a$ and $t_{N}=b$ . Let $\mathscr{U}_{N}$ be a class of continuous, non-decreasing, concave, piecewise linear functions defined over an interval $[a,b]$ with kinks on $T$ , as well as Lipschitz condition with modulus $L$ and normalized conditions $u(a)=0$ and $u(b)=1$ . Let $u_{N},u_{N}^{0}\in\mathscr{U}_{N}$ , we consider a ball in $\mathscr{U}_{N}$ with the Kantorovich metric

\mathbb{B}_{K}(u_{N}^{0},r)=\left\{u_{N}\in\mathscr{U}_{N}|\mathsf{d\kern-0.70007ptl}_{K}(u_{N},u_{N}^{0})\leq r\right\},

(3.23)

where the subscript $K$ represents the Kantorovich metric and

\displaystyle{\mathsf{d\kern-0.70007ptl}_{K}(u,v):=\sup_{g\in\mathscr{G}_{K}}|\langle g,u\rangle-\langle g,v\rangle|=\sup_{g\in\mathscr{G}_{K}}\left\{\int_{{\rm I\!R}}g(t)du(t)-\int_{{\rm I\!R}}g(t)dv(t)\right\}}

(3.24)

and

\mathscr{G}_{K}:=\{g:{\rm I\!R}\rightarrow{\rm I\!R}\mid|g(t)-g(t^{\prime})|\leq|t-t^{\prime}|,\forall t,t^{\prime}\in{\rm I\!R}\}.

(3.25)

Note that piecewise linear utility functions are used to approximate general utility functions in the utility preference robust optimization model [18]. The difference is that here we use the Kantorovich ball to construct the ambiguity set of DM’s utility function whereas the authors use pairwise comparison approach to elicit the DM’s utility preferences in [18]. The next proposition states that $\mathsf{d\kern-0.70007ptl}_{K}(u_{N},u_{N}^{0})$ may be computed by solving a linear program.

Proposition 3.1

The Kantorvich distance $\mathsf{d\kern-0.70007ptl}_{K}(u_{N},u_{N}^{0})$ is the optimal value of the following linear program:


$\displaystyle\displaystyle\max_{\begin{subarray}{c}y_{1},\cdots,y_{N-1}\\ z_{1},\cdots,z_{N}\end{subarray}}$	$\displaystyle\sum_{j=2}^{N}(\beta_{j-1}-\beta_{j-1}^{0})y_{j-1}$	(3.26a)
$\displaystyle{\rm s.t.}~{}~{}~{}$	$\displaystyle y_{j-1}\leq z_{j-1}(t_{j}-t_{j-1})+\frac{1}{2}(t_{j}-t_{j-1})^{2},j=2,\cdots,N,$	(3.26e)
	$\displaystyle-y_{j-1}\leq-z_{j-1}(t_{j}-t_{j-1})+\frac{1}{2}(t_{j}-t_{j-1})^{2},j=2,\cdots,N,$
	$\displaystyle y_{j-1}\leq z_{j}(t_{j}-t_{j-1})+\frac{1}{2}(t_{j}-t_{j-1})^{2},j=2,\cdots,N,$
	$\displaystyle-y_{j-1}\leq-z_{j}(t_{j}-t_{j-1})+\frac{1}{2}(t_{j}-t_{j-1})^{2},j=2,\cdots,N.$

Proof. Let $g\in\mathscr{G}_{K}$ . By definition,

\int_{a}^{b}g(t)du_{N}(t)=\sum_{j=2}^{N}\beta_{j-1}\int_{t_{j-1}}^{t_{j}}g(t)dt,

where $\beta_{j}$ denotes the slope of $u_{N}$ at interval $[t_{j-1},t_{j}]$ . Since for each $g\in\mathscr{G}_{K}$ , $-g\in\mathscr{G}_{K}$ ,

\mathsf{d\kern-0.70007ptl}_{K}(u_{N},u_{N}^{0})=\sup_{g\in\mathscr{G}_{K}}\sum_{j=2}^{N}(\beta_{j-1}-\beta_{j-1}^{0})\int_{t_{j-1}}^{t_{j}}g(t)dt,

where $\beta_{j-1}^{0}$ denotes the slope of $u^{0}$ at interval $[t_{j-1},t_{j}]$ . Note that in this formulation, $\mathsf{d\kern-0.70007ptl}_{K}(u_{N},u^{0}_{N})$ depends on the slopes of $u_{N},u_{N}^{0}$ rather than their function values. Let $y_{j-1}:=\int_{t_{j-1}}^{t_{j}}g(t)dt$ and $z_{j}:=g(t_{j})$ . Since $|g(t)-g(t_{j-1})|\leq t-t_{j-1}$ for all $t\in[t_{j-1},t_{j}]$ , we have

z_{j-1}(t_{j}-t_{j-1})-\frac{1}{2}(t_{j}-t_{j-1})^{2}\leq y_{j-1}\leq z_{j-1}(t_{j}-t_{j-1})+\frac{1}{2}(t_{j}-t_{j-1})^{2}

for $j=2,\cdots,N$ . Likewise, since $|g(t)-g(t_{j})|\leq t_{j}-t$ for all $t\in[t_{j-1},t_{j}]$ , we have

z_{j}(t_{j}-t_{j-1})-\frac{1}{2}(t_{j}-t_{j-1})^{2}\leq y_{j-1}\leq z_{j}(t_{j}-t_{j-1})+\frac{1}{2}(t_{j}-t_{j-1})^{2}

for $j=2,\cdots,N$ . To complete the proof, it suffices to show that conditions

|g(t)-g(t_{j-1})|\leq t-t_{j-1}\;\;\text{and}\;\;|g(t)-g(t_{j})|\leq t_{j}-t,\forall t\in[t_{j-1},t_{j}]

(3.27)

are adequate to cover the generic condition

|g(t^{\prime})-g(t^{\prime\prime})|\leq|t^{\prime}-t^{\prime\prime}|,\forall t^{\prime},t^{\prime\prime}\in[a,b].

(3.28)

We consider two cases.

Case 1. $t^{\prime},t^{\prime\prime}\in[t_{i-1},t_{i}]$ for some $i$ . In this case, the generic condition is adequately covered by $|g(t)-g(t_{j-1})|\leq t-t_{j-1}$ for all $t\in[t_{j-1},t_{j}]$ . Because the objective depends only on $\int_{t_{j-1}}^{t_{j}}g(t)dt$ .

Case 2. $t^{\prime},t^{\prime\prime}$ lie in two intervals, i.e., $t^{\prime}\in[t_{i-1},t_{i}]$ and $t^{\prime\prime}\in[t_{j-1},t_{j}]$ , where $i<j$ . Then by (3.27),

	$\displaystyle\|g(t^{\prime})-g(t^{\prime\prime})\|$	$\displaystyle\leq$	$\displaystyle\|g(t^{\prime})-g(t_{i})\|+\|g(t_{i})-g(t_{i+1})\|+\cdots+\|g(t_{j-2})-g(t_{j-1})\|+\|g(t_{j-1})-g(t^{\prime\prime})\|$
		$\displaystyle\leq$	$\displaystyle t_{i}-t^{\prime}+t_{i+1}-t_{i}+\cdots+t_{j-1}-t_{j-2}+t^{\prime\prime}-t_{j-1}=t^{\prime\prime}-t^{\prime}.$

The proof is complete.

3.2 Alternating iterative algorithm for computing RMOCE

We are now ready to discuss how to compute the RMOCE with the ambiguity set of piecewise linear utility functions constructed by the Kantorovich ball. Assume that the probability distribution of random variable $\xi$ is discrete with $P(\xi=\xi_{k})=p_{k}$ for $k=1,...,K$ and $u_{N},u_{N}^{0}\in\mathscr{U}_{N}$ . Then we can rewrite the RMOCE problem (1.4) as

{\rm(RMOCE-PLU)}\quad\;\;R_{N}(\xi):=\displaystyle{\max_{x\in{\rm I\!R}}\min_{u_{N}\in\mathbb{B}_{K}(u^{0}_{N},r)}}\;\;u_{N}(x)+\sum_{k=1}^{K}p_{k}u_{N}(\xi_{k}-x).

(3.29)

Recall that in 2.1, we show that the optimal solutions of MOCE are contained in interval $[\xi_{\min}/2,\xi_{\max}/2]$ when utility function is strictly concave. Unfortunately, this result is not applicable to problem (3.34) because $u_{N}^{s}$ is piecewise linear. However, under some fairly moderate conditions, we are able to show that the optimal solutions are bounded. The next proposition states this.

Proposition 3.2

Consider MOCE problem (1.3). Let $X^{*}$ denote the set of optimal solutions. Assume: (a) $u$ is a piecewise linear concave function and (b) $u$ has at least two pieces in the interval $[\xi_{\min},\xi_{\max}]$ . Then the following assertions hold.

(i)

$X^{*}$ is a compact and convex set.
(ii)

If $0\in[\xi_{\min},\xi_{\max}]$ , then $X^{*}\subset[\xi_{\min},\xi_{\max}]$ .
(iii)

If $\xi_{\min}\geq 0$ , then $X^{*}\subset[0,\xi_{\max}]$ .
(iv)

If $\xi_{\max}\leq 0$ , then $X^{*}\subset[\xi_{\min},0]$ .

Proof. Part (i). Observe first that $X^{*}$ is a convex set since problem (1.3) is a convex optimization problem. Suppose for the sake of a contradiction that $X^{*}$ is unbounded. Then either $X^{*}$ is a right half line or a left half line. We consider the former. In that case, there exists $x^{*}\in X^{*}$ sufficiently large such that

[\xi_{\min},\xi_{\max}]\subset[\xi_{\min}-x^{*},x^{*}].

(3.30)

By the first order optimality condition

0\in\partial u(x^{*})+\partial{\mathbb{E}}_{P}[u(\xi-x^{*})]=\partial u(x^{*})-{\mathbb{E}}_{P}[\partial u(\xi-x^{*})].

(3.31)

The equality holds because of Clarke regularity, see [9, 11]. Since $x^{*}\geq\xi_{\max}-x^{*}$ , then any subgradient in set $\partial u(x^{*})$ is greater or equal to the subgradient from $\partial u(\xi-x^{*})$ for all $\xi\in[\xi_{\min},\xi_{\max}]$ . This means the optimality condition holds if and only if $x^{*}$ and $\xi_{\min}-x^{*}$ are in the domain of the same linear piece. But this contradicts assumption (b). Using a similar argument, we can also show that $X^{*}$ cannot be a left half line.

Part (ii). Assume for a contradiction that $x^{*}>\xi_{\max}$ . Then inclusion (3.30) holds. Following a similar analysis to that in Part (i), we can show that in this case $x^{*}$ does not satisfy (3.31). If $x^{*}<\xi_{\min}\leq 0$ , then

[\xi_{\min},\xi_{\max}]\subset[x^{*},\xi_{\max}-x^{*}].

(3.32)

Consequently we can show that $X^{*}$ cannot satisfy the optimality condition (3.31).

Part (iii). In this case, we can show that $x^{*}$ cannot be larger that $\xi_{\max}$ because otherwise we would have (3.30) and a contradiction to the optimality condition. Likewise if $x^{*}<0$ , then the inclusion (3.32) would be invoked.

Part (iv) is similar to Part (iii), we omit the details.

Note that if we strengthen the condition on two linear pieces in the interval $[\xi_{\min},\xi_{\max}]$ to a smaller interval $[\xi_{\min}/2,\xi_{\max}/2]$ , then we will be able to strengthen the conclusions in Parts (ii)-(iv) whereby $X^{*}$ is included in $[\xi_{\min}/2,\xi_{\max}/2]$ , we leave readers for an exercise.

Now we propose the alternating iterative algorithm for solving the maximin problem (3.29).

Algorithm 3.1

Step 0. Choose an initial point $x^{0}$ .

Step 1. For s=1,…, solve

\displaystyle{u^{s}_{N}\in\arg\min_{u_{N}\in\mathbb{B}_{K}(u^{0}_{N},r)}}u_{N}(x^{s-1})+\sum_{k=1}^{K}p_{k}u_{N}(\xi_{k}-x^{s-1})

(3.33)

and

\displaystyle{x^{s}\in\arg\max_{x\in X}u_{N}^{s}(x)+\sum_{k=1}^{K}p_{k}u_{N}^{s}(\xi_{k}-x),}

(3.34)

where $X$ is a compact subset of ${\rm I\!R}$ .

Step 2. Stop when $x^{s+1}=x^{s}$ and $u_{N}^{s+1}=u_{N}^{s}$ .

Note that in equation (3.34), we restrict $x$ to taking values in a convex and compact set $X$ since 3.2 guarantees that the optimal $x^{*}$ is contained in such a set. There is another important issue concerning the algorithm, that is, whether a sequence $\{x^{s}\}$ generated by the algorithm converges to the optimal solution of (RMOCE-PLU). The next proposition addresses this.

Proposition 3.3

Algorithm 3.1 either terminates in a finite number of steps with a solution of the (RMOCE-PLU) model or generates a sequence $\{(x^{s},u_{N}^{s})\}$ whose cluster points, if exist, are optimal solution of the (RMOCE-PLU) model.

Proof. Let $(x^{*},u^{*})$ be a cluster point of the sequence generated by Algorithm 3.1. Then for all $\mathbb{B}_{K}(u^{0}_{N},r)$ and $x\in X$ ,

u^{*}(x)+{\mathbb{E}}_{P}[u^{*}(\xi-x)]\leq u^{*}(x^{*})+{\mathbb{E}}_{P}[u^{*}(\xi-x^{*})]\leq u(x^{*})+{\mathbb{E}}_{P}[u(\xi-x^{*})].

(3.35)

For $s=1,2,...$ ,

u^{s+1}(x^{s})+{\mathbb{E}}_{P}[u^{s+1}(\xi-x^{s})]\leq u(x^{s})+{\mathbb{E}}_{P}[u(\xi-x^{s})]

and

u^{s}(x^{s})+{\mathbb{E}}_{P}[u^{s}(\xi-x^{s})]\leq u^{s}(x)+{\mathbb{E}}_{P}[u^{s}(\xi-x)].

(3.36)

If Algorithm 3.1 terminates in finite steps, then $x^{s+1}=x^{s}$ and $u^{s+1}=u^{s}$ for some $s$ and $(x^{s},u^{s})$ satisfies (3.35). In what follows we consider the case that Algorithm 3.1 generates an infinite sequence $\{(x^{s},u^{s})\}$ . Let $(\hat{x},\hat{u})$ be a cluster point of $\{(x^{s},u^{s})\}$ . For the simplicity of notation, we assume that $(x^{s},u^{s})\rightarrow(\hat{x},\hat{u})$ . If $(\hat{x},\hat{u})$ is not a saddle point, then it violates one of the inequalities in (3.35). Without loss of generality, consider the case that the first inequality of (3.35) is violated, that is, there exists $x_{0}$ such that

\hat{u}(x_{0})+{\mathbb{E}}_{P}[\hat{u}(\xi-x_{0})]>\hat{u}(\hat{x})+{\mathbb{E}}_{P}[\hat{u}(\xi-\hat{x})].

Since $\hat{u}$ is continuous, then for sufficiently large $s$ ,

u^{s}(x_{0})+{\mathbb{E}}_{P}[u^{s}(\xi-x_{0})]>u^{s}(x^{s})+{\mathbb{E}}_{P}[u^{s}(\xi-x^{s})],

which is a contradiction to (3.36). In the same manner, we can show that $(\hat{x},\hat{u})$ satisfies the second inequality in (3.35). The proof is complete.

Note that the cluster point is indeed a saddle point of the maximin problem (3.29) and existence of the latter is guaranteed by the fact that the objective function is linear in $u$ and concave in $x$ . Problem (3.33) is a convex problem because $\mathbb{B}_{K}(u^{0}_{N},r)$ is a compact and convex set. By writing each utility function $u_{N}\in\mathscr{U}_{N}$ as

u_{N}(t)=(a_{1}t+b_{1})\mathds{1}_{[t_{1},t_{2}]}(t)+\sum_{j=2}^{N-1}(a_{j}t+b_{j})\mathds{1}_{(t_{j},t_{j+1}]}(t)

(3.37)

for $t\in[a,b]$ and writing down the Lagrange dual of problem (3.26),


$\displaystyle\displaystyle\min_{\begin{subarray}{c}\lambda^{i}_{j},i=1,2,3,4\\ j=2,\cdots,N\end{subarray}}$	$\displaystyle-\frac{1}{2}\sum_{j=2}^{N}(\lambda_{j}^{1}+\lambda_{j}^{2}+\lambda_{j}^{3}+\lambda_{j}^{4})(t_{j}-t_{j-1})^{2}$	(3.38a)
$\displaystyle{\rm s.t.}~{}~{}~{}~{}$	$\displaystyle(\beta_{j-1}-\beta_{j-1}^{0})+(\lambda_{j}^{1}-\lambda_{j}^{2}+\lambda_{j}^{3}-\lambda_{j}^{4})=0,j=2,\cdots,N,$	(3.38f)
	$\displaystyle(\lambda_{j+1}^{2}-\lambda_{j+1}^{1})(t_{j+1}-t_{j})+(\lambda_{j}^{4}-\lambda_{j}^{3})(t_{j}-t_{j-1})=0,$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad j=2,\cdots,N-1,$
	$\displaystyle(\lambda_{2}^{2}-\lambda_{2}^{1})(t_{2}-t_{1})=0,$
	$\displaystyle(\lambda_{N}^{4}-\lambda_{N}^{3})(t_{N}-t_{N-1})=0,$
	$\displaystyle\lambda_{j}^{i}\leq 0,j=2,\cdots,N,i=1,2,3,4.$

We can effectively reformulate problem (3.33) as a linear program:


	$\displaystyle(a^{s},b^{s})\in\displaystyle\arg\min_{\begin{subarray}{c}a_{j},b_{j},\\ j=1,\cdots,N-1\end{subarray}}$		$\displaystyle(a_{1}x^{s-1}+b_{1})\mathds{1}_{[t_{1},t_{2}]}(x^{s-1})+\sum_{j=2}^{N-1}(a_{j}x^{s-1}+b_{j})\mathds{1}_{(t_{j},t_{j+1}]}(x^{s-1})$
			$\displaystyle+\sum_{k=1}^{K}p_{k}\Bigl{\{}(a_{1}(\xi^{k}-x^{s-1})+b_{1})\mathds{1}_{[t_{1},t_{2}]}(\xi^{k}-x^{s-1})\Bigr{.}$
			$\displaystyle\Bigl{.}+\sum_{j=2}^{N-1}(a_{j}(\xi^{k}-x^{s-1})+b_{j})\mathds{1}_{(t_{j},t_{j+1}]}(\xi^{k}-x^{s-1})\Bigr{\}}$
	s.t.		$\displaystyle a_{j-1}t_{j}+b_{j-1}=a_{j}t_{j}+b_{j},j=2,\cdots,N-1,$
			$\displaystyle a_{1}t_{1}+b_{1}=0,$
			$\displaystyle a_{N-1}t_{N}+b_{N-1}=1,$
			$\displaystyle a_{j+1}\leq a_{j},j=1,\cdots,N-2,$
			$\displaystyle 0\leq a_{j}\leq L,j=1,\cdots,N-1,$
			$\displaystyle-\frac{1}{2}\sum_{j=2}^{N}(\lambda_{j}^{1}+\lambda_{j}^{2}+\lambda_{j}^{3}+\lambda_{j}^{4})(t_{j}-t_{j-1})^{2}\leq r,\qquad\qquad$
			$\displaystyle(a_{j-1}-a_{j-1}^{0})+(\lambda_{j}^{1}-\lambda_{j}^{2}+\lambda_{j}^{3}-\lambda_{j}^{4})=0,j=2,\cdots,N,$
			$\displaystyle(\lambda_{j+1}^{2}-\lambda_{j+1}^{1})(t_{j+1}-t_{j})+(\lambda_{j}^{4}-\lambda_{j}^{3})(t_{j}-t_{j-1})=0,$
			$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad j=2,\cdots,N-1,$
			$\displaystyle(\lambda_{2}^{2}-\lambda_{2}^{1})(t_{2}-t_{1})=0,$
			$\displaystyle(\lambda_{N}^{4}-\lambda_{N}^{3})(t_{N}-t_{N-1})=0,$
			$\displaystyle\lambda_{j}^{i}\leq 0,j=2,\cdots,N,i=1,2,3,4,$

where $a_{j-1}^{0}$ denotes the slope of $u_{N}^{0}$ at interval $[t_{j-1},t_{j}]$ . Constraint (3.39) requires the piecewise linear function to be continuous at the kinks, constraint (3.39) and (3.39) represent the normalized conditions, (3.39) requires the concavity of utility function, (3.39) represents the Lipschitz condition, constraints (3.39)-(3.39) represent the bounded Kantorovich ball. Note that here we use the Lagrange dual problem (3.38) instead of the primal problem (3.26) because the latter would have bilinear terms $(\beta_{j-1}-\beta_{j-1}^{0})y_{j-1}$ otherwise.

4 RMOCE with non-piecewise linear utility functions

The computational schemes that we discussed in the previous section are applicable to the case when the ambiguity set is constructed by a Kantorovich ball of piecewise linear utility functions. In practice, the utility functions are not necessarily piecewise linear. This raises a question as to how much we may miss if we use ${\rm(RMOCE-PLU)}$ to compute (RMOCE) with the ambiguity set constructed by the Kantorovich ball of general utility functions. In this section, we address the issue which is essentially about error bound of modelling error. To maximize the scope of coverage, we consider $\zeta$ -ball instead of the Kantovich ball. Let $\mathscr{U}_{L}$ be a class of continuous, non-decreasing, concave functions defined over $[a,b]$ with Lipschitz condition with moludus $L$ and normalized conditions $u(a)=0$ and $u(b)=1$ . For $u^{0}\in\mathscr{U}_{L}$ , we define

\mathbb{B}(u^{0},r):=\left\{u\in\mathscr{U}_{L}\;|\;\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u^{0})\leq r\right\},

(4.40)

where

\displaystyle{\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,v):=\sup_{g\in\mathscr{G}}|\langle g,u\rangle-\langle g,v\rangle|},

(4.41)

$\mathscr{G}$ is a set of measurable functions defined over ${\rm I\!R}$ and $\langle g,u\rangle:=\int_{{\rm I\!R}}g(t)du(t)$ . $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}$ is known as a pseudo metric. It can be observed that $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,v)=0$ if and only if $\langle g,u\rangle=\langle g,v\rangle$ for all $g\in\mathscr{G}$ but not necessarily $u=v$ unless $\mathscr{G}$ is sufficiently large. By specifying particular properties of functions in set $\mathscr{G}$ , we may obtain some specific metric such as Kantorovich metric $\mathsf{d\kern-0.70007ptl}_{K}$ and the Kolmogorov metric with $\mathscr{G}=\displaystyle{\mathscr{G}_{I}}$ , where $\mathscr{G}_{I}$ consists of all indicator functions defined as

\mathds{1}_{(a,t]}(s):=\begin{cases}1&\text{if }s\in(a,t],\\ 0&\text{otherwise}.\end{cases}

(4.42)

With the definition of the $\zeta$ -ball and $u^{0}\in\mathscr{U}_{L}$ , we may define the corresponding RMOCE as

{\rm(RMOCE(\zeta))}\quad\;\;R(\xi):=\displaystyle{\max_{x\in{\rm I\!R}}\min_{u\in\mathbb{B}(u^{0},r)}}\;\;u(x)+{\mathbb{E}}_{P}[u(\xi-x)]

(4.43)

and the one when the utility functions are restricted to be piecewise linear:

{\rm(RMOCE(\zeta,N))}\quad\;\;R_{N}(\xi):=\displaystyle{\max_{x\in{\rm I\!R}}\min_{u_{N}\in\mathbb{B}_{N}(u_{N}^{0},r)}}\;\;u_{N}(x)+{\mathbb{E}}_{P}[u_{N}(\xi-x)],

(4.44)

where

\mathbb{B}_{N}(u^{0}_{N},r):=\left\{u_{N}\in\mathscr{U}_{N}:\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},u^{0}_{N})\leq r\right\}.

(4.45)

We investigate the difference between $\mathbb{B}_{N}(u_{N}^{0},r)$ and $\mathbb{B}(u^{0},r)$ and its propagation to the optimal values. Let $\mathcal{U}_{1}$ and $\mathcal{U}_{2}$ be two sets of utility function, $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,\mathcal{U}_{1}):=\inf_{\tilde{u}\in\mathcal{U}_{1}}\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,\tilde{u})$ between $u$ and $\mathcal{U}_{1}$ , $\mathbb{D}_{\mathscr{G}}(\mathcal{U}_{1},\mathcal{U}_{2}):=\sup_{u\in\mathcal{U}_{1}}\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,\mathcal{U}_{2})$ be the deviation distance of $\mathcal{U}_{1}$ from $\mathcal{U}_{2}$ , and $\mathbb{H}_{\mathscr{G}}(\mathcal{U}_{1},\mathcal{U}_{2}):=\max\{\mathbb{D}_{\mathscr{G}}(\mathcal{U}_{1},\mathcal{U}_{2}),\mathbb{D}_{\mathscr{G}}(\mathcal{U}_{2},\mathcal{U}_{1})\}$ be the Hausdorff distance between $\mathcal{U}_{1}$ and $\mathcal{U}_{2}$ .

4.1 Error bound on the ambiguity set

We start by quantifying the difference between the ambiguity sets. To this effect, we need a couple of technical results.

Proposition 4.1

([18, Proposition 4.1]) For each fixed $u\in\mathscr{U}_{L}$ , let $u_{N}\in\mathscr{U}_{N}$ be such that $u_{N}(t_{i})=u(t_{i})$ for $i=1,...N$ and

u_{N}(t):=u(t_{i-1})+\frac{u(t_{i})-u(t_{i-1})}{t_{i}-t_{i-1}}(t-t_{i-1})\;{\rm for}\;t\in[t_{i-1},t_{i}],\;i=2,\cdots,N.

(4.46)

Then

\|u_{N}-u\|_{\infty}:=\sup_{t\in[a,b]}|u_{N}(t)-u(t)|\leq L\beta_{N},

(4.47)

where $\beta_{N}:=\max_{i=2,\cdots,N}(t_{i}-t_{i-1})$ . Moreover, in the case when $\mathscr{G}=\mathscr{G}_{K}$ , it holds that

\displaystyle{\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N})\leq 2\beta_{N}}.

(4.48)

In the case when $\mathscr{G}=\mathscr{G}_{I}$ , $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N})\leq L\beta_{N}$ .

Here and later on, we call $u_{N}$ defined in (4.48) as a projection of $u$ on $\mathscr{U}_{N}$ . Next, we quantify the deviation distance and Hausdorff distance between $\zeta$ -balls in the $\mathscr{U}_{L}$ and $\mathscr{U}_{N}$ .

Lemma 4.1

Let $u_{N}\in\mathscr{U}_{N}$ , $u\in\mathscr{U}_{L}$ and $\delta$ , $r$ be any positive numbers. Then the following holds:

(i)

$\mathbb{D}_{\mathscr{G}}(\mathbb{B}_{N}(u_{N},r+\delta),\mathbb{B}_{N}(u_{N},r))\leq\delta$ , $\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u,r+\delta),\mathbb{B}(u,r))\leq\delta$ ,
(ii)

If $u_{N}$ is defined as in (4.46) and $\mathscr{G}=\mathscr{G}_{K}\cup\mathscr{G}_{I}$ , then $\mathbb{H}_{\mathscr{G}}(\mathbb{B}_{N}(u_{N},r),\mathbb{B}(u,r))\leq\max\{2,L\}\beta_{N}$ and $\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u_{N},r+\delta),\mathbb{B}(u_{N},r))\leq\delta+2\max\{2,L\}\beta_{N}$ .

Proof. The proof is similar to that of [46], here we include a sketch for self-containedness.

Part (i). We only prove the first inequality, as the second one can be proved analogously. Let $\tilde{u}_{N}\in\mathbb{B}_{N}(u_{N},r+\delta)\setminus\mathbb{B}_{N}(u_{N},r)$ and $u_{N}^{\lambda}:=\lambda\tilde{u}_{N}+(1-\lambda)u_{N}\in\mathscr{U}_{N}$ , where $\lambda:=r/\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},\tilde{u}_{N})\in(0,1)$ . By the definition of $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}$ , we have $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N}^{\lambda},u_{N})=\sup_{g\in\mathscr{G}}\langle g,u_{N}^{\lambda}-u_{N}\rangle=\lambda\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},\tilde{u}_{N})=r$ , which implies $u_{N}^{\lambda}\in\mathbb{B}_{N}(u_{N},r)$ . Thus

	$\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},\mathbb{B}_{N}(u_{N},r))$	$\displaystyle\leq$	$\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u_{N}^{\lambda})=(1-\lambda)\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u_{N})$
		$\displaystyle=$	$\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u_{N})-r\leq r+\delta-r=\delta.$

Since $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\hat{u}_{N},\mathbb{B}_{N}(u_{N},r))=0$ for $\hat{u}_{N}\in\mathbb{B}_{N}(u_{N},r)$ , we have $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\hat{u}_{N},\mathbb{B}_{N}(u_{N},r))\leq\delta$ for all $\hat{u}_{N}\in\mathbb{B}_{N}(u_{N},r+\delta)$ . By the definition of $\mathbb{D}_{\mathscr{G}}$ , we have

\mathbb{D}_{\mathscr{G}}(\mathbb{B}_{N}(u_{N},r+\delta),\mathbb{B}_{N}(u_{N},r))=\sup_{\hat{u}_{N}\in\mathbb{B}_{N}(u_{N},r+\delta)}\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\hat{u}_{N},\mathbb{B}(u_{N},r))

and hence (i) holds.

Part (ii). Let $\tilde{u}_{N}\in\mathbb{B}_{N}(u_{N},r)$ . Under 4.1,

\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u)\leq\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u_{N})+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},u)\leq r+\max\{2,L\}\beta_{N},

which implies $\mathbb{B}_{N}(u_{N},r)\subset\mathbb{B}(u,r+\max\{2,L\}\beta_{N})$ . By Part (i),

\mathbb{D}_{\mathscr{G}}(\mathbb{B}_{N}(u_{N},r),\mathbb{B}(u,r))\leq\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u,r+2\beta_{N}),\mathbb{B}(u,r))\leq\max\{2,L\}\beta_{N}.

Similarly, we have $\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u,r),\mathbb{B}_{N}(u_{N},r))\leq\max\{2,L\}\beta_{N}$ . The result holds due to the definition of Hausdorff distance under $\zeta$ -metric.

Now we turn to prove $\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u_{N},r+\delta),\mathbb{B}(u_{N},r))\leq\delta+2\max\{2,L\}\beta_{N}$ . Since $u_{N}\in\mathscr{U}_{N}$ , then we can find a $u\in\mathscr{U}_{L}$ such that $u_{N}$ is the projection of $u$ . Hence for any $\tilde{u}\in\mathbb{B}(u_{N},r+\delta)$ , we have $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},u)\leq\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},u_{N})+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},u)\leq r+\delta+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},u)$ . Consequently, $\mathbb{B}(u_{N},r+\delta)\subset\mathbb{B}(u,r+\delta+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},u))$ . On the other hand, for any $\tilde{u}\in\mathbb{B}(u,r-\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},u))$ , we have

\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},u_{N})\leq\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},u)+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N})\leq r-\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N})+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N})=r,

hence $\mathbb{B}(u,r-\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N}))\subset\mathbb{B}(u_{N},r)$ . Therefore, according to the definition of $\mathbb{D}_{\mathscr{G}}$ and Part (i),

	$\displaystyle\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u_{N},r+\delta),\mathbb{B}(u_{N},r))$	$\displaystyle\leq$	$\displaystyle\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u,r+\delta+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u_{N},u)),\mathbb{B}(u,r-\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N})))$
		$\displaystyle\leq$	$\displaystyle\delta+2\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u_{N})\leq\delta+2\max\{2,L\}\beta_{N}.$

The proof is complete.

With 4.1, we are ready to quantify the difference between $\mathbb{B}(u,r)$ and $\mathbb{B}_{N}(u_{N},r)$ .

Theorem 4.1

Let $u\in\mathscr{U}_{L}$ and $u_{N}$ is a projection of $u$ defined as in (4.46) and $\mathscr{G}=\mathscr{G}_{K}\cup\mathscr{G}_{I}$ . Then

\mathbb{H}_{\mathscr{G}}(\mathbb{B}(u,r),\mathbb{B}_{N}(u_{N},r))\leq 5\max\{2,L\}\beta_{N}.

(4.49)

Proof. By the triangle inequality of the Hausdorff distance in the space of $\mathscr{U}_{L}$ , we have

\mathbb{H}_{\mathscr{G}}(\mathbb{B}(u,r),\mathbb{B}_{N}(u_{N},r))\leq\mathbb{H}_{\mathscr{G}}(\mathbb{B}(u,r),\mathbb{B}(u_{N},r))+\mathbb{H}_{\mathscr{G}}(\mathbb{B}(u_{N},r),\mathbb{B}_{N}(u_{N},r)).

From 4.1, $\mathbb{H}_{\mathscr{G}}(\mathbb{B}(u,r),\mathbb{B}(u_{N},r))\leq\max\{2,L\}\beta_{N}$ , so it suffices to show $\mathbb{H}_{\mathscr{G}}(\mathbb{B}(u_{N},r),\mathbb{B}_{N}(u_{N},r))\leq 4\max\{2,L\}\beta_{N}$ . By the definition of $\mathbb{D}_{\mathscr{G}}$ ,

$\displaystyle\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u_{N},r),\mathbb{B}_{N}(u_{N},r))$	$\displaystyle=$	$\displaystyle\sup_{\tilde{u}\in\mathbb{B}(u_{N},r)}\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},\mathbb{B}_{N}(u_{N},r))$
	$\displaystyle\leq$	$\displaystyle\sup_{\tilde{u}\in\mathbb{B}(u_{N},r)}[\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},\tilde{u}_{N})+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},\mathbb{B}_{N}(u_{N},r)]$
	$\displaystyle\leq$	$\displaystyle\sup_{\tilde{u}\in\mathbb{B}(u_{N},r)}[\max\{2,L\}\beta_{N}+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},\mathbb{B}_{N}(u_{N},r)]$
	$\displaystyle\leq$	$\displaystyle\mathbb{D}_{\mathscr{G}}(\mathbb{B}_{N}(u_{N},\max\{2,L\}\beta_{N}+r),\mathbb{B}_{N}(u_{N},r))+\max\{2,L\}\beta_{N}$
	$\displaystyle\leq$	$\displaystyle 2\max\{2,L\}\beta_{N},$

where $\tilde{u}_{N}$ is the projection of $\tilde{u}$ . The second inequality follows from (4.48), the third inequality is due to the fact that for any $\tilde{u}\in\mathbb{B}(u_{N},r)$ , its projection $\tilde{u}_{N}$ satisfies

\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u_{N})\leq\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},\tilde{u})+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},u_{N})\leq\max\{2,L\}\beta_{N}+r,

that is, $\tilde{u}_{N}\in\mathbb{B}(u_{N},\max\{2,L\}\beta_{N}+r)$ . The last inequality follows from part (i) of 4.1. Likewise, we have

$\displaystyle\mathbb{D}_{\mathscr{G}}(\mathbb{B}_{N}(u_{N},r),\mathbb{B}(u_{N},r))$	$\displaystyle=$	$\displaystyle\sup_{\tilde{u}_{N}\in\mathbb{B}_{N}(u_{N},r)}\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},\mathbb{B}(u_{N},r))$
	$\displaystyle\leq$	$\displaystyle\sup_{\tilde{u}_{N}\in\mathbb{B}_{N}(u_{N},r)}[\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},\tilde{u})+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},\mathbb{B}(u_{N},r))]$
	$\displaystyle\leq$	$\displaystyle\sup_{\tilde{u}_{N}\in\mathbb{B}_{N}(u_{N},r)}\max\{2,L\}\beta_{N}+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},\mathbb{B}(u_{N},r))$
	$\displaystyle\leq$	$\displaystyle\sup_{\tilde{u}_{N}\in\mathbb{B}_{N}(u_{N},r)}\max\{2,L\}\beta_{N}+\mathbb{D}_{\mathscr{G}}(\mathbb{B}(u_{N},\max\{2,L\}\beta_{N}+r),\mathbb{B}(u_{N},r))$
	$\displaystyle\leq$	$\displaystyle 4\max\{2,L\}\beta_{N},$

where the third inequality is derived from the fact that for any $\tilde{u}_{N}\in\mathbb{B}_{N}(u_{N},r)$ , that is, $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u_{N})\leq r$ , we have $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},u_{N})\leq\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},\tilde{u}_{N})+\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u}_{N},u_{N})\leq\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},\tilde{u}_{N})+r\leq\max\{2,L\}\beta_{N}+r$ , that is $\tilde{u}\in\mathbb{B}(u_{N},\max\{2,L\}\beta_{N}+r)$ . The last inequality follows from part (ii) of 4.1. Finally, by the definition of Hausdorff distance under metric $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}$ , the proof is complete.

4.2 Error bound on the optimal value

Theorem 4.2

Let $u^{0}\in\mathscr{U}_{L}$ and $u_{N}^{0}\in\mathscr{U}_{N}$ be defined as in (4.46). If $\mathscr{G}=\mathscr{G}_{K}\cup\mathscr{G}_{I}$ in (4.45), then $|R(\xi)-R_{N}(\xi)|\leq 10\max\{2,L\}\beta_{N}$ .

Proof. It is well known that

	$\displaystyle\|R_{N}(\xi)-R(\xi)\|$	$\displaystyle\leq$	$\displaystyle\max_{x\in\mathds{R}}\left\|\inf_{u\in\mathbb{B}(u^{0},r)}\left\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\right\}\right.$
			$\displaystyle\left.\quad\quad\quad-\inf_{u_{N}\in\mathbb{B}_{N}(u^{0}_{N},r)}\left\{u_{N}(x)+{\mathbb{E}}_{P}[u_{N}(\xi-x)]\right\}\right\|.$

Let $\delta$ be a small positive number. For any $x\in\mathds{R}$ , we can find $u^{x}\in\mathbb{B}(u^{0},r)$ and $u^{x}_{N}\in\mathbb{B}_{N}(u^{0}_{N},r)$ depending on $\delta$ such that

$\displaystyle u^{x}(x)+{\mathbb{E}}_{P}[u^{x}(\xi-x)]$	$\displaystyle\leq$	$\displaystyle\inf_{u\in{\mathbb{B}(u^{0},r)}}\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}+\delta,$
$\displaystyle u^{x}_{N}(x)+{\mathbb{E}}_{P}[u^{x}_{N}(\xi-x)]$	$\displaystyle\geq$	$\displaystyle\inf_{u_{N}\in{\mathbb{B}_{N}(u_{N}^{0},r)}}\{u_{N}(x)+{\mathbb{E}}_{P}[u_{N}(\xi-x)]\},$
$\displaystyle\sup_{t\in[a,b]}\|u_{N}^{x}(t)-u^{x}(t)\|$	$\displaystyle\leq$	$\displaystyle\mathbb{H}(\mathbb{B}_{N}(u_{N}^{0},r),\mathbb{B}(u^{0},r))+\delta,$

where $\mathbb{H}$ denotes the Hausdorff distance in the space of continuous functions defined on $[a,b]$ equipped with infinity norm $\|\cdot\|_{\infty}$ . Combining the above inequalities

	$\displaystyle\inf_{u_{N}\in{\mathbb{B}_{N}(u^{0}_{N},r)}}\{u_{N}(x)+{\mathbb{E}}_{P}[u_{N}(\xi-x)]\}-\inf_{u\in{\mathbb{B}(u^{0},r)}}\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}$
	$\displaystyle\leq{\mathbb{E}}_{P}[u_{N}^{x}(\xi-x)-u^{x}(\xi-x)]+(u_{N}^{x}(x)-u^{x}(x))+\delta$
	$\displaystyle\leq 2\\|u_{N}^{x}-u^{x}\\|_{\infty}+\delta$
	$\displaystyle\leq 2\mathbb{H}(\mathbb{B}_{N}(u^{0}_{N},r),\mathbb{B}(u^{0},r))+3\delta.$

By exchanging the positions of $\mathbb{B}_{N}(u^{0}_{N},r)$ and $\mathbb{B}(u^{0},r)$ , we have

	$\displaystyle\inf_{u\in{\mathbb{B}(u^{0},r)}}\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}-\inf_{u_{N}\in{\mathbb{B}_{N}(u^{0}_{N},r)}}\{u_{N}(x)+{\mathbb{E}}_{P}[u_{N}(\xi-x)]\}$
	$\displaystyle\leq 2\mathbb{H}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u^{0}_{N},r))+3\delta.$

Since $\delta\geq 0$ can be arbitrarily small, we obtain

	$\displaystyle\|R_{N}(\xi)-R(\xi)\|$	$\displaystyle\leq$	$\displaystyle\max_{x\in\mathds{R}}\left\|\inf_{u\in{\mathbb{B}(u^{0},r)}}\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}-\inf_{u\in{\mathbb{B}_{N}(u_{N}^{0},r)}}\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}\right\|$
		$\displaystyle\leq$	$\displaystyle 2\mathbb{H}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u^{0}_{N},r)).$

The main challenge here is that $\mathbb{H}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u^{0}_{N},r))$ differs from $\mathbb{H}_{\mathscr{G}}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u^{0}_{N},r))$ . In what follows, we show that

\mathbb{H}_{\mathscr{G}_{I}}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u^{0}_{N},r))=\mathbb{H}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u^{0}_{N},r)),

(4.50)

where $\mathscr{G}_{I}=\{\mathds{1}_{(a,t]}(\cdot):t\in[a,b]\}$ . Let $\tilde{u}\in\mathbb{B}(u^{0},r)$ and $\tilde{u}_{N}\in\mathbb{B}_{N}(u^{0}_{N},r)$ ,

$\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathscr{G}_{I}}(\tilde{u},\tilde{u}_{N})$	$\displaystyle=$	$\displaystyle\sup_{g\in\mathscr{G}_{I}}\left\|\int_{a}^{b}g(t)d\tilde{u}(t)-\int_{a}^{b}g(t)d\tilde{u}_{N}(t)\right\|$	(4.51)
	$\displaystyle=$	$\displaystyle\sup_{t\in[a,b]}\left\|\int_{a}^{t}1d\tilde{u}(s)-\int_{a}^{t}1d\tilde{u}_{N}(s)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{t\in[a,b]}\left\|\tilde{u}(t)-\tilde{u}_{N}(t)-\tilde{u}(a)+\tilde{u}_{N}(a)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{t\in[a,b]}\|\tilde{u}(t)-\tilde{u}_{N}(t)\|.$

The last equality is due to the fact that $u(a)=u_{N}(a)=0$ . By taking infimum w.r.t. $\tilde{u}_{N}\in\mathbb{B}_{N}(u^{0}_{N},r)$ and taking superemum w.r.t. $\tilde{u}\in\mathbb{B}(u^{0},r)$ on both sides of the equality above, we obtain $\mathbb{D}_{\mathscr{G}_{I}}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u_{N}^{0},r))=\mathbb{D}(\mathbb{B}(u^{0},r),\mathbb{B}_{N}(u_{N}^{0},r)).$ Swapping the positions between $\mathbb{B}(u^{0},r)$ and $\mathbb{B}_{N}(u_{N}^{0},r)$ , we obtain (4.50). Combining with Theorem 4.1, we obtain the conclusion.

5 Extensions

In this section, we discuss potential extensions of the MOCE and RMOCE models by considering utility functions with unbounded domain and multivariate utility functions.

5.1 Utility function with unbounded domain

In some important applications such as finance and economics, the underlying random variables which represent market demand, stock price and rate of return often have unbounded support. This raises a question as to whether our proposed model and computational schemes in the previous sections can be effectively applied to these situations. Here we discuss this issue.

We start by defining a set of nonconstant increasing function defined over ${\rm I\!R}$ denoted by $\mathscr{U}_{\infty}$ . We no longer restrict the domain of $u$ to a bounded interval $[a,b]$ . Let $u^{0}\in\mathscr{U}_{\infty}$ , the $\zeta$ -ball in $\mathscr{U}_{\infty}$ is defined as

\mathbb{B}_{\infty}(u^{0},r):=\{u\in\mathscr{U}_{\infty}\;|\;\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u^{0})\leq r\},

where $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}$ is the pseudo metric defined in (4.41) and $\mathscr{G}$ is a set of measurable function throughout this section. The robust modified optimized certainty equivalent model based on $\mathbb{B}_{\infty}(u^{0},r)$ is defined as

{\rm(RMOCE)_{\infty}}\quad\;\;R_{\infty}(\xi):=\displaystyle{\max_{x\in X}\min_{u\in\mathbb{B}_{\infty}(u^{0},r)}}\;\;u(x)+{\mathbb{E}}_{P}[u(\xi-x)],

(5.52)

where $X$ is a compact implementable decisions over $X\subset{\rm I\!R}$ . Our aim is to solve $\rm(RMOCE)_{\infty}$ and our concern is that the numerical schemes proposed in Section 3 cannot be applied to this problem directly. Let $u^{0}_{\rm truc}$ be the truncation of $u^{0}$ over $[a,b]$ , define

\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r):=\{u\in\mathscr{U}_{[a,b]}\;|\;\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u^{0}_{\rm truc})\leq r\},

(5.53)

where $\mathscr{U}_{[a,b]}$ denotes the set of nonconstant nondecreasing functions defined over $[a,b]$ . We rewrite (4.43) as

{\rm(RMOCE)_{[a,b]}}\quad\;\;R_{[a,b]}(\xi):=\displaystyle{\max_{x\in X}\min_{u\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}}\;\;u(x)+{\mathbb{E}}_{P}[u(\xi-x)].

(5.54)

What we are interested here is the difference between $\rm(RMOCE)_{\infty}$ and $\rm(RMOCE)_{[a,b]}$ in terms of the optimal value. We will show that the difference between $R_{\infty}(\xi)$ and $R_{[a,b]}(\xi)$ is only related with the radius of the $\zeta$ -ball under some moderate conditions and therefore we may solve $\rm(RMOCE)_{\infty}$ approximately by solving $\rm(RMOCE)_{[a,b]}$ . The latter can be solved by the piecewise linear approximation scheme detailed in Section 3.

To build the bridge between $\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)$ and $\mathbb{B}_{\infty}(u^{0},r)$ , we define the following set

\tilde{\mathbb{B}}_{[a,b]}(u^{0},r):=\{u\in{\mathscr{U}}_{\infty}:\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,u^{0})\leq r,u(t)=u(a)\text{~{}for~{}}t<a,\tilde{u}(t)=u(b)\text{~{}for~{}}t>b\}.

(5.55)

Notice that $\tilde{\mathbb{B}}_{[a,b]}(u^{0},r)$ is not a ball which is defined under the pseudo metric. Then we can establish the connection between $\tilde{\mathbb{B}}_{[a,b]}$ and $\mathbb{B}_{\infty}$ in the following proposition.

Proposition 5.1

Let $u^{0}\in\mathscr{U}_{\infty}$ and assume that there exists a position number $\theta$ such that

\sup_{g\in\mathscr{G},u\in\mathbb{B}_{\infty}(u^{0},r)}\int_{{\rm I\!R}}|g(t)|du(t)\leq\theta.

(5.56)

Then for any $\epsilon>0$ there exist constants $a<0$ and $b>0$ such that

\mathbb{H}_{\mathscr{G}}(\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon),\mathbb{B}_{\infty}(u^{0},r))\leq\sup_{g\in\mathscr{G},u\in\mathbb{B}_{\infty}(u^{0},r)}\left|\int_{{\rm I\!R}\setminus[a,b]}g(t)du(t)\right|\leq\epsilon.

(5.57)

Proof. From the condition (5.56), for any $\epsilon>0$ there exist constants $a<0$ and $b>0$ such that

\sup_{g\in\mathscr{G},u\in\mathbb{B}_{\infty}(u^{0},r)}\int_{{\rm I\!R}\setminus[a,b]}|g(t)|du(t)\leq\epsilon.

(5.58)

For any fixed $u\in\mathbb{B}_{\infty}(u^{0},r)$ , let $\tilde{u}=u(t)$ for $t\in[a,b]$ and $\tilde{u}(t)=u(a)$ for $t<a$ and $\tilde{u}(t)=u(b)$ for $t>b$ . Then we can obtain

$\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(\tilde{u},u^{0})$	$\displaystyle=$	$\displaystyle\sup_{g\in\mathscr{G}}\left\|\int_{{\rm I\!R}}g(t)d\tilde{u}(t)-\int_{{\rm I\!R}}g(t)du^{0}(t)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{g\in\mathscr{G}}\left\|\int_{a}^{b}g(t)d(\tilde{u}(t)-u^{0}(t))-\int_{{\rm I\!R}\setminus[a,b]}g(t)du^{0}(t)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{g\in\mathscr{G}}\left\|\int_{a}^{b}g(t)d(\tilde{u}(t)-u^{0}(t))+\int_{{\rm I\!R}\setminus[a,b]}g(t)d(u(t)-u^{0}(t))-\int_{{\rm I\!R}\setminus[a,b]}g(t)du(t)\right\|$
	$\displaystyle\leq$	$\displaystyle r+\sup_{g\in\mathscr{G}}\left\|\int_{{\rm I\!R}\setminus[a,b]}g(t)du(t)\right\|\leq r+\epsilon.$

Hence $\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)$ and

\mathbb{D}_{\mathscr{G}}(u,\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon))\leq\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(u,\tilde{u})=\sup_{g\in\mathscr{G}}\left|\int_{{\rm I\!R}\setminus[a,b]}g(t)du(t)\right|\leq\epsilon.

(5.59)

By taking supremum w.r.t. $u$ over $\mathbb{B}_{\infty}(u^{0},r)$ on both sides of (5.59), we obtain

\mathbb{D}_{\mathscr{G}}(\mathbb{B}_{\infty}(u^{0},r),\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon))\leq\epsilon.

Note that $\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)\subset\mathbb{B}_{\infty}(u^{0},r+\epsilon)$ , then we have

\mathbb{D}_{\mathscr{G}}(\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon),\mathbb{B}_{\infty}(u^{0},r))\leq\mathbb{D}_{\mathscr{G}}(\mathbb{B}_{\infty}(u^{0},r+\epsilon),\mathbb{B}_{\infty}(u^{0},r))\leq\epsilon,

where the last inequality is from part (i) of 4.1. Consequently (5.57) follows.

From 5.1, we can see that when the interval $[a,b]$ is large enough, the difference between $\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)$ and $\mathbb{B}_{\infty}(u^{0},r)$ will not be significant. We now turn to compare $\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)$ with $\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)$ , and we could get similar conclusion in [18, Section 6.2] that the extended function $\tilde{u}$ of $u\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)$ is in $\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)$ , where $\epsilon\geq\sup_{g\in\mathscr{G}}\left|\int_{{\rm I\!R}\setminus[a,b]}g(t)du^{0}(t)\right|$ , because $u^{0}\in\mathbb{B}_{\infty}(u^{0},r)$ . By exploiting the relationship, we can quantify the difference between $R_{\infty}(\xi)$ and $R_{[a,b]}(\xi)$ in the following theorem.

Theorem 5.1

Assume there exists a constant $\delta>0$ such that

\sup_{u\in\mathbb{B}_{\infty}(u^{0},r+\delta),x\in X}\int_{{\rm I\!R}}|u(\xi-x)|P(d\xi)<\infty

(5.60)

and the condition in 5.1 is fulfilled. Then for any $\epsilon>0$ , there exist constants $a<0$ and $b>0$ such that

|R_{\infty}(\xi)-R_{[a,b]}(\xi)|\leq 3\epsilon.

(5.61)

Proof. It follows from conditions (5.60) and (5.56) that for any $0<\epsilon<\delta$ there exist constants $a<0<b$ such that

\sup_{u\in\mathbb{B}_{\infty}(u^{0},r+\delta),x\in X}\int_{\xi-x\in{\rm I\!R}\setminus[a,b]}|u(\xi-x)|P(d\xi)\leq\epsilon/3

(5.62)

and (5.58) holds. Since $\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)\subset\mathbb{B}_{\infty}(u^{0},r+\delta)$ , the above inequality implies

\sup_{u\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\bigl{(}|u(a)|P((-\infty,a))+|u(b)|P((b,+\infty))\bigr{)}\leq\epsilon/3.

(5.63)

By definitions of $R_{\infty}(\xi)$ and $R_{[a,b]}(\xi)$ ,

			$\displaystyle\|R_{\infty}(\xi)-R_{[a,b]}(\xi)\|$
		$\displaystyle\leq$	$\displaystyle\sup_{x\in X}\left\|\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\left[u(x)+\int_{{\rm I\!R}}u(\xi-x)P(d\xi)\right]-\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]\right\|$
		$\displaystyle\leq$	$\displaystyle\sup_{x\in X}\left\|\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\left[u(x)+\int_{{\rm I\!R}}u(\xi-x)P(d\xi)\right]-\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]\right\|$
		$\displaystyle+$	$\displaystyle\sup_{x\in X}\left\|\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]-\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]\right\|.$

Let us estimate the first term at the right side of the last inequality above. Observe that

	$\displaystyle\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\left[u(x)+\int_{{\rm I\!R}}u(\xi-x)P(d\xi)\right]-\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\left\|\int_{{\rm I\!R}}u(\xi-x)P(d\xi)-\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right\|+\|u(x)-\tilde{u}(x)\|\right]$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\Bigg{[}\left\|\int_{\xi-x\in[a,b]}u(\xi-x)P(d\xi)-\int_{\xi-x\in[a,b]}\tilde{u}(\xi-x)P(d\xi)\right\|$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\|u(x)-\tilde{u}(x)\|+\frac{2\epsilon}{3}\Bigg{]}$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\sup_{t\in[a,b]}\|u(t)-\tilde{u}(t)\|+\|u(x)-\tilde{u}(x)\|+\frac{2\epsilon}{3}\right]$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\mathsf{d\kern-0.70007ptl}_{\mathscr{G}_{I}}(u,\tilde{u})+\|u(x)-\tilde{u}(x)\|+\frac{2\epsilon}{3}\right].$

Thus

	$\displaystyle\sup_{x\in X}\left\|\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\left[u(x)+\int_{{\rm I\!R}}u(\xi-x)P(d\xi)\right]-\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]\right\|$
	$\displaystyle\leq 2\mathbb{H}_{\mathscr{G}_{I}}(\mathbb{B}_{\infty}(u^{0},r),\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon))+\frac{2\epsilon}{3}.$		(5.64)

The last inequality holds due to $X\subset[a,b]$ . Now let us turn to the second term. For any $x\in X$ and a fixed positive number $\varepsilon$ , we can find $\hat{u}_{\xi}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)$ and its extended function $\tilde{u}_{\xi}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)$ such that

	$\displaystyle\hat{u}_{\xi}(x)+\int_{\xi-x\in[a,b]}\hat{u}_{\xi}(\xi-x)P(d\xi)$	$\displaystyle\leq$	$\displaystyle\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]+\varepsilon,$
	$\displaystyle\tilde{u}_{\xi}(x)+\int_{{\rm I\!R}}\tilde{u}_{\xi}(\xi-x)P(d\xi)$	$\displaystyle\geq$	$\displaystyle\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right].$

Consequently we have

	$\displaystyle\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]-\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]$
	$\displaystyle\leq\tilde{u}_{\xi}(x)+\int_{{\rm I\!R}}\tilde{u}_{\xi}(\xi-x)P(d\xi)-\hat{u}_{\xi}(x)-\int_{\xi-x\in[a,b]}\hat{u}_{\xi}(\xi-x)P(d\xi)+\varepsilon$
	$\displaystyle=\int_{\xi-x\in{\rm I\!R}\setminus[a,b]}\tilde{u}_{\xi}(\xi-x)P(d\xi)+\varepsilon$
	$\displaystyle\leq\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\Bigl{(}\|\tilde{u}(a)\|P((-\infty,a))+\|\tilde{u}(b)\|P((b,+\infty))\Bigr{)}+\varepsilon.$		(5.65)

The second equality is satisfied because $\tilde{u}_{\xi}$ is the extended function of $\hat{u}_{\xi}$ and $x\in[a,b]$ . By exchanging the positions of $\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)$ and $\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)$ , we have

	$\displaystyle\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]-\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]$
	$\displaystyle\leq\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\Bigl{(}\|\tilde{u}(a)\|P((-\infty,a))+\|\tilde{u}(b)\|P((b,+\infty))\Bigr{)}+\varepsilon.$		(5.66)

Since $\varepsilon$ can be arbitrarily small, we obtain

	$\displaystyle\sup_{x\in X}\Bigg{\|}\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad-\inf_{\tilde{u}_{2}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]\Bigg{\|}$
	$\displaystyle\leq\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}(\|\tilde{u}(a)\|P((-\infty,a))+\|\tilde{u}(b)\|P((b,+\infty)))\leq\epsilon/3.$		(5.67)

Combining (5.64) -(5.67), we obtain (5.61) from (5.57).

5.2 Multiattribute utility case

The OCE models that we discussed so far are for single attribute decision making. It might be interesting to ask whether the models can be extended to multi-attribute decision making. The answer is yes. Here we present two potential extended models. One is to consider the case that the utility function has an additive structure, that is, the multivariate utility function is the sum of the marginal utility functions of each attribute. Such utility functions are widely used in the literature, see e.g. [28, 1, 2]. In that case, given $\xi:\Omega\rightarrow{\rm I\!R}^{m}$ and $U:{\rm I\!R}^{m}\rightarrow{\rm I\!R}$ , we may define the MOCE as

{\rm(MMOCE-A)}\quad\quad\displaystyle{M_{u}(\xi):=\sup_{x\in{\rm I\!R}^{m}}\;\;U(x)+{\mathbb{E}}_{P}[U(\xi-x)]},

(5.68)

where the multiattribute utility function $U(x)=\sum_{i=1}^{m}u_{i}(x_{i})$ and $u_{i}:{\rm I\!R}\rightarrow{\rm I\!R}$ is the marginal utility function with respect to the $i$ th attribute. The formulation can be simplified when the probability distribution of $\xi$ is the product of its marginal distributions:

\displaystyle{M_{u}(\xi)=\sum_{i=1}^{m}\sup_{x_{i}\in{\rm I\!R}}\;\;\{u_{i}(x_{i})+{\mathbb{E}}_{P_{i}}[u_{i}(\xi_{i}-x_{i})]\}}.

(5.69)

The economic interpretation of the model is that the decision maker might have a portfolio of random assets $x_{i}$ , $i=1,\cdots m$ and the DM would like to cash out $x_{i}$ from asset $i$ . The marginal utilities may be the same or different. Problem (5.69) is decomposable as it stands, thus it retains the properties outlined in Section 2 and can be calculated by calculating $m$ single attribute MOCE simultaneously.

When the utility function is non-additive, we may consider the following model:

{\rm(MMOCE-B)}\quad\quad\displaystyle{M_{u}(\xi):=\sup_{t\in{\rm I\!R}_{+}}\;\;\{u(td)+{\mathbb{E}}_{P}[u(\xi-td)]\}},

(5.70)

where $d$ is a fixed vector of weights. In this model, cash to be taken out from the assets is in a prefixed proportion. (MMOCE-B) is essentially a single variate MOCE model. Note that it is possible to further extend model (MMOCE-A) by replacing deterministic vector $x$ with a random vector $X$ :

{\rm(MMOCE-A^{\prime})}\quad\quad\displaystyle{M_{u}(\xi):=\sup_{X}\;\;{\mathbb{E}}[U(X)]+{\mathbb{E}}[U(\xi-X)]}.

(5.71)

This kind of model has potential applications in finance where a firm detaches risk assets from non-risky assets in order to reduce the systemic risk [47]. In that context, problem (MMOCE-A’) is to find optimal separation $X$ from the existing overall portfolio of assets $\xi$ . The problem is intrinsically two-stage, one may use linear/polynomial decision rule [5] or K-adapativity method [8] to obtain a (MMOCE-A)-version of approximation. Note also that model (MMOCE-A’) is related to the IDR-based CDE model recently studied by Qi et al. [39] who use OCE for optimizing individualised medical treatment. Since all of the extended models outlined above require much more detailed analysis, we leave them for future research.

6 Quantitative statistical robustness

6.1 Motivation

In Section 3, we discuss in detail how to obtain an approximate solution of (RMOCE) (to ease reading, we repeat the model here):

{\rm(RMOCE-P)}\quad\quad\displaystyle{R(P):=\max_{x\in X}\inf_{u\in\mathcal{U}}{\mathbb{E}}_{P}[u(x)+u(\xi-x)]},

(6.72)

where $\xi$ follows probability distribution $P$ . A key assumption is that the true probability distribution $P$ is known and discretely distributed. This assumption may not be satisfied in data-driven problems where the true $P$ is unknown, and one often uses empirical data to construct an approximation of $P$ . Even worse is that such data may be contaminated.

Let $\tilde{\xi}^{1},...,\tilde{\xi}^{N}$ denote the contaminated empirical data (we call them perceived data and we use $N$ to denote the size of samples rather than number of breakpoints without causing confusion henceforth). Let $Q_{N}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{\tilde{\xi}_{i}}$ be the empirical distribution constructed with the perceived data, where $\delta_{\tilde{\xi}_{i}}$ is the Dirac measure at $\tilde{\xi}_{i}$ . We use the perceived data to solve the RMOCE model (assume that the model is solved precisely without computational error):

{\rm(RMOCE-Q_{N})}\quad\quad\displaystyle{R(Q_{N}):=\max_{x\in X}\inf_{u\in\mathcal{U}}{\mathbb{E}}_{Q_{N}}[u(x)+u(\xi-x)]}.

(6.73)

We then ask ourselves as to whether $R(Q_{N})$ is a good estimation of $R(P)$ from statistical point of view. This question is concerned with data perturbation rather than modelling/computational errors as discussed in Section 3.

To proceed the analysis, we introduce another empirical distribution, denoted by $P_{N}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{\xi_{i}}$ , which is constructed by the purified perceived data $\xi^{1},...,\xi^{N}$ (the noise in the perceived data is detached, we call them real data henceforth). In practice, it is impossible to detach the noise, we introduce the notion purely for the convenience of statistical analysis. Let $R(P_{N})$ be the optimal value of (RMOCE-P) by replacing $P$ with $P_{N}$ . By the classical law of large numbers, we know that $P_{N}\to P$ and $R(P_{N})\to R(P)$ under moderate conditions. Thus in the literature of stochastic programming, $R(P_{N})$ is called a statistical estimator of $R(P)$ and here we emphasize that this estimator is based on real data.

Our question is then whether $R(Q_{N})$ is close to $R(P_{N})$ because the former is the only quantity that we are able to obtain. To address this question, we assume the perceived data are iid which means $Q_{N}\to Q$ for some $Q$ as $N\to\infty$ . In other words, the perceived data may be viewed as if they are generated by the invisible distribution $Q$ . Let $R(Q)$ denote the optimal value of (RMOCE) with $P$ being replaced by $Q$ . We then have

R(Q_{N})-R(P_{N})=R(Q_{N})-R(Q)+R(Q)-R(P)+R(P)-R(P_{N}).

Thus if $R(Q_{N})\to R(Q)$ as $N\to\infty$ uniformly for all $Q$ close to $P$ and $R(Q)\to R(P)$ as $Q\to P$ , then $R(Q_{N})$ is close $R(P_{N})$ . This explains roughly the motivation of this section. The formal quantitative statistical robust analysis is a bit more complex as we will examine the difference between the probability distributions of $R(Q_{N})$ and $R(P_{N})$ under some metric rather than estimating $R(Q_{N})-R(P_{N})$ for each given set of perceived data.

6.2 Statistical analysis

For any two probability measures $P,Q\in\mathscr{P}({\rm I\!R})$ , define the pseudo-metric between $P$ and $Q$ by

\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(P,Q):=\sup_{g\in\mathscr{G}}\bigl{|}{\mathbb{E}}_{P}[g(\xi)]-{\mathbb{E}}_{Q}[g(\xi)]\bigr{|}.

(6.74)

It can be seen that $\mathsf{d\kern-0.70007ptl}_{\mathscr{G}}(P,Q)$ is the maximal difference between the expected values of the class of measurable functions $\mathscr{G}$ with respect to $P$ and $Q$ . The specific pseudo metrics that we consider in this paper are the Fortet-Mourier metric and the Kantorovich metric. Recall that the $p$ -th order Fortet-Mourier metric with $p\geq 1$ for $P,Q\in\mathscr{P}({\rm I\!R})$ :

\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q):=\sup_{g\in\mathscr{G}_{p}({\rm I\!R})}\left|\int_{{\rm I\!R}}g(\xi)P(d\xi)-\int_{{\rm I\!R}}g(\xi)Q(d\xi)\right|,

(6.75)

where

\mathscr{G}_{p}({\rm I\!R}):=\{g:{\rm I\!R}\rightarrow{\rm I\!R}\mid|g(\xi)-g(\tilde{\xi})|\leq c_{p}(\xi,\tilde{\xi})\|\xi-\tilde{\xi}\|,\forall\xi,\tilde{\xi}\in{\rm I\!R}\}

and

c_{p}(\xi,\tilde{\xi}):=\max\{1,\|\xi\|,\|\tilde{\xi}\|\}^{p-1},\quad\forall\xi,\tilde{\xi}\in{\rm I\!R}.

When $p=1$ , the functions in $\mathscr{G}_{p}({\rm I\!R})$ are globally Lipschitz continuous with modulus $1$ and $\mathscr{G}_{p}({\rm I\!R})$ coincides with $\mathscr{G}_{K}$ in (3.25). Thus $\mathsf{d\kern-0.70007ptl}_{FM,1}(P,Q)=\mathsf{d\kern-0.70007ptl}_{K}(P,Q)$ . For more details, see [16, 42, 48].

To get the statistical robustness result, let ${\rm I\!R}^{\otimes N}$ and $\mathcal{B}({\rm I\!R})^{\otimes N}$ denote the Cartesian product ${\rm I\!R}\times\cdot\cdot\cdot\times{\rm I\!R}$ and its Borel sigma algebra. Let $P^{\otimes N}$ denote the probability measure on the measurable space $({\rm I\!R}^{\otimes N},\mathcal{B}({\rm I\!R})^{\otimes N})$ with marginal $P$ on each $({\rm I\!R},\mathcal{B}({\rm I\!R}))$ and $Q^{\otimes N}$ with marginal $Q$ . Now we can state the definition of statistical robustness of a statistic estimator, which is proposed in [18, 48].

Definition 6.1 (Quantitative statistical robustness)

Let ${\cal M}\subset\mathscr{P}({\rm I\!R})$ be a set of probability measures. A sequence of statistical estimators $\hat{T}_{N}$ is said to be quantitatively statistically robust on $\mathcal{M}$ w.r.t. $(\mathsf{d\kern-0.70007ptl}_{K},\mathsf{d\kern-0.70007ptl}_{FM,p})$ if there exists a positive constant $C$ such that for all $N$

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}(P^{\otimes N}\circ\hat{T}_{N}^{-1},Q^{\otimes N}\circ\hat{T}_{N}^{-1})\leq C\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty,\;\forall P,Q\in\mathcal{M},

(6.76)

where $\mathsf{d\kern-0.70007ptl}_{K}$ is the Kantorovich metric on $\mathscr{P}({\rm I\!R})$ and $\mathsf{d\kern-0.70007ptl}_{FM,p}$ is the Fortet-Mourier metric on $\mathscr{P}({\rm I\!R})$ .

Here $P^{\otimes N}\circ\hat{T}_{N}^{-1}$ and $Q^{\otimes N}\circ\hat{T}_{N}^{-1}$ are probability measures/distributions on ${\rm I\!R}$ . The next theorem states quantitative statistical robustness of $\hat{R}_{N}:=R(Q_{N})$ .

Theorem 6.1

Assume: (a) There exists a positive constant $L>0$ such that for all $x\in X$ and $u\in\mathcal{U}$ ,

|u(\xi-x)-u(\xi^{\prime}-x)|\leq L\max\left\{1,|\xi|,|\xi^{\prime}|\right\}^{p-1}|\xi-\xi^{\prime}|,

(b) set $\mathcal{U}$ is chosen such that $\psi(t):=\sup_{u\in\mathcal{U}}|u(t)|$ is a gauge function, that is, $\psi:{\rm I\!R}\rightarrow[0,\infty)$ is continuous and $\psi\geq 1$ holds outside a compact set. Then for any $N\in\mathbb{N}$ ,

\displaystyle{\mathsf{d\kern-0.70007ptl}_{K}(P^{\otimes N}\circ\hat{R}_{N}^{-1},Q^{\otimes N}\circ\hat{R}_{N}^{-1})\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q),\forall P,Q\in\mathcal{M}^{\phi}},

(6.77)

where $\phi(\xi):=C_{0}+L(|\xi|+|\xi|^{p})$ for some constant $C_{0}>1$ .

Proof. By definition

	$\displaystyle\mathsf{d\kern-0.70007ptl}_{K}(P^{\otimes N}\circ\hat{R}_{N}^{-1},Q^{\otimes N}\circ\hat{R}_{N}^{-1})$
	$\displaystyle=\sup_{g\in\mathscr{G}_{K}}\left\|\int_{{\rm I\!R}}g(t)P^{\otimes N}\circ\hat{R}_{N}^{-1}(dt)-\int_{{\rm I\!R}}g(t)Q^{\otimes N}\circ\hat{R}_{N}^{-1}(dt)\right\|$
	$\displaystyle=\sup_{g\in\mathscr{G}_{K}}\left\|\int_{{\rm I\!R}^{\otimes N}}g(\hat{R}(\boldsymbol{\xi}^{N}))P^{\otimes N}(d\boldsymbol{\xi}^{N})-\int_{{\rm I\!R}^{\otimes N}}g(\hat{R}(\boldsymbol{\xi}^{N}))Q^{\otimes N}(d\boldsymbol{\xi}^{N})\right\|,$		(6.78)

where $\boldsymbol{\xi}^{N}=(\xi^{1},...,\xi^{N})$ and we write $\hat{R}(\boldsymbol{\xi}^{N})$ for $\hat{R}_{N}$ . To see the well-definedness of the pseudo-metric, notice that for every $g\in\mathscr{G}_{K}$ and a fixed $\boldsymbol{\xi}_{0}^{N}\in{\rm I\!R}^{\otimes N}$

|g(\hat{R}(\boldsymbol{\xi}^{N}))|\leq|g(\hat{R}(\boldsymbol{\xi}_{0}^{N}))|+|\hat{R}(\boldsymbol{\xi}^{N})-\hat{R}(\boldsymbol{\xi}_{0}^{N})|,

(6.79)

where $\boldsymbol{\xi}_{0}^{N}\in{\rm I\!R}^{\otimes N}$ is fixed. From condition (b) and nondecreasing property of $u$ , there exists a positive number $C_{0}$ such that

\sup_{u\in\mathcal{U},x\in X}|u(x)+u(\xi-x)|\leq C_{0}+L(|\xi|+|\xi|^{p}),\forall\xi\in{\rm I\!R}.

(6.80)

By the definition of $\hat{R}(\boldsymbol{\xi}^{N})$ , it follows that

|\hat{R}(\boldsymbol{\xi}^{N})|=\left|\max_{x\in X}\inf_{u\in\mathcal{U}}\frac{1}{N}\sum_{k=1}^{N}[u(x)+u(\xi^{k}-x)]\right|\leq\frac{1}{N}\sum_{k=1}^{N}\phi(\xi^{k}).

Moreover,

	$\displaystyle\int_{{\rm I\!R}^{\otimes N}}\|\hat{R}(\boldsymbol{\xi}^{N})\|P^{\otimes N}(d\boldsymbol{\xi}^{N})$	$\displaystyle\leq$	$\displaystyle\int_{{\rm I\!R}^{\otimes N}}\frac{1}{N}\sum_{k=1}^{N}\phi(\xi^{k})P^{\otimes N}(d\boldsymbol{\xi}^{N})$		(6.81)
		$\displaystyle=$	$\displaystyle\int_{{\rm I\!R}}\phi(\xi)P(d\xi)<\infty,\forall P\in\mathcal{M}^{\phi},$		(6.81)

where the equality holds due to the fact that $\xi^{1},...,\xi^{N}$ are independent and identically distributed. Combining (6.79) and (6.81) we can obtain

\int_{{\rm I\!R}^{\otimes N}}g(\hat{R}(\boldsymbol{\xi}^{N}))P^{\otimes N}(d\boldsymbol{\xi}^{N})<\infty,\forall P\in\mathcal{M}^{\phi}.

Similar argument can be made on $\int_{{\rm I\!R}^{\otimes N}}g(\hat{R}(\boldsymbol{\xi}^{N}))Q^{\otimes N}(d\boldsymbol{\xi}^{N})$ for any $Q\in\mathcal{M}^{\phi}$ . Next, for any $P,Q\in\mathcal{M}^{\phi}$ ,

$\displaystyle\|R(P)-R(Q)\|$	$\displaystyle=$	$\displaystyle\left\|\sup_{x\in X}\inf_{u\in\mathcal{U}}\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\}-\sup_{x\in X}\inf_{u\in\mathcal{U}}\{u(x)+{\mathbb{E}}_{Q}[u(\xi-x)]\}\right\|$
	$\displaystyle\leq$	$\displaystyle\sup_{x\in X}\sup_{u\in\mathcal{U}}\|{\mathbb{E}}_{P}[u(\xi-x)]-{\mathbb{E}}_{Q}[u(\xi-x)]\|$
	$\displaystyle\leq$	$\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q),$

where the last inequality follows from condition (a). Then we can obtain

|g(\hat{R}(\tilde{\xi}^{1},...,\tilde{\xi}^{N})-g(\hat{R}(\hat{\xi}^{1},...,\hat{\xi}^{N}))|\leq|\hat{R}(\tilde{\xi}^{1},...,\tilde{\xi}^{N})-\tilde{R}(\hat{\xi}^{1},...,\hat{\xi}^{N})|\\ \leq\frac{1}{N}\sum_{k=1}^{N}\sup_{x\in X,u\in\mathcal{U}}|u(\tilde{\xi}^{k}-x)-u(\hat{\xi}^{k}-x)|\leq\frac{L}{N}\sum_{k=1}^{N}\max\{1+|\tilde{\xi}^{k}|+|\hat{\xi}^{k}|\}^{p-1}|\tilde{\xi}^{k}-\hat{\xi}^{k}|.

(6.82)

It follows by [48, Lemma 4.4] that

\displaystyle(\ref{eq:Kantorovich-distance})\leq\mathsf{d\kern-0.70007ptl}_{FM,p}(P^{\otimes N},Q^{\otimes N})\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)

(6.83)

and hence inequality (6.77).

7 Numerical tests

We have carried out some tests on the numerical schemes for computing RMOCE. In this section, we report the preliminary numerical results.

The first set of tests are about the comparison between the MOCE model (1.3) and OCE model (1.1) in terms of the optimal values and the optimal solutions. We do so by considering $\xi$ following some specific distributions including uniform, Gamma, lognormal and normalized Pareto distribution. The second set of tests are on the RMOCE model and numerical schemes proposed in Section 3. We investigate how the optimal value and the worst case utility function in the RMOCE model change as the radius of ambiguity set and the number of breakpoints vary. We use the parallel particle swarm optimization method [29, 34] to solve problem (3.29) and CVX solver to solve inner minimization problem (3.33). All the tests are carried out in Matlab R2021a installed on a PC (16GB RAM, CPU 2.3 GHz) with Intel Core i7 processor.

Throughout the section we restrict $\mathscr{U}$ to a set of all increasing concave utility functions mapping from a compact interval $[a,b]$ . We take $[a,b]$ as the domain of $u$ which is the union of ranges of $x$ and $\xi-x$ for $x\in[\xi_{\min}/2,\xi_{\max}/2]$ by 3.2 because the number $N$ of breakpoints can guarantee that $\beta_{N}\leq\xi_{\max}-\xi_{\min}$ . We generate iid samples $\xi^{1},...,\xi^{K}$ for random variable $\xi$ with equal probabilities $p_{k}=1/K$ for $k=1,...,K$ .

In the first set of tests of OCE and MOCE, we set the nomial utility function as $u^{0}(t)=(1-e^{-2t})/2$ . Table 1 displays the optimal values and the optimal solutions as well as the CPU times. The $3$ th and $4$ th columns present the optimal values of OCE and MOCE model, and the $5$ th and $6$ th columns present the optimal solutions of OCE and MOCE model, respectively. As we can see, the OCE values are consistently larger than the MOCE values, this is because $u(t)<t$ . Moreover, we find the optimal solutions of MOCE problem (under $x^{*}$ ) fall within $[\xi_{\min}/2,\xi_{\max}/2]$ although we have not displayed the intervals due to the limitation of space. This complies with 2.1.

Table 1: Numerical results of MOCE

Distribution	K	$M_{u}(\xi)$	$S_{u}(\xi)$	$x^{*}$	$\eta^{*}$	CPU time
Uniform (-1,1)	10	-0.5590	-0.4440	-0.2220	-0.4441	0.8700
	100	-0.1950	-0.1782	-0.0891	-0.1782	0.9914
	1000	-0.3508	-0.3008	-0.1504	-0.3008	3.8415
Lognormal (0,1)	10	0.4929	0.6792	0.3396	0.6792	0.4279
	100	0.5182	0.7303	0.3651	0.7303	0.8139
	1000	0.5313	0.7578	0.3789	0.7578	3.7001
Pareto (1,1.5)	10	0.8692	2.0337	1.0169	2.0337	0.4484
	100	0.8990	2.2926	1.1463	2.2925	0.7263
	1000	0.8942	2.2461	1.1231	2.2461	3.6693
Gamma (0.53,3)	10	0.3392	0.4143	0.2072	0.4143	0.5002
	100	0.4415	0.5824	0.2912	0.5825	0.7094
	1000	0.4088	0.5255	0.2628	0.5255	3.7729

In the second set of tests about RMOCE, we set the nominal utility as $u^{0}(t)=(1-e^{-\alpha})/2$ where $\alpha\in{\rm I\!R}_{+}$ is a parameter which determines the degree of concavity of the utility function. The number of random samples is fixed at $K=100$ for the uniform distribution and $K=10$ for Gamma, lognormal and normalized Pareto distribution. The parameters of the tests are listed in Table 3, the 4th column represents the Lipschitz modulus of utility functions. For the cases where the random samples are generated by uniform distribution, Figures 3 and 3 visualize the worst case utility functions and the optimal values as the radius decreases. Figure 3 visualizes the change of optimal values as the number of breakpoints increases. It can be seen that the number of breakpoints has little effect on the optimal value. For the cases when $\xi$ follows Gamma, lognormal and normalized Pareto distribution, Figures 4 and 5 visualize the changes of the worst case utility functions and the optimal values as the radius decreases. We can see that the worst utility function moves closer to the nominal utility function as the radius of the ambiguity set decreases to zero, the optimal value increases as the radius decreases. This is because the Kantorovich ball becomes smaller when the radius decreases. In the case that $r=0$ , the worst case utility function is the piecewise linear approximation of the nominal utility function. The error bound of the optimal value is also depicted in Figures 3 and 5, note that the error bound is getting smaller when the number of breakpoints increases in Figure 3. Table 3 provides the optimal values and running time for different number of breakpoints.

Table 2: Parameters of RMOCE tests

Distribution	$\alpha$	N	L
Uniform (-1,1)	2	10	30
Lognormal (0,1)	1/2	300	10
Pareto (1,1.5)	1/3	300	10
Gamma (0.53,3)	1/2	300	10

Table 3: Running time for different number of breakpoints

N	Optimal value	CPU time
20	-108.4846	20.7796
40	-108.6524	24.1097
60	-108.5553	31.4506
80	-108.5657	36.7656
100	-108.5648	41.9548

8 Conclusion

In this paper we explore variations of the concept of optimized certainty equivalent with a number of new inputs. First, we propose a modified optimized certainty equivalent (MOCE) model by considering the utility of present consumption. The optimal strategy (which balances the present and future consumption) is uniquely determined by the decision maker’s risk preference rather than by his/her utility representations (which is not unique). The resulting MOCE value is positive homogeneous in $u$ . The MOCE is also in alignment with the consumption models in economics. Second, there is a distinction between OCE and MOCE in terms of the utility functions to be used in the model. In the classical OCE model, it requires the utility function to satisfy $u(0)=0$ and $1\in\partial u(0)$ . The new MOCE model does not require these conditions. Third, we propose a preference robust version of the new MOCE model for the case that the decision maker’s true utility function is ambiguous. Ambiguity does exist in practice and this paper provides a comprehensive treatment of the preference robust MOCE model from modelling to computational scheme and underlying theory. Fourth, in the case that the proposed RMOCE model is applied to data-driven problems where the underlying exogenous data (samples of $\xi$ ) are potentially contaminated, we derive sufficient conditions under which the RMOCE calculated with the data is statistically robust. Fifth, we outline potential extensions of the MOCE model from single decision making to multi-attribute decision making and point out potential applications in asset re-organization. In summary, this paper provides a new outlook of OCE in both modelling and analysis, which complement the existing research in the literature.

References

[1] A. E. Abbas. Multiattribute utility copulas. Operations Research, 57(6):1367–1383, 2009.
[2] A. E. Abbas and Z. Sun. Multiattribute utility functions satisfying mutual preferential independence. Operations Research, 63(2):378–393, 2015.
[3] B. Armbruster and E. Delage. Decision making under uncertainty when preference information is incomplete. Management science, 61(1):111–128, 2015.
[4] R. J. Aumann. Integrals of set-valued functions. Journal of mathematical analysis and applications, 12(1):1–12, 1965.
[5] D. Bampou and D. Kuhn. Scenario-free stochastic programming with polynomial decision rules. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 7806–7812. IEEE, 2011.
[6] A. Ben-Tal and M. Teboulle. Expected utility, penalty functions, and duality in stochastic nonlinear programming. Management Science, 32(11):1445–1466, 1986.
[7] A. Ben‐Tal and M. Teboulle. An old‐new concept of convex risk measures: The optimized certainty equivalent. Mathematical Finance, 17(3):449–476, 2007.
[8] D. Bertsimas and C. Caramanis. Adaptability via sampling. In 2007 46th IEEE Conference on Decision and Control, pages 4717–4722. IEEE, 2007.
[9] J. V. Burke, X. Chen, and H. Sun. The subdifferential of measurable composite max integrands and smoothing approximation. Mathematical Programming, 181(2):229–264, 2020.
[10] S. Cerreia-Vioglio, D. Dillenberger, and P. Ortoleva. Cautious expected utility and the certainty effect. Econometrica, 83(2):693–728, 2015.
[11] F. H. Clarke. Optimization and Nonsmooth Analysis. SIAM, 1990.
[12] J. H. Cochrane. Asset pricing: Revised edition. Princeton university press, 2009.
[13] R. Cont, R. Deguest, and G. Scandolo. Robustness and sensitivity analysis of risk measurement procedures. Quantitative finance, 10(6):593–606, 2010.
[14] E. Delage, S. Guo, and H. Xu. Shortfall risk models when information on loss function is incomplete. Operations Research, 2022.
[15] P. H. Farquhar. State of the art-utility assessment methods. Management science, 30(11):1283–1300, 1984.
[16] A. L. Gibbs and F. E. Su. On choosing and bounding probability metrics. International statistical review, 70(3):419–435, 2002.
[17] S. Guo and H. Xu. Statistical robustness in utility preference robust optimization models. Mathematical Programming, pages 1–42, 2021.
[18] S. Guo and H. Xu. Utility preference robust optimization with moment-type information structure. Optimization online, 2021.
[19] N. H. Hakansson. Optimal investment and consumption strategies under risk for a class of utility functions, pages 525–545. Elsevier, 1975.
[20] F. R. Hampel. A general qualitative definition of robustness. The annals of mathematical statistics, 42(6):1887–1896, 1971.
[21] W. Haskell, H. Xu, and W. Huang. Preference robust optimization for choice functions on the space of cdfs. SIAM Journal on Optimization, 2022.
[22] W. B. Haskell, W. Huang, and H. Xu. Preference elicitation and robust optimization with multi-attribute quasi-concave choice functions. arXiv preprint arXiv:1805.06632, 2018.
[23] J. Hu, M. Bansal, and S. Mehrotra. Robust decision making using a general utility set. European Journal of Operational Research, 269(2):699–714, 2018.
[24] J. Hu and S. Mehrotra. Robust decision making over a set of random targets or risk-averse utilities with an application to portfolio optimization. IIE Transactions, 47(4):358–372, 2015.
[25] J. Hu and G. Stepanyan. Optimization with reference-based robust preference constraints. SIAM Journal on Optimization, 27(4):2230–2257, 2017.
[26] P. J. Huber and E.M. Ronchetti. Robust Statistics. Wiley, Hoboken, 2nd edition edition, 2009.
[27] U. S. Karmarkar. Subjectively weighted utility: A descriptive extension of the expected utility model. Organizational behavior and human performance, 21(1):61–72, 1978.
[28] R. L. Keeney, H. Raiffa, and R. F. Meyer. Decisions with multiple objectives: preferences and value trade-offs. Cambridge university press, 1993.
[29] J. Kennedy and R. Eberhart. Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks, volume 4, pages 1942–1948. IEEE, 1995.
[30] V. Krätschmer, A. Schied, and H. Zähle. Qualitative and infinitesimal robustness of tail-dependent statistical functionals. Journal of Multivariate Analysis, 103(1):35–47, 2012.
[31] J. Y. Li. Inverse optimization of convex risk functions. Management Science, 67(11):7113–7141, 2021.
[32] J. Liu, Z. Chen, and H. Xu. Multistage utility preference robust optimization. arXiv preprint arXiv:2109.04789, 2021.
[33] F. Maccheroni. Maxmin under risk. Economic Theory, 19(4):823–831, 2002.
[34] E. Mezura-Montes and C. A. C. Coello. Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm and Evolutionary Computation, 1(4):173–194, 2011.
[35] O. Morgenstern and J. Von Neumann. Theory of games and economic behavior. Princeton university press, 1953.
[36] M. Nouiehed, J. Pang, and M. Razaviyayn. On the pervasiveness of difference-convexity in optimization and statistics. Mathematical Programming, 174(1):195–222, 2019.
[37] W. Ogryczak and A. Ruszczynski. Dual stochastic dominance and related mean-risk models. SIAM Journal on Optimization, 13(1):60–78, 2002.
[38] G. C. Pflug and W. Römisch. Modeling, measuring and managing risk. World Scientific, 2007.
[39] Z. Qi, Y. Cui, Y. Liu, and J. Pang. Estimation of individualized decision rules based on an optimized covariate-dependent equivalent of random outcomes. SIAM Journal on Optimization, 29(3):2337–2362, 2019.
[40] R. T. Rockafellar. Convex analysis, volume 36. Princeton university press, 1970.
[41] R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of risk, 2:21–42, 2000.
[42] W. Römisch. Stability of stochastic programming problems. Handbooks in operations research and management science, 10:483–554, 2003.
[43] D. Sauré and J. P. Vielma. Ellipsoidal methods for adaptive choice-based conjoint analysis. Operations Research, 67(2):315–338, 2019.
[44] L. L. Thurstone. A law of comparative judgment. Psychological review, 34(4):273, 1927.
[45] K. E. Train. Discrete choice methods with simulation. Cambridge university press, 2009.
[46] W. Wang and H. Xu. Robust spectral risk optimization when information on risk spectrum is incomplete. SIAM Journal on Optimization, 30(4):3198–3229, 2020.
[47] W. Wang, H. Xu, and T. Ma. Optimal scenario-dependent multivariate shortfall risk measure and its application in capital allocation. Available at SSRN 3849125, 2021.
[48] W. Wang, H. Xu, and T. Ma. Quantitative statistical robustness for tail-dependent law invariant risk measures. Quantitative Finance, pages 1–17, 2021.
[49] M. Weber. Decision making with incomplete information. European journal of operational research, 28(1):44–57, 1987.
[50] W. Wiesemann, D. Kuhn, and M. Sim. Distributionally robust convex optimization. Operations Research, 62(6):1358–1376, 2014.
[51] H. Xu and S. Zhang. Quantitative statistical robustness in distributionally robust optimization models. Pacific Journal of Optimization Special Issue, 2021.
[52] S. Zhang and H. Xu. Preference robust generalized shortfall risk measure based on the cumulative prospect theory when the value function and weighting functions are ambiguous. arXiv preprint arXiv:2112.10142, 2021.

	$\displaystyle\|R_{N}(\xi)-R(\xi)\|$	$\displaystyle\leq$	$\displaystyle\max_{x\in\mathds{R}}\left\|\inf_{u\in\mathbb{B}(u^{0},r)}\left\{u(x)+{\mathbb{E}}_{P}[u(\xi-x)]\right\}\right.$
			$\displaystyle\left.\quad\quad\quad-\inf_{u_{N}\in\mathbb{B}_{N}(u^{0}_{N},r)}\left\{u_{N}(x)+{\mathbb{E}}_{P}[u_{N}(\xi-x)]\right\}\right\|.$

			$\displaystyle\|R_{\infty}(\xi)-R_{[a,b]}(\xi)\|$
		$\displaystyle\leq$	$\displaystyle\sup_{x\in X}\left\|\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\left[u(x)+\int_{{\rm I\!R}}u(\xi-x)P(d\xi)\right]-\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]\right\|$
		$\displaystyle\leq$	$\displaystyle\sup_{x\in X}\left\|\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\left[u(x)+\int_{{\rm I\!R}}u(\xi-x)P(d\xi)\right]-\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]\right\|$
		$\displaystyle+$	$\displaystyle\sup_{x\in X}\left\|\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]-\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]\right\|.$

	$\displaystyle\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\left[u(x)+\int_{{\rm I\!R}}u(\xi-x)P(d\xi)\right]-\inf_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\left\|\int_{{\rm I\!R}}u(\xi-x)P(d\xi)-\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right\|+\|u(x)-\tilde{u}(x)\|\right]$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\Bigg{[}\left\|\int_{\xi-x\in[a,b]}u(\xi-x)P(d\xi)-\int_{\xi-x\in[a,b]}\tilde{u}(\xi-x)P(d\xi)\right\|$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\|u(x)-\tilde{u}(x)\|+\frac{2\epsilon}{3}\Bigg{]}$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\sup_{t\in[a,b]}\|u(t)-\tilde{u}(t)\|+\|u(x)-\tilde{u}(x)\|+\frac{2\epsilon}{3}\right]$
	$\displaystyle\leq\inf_{u\in\mathbb{B}_{\infty}(u^{0},r)}\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\mathsf{d\kern-0.70007ptl}_{\mathscr{G}_{I}}(u,\tilde{u})+\|u(x)-\tilde{u}(x)\|+\frac{2\epsilon}{3}\right].$

	$\displaystyle\sup_{x\in X}\Bigg{\|}\inf_{\hat{u}\in\mathbb{B}_{[a,b]}(u^{0}_{\rm truc},r)}\left[\hat{u}(x)+\int_{\xi-x\in[a,b]}\hat{u}(\xi-x)P(d\xi)\right]$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad-\inf_{\tilde{u}_{2}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}\left[\tilde{u}(x)+\int_{{\rm I\!R}}\tilde{u}(\xi-x)P(d\xi)\right]\Bigg{\|}$
	$\displaystyle\leq\sup_{\tilde{u}\in\tilde{\mathbb{B}}_{[a,b]}(u^{0},r+\epsilon)}(\|\tilde{u}(a)\|P((-\infty,a))+\|\tilde{u}(b)\|P((b,+\infty)))\leq\epsilon/3.$		(5.67)