This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sensitivity of multiperiod optimization problems with respect to the adapted Wasserstein distance

Daniel Bartl Daniel Bartl
Faculty of Mathematics
University of Vienna, Austria
[email protected]
 and  Johannes Wiesel Johannes Wiesel
Department of Statistics
Columbia University
[email protected]
Abstract.

We analyze the effect of small changes in the underlying probabilistic model on the value of multiperiod stochastic optimization problems and optimal stopping problems. We work in finite discrete time and measure these changes with the adapted Wasserstein distance. We prove explicit first-order approximations for both problems. Expected utility maximization is discussed as a special case.

Keywords: robust multiperiod stochastic optimization, sensitivity analysis, (adapted) Wasserstein distance, optimal stopping
Both authors thank Mathias Beiglböck, Yifan Jiang, and Jan Obłój for helpful discussions and two referees for a careful reading. DB acknowledges support from Austrian Science Fund (FWF) through projects ESP-31N and P34743. JW acknowledges support from NSF grant DMS-2205534.

1. Introduction

Consider a (real-valued) discrete-time stochastic process X=(Xt)t=1TX=(X_{t})_{t=1}^{T} whose probabilistic behavior is governed by a reference model PP. Typically, such a model could describe the evolution of the stochastic process in an idealized probabilistic setting, as is customary in mathematical finance, or it could be derived from historical observations, as is a common assumption in statistics and machine learning. In both cases, one expects PP to merely approximate the true but unknown model. Consequently, an important question pertains to the effect that (small) misspecifications of PP have on quantities of interest in these areas. In this note, we analyze this question in two fundamental instances: optimal stopping problems and convex multiperiod stochastic optimization problems. For simplicity we focus on the latter in this introduction: consider

v(P):=infa admissable controlEP[f(X,a(X))],v(P):=\inf_{a\text{ admissable control}}E_{P}[f(X,a(X))],

where f:T×Tf\colon\mathbb{R}^{T}\times\mathbb{R}^{T}\to\mathbb{R} is convex in the control variable (i.e., its second argument). The admissible controls are the (uniformly bounded) predictable functions a=(at)t=1Ta=(a_{t})_{t=1}^{T}, i.e., at(X)a_{t}(X) only depends on X1,,Xt1X_{1},\dots,X_{t-1}. For concreteness, let us mention that utility maximization—an essential problem in mathematical finance—falls into this framework, by setting

f(x,a):=U(g(x)t=1Tat(xtxt1)),f(x,a):=-U\Big{(}g(x)-\sum_{t=1}^{T}a_{t}(x_{t}-x_{t-1})\Big{)},

where U:U\colon\mathbb{R}\to\mathbb{R} is a concave utility function, g:Tg\colon\mathbb{R}^{T}\to\mathbb{R} is a payoff function and X0X_{0}\in\mathbb{R}—see Example 2.6 for more details.

The question how changes of the model PP influence the value of v(P)v(P) clearly depends on the chosen distance between models. In order to answer it in a generic way (i.e., without restricting to parametric models), one first needs to choose a suitable metric on the laws of stochastic processes 𝒫(T)\mathcal{P}(\mathbb{R}^{T}). A crucial observation that has appeared in different contexts and goes back at least to Aldous (1981); Hellwig (1996); Pflug (2010); Pflug and Pichler (2012), is that any metric compatible with weak convergence (and also variants thereof that account for integrability, such as the Wasserstein distance) is too coarse to guarantee continuity of the map Pv(P)P\mapsto v(P) in general. Roughly put, the reason is that two processes can have very similar laws but completely different filtrations; hence completely different sets of admissible controls. This fact has been rediscovered several times during the last decades, and researchers from different fields have introduced modifications of the weak topology that guarantee such continuity properties of Pv(P)P\mapsto v(P); we refer to Backhoff-Veraguas et al. (2020b) for detailed historical references. Strikingly, all the different modifications of the weak topology turn out to be equivalent to the so-called weak adapted topology: this is the coarsest topology that makes multiperiod optimization problems continuous, see Backhoff-Veraguas et al. (2020b); Bartl et al. (2021a).

With the choice of topology settled, the next question pertains to the choice of a suitable distance. This is already relevant in a one-period framework, where the weak and weak adapted topologies coincide. Recent research shows, that the Wasserstein distance, which metrizes the weak topology, is (perhaps surprisingly) powerful and versatile. Analogously, the adapted Wasserstein distance 𝒜𝒲p\mathcal{AW}_{p} (see Section 2 for the definition) metrizes the weak adapted topology and Pflug (2010); Backhoff-Veraguas et al. (2020a) show, that the multiperiod optimization problems considered in this note are Lipschitz-continuous w.r.t. 𝒜𝒲p\mathcal{AW}_{p}. However, the Lipschitz-constants depend on global continuity parameters of ff and are thus far from being sharp in general. Moreover, the exact computation of the (worst case) value of v(Q)v(Q), where QQ is in a neighbourhood of PP, requires solving an infinite-dimensional optimization problem, which does not admit explicit solutions in general. Both of these issues already occur in a one-period setting—despite the results of, e.g., Blanchet and Murthy (2019); Bartl et al. (2019), which relate this infinite-dimensional optimization problem to a simpler dual problem. In conclusion, computing the error

(r):=sup𝒜𝒲p(P,Q)rv(Q)v(P)\mathcal{E}(r):=\sup_{\mathcal{AW}_{p}(P,Q)\leq r}v(Q)-v(P)

exactly is only possible in a few (arguably degenerate) cases.

In this note we address both issues by extending ideas of Bartl et al. (2021b); Oblój and Wiesel (2021) from a one-period setting to a multiperiod setting. The key insight of these works is that in a one-period setting, computing first-order approximations for (r)\mathcal{E}(r) is virtually always possible, while obtaining exact expressions might be infeasible in many cases. Our results go hand in hand with those of Bartl et al. (2021b), and we obtain explicit closed-form solutions for (0)\mathcal{E}^{\prime}(0), which have intuitive interpretations. For instance, we show in Theorem 2.4, that under mild integrability and differentiability assumptions

supQ:𝒜𝒲p(P,Q)rv(Q)\displaystyle\sup_{Q\,:\,\mathcal{AW}_{p}(P,Q)\leq r}v(Q) =v(P)+r(t=1TEP[|EP[xtf(X,a(X))|tX]|pp1])p1p+o(r)\displaystyle=v(P)+r\cdot\Big{(}\sum_{t=1}^{T}E_{P}\big{[}\big{|}E_{P}\big{[}\partial_{x_{t}}f(X,a^{\ast}(X))\big{|}\mathcal{F}_{t}^{X}\big{]}\big{|}^{\frac{p}{p-1}}\big{]}\Big{)}^{\frac{p-1}{p}}+o(r)

holds as r0r\downarrow 0, where xtf(x,a)\partial_{x_{t}}f(x,a) is the partial derivative with respect to the tt-th coordinate of xx, tX=σ(X1,,Xt)\mathcal{F}^{X}_{t}=\sigma(X_{1},\dots,X_{t}), aa^{\ast} is the unique optimizer for v(P)v(P) and oo denotes the Landau symbol. In the case of utility maximization with p=q=2p=q=2 (and g0g\equiv 0 for simplicity), the first-order correction term is essentially the expected quadratic variation of aa^{\ast}, but not under PP, but distorted by the l2l^{2}-distance of the conditioned Radon-Nykodym density of an equivalent martingale measure w.r.t. PP—see Example 2.6 for details.

Investigating robustness of optimization problems in varying formulations is a recurring theme in the optimization literature; we refer to Rahimian and Mehrotra (2019) and the references therein for an overview. In the context of mathematical finance, representing distributional uncertainty through Wasserstein neighbourhoods goes back (at least) to Pflug and Wozabal (2007) and has seen a spike in recent research activity, leading to many impressive developments, see, e.g., duality results in Gao and Kleywegt (2016); Blanchet and Murthy (2019); Kuhn et al. (2019); Bartl et al. (2019) and applications in mathematical finance Blanchet et al. (2021), machine learning and statistics Shafieezadeh-Abadeh et al. (2019); Blanchet et al. (2020). Our theoretical results are directly linked to Acciaio and Hou (2022); Backhoff et al. (2022) characterizing the speed of convergence between the true and the (modified) empirical measure in the adapted Wasserstein distance and to new developments for computationally efficient relaxations of optimal transport problems, see Eckstein and Pammer (2022). For completeness, we mention that other notions of distance have been used to model distributional uncertainty, see e.g., Lam (2016, 2018); Calafiore (2007) in the context of operations research, Huber (2011); Lindsay (1994) in the context of statistics, and Herrmann and Muhle-Karbe (2017); Hobson (1998); Karoui et al. (1998) in the context of mathematical finance.

2. Main results

2.1. Preliminaries

We start by setting up notation. Let TT\in\mathbb{N}, let T\mathbb{R}^{T} be the path space of a stochastic process in finite discrete time, and let 𝒫p(T)\mathcal{P}_{p}({\mathbb{R}}^{T}) denote the set of all Borel-probability measures on T{\mathbb{R}}^{T} with finite pp-th moment. Throughout this article, X:TTX\colon{\mathbb{R}}^{T}\to{\mathbb{R}}^{T} is the identity (i.e., the canonical process) and X,Y:T×TTX,Y\colon{\mathbb{R}}^{T}\times{\mathbb{R}}^{T}\to{\mathbb{R}}^{T} denote the projections to the first and second coordinate, respectively. The filtration generated by XX is denoted by (tX)t=0T(\mathcal{F}^{X}_{t})_{t=0}^{T}, i.e., tX:=σ(Xs:st)\mathcal{F}_{t}^{X}:=\sigma(X_{s}:s\leq t) and 0X:={,T}{\mathcal{F}}_{0}^{X}:=\{\emptyset,{\mathbb{R}}^{T}\}. Sometimes we write (tX,Y)t=1T(\mathcal{F}^{X,Y}_{t})_{t=1}^{T} for the filtration generated by the processes (Xt,Yt)t=1T(X_{t},Y_{t})_{t=1}^{T}.

For a function f:T×Tf\colon{\mathbb{R}}^{T}\times{\mathbb{R}}^{T}\to{\mathbb{R}} we write xtf\partial_{x_{t}}f for the partial derivative of ff in tt-th coordinate of xx, that is,

xtf(x,a)=limε01ε(f(x+εet,a)f(x,a))\partial_{x_{t}}f(x,a)=\lim_{\varepsilon\downarrow 0}\frac{1}{\varepsilon}(f(x+\varepsilon e_{t},a)-f(x,a))

where ete_{t} is the tt-th unit vector; af\nabla_{a}f for the gradient in aa, and a2f\nabla_{a}^{2}f for the Hessian in aa. We adopt the same notation for functions f:Tf\colon{\mathbb{R}}^{T}\to{\mathbb{R}} or f:T×{1,,T}f\colon{\mathbb{R}}^{T}\times\{1,\dots,T\}\to{\mathbb{R}} and write xtf\partial_{x_{t}}f for the partial derivative of ff in tt-th coordinate of xTx\in{\mathbb{R}}^{T}. For univariate functions :\ell:{\mathbb{R}}\to{\mathbb{R}} we simply write ,′′\ell^{\prime},\ell^{\prime\prime} for the first and second derivatives.

Definition 2.1.

Let P,Q𝒫p(T)P,Q\in\mathcal{P}_{p}({\mathbb{R}}^{T}). A Borel probability measure π\pi on T×T{\mathbb{R}}^{T}\times{\mathbb{R}}^{T} is called coupling (between PP and QQ), if its first marginal distribution is PP and its second one is QQ. A coupling π\pi is called causal if

(2.1) π((Y1,,Yt)A|X1,,XT)=π((Y1,,Yt)A|X1,,Xt)\displaystyle\pi((Y_{1},\dots,Y_{t})\in A\,|\,X_{1},\dots,X_{T})=\pi((Y_{1},\dots,Y_{t})\in A\,|\,X_{1},\dots,X_{t})

π\pi-almost surely for all Borel sets AtA\subseteq{\mathbb{R}}^{t} and all 1tT1\leq t\leq T; a casual coupling is called bicausal if (2.1) holds also with the roles of XX and YY reversed.

Phrased differently, (2.1) means that under π\pi, conditionally on the ‘past’ X1,,XtX_{1},\dots,X_{t}, the ‘future’ Xt+1,,XTX_{t+1},\dots,X_{T} is independent of Y1,,YtY_{1},\dots,Y_{t}; see e.g. (Bartl et al., 2021a, Lemma 2.2) for this and further equivalent characterizations of (bi-)causality. It is also instructive to analyze condition (2.1) in the case of a Monge-coupling, i.e., when there is a transport map ψ:TT\psi\colon{\mathbb{R}}^{T}\to{\mathbb{R}}^{T} such that Y=ψ(X)=(ψt(X))t=1TY=\psi(X)=(\psi_{t}(X))_{t=1}^{T} π\pi-almost surely. Indeed, then (2.1) simply means that ψt\psi_{t} needs to be tX\mathcal{F}^{X}_{t}-measurable.

Fix p(1,)p\in(1,\infty) and define the adapted Wasserstein distance on 𝒫p(T)\mathcal{P}_{p}({\mathbb{R}}^{T}) by

(2.2) 𝒜𝒲p(P,Q):=infπ(t=1TEπ[|XtYt|p])1/p,\displaystyle\mathcal{AW}_{p}(P,Q):=\inf_{\pi}\Big{(}\sum_{t=1}^{T}E_{\pi}[|X_{t}-Y_{t}|^{p}]\Big{)}^{1/p},

where the infimum is taken over all bicausal couplings π\pi between PP and QQ. Set

Br(P):={Q𝒫p(T):𝒜𝒲p(P,Q)r}B_{r}(P):=\{Q\in\mathcal{P}_{p}({\mathbb{R}}^{T}):\mathcal{AW}_{p}(P,Q)\leq r\}

for r0r\geq 0, and denote by q:=p/(p1)q:=p/(p-1) the conjugate Hölder exponent of pp.

2.2. The uncontrolled case

We are now in a position to state the main results of the paper. We start with a simplified case, where ff depends on XX only and there are no controls. The sensitivities of the stochastic optimization and optimal stopping problems in Section 2.3 and 2.4 respectively can be seen as natural extensions of this result; indeed the sensitivity computed in Theorem 2.2 already exhibits the structure, which is common to all our results presented here.

Theorem 2.2.

Let f:Tf\colon{\mathbb{R}}^{T}\to{\mathbb{R}} be continuously differentiable and assume that there exists c>0c>0 such that

s=1T|xsf(x)|c(1+s=1T|xs|p1)\sum_{s=1}^{T}|\partial_{x_{s}}f(x)|\leq c\Big{(}1+\sum_{s=1}^{T}|x_{s}|^{p-1}\Big{)}

for every xTx\in{\mathbb{R}}^{T}. Then, as r0r\downarrow 0,

supQBr(P)EQ[f(X)]=EP[f(X)]+r(t=1TEP[|EP[xtf(X)|tX]|q])1/q+o(r).\displaystyle\sup_{Q\in B_{r}(P)}E_{Q}[f(X)]=E_{P}[f(X)]+r\cdot\Big{(}\sum_{t=1}^{T}E_{P}\Big{[}\big{|}E_{P}[\partial_{x_{t}}f(X)|\mathcal{F}^{X}_{t}]\big{|}^{q}\Big{]}\Big{)}^{1/q}+o(r).

2.3. Multiperiod stochastic optimization problems

Fix a constant LL throughout this section, and denote by 𝒜\mathcal{A} the set of all predictable controls bounded by LL, i.e., every a=(at)t=1T𝒜a=(a_{t})_{t=1}^{T}\in\mathcal{A} is such that at:Ta_{t}\colon\mathbb{R}^{T}\to\mathbb{R} only depends on x1,,xt1x_{1},\dots,x_{t-1} (with the convention that a1a_{1} is deterministic) and that |at|L|a_{t}|\leq L for a fixed constant LL. Recall that

v(Q)=infa𝒜EQ[f(X,a(X))],v(Q)=\inf_{a\in\mathcal{A}}E_{Q}[f(X,a(X))],

where f:T×Tf\colon\mathbb{R}^{T}\times\mathbb{R}^{T}\to\mathbb{R} is assumed to be convex in the control variable (i.e., its second argument).

Assumption 2.3.

For every xTx\in\mathbb{R}^{T}, f(x,)f(x,\cdot) is twice continuously differentiable and strongly convex in the sense that a2f(X,)ε(X)I\nabla^{2}_{a}f(X,\cdot)\succ\varepsilon(X)I on [L,L]T[-L,L]^{T} where II is the identity matrix111For two T×TT\times T-matrices AA and BB, we write ABA\succ B if ABA-B is positive semidefinite, that is, Az,zBz,z\langle Az,z\rangle\geq\langle Bz,z\rangle for all zTz\in{\mathbb{R}}^{T}. and P(ε(X)>0)=1P(\varepsilon(X)>0)=1.

Moreover, f(,a)f(\cdot,a) is differentiable for every aTa\in\mathbb{R}^{T}, its partial derivatives xtf\partial_{x_{t}}f are jointly continuous, and there is a constant c>0c>0 such that

s=1T|xsf(x,a)|c(1+s=1T|xs|p1)\displaystyle\sum_{s=1}^{T}|\partial_{x_{s}}f(x,a)|\leq c\cdot\Big{(}1+\sum_{s=1}^{T}|x_{s}|^{p-1}\Big{)}

for every xTx\in\mathbb{R}^{T} and a[L,L]Ta\in[-L,L]^{T}.

Theorem 2.4.

If Assumption 2.3 holds true, then there exists exactly one a𝒜a^{\ast}\in\mathcal{A} such that v(P)=EP[f(X,a(X))]v(P)=E_{P}[f(X,a^{\ast}(X))]. Furthermore, as r0r\downarrow 0,

supQBr(P)v(Q)=v(P)+r(t=1TEP[|EP[xtf(X,a(X))|tX]|q])1/q+o(r).\sup_{Q\in B_{r}(P)}v(Q)=v(P)+r\cdot\Big{(}\sum_{t=1}^{T}E_{P}\big{[}|E_{P}[\partial_{x_{t}}f(X,a^{\ast}(X))|\mathcal{F}_{t}^{X}]|^{q}\big{]}\Big{)}^{1/q}+o(r).
Remark 2.5.

The restriction to controls that are uniformly bounded (i.e., satisfy |at(x)|L|a_{t}(x)|\leq L) is necessary to guarantee continuity of Qv(Q)Q\mapsto v(Q) in general. This can be seen easily in the utility maximization example below—even when restricting to models that satisfy a no-arbitrage condition, see, e.g., (Backhoff-Veraguas et al., 2020a, Remark 5.3).

Example 2.6.

Let :\ell\colon\mathbb{R}\to\mathbb{R} be a convex loss function, i.e., \ell is bounded from below and convex. Moreover let g:Tg\colon\mathbb{R}^{T}\to\mathbb{R} be (the negative of) a payoff function and consider the problem

u(P):=infa𝒜EP[(g(X)+t=1Tat(X)(XtXt1))],u(P):=\inf_{a\in\mathcal{A}}E_{P}\Big{[}\ell\Big{(}g(X)+\sum_{t=1}^{T}a_{t}(X)(X_{t}-X_{t-1})\Big{)}\Big{]},

where X0X_{0}\in\mathbb{R} is a fixed value. As discussed in the introduction, u(P)u(P) corresponds to the utility maximisation problem with payoff gg.

Suppose that \ell is twice continuously differentiable with |(u)|c(1+|u|p1)|\ell^{\prime}(u)|\leq c(1+|u|^{p-1}) and ′′>0\ell^{\prime\prime}>0, that gg is continuously differentiable with bounded derivative, and that P(Xt1=Xt)=0P(X_{t-1}=X_{t})=0 for all tt. Then Assumption 2.3 is satisfied.

The assumption that P(Xt+1=Xt)=0P(X_{t+1}=X_{t})=0 is used to prove strong convexity in the sense of Assumption 2.3. In the present one-dimensional setting, it simply means that the stock price does not stay constant from time tt to t+1t+1 with positive probability. It is satisfied, for instance, if XX is a Binomial tree under PP, or if PP has a density with respect to the Lebesgue measure—in particular, if XX is a discretized SDE with non-zero volatility. Moreover the assumption that the derivative of gg is bounded can be relaxed at the price of restricting to \ell with a slower growth.

Corollary 2.7.

In the setting of Example 2.6: Let aa^{\ast} be the unique optimizer for u(P)u(P), set aT+1:=0,a^{\ast}_{T+1}:=0,

Z:=g(X)+t=1Tat(XtXt1),Z:=g(X)+\sum_{t=1}^{T}a^{\ast}_{t}(X_{t}-X_{t-1}),

and

V:=(t=1TEP[|(at+1at)EP[(Z)|tX]EP[(Z)xtg(X)|tX]|q])1/q.\displaystyle V:=\Big{(}\sum_{t=1}^{T}E_{P}\Big{[}\Big{|}(a^{\ast}_{t+1}-a^{\ast}_{t})\cdot E_{P}\big{[}\ell^{\prime}(Z)|\mathcal{F}_{t}^{X}\big{]}-E_{P}\big{[}\ell^{\prime}(Z)\partial_{x_{t}}g(X)|\mathcal{F}_{t}^{X}\big{]}\Big{|}^{q}\Big{]}\Big{)}^{1/q}.

Then

supQBr(P)u(Q)=u(P)+rV+o(r)as r0.\displaystyle\sup_{Q\in B_{r}(P)}u(Q)=u(P)+r\cdot V+o(r)\qquad\text{as }r\downarrow 0.

Note that for p=q=2p=q=2 and g=0g=0, FF is essentially the expected quadratic variation of aa^{\ast}, but not under PP, but distorted by the l2l^{2}-distance of the conditioned Radon-Nykodym density of an equivalent martingale measure w.r.t. PP.

2.4. Optimal stopping problems

Let f:T×{1,,T}f\colon{\mathbb{R}}^{T}\times\{1,\dots,T\}\to\mathbb{R} be such that f(X,t)f(X,t) is tX{\mathcal{F}}_{t}^{X}-measurable for t=1,,Tt=1,\dots,T and consider

s(Q):=infτST𝔼Q[f(X,τ)],\displaystyle s(Q):=\inf_{\tau\in\mathrm{ST}}\mathbb{E}_{Q}[f(X,\tau)],

where ST\mathrm{ST} refers to the set of all bounded stopping times with respect to the canonical filtration, i.e., τST\tau\in\mathrm{ST} if τ:T{1,,T}\tau\colon{\mathbb{R}}^{T}\to\{1,\dots,T\} is such that {τt}tX\{\tau\leq t\}\in\mathcal{F}_{t}^{X} for every tt.

Theorem 2.8.

Assume that f(,t)f(\cdot,t) is continuously differentiable for every t=1,,Tt=1,\dots,T and that there is a constant c>0c>0 such that

s=1T|xsf(x,t)|c(1+s=1T|xs|p1)\sum_{s=1}^{T}|\partial_{x_{s}}f(x,t)|\leq c\Big{(}1+\sum_{s=1}^{T}|x_{s}|^{p-1}\Big{)}

for every xTx\in{\mathbb{R}}^{T} and t=1,,Tt=1,\dots,T. Furthermore assume that there exists exactly one optimal stopping time τ\tau^{\ast} for s(P)s(P). Then, as r0r\downarrow 0,

supQBr(P)s(Q)=s(P)+r(t=1TEP[|EP[xtf(X,τ)|tX]|q])1/q+o(r).\sup_{Q\in B_{r}(P)}s(Q)=s(P)+r\cdot\left(\sum_{t=1}^{T}E_{P}\left[\left|E_{P}\left[\partial_{x_{t}}f(X,\tau^{\ast})|{\mathcal{F}}_{t}^{X}\right]\right|^{q}\right]\right)^{1/q}+o(r).
Example 2.9.

It is instructive to consider Theorem 2.8 in the special case where ff is Markovian, i.e., there is function g:g\colon{\mathbb{R}}\to{\mathbb{R}} such that f(x,t)=g(xt)f(x,t)=g(x_{t}) for all tt and xx. Indeed, in this case, the first-order correction term simplifies to EP[|g(Xτ)|q]1/qE_{P}[|g^{\prime}(X_{\tau^{\ast}})|^{q}]^{1/q}.

2.5. Extensions and open questions

To the best of our knowledge, this is the first work addressing the nonparametric sensitivity of multiperiod optimization problems (w.r.t. the adapted Wasserstein distance). Below we identify possible extensions, which are outside of the current scope of the current article. We plan to address these in future work.

  1. (1)

    Our results extend to d{\mathbb{R}}^{d}-valued stochastic processes and 𝒜𝒲p\mathcal{AW}_{p} defined with an arbitrary norm on d{\mathbb{R}}^{d}. Similarly, the set of predictable controls in Theorem 2.4 can be chosen to be any compact convex subset of d.{\mathbb{R}}^{d}. The modifications needed are in line with Bartl et al. (2021b).

  2. (2)

    A natural extension of our results from a financial perspective would be the analysis of sensitivities for robust option pricing: let PP be a martingale law (i.e. XX is a (tX)t=1T(\mathcal{F}_{t}^{X})_{t=1}^{T}-martingale under PP) and consider

    supQBr(P):Q is a martingale lawE[f(X)].\sup_{Q\in B_{r}(P):\,Q\text{ is a martingale law}}E[f(X)].

    In one period models (T=1T=1) this was carried out in Bartl et al. (2021b); Nendel and Sgarabottolo (2022). In a similar manner, it is natural to analyze sensitivity of robust American option pricing by considering only martingales in Theorem 2.8.

  3. (3)

    There are certain natural examples for ff that do not satisfy our regularity assumptions, e.g., in mathematical finance. In a one-period framework, regularity of ff can be relaxed systematically, see Bartl et al. (2021b); Nendel and Sgarabottolo (2022), and it is interesting to investigate if this is the case here as well.

  4. (4)

    In some examples, the restriction to bounded controls is automatic, see, e.g., Rásonyi and Stettner (2005). For instance, in the setting of Example 2.6 with g=0g=0, we suspect that similar arguments as used in Rásonyi and Stettner (2005) might show that a “conditional full support condition” of PP is sufficient to obtain first-order approximation with unbounded strategies.

  5. (5)

    We suspect that the assumption on the uniqueness of the optimizer in Theorems 2.4 and 2.8 can be relaxed. Indeed, at least in a two-period setting, modifications of the arguments presented here can cover the general case in Theorem 2.8.

  6. (6)

    Motivated from the literature on distributionally robust optimization problems cited in the introduction, one could also consider min-max problems of the form

    infa𝒜supQBr(P)EP[f(X,a(X))].\inf_{a\in\mathcal{A}}\sup_{Q\in B_{r}(P)}E_{P}[f(X,a(X))].

    An important observation is that most arguments in the analysis of such problems (in the one-period setting) heavily rely on (convexity and) compactness of Br(P)B_{r}(P); both properties fail to hold true in multiple periods. It was recently shown in Bartl et al. (2021a) that these can be recovered by passing to an appropriate factor space of processes together with general filtrations.

  7. (7)

    The present methods can be extended to cover functionals that depend not only on PP but also on its disintegrations—as is common in weak optimal transport (see, e.g., Gozlan et al. (2017)). As an example, consider T=2T=2 and J(Q):=EQ[f(X1)+g(QX1)]J(Q):=E_{Q}[f(X_{1})+g(Q_{X_{1}})], where the functions ff and gg are suitably (Fréchet) differentiable. Using the same arguments as in the proof of Theorem 2.2, one can show that the first-order correction term equals (EP[|f(X1)|q+|EP[g(PX1)]|q])1/q(E_{P}[|f^{\prime}(X_{1})|^{q}+|E_{P}[g^{\prime}(P_{X_{1}})]|^{q}])^{1/q}.222When completing a first draft of this paper, we learned that similar results have been established by Jiang in independent research.

3. Proofs

3.1. Proof of Theorem 2.2

We need the following technical lemma, which essentially states that causal couplings can be approximated by bicausal ones with similar marginals. For a Borel probability measure π\pi on T×T{\mathbb{R}}^{T}\times{\mathbb{R}}^{T} and a Borel mapping ϕ\phi from T×T{\mathbb{R}}^{T}\times{\mathbb{R}}^{T} to another Polish space, ϕπ\phi_{\ast}\pi denotes the push-forward of measure π\pi under ϕ\phi.

Lemma 3.1.

Let P,Q𝒫p(T)P,Q\in\mathcal{P}_{p}({\mathbb{R}}^{T}) and let π\pi be a causal coupling between PP and QQ. For each δ>0\delta>0 there exists Yδ:T×TTY^{\delta}\colon{\mathbb{R}}^{T}\times{\mathbb{R}}^{T}\to{\mathbb{R}}^{T} such that YtδY^{\delta}_{t} is tX,Y{\mathcal{F}}^{X,Y}_{t}-measurable, XtX_{t} is σ(Ytδ)\sigma(Y^{\delta}_{t})-measurable, and |YtδYt|δ|Y_{t}^{\delta}-Y_{t}|\leq\delta for every 1tT1\leq t\leq T.

In particular, πδ:=(X,Yδ)π\pi^{\delta}:=(X,Y^{\delta})_{\ast}\pi is a bicausal coupling between PP and Qδ:=YδπQ^{\delta}:=Y^{\delta}_{\ast}\pi.

Proof.

For δ>0\delta>0 we consider the Borel mappings

ψδ\displaystyle\psi_{\delta} :(0,δ) and\displaystyle\colon\mathbb{R}\to(0,\delta)\text{ and}
ϕδ\displaystyle\phi_{\delta} :δ:={δk:k}\displaystyle\colon\mathbb{R}\to\delta\mathbb{Z}:=\{\delta k:k\in\mathbb{Z}\}

where ψδ\psi_{\delta} is a (Borel-)isomorphism and ϕδ(x):=max{δk:δkx}\phi_{\delta}(x):=\max\{\delta k:\delta k\leq x\}. For t=1,,Tt=1,\dots,T, set

Ytδ:=ϕδ(Yt)+ψδ(Xt).Y_{t}^{\delta}:=\phi_{\delta}(Y_{t})+\psi_{\delta}(X_{t}).

By definition XtX_{t} is σ(Ytδ)\sigma(Y_{t}^{\delta})-measurable, YtδY^{\delta}_{t} is tX,Y\mathcal{F}^{X,Y}_{t}-measurable, and |YtδYt|δ|Y_{t}^{\delta}-Y_{t}|\leq\delta. It remains to note that the bicausality constraints (2.1) are clearly satisfied. ∎

The following proof serves as a blue print for the proofs of Theorems 2.4 and 2.8.

Proof of Theorem 2.2.

To simplify notation, set

Ft:=EP[xtf(X)|tX]for t=1,,T.F_{t}:=E_{P}[\partial_{x_{t}}f(X)|\mathcal{F}^{X}_{t}]\quad\text{for }t=1,\dots,T.

We first prove the upper bound, that is

(3.1) lim supr01r(supQBr(P)EQ[f(X)]EP[f(X)])(t=1TEP[|Ft|q])1/q.\displaystyle\limsup_{r\to 0}\frac{1}{r}\Big{(}\sup_{Q\in B_{r}(P)}E_{Q}[f(X)]-E_{P}[f(X)]\Big{)}\leq\Big{(}\sum_{t=1}^{T}E_{P}[|F_{t}|^{q}]\Big{)}^{1/q}.

To that end, for any r>0r>0, let Q=QrBr(P)Q=Q^{r}\in B_{r}(P) be such that

EQ[f(X)]supRBr(P)ER[f(X)]o(r),E_{Q}[f(X)]\geq\sup_{R\in B_{r}(P)}E_{R}[f(X)]-o(r),

and let π=πr\pi=\pi^{r} be an (almost) optimal bicausal coupling between PP and QQ, i.e.,

(t=1TEπ[|XtYt|p])1/p𝒜𝒲p(P,Q)+o(r)r+o(r).\Big{(}\sum_{t=1}^{T}E_{\pi}[|X_{t}-Y_{t}|^{p}]\Big{)}^{1/p}\leq\mathcal{AW}_{p}(P,Q)+o(r)\leq r+o(r).

The fundamental theorem of calculus and Fubini’s theorem imply

EQ[f(X)]EP[f(X)]\displaystyle E_{Q}[f(X)]-E_{P}[f(X)] =Eπ[f(Y)f(X)]\displaystyle=E_{\pi}[f(Y)-f(X)]
=t=1T01Eπ[xtf(X+λ(YX))(YtXt)]𝑑λ.\displaystyle=\sum_{t=1}^{T}\int_{0}^{1}E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X))\cdot(Y_{t}-X_{t})]\,d\lambda.

Moreover, by the tower property and Hölder’s inequality,

Eπ[xtf(X+λ(YX))(YtXt)]\displaystyle E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X))\cdot(Y_{t}-X_{t})]
=Eπ[Eπ[xtf(X+λ(YX))|tX,Y](YtXt)]\displaystyle=E_{\pi}\Big{[}E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X))|\mathcal{F}_{t}^{X,Y}]\cdot(Y_{t}-X_{t})\Big{]}
Eπ[|Eπ[xtf(X+λ(YX))|tX,Y]|q]1/qEπ[|YtXt|p]1/p.\displaystyle\leq E_{\pi}\Big{[}|E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X))|\mathcal{F}_{t}^{X,Y}]|^{q}\Big{]}^{1/q}\cdot E_{\pi}[|Y_{t}-X_{t}|^{p}]^{1/p}.

We next claim that, for every λ[0,1]\lambda\in[0,1],

(3.2) Eπ[|Eπ[xtf(X+λ(YX))|tX,Y]|q]1/qEP[|Ft|q]1/q\displaystyle E_{\pi}\Big{[}\Big{|}E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X))|\mathcal{F}_{t}^{X,Y}]\Big{|}^{q}\Big{]}^{1/q}\to E_{P}[|F_{t}|^{q}]^{1/q}

as r0r\to 0. Indeed, since π\pi is bicausal, we have that

Eπ[xtf(X)|tX,Y]=EP[xtf(X)|tX]=FtE_{\pi}[\partial_{x_{t}}f(X)|\mathcal{F}_{t}^{X,Y}]=E_{P}[\partial_{x_{t}}f(X)|\mathcal{F}_{t}^{X}]=F_{t}

π\pi-almost surely, see, e.g., (Bartl et al., 2021a, Lemma 2.2). Therefore, Jensen’s inequality shows that

Eπ[|Eπ[xtf(X+λ(YX))|tX,Y]Ft|q]\displaystyle E_{\pi}\Big{[}\Big{|}E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X))|\mathcal{F}_{t}^{X,Y}]-F_{t}\Big{|}^{q}\Big{]}
Eπ[|xtf(X+λ(YX))xtf(X)|q]\displaystyle\leq E_{\pi}\Big{[}|\partial_{x_{t}}f(X+\lambda(Y-X))-\partial_{x_{t}}f(X)|^{q}\Big{]}

which converges zero; this follows from the continuity of xtf\partial_{x_{t}}f, since t=1TEπ[|XtYt|p]0\sum_{t=1}^{T}E_{\pi}[|X_{t}-Y_{t}|^{p}]\to 0, and since |xtf(x)|qc~(1+s=1T|xs|p)|\partial_{x_{t}}f(x)|^{q}\leq\tilde{c}(1+\sum_{s=1}^{T}|x_{s}|^{p}) by the growth assumption and since q(p1)=pq(p-1)=p. Then (3.2) follows from the triangle inequality.

We conclude that

EQ[f(X)]EP[f(X)]\displaystyle E_{Q}[f(X)]-E_{P}[f(X)] t=1T(EP[|Ft|q]1/q+o(1))Eπ[|YtXt|p]1/p\displaystyle\leq\sum_{t=1}^{T}\Big{(}E_{P}[|F_{t}|^{q}]^{1/q}+o(1)\Big{)}E_{\pi}[|Y_{t}-X_{t}|^{p}]^{1/p}
(t=1TEP[|Ft|q]+o(1))1/q(t=1TEπ[|YtXt|p])1/p,\displaystyle\leq\Big{(}\sum_{t=1}^{T}E_{P}[|F_{t}|^{q}]+o(1)\Big{)}^{1/q}\Big{(}\sum_{t=1}^{T}E_{\pi}[|Y_{t}-X_{t}|^{p}]\Big{)}^{1/p},

where the second inequality follows from Hölder’s inequality between q(T)\ell^{q}(\mathbb{R}^{T}) and p(T)\ell^{p}(\mathbb{R}^{T}). Recalling that π\pi is an almost optimal bicausal coupling between PP and QQ and that 𝒜𝒲p(P,Q)r\mathcal{AW}_{p}(P,Q)\leq r, this shows (3.1).

It remains to prove the lower bound, that is,

(3.3) lim infr01r(supQBr(P)EQ[f(X)]EP[f(X)])(t=1TEP[|Ft|q])1/q.\displaystyle\liminf_{r\to 0}\frac{1}{r}\Big{(}\sup_{Q\in B_{r}(P)}E_{Q}[f(X)]-E_{P}[f(X)]\Big{)}\geq\Big{(}\sum_{t=1}^{T}E_{P}[|F_{t}|^{q}]\Big{)}^{1/q}.

To that end, we first use the duality between q(T)\|\cdot\|_{\ell^{q}(\mathbb{R}^{T})} and p(T)\|\cdot\|_{\ell^{p}(\mathbb{R}^{T})}, which yields the existence of a[0,)Ta\in[0,\infty)^{T} satisfying

(t=1TEP[|Ft|q])1/q=t=1TEP[|Ft|q]1/qat and t=1Tatp=1.\Big{(}\sum_{t=1}^{T}E_{P}[|F_{t}|^{q}]\Big{)}^{1/q}=\sum_{t=1}^{T}E_{P}[|F_{t}|^{q}]^{1/q}a_{t}\quad\text{ and }\sum_{t=1}^{T}a_{t}^{p}=1.

Next we use duality between Lq(P)\|\cdot\|_{L^{q}(P)} and Lp(P)\|\cdot\|_{L^{p}(P)} which yields the existence of random variables (Zt)t=1T(Z_{t})_{t=1}^{T} satisfying

EP[|Ft|q]1/qat=EP[FtZt]and EP[|Zt|p]1/p=atE_{P}[|F_{t}|^{q}]^{1/q}a_{t}=E_{P}[F_{t}Z_{t}]\quad\text{and }E_{P}[|Z_{t}|^{p}]^{1/p}=a_{t}

for t=1,,Tt=1,\dots,T. Combining both results,

(3.4) t=1TEP[FtZt]=(t=1TEP[|Ft|q])1/qandt=1TEP[|Zt|p]=1.\displaystyle\sum_{t=1}^{T}E_{P}[F_{t}Z_{t}]=\Big{(}\sum_{t=1}^{T}E_{P}[|F_{t}|^{q}]\Big{)}^{1/q}\quad\text{and}\quad\sum_{t=1}^{T}E_{P}[|Z_{t}|^{p}]=1.

Note that, since FtF_{t} is tX\mathcal{F}^{X}_{t}-measurable, ZtZ_{t} can be chosen tX\mathcal{F}^{X}_{t}-measurable as well.

At this point, for fixed r>0r>0, we would like to define QrQ^{r} as the law of X+rZX+rZ and π=πr\pi=\pi^{r} as the law of (X,X+rZ)(X,X+rZ). Since ZtZ_{t} is tX\mathcal{F}^{X}_{t}-measurable, π\pi is clearly causal. Unfortunately however, it does not need to be bicausal in general. We thus first apply Lemma 3.1 to PP and Q=QrQ=Q^{r} with δ=1/n\delta=1/n, which yields measures Qn=Qr,nQ^{n}=Q^{r,n} and processes Yn=Yr,nY^{n}=Y^{r,n} which satisfy the assertion of Lemma 3.1.

Now fix r>0r>0. Since

𝒜𝒲p(P,Qn)\displaystyle\mathcal{AW}_{p}(P,Q^{n}) (t=1TEπ[|XtYtn|p])1/p\displaystyle\leq\Big{(}\sum_{t=1}^{T}E_{\pi}[|X_{t}-Y_{t}^{n}|^{p}]\Big{)}^{1/p}
(t=1T(rat)p)1/p+Tn=r+Tn,\displaystyle\leq\Big{(}\sum_{t=1}^{T}(ra_{t})^{p}\Big{)}^{1/p}+\frac{T}{n}=r+\frac{T}{n},

we have that, for every ε>0\varepsilon>0,

supRBr+ε(P)ER[f(X)]EP[f(X)]limnEπ[f(Yn)f(X)].\sup_{R\in B_{r+\varepsilon}(P)}E_{R}[f(X)]-E_{P}[f(X)]\geq\lim_{n\to\infty}E_{\pi}[f(Y^{n})-f(X)].

Using the fundamental theorem of calculus and Fubini’s theorem as before, the fact that YtnY^{n}_{t} is tX\mathcal{F}^{X}_{t}-measurable shows that

EP[f(Yn)f(X)]\displaystyle E_{P}[f(Y^{n})-f(X)] =t=1T01EP[xtf(X+λ(YtnXt))(YtnXt)]𝑑λ\displaystyle=\sum_{t=1}^{T}\int_{0}^{1}E_{P}\Big{[}\partial_{x_{t}}f(X+\lambda(Y^{n}_{t}-X_{t}))\cdot(Y^{n}_{t}-X_{t})\Big{]}\,d\lambda
=t=1T01EP[EP[xtf(X+λ(YtnXt))|tX](YtnXt)]𝑑λ\displaystyle=\sum_{t=1}^{T}\int_{0}^{1}E_{P}\Big{[}E_{P}[\partial_{x_{t}}f(X+\lambda(Y^{n}_{t}-X_{t}))|{\mathcal{F}}_{t}^{X}]\cdot(Y^{n}_{t}-X_{t})\Big{]}\,d\lambda
t=1T01EP[EP[xtf(X+λrZ)|tX]rZt]𝑑λ,\displaystyle\to\sum_{t=1}^{T}\int_{0}^{1}E_{P}\Big{[}E_{P}[\partial_{x_{t}}f(X+\lambda rZ)|{\mathcal{F}}_{t}^{X}]\cdot rZ_{t}\Big{]}\,d\lambda,

as nn\to\infty, by the growth assumption since YtnXtZtY^{n}_{t}-X_{t}\to Z_{t} in Lp(P)L^{p}(P).

In a final step we let r0r\to 0. Applying the previous step to ε=o(r)\varepsilon=o(r) shows that

lim infr01r(supRBr(P)ER[f(X)]EP[f(X)])\displaystyle\liminf_{r\to 0}\frac{1}{r}\Big{(}\sup_{R\in B_{r}(P)}E_{R}[f(X)]-E_{P}[f(X)]\Big{)}
lim infr0t=1T01EP[EP[xtf(X+λrZ)|tX]Zt]𝑑λ\displaystyle\geq\liminf_{r\to 0}\sum_{t=1}^{T}\int_{0}^{1}E_{P}\big{[}E_{P}[\partial_{x_{t}}f(X+\lambda rZ)|{\mathcal{F}}_{t}^{X}]\cdot Z_{t}\big{]}\,d\lambda
=t=1TEP[EP[xtf(X)|tX]Zt]dλ,\displaystyle=\sum_{t=1}^{T}E_{P}\big{[}E_{P}[\partial_{x_{t}}f(X)|{\mathcal{F}}_{t}^{X}]\cdot Z_{t}\big{]}\,d\lambda,

where the equality holds by using the growth assumption on |xtf||\partial_{x_{t}}f|. Recalling the choice of (Zt)t=1T(Z_{t})_{t=1}^{T} (see (3.4)) completes the proof. ∎

3.2. Proof of Theorem 2.4

The proof of Theorem 2.4 has a similar structure as the proof of Theorem 2.2, but some additional arguments have to be made in order to take care of the optimization in a𝒜a\in\mathcal{A}. Throughout, we work under Assumption 2.3. We start with two auxiliary results.

Lemma 3.2.

Let (Qn)n(Q_{n})_{n\in\mathbb{N}} be such that 𝒜𝒲p(P,Qn)0\mathcal{AW}_{p}(P,Q_{n})\to 0 for nn\to\infty. Then v(Qn)v(P)v(Q^{n})\to v(P).

Proof.

Let Q𝒫p(T)Q\in\mathcal{P}_{p}({\mathbb{R}}^{T}) and let π\pi be a bicausal coupling between PP and QQ. Let ε>0\varepsilon>0 be arbitrary and fix a𝒜a\in\mathcal{A} that satisfies EP[f(X,a(X))]v(P)+εE_{P}[f(X,a(X))]\leq v(P)+\varepsilon. Next define bb by

bt:=Eπ[at(X)|TY]for t=1,,T.b_{t}:=E_{\pi}[a_{t}(X)|{\mathcal{F}}_{T}^{Y}]\qquad\text{for }t=1,\dots,T.

By bicausality, btb_{t} is actually measurable with respect to t1Y{\mathcal{F}}_{t-1}^{Y} (see, e.g., (Bartl et al., 2021a, Lemma 2.2)) and clearly |bt|L|b_{t}|\leq L; thus b𝒜b\in\mathcal{A}. Moreover, convexity of f(x,)f(x,\cdot) implies that

v(Q)\displaystyle v(Q) EQ[f(Y,b(Y))]=Eπ[f(Y,Eπ[a(X)|TY])]Eπ[f(Y,a(X))].\displaystyle\leq E_{Q}[f(Y,b(Y))]=E_{\pi}[f(Y,E_{\pi}[a(X)|{\mathcal{F}}_{T}^{Y}])]\leq E_{\pi}[f(Y,a(X))].

The fundamental theorem of calculus and Hölder’s inequality yield

Eπ[f(Y,a(X))]EP[f(X,a(X))]\displaystyle E_{\pi}[f(Y,a(X))]-E_{P}[f(X,a(X))]
=01t=1TEπ[xtf(X+λ(YX),a(X))(YtXt)]dλ\displaystyle=\int_{0}^{1}\sum_{t=1}^{T}E_{\pi}\Big{[}\partial_{x_{t}}f(X+\lambda(Y-X),a(X))(Y_{t}-X_{t})\Big{]}\,d\lambda
01t=1TEπ[|xtf(X+λ(YX),a(X))|q]1/qEπ[|YtXt|p]1/pdλ.\displaystyle\leq\int_{0}^{1}\sum_{t=1}^{T}E_{\pi}[|\partial_{x_{t}}f(X+\lambda(Y-X),a(X))|^{q}]^{1/q}E_{\pi}[|Y_{t}-X_{t}|^{p}]^{1/p}\,d\lambda.

Using the growth assumption and arguing as in the proof of Theorem 2.2, the last term is at most of order 𝒜𝒲p(P,Q)\mathcal{AW}_{p}(P,Q). As ε\varepsilon was arbitrary, this shows v(Q)v(P)O(𝒜𝒲p(P,Q))v(Q)-v(P)\leq O(\mathcal{AW}_{p}(P,Q)) (where again OO denotes the Landau symbol) and reversing the roles of PP and QQ completes the proof. ∎

Lemma 3.3.

There exists exactly one a𝒜a^{\ast}\in\mathcal{A} such that v(P)=EP[f(X,a(X))]v(P)=E_{P}[f(X,a^{\ast}(X))].

Proof.

This is a standard result. The existence follows from Komlos’ lemma Komlós (1967) and uniqueness from strict convexity. ∎

Proof of Theorem 2.4.

Let a𝒜a^{\ast}\in\mathcal{A} be the unique optimizer of v(P)v(P) (see Lemma 3.3) and, for shorthand notation, set Ft:=EP[xtf(X,a(X))|tX]F_{t}:=E_{P}[\partial_{x_{t}}f(X,a^{\ast}(X))|\mathcal{F}_{t}^{X}] for t=1,,Tt=1,\dots,T.

We first prove the upper bound. We claim that it follows from combining the reasoning in the proof of Theorem 2.2 and Lemma 3.2. Indeed, let QrBr(P)Q^{r}\in B_{r}(P) be such that

v(Qr)supQBr(P)v(Q)o(r)v(Q^{r})\geq\sup_{Q\in B_{r}(P)}v(Q)-o(r)

and let π=πr\pi=\pi^{r} be a bicausal coupling between PP and QrQ^{r} that is (almost) optimal for 𝒜𝒲p(P,Qr)\mathcal{AW}_{p}(P,Q^{r}). Define br𝒜b^{r}\in\mathcal{A} by br:=Eπ[a(X)|TY]b^{r}:=E_{\pi}[a^{\ast}(X)|{\mathcal{F}}_{T}^{Y}] and use convexity of f(x,)f(x,\cdot) to conclude that

v(Qr)Eπ[f(Y,br(Y))]Eπ[f(Y,a(X))].v(Q^{r})\leq E_{\pi}[f(Y,b^{r}(Y))]\leq E_{\pi}[f(Y,a^{\ast}(X))].

From here on, it follows from the fundamental theorem of calculus and Hölder’s inequality just as in the proof of Theorem 2.2 that

v(Qr)v(P)\displaystyle v(Q^{r})-v(P) t=1TEπ[xtf(X,a(X))(YtXt)]+o(r)\displaystyle\leq\sum_{t=1}^{T}E_{\pi}[\partial_{x_{t}}f(X,a^{\ast}(X))\cdot(Y_{t}-X_{t})]+o(r)
r(t=1TEπ[|Ft|q])1/q+o(r).\displaystyle\leq r\Big{(}\sum_{t=1}^{T}E_{\pi}[|F_{t}|^{q}]\Big{)}^{1/q}+o(r).

This completes the proof of the upper bound.

We proceed with the lower bound. To that end, we start with the same construction as in the proof of Theorem 2.2: let π=πr\pi=\pi^{r} be the law of (X,X+rZ)(X,X+rZ) where ZZ satisfies (3.4), that is, ZtZ_{t} is tX\mathcal{F}_{t}^{X}-measurable for every tt such that

t=1TEP[|Zt|p]1and(t=1TEP[|Ft|q])1/q=t=1TEP[FtZt].\sum_{t=1}^{T}E_{P}[|Z_{t}|^{p}]\leq 1\quad\text{and}\quad\Big{(}\sum_{t=1}^{T}E_{P}[|F_{t}|^{q}]\Big{)}^{1/q}=\sum_{t=1}^{T}E_{P}[F_{t}Z_{t}].

Again, π\pi might be only causal and not bicausal, and we need to rely on Lemma 3.1. For the sake of a clearer presentation, we ignore this step this time.

For each rr, let ar𝒜a^{r}\in\mathcal{A} be almost optimal for v(Qr)v(Q^{r}), that is

EQr[f(Y,ar(Y))]v(Qr)+o(r).E_{Q^{r}}[f(Y,a^{r}(Y))]\leq v(Q^{r})+o(r).

Observe that, by construction of QrQ^{r} (i.e., since ZtZ_{t} is tX\mathcal{F}_{t}^{X}-measurable for each tt), there is br𝒜b^{r}\in\mathcal{A} such that π(ar(Y)=br(X))=1\pi(a^{r}(Y)=b^{r}(X))=1.

Now let (rn)n(r_{n})_{n} be an arbitrary sequence that converges to zero. By Lemma 3.4 below, after passing to subsequence (rnk)k(r_{n_{k}})_{k}, brnk(X)b^{r_{n_{k}}}(X) converges to a(X)a^{\ast}(X) PP-almost surely. Since

v(P)EP[f(X,brnk(X))]v(P)\leq E_{P}[f(X,b^{r_{n_{k}}}(X))]

for all kk, the fundamental theorem of calculus and the growth assumption imply

v(Qrnk)v(P)\displaystyle v(Q^{r_{n_{k}}})-v(P) EP[f(X+rnkZ,brnk(X))f(X,brnk(X))]o(rnk)\displaystyle\geq E_{P}[f(X+r_{n_{k}}Z,b^{r_{n_{k}}}(X))-f(X,b^{r_{n_{k}}}(X))]-o(r_{n_{k}})
t=1TEP[EP[xtf(X,brnk(X))|tX]rnkZt]o(rnk).\displaystyle\geq\sum_{t=1}^{T}E_{P}[E_{P}[\partial_{x_{t}}f(X,b^{r_{n_{k}}}(X))|\mathcal{F}_{t}^{X}]\cdot r_{n_{k}}Z_{t}]-o(r_{n_{k}}).

Since brnk(X)a(X)b^{r_{n_{k}}}(X)\to a^{\ast}(X) PP-almost surely, the continuity of xtf\partial_{x_{t}}f and the growth assumption imply that

lim infkv(Qrnk)v(P)rnk\displaystyle\liminf_{k\to\infty}\frac{v(Q^{r_{n_{k}}})-v(P)}{r_{n_{k}}} t=1TEP[EP[xtf(X,a(X))|tX]Zt].\displaystyle\geq\sum_{t=1}^{T}E_{P}[E_{P}[\partial_{x_{t}}f(X,a^{\ast}(X))|\mathcal{F}_{t}^{X}]\cdot Z_{t}].

To complete the proof, it remains to recall the choice of ZZ and that (rn)n(r_{n})_{n} was an arbitrary sequence. ∎

Lemma 3.4.

In the setting of the proof of Theorem 2.4: there exists a subsequence (rnk)k(r_{n_{k}})_{k} such that brnk(X)a(X)b^{r_{n_{k}}}(X)\to a^{\ast}(X) PP-almost surely.

Proof.

Recall that brnb^{r_{n}} was chosen almost optimally for v(Qrn)v(Q^{r_{n}}), hence

v(Qrn)\displaystyle v(Q^{r_{n}}) EP[f(X+rnZ,brn(X))]o(rn)EP[f(X,brn(X))]O(rn),\displaystyle\geq E_{P}[f(X+r_{n}Z,b^{r_{n}}(X))]-o(r_{n})\geq E_{P}[f(X,b^{r_{n}}(X))]-O(r_{n}),

where the last inequality holds by continuity and the growth assumptions on ff, see the proof of Theorem 2.4. Next recall that a2f(X,a)ε(X)I\nabla_{a}^{2}f(X,a)\succ\varepsilon(X)I for a[L,L]Ta\in[-L,L]^{T} with P(ε(X)>0)=1P(\varepsilon(X)>0)=1. In particular, a second order Taylor expansion shows that

EP[f(X,brn(X))]EP[f(X,a(X))]\displaystyle E_{P}[f(X,b^{r_{n}}(X))]-E_{P}[f(X,a^{\ast}(X))]
EP[af(X,a(X)),brn(X)a(X)]+EP[ε(X)2brn(X)a(X)2(T)2].\displaystyle\geq E_{P}\Big{[}\langle\nabla_{a}f(X,a^{\ast}(X)),b^{r_{n}}(X)-a^{\ast}(X)\rangle\Big{]}+E_{P}\Big{[}\frac{\varepsilon(X)}{2}\|b^{r_{n}}(X)-a^{\ast}(X)\|_{\ell^{2}({\mathbb{R}}^{T})}^{2}\Big{]}.

The first term is non-negative by optimality of aa^{\ast}. Thus, since v(Qrn)v(P)v(Q^{r_{n}})\to v(P) by Lemma 3.2, this implies that the second term must converge to zero. As ε\varepsilon is strictly positive, this can only happen if brn(X)a(X)b^{r_{n}}(X)\to a^{\ast}(X) in PP-probability. Hence, after passing to a subsequence, brn(X)a(X)b^{r_{n}}(X)\to a^{\ast}(X) PP-almost surely. ∎

3.3. Proof of Corollary 2.7

For shorthand notation, set

(ax)T:=t=1Tat(xtxt1).(a\cdot x)_{T}:=\sum_{t=1}^{T}a_{t}(x_{t}-x_{t-1}).

The goal is to apply Theorem 2.4 to the function

f(x,a):=(g(x)+(ax)T)f(x,a):=\ell(g(x)+(a\cdot x)_{T})

for (x,a)T×T(x,a)\in{\mathbb{R}}^{T}\times\mathbb{R}^{T}. To that end, we start by checking Assumption 2.3. Since gg continuously differentiable and \ell is twice continuously differentiable, the parts of Assumption 2.3 pertaining to the differentiability of ff hold true. Moreover,

a2f(x,a)u,u=′′(g(x)+(ax)T)t=1Tut2(xtxt1)2\langle\nabla^{2}_{a}f(x,a)u,u\rangle=\ell^{\prime\prime}(g(x)+(a\cdot x)_{T})\sum_{t=1}^{T}u_{t}^{2}(x_{t}-x_{t-1})^{2}

for any uTu\in\mathbb{R}^{T}. Since ′′>0\ell^{\prime\prime}>0 and P(Xt=Xt1)=0P(X_{t}=X_{t-1})=0 for every tt by assumption, one can readily verify that there is ε(X)\varepsilon(X) with P(ε(X)>0)=1P(\varepsilon(X)>0)=1 such that

a2f(X,)ε(X)Ion [L,L]T.\nabla^{2}_{a}f(X,\cdot)\succ\varepsilon(X)I\qquad\text{on }[-L,L]^{T}.

Next observe that

xtf(x,a)=(g(x)+(ax)T)(xtg(x)+(atat+1)).\partial_{x_{t}}f(x,a)=\ell^{\prime}(g(x)+(a\cdot x)_{T})(\partial_{x_{t}}g(x)+(a_{t}-a_{t+1})).

A quick computation involving the growth assumption on gg^{\prime} and \ell^{\prime} shows that

|xtf(x,a)|c(1+s=1T|xs|p1) for all xT and a[L,L]T.|\partial_{x_{t}}f(x,a)|\leq c\Big{(}1+\sum_{s=1}^{T}|x_{s}|^{p-1}\Big{)}\qquad\text{ for all }x\in\mathbb{R}^{T}\text{ and }a\in[-L,L]^{T}.

In particular, Assumption 2.3 is satisfied, and the proof follows by applying Theorem 2.4. ∎

3.4. Proof of Theorem 2.8

We start with the upper bound. To that end, let τ\tau^{\ast} be the optimal stopping time for s(P)s(P), let QBr(P)Q\in B_{r}(P) be such that s(Q)supRBr(P)s(R)o(r)s(Q)\geq\sup_{R\in B_{r}(P)}s(R)-o(r), and let π\pi be a (almost) optimal bicausal coupling for 𝒜𝒲p(P,Q)\mathcal{AW}_{p}(P,Q). Using a similar argument as in Lemma 3.2, we can use the coupling π\pi to build a stopping time τ\tau such that

EQ[f(X,τ(X))]Eπ[f(Y,τ(X))]E_{Q}[f(X,\tau(X))]\leq E_{\pi}[f(Y,\tau^{\ast}(X))]

—see (Backhoff-Veraguas et al., 2020b, Lemma 7.1) or (Bartl et al., 2021a, Proposition 5.8) for detailed proofs. Under the growth assumption on ff, the fundamental theorem of calculus and Fubini’s theorem yield

s(Q)s(P)\displaystyle s(Q)-s(P) Eπ[f(Y,τ(X))f(X,τ(X))]+o(r)\displaystyle\leq E_{\pi}[f(Y,\tau^{\ast}(X))-f(X,\tau^{\ast}(X))]+o(r)
=01t=1TEπ[xtf(X+λ(YX),τ(X))(YtXt)]dλ+o(r)\displaystyle=\int_{0}^{1}\sum_{t=1}^{T}E_{\pi}\left[\partial_{x_{t}}f(X+\lambda(Y-X),\tau^{\ast}(X))\cdot(Y_{t}-X_{t})\right]\,d\lambda+o(r)
r01(t=1TEπ[|Eπ[xtf(X+λ(YX),τ(X))|tX,Y]|q])1/qdλ\displaystyle\leq r\int_{0}^{1}\Big{(}\sum_{t=1}^{T}E_{\pi}\left[|E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X),\tau^{\ast}(X))|{\mathcal{F}}_{t}^{X,Y}]|^{q}\right]\Big{)}^{1/q}\,d\lambda
+o(r),\displaystyle\quad+o(r),

where the last inequality follows from Hölder’s inequality and since

t=1TEπ[|XtYt|p]rp\sum_{t=1}^{T}E_{\pi}[|X_{t}-Y_{t}|^{p}]\leq r^{p}

in the same way as in the proof of Theorem 2.2. We also conclude using similar arguments that

limr0Eπ[|Eπ[xtf(X+λ(YX),τ(X))|tX,Y]|q]=EP[|EP[xtf(X,τ)|tX]|q]\displaystyle\lim_{r\to 0}E_{\pi}\left[|E_{\pi}[\partial_{x_{t}}f(X+\lambda(Y-X),\tau^{\ast}(X))|{\mathcal{F}}_{t}^{X,Y}]|^{q}\right]=E_{P}\left[|E_{P}[\partial_{x_{t}}f(X,\tau^{\ast})|{\mathcal{F}}_{t}^{X}]|^{q}\right]

for every λ[0,1]\lambda\in[0,1] and every t=1,,Tt=1,\dots,T.

We proceed with the lower bound. To make the presentation concise, we assume here that T=2T=2—the general case follows from a (somewhat tedious) adaptation of the arguments presented here. The assumption that the optimal stopping time τ\tau^{\ast} is unique implies, by the Snell envelope theorem, that

P(f(X,1)EP[f(X,2)|1X])=1;P(f(X,1)\neq E_{P}[f(X,2)|{\mathcal{F}}_{1}^{X}])=1;

in particular

(3.5) {τ=1}={f(X,1)<EP[f(X,2)|1X]},{τ=2}={f(X,1)>EP[f(X,2)|1X]}.\displaystyle\begin{split}\{\tau^{\ast}=1\}&=\{f(X,1)<E_{P}[f(X,2)|{\mathcal{F}}_{1}^{X}]\},\\ \{\tau^{\ast}=2\}&=\{f(X,1)>E_{P}[f(X,2)|{\mathcal{F}}_{1}^{X}]\}.\end{split}

As before, set Ft:=EP[xtf(X,τ)|tX]F_{t}:=E_{P}[\partial_{x_{t}}f(X,\tau^{\ast})|{\mathcal{F}}_{t}^{X}] and take ZZ that satisfies (3.4), i.e., ZtZ_{t} is tX\mathcal{F}_{t}^{X}-measurable for every tt, and

(3.6) EP[|Z1|p]+EP[|Z2|p]1andEP[F1Z1]+EP[F2Z2]=(EP[|F1|q]+EP[|F2|q])1/q.\displaystyle\begin{split}E_{P}[|Z_{1}|^{p}]+E_{P}[|Z_{2}|^{p}]&\leq 1\quad\text{and}\\ E_{P}[F_{1}Z_{1}]+E_{P}[F_{2}Z_{2}]&=(E_{P}[|F_{1}|^{q}]+E_{P}[|F_{2}|^{q}])^{1/q}.\end{split}

Next, for every r>0r>0, set

Ar\displaystyle A^{r} :={f(X+rZ,1)<EP[f(X+rZ,2)|1X]}{τ=1},\displaystyle:=\{f(X+rZ,1)<E_{P}[f(X+rZ,2)|{\mathcal{F}}_{1}^{X}]\}\cap\{\tau^{\ast}=1\},
Br\displaystyle B^{r} :={f(X+rZ,1)>EP[f(X+rZ,2)|1X]}{τ=2}.\displaystyle:=\{f(X+rZ,1)>E_{P}[f(X+rZ,2)|{\mathcal{F}}_{1}^{X}]\}\cap\{\tau^{\ast}=2\}.

Define the process

Xr:=X+rZ𝟏ArBr.X^{r}:=X+rZ\mathbf{1}_{A^{r}\cup B^{r}}.

Since Ar,BrA^{r},B^{r} and Z1Z_{1} are 1X{\mathcal{F}}_{1}^{X}-measurable, the coupling πr:=(X,Xr)P\pi^{r}:=(X,X^{r})_{\ast}P is causal between PP and Pr:=(Xr)PP^{r}:=(X^{r})_{\ast}P. Using Lemma 3.1 (just as in the proof of Theorem 2.2), we can actually assume without loss of generality that πr\pi^{r} is in fact bicausal and that Xrt=Xt\mathcal{F}^{X^{r}}_{t}=\mathcal{F}^{X}_{t} for each tt—we will leave this detail to the reader and proceed.

In particular, since

|XtXtr|r|Zt|,|X_{t}-X_{t}^{r}|\leq r|Z_{t}|,

it follows from (3.6) that 𝒜𝒲p(P,Pr)r\mathcal{AW}_{p}(P,P^{r})\leq r; thus

(3.7) supQBr(P)s(Q)\displaystyle\sup_{Q\in B_{r}(P)}s(Q) infτSTEP[f(Xr,τ(Xr))]\displaystyle\geq\inf_{\tau\in\mathrm{ST}}E_{P}[f(X^{r},\tau(X^{r}))]
(3.8) =EP[f(Xr,1)EP[f(Xr,2)|X1]],\displaystyle=E_{P}[f(X^{r},1)\wedge E_{P}[f(X^{r},2)|\mathcal{F}^{X}_{1}]],

where the equality holds by the Snell envelope theorem and since X1=Xr1\mathcal{F}^{X}_{1}=\mathcal{F}^{X^{r}}_{1}.

Next note that

f(Xr,1)<E[f(Xr,2)|X1]\displaystyle f(X^{r},1)<E[f(X^{r},2)|\mathcal{F}^{X}_{1}] and f(X,1)<E[f(X,2)|X1]on Ar,\displaystyle\text{ and }f(X,1)<E[f(X,2)|\mathcal{F}^{X}_{1}]\qquad\text{on }A^{r},
f(Xr,1)>E[f(Xr,2)|X1]\displaystyle f(X^{r},1)>E[f(X^{r},2)|\mathcal{F}^{X}_{1}] and f(X,1)>E[f(X,2)|X1]on Br,\displaystyle\text{ and }f(X,1)>E[f(X,2)|\mathcal{F}^{X}_{1}]\qquad\text{on }B^{r},
f(Xr,1)EP[f(Xr,2)|X1]\displaystyle f(X^{r},1)\wedge E_{P}[f(X^{r},2)|\mathcal{F}^{X}_{1}] =f(X,1)EP[f(X,2)|X1]on (ArBr)c1X.\displaystyle=f(X,1)\wedge E_{P}[f(X,2)|\mathcal{F}^{X}_{1}]\qquad\,\,\,\text{on }(A^{r}\cup B^{r})^{c}\in\mathcal{F}_{1}^{X}.

Combined with (3.7) and since

s(P)=EP[f(X,1)E[f(X,2)|1X]],s(P)=E_{P}[f(X,1)\wedge E[f(X,2)|{\mathcal{F}}_{1}^{X}]],

we get

supQBr(P)s(Q)s(P)\displaystyle\sup_{Q\in B_{r}(P)}s(Q)-s(P) EP[(f(Xr,1)f(X,1))𝟏Ar\displaystyle\geq E_{P}\big{[}(f(X^{r},1)-f(X,1))\mathbf{1}_{A^{r}}
+(EP[f(Xr,2)f(X,2)|1X])𝟏Br]\displaystyle\qquad+(E_{P}[f(X^{r},2)-f(X,2)|{\mathcal{F}}_{1}^{X}])\mathbf{1}_{B^{r}}]
=EP[(f(Xr,1)f(X,1))𝟏Ar+(f(Xr,2)f(X,2))𝟏Br],\displaystyle=E_{P}\big{[}(f(X^{r},1)-f(X,1))\mathbf{1}_{A^{r}}+(f(X^{r},2)-f(X,2))\mathbf{1}_{B^{r}}],

where the inequality holds by the tower property. Using the fundamental theorem of calculus just as in the proof of Theorem 2.2 shows that

supQBr(P)1r(s(Q)s(P))\displaystyle\sup_{Q\in B_{r}(P)}\frac{1}{r}\big{(}s(Q)-s(P)\big{)} t=12EP[xtf(X,1)Zt𝟏Ar+xtf(X,2)Zt𝟏Br]o(1)\displaystyle\geq\sum_{t=1}^{2}E_{P}\big{[}\partial_{x_{t}}f(X,1)Z_{t}\mathbf{1}_{A^{r}}+\partial_{x_{t}}f(X,2)Z_{t}\mathbf{1}_{B^{r}}\big{]}-o(1)
t=12EP[xtf(X,τ)Zt]\displaystyle\to\sum_{t=1}^{2}E_{P}\big{[}\partial_{x_{t}}f(X,\tau^{\ast})Z_{t}\big{]}

as r0r\downarrow 0, where the convergence holds because, by (3.5), 𝟏A1r𝟏{τ=1}\mathbf{1}_{A_{1}^{r}}\to\mathbf{1}_{\{\tau^{\ast}=1\}} and 𝟏A2r𝟏{τ=2}\mathbf{1}_{A_{2}^{r}}\to\mathbf{1}_{\{\tau^{\ast}=2\}}. To complete the proof, it remains to recall the definition of ZZ, see (3.6). ∎

References

  • Acciaio and Hou [2022] Beatrice Acciaio and Songyan Hou. Convergence of adapted empirical measures on d\mathbb{R}^{d}. arXiv preprint arXiv:2211.10162, 2022.
  • Aldous [1981] D. Aldous. Weak convergence and general theory of processes. Department of Statistics, University of California, Berkeley, CA 94720, 1981.
  • Backhoff et al. [2022] Julio Backhoff, Daniel Bartl, Mathias Beiglböck, and Johannes Wiesel. Estimating processes in adapted Wasserstein distance. The Annals of Applied Probability, 32(1):529–550, 2022.
  • Backhoff-Veraguas et al. [2020a] Julio Backhoff-Veraguas, Daniel Bartl, Mathias Beiglböck, and Manu Eder. Adapted Wasserstein distances and stability in mathematical finance. Finance and Stochastics, 24(3):601–632, 2020a.
  • Backhoff-Veraguas et al. [2020b] Julio Backhoff-Veraguas, Daniel Bartl, Mathias Beiglböck, and Manu Eder. All adapted topologies are equal. Probability Theory and Related Fields, 178(3):1125–1172, 2020b.
  • Bartl et al. [2019] Daniel Bartl, Samuel Drapeau, and Ludovic Tangpi. Computational aspects of robust optimized certainty equivalents and option pricing. Mathematical Finance, 9(1):203, March 2019.
  • Bartl et al. [2021a] Daniel Bartl, Mathias Beiglböck, and Gudmund Pammer. The Wasserstein space of stochastic processes. arXiv preprint arXiv:2104.14245, 2021a.
  • Bartl et al. [2021b] Daniel Bartl, Samuel Drapeau, Jan Oblój, and Johannes Wiesel. Sensitivity analysis of Wasserstein distributionally robust optimization problems. Proceedings of the Royal Society A, 477(2256):20210176, 2021b.
  • Blanchet and Murthy [2019] Jose Blanchet and Karthyek Murthy. Quantifying distributional model risk via optimal transport. Mathematics of Operations Research, 44(2):565–600, 2019.
  • Blanchet et al. [2020] José Blanchet, Yang Kang, José Luis Montiel Olea, Viet Anh Nguyen, and Xuhui Zhang. Machine learning’s dropout training is distributionally robust optimal. arXiv preprint arXiv:2009.06111, 2020.
  • Blanchet et al. [2021] Jose Blanchet, Lin Chen, and Xun Yu Zhou. Distributionally robust mean-variance portfolio selection with Wasserstein distances. Management Science, 2021.
  • Calafiore [2007] Giuseppe C Calafiore. Ambiguous risk measures and optimal robust portfolios. SIAM Journal on Optimization, 18(3):853–877, 2007.
  • Eckstein and Pammer [2022] Stephan Eckstein and Gudmund Pammer. Computational methods for adapted optimal transport. arXiv preprint arXiv:2203.05005, 2022.
  • Gao and Kleywegt [2016] Rui Gao and Anton J Kleywegt. Distributionally robust stochastic optimization with wasserstein distance. arXiv preprint arXiv:1604.02199, 2016.
  • Gozlan et al. [2017] Nathael Gozlan, Cyril Roberto, Paul-Marie Samson, and Prasad Tetali. Kantorovich duality for general transport costs and applications. Journal of Functional Analysis, 273(11):3327–3405, 2017.
  • Hellwig [1996] M. Hellwig. Sequential decisions under uncertainty and the maximum theorem. J. Math. Econom., 25(4):443–464, 1996.
  • Herrmann and Muhle-Karbe [2017] Sebastian Herrmann and Johannes Muhle-Karbe. Model uncertainty, recalibration, and the emergence of delta–vega hedging. Finance and Stochastics, 21(4):873–930, 2017.
  • Hobson [1998] David G Hobson. Volatility misspecification, option pricing and superreplication via coupling. Annals of Applied Probability, pages 193–205, 1998.
  • Huber [2011] Peter J Huber. Robust statistics. In International encyclopedia of statistical science, pages 1248–1251. Springer, 2011.
  • [20] Yifan Jiang. Wasserstein distributional sensitivity to model uncertainty in a dynamic context. DPhil Transfer of Status Thesis. University of Oxford, January 2023. Private communication.
  • Karoui et al. [1998] Nicole El Karoui, Monique Jeanblanc-Picquè, and Steven E Shreve. Robustness of the black and scholes formula. Mathematical finance, 8(2):93–126, 1998.
  • Komlós [1967] Janos Komlós. A generalization of a problem of Steinhaus. Acta Mathematica Academiae Scientiarum Hungaricae, 18(1-2):217–229, 1967.
  • Kuhn et al. [2019] Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh-Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics, pages 130–166. Informs, 2019.
  • Lam [2016] Henry Lam. Robust sensitivity analysis for stochastic systems. Mathematics of Operations Research, 41(4):1248–1275, 2016.
  • Lam [2018] Henry Lam. Sensitivity to serial dependency of input processes: A robust approach. Management Science, 64(3):1311–1327, 2018.
  • Lindsay [1994] Bruce G Lindsay. Efficiency versus robustness: the case for minimum hellinger distance and related methods. The annals of statistics, 22(2):1081–1114, 1994.
  • Nendel and Sgarabottolo [2022] Max Nendel and Alessandro Sgarabottolo. A parametric approach to the estimation of convex risk functionals based on Wasserstein distances. arXiv preprint arXiv:2210.14340, 2022.
  • Oblój and Wiesel [2021] Jan Oblój and Johannes Wiesel. Distributionally robust portfolio maximization and marginal utility pricing in one period financial markets. Mathematical Finance, 31(4):1454–1493, 2021.
  • Pflug [2010] G Ch Pflug. Version-Independence and Nested Distributions in Multistage Stochastic Optimization. SIAM Journal on Optimization, 20(3):1406–1420, January 2010.
  • Pflug and Wozabal [2007] Georg Pflug and David Wozabal. Ambiguity in portfolio selection. Quantitative Finance, 7(4):435–442, 2007.
  • Pflug and Pichler [2012] Georg Ch Pflug and Alois Pichler. A Distance For Multistage Stochastic Optimization Models. SIAM Journal on Optimization, 22(1):1–23, January 2012.
  • Rahimian and Mehrotra [2019] Hamed Rahimian and Sanjay Mehrotra. Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659, 2019.
  • Rásonyi and Stettner [2005] Miklós Rásonyi and Lukasz Stettner. On utility maximization in discrete-time financial market models. The Annals of Applied Probability, 15(2):1367–1395, 2005.
  • Shafieezadeh-Abadeh et al. [2019] Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, and Peyman Mohajerin Esfahani. Regularization via mass transportation. Journal of Machine Learning Research, 20(103):1–68, 2019.