This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An optimal transport based characterization of convex order

Johannes Wiesel Johannes Wiesel
Columbia University, Department of Statistics
1255 Amsterdam Avenue
New York, NY 10027, USA
[email protected]
 and  Erica Zhang Erica Zhang
Columbia University, Department of Statistics
1255 Amsterdam Avenue
New York, NY 10027, USA
[email protected]
Abstract.

For probability measures μ,ν\mu,\nu and ρ\rho define the cost functionals

C(μ,ρ):=supπΠ(μ,ρ)x,yπ(dx,dy),C(ν,ρ):=supπΠ(ν,ρ)x,yπ(dx,dy),\displaystyle C(\mu,\rho):=\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy),\quad C(\nu,\rho):=\sup_{\pi\in\Pi(\nu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy),

where ,\langle\cdot,\cdot\rangle denotes the scalar product and Π(,)\Pi(\cdot,\cdot) is the set of couplings. We show that two probability measures μ\mu and ν\nu on d\mathbb{R}^{d} with finite first moments are in convex order (i.e. μcν\mu\preceq_{c}\nu) iff C(μ,ρ)C(ν,ρ)C(\mu,\rho)\leq C(\nu,\rho) holds for all probability measures ρ\rho on d\mathbb{R}^{d} with bounded support. This generalizes a result by Carlier. Our proof relies on a quantitative bound for the infimum of f𝑑νf𝑑μ\int f\,d\nu-\int f\,d\mu over all 11-Lipschitz functions ff, which is obtained through optimal transport duality and Brenier’s theorem. Building on this result, we derive new proofs of well-known one-dimensional characterizations of convex order. We also describe new computational methods for investigating convex order and applications to model-independent arbitrage strategies in mathematical finance.

JW acknowledges support by NSF Grant DMS-2205534. Part of this research was performed while JW was visiting the Institute for Mathematical and Statistical Innovation (IMSI), which is supported by the National Science Foundation (Grant No. DMS-1929348). JW thanks Beatrice Acciaio, Guillaume Carlier, Max Nendel, Gudmund Pammer and Ruodu Wang for helpful discussions. EZ acknowledges support through the summer internship program of the Columbia university statistics department.

1. Introduction and main result

Fix two probability measures μ,ν𝒫(d)\mu,\nu\in\mathcal{P}(\mathbb{R}^{d}) with

|x|μ(dx)<,|y|ν(dy)<.\int|x|\,\mu(dx)<\infty,\quad\int|y|\,\nu(dy)<\infty.

Recall that μ\mu and ν\nu are in convex order (denoted by μcν\mu\preceq_{c}\nu) iff

f𝑑μf𝑑νfor all convex functions f:d.\displaystyle\int f\,d\mu\leq\int f\,d\nu\qquad\text{for all convex functions }f:\mathbb{R}^{d}\to\mathbb{R}.

As any convex function is bounded from below by an affine function, the above integrals take values in (,](-\infty,\infty]. The notion of convex order is very well studied, see e.g. Ross et al. (1996); Müller and Stoyan (2002); Shaked and Shanthikumar (2007); Arnold (2012) and the references therein for an overview. It plays a pivotal role in mathematical finance since Strassen (1965) established that μcν\mu\preceq_{c}\nu if and only if (μ,ν)\mathcal{M}(\mu,\nu) — the set of martingale laws on d×d\mathbb{R}^{d}\times\mathbb{R}^{d} with marginals μ\mu and ν\nu — is non-empty. This result is also the reason why convex order has taken the center stage in the field of martingale optimal transport, see e.g. Galichon et al. (2014); Beiglböck et al. (2013, 2015); De March and Touzi (2019); Obłój and Siorpaes (2017); Guo and Obłój (2019); Alfonsi et al. (2019, 2020); Alfonsi and Jourdain (2020); Jourdain and Margheriti (2022); Massa and Siorpaes (2022) and the references therein. Furthermore, convex order plays a pivotal role in dependence modelling and risk aggregation, see e.g. Tchen (1980); Rüschendorf and Uckelmann (2002); Wang and Wang (2011); Embrechts et al. (2013); Bernard et al. (2017).
While there is an abundance of explicit characterizations of convex order available in one dimension (i.e. d=1d=1) – see e.g. (Shaked and Shanthikumar, 2007, Chapter 3)) — the case d>1d>1 seems to be less studied to the best of our knowledge. The main goal of this article is to fill this gap: we discuss a characterization of convex order, that holds in general dimensions, and is based on the theory of optimal transport (OT). Optimal transport goes back to the seminal works of Monge (1781) and Kantorovich (1958). It is concerned with the problem of transporting probability distributions in a cost-optimal way. We refer to Rachev and Rüschendorf (1998) and Villani (2003, 2008) for an overview. For this paper we only need a few basic concepts from OT. Most importantly we will need the cost functionals

C(μ,ρ):=supπΠ(μ,ρ)x,yπ(dx,dy),C(ν,ρ):=supπΠ(ν,ρ)x,yπ(dx,dy).\displaystyle C(\mu,\rho):=\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy),\qquad C(\nu,\rho):=\sup_{\pi\in\Pi(\nu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy).

Here Π(μ,ν)\Pi(\mu,\nu) denotes the set of probability measures on d×d\mathbb{R}^{d}\times\mathbb{R}^{d} with marginals μ\mu and ν\nu. Our main result is the following:

Theorem 1.1.

Assume that μ,ν𝒫(d)\mu,\nu\in\mathcal{P}(\mathbb{R}^{d}) have finite first moments. Then

(1) inff𝒞1(d)(f𝑑νf𝑑μ)=infρ𝒫1(d)(C(ν,ρ)C(μ,ρ)),\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right),

where

𝒫1(d):={ρ𝒫(d):supp(ρ)B1(0)}\displaystyle\mathcal{P}^{1}(\mathbb{R}^{d}):=\{\rho\in\mathcal{P}(\mathbb{R}^{d}):\ \mathrm{supp}(\rho)\subseteq B_{1}(0)\}

and

𝒞1(d):={f:dconvex,1-Lipschitz}.\displaystyle\mathcal{C}^{1}(\mathbb{R}^{d}):=\{f:\mathbb{R}^{d}\to\mathbb{R}\ \mathrm{convex},1\text{-}\mathrm{Lipschitz}\}.

Theorem 1.1 states that convex order of μ\mu and ν\nu is equivalent to an order relation C(,)C(\cdot,\cdot) on the space of probability measures. Contrary to standard characterizations of convex order using potential functions or cdfs, it holds in any dimension and can be seen as a natural generalization of the following result:

Corollary 1.2.

Denote the 2-Wasserstein metric by

𝒲2(μ,ν):=infπΠ(μ,ν)|xy|2π(dx,dy).\displaystyle\mathcal{W}_{2}(\mu,\nu):=\inf_{\pi\in\Pi(\mu,\nu)}\sqrt{\int|x-y|^{2}\,\pi(dx,dy)}.

If μ\mu and ν\nu have finite second moment, then they are in convex order if and only if

(2) 𝒲2(ν,ρ)2𝒲2(μ,ρ)2|y|2ν(dy)|x|2μ(dx)\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}\leq\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx)

holds for all probability measures ρ\rho on d\mathbb{R}^{d} with bounded support.

Corollary 1.2 itself has an interesting history. To the best of our knowledge, it was first stated in Carlier (2008) for compactly supported measures μ,ν\mu,\nu. His proof relies on a well-known connection between convex functions and OT for the squared Euclidean distance called Brenier’s theorem (see Brenier (1991); Rüschendorf and Rachev (1990)) together with a certain probabilistic first-order condition, see (Carlier, 2008, Proposition 1). We emphasize here, that contrary to the setting of Brenier’s theorem, no assumptions on the probability measures μ\mu and ν\nu except for the compact support condition are made; in particular there is no need to assume that these are absolutely continuous wrt. the Lebesgue measure.

Interestingly, Carlier’s result does not seem to be very well-known in the literature on stochastic order. We conjecture that this is mainly due to his use of the french word “balayée” instead of convex order, so that the connection is not immediately apparent. For this reason, one aim of this note is to popularize Carlier’s result, making it accessible to a wider audience, while simultaneously showcasing potential applications. As it turns out, Corollary 1.2 is at least partially known to the mathematical finance community: indeed, the “only if” direction of Corollary 1.2 was rediscovered in (Alfonsi and Jourdain, 2020, Equation (2.2)) for (not necessarily compactly supported) probability measures μ,ν\mu,\nu with finite second moments.

Theorem 1.1 differs from Carlier’s work in three aspects: first, as the convex order is classically embedded in 𝒫1(d)\mathcal{P}_{1}(\mathbb{R}^{d}) and does not require moments of higher order or compact support assumptions (see e.g. Nendel (2020)), Theorem 1.1 is simultaneously more concise and arguably more natural than Corollary 1.2. Second, our proof of Theorem 1.1 (and thus also Corollary 1.2) follows a different route than Carlier’s original proof, who argues purely on the space probability measures (i.e. the “primal side” in optimal transport). Instead, we combine Brenier’s theorem with the theory of the classical optimal transport duality. Lastly, we discuss three implications of Theorem 1.1: we first give a proof of a characterization of convex order in one dimension through quantile functions. Then we use Theorem 1.1 to derive new computational methods for testing convex order between μ\mu and ν\nu. For the computation we exploit state of the art computational OT methods, which are efficient for potentially high-dimensional problems. These have recently seen a spike in research activity. We refer to Peyré and Cuturi (2019) for an overview. Finally we discuss applications of Theorem 1.1 to the theory of so-called model-independent arbitrages, see (Acciaio et al., 2013, Definition 1.2).

This article is structured as follows: in Section 2 we state examples and consequences of Theorem 1.1. In particular we connect it to some well-known results in the theory of convex order. The proof of the main results is given in Section 3. Sections 4 and 5 discuss numerical and mathematical finance applications of Theorem 1.1 respectively. Remaining proofs are collected in Section 6.

2. Discussion and consequences of main results

To sharpen intuition, let us first discuss the case d=1d=1. By Theorem 1.1 we can obtain a new proof of a well-known representation of convex order on the real line, see e.g. (Shaked and Shanthikumar, 2007, Theorem 3.A.5). Here we denote the quantile function of a probability measure μ\mu by

Fμ1(x):=inf{y:μ((,y])x}.\displaystyle F_{\mu}^{-1}(x):=\inf\{y\in\mathbb{R}:\ \mu((-\infty,y])\geq x\}.
Corollary 2.1.

For d=1d=1 we have

μcν\displaystyle\mu\preceq_{c}\nu 0x[Fμ1(y)Fν1(y)]𝑑y0.\displaystyle\Leftrightarrow\int_{0}^{x}[F^{-1}_{\mu}(y)-F_{\nu}^{-1}(y)]\,dy\geq 0.

for all x[0,1]x\in[0,1], with equality for x=1x=1.

The proofs of all results of this section are collected in Section 6. We continue with general dd\in\mathbb{N} and give a geometric interpretation of Corollary 1.2 by restating it as follows: μcν\mu\preceq_{c}\nu holds iff

(3) 𝒲2(ν,ρ)2𝒲2(μ,ρ)2𝒲2(ν,δz)2𝒲(μ,δz)2\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}\leq\mathcal{W}_{2}(\nu,\delta_{z})^{2}-\mathcal{W}(\mu,\delta_{z})^{2}

for all ρ𝒫(d)\rho\in\mathcal{P}(\mathbb{R}^{d}) with bounded support, where δz\delta_{z}, zdz\in\mathbb{R}^{d} is a Dirac measure. Indeed, varying ρ\rho over Dirac measures in (2) implies that the means of μ\mu and ν\nu have to be equal; equation (3) then follows from simple algebra. This implies in particular that the difference between squared Wasserstein cost from ν\nu and μ\mu to ρ\rho is maximised at the point masses. Lastly, Theorem 1.1 can also be reformulated as: μcν\mu\preceq_{c}\nu iff

supπΠ(μ,ρ)x,zπ(dx,dz)supπΠ(ν,ρ)y,zπ(dy,dz),\displaystyle\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,z\rangle\,\pi(dx,dz)\leq\sup_{\pi\in\Pi(\nu,\rho)}\int\langle y,z\rangle\,\pi(dy,dz),

i.e. for any ρ𝒫(d)\rho\in\mathcal{P}(\mathbb{R}^{d}) with bounded support, the maximal covariance between μ\mu and ρ\rho is less than the one between ν\nu and ρ\rho. This provides a natural intuition for a classical pedestrian description of convex order, namely that “ν\nu being more spread out than μ\mu”.

We next give a simple example for Corollary 1.2.

Example 2.2.

Let us take μ=δ0\mu=\delta_{0} and ν\nu with mean zero. Now, recalling (4) and bounding 𝒲2(ν,ρ)\mathcal{W}_{2}(\nu,\rho) from above by choosing the product coupling, we obtain that for any ρ\rho with finite second moment

𝒲2(ν,ρ)2𝒲2(μ,ρ)2\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2} =𝒲2(ν,ρ)2|x|2ρ(dx)\displaystyle=\mathcal{W}_{2}(\nu,\rho)^{2}-\int|x|^{2}\,\rho(dx)
|y|2ν(dy)2x,yν(dx)ρ(dy)\displaystyle\leq\int|y|^{2}\nu(dy)-\int 2\langle x,y\rangle\,\nu(dx)\rho(dy)
=|y|2ν(dy)\displaystyle=\int|y|^{2}\nu(dy)
=|y|2ν(dy)|x|2μ(dx).\displaystyle=\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx).

In conclusion we recover the well-known fact δ0cν\delta_{0}\preceq_{c}\nu.

We now state two direct corollaries of Corollary 1.2. We consider the cost c(x,y):=|xy|2/2c(x,y):=|x-y|^{2}/2 and recall that a function ff is cc-concave, if

f(x)=infyd(g(y)c(x,y))\displaystyle f(x)=\inf_{y\in\mathbb{R}^{d}}(g(y)-c(x,y))

for some function g:dg:\mathbb{R}^{d}\to\mathbb{R}. We then have the following:

Corollary 2.3.

We have

g𝑑νg𝑑μfor all c-concave functions g:d\displaystyle\int g\,d\nu\leq\int g\,d\mu\qquad\text{for all }c\text{-concave functions }g:\mathbb{R}^{d}\to\mathbb{R}

if and only if

𝒲2(ν,ρ)2𝒲2(μ,ρ)2for all ρ𝒫(d) with compact support.\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}\leq\mathcal{W}_{2}(\mu,\rho)^{2}\qquad\text{for all }\rho\in\mathcal{P}(\mathbb{R}^{d})\text{ with compact support}.

Corollary 1.2 also directly implies the following well-known result:

Corollary 2.4.

If μcν\mu\preceq_{c}\nu then

𝒲2(μ,ν)2|y|2ν(dy)|x|2μ(dx).\mathcal{W}_{2}(\mu,\nu)^{2}\leq\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx).

In particular μcν\mu\preceq_{c}\nu implies

supπΠ(μ,ν)x,yπ(dx,dy)|x|2μ(dx).\displaystyle\ \sup_{\pi\in\Pi(\mu,\nu)}\int\langle x,y\rangle\,\pi(dx,dy)\geq\int|x|^{2}\,\mu(dx).

3. Proof of Theorem 1.1

Let us start by setting up some notation. We denote the scalar product on d\mathbb{R}^{d} by ,\langle\cdot,\cdot\rangle. We write |||\cdot| for the Euclidean norm on d\mathbb{R}^{d}. The ball in d\mathbb{R}^{d} around xx of radius r>0r>0 will be denoted by Br(x)B_{r}(x). We write f(x)\nabla f(x) for the derivative of a function f:df:\mathbb{R}^{d}\to\mathbb{R} at a point xdx\in\mathbb{R}^{d}. We denote the dd-dimensional Lebesgue measure by λ\lambda.

In order to keep this article self-contained, we summarise some properties of optimal transport at the beginning of this section, and refer to (Villani, 2003, Chapter 2.1) for a more detailed treatment.
By definition we have for any ρ𝒫2(d)\rho\in\mathcal{P}_{2}(\mathbb{R}^{d}) that

(4) 𝒲2(μ,ρ)2=|x|2μ(dx)+|y|2ρ(dy)2supπΠ(μ,ρ)x,yπ(dx,dy).\displaystyle\mathcal{W}_{2}(\mu,\rho)^{2}=\int|x|^{2}\,\mu(dx)+\int|y|^{2}\,\rho(dy)-2\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy).

In this section we thus (re-)define the cost function c(x,y):=x,yc(x,y):=\langle x,y\rangle and recall that the convex conjugate f:d{}f^{*}:\mathbb{R}^{d}\to\mathbb{R}\cup\{\infty\} of a function f:df:\mathbb{R}^{d}\to\mathbb{R} is given by

f(y):=supxd(y,xf(x)).\displaystyle f^{*}(y):=\sup_{x\in\mathbb{R}^{d}}\left(\langle y,x\rangle-f(x)\right).

The subdifferential of a proper convex function f:d{}f:\mathbb{R}^{d}\to\mathbb{R}\cup\{\infty\} is defined as

f(x):={yd:f(x)f(x)y,xx for all xd}.\displaystyle\partial f(x):=\{y\in\mathbb{R}^{d}:\ f(x^{\prime})-f(x)\geq\langle y,x^{\prime}-x\rangle\text{ for all }x^{\prime}\in\mathbb{R}^{d}\}.

It is non-empty if xx belongs to the interior of the domain of ff. We have

(5) f(x)+f(y)x,y=0yf(x).\displaystyle f(x)+f^{*}(y)-\langle x,y\rangle=0\quad\Leftrightarrow\quad y\in\partial f(x).

Lastly we recall the duality

(6) C(μ,ρ)=supπΠ(μ,ρ)x,yπ(dx,dy)=inffgcf𝑑μ+g𝑑ρ=inffgc,f,g proper, convex f𝑑μ+g𝑑ρ\displaystyle\begin{split}C(\mu,\rho)&=\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy)\\ &=\inf_{f\oplus g\geq c}\int f\,d\mu+\int g\,d\rho\\ &=\inf_{f\oplus g\geq c,\ f,g\text{ proper, convex }}\int f\,d\mu+\int g\,d\rho\end{split}

and the existence of an optimal pair (f,f)(f,f^{*}) of (lower semicontinuous, proper) convex conjugate functions. Replacing μ\mu by ν\nu in the display above, we obtain a similar duality for C(ν,ρ)C(\nu,\rho).

3.1. Proof of Theorem 1.1: the equivalent case

We first prove Theorem 1.1 for measures μ,ν\mu,\nu, which are equivalent to the dd-dimensional Lebesgue measure λ\lambda, i.e. μ,νλ\mu,\nu\sim\lambda. As μ,ν𝒫1(d)\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d}), the domain of the optimising potential ff for C(μ,ρ)C(\mu,\rho) (resp. C(ν,ρ)C(\nu,\rho)) is d\mathbb{R}^{d} in this case. Recall furthermore that

(7) f(x)=Conv¯(limxkxf(xk)),\displaystyle\partial f(x)=\overline{\text{Conv}}(\lim_{x_{k}\to x}\nabla f(x_{k})),

see e.g. (Villani, 2003, 2.1.3.3)). We write

f:=supxdsupyf(x)|y|.\|\partial f\|_{\infty}:=\sup_{x\in\mathbb{R}^{d}}\sup_{y\in\partial f(x)}|y|.

We now prove Theorem 1.1 when μ,νλ\mu,\nu\sim\lambda:

Proposition 3.1.

Assume μ,ν𝒫1(d)\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d}), μ,νλ\mu,\nu\sim\lambda. Recall

𝒫1(d)={ρ𝒫(d):supp(ρ)B1(0)}\displaystyle\mathcal{P}^{1}(\mathbb{R}^{d})=\{\rho\in\mathcal{P}(\mathbb{R}^{d}):\ \mathrm{supp}(\rho)\subseteq B_{1}(0)\}

as well as the 1-Lipschitz convex functions

𝒞1(d)={f:dconvex,f1}.\displaystyle\mathcal{C}^{1}(\mathbb{R}^{d})=\{f:\mathbb{R}^{d}\to\mathbb{R}\ \mathrm{convex},\|\partial f\|_{\infty}\leq 1\}.

Then we have

inff𝒞1(d)(f𝑑νf𝑑μ)=infρ𝒫1(d)(C(ν,ρ)C(μ,ρ)).\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).
Proof.

As μ,ν\mu,\nu have finite first moment and ρ\rho is compactly supported, |C(μ,ρ)|,|C(ν,ρ)|<|C(\mu,\rho)|,|C(\nu,\rho)|<\infty follows from Hölder’s inequality. We now fix ρ𝒫1(d)\rho\in\mathcal{P}^{1}(\mathbb{R}^{d}) and take an optimal convex pair (f^,g^)(\hat{f},\hat{g}) in (6) for C(ν,ρ)C(\nu,\rho). Next we apply Brenier’s theorem in the form of (Villani, 2003, Theorem 2.12)), which states that ρ=f^(x)ν\rho=\nabla\hat{f}(x)_{*}\nu.111We note that the result is stated under the additional requirement that ν,ρ𝒫2(d)\nu,\rho\in\mathcal{P}_{2}(\mathbb{R}^{d}). However as ρ\rho is supported on the unit ball, it can be checked that the arguments of (Villani, 2003, proof of Theorem 2.9) (in particular boundedness from below) carry over, when simply adding |x||x| to the potential f^\hat{f} instead of adding |x|2/2|x|^{2}/2 to f^\hat{f} and |y|2/2|y|^{2}/2 to g^\hat{g}. Furthermore, as supp(ρ)B1(0)\text{supp}(\rho)\subseteq B_{1}(0) we conclude f^1\|\partial\hat{f}\|_{\infty}\leq 1 by (7) and

C(ν,ρ)C(μ,ρ)\displaystyle C(\nu,\rho)-C(\mu,\rho) f^𝑑ν+g^𝑑ρ(f^𝑑μ+g^𝑑ρ)\displaystyle\geq\int\hat{f}\,d\nu+\int\hat{g}\,d\rho-\left(\int\hat{f}\,d\mu+\int\hat{g}\,d\rho\right)
=f^𝑑νf^𝑑μ\displaystyle=\int\hat{f}\,d\nu-\int\hat{f}\,d\mu
inff𝒞1(d)(f𝑑νf𝑑μ).\displaystyle\geq\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right).

Taking the infimum over ρ𝒫1(d)\rho\in\mathcal{P}^{1}(\mathbb{R}^{d}) shows that

infρ𝒫1(d)(C(ν,ρ)C(μ,ρ))inff𝒞1(d)(f𝑑νf𝑑μ).\displaystyle\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)\geq\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right).

On the other hand, fix f𝒞1(d)f\in\mathcal{C}^{1}(\mathbb{R}^{d}) and set g:=fg:=f^{*}. Define ρ^:=fμ\hat{\rho}:=\nabla f_{*}\mu and note that ρ^𝒫1(d)\hat{\rho}\in\mathcal{P}^{1}(\mathbb{R}^{d}). Then again by Brenier’s theorem we obtain optimality of the pair (f,g)(f,g) for C(μ,ρ^)C(\mu,\hat{\rho}), and thus

f𝑑νf𝑑μ\displaystyle\int f\,d\nu-\int f\,d\mu =(f𝑑ν+g𝑑ρ^)(g𝑑ρ^+f𝑑μ)\displaystyle=\left(\int f\,d\nu+\int g\,d\hat{\rho}\right)-\left(\int g\,d\hat{\rho}+\int f\,d\mu\right)
C(ν,ρ^)C(μ,ρ^)\displaystyle\geq C(\nu,\hat{\rho})-C(\mu,\hat{\rho})
infρ𝒫1(d)(C(ν,ρ)C(μ,ρ)).\displaystyle\geq\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

Taking the infimum over f𝒞1(d)f\in\mathcal{C}^{1}(\mathbb{R}^{d}) shows

inff𝒞1(d)(f𝑑νf𝑑μ)infρ𝒫1(d)(C(ν,ρ)C(μ,ρ)).\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)\geq\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

This concludes the proof. ∎

3.2. Proof of Theorem 1.1: the general case

We now prove Theorem 1.1 for general measures μ,ν𝒫1(d)\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d}) through approximation in the 11-Wasserstein sense.

Proof of Theorem 1.1.

Let us take sequences of (μn)n(\mu_{n})_{n\in\mathbb{N}}, (νn)n(\nu_{n})_{n\in\mathbb{N}} in 𝒫1(d)\mathcal{P}_{1}(\mathbb{R}^{d}) satisfying

limn𝒲1(μ,μn)=0=limn𝒲1(ν,νn),μn,νnλ for all n,\displaystyle\lim_{n\to\infty}\mathcal{W}_{1}(\mu,\mu_{n})=0=\lim_{n\to\infty}\mathcal{W}_{1}(\nu,\nu_{n}),\qquad\mu_{n},\nu_{n}\sim\lambda\text{ for all }n\in\mathbb{N},

where 𝒲1\mathcal{W}_{1} denotes the 11-Wasserstein distance. Recall that 𝒞1(d)\mathcal{C}^{1}(\mathbb{R}^{d}) denotes the set of convex 11-Lipschitz functions. Thus, e.g. by the Kantorovich-Rubinstein formula ((Villani, 2008, (5.11))),

(8) limnsupf𝒞1(d)|f𝑑μf𝑑μn|limn𝒲1(μ,μn)=0.\displaystyle\lim_{n\to\infty}\sup_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left|\int fd\mu-\int fd\mu_{n}\right|\leq\lim_{n\to\infty}\mathcal{W}_{1}(\mu,\mu_{n})=0.

The same holds for (νn)n(\nu_{n})_{n\in\mathbb{N}} and ν\nu. Next, take an optimal coupling π=π(dx,dy)\pi=\pi(dx,dy) for C(ν,ρ)C(\nu,\rho) and an optimal coupling πn=πn(dx,dz)\pi^{n}=\pi^{n}(dx,dz) for 𝒲1(ν,νn)\mathcal{W}_{1}(\nu,\nu_{n}). Then π^n(dy,dz):=πn(dx,dz)πx(dy)\hat{\pi}^{n}(dy,dz):=\int\pi^{n}(dx,dz)\pi_{x}(dy) is a coupling of ρ\rho and νn\nu_{n}. Furthermore, as |y|1|y|\leq 1 ρ\rho-a.s. we have

C(ν,ρ)C(νn,ρ)|y,xzπn(dx,dz)πx(dy)||xz|πn(dx,dz)𝒲1(νn,ν).\displaystyle\begin{split}C(\nu,\rho)-C(\nu_{n},\rho)&\leq\left|\int\langle y,x-z\rangle\,\pi^{n}(dx,dz)\pi_{x}(dy)\right|\\ &\leq\int|x-z|\,\pi^{n}(dx,dz)\\ &\leq\mathcal{W}_{1}(\nu_{n},\nu).\end{split}

Exchanging the roles of ν\nu and νn\nu_{n} then yields

|C(ν,ρ)C(νn,ρ)|𝒲1(νn,ν).\displaystyle|C(\nu,\rho)-C(\nu_{n},\rho)|\leq\mathcal{W}_{1}(\nu_{n},\nu).

As the rhs is independent of ρ𝒫1(d)\rho\in\mathcal{P}^{1}(\mathbb{R}^{d}) this shows

(9) limnsupρ𝒫1(d)|C(ν,ρ)C(νn,ρ)|=0.\displaystyle\lim_{n\to\infty}\sup_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left|C(\nu,\rho)-C(\nu_{n},\rho)\right|=0.

A similar argument holds for (μn)n(\mu_{n})_{n\in\mathbb{N}} and μ\mu. We can now write

inff𝒞1(d)(f𝑑νf𝑑μ)\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right) =inff𝒞1(d)[(fdνnfdμn)\displaystyle=\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\Bigg{[}\left(\int f\,d\nu_{n}-\int f\,d\mu_{n}\right)
+(fdνfdνn)(fdμfdμn)]\displaystyle+\left(\int f\,d\nu-\int f\,d\nu_{n}\right)-\left(\int f\,d\mu-\int f\,d\mu_{n}\right)\Bigg{]}

and

infρ𝒫1(d)(C(ν,ρ)C(μ,ρ))\displaystyle\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\big{(}C(\nu,\rho)-C(\mu,\rho)\big{)} =infρ𝒫1(d)[(C(νn,ρ)C(μn,ρ))\displaystyle=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\Bigg{[}\big{(}C(\nu_{n},\rho)-C(\mu_{n},\rho)\big{)}
+(C(ν,ρ)C(νn,ρ))(C(μ,ρ)C(μn,ρ)).]\displaystyle+\big{(}C(\nu,\rho)-C(\nu_{n},\rho)\big{)}-\big{(}C(\mu,\rho)-C(\mu_{n},\rho)\big{)}.\Bigg{]}

Applying Proposition 3.1, taking nn\to\infty and using (8), (9) then concludes the proof. ∎

3.3. Proof of Corollary 1.2

We now detail the proof of Corollary 1.2. We start with a preliminary result, which is an immediately corollary of Theorem 1.1.

Corollary 3.2.

Assume μ,ν𝒫1(d)\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d}). Then we have

(10) inffconvex(f𝑑νf𝑑μ)=infρ𝒫(d)(C(ν,ρ)C(μ,ρ)),\displaystyle\inf_{f\ \mathrm{convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right),

where 𝒫(d)\mathcal{P}^{\infty}(\mathbb{R}^{d}) denotes the set of probability measures with bounded support. In particular

f𝑑μf𝑑νfor all convex functions f:d\displaystyle\int f\,d\mu\leq\int f\,d\nu\qquad\text{for all convex functions }f:\mathbb{R}^{d}\to\mathbb{R}

if and only if

C(μ,ρ)C(ν,ρ)for all ρ𝒫(d).\displaystyle C(\mu,\rho)\leq C(\nu,\rho)\qquad\text{for all }\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d}).
Proof.

Multiplying both sides of (1) by k>0k>0 yields

inff𝒞k(d)(f𝑑νf𝑑μ)=infρ𝒫k(d)(C(ν,ρ)C(μ,ρ))\displaystyle\inf_{f\in\mathcal{C}^{k}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{k}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)

with the definitions

𝒫k(d)={ρ𝒫(d):supp(ρ)Bk(0)}\displaystyle\mathcal{P}^{k}(\mathbb{R}^{d})=\{\rho\in\mathcal{P}(\mathbb{R}^{d}):\ \text{supp}(\rho)\subseteq B_{k}(0)\}

and

𝒞k(d):={f:d convex,fk}.\displaystyle\mathcal{C}^{k}(\mathbb{R}^{d}):=\{f:\mathbb{R}^{d}\to\mathbb{R}\text{ convex},\|\partial f\|_{\infty}\leq k\}.

Taking kk\to\infty we obtain

inff convex, Lipschitz(f𝑑νf𝑑μ)=infρ𝒫(d)(C(ν,ρ)C(μ,ρ)).\displaystyle\inf_{f\text{ convex, Lipschitz}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

Lastly, any convex function f:df:\mathbb{R}^{d}\to\mathbb{R} can be approximated pointwise from below by convex Lipschitz functions. Thus

inffconvex(f𝑑νf𝑑μ)=infρ𝒫(d)(C(ν,ρ)C(μ,ρ)).\displaystyle\inf_{f\ \mathrm{convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

The claim thus follows. ∎

Remark 3.3.

If μ,ν𝒫p(d)\mu,\nu\in\mathcal{P}_{p}(\mathbb{R}^{d}) for some p1p\geq 1, then by Hölder’s inequality and density of finitely supported measures in the qq-Wasserstein space we also obtain

inffconvex(f𝑑νf𝑑μ)=infρ𝒫q(d)(C(ν,ρ)C(μ,ρ)),\displaystyle\inf_{f\ \mathrm{convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}_{q}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right),

where 1/p+1/q=11/p+1/q=1.

Proof of Corollary 1.2.

Recall from (4) that

C(μ,ρ)\displaystyle C(\mu,\rho) =12(|x|2μ(dx)+|z|2ρ(dz)𝒲2(μ,ρ)2),\displaystyle=\frac{1}{2}\left(\int|x|^{2}\,\mu(dx)+\int|z|^{2}\,\rho(dz)-\mathcal{W}_{2}(\mu,\rho)^{2}\right),
C(ν,ρ)\displaystyle C(\nu,\rho) =12(|y|2ν(dy)+|z|2ρ(dz)𝒲2(ν,ρ)2).\displaystyle=\frac{1}{2}\left(\int|y|^{2}\,\nu(dy)+\int|z|^{2}\,\rho(dz)-\mathcal{W}_{2}(\nu,\rho)^{2}\right).

Combining this with (10) from Corollary 3.2 yields

inff convex(f𝑑νf𝑑μ)=infρ𝒫(d)(C(ν,ρ)C(μ,ρ))\displaystyle\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)
=12infρ𝒫(d)(|y|2ν(dy)+|z|2ρ(dz)𝒲2(ν,ρ)2\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\int|y|^{2}\,\nu(dy)+\int|z|^{2}\,\rho(dz)-\mathcal{W}_{2}(\nu,\rho)^{2}
|x|2μ(dx)|z|2ρ(dz)+𝒲2(μ,ρ)2)\displaystyle\qquad\qquad\qquad\qquad-\int|x|^{2}\,\mu(dx)-\int|z|^{2}\,\rho(dz)+\mathcal{W}_{2}(\mu,\rho)^{2}\Big{)}
=12infρ𝒫(d)(𝒲2(μ,ρ)2𝒲2(ν,ρ)2+|y|2ν(dy)|x|2μ(dx)).\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}+\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx)\Big{)}.

Thus

f𝑑μf𝑑νfor all convex functions f:d\displaystyle\int f\,d\mu\leq\int f\,d\nu\qquad\text{for all convex functions }f:\mathbb{R}^{d}\to\mathbb{R}
\displaystyle\Leftrightarrow inff convex(f𝑑νf𝑑μ)0\displaystyle\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)\geq 0
\displaystyle\Leftrightarrow infρ𝒫(d)(𝒲2(μ,ρ)2𝒲2(ν,ρ)2)|x|2μ(dx)|y|2ν(dy)\displaystyle\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}\Big{)}\geq\int|x|^{2}\,\mu(dx)-\int|y|^{2}\,\nu(dy)
\displaystyle\Leftrightarrow supρ𝒫(d)(𝒲2(ν,ρ)2𝒲2(μ,ρ)2)|y|2ν(dy)|x|2μ(dx).\displaystyle\sup_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}\Big{)}\leq\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx).

The claim follows. ∎

4. Numerical examples

In this section we illustrate Theorem 1.1 numerically. We focus on the following toy examples, where convex order or its absence is easy to establish:

Example 4.1.

μ=𝒩(0,σ2I)\mu=\mathcal{N}(0,\sigma^{2}I) and ν=𝒩(0,I)\nu=\mathcal{N}(0,I) for σ2[0,2]\sigma^{2}\in[0,2] for d=1,2d=1,2.

Example 4.2.

μ=12(δ1s+δ1+s)\mu=\frac{1}{2}\left(\delta_{-1-s}+\delta_{1+s}\right) and ν=12(δ1+δ1)\nu=\frac{1}{2}\left(\delta_{-1}+\delta_{1}\right) for s[1,1]s\in[-1,1].

Example 4.3.
μ=14(δ(1s,0)+δ(1+s,0)+δ(0,1+s)+δ(0,1s))\mu=\frac{1}{4}\left(\delta_{(-1-s,0)}+\delta_{(1+s,0)}+\delta_{(0,1+s)}+\delta_{(0,-1-s)}\right)

and

ν=14(δ(1,0)+δ(1,0)+δ(0,1)+δ(0,1))\nu=\frac{1}{4}\left(\delta_{(-1,0)}+\delta_{(1,0)}+\delta_{(0,1)}+\delta_{(0,-1)}\right)

for s[1,1]s\in[-1,1].

A general numerical implementation for testing convex order of the two measures μ,ν𝒫2(d)\mu,\nu\in\mathcal{P}_{2}(\mathbb{R}^{d}) in general dimensions and the examples discussed here can be found in the Github repository https://github.com/johanneswiesel/Convex-Order. In the implementation we use the POT package (https://pythonot.github.io) to compute optimal transport distances.

Let us set

V(μ,ν):=infρ𝒫1(d)(C(ν,ρ)C(μ,ρ))\displaystyle V(\mu,\nu):=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)

and note that by Theorem 1.1 we have the relationship

μcνV(μ,ν)0.\displaystyle\mu\preceq_{c}\nu\quad\Leftrightarrow\quad V(\mu,\nu)\geq 0.

Clearly the computation of V(μ,ν)V(\mu,\nu) hinges on the numerical exploration of the convex set of probability measures 𝒫1(d)\mathcal{P}^{1}(\mathbb{R}^{d}). We propose two methods for this: our first method only considers finitely supported measures ρ\rho, which are dense in 𝒫1(d)\mathcal{P}^{1}(\mathbb{R}^{d}) in the Wasserstein topology. It relies on the Dirichlet distribution on the space Rg1R^{g-1}, gg\in\mathbb{N}, with density

f(x1,,xg;α1,,αg)=1B(α)i=1gxiαi1\displaystyle f(x_{1},\dots,x_{g};\alpha_{1},\dots,\alpha_{g})=\frac{1}{B(\alpha)}\prod_{i=1}^{g}x_{i}^{\alpha_{i}-1}

for x1,,xg[0,1]x_{1},\dots,x_{g}\in[0,1] satisfying i=1gxi=1\sum_{i=1}^{g}x_{i}=1. Here α1,,αg>0\alpha_{1},\dots,\alpha_{g}>0, α:=(α1,,αg)\alpha:=(\alpha_{1},\dots,\alpha_{g}) and B(α)B(\alpha) denotes the Beta function. Fixing gg grid points {k1,,kg}\{k_{1},\dots,k_{g}\} in B1(0)B_{1}(0), we can consider any realization of a Dirichlet random variable (X1,,Xg)(X_{1},\dots,X_{g}) as a probability distribution assigning probability mass XiX_{i} to the grid point kik_{i}, i{1,,g}i\in\{1,\dots,g\}. This leads to the following algorithm:

Algorithm 1 Basic algorithm for Indirect Dirichlet method
probability measures μ\mu, ν\nu, maximal number of evaluations NN, number of grid points gg.
V(μ,ν)V(\mu,\nu)
Generate a grid GG of B1(0)B_{1}(0) of gg equidistant points and consider Dirichlet random variables modelling ρ\rho supported on GG. Use Bayesian optimization to solve
inf[C(ρ,ν)C(ρ,μ)]\displaystyle\inf\,[C(\rho,\nu)-C(\rho,\mu)]
over the set of Dirichlet distributions on g1\mathbb{R}^{g-1}. Terminate after NN steps.
return infC(ρ,ν)C(ρ,μ).\inf\,C(\rho,\nu)-C(\rho,\mu).

The main computational challenge in Algorithm 1 is the efficient evaluation of C(ρ,ν)C(\rho,\nu) and C(ρ,μ)C(\rho,\mu). For this we aim to write C(ρ,ν)C(\rho,\nu) and C(ρ,μ)C(\rho,\mu) as linear programs. We offer two different variants of Algorithm 1:

  • Indirect Dirichlet method with histograms: If we have access to finitely supported approximations a and b of μ\mu and ν\nu respectively and the measure ρ\rho is supported on GG as above, then we solve the linear programs C(a,ρ)C(\textbf{a},\rho) and C(b,ρ)C(\textbf{b},\rho) as is standard in optimal transport theory.

  • Indirect Dirichlet method with samples: here we draw a number of samples from μ\mu and ν\nu respectively and denote the respective empirical distributions of these samples by a and b. As before we assume that we have access to a probability measure ρ\rho supported on GG. We then solve the linear programs C(a,ρ)C(\textbf{a},\rho) and C(a,ρ)C(\textbf{a},\rho).

An alternative to Algorithm 1 is to directly draw samples from a distribution ρ𝒫1(d)\rho\in\mathcal{P}^{1}(\mathbb{R}^{d}). We call this the Direct randomized Dirichlet method, see Algorithm 2 below.

Algorithm 2 Direct randomized Dirichlet method
probability measures μ\mu, ν\nu, maximal number of evaluations NN.
V(μ,ν)V(\mu,\nu)
Draw samples from μ\mu and ν\nu and denote the empirical distributions of these samples by a and b respectively. Draw samples from a Dirichlet distribution and randomize their signs, under the constraint that the empirical distribution ρ\rho of these samples is an element of 𝒫1(d)\mathcal{P}^{1}(\mathbb{R}^{d}). Use Bayesian optimization to solve
inf[C(ρ,ν)C(ρ,μ)]\displaystyle\inf\,[C(\rho,\nu)-C(\rho,\mu)]
over the set of these distributions. Terminate after NN steps.
return minimal value of inf[C(ρ,μ)C(ρ,ν)].\inf\,[C(\rho,\mu)-C(\rho,\nu)].

We refer to the github repository for a more detailed discussion, in particular for the implementation and further comments. For each example stated at the beginning of this section and each pair (μ,ν)(\mu,\nu) we plot V(μ,ν)V(\mu,\nu) for the three methods discussed above, see Figures 3 and 4.

Refer to caption
Refer to caption
Figure 1. Values of different estimators of V(μ,ν)V(\mu,\nu) plotted against σ\sigma for Example 4.1. Both plots use N=100N=100 samples.
Refer to caption
Refer to caption
Figure 2. Values of different estimators of V(μ,ν)V(\mu,\nu) plotted against ss for Example 4.2 (left) and 4.3 (right). Both plots use N=100N=100 samples.

Discounting numerical errors, all estimators seem to detect convex order. The direct randomized Dirichlet method is less complex; however it does not seem to explore the 𝒫1(d)\mathcal{P}^{1}(\mathbb{R}^{d})-space as well as the two indirect Dirichlet methods. On the other hand, both of the indirect Dirichlet methods yield very similar results for the examples considered. As the name suggests, the “indirect Dirichlet method with samples” works on samples directly, which might be more convenient for practical applications on real data.

As can be expected from the numerical implementation, the histogram method consistently yields the lowest runtimes, while runtimes of the other methods are much higher. Indeed, when working with samples, the weights of the empirical distributions are constant, while the OT cost matrices Ma\textbf{M}_{\textbf{a}} and Mb\textbf{M}_{\textbf{b}} in the implementation have to re-computed in each iteration and this is very costly; for the histogram method, the weights ρ\rho change, while the grid stays constant — and thus also Ma\textbf{M}_{\textbf{a}} and Mb\textbf{M}_{\textbf{b}} .

5. Model independent arbitrage strategies

Let us consider a financial market with dd financial assets and denote its price process by (St)t0(S_{t})_{t\geq 0}. Let us assume S0=s0S_{0}=s_{0}\in\mathbb{R} and fix two maturities T1<T2T_{1}<T_{2}. If call options with these maturities are traded at all strikes, then the prices of the call options determine the distribution of ST1S_{T_{1}} and ST2S_{T_{2}} under any martingale measure; this fact was first established by Breeden and Litzenberger (1978). Let us denote the laws of ST1S_{T_{1}} and ST2S_{T_{2}} by μ\mu and ν\nu respectively. If trading is only allowed at 0,T10,T_{1} and T2T_{2}, the following definition is natural and will be crucial for our analysis.

Definition 5.1.

The triple of measurable functions (u1,u2,Δ)(u_{1},u_{2},\Delta) is a model-independent arbitrage if u1L1(μ)u_{1}\in L^{1}(\mu), u2L1(ν)u_{2}\in L^{1}(\nu) and

u1(x)u1𝑑μ+u2(y)u2𝑑ν+Δ(x)(yx)>0,for all (x,y)d×d.\displaystyle u_{1}(x)-\int u_{1}\,d\mu+u_{2}(y)-\int u_{2}\,d\nu+\Delta(x)(y-x)>0,\quad\text{for all }(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}.

If no such strategies exist, then we call the market free of model-independent arbitrage.

In the above, u1u_{1} and u2u_{2} can be interpreted as payoffs of Vanilla options with market prices u1𝑑μ\int u_{1}\,d\mu and u2𝑑ν\int u_{2}\,d\nu respectively, while the term Δ(x)(yx)\Delta(x)(y-x) denotes the gains or losses from buying Δ(x)\Delta(x) assets at time T1T_{1} and holding them until T2T_{2}.

The following theorem makes the connection between model-independent arbitrages and convex order of μ\mu and ν\nu apparent. It can essentially be found in (Guyon et al., 2017, Theorem 3.4).

Theorem 5.2.

The following are equivalent:

  1. (i)

    The market is free of model-independent arbitrage.

  2. (ii)

    (μ,ν)\mathcal{M}(\mu,\nu)\neq\emptyset.

  3. (iii)

    μcν\mu\preceq_{c}\nu.

In particular, if μcν\mu\npreceq_{c}\nu, then there exists a convex function ff, such that the triple (f(x),f(y),g(x))(-f(x),f(y),-g(x)) is a model-independent arbitrage. Here gg is a measurable selector of the subdifferential of ff.

The strategy (f(x),f(y),g(x))(-f(x),f(y),-g(x)) is often called a calendar spread. As our setting is not quite exactly covered by (Guyon et al., 2017, Theorem 3.4) and the proof is not hard, we include it here.

Proof of Theorem 5.2.

(ii)\Leftrightarrow(iii) is Strassen’s theorem, see Strassen (1965). If μcν\mu\npreceq_{c}\nu, then by definition there exists a convex function ff such that

f𝑑μ>f𝑑ν.\int f\,d\mu>\int f\,d\nu.

On the other hand, ff is convex and thus satisfies

f(y)f(x)g(x)(yx)for all (x,y)d×d.\displaystyle f(y)-f(x)\geq g(x)(y-x)\quad\text{for all }(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}.

Combining the two equations above shows that (f(x),f(y),g(x))(-f(x),f(y),-g(x)) is a model-independent arbitrage, and thus (i)\Rightarrow(iii). It remains to show (ii)\Rightarrow(i), which is well known. Indeed, taking expectations in the inequality

u1(x)u1𝑑μ+u2(y)u2𝑑ν+Δ(x)(yx)>0,for all (x,y)d×d\displaystyle u_{1}(x)-\int u_{1}\,d\mu+u_{2}(y)-\int u_{2}\,d\nu+\Delta(x)(y-x)>0,\quad\text{for all }(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}

under any martingale measure with marginals μ,ν\mu,\nu leads to a contradiction. This concludes the proof. ∎

As a direct consequence of Theorem 5.2, we can use Theorem 1.1 to detect model-independent arbitrages in the market under consideration: indeed, Theorem 1.1 states that μcν\mu\npreceq_{c}\nu implies existence of a probability measure ρ𝒫1(d)\rho\in\mathcal{P}^{1}(\mathbb{R}^{d}) satisfying

C(ρ,ν)C(ρ,μ)<0.\displaystyle C(\rho,\nu)-C(\rho,\mu)<0.

Next, if νλ\nu\sim\lambda, then the proof of Theorem 1.1 shows that ρ=f^(x)ν\rho=\nabla\hat{f}(x)_{*}\nu for some convex function f^:d\hat{f}:\mathbb{R}^{d}\to\mathbb{R} and

f^𝑑νf^𝑑μC(ρ,ν)C(ρ,μ)<0,i.e. f^𝑑ν<f^𝑑μ.\displaystyle\int\hat{f}\,d\nu-\int\hat{f}\,d\mu\leq C(\rho,\nu)-C(\rho,\mu)<0,\quad\text{i.e. }\int\hat{f}\,d\nu<\int\hat{f}\,d\mu.

In particular, a model-independent arbitrage strategy is given by calendar spread (f^(x),f^(x),f^(x))(-\hat{f}(x),\hat{f}(x),-\nabla\hat{f}(x)). Via an approximation argument, this result remains true for arbitrary probability measures ν𝒫1(d)\nu\in\mathcal{P}_{1}(\mathbb{R}^{d}). In particular, we can use the same methods as in Section 4 to find ρ\rho. We then estimate f^(x)\nabla\hat{f}(x) from the optimizing transport plan πΠ(ρ,ν)\pi\in\Pi(\rho,\nu) of C(ρ,ν)C(\rho,\nu) by taking the conditional expectation xπy(dx)\int x\,\pi_{y}(dx), where (πy)yd(\pi_{y})_{y\in\mathbb{R}^{d}} denotes the conditional probability distribution of π\pi with respect to its second marginal ν\nu. This is a standard technique (see e.g. Deb et al. (2021) for details). In conclusion we can obtain an explicit arbitrage strategy.

To illustrate the ideas outlined above, we return to Example 4.1, i.e. μ=𝒩(0,σ2I)\mu=\mathcal{N}(0,\sigma^{2}I) and ν=𝒩(0,I)\nu=\mathcal{N}(0,I) for σ2>0\sigma^{2}>0 and d=1,2d=1,2. Having determined ρ\rho such that C(ρ,ν)C(ρ,μ)<0C(\rho,\nu)-C(\rho,\mu)<0, we estimate f^\nabla\hat{f} numerically. We show estimates for f^\nabla\hat{f} and f^\hat{f} in the plots below.

Refer to caption
Refer to caption
Figure 3. Plot of estimates for f^\nabla\hat{f} and f^\hat{f} for μ=𝒩(0,2),ν=𝒩(0,1)\mu=\mathcal{N}(0,2),\nu=\mathcal{N}(0,1), d=1d=1. Both plots use N=100N=100 samples.
Refer to caption
Figure 4. Plot of estimate for f^\hat{f} for μ=𝒩(0,4I),ν=𝒩(0,I)\mu=\mathcal{N}(0,4I),\nu=\mathcal{N}(0,I), d=2d=2. Both plots use N=100N=100 samples.

6. Remaining proofs

Proof of Corollary 2.3.

Recall that a function g:dg:\mathbb{R}^{d}\to\mathbb{R} is cc-concave, iff f(x):=|x|2/2g(x)f(x):=|x|^{2}/2-g(x) is convex. In particular

g𝑑μg𝑑ν\displaystyle\int g\,d\mu-\int g\,d\nu =([|x|22g(x)]μ(dx)+[|y|22g(y)]ν(dy))\displaystyle=\left(-\int\left[\frac{|x|^{2}}{2}-g(x)\right]\,\mu(dx)+\int\left[\frac{|y|^{2}}{2}-g(y)\right]\,\nu(dy)\right)
+|x|22μ(dx)|y|22ν(dy)\displaystyle\qquad+\int\frac{|x|^{2}}{2}\,\mu(dx)-\int\frac{|y|^{2}}{2}\,\nu(dy)
=f𝑑νf𝑑μ+|x|22μ(dx)|y|22ν(dy).\displaystyle=\int f\,d\nu-\int f\,d\mu+\int\frac{|x|^{2}}{2}\,\mu(dx)-\int\frac{|y|^{2}}{2}\,\nu(dy).

By (10) we obtain

infgc-concave(g𝑑μg𝑑ν)\displaystyle\inf_{g\ c\text{-concave}}\left(\int g\,d\mu-\int g\,d\nu\right)
=inff convex(f𝑑νf𝑑μ)+|x|22μ(dx)|y|22ν(dy)\displaystyle=\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)+\int\frac{|x|^{2}}{2}\,\mu(dx)-\int\frac{|y|^{2}}{2}\,\nu(dy)
=12infρ𝒫(d)(𝒲22(μ,ρ)𝒲22(ν,ρ)+|y|2ν(dy)|x|2μ(dx)\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}^{2}(\mu,\rho)-\mathcal{W}_{2}^{2}(\nu,\rho)+\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx)
+|x|2μ(dx)|y|2ν(dy))\displaystyle\qquad\qquad\qquad+\int|x|^{2}\,\mu(dx)-\int|y|^{2}\,\nu(dy)\Big{)}
=12infρ𝒫(d)(𝒲2(μ,ρ)2𝒲2(ν,ρ)2).\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}\Big{)}.

This concludes the proof. ∎

Proof of Corollary 2.4.

The first claim follows from Corollary 1.2 by setting ρ=μ\rho=\mu. By (4) the above implies

2|x|2μ(dx)2supπΠ(μ,ν)x,yπ(dx,dy),\displaystyle 2\int|x|^{2}\,\mu(dx)\leq 2\sup_{\pi\in\Pi(\mu,\nu)}\int\langle x,y\rangle\,\pi(dx,dy),

so the second claim follows. ∎

Proof of Corollary 2.1.

First, (Wang et al., 2020, Theorem 2 & Lemma 1) show that μcν\mu\preceq_{c}\nu iff

(11) 01[Fν1(1u)Fμ(1u)]𝑑h(u)0\displaystyle\int_{0}^{1}[F_{\nu}^{-1}(1-u)-F_{\mu}(1-u)]\,dh(u)\geq 0

for all concave functions hh such that the above integral is finite. As any concave function is Lebesgue-almost surely differentiable, standard approximation arguments imply that (11) holds iff

01g(u)[Fν1(u)Fμ1(u)]𝑑u0\displaystyle\int_{0}^{1}g(u)[F_{\nu}^{-1}(u)-F_{\mu}^{-1}(u)]\,du\geq 0

for all bounded increasing left-continuous functions g:(0,1)g:(0,1)\to\mathbb{R}. But

{Fρ1:ρ𝒫() with bounded support}\displaystyle\{F_{\rho}^{-1}:\ \rho\in\mathcal{P}(\mathbb{R})\text{ with bounded support}\}

is exactly the set of all bounded increasing left-continuous functions on (0,1)(0,1). Noting that by (Villani, 2003, Equation (2.47))

𝒲2(ν,ρ)2\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2} =01(Fν1(x)Fρ1(x))2𝑑x\displaystyle=\int_{0}^{1}(F_{\nu}^{-1}(x)-F_{\rho}^{-1}(x))^{2}\,dx
=y2ν(dy)201Fν1(x)Fρ1(x)𝑑x+z2ρ(dz),\displaystyle=\int y^{2}\,\nu(dy)-2\int_{0}^{1}F_{\nu}^{-1}(x)F_{\rho}^{-1}(x)\,dx+\int z^{2}\,\rho(dz),

we calculate

𝒲2(ν,ρ)2𝒲2(μ,ρ)2\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2} =y2ν(dy)201Fρ1(u)Fν1(u)𝑑u+z2ρ(dz)\displaystyle=\int y^{2}\,\nu(dy)-2\int_{0}^{1}F_{\rho}^{-1}(u)F_{\nu}^{-1}(u)\,du+\int z^{2}\,\rho(dz)
x2μ(dy)+201Fρ1(u)Fμ1(u)𝑑uz2ρ(dz)\displaystyle\quad-\int x^{2}\,\mu(dy)+2\int_{0}^{1}F_{\rho}^{-1}(u)F_{\mu}^{-1}(u)\,du-\int z^{2}\,\rho(dz)
=201Fρ1(u)[Fμ1(u)Fν1(u)]𝑑u\displaystyle=2\int_{0}^{1}F_{\rho}^{-1}(u)[F_{\mu}^{-1}(u)-F_{\nu}^{-1}(u)]\,du
+y2ν(dy)x2μ(dy).\displaystyle\quad+\int y^{2}\,\nu(dy)-\int x^{2}\,\mu(dy).

This concludes the proof. ∎

References

  • Acciaio et al. [2013] B. Acciaio, M. Beiglböck, F. Penkner, and W. Schachermayer. A model-free version of the Fundamental Theorem of Asset Pricing and the Super-replication Theorem. Math. Finance, DOI: 10.1111/mafi.12060, 2013.
  • Alfonsi and Jourdain [2020] A. Alfonsi and B. Jourdain. Squared quadratic Wasserstein distance: optimal couplings and Lions differentiability. ESAIM Prob. Stat., 24:703–717, 2020.
  • Alfonsi et al. [2019] A. Alfonsi, J. Corbetta, and B. Jourdain. Sampling of one-dimensional probability measures in the convex order and computation of robust option price bounds. Int. J. Theor. Appl. Finance, 22(3), 2019.
  • Alfonsi et al. [2020] A. Alfonsi, J. Corbetta, and B. Jourdain. Sampling of probability measures in the convex order by Wasserstein projection. Ann. Henri Poincare, 56(3):1706–1729, 2020.
  • Arnold [2012] B. Arnold. Majorization and the Lorenz order: A brief introduction, volume 43. Springer Science & Business Media, 2012.
  • Beiglböck et al. [2013] M. Beiglböck, P. Henry-Labordère, and F. Penkner. Model-independent bounds for option prices—a mass transport approach. Finance Stoch., 17(3):477–501, 2013.
  • Beiglböck et al. [2015] M. Beiglböck, M. Nutz, and N. Touzi. Complete duality for martingale optimal transport on the line. Ann. Prob., 45(5):3038–3074, 2015.
  • Bernard et al. [2017] C. Bernard, L. Rüschendorf, and S. Vanduffel. Value-at-risk bounds with variance constraints. J. Risk Insur., 84(3):923–959, 2017.
  • Breeden and Litzenberger [1978] D. Breeden and R. Litzenberger. Prices of state-contingent claims implicit in option prices. Journal of Business, pages 621–651, 1978.
  • Brenier [1991] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Commu. Pure Appl. Math., 44(4):375–417, 1991.
  • Carlier [2008] Guillaume Carlier. Remarks on toland’s duality, convexity constraint and optimal transport. Pacific Journal of Optimization, 4(3):423–432, 2008.
  • De March and Touzi [2019] H. De March and N. Touzi. Irreducible convex paving for decomposition of multidimensional martingale transport plans. Ann. Prob., 47(3):1726–1774, 2019.
  • Deb et al. [2021] Nabarun Deb, Promit Ghosal, and Bodhisattva Sen. Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. Advances in Neural Information Processing Systems, 34:29736–29753, 2021.
  • Embrechts et al. [2013] P. Embrechts, G. Puccetti, and L. Rüschendorf. Model uncertainty and var aggregation. J. Bank. Financ., 37(8):2750–2764, 2013.
  • Galichon et al. [2014] A. Galichon, P. Henry-Labordère, and N. Touzi. A stochastic control approach to no-arbitrage bounds given marginals, with an application to lookback options. Ann. Appl. Prob., 24(1):312–336, 2014.
  • Guo and Obłój [2019] Gaoyue Guo and Jan Obłój. Computational methods for martingale optimal transport problems. Ann. Appl. Prob., 29(6):3311–3347, 2019.
  • Guyon et al. [2017] Julien Guyon, Romain Menegaux, and Marcel Nutz. Bounds for VIX futures given S&P 500 smiles. Finance and Stochastics, 21:593–630, 2017.
  • Jourdain and Margheriti [2022] B. Jourdain and W. Margheriti. Martingale Wasserstein inequality for probability measures in the convex order. Bernoulli, 28(2):830–858, 2022.
  • Kantorovich [1958] L. Kantorovich. On the translocation of masses. Manag. Sci., (5):1–4, 1958.
  • Massa and Siorpaes [2022] M. Massa and P. Siorpaes. How to quantise probabilities while preserving their convex order. arXiv preprint arXiv:2206.10514, 2022.
  • Monge [1781] G. Monge. Mémoire sur la théorie des déblais et des remblais. De l’Imprimerie Royale, 1781.
  • Müller and Stoyan [2002] A. Müller and D. Stoyan. Comparison methods for stochastic models and risks, volume 389. Wiley, 2002.
  • Nendel [2020] M. Nendel. A note on stochastic dominance, uniform integrability and lattice properties. Bull. Lond. Math. Soc., 52(5):907–923, 2020.
  • Obłój and Siorpaes [2017] J. Obłój and P. Siorpaes. Structure of martingale transports in finite dimensions. arXiv preprint arXiv:1702.08433, 2017.
  • Peyré and Cuturi [2019] G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  • Rachev and Rüschendorf [1998] S. Rachev and L. Rüschendorf. Mass Transportation Problems: Volume I: Theory, volume 1. Springer Science &\& Business Media, 1998.
  • Ross et al. [1996] S. M Ross, J. Kelly, R. Sullivan, W. Perry, D. Mercer, R. Davis, T. Washburn, E. Sager, J. Boyce, and V. Bristow. Stochastic processes, volume 2. Wiley New York, 1996.
  • Rüschendorf and Rachev [1990] L. Rüschendorf and S. Rachev. A characterization of random variables with minimum L2-distance. J. Multivariate Anal., 32(1):48–54, 1990.
  • Rüschendorf and Uckelmann [2002] L. Rüschendorf and L. Uckelmann. Variance minimization and random variables with constant sum. In et al. Cuadras, editor, Distributions with given marginals and statistical modelling, pages 211–222. Springer, 2002.
  • Shaked and Shanthikumar [2007] M. Shaked and J. Shanthikumar. Stochastic orders. Springer, 2007.
  • Strassen [1965] V. Strassen. The existence of probability measures with given marginals. Ann. Math. Statist., pages 423–439, 1965.
  • Tchen [1980] A. Tchen. Inequalities for distributions with given marginals. Ann. Prob., pages 814–827, 1980.
  • Villani [2003] C. Villani. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.
  • Villani [2008] C. Villani. Optimal transport: old and new, volume 338. Springer Berlin, 2008.
  • Wang and Wang [2011] B. Wang and R. Wang. The complete mixability and convex minimization problems with monotone marginal densities. J. Multivariate Anal., 102(10):1344–1360, 2011.
  • Wang et al. [2020] Q. Wang, R. Wang, and Y. Wei. Distortion riskmetrics on general spaces. Astin Bull., 50(3):827–851, 2020.