An optimal transport based characterization of convex order

Johannes Wiesel Johannes Wiesel
Columbia University, Department of Statistics
1255 Amsterdam Avenue
New York, NY 10027, USA [email protected] and Erica Zhang Erica Zhang
Columbia University, Department of Statistics
1255 Amsterdam Avenue
New York, NY 10027, USA [email protected]

Abstract.

For probability measures $\mu,\nu$ and $\rho$ define the cost functionals

\displaystyle C(\mu,\rho):=\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy),\quad C(\nu,\rho):=\sup_{\pi\in\Pi(\nu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy),

where $\langle\cdot,\cdot\rangle$ denotes the scalar product and $\Pi(\cdot,\cdot)$ is the set of couplings. We show that two probability measures $\mu$ and $\nu$ on $\mathbb{R}^{d}$ with finite first moments are in convex order (i.e. $\mu\preceq_{c}\nu$ ) iff $C(\mu,\rho)\leq C(\nu,\rho)$ holds for all probability measures $\rho$ on $\mathbb{R}^{d}$ with bounded support. This generalizes a result by Carlier. Our proof relies on a quantitative bound for the infimum of $\int f\,d\nu-\int f\,d\mu$ over all $1$ -Lipschitz functions $f$ , which is obtained through optimal transport duality and Brenier’s theorem. Building on this result, we derive new proofs of well-known one-dimensional characterizations of convex order. We also describe new computational methods for investigating convex order and applications to model-independent arbitrage strategies in mathematical finance.

JW acknowledges support by NSF Grant DMS-2205534. Part of this research was performed while JW was visiting the Institute for Mathematical and Statistical Innovation (IMSI), which is supported by the National Science Foundation (Grant No. DMS-1929348). JW thanks Beatrice Acciaio, Guillaume Carlier, Max Nendel, Gudmund Pammer and Ruodu Wang for helpful discussions. EZ acknowledges support through the summer internship program of the Columbia university statistics department.

1. Introduction and main result

Fix two probability measures $\mu,\nu\in\mathcal{P}(\mathbb{R}^{d})$ with

\int|x|\,\mu(dx)<\infty,\quad\int|y|\,\nu(dy)<\infty.

Recall that $\mu$ and $\nu$ are in convex order (denoted by $\mu\preceq_{c}\nu$ ) iff

\displaystyle\int f\,d\mu\leq\int f\,d\nu\qquad\text{for all convex functions }f:\mathbb{R}^{d}\to\mathbb{R}.

As any convex function is bounded from below by an affine function, the above integrals take values in $(-\infty,\infty]$ . The notion of convex order is very well studied, see e.g. Ross et al. (1996); Müller and Stoyan (2002); Shaked and Shanthikumar (2007); Arnold (2012) and the references therein for an overview. It plays a pivotal role in mathematical finance since Strassen (1965) established that $\mu\preceq_{c}\nu$ if and only if $\mathcal{M}(\mu,\nu)$ — the set of martingale laws on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ with marginals $\mu$ and $\nu$ — is non-empty. This result is also the reason why convex order has taken the center stage in the field of martingale optimal transport, see e.g. Galichon et al. (2014); Beiglböck et al. (2013, 2015); De March and Touzi (2019); Obłój and Siorpaes (2017); Guo and Obłój (2019); Alfonsi et al. (2019, 2020); Alfonsi and Jourdain (2020); Jourdain and Margheriti (2022); Massa and Siorpaes (2022) and the references therein. Furthermore, convex order plays a pivotal role in dependence modelling and risk aggregation, see e.g. Tchen (1980); Rüschendorf and Uckelmann (2002); Wang and Wang (2011); Embrechts et al. (2013); Bernard et al. (2017).
While there is an abundance of explicit characterizations of convex order available in one dimension (i.e. $d=1$ ) – see e.g. (Shaked and Shanthikumar, 2007, Chapter 3)) — the case $d>1$ seems to be less studied to the best of our knowledge. The main goal of this article is to fill this gap: we discuss a characterization of convex order, that holds in general dimensions, and is based on the theory of optimal transport (OT). Optimal transport goes back to the seminal works of Monge (1781) and Kantorovich (1958). It is concerned with the problem of transporting probability distributions in a cost-optimal way. We refer to Rachev and Rüschendorf (1998) and Villani (2003, 2008) for an overview. For this paper we only need a few basic concepts from OT. Most importantly we will need the cost functionals

\displaystyle C(\mu,\rho):=\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy),\qquad C(\nu,\rho):=\sup_{\pi\in\Pi(\nu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy).

Here $\Pi(\mu,\nu)$ denotes the set of probability measures on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ with marginals $\mu$ and $\nu$ . Our main result is the following:

Theorem 1.1.

Assume that $\mu,\nu\in\mathcal{P}(\mathbb{R}^{d})$ have finite first moments. Then

(1)

\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right),

where

\displaystyle\mathcal{P}^{1}(\mathbb{R}^{d}):=\{\rho\in\mathcal{P}(\mathbb{R}^{d}):\ \mathrm{supp}(\rho)\subseteq B_{1}(0)\}

and

\displaystyle\mathcal{C}^{1}(\mathbb{R}^{d}):=\{f:\mathbb{R}^{d}\to\mathbb{R}\ \mathrm{convex},1\text{-}\mathrm{Lipschitz}\}.

Theorem 1.1 states that convex order of $\mu$ and $\nu$ is equivalent to an order relation $C(\cdot,\cdot)$ on the space of probability measures. Contrary to standard characterizations of convex order using potential functions or cdfs, it holds in any dimension and can be seen as a natural generalization of the following result:

Corollary 1.2.

Denote the 2-Wasserstein metric by

\displaystyle\mathcal{W}_{2}(\mu,\nu):=\inf_{\pi\in\Pi(\mu,\nu)}\sqrt{\int|x-y|^{2}\,\pi(dx,dy)}.

If $\mu$ and $\nu$ have finite second moment, then they are in convex order if and only if

(2)

\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}\leq\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx)

holds for all probability measures $\rho$ on $\mathbb{R}^{d}$ with bounded support.

Corollary 1.2 itself has an interesting history. To the best of our knowledge, it was first stated in Carlier (2008) for compactly supported measures $\mu,\nu$ . His proof relies on a well-known connection between convex functions and OT for the squared Euclidean distance called Brenier’s theorem (see Brenier (1991); Rüschendorf and Rachev (1990)) together with a certain probabilistic first-order condition, see (Carlier, 2008, Proposition 1). We emphasize here, that contrary to the setting of Brenier’s theorem, no assumptions on the probability measures $\mu$ and $\nu$ except for the compact support condition are made; in particular there is no need to assume that these are absolutely continuous wrt. the Lebesgue measure.

Interestingly, Carlier’s result does not seem to be very well-known in the literature on stochastic order. We conjecture that this is mainly due to his use of the french word “balayée” instead of convex order, so that the connection is not immediately apparent. For this reason, one aim of this note is to popularize Carlier’s result, making it accessible to a wider audience, while simultaneously showcasing potential applications. As it turns out, Corollary 1.2 is at least partially known to the mathematical finance community: indeed, the “only if” direction of Corollary 1.2 was rediscovered in (Alfonsi and Jourdain, 2020, Equation (2.2)) for (not necessarily compactly supported) probability measures $\mu,\nu$ with finite second moments.

Theorem 1.1 differs from Carlier’s work in three aspects: first, as the convex order is classically embedded in $\mathcal{P}_{1}(\mathbb{R}^{d})$ and does not require moments of higher order or compact support assumptions (see e.g. Nendel (2020)), Theorem 1.1 is simultaneously more concise and arguably more natural than Corollary 1.2. Second, our proof of Theorem 1.1 (and thus also Corollary 1.2) follows a different route than Carlier’s original proof, who argues purely on the space probability measures (i.e. the “primal side” in optimal transport). Instead, we combine Brenier’s theorem with the theory of the classical optimal transport duality. Lastly, we discuss three implications of Theorem 1.1: we first give a proof of a characterization of convex order in one dimension through quantile functions. Then we use Theorem 1.1 to derive new computational methods for testing convex order between $\mu$ and $\nu$ . For the computation we exploit state of the art computational OT methods, which are efficient for potentially high-dimensional problems. These have recently seen a spike in research activity. We refer to Peyré and Cuturi (2019) for an overview. Finally we discuss applications of Theorem 1.1 to the theory of so-called model-independent arbitrages, see (Acciaio et al., 2013, Definition 1.2).

This article is structured as follows: in Section 2 we state examples and consequences of Theorem 1.1. In particular we connect it to some well-known results in the theory of convex order. The proof of the main results is given in Section 3. Sections 4 and 5 discuss numerical and mathematical finance applications of Theorem 1.1 respectively. Remaining proofs are collected in Section 6.

2. Discussion and consequences of main results

To sharpen intuition, let us first discuss the case $d=1$ . By Theorem 1.1 we can obtain a new proof of a well-known representation of convex order on the real line, see e.g. (Shaked and Shanthikumar, 2007, Theorem 3.A.5). Here we denote the quantile function of a probability measure $\mu$ by

\displaystyle F_{\mu}^{-1}(x):=\inf\{y\in\mathbb{R}:\ \mu((-\infty,y])\geq x\}.

Corollary 2.1.

For $d=1$ we have

\displaystyle\mu\preceq_{c}\nu

\displaystyle\Leftrightarrow\int_{0}^{x}[F^{-1}_{\mu}(y)-F_{\nu}^{-1}(y)]\,dy\geq 0.

for all $x\in[0,1]$ , with equality for $x=1$ .

The proofs of all results of this section are collected in Section 6. We continue with general $d\in\mathbb{N}$ and give a geometric interpretation of Corollary 1.2 by restating it as follows: $\mu\preceq_{c}\nu$ holds iff

(3)

\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}\leq\mathcal{W}_{2}(\nu,\delta_{z})^{2}-\mathcal{W}(\mu,\delta_{z})^{2}

for all $\rho\in\mathcal{P}(\mathbb{R}^{d})$ with bounded support, where $\delta_{z}$ , $z\in\mathbb{R}^{d}$ is a Dirac measure. Indeed, varying $\rho$ over Dirac measures in (2) implies that the means of $\mu$ and $\nu$ have to be equal; equation (3) then follows from simple algebra. This implies in particular that the difference between squared Wasserstein cost from $\nu$ and $\mu$ to $\rho$ is maximised at the point masses. Lastly, Theorem 1.1 can also be reformulated as: $\mu\preceq_{c}\nu$ iff

\displaystyle\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,z\rangle\,\pi(dx,dz)\leq\sup_{\pi\in\Pi(\nu,\rho)}\int\langle y,z\rangle\,\pi(dy,dz),

i.e. for any $\rho\in\mathcal{P}(\mathbb{R}^{d})$ with bounded support, the maximal covariance between $\mu$ and $\rho$ is less than the one between $\nu$ and $\rho$ . This provides a natural intuition for a classical pedestrian description of convex order, namely that “ $\nu$ being more spread out than $\mu$ ”.

We next give a simple example for Corollary 1.2.

Example 2.2.

Let us take $\mu=\delta_{0}$ and $\nu$ with mean zero. Now, recalling (4) and bounding $\mathcal{W}_{2}(\nu,\rho)$ from above by choosing the product coupling, we obtain that for any $\rho$ with finite second moment

	$\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}$	$\displaystyle=\mathcal{W}_{2}(\nu,\rho)^{2}-\int\|x\|^{2}\,\rho(dx)$
		$\displaystyle\leq\int\|y\|^{2}\nu(dy)-\int 2\langle x,y\rangle\,\nu(dx)\rho(dy)$
		$\displaystyle=\int\|y\|^{2}\nu(dy)$
		$\displaystyle=\int\|y\|^{2}\,\nu(dy)-\int\|x\|^{2}\,\mu(dx).$

In conclusion we recover the well-known fact $\delta_{0}\preceq_{c}\nu$ .

We now state two direct corollaries of Corollary 1.2. We consider the cost $c(x,y):=|x-y|^{2}/2$ and recall that a function $f$ is $c$ -concave, if

\displaystyle f(x)=\inf_{y\in\mathbb{R}^{d}}(g(y)-c(x,y))

for some function $g:\mathbb{R}^{d}\to\mathbb{R}$ . We then have the following:

Corollary 2.3.

We have

\displaystyle\int g\,d\nu\leq\int g\,d\mu\qquad\text{for all }c\text{-concave functions }g:\mathbb{R}^{d}\to\mathbb{R}

if and only if

\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}\leq\mathcal{W}_{2}(\mu,\rho)^{2}\qquad\text{for all }\rho\in\mathcal{P}(\mathbb{R}^{d})\text{ with compact support}.

Corollary 1.2 also directly implies the following well-known result:

Corollary 2.4.

If $\mu\preceq_{c}\nu$ then

\mathcal{W}_{2}(\mu,\nu)^{2}\leq\int|y|^{2}\,\nu(dy)-\int|x|^{2}\,\mu(dx).

In particular $\mu\preceq_{c}\nu$ implies

\displaystyle\ \sup_{\pi\in\Pi(\mu,\nu)}\int\langle x,y\rangle\,\pi(dx,dy)\geq\int|x|^{2}\,\mu(dx).

3. Proof of Theorem 1.1

Let us start by setting up some notation. We denote the scalar product on $\mathbb{R}^{d}$ by $\langle\cdot,\cdot\rangle$ . We write $|\cdot|$ for the Euclidean norm on $\mathbb{R}^{d}$ . The ball in $\mathbb{R}^{d}$ around $x$ of radius $r>0$ will be denoted by $B_{r}(x)$ . We write $\nabla f(x)$ for the derivative of a function $f:\mathbb{R}^{d}\to\mathbb{R}$ at a point $x\in\mathbb{R}^{d}$ . We denote the $d$ -dimensional Lebesgue measure by $\lambda$ .

In order to keep this article self-contained, we summarise some properties of optimal transport at the beginning of this section, and refer to (Villani, 2003, Chapter 2.1) for a more detailed treatment.
By definition we have for any $\rho\in\mathcal{P}_{2}(\mathbb{R}^{d})$ that

(4)

\displaystyle\mathcal{W}_{2}(\mu,\rho)^{2}=\int|x|^{2}\,\mu(dx)+\int|y|^{2}\,\rho(dy)-2\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy).

In this section we thus (re-)define the cost function $c(x,y):=\langle x,y\rangle$ and recall that the convex conjugate $f^{*}:\mathbb{R}^{d}\to\mathbb{R}\cup\{\infty\}$ of a function $f:\mathbb{R}^{d}\to\mathbb{R}$ is given by

\displaystyle f^{*}(y):=\sup_{x\in\mathbb{R}^{d}}\left(\langle y,x\rangle-f(x)\right).

The subdifferential of a proper convex function $f:\mathbb{R}^{d}\to\mathbb{R}\cup\{\infty\}$ is defined as

\displaystyle\partial f(x):=\{y\in\mathbb{R}^{d}:\ f(x^{\prime})-f(x)\geq\langle y,x^{\prime}-x\rangle\text{ for all }x^{\prime}\in\mathbb{R}^{d}\}.

It is non-empty if $x$ belongs to the interior of the domain of $f$ . We have

(5)

\displaystyle f(x)+f^{*}(y)-\langle x,y\rangle=0\quad\Leftrightarrow\quad y\in\partial f(x).

Lastly we recall the duality

(6)

\displaystyle\begin{split}C(\mu,\rho)&=\sup_{\pi\in\Pi(\mu,\rho)}\int\langle x,y\rangle\,\pi(dx,dy)\\ &=\inf_{f\oplus g\geq c}\int f\,d\mu+\int g\,d\rho\\ &=\inf_{f\oplus g\geq c,\ f,g\text{ proper, convex }}\int f\,d\mu+\int g\,d\rho\end{split}

and the existence of an optimal pair $(f,f^{*})$ of (lower semicontinuous, proper) convex conjugate functions. Replacing $\mu$ by $\nu$ in the display above, we obtain a similar duality for $C(\nu,\rho)$ .

3.1. Proof of Theorem 1.1: the equivalent case

We first prove Theorem 1.1 for measures $\mu,\nu$ , which are equivalent to the $d$ -dimensional Lebesgue measure $\lambda$ , i.e. $\mu,\nu\sim\lambda$ . As $\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d})$ , the domain of the optimising potential $f$ for $C(\mu,\rho)$ (resp. $C(\nu,\rho)$ ) is $\mathbb{R}^{d}$ in this case. Recall furthermore that

(7)

\displaystyle\partial f(x)=\overline{\text{Conv}}(\lim_{x_{k}\to x}\nabla f(x_{k})),

see e.g. (Villani, 2003, 2.1.3.3)). We write

\|\partial f\|_{\infty}:=\sup_{x\in\mathbb{R}^{d}}\sup_{y\in\partial f(x)}|y|.

We now prove Theorem 1.1 when $\mu,\nu\sim\lambda$ :

Proposition 3.1.

Assume $\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d})$ , $\mu,\nu\sim\lambda$ . Recall

\displaystyle\mathcal{P}^{1}(\mathbb{R}^{d})=\{\rho\in\mathcal{P}(\mathbb{R}^{d}):\ \mathrm{supp}(\rho)\subseteq B_{1}(0)\}

as well as the 1-Lipschitz convex functions

\displaystyle\mathcal{C}^{1}(\mathbb{R}^{d})=\{f:\mathbb{R}^{d}\to\mathbb{R}\ \mathrm{convex},\|\partial f\|_{\infty}\leq 1\}.

Then we have

\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

Proof.

As $\mu,\nu$ have finite first moment and $\rho$ is compactly supported, $|C(\mu,\rho)|,|C(\nu,\rho)|<\infty$ follows from Hölder’s inequality. We now fix $\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})$ and take an optimal convex pair $(\hat{f},\hat{g})$ in (6) for $C(\nu,\rho)$ . Next we apply Brenier’s theorem in the form of (Villani, 2003, Theorem 2.12)), which states that $\rho=\nabla\hat{f}(x)_{*}\nu$ .¹¹1We note that the result is stated under the additional requirement that $\nu,\rho\in\mathcal{P}_{2}(\mathbb{R}^{d})$ . However as $\rho$ is supported on the unit ball, it can be checked that the arguments of (Villani, 2003, proof of Theorem 2.9) (in particular boundedness from below) carry over, when simply adding $|x|$ to the potential $\hat{f}$ instead of adding $|x|^{2}/2$ to $\hat{f}$ and $|y|^{2}/2$ to $\hat{g}$ . Furthermore, as $\text{supp}(\rho)\subseteq B_{1}(0)$ we conclude $\|\partial\hat{f}\|_{\infty}\leq 1$ by (7) and

	$\displaystyle C(\nu,\rho)-C(\mu,\rho)$	$\displaystyle\geq\int\hat{f}\,d\nu+\int\hat{g}\,d\rho-\left(\int\hat{f}\,d\mu+\int\hat{g}\,d\rho\right)$
		$\displaystyle=\int\hat{f}\,d\nu-\int\hat{f}\,d\mu$
		$\displaystyle\geq\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right).$

Taking the infimum over $\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})$ shows that

\displaystyle\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)\geq\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right).

On the other hand, fix $f\in\mathcal{C}^{1}(\mathbb{R}^{d})$ and set $g:=f^{*}$ . Define $\hat{\rho}:=\nabla f_{*}\mu$ and note that $\hat{\rho}\in\mathcal{P}^{1}(\mathbb{R}^{d})$ . Then again by Brenier’s theorem we obtain optimality of the pair $(f,g)$ for $C(\mu,\hat{\rho})$ , and thus

	$\displaystyle\int f\,d\nu-\int f\,d\mu$	$\displaystyle=\left(\int f\,d\nu+\int g\,d\hat{\rho}\right)-\left(\int g\,d\hat{\rho}+\int f\,d\mu\right)$
		$\displaystyle\geq C(\nu,\hat{\rho})-C(\mu,\hat{\rho})$
		$\displaystyle\geq\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).$

Taking the infimum over $f\in\mathcal{C}^{1}(\mathbb{R}^{d})$ shows

\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)\geq\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

This concludes the proof. ∎

3.2. Proof of Theorem 1.1: the general case

We now prove Theorem 1.1 for general measures $\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d})$ through approximation in the $1$ -Wasserstein sense.

Proof of Theorem 1.1.

Let us take sequences of $(\mu_{n})_{n\in\mathbb{N}}$ , $(\nu_{n})_{n\in\mathbb{N}}$ in $\mathcal{P}_{1}(\mathbb{R}^{d})$ satisfying

\displaystyle\lim_{n\to\infty}\mathcal{W}_{1}(\mu,\mu_{n})=0=\lim_{n\to\infty}\mathcal{W}_{1}(\nu,\nu_{n}),\qquad\mu_{n},\nu_{n}\sim\lambda\text{ for all }n\in\mathbb{N},

where $\mathcal{W}_{1}$ denotes the $1$ -Wasserstein distance. Recall that $\mathcal{C}^{1}(\mathbb{R}^{d})$ denotes the set of convex $1$ -Lipschitz functions. Thus, e.g. by the Kantorovich-Rubinstein formula ((Villani, 2008, (5.11))),

(8)

\displaystyle\lim_{n\to\infty}\sup_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left|\int fd\mu-\int fd\mu_{n}\right|\leq\lim_{n\to\infty}\mathcal{W}_{1}(\mu,\mu_{n})=0.

The same holds for $(\nu_{n})_{n\in\mathbb{N}}$ and $\nu$ . Next, take an optimal coupling $\pi=\pi(dx,dy)$ for $C(\nu,\rho)$ and an optimal coupling $\pi^{n}=\pi^{n}(dx,dz)$ for $\mathcal{W}_{1}(\nu,\nu_{n})$ . Then $\hat{\pi}^{n}(dy,dz):=\int\pi^{n}(dx,dz)\pi_{x}(dy)$ is a coupling of $\rho$ and $\nu_{n}$ . Furthermore, as $|y|\leq 1$ $\rho$ -a.s. we have

\displaystyle\begin{split}C(\nu,\rho)-C(\nu_{n},\rho)&\leq\left|\int\langle y,x-z\rangle\,\pi^{n}(dx,dz)\pi_{x}(dy)\right|\\ &\leq\int|x-z|\,\pi^{n}(dx,dz)\\ &\leq\mathcal{W}_{1}(\nu_{n},\nu).\end{split}

Exchanging the roles of $\nu$ and $\nu_{n}$ then yields

\displaystyle|C(\nu,\rho)-C(\nu_{n},\rho)|\leq\mathcal{W}_{1}(\nu_{n},\nu).

As the rhs is independent of $\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})$ this shows

(9)

\displaystyle\lim_{n\to\infty}\sup_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left|C(\nu,\rho)-C(\nu_{n},\rho)\right|=0.

A similar argument holds for $(\mu_{n})_{n\in\mathbb{N}}$ and $\mu$ . We can now write

	$\displaystyle\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)$	$\displaystyle=\inf_{f\in\mathcal{C}^{1}(\mathbb{R}^{d})}\Bigg{[}\left(\int f\,d\nu_{n}-\int f\,d\mu_{n}\right)$
		$\displaystyle+\left(\int f\,d\nu-\int f\,d\nu_{n}\right)-\left(\int f\,d\mu-\int f\,d\mu_{n}\right)\Bigg{]}$

and

	$\displaystyle\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\big{(}C(\nu,\rho)-C(\mu,\rho)\big{)}$	$\displaystyle=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\Bigg{[}\big{(}C(\nu_{n},\rho)-C(\mu_{n},\rho)\big{)}$
		$\displaystyle+\big{(}C(\nu,\rho)-C(\nu_{n},\rho)\big{)}-\big{(}C(\mu,\rho)-C(\mu_{n},\rho)\big{)}.\Bigg{]}$

Applying Proposition 3.1, taking $n\to\infty$ and using (8), (9) then concludes the proof. ∎

3.3. Proof of Corollary 1.2

We now detail the proof of Corollary 1.2. We start with a preliminary result, which is an immediately corollary of Theorem 1.1.

Corollary 3.2.

Assume $\mu,\nu\in\mathcal{P}_{1}(\mathbb{R}^{d})$ . Then we have

(10)

\displaystyle\inf_{f\ \mathrm{convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right),

where $\mathcal{P}^{\infty}(\mathbb{R}^{d})$ denotes the set of probability measures with bounded support. In particular

\displaystyle\int f\,d\mu\leq\int f\,d\nu\qquad\text{for all convex functions }f:\mathbb{R}^{d}\to\mathbb{R}

if and only if

\displaystyle C(\mu,\rho)\leq C(\nu,\rho)\qquad\text{for all }\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d}).

Proof.

Multiplying both sides of (1) by $k>0$ yields

\displaystyle\inf_{f\in\mathcal{C}^{k}(\mathbb{R}^{d})}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{k}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)

with the definitions

\displaystyle\mathcal{P}^{k}(\mathbb{R}^{d})=\{\rho\in\mathcal{P}(\mathbb{R}^{d}):\ \text{supp}(\rho)\subseteq B_{k}(0)\}

and

\displaystyle\mathcal{C}^{k}(\mathbb{R}^{d}):=\{f:\mathbb{R}^{d}\to\mathbb{R}\text{ convex},\|\partial f\|_{\infty}\leq k\}.

Taking $k\to\infty$ we obtain

\displaystyle\inf_{f\text{ convex, Lipschitz}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

Lastly, any convex function $f:\mathbb{R}^{d}\to\mathbb{R}$ can be approximated pointwise from below by convex Lipschitz functions. Thus

\displaystyle\inf_{f\ \mathrm{convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right).

The claim thus follows. ∎

Remark 3.3.

If $\mu,\nu\in\mathcal{P}_{p}(\mathbb{R}^{d})$ for some $p\geq 1$ , then by Hölder’s inequality and density of finitely supported measures in the $q$ -Wasserstein space we also obtain

\displaystyle\inf_{f\ \mathrm{convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}_{q}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right),

where $1/p+1/q=1$ .

Proof of Corollary 1.2.

Recall from (4) that

	$\displaystyle C(\mu,\rho)$	$\displaystyle=\frac{1}{2}\left(\int\|x\|^{2}\,\mu(dx)+\int\|z\|^{2}\,\rho(dz)-\mathcal{W}_{2}(\mu,\rho)^{2}\right),$
	$\displaystyle C(\nu,\rho)$	$\displaystyle=\frac{1}{2}\left(\int\|y\|^{2}\,\nu(dy)+\int\|z\|^{2}\,\rho(dz)-\mathcal{W}_{2}(\nu,\rho)^{2}\right).$

Combining this with (10) from Corollary 3.2 yields

	$\displaystyle\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\int\|y\|^{2}\,\nu(dy)+\int\|z\|^{2}\,\rho(dz)-\mathcal{W}_{2}(\nu,\rho)^{2}$
	$\displaystyle\qquad\qquad\qquad\qquad-\int\|x\|^{2}\,\mu(dx)-\int\|z\|^{2}\,\rho(dz)+\mathcal{W}_{2}(\mu,\rho)^{2}\Big{)}$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}+\int\|y\|^{2}\,\nu(dy)-\int\|x\|^{2}\,\mu(dx)\Big{)}.$

Thus

		$\displaystyle\int f\,d\mu\leq\int f\,d\nu\qquad\text{for all convex functions }f:\mathbb{R}^{d}\to\mathbb{R}$
	$\displaystyle\Leftrightarrow$	$\displaystyle\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)\geq 0$
	$\displaystyle\Leftrightarrow$	$\displaystyle\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}\Big{)}\geq\int\|x\|^{2}\,\mu(dx)-\int\|y\|^{2}\,\nu(dy)$
	$\displaystyle\Leftrightarrow$	$\displaystyle\sup_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}\Big{)}\leq\int\|y\|^{2}\,\nu(dy)-\int\|x\|^{2}\,\mu(dx).$

The claim follows. ∎

4. Numerical examples

In this section we illustrate Theorem 1.1 numerically. We focus on the following toy examples, where convex order or its absence is easy to establish:

Example 4.1.

$\mu=\mathcal{N}(0,\sigma^{2}I)$ and $\nu=\mathcal{N}(0,I)$ for $\sigma^{2}\in[0,2]$ for $d=1,2$ .

Example 4.2.

$\mu=\frac{1}{2}\left(\delta_{-1-s}+\delta_{1+s}\right)$ and $\nu=\frac{1}{2}\left(\delta_{-1}+\delta_{1}\right)$ for $s\in[-1,1]$ .

Example 4.3.

\mu=\frac{1}{4}\left(\delta_{(-1-s,0)}+\delta_{(1+s,0)}+\delta_{(0,1+s)}+\delta_{(0,-1-s)}\right)

and

\nu=\frac{1}{4}\left(\delta_{(-1,0)}+\delta_{(1,0)}+\delta_{(0,1)}+\delta_{(0,-1)}\right)

for $s\in[-1,1]$ .

A general numerical implementation for testing convex order of the two measures $\mu,\nu\in\mathcal{P}_{2}(\mathbb{R}^{d})$ in general dimensions and the examples discussed here can be found in the Github repository https://github.com/johanneswiesel/Convex-Order. In the implementation we use the POT package (https://pythonot.github.io) to compute optimal transport distances.

Let us set

\displaystyle V(\mu,\nu):=\inf_{\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)

and note that by Theorem 1.1 we have the relationship

\displaystyle\mu\preceq_{c}\nu\quad\Leftrightarrow\quad V(\mu,\nu)\geq 0.

Clearly the computation of $V(\mu,\nu)$ hinges on the numerical exploration of the convex set of probability measures $\mathcal{P}^{1}(\mathbb{R}^{d})$ . We propose two methods for this: our first method only considers finitely supported measures $\rho$ , which are dense in $\mathcal{P}^{1}(\mathbb{R}^{d})$ in the Wasserstein topology. It relies on the Dirichlet distribution on the space $R^{g-1}$ , $g\in\mathbb{N}$ , with density

\displaystyle f(x_{1},\dots,x_{g};\alpha_{1},\dots,\alpha_{g})=\frac{1}{B(\alpha)}\prod_{i=1}^{g}x_{i}^{\alpha_{i}-1}

for $x_{1},\dots,x_{g}\in[0,1]$ satisfying $\sum_{i=1}^{g}x_{i}=1$ . Here $\alpha_{1},\dots,\alpha_{g}>0$ , $\alpha:=(\alpha_{1},\dots,\alpha_{g})$ and $B(\alpha)$ denotes the Beta function. Fixing $g$ grid points $\{k_{1},\dots,k_{g}\}$ in $B_{1}(0)$ , we can consider any realization of a Dirichlet random variable $(X_{1},\dots,X_{g})$ as a probability distribution assigning probability mass $X_{i}$ to the grid point $k_{i}$ , $i\in\{1,\dots,g\}$ . This leads to the following algorithm:

Algorithm 1 Basic algorithm for Indirect Dirichlet method

probability measures

\mu

\nu

, maximal number of evaluations

N

, number of grid points

g

V(\mu,\nu)

Generate a grid

G

B_{1}(0)

g

equidistant points and consider Dirichlet random variables modelling

\rho

supported on

G

. Use Bayesian optimization to solve

\displaystyle\inf\,[C(\rho,\nu)-C(\rho,\mu)]

over the set of Dirichlet distributions on

\mathbb{R}^{g-1}

. Terminate after

N

steps.

return

\inf\,C(\rho,\nu)-C(\rho,\mu).

The main computational challenge in Algorithm 1 is the efficient evaluation of $C(\rho,\nu)$ and $C(\rho,\mu)$ . For this we aim to write $C(\rho,\nu)$ and $C(\rho,\mu)$ as linear programs. We offer two different variants of Algorithm 1:

•

Indirect Dirichlet method with histograms: If we have access to finitely supported approximations a and b of $\mu$ and $\nu$ respectively and the measure $\rho$ is supported on $G$ as above, then we solve the linear programs $C(\textbf{a},\rho)$ and $C(\textbf{b},\rho)$ as is standard in optimal transport theory.
•

Indirect Dirichlet method with samples: here we draw a number of samples from $\mu$ and $\nu$ respectively and denote the respective empirical distributions of these samples by a and b. As before we assume that we have access to a probability measure $\rho$ supported on $G$ . We then solve the linear programs $C(\textbf{a},\rho)$ and $C(\textbf{a},\rho)$ .

An alternative to Algorithm 1 is to directly draw samples from a distribution $\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})$ . We call this the Direct randomized Dirichlet method, see Algorithm 2 below.

Algorithm 2 Direct randomized Dirichlet method

probability measures

\mu

\nu

, maximal number of evaluations

N

V(\mu,\nu)

Draw samples from

\mu

and

\nu

and denote the empirical distributions of these samples by a and b respectively. Draw samples from a Dirichlet distribution and randomize their signs, under the constraint that the empirical distribution

\rho

of these samples is an element of

\mathcal{P}^{1}(\mathbb{R}^{d})

. Use Bayesian optimization to solve

\displaystyle\inf\,[C(\rho,\nu)-C(\rho,\mu)]

over the set of these distributions. Terminate after

N

steps.

return minimal value of

\inf\,[C(\rho,\mu)-C(\rho,\nu)].

We refer to the github repository for a more detailed discussion, in particular for the implementation and further comments. For each example stated at the beginning of this section and each pair $(\mu,\nu)$ we plot $V(\mu,\nu)$ for the three methods discussed above, see Figures 3 and 4.

Refer to caption — Figure 1. Values of different estimators of $V(\mu,\nu)$ plotted against $\sigma$ for Example 4.1. Both plots use $N=100$ samples.

Discounting numerical errors, all estimators seem to detect convex order. The direct randomized Dirichlet method is less complex; however it does not seem to explore the $\mathcal{P}^{1}(\mathbb{R}^{d})$ -space as well as the two indirect Dirichlet methods. On the other hand, both of the indirect Dirichlet methods yield very similar results for the examples considered. As the name suggests, the “indirect Dirichlet method with samples” works on samples directly, which might be more convenient for practical applications on real data.

As can be expected from the numerical implementation, the histogram method consistently yields the lowest runtimes, while runtimes of the other methods are much higher. Indeed, when working with samples, the weights of the empirical distributions are constant, while the OT cost matrices $\textbf{M}_{\textbf{a}}$ and $\textbf{M}_{\textbf{b}}$ in the implementation have to re-computed in each iteration and this is very costly; for the histogram method, the weights $\rho$ change, while the grid stays constant — and thus also $\textbf{M}_{\textbf{a}}$ and $\textbf{M}_{\textbf{b}}$ .

5. Model independent arbitrage strategies

Let us consider a financial market with $d$ financial assets and denote its price process by $(S_{t})_{t\geq 0}$ . Let us assume $S_{0}=s_{0}\in\mathbb{R}$ and fix two maturities $T_{1}<T_{2}$ . If call options with these maturities are traded at all strikes, then the prices of the call options determine the distribution of $S_{T_{1}}$ and $S_{T_{2}}$ under any martingale measure; this fact was first established by Breeden and Litzenberger (1978). Let us denote the laws of $S_{T_{1}}$ and $S_{T_{2}}$ by $\mu$ and $\nu$ respectively. If trading is only allowed at $0,T_{1}$ and $T_{2}$ , the following definition is natural and will be crucial for our analysis.

Definition 5.1.

The triple of measurable functions $(u_{1},u_{2},\Delta)$ is a model-independent arbitrage if $u_{1}\in L^{1}(\mu)$ , $u_{2}\in L^{1}(\nu)$ and

\displaystyle u_{1}(x)-\int u_{1}\,d\mu+u_{2}(y)-\int u_{2}\,d\nu+\Delta(x)(y-x)>0,\quad\text{for all }(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}.

If no such strategies exist, then we call the market free of model-independent arbitrage.

In the above, $u_{1}$ and $u_{2}$ can be interpreted as payoffs of Vanilla options with market prices $\int u_{1}\,d\mu$ and $\int u_{2}\,d\nu$ respectively, while the term $\Delta(x)(y-x)$ denotes the gains or losses from buying $\Delta(x)$ assets at time $T_{1}$ and holding them until $T_{2}$ .

The following theorem makes the connection between model-independent arbitrages and convex order of $\mu$ and $\nu$ apparent. It can essentially be found in (Guyon et al., 2017, Theorem 3.4).

Theorem 5.2.

The following are equivalent:

(i)

The market is free of model-independent arbitrage.
(ii)

$\mathcal{M}(\mu,\nu)\neq\emptyset$ .
(iii)

$\mu\preceq_{c}\nu$ .

In particular, if $\mu\npreceq_{c}\nu$ , then there exists a convex function $f$ , such that the triple $(-f(x),f(y),-g(x))$ is a model-independent arbitrage. Here $g$ is a measurable selector of the subdifferential of $f$ .

The strategy $(-f(x),f(y),-g(x))$ is often called a calendar spread. As our setting is not quite exactly covered by (Guyon et al., 2017, Theorem 3.4) and the proof is not hard, we include it here.

Proof of Theorem 5.2.

(ii) $\Leftrightarrow$ (iii) is Strassen’s theorem, see Strassen (1965). If $\mu\npreceq_{c}\nu$ , then by definition there exists a convex function $f$ such that

\int f\,d\mu>\int f\,d\nu.

On the other hand, $f$ is convex and thus satisfies

\displaystyle f(y)-f(x)\geq g(x)(y-x)\quad\text{for all }(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}.

Combining the two equations above shows that $(-f(x),f(y),-g(x))$ is a model-independent arbitrage, and thus (i) $\Rightarrow$ (iii). It remains to show (ii) $\Rightarrow$ (i), which is well known. Indeed, taking expectations in the inequality

\displaystyle u_{1}(x)-\int u_{1}\,d\mu+u_{2}(y)-\int u_{2}\,d\nu+\Delta(x)(y-x)>0,\quad\text{for all }(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}

under any martingale measure with marginals $\mu,\nu$ leads to a contradiction. This concludes the proof. ∎

As a direct consequence of Theorem 5.2, we can use Theorem 1.1 to detect model-independent arbitrages in the market under consideration: indeed, Theorem 1.1 states that $\mu\npreceq_{c}\nu$ implies existence of a probability measure $\rho\in\mathcal{P}^{1}(\mathbb{R}^{d})$ satisfying

\displaystyle C(\rho,\nu)-C(\rho,\mu)<0.

Next, if $\nu\sim\lambda$ , then the proof of Theorem 1.1 shows that $\rho=\nabla\hat{f}(x)_{*}\nu$ for some convex function $\hat{f}:\mathbb{R}^{d}\to\mathbb{R}$ and

\displaystyle\int\hat{f}\,d\nu-\int\hat{f}\,d\mu\leq C(\rho,\nu)-C(\rho,\mu)<0,\quad\text{i.e. }\int\hat{f}\,d\nu<\int\hat{f}\,d\mu.

In particular, a model-independent arbitrage strategy is given by calendar spread $(-\hat{f}(x),\hat{f}(x),-\nabla\hat{f}(x))$ . Via an approximation argument, this result remains true for arbitrary probability measures $\nu\in\mathcal{P}_{1}(\mathbb{R}^{d})$ . In particular, we can use the same methods as in Section 4 to find $\rho$ . We then estimate $\nabla\hat{f}(x)$ from the optimizing transport plan $\pi\in\Pi(\rho,\nu)$ of $C(\rho,\nu)$ by taking the conditional expectation $\int x\,\pi_{y}(dx)$ , where $(\pi_{y})_{y\in\mathbb{R}^{d}}$ denotes the conditional probability distribution of $\pi$ with respect to its second marginal $\nu$ . This is a standard technique (see e.g. Deb et al. (2021) for details). In conclusion we can obtain an explicit arbitrage strategy.

To illustrate the ideas outlined above, we return to Example 4.1, i.e. $\mu=\mathcal{N}(0,\sigma^{2}I)$ and $\nu=\mathcal{N}(0,I)$ for $\sigma^{2}>0$ and $d=1,2$ . Having determined $\rho$ such that $C(\rho,\nu)-C(\rho,\mu)<0$ , we estimate $\nabla\hat{f}$ numerically. We show estimates for $\nabla\hat{f}$ and $\hat{f}$ in the plots below.

6. Remaining proofs

Proof of Corollary 2.3.

Recall that a function $g:\mathbb{R}^{d}\to\mathbb{R}$ is $c$ -concave, iff $f(x):=|x|^{2}/2-g(x)$ is convex. In particular

	$\displaystyle\int g\,d\mu-\int g\,d\nu$	$\displaystyle=\left(-\int\left[\frac{\|x\|^{2}}{2}-g(x)\right]\,\mu(dx)+\int\left[\frac{\|y\|^{2}}{2}-g(y)\right]\,\nu(dy)\right)$
		$\displaystyle\qquad+\int\frac{\|x\|^{2}}{2}\,\mu(dx)-\int\frac{\|y\|^{2}}{2}\,\nu(dy)$
		$\displaystyle=\int f\,d\nu-\int f\,d\mu+\int\frac{\|x\|^{2}}{2}\,\mu(dx)-\int\frac{\|y\|^{2}}{2}\,\nu(dy).$

By (10) we obtain

	$\displaystyle\inf_{g\ c\text{-concave}}\left(\int g\,d\mu-\int g\,d\nu\right)$
	$\displaystyle=\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)+\int\frac{\|x\|^{2}}{2}\,\mu(dx)-\int\frac{\|y\|^{2}}{2}\,\nu(dy)$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}^{2}(\mu,\rho)-\mathcal{W}_{2}^{2}(\nu,\rho)+\int\|y\|^{2}\,\nu(dy)-\int\|x\|^{2}\,\mu(dx)$
	$\displaystyle\qquad\qquad\qquad+\int\|x\|^{2}\,\mu(dx)-\int\|y\|^{2}\,\nu(dy)\Big{)}$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}\Big{)}.$

This concludes the proof. ∎

Proof of Corollary 2.4.

The first claim follows from Corollary 1.2 by setting $\rho=\mu$ . By (4) the above implies

\displaystyle 2\int|x|^{2}\,\mu(dx)\leq 2\sup_{\pi\in\Pi(\mu,\nu)}\int\langle x,y\rangle\,\pi(dx,dy),

so the second claim follows. ∎

Proof of Corollary 2.1.

First, (Wang et al., 2020, Theorem 2 & Lemma 1) show that $\mu\preceq_{c}\nu$ iff

(11)

\displaystyle\int_{0}^{1}[F_{\nu}^{-1}(1-u)-F_{\mu}(1-u)]\,dh(u)\geq 0

for all concave functions $h$ such that the above integral is finite. As any concave function is Lebesgue-almost surely differentiable, standard approximation arguments imply that (11) holds iff

\displaystyle\int_{0}^{1}g(u)[F_{\nu}^{-1}(u)-F_{\mu}^{-1}(u)]\,du\geq 0

for all bounded increasing left-continuous functions $g:(0,1)\to\mathbb{R}$ . But

\displaystyle\{F_{\rho}^{-1}:\ \rho\in\mathcal{P}(\mathbb{R})\text{ with bounded support}\}

is exactly the set of all bounded increasing left-continuous functions on $(0,1)$ . Noting that by (Villani, 2003, Equation (2.47))

	$\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}$	$\displaystyle=\int_{0}^{1}(F_{\nu}^{-1}(x)-F_{\rho}^{-1}(x))^{2}\,dx$
		$\displaystyle=\int y^{2}\,\nu(dy)-2\int_{0}^{1}F_{\nu}^{-1}(x)F_{\rho}^{-1}(x)\,dx+\int z^{2}\,\rho(dz),$

we calculate

	$\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}$	$\displaystyle=\int y^{2}\,\nu(dy)-2\int_{0}^{1}F_{\rho}^{-1}(u)F_{\nu}^{-1}(u)\,du+\int z^{2}\,\rho(dz)$
		$\displaystyle\quad-\int x^{2}\,\mu(dy)+2\int_{0}^{1}F_{\rho}^{-1}(u)F_{\mu}^{-1}(u)\,du-\int z^{2}\,\rho(dz)$
		$\displaystyle=2\int_{0}^{1}F_{\rho}^{-1}(u)[F_{\mu}^{-1}(u)-F_{\nu}^{-1}(u)]\,du$
		$\displaystyle\quad+\int y^{2}\,\nu(dy)-\int x^{2}\,\mu(dy).$

This concludes the proof. ∎

References

Acciaio et al. [2013] B. Acciaio, M. Beiglböck, F. Penkner, and W. Schachermayer. A model-free version of the Fundamental Theorem of Asset Pricing and the Super-replication Theorem. Math. Finance, DOI: 10.1111/mafi.12060, 2013.
Alfonsi and Jourdain [2020] A. Alfonsi and B. Jourdain. Squared quadratic Wasserstein distance: optimal couplings and Lions differentiability. ESAIM Prob. Stat., 24:703–717, 2020.
Alfonsi et al. [2019] A. Alfonsi, J. Corbetta, and B. Jourdain. Sampling of one-dimensional probability measures in the convex order and computation of robust option price bounds. Int. J. Theor. Appl. Finance, 22(3), 2019.
Alfonsi et al. [2020] A. Alfonsi, J. Corbetta, and B. Jourdain. Sampling of probability measures in the convex order by Wasserstein projection. Ann. Henri Poincare, 56(3):1706–1729, 2020.
Arnold [2012] B. Arnold. Majorization and the Lorenz order: A brief introduction, volume 43. Springer Science & Business Media, 2012.
Beiglböck et al. [2013] M. Beiglböck, P. Henry-Labordère, and F. Penkner. Model-independent bounds for option prices—a mass transport approach. Finance Stoch., 17(3):477–501, 2013.
Beiglböck et al. [2015] M. Beiglböck, M. Nutz, and N. Touzi. Complete duality for martingale optimal transport on the line. Ann. Prob., 45(5):3038–3074, 2015.
Bernard et al. [2017] C. Bernard, L. Rüschendorf, and S. Vanduffel. Value-at-risk bounds with variance constraints. J. Risk Insur., 84(3):923–959, 2017.
Breeden and Litzenberger [1978] D. Breeden and R. Litzenberger. Prices of state-contingent claims implicit in option prices. Journal of Business, pages 621–651, 1978.
Brenier [1991] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Commu. Pure Appl. Math., 44(4):375–417, 1991.
Carlier [2008] Guillaume Carlier. Remarks on toland’s duality, convexity constraint and optimal transport. Pacific Journal of Optimization, 4(3):423–432, 2008.
De March and Touzi [2019] H. De March and N. Touzi. Irreducible convex paving for decomposition of multidimensional martingale transport plans. Ann. Prob., 47(3):1726–1774, 2019.
Deb et al. [2021] Nabarun Deb, Promit Ghosal, and Bodhisattva Sen. Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. Advances in Neural Information Processing Systems, 34:29736–29753, 2021.
Embrechts et al. [2013] P. Embrechts, G. Puccetti, and L. Rüschendorf. Model uncertainty and var aggregation. J. Bank. Financ., 37(8):2750–2764, 2013.
Galichon et al. [2014] A. Galichon, P. Henry-Labordère, and N. Touzi. A stochastic control approach to no-arbitrage bounds given marginals, with an application to lookback options. Ann. Appl. Prob., 24(1):312–336, 2014.
Guo and Obłój [2019] Gaoyue Guo and Jan Obłój. Computational methods for martingale optimal transport problems. Ann. Appl. Prob., 29(6):3311–3347, 2019.
Guyon et al. [2017] Julien Guyon, Romain Menegaux, and Marcel Nutz. Bounds for VIX futures given S&P 500 smiles. Finance and Stochastics, 21:593–630, 2017.
Jourdain and Margheriti [2022] B. Jourdain and W. Margheriti. Martingale Wasserstein inequality for probability measures in the convex order. Bernoulli, 28(2):830–858, 2022.
Kantorovich [1958] L. Kantorovich. On the translocation of masses. Manag. Sci., (5):1–4, 1958.
Massa and Siorpaes [2022] M. Massa and P. Siorpaes. How to quantise probabilities while preserving their convex order. arXiv preprint arXiv:2206.10514, 2022.
Monge [1781] G. Monge. Mémoire sur la théorie des déblais et des remblais. De l’Imprimerie Royale, 1781.
Müller and Stoyan [2002] A. Müller and D. Stoyan. Comparison methods for stochastic models and risks, volume 389. Wiley, 2002.
Nendel [2020] M. Nendel. A note on stochastic dominance, uniform integrability and lattice properties. Bull. Lond. Math. Soc., 52(5):907–923, 2020.
Obłój and Siorpaes [2017] J. Obłój and P. Siorpaes. Structure of martingale transports in finite dimensions. arXiv preprint arXiv:1702.08433, 2017.
Peyré and Cuturi [2019] G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
Rachev and Rüschendorf [1998] S. Rachev and L. Rüschendorf. Mass Transportation Problems: Volume I: Theory, volume 1. Springer Science $\&$ Business Media, 1998.
Ross et al. [1996] S. M Ross, J. Kelly, R. Sullivan, W. Perry, D. Mercer, R. Davis, T. Washburn, E. Sager, J. Boyce, and V. Bristow. Stochastic processes, volume 2. Wiley New York, 1996.
Rüschendorf and Rachev [1990] L. Rüschendorf and S. Rachev. A characterization of random variables with minimum L2-distance. J. Multivariate Anal., 32(1):48–54, 1990.
Rüschendorf and Uckelmann [2002] L. Rüschendorf and L. Uckelmann. Variance minimization and random variables with constant sum. In et al. Cuadras, editor, Distributions with given marginals and statistical modelling, pages 211–222. Springer, 2002.
Shaked and Shanthikumar [2007] M. Shaked and J. Shanthikumar. Stochastic orders. Springer, 2007.
Strassen [1965] V. Strassen. The existence of probability measures with given marginals. Ann. Math. Statist., pages 423–439, 1965.
Tchen [1980] A. Tchen. Inequalities for distributions with given marginals. Ann. Prob., pages 814–827, 1980.
Villani [2003] C. Villani. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.
Villani [2008] C. Villani. Optimal transport: old and new, volume 338. Springer Berlin, 2008.
Wang and Wang [2011] B. Wang and R. Wang. The complete mixability and convex minimization problems with monotone marginal densities. J. Multivariate Anal., 102(10):1344–1360, 2011.
Wang et al. [2020] Q. Wang, R. Wang, and Y. Wei. Distortion riskmetrics on general spaces. Astin Bull., 50(3):827–851, 2020.

	$\displaystyle\mathcal{W}_{2}(\nu,\rho)^{2}-\mathcal{W}_{2}(\mu,\rho)^{2}$	$\displaystyle=\mathcal{W}_{2}(\nu,\rho)^{2}-\int\|x\|^{2}\,\rho(dx)$
		$\displaystyle\leq\int\|y\|^{2}\nu(dy)-\int 2\langle x,y\rangle\,\nu(dx)\rho(dy)$
		$\displaystyle=\int\|y\|^{2}\nu(dy)$
		$\displaystyle=\int\|y\|^{2}\,\nu(dy)-\int\|x\|^{2}\,\mu(dx).$

	$\displaystyle\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)=\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\left(C(\nu,\rho)-C(\mu,\rho)\right)$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\int\|y\|^{2}\,\nu(dy)+\int\|z\|^{2}\,\rho(dz)-\mathcal{W}_{2}(\nu,\rho)^{2}$
	$\displaystyle\qquad\qquad\qquad\qquad-\int\|x\|^{2}\,\mu(dx)-\int\|z\|^{2}\,\rho(dz)+\mathcal{W}_{2}(\mu,\rho)^{2}\Big{)}$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}+\int\|y\|^{2}\,\nu(dy)-\int\|x\|^{2}\,\mu(dx)\Big{)}.$

	$\displaystyle\int g\,d\mu-\int g\,d\nu$	$\displaystyle=\left(-\int\left[\frac{\|x\|^{2}}{2}-g(x)\right]\,\mu(dx)+\int\left[\frac{\|y\|^{2}}{2}-g(y)\right]\,\nu(dy)\right)$
		$\displaystyle\qquad+\int\frac{\|x\|^{2}}{2}\,\mu(dx)-\int\frac{\|y\|^{2}}{2}\,\nu(dy)$
		$\displaystyle=\int f\,d\nu-\int f\,d\mu+\int\frac{\|x\|^{2}}{2}\,\mu(dx)-\int\frac{\|y\|^{2}}{2}\,\nu(dy).$

	$\displaystyle\inf_{g\ c\text{-concave}}\left(\int g\,d\mu-\int g\,d\nu\right)$
	$\displaystyle=\inf_{f\text{ convex}}\left(\int f\,d\nu-\int f\,d\mu\right)+\int\frac{\|x\|^{2}}{2}\,\mu(dx)-\int\frac{\|y\|^{2}}{2}\,\nu(dy)$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}^{2}(\mu,\rho)-\mathcal{W}_{2}^{2}(\nu,\rho)+\int\|y\|^{2}\,\nu(dy)-\int\|x\|^{2}\,\mu(dx)$
	$\displaystyle\qquad\qquad\qquad+\int\|x\|^{2}\,\mu(dx)-\int\|y\|^{2}\,\nu(dy)\Big{)}$
	$\displaystyle=\frac{1}{2}\inf_{\rho\in\mathcal{P}^{\infty}(\mathbb{R}^{d})}\Big{(}\mathcal{W}_{2}(\mu,\rho)^{2}-\mathcal{W}_{2}(\nu,\rho)^{2}\Big{)}.$