This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the existence of Monge solutions to multi-marginal optimal transport with quadratic cost and uniform discrete marginals

Pedram Emami [email protected], [email protected]  and  Brendan Pass University of Alberta, Edmonton, Alberta, Canada
Abstract.

A natural and important question in multi-marginal optimal transport is whether the Monge ansatz is justified; does there exist a solution of Monge, or deterministic, form? We address this question for the quadratic cost when each marginal measure is mm-empirical (that is, uniformly supported on mm points). By direct computation, we provide an example showing that the ansatz can fail when the underlying dimension dd is 22, number of marginals NN to be matched is 33 and the size mm of their supports is 33. As a consequence, the set of mm-empirical measures is not barycentrically convex when N3N\geq 3, d2d\geq 2 and m3m\geq 3. It is a well known consequence of the Birkhoff-von Neumann Theorem that the Monge ansatz holds for N=2N=2, standard techniques show it holds when d=1d=1, and we provide a simple proof here that it holds whenever m=2m=2. Therefore, the NN, dd and mm in our counterexample are as small as possible.

Key words and phrases:
multi-marginal optimal transport, discrete marginals, Monge solutions, Wasserstein barycenter
2010 Mathematics Subject Classification:
49Q22
BP is pleased to acknowledge support from Natural Sciences and Engineering Research Council of Canada Grant 04658-2018. The work of PE was completed in partial fulfillment of the requirements for a doctoral degree in mathematics at the University of Alberta.

1. Introduction

Given probability measures μ1,μ2,,μN\mu_{1},\mu_{2},...,\mu_{N} (known as the marginals) on respective bounded domains X1,X2,,XNdX_{1},X_{2},...,X_{N}\subseteq\mathbb{R}^{d} and a cost function c:X1×X2××XNc:X_{1}\times X_{2}\times...\times X_{N}\rightarrow\mathbb{R}, the multi-marginal optimal transport problem is to minimize

(1) infγΠ(μ1,μ2,,μN)X1×X2××XNc(x1,x2,,xN)𝑑γ,\inf_{\gamma\in\Pi(\mu_{1},\mu_{2},...,\mu_{N})}\int_{X_{1}\times X_{2}\times...\times X_{N}}c(x_{1},x_{2},...,x_{N})\,d\gamma,

over the set Π(μ1,μ2,,μN)\Pi(\mu_{1},\mu_{2},...,\mu_{N}) of probability measures on X1×X2××XNX_{1}\times X_{2}\times...\times X_{N} whose marginals are the μi\mu_{i}. In particular, the N=2N=2 case is the classical optimal transport problem, which represents a very active area of modern mathematics, with many applications, reviewed in, for example [27, 25, 28]. The more general case, N3N\geq 3, has recently emerged as an important problem in its own right, owing largely to it own wide variety of applications; see, for instance, the survey papers [19, 8].

A central question in this theory is whether minimizers concentrate on the graphs of functions over the first variable x1x_{1}; solutions with this structure are known as Monge solutions. Restricting the minimization to measures arising from such mappings, or, adopting the terminology of [9], making the Monge ansatz, when justified, amounts to a spectacular dimensional reduction and therefore greatly simplifies the problem computationally. In the classical N=2N=2 case, it is well known that if μ1\mu_{1} is absolutely continuous with respect to Lebesgue measure, then, for cost satisfying mild structural conditions, the solution to (1) is unique and of Monge type [4, 11, 15]. Central among such costs is the quadratic cost, c(x1,x2)=|x1x2|2c(x_{1},x_{2})=|x_{1}-x_{2}|^{2}, which induces a metric, known as the Wasserstein distance, on the set of probability measures, defined by W2(μ1,μ2):=infγΠ(μ1,μ2)X1×X2|x1x2|2𝑑γW_{2}(\mu_{1},\mu_{2}):=\sqrt{\inf_{\gamma\in\Pi(\mu_{1},\mu_{2})}\int_{X_{1}\times X_{2}}|x_{1}-x_{2}|^{2}\,d\gamma}; Monge solution and uniqueness results for this cost were first proven by Brenier [3]. The situation is much more subtle and less well understood when N3N\geq 3. However, by now many examples of cost functions for which unique Monge solutions exist for absolutely continuous marginals [5, 12, 7, 22, 20], as well as many for which they do not [6, 18, 17], have been discovered, and some general (though very strong) conditions on cc ensuring this structure have been developed [16, 14, 21]. Of special note is the natural, multi-marginal extension of the quadratic cost,

(2) c(x1,x2,,xN)=i,j=1N|xixj|2,c(x_{1},x_{2},...,x_{N})=\sum_{i,j=1}^{N}|x_{i}-x_{j}|^{2},

or equivalently, c(x1,x2,,xN)=|i=1Nxi|2c(x_{1},x_{2},...,x_{N})=-\left|\sum_{i=1}^{N}x_{i}\right|^{2}, for which existence of Monge solutions and uniqueness were proven by Gangbo and Swiech [10]. Solving (1) with this cost is known to be equivalent to finding a Wasserstein barycenter (with equal weights on the measures), which, as introduced by Agueh-Carlier [1], is a minimizer among probability measures ν\nu on d\mathbb{R}^{d} of

(3) νi=1NW22(ν,μi).\nu\mapsto\sum_{i=1}^{N}W_{2}^{2}(\nu,\mu_{i}).

Wasserstein barycenters are Fréchet means on the Wasserstein space, and are an extremely active area of current research, due to many applications in statistics, image processing and data science, among other areas (see, for instance, [24, 13, 29, 26] and the survey in the monograph [23]).

In this short note we are interested in discrete, rather than absolutely continuous, marginals μi\mu_{i}. In general, optimal couplings between discrete measures need not be unique or of Monge type; indeed, in general, there may not even exist a single coupling γΠ(μ1,μ2,,μN)\gamma\in\Pi(\mu_{1},\mu_{2},...,\mu_{N}) which is of Monge type. However, such transport maps at least exist when all the marginals are uniformly distributed on a common number mm of points, as is often the case in applications. We will call such measures mm-empirical measures. In this case, when N=2N=2, the existence of Monge solutions for any cost function cc is an easy consequence of the Birkhoff-von Neumann Theorem, although solutions to (1) are not unique in general. An immediate consequence is that the set of mm-empirical measures is geodesically convex in Wasserstein space; that is, any two elements can be connected by a Wasserstein geodesic (often called a displacement interpolant) which lies in this set.

On the other hand, even for mm-empirical measures, Monge solutions for N3N\geq 3 do not exist for some cost functions. Indeed, as exposed in [9], when N3N\geq 3, the convex polytope Π(μ1,,μN)\Pi(\mu_{1},...,\mu_{N}) has extreme points which are not of Monge type, unlike in the N=2N=2 case. This immediately implies the existence of cost functions for which Monge solutions do not exist, and [9] provides examples with relevance in physics for which this occurs. Intuition provided there, however, suggests that it is the repulsive nature of certain costs which leads to non-Monge optimizers for mm-empirical measures, in agreement with known results in the continuous case. It remains reasonable to expect existence of Monge solutions for the quadratic cost function (2). It is similarly natural to conjecture that mm-empirical marginals always have mm-empirical Wasserstein barycenters (that is, that the set of mm-empirical measures is barycentrically convex); we note that upper bounds on the size of the support of barycenters are known for general discrete measures [2], but the bound obtained there is N(m1)+1N(m-1)+1 in the case of mm-empirical marginals, much larger than mm.

We exhibit here, by direct computation, an example showing that Monge solutions may not exist for N3N\geq 3 even for the quadratic cost. An immediate consequence is that the set of mm-empirical measures is not barycentrically convex; there exist mm-empirical marginals μi\mu_{i} for which there does not exist a Wasserstein barycenter within the same set.

Our counterexample is for three measures (N=3N=3) in two dimensions (d=2d=2) supported on three points (m=3m=3). These are the smallest values of these three parameters for which such counterexamples can exist. Indeed, for N=2N=2, Monge solutions can always be found as discussed above. When d=1d=1, optimal couplings are well known to be monotone increasing, which implies they are of Monge form and unique. Finally, we present here a simple proof that there always exists a Monge solution for any collection of 22-empirical marginals, for any values of dd and NN.

2. A counterexample to Monge solutions for marginals with three-point support in 2\mathbb{R}^{2}

We define the set of mm-empirical measures on d\mathbb{R}^{d} as Em:={1mi=1mδxi:x1,..,xmd}E_{m}:=\{\frac{1}{m}\sum_{i=1}^{m}\delta_{x_{i}}:x_{1},..,x_{m}\in\mathbb{R}^{d}\}.

Here we present a counterexample to the existence of Monge solutions for 33 marginals supported uniformly on 33 points in 2\mathbb{R}^{2}.

Example 1.

Let μ1,μ2,μ3E3\mu_{1},\mu_{2},\mu_{3}\in E_{3} be the 33-empirical measures supported on 33 points in 2\mathbb{R}^{2} defined by

μ1\displaystyle\mu_{1} =13(δ(0.4417,4.7665)+δ(0.27748,1.0397)+δ(1.4826,4.7896)),\displaystyle=\frac{1}{3}\left(\delta_{(0.4417,-4.7665)}+\delta_{(-0.27748,1.0397)}+\delta_{(1.4826,4.7896)}\right),
μ2\displaystyle\mu_{2} =13(δ(2.1054,3.9784)+δ(3.5763,1.8988)+δ(3.328,1.558)),\displaystyle=\frac{1}{3}\left(\delta_{(-2.1054,-3.9784)}+\delta_{(3.5763,-1.8988)}+\delta_{(3.328,-1.558)}\right),
(4) μ3\displaystyle\mu_{3} =13(δ(3.6728,0.23451)+δ(1.6988,2.2917)+δ(1.1644,2.386)).\displaystyle=\frac{1}{3}\left(\delta_{(-3.6728,0.23451)}+\delta_{(1.6988,-2.2917)}+\delta_{(-1.1644,-2.386)}\right).

The optimal transport problem (1) with cost (2) can easily be solved directly by linear programming, yielding the optimal measure γ=16[δx113+δx122+δx211+δx223+δx331+δx332]\gamma=\frac{1}{6}[\delta_{x^{113}}+\delta_{x^{122}}+\delta_{x^{211}}+\delta_{x^{223}}+\delta_{x^{331}}+\delta_{x^{332}}], with an optimal cost of 68.027\mathbf{68.027}. Here, xα1α2α3:=(x1α1,x2α3,x3α3)(2)3x^{\alpha^{1}\alpha^{2}\alpha^{3}}:=(x_{1}^{\alpha^{1}},x_{2}^{\alpha^{3}},x_{3}^{\alpha^{3}})\in(\mathbb{R}^{2})^{3}, where the elements xiαi2x_{i}^{\alpha^{i}}\in\mathbb{R}^{2} in the support of each μi\mu_{i} are ordered as in (4): for example, x113=((0.4417,4.7665),(2.1054,3.9784),(1.1644,2.386))x^{113}=((0.4417,-4.7665),(-2.1054,-3.9784),(-1.1644,-2.386)).

On the other hand, potential Monge solutions arise from transport maps, which correspond to measures of the form γ=13[δx1,σ2(1),σ3(1)+δx2,σ2(2),σ3(2)+δx3,σ2(3),σ3(3)]\gamma=\frac{1}{3}[\delta_{x^{1,\sigma_{2}(1),\sigma_{3}(1)}}+\delta_{x^{2,\sigma_{2}(2),\sigma_{3}(2)}}+\delta_{x^{3,\sigma_{2}(3),\sigma_{3}(3)}}] for permutations σ2\sigma_{2} and σ3\sigma_{3} of the set {1,2,3}\{1,2,3\} of indices. One can easily check that the minimal Monge cost (MMC), obtained by minimizing the transport cost among the (N!)m1=(3!)2=36(N!)^{m-1}=(3!)^{2}=36 transport maps, is obtained by γ=13[δx113+δx222+δx331]\gamma=\frac{1}{3}[\delta_{x^{113}}+\delta_{x^{222}}+\delta_{x^{331}}], yielding a strictly higher total cost of 68.065\mathbf{68.065}. It follows that the problem (1) with quadratic cost (2) and marginals (4) does not admit a Monge solution.

Since a 33-empirical Wasserstein barycenter of μ1,μ2\mu_{1},\mu_{2} and μ3\mu_{3} would immediately yield a Monge solution to (1) via the equivalence between the problems identified in [1], this also implies that there cannot exist a 33-empirical Wasserstein barycenter νE3\nu\in E_{3} of these marginals.

3. Existence of Monge solutions for uniform empirical measures with two-point supports in d\mathbb{R}^{d}

We now present a simple proof that Monge solutions exist for 22-empirical marginals.

Theorem 1.

Let μi=12(δxi1+δxi2)E2\mu_{i}=\frac{1}{2}\left(\delta_{x_{i}^{1}}+\delta_{x_{i}^{2}}\right)\in E_{2} be 22-empirical measures on d\mathbb{R}^{d} for i=1,,Ni=1,...,N. Then problem (1) with cost (2) has a Monge solution.

Proof.

Since the transform xixix¯ix_{i}\rightarrow x_{i}-\bar{x}_{i}, where each xi:=(xi1+xi2)/2x_{i}:=(x_{i}^{1}+x_{i}^{2})/2 is the mean of μi\mu_{i}, shifts the values of the functional (1) but does not change its minimizer, we can assume without loss of generality that the mean of each μi\mu_{i} is 0; that is, xi2=xi1x_{i}^{2}=-x_{i}^{1}. Now, choose αi{1,2}\alpha_{i}\in\{1,2\} for i=1,2,Ni=1,2...,N, to maximize

M=maxαi{1,2}:i=1,2,,N|i=1nxiαi|2.M=\max_{\alpha_{i}\in\{1,2\}:i=1,2,...,N}\left|\sum_{i=1}^{n}x_{i}^{\alpha_{i}}\right|^{2}.

Then set x^=(x1α1,x2α2,,xNαN)(d)N\hat{x}=(x_{1}^{\alpha^{1}},x_{2}^{\alpha^{2}},...,x_{N}^{\alpha^{N}})\in(\mathbb{R}^{d})^{N} and γ=12[δx^+δx^]\gamma=\frac{1}{2}[\delta_{\hat{x}}+\delta_{-\hat{x}}]. It is clear that γΠ(μ1,,μN)\gamma\in\Pi(\mu_{1},...,\mu_{N}) and that γ\gamma is supported on the graph of the transport map x1α1(x2α2,,xNαN)x_{1}^{\alpha^{1}}\mapsto(x_{2}^{\alpha^{2}},...,x_{N}^{\alpha^{N}}), x1α1(x2α2,,xNαN)-x_{1}^{\alpha^{1}}\mapsto(-x_{2}^{\alpha^{2}},...,-x_{N}^{\alpha^{N}}). Since |i=1nxi|2=M\left|\sum_{i=1}^{n}x_{i}\right|^{2}=M γ\gamma a.e, we have

(d)N|i=1nxi|2𝑑γ=M,-\int_{(\mathbb{R}^{d})^{N}}\left|\sum_{i=1}^{n}x_{i}\right|^{2}d\gamma=-M,

whereas, for any other γ~Π(μ1,μN)\tilde{\gamma}\in\Pi(\mu_{1},...\mu_{N}) we have

(d)N|i=1nxi|2𝑑γ~M,-\int_{(\mathbb{R}^{d})^{N}}\left|\sum_{i=1}^{n}x_{i}\right|^{2}d\tilde{\gamma}\geq-M,

establishing the optimality of the Monge coupling γ\gamma for the cost function c(x1,x2,,xN)=|i=1nxi|2c(x_{1},x_{2},...,x_{N})=-\left|\sum_{i=1}^{n}x_{i}\right|^{2}, which is well known to be equivalent to (2) [10].

Corollary 1.

Let μi=12(δxi1+δxi2)E2\mu_{i}=\frac{1}{2}\left(\delta_{x_{i}^{1}}+\delta_{x_{i}^{2}}\right)\in E_{2} be 22-empirical measures on d\mathbb{R}^{d} for i=1,,Ni=1,...,N. Then there exists a 22-empirical Wasserstein barycenter νE2\nu\in E_{2} of the μi\mu_{i}.

Proof.

Let x±:=±1Ni=1Nxiαi2x^{\pm}:=\pm\frac{1}{N}\sum_{i=1}^{N}x_{i}^{\alpha_{i}}\in\mathbb{R}^{2}, where the αi1,2\alpha_{i}\in{1,2} are as in the previous proof. The equivalence between multi-marginal optimal transport and the Wasserstein barycenter established in Proposition 4.1 of [1] then implies that 12[δx++δx]E2\frac{1}{2}[\delta_{x^{+}}+\delta_{x^{-}}]\in E_{2} is a Wasserstein barycenter.

4. Discussion

The counterexample in Example 1 was found by brute force, through randomly generating many collections of empirical marginals and computing the solutions to (1). A couple of observations can be gleaned from the data obtained by this procedure:

  1. (1)

    The occurrence of problems that do not have Monge solutions is quite rare; for three measures supported on three points in 2\mathbb{R}^{2}, we found that only about one in every 50005000 to 80008000 problems does not have a Monge solution. The frequency of these counterexamples seems to increase with the underlying dimension; we found that the failure of the Monge ansatz occurs more often in 3\mathbb{R}^{3}.

  2. (2)

    Whenever the Monge ansatz fails, the difference between the minimal Monge cost and the actual optimal cost is quite small. We identified around 500500 cases when the Monge ansatz fails in 2\mathbb{R}^{2}; among these the greatest difference between the cost of the minimal Monge cost and the actual optimal cost was 4.3%4.3\%. In fact, in most cases the error was much lower than the above upper bound as is shown in Figure 1.

The last observation suggest rigorously quantifying the error made by the Monge ansatz as an intriguing line of future research. It seems possible that the errors are small enough that the Monge ansatz as an approximation of (1) may still be a useful tool in some applications.

Refer to caption
Figure 1. The distribution of the errors of minimal Monge costs for 500500 occurrences of multi-marginal optimal transport problems with 33-empirical marginals in 3\mathbb{R}^{3} which do not have Monge solutions.

References

  • [1] Martial Agueh and Guillaume Carlier, Barycenters in the wasserstein space, SIAM Journal on Mathematical Analysis 43 (2011), no. 2, 904–924.
  • [2] Ethan Anderes, Steffen Borgwardt, and Jacob Miller, Discrete wasserstein barycenters: Optimal transport for discrete data, Mathematical Methods of Operations Research 84 (2016), 389–409.
  • [3] Yann Brenier, Décomposition polaire et réarrangement monotone des champs de vecteurs, C. R. Acad. Sci. Paris Sér. I Math. 305 (1987), no. 19, 805–808. MR 923203
  • [4] Luis A. Caffarelli, Allocation maps with general cost functions, Partial differential equations and applications, Lecture Notes in Pure and Appl. Math., vol. 177, Dekker, New York, 1996, pp. 29–35. MR 1371577
  • [5] G. Carlier, On a class of multidimensional optimal transportation problems, J. Convex Anal. 10 (2003), no. 2, 517–529. MR 2044434
  • [6] Guillaume Carlier and Bruno Nazaret, Optimal transportation for the determinant, ESAIM Control Optim. Calc. Var. 14 (2008), no. 4, 678–698. MR 2451790
  • [7] Maria Colombo, Luigi De Pascale, and Simone Di Marino, Multimarginal optimal transport maps for one-dimensional repulsive costs, Canad. J. Math. 67 (2015), no. 2, 350–368. MR 3314838
  • [8] Simone Di Marino, Augusto Gerolin, and Luca Nenna, Optimal transportation theory with repulsive costs, Topological optimization and optimal transport, Radon Ser. Comput. Appl. Math., vol. 17, De Gruyter, Berlin, 2017, pp. 204–256. MR 3729378
  • [9] Gero Friesecke, A Simple Counterexample to the Monge Ansatz in Multimarginal Optimal Transport, Convex Geometry of the Set of Kantorovich Plans, and the Frenkel–Kontorova Model, SIAM Journal on Mathematical Analysis 51 (2019), no. 6, 4332–4355.
  • [10] W. Gangbo and A. Świȩch, Optimal maps for the multidimensional Monge-Kantorovich problem, Comm. Pure Appl. Math. 51 (1998), no. 1, 23–45.
  • [11] Wilfrid Gangbo and Robert J. McCann, The geometry of optimal transportation, Acta Math. 177 (1996), no. 2, 113–161. MR 1440931
  • [12] Henri Heinich, Problème de Monge pour nn probabilités, C. R. Math. Acad. Sci. Paris 334 (2002), no. 9, 793–795. MR 1905042
  • [13] Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Hai Bui, Viet Huynh, and Dinh Phung, Multilevel clustering via Wasserstein means, Proceedings of the 34th International Conference on Machine Learning (Doina Precup and Yee Whye Teh, eds.), Proceedings of Machine Learning Research, vol. 70, PMLR, 06–11 Aug 2017, pp. 1501–1509.
  • [14] Young-Heon Kim and Brendan Pass, A general condition for Monge solutions in the multi-marginal optimal transport problem, SIAM J. Math. Anal. 46 (2014), no. 2, 1538–1550. MR 3190751
  • [15] Vladimir Levin, Abstract cyclical monotonicity and Monge solutions for the general Monge-Kantorovich problem, Set-Valued Anal. 7 (1999), no. 1, 7–32. MR 1699061
  • [16] B. Pass, Uniqueness and monge solutions in the multimarginal optimal transportation problem, SIAM Journal on Mathematical Analysis 43 (2011), no. 6, 2758–2775.
  • [17] by same author, On the local structure of optimal measures in the multi-marginal optimal transportation problem, Calculus of Variations and Partial Differential Equations 43 (2012), 529–536, 10.1007/s00526-011-0421-z.
  • [18] Brendan Pass, Remarks on the semi-classical Hohenberg-Kohn functional, Nonlinearity 26 (2013), no. 9, 2731–2744. MR 3106579
  • [19] by same author, Multi-marginal optimal transport: theory and applications, ESAIM Math. Model. Numer. Anal. 49 (2015), no. 6, 1771–1790. MR 3423275
  • [20] Brendan Pass and Adolfo Vargas-Jiménez, Monge solutions and uniqueness in multi-marginal optimal transport via graph theory, Adv. Math. 428 (2023), Paper No. 109101, 43. MR 4601780
  • [21] Brendan W. Pass and Adolfo Vargas-Jiménez, A general framework for multi-marginal optimal transport, To appeach in Math. Programming.
  • [22] by same author, Multi-marginal optimal transportation problem for cyclic costs, SIAM J. Math. Anal. 53 (2021), no. 4, 4386–4400. MR 4296755
  • [23] Gabriel Peyré and Marco Cuturi, Computational optimal transport: With applications to data science, 2019.
  • [24] Julien Rabin, Gabriel Peyré, Julie Delon, and Marc Bernot, Wasserstein barycenter and its application to texture mixing, Scale Space and Variational Methods in Computer Vision (Berlin, Heidelberg) (Alfred M. Bruckstein, Bart M. ter Haar Romeny, Alexander M. Bronstein, and Michael M. Bronstein, eds.), Springer Berlin Heidelberg, 2012, pp. 435–446.
  • [25] Filippo Santambrogio, Optimal transport for applied mathematicians, vol. 55, Birkäuser, NY, 2015.
  • [26] M. Staib, S. Claici, J. Solomon, and S. Jegelka, Parallel Streaming Wasserstein Barycenters, Advances in Neural Information Processing Systems (I. Guyon et al., ed.), vol. 30, Curran Associates, Inc., 2017.
  • [27] Cédric Villani, Topics in optimal transportation, vol. 58, American Mathematical Soc., 2003.
  • [28] Cédric Villani, Optimal transport, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338, Springer-Verlag, Berlin, 2009, Old and new. MR 2459454
  • [29] H. Yang and E. Tabak, Conditional density estimation, latent variable discovery, and optimal transport, Comm. Pure Appl. Math. 75 (2022), no. 3, 610–663. MR 4373178