This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Minimizing optimal transport for functions with fixed-size nodal sets

Qiang Du Department of Applied Physics and Applied Mathematics, and Data Science Institute, Columbia University, 500 W120 Street, New York, NY 10027, USA [email protected]  and  Amir Sagiv Department of Applied Physics and Applied Mathematics, Columbia University, 500 W120 Street, New York, NY 10027, USA [email protected]
Abstract.

Consider the class of zero-mean functions with fixed LL^{\infty} and L1L^{1} norms and exactly NN\in\mathbb{N} nodal points. Which functions ff minimize Wp(f+,f)W_{p}(f_{+},f_{-}), the Wasserstein distance between the measures whose densities are the positive and negative parts? We provide a complete solution to this minimization problem on the line and the circle, which provides sharp constants for previously proven “uncertainty principle”-type inequalities, i.e., lower bounds on NWp(f+,f)N\cdot W_{p}(f_{+},f_{-}). We further show that, while such inequalities hold in many metric measure spaces, they are no longer sharp when the non-branching assumption is violated; indeed, for metric star-graphs, the optimal lower bound on Wp(f+,f)W_{p}(f_{+},f_{-}) is not inversely proportional to the size of the nodal set, NN. Based on similar reductions, we make connections between the analogous problem of minimizing Wp(f+,f)W_{p}(f_{+},f_{-}) for ff defined on Ωd\Omega\subset\mathbb{R}^{d} with an equivalent optimal domain partition problem.

Key words and phrases:
Wasserstein, Uncertainty Principle, Nodal Set, Metric Graph, Optimal Partition
2020 Mathematics Subject Classification:
28A75, 49Q05, 49Q22, 52C35

1. Introduction

The study of nodal sets (or zero sets) of functions is a classic topic in analysis. Loosely speaking, the key question is, for a given a function, “how big” its nodal set Z(f)={x|f(x)=0}Z(f)=\{x~{}|~{}f(x)=0\} is.111Here we use the term “nodal set” for a broad class of functions (see Section 2), whereas in other places it refers to the zero sets of a special class of functions, e.g., Laplacian eigenfunctions.

In recent years, this question has been connected to the notion of optimal transport. The intuition is as follows: given a sufficiently regular metric Borel probability space Ω\Omega (e.g., a domain in d\mathbb{R}^{d} or a Riemannian manifold) and a sufficiently regular and bounded real-valued function f:Ωf:\Omega\to\mathbb{R} with mean zero over Ω\Omega, we consider its positive and negative parts f±(x)±max(±f(x),0)f_{\pm}(x)\equiv\pm\max(\pm f(x),0) as densities, and the nodal set Z(f)Z(f) as the interfaces of their supports. If the nodal set is small, then there should be some regions in the support of f+f_{+} that cannot be near the support of ff_{-}. Therefore, the cost of transporting f+f_{+} to ff_{-} cannot be arbitrarily small.

A common metric which quantifies the notion of transport cost is the Wasserstein-pp distance Wp(f+,f)W_{p}(f_{+},f_{-}). To make the connection between the nodal set and the transport cost more transparent, we ask:

(1) Given a nodal set of a certain size, which functions minimize Wp(f+,f)W_{p}(f_{+},f_{-})?

The answer to this minimization problem should depend on the norms of ff; first, the larger f1\|f\|_{1} is, the more mass there is to transport. Second, the more localized the function is, as measured in this work by f\|f\|_{\infty}, the more mass can be concentrated near the interface, thus decreasing the transport distance and cost.

Before rigorously formalizing our question (see Question 1 in Section 2), we note that the motivation behind it originates from the following type of inequalities: for a domain  Ω\Omega,

(2) W1(f+,f)Surf{xΩ:f(x)=0}c(Ω)(fL1fL)ρfL1,fC0(Ω¯),W_{1}(f_{+},f_{-})\cdot{\rm Surf}\left\{x\in\Omega:f(x)=0\right\}\geq c(\Omega)\left(\frac{\|f\|_{L^{1}}}{\|f\|_{L^{\infty}}}\right)^{\rho}\|f\|_{L^{1}}\,,\qquad\forall f\in C^{0}(\bar{\Omega}),

where ρ>0\rho>0, c(Ω)>0c(\Omega)>0 is a constant that depends only on the domain Ω\Omega, and Surf{\rm Surf} is a surface measure, e.g., the co-dimension 11 Hausdorff measure. The first inequality of this type was proven by Steinerberger [48] for two-dimensional domains such as the unit square [0,1]2[0,1]^{2}, with ρ=1\rho=1. This result was then generalized to certain compact domains in d\mathbb{R}^{d} with arbitrary d1d\geq 1 by Steinerberger and the second author with ρ=41/d\rho=4-1/d, see [45]. Subsequently, Carroll, Massaneda, and Ortega-Cerdà [16] improved the exponent to ρ=21/d\rho=2-1/d, and Cavalletti and Farinelli proved that the optimal ρ=1\rho=1 is true in all dimensions [17]. The analysis in [17] takes place at a much larger class of metric measure spaces that satisfy the Curvature-Dimension (CD(K,N){\rm CD}(K,N)) condition, where the nodal set can be measured using the notion of Perimeter [2, 39]. That result was then also proven for RCD(K,){\rm RCD}(K,\infty) metric spaces [3, 4] by De-Ponti and Farinelli for all p1p\geq 1 [21].

Despite the great level of generality of the above results, some fundamental questions still remain: is the inequality (2) with an exponent ρ=1\rho=1 sharp? First, we do not know what the optimal constant c(Ω)c(\Omega) is. More fundamentally, one may ask whether a multiplicative inequality is the natural one. Indeed, some works studied related additive inequalities [13, 15, 41, 52] regarding transport of the Lebesgue measure between disjoint domains. 222In particular, [15, Proposition 2.5] and [41, Lemma 2.8] consider the overall transport outside of a bounded set EdE\subset\mathbb{R}^{d} and prove that a lower bound on the transport cost must be inversely proportional to the perimeter of EE, i.e., if FEcF\subseteq E^{c} with volume |E|=|F|=1|E|=|F|=1, then Wp(𝟙E,𝟙F)>C(d)/Per(E)W_{p}(\mathbbm{1}_{E},\mathbbm{1}_{F})>C(d)/{\rm Per}(E) for some universal C(d)>0C(d)>0.

But even the most general results regarding inequalities of the form (2) require an essentially non-branching property [17, 21]: all RCD(K,){\rm RCD}(K,\infty) spaces satisfy this property [44], whereas the result of [17] is proven only to CD(K,N){\rm CD}(K,N) spaces which satisfy this property. A primary takeaway of our study is that it is indeed necessary in order for inequalities like (2) to be sharp.

1.1. Main results

In this paper, we formalize the minimization problem (1) and establish a strategy to its solution. We find the minimizers on three families of one-dimensional domains: a line (or an interval), a circle, and metric star graphs. The analysis of these domains leads us to the following key conclusions:

Refer to caption
Figure 1. An exemplary minimizer of Wp(f+,f)W_{p}(f_{+},f_{-}) with 33 nodal points on an interval.
  • Generalized nodal sets for L1LL^{1}\cap L^{\infty} functions. Common to all these settings are that the minimizers are step functions, and therefore do not have nodal sets in the usual sense. Hence, it is crucial to extend the class of admissible functions and notion of nodal sets of continuous functions to a broader class of L1LL^{1}\cap L^{\infty} functions. Furthermore, it may be expected that this would be the right formulation through which to explore the minimizers in higher dimensions (see Section 6).

  • Strategy. Our work establishes a strategy to approach the minimization problem under consideration through a series of reductions of the class of feasible functions. The reduction strategy consists of three stages: (1) We reduce the functional minimization problem to a geometric one. Fixing the nodal domain, i.e., the interface between the positive and negative parts (see Definition 1), we show that it is always preferable to allocate the L1L^{1} mass near the nodal domain. Thus, given a function ff we find a new function gg which is (up to a sign and scaling) the indicator function of a subset of the supports of f±f_{\pm}, and for which the overall transport cost is cheaper. (2) We find that there is always a preferable configuration of the mass and the nodal points such that the optimal transport plan only couples adjacent intervals. (3) Steps (1) and (2) reduce the infinite-dimensional variational problem to a constrained finite dimensional problem, which one can solve by elementary means.

  • The minimizers By posing the question as a minimization problem, our work goes beyond previous works on the subject since we gain insight into the minimizers themselves. In the simplest case, the interval (or for every smooth non-intersecting curve), the minimizers are step functions whose only possible values are either ±f\pm\|f\|_{\infty} or 0, as can be seen in Fig. 1. Thus, these are functions for which the optimal transport between f+f_{+} and ff_{-} is local, across a single interface. Next, even though the circle introduces a new global structure, we show that the nature of the problem (and the minimizers) do not change.

  • Sharp multiplicative inequalities for the interval and the circle. The complete solution of the minimization problem allows us to explore the sharpness of the multiplicative “uncertainty principle” (2). For an interval, we find that multiplicative inequalities are optimal (Section 3). The sharp inequality for p1p\geq 1 is

    Wp(f+,f)|Z(f)|211p(f1f)f11p.W_{p}(f_{+},f_{-})\cdot|Z(f)|\geq 2^{-1-\frac{1}{p}}\left(\frac{\|f\|_{1}}{\|f\|_{\infty}}\right)\|f\|_{1}^{\frac{1}{p}}\,.

    We further show that the same conclusion is not limited to an interval domain but also remains valid for a circle (see Section 4). This, in turn, leads us to seek for an example of domains, e.g., a metric star graph, for which the sharp lower bound is not expected to be multiplicative (see Section 5).

  • Metric star graphs. We carry our strategy to the more complicated case of metric star graphs (e.g., we need to solve Kontorvich’ problem and not Monge’s); Indeed, as before, Theorem 13 reduces the general minimization problem over L1LL^{1}\cap L^{\infty} into a finite-dimensional constrained optimization problem. The key difference, however, is that in contrast to the line and the circle, the optimal relation between the minimal transport cost and the other factors under consideration is not multiplicative on star graphs. The introduction of a node yields optimal lower bounds which depend on the geometry of the domain. For example, in the case of a star with DD\in\mathbb{N} sufficiently long edges,

    W1(f+,f)N~14f1ff1,N~=|Z(f)|1+{D2,Deven,(D+1)(D1)2D,Dodd.W_{1}(f_{+},f_{-})\tilde{N}\geq\frac{1}{4}\frac{\|f\|_{1}}{\|f\|_{\infty}}\|f\|_{1}\,,\qquad\tilde{N}=|Z(f)|-1+\left\{\begin{array}[]{cc}\frac{D}{2}\,,&D~{}{\rm even}\,,\\ \frac{(D+1)(D-1)}{2D}\,,&D~{}{\rm odd}\,.\end{array}\right.

    When some of the edges are not sufficiently long, even more complicated forms of inequalities emerge. Our conclusions for star graphs echo and contrast the analysis of [17], where the multiplicative lower bound (2) is proved for a broad class of non-branching spaces. Since stars, and metric graphs in general, are branching spaces (see also [26]), one might expect a different type of results, as we indeed prove in Section 5.

The strategy and issues identified in this work are expected to be generalizable to high dimensional settings. The one-dimensional settings help establish the strategy in its simplest form, and allow us to reach easy-to-compute and precise constants. Furthermore, by considering graphs, we are able to expose the key challenges in going beyond all previous works on non-branching spaces. In Section 6, we outline the conjectures and key challenges in generalizing our strategy to multi-dimensional domains, as well as to general metric graphs.

1.2. Structure of the paper

Preliminaries, definitions, and the formulation of our minimization problem (Question 1) are given in Section 2. Then, the problem is solved for the interval (Theorem 5) and the circle (Theorem 7) in Sections 3 and 4, respectively. For star graphs, in Section 5 we re-formulate Question 1 as a finite-dimensional constrained minimization problem (Theorem 13) and solve it for a number of special cases. Finally, an outlook on the problem in multiple dimensions and on general metric graphs is presented in Section 6.

2. Settings

2.1. Nodal Sets.

Given a one-dimensional domain II, constants c,c1>0c_{\infty},c_{1}>0, and NN\in\mathbb{N}, define

(3) X=X(c,c1,N,I){f:Imeasurablesuchthatf=c,f1=c1,|Z(f)|=N,If(x)𝑑x=0.},X=X(c_{\infty},c_{1},N,I)\equiv\left\{f:I\to\mathbb{R}~{}~{}{\rm measurable}~{}~{}{\rm such}~{}{\rm that}~{}~{}\begin{array}[]{l}\|f\|_{\infty}=c_{\infty}\,,\\ \|f\|_{1}=c_{1}\,,\\ \left|Z(f)\right|=N\,,\\ \int_{I}f(x)\,dx=0\,.\end{array}~{}~{}\right\}\,,

where Z(f)Z(f) is the set of points where ff changes its signed, defined as follows:

Definition 1 (effective nodal set).

Consider a measurable f:If:I\to\mathbb{R} with finite L1L^{1} and LL^{\infty} norms.

  • Define

    Z1(f){x|f(x)>0}{x|f(x)<0}.Z_{1}(f)\equiv\partial\{x~{}|f(x)>0\}\cap\partial\{x~{}|f(x)<0\}\,.
  • Let 𝒵(f)π0(f1({0}))\mathcal{Z}(f)\equiv\pi_{0}\left(f^{-1}(\{0\})\right), the set of all connected components of f1({0})f^{-1}(\{0\}).

  • Let 𝒵(f)𝒵(f)\mathcal{Z}^{\prime}(f)\subseteq\mathcal{Z}(f) be the set of all elements of 𝒵(f)\mathcal{Z}(f) whose closure intersects both {x|f(x)>0}\partial\{x~{}|f(x)>0\} and {x|f(x)<0}\partial\{x~{}|f(x)<0\}.

  • Let Z2(f)Z_{2}(f) be a set which consists of a unique point xJx\in J for every J𝒵(f)J\in\mathcal{Z}^{\prime}(f).

  • Finally, define the effective nodal set as Z(f)Z1(f)Z2(f)Z(f)\equiv Z_{1}(f)\cup Z_{2}(f).

Refer to caption
Figure 2. Under Definition 1, the portrayed function has |Z(f)|=2|Z(f)|=2.

Our intention is to define Z(f)Z(f) in a way which exactly captures every sign change of ff once. Let us see that this is indeed achieved by the definition: consider, for example, the function in Fig. 2. While f=0f=0 on [A,B][A,B], and so [A,B]𝒵(f)[A,B]\in\mathcal{Z}(f), it is not an element of 𝒵(f)\mathcal{Z}^{\prime}(f), since it does not intersect with {f<0}\partial\{f<0\}. Hence, no point in [A,B][A,B] is contained in Z(f)Z(f). The same is true for [D,E][D,E]. By definition, CZ1(f)C\in Z_{1}(f). Lastly, [F,G]𝒵(f)[F,G]\in{\mathcal{Z}^{\prime}(f)}, and therefore FZ2(f)F\in Z_{2}(f). Overall, ff changes its sign exactly twice and indeed |Z(f)|=|{C,F}|=2|Z(f)|=|\{C,{F}\}|=2.

We contrast our definition with those used in previous works:

  • In [45, 48], the objects of study are continuous functions, and therefore the interface between the supports of f+f_{+} and ff_{-} is always contained in the set f1({0})f^{-1}(\{0\}). For general measurable functions, therefore, Definition 1 of the effective nodal set need not coincide with f1({0})f^{-1}(\{0\}) for fC0(I)f\in C^{0}(I) (consider e.g., f(x)=x2f(x)=x^{2} on [1,1][-1,1]). Moreover, for functions which are merely bounded but not continuous, these interfaces are not necessarily in f1({0})f^{-1}(\{0\}), and hence the somewhat more complicated form of Definition 1.

  • The authors of [17] considered the quantity Per({x|f(x)>0}){\rm Per}(\{x~{}|~{}f(x)>0\}). First note that, in Euclidean settings, the Perimeter coincides with the Hausdorff measure d1\mathcal{H}^{d-1} of the reduced boundary of {f>0}\{f>0\}, and in particular for d=1d=1 this is just the cardinality of the set.

    Furthermore, the zero set in [17], {x|f(x)>0}\partial\{x|f(x)>~{}0\}, includes points where ff does not changes signs, e.g., AA and BB in Figure 2. This is not an issue when seeking lower bounds of the type (2), as adding points which are not interfaces between f+f_{+} and ff_{-} can only increase the left hand side of (2). Since we solve minimization problems on functions with exactly NN sign changes, it is easier if we avoid such issues, even at a cost of a slightly more elaborated definition.

2.2. Optimal Transport and the Wasserstein distance.

We briefly recall the definition and certain key properties of the Wasserstein-pp distance. We refer to [47, 51] for a more comprehensive treatment of this topic. Let p1p\geq 1, Ωn\Omega\subseteq\mathbb{R}^{n} a Borel set, and denote by p(Ω)\mathbb{P}_{p}(\Omega) the set of all Borel probability measures on Ω\Omega with finite pp-th moments. Define the Wasserstein-pp distance between two measures μ1,μ2p(Ω)\mu_{1},\mu_{2}\in\mathbb{P}_{p}(\Omega) as

(4) Wp(μ1,μ2)infγΓ(μ1,μ2)Kp1p(γ),Kp(γ)Ω×Ω|xy|pdγ(x,y),W_{p}(\mu_{1},\mu_{2})\equiv\inf\limits_{\gamma\in\Gamma(\mu_{1},\mu_{2})}K_{p}^{\frac{1}{p}}(\gamma)\,,\qquad K_{p}(\gamma)\equiv\int\limits_{\Omega\times\Omega}|x-y|^{p}\,\mathop{}\!\mathrm{d}\gamma(x,y)\,,

where |xy||x-y| is the geodesic distance on II and Γ(μ1,μ2)\Gamma(\mu_{1},\mu_{2}) is the set of all Borel probability measures on Ω×Ω\Omega\times\Omega with marginals μ1\mu_{1} and μ2\mu_{2}, i.e.,

(5) μ1(A)=γ(Ω×A),μ2(A)=γ(A×Ω),\mu_{1}(A)=\gamma(\Omega\times A)\,,\qquad\mu_{2}(A)=\gamma(A\times\Omega)\,,

for any γΓ(μ1,μ2)\gamma\in\Gamma(\mu_{1},\mu_{2}) and any Borel set AΩA\subseteq\Omega.

A measure γ\gamma is often called a transport plan and γ\gamma minimizing KpK_{p} is said to solve the Kantorovich problem. In some cases, there exists an optimal transport map, a function T:ΩΩT:\Omega\to\Omega which solves the so-called Monge problem:

(6) inf{T|T#μ1=μ2}Kp1p(T),Kp(T)Ω|xT(x)|p𝑑μ1(x),\inf\limits_{\{T~{}~{}|~{}~{}T_{\#}\mu_{1}=\mu_{2}\}}K_{p}^{\frac{1}{p}}(T)\,,\qquad K_{p}(T)\equiv\int\limits_{\Omega}|x-T(x)|^{p}\,d\mu_{1}(x)\,,

where by T#μ1T_{\#}\mu_{1} we mean the pushforward of μ1\mu_{1} by TT, i.e., the measure which assigns to any Borel set AΩA\subseteq\Omega the measure T#μ1(A)=μ1(T1(A))T_{\#}\mu_{1}(A)=\mu_{1}(T^{-1}(A)). Any map that pushes μ1\mu_{1} to μ2\mu_{2} induces a transport plan (id,T)#μ1Γ(μ1,μ2)({\rm id},T)_{\#}\mu_{1}\in\Gamma(\mu_{1},\mu_{2}), and the transport cost is unambiguously defined, i.e., we can write Kp(T)K_{p}(T) as a shorthand for Kp((id,T)#μ1)K_{p}(({\rm id},T)_{\#}\mu_{1}).

On an interval II\subseteq\mathbb{R}, for any p1p\geq 1 and any two atomless measures, an optimal transport plan is induced by an optimal transport map [47, Theorem 2.9], i.e., Wpp(μ1,μ2)=Kp(T)W_{p}^{p}(\mu_{1},\mu_{2})=K_{p}(T), with a monotonically increasing TT defined by

(7) T=Fμ11Fμ2,T=F_{\mu_{1}}^{-1}\circ F_{\mu_{2}}\,,

where Fμ(y)μ(,y)F_{\mu}(y)\equiv\mu(-\infty,y) is the cumulative distribution function (CDF), and the inverse is taken in the generalized sense, Fμ1(x)inf{t|Fμ(t)x}F^{-1}_{\mu}(x)\equiv\inf\{t\in\mathbb{R}~{}|~{}F_{\mu}(t)\geq x\}.333The statement holds more generally for the optimal transport with respect to any convex cost function h(xy)h(x-y) on the line, but we will not pursue this level of generality here. For p>1p>1, this map is also unique [47]. Hence, the Wasserstein-pp distance has the much simpler form

(8) Wp(μ1,μ2)=Fμ11Fμ21Lp().W_{p}(\mu_{1},\mu_{2})=\|F_{\mu_{1}}^{-1}-F_{\mu_{2}}^{-1}\|_{L^{p}(\mathbb{R})}\,.

In the particular case of p=1p=1, one gets the more straightforward formula with the CDFs (and not their inverses) W1(μ1,μ2)=Fμ1Fμ2L1()W_{1}(\mu_{1},\mu_{2})=\|F_{\mu_{1}}-F_{\mu_{2}}\|_{L^{1}(\mathbb{R})}, see [46, 50].

Our main question can be rigorously formalized as

Question 1.

For a one-dimensional domain II with nonzero measure |I|>0|I|>0, p1p\geq 1, c>0c_{\infty}>0, c1(0,c|Ω|]c_{1}\in(0,c_{\infty}|\Omega|], and N+N\in\mathbb{N}_{+}, what are the minimizers and the minimum value of the minimization problem

(9) minfX(c,c1,N,I)Wp(f+,f).\min_{f\in X(c_{\infty},c_{1},N,I)}W_{p}(f_{+},f_{-})\,.

By Wp(f+,f)W_{p}(f_{+},f_{-}), we mean the WpW_{p}-distance between the measures whose densities are f+f_{+} and ff_{-}. Note that c1(0,c|Ω|]c_{1}\in(0,c_{\infty}|\Omega|] and N>0N>0 are specified in the statement of Question 1 as necessary and sufficient condition for X(c,c1,N,I)X(c_{\infty},c_{1},N,I)\neq\emptyset.

A crucial ingredient to study the above minimization problem is to establish an equivalence formulation that provides a characterization of the minimizers to the original problem. Connected to this, we present a sub-class of functions, defined below:

Definition 2.

Let II be a one dimensional domain (a curve or a metric graph). Denote by Xs=Xs(c,c1,N,I)X_{s}=X_{s}(c_{\infty},c_{1},N,I) the set of step-functions fX(c,c1,N,I)f\in X(c_{\infty},c_{1},N,I) such that f=±cf=~{}\pm c_{\infty} only on intervals adjacent to points in Z(f)Z(f) and 0 everywhere else; see, e.g., Figure 1 and Figure 3(B).

3. The Interval

In this section, we study the case of a nonempty interval I=(0,L)I=(0,L) with L>0L>0. Our strategy to answer Question 1, here and throughout this paper, is that of optimization; for every candidate function fXf\in X, we attempt to construct gXg\in X such that Wp(f+,f)>Wp(g+,g)W_{p}(f_{+},f_{-})>W_{p}(g_{+},g_{-}). The minimizers will therefore be the only functions for which further optimization is not possible, and we will show that, by construction, those are also global minimizers.

Lemma 2.

Let I=(0,L)I=(0,L). For every fX=X(c,c1,N,I)f\in X=X(c_{\infty},c_{1},N,I), denote Z(f)={z1,,zN}Z(f)=\{z_{1},\ldots,z_{N}\} with zi<zi+1z_{i}<z_{i+1} for all 1i<N1\leq i<N. Then, there exists a function gXsg\in X_{s} such that

  1. (i)

    Z(f)=Z(g)Z(f)=Z(g)

  2. (ii)

    For any ziZ(g)z_{i}\in Z(g), g0g\geq 0 on Ii=(zi,zi+1)I_{i}=(z_{i},z_{i+1}) if and only if f0f\geq 0 there, and

    Iig(x)𝑑x=Iif(x)𝑑x.\int_{I_{i}}g(x)\,dx=\int_{I_{i}}f(x)\,dx.
  3. (iii)

    Wp(f+,f)Wp(g+,g)W_{p}(f_{+},f_{-})\geq W_{p}(g_{+},g_{-}) for any p1p\geq 1, with strong inequality if fXsf\not\in X_{s}.

Remark.

Even though the optimal transport plan in this case is given by the monotonic map (7), we will work in this proof with a general coupling γΓ(f+,f)\gamma\in\Gamma(f_{+},f_{-}). This level of generality shows that at least the geometric nature of the problem extends to the following more general setting: let Ω\Omega be a simple curve, c(x,y)=h(|xy|)c(x,y)=h(|x-y|) where h:++h:\mathbb{R}_{+}\to\mathbb{R}_{+} is a monotonically increasing function in the geodesic distance for two points x,yΩx,y\in\Omega, and define the cc-transport cost as

(10) Kc(T)=Ω×Ωc(x,T(x)),dμ(x).K_{c}(T)=\int_{\Omega\times\Omega}c(x,T(x))\,,d\mu(x)\,.

Then, defining the optimal transport with respect to c(x,y)c(x,y) analogously to (4), Lemma 2 still holds. Also, it will allow us to generalize this statement immediately to star graphs in Section 5.

Proof.

Consider an optimal transport plan between f+f_{+} and ff_{-}, i.e., a measure γΓ(f+,f)\gamma\in\Gamma(f_{+},f_{-}) such that Kp(γ)=Wpp(f+,f)K_{p}(\gamma)=W_{p}^{p}(f_{+},f_{-}), see equation (5). The intuition is that for each 0jN0\leq j\leq N, some of the mass on IjI_{j} has to be transported to the left by γ\gamma, and some has to be transported to the right, see Fig. 3(a).444The red dotted line, separating between the left and right transported parts in IjI_{j}, is vertical in Fig. 3(a), since the optimal transport in this case is given by a map. In the case when the optimal transport is given by a coupling, this separation would be better depicted by a curve h(x)h(x) with 0h(x)f+(x)0\leq h(x)\leq f_{+}(x). It is therefore less costly to have those respective masses already concentrated near the endpoints of IjI_{j}, see Fig. 3(b).

(a)
Refer to caption
(b)
Refer to caption
Figure 3. (A) For an interval IjI_{j} where f(x)0f(x)\geq 0 and a given transport plan γΓ(f+,f)\gamma\in\Gamma(f_{+},f_{-}), the mass left of the dotted lines will be transported to intervals II_{\ell} with <j\ell<j, and the mass to the right of the dotted to intervals II_{\ell} with >j\ell>j. (B) The cost of transport for the portrayed re-organization of ff on IjI_{j} is cheaper.

To prove the above intuition rigorously, we will inductively define a sequence of functions in the following way: let f0ff^{0}\equiv f and γ0γ\gamma^{0}\equiv\gamma. For 1jN1\leq j\leq N, we will define a function fjf^{j} and a new transport map γjΓ(f+j,fj)\gamma^{j}\in\Gamma(f_{+}^{j},f_{-}^{j}) such that they satisfy conditions (i–iii) of Lemma 2 on the intervals I0,,Ij1I_{0},\ldots,I_{j-1} and such that Kp(γj1)Kp(γj)K_{p}(\gamma_{j-1})\geq K_{p}(\gamma_{j}) for any p1p\geq 1.

Since ff has a definite sign on each interval, suppose without loss of generality that ff is non-negative on IjI_{j} (the negative case is completely symmetric). Define pjγj(Ij×)p^{j}\equiv\gamma^{j}(I_{j}\times\cdot); this is a Borel measure on II which specifies how much mass from IjI_{j} is transported in II under γj\gamma^{j}.

Since the functions f+f_{+} and ff_{-} (thus the functions f+jf_{+}^{j} and fjf_{-}^{j}) have disjoint supports, γj\gamma^{j} does not transports from IjI^{j} into itself, and so pj(Ij)=γj(Ij×Ij)=0p^{j}(I_{j})=\gamma^{j}(I_{j}\times I_{j})=0. There exist two constants λj,ϱj0\lambda_{j},\varrho_{j}\geq 0, such that (identifying z0=0z_{0}=0 and zN+1=Lz_{N+1}=L)

pj((z0,zj))=λj,pj((zj+1,zN+1))=ϱj.{p^{j}}((z_{0},z_{j}))=\lambda_{j}\,,\qquad{p^{j}}((z_{j+1},z_{N+1}))=\varrho_{j}\,.

These are the masses transported from IjI_{j} to its left and right, respectively.

We construct a new function fj+1Xf^{j+1}\in X with Z(fj)=Z(fj+1)Z(f^{j})=Z(f^{j+1}) as follows:555Since the nodal set Z(fj)Z(f^{j}) is independent of jj, we write zjz_{j} unambiguously.

fj+1(x)sign(f)(x){c,x(zj,zj+λjc),0,x(zj+λjc,zj+1ϱjc),c,x(zj+1ϱjc,zj)fj(x),otherwise.f^{j+1}(x)\equiv{\rm sign}(f)(x)\,\cdot\,\left\{\begin{array}[]{ll}c_{\infty}\,,&x\in\left(z_{j},z_{j}+\frac{\lambda_{j}}{c_{\infty}}\right)\,,\\ 0\,,&x\in\left(z_{j}+\frac{\lambda_{j}}{c_{\infty}},z_{j+1}-\frac{\varrho_{j}}{c_{\infty}}\right)\,,\\ c_{\infty}\,,&x\in\left(z_{j+1}-\frac{\varrho_{j}}{c_{\infty}},z_{j}\right)\\ f^{j}(x)\,,&{\rm otherwise}\,.\end{array}\right.

Clearly fj+1f^{j+1} satisfies conditions (i)–(ii) of Lemma 2 on the interval IjI_{j}, and by induction on the intervals I0,,Ij1I_{0},\ldots,I_{j-1} as well. Since the mass transported from IjI_{j} to the left is now concentrated on (zj,zj+λj/c)(z_{j},z_{j}+\lambda_{j}/c_{\infty}) as much as possible (with density =c=c_{\infty}), one can transport this mass to the left at a lower cost, by definition (4). The same holds for the transport out of IjI_{j} to the right. The transport from any other positive interval is defined identically to γj\gamma^{j}. The resulting γj+1\gamma^{j+1}, the optimal transport plan between f+j+1f^{j+1}_{+} and fj+1f^{j+1}_{-}, satisfies Kp(γj)Kp(γj+1)K_{p}(\gamma^{j})\geq K_{p}(\gamma^{j+1}) for any p1p\geq 1. This is because, under the new optimal transport plan, some of the mass is transported over a shorter distance, and no mass is transported over a longer distance.

Moreover, note that if fj+1fjf^{j+1}\neq f^{j}, i.e., if the construction really did change the function (and so also γjγj+1\gamma^{j}\neq\gamma^{j+1}), then a nonzero mass is now transported over a shorter distance, and so a strict inequality holds Kp(γj)>Kp(γj+1)K_{p}(\gamma^{j})>K_{p}(\gamma^{j+1}).

Finally, we set gfN+1g\equiv f^{N+1}. ∎

Lemma 2 implies that for any p1p\geq 1

minfX(c,c1,N,I)Wp(f+,f)minfXs(c,c1,N,I)Wp(f+,f).\min\limits_{f\in X(c_{\infty},c_{1},N,I)}W_{p}(f_{+},f_{-})\geq\min\limits_{f\in X_{s}(c_{\infty},c_{1},N,I)}W_{p}(f_{+},f_{-})\,.

And furthermore the minimum on the left hand side is attained only on Xs(c,c1,N,I)X_{s}(c_{\infty},c_{1},N,I). Hence, Question 1 reduces as follows:

Corollary 3.

For I=(0,L)I=(0,L), the two minimization problems

minfX(c,c1,N,I)Wp(f+,f)andminfXs(c,c1,N,I)Wp(f+,f),\min\limits_{f\in X(c_{\infty},c_{1},N,I)}W_{p}(f_{+},f_{-})\quad\text{and}\quad\min\limits_{f\in X_{s}(c_{\infty},c_{1},N,I)}W_{p}(f_{+},f_{-})\,,

have the same minimizers and the same minimum value for any p1p\geq 1.

Corollary 3 allows us consider the minimizers in Question 1 only from XsX_{s}, simply by shifting the mass without changing the nodal set itself. To further reduce the problem, we will now allow for non-local shifts: we will move mass across sub-intervals, which may also change the nodal set (but not its size). To ensure that these non-local shifts reduce the transport cost, we will make use of the monotonicity properties of (7).

Lemma 4.

Let I=(0,L)I=(0,L) and fXs(c,c1,N,I)f\in X_{s}(c_{\infty},c_{1},N,I). There exists a function gXsg\in X_{s} such that Wp(f+,f)Wp(g+,g)W_{p}(f_{+},f_{-})\geq W_{p}(g_{+},g_{-}), with the following property: denoting the optimal transport map (7) of gg by T=T[g]T=T[g], then TT only transport mass between adjacent intervals, i.e., T(Ij)Ij1Ij+1T(I_{j})\subseteq I_{j-1}\cup I_{j+1} for every Ij=(zj,zj+1)I_{j}=(z_{j},z_{j+1}) where {z1,,zN}=Z(g)\{z_{1},\ldots,z_{N}\}=Z(g), z0=0z_{0}=0, and zN+1=Lz_{N+1}=L.

Proof.

Suppose without loss of generality that ff is nonnegative on I0=(z0,z1)I_{0}=(z_{0},z_{1}). By definition, ff is characterized by 3N3N nonnegative numbers, {zi,li,ri}i=1N\{z_{i},l_{i},r_{i}\}_{i=1}^{N}, such that f=(1)i+1cf=(-1)^{i+1}c_{\infty} on (zili,zi)(z_{i}-l_{i},z_{i}), f=(1)icf=(-1)^{i}c_{\infty} on (zi,zi+ri)(z_{i},z_{i}+r_{i}), and f=0f=0 everywhere else,666Since fXsf\in X_{s}, behaviors such as the interval [F,G][F,G] in Figure 2 are excluded. see Fig. 4.

Refer to caption
Figure 4. A nodal point z2z_{2} and the supports of f+f_{+} and ff_{-} to its left and right, with z2l2z_{2}-l_{2} and z2+r2z_{2}+r_{2} annotated.

We will again construct a sequence of functions fjXs(c,c1,N)f^{j}\in X_{s}(c_{\infty},c_{1},N), now characterized by {zij,rij,lij}i=1N\{z_{i}^{j},r_{i}^{j},l_{i}^{j}\}_{i=1}^{N}, such that

  1. (i)

    Kp(Tj)Kp(Tj+1)K_{p}(T^{j})\geq K_{p}(T^{j+1}), where TjT^{j} is the monotone optimal transport map (7) associated with Wp(f+j,fj){W_{p}}(f^{j}_{+},f^{j}_{-}).

  2. (ii)

    Tj+1(Ik)Ik+1Ik1T^{j+1}(I_{k})\subseteq I_{k+1}\cup I_{k-1} for k=0,,jk=0,\ldots,j

Set f0=ff^{0}=f. Suppose without loss of generality that fjf^{j} is nonnegative on IjI_{j}. By Lemma 2, the mass in (zjj,zjj+rjj)(z_{j}^{j},z_{j}^{j}+r_{j}^{j}) is transported to the left, i.e., to I0,,Ij1I_{0},\ldots,I_{j-1}. By the induction assumption, in the jj-th step all of the mass in intervals with index less than jj is transported to adjacent intervals. Hence our inductive construction only needs to consider mass transported from IjI_{j} to the right.

Assume Tj=TT^{j}=T transports an interval E(zj+1lj+1j,zj+1)IjE\subseteq(z_{j+1}-l_{j+1}^{j},z_{j+1})\subseteq I_{j} to a non-adjacent sub-interval to the right of IjI_{j}, i.e., T(E)(zj+3j,L)T(E)\subseteq(z_{j+3}^{j},L) but T(E)Ij+1=T(E)\cap I_{j+1}=\emptyset.777Since fjf^{j} is non-negative on IjI_{j}, it is also non-negative on Ij+2I_{j+2}, and therefore the next-nearest interval on which ff is non-positive is Ij+3=(zj+3,zj+4)I_{j+3}=(z_{j+3},z_{j+4}).

For simplicity, suppose further that EE maps into a single interval, i.e., T(E)IiT(E)\subseteq I_{i} with i>j+2i>j+2; Otherwise, if TT only maps a part of EE into IiI_{i}, one can split EE into different sub-intervals each mapping into a distinct interval.

To construct fj+1f^{j+1}, we would like to perform a shift operation, that is, to shift T(E)T(E) into Ij+1I_{j+1}. To make this more precise, consider first the ideal situation where fj=0f^{j}=0 on some subset E~Ij+1\tilde{E}\subseteq I_{j+1} of equal length, i.e., |E~|=|E||\tilde{E}|=|E|. In loose terms, it means that there is space for EE to be shifted into Ij+1I_{j+1}, and so we would set

(11) fj+1(x)={c,xE~,0,xT(E)fj(x),otherwise.f^{j+1}(x)=\left\{\begin{array}[]{cc}-c_{\infty}\,,&x\in\tilde{E}\,,\\ 0\,,&x\in T(E)\\ f^{j}(x)\,,&{\rm otherwise}\,.\end{array}\right.

In particular, in this case Z(fj)=Z(fj+1).Z(f^{j})=Z(f^{j+1}).

The above construction, however, might no be possible; it might be that the interval Ij+1I_{j+1} is already full, by which we mean that if (zj+1j+rj+1j,zj+1jlj+2j)Ij+1(z_{j+1}^{j}+r_{j+1}^{j},z_{j+1}^{j}-l_{j+2}^{j})\subset I_{j+1}, the sub-interval on which f=0f=0 is shorter than T(E)T(E). Then the mass of T(E)T(E) cannot be shifted there (and fj+1f^{j+1} cannot be defined as in (11)), since ff is a step function of height ±c\pm c_{\infty}, and so cannot exceed this value and stay in Xs(c,c1,N)X_{s}(c_{\infty},c_{1},N). If this is the case, we need to push some or all of the points zj+2j,,zijz_{j+2}^{j},\ldots,z_{i}^{j} to the right by up to |E||E|, and then we can repeat the above construction as in (11), with the shifted intervals and nodal points.

The shift operation is depicted on Figure 5. Let us consider the effect of the shift operation on the overall transport cost Wp(f+j+1,fj+1W_{p}(f_{+}^{j+1},f_{-}^{j+1}):

  • The mass |E|c|E|c_{\infty}, which was previously transported between from EIjE\subseteq I_{j} to Tj(E)IiT^{j}(E)\subseteq I_{i}, is now transported over as shorter distance, to Tj+1(E)Ij+1T^{j+1}(E)\subseteq I_{j+1}.

  • Suppose that for >j\ell>j, the nodal endpoint zjz_{\ell}^{j} was pushed to the right. By the inductive construction, the transport from/to II_{\ell} could not have been from a point to the left of zjz_{j}. If it was transported to/from a point to the right of IiI_{i}, then the overall transport distance decreased.

  • Suppose that for >j\ell>j, the nodal endpoint zjz_{\ell}^{j} was pushed to the right. Suppose without loss of generality that Isupp(f+)I_{\ell}\subseteq{\rm supp}(f_{+}) (the negative case is analogous) and that for some DID\subseteq I_{\ell}, we have Tj(D)Imsupp(f)T^{j}(D)\subseteq I_{m}\subset{\rm supp}(f_{-}) with >m>j\ell>m>j; the proposed shift would then increase the transport distance, by potentially pushing DD away from T(D)T(D). This scenario, however, is impossible due to the monotonicity of TjT^{j}, see (7): take two points xEx\in E and xDx^{\prime}\in D, then x<xx<x^{\prime} but we T(x)>T(x)T(x)>T(x^{\prime}) (since i>)i>\ell), hence a contradiction.

We note here that our construction cannot change the order of points, i.e., zkj+1zk+1j+1z_{k}^{j+1}\leq z_{k+1}^{j+1}. Similarly, the construction keeps the nodal points inside II, i.e., z1j+10z_{1}^{j+1}\geq 0 and zNj+1Lz_{N}^{j+1}\leq L.

Refer to caption
Figure 5. The shift operation described in the proof of Lemma 4: on the jj-th step, some EIjE\subseteq I_{j} is mapped by TjT^{j} to a non-adjacent interval IiI_{i} right of EE. We shift that mass to Ij+1I_{j+1}. In the process, some or all of the nodal points in between, e.g., zjz_{\ell}^{j} with j+1<<ij+1<\ell<i, need to be pushed away (to the right) to “make room” for the shifted mass.

We established the first property in the induction, that Kp(Tj)Kp(Tj+1)K_{p}(T^{j})\geq K_{p}(T^{j+1}). Now, note that the new transport map we constructed is monotone and pushes f+j+1f_{+}^{j+1} to fj+1f_{-}^{j+1}. Hence, it is an optimal transport map between f+j+1f_{+}^{j+1} and fj+1f_{-}^{j+1} (the unique one for p>1p>1), Tj+1T^{j+1}. By construction, it satisfies the adjacency requirement, that Tj+1(Ik)Ik+1Ik1T^{j+1}(I_{k})\subseteq I_{k+1}\cup I_{k-1} for all kjk\leq j.

This completes the jj-th step. We take g=fNg=f^{N}, which completes the construction. Finally, note that unless f=gf=g, we strictly reduced the transport cost at some stage of the induction, and so Wp(f+,f)>Wp(g+,g)W_{p}(f_{+},f_{-})>W_{p}(g_{+},g_{-}). ∎

Remark.

The proof of Lemma 4 relies on the convexity of the cost function h(|xy|)=|xy|ph(|x-y|)=|x-y|^{p} with p[1,)p\in[1,\infty). When hh is convex, the monotone map (7) is a solution of Monge’s problem (6), see [47]. This is no longer the case if one considers a concave cost function, e.g., |xy|p|x-y|^{p} with p(0,1)p\in(0,1). For concave costs, it is known that the maps may not be monotone [29, 36], and a different type of analysis will be needed. Another interesting cost function not considered in this work is the LL^{\infty} cost [7, 18].

We conclude that fXs(c,c1,N,I)f\in X_{s}(c_{\infty},c_{1},N,I) can be a minimizer of the problem (9) for p1p\geq 1 if and only if it has the following structure:

  1. (i)

    Z(f)={z1,,zN}(0,L)Z(f)=\{z_{1},\ldots,z_{N}\}\subset(0,L).

  2. (ii)

    There exist d1,,dN>0d_{1},\ldots,d_{N}>0 such that zj+djzj+1dj+1z_{j}+d_{j}\leq z_{j+1}-d_{j+1} for all j=1,,Nj=1,\ldots,N, z1d10z_{1}-d_{1}\geq 0, and zN+dNLz_{N}+d_{N}\leq L.

  3. (iii)

    without loss of generality, assume that f0f\geq 0 on (z0,z1)(z_{0},z_{1}). Then for jj odd, g(x)=cg(x)=c_{\infty} on (zjdj,zj)(z_{j}-d_{j},z_{j}) and g(x)=cg(x)=-c_{\infty} on (zj,zj+dj)(z_{j},z_{j}+d_{j}), and vice versa for jj even.

In less formal language, a minimizing step-function attains its maximal value cc_{\infty} anti-symmetrically around each nodal point zjz_{j}, see Figure 1. The optimal transport plan γΓ(f+,f)\gamma\in\Gamma(f_{+},f_{-}), which is given by the monotone map (7), only transports across each nodal point, and Kp(γ)K_{p}(\gamma) is just the sum of costs accrued at each nodal point. The only questions that now remain concern the distribution of the width parameters d1,dNd_{1},\ldots d_{N}, and the overall minimal optimal transport cost.

Theorem 5.

Let I=(0,L)I=(0,L) with L>0L>0, p1p\geq 1, c>0c_{\infty}>0, c1(0,Lc]c_{1}\in(0,Lc_{\infty}], and N+N\in\mathbb{N}_{+}. The minimizers of the problem (9) are step functions, anti-symmetric about each nodal point, with value ±c\pm c_{\infty} and width c1/(2cN)c_{1}/(2c_{\infty}N). These minimizers satisfy

(12) minfX(c,c1,N,I)Wp(f+,f)=2p+1pc1Ncc11p.\min\limits_{f\in X(c_{\infty},c_{1},N,I)}W_{p}(f_{+},f_{-})=2^{-\frac{p+1}{p}}\frac{c_{1}}{Nc_{\infty}}c_{1}^{\frac{1}{p}}\,.
Proof.

By (8), the transport cost across a single nodal point zjz_{j} depends on the inverse CDFs.888Since Fμ(y)=μ(,y)F_{\mu}(y)=\mu(-\infty,y) might not be bijective, we define the inverse CDF by F1(x)inf{y|F(y)y}F^{-1}(x)\equiv\inf\{y\in\mathbb{R}~{}~{}|~{}~{}F(y)\geq y\}, see [47, Section 2.1]. By Lemmas 2 and 4, it suffices to consider step functions fXsf\in X_{s} where mass is transported only to adjacent sub-intervals. Suppose without loss of generality that f=cf=c_{\infty} on (z1d1,z1)(z_{1}-d_{1},z_{1}) and f=cf=-c_{\infty} on (z1,z1+d1)(z_{1},z_{1}+d_{1}). By definition, F+(z1)=F(z1+d1)=cd1F_{+}(z_{1})=F_{-}(z_{1}+d_{1})=c_{\infty}d_{1}. Hence Ff+1(cd1)Ff1(cd1)=d1F_{f_{+}}^{-1}(c_{\infty}d_{1})-F_{f_{-}}^{-1}(c_{\infty}d_{1})=d_{1}. On the interval (of cumulative probabilities) (0,cd1)(0,c_{\infty}d_{1}), the inverse CDFs are linear with slope 1/c1/c_{\infty}, and so for every t(0,cd1)t\in(0,c_{\infty}d_{1}) the difference between the inverse CDFs is constant, i.e., d1d_{1}. Hence,

(13) [0cd1|Ff+1(t)Ff1(t)|p𝑑t]1p=[cd1p+1]1p=c1pd11+1p.\left[\int\limits_{0}^{c_{\infty}d_{1}}|F_{f_{+}}^{-1}(t)-F_{f_{-}}^{-1}(t)|^{p}\,dt\right]^{\frac{1}{p}}=[c_{\infty}d_{1}^{p+1}]^{\frac{1}{p}}=c_{\infty}^{\frac{1}{p}}d_{1}^{1+\frac{1}{p}}\,.

Summing up the contributions of all nodal points, then by (8) we have that

(14a) Wpp(f+,f)=j=1Ncdjp+1.W_{p}^{p}(f_{+},f_{-})=\sum\limits_{j=1}^{N}c_{\infty}d_{j}^{p+1}\,.
For simplicity of computations, we will minimize Wpp(f+,f)W_{p}^{p}(f_{+},f_{-}) (which is equivalent to minimizing Wp(f+,f)W_{p}(f_{+},f_{-})). This is a constrained minimization problem, under the L1L^{1} constraint
(14b) j=1Ndj=c12c.\sum\limits_{j=1}^{N}d_{j}=\frac{c_{1}}{2c_{\infty}}\,.

By the method of Lagrange multipliers, let

=cjdjp+1λ(jdjc1/2c).\mathcal{L}=c_{\infty}\sum_{j}d_{j}^{p+1}-\lambda(\sum_{j}d_{j}-c_{1}/2c_{\infty})\,.

We get from the condition dj=0\partial_{d_{j}}\mathcal{L}=0 that

(15) λ=(p+1)cdjp,1jN.\lambda=(p+1)c_{\infty}d_{j}^{p}\,,\qquad 1\leq j\leq N\,.

The condition λ=0\partial_{\lambda}\mathcal{L}=0 yields

c12c=j=1Ndj=N[λ(p+1)c]1p,\frac{c_{1}}{2c_{\infty}}=\sum\limits_{j=1}^{N}d_{j}=N\left[\frac{\lambda}{(p+1)c_{\infty}}\right]^{\frac{1}{p}},

which leads to,

λ=(p+1)c(c12Nc)p.\lambda=(p+1)c_{\infty}\left(\frac{c_{1}}{2Nc_{\infty}}\right)^{p}\,.

Combining the above expressions of λ\lambda together, we get dj=c1/2Ncd_{j}=c_{1}/2Nc_{\infty}, and the overall cost is, by (14a),

Wpp(f+,f)=Nc(c12Nc)p+1=12p+1(c1Nc)pc1.W_{p}^{p}(f_{+},f_{-})=Nc_{\infty}\left(\frac{c_{1}}{2Nc_{\infty}}\right)^{p+1}=\frac{1}{2^{p+1}}\left(\frac{c_{1}}{Nc_{\infty}}\right)^{p}c_{1}\,.

As a consequence, we have

Corollary 6.

Let I=(0,L)I=(0,L) and p1p\geq 1. For any fL(I)f\in L^{\infty}(I) we have

(16) Wp(f+,f)|Z(f)|211p(f1f)f11p.W_{p}(f_{+},f_{-})\cdot|Z(f)|\geq 2^{-1-\frac{1}{p}}\left(\frac{\|f\|_{1}}{\|f\|_{\infty}}\right)\|f\|_{1}^{\frac{1}{p}}\,.

This inequality is sharp, where equality holds if and only if ff is a minimizer of (9) as described in Theorem 5.

We thus see that for the case of I=(0,L)I=(0,L), we have a sharp inequality (16) in a multiplicative form. The inequality shows no dependence on the length of LL, which can also be seen from the scaling properties of the quantities involves. Moreover, we are able to characterize an explicit constant factor which is also the best possible. Our result establishes that for p=1p=1, the inequality in [17, Prop. 3.1] is (16), and therefore it is indeed sharp. For p1p\geq 1, the scaling in [21, Corollary 3.3] of the lower bound is the same as here, 211/p2^{-1-1/p}, but the overall constant is lower since it is proven in more general settings. We may attribute the proof of the inequality (16) not only to properties of the special geometry in one dimension, but also the way the minimization problem (9) is posed; it is the solution to the latter that leads to, as a by-product, the most natural ”uncertainty principle” in a multiplicative form.

4. The Circle

We now consider the minimizers of W1(f+,f)W_{1}(f_{+},f_{-}) in X=X(c,c1,N,I)X=X(c_{\infty},c_{1},N,I) for the case where II is a circle. As we see from the earlier discussion on the case of an interval, by scaling of c1c_{1} we can restrict our attention to the unit circle I=S1I=S^{1}. The geodesic distance between two points eit,eisS1e^{it},e^{is}\in S^{1} is defined by

d(eit,eis)=min{(ts)mod(2π),(st)mod(2π)}.d(e^{it},e^{is})=\min\{(t-s)~{}{\rm mod}(2\pi),(s-t)~{}{\rm mod}(2\pi)\}\,.

For every zjZ(f)z_{j}\in Z(f) we denote zj=eisjz_{j}=e^{is_{j}} such that, without loss of generality

0=s1<s2<<sN<sN+1=2π,0=s_{1}<s_{2}<\cdots<s_{N}<s_{N+1}=2\pi\,,

i.e., the nodal points are ordered from the xx-axis in a counterclockwise direction.

In this section we show that, on a circle, the minimizers and minimal value of (9) are analogous to those on the interval (Theorem 5 and Corollary 6):

Theorem 7.

Let p1p\geq 1, c>0c_{\infty}>0, c1(0,2cπ]c_{1}\in(0,2c_{\infty}\pi], and N+N\in\mathbb{N}_{+}. The minimizers of Wp(f+,f)W_{p}(f_{+},f_{-}) over X(c,c1,N,S1)X(c_{\infty},c_{1},N,S^{1}) are step functions in Xs(c,c1,N,S1)X_{s}(c_{\infty},c_{1},N,S^{1}), anti-symmetric about each nodal point, with value ±c\pm c_{\infty} and width c1/2cNc_{1}/2c_{\infty}N. Hence

minfX(c,c1,N,S1)Wp(f+,f)=211pc1Ncc11p.\min\limits_{f\in X(c_{\infty},c_{1},N,S^{1})}W_{p}(f_{+},f_{-})=2^{-1-\frac{1}{p}}\frac{c_{1}}{Nc_{\infty}}c_{1}^{\frac{1}{p}}\,.
Proof.

Throughout this proof, we use the existence of an optimal transport map; for all orders p1p\geq 1 there exists such a map T:S1S1T:S^{1}\to S^{1}, as established for p>1p>1 in see [37] or [51, Theorem 2.47], and for p=1p=1 in [28] (see also [47, Section 3.1] and [14, 10] for general metric settings). A more elaborate analysis of optimal transport maps on the circle appears in [24].

First, we note that the proof of Lemma 2 carries to the circle without change: it is an iterative process done on ff in each interval (arch) {eis|s(sj,sj+1)}j=1N\{e^{is}~{}|~{}s\in(s_{j},s_{j+1})\}_{j=1}^{N}. Hence, we can restrict our attention to solving the optimal-transport minimization problem on Xs(c,c1,N,S1)X_{s}(c_{\infty},c_{1},N,S^{1}), see Definition 2.

Next, we turn to extend Lemma 4 to the circle, i.e., to show that a function fXsf\in X_{s} has a cheaper transport cost if the optimal transport map associated with it only transports mass between adjacent arcs. Here lies the main new challenge: we cannot simply implement our inductive “shifting” strategy from Lemma 4, since there are no end-points to the circle, and hence no natural candidate arc IjI_{j} to start the induction from. To this end, we prove the following lemma:

Lemma 8.

Let fXsf\in X_{s} and let TT be the optimal transport map associated with Wp(f+,f)W_{p}(f_{+},f_{-}) for a fixed p1p\geq 1. Then, there exists a partition S1=J1JKS^{1}=J_{1}\cup\cdots\cup J_{K} into disjoint arcs such that for each arc JkJ_{k}, either

  1. (1)

    All points xJksupp(f+)x\in J_{k}\cap{\rm supp}(f_{+}) are transported clockwise to T(x)JkT(x)\in J_{k}, or

  2. (2)

    all points xJksupp(f+)x\in J_{k}\cap{\rm supp}(f_{+}) are transported counterclockwise to T(x)JkT(x)\in J_{k}, or

  3. (3)

    f=0f=0 on JkJ_{k}.

To prove Lemma 8, we will rely on the existence of an optimal transport map (in the sense of Monge). For p>1p>1, recall the following theorem due to McCann [37] (for the Euclidean case, see [11]), presented here in a simplified form:

Theorem (McCann [37]).

Let Ω\Omega be a C3C^{3} connected, compact, Riemannian manifold without boundaries. Let p>1p>1 and consider two Borel probability measures μ\mu and ν\nu with finite pp-moments, such that μ\mu is absolutely continuous with respect to the volume measure of Ω\Omega. Then, with respect to the Wasserstein-pp distance (4), there exists an optimal transport map TT. Moreover, there exists a vector field VV such that

(17) T(x)=expx[V],T(x)=\exp_{x}[V]\,,

where exp\exp is the exponential map with respect to the geodesic distance.

Remark.

McCann’s theorem holds for the general class of optimal transport problems with respect to KcK_{c} (see (10)) for a strictly convex cost function c(x,y)=h(|xy|)c(x,y)=h(|x-y|). Furthermore, the vector-field VV is characterized in terms of the gradient of the Kantorovich potential, see [37, Theorem 13] for details.

The case of p=1p=1 is similar: the circle decomposes into a union of geodesics lines (arcs), known as “transport rays”, which intersect (potentially) only at their endpoints. On each such transport ray, the optimal transport map is monotone; see details in [47, Section 3.1] and [28].

Proof of Lemma 8.

First consider p>1p>1. Choose any xS1x\in S^{1}, and suppose first that y=T(x)y=T(x) is transported clockwise from xx, i.e., yy is clockwise to xx on the shorter arc between the two points. Then, any point wS1supp(f+)w\in S^{1}\cap{\rm supp}(f_{+}) lying on that arc is also transported clockwise, since the vector field VV points clockwise on that arc. Hence, the set of points xS1x\in S^{1} for which TT transports clockwise is a union of arcs, and so each JkJ_{k} is a connected component of that set. If y=T(x)y=T(x) is counterclockwise, the proof is analogous, and together these type of arcs cover supp(f+)supp(f){\rm supp}(f_{+})\cup{\rm supp}(f_{-}). The remaining points on S1S^{1} are those where f=0f=0, and by the extension of Lemma 2 to the circle, it too is covered by disjoint arcs. This completes the proof for p>1p>1.

For p=1p=1, the decomposition of the circle into transport rays, on each of which the optimal-transport map is monotone, yields an analogous proof. ∎

Proof of Theorem 7 - continued: Suppose first that J1S1J_{1}\neq S^{1}, i.e., it is not the case that all points xsupp(f+)x\in{\rm supp}(f_{+}) are transported clockwise (or counterclockwise). Then, on each arc JkJ_{k} we can apply the analysis from the case of the interval. Suppose points are transported clockwise on JkJ_{k}. Therefore the counterclockwise end of JkJ_{k} has to be in supp(f+){\rm supp}(f_{+}), and we can choose it as the starting point of the inductive process in Lemma 4.

Refer to caption
Figure 6. The “lagging” scenario as defined in the proof of Lemma 8: red intervals are in the support of f+f_{+}, Blue intervals are in the support of ff_{-}. The two dash-dotted lines are equator lines: They indicate that e.g., IjI_{j} would transport to IiI_{i} counterclockwise under TT.

If, on the contrary, J1=S1J_{1}=S^{1} assume that without loss of generality, all points xsupp(f+)x\in{\rm supp}(f_{+}) are transported counterclockwise. If each xsupp(f+)x\in{\rm supp}(f_{+}) is transported to the adjacent interval, the proof is completed. Assume otherwise, i.e., that we are in a “lagging” scenario as in Figure 6. We claim that this cannot be the optimal transport map for a minimizer of Wp(f+,f)W_{p}(f_{+},f_{-}). This can be shown by constructing gXsg\in X_{s} for which Wp(g+,g)<Wp(f+,f)W_{p}(g_{+},g_{-})<W_{p}(f_{+},f_{-}) as follows:

Let Z(f)={z1,zN}Z(f)=\{z_{1},\ldots z_{N}\} be the nodal points arranged in a counterclockwise order, with an arbitrary starting point. As before, denote by IjI_{j} the arcs between zjz_{j} and zj+1z_{j+1}, where INI_{N} is the arc between zNz_{N} and z1z_{1}. Assume without loss of generality that I1supp(f+)I_{1}\subseteq{\rm supp}(f_{+}), and set I1I_{1} for gg to be the same as for ff. Adjacent to I1I_{1} counterclockwise we set a negative arc I2I_{2} precisely of the length |I1||I_{1}|, and we define the new pushforward map S#g+=gS_{\#}g_{+}=g_{-} as the monotonic map from I1I_{1} to I2I_{2}. We do so iteratively - we position positive intervals precisely of the size they had for ff, and then a negative interval of the same size. In defining gg, the L1L^{1} and LL^{\infty} norms are unchanged, and so it the number of nodal points. While the map SS might not be the optimal transport map between g+g_{+} to gg_{-}, the overall transport distances have been reduced and so Wp(f+,f)=Kp(T)>Kp(S)>Wp(g+,g)W_{p}(f_{+},f_{-})=K_{p}(T)>K_{p}(S)>W_{p}(g_{+},g_{-}).

Hence, we have extended Lemma 4 to the circle, for all p1p\geq 1. Now, since the transport occurs only across nodal points to adjacent intervals, our Lagrange-multipliers analysis for the case of the interval applies, and we obtain the desired result. ∎

Remark.

The work of Delon, Salomon, and Sobolevskii [24] suggests an alternative route to prove Lemma 8 on the circle: “lifting” each measure on the circle to a periodic measure on the line, they study locally optimal transport maps between the “lifted” periodic measure. These measures are similar to Fμ1FνF_{\mu}^{-1}\circ F_{\nu} up to a shift, and therefore in particular, are monotonic. We do not pursue this strategy further in this work, and refer to [24] for details.

5. Star graphs

Given a positive integer DD, we define the star graph SD=SD(L1,,LD)S_{D}=S_{D}(L_{1},\ldots,L_{D}) as the quotient space of the disjoint union of DD intervals Ij=[0,Lj)I_{j}=[0,L_{j}) where Lj>0L_{j}>0 for 1,,D1,\ldots,D, under the equivalence relation 0Ij00_{I_{j}}\equiv 0. For ease of notation, denote by tjt_{j} the point tIjt\in I_{j} for every 1jD1\leq j\leq D and every t(0,Lj]t\in(0,L_{j}].

We will call the 0 point the vertex of the star. Definition 1 for Z(f)Z(f) extends to I=SDI=S_{D}. Consequently, the definitions of X(c,c1,N,I)X(c_{\infty},c_{1},N,I) (see (3)) and Xs(c,c1,N,I)X_{s}(c_{\infty},c_{1},N,I) (see Definition 2) naturally extend to the case of star graph I=SDI=S_{D} as well. The distance between any two points xIjx\in I_{j} and yIky\in I_{k} is |xy||x-y| if j=kj=k, and x+yx+y otherwise, i.e., if the two points are on different edges of the star, the geodesic distance between them is the length of the path going from xx to yy through the vertex 0.

Refer to caption
Figure 7. Left and Center stars have 33 and 44 long edges, respectively. But in a star graph, the lengths of the edges matter: as the length L30+L_{3}\to 0^{+}, S4S_{4} is deformed into S3S_{3} (Right). This metric structure in turn manifests itself in the minimization problem and in its solution, see Theorem 13.

Stars are a class of spaces where we might expect the optimal dependence between Wp(f+,f)W_{p}(f_{+},f_{-}) and the number of nodal points NN to be non-multiplicative, for the following reason: In [17, 20], the sharp (up to a constant) uncertainty quantification (2) was derived for metric spaces which are essentially non-branching. Intuitively, this means that if 1(t),2(t):[0,1]Ω\ell_{1}(t),\ell_{2}(t):[0,1]\to\Omega are two “generic” geodesic lines of unit length, and 1(t)=2(t)\ell_{1}(t)=\ell_{2}(t) on an open subset of [0,1][0,1], then 1=2\ell_{1}=\ell_{2} everywhere; see [44] for details. Star graphs, however, are certainly spaces with branching (and so are trees in general). Hence, in search for new types of dependencies between minWp(f+,f)\min W_{p}(f_{+},f_{-}) and NN, stars are excellent candidates over which to study the minimization problem stated in Question 1.

The technical difficulty is that, on the star, we do not have explicit optimal maps such as (7) for the interval, nor do we even expect the existence of optimal transport maps, i.e., we expect a solution to the Kontorovich problem (4), but not Monge’s (6). More broadly, stars are an example of metric graphs, on which the study of optimal transport is at a relatively early stage [26, 35].

Main results: The key element of our analysis is that it is always “useful”, in the sense of minimizing Wp(f+,f)W_{p}(f_{+},f_{-}), to position one of the nodal points at the vertex of the star. We approach Question 1 on stars by establishing a correspondence between transport over a star-vertex and transport on the real line (Lemma 9). This equivalence allows us to reduce the minimization problem to a finite-dimensional constrained optimization problem (Theorem 13).

From there, the optimization problem bifurcates into several different cases, depending on both the topology and the lengths of the star’s edges. In Sections 5.25.5, we work the details of the following cases:

  • For an even number of sufficiently long edges, the vertex 0SD0\in S_{D} is equivalent to D/2D/2 nodal points. Hence, we have a multiplicative uncertainty principle of the type (presented here for simplicity with p=1p=1)

    W1(g+,g)c124c1N~,N~=N1+D2,W_{1}(g_{+},g_{-})\geq\frac{c_{1}^{2}}{4c_{\infty}}\frac{1}{\tilde{N}}\,,\qquad\tilde{N}=N-1+\frac{D}{2}\,,

    see Section 5.3.

  • For an odd number of sufficiently long edges, the main complication is that there is an imbalance between the number of positive and negative edges around 0. Nevertheless, we get the same type of inequality, only now with (Section 5.4)

    N~N1+D~,D~(D+1)(D1)2D.\tilde{N}\equiv N-1+\tilde{D}\,,\qquad\tilde{D}\equiv\frac{(D+1)(D-1)}{2D}\,.
  • When one of the edges is short, no N~\tilde{N} or D~\tilde{D} type inequalities emerge, but the lower bounds we obtain “interpolate” between the case of a star with D1D-1 long edges (and a “degenerate” DD-th edge with length zero) to the case of DD long edges (Section 5.5).

In summary, for stars with DD long edges, the vertex 0SD0\in S_{D} is effectively equivalent to D/2D/2 or D~=(D+1)(D1)2D\tilde{D}=\frac{(D+1)(D-1)}{2D} nodal points on the line, depending on whether DD is even or odd, respectively. Finally, while we can interpolate between DD to D1D-1 edges by shortening/lengthening the edges, the general case of a star does not seem to admit such a clean result; Indeed, an “uncertainty principle”-type lower bounds on W1(f+,f)|Z(f)|W_{1}(f_{+},f_{-})\cdot|Z(f)|, such as (16), breaks even in relatively simple metric graphs, thus demonstrating that the non-branching property used in [17] is indeed necessary.

5.1. Stars – the general framework

Our strategy to prove the main result, Theorem 13, consists of two parts. In Lemma 9, we analyze the optimal transport problem in the case of a single nodal point on the star vertex. Lemma 11 generalizes Lemma 4 on the optimality of transfer to adjacent sub-interval to the case of the star graph. The main technical difficulty here is that we cannot assume the existence of optimal transport maps (in the sense of Monge’s problem (6), as opposed to Kantorovich (4)), on graphs. Indeed Lemma 9 shows that already in simple settings of Z(f)={0}Z(f)=\{0\} the optimal transport plan will not be induced by a map. Moreover, we cannot expect to have monotonicity in the strict sense, due to the geometry of the graph. Nonetheless, Lemma 12 shows that the optimal transport plans satisfy a sufficient monotonicity-like property.

We begin with functions satisfying Z(f)={0}Z(f)=\{0\}.

Lemma 9.

Let D>M1D>M\geq 1 be integers and consider fX(c,c1,1,SD)f\in X(c_{\infty},c_{1},1,S_{D}) where for every xjIjSDx_{j}\in I_{j}\subseteq S_{D},

(18) f(xj){0,1jM,0,M+1jD,.f(x_{j})\left\{\begin{array}[]{ll}\geq 0\,,&1\leq j\leq M\,,\\ \leq 0\,,&M+1\leq j\leq D\,,\end{array}\right..

Define g:g:\mathbb{R}\to\mathbb{R} as999Here we identify xx\in\mathbb{R} with the point xjIjx_{j}\in I_{j} with the exact same value. This is unambiguous since each fjf_{j} is only defined on the respective edge IjI_{j}.

(19) g(x)=g(x;f)=g+(x)+g(x)j=1Mfj(x)+j=M+1Dfj(x)g(x)=g(x;f)=g_{+}(x)+g_{-}(x)\equiv\sum\limits_{j=1}^{M}{f_{j}(x)}+\sum\limits_{j=M+1}^{D}{f_{j}(-x)}

such that g+g_{+} is supported on (0,maxj=1,,MLj)(0,\max_{j=1,\ldots,M}L_{j}) and gg_{-} is supported on (maxj=M+1,,DLj,0)(-\max_{j=M+1,\ldots,D}L_{j},0). Then there is a surjective correspondence Φ:Γ(f+,f)Γ(g+,g)\Phi:\Gamma(f_{+},f_{-})\to\Gamma(g_{+},g_{-}) such that Kp(γ)=Kp(Φ[γ])K_{p}(\gamma)=K_{p}(\Phi[{\gamma}]) for every p1p\geq 1 and γΓ(f+,f)\gamma\in\Gamma(f_{+},f_{-}). Hence,

Wp(g+,g)=Wp(f+,f).W_{p}(g_{+},g_{-})=W_{p}(f_{+},f_{-})\,.
Proof.

For for every γΓ(f+,f)\gamma\in\Gamma(f_{+},f_{-}) and any two measurable sets E+,E(0,)E_{+},E_{-}\subseteq(0,\infty), define

(20) Φ[γ](E+,E)j=1Mi=M+1Dγ(E+Ij,EIi),\Phi[\gamma](E_{+},-E_{-})\equiv\sum\limits_{j=1}^{M}\sum\limits_{i=M+1}^{D}\gamma(E_{+}\cap I_{j},-E_{-}\cap-I_{i})\,,

where for every A(0,)A\subseteq(0,\infty) we define A{x|xA}-A\equiv\{-x~{}~{}|~{}~{}x\in A\}. Φ[γ]\Phi[\gamma] is a Borel measure on ×\mathbb{R}\times\mathbb{R} with marginals g+g_{+} and gg_{-}, simply by additivity. That Kp(γ)=Kp(Φ[γ])K_{p}(\gamma)=~{}K_{p}(\Phi[\gamma]) follows from definition (4): the same amount of mass travels the same distance in both plans. To summarize, we have shown the inclusion

{Kp(γ)|γΓ(f+,f)}={Kp(Φ[γ])|γΓ(f+,f)}{Kp(η)|ηΓ(g+,g)}.\{K_{p}(\gamma)~{}~{}|\gamma\in\Gamma(f_{+},f_{-})\}=\{K_{p}(\Phi[\gamma])~{}~{}|\gamma\in\Gamma(f_{+},f_{-})\}\subseteq\{K_{p}(\eta)~{}~{}|~{}~{}\eta\in\Gamma(g_{+},g_{-})\}\,.

To conclude that Wp(f+,f)=Wp(g+,g)W_{p}(f_{+},f_{-})=W_{p}(g_{+},g_{-}) we need to prove inclusion in the other direction, i.e., to find for any ηΓ(g+,g)\eta\in\Gamma(g_{+},g_{-}) a coupling γηΓ(f+,f)\gamma_{\eta}\in\Gamma(f_{+},f_{-}) such that Φ[γη]=η\Phi[\gamma_{\eta}]=\eta. We define γη\gamma_{\eta} in a symmetric manner: for any two measurable sets A,B(0,)A,B\subseteq(0,\infty) and 1i,jD1\leq i,j\leq D,

γη(AIi,BIj)η(A,B)fiL1(AIi)g+L1(A)fjL1(BIj)gL1(B),\gamma_{\eta}(A\cap I_{i},B\cap I_{j})\equiv\eta(A,-B)\frac{\|f_{i}\|_{L^{1}(A\cap I_{i})}}{\|g_{+}\|_{L^{1}(A)}}\frac{\|f_{j}\|_{L^{1}(B\cap I_{j})}}{\|g_{-}\|_{L^{1}(-B})}\,,

and for any general sets 𝒜,SD\mathcal{A},\mathcal{B}\subseteq S_{D}, then

γη(𝒜,)i=1Dj=1Dγη(𝒜Ii,Ij).\gamma_{\eta}(\mathcal{A},\mathcal{B})\equiv\sum\limits_{i=1}^{D}\sum\limits_{j=1}^{D}\gamma_{\eta}(\mathcal{A}\cap I_{i},\mathcal{B}\cap I_{j})\,.

Again, by definition γη\gamma_{\eta} is a Borel measure on SD×SDS_{D}\times S_{D}. To verify that γηΓ(f+,f)\gamma_{\eta}\in\Gamma(f_{+},f_{-}), we compute its marginals. Let (x,y)suppγη(x,y)\in{\rm supp}\gamma_{\eta}. Necessarily yy is an element in one of the negative intervals IM+1,IDI_{M+1},\ldots I_{D}, then for any measurable set A(0,)A\subseteq(0,\infty)

γη(AIi,SD)\displaystyle\gamma_{\eta}(A\cap I_{i},S_{D}) =γη(AIi,j=M+1DIj)\displaystyle=\gamma_{\eta}\left(A\cap I_{i},\bigcup\limits_{j=M+1}^{D}I_{j}\right)
=j=M+1Dγη(AIi,Ij)\displaystyle=\sum\limits_{j=M+1}^{D}\gamma_{\eta}(A\cap I_{i},I_{j})
=η(A,(,0))fiL1(AAi)g+L1(A)j=M+1DfjL1(Ij)gL1(,0)\displaystyle=\eta(A,(-\infty,0))\frac{\|f_{i}\|_{L^{1}(A\cap A_{i})}}{\|g_{+}\|_{L^{1}(A)}}\sum\limits_{j=M+1}^{D}\frac{\|f_{j}\|_{L^{1}(I_{j})}}{\|g_{-}\|_{L^{1}(-\infty,0)}}
=η(A,(,0))fiL1(AIi)g+L1(A)1=fiL1(AIi),\displaystyle=\eta(A,(-\infty,0))\frac{\|f_{i}\|_{L^{1}(A\cap I_{i})}}{\|g_{+}\|_{L^{1}(A)}}\cdot 1=\|f_{i}\|_{L^{1}(A\cap I_{i})}\,,

where at the last passage we used the fact that g+g_{+} is the density of the second-coordinate marginal of any ηΓ(g+,g)\eta\in\Gamma(g_{+},g_{-}), i.e., that η(A,(,0))=g+L1(A)\eta(A,(-\infty,0))=\|g_{+}\|_{L^{1}(A)}. The calculation of the second-coordinate marginals of γη\gamma_{\eta} is analogous.

Finally, to prove that Φ[γη]=η\Phi[\gamma_{\eta}]=\eta, consider a pair of open sets E+,E(0,)E_{+},E_{-}\subseteq(0,\infty). By (20)

Φ[γη](E+,E)\displaystyle\Phi[\gamma_{\eta}](E_{+},-E_{-}) =j=1Mi=M+1Dη(E+,E)fiL1(E+Ii)g+L1(E+)fjL1(EIj)gL1(E)\displaystyle=\sum\limits_{j=1}^{M}\sum\limits_{i=M+1}^{D}\eta(E_{+},-E_{-})\frac{\|f_{i}\|_{L^{1}(E_{+}\cap I_{i})}}{\|g_{+}\|_{L^{1}(E_{+})}}\frac{\|f_{j}\|_{L^{1}(E_{-}\cap I_{j})}}{\|g_{-}\|_{L^{1}(-E_{-})}}
=η(E+,E)g+L1(E+)gL1(E)j=1MfjL1(EIj)i=M+1DfiL1(E+Ii)\displaystyle=\frac{\eta(E_{+},-E_{-})}{\|g_{+}\|_{L^{1}(E_{+})}\cdot\|g_{-}\|_{L^{1}(-E_{-}})}\sum\limits_{j=1}^{M}\|f_{j}\|_{L^{1}(E_{-}\cap I_{j})}\sum\limits_{i=M+1}^{D}\|f_{i}\|_{L^{1}(E_{+}\cap I_{i})}
=η(E+,E)g+L1(E+)gL1(E)f+L1(E+)fL1(E)\displaystyle=\frac{\eta(E_{+},-E_{-})}{\|g_{+}\|_{L^{1}(E_{+})}\cdot\|g_{-}\|_{L^{1}(-E_{-})}}\|f_{+}\|_{L^{1}(E_{+})}\|f_{-}\|_{L^{1}(-E_{-})}
=η(E+,E).\displaystyle=\eta(E_{+},-E_{-})\,.

To solve the minimization problem on SDS_{D}, we first state an extension of Lemma 2 to star graphs, where the proof is completely identical to that of Lemma 2:

Lemma 10.

For every fX=X(c,c1,N,SD)f\in X=X(c_{\infty},c_{1},N,S_{D}), there exists a function hXs(c,c1,N,SD)h\in X_{s}(c_{\infty},c_{1},N,S_{D}) such that

  1. (i)

    Z(f)=Z(h)Z(f)=Z(h)

  2. (ii)

    For any sub-interval JJ between to adjacent nodal points, i.e., J=(z,z)J=(z,z^{\prime}) where z,zZ(f)z,z^{\prime}\in Z(f) and (z,z)Z(f)=(z,z^{\prime})\cap Z(f)=\emptyset, then h0h\geq 0 on JJ if and only if f0f\geq 0 there, and

    (21) Jh(x)𝑑x=Jf(x)𝑑x.\int_{J}h(x)\,dx=\int_{J}f(x)\,dx\,.

    If 0Z(f)0\not\in Z(f), (21) also holds when JJ is a maximal star-like subgraph for which 0J0\in J but Z(f)Z(f) is disjoint from the interior of JJ.

  3. (iii)

    Wp(f+,f)Wp(h+,h)W_{p}(f_{+},f_{-})\geq W_{p}(h_{+},h_{-}) for any p1p\geq 1 with equality possible only if fXs(c,c1,N,SD)f\in X_{s}(c_{\infty},c_{1},N,S_{D}) as well.

We now proceed to prove an analog of Lemma 4 for star graphs; the minimizers of Wp(f+,f)W_{p}(f_{+},f_{-}) in Xs(c,c1,N,SD)X_{s}(c_{\infty},c_{1},N,S_{D}) are those where mass is transported only between adjacent sub-intervals.

Lemma 11.

Let fXs(c,c1,N,SD)f\in X_{s}(c_{\infty},c_{1},N,S_{D}). There exists gXs(c,c1,N,SD)g\in X_{s}(c_{\infty},c_{1},N,S_{D}) such that Wp(f+,f)Wp(g+,g)W_{p}(f_{+},f_{-})\geq W_{p}(g_{+},g_{-}) for every p1p\geq 1, with the following property:

If γΓ(g+,g)\gamma\in\Gamma(g_{+},g_{-}) is an optimal transport plan, then γ\gamma only transport mass between adjacent intervals. By this, we mean that if JJ and JJ^{\prime} are two closed sub-intervals between two adjacent nodal points (or the smallest star-like subgraph between the nodal points closest to 0), then γ(J×J)0\gamma(J\times J^{\prime})\neq 0 only if JJ\partial J\cap\partial J^{\prime}\neq\emptyset.

The new challenge in proving Lemma 11 is the absence of a monotonicity property. On the interval, the explicitly optimal transport map (7) is monotone. On the circle, monotonicity is a consequence of the exponential map form of (17) (or of the more specific construction in [24]). On star graphs, while we cannot hope for monotonicity in the usual sense, we show that a similar property (Lemma 12) is sufficient to demonstrate that the shift operations described in Lemma 4 can be implemented on SDS_{D} as well (Lemma 11).

Proof of Lemma 11.

We define the following disjoint partition:

SD=Ji=1Dj=1n(i)Ji,j,S_{D}=J^{*}\cup~{}\bigcup\limits_{i=1}^{D}\bigcup\limits_{j=1}^{n(i)}J_{i,j}\,,

where on each interval IiI_{i} that comprises the star graph SDS_{D}, we denote the sub-intervals between adjacent nodal points by Ji,1,,Ji,niIiJ_{i,1},\ldots,J_{i,n_{i}}\subset I_{i} where ni0n_{i}\geq 0 for each 1id1\leq i\leq d, and where the subintervals are ordered by decreasing distances from 0. Finally, let JJ^{*} be the smallest star-like graph between the closest nodal point to 0. If 0Z(f)0\in Z(f), then J=J^{*}=\emptyset. For ease of notation, it is useful to denote J=Ji,n(i)+1J^{*}=J_{i,n(i)+1} for all 1iD1\leq i\leq D.

We repeat the inductive construction of Lemma 4 (see Figure 8 for illustration): for each jj, we iterate over all 1iD1\leq i\leq D for which jn(i)j\leq n(i). Suppose without loss of generality that Ji,jsupp(f+)J_{i,j}\cap{\rm supp}(f_{+})\neq\emptyset and that for some set of nonzero measure EJi,jsupp(f+)E\subseteq J_{i,j}\cap{\rm supp}(f_{+}) we have that γ(E,)\gamma(E,\cdot) is supported on non-adjacent intervals, i.e.,

cγ(E,SD(Ji,j1Ji,j+1))>0.c\equiv\gamma\left(E,S_{D}\setminus\left(J_{i,j-1}\cup J_{i,j+1}\right)\right)>0\,.

For each non-adjacent interval where γ(E,)\gamma(E,\cdot) is supported, we will shift that exact same mass to either Ji,j1J_{i,j-1} or Ji.j+1J_{i.j+1}, depending on the relative order between the intervals.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 8. The shifts of Lemma 11 on S3S_{3}. (A). EE is being (partially) transported into UU under the optimal transport plan γ\gamma. (B). After the shift, the relevant mass of UU is shifted to the interval adjacent to EE, where mass on some of the intervals between EE and where UU previously was has been shifted to the right. (C). If x1Ex_{1}\in E, y3Uy_{3}\in U, i.e., (x1,y3)supp(γ)(x_{1},y_{3})\in{\rm supp}(\gamma), then this “push” could be harmful if (x2,y3)supp(γ)(x^{\prime}_{2},y^{\prime}_{3})\in{\rm supp}(\gamma), since then y3y^{\prime}_{3} could be pushed away from x2x^{\prime}_{2}. We show that this scenario is forbidden when γ\gamma is an optimal transport plan with respect to the cost function h(xy)=|xy|ph(x-y)=|x-y|^{p} for p1p\geq 1.

As in Lemma 4, these shifts might require to push away from Ji,jJ_{i,j} some nodal points to create room for the negative mass. See Figure 8 for an illustration. Suppose then that some interval of length AA was shifted from Ji,jJ_{i^{\prime},j^{\prime}} to Ji,j±1J_{i,j\pm 1}, and that all or some of the mass between them has been pushed away from Ji,j±1J_{i,j\pm 1}. To exclude these scenarios, we state a monotonicity lemma (in the spirit of [47, Theorem 2.9]):

Lemma 12.

Let γ\gamma be an optimal transport plan with respect to WpW_{p} with p1p\geq 1, let 1i,j,k,lD1\leq i,j,k,l\leq D, let x,x,y,y>0x,x^{\prime},y,y^{\prime}>0, and let (xi,yk),(xj,yl)supp(γ)(x_{i},y_{k}),(x^{\prime}_{j},y^{\prime}_{l})\in{\rm supp}(\gamma). Then the following scenarios are impossible:

  1. (1)

    i,j,ki,j,k are distinct, k=lk=l, x<xx^{\prime}<x, and y<yy^{\prime}<y.

  2. (2)

    j,k,lj,k,l are distinct, i=ji=j, x<xx^{\prime}<x, and y<yy^{\prime}<y.

  3. (3)

    i=ji=j, x<xx^{\prime}<x, and y<yy^{\prime}<y (it may or may not be that i=ki=k, i.e., (xi,yk)(x_{i},y_{k}) may or may not be part of the same edge).

Here, as before, for every 1iD1\leq i\leq D and every x(0,Li)x\in(0,L_{i}), we denote by xix_{i} the point on IiI_{i} whose distance from the vertex 0 is xx. The same notation applies to yky_{k}, xjx^{\prime}_{j}, and yly^{\prime}_{l}.

The proof of Lemma 12 is given after we complete the proof of Lemma 11.

Lemma 12 is a star-graph analog of the monotonicity property (7) of optimal transport on the line: item (3) is the most straightforward, since it implies that all four points lie on the same linear path. Item (1) (and analogously (2)) is depicted in Figure 8(C) with ii, jj, and kk being distinct, k=lk=l, i.e., xix_{i} and xjx^{\prime}_{j} do not lie on the same edges, and yky^{\prime}_{k} is on the path between xix_{i} and yky_{k}.

With the help of Lemma 12, we now make the following claim: if a set with nonzero mass between Ji,jJ_{i,j} and Ji,jJ_{i^{\prime},j^{\prime}} was pushed away from Ji,jJ_{i,j}, then it cannot increase the overall transport cost.

As in the case of the interval or the circle, proving this claim would show that our inductive construction decreases the transport cost, thus proving the lemma.

To proceed with the proof of the above claim, we note that such a “push” away of a set DD from Ji,jJ_{i,j} (towards the vertex 0) could increase the overall transport cost in three scenarios: either (i) the optimal-transport plan γ\gamma couples DD to a set further away from the vertex on IiI_{i}, i.e., in Ji,1,,Ji,j1J_{i,1},\ldots,J_{i,j-1}, or, (ii) γ\gamma couples DD to an interval between IiI_{i} and DD (iii) γ\gamma couples DD to a third interval, i.e., not IiI_{i} and not IiI_{i^{\prime}}. We will now rule out both all these three scenarios.

First, (i) is impossible since there cannot be a point xiJi,1Ji,j2x^{\prime}_{i}\in J_{i,1}\cup\ldots\cup J_{i,j-2} which was previously transported into a pushed interval, due to the inductive construction. The only other way for the transport distance to increase with the shift is if yky^{\prime}_{k} is on the path between xix_{i} and yky_{k} and that (xi,yk),(xj,yk)supp(γ)(x_{i},y_{k}),(x^{\prime}_{j},y^{\prime}_{k})\in{\rm supp}(\gamma) with yyy^{\prime}\neq y. Lemma 12, item (3) similarly rules out scenario (ii). We are left with scenario (iii), depicted in Figure 8(c). We remark that it is really this scenario that distinguishes the star graph from the interval.

Here, iji\neq j, and so Lemma 12, items (1)–(2), imply that |x||x||x^{\prime}|\geq|x|. Hence, up to relabeling of the edges, we have that (x1,y3),(x2,y3)spt(γ)(x_{1},y_{3}),(x^{\prime}_{2},y^{\prime}_{3})\in{\rm spt}(\gamma) with |x||x||x^{\prime}|\geq|x|, and |y|>|y||y|>|y^{\prime}|. By a slight abuse of notation, let xx and xx^{\prime} be the infimum of all values such that x1x_{1} and x2x^{\prime}_{2} satisfy these hypotheses, i.e., the closest to the vertex 0.

To rule this scenario out, let us now define an auxiliary function, f¯\bar{f}, which exchanges the values of ff on (0,x)(0,x) between I1I_{1} and I2I_{2}. Formally, for every j=1,Dj=1,\ldots D and every tjIjt_{j}\in I_{j}

f¯(tj)={f(t2)ifj=1andt(0,x)f(t1)ifj=2andt(0,x)f(tj)otherwise\bar{f}(t_{j})=\left\{\begin{array}[]{ccc}f(t_{2})&{\rm if}~{}j=1&{\rm and}~{}t\in(0,x)\\ f(t_{1})&{\rm if}~{}j=2&{\rm and}~{}t\in(0,x)\\ f(t_{j})&{\rm otherwise}&\end{array}\right.

Let us also define γ¯\bar{\gamma} to be the coupling between f+¯\bar{f_{+}} and f¯\bar{f_{-}} which is identical to γ\gamma, with the corresponding changes. We now proceed to make one helpful observation that γ¯\bar{\gamma} is an optimal transport plan between f¯+\bar{f}_{+} and f¯\bar{f}_{-}.

Indeed, since x1x_{1} is transported to I3I_{3}, then no other point on (0,x1)(0,x_{1}) is transported to (x1,L1)(x_{1},L_{1}). This would contradict monotonicity on the interval I1I_{1} (see item (3) of Lemma 12). The analogous statement holds for x2x^{\prime}_{2} as well. Hence, the “exchange” of intervals did not change the distances along which mass is transported, and so Kp(γ)=Kp(γ¯)K_{p}(\gamma)=K_{p}(\bar{\gamma}), and so Wp(f+,f)=Kp(γ)Wp(f¯+,f¯)W_{p}(f_{+},f_{-})=K_{p}(\gamma)\geq W_{p}(\bar{f}_{+},\bar{f_{-}}).

We now want to show an inequality in the other direction. Let γ\gamma_{*} be an optimal transport plan for f¯\bar{f}, then γ¯\bar{\gamma_{*}} is a coupling of f+f_{+} and ff_{-}, and so

Wp(f+,f)=Kp(γ)=Kp(γ¯)Wp(f¯+,f¯)=Kp(γ)=Kp(γ¯)Wp(f+,f).W_{p}(f_{+},f_{-})=K_{p}(\gamma)=K_{p}(\bar{\gamma})\geq W_{p}(\bar{f}_{+},\bar{f}_{-})=K_{p}(\gamma_{*})=K_{p}(\bar{\gamma}_{*})\geq W_{p}(f_{+},f_{-})\,.

Hence we have equalities throughout, and in particular Kp(γ¯)=Wp(f¯+,f¯)K_{p}(\bar{\gamma})=W_{p}(\bar{f}_{+},\bar{f_{-}}), and so γ¯\bar{\gamma} is an optimal transport plan from f¯+\bar{f}_{+} to f¯\bar{f}_{-}.

By construction (x2,y3)spt(γ¯)(x^{\prime}_{2},y^{\prime}_{3})\in{\rm spt}(\bar{\gamma}) (as it were with γ\gamma), and due to the “exchange” now we have that (x2,y3)spt(γ¯)(x_{2},y_{3})\in{\rm spt}(\bar{\gamma}). This violates item (3) in Lemma 12.

Thus we arrive at a contradiction. Hence x=xx^{\prime}=x. But now consider the situation where (x1.y3),(x2,y3)spt(γ)(x_{1}.y_{3}),(x_{2},y^{\prime}_{3})\in{\rm spt}(\gamma), with y>yy>y^{\prime}. By the exact same argument with the roles of yy and xx interchanged, we arrive at a contradiction again. We have ruled out scenario (iii), as depicted in Fig. 8(c).

To summarize, the proposed shift cannot increase the transport cost, and therefore Lemma 11 is proven.

Proof of Lemma 12.

Consider first the case of p>1p>1, where c(x,y)=h(xy)=|xy|pc(x,y)=h(x-y)=|x-y|^{p} is strictly convex. First, recall that the support of γ\gamma is cc-cyclically monotone [47, Theorem 1.38], i.e.,

(22) h(xiyk)+h(xkyl)h(xiyl)+h(xjyk).h(x_{i}-y_{k})+h(x_{k}^{\prime}-y_{l}^{\prime})\leq h(x_{i}-y_{l}^{\prime})+h(x_{j}^{\prime}-y_{k})\,.

We will show in detail how the first of the three scenarios is impossible, as the others follow similarly.

Assume without loss of generality that x1I1x_{1}\in I_{1}, y3,y3I3y_{3}^{\prime},y_{3}\in I_{3}, and x2I2x^{\prime}_{2}\in I_{2}, as in Figure 8(c). Denote a=|y||y|>0a=|y|-|y^{\prime}|>0. Then (22) reads as

(23) h(|x|+|y|+a)+h(|x|+|y|)h(|x|+|y|)+h(|x|+|y|+a).h(|x|+|y^{\prime}|+a)+h(|x^{\prime}|+|y^{\prime}|)\leq h(|x|+|y^{\prime}|)+h(|x^{\prime}|+|y^{\prime}|+a)\,.

Suppose |x||x||x^{\prime}|\leq|x| first. Then

|x|+|y|<|x|+|y|+a|x|+|y|+a,|x^{\prime}|+|y^{\prime}|<|x^{\prime}|+|y^{\prime}|+a\leq|x|+|y^{\prime}|+a\,,
|x|+|y||x|+|y|<|x|+|y|+a.|x^{\prime}|+|y^{\prime}|\leq|x|+|y^{\prime}|<|x|+|y^{\prime}|+a\,.

We can express the “sandwiched” numbers, |x|+|y||x|+|y^{\prime}| and |x|+|y|+a|x^{\prime}|+|y^{\prime}|+a, as linear interpolation between the endpoints α=|x|+|y|\alpha=|x^{\prime}|+|y^{\prime}| and β=|x|+|y|+a\beta=|x|+|y^{\prime}|+a,

|x|+|y|+a=tα+(1t)β,|x|+|y|=tβ+(1t)α,|x^{\prime}|+|y^{\prime}|+a=t\alpha+(1-t)\beta\,,\qquad|x|+|y^{\prime}|=t\beta+(1-t)\alpha\,,

where

taa+|x||x|(0,1).t\equiv\frac{a}{a+|x|-|x^{\prime}|}\in(0,1)\,.

However, hh is strictly convex for p>1p>1, and so combined with (23), then

h(α)+h(β)\displaystyle h(\alpha)+h(\beta) h(tα+(1t)β)+h(tβ+(1t)α)\displaystyle\leq h(t\alpha+(1-t)\beta)+h(t\beta+(1-t)\alpha)
<th(α)+(1t)h(β)+th(β)+(1t)h(α)\displaystyle<th(\alpha)+(1-t)h(\beta)+th(\beta)+(1-t)h(\alpha)
=h(α)+h(β),\displaystyle=h(\alpha)+h(\beta)\,,

which is a contradiction.

The case of p=1p=1 is proven by approximating h(xy)=|xy|h(x-y)=|x-y| by a sequence of strictly convex functions, see [47]. Crucially, even though x,ySDx,y\in S_{D} and not on the real line, h(xy)h(x-y) is really a shorthand for hh operating on the geodesic distance (on the graph) between xx and yy, and hence the procedure generalizes to the graph.

We have proven item (1) in the lemma. Item (2) is analogous, but with interchanging xx and xx^{\prime} with yy and yy^{\prime}, respectively. Item (3) reduces to the standard analysis of the interval, since all four points lie on the same line.

Remark.

While it is preferable to have 0Z(f)0\in Z(f), it might not be possible; A sufficient condition is to have at least two sufficiently long edges. Up to a relabeling of the edges, we make the hypothesis that

|I1|,|I2|>c1/2c|I_{1}|,|I_{2}|>c_{1}/2c_{\infty}\,

i.e., all of the positive and negative mass could be allocated on I1I_{1} and I2I_{2}, respectively, in intervals adjacent to 0. This is a sufficient condition that avoids somewhat pathological star graphs for which 0 cannot be efficiently utilized as a nodal point. Such graphs might resemble an interval attached to many short edges at its end, see e.g., Fig. 9.

Refer to caption
Figure 9. A star with 66 edges, where L2,,L6εL_{2},\ldots,L_{6}\sim\varepsilon with ϵ<<1\epsilon<<1. However, since most of the edges are very short, this star should be almost equivalent to an interval, in terms of transport and mass allocation. In particular, for a function with a single nodal point (N=1N=1), that point cannot be the vertex 0.

Since the minimizers of Wp(f+,f)W_{p}(f_{+},f_{-}) on SDS_{D} shift mass only between adjacent sub-intervals (Lemma 11), we can now integrate our analysis of the single-point case (Lemma 9) and conclude that it is always preferable to have a nodal point on the vertex. Hence, minimizers have the following form:

  1. (1)

    The vertex is a nodal point, i.e., 0Z(f)0\in Z(f), and there exists r1,,rD0r_{1},\ldots,r_{D}\geq 0 such that ff on [0,rj]Ij[0,r_{j}]\subseteq I_{j} is non-negative for 1jM1\leq j\leq M, and non-positive for M+1jDM+1\leq j\leq D. The overall mass concentrated around 0 is therefore

    c~1cj=1Drj.\tilde{c}_{1}\equiv c_{\infty}\sum\limits_{j=1}^{D}r_{j}\,.
  2. (2)

    The other N1N-1 nodal points are “internal”, i.e., for each zZ(f)z_{\ell}\in Z(f) there exists 1jD1\leq j\leq D such that z]0,Lj[z_{\ell}\in]0,L_{j}[ and ff is an anti-symmetric step function around it with length d>0d_{\ell}>0, as in the interval case of Section 3.

Therefore, one needs to optimize only the positive step-widths r1,,rDr_{1},\ldots,r_{D} and d1,,dN1d_{1},\ldots,d_{N-1}. We summarize these results in the following theorem:

Theorem 13.

Suppose SDS_{D} has at least two edges longer than c1/2cc_{1}/2c_{\infty}. Question 1, i.e., minimizing Wpp(f+,f)W_{p}^{p}(f_{+},f_{-}) over X(c,c1,N,SD)X(c_{\infty},c_{1},N,S_{D}), is equivalent to finding r1,,rD,d1,,dN1>0r_{1},\ldots,r_{D},d_{1},\ldots,d_{N-1}>0 and a configuration function q:{1,,N1}{1,,D}q:\{1,\ldots,N-1\}\to\{1,\ldots,D\} which minimize

(24a) =1N1cdp+1+Wpp(g+,g),\sum\limits_{\ell=1}^{N-1}c_{\infty}d_{\ell}^{p+1}+W_{p}^{p}(g_{+},g_{-})\,,
where
(24b) g(x)cj=1M𝟙[0,rj](x)cj=M+1D𝟙[rj,0](x),g(x)\equiv c_{\infty}\sum\limits_{j=1}^{M}\mathbbm{1}_{[0,r_{j}]}(x)-c_{\infty}\sum\limits_{j=M+1}^{D}\mathbbm{1}_{[-r_{j},0]}(x)\,,
subject to the mass conservation conditions
(24c) {cj=1Mrj=cj=M+1Drj12c~1,c~1+2c=1N1d=c1,\left\{\begin{array}[]{cc}c_{\infty}\sum\limits_{j=1}^{M}r_{j}=c_{\infty}\sum\limits_{j=M+1}^{D}r_{j}\equiv\frac{1}{2}\tilde{c}_{1}\,,&\\ \tilde{c}_{1}+2c_{\infty}\sum\limits_{\ell=1}^{N-1}d_{\ell}=c_{1}\,,&\end{array}\right.
and the admissibility constraints
(24d) rj+2q()=jdLj,1jD.r_{j}+2\sum\limits_{q(\ell)=j}d_{\ell}\leq L_{j}\,,\qquad 1\leq j\leq D\,.

Note that, since gg is given explicitly by the equivalence established in Lemma 9, see (19), and since it is a function on the line, Wp(g+,g)W_{p}(g_{+},g_{-}) is given by the respective inverse CDFs as in (8). Therefore optimizing Wpp(g+,g)W_{p}^{p}(g_{+},g_{-}) involves a direct, closed form calculation (see below). Moreover, if the edges are long, i.e., Lj>c1/cL_{j}>c_{1}/c_{\infty} for any 1jD1\leq j\leq D, then the first constraint can be satisfied for any assignment of rjr_{j}’s and nodal point, i.e., for any choice of qq. Hence, (24) becomes computable in closed form without having to enumerate over all (N1)D(N-1)^{D} configuration functions qq.

We can also see what is the complication introduced by short edges - if e.g., LD0L_{D}\to 0, then effectively the number of effective edges becomes D1D-1 (in the sense that only vanishingly small mass can be assigned to IDI_{D}), thus changing the solution. Hence, obtaining a closed-form expression for (24) can be algebraically daunting. In fact, already for N=1N=1, where Z(f)={0}Z(f)=\{0\}, there are some complications. We will work out a few cases.

5.2. Long edges, 𝐍=𝟏{\bf N=1}

We find the minimizers of Wp(f+,f)W_{p}(f_{+},f_{-}) on fX(c,c~1,N=1,SD)f\in X(c_{\infty},\tilde{c}_{1},N=1,S_{D}) where all edges are sufficiently long.101010The notation of the L1L^{1} norm by c~1\tilde{c}_{1} here will be convenient when we consider N1N\geq 1 in the next section. As discussed above, it is always preferable to have the nodal point at 0. Moreover, it would always be better to distribute the L1L^{1} mass on all edges. This is because the L1L^{1} and LL^{\infty} constraints imply that “wasted” edges (on which fj=0f_{j}=0 everywhere) lead to transport over larger distances, and so to a more expensive optimal transport cost. Hence, Wp(f+,f)=Wp(g+,g)W_{p}(f_{+},f_{-})=W_{p}(g_{+},g_{-}), the equivalent function on the line given by (24a). Since the edges are sufficiently long, if ff has 1M<D1\leq M<D positive edges, then gg could be a general function with g=max(D,DM)c\|g\|_{\infty}=\max(D,D-M)c_{\infty}. Hence, by our analysis for the interval (Section 3), the optimal gg is of the form:

g(x)={(DM)c,x(r,0),Dc,x(0,r+),0,otherwise.g(x)=\left\{\begin{array}[]{ll}-(D-M)c_{\infty}\,,&x\in(-r_{-},0)\,,\\ Dc_{\infty}\,,&x\in(0,r_{+})\,,\\ 0\,,&{\rm otherwise}\,.\end{array}\right.

Crucially, an f:SDf:S_{D}\to\mathbb{R} to which this gg is equivalent is only possible when all of SDS_{D} edges (the different IjI_{j}’s) are sufficiently long. In that case, the corresponding ff will be of the form

(25) fj(x)={c,x(0,r+),1jM,c,x(0,r),M+1jD,0,otherwise,f_{j}(x)=\left\{\begin{array}[]{lll}c_{\infty}\,,&x\in(0,r_{+})\,,&1\leq j\leq M\,,\\ -c_{\infty}\,,&x\in(0,r_{-})\,,&M+1\leq j\leq D\,,\\ 0\,,&{\rm otherwise}\,,\end{array}\right.

where r+r_{+} and rr_{-} are determined by the mass conservation relation (24c), which reduces to

(26) Mr+=(DM)r.Mr_{+}=(D-M)r_{-}\,.

In these settings, minimizing Wp(f+,f)=Wp(g+,g)W_{p}(f_{+},f_{-})=W_{p}(g_{+},g_{-}) reduces to finding the optimal parameters r+,r>0r_{+},r_{-}>0 and 1MD1\leq M\leq D.

Proposition 14 (N=1N=1 on a star).

Suppose SDS_{D} has at least two edges longer than c1/2cc_{1}/2c_{\infty}. Then,

argminWpp(f+,f)overXs(c,c~1,1,SD),{\rm arg}\min W_{p}^{p}(f_{+},f_{-})\qquad\quad{\rm over}\qquad\quad X_{s}(c_{\infty},\tilde{c}_{1},1,S_{D})\,,

is given by (25) with M=D/2M=\lfloor D/2\rfloor, the lower integer part of D/2D/2.

Proof.

Since the equivalent gg (see Lemma 9) is a function on an interval, we can use the explicit formula in terms of the inverse CDFs, (8). By direct calculation, the inverse CDFs, defined on the interval [0,c~1/2][0,\tilde{c}_{1}/2], are G+1(t)=r+2c~11(t)G_{+}^{-1}(t)=r_{+}2\tilde{c}_{1}^{-1}(t) and G1(t)=r+r2c~11tG_{-}^{-1}(t)=-r_{-}+r_{-}2\tilde{c}_{1}^{-1}t. Assume without loss of generality that r+rr_{+}\geq r_{-}, and so α2c~11(r+r)0\alpha\equiv 2\tilde{c}_{1}^{-1}(r_{+}-r_{-})\geq 0. Hence,

Wpp(g+,g)\displaystyle W_{p}^{p}(g_{+},g_{-}) =0c~1/2[r+2c~11t(r+r2c~11t)]p𝑑t\displaystyle=\int\limits_{0}^{\tilde{c}_{1}/2}\left[r_{+}2\tilde{c}_{1}^{-1}t-(-r_{-}+r_{-}2\tilde{c}_{1}^{-1}t)\right]^{p}\,dt
(27) =0c~1/2[αt+r]p𝑑t0c~1/2[r]p.\displaystyle=\int\limits_{0}^{\tilde{c}_{1}/2}\left[\alpha t+r_{-}\right]^{p}\,dt\geq\int\limits_{0}^{\tilde{c}_{1}/2}\left[r_{-}\right]^{p}\,.

Hence, Wpp(g+,g)W_{p}^{p}(g_{+},g_{-}) is minimized when α=0\alpha=0, i.e., when r+=rr_{+}=r_{-}, which by the mass conservation relation (26) yields M=D/2M=D/2.

When DD is even, substitution of M=D/2M=D/2 into (27) yields

(28) DevenWpp(g+,g)=c~12rp=12Dp(c~1c)pc~1,r=r+=c1~c1D,D~{}{\rm even}\qquad\qquad W_{p}^{p}(g_{+},g_{-})=\frac{\tilde{c}_{1}}{2}r_{-}^{p}=\frac{1}{2D^{p}}\left(\frac{\tilde{c}_{1}}{c_{\infty}}\right)^{p}\tilde{c}_{1}\,,\qquad r_{-}=r_{+}=\frac{\tilde{c_{1}}}{c_{\infty}}\frac{1}{D}\,,

which is the same as in the single step function case for D=2D=2, see (13).

When DD is odd, M=D/2M=D/2 is not an integer, and therefore not an admissible choice. We claim that the optimal choices are either M=D/2=(D1)/2M=\lfloor D/2\rfloor=(D-1)/2 or M=D/2M=\lceil D/2\rceil. To see that, first note that the integrand in (27) is monotonic in α\alpha. Moreover, by the mass conservation relation (26), we get that

(29) α=α(M)=1c(1M1DM),\alpha=\alpha(M)=\frac{1}{c_{\infty}}\left(\frac{1}{M}-\frac{1}{D-M}\right)\,,

for any 1MD1\leq M\leq D. Hence, Wpp(g+,g)W_{p}^{p}(g_{+},g_{-}) is monotonic convex in MM near its minimum at D/2D/2, and the closest integer values, D/2\lfloor D/2\rfloor and D/2\lceil D/2\rceil, minimize the cost.

To find the minimal cost, first, by direct computation of the integral in (27), we get

(30) Wpp(g+,g)=1(p+1)α[(αc~12+r)p+1rp+1].W_{p}^{p}(g_{+},g_{-})=\frac{1}{(p+1)\alpha}\left[\left(\frac{\alpha\tilde{c}_{1}}{2}+r_{-}\right)^{p+1}-r_{-}^{p+1}\right]\,.

By substitution of M=D/2=(D1)/2M=\lfloor D/2\rfloor=(D-1)/2 into (29), we get that

α(D2)=1c(2D12D+1)=4c(D1)(D+1).\displaystyle\alpha\left(\lfloor\frac{D}{2}\rfloor\right)=\frac{1}{c_{\infty}}\left(\frac{2}{D-1}-\frac{2}{D+1}\right)=\frac{4}{c_{\infty}(D-1)(D+1)}\,.

Similarly, by the same constraint in (27), we have that

rcD+12=c~12r=c~1c(D+1),andr+=c~1c(D1).r_{-}c_{\infty}\frac{D+1}{2}=\frac{\tilde{c}_{1}}{2}\,\,\Longrightarrow r_{-}=\frac{\tilde{c}_{1}}{c_{\infty}(D+1)},\quad\text{and}\quad r_{+}=\frac{\tilde{c}_{1}}{c_{\infty}(D-1)}\,.

Substituting the expressions for α\alpha and rr_{-} into (30), we get

Wpp(g+,g)\displaystyle W_{p}^{p}(g_{+},g_{-}) =c(D1)(D+1)4(p+1)[(4c~12c(D1)(D+1)+c~1c(D+1))p+1(c~1c(D+1))p+1]\displaystyle=\frac{c_{\infty}(D-1)(D+1)}{4(p+1)}\left[\left(\frac{4\tilde{c}_{1}}{2c_{\infty}(D-1)(D+1)}+\frac{\tilde{c}_{1}}{c_{\infty}(D+1)}\right)^{p+1}-\left(\frac{\tilde{c}_{1}}{c_{\infty}(D+1)}\right)^{p+1}\right]
=14(p+1)(D+1)p(D1)p(c~1c)pc1~[(D+1)p+1(D1)p+1].\displaystyle=\frac{1}{4(p+1)(D+1)^{p}(D-1)^{p}}\left(\frac{\tilde{c}_{1}}{c_{\infty}}\right)^{p}\tilde{c_{1}}\left[(D+1)^{p+1}-(D-1)^{p+1}\right]\,.

To compare this expression to the even case (28), let us set p=1p=1. Then

(31) W1(g+,g)=D2(D+1)(D1)c~1cc1~=c~124D~c,W_{1}(g_{+},g_{-})=\frac{D}{2(D+1)(D-1)}\frac{\tilde{c}_{1}}{c_{\infty}}\tilde{c_{1}}=\frac{\tilde{c}_{1}^{2}}{4\tilde{D}c_{\infty}}\,,

with D~=(D+1)(D1)2D\tilde{D}=\frac{(D+1)(D-1)}{2D} as defined before. Since

12(D+1)14D~12(D1),\frac{1}{2(D+1)}\leq\frac{1}{4\tilde{D}}\leq\frac{1}{2(D-1)}\,,

we easily see that W1W_{1} lies in between the optimal cost for stars with D1D-1 and D+1D+1. This is consistent with the intuition that as DD increases, the cost decreases.

Either DD is even or odd, the overall transport cost is cheaper than the case of an internal point. This makes intuitive sense, because a nodal point on the vertex allows for more mass to be transported over a shorter distance, due to Lemma 9. ∎

5.3. Long edges, 𝐍𝟏{\bf N\geq 1}, 𝐃{\bf D} is even

In principle, now one can go to the general case of N1N\geq 1 and, using the minimization formulation (24), find the minimizers. Comparing the even and odd cases in (28) and (31), respectively, it seems that the case of DD even is more tractable.

The total cost of transport through the 0-vertex depends on a single parameter r=r+r_{-}=r_{+}, which is not yet determined. Equivalently, it depends on c~1\tilde{c}_{1}, the amount of mass concentrated around the vertex, with 0c~1c10\leq\tilde{c}_{1}\leq c_{1}. Since we assume that the edges are sufficiently long, then (based on our analysis for the interval in Section 3) the minimization problem (24) simplifies by d1==dN1=dintd_{1}=\cdots=d_{N-1}=d_{\rm int}. We therefore wish to minimize

(32) Wpp(f+,f)=(N1)cdintp+1+D2cr+p+1,W_{p}^{p}(f_{+},f_{-})=(N-1)c_{\infty}d_{\rm int}^{p+1}+\frac{D}{2}c_{\infty}r_{+}^{p+1}\,,

subject to

2(N1)dintc+Dr+c=c1.2(N-1)d_{\rm int}c_{\infty}+Dr_{+}c_{\infty}=c_{1}\,.

As in the case of the interval (Section 3), we use the method of Lagrange multipliers

=(N1)cdintp+1+D2cr+p+1λ(2(N1)dintc+Dr+cc1).\mathcal{L}=(N-1)c_{\infty}d_{\rm int}^{p+1}+\frac{D}{2}c_{\infty}r_{+}^{p+1}-\lambda(2(N-1)d_{\rm int}c_{\infty}+Dr_{+}c_{\infty}-c_{1})\,.

Then

0=dint=(p+1)(N1)cdintp2λ(N1)cdintp=2λp+1,0=\partial_{d_{\rm int}}\mathcal{L}=(p+1)(N-1)c_{\infty}d_{\rm int}^{p}-2\lambda(N-1)c_{\infty}\quad\Longrightarrow\quad d_{\rm int}^{p}=\frac{2\lambda}{p+1}\,,

and

0=r+=(p+1)D2cr+pλDcr+p=2λp+1.0=\partial_{r_{+}}\mathcal{L}=(p+1)\frac{D}{2}c_{\infty}r_{+}^{p}-\lambda Dc_{\infty}\quad\Longrightarrow\quad r_{+}^{p}=\frac{2\lambda}{p+1}\,.

In particular, r+=dintr_{+}=d_{\rm int}. The constraint associated with c1c_{1} then yields

dint=r+=121(N1)+D2c1cd_{\rm int}=r_{+}=\frac{1}{2}\frac{1}{(N-1)+\frac{D}{2}}\frac{c_{1}}{c_{\infty}}

and the minimal optimal transport cost is

Wp(f+,f)=c1cc11p21+1p1N~,N~N1+D2.W_{p}(f_{+},f_{-})=\frac{c_{1}}{c_{\infty}}\frac{c_{1}^{\frac{1}{p}}}{2^{1+\frac{1}{p}}}\frac{1}{\tilde{N}}\,,\qquad\tilde{N}\equiv N-1+\frac{D}{2}\,.

In comparison with Theorem 5 for the interval, we see that for a star with an even number of edges that are sufficiently long, NN is replaced by N~\tilde{N}, i.e., the vertex as a nodal point is equivalent to D/2D/2 nodal points on a line.

5.4. Long edges, 𝐍𝟏{\bf N\geq 1}, 𝐃{\bf D} is odd

In this case, the same argument yields a much more cumbersome expression, and so we resolve it only for p=1p=1. Lagrange multipliers method now yields

=(N1)cdint2+14D~c~12cλ(2(N1)dintc+c~1c1),for D~(D+1)(D1)2D.\mathcal{L}=(N-1)c_{\infty}d_{\rm int}^{2}+\frac{1}{4\tilde{D}}\frac{\tilde{c}_{1}^{2}}{c_{\infty}}-\lambda\left(2(N-1)d_{\rm int}c_{\infty}+\tilde{c}_{1}-c_{1}\right)\,,\quad\text{for }\tilde{D}\equiv\frac{(D+1)(D-1)}{2D}.

Here, since r+rr_{+}\neq r_{-}, it is easier to minimize the cost with respect to c~1\tilde{c}_{1}, the total L1L^{1} mass around the vertex. Then

0=c~1=12D~c~1cλ,0=\partial_{\tilde{c}_{1}}\mathcal{L}=\frac{1}{2\tilde{D}}\frac{\tilde{c}_{1}}{c_{\infty}}-\lambda\,,

and so

0=dint=2(N1)cdint2λ(N1)cdint=λ=12D~c~1c.0=\partial_{d_{\rm int}}\mathcal{L}=2(N-1)c_{\infty}d_{\rm int}-2\lambda(N-1)c_{\infty}\quad\Longrightarrow\quad d_{\rm int}=\lambda=\frac{1}{2\tilde{D}}\frac{\tilde{c}_{1}}{c_{\infty}}\,.

The L1L^{1} constraint now reads as

N1D~c~1cc+c~1=c1c~1=D~N1+D~c1.\frac{N-1}{\tilde{D}}\frac{\tilde{c}_{1}}{c_{\infty}}c_{\infty}+\tilde{c}_{1}=c_{1}\quad\Longrightarrow\quad\tilde{c}_{1}=\frac{\tilde{D}}{N-1+\tilde{D}}c_{1}\,.

As a sanity check, we see that c~1c1\tilde{c}_{1}\to c_{1} as DD\to\infty. We note that, in this case, the sharp lower bound for fX(c,c1,N,SD)f\in X(c_{\infty},c_{1},N,S_{D}) reads like

W1(f+,f)\displaystyle W_{1}(f_{+},f_{-}) (N1)cdint2+14D~c~12c\displaystyle\geq(N-1)c_{\infty}d_{\rm int}^{2}+\frac{1}{4\tilde{D}}\frac{\tilde{c}_{1}^{2}}{c_{\infty}}
c124(N1+D~)c\displaystyle\geq\frac{c_{1}^{2}}{4(N-1+\tilde{D})c_{\infty}}
(33) 14N~c12c,forN~N1+D~.\displaystyle\geq\frac{1}{4\tilde{N}}\frac{c_{1}^{2}}{c_{\infty}}\,,\qquad\text{for}\quad\tilde{N}\equiv N-1+\tilde{D}\,.

Compared with the case where the number of edges is even (see Sec. 5.3), D~\tilde{D} may be viewed as the effective multiplicity of the nodal point at the vertex.

The key difference between the lower bounds on W1W_{1} for stars and the lower bounds obtained for non-branching spaces (the interval, the circle, and general dd-dimensional surfaces in (2), see [16, 17, 45]), is that here the lower bound does not admit a multiplicative ‘uncertainty principle’-like structure of 4W1Nc1cc14W_{1}\cdot N\geq\frac{c_{1}}{c_{\infty}}c_{1}, where AA depends on the domain and norms of ff. Nevertheless, for the cases in Sections 5.3 and 5.4, we may still say that the structure is preserved with an effective NN that is replaced by N~\tilde{N} to account for the particular geometry of the nodal point at the vertex of the star. However, discussions below imply that the precise characterization of N~\tilde{N} could be much more involved than only the degree DD with short edges present.

5.5. 𝐃=𝟑{\bf D=3} with one short edge, 𝐍𝟏{\bf N\geq 1} and 𝐩=𝟏{\bf p=1}

In the long-edge cases discussed above, the minimization problem given in Theorem 13 is assumed to have a solution in the interior of the constraint set (24d). To see further how the geometry affects the conclusion on the minimal optimal transport cost, we consider the special case of D=3D=3, with two long edges and a third short one as an illustrative example. Our goal is to see how the limit L30+L_{3}\to 0^{+} interpolates between S3S_{3} with three long edges, and S2[L1,L2]S_{2}\cong[-L_{1},L_{2}].

First, note that to optimize the transport cost in the present case, it remains advantageous to include the node of the star in Z(f)Z(f). Moreover, the maximal amount of mass should be assigned to the short edge if its length is sufficiently small. Meanwhile, away from the node, we retain the symmetric structure around the other points in Z(f)Z(f) that are located on the long edges. Near those points, we utilize earlier discussion and use dintd_{int} to represent the length of the subintervals on which the functions are taken to be ±c\pm c_{\infty} with the total optimal transport cost given by (N1)cdint2(N-1)c_{\infty}d_{int}^{2}.

As before, to get an explicit expression for (24a), we need to work out W1(g+,g)W_{1}(g_{+},g_{-}) as a function of the overall mass assigned to the vertex, c~1\tilde{c}_{1}. Assume without loss of generality that f0f\geq 0 on I1I_{1} and f0f\leq 0 on I2I_{2} and I3I_{3} near the node. For L31L_{3}\ll 1, then the maximal amount of mass should be assigned to I3I_{3}, i.e., r3=L3r_{3}=L_{3} and f(x)=cf(x)=-c_{\infty} on I3I_{3}. By the consideration above, we have that the equivalent g:g:\mathbb{R}\to\mathbb{R} is of the form

g(x)={cx(r1,0),2cx(0,L3),cx(L3,r2),0otherwise.g(x)=\left\{\begin{array}[]{ll}c_{\infty}&x\in(-r_{1},0)\,,\\ -2c_{\infty}&x\in(0,L_{3})\,,\\ -c_{\infty}&x\in(L_{3},r_{2})\,,\\ 0&{\rm otherwise}\,.\end{array}\right.

for two undetermined constants r1,r2>0r_{1},r_{2}>0. For p=1p=1, we can use the CDF counterpart of (8), W1(g+,g)=G+GL1()W_{1}(g_{+},g_{-})=\|G_{+}-G_{-}\|_{L^{1}(\mathbb{R})}, where G±G_{\pm} are the CDFs of g±g_{\pm}. By direct computation

G+(t)G(t)={c(t+r1)t(r1,0),cr12ctt(0,L3),cr12cL3ctx(L3,r2).G_{+}(t)-G_{-}(t)=\left\{\begin{array}[]{ll}c_{\infty}(t+r_{1})&t\in(-r_{1},0)\,,\\ c_{\infty}r_{1}-2c_{\infty}t&t\in(0,L_{3})\,,\\ c_{\infty}r_{1}-2c_{\infty}L_{3}-c_{\infty}t&x\in(L_{3},r_{2})\,.\end{array}\right.

Since the overall mass conservation (24c) reduces to r1=r2+L3=c~1/2cr_{1}=r_{2}+L_{3}=\tilde{c}_{1}/2c_{\infty}, we have that

W1(g+,g)\displaystyle W_{1}(g_{+},g_{-}) =G+GL1()\displaystyle=\|G_{+}-G_{-}\|_{L^{1}(\mathbb{R})}
=14c~1r1+14c~1r2+14c~1L3r2cL3\displaystyle=\frac{1}{4}\tilde{c}_{1}r_{1}+\frac{1}{4}\tilde{c}_{1}r_{2}+\frac{1}{4}\tilde{c}_{1}L_{3}-r_{2}c_{\infty}L_{3}
=c~124cr2cL3\displaystyle=\frac{\tilde{c}_{1}^{2}}{4c_{\infty}}-r_{2}c_{\infty}L_{3}
=c~124c(c~12cL3)cL3.\displaystyle=\frac{\tilde{c}_{1}^{2}}{4c_{\infty}}-\left(\frac{\tilde{c}_{1}}{2c_{\infty}}-L_{3}\right)c_{\infty}L_{3}\,.

Putting these considerations together, the minimization problem in (24) reduces to

W1(f+,f)\displaystyle W_{1}(f_{+},f_{-}) =(N1)cdint2+c~124c(c~12cL3)cL3\displaystyle=(N-1)c_{\infty}d_{\rm int}^{2}+\frac{\tilde{c}_{1}^{2}}{4c_{\infty}}-\left(\frac{\tilde{c}_{1}}{2c_{\infty}}-L_{3}\right)c_{\infty}L_{3}
=c[(N1)dint2+(c~12cL32)2+34L32],\displaystyle=c_{\infty}\left[(N-1)d_{\rm int}^{2}+\left(\frac{\tilde{c}_{1}}{2c_{\infty}}-\frac{L_{3}}{2}\right)^{2}+\frac{3}{4}L_{3}^{2}\right]\,,

subject to the L1L^{1} mass conservation constraint

2(N1)dint+c~1c=c1c.2(N-1)d_{\rm int}+\frac{\tilde{c}_{1}}{c_{\infty}}=\frac{c_{1}}{c_{\infty}}\,.

Thus, for βL3c/c1\beta\equiv L_{3}c_{\infty}/c_{1}, we get the optimal transport cost

(34) W1(f+,f)c[14N(c1cL3)2+34L32]=c124c(1β)2+3Nβ2NW_{1}(f_{+},f_{-})\geq c_{\infty}\left[\frac{1}{4N}\left(\frac{c_{1}}{c_{\infty}}-L_{3}\right)^{2}+\frac{3}{4}L_{3}^{2}\right]=\frac{c_{1}^{2}}{4c_{\infty}}\frac{(1-\beta)^{2}+3N\beta^{2}}{N}

with equality for

dint=12N(c1cL3), and c~1c=c1Nc+(N1)L3N.d_{\rm int}=\frac{1}{2N}\left(\frac{c_{1}}{c_{\infty}}-L_{3}\right),\;\text{ and }\;\frac{\tilde{c}_{1}}{c_{\infty}}=\frac{c_{1}}{Nc_{\infty}}+\frac{(N-1)L_{3}}{N}\,.

β\beta as an interpolation parameter between S3S_{3} and an interval.

We can understand these expressions between the two limits of either shortening or lengthening I3I_{3}, i.e., as the star graph deforms into an interval with L30+L_{3}\to 0^{+} on one hand and as third edge becomes long enough for us to recover the case of three long edges, as described in Sections 5.2 and 5.4. Note that our analysis involving the short edge holds precisely until the case where there is no distinction between I2I_{2} and I3I_{3} in terms of mass allocation. By the results in Section 5.4, the optimal mass around the origin for D=3D=3 is c~1=4c1/(3N+1)\tilde{c}_{1}=4c_{1}/(3N+1). Since near the vertex, L3L_{3} and L2L_{2} both contain the support of ff_{-}, the threshold length for L3L_{3} to enable the optimal c~1\tilde{c}_{1} thus corresponds to β=cL3/c1=c~1/4c1=1/(3N+1)\beta^{*}=c_{\infty}L_{3}/c_{1}=\tilde{c}_{1}/4c_{1}=1/(3N+1). Indeed, we may rewrite the right hand side of (34) to get an equivalent form given by

W1(f+,f)c124c3N+1N(β13N+1)2+c124c(N+13).\displaystyle W_{1}(f_{+},f_{-})\geq\frac{c_{1}^{2}}{4c_{\infty}}\frac{3N+1}{N}\left(\beta-\frac{1}{3N+1}\right)^{2}+\frac{c_{1}^{2}}{4c_{\infty}(N+\frac{1}{3})}\,.

The lower bound is clearly a decreasing function of β\beta for β[0,1/(3N+1)]\beta\in[0,1/(3N+1)], hence interpolating, as expected, between the lower bound derived in (16) for p=1p=1 and β=0\beta=0 (i.e., D=2D=2 for the case of an interval) and that given in (33) with D=3D=3 and D~=43\tilde{D}=\frac{4}{3}.

Meanwhile, for small but nonzero L3L_{3}, the non-multiplicative nature of the lower bound on the right hand side of (34) is evident. Compared to the cases in Sections 5.3 and 5.4, we see that the multiplicative form is lost not only in terms of NN but also with respect to c1c\frac{c_{1}}{c_{\infty}}.

Remark.

What can be said on stars such as in Fig. 9, for which Theorem 13 does not apply? We outline the strategy for their analysis: first, the equivalence relation in Lemma 9 can be extended to a single nodal point near the vertex, on the long edge. The adjacency and monotonicity arguments (Lemmas 11 and 12) seem to go through as well. Then, the optimization problem in Theorem 13 should be extended by adding another parameter to represent the location of that “special” nodal point, which is near the vertex. Finally, we remark that the condition for a long edge being c1/2c\geq c_{1}/2c_{\infty} is sufficient, but not necessarily sharp. The sharpness of this lower bound would become one interesting element in the study of Question 1 on general metric graphs.

6. Outlook

An important message of this work is that the original problem of minimizing the transport cost over the specified function class is equivalent to a generalized minimal surface problem associated with optimal domain partition. For one-dimensional domains, minimizing the cost reduces to optimally positioning the nodal points and locating masses around them. We believe that this approach can be generalized to more challenging cases, some of which are discussed below.

Multidimensional domains

In an analogous way to the one-dimensional settings of this study, it seems that on bounded and regular domain Ωd\Omega\subseteq\mathbb{R}^{d}, the minimizers will be step functions concentrated near the interfaces between f+f_{+} and ff_{-}. If this is indeed the case, the key remains to be the solution to the minimization problem

(35) minfXs(1,c1/c,N,Ω)Wp(f+,f).\min\limits_{f\in X_{s}(1,c_{1}/c_{\infty},N,\Omega)}W_{p}(f_{+},f_{-}).

It should be commented that, for step functions, the existence of a lower bound ((2) [16, 17, 45]) combined with nucleation arguments such as [41, Lemma 2.8], should yield the existence of minimizers. An element in Xs(1,c1/c,N,Ω)X_{s}(1,c_{1}/c_{\infty},N,\Omega) corresponds to a unique partition of Ω\Omega into three disjoint subsets Ω+,Ω,Ω0Ω\Omega_{+},\Omega_{-},\Omega_{0}\subset\Omega, where Ω0=Ω(Ω+Ω)\Omega_{0}=\Omega\setminus(\Omega_{+}\cup\Omega_{-}), such that

|Ω|=|Ω+|=c12,d1(Γ)=N,ΓΩ+Ω,|\Omega_{-}|=|\Omega_{+}|=\frac{c_{1}}{2}\,,\qquad\mathcal{H}^{d-1}(\Gamma)=N\,,\qquad\Gamma\equiv\partial\Omega_{+}\cap\partial\Omega_{-}\,,

assuming that the partitions are sufficiently regular such that d1(Γ)\mathcal{H}^{d-1}(\Gamma) is well defined. Denoting the set of such partitions by Y(c1,N,Ω)Y(c_{1},N,\Omega), one may attempt a similar approach to the one presented in this work: reducing the functional minimization problem (35) to a purely geometric one:

(36) minfX(1,c1,N,Ω)Wp(f+,f)=min(Ω,Ω+,Ω0)Y(c1,N,Ω)Wp(𝟙Ω+,𝟙Ω)\min\limits_{f\in X(1,c_{1},N,\Omega)}W_{p}(f_{+},f_{-})=\min\limits_{(\Omega_{-},\Omega_{+},\Omega_{0})\in Y(c_{1},N,\Omega)}W_{p}(\mathbbm{1}_{\Omega_{+}},\mathbbm{1}_{\Omega_{-}})

Clearly, the latter becomes an optimal partition problem or a minimal surface problem. That is, the problem under consideration may be formulated as an equivalent geometric question.

Question 15.

For a bounded and regular domain Ωd\Omega\subset\mathbb{R}^{d}, p1p\geq 1, c1>0c_{1}>0, and N>0N>0, what are the optimal partitions (Ω,Ω+,Ω0)(\Omega_{-},\Omega_{+},\Omega_{0}) of Ω\Omega and the corresponding minimum value associated with the problem

(37) min(Ω,Ω+,Ω0)Y(c1,N,Ω)Wp(𝟙Ω+,𝟙Ω)\min\limits_{(\Omega_{-},\Omega_{+},\Omega_{0})\in Y(c_{1},N,\Omega)}W_{p}(\mathbbm{1}_{\Omega_{+}},\mathbbm{1}_{\Omega_{-}})

This draws interesting connections to possibly other optimal domain partition problems such as those related to different optimal transport minimization problems, e.g., partitions defined by the optimal quantization (centroidal Voronoi tessellations) [25, 38]. One may find more discussions on other optimal domain partition problems in [12]. A variation of Question 15 considered in [13, 41, 42, 52] is to modify the set of feasible partitions by removing the constraint on the prescribed measure of the nodal, and consider the minimization of |Ω+|+λW1(𝟙Ω+,𝟙Ω)|\partial\Omega_{+}|+\lambda W_{1}(\mathbbm{1}_{\Omega_{+}},\mathbbm{1}_{\Omega_{-}}) where |Ω+||\partial\Omega_{+}| is the Hausdorff measure of the boundary set of Ω+\Omega_{+} and λ>0\lambda>0 is a penalty constant.

The solution to the purely geometric Question 15 can be used to offer a new perspectives on the existing studies concerning (2) in multiple-dimensions [48, 45, 16, 17] and to make further connections with the bilayer membrane limits [42, 34]. In particular, the optimal transport problem admits a decomposition into a continuous family of one-dimensional optimal-transport problems. One of the main challenges is that, as we seek to solve (37), this decomposition changes as well. Based on the discussion on the one dimensional examples, it is obvious that the optimal partitions may not be unique in general, even though they yield the same optimal transport cost. While a complete solution is beyond the scope of this work, one might conjecture that Γ\Gamma is made of piecewise planar surfaces, which are the conventional minimal surfaces in the Euclidean space.

Metric graphs and Laplacian eigenfunctions

This work is a starting point for the study of the Wasserstein minimization problems on metric graphs, for which the star SDS_{D} (Section 5) is perhaps the most basic nontrivial example. Our analysis for SDS_{D} suggests what might be the key challenges for general metric graphs - first, that the associated minimization problems might be algebraically complicated, and second, that the equivalence principles (Lemma 9) are not easily extended to general graphs. The second key challenge is to find a monotonicity-like property which generalizes Lemma 12 to non-star graphs.

While the study of optimal transport on metric graphs is relatively new [26], the study of nodal sets of special functions on metric graphs is a well established field, see e.g., [1, 5, 6, 9, 19, 31, 32], as well as [8] and the references therein. In its heart, the question is a natural extension of Sturm’s Oscillations Theory: given the NN-th Laplacian eigenfunction on a certain metric graph, how many zeroes does it have?

Independently, uncertainty principles for the Wasserstein distance such as (16) were applied to generalize Sturm and Sturm-Hurwitz (sums of Laplacian eigenfunctions) type results in dimensions d1d\geq 1 [16, 17, 45, 48]. Generally speaking, the approach proceeds in two steps: first, prove a lower bound of the type (16), and then, obtain an upper bound on W(f+,f)W(f_{+},f_{-}) for the specialized type of functions under consideration, e.g., Laplacian eigenfunction. The current work might therefore open the way of connecting these two lines of works, though many open challenges remain before such a connection is established.

Minimization over more specialized function classes

Concerning the possible connection to properties of Laplacian eigenfunctions, related studies on lower bounds and uncertainty-principles of Laplacian eigenfunctions on Riemannian manifolds (and RCD{\rm RCD} spaces in general) can be found in [21, 40, 49]. One may note that the key to get the new lower bounds like (16) presented in this work is through the study of the minimization problem over the function class defined by, e.g., (3). The latter class is large enough to allow minimizers taking on the form of step functions. On the other hand, eigenfunctions of elliptic operators are smooth. Thus, another natural research direction to explore is the study of possibly different bounds, associated with similar minimization problems, but over classes of functions that are more regular or of special forms. In particular, the special forms may correspond to discrete function spaces, and the problem of passing from discretizations of LL^{\infty} functions, where the minimization problem is finite-dimensional, to the continuum limit (studied in this paper), might be of independent interest.

Functional analytic perspective

Inequalities such as (2) and (16) can be thought of as interpolation inequalities. On the lower-bound side, there are the L1L^{1} and LL^{\infty} norms involving no derivatives. The upper bound consists of two terms, which can be connected to derivative norms of different orders, respectively: first, |Z(f)||Z(f)| is smaller than D𝟙supp(f+)1\|D\mathbbm{1}_{{\rm supp}(f_{+})}\|_{1}, where the derivative DD is taken in the sense of functions of bounded variations [27, 30]; then, Wp(f+,f)W_{p}(f_{+},f_{-}) can be viewed as norms of derivatives of negative order, e.g., the W2W_{2} distance is related to the standard Sobolev H˙1\dot{H}^{-1} norm; See [33, 43] for details.

While this functional-analytic perspective is not used in this paper, it can lead to the exploration of more general variational problems of the type Question 1, e.g., by allowing for constraints not only in L1L^{1} and LL^{\infty}, but in other norms such as LpL^{p} norms for 1<p<1<p<\infty. Moreover, understanding (2) as an interpolation inequality, (16) can be thought of as a sharp interpolation inequality with optimal constants, of which an extensive literature exists [22, 23].

Acknowledgments

The authors would like to thank Ram Band for many invigorating discussions and help provided along the way, as well as for suggesting to study this minimization problem on metric graph and highlighting some of the challenges in that area. The authors would also like to thank Raghavendra Venkatraman for bringing to our attention an issue with one of the definitions in a previous version of this manuscript. Many thanks to Fabio Cavalletti and Tim Laux for useful comments and references. Thanks also to Zirui Xu for bringing references [34, 42] to our attention. The research of Q.D. is supported in part by NSF DMS-1937254, DMS-2012562 and CCF-1704833. A.S. acknowledges the support of the AMS-Simons Travel Grant, and is supported in part by Simons Foundation Math + X Investigator Award #376319 (Michael I. Weinstein).

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

References

  • [1] Lior Alon, Ram Band, and Gregory Berkolaiko, Nodal statistics on quantum graphs, Communications in Mathematical Physics 362 (2018), no. 3, 909–948.
  • [2] Luigi Ambrosio and Simone Di Marino, Equivalent definitions of bv space and of total variation on metric measure spaces, Journal of Functional Analysis 266 (2014), no. 7, 4150–4188.
  • [3] Luigi Ambrosio, Nicola Gigli, Andrea Mondino, and Tapio Rajala, Riemannian ricci curvature lower bounds in metric measure spaces with σ\sigma-finite measure, Transactions of the American Mathematical Society 367 (2015), no. 7, 4661–4701.
  • [4] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré, Metric measure spaces with riemannian ricci curvature bounded from below, Duke Mathematical Journal 163 (2014), no. 7, 1405–1490.
  • [5] Ram Band, The nodal count {\{0, 1, 2, 3,…}\} implies the graph is a tree, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372 (2014), no. 2007, 20120504.
  • [6] Ram Band, Gregory Berkolaiko, Hillel Raz, and Uzy Smilansky, The number of nodal domains on quantum graphs as a stability index of graph partitions, Communications in Mathematical Physics 311 (2012), no. 3, 815–838.
  • [7] E Barron, M Bocea, and R Jensen, Duality for the ll^{\infty} optimal transport problem, Transactions of the American Mathematical Society 369 (2017), no. 5, 3289–3323.
  • [8] Gregory Berkolaiko and Peter Kuchment, Introduction to quantum graphs, no. 186, American Mathematical Soc., 2013.
  • [9] Gregory Berkolaiko and Tracy Weyand, Stability of eigenvalues of quantum graphs with respect to magnetic perturbation and the nodal count of the eigenfunctions, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372 (2014), no. 2007, 20120522.
  • [10] Stefano Bianchini and Fabio Cavalletti, The Monge problem for distance cost in geodesic spaces, Communications in Mathematical Physics 318 (2013), no. 3, 615–673.
  • [11] Yann Brenier, Polar factorization and monotone rearrangement of vector-valued functions, Communications on pure and applied mathematics 44 (1991), no. 4, 375–417.
  • [12] Dorin Bucur, Giuseppe Buttazzo, and Antoine Henrot, Existence results for some optimal partition problems, Advances in Mathematical Sciences and Applications 8 (1998), 571–579.
  • [13] Giuseppe Buttazzo, Guillaume Carlier, and Maxime Laborde, On the Wasserstein distance between mutually singular measures, Advances in Calculus of Variations 13 (2020), no. 2, 141–154.
  • [14] Luis Caffarelli, Mikhail Feldman, and Robert McCann, Constructing optimal maps for Monge’s transport problem as a limit of strictly convex costs, Journal of the American Mathematical Society 15 (2002), no. 1, 1–26.
  • [15] Jules Candau-Tilh and Michael Goldman, Existence and stability results for an isoperimetric problem with a non-local interaction of wasserstein type, ESAIM: Control, Optimisation and Calculus of Variations 28 (2022), 37.
  • [16] Tom Carroll, Xavier Massaneda, and Joaquim Ortega-Cerdà, An enhanced uncertainty principle for the Vaserstein distance, Bulletin of the London Mathematical Society 52 (2020), no. 6, 1158–1173.
  • [17] Fabio Cavalletti and Sara Farinelli, Indeterminacy estimates and the size of nodal sets in singular spaces, Advances in Mathematics 389 (2021), 107919.
  • [18] Thierry Champion, Luigi De Pascale, and Petri Juutinen, The \infty-wasserstein distance: Local solutions and existence of optimal transport maps, SIAM Journal on Mathematical Analysis 40 (2008), no. 1, 1–20.
  • [19] Yves Colin de Verdière, Magnetic interpretation of the nodal defect on graphs, Analysis & PDE 6 (2013), no. 5, 1235–1242.
  • [20] Nicolò De Ponti and Sara Farinelli, Eigenfunctions and a lower bound on the wasserstein distance, arXiv preprint arXiv:2104.12097 (2021).
  • [21] by same author, Indeterminacy estimates, eigenfunctions and lower bounds on wasserstein distances, Calculus of Variations and Partial Differential Equations 61 (2022), no. 4, 1–17.
  • [22] Manuel Del Pino and Jean Dolbeault, Best constants for gagliardo–nirenberg inequalities and applications to nonlinear diffusions, Journal de Mathématiques Pures et Appliquées 81 (2002), no. 9, 847–875.
  • [23] by same author, The optimal euclidean lp-sobolev logarithmic inequality, Journal of Functional Analysis 197 (2003), no. 1, 151–161.
  • [24] Julie Delon, Julien Salomon, and Andrei Sobolevski, Fast transport optimization for monge costs on the circle, SIAM Journal on Applied Mathematics 70 (2010), no. 7, 2239–2258.
  • [25] Qiang Du, Vance Faber, and Max Gunzburger, Centroidal Voronoi tessellations: Applications and algorithms, SIAM review 41 (1999), no. 4, 637–676.
  • [26] Matthias Erbar, Dominik Forkert, Jan Maas, and Delio Mugnolo, Gradient flow formulation of diffusion equations in the Wasserstein space over a metric graph, arXiv preprint arXiv:2105.05677 (2021).
  • [27] Lawrence C Evans and Ronald F Garzepy, Measure theory and fine properties of functions, Routledge, 2018.
  • [28] Mikhail Feldman and Robert McCann, Monge’s transport problem on a Riemannian manifold, Transactions of the American Mathematical Society 354 (2002), no. 4, 1667–1697.
  • [29] Wilfrid Gangbo and Robert J McCann, The geometry of optimal transportation, Acta Mathematica 177 (1996), no. 2, 113–161.
  • [30] Enrico Giusti and Graham Hale Williams, Minimal surfaces and functions of bounded variation, vol. 80, Springer, 1984.
  • [31] Sven Gnutzmann, Uzy Smilansky, and Joachim Weber, Nodal counting on quantum graphs, Waves in random media 14 (2003), no. 1, S61.
  • [32] Matthias Hofmann, James B Kennedy, Delio Mugnolo, and Marvin Plümer, On pleijel’s nodal domain theorem for quantum graphs, Annales Henri Poincaré, Springer, 2021, pp. 1–30.
  • [33] Grégoire Loeper, Uniqueness of the solution to the vlasov–poisson system with bounded density, Journal de mathématiques pures et appliquées 86 (2006), no. 1, 68–79.
  • [34] Luca Lussardi, Mark A Peletier, and Matthias Röger, Variational analysis of a mesoscale model for bilayer membranes, Journal of Fixed Point Theory and Applications 15 (2014), no. 1, 217–240.
  • [35] Jose Mazón, Julio D Rossi, and Julian Toledo, Optimal mass transport on metric graphs, SIAM Journal on Optimization 25 (2015), no. 3, 1609–1632.
  • [36] Robert J McCann, Exact solutions to the transportation problem on the line, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 455 (1999), no. 1984, 1341–1380.
  • [37] by same author, Polar factorization of maps on Riemannian manifolds, Geometric & Functional Analysis GAFA 11 (2001), no. 3, 589–608.
  • [38] Quentin Mérigot, Filippo Santambrogio, and Clément Sarrazin, Non-asymptotic convergence bounds for wasserstein approximation using point clouds, Advances in Neural Information Processing Systems 34 (2021), 12810–12821.
  • [39] Michele Miranda Jr, Functions of bounded variation on “good” metric spaces, Journal de mathématiques pures et appliquées 82 (2003), no. 8, 975–1004.
  • [40] Mayukh Mukherjee, A sharp wasserstein uncertainty principle for laplace eigenfunctions, arXiv preprint arXiv:2103.11633 (2021).
  • [41] Michael Novack, Ihsan Topaloglu, and Raghavendra Venkatraman, Least wasserstein distance between disjoint shapes with perimeter regularization, Journal of Functional Analysis 284 (2023), no. 1, 109732.
  • [42] Mark A Peletier and Matthias Röger, Partial localization, lipid bilayers, and the elastica functional, Archive for rational mechanics and analysis 193 (2009), no. 3, 475–537.
  • [43] Rémi Peyre, Comparison between w2 distance and H˙1\dot{H}^{-1} norm, and localization of wasserstein distance, ESAIM: Control, Optimisation and Calculus of Variations 24 (2018), no. 4, 1489–1501.
  • [44] Tapio Rajala and Karl-Theodor Sturm, Non-branching geodesics and optimal maps in strong cd(k,)cd(k,\infty)-spaces, Calculus of Variations and Partial Differential Equations 50 (2014), no. 3, 831–846.
  • [45] Amir Sagiv and Stefan Steinerberger, Transport and interface: an uncertainty principle for the Wasserstein distance, SIAM Journal on Mathematical Analysis 52 (2020), no. 3, 3039–3051.
  • [46] T Salvemini, Sul calcolo degli indici di concordanza tra due caratteri quantitativi, Atti della VI Riunione della Soc. Ital. di Statistica (1943).
  • [47] Filippo Santambrogio, Optimal transport for applied mathematicians, vol. 55, Springer, 2015.
  • [48] Stefan Steinerberger, A metric sturm–liouville theory in two dimensions, Calculus of Variations and Partial Differential Equations 59 (2020), no. 1, 1–14.
  • [49] by same author, Wasserstein distance, fourier series and applications, Monatshefte für Mathematik 194 (2021), no. 2, 305–338.
  • [50] SS Vallender, Calculation of the Wasserstein distance between probability distributions on the line, Theory of Probability & Its Applications 18 (1974), no. 4, 784–786.
  • [51] Cédric Villani, Topics in optimal transportation, no. 58, American Mathematical Soc., 2003.
  • [52] Qinglan Xia and Bohan Zhou, The existence of minimizers for an isoperimetric problem with Wasserstein penalty term in unbounded domains, Advances in Calculus of Variations (2021).